Google Data APIs Protocol

Google Data APIs Protocol - interesting move from Google. I (and others) have thought for a while that combining OpenSearch’s read capabilities with the Atom Publishing Protocol’s write capabilities would create a very powerful API, and that’s roughly what Google is doing here.

It’s great to see the OpenSearch support (a bit - they’re using startIndex, totalResults and itemsPerPage), but I’d like to see them using it more. Some of what they’re doing is contrary to how OpenSearch works (that’s not a problem per-say), as they’re using predefined query names such as q and max-results (and a folder for categories) rather that allowing people to use whichever they want and then specify them in an OpenSearch Description file.

In that same vein, it would be nice to see them make use of autodiscovery, as Atom, RSS, OpenSearch, and others do. Upon first inspection I would say these autodiscovered documents could be OpenSearch Descriptions, but I may be wrong about that.

One interesting thing to note is that they mention how startIndex is 1-based (which is true), and then display an example with a value of “0″. Sounds like DeWitt is right, it does need to handle 0-based numbers too; even Google is making that mistake.

DeWitt brings up some other good points as well.

Via Niall.

Update: Joe Gregorio weighs in

Update 2: Marc Canter (one of my favourite bloggers) finds this linkworthy ;-) although I’m always amazed at the spellings my name gets.

There is no XML without namespaces

Yes, this makes two blog posts today, and yes, I’m going to talk about XML again.

I’ve suspected this for a while, but hadn’t looked into it. Thanks to Sam Ruby, I see that someone has: Who knows an XML document from a hole in the ground? shows that indeed, a lot of RSS/Atom parsers are not reading XML as XML… or at least, they’re not understanding the namespaces.

This wasn’t a problem when most feeds were bare-bones, and before Atom. Now, only a couple of years after I expected, all sorts of data and metada is starting to be put into feeds, with lots of different namespaces.

This is one of those things were if you’re a feed reader, and you don’t understand namespaces, you are broken, and need to be fixed. There’s no way around it, end of story.

That being said, I’m much more optimistic now than I was about those fixes actually happened. Phil Ringnalda’s Atom title tests really did help and pushed a lot more readers into supporting it properly. Now let’s see some real XML parsing.

FeedBurner is cool, but…

FeedBurner offers a very attractive service, and their new FeedFlare is just one part of that. But please, FeedBurner… when a user changes some settings, record the time of that change and only allow that change to affect new items. Not that it isn’t fun to see a whole lot of my subscriptions suddenly all marked as unread.

What’s wrong with MSN’s RSS search

News from Luigi about RSS search from MSN leads me to think MSN Search knows what they’re doing. Or not.

They are putting RSS/Atom search integrated right in with their web search. This is good. But… they’re displaying RSS feeds as regular search results, without modification. That means that when you click on a RSS feed result, you are taken to (surprise) the RSS feed, which, most of the time, is not in a human-readable format. Hello usability? This is acceptable for a major engine to put out for average web users?? Additionally, the ‘cache’ link for RSS feed results displays a somewhat more human-readable display, but it could definitely be improved.

Virtually all, if not all RSS feeds today are representations of existing web pages. It would make a little more sense to point to those, and provide an additional link to the actual RSS feed. This is essentially what all the major RSS search engines are smart enough to do, including Feedster, Blogdigger, and Bloglines.

Actually those engines are all smarter still, since they’re indexing individual RSS items rather than whole RSS feeds as if they were a single document. That’s a huge benefit of RSS; that the individual items have been separated, and usually come with important metadata, like the date. MSN doesn’t seem to make use of this at all, although admittedly their implementation is new.

It does appear that Yahoo has got some of this right, linking to web pages (and sometimes the web pages of the individual items). However, the same does not apply to their search API, which does use RSS feed URLs as the main link for each search result, and it does not provide the web page alternative. Which leads me to the news today of Yahoo Weather in RSS. They’re even including some excellent data in there, but, they’ve defined a new namespace for some of this data, which points to http://xml.weather.yahoo.com/ns/rss/1.0, which returns a 404 now. Also it’d be nice if they labeled their namespace ‘weather,’ rather than ‘yweather.’ And I strongly suspect that there are existing weather vocabularies they may have been able to use instead.

Anyway, back to MSN Search, they’ve introduced two new syntaxes, feed:, to specify to look for RSS feeds, and hasfeed: to specify that the results are web pages that have RSS feeds. That seems okay, but the way to use the syntax is odd. For example feed: site:bbc.co.uk. It has been semi-standard for a while to use syntax like syntax:foo, as in the site: keyword used, however the new syntax seems to be syntax: by itself. Confusing. Let’s just assume that this is temporary, until there’s a web-based interface for choosing to find RSS feeds.

</rant>