randomness

work seems to be keeping me rather busy

Yesterday I got around to fixing a several-month-old bug with my University of Waterloo search engine. Turns out the problem was Yahoo having changed their query parser. The query I was sending used to be

search terms (site:example.com OR site:example2.com OR ... site:exampleN.com)

however example.com wasn’t showing up on the results… the fix was adding a space before the ending parentheses.

search terms (site:example.com OR site:example2.com OR ... site:exampleN.com )

I wish Yahoo would publicly document all of their advanced search syntax, including the maximum query length.

I’ve been meaning to do another OpenSearch Update post. I’ve recently started adding some of these to del.icio.us. Noticing lots of non-English blog posts on OpenSearch lately, which is very cool. Today someone asked about including thumbnails. I’ve replied suggesting Media RSS but asking for consensus (although my email still needs to be moderated).

Lots of neat stuff in the mapping space lately. Thanks to Mikel Maron, Virtual Earth now has georss feeds.

So for years I’ve been largely ignoring the social networking websites. Or to be more accurate, reading up on them a lot, but not actually using them. Among other things, I don’t want to waste my time, nor provide a lot of my personal data to some walled garden. Regarding the latter, PeopleAggregator has been out for a while, and I hadn’t gotten around to congradulating Marc and Phillip. Anyhow, Facebook came to my school (this year I believe) and I’ve found that I’m actually using it. Not much, but more than I’ve ever used another similar site. Unlike the first generation of these websites, it actually has a point to it. I’m still resisting uploading photos to it (if I annotate those photos, am I ever going to be able to export that? highly unlikely) and I don’t like using it for messaging, because it won’t be searchable and integrated with my email or instant messaging services. Amusingly enough, I do think Facebook will actually succeed in making money. Hmn.. I guess I don’t have any major point to make here..

yahoo site searching syntax

Here’s a summary of what I’ve learned when restricting Yahoo! search to specific websites.

  • always use brackets

    prevents errors, especially with boolean. not really necessary in this example, but nevertheless: (search terms) (site:example.com)

  • use capitals for boolean

    site:example.com OR site:example2.com

  • specify field names always

    Use site:example.com OR site:example2.com not site:(example.com OR example2.com)

  • how to specify paths

    to specify a website that isn’t a (sub)domain use site:example.com inurl:folder/folder2 for the website example.com/folder/folder2/

    One problem is that if you are specifying a domain name and a website with a path, results for the latter will be ranked higher, because they match both site: and inurl:. To compensate for that, you could use a different method: inurl:example_com/folder/folder2. Note the use of the underscore instead of a dot for the last (and only the last) dot in the domain name. Also, in rare circumstances, this will find pages that are not in example.com, but have those terms in the URL somewhere.

  • specifying multiple folders in a site

    site:example.com (inurl:folder OR inurl:folder2)
    or
    inurl:example_com/folder OR inurl:example_com/folder2

  • specifying multiple sites with paths

    this can be derived from previous points, but here goes: site:example.com OR (site:example2.com inurl:folder) or more advanced: site:example.com OR site:example2.com OR (site:example3.com (inurl:folder OR inurl:folder2 OR inurl:folder3))

  • Use OR for multiple exclusion

    NOT (site:example.com OR site:example2.com)

  • putting it all together

    (search terms) (site:example.com OR (site:example2.com inurl:folder)) NOT (site:sub.example.com OR (site:example.com inurl:somefolder)) not that this is restricting to two websites (one with a path) but excluding sites from that first website in a specific subdomain or folder