Thursday, December 28, 2006

Search Strategies for the Web and Databases Part I

Search Strategies for the Web and Databases - Part I

In the next few days I'll be posting, in parts, an edited version of a guide I had been providing my students with since 2002. In this first part I'll try and give you a taste for the type and logic of search resources available to you through the Web. The second part will provide lists of what I perceive to be the best search engines and directories available. The third installment will include tips on how to keep abreast of new developments in the search and research space. The fourth and last installment will include a short tutorial on Boolean searching.

So - What kind of search resources are available on the Web?

Asked to name a search engine, most people would likely come up with either Yahoo! or Google. But Yahoo! and Google are as different from each other as night and day... Yahoo! is a directory - typically, directories work by cataloging entries that Web site owners and other interested parties submit. Therefore, if you don't submit your Web site to a given directory, you don't exist in that directory. But in the case of Yahoo!, even if you do submit your site, you may not appear in their directory... Why? A number of reasons. Yahoo! has cut back significantly on manpower in its directory department, and has built up a several months-long backlog (if not longer...), so they may just about be getting around to approving and cataloging sites submitted mid-year (if that). Some sites, Yahoo! librarians won't approve - or they won't approve them for the classifications for which they were submitted. And, Yahoo! has gotten into the nasty habit of trying to charge everyone for everything that they once happily provided for free (because they used to earn umpteen-hundred million dollars a quarter from Web-vertising! For an interesting perspective on Yahoo's new attitude, see for example Fast Company - http://www.fastcompany.com/online/60/jellis.html).

Google, on the other hand, is a true search engine. To appear in a search engine's results, you don't have to submit your site - in fact, in many instances, you can't. Rather, you need to structure your site in a way that will make it easy for search engine "spiders" and "crawlers" to find you. Spiders and crawlers are softbots (for software robots...) which "live" on the Web and constantly search and traverse Web sites, continuously feeding their "owners" (the search engine companies, typically...) with data. In order to ensure that such softbots find your site, you can employ many different strategies, beginning with well-defined and well-targeted "meta tags" in your html source code. An entire industry has sprung up around the very idea of developing strategies that will improve the chances such engines will find you - and that when individuals go to search for something relevant to you, your site will appear among the very top hits.

Search engines use various algorithms to index the Web - Google uses an algorithm called PageRank (after one of the company's co-founders, Larry Page) - but most engines index only a fraction of it; by some estimates, more than 70% of the Web is never returned in a search, meaning that effectively, unless you know of a specific site or page and go there directly - you'll probably never come across it. Google indexes more of the Web than any other search engine, and indexes it in a more efficient manner, which is why you are guaranteed to get results back lightnin' fast.

To summarize this section, then - there are basically two types of search resources on the Web - search engines and directories. Engines are better when you have less of an idea of what it is you are looking for, and vice versa. For example, if I am trying to locate the Web site for American Airlines, my best bet is probably to head over to Yahoo and search for "American Airlines" - Yahoo pulls the "American Airlines" category directly from its directory; click on it and you will be whisked off to a page that includes several links to "American Airlines" sites. But if I am looking for information about, say, "tropical birds" - I'm much better off heading over to Google - I'll get a much richer and more up-to-date result set. In other words, engines are good for a subject or concept search, while directories work better for "object" searches (company, person, product and so forth).

No comments: