Search Engines: How they Work

Search engines operate in three parts, (1) a mechanism that identifies web pages to be included in the database, (2) a mechanism that indexes the sites, and (3) a searching mechanism with an interface, which scans the index for keywords.

When conducting a search, search terms are entered and the index is searched. The index is a database that holds the information related to the web documents. Documents in which the search terms occur are presented as "hits."

Most search tools retrieve "hits" or "matches" by seeking occurrences of your search terms within its database and by attempting to match the terms against its index.

'Bot, intelligent agent

A 'bot is an automated device (software) which may be programmed to search for terms (data "strings") matching certain criteria. 'Bots are also known as intelligent agents, spiders, crawlers, robots, or worms.

A 'bot identifies and notes the url's of web pages to be included in the database. Then, another 'bot comes along and scans the interiors of web documents and records occurrences of words and their position within the text. This is the information used to create an index.

'Bots crawl from one hypertext link to another.

Listing Hits:

Every search engine had its own method for calculating relevance. Relevance is a rank assigned to the hits that your search term(s) have generated. Some search engines assign a number to each hit. That number next to the URL indicates its "relevance ranking". Relevance is simply the probability that the "hit" or "match" is on-target with your query.

Search engine masters do not divulge their secrets for calculating relevance. Appearing high in the major search engines' rankings on a topic means big business.

Some search engines look only in certain fields to index documents such as the title field, first paragraph and in something called "meta-tags." Meta-tags allow the creator of a web site to add descriptive keywords which are not displayed in the actual web documents; they are specifically to enhance retrieval of the document.

top