| An Internet search engine is a software program specially-designed to search for data on the Web. The search results are commonly presented in the form of a list and are commonly called hits. The information may contain web pages, images, information and other types of files. Some search engines also collect data available in databases or open directories. Unlike Web directories that are maintained by human editors, search tools operate automatically or are a mix of algorithmic and human input.
Internet search engines operate by storing information about numerous web pages which they retrieve from the WWW. These pages are retrieved by a web crawler, also known as a spider. It is an automated Web browser that follows every link it discovers. Afterwards the content of each page is analyzed to decide how it should be indexed. Words, for example, are taken from titles, headings or special fields called meta tags. Data about web pages are stored in an index database for further use in queries. Some search engines, such as Google, save and store the whole or part of the source page (differently called a cache) as well as data about web pages, whereas others, such as AltaVista, save and store every word of every page they find. The cached page always comprises the initial search text, because it is the one that was actually indexed. Thus, it can be very useful because it comprises information that can no longer be available elsewhere.
When a user types search words in the search field, the software programme checks its catalogue and displays a list of the most suitable web pages according to its criteria, commonly with a brief summary containing the title of the document and at times excerpts from the text. Some search tools have introduced an advanced feature called proximity search which allows users to define the length between key words.
The usefulness of a search engine hangs on the relevance of the results it provides. Since there may be millions of web pages that contain a particular key term or word combination, web pages can be divided into relevant and irrelevant ones. The results can be ranked to show the "best" ones first.
How a search engine determines which pages are the best matches, and in what arrangement the results should be shown, is search engine-specific. The techniques also alter with time, as the use of Internet services alters and new techniques emerge. |