Total Number of Subscribers: 464   

 

  Date: 16th October 2009

 Compiled by: M Sathya Kumar  


Search Engines — some fundamentals

Not all explorers leave home’ says the caption of a popular advertisement. True indeed. Some explorers may opt to explorer their world from the desktop of their home ! Just connect to the Internet and you have access to an estimated 5 billion pages (well, there is no consensus on this figure !) of information that almost doubles itself every two years. Although exploring the World Wide Web and hunting for relevant information is by itself a stimulating task, the objective is accomplished only when one finds pertinent information.

Searching information on the Internet with the help of search engines is a popular way of finding information (although, not the best way to find information in all cases). All those who have tried searching information with the help of search engines, will agree that finding relevant information efficiently is not an easy task. It is difficult due to several reasons. Firstly, it is difficult because of the sheer vastness of information. However, what makes the task of finding high quality relevant information really difficult is that this vast sea of information is not indexed in any standard way. It is like guessing what words would be used to describe the information that you are looking for. This article discusses the fundamental principles based on which the information is searched by search engines on the Internet.

Search Engines :

A. What is a search engine ?

Search engines are huge databases that contain full-text (every word) of the web pages they link to.

B. Who builds this database ?

The database of a search engine is built by computer robot programs called ‘spiders’ and not by human selection. Although it is said they ‘crawl’ the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e., already ‘know about’). These spiders carry out their exercise of following links and creating new records in the database according to programmed time intervals.

C. How does a page, to which no other page links, ever gets an entry in the database ?

Quite obvious, if a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page — one that no other page has ever linked to — can get into a search engine is for its URL to be sent by some person to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.

D. After finding pages, what’s next?

After spiders find pages, they pass them on for ‘indexing’. ‘Indexing’ in very simple words means relating (linking) the record to a word. The concept is similar to the concept of subject index at the end of a book, which lists in the order of topics, all pages in the book that cover those particular topics. Indexing is done by another computer program, again without any human interference. This program identifies the text and other content in the page and stores it in the search engine database’s files so that the database can be searched by that keyword. Search engines may index all of the terms on a given web page. Or they may index all of the terms within the first few sentences, the Website title, or the document’s metatags.

E. Finally, how is the word(s) searched ?

Contrary to popular perceptions, search engine does not really search the World Wide Web directly. When you enter a word or a phrase in a search engine and click the ‘Search’ button, the search engine searches your word or phrase in its database of the full text of web pages selected from the billions of web pages out there residing on servers. This means that you are always searching your word or phrase in a somewhat stale copy of the real web page. When you click on links provided in a search engine’s search results, you retrieve from the server the current version of the page.

F. How are Search results ranked ?

All pages in the search results are ranked by a computer algorithm, e.g. Google uses a complex mathematical algorithm based on the number and quality of links to a particular web page to rank that web page in its search results. However, certain search engines rank pages in their search results on the basis of revenue paid by owners of these websites.

G. How much of the World Wide Web does a search engine database cover ?

Well, again contrary to popular perceptions, even the most popular of search engines cover not more than 65% of the World Wide Web. One reason is the time lag between a web page is put up somewhere on a server out there, and the time its record is created in the search engine’s database, either after the spider of the search engine locates it or some person requests the link to be included. Another reason is that some types of pages and links are excluded from most search engines by policy. Still others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the ‘Invisible Web’ — what you don’t see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web !

H. When should I use a search engine ?

Search engine is a good tool to find information specially when :

1. You have a narrow or obscure topic or idea to research

2. You are looking for a specific site

3. You want to search the full text of millions of pages

4. You want to retrieve a large number of documents on your topic i.e. you are inclined to do some hardcore research.

5. You want to search for particular types of documents, file types, source locations, languages, date last modified, etc.

I. Meta-Search Engines, now who’s that ?

Meta-search engines (also known as multi-threaded engines) pass on the search query to several major search engines at once. They do not maintain their own database of web pages. Instead, they act as a middle agent, passing on the query to the major engines and then returning the results. Because the major search engines often produce very different results, meta-search engines provide a quick way to determine which engines are retrieving the best match for your information need.

Because search engines vary in ability to interpret complex searches, meta-search engines work best with simple searches.

J. Good Search Engines ?

1. Google — it’s my personal favourite.

2. AltaVista

3. HotBot

4. Excite

Having discussed the fundamentals of search engines, we shall take up in the forthcoming issues, tips and tricks to find information using Google. Kindly note, our focus now shifts from ‘searching’ to ‘finding’ !

Article by Mr. Nikunj S. Shah, Chartered Accountant

 


Rewards waiting for feedback at
E-mail : smarttrainee@gmail.com


www.primeonlinetest.com

Disclaimer: We believe that the information contained in this e-zine is true. If you do not wish to receive Smart Trainee please click here.

Prime Academy - In Pursuit of excellence

 

Click here to contact us, if you are unable to view the content properly