Tuesday, September 07, 2010 | Posted in , , , , , ,
When understanding how Google indexes webpage’s is to think of the web as a large book and which has an impressive index which identifies where everything is located. When you query a search on Google, it checks their index of webpage’s they have compiled and determines the most relevant search results to be returned back to the user.
The three key processes in delivering search results to you are:
  • Crawling: Does Google know about your site? Can we find it?
  • Indexing: Can Google index your site?
  • Serving: Does the site have good and useful content that is relevant to the user's search?

Crawling
Google use a process where automated software
known as Googlebot skewers the internet for new and recently updated paged to be added to their index.
Google use a vast set of computers to fetch (or “crawl”) billions of webpage’s. The program which performs this amazing job is known as Googlebot. Other terms it’s also known as are robot, bot or spider. Googlebot uses an algorithmic process: computer programs determine which sites to crawls, how often, and how many pages they fetch from each site.
Indexing
When Googlebot begins to the crawl the internet it starts with a list of URL’s from previous sessions, and augmented with Sitemap data provided by webmasters. When Googlebot lands on a page it takes the links from that page and adds them to its list of pages to crawl. New websites or updates to new ones are noted and updated on the Google index.
The good thing to note here is that you cannot pay Google to crawl a site more frequently and is not part of their revenue generating services.
Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, Google processes information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, we cannot process the content of some rich media files, dynamic pages or iframes.
Serving results
When you query a search on Google, it takes your query and matches to relevant pages within their index and then displays the results in the order Google feels most benefits your query. Relevancy is determined by over 200 factors, one of which is Pagerank. Pagerank is based on the importance if the incoming Links from other sites. Each link from another site your own contributes to how well your page will rank. But don’t think you can go out and get 1000’s of incoming links by automatically submitting to sites. Not links are equal in Googles eyes. Google is working hard to ensure that you the user are provided the best result for your search by identifying spam links and other practices that have a negative impact on search results. What this means in simple terms, if you have a site which provides information on pet care and products, then your page rank will increase when getting incoming links from websites with similar interests and content.
Before Google can index and rank your site well in search results, you need to ensure that Googlebot can crawl and index your site properly. Broken and dead links will have a negative impact on how your site ranks. It’s important to ensure that you use Google webamstertools to not only ensure your site can be crawled but to also ensure you comply with Googles guidelines and improve your sites ranking.
Caching your Site

One of the major advantaged I find with Google caching the content of your website is that if you ever mistakenly save over your index.html page on either your main website or one of it's subfolders your can retrieve that data without to much hassle through Googles Webmaster Tools.

1. Log into your Webmaster Tools Account.

2. Click on Statistics on the left hand side panel.

3. Click on Index Stats.

4. Then on
cache: The current cache of your site cache:yourdomain.com
From here it will show you a screen shot of the last time Googlebot crawled your webpage.

What you want to do from here is save that page.

1. Go to 'File' at the top right hand corner of  your browser.

2. Go down to 'Save As' and save it as index.html or index1 whatever you choose.

3. Use your FTP program to transfer it over to your Hosting Folder.

4. Using your Website Design Tool modify it from what Google has Cached then save back over your index.html file.

By this point you should your original Website which was lost should now be restored. At this point you may want to make a back up of it.

Comments

0 responses to "How Google Indexes Your Webpage"