SEO is Search Engine Optimization

Playing in Googlebots Sandbox with Slurp, Teoma, & MSNbot - Spiders Display Differing Personalities

There has been endless webmaster speculation and worry about the so-called "Google Sandbox" - the indexing time delay for new domain names - rumored to last for at least 45 days from the date of first "discovery" by Googlebot. This recognized listing delay came to be called the "Google Sandbox effect."

Ruminations on the algorithmic elements of this sandbox time delay have ranged widely since the indexing delay was first noticed in spring of 2004. Some believe it to be an issue of one single element of good search engine optimization such as linking campaigns. Link building has been the focus of most discussion, but others have focused on the possibility of size of a new site or internal linking structure or just specific time delays as most relevant algorithmic elements.

Rather than contribute to this speculation and further muddy the Sandbox, we'll be looking at a case study of a site on a new domain name, established May 11, 2005 and the specific site structure, submissions activity, external and internal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at the top four search engines.

Ready? We'll give dates and crawler action in daily lists and see how this all plays out on this single new site over time.

* May 11, 2005 Basic text on large site posted on newly purchased domain name and going live by days end. Search friendly structure implemented with text linking making full discovery of all content possible by robots. Home page updated with 10 new text content pages added daily. Submitted site at Google's "Add URL" submission page.

* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google. (Slurp is Yahoo's spider and Teoma is from Ask Jeeves) Posted link on WebSite101 to new domain at Publish101.com

* May 15 - Googlebot arrives and eagerly crawls 245 pages on new domain after looking for, but not finding the robots.txt file. Oooops! Gotta add that robots.txt file!

* May 16 - Googlebot returns for 5 more pages and stops. Slurp greedily gobbles 1480 pages and 1892 bad links! Those bad links were caused by our email masking meant to keep out bad bots. How ironic slurp likes these.

* May 17 - Slurp finds 1409 more masking links & only 209 new content pages. MSNbot visits for the first time and asks for robots.txt 75 times during the day, but leaves when it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in!

* May 23 - Teoma spider shows up for the first time and crawls 93 pages. Site gets slammed by BecomeBot, a spider that hits a page every 5 to 7 seconds and strains our resources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out.

* May 24 - MSNbot has stopped showing up for a week since finding the robots.txt file missing. Slurp is showing up every few hours looking at robots.txt and leaving again without crawling anything now that it is excluded from the email masking links. BecomeBot appears to be honoring the robots.txt exclusion but asks for that file 109 times during the day. Teoma crawls 139 more pages.

* May 25 - We realize that we need to re-allocate server resources and database design and this requires changes to URL's, which means all previously crawled pages are now bad links! Implement subdomains and wonder what now? Slurp shows up and finds thousands of new email masking links as the robots.txt was not moved to new directory structures. Spiders are getting errors pages upon new visits. Scampering to put out fires after wide-ranging changes to site, we miss this for a week. Spider action is spotty for 10 days until we fix robots.txt

* June 4 - Teoma returns and crawls 590 pages! No others.

* June 5 - Teoma returns and crawls 1902 pages! No others.

* June 6 - Teoma returns and crawls 290 pages. No others.

* June 7 - Teoma returns and crawls 471 pages. No others.

* June 8-14 Odd spider behavior, looking at robots.txt only.

* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.

* June 16 - Slurp still thirsty, gulps 1379 pages! No others.

So we'll take a break here at the 5 weeks point and take note of the very different behavior of the top crawlers. Googlebot visits once and looks at a substantial number of pages but doesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages until it is told to lay off the bad liquor, er that is links by getting robots.txt to slap slurp to its senses. MSNbot visits looking for that robots.txt and won't crawl any pages until told what NOT to do by the robots.txt file. Teoma just crawls like crazy, takes breaks, then comes back for more.

This behavior may imitate the differing personalities of the software engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and some reassurance it is doing the right thing, picks up pages slowly and carefully. Slurp has addictive personality and performs erratically on a random schedule. Googlebot takes a good long look and leaves. Who knows whether it will be back and when.

Now let's look at indexing by each engine. As of this writing on July 7, each engine also shows differing indexing behavior as well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexed in a clear aging routine that doesn't list any of the nearly 8,000 pages it has crawled to date (not all itemized above.) MSN has 187 pages indexed while crawling fewer pages than any of the others. Ask Jeeves has crawled more pages to date than any search engine, yet has not indexed a single page.

Each of the engines will show the number of pages indexed if you use the query operator "site:publish101.com" without the quotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.

The daily activity not listed in the three weeks since June 16 above has not varied dramatically, with Teoma crawling a bit more than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.

Linking campaign has been minimal with posts to discussion lists, a couple of articles and some blog activity. Looking back over this time it is apparent that a listing delay is actually quite sensible from the view of the search engines. Our site restructuring and bobbled robots.txt implementation seems to have abruptly stalled crawling but the indexing behavior of each engine displays distinctly differing policy by each major player.

The sandbox is apparently not just Google's playground, but it is certainly tiresome after nearly two months. I think I'd like to leave for home, have some lunch and take a nap now.

Back to class before we leave for the day kiddies. What did we learn today? Watch early crawler activity and be certain to implement robots.txt early and adjust often for bad bots. Oh yes, and the sandbox belongs to all search engines.

Mike Banks Valentine is a search engine optimization specialist who operates http://WebSite101.com and will continue reports of case study chronicling search indexing of http://Publish101.com

RELATED ARTICLES

Five FAQ About Google PageRank
Five FAQ about Google PageRank1. What is PageRank and why should I care about it?PageRank is a formula that assigns a value to every page in the Google index.

Organic SEO: Patience For Long Term Ranking Results
When does long term SEO show ranking results? It takes time for optimization to produce targeted traffic to your website. Organic SEO requires time to take effect, just as it takes time for your web pages to start showing up in the search engine results.

Anchor Text Optimization
Anchor Text (also called phrase linking) can significantly improve your web pages relevance in the search engines. Optimized or keyword rich anchor text can help your web site gain positioning in the search engines as well as help drive better targeted search traffic.

Directories and Their Importance for Search Engine Rankings
About directories:A directory is simply a web site that contains a categorized listing of links from around the web. They aid surfers to locate the 'best' and most informative links for a particular category.

New Site and Sandbox: How to Get Rid of It
You have put lot of sweat in making your site. Now you want it to engender revenues and only source of traffic is search engines.

Search Engine Optimization :: The Basics and Why Websites Need It
We all know that the most targeted traffic we can get for our websites is from search engines. If you have a little patience and time to set websites the right way you can have a great source of excellent traffic and, best of all, you get it for free.

How To Make Your Website More Successful? (Part II)
In part I of our series of how to make your website more successful we already showed you some important tricks to build a more successful website. This time we are going to expand the scope a little to further improve your website and to make it work harder for you on the Internet.

Submitting Your Site To The Open Web Directory: Some Dos And Don'ts
One of the most important steps in any site's publicity campaign is the submission to the Open Web Directory (http://www.dmoz.

How to Get a Website Indexed Fast
Get Indexed FastWhat does getting indexed mean?The search engines keep a cache of every web page in their index.In English, this means: The search engines make a copy of every web page they visit and put in their records? ummm, I think that's what I mean.

Dont Focus Too Much on Your Internet Business Website Ranking
No doubt, having a high search engine ranking is very important to a home business owner as it will increase their business revenue. Much money have been spend on search engine optimization software, books, ebook and SEO services in order to obtain a high ranking website.

Tales, Fails And Betrayals Of Search Engine Placement Part 1
This is the first of a series of articles about a subject so many people want to learn; the un-sizzled truth of search engines. What do you really know about how a web site is placed? Are you being told the truth? Not only will these articles enlighten you, but along the way, I will create a new web site and in the next 2 months we will watch where it lands!You will learn about the tricks, and tools I use.

Why Articles Are Not The Route To High Search Engine Rankings
If you have any interest in getting high search engine rankings for your website (and who doesn't) you've probably been sold the idea that writing and publishing your own articles will do it for you. Here's why that's not entirely true.

The Budget Webmaster's 6 Step Guide to Improving Existing Rankings in Google
The Budget Webmaster's 6 Step Guide to Improving Existing Rankings in GoogleYou know the scenario. You get an occasional click from Google for a certain keyword.

Google Ban - How Not To Get Banned By Google!
Given that Google now provides over 75% of all Internet search traffic, the last possible thing any site owner would want is to be banned from the Google index!With countless search engine marketing techniques being employed these days, and contrasting advice available all over the web, it is well worth ensuring that you do not 'over optimise' your site or use any techniques which will result in Google penalising your site.Although the main rule would be to create a site which caters for your audience, provides quality content and contains meta information which is faithful to your site content, you should always optimise your site code to aid in your search ranking efforts, but this should be done in moderation, and in line with the following tips.

Search Engine Strategies for Affiliate Websites
The major search engines are always on the lookout to improve their search results and to weed out websites with duplicate content. They are also weeding out websites that mainly contain affiliate links.

Creative Search Engine Optimization - A Case Study
Search engine optimization this and search engine optimization that. You read and hear about it all day, but what about your site? While there are plenty of articles providing useful information, this article shows you how a real world example met with success.

Meta Tag Tactics - Give Your Website Traffic a Boost with the Meta Tag Basics
Getting your site noticed by the search engines and rewarded with top rankings is most webmasters main goal, however there are a lot of different factors that play into what the search engines are looking for, including Meta tags. So, if you don't know anything about meta tags but are interested in learning about them so you can use them to possibly increase your rankings, then read the following basic tips regarding meta tags.

Search Engine Musical Chairs
News broke this week that Yahoo has purchased the Inktomi search engine for around US$235 million. This is an interesting development in the search engine industry that may impact greatly on exactly where sites get their traffic from.

Google Page Rank - Important Or Just Another Number?
In my last newsletter I wrote about how your websites Alexa rating is not actually that important to the success of your online business. In this issue, I want to look at another popular statistic - Google Page Rank - and ask a similar question - is it that important?First a quick overview as to what the Google Page Rank actually is.

Link Building Services
In today scenario when we talk about Search Engine Optimization, we also talk about one of the most important aspect of SEO, which is Link Building. But there are different types, aspects and limitations of Link Building, which would be discussed now under1.

Home | Site Map | Thai Hosting | Website Directory