| SEO is Search Engine Optimization | |
60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next
Search engine listing delays have come to be called the Google Sandboxeffect are actually true in practice at each of four top tiersearch engines in one form or another. MSN, it seems has theshortest indexing delay at 30 days. This article is thesecond in a series following the spiders through a brand newweb site beginning on May 11, 2005 when the site was firstmade live on that day under a newly purchased domain name. Previously we looked at the first 35 days and detailed thecrawling behavior of Googlebot, Teoma, MSNbot and Slurp asthey traversed the pages of this new site. We discovered theeach robot spider displays distinctly different behavior incrawling frequency and similarly differing indexing patterns. For reference, there are about 15 to 20 new pages added tothe site daily, which are each linked from the home page fora day. Site structure is non-traditional with no categoriesand a linking structure tied to author pages listing theirarticles as well as a "related articles" index varied bylinking to relevant pages containing similar content. So let's review where we are with each spider crawling andlook at pages crawled and compare pages indexed by engine. The AskJeeves spider, Teoma has crawled most of the pages onthe site, yet indexes no pages 60 days later at this writing.This is clearly a site aging delay that's modeled on Google'sSandbox behavior. Although the Teoma spider from Ask.com hascrawled more pages on this site than any other engine over a60 day period and appears to be tired of crawling as they'venot returned since July 13 - their first break in 60 days. In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed evena single page in 60 days since they made that initial crawl.But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at afew pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly. MSNbot crawled timidly for the first 45 days, looking over30 to 50 pages daily, but not until they found a robots.txtfile, which we'd neglected to post to the site for a week andthen bobbled the ball as we changed site structure, thenfailed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If littleelse were discovered about initial crawls and indexing, wehave seen that MSNbot relies heavily on that robots.txt fileand proper implementation of that file will speed crawling. MSNbot is now crawling with enthusiasm at anywhere between200 to 800 pages daily. As a matter of fact, we had to usea "crawl-delay" command in the robots.txt file after MSNbotbegan hitting 6 pages per second last week. The MSN index nowshows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes howwe changed the page structure to include a new feature whichlinks to questions from several other article pages. Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pagesone day and then 4000 the next, then simply look at the homepage for a few days and then jump back in for 3000 pages thenext day and back to only reviewing robots.txt for two days.Consistency is not a curse suffered by Slurp. Yahoo now shows6 pages in their index, one an errors page and another is a"index/of" page as we have not posted a home page to severalsubdomains. But Slurp has crawled easily 15,000 pages to date. Lessons learned in the first 60 days on a new site follow: 1) Google crawls 250 pages on first discovery of links to site.Then they don't return until they find more links and crawlslowly. Google has failed to index new domain for 60 days. 2) Yahoo looks for errors pages and once they find bad linkswill crawl them ceaselessly until you tell them to stop it.Then won't crawl at all for weeks until crawling heavilyone day and lightly the next in random fashion. 3) MSNbot requires robots.txt files and once they decide theylike your site, may crawl too fast, requiring "crawl-delay"instructions in that robots.txt file. Implement immediately. 4) Bad bots can strain resources and hit too many pages tooquickly until you tell them to stay out. We banned 3 botsoutright after they slammed our servers for a day or two.Noted "aipbot" crawled first then "BecomeBot" came alongand then "Pbot" from Picsearch.com crawled heavily lookingfor image files we don't have. Bad bots, stay out. Best toimplement robots.txt exclusions for all but top engines iftheir crawlers strain your server resources. We consideredexcluding the Chinese search engine named Baidu.com whenthey began crawling heavily early on. We don't expect muchtraffic from China, but why exclude one billion people?Especially since Google is rumored to be considering apossible purchase of Baidu.com as entry to Chinese market. The bottom line is that we've discovered all engines seem todelay indexing of new domain names for at least thirty days.Google so far has delayed indexing THIS new domain for 60days since first crawling it. AskJeeves has crawled thousandsof pages, while indexing none of them. MSN indexes faster thanall engines but requires robots.txt file. Yahoo's Slurp crawlson again off again for 60 days, but indexes only six of total15,000 or more pages crawled to date. We seem to have settled that there is a clear indexing delay,but whether this site specifically is "Sandboxed" and whetherdelays apply universally is less clear. Many webmasters claimthat they have been indexed fully within 30 days of first posting a new domain. We'd love to see others track spidersthrough new sites following launch to document their resultspublicly so that indexing and crawling behavior are proven. � Copyright July 18, 2005 Mike Banks Valentine Mike Banks Valentine is a search engine optimization specialistwho operates WebSite101 eCommerce Tutorial and will continue reports ofcase study chronicling search indexing of Publish101 Article Resource Click to Contact Mike Valentine
MORE RESOURCES:
Sponsored |
RELATED ARTICLES
How to Get the Best Deal on Your SEO Project If you own or manage a business Website, chances are you are at least somewhat familiar with the concept of search engine optimization (SEO). You may have read any number of books and articles on the subject and possibly given it a try yourself. Learn about the Google Search Engine Tools Think you know everything about searching with Google? Think again. Believe it or not, there are many tools and features available on Google that can be useful for marketing research as well as wasting time. Search Engine Optimization With Sitemaps I just wanted to share a little Search Engine Optimization experiment I ran to confirm the theory that Google likes content rich sitemap pages rather than just a bunch of links pointing to different pages on your site. I also wanted to look at a way of funnelling Google page rank to all the internal pages on my site as quickly as possibleI have heard from a few search engine optimization companies that sitemaps are good ways of helping search engine spiders find all the pages on your site but have you every thought that using good quality sitemaps can also help your internal pages attain a very high Google page rank very quicklyI was reading a Search Engine Optimization article about how Google likes pages with good quality relevant content and how they wanted to serve this quality content to their surfers. Gaining Link Popularity Techniques Link popularity is very important these days for all websites since most major search engines look at link popularity has a tool for ranking your site in there search results as well as the quality of your websites content keyword wise. Link popularity is a great way to gaining targeted visitors to your website. Linking for Fun and Profit Well actually, linking isn't fun at all. In fact, it's quite tedious. How to Avoid the Google Duplicate Content Filter? More and more webmasters are building websites with publicly available content (data feeds, news feeds, articles). This results in websites with duplicate content on the Internet. Click Click Boom: a Linking Strategy that will Blow Away Your Competition Web marketers, do you hear what I hear?"Click-click BOOM"That's the sound of your new linking strategy that's going to positively blow your competition clear out of the water.Click-click BOOM. 10 Things to Expect from Your SEO Copywriter From the perspective of a business owner, webmaster, or marketing manager, the change exhibited by the Internet is profoundly exciting, yet profoundly disturbing. The information (and misinformation and disinformation) it offers, the business benefits it promises, and the rules it is governed by change at such a rapid rate that it's almost impossible to keep up. One Way Linking Campaigns II There is a way to generate links with the content that you have not as yet created. For this contact the established authorities (writers, publishers ) in your domain area & let them know that you are available as a resource for researching & writing on any topic from the chosen domain. 3.5 Reasons You Need To Forget About Search Engines Have you ever stopped to think how much time is taken up over the course of a year just by that niggling thought at the back of your mind;"Am I on the first page?"I bet it if you counted up all the times you did it you would be looking at hours, maybe even days! But are search engines really that important to the success of your website? No, they're not - here's why.. Ten Steps To A Well Optimized Website - Step 5: Internal Linking Welcome to part five in this search engine positioning series. Last week we discussed the importance of content optimization. PageRank and How It Gets Assigned We know that each and every website page is assigned a Google Page rank, based upon a mathematical algorithm. Pages rank on a scale of 0 (zero) the lowest, and 10 (ten) the highest. Google Rankings -- Achieve Top 10 Rankings with Free Tools In order to get a top 10 google ranking it is imperative to not only know how to use the resources that are available to us, but to also know what to do with them. The key is always to find out what tools are the ones to use. Google has an Achilles Heal - Will Their Competitors Notice? Even though Google Revenues continue to soar, the hidden problem that may stifle growth and may even allow Yahoo or MSN to overtake the paid search market in the future lies in two critical phrases: Customer Support, and Customer TrainingApproximately 40% of the small businesses we have surveyed have tried Adwords in the past and failed, and some of them have tried multiple times. In some markets the percentage is closer to 60%. SEO #4: Off-page Optimization Yesterday you should have read the third course out of 6 courses that will help you get a TOP rank in the search engines and get EXPLOSIVE LASER TARGETED TRAFFIC for Free. Today we move on to course #4 and study off-page Optimization. Link Building for Hilltop Hilltop is one of the major concepts underpinning Google's search algorithm, yet its workings and implications are often misunderstood.After the infamous Florida Update, many webmasters were aghast as their rankings plummeted; and again, when the mysterious "sandbox" was implemented, some webmasters could not get a Web site to rank well, period. Advanced Uses for the Google Algorithm Previously.. The Google Phenomenon: Greatest Hits Collection As my readers know, I've been one of Google's harshest critics. However, I'm also a firm believer that it's okay to criticize, as long as it's justifiable and fairly balanced with deserved praise. Finding Targeted Keyword Phrases Your Competitors Miss Finding Targeted Keyword Phrases Your Competitors MissFinding keyword phrases your competition is missing is easier than it seems. Combinations of two and three word phrases are often overlooked by your competitors when vying for the top competitive terms. SEO - Get Your Site Out of the Google Sandbox Fast! Is your new site sitting in the infamous Google "sandbox"? There is a way to get it out fast, as well as getting all of your other pages indexed!How?Write an article on your site topic and upload it to your website. You can either put it on your index page, or place a snippet of the content with a link to your new article on your index page. |
| Home | Site Map | Thai Hosting | Website Directory |
| © 2007 ThaiIRC.in.th |