SEO is Search Engine Optimization

Search Engine Spiders Lost Without Guidance - Post This Sign!

The robots.txt file is an exclusion standard required by allweb crawlers/robots to tell them what files and directoriesthat you want them to stay OUT of on your site. Not allcrawlers/bots follow the exclusion standard and will continuecrawling your site anyway. I like to call them "Bad Bots" ortrespassers. We block them by IP exclusion which is anotherstory entirely.

This is a very simple overview of robots.txt basics forwebmasters. For a complete and thorough lesson, visithttp://www.robotstxt.org/

To see the proper format for a somewhat standard robots.txtfile look directly below. That file should be at the root ofthe domain because that is where the crawlers expect it to be,not in some secondary directory.

Below is the proper format for a robots.txt file ----->

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/

User-agent: msnbot
Crawl-delay: 10

User-agent: Teoma
Crawl-delay: 10

User-agent: Slurp
Crawl-delay: 10

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

--------> End of robots.txt file

This tiny text file is saved as a plain text document andALWAYS with the name "robots.txt" in the root of your domain.

A quick review of the listed information from the robots.txtfile above follows. The "User Agent: MSNbot" is from MSN,Slurp is from Yahoo and Teoma is from AskJeeves. The otherslisted are "Bad" bots that crawl very fast and to nobody'sbenefit but their own, so we ask them to stay out entirely.The * asterisk is a wild card that means "All"crawlers/spiders/bots should stay out of that group of filesor directories listed.

The bots given the instruction "Disallow: /" means they shouldstay out entirely and those with "Crawl-delay: 10" are thosethat crawled our site too quickly and caused it to bog downand overuse the server resources. Google crawls more slowlythan the others and doesn't require that instruction, so isnot specifically listed in the above robots.txt file.Crawl-delay instruction is only needed on very large siteswith hundreds or thousands of pages. The wildcard asterisk *applies to all crawlers, bots and spiders, includingGooglebot.

Those we provided that "Crawl-delay: 10" instruction to wererequesting as many as 7 pages every second and so we askedthem to slow down. The number you see is seconds and you canchange it to suit your server capacity, based on theircrawling rate. Ten seconds between page requests is far moreleisurely and stops them from asking for more pages than yourserver can dish up.

(You can discover how fast robots and spiders are crawling bylooking at your raw server logs - which show pages requestedby precise times to within a hundredth of a second - availablefrom your web host or ask your web or IT person. Your serverlogs can be found in the root directory if you have serveraccess, you can usually download compressed server log filesby calendar day right off your server. You'll need a utilitythat can expand compressed files to open and read those plaintext raw server log files.)

To see the contents of any robots.txt file just typerobots.txt after any domain name. If they have that file up,you will see it displayed as a text file in your web browser.Click on the link below to see that file for Amazon.com

http://www.Amazon.com/robots.txt

You can see the contents of any website robots.txt file thatway.

The robots.txt shown above is what we currently use atPublish101 Web Content Distributor, just launched in May of2005. We did an extensive case study and published a series ofarticles on crawler behavior and indexing delays known as theGoogle Sandbox. That Google Sandbox Case Study is highlyinstructive on many levels for webmasters everywhere about theimportance of this often ignored little text file.

One thing we didn't expect to glean from the research involvedin indexing delays (known as the Google Sandbox) was theimportance of robots.txt files to quick and efficient crawlingby the spiders from the major search engines and the number ofheavy crawls from bots that will do no earthly good to thesite owner, yet crawl most sites extensively and heavily,straining servers to the breaking point with requests forpages coming as fast as 7 pages per second.

We discovered in our launch of the new site that Google andYahoo will crawl the site whether or not you use a robots.txtfile, but MSN seems to REQUIRE it before they will begincrawling at all. All of the search engine robots seem torequest the file on a regular basis to verify that it hasn'tchanged.

Then when you DO change it, they will stop crawling for briefperiods and repeatedly ask for that robots.txt file duringthat time without crawling any additional pages. (Perhaps theyhad a list of pages to visit that included the directory orfiles you have instructed them to stay out of and must nowadjust their crawling schedule to eliminate those files fromtheir list.)

Most webmasters instruct the bots to stay out of "image"directories and the "cgi-bin" directory as well as anydirectories containing private or proprietary files intendedonly for users of an intranet or password protected sectionsof your site. Clearly, you should direct the bots to stay outof any private areas that you don't want indexed by the searchengines.

The importance of robots.txt is rarely discussed by averagewebmasters and I've even had some of my client business'webmasters ask me what it is and how to implement it when Itell them how important it is to both site security andefficient crawling by the search engines. This should bestandard knowledge by webmasters at substantial companies, butthis illustrates how little attention is paid to use ofrobots.txt.

The search engine spiders really do want your guidance andthis tiny text file is the best way to provide crawlers andbots a clear signpost to warn off trespassers and protectprivate property - and to warmly welcome invited guests, suchas the big three search engines while asking them nicely tostay out of private areas.

Google Sandbox Case Study http://publish101.com/Sandbox2Mike Banks Valentine operates http://Publish101.comFree Web Content Distribution for Article Marketers andProvides content aggregation, press release optimizationand custom web content for Search Engine Positioninghttp://www.seoptimism.com/SEO_Contact.htm

MORE RESOURCES:

- ชีทสรุป มสธ.
- จดโดเมน
- โฮสติ้ง - Thailand Web Hosting Business
- Directory
- SEO Thailand
- ข้อมูลหนัง
- Domain Names
- Web Hosting
- http://www.thaiirc.in.th/seo/
- Website Directory
- ฟิล์ม focus
- ฟิล์มกันรอย focus

Sponsored

RELATED ARTICLES

If You Could Submit Just One Page To The Search Engines Which Should It Be?
Listen. Some make submitting pages to search engines sound like the fast track to search engine ranking nirvana.

Good Things Come to Those Who Wait (and Other Analogies and Clich�'s for SEO)
We've all heard that familiar expression, "Good things come to those who wait". Whether you're waiting for your Heinz ketchup to pour out onto your burger (remember those commercials?), waiting for Christmas day to open your gifts, waiting for summer vacation to be let out of school, or waiting in line at the DMV? well, maybe not the DMV, good things will come if you simply allow them to come in their own time.

Opinion - Search Engine Success
This article is actually the summary to a book soon to be released by the author, titled "Guaranteed Website Success". Opinions are quite often controversial.

Lets Make Your Website #1
Their is simple way of making your website rank top and search optimized.Step 1.

Speed Indexing - 3 Steps to Getting Your Website Listed in Google Quickly
Getting your website listed in Google quickly simply requires that you know what Google is looking for and how to apply that to your site. Fortunately, what Google is looking for is pretty easy to understand and use in your marketing plan.

Using Google
Thanks to a unique algorithm that produces most relevant results to any given query, Google has become, indisputably, the best search engine on the Internet. On the last count, Google has indexed over 4 billion pages and tackles around 200 million searches a day! A cluster of 100 thousand servers are used to store, crunch and spew out the query results with lightning speed that you are so accustomed to see.

How To Become an SEO Expert - 5 Secrets That Will Allow You to Outperform 95% of All Webmasters
Becoming an SEO expert, or a search engine optimization expert, is something that is worth your while if you are interested in getting your web page noticed and increasing your traffic. It is not hard to become an SEO expert if you simply know what steps you need follow to take you there.

Google Search Algorithm Patent Application Creates Spring Buzz!
Google applied for a patent on their ranking algorithm as of 15months ago on December 31, 2003 and that application was postedon March 31st at the US Patent Office. It got the discussionforums buzzing this weekend.

Search Engine Optimisation - Getting Targeted Traffic
Getting SEO right is an art and it's based on theory and best practice. The search engines don't publish their algorithms so getting it right is like finding the exact formula for Coca Cola.

To Understand the Success of Website Ranking
Time is a factorTo obtain positive results is not very fast to achieve. It always takes time to reach a good ranking on search engines since there are millions of web pages to be indexed in their databases.

The First Search Engine Marketing Method: Content That Meets Customers Needs
For your business web site, good search engine rankings and high user traffic depend on relevant content. More user traffic will mean more sales leads and then more sales.

7 Search Engine Optimization Strategies
Search engine optimization refers to the technique of making your web pages search engine friendly so that search engines are more easy to understand and analyze your website. Consequently, your site has a better chance to gain high search engine ranking.

Things You Must Realize When Searching
For the uninitiated, searching for web pages can seem a slow, obscure process. Unless you have a high-speed Internet connection, web pages may seem to take days to load.

How to Improve Your Search Engine Rankings
When people think of search engine optimization, they immediately think of time consuming very tedious tasks that are way beyond their capabilities. That might be true if they put no time and effort into it.

Monitor and Increase Your Search Engine Visibility with the DIY SEO Tools
In this three part article, you'll find many tools that any webmaster can use to monitor your site's search engine position, and use to increase the visibility of your site in major search engines like Google, Yahoo and MSN.URL Trendshttp://www.

Click Click Boom: a Linking Strategy that will Blow Away Your Competition
Web marketers, do you hear what I hear?"Click-click BOOM"That's the sound of your new linking strategy that's going to positively blow your competition clear out of the water.Click-click BOOM.

Search Engine Optimization Strategies Guaranteed to Skyrocket your Rankings
The point of optimizing your website is so that you will get ranked higher in the search engines and receive more visitors to your site. As a result, you will increase sales and revenue.

Search Engine Optimization, Positioning: Is Chasing Google Algorithms Worth It?
Google is the undisputed heavyweight champion of search engines. Most SEO (search engine optimization) companies pitch their tent around Google.

Dynamic Pages
Dynamic pages and the Search Engines By Clare Lawrence 10th March 2003 Clare is the CEO of Discount Domains Ltd a leading UK Domain name registration service.Do search engines such as google penalise dynamic pages?Dynamic pages are used to deliver content from a database to websites - the advantage being that data can be updated and the contents of pages changed without the need to reload pages etc.

2 Powerful Ways To Capitalize on Your Search Engine Traffic
Many marketers know that search engine marketing is among one ofthe best methods to get visitors to their websites. It's aprofitable way to reach new subscribers and new customers.

Home | Site Map | Thai Hosting | Website Directory