The brief (and by no means complete) definition of a robots.txt is that it is a document that tells search engine spiders what pages in your site to index and which to ignore. Many people choose to prevent their BANS folders and files from being indexed for several reasons. If they do get indexed then these folders and files could be included in search engine results. These are not pages you want potential customers going to.
Be aware that creating a robots.txt does not guarantee that search engine spiders will follow your requests. Most will but a few will do what they want regardless of what you enter in robots.txt.
My instructions to add a basic robots.txt to your site: |
|
|
Step One
|
|
| Open Notepad or equivalent (do NOT use a word document program as these add hidden characters that can do funny things). | |
| Copy the contents of robots-txt.txt to your Notepad. | |
| Note: If you don’t want certain folders or pages to be indexed you can disallow them by following the above format: | |
|
|
| Make sure you don’t leave a blank Disallow action: | |
|
|
| Including a blank Disallow: will prevent your entire site from being indexed. That would be bad! | |
| Step Two | |
| Save your document as robots.txt. | |
|
Step Three
|
|
| Upload your robots.txt to your site’s root file (where you site’s index.php is located). (Click here to read my opinion of the best FTP client.) | |
|
done
|
|
Rochelle



[...] Three Help search engines index your site quicly by adding your dynamic sitemap to your robots.txt (click here if you don’t yet have a [...]
[...] it is a good idea to add a meta tag that helps search engine spiders find your robots.txt file (click here for instructions on how to add a robots.txt to your site). Doing this may also help your site get [...]
[...] Seven Add the dynamic sitemap to your robots.txt (click here if you don’t yet have a [...]
[...] Eighteen Add robots.txt to [...]
[...] to your robots.txt file (click here if you don’t know how to do this or see How To Add a robots.txt if you don’t yet have a [...]
Checked my robot txt in google tools and for many of the lines the result was “Syntax not understood” line six title was “Accepted but correct syntax includes a colon”
Is this normal, I followed the instructions. Thanks for any help, I love-love-love this site, Joette
Joette,
My sincere apologies to you. I had some HTML codes in the robots-txt.txt file that I have now removed. Thank you for bringing this to my attention.
Rochelle
Hi Rochelle,
When working through your checklist, I completed the task to disallow the safety on ebay page from my robots.txt file.
I’ve just gone back into Google Webmaster tools and there is a warning on the site:
“URLs roboted out
When we tested a sample of the URLs from your Sitemap, we found that the site’s robots.txt file was blocking access to some of the URLs. If you don’t intend to block some of the URLs contained in the Sitemap, please use our robots.txt analysis tool to verify that the URLs you submitted in your Sitemap are accessible by Googlebot. All accessible URLs will still be submitted.”
Can you please advise if this is normal? If so, will it affect any indexing that may occur?
Thanks
Shelley
Shelley,
I’m so sorry for not responding sooner to your question. It was an oversight on my part.
This message is ok. If you created a robots.txt file using my instructions then there are five places in your site you are telling robots to ignore and not index:
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /cont/
Disallow: /themes/
Disallow: /scripts/
The message you saw is letting you know that there are places that are disallowed. It’s letting you know about this in case you didn’t mean to disallow any pages, or in case you disallowed a location you actually want indexed.
Once I did exactly that. I accidently disallowed my entire site (not good for indexing!). So, if you see this message it is a good idea to verify that you are only disallowing pages that you don’t want indexed. If you find that you accidently disallowed locations that you DO want indexed, just remove the ‘Disallow: /whatever file or folder is disallowed/’ from your robots.txt.
Rochelle
Hi Rochelle,
Thanks for your reassurance.
That’s what I did. I went and checked which pages it couldn’t view and they were the ones I had disallowed.
Just gave me a bit of a scare at first.
Thanks again for your help.
Shelley