Click to go to USFreeAds' website
About
This website is all about strategies to help you grow your niche store empire. I have many Build A Niche Store sites and use the strategies that I share here.
Newsletter
Subscribe to my posts and get all of the latest tips and strategies sent directly to your email!
Delivered by FeedBurner
RSS Feed
Get the most recent posts and comments sent to you directly by subscribing to my RSS feeds!
Subscribe to RSS! Subscribe to RSS Comments!

Jan
30

How to Add A Robots.txt to your site

RochelleHow To..., Strategies

The brief (and by no means complete) definition of a robots.txt is that it is a document that tells search engine spiders what pages in your site to index and which to ignore. Many people choose to prevent their BANS folders and files from being indexed for several reasons. If they do get indexed then these folders and files could be included in search engine results. These are not pages you want potential customers going to.

Be aware that creating a robots.txt does not guarantee that search engine spiders will follow your requests. Most will but a few will do what they want regardless of what you enter in robots.txt.

 

My instructions to add a basic robots.txt to your site:

Step One
  Open Notepad or equivalent (do NOT use a word document program as these add hidden characters that can do funny things).
   
  Copy the contents of robots-txt.txt to your Notepad.
   
  Note: If you don’t want certain folders or pages to be indexed you can disallow them by following the above format:
 
  • Disallow: /name of folder to disallow/
  • or
  • Disallow: /NameOfPageToDisallow.htm
   
  Make sure you don’t leave a blank Disallow action:
 
  • Disallow:
  Including a blank Disallow: will prevent your entire site from being indexed. That would be bad!
Step Two
  Save your document as robots.txt.
Step Three
  Upload your robots.txt to your site’s root file (where you site’s index.php is located). (Click here to read my opinion of the best FTP client.)
done

Rochelle

Related Posts:

  • How to Add robots.txt Meta Tags to Your Site
  • How to Add your Dynamic Sitemap to your site’s robots.txt
  • How to Add a Single RSS Content Page to Your Site
  • How to Add External Links Alphabetically to Your Site’s Sidebar
  • How to Add a Dynamic Sitemap to Your Site
  • How to Allow Negative Keywords in Your Site
  • One Day After Starting Test of 30 Minute Backlinks
  • How to Add Mark’s Sitemap to Your Site
  • How to Add a ‘Contact Us’ Form to Your Site
  • Three Days After Starting Test of 30 Minute Backlinks
  • RSS feed | Trackback URI

    10 Comments »

    Comment by Joette
    2008-04-20 11:47:23

    Checked my robot txt in google tools and for many of the lines the result was “Syntax not understood” line six title was “Accepted but correct syntax includes a colon”
    Is this normal, I followed the instructions. Thanks for any help, I love-love-love this site, Joette

     
    Comment by Rochelle
    2008-04-20 20:01:02

    Joette,

    My sincere apologies to you. I had some HTML codes in the robots-txt.txt file that I have now removed. Thank you for bringing this to my attention.

    Rochelle

     
    Comment by Shelley Subscribed to comments via email
    2008-06-12 03:30:45

    Hi Rochelle,

    When working through your checklist, I completed the task to disallow the safety on ebay page from my robots.txt file.

    I’ve just gone back into Google Webmaster tools and there is a warning on the site:

    “URLs roboted out
    When we tested a sample of the URLs from your Sitemap, we found that the site’s robots.txt file was blocking access to some of the URLs. If you don’t intend to block some of the URLs contained in the Sitemap, please use our robots.txt analysis tool to verify that the URLs you submitted in your Sitemap are accessible by Googlebot. All accessible URLs will still be submitted.”

    Can you please advise if this is normal? If so, will it affect any indexing that may occur?

    Thanks

    Shelley

    Comment by Rochelle
    2008-06-21 09:08:40

    Shelley,

    I’m so sorry for not responding sooner to your question. It was an oversight on my part.

    This message is ok. If you created a robots.txt file using my instructions then there are five places in your site you are telling robots to ignore and not index:

    Disallow: /cgi-bin/
    Disallow: /admin/
    Disallow: /cont/
    Disallow: /themes/
    Disallow: /scripts/

    The message you saw is letting you know that there are places that are disallowed. It’s letting you know about this in case you didn’t mean to disallow any pages, or in case you disallowed a location you actually want indexed.

    Once I did exactly that. I accidently disallowed my entire site (not good for indexing!). So, if you see this message it is a good idea to verify that you are only disallowing pages that you don’t want indexed. If you find that you accidently disallowed locations that you DO want indexed, just remove the ‘Disallow: /whatever file or folder is disallowed/’ from your robots.txt.

    Rochelle

     
     
    Comment by Shelley Subscribed to comments via email
    2008-06-21 23:45:19

    Hi Rochelle,

    Thanks for your reassurance.
    That’s what I did. I went and checked which pages it couldn’t view and they were the ones I had disallowed.
    Just gave me a bit of a scare at first. :)

    Thanks again for your help.

    Shelley

     
    Name (required)
    E-mail (required - never shown publicly)
    URI
    Subscribe to comments via email
    Your Comment (smaller size | larger size)
    You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

    Trackback responses to this post