Search Engine software otherwise known as robots or bots, keep visiting website to crawl updated information and help in indexing them. This helps users to get the information about the website, when he/she does any query in search box.
Hence shortly we can define robot.txt file as ” A file in the root of your website that can either allow or dis-allow (restrict ) search engine robots/bots from crawling pages of your website(s). “
Now the question is how does this robots or bots crawl websites and why they do so ?
I have already provided the second question answer in my first paragraph and now the question is how does it crawl websites. They do this via links which I have shown in below diagram :
Circles : Represents websites, Arrow : Represents link within websites
With the help of robot.txt file, we can direct robot to if crawl website or not and which pages we can allow it to crawl. Hence robots first check the instruction provided and doesn’t go out of this.
“If you want search engines to index everything in your site, you don’t need a robots.txt file ”
Therefore one need to check if your website is having proper robot.txt file or not otherwise go for creating it with the help of below guidelines :
How to create a /robots.txt file ?
Where I can put my generated robot.txt file ?
Below I have shared the ways how one can add robot.txt file. There are two ways to do so :
1) Shortest Method : in the top-level directory of your web server.
2)Long Method :
For example, for “http://www.domainname.com/seo/index.html, it will
remove the “/shop/index.html“, and replace it with “/robots.txt“, and will end up with “http://www.domain.com/robots.txt”.
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page.
- How and where to put the file exactly , depends on your web server software.
- Remember to use all lower case for the filename “robots.txt“, not “Robots.TXT
The “/robots.txt” file is a text file, with one or more records.
Below I am sharing the format how one can create and how to tell robot what to index while scanning :
To exclude all robots from the entire server
User-agent: * Disallow: /
To allow all robots complete access
User-agent: * Disallow:
(or just create an empty “/robots.txt” file, or don’t use one at all)
To exclude all robots from part of the server
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
To exclude a single robot
User-agent: BadBot Disallow: /
To allow a single robot
User-agent: Google Disallow: User-agent: * Disallow: /
To exclude all files except one
This is currently a bit awkward, as there is no “Allow” field. The
easy way is to put all files to be disallowed into a separate
directory, say “stuff”, and leave the one file in the level above
User-agent: * Disallow: /~manoj/stuff/
Alternatively you can explicitly disallow all disallowed pages:
User-agent: * Disallow: /~manoj/junk.html Disallow: /~manoj/images.html Disallow: /~manoj/bar.html
Reference links :
Also one can make or edit own created robots.txt file using the robots.txt Tester tool. With this one can test changes as per the adjustment.
Hope you liked this topic and would like to share your opinion. Please do share your comments and also the issues which you are looking out to solve.