Excluding Pages from a Search Engine’s Index
Two important and common problems of websites are excluding certain pages from being indexed and fighting spam. The issue of link spam is clearly a problem every website, especially blogs, have. The problem of not wanting certain pages to be indexed is not universal though it is a pressing one for some websites. Reasons for not wanting a certain page to be indexed by search engines include links leading to duplicate content or stub “no content” pages. This is important in SEO because we know that duplicate content is not desirable. Whatever the reason is, there are ways in which certain pages can be specified as “not for indexing”.
One of the ways to do this is by using the nofollow attribute. However, as discussed in the previous post on the nofollow attribute you can see that exclusion of pages from being indexed is not really what the attribute is for, besides not all of the search engines bots (Yahoo! and Ask bots) even recognize/pay attention to the nofollow attribute. To exclude a page from being indexed what you need is to create a robots.txt file and include the pages(s), file(s) or even whole directories you don’t want to be indexed.
So what do you put in the robots.txt file? The standard file indicates the User-agent(s) and the files/page/directory which the user-agent(s) cannot enter. For example:
User-agent: *
Disallow: /private/
The * means that all bots are not allowed to enter nor index the directory “private”.
If you want to keep out specific bots you can specify this by replacing the * with the name of the bot.
For example:
User-agent: Googlebot
Disallow: /private/
This means that all the other bots will be able to crawl and index your private directory except for Google.
Popularity: 40% [?]
Similar Posts:
Post-Plugin Library missing

0845 838 7448
No comments yet.