back | forward 
The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to visiting robots which parts of their site should not be visited by the robot. Before indexing your site, the spider downloads robots.txt file that contains instructions on what should and what should not be indexed. Therefore a key to controlling spiders is robots.txt file. If you have a large website or update it frequently, creating and editing it will be a hard and dull work.
Robots.txt Editor is an easy-to-navigate visual editor that will enable you to specify different directives for selected spiders in specific areas of the site and generate the robots.txt file quickly and easily.
The robots.txt file is just a simple text file, containing instructions for search engine robots. There are different types of instructions: for individual robots, for all robots, for certain folders, for file types,etc. The file can be created using a simple text editor like notepad or wordpad, but it is very difficult to create such file manually and not to make a mistake somewhere.
There are two tabs in on this module: Spider List tab and Disallow tab.
On the Spider tab you can view names of all robots that are in the program database (176 altogether). There are three fields for each robot. You can group robots alphabetically by any of these fields by clicking on the column header.
- Spiders (User-agents) - name of the spider user-agent does not necessarily correspond to the name of the search engine, for example, Inktomi's user-agent is Slurp. In this field user-agent of the search robot is given in brackets
- Country - country, where search engine is located
- Primary language - primary language of the search engine. Although some search engines (like Google) provide interface in different languages for users in different countries, their primary language is that of the country of location in most times.
If you right-click on the Robots window, the following context menu will appear:
 |
Select all - select all robots in the list
Deselect all - clear selection
Select by... -Select only robots from certain categry (country, primary language)
Invert selection - make deselect selected robots and vice versa
Spider homepage - open the spider homepage in the browser window |
Select robots you would like to exclude from indexing your site by clicking their check boxes and click Next > to go over to Disallow tab.
Import robots.txt If you want to modify an existing robots.txt file, you can import it into the program for editing. Click on the Import robots.txt to launch Import Wizard. It consists of three steps:
- On the first step of the wizard you must select the location of the file to export. Select Local folder if your file is saved on a local computer or local area network. Select FTP Server if you access your site via FTP protocole. Select HTTP Server if you can access your site only via HTTP protocole.
- On the second step you must select the file to import.
- For local folder click
Browse and show path to your file.
- For FTP server enter the hostname of your FTP server and click Next > or
FTP Browse. If you click FTP Browse, you will be able to view the contents on your server. If you click Next the program will connect to the server and try to find the file on its own.
- For HTTP Server, enter the URL of the site's homepage, which is usually the address of its root folder and click Get to receive file.
- On the third step you will see the information whether the import was successful or not. If the message says the import was successful, click Finish and select the location where you would like your file to be saved.
- Local PC - click
Browse and show the location of the folder, where you want to save the imported robots.txt file
- FTP Server - enter your FTP server hostname, your username and password and path to the target folder. You can also select
FTP Browse option and show the path to the folder manually
On the Disallow tab you can select files and folders of your site you would like to leave unindexed.
There are two windows on this tab. In the first window you can view the spiders you have selected on the Spider tab. You can group them by the country of their location or by their primary language by selecting the appropriate value in the Group by drop-down menu. In the second window you can see the catalogue structure of your site. After the first program startup the rogram installation folder is set as your default site folder.
To change the location of your site, click Site Location. If your site is stored on a local computer, click select Local PC, Browse to the site's root folder, select it and click OK. If your site is accessed via FTP, select FTP Server, enter the address of your FTP site into the Host field, enter the port number, your username, associated password and the path to root folder of your site. If your computer is already connected to the Internet, you can browse to the site's root folder by clicking FTP Browse. If you are working behind the firewall, select Passive Mode check box.
If the root folder of your site is set properly, you will see its file and directory tree. You can expand directory nod by clicking next to the folder icon. To select folders for including into the list of forbidden for indexation folders, check its checkbox.
You can deselect the selected folders by clicking Clear selection. The following dialog will appear, where you should indicate whether this should be applied for the selected search robot, or for all robots. Click OK.
To select folders for preventing from being indexed do the following:
- Select a spider.
- Select the checkboxes of the folders you want to be disallowed for this search spider
- Click
Generate robot.txt
Note: if you want to apply the same rules for all robets, make sure *(All Spiders) item is selected, otherwise the limitations will be applied only for the selected robot (i. e. the one, marked by cursor)
You can also disallow access to certain file extensions. Click Disallow File Extensions and in the pop-up window add or remove the extension
In the pop-up window you will be able to preview the output robot.txt file. If the result is plausible, you can save the file bu clicking Save file. Indicate whether you want to save the file locally or on your remote FTP server and click Next>. If you select Local folder, the program will suggest standard Windows Save As... dialog. If you select FTP Folder, the program will open FTP configuration window, where you should enter your FTP settings and click Next>. The program will connect to remote FTP server and save your robots.txt file automatically. After uploading click Finish.
See on the Internet: A Standard for Robot Exclusion
back | forward 
|