<<< Analyzing Log Files
Log File Analysis >>>
Let us take a closer look at an example of importing log files. You start a five-step Import Wizard from the Log Analyzer tab.
At the first step, you need to select the profile that corresponds to the format of your log file. If you do not want to specify the logfile format (profile), you may select Autodetect from the list of available profiles. The list always contains 3 standard profiles: Apache 2.0 standard, Zeus 1.0, and W3C Extended Log Format, and Autodetect. When you select Autodetect, the program will try to locate the fields with key data in the log file records automatically. Autodetect works only for logs in Combined Log Format or W3C Extended Log Format. You may switch to the Log Profile Manager , if necessary (e.g., there is no log type that you need or you want to adjust an existing format).
After you select the profile and press Next, you'll be taken to the next step, where you will pick the import source. This may be a local file or a file on the remote FTP server. In case of the latter you will have to enter your FTP server address and connection parameters. If you have entered these settings before and saved them as default ones, you'll be able to load them instantly. When you are done, press Next to proceed to the next step.
At this step you will select log files to be imported. Naturally, all of them must be of the same format . There is a list of last 10 imported log files (this list is project-specific). You may move them to the list above for import. The Add button opens the file selection window. Log files may be saved in the form of ordinary text files, .gz archives (packed text log), or .tar.gz archives (several log files packed together).
If you import files from the FTP server, the download process will start after you press Next. You may abort the download by pressing Stop. The stopping operation may take a few seconds due to the peculiarities of FTP protocol.
The fourth step (Autodetect profile) is activated only if you have selected Autodetect at Step 1. Otherwise this step is skipped. One string from the log file is displayed at the top of the window. You may change the current string using Previous Line and Next Line buttons. The autodetect is performed automatically when you change the current line. When autodetect is completed successfully, there are two variants possible: If the log file was saved in W3C format, you'll receive a message stating this. If it was Combined Log Format, the information detected would be displayed in the fields at the bottom. If all is well, press Correct to proceed to the final stage of import. If the results are not satisfactory, that means that Autodetect is unable to locate the fields properly and you need to specify the exact profile.
At the final step you can specify the parameters of data to be saved into the log database of the current project. You can record the information on last visit only (Keep last spider visit) or only the last spider visit to a unique page (Keep last spider visit to individual page), as well as record all visits (Keep all visits). If you do not check the Save import results in project database option, then the above choices will only be applied to the imported log file. When a Clear log database before import option is selected, the log database will be cleared!
When you press the Finish button, the import of log files will be started according to the parameters you have specified. You can abort the process by pressing the Stop button in the progress window. In this case you will be able to work with information that has been loaded till that time.
After the import is completed, the spider visits tree will be constructed at the Log Analyzer tab.
The program may find requests from unknown robots when importing your log files. Then a message will be displayed, asking you whether you would like to add this new spider to your spider database. If you agree, the window used for adding a new spider into the database will appear, with the information filled in partly, and a line with the unknown robot's request will be provided for your reference. The line fragment that normally contains the spider name will be placed in the User-Agent field. You need to remove the unnecessary symbols so that only a spider name will remain. If you believe this robot is not a search engine spider, you may open the list of Known User-Agents (non-spiders) and add the full line from the User-Agent field to the list.
In the situation displayed at the screenshot, the content of Log File String field is:
81.2.119.212 - - [25/Jul/2003:09:53:53 -0500] "GET /robots.txt HTTP/1.1" 404 290 "-" "Comodo HTTP(S) Crawler - http://www.instantssl.com/crawler, http://www.whichssl.com/crawler"br> The program will fill in the User-Agent field using the value Comodo HTTP(S) Crawler - http://www.instantssl.com/crawler, http://www.whichssl.com/crawler . You delete what is unnecessary and leave only Comodo in this field, and enter the full name, "Comodo" in the Spider Name field. The spider will be added to the database.
<<< Analyzing Log Files
Log File Analysis >>>
|