Introduction Report Samples Download Contact Purchase EDM About

The Difference Between Log Analyzers and their Results


A large number of tools allow you to get reports and statistics regarding the visitors who come to your website. If we just talk about log analyzers, the amazing thing is that you will get as many different (and contradictory) results than the number of software that you try. Why do log analyzers give you different results if you scan the same log files? Shouldn't you get the same number of visitors, hits, time spent on your site for each visitor, etc? If some of you try Google Analytics or MS AdWord Analytics, you could get even larger differences. Is it preferable to use a page tagging system instead of a software to analyze your log files?


Differences between log analyzers:

In some cases, bugs or a lazy interpretation of the web statistics (from the programmer) can lead to inaccurate reports. For the most serious log analyzers however, the difference regarding some fundamental data like the number of visitors stems from subtle reasons. If you do not use cookies, or if a small fraction of your visitor's browsers do not accept cookies, the presence of visitors from AOL America, or any ISP who is sharing IP addresses between users can introduce an important distortion. Let say that such visitors account for 5% of all your visitors; if each of them is generating an average of 20 hits (images, pages, downloadable files) with a different IP address for each requested file, your software would tell you that the number of visitors is almost twice more important than the real number! Some software will combine the IP address with the browser of the client in such cases, however if 90% of the people are using the two  most  popular browser versions it wont help. When Expert Data Miner encounters an IP address that falls within the range of such reusable IPs, it will use a thumb rule based on the number of hits from this IP range to estimate the number of visitors for these providers.

But even if you are using cookies there is still a way to over count the number of visitors. Let say that a visitor is asking an image or a pdf file from a direct link and ask for some page which contains the javascript to assign a cookie only a few minutes later; many software will count two visitors, one with a specific IP address, one with a cookie. On the other hand EDM will merge those two visitors into a single one.

There is also a potential difference if you use a page tagging software. In the case of a log analyzer, visitors who are just asking for one popular file may be ignored unless you specify 'no-cache' in the header of your page. That's because if the page is stored in a cache, their request will never reach the server, so it is not recorded in the log file. You can estimate roughly the importance of such a factor in one of the reports of Expert Data Miner, 'Referrer Type (Hits)/Entry Page. This report contains a section Internal Referrers, so pages whose referrer field contains one of your pages that this visitor doesn't seem to have request earlier.

However you can also miss visitors with a page tagging system if the javascript is disabled from the client browser. More important, you will miss virtually all spiders and robots; they are some kind of visitors too. For the same reason visitors who are just asking for an image or a video with a direct link, so people who bypass the javascript that you may try to implement are not detected. Visitors who get an error code and cannot fetch a page are also ignored. Combining a log analyzer with a page tagging system is thus not a bad idea in some cases.

The number of hits can also differ between two log analyzers if one decides that a hit with an error code is not a real 'hit'. For some reports like the Search Phrases, the number of hits will not be the same if the search engine definition differs; this is always the case, however you can add your own specialized search engine in the list that EDM is using by default.

If a report like the Most Popular Pages or the Most Popular Downloads the difference is not only existing because spiders can be included or rejected from the results, the choice of the software maker to include or not  requests with an error code (like file not found)  has also an impact. System codes  like 206 (Partial Content) or 304 (Not Modified or 'is it in cache?') an the way to process them can also lead to an important difference. A request for a file can generate many hits with a code 206, or one code 304, or even one code 304 with another hit with a code 200 (OK). Here EDM is using roughly an algorithm not so different than Urchin, aka Google Analytics to determine how many times a file was requested.

You will also get very different values regarding the average time that your visitors are spending on your website. The reason is that many software proceed by assumptions and even if the time spent on your last page is not available (no signal is sent to your server when a visitor close his browser or press the back button to return in Google) they will still assign an hypothetical value for the last page. And here again some software will mix spiders and humans to get this 'average time per session'.

Advantages of Expert Data Miner:

You can get the user path of any visitor. Very few log analyzers allow you to do this; the reason is not just that it is technically difficult to achieve, it is also hard to process and save all this data on a hard drive and keep the application relatively fast. Several people will find it practical to fetch the navigation path or the pages requested for any visitor (so the click trail). In some cases the DNS will give you the identity of those visitors; in some other case you can still use the IP and fetch further information with the tool of your choice. But if you are using cookies or an authentication ID, EDM also allows you to visualize all the sessions and the activity for a specific visitor before or after the session that you are currently viewing. If you get the navigation path of a visitor who found your site with some keywords, you have just to click the 'next visit' button to see what this visitor did later provides that he came back.

Being able to trace back all the pages or files requested by a visitor is also helping you to understand what is laying behind the statistics and how these ones were calculated. Finally and not the least, it is a helpful tool to isolate bugs and validate the statistical results of the chosen line.

You can create your own columns and combine a rich set of filters with an OR/AND condition and get personalized reports with a flexibility that is unmatched. There is also a very good level of flexibility regarding query segments if you use Perl /PHP/ ASP pages. You can regroup the hits of your pages according to the query variables that you decide to keep or drop.

 In spite of this, the interface is relatively easy to use and the concepts can be grasp quickly. After all your goal is often to improve your profits and optimize your website; if you are lock with a system that is so complex that you need to pay specialists to configure or interpret your reports, a software is not so helpful to increase your profits. An online tutorial was put on this website; nevertheless even with two links that point to the turorial from the software, less than 5% of the people who download the trial version are using it.

Custom modifications: Don't expect custom modifications from large companies who offer you a page tagging software. You are lock with the screen layout that comes with their software, and you have to adapt yourself to the changes that they do. With ASCO IT if you have a specific need for a report it is still possible to adapt the software at a reasonable cost in many cases.



Back to the home page