When interpreting your web statistics becomes
an easy task with the proper data mining tools
by Jean-François Beaulieu
|
If you invested a lot of time and money in your web site, getting the proper tools to know who are your visitors and what they do should be the next step. Many statistical packages exist on the Internet, but their quality varies greatly. Most free packages will give you a rough picture but in general they fail to separate correctly spiders & other scripts from human visitors. Often your Internet Service Provider is giving you some results that he got from a predefined package. However your provider is locked in a difficult situation; to attract people he must slash his prices and paying for a very expensive package could harm his own business if only a fraction of his customers is really using it. Furthermore, processing mountains of log files in order to get much more than a rough picture will put an extra strain on the server, so slow it down. Eventually you penalize those visitors who wish to fetch your pages quickly and at the end... you penalize yourself if the visitor never comes back. One alternative is to repatriate your log files from the server ( or main computer) and to bring them on your own computer. In such an environment, you can analyze your log files as much as you want without penalizing potential visitors to your web site. Furthermore, the Windows environment offers a wide range of popup and graphical tools that makes the analysis of such data much easier. Among several packages, Expert Data Miner is likely to shed a new light on your perspective regarding web statistics. Produced by ASCO IT, a Montreal based company, Expert Data Miner has four main qualities that distinguish it from its competitors: Accuracy, speed, configurability, originality. Accuracy: An enormous effort has been put to offer accuracy, not 'junk' or bogus statistics. You will be surprised to learn that spiders can form as much as 40% of your 'visitors' in some cases. Those scripts and programs will certainly help your site to be indexed by search engines, but if you wish to track down how much time a typical visitor spends on one page or where he goes, removing spiders from your sample seems a necessity. However, most packages fail to do it properly. With EDM you can configure virtually each report and say if you want or include or exclude spiders from your statistics. Counting the real number of visitors is also a
task that may not be so easy, mostly because of some providers like AOL
America who are recycling IP addresses constantly. This leads many software
to overestimate drastically your number of visitors; furthermore if you
can't identify properly the visitors with an ID, many related statistics will
also be incorrect; there is a 'snow ball' effect here.
Speed: Among the web statistic packages designed for
Windows, EDM is one of the fastest, and probably even the
fastest in the world. If you have a web site with
thousands or even dozens of thousands of visitors per day, this matters
especially. Several tests were performed with log files of hundreds of
thousands of lines and 1.5 million lines in one case. EDM is 12 to 15 times
faster than Deep Tracker and in spite of the large amount of produced
data, it can process a log 3 to 10 times faster than most other packages
(often with more reports or more columns per report). Configurability: Most other packages give you the possibility to define 'projects' but do not offer database support. With their projects, you scan a couple of logs, save the results, and that's over. Should you wish to make a bigger study based on months of activity, or add new log files one week after, you can't. A handful of companies will offer some level of database support, but this will always be at the expense of speed. With Expert Data Miner, you can define cumulative or non cumulative projects. The database was also designed to be fast; so switching to the database mode will only add a fraction to the process time. Most other packages give you some predefined columns in each report. A few of them will allow you to hide or discard some predefined columns. With EDM, you can not only switch two columns or insert/delete a predefine column in your reports, you can also create columns from scratch. This would be the wet dream of any statistician. You define a target, an operand, the column header and bingo! EDM will produce the newly defined reports & columns from your log files. Combining such customizable columns with customizable filters on rows gives you a powerful tool when it comes to isolate the behavior of some users. If you are already using cookies to count your returning visitors, you may not wish to change your system. EDM can be configured to accept your cookies and spot returning visitors, even cookies with alphanumeric values. If you own a specialized web site and that your visitors do not follow the classical incoming path Google/Yahoo/Ask Jeeves, you can also add new search engines to the predefined database. Originality: Many reports and concepts in EDM are pure creations that do not exist in any other package. In addition to the classical (and mandatory) reports on daily traffic, hourly traffic, etc.. you will find several reports that have no equivalent. EDM is relying heavily on concepts like entry pages and referrers. Tracking down where your visitors come from and what they do in each case makes the core of the application. |