I- Expert Data Miner - Getting Started


Suppose that you are just using Expert Data Miner for the first time; previously you fetched your log files from your web server. Here we will use Apache log files. The first step is to open your log files from the default project.

In this case, as long that you have enough memory, it is preferable to open several log files at a time. Using the shift key and the mouse both at a time can do the job:

The above dialog appears in French because my Windows XP station is in French, this is a feature controlled by your Windows version, not by EDM. The next step is to parse the log files. But since you are using the system for the first time, you are prompted for the domain name to use. This domain will be used for the default project; if you manage several websites you'll have to create a project for each site rather than to rely on the default project.

The fictive domain name that is typed here is 'shonxxx.com', not 'www.shonxxx.com' or 'http://www.shonxxx.com'. Typing a wrong domain name would affect negatively one report, External Referrers.

The Default (main page) can be left to blank in most cases except if your default page is not index.htm, index.html, default.asp or default.aspx (on IIS). When someone visit your domain he can type http://www.yourdomain.com/ or in some cases http://www.yourdomain.com/index.html (or default.asp on IIS) and get the same page. EDM will merge the statistics of all those different requests into an equivalent page "/", your root page, provided that it knows which equivalent pages to use. However equivalences like index.html or default.asp are so frequent that it can manage this by default.

The query extensions are the characters that come after the "?" in a request to your pages. Often with php, asp, perl or other kind of pages the output page will be controlled by the query extensions, i.e. http://mypage.asp?id=4 and http://mypage.asp?id=48 will not display the same content. Your log files contain the whole string, but if you wish to regroup the statistics related to http://mypage.asp into a single page you can discard the query extensions. If you have a page like http://mypage.asp?user=A7dhn8562&idPage=4&uid=33&grp=100 you can also decide to discard selectivelly some query extensions or to keep some. If any hit on mypage.asp contains a different user number your statistics are useless because each request will get one hit, each will be considered as a request to a different page. Since the variable 'user' plays no logical role in the output content it is better to discard selectivelly this query extension here. This means that by discarding the variable 'sess' the string user=A7dhn8562 is discarded, the separator '&' also. You can also decide later to discard or keep selectivelly query variables (and their content) for all your pages or for some specific pages. All you need is to go in the menu Configuration, Global and choose the option Parameters. Let's parse the log files instead.

 The second button under the 'Run' command in the menu is thus pressed. A progress dialog appears. After a few seconds the task is over.

Let's choose the General Data - Traffic section. Here the daily activity is retrieved:

The % of people who visited your root page is a user defined column. To see how it is done, I will click on the button with a hammer and the screw driver, just beside the light bulb. The following (configuration) screen appears:

Here you can configure either the layout of your screen or the layout of your HTML report, if you decide to export those results in an HTML report later. There is one such screen available for each report. The combo box Sorted By defines the default sort when you open this report or when you output it to an HTML file. You can always change the current sort column when you click on a column header in the main reports.

The column  '% who visit root page' can be selected and deleted. One can also add a new column from a pool of predefined columns. But you can also create new columns in this pool. This is done by clicking the button Define Action. If you click this button, here is what you get:

If you want to add a new column in your report, you can select the combo 'Action Type'. The available actions for this report are then shown.

If you want to know what percentage of your users are asking for the page http://mydomain.com/sub1/mypage.html during their session, you select 'Match a Page'. The same choice could be done for a downloadable file (zip, mp3, etc..). You then need to type in the target box:


You need also to define the column header and click on the Update List button when you have finished before to Save/Exit. The Long Description & Tool Tip field is optional; since you are limited to 22 characters for the column header you may prefer to get a longer description when you drag you mouse over the column header in your report later on. But let say that you don't want to add a column right now; just to see the content of a previous action. Select the first line in the list and click the Edit button. You will get this:

The target '/' is the last character after your domain's name in http://www.mydomain.com/ . It is the root page. When EDM scans your log, it will transform URLs like http://www.mydomain.com/index.html into http://www.mydomain.com/ or '/' for the reason that we saw earlier, i.e. because you get the same page when you type one of the above URLs in your browser. For IIS logs, /default.asp and /default.aspx are also transformed into '/' for the same reason.

There is no reason to modify this now so let's click on the Cancel Item button. The action that will be created in the pool is the people who comes from Canada, so the choice 'Coming from a Zone' will be taken in the combo box Action Type.

The button Update List is then pressed, and finally the button Save/Exit. We are back in the previous screen but we need to add this new column somewhere in the report; for the moment it is just in the global pool, but not yet attached to a report. Lets select the column Visits and press the insert button after. The column will thus appear after the column visits.

Once this change is done and save, click again on the 'Parse Log' button from the main screen and wait that everything is over.

Actions can be introduced or removed in nearly the 3/4 of the reports.  You can use your imagination and spot situations where cross-linking new columns with a row will give you valuable information. If you work in marketing especially, there is a lot of interesting conclusions that you can draw.

This was for the default project; anytime that you open again your application, you will be able to parse with the domain name that you defined earlier. If you wish to parse log files with other domain's names, you can use the option Project from your main menu. You can also create a new project if you wish to use filters in a specific situation, or if you have a large number of log files to process and not enough memory with the default project. If you create a cumulative project, you need to either keep the columns that you define at the outset or either rescan all your log files when the format of your database is changing. One advantage with the cumulative project is that you can use the option Fetch Files From Server and leave it to EDM to discard log lines that were already processed earlier. So it's possible to update your statistics quickly when you press this button.

Next Section: User Path

Back to the home page