AWStats Log Analysis Paper

 


This is all you need to know to get statistics of your web sites:
The different ways to have them, the differences, their acuracy, etc...


How getting statistics for a website


HTML Tag counter

The goal of this way of working is to add a tag on web page, hosted by web site www.mydomain, that point to a page (may be a CGI script) hosted on a server called tagserver (may be the same), so that each time a visitor download a page the browser then make a query to server tagserver. When the server send answer to this tag, it increase a value by one in a database.
What's good:
- Easy to do for newbie hosted by free web hosting provider.
- No need to have access to web server log file.
- This is the only way to get some kind of information like screen size, technologies supported by browsers like Java, Flash, PDF or file format reading capabilities of visitor computer (Real, QuickTime, or Windows Media files playing capabilities).
What's wrong:
- If the browser use a cache or not, the page and counter link can be asked or not by browser so hit can be counted or not.
- You need to put the tag on each of your page (not only the home page) because a visitor can come into your site directly to a secondary page with no path on home page. According to web site size, popularity and content, visits path use the home page between 5% to 95% of times. Adding tags on all pages can be a very hard work.
- You can't detect robots as a lot of them does not download sub-links detected as CGI counter.
- You miss a lot of informations like keywords, referrers, search engines, ...
- When page is downloaded several times by same user when getting back on previous page, the referring link can be or not downloaded again.
Summary:
Using a counter count something but nothing clear (not visits, nor visitors, nor page views). Because we don't know what a such tools count it's even not possible to make a study to have an average error rate for using a such way to get web statistics !

Log analysis

Because there is so rubbish things that was said and are still said about log analysis, instead of telling why log analysis can be good, I prefer to tell first all what is said about log analysis and then tell if it's true or wrong.
You will see that a lot of people speaking about log analysis (authors of commercial products but also of old free log analyzers) are still living in a prehistorical time.
...Still in working progress...

Applicative tracking (cookies or session ...)

...Still in working progress...


So How AWStats works ?


As we have seen previously, there is different ways to get and compute data. AWStats was built with one goal : Being more accurate than any other tool and use the most accurate technology to compute its statistics. That's the reason why AWStats simply uses all thoose methods. The default setup (also called the 'easy setup for newbies') is only log analysis (enhanced with all clever trips suggested in this paper to avoid errors made by most log analyzers), but it's highly recommanded to activate all AWStats feature if you are an experienced user to also use the other methods (HTML tag counting and Applicative tracking) to benefit of all advantage of them.

The conclusion is that :

A log analyzer that does not include a very high level of intelligency in computing log files will give you very bad results. So most of authors of Commercial Products are wrong when they say that a log analysis is a very good way to have statistics. IT IS, BUT ONLY if the product uses complex rules to reduce errors, and that's not the case for their product (I study 2 of them, among the two most popular, and both of them were so simple in their "algorithm" that the error rate was between 40% to 250%. A log analysis is more than just counting lines in a file !).

On the other hand, people that say a log analyzer can't give accurate results are also wrong, because when saying that, THEY FORGET THERE IS NOWDAYS NEW TECHNICS (like thoose developped in AWStats) TO EXCLUDE OR REDUCE SERIOUSLY CACHE BROWSERS, PROXY, IP LOCATION, WORMS, ROBOTS AND LOG WRITING BUFFERING PROBLEMS. Of course using thoose technics consumes a lot of time and reduces log analyzer speed by 2 or 3, but this give you very precise results.



Other articles


Measuring Web Site Usage: Log File Analysis by Susan Haigh and Janette Megarity. Being on a Canadian government site, it's available in both English and French.
Article written by .


Follow @awstats_project