Why does my web reporting tool show different numbers to the old one that we used to use?
It is a common question and one that inevitably occurs when a change in technology occurs. You move from one web analytics tool to another and you find that even the simplest of metrics such as the number of visitors change. Like most people your first question is likely to be; “shouldn’t they report the same numbers?”
In an ideal world, then yes this would be the case, however there are a number of factors to consider. These are:
- How the data is collected
- How the software determines what a visitor is
There are 3 primary ways in which data is collected for reporting. These are:
- Web server log files.
- Packet based collection where a dedicated device looks at the network traffic to a web server and records this information.
The last option is relatively uncommon.
The problem is that each of these three methods has their problems. With the log files and the packet based collection, both of these rely on the end user requesting a web page and that web page being delivered by the web server. It sounds logical that this would be recorded into the log files. That is until we factor in proxy servers and the web browser cache.
A large ISP and even some corporate networks will have a proxy server that will temporarily save a copy of a web page and make this available for the customers within that network. This means that when the first customer from that network requests the file, your web server will deliver the file and record the transaction into the log file. Later on when the second or subsequent visitor requests the page, they then get this from the proxy server and not your web server. In this case it is likely that nothing will be recorded about this event in the web server log files. A similar thing happens with your own personal web browser. Once you view a page for the first time, depending on the settings of your web browser, you may not actually visit the web server again when visiting that page for a subsequent time.
In short if the web server doesn’t record the event, then this is missing from the reports.
Again as above, if it isn’t recorded then it will be missing from the reports.
Different Visitor and Session Detection Methods
All web analytics applications use rules by which a visit or session on the website is determined. Generally speaking they all use a time out value (usually 30 minutes) so that very long sessions are not recorded or are treated as multiple visits.
Usually a web log file tool will give a choice of options as to how a visitor should be calculated. These fall into three main groups.
- IP address based
- Cookie based
- Login based (authenticated users)
The last option is relatively rare.
Within this each product has subtle variations on how these are calculated. The general industry recommendation is to a cookie based tracking method with a visitor’s session set to time out if there is no activity after 30 minutes.
In summary the methods are:
IP address – usually combined with the web browser’s signature – this is the least accurate method, however it is commonly the default method used by web server log file analysis.
Cookies – This is more accurate, however visitors do delete cookies or may choose not to accept them. It is commonly considered the best practicable method to detect visitors.
Authenticated users – This is where a visitor logs in to the website using a user name and password. Where this data is available then a very high level of accuracy can be achieved, however it is generally not practical to implement on an internet based website as the un-authenticated visitor traffic is usually of significant importance.
Depending on how a reporting/analytics tool is implemented or designed it may make use of one or more filters to exclude certain transactions from the reports.
The reason for this is that a large percentage of traffic to a website is likely to be of little interest to a business owner or manager. Search engines for instance, send automated software programs to index a website on a regular basis and email harvesters repeatedly crawl sites looking for email addresses. There are many such automated applications that perform functions such as these that should be discounted from any reporting. Most log file based software products do not filter this traffic in its reports which results in over counting.
If there is even the slightest variation in any or all of these three factors then your reports will differ greatly between the software products.
The next question that you are likely to have is “which is the most accurate method or product?”
Unfortunately it isn’t that simple and there is no definitive answer to this question. The best option is to stick to one tool. If you do need to change or use multiple tools, then planning a transition between the old and the new numbers is a good step. Run the two tools together for a period of time (say 3 months) and then move across to the new reporting system at the end of this period. If you do this though, be very sure to have clear communication as to why the change in tool was necessary and to provide information in advance that each report user should expect to see changes in their numbers.
If this seems too painful to consider, then an alternative approach is to use the original tool to report figures to management and the new tool for the detailed work for which it was purchased.