Web Analytics & Marketing Optimisation Experts

Sign up to receive marketing & analysis tips:

Should I worry about missing data?

By Rod Jacka

I am often asked the question, "which method of data collection is best, log files or browser based JavaScript?" Another common question goes along the lines of "what about people deleting cookies? I delete mine all the time...".

It is the stuff of strong opinions, and can be the cause of heated debate.

In both cases there isn't a perfect solution. Both data collection methods lose data and yes, visitors don't behave themselves and they do occasionally (or even regularly) clear their cookies.

One way or another I am not worried about it. Why; because I am comfortable making decisions based on sampled data. Virtually every survey conducted is based on sample data. Obviously it is impractical to ask everyone to answer a survey, so we take a sample of the population and then estimate the results. With web analytics we can apply the same thinking by treating the visitors to our website as one or more different population groups.

There are two key issues that we need to understand in order to correctly use the data.

  1. That the data that is collected isn't perfect, however it represents a very large sample of the population of visitors to the website, and
  2. Defining exactly what is the population group that we are looking to measure.

Let's start with the data. Whilst it may seem strange, your web server log files don't capture every interaction with the web site. Your web browser holds a temporary copy of web pages when you view them and your ISP may do the same to make browsing the web faster for others. In both cases other requests for this page may not make it to the web server.

JavaScript based tagging is generally thought to be more accurate in capturing more page views, however if a visitor doesn't turn on JavaScript or they leave the page before the scripts complete their work, their data is lost.

Another issue is how the tool that you are using determines what a visitor is. Cookies are not perfect, but they are the best that we have. But if we use them and the cookie isn't there, then the visitor doesn't exist.

The end result is that we can't rely on our underlying data to be an accurate record of every visit to the website. Whilst it is difficult or even impossible to test for how much data is lost, it would be safe to say that even if we lost 50% of the data, we can still draw valid conclusions from the 50% we do have. A 50% sample of any population is more than enough for most scientific studies.

The second and probably most important issue is defining what the population group that we are measuring is. The art (and the science) to all of this is to find the right sample that represents the group that we want to measure. To get meaningful results we need to use a sample that represents the people who we are trying to reach. This means that we need to define:

  1. The population group. I.e. who are our customers, users, etc and where they are located.
  2. The percentage of our population group which visit and interact with our website.
  3. The behaviours they are likely to exhibit when interacting with the website.

From this we can start to define segments and key performance indicators that we can use to observe whether we are successful or not in reaching these people.

For many sites, this is relatively straight forward. There is product sold or an enquiry page and visitors arrive at the site via advertisements or search engine rankings. Once they are there they either convert or they don't. In these cases web analytics is relatively straight forward and by simply aligning our advertising, landing page and sales processes the results will improve1.

When we look at more complex sites, such as government, universities, large multifaceted corporations, etc defining each of our population groups (read segments) is critical to understanding what is going on with the website. For example one of our government clients has classified their visitors into 3 groupings representing different levels of interaction and service needs. Broadly these visitor segments are:

  1. Those visitors with simple needs and who only access the site on a casual basis.
  2. Those visitors who regularly access the site and download key resources more than once.
  3. The "hard core" users who use the site extensively and have specific and unique requirements.

These definitions can be used to classify visitors to the website based on their behaviour.

A basic set of segments that you can use on your website is qualified visitors vs. unqualified visitors. A qualified visitor is someone who meets the criteria that we are interested in. For instance, let's say your business sells specialist scientific instruments to research institutions and universities in Australia. A qualified visitor is likely to be one who:

  • Is located in an Australian research institution or university.
  • Is likely to visit the website on more than one occasion whilst they research the product.
  • Will possibly request quotes, specifications and other detailed information.
  • Will probably visit more than 2 product pages during each visit.

These attributes can be used to place a visitor into the qualified visitor group. The techniques to do this are beyond the scope of this article, however the following tools provide the ability to do all or part of this:

  • ClickTracks
  • Google Analytics
  • Urchin
  • Omniture SiteCatalyst
  • HBX SiteCatalyst
  • WebTrends Marketing Lab Enterprise

If your website is a simple one, say a lead generation website or a small ecommerce site, then you probably don't need to worry too much about this, however if your website isn't converting visitors to customers, is large or complex or has many different audiences then combining segmentation with a sampling methodology will lead to the greatest results in analysing your website.

Panalysis are experts in helping you to create segments, define key performance indicators and to track your website against these. For further details, please contact our sales team.



1. The problem is that overall conversion rates to sell products and services or to generate enquiries will be very low. With conversion rates hovering around 2% for many websites, I am not the first person to take a pessimistic view that the glass is not 2% full, it is 98% empty. For further reading on this, I thoroughly recommend Avinash Kausik's book "Web Analytics an Hour a Day".



Services: Google Analytics Support | Google Mini | Web Analytics Training | Search Engine Optimisation Audit