Why Data Quality in E-commerce is So Very Important

Damien Smith - Thursday, July, 06, 2017

Nothing causes senior management or a business owner to distrust the numbers faster than errors in the data. Once that trust is gone it’s very difficult to win back.

Whilst it is important that any user of Google Analytics Enhanced E-commerce understands that the transactions that are recorded will not match your accounting system exactly, they should be very close. Typically we aim for an error rate of less that 5% but have seen differences up to 10% on a healthy website with no tracking issues.

Some of the key errors that might be influencing your data are listed below:

Missing Transactions

Whilst a small amount of missing transactions is virtually unavoidable sometimes errors can creep in.

Common issues that cause this are:

  • Unexpected characters in the JavaScript code. E.g. the $ or , characters appearing in the price field or quotation characters in the product description.
  • Incomplete data being sent to Google Analytics. The user exiting the browser if the confirmation page takes too long to load is a common cause.
  • The customer is not always sent to the confirmation page that sends the data to Google Analytics. Paypal and other external payment gateways can sometimes cause this if the customer doesn’t fully complete the process and return to the thank you page.

Too Many Transactions

If you see that the total count of transactions is higher than expected it is likely that some customers can reload the confirmation page that sends the data to Google Analytics and cause that transaction to be double counted.

This can happen if you send a confirmation email to the customer that encourages to return to their order details and your system reloads the Google Analytics code when the user visits that page. It can also happen if the customer reloads their confirmation page.

The solution to this is to only send the data to Google Analytics the first time. This may require modifications to your shopping cart as many will allow for the full transaction details to be reloaded when the confirmation page is refreshed.

To check if your data includes any duplicate transactions, click on the link below to view a custom Google Analytics report and select your normal Google Analytics view when prompted. The number of transactions per transaction ID should only be “1”.

Duplicate Transaction Report

Not Tracking Refunds

Whilst it is a little harder to implement if your store receives a large number of refunds this can significantly bias the data. Google Analytics can reverse transactions by sending a refund event. Removing these sales is important to ensure that you accurately measure the results of your marketing campaigns.

Incorrect Sales Attribution

This is the number one issue that we are brought in to solve time and time again.

There are two main causes of this:

Domain Issues

The first occurs when your payment gateway is on a different domain and the final confirmation page is on that domain. The symptom that you will see is that all sales are attributed to that domain. The reason for this is that the customer’s is tracked as a different user on the payment gateway site.

The solution to this is to implement cross domain tracking so that the user can be tracked consistently across both domains.

Tip: Use Google Tag Manager. It makes solving this problem much, much easier than doing this using straight Google Analytics code.

Referrer Issues

The second occurs when the customer is taken to a different payment gateway and then is redirected back to your own store to see the confirmation page. In this case you might see your own site, Paypal or some other payment gateway as the source of the sales from your site.

The solution to this is to add all of your sites and the payment gateway domains to the list of ignored referrers in your Google Analytics settings.

There are many other ways in which you can receive unexpected data. If you need help in solving these please contact us for support.

The problem with prediction, in particular presidential prediction

Rod Jacka - Thursday, November, 10, 2016

I really like the word phantasmagorical but I rarely have a chance to use it.

It’s by no means a common word – so a simple definition might be useful:

‘a confusing or strange scene that is like a dream because it is always changing in an odd way’
Mirram-Webster Online Dictionary

Now with that in context, it’s fair to say that Donald Trump’s election has surprised more than a few, and that the use of the word phantasmagorical is highly justified. From many a perspective, it was inconceivable that someone with his rhetoric or political inexperience would be elected leader of the USA.

From the start, Nate Silver’s FiveThirtyEight website has provided great insight. It’s had a solid history of good predictions and is seen as a reliable source of information, condensing large numbers of variables into a predictive state. During the election I also chose to revisit Nassim Taleb’s book Antifragile as a companion to FiveThirtyEight. Ever since reading his first two books, Fooled By Randomness and The Black Swan several years ago I have been fascinated with statistics, volatility and randomness in general.

A key message I took from Taleb’s books was that prediction of rare events is not possible. The world is a chaotic and complex system with many variables where interactions lead to unexpected and unpredictable outcomes. Whilst we can measure many things using solid historical data, it is the unseen variables and their even greater unknown interaction (both seen and unseen) that can quickly send predictions through the floor…..as we saw.

As a long term practitioner of analysis using statistical tools and approaches I have long come to respect that models are just models – they are not the world itself. In the oft quoted words of George P. Box and Norman R. Draper (1987) “Essentially, all models are wrong, but some are useful”. Equally I have come to understand that to correctly use statistical tools you absolutely must deeply understand how they work and what their limitations are. (Like whether the media is involved?)… side thought.

There are many cases where forecasts and predictions based on data and analytics has worked, but we must always be mindful when using these tools that we are making predictions based on past observations. There have been 58 presidential elections in the entire history of the USA. As such, any model trained on predicting an event which only occurs every 4 years with 58 outcomes across 227 years is likely to get it wildly wrong from time to time. To give an analogy, all swans are white until you see a black one.

What I think we have seen with the US election results, and Brexit too, are black swan events.  They’re emerging from many factors which are leading towards some large changes in the future. It seems very likely to me that the forecasts which were being made during the election campaign were inaccurate simply because nothing like this had been observed before.


http://www.dailywire.com/news/10660/just-how-wrong-were-pollsters-final-polls-vs-final-james-barrett#Statistics and predictive analytics are great tools, but I also strongly encourage you to examine and understand the work of Nassim Taleb, Daniel Kahneman, Amos Tversky, Philip Tetlock and many others who work in the areas of of understanding human behaviour, the limitations of our minds and complexity science.They may not help you to predict your outcomes, but adding their approaches to your toolkit of techniques will help you to better understand the world, appreciate the limitations of data and analytics, and make you a much better marketer and analyst.