Comparing the statistics of different site-centric systems

When comparing statistical values of two site-centric systems, the following issues should be taken into consideration:

  1. The range of web pages that are measured by the site-centric systems should be exactly the same.
    In the case when one site-centric system measures a different range of pages than the other one, then the statistical values may differ (especially the number of page views). The difference in the numbers of page views is most often caused by measuring different ranges of web pages (especially the difference in the number of cookies is much lower than the difference for page views).
    It is recommended to verify the statistical values for a chosen single page (e.g. for the main page only). If the statistics for a single page come out to be equal in both systems, then it may mean that one site-centric system measures all the pages that were intended to be examined and the other system omits some pages (tracking tags have not been pasted into those pages).

  2. Definitions of the indicators that are being compared must be the same.
    The same indicator may have different definitions in different site-centric systems.
    It is important to know the definition of the users. In the case when both systems count cookies, it is recommended to verify how site-centric systems create new cookies (and whether the site-centric system is able to distinguish between cookie and non-cookie page views). In the case of visits to a given page, one should verify whether the maximum time gap between two consecutive page views within a visit lasts the same.

  3. Comparing daily statistics.
    Please compare the statistics day by day and check whether the difference (in percentages) keeps more-or-less constant value each day. If, on any day, the difference was significantly higher, it usually means that tags did not work the whole day (e.g. because tags had been removed from the web pages or there was a failure of the site-centric).

  4. Excluding some part of the traffic by one of the site-centric systems.
    Please verify ranges of IP addresses that are not taken into consideration during calculating the statistical values (e.g. excluding the traffic generated by the company's IP addresses).

  5. The spot where tags are inserted to the web page.
    Tags placed closer to the beginning of the html code (e.g. in the HEAD section) are usually run earlier than the tags placed further on the web page (e.g. just before the closing of the BODY section) and as such may cause overvaluation of statistics.

  6. Technical differences in tracking tag constructions.
    One of such differences could be the quality of non-caching protection. Another type is downloading a .js file versus downloading a .gif (or other image) file. Usually .js files are requested earlier than images (but such operation can affect the page displaying in the case of the server failure). Another issue is a bias caused by the NOSCRIPT section. A browser that does not support the SCRIPT section is able to run the NOSCRIPT section (and thus the site-centric system can calculate those page views). On the other hand, some browsers do not support NOSCRIPT section correctly (e.g. IE 4 runs both SCRIPT and NOSCRIPT sections and thus possibly 2 page views may be calculated) and images defined in the NOSCRIPT section can be downloaded by search-engine spiders (it also causes the increase in the number of page views).
    Additionally, let us not forget that Internet protocol does not give the guarantee that every request will reach the server.

If two site-centric systems show values for a given indicator with a difference greater than 5%, then it is worth verifying the reasons for discrepancies (using the above-outlined directions).

Please note that this document refers to comparing statistics collected by site-centric systems which use JavaScript tags inserted to the html code of web pages. When comparing a site-centric with a web-log analyzer system, some additional issues must be taken into account (e.g. the difference between the event of requesting the page by a browser and displaying the page in the browser; downloading pages by search-engine spiders).



back to the main page