Krux Uncovers Significant Data Collection Activity


SAN FRANCISCO and NEW YORK, May 7, 2013–Third-party data collection across many leading websites continues at very significant levels while data collection via social media/sharing widgets is growing rapidly according to the third-annual Cross Industry Study (CIS) of web data collection activity from Krux (  The full report is available for download here.

The Krux study is the only research project solely focused on ‘data leakage,’ consumer and website data collected by companies other than the site owner, often without compensation or consent. This year, in addition to the top 50 ad-supported content sites, Krux expanded its analysis to also include data from the top 100 e-commerce and marketer sites, as well as 50 smaller content sites. Krux is the global technology leader in cloud-based consumer data management solutions.

“Though the increase of third-party data collection has moderated due to better data governance by website operators, there’s still a great deal of unknown and unwanted data harvesting happening out there,” said Tom Chavez, Krux co-founder and CEO.  “Given Krux’s role in the data ecosystem, we understand that not all collection is bad.  However, when 46 percent of this collection comes from higher risk intermediaries and market middlemen, it raises questions as to which companies are mining the data, if that collection is fully sanctioned and how that data will ultimately be used.”

Study highlights include:

  • Data collection activity that is within the website owners’ control dropped by 40 percent.  This illustrates a concerted effort on the part of many web operators to implement better data governance practices and exert greater control over what’s happening on their pages.
  • The number of third parties observed participating in data collection continued its rise, from 168 individual companies in 2011, to 300 in 2012 and 328 in 2013.
  • Collection from higher-risk categories rose as well, from 40 percent in 2012 to 46 percent in 2013.  Collectors are considered ‘higher-risk’ if there is the potential that they will use the data they collect to power competitive market activity.


"Data will be one of the core drivers of our business," said Mark Howard, senior vice president of digital advertising strategy, Forbes Media.  "Our work with the Krux platform is one of the major reasons we're comfortable entering into the data business, to help ward off data leakage and ensure that all of our media and data transactions are conducted in a secure and policy-controlled manner.”

When digging into the data collection dynamics this year, Krux uncovered a number of interesting trends:

  • Data collection volume from social media/sharing widgets, including mainstream social media principals like Twitter, Facebook and Google+, as well as intermediaries such as AddThis, grew almost 30 percent from 2012 to 2013.  Data collection from social media/sharing widgets now represents 20 percent of all third-party data collection.  This growth reflects both more aggressive collection activities by the social players as well as increased reliance on social/sharing tools by websites to grow their consumer reach. 
  • Data collection from advertising Supply Side Platforms (SSPs) dropped 70 percent from 2012 to 2013.  This reflects a significant drop in volumes from AdMeld since its acquisition and integration into Google. It may also reflect increased frequency capping, server-to-server cookie synching and other techniques harder to observe at the page level.
  • Like their content counterparts, e-commerce and marketer sites experience a similarly high proportion of third-party collection activity that is beyond their control, 60 percent and 54 percent respectively.

“In expanding the scope of this year’s report, we uncovered one of our most compelling findings: everyone is a publisher,” stated Krux President Gordon McLeod.  “For commerce and marketer sites, much of the data collection stems from the same types of media and technology players we see on content sites.  And on commerce sites specifically, we saw a prevalence of higher-risk collectors, not surprising given the relative value of consumer purchase and intent data.  Ultimately, this underscores the fact that first-party data is extremely valuable.  If not, why are so many third-parties so intent upon collecting it?”

Survey Methodology
Krux used the proprietary, cloud-based data scanning capabilities of its platform to analyze data collection activity across more than 50 top ad-supported websites in the U.S. Krux selected a representative set of pages from each site, emulated individual browser sessions for each of those URLs and precisely measured the resulting data collection events within those browser sessions. This year’s study also analyzed the top 100 commerce/marketer sites plus an additional 50 smaller content sites. The study defines a data collection event as any instance when a cookie is set, referenced or modified on a user’s browser.

Krux’s cloud-based infrastructure helps companies protect, manage and monetize the consumer data created across all their screens and sources. The 2013 Krux Cross-Industry Study is available as an infographic and is downloadable from the Krux website: CIS2013.