musings of a data scientist
04.18.2012 - Roopak Gupta
Today's guest blogger, Krux data scientist and all around good human being, Roopak Gupta, shares his reflections on how Big Data is changing the world.
There’s no doubt about it. We’re living in the era of Big Data. And as a data scientist, I can honestly say I’ve never been more excited, energized, and amazed – and thankful my career has led me where it has. The new breed of data-driven applications that are coming to market aren’t just better than the ones that have come before, they are truly redefining what’s possible.
Historically, we’ve relied on statistics, sampling, and probability when the sample set was too large to process. We developed clever methods to overcome processing limitations to ensure meaningful insights could still be gained. While I loved this challenge in the past, it always felt something was missing. Sampling solved an important challenge, but it sure left a lot of data on the table, unprocessed and unread.
Big Data, you complete me
Today we are in a different world, and I couldn’t be happier. I come to work each day with petabytes of data at my fingertips and with the ability to process every last bit of it. And while the Big Data business applications are getting most of the attention, these new levels of data processing and analytic horsepower have very real applications for the broader good. A great example of this is seen in the Santa Cruz predictive policing program, where they have taken 8 years of historic public safety data and harnessed it to predict the potential for criminal activity day to day. A recent assessment of that program can be found here.
The other thing that fascinates me is the importance of visualization of the abundant data that we have. You need to look at the entire landscape and visualize it to understand the big picture. Only then can you isolate the actionable insights. And if you do it right, you’re not only making the analyst’s job easier, you’re making the process of discovery far more personal.
Why I’m excited to go to work each day
We at Krux take pride in giving the users of our platform a terrific experience as they use our technology to better understand their audiences and put that insight to work improving their advertising, content, and commerce businesses. Just like search engine takes into account referral sites, search terms, the link selected, and how long it took a user to click a link to improve the quality of search results overall, we help publishers make the right content, advertising, or commerce decisions in support of better user experience and a stronger bottom line.
One of the ways we help publishers understand their audience is through our loyalty analytics, analyzing site visit and content consumption patterns and providing intuitive and powerful visualizations of what content groups of users gravitate towards. Every user is assigned to a category such as Fly-Bys, Occasionals, Regulars, Fans and Zealots based on their site activity, frequency of visits, amount of time spent and their probability of returning to the site. Once a user is categorized, it becomes easy to focus on who they are, what they want, and how to keep them coming back.
For example, if you want to understand your zealots and find more of them, you can use the profile data we append to those users to help find others like that elsewhere on the web. Alternately, if you want to convert your Fly-Bys to Regulars, you can achieve that by retargeting those users across the web to invite them back to your site. These are simple yet extremely powerful user engagement tools, and they draw on some real-time analytic and visualization techniques that only recently have become possible.
An example of Krux Loyalty Analytics
Beyond broad visitation trends, it’s also important to have a sharp view of what actions users are taking once on your site. For these scenarios, Krux has developed an event-based segmentation framework. This allows a publisher to identify key actions on their site and then flag and segment users who take those actions. This could be useful for an eCommerce site with the age-old abandoned shopping cart conundrum, a finance blog with a whitepaper download, or publishers who want to understand which kinds of users are consuming video content.
The event framework itself is important, but it becomes even more powerful when those event-based segments are further parameterized by other user information you have on hand. For example, in the case of viewing video, you can understand which of those video viewers came to the site (or to the video itself) via social channels, search engines, or who may be regular visitors (Fly-Bys or Zealots, perhaps?!). The possibilities are endless, and the end results are rewarding , first to the publisher via deeper insight, second to the end user as that insight is put to work to deliver a better experience.
An example of Krux event-based segmentation and analytics
The power of the event-based framework, when married with a rich mix of corresponding user-level data points, is what’s at the center of Atomicity, one of the five core tenets of Krux’s framework for the must-have data management infrastructure for operators on this ‘new’ consumer web. As described in that report (available in it’s original form here):
Atomicity. Site- and ad-based systems work well over large chunks or flows of users but hit the complexity barrier when they attempt to describe audiences across combinations of segments, sections, demographics, behaviors, or time. Atomicity is the ability to unpack cookies connected to individual users and to analyze who they are, the value they create, what they want, and where they’re headed.
Today, Krux’s cloud-based infrastructure manages and processes more than 10,000 requests per second. Every one of those requests is analyzed in real time and translated into insight so our platform clients can better serve their end users delivering better web experiences. And all the while, I’m proud to know we’re doing it all with the consumer as our true north. Through the great work of others on the team, we are developing some of the best solutions for respecting user privacy, user preference, and for honoring DNT – I encourage you to take a look at my colleague Jos Boumans’ recent posts on the state of DNT and our response here and here.
This is an amazing world of – truly – “big” data we live in, and I invite you to visit us, whether you’re a consumer who wants to know more, a publishers who might like our solutions, or a thrill-seeker who might like to join the team.
The future glistens. The application of Big Data and cloud-scale processing and analytics is virtually uncapped, in digital media and beyond. It is going to be a very different world in a few years and an exciting place to be – as a web user certainly, but as a data scientist most of all.