Identity Management: An Open Letter to the Industry

CEO's Corner Nov 23, 2015

Identity management is a pillar of the data economy, but it is also a process that is imprecise and subject to the disparate practices of many marketing cloud players. With this post, we hope to expose some of the dynamics at play and elevate the industry dialogue about many of the related concepts.

shutterstock_284496383

 

Every web-enabled enterprise these days is on a mission to engage with its customers and prospects across every nook and cranny of their digital lives. Finding the same user who read your email offer and presenting him with a compelling online coupon, orchestrating and personalizing a conversation with prospects across mobile, video, and display – these challenges naturally compel marketers to invest in services for matching users across systems and screens. Identity management has, accordingly, become the most pressing, most strategic challenge for the marketing cloud.

There is, however, a fly in the stew: ‘match rates,’ which are intended to measure the effectiveness of identity management services, don’t make sense. Numerators require denominators, and ratios work when they’re both correctly computed. When it comes to match rate metrics today, in the best case we’re calculating them incorrectly. In the worst case, we’re putting apples into the numerator and oranges into the denominator. The situation is dire, but recoverable. If we crisply articulate what we want to measure, with attention to understanding what’s actually measurable, it’s possible to harness full value from the many identity services that are coming into the market. 

  1. Apples to oranges: offline-to-online matching

On-boarding’s objective is to find offline users online. Implicit in the notion of a match rate for on-boarding is the idea of a direct correspondence between a unique offline identifier and a unique online identifier. A match rate of 80% would suggest that you’ve ‘found’ 80% of your offline users online. But that’s not how it works.

On-boarding in its current form works by taking a blob of CRM records, which contain a scrap of PII (an email address or phone number, for example), and possibly returning a blob of online records that replace the PII with a cookie and the other data as key-value pairs appended to the online user ID or cookie (see: Figure 1).

Figure 1: Bringing Offline Records Online
Offline Online Onboarding

Figure 1: This illustrates how offline records are – and are not – matched against online user IDs

 

 

 

 

 

 

 

 

 

 

 

To fix ideas, let’s consider an example (see: Figure 2). The left column of the table below represents a set of records funneled into an on-boarding service.

 

Figure 2: Considerations in the Onboarding Process
diagram-record-onboarding

Figure 2: In short, numerators and denominators matter when evaluating match rates. A change in the basis of comparison can yield dramatically different results. 

 

As we can see, for the first record, the onboarding service returns two online IDs, corresponding to a person located on each of his mobile and his desktop devices, say. This is an example of on-boarding’s ‘one-to-many’ aspect. Notice that the second and fourth records result in the same online id pointing to the same person; this might correspond to a scenario where you maintain two email accounts (work and personal, say) and toggle between them on a single browser or device. This is an example of on-boarding’s ‘many-to-one’ aspect. Finally, notice that records 3 and 5 fail to identify even just one online cookie. The offline-to-online universe isn’t fully mapped, so not every record results in an online match.

Figure 3: The Inherent Complexities in Mapping Records to IDs
One to Many Matching

Figure 3: Matching is rarely a one-to-one exercise, and a clear view into disparate user IDs and cross device usage is critical to success. 

 

Match rates as currently measured for onboarding put apples (non-unique records containing potentially duplicated online IDs) in the numerator and oranges (non-unique records containing offline IDs) in the denominator. When you correct for the many-to-one, one-to-many, and one-to-none dynamics, it’s easy to see how dicey match rate calculations can be. For the example we’ve considered, current match rate calculations simply takes the number of output records (4 in our example) and divides it by the number of input records, yielding a match rate of 80%.

But if you tally the number of distinct online IDs returned and divide by the number of actual people in the input file to the on-boarder, you get a match rate of 60%. And, when you calibrate further for the many-to-one aspect and the fact that two of the offline records point to different devices owned by a single person, the match rate declines further to 40%. Plainly, offline-to-online match rates aren’t as simple as they appear at first blush.

 

  1. Apples to apples: online user matching

The next question to wrestle to the ground involves match rates tied to identity across different online channels and websites. Marketers seeking to interact with customers and prospects across video and display, for example, will frequently buy media from many different sources, and they will need to map their “master” list of cookies/user records with those of their media buying partners.

The good news is that, for multichannel online matching, we’re talking about apples on both sides. There are extremely well-established practices for synching users across different systems. But still, there is a rub. Unlike the onboarding scenario above, rather than overstating match rates, today’s approach to calculating match rates can grossly understate that success. Further, it can gloss over some of the challenges that influence the population of users that are actually ‘matchable’ in the first place.

There are three dynamics at play (see: Figure 4):

  • Safari browsers – Safari browsers will not accept third-party cookies. If a marketer has placed their own first-party cookie on a device or browser, their buying partner will not be able to reference that same cookie; that user is therefore ‘dark’ and unmatchable.
  • Recency – some buying partners apply recency controls, meaning if that they have not see a user themselves within X days, they will disregard that browser cookie or device record and not even attempt to make a match, even if one is possible.
  • Mobile Devices – Android and iOS devices do not accept cookies for app-based marketing, and instead rely on unique device IDs. Not all buying partners support this kind of ID-based targeting.

The key takeaway? When calculating match rates for online user matching, the denominator needs to be adjusted for Safari, recency, and mobile IDFA’s to arrive at an intelligent estimate of the total matchable population. As the image below illustrates, that denominator can be calculated, but it requires a degree of cross-web and cross-device visibility few systems provide.

Figure 4: Understanding the True Matchable Population
The Denominator Matters

Figure 4: In short, not all users or user IDs are matchable. This can lead to dramatic differences in Match Rates based on how calculations are performed.

 

  1. The cross-device challenge

The scenarios outlined above all related to the matching of one known thing to another known thing – an offline record with an online cookie, or one cookie with another. But marketers and media companies increasingly encounter users via different devices – web, phones, tablets. The challenge they face is to understand if and how those disparate device-level IDs correspond to distinct individuals.

When it comes to cross-device, there are two types of match rates: probabilistic and deterministic.

For deterministic matching, one relies on known or declared data to connect a person with a device. For example, if you are a digital subscriber of a publication and you are logged in when you visit on your laptop (see: Figure 5), phone, and tablet, there is little doubt that you are you. It’s worth noting, however, that this carries its own imperfections. First, there is the shared device conundrum. If someone visits this same publication on her spouse’s machine while her spouse is signed in, the data signal can be muddied. Further, if one uses multiple email accounts on the same device, for example, there can be two user records associated with the same machine.

Figure 5: An Illustration of Deterministic Matching

Figure 5: Mapping subscriptions to online user IDs is a prime example of a ‘deterministic’ match.

For probabilistic matching, one relies on observation, inference, and some advanced machine-learning techniques to predict who’s on the other side of the screen. For example, if you are a reader of a publication that asks for a subscriber login on your phone, tablet, and computer, but you’re not a subscriber and never login, the publisher has no way of knowing that the three cookies they see correspond to you, a single person. The right probabilistic solution can go a long way to solving this challenge for a large segment of those otherwise unknown users.

Predictive algorithms are hungry for data; the more data they crunch, the more accurate their predictions. The leading approaches to probabilistic matches rely on access to a large sample of users. The size of the sample matters for two reasons. First, with increased reach, the algorithms can observe behaviors, activities, and trends that indicate strong correlations between otherwise disconnected device IDs. Second, access to more user records – and a more diverse user universe – allows for more advanced testing routines to measure performance and enhance the accuracy of their results. This could, for example, take the form of a “hold out” test, whereby the algorithm’s matches are continuously tested against a set of known users. The larger the data set, the more at-bats the match algorithm has to practice and continuously improve its on-base percentage.

So, there are two takeaways regarding the cross-device challenge. First, even the best deterministic approaches are subject to imprecision in light of the way we share devices and host multiple accounts on a single machine. Second, when it comes to probabilistic approaches, we do not yet have a standard benchmark by which to measure the efficacy of probabilistic matching algorithms. A good first step is to ask the size of the sample population behind the match rate claim; if it was tested on a sample set of 1000 where 900 of the users were already known, you have a right to be skeptical. And if the probabilistic algorithm was tested on a very large population of users (ideally measured in the 100’s of millions), your confidence in the results should be much higher.

Conclusion

Identity management is a pillar of the data economy, but it is also a process that is imprecise and subject to the disparate practices of many marketing cloud players. I offer three recommendations for navigating the maze ahead:

  • First, as an industry we must get crisper in the language and the distinctions we use to describe the data we handle. We owe our customers and partners greater transparency and visibility into the calculations that underpin the metrics we bandy about, “match rate” chief among them. A good start, when asked for your match rate, would be to courteously respond, “Well, what match rate do you have in mind? Offline/online, cross-device, or cross-web?”
  • Second, we need to educate our customers and partners regarding the nuances of identity management so they can start to make sense of the results they’re seeing. For example, if they’re vexed by online user match rates, we should ask them to consider whether they’re looking at total users or matchable users. We should prepare for inconsistencies in the assertions of competing vendors and recognize that the final output of user, record, and device totals will rarely foot exactly. It’s an inexact science in the best cases, and in the worst case, it’s not a science at all.
  • Third, and most importantly, as enterprises engage with service providers, they should never hesitate to ask the follow-on questions about the what, how, and why that sits behind the curtain or inside the black box. A single match rate assertion needs to serve as just the beginning of the conversation, not the end.

Taking the longer view, the ultimate solution lies in the development and deployment of a broad new layer of identity management solutions. Few systems deliver the scale to effectively handle the diversity of data most businesses are integrating and throwing off. The ultimate solution is a cocktail spanning multiple systems and approaches, not a silver bullet.

This has been an area of considerable investment for Krux and will continue to be a prime focus. We have the benefit of drawing on a standard-setting proprietary technology, a broad partner ecosystem, and unprecedented global consumer reach across 3B browsers and devices.

While the preceding was just my perspective, it reflects the work and focus of the entire Krux team. Stay tuned for much more from us on this topic in the months ahead.

To find out more about the issues highlighted in this post, or to discuss how they may be impacting your business, feel free to reach out to the Krux team today.