On the Absence of Anonymity
04.11.2011 - Krux Digital
In a somewhat cautionary post, Slate’s Farhad Manjoo reminds us all of the inherent value to *some* anonymous tracking online, highlighting many of the ways it powers our web experience to deliver features that we’ve grown to love, and, frankly, take for granted.
The best example was that of Google’s contextual spellcheck, something this Krux writer himself has observed to beat the pants off of MS Word and Apple.
Take the spell checker: How does Google know you meant Rebecca Black when you typed Rebeca Blacke? Note that this is a trick that no ordinary, dictionary-based spell-checker could perform—these are proper nouns, and we're dealing with an ephemeral personality. But since Google has stored lots of other people's search requests for Black, it knows you're looking for the phenom behind "Friday." The theory behind the spell-checker can be applied more broadly. By studying words that often come together in search terms—for instance, people may either search for "los angeles murder rate" or "los angeles homicide rate"—Google can detect that two completely different words may have the same meaning. This has profound implications for the future of computing: In a very real sense, mining search queries is teaching computers how to understand language... If Google were forced to forget every search query right after it served up a result, none of these things would be possible.
While carefully considered moves to limit so-called anonymous data collection would hardly cripple the internet, such moves could certainly have a negative impact on our experiences as users and could slow online economic growth. But, Manjoo stops short of drawing a line between good and bad data collection. None of us is able to draw that line just yet, as the myriad ways in which data is employed is extremely complex, ever-changing, and woven deep into the fabric of the web experience. And to Manjoo's credit, he doesn't let the reader off easy, either - take the inherent Facebook conundrum:
Every year or so, the media and activists get exercised over some new and alarming slight by the social-networking company. Yet our actions belie our concerns; while we all holler about how much we hate Facebook, none of us quit it—and, in fact, hundreds of thousands more keep signing up.
Manjoo also offers the always-sobering reminder of how quaint the PII and non-PII distinction may soon become. If you don’t recall the EFF’s report from 2009 about the identification of individuals using only non-PII, go back for a refresher.
Gender, ZIP code, and birth date feel anonymous, but Prof. Sweeney was able to identify Governor Weld through them for two reasons. First, each of these facts about an individual (or other kinds of facts we might not usually think of as identifying) independently narrows down the population, so much so that the combination of (gender, ZIP code, birthdate) was unique for about 87% of the U.S. population. If you live in the United States, there's an 87% chance that you don't share all three of these attributes with any other U.S. resident. Second, there may be particular data sources available (Sweeney used a Massachusetts voter registration database) that let people do searches to bootstrap what they know about someone in order to learn more -- including traditional identifiers like name and address. In a very concrete sense, "anonymized" or "merely demographic" information about people may be neither.
As tracking becomes more pervasive, and computers get, well, smarter, it's easy to understand how we'll all soon be on the hook for taking a more active role in managing our individual data signatures. A decent one-size-fits-most solution has yet to be proffered by the industry, regulators, or consumer/privacy advocates. And it’ll be hard to come by. Here’s hoping we find one.