Mar 9, 2019 - Technology

Practices to anonymize data often reveal a lot of detail about individuals

Illustration of black and white photo broken up into pieces

Illustration: Sarah Grillo/Axios

Most data brokers avoid scrutiny by saying the data they collect and sell is “anonymized,” or a summary of a lot of people's information, rather than a single individual’s data.

Yes, but: That "anonymous" data can be used to pinpoint real people, or match that data to other supposedly anonymous profiles.

  • In 2006, AOL released search histories of 657,000 anonymous Americans, hoping the data could spur new research. But those searches contained things like locations, ages and genders — ultimately linkable back to specific people.
  • Researchers like Latanya Sweeney have discovered a variety of other ways to reverse more subtle forms of information. In fact, most Americans can be identified by birthday, gender and zip code, she discovered when she led Harvard’s Data Privacy Lab.
  • Last year, researchers noticed that hashing email addresses — thought to be an anonymizing mathematical function that would turn email addresses into gibberish — could be reversed by taking lists of leaked email addresses and performing trial and error searches.
“Once released, information is hard to control. Thus, over time, the more information and data can be linked and analyzed, the higher the likelihood of being able to make sensitive inferences from it for larger groups of people."
— Alessandro Acquisti, a Carnegie Mellon Professor who has studied re-identification
Go deeper