Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Quasi-identifier
Information that reveal identity if combined with other information

Quasi-identifiers are pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier.

Quasi-identifiers can thus, when combined, become personally identifying information. This process is called re-identification. As an example, Latanya Sweeney has shown that even though neither gender, birth dates nor postal codes uniquely identify an individual, the combination of all three is sufficient to identify 87% of individuals in the United States.

The term was introduced by Tore Dalenius in 1986. Since then, quasi-identifiers have been the basis of several attacks on released data. For instance, Sweeney linked health records to publicly available information to locate the then-governor of Massachusetts' hospital records using uniquely identifying quasi-identifiers, and Sweeney, Abu and Winn used public voter records to re-identify participants in the Personal Genome Project. Additionally, Arvind Narayanan and Vitaly Shmatikov discussed on quasi-identifiers to indicate statistical conditions for de-anonymizing data released by Netflix.

Motwani and Ying warn about potential privacy breaches being enabled by publication of large volumes of government and business data containing quasi-identifiers.

We don't have any images related to Quasi-identifier yet.
We don't have any YouTube videos related to Quasi-identifier yet.
We don't have any PDF documents related to Quasi-identifier yet.
We don't have any Books related to Quasi-identifier yet.
We don't have any archived web articles related to Quasi-identifier yet.

See also

References

  1. "Glossary of Statistical Terms: Quasi-identifier". OECD. November 10, 2005. Retrieved 29 September 2013. http://stats.oecd.org/glossary/detail.asp?ID=6961

  2. Sweeney, Latanya. Simple demographics often identify people uniquely. Carnegie Mellon University, 2000. http://dataprivacylab.org/projects/identifiability/paper1.pdf http://dataprivacylab.org/projects/identifiability/paper1.pdf

  3. Dalenius, Tore. Finding a Needle In a Haystack or Identifying Anonymous Census Records. Journal of Official Statistics, Vol.2, No.3, 1986. pp. 329–336. http://www.jos.nu/Articles/abstract.asp?article=23329 Archived 2017-08-08 at the Wayback Machine http://www.jos.nu/Articles/abstract.asp?article=23329

  4. Anderson, Nate. Anonymized data really isn’t—and here’s why not. Ars Technica, 2009. https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/ https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/

  5. Barth-Jones, Daniel C. The're-identification'of Governor William Weld's medical information: a critical re-examination of health data identification risks and privacy protections, then and now. Then and Now (June 4, 2012) (2012).

  6. Sweeney, Latanya, Akua Abu, and Julia Winn. "Identifying participants in the personal genome project by name." Available at SSRN 2257732 (2013).

  7. Narayanan, Arvind and Shmatikov, Vitaly. Robust De-anonymization of Large Sparse Datasets. The University of Texas at Austin, 2008. https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

  8. Rajeev Motwani and Ying Xu (2008). Efficient Algorithms for Masking and Finding Quasi-Identifiers (PDF). Proceedings of SDM’08 International Workshop on Practical Privacy-Preserving Data Mining. https://www.csee.umbc.edu/~kunliu1/p3dm08/proceedings/2.pdf