Cybercriminals are waging war on personal data. Personal data exposure is not just an issue in terms of security and financial cost. Privacy, too, is a vital consideration. Consumers want to have their privacy secured and respected, so much so that privacy protection is now a competitive edge.
However, protecting and securing personal data is a complicated business. The more common ways to protect personal data are by taking data protection courses, as well as doing techniques like anonymization or pseudonymization. Here, we focus on the pros and cons of these techniques.
Definitions of De-identification, Anonymization and Pseudonymization
Personal data is information that could be used to identify an individual. Most of the privacy and data security regulations are based on the ability to connect personal data back to an individual. Therefore, being able to remove connections between data and an individual can, in theory, help with meeting some parts of these regulations.
Some methods convert identifiable data into anonymized data, in a manner where the converted data can't be reverted back into the original personal data.
The term de-identification is often used together with health data. Nevertheless, it can be applied to any personal data to enhance privacy. Generally, the technique of de-identification is used to separate personal identifiable information from health data. The data is not anonymized. The de-identified data can be re-associated at a later date.
Pseudonymization is a technique that takes personal identifiers and replaces them with artificial claims. For instance, the method may take a first name and surname and replace it with a pseudonym.
When and Why are De-identification, Anonymization or Pseudonymization Used?
There are several cases where the use of one of these strategies may be ideal. Some instances include:
Anonymization can tap into the sharing of health data in research improvement.
Personal data is key for many smart city services. This includes geolocation and behavioral data. Sensors and connected devices need a continuous feed of those data to optimize services.
Training Sets for Artificial Intelligence
AI requires training data. These data are often personal, and sensitive. Anonymization is a vital aspect of AI to secure privacy.
Methods and Practices in De-identifying Personal Data
There are a growing number of ways that can be used to decouple identifying data. Software programs to perform anonymization are relatively common and use a variety of techniques like transforming data by removing or replacing identifiers. In the case of photos, however, software programs blur the images.
De-identification of data, however, does not fully suffice. Using de-identification and anonymization techniques is as much about people and governance as it is about technology.
De-identification governance requires a framework to ensure successful implementation of the technology. If you only apply technology to anonymize data, you miss out on a vital area of the overall strategy — the people and decisions behind the solution. Without these elements, you miss the tenets of governance — accountability, transparency and applicability.
The best way to keep data secure is not to collect it in the first place. But, in reality, this is not ever going to happen. Various use cases across banking, government, e-commerce, and healthcare need to process personal and health data. Nevertheless, minimizing the data collected should be part of an overall approach to maintaining security and privacy of personal data. As it is reiterated in various data protection courses, online digital footprints containing location, behavioural, and other data will be harder to contain.
We must accept that de-identification, pseudonymization, and even full anonymization still come with risks. However, a program of de-identification, together with good data practices and governance, can help mitigate those risks.