The Essentials of Personally Identifiable Information (PII)
Modern privacy regulation is centered around the concept of personal information. The General Data Privacy Regulation (GDPR) popularized it, but since then similar initiatives—like the California Consumer Privacy Act—have expanded on the definition of "Personal Information."
If your application collects any kind of information about your users or customers, it is important that you track when, how, and for what purpose you are collecting their data. In this article we will look into what classifies as personally identifiable information (PII), and some lesser obvious data types that you'll also need to be aware of when capturing user data.
Linked vs. Linkable Information
Before considering how each piece of regulation defines personal data, it is important to understand that user data is generally considered linked or linkable.
Personal data that can be directly linked to an individual is linked information. It is overtly personally identifiable, and is what nearly all privacy legislation covers as a baseline. Some examples of linked information about an individual includes:
- A full name
- Personal or business addresses
- ID numbers
- Telephone number
- Driver's license number or state ID number
- Social security numbers
- Banking or credit card numbers
- email addresses
This data is captured with consent from the user, and for the explicit purpose of identifying them or confirming their identity.
While linked information is easier to identify, it isn't the only way to identify an individual. By combining pieces of otherwise unrelated data, and individual can be identified. This type of personal information is considered linkable information. Some examples of linkable data include:
- A partial name
- Location information like city, state, or postal code
- Race or ethnic background
- Job titles
- An age range
- Search history
- Geolocation data
- Shopping preferences
A single piece of linkable information on its own is generally harmless, but as more data is captured about a user, they can be identified with a high degree of accuracy.
This is often how highly targeted advertising is used. Details about an individual's shopping habits, their region, and the habits of their social groups can render a picture so accurate that it can be used to distinguish a specific person. As a result, some legislation now considers linkable information as personally identifiable information.
How regulation defines whether "personal information" should include linkable information differs, but unless your organization is localized to a specific region, your aim should be to treat all data, linked or linkable, as personal information.
Sensitive vs. non-sensitive personal information
Personally identifiable information is also classified as sensitive or non-sensitive. Sensitive information requires more care during storage and transmission, such as using encryption. Whether a data point is labeled as sensitive or not is determined by how harmful the disclosure of that information would be if a breach occurred.
Nonsensitive information may be publically available information and includes things like:
- City or Zip code
- Political affiliation (in parts of the world where that data is public)
- Date of birth
This information, by itself, is not enough to identify individuals. As we learned in the linkable section, however, its sensitivity can increase when paired with other data points.
Sensitive information is data with a higher risk of damage should it be disclosed outside the original collection purpose. This includes linked identifiers like full names, ID numbers, biometric data, and medical records. As with linked vs. linkable, the legal definition of sensitivity will differ from one piece of regulation to the next.
General Data Protection Regulation and expansions on PII
Each regulation has its own definition of what qualifies as personal information. The general data protection regulation (GDPR) defined much of the data types we've listed in this article. Article 4 of the regulation outlines personal data as:
...a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
It goes on to expand this definition further in Recital 30 by clarifying that online identifiers—cookies, IP addresses, MAC addresses, etc—can also directly identify an individual and are considered personal information.
Specific regional regulations, as well as organization-specific policies, further define what data is classified as personal. While the GDPR's definition is intentionally broad to allow for more coverage, California's Consumer Privacy Act (CCPA) expands the coverage even further through the phrase "could reasonably be linked." This is where the broad definition of linkable information comes from. It offers protection to any information that could be used to identify an individual, even if the information isn't as obviously linkable. Other regional legislation, like the New Zealand Privacy Act 2020, take a less specific approach. The privacy commissioner defines personal information as "any information which tells us something about a specific individual."
Other industry specific regulations can narrow the scope of what constitutes personal information and who can access it, such as the Health Insurance Portability and Affordability Act (HIPAA) for the Healthcare industry or Family Educational Right and Privacy Act (FERPA) for education.
How your application should handle personal information
Most of the larger regulations, like GDPR, require organizations to implement systems for tracking how the personal data of citizens is being collected, used, and shared. For most organizations, this becomes more complicated as third-party APIs and web services integrate with that data. It is important to track if and how the personal information of your users is transmitted to a third party, and furthermore what plan of action you will take if that vendor suffers a data breach. It can also be useful to assess all of your third party vendors to ensure that they are compliant with any local regulation, as well as see if they have performed any security and data privacy audits.
Perhaps most important of all visibility into which APIs are used within your org and which have access to user data. More and more companies are unaware of the full scope of integrations that their teams use. We call these shadow APIs. They pose a risk to data privacy, application performance, and security.