Data Classification

Labeling every dataset by its sensitivity so the right controls (access, encryption, retention) apply automatically.

1 min read

= putting a sensitivity label on every dataset. 'Public', 'Internal', 'Confidential', 'Restricted' is a common 4-level taxonomy. The label decides what protection applies: encryption, access rules, where it can be stored, who can see it.

Think of it like security clearances in government: Top Secret data doesn't go in the same drawer as today's lunch menu. If everything is 'secret', nothing is — you lose the ability to prioritize.

Concrete example: Public — the team's names on the website. Internal — the org chart. Confidential — salary data, roadmap. Restricted — customers' credit card numbers, health records, authentication secrets.

Why it matters operationally: when a new dataset is created, the label determines: is it encrypted at rest? Can it go into a dev environment? Can it leave the EU? Who needs to approve access? Without a label, these questions get answered case-by-case, inconsistently.

A good classification is minimal (3-5 levels — more becomes unusable), actionable (each level has distinct controls), and applied at ingest (datasets get tagged when they enter your systems, not months later during an audit).

Grounded on https://www.isaca.org/resources/news-and-trends/industry-news/2020/data-classification

Next up

Data Quality

The dimensions (accuracy, completeness, consistency, timeliness, validity, uniqueness) that make data trustworthy — and how to measure and fix them.