Retention & Data Lifecycle
When to keep data, when to archive it, when to delete it — driven by legal, business, and privacy requirements.
Retention policy = declarative mapping (data class → retention duration + destination tier + deletion mode). Retention duration driven by: statutory (e.g., EU commercial records 10y, tax 6-10y by country, HIPAA 6y, PCI cardholder 1y post-auth), contractual (customer master agreements), business need (analytical value decay), privacy (GDPR purpose limitation).
Lifecycle stages: **active** (OLTP, real-time querying), **analytical** (warehouse, aggregation), **archive** (cold storage — S3 Glacier Deep Archive, GCS Archive, Azure Archive tier), **anonymized** (personal identifiers stripped, aggregate kept for long-term trends), **deleted** (actually purged including from backups per next-rotation policy).
Automation patterns: (i) **object storage lifecycle rules** (S3 Lifecycle, GCS Lifecycle, Azure Management Policies) transition + delete objects by age/tag; (ii) **warehouse partition expiration** (BigQuery table partition expiration, Snowflake retention); (iii) **app-level jobs** (scheduled tasks invoking delete with audit log); (iv) **tombstone columns** + scheduled hard-delete for soft-delete patterns.
**Right to erasure (GDPR Art. 17)** and retention conflict: if a data subject requests deletion but statutory retention requires keeping the record (e.g., invoices), you can refuse erasure citing legal basis but must still delete everything not required for the statutory purpose (e.g., the marketing profile linked to that customer).
**Anonymization vs pseudonymization**: anonymized data is NOT personal data (irreversible removal of identifiers), falls outside GDPR. Pseudonymized data (identifiers replaced with token, with re-identification key elsewhere) IS still personal data. Most 'anonymized' datasets are actually pseudonymized — k-anonymity + l-diversity + t-closeness tests required for true anonymization.
implementation: a flag on records/accounts that overrides retention expiry until the hold is lifted (litigation close, investigation close). Must be auditable (who placed, who lifted, for which case, when). Many orgs fail here — policy-defined but not tool-enforced.
**Defensible deletion**: you must prove you deleted (audit trail) and that the deletion was part of a consistent policy (not selective cover-up). 'We just deleted this tape' without a documented retention schedule is called 'spoliation' in litigation and can bring severe sanctions.
**Immutable / WORM storage** (Object Lock, Glacier Vault Lock, legal hold modes) is required for regulated records (SEC 17a-4, FINRA) — data cannot be deleted or modified until retention expires, not even by admins.
Grounded on https://www.dama.org/
Next up
Implementing Data Governance (Frameworks & Maturity)
How to actually roll out governance in a real organization without boiling the ocean. Start small, show value, expand by domain.