Dualo
Data Governance

Retention & Data Lifecycle

When to keep data, when to archive it, when to delete it — driven by legal, business, and privacy requirements.

1 min read

Data doesn't live forever. Keeping everything forever is a regulatory risk ( requires you to delete PII when the purpose ends), a cost (storage + maintenance of stale data), and a liability (more data = bigger breach surface). Deleting too aggressively breaks audits and analytics.

A assigns each a lifecycle: how long to keep in hot storage (fast, expensive), how long in archive (slow, cheap), when to delete permanently. Driven by three factors: **legal** (invoices: 7 years in most countries), **business** (trend analysis on 3 years of sales), **privacy** (PII from a cancelled customer: delete after 30 days).

The lifecycle is a pipeline: **hot** (recent, frequently queried), **warm** (less frequent, cheaper tier), **cold/archive** (rarely queried, cheapest — Glacier, Archive tier), **deleted** (gone). Each transition is automated by policy, not manual.

are an exception: if there's ongoing litigation, data for those records must be frozen — no deletion, even if retention says delete. Freeze lifts when the case closes.

Concrete example pipeline for e-commerce orders: 2 years in hot (OLTP DB), 5 years in warehouse (analytics, aggregated), year 7 → delete from warehouse + replace with anonymized aggregates for historical trends. Separate pipelines for personal identifiers and transactional facts — you can delete the PII long before the transaction records.

Grounded on https://www.dama.org/

Next up

Implementing Data Governance (Frameworks & Maturity)

How to actually roll out governance in a real organization without boiling the ocean. Start small, show value, expand by domain.