Notes on the architecture of data
Designing the structures that make data trustworthy.
Essays and field notes on data architecture, data modeling, dimensional modeling, data contracts, and the lakehouse — practical writing for data engineers and architects who design systems that make information trustworthy.
Essays & field notes 13 pieces
Essay
Your AI Is Only as Good as Your Data Architecture
Retrieval-augmented generation, AI agents, and LLMs querying your warehouse are all only as reliable as the data beneath them. GenAI doesn't replace data architectur...
02Reconsidered
What GenAI Actually Changes About Data Architecture — and What It Doesn't
Cutting through the hype: GenAI adds vector storage and new retrieval patterns to the data stack, but the fundamentals — structure, quality, governance, ownership — ...
03Field Notes
Slowly Changing Dimensions, Explained Without the Jargon
Slowly changing dimensions answer one question: when a dimension attribute changes, do you overwrite history or preserve it? Here are SCD Types 1, 2, and 3, and exac...
04Field Notes
Data Warehouse vs Data Lake vs Lakehouse: A Clear Comparison
A data warehouse stores structured, modeled data for analytics. A data lake stores raw data of any shape, cheaply. A lakehouse tries to be both. Here's the real trad...
05Field Notes
How to Make a Data Pipeline Idempotent
An idempotent data pipeline produces the same result whether it runs once or five times. Here are the concrete patterns — partition overwrite, merge on keys, delete-...
06Essay
What Is a Semantic Layer, and Why Does Your Data Stack Need One?
A semantic layer is the single, governed place where business metrics are defined once — independent of any dashboard. Here's what it is, what it isn't, and why it f...
07Reconsidered
The Medallion Architecture, Reconsidered
Bronze, silver, gold is a useful default and a dangerous dogma. A second look at what the layers get right, and where they quietly fall apart.
08Field Notes
OLTP vs OLAP: Why You Shouldn't Run Analytics on Your App Database
OLTP systems handle many small transactions fast. OLAP systems scan huge volumes for analysis. They're optimized for opposite things — which is why querying your pro...
09Essay
Data Contracts Are a Cultural Problem
A schema check is the easy 10% of a data contract. The other 90% is an organizational agreement that no YAML file can enforce for you.
10Field Notes
Star Schema vs Snowflake Schema: Which to Use and When
Star schema vs snowflake schema comes down to one decision — whether to normalize your dimensions. Here's the trade-off, and why the star usually wins in a modern wa...
11Field Notes
A Field Guide to Dimensional Modeling
Facts, dimensions, and grain — the three ideas that quietly run most analytics, explained without the dogma.
12Field Notes
Surrogate Keys vs Natural Keys: A Practical Rule
Surrogate key vs natural key is a decision every data model faces. The practical rule: use surrogate keys for dimensions, keep the natural key as an attribute, and h...
13Manifesto
The Shape of Data
Every dataset has a shape. The only question is whether you chose it, or whether it happened to you.