dataarchitect.studio

Notes on the architecture of data

Designing the structures that make data trustworthy.

Essays and field notes on data architecture, data modeling, dimensional modeling, data contracts, and the lakehouse — practical writing for data engineers and architects who design systems that make information trustworthy.

01

Essay

Your AI Is Only as Good as Your Data Architecture

Retrieval-augmented generation, AI agents, and LLMs querying your warehouse are all only as reliable as the data beneath them. GenAI doesn't replace data architectur...

May 31, 2026 5 min
02

Reconsidered

What GenAI Actually Changes About Data Architecture — and What It Doesn't

Cutting through the hype: GenAI adds vector storage and new retrieval patterns to the data stack, but the fundamentals — structure, quality, governance, ownership — ...

May 31, 2026 5 min
03

Field Notes

Slowly Changing Dimensions, Explained Without the Jargon

Slowly changing dimensions answer one question: when a dimension attribute changes, do you overwrite history or preserve it? Here are SCD Types 1, 2, and 3, and exac...

May 31, 2026 6 min
04

Field Notes

Data Warehouse vs Data Lake vs Lakehouse: A Clear Comparison

A data warehouse stores structured, modeled data for analytics. A data lake stores raw data of any shape, cheaply. A lakehouse tries to be both. Here's the real trad...

May 30, 2026 5 min
05

Field Notes

How to Make a Data Pipeline Idempotent

An idempotent data pipeline produces the same result whether it runs once or five times. Here are the concrete patterns — partition overwrite, merge on keys, delete-...

May 29, 2026 6 min
06

Essay

What Is a Semantic Layer, and Why Does Your Data Stack Need One?

A semantic layer is the single, governed place where business metrics are defined once — independent of any dashboard. Here's what it is, what it isn't, and why it f...

May 28, 2026 4 min
07

Reconsidered

The Medallion Architecture, Reconsidered

Bronze, silver, gold is a useful default and a dangerous dogma. A second look at what the layers get right, and where they quietly fall apart.

May 27, 2026 4 min
08

Field Notes

OLTP vs OLAP: Why You Shouldn't Run Analytics on Your App Database

OLTP systems handle many small transactions fast. OLAP systems scan huge volumes for analysis. They're optimized for opposite things — which is why querying your pro...

May 23, 2026 5 min
09

Essay

Data Contracts Are a Cultural Problem

A schema check is the easy 10% of a data contract. The other 90% is an organizational agreement that no YAML file can enforce for you.

May 19, 2026 5 min
10

Field Notes

Star Schema vs Snowflake Schema: Which to Use and When

Star schema vs snowflake schema comes down to one decision — whether to normalize your dimensions. Here's the trade-off, and why the star usually wins in a modern wa...

May 12, 2026 4 min
11

Field Notes

A Field Guide to Dimensional Modeling

Facts, dimensions, and grain — the three ideas that quietly run most analytics, explained without the dogma.

May 06, 2026 5 min
12

Field Notes

Surrogate Keys vs Natural Keys: A Practical Rule

Surrogate key vs natural key is a decision every data model faces. The practical rule: use surrogate keys for dimensions, keep the natural key as an attribute, and h...

Apr 29, 2026 4 min
13

Manifesto

The Shape of Data

Every dataset has a shape. The only question is whether you chose it, or whether it happened to you.

Apr 22, 2026 4 min