Field Notes
The Grain of a Fact Table: The First Decision That Decides Everything Else
The grain of a fact table is the business definition of what exactly one row represents. Not the list of columns — a sentence: one row per order line, one row per account per month, one row per boarding-pass scan. Declaring that sentence is the first design decision in dimensional modeling, made before you pick a single dimension or measure, because it is the test every later choice must pass. A dimension belongs in the table only if it’s single-valued at that grain; a measure belongs only if it’s true at that grain. Skip the declaration and both mistakes walk in unnoticed — which is why nearly every double-counting bug in a warehouse traces back to a grain violation.
Kimball’s design method makes this order explicit: choose the business process, declare the grain, then — and only then — choose dimensions and facts. The grain sits second because everything downstream inherits from it.
Atomic grain first, always
The safest declaration is the most atomic one — the finest level of detail the source’s measurement event actually produces. For a retail order that’s the order line, not the order; for web traffic, the page view, not the session.
| Atomic grain (order line) | Coarse grain (order) | |
|---|---|---|
| “Sales by product?” | Yes — product is a line attribute | Impossible — products vary within an order |
| Unanticipated questions | Survives them | Fails the first new one |
| Roll up / drill down | Aggregates up freely | Can never drill back down |
| Row volume | Larger | Smaller |
| Right role | The foundation fact table | A derived aggregate on top |
The asymmetry is the whole argument: atomic data can always become coarse, coarse data can never become atomic again. If query cost hurts, add an aggregate table (or let a semantic layer manage one) — as a second table with its own declared grain, not a compromise in the first.
The classic bug: two grains in one table
Here’s how the mistake usually arrives. Orders have lines (product, quantity, price)
— but the source also carries an order-level shipping fee. Someone helpfully adds
shipping_fee to the line-grain table, repeated on every line:
-- Grain: one row per order line ... except one column isn't
SELECT
order_id, product_key, quantity,
line_amount, -- true at line grain ✓
shipping_fee -- true at ORDER grain ✗ (repeated on every line)
FROM fact_order_line;
-- The inevitable report, quietly wrong:
SELECT sum(line_amount) AS revenue,
sum(shipping_fee) AS shipping -- 3-line orders count fees 3x
FROM fact_order_line;
The fix is never “remember not to sum that column.” It’s one of two structural moves:
-- Fix 1: allocate the order fact down to line grain
shipping_fee * line_amount / sum(line_amount) OVER (PARTITION BY order_id)
AS allocated_shipping_fee
Allocation keeps one table and makes every column summable; a second fact table keeps each measurement honest at its natural grain. Both work. Hoping people remember doesn’t.
Grain declares the fact table’s type
Say the grain sentence out loud and you’ve usually also named which of the three fact table types you’re building: one row per event is a transaction fact, one row per thing per period is a periodic snapshot, one row per process lifecycle is an accumulating snapshot. Even factless fact tables — rows with no measures at all — still have a perfectly crisp grain; the row itself is the fact.
The grain also disciplines dimensions. Ask of each candidate: is it single-valued at this grain? Product is single-valued per order line but multivalued per order — so a line grain earns the product dimension and an order grain forfeits it. That one test, applied ruthlessly, is most of what separates a star schema that holds up from a wide table that slowly stops being true.
Declare the grain in one sentence. Write it at the top of the model file. Reject anything that violates it. It is the cheapest correctness guarantee in all of data architecture.
Common questions
What is the grain of a fact table?
The grain is the precise business definition of what a single row in the fact table represents — for example, 'one row per order line' or 'one row per account per month.' It's declared in business terms, before choosing dimensions or measures, and every element of the table must honor it.
How do I choose the right grain?
Start at the most atomic grain the source system produces — the individual measurement event, like an order line or a sensor reading. Atomic grain answers unpredictable questions and can always be aggregated up; a coarser grain can never be drilled back down. Only pre-aggregate as a deliberate performance layer on top.
Can one fact table have two grains?
No — mixing grains in one table is the classic dimensional modeling bug. If a business process produces measurements at two grains (order lines and order-level shipping fees, say), either allocate the coarser fact down to the fine grain or build two fact tables. Never store both grains in the same table.
What's the difference between grain and granularity?
In dimensional modeling they're used interchangeably, but 'grain' is the Kimball term of art for the declared meaning of one row. 'Granularity' more loosely describes how detailed data is. The discipline lies in the declaration: writing the grain down as a sentence and rejecting anything that violates it.
Essays by email
One new essay on data architecture, straight to your inbox. No noise, unsubscribe anytime.