dataarchitect.studio

Field Notes

OLTP vs OLAP: Why You Shouldn't Run Analytics on Your App Database

Every data architecture eventually runs into the difference between OLTP and OLAP, usually the hard way: someone runs a big analytical query against the production application database, and the app slows to a crawl for everyone. The two acronyms describe two kinds of database workload that are optimised for opposite things, and understanding why is the foundation under nearly every decision about where analytics should live.

Two opposite jobs

OLTP — Online Transaction Processing — is the workload your application runs. It’s defined by many small, fast operations: a user places an order, updates their profile, adds an item to a cart. Thousands of these happen concurrently, each touching a handful of rows, mixing reads and writes. The database behind your app — Postgres, MySQL, SQL Server — is an OLTP system, and it’s tuned to do this well: insert a row, update a record, look up a single customer by ID, all in milliseconds.

OLAP — Online Analytical Processing — is the workload your analytics runs. It’s defined by few large, complex operations: “total revenue by product category by month for the last three years.” A single such query might scan millions or billions of rows, aggregate them, and join across large tables. There are far fewer of these queries, they’re almost entirely reads, and each one is enormous compared to an OLTP operation.

OLTP is many people each touching a little data. OLAP is a few people each touching a lot of data. Almost every difference between the two follows from that.

Why the same database can’t do both well

The two workloads don’t just differ in size — they pull the underlying design in incompatible directions. Three differences matter most.

Row storage vs column storage. OLTP databases store data by row, because a transaction typically wants a whole record at once (give me everything about this order). OLAP systems store data by column, because an analytical query typically wants one or two columns across a vast number of rows (give me the amount column for ten million orders, and sum it). Reading a single column from row-stored data means touching every row to extract one field — slow. Column storage reads just that column — fast. The storage layout that’s right for one workload is actively wrong for the other.

Normalized vs denormalized. OLTP schemas are normalized — data split across many tables to avoid redundancy and keep writes consistent and cheap. That’s ideal for transactions but painful for analysis, where answering a business question means joining a dozen normalized tables together every time. OLAP systems are denormalized — data deliberately pre-joined into wide tables and dimensional models like star schemas — so analytical queries are simple and fast. Again: opposite choices, each correct for its own job.

Contention. This is the one that bites you in production. A heavy OLAP query scanning millions of rows consumes huge amounts of memory, CPU, and I/O, and can hold locks or saturate the database while it runs. On a dedicated analytical system, fine — that’s what it’s for. On your production OLTP database, that same query starves the fast little transactions your application depends on, and real users feel it: checkouts hang, pages time out. You’ve made your app slow to compute a report.

The architectural answer

Because one system can’t serve both workloads well, the standard architecture is to keep them separate and move data from one to the other. Your application writes to its OLTP database, optimised for transactions. On some cadence — batch jobs, or change data capture streaming changes continuously — that data is copied into a separate analytical store (a warehouse or lakehouse) optimised for OLAP, where it’s reshaped into denormalized, column-stored, analytics-friendly models.

This separation is why the modern data stack looks the way it does. The warehouse isn’t a second copy of your database for no reason — it exists precisely because the analytical workload needed its own home, with the opposite storage model, the opposite schema design, and its own compute that can’t slow down your app. The pipelines that keep it in sync are the bridge between the two worlds.

The rule of thumb

When you catch yourself about to run analytics against a production application database, stop and recognise the workload mismatch. A little reporting on a small app is survivable; real analytics at scale is not. The instinct to “just query the prod DB” is the instinct that takes the site down at month-end close.

Keep transactions on OLTP. Keep analytics on OLAP. Move data deliberately between them. It looks like more infrastructure than necessary right up until the moment a single analyst’s query would have frozen your checkout flow — and then it looks like exactly the right amount.