dataarchitect.studio

Field Notes

ETL vs ELT: The Same Three Letters in a Different Order

ETL and ELT are the same three steps — extract, transform, load — in a different order. ETL transforms the data before loading it into the warehouse; ELT loads the raw data first and transforms it inside the warehouse. Swapping the order of two letters sounds like the kind of distinction only a consultant could love, but the reorder isn’t cosmetic. It encodes a genuine shift in where transformation happens, who owns it, and what becomes possible — and understanding it explains most of how the modern data stack got its shape.

The only difference

Both approaches extract data from source systems. The fork is what happens next.

ETL — extract, transform, load. Pull the data out, run it through a separate processing engine that cleans, reshapes, and aggregates it, and load the finished result into the warehouse. The warehouse only ever sees data that’s already been transformed into its final form.

ELT — extract, load, transform. Pull the data out and load it into the warehouse raw, more or less as it arrived. Then transform it in place, using the warehouse’s own compute — typically with SQL — to build the cleaned, modeled tables your analysts use.

That’s the whole technical distinction. But notice what moves when you swap the order: in ETL, transformation lives in a dedicated engine outside the warehouse; in ELT, transformation lives inside it.

Why ETL came first

ETL was the default for decades for a sound reason: warehouse storage and compute were expensive and finite. When every gigabyte and every CPU cycle in the warehouse cost real money, you couldn’t afford to dump raw data in and sort it out later. You transformed and aggregated first, in a separate ETL server, so that only the clean, necessary, final data consumed precious warehouse space. The cost structure of the hardware made transform-before-load the rational choice.

Why ELT took over

Then the cloud warehouse changed the economics underneath the whole decision. Storage became cheap, compute became elastic and separable, and columnar engines made large-scale in-warehouse transformation fast. Suddenly the old reason to transform first — don’t waste expensive warehouse resources — mostly evaporated, and loading raw data became not just feasible but advantageous.

ETL made sense when the warehouse was the scarcest resource in the building. ELT makes sense when the warehouse is cheap, elastic, and the most capable processing engine you own. The order flipped because the economics flipped.

The benefits of ELT compound. You load raw data fast, decoupling ingestion from transformation. You keep full fidelity — the raw data stays available, so you can reprocess it when your logic changes or a bug surfaces, the same instinct behind the immutable raw layer in the medallion architecture. And you transform in SQL, the language analysts already speak, which is what let analytics engineers — not just specialized data engineers — own the transformation layer. The modern, cloud-warehouse-and-lakehouse stack is built on this ELT foundation.

The real shift: where transformation lives

Frame ETL-versus-ELT only as “order of operations” and you miss the point. The deeper change is about location and ownership. In the ETL world, transformation happened in a separate engine, often owned by a specialized team, and the warehouse received data already shaped — which also meant the raw data frequently wasn’t kept, so the transformation was effectively irreversible. In the ELT world, transformation is version-controlled SQL running in the warehouse, accessible to a broader set of people, with the raw data preserved underneath as a safety net you can always rebuild from.

That’s why ELT didn’t just change pipelines; it changed who does the work. It pulled transformation out of a niche engineering specialty and into the hands of anyone fluent in SQL.

When ETL still wins

ELT is the modern default, but defaults have exceptions, and ETL remains genuinely correct in specific cases — almost all of which share one trait: something must happen to the data before it’s allowed to land.

  • Compliance and sensitive data. This is the big one. If regulations forbid raw personal, health, or financial data from sitting in your warehouse, you cannot load it raw and clean it later — you must mask, tokenize, or drop those fields before loading. That’s transform-before-load by definition: ETL.
  • Transformations that don’t fit SQL or the warehouse. Heavy non-relational logic, specialized processing, or certain machine-learning feature pipelines may be better suited to a dedicated engine than to in-warehouse SQL.
  • Constrained targets. Loading into a system that can’t economically store or process raw volume pushes the transformation back upstream.

Outside cases like these, on a modern cloud warehouse, ELT is almost always the better default.

The synthesis

So ETL versus ELT isn’t a war with a winner; it’s a decision that the cloud quietly re-defaulted. For most analytics on a modern warehouse, load raw and transform in place — ELT — because the economics that justified transforming first are gone, and keeping raw data buys you fidelity and flexibility. Reach for ETL when something genuinely must be done to the data before it lands, compliance most of all.

It also fits neatly with the rest of the modern toolkit: change data capture streams raw changes out of source systems, you land them in the warehouse, and you transform them there — extract, load, transform, with the loading cadence chosen separately. And once the data is loaded and transformed, what you’re usually building is the dimensional models the warehouse exists to serve — the consumption layer that, however you architect the warehouse itself, is where analysts actually live.

Same three letters, different order — and that small reorder carries the whole story of how data warehousing moved from scarce and expensive to cheap and elastic.

Common questions

What is the difference between ETL and ELT?

The order of the last two steps. ETL extracts data, transforms it in a separate processing engine, then loads the cleaned result into the warehouse. ELT extracts data, loads it raw into the warehouse first, then transforms it there using the warehouse's own compute. ETL transforms before landing; ELT transforms after.

Is ELT replacing ETL?

For most cloud-warehouse analytics, ELT has become the default, because cheap storage and elastic compute made it practical to load raw data and transform it in place with SQL. But ETL isn't obsolete — it remains the right choice when data must be transformed before it lands, such as masking sensitive fields, or when transformation doesn't fit SQL.

When should you use ETL instead of ELT?

Use ETL when you cannot load raw data as-is — most often for compliance, where personal or sensitive data must be masked or dropped before it reaches the warehouse — or when transformations are complex enough that a dedicated processing engine suits them better than in-warehouse SQL.