The switch from ETL (Extract, Transform, Load) to EL>T (Extract, Load, decoupled from Transform) is not sudden, or new. It has been in works for some time, partly driven by business needs, and partly due to modernizing of data architectures.
The concept of ELT really took root in late 1980s/early 1990s when organizations started centralizing their data in data warehouses. They needed tools to “extract” data from multiple source systems, “transform” it to make it compatible with the target system format, and then “load” it into the target repository. Since traditional data warehouses were the target repository, source data was transformed into relational format before it could be ingested/ loaded into the data warehouse.
Extract Source Data ==> Transform Source Data ==> Load Data into Data Mart ==> Consume Data
Although the ETL approach offered clean, standardized, quality data that was ready for analysis, to data consumers it also had its own challenges. Extracting, transforming and loading data required specialized resources—programmers, engineers, IT personnel – to manually code ETL scripts, which was expensive. Because the data needed to be cleansed and transformed before getting loaded into the data warehouse, data access, and thereby business insights, were slow. And growing data volumes meant more compute power for ETL processing, and the loss of data granularity to ensure performance.
Emergence of data lakes in 2000s initiated the shift from ETL to EL>T. Designed to store large volumes of diverse data quickly and cheaply, data lakes didn’t require data to be structured. Businesses could extract and load raw data quickly into the centralized repository, allowing business users to transform data into the right structure at the time of analysis.
Even though data lakes initiated the shift to ELT – the more recent cloud adoption has really accelerated it. Cloud-based data lakes and data warehouses offer near-endless storage and extensive processing power, allowing organizations to extract and load all data they may need, in near real time. And then use the processing power of the platform itself, be it Amazon Redshift, Snowflake, Azure Synapse, Databricks, Amazon EMR, or more, to transform data for multiple use cases and user groups – BI, analytics, or predictive modeling.
Extract Source Data ==> Load Data into the Cloud DL / DWH ==> Transform(s) for Multiple Use-cases
The Beauty of EL>T Is Two-Fold
One: Data (all sources, and types) is now available to data consumers/ business users in near real-time. There is no time lost on first formatting and structuring data. From data consumers point, it makes sense too. They don’t have to know what they are going to do with the data from the start.
Two: Organizations have the flexibility to perform multiple transformations, serving different use-cases at the same time. Using the processing power of their cloud data platforms, they can quickly and easily process data for different user groups and use-cases, enabling more data exploration and experimentation.
With ETL, the processing is 1:1. Data is extracted, transformed and loaded for a specific, highly targeted use case. If the transformed data doesn’t work for this previously defined business use-case, or business needs more data to be added, you need to go back to the drawing board, process data again. ELT, on the other hand, gives organizations the ability to transform data on the fly. Its 1:many approach to transformation is better aligned with today’s agile development approaches.
That said, reports of the demise of ETL are premature, just like the FUD (fear, uncertainty, and doubt) around ELT is exaggerated. Both approaches have a place depending on your need and use case(s). And the likelihood is that you as a business are leveraging both approaches.
Irrespective of the approach you take, Qlik can help. With our industry leading capabilities in real time data ingestion through Change Data Capture, Data Lake and Data Warehouse Automation, cataloging, as well as support for multi-cloud environments, we can automate the entire data integration process, making it quicker, easier and secure.