There’s a long-running argument among Business Intelligence professionals that’s been around since ETL tools came into existence. It is this: Is it Extraction, Transformation, and Load (ETL) or is it Extraction, Load, and Transform (ELT). Sounds like a play on words, but in translation it dictates how your data movement processes are designed, coded, and supported. The former dictates that you do your transformations/business logic in the data movement processes, while the latter implies that you perform most of your transformations/business logic after landing the data in the database.
You’ll also hear the argument around where to put business logic: in the ETL or the database? This argument has morphed and become more prevalent with the advent of the cloud and the debate over cloud vs on-premises. The days of picking one ETL tool/solution to satisfy the needs of the enterprise are over. A whole host of questions have to be addressed, and the options boiled-down, before one can decide:
- Data warehouse and data lake/big data
- Structured and unstructured data
- Streaming vs data handoff/data extraction
- Cloud vs on-premises.
- New tools/services and traditional ETL
- Explosion in potential data sources
- Business logic storage location
Let’s say you’ve been tasked with having to spend money to build out your data infrastructure.
Where do you begin?
Today’s CIO has difficult choices to make. Our hands are being forced, by the cloud and other considerations. A good example is SQL Server 2008 R2. As it approaches the end of extended support (or life support), many companies will be put in the position of having to either migrate or sunset ETL packages that have been in place longer that some of the support staff have been in the industry!
So, is traditional ETL dead?
Yes, I’d say so. Should I design and create my processes the same way I did ten years ago? The answer is no. The traditional ETL approach is dead (or dying), and we must all change our mindset to adapt. The capabilities of newer tools, languages and services—and the introduction of these items—is taking place at a staggering pace. Selection options have never been more prevalent, (along with the marketing hype). Products such as AWS Glue, Data Factory, SSIS, Boomi, Mulesoft, etc, and even cloud DW products like Snowflake, complicate the decision around where to allocate your limited resources. Now add licensing into the mix. Yikes! Assume that tools, programming languages, and services will come and go, and that platforms will move in the next few years. This means you must have a clearer approach to your selection and that it must map to a longer-term strategic vision, as well as to your business processes.
The days of the all-encompassing/big bang ETL packages are over. They’re too difficult to maintain and are not flexible. My advice: make the processes smaller, more focused, and dedicated in purpose, and keep business-specific logic closer to the database/destination. I Repeat: keep your business logic close to your data. Make the ETL more manageable, more modularized, more agile, and more portable. When you look at your data movement infrastructure/architecture, break-up the process into specific segments and align your ETL towards it:
- Data access/data movement/delivery
- Data transformations
- Data services
- Business-specific logic
Define and split-up your generic transformations from your business logic. If I have a requirement for generic data scrubbing, or a generic process (e.g. name standardization), a host of ETL tasks or cloud micro-services can handle it. Keep the generic transforms separate from the core business processes that are specific to your business or application. ETL processes, in general, are still great at managing the necessary workflow.
I’ve seen many SSIS packages over the years, and recently find myself having to convert a massive number of them. Leverage tools for what they do best. Dumb-down your ETL, have dedicated focus, and leverage its ability to access a wide range of data sources and network resources, coordinate activities/workflow. But store the business logic with the database/destination. With stored procedures (or something similar) that’s your call. What if you want to elevate your database to the cloud, but you won’t have SSIS there for support? If SSIS has to go away, you can more readily replace SSIS with a cloud based ETL, and then have your business logic (in stored procedures) move with the data. The choice is the same if you have a SQL Azure database, and your data is being supplied via data factory (or something similar). Business-specific logic should be close to the database/destination, and non-business transforms may be done with the ETL. So, ETL vs ELT? I’l give you the classic Microsoft answer: it depends. But at this point, I’m more inclined towards ELT.
So, what are you doing at your company? Let Anexinet’s expert consultants help keep your feet on the ground and your BI in the cloud. That’s why we’re here. Because not all platforms are created equal. And regardless of which one you choose, your data will likely remain locked in siloes, making quality a relentless challenge. And while omni-channel thinking is fun to brag about, it remains difficult to implement. So, if your organization needs any help with its Business Intelligence Strategy, please don’t hesitate to reach out to us. We’d love to help you get started.
Business Intelligence Architect