Analytics Engineer: a Former Furious Analyst
What the hell is an analytics engineer? One thing is certain — it’s a nice name repackage, addressing today’s needs for pinpointing roles in the expanding data ecosystem. The backstory is: when you raise capital at $1,5 billion valuation and there are rumors that another raise would bring $6 billion level, you can name things however you like. People will definitely notice.
The title was coined by two schools. One states that an analytics engineer is a pissed off analyst or data scientist. The second is broadcasted by DBT Labs — the already mentioned unicorn in the VC world.
What situation an analyst boils most often at? During mission critical task the business logic is always checked, tooling: checked, no-distractions window: checked. Just get the data and go…
Ah, the data. There are two major pitfalls. The data isn’t there or the data is no good.
In both cases an analyst needs to get to the source. This is the problem part of the story. Data ecosystems are in vast majority based on one process: Extract — Transform — Load. ETL for short.
There are modern varieties like ELT, ETLT, but I won’t get into details right now.
Normally a data engineer provides a platform where data can be reached. When something goes wrong — and remember, we have mission critical situation here — the data engineer is not there. Not that she or he has better things to do. Unless she or he has… Maintaining a living system is hard work. Interventions like in our case often may not be performed on the spot.
There is no clear answer who is closer to the imaginary middle point. Is it a data engineer who can learn analytics stuff and the visualization art fast or a data analyst who can learn the tooling and the art of handling data where it’s raw. And dirty. And all over the place. The land where data is raw used to be associated with reaching basically bare metal. Meaning servers — the real infrastructure. Today, cloud solutions abstract away most of this part so you can manage your remote world-class infrastructure with more ease.
For the analysts equipped with new skills and tooling reaching the exact sources of raw data is still not that easy, though. There is no way to hook up the analyst’s system environment just like that into broader data stack. Managing data systems is like navigating an oil tanker. In our case, for a major new feature, it’s at least 6 months of preparations and 6 more months for an implementation. The other burden is compliance. There is no way an analyst would get into the raw space without establishing clear and safe access policy.
What the analyst can do in the short term? It turned out the biggest returns may be created after ETL process is completed. There is no strong need to dig very deep into the system. The data is put in a central place, like a data warehouse. It’s a quite friendly environment.
It’s natural for the analyst to reach data there. However, it still needs to be thoroughly curated to be valuable. The open-source community focused around DBT Labs created a tool. Data built tool, or dbt for short. It takes care of the “Transform” part of the ETL process. Transformations may be performed not within an ETL task, but after it’s completed. Just switch ETL to ELT or create ETLT and you get the idea (it’s about the sequence). This is still engineering, but inside a safe area. No angry calls from the infrastructure team for wandering around inside fragile parts of the system.
The detailed story may go on. To wrap up, an analytics engineer is an alloy of a data engineer and a data analyst / data scientist. Since it is a new actor within the data ecosystem, precise duties varies across organizations.