Modern Data Stack for Non-Engineers in Trenches
You have heard about the data revolution. You have encountered some buzzwords, such as big data, a data-driven culture and recently — perhaps — the modern data stack. You don’t have time to dig deep into the subject because you are out there, in the field, with hot deadlines. However, somehow you know this wave of business culture shift is approaching. Take the shortest possible trip with me so I can picture what’s happening.
If you are a shorter-attention-span reader, having other things to do right now and don’t have time even for a short trip, get this essential piece:
At the end of the day you don’t want to just have cool, new, modular, cloud-based IT stuff. You want to perform better. You do this with better business decisions. Better business decisions are based on better use of data. Better use of data is possible by bringing data itself to the most approachable level. Most approachable level means the most efficient IT solutions concerning data. Most efficient IT solutions concerning data — today — mean the modern data stack. The modern data stack… IS THE cool, new, modular, cloud-based IT stuff. Oh.. And it works.
Data is already here
Economics wouldn’t exist without a scarcity of resources. There are limited resources, so everyone at least tries to make rational decisions. Business is also about creating something (profit as the most obvious objective) from rare inputs. The key word here is a decision. It’s nice to make rational ones.
It’s amazing how compelling it is to jump into definitions, axioms and philosophy itself right now to unpack the depth of what I’ve just written. What does it mean to be rational in business and in general, for instance? I declared the trip to be short, though. Some abstraction and flat definitions are a must here.
Better decisions are made not with data. Better decisions are made with better use of data. It’s impossible to avoid exposition to data. Just being here, conscious is collecting data through senses. Business does the same but on the different level. However, the difference between being aware of data being thrown at and doing a useful thing with it is a very different story.
The game of abstractions
In IT engineering a data stack is a collection of tools and solutions concerning data management, put together to work as a one abstract unit. It’s a popular subject today because you don’t need to be a full-scale IT engineer to run some sophisticated pieces of infrastructure to serve your purposes.
Picture this. There were times when you couldn’t effectively use a car without knowing how it works. Fixing it on the spot was regular and quite often. Ask your grandparents and parents. Now you don’t even have to buy a car to drive one and you don’t bother how it works.
The same goes with the IT infrastructure. It’s less and less about mysterious rooms with computers pilled onto each other in the closets. Having easier access to the best solutions without huge upfront costs gets you closer to the idea of creating new insights from massive data you now want to collect and — again — to make better decisions.
At the end of the day you don’t want to just have a cool, new, modular, cloud-based IT stuff. You want to perform better. You do this with better business decisions. Better business decisions are based on better use of data. Better use of data is possible by bringing data itself to the most approachable level. Most approachable level means the most efficient IT solutions concerning data. Most efficient IT solutions concerning data — today — mean the modern data stack. The modern data stack IS THE cool, new, modular, cloud-based IT stuff. Oh.. And it works.
All levels of management, sales, accounting, legal, HR and many more departments have data, will have even more data and make decisions based on some data. The thing is, there is lot of time and money wasted during the process. There are some matured tools and — let’s say — seasoned ways of doing business (does manual entires into multiple databases embedded in Excel spreadsheets on local hard drives ring a bell?). Get rid of silly, repetitive procedures and more valuable outputs can be created.
Most of the biggest businesses have been using the best solutions available at particular times. The data warehouse concept is one of those. Usually regular databases are spread across all departments of closed ecosystems of people and tools — just silos. Compliance in an investment firm may investigate particular collection of assets from the whole pile with its own database, while accounting have its own database. Every one of them has its own so called source of truth. Data warehouse centralizes all of those silos, transforms and unifies data to become the one and only source of truth. The same goes with universal metrics across the company. You have no idea how often I saw struggles defining something that’s obvious (is it, though?), like a customer or an engagement metric, or — closer to my dear financial industry — what is the risk free rate.
There are some neat additional features of data warehouses, but not today…
A decade makes a difference
10 years ago, if you wanted to have a data warehouse you had to pay seven figures in US dollars upfront. Today you can spin up the world-class infrastructure in 5 minutes to see how it works and operate at a fraction of the mentioned upfront level, mostly in pay-as-you-go manner. Add days to to put it into production, if you start from scratch. Big business transformations takes much longer, since switching from legacy on-premise systems to the cloud is a big thing.
10 years ago even hoarding data into one place (even if it was a classic database) was a nightmare. Today specialized companies sell connectors of thousand of flavors so that IT teams won’t spend time on cracking low-level code to get simple sales and customer data from outside vendors.
10 years ago whoever was making a reporting and predictions was using mostly MS Excel and was thinking 3D pie charts were funky. Today specialized business analysts use Tableau, Looker, PowerBI and tens of free, open-source, high-class solutions to perform near real-time analytics.
10 years ago only the biggest players could afford special-ops teams to maintain infrastructure to perform gigantic calculations. Those gigantic calculations are not that useful when you do regular accountings, reporting etc. However, imagine merging usual business data with video data, precise measures from hundreds of sensors (so-called Internet-of-Things) sending signals each second, or streams of data coming from social media platforms. This part of the story may take hours of your time. It means, this is where Artificial Intelligence revolution kicks in. Massive calculations doing very fast created incredible space of possibilities that will be explored (or already have been) in every aspect of a business.
And last but not least, 10 years ago the idea of experiments was about some data science research projects, maybe some online A/B tests conducted by marketing teams or manually curated data, collected through online or offline forms. Today companies are building massive platforms where you can test new solutions applied to a business, without deep statistical knowledge. Those are mainly focused on user interfaces of online businesses, but the blast is spreading. I’ll will touch on the experimental culture aspect in the coming post.
It’s all about triggering superpowers of regular employees
With a bit less concerns about infrastructure constraints the intellectual potential has been additionally channeled into subjects of data governance (access policies and establishing clear ownerships) and overall quality of data.
All of this is, in my opinion, for one thing and one thing only. To build trust around using data at a more advanced level.
The story of the modern data stack from the pure engineering perspective is also exiting. This basically brought me to the field of the analytics engineering.
You are somewhere out there in the trenches on the business field, performing the best way you can, whatever you’re best at. I’ve just succinctly announced that some high-tech reinforcements are just behind you.