The BI world is rife with featureful, visually stunning, and well-marketed solutions designed to automate and simplify business intelligence. So why does Prominence do all its data modeling in plain old SQL? And why do we think you should, too? Read on!
What do SQL Server, Oracle, MySQL, PostgreSQL, Spark, Hive, Presto, Snowflake, Azure Synapse, AWS Redshift, Google BigQuery, … and on and on … , have in common? You guessed it—SQL.
All of these platforms support SQL as their go-to language for data modeling, transforming and aggregation. The ability to run SQL on so many different systems and in so many different contexts makes portability perhaps the most important virtue of SQL. Portability means choices. Portability means freedom. Portability is the opposite of lock-in.
“Sequel” or “S-Q-L”? As much as we’d love to settle the matter for you, the truth is, there doesn’t seem to be much consensus and ardent camps on both sides of the issue. Choose the pronunciation that works for you! That said, your author tends to use both pronunciations interchangeably—sometimes in the same sentence—so maybe you should look elsewhere for advice on this topic.
As a practical matter, this means the business-critical dimensions, metrics, and logic you build on one of these platforms—the lifeblood of a data-driven organization—are fundamentally portable across all of these platforms.
Not to suggest it’s copy-paste-done, SQL implementations still have their variations, but it’s worlds easier than migrating to or from a proprietary system with a proprietary language or the pipes and boxes of a data flow GUI.
Ask yourself: “Where do I want to spend my time and budget?” Migrating old work off of proprietary legacy systems or building new and novel solutions?
SQL has won its place in the big data world, with the most popular big-data processing technologies—Spark, Impala, Hive, Presto, BigQuery, etc.—and all of the cloud warehouses supporting, and in some cases exclusively supporting, SQL as the data processing language.
For anything from a single file backing a personal organization system (SQLite) to on-premises transaction processing and data warehousing (SQL Server, Oracle, Postgres) to petabyte-scale data crunching across a 100-node cloud cluster (SparkSQL), SQL runs at all scales.
As your analytics adoption matures, you will grow to tackle increasingly ambitious projects, and your solutions will naturally become larger and more complex. Where you once thought only of how to get the right data into the right hands, you’ll now have to tackle the ancillary complexities of managing a large analytics ecosystem. This is where tools like version control (git), Continuous Integration and Delivery (CI/CD), automated testing, sane deployment and rollback controls, and external automation evolve from nice-to-have to absolutely essential.
It’s at this level of complexity that SQL shines and its competitors falter. Among the most common complaints we hear about proprietary systems and applications is that they don’t play nicely with the world of tools designed to help data engineers do their jobs. In fact, many such tools represent walled gardens that are strictly limited to the features the vendor chose to ship with and are entirely closed off to the vast and mature world of code management and DevOps tools. SQL, which is coded in plain text after all, works seamlessly with all of these systems, allowing your data engineering practices to grow in pace with your analytics ambitions.
It can seem like everything in the software world is short-lived—you’ve barely caught wind of the hot new technology, and someone’s already written a blog post about how it’s already obsolete!
Sure, that’s true for a lot of technologies, but a look closer reveals a handful of exceptions, and SQL is a stand out.
SQL’s been around since 1974, standardized since 1986, and a predominant technology for decades now. And it’s showing no sign of slowing down. If anything, the opposite is true, with all the cutting edge cloud BI technologies proudly advertising their support for this 47-year-old technological neanderthal.
On its face, the dream of self service is one of empowered data citizens mixing and matching data, discovering insights, and revolutionizing business models before lunch without so much as touching IT’s ticketing system!
Naturally, this dream is more than a little over-simplified. The reality is that self-service users need access to curated, well-documented, well-designed data sources that have been explicitly and expertly modeled to facilitate said mixing and matching, insight-creating, and business revolutionizing.
In our experience, this modeling is still well within the realm of experienced and trained data architects. And you can probably guess what tool the data architects we know reach for.
Okay, admittedly, not everything can be done in SQL, nor should it. In the realm of data modeling, we believe that SQL is king. For some data engineering and many data science tasks, tools like Python and R can reach places SQL cannot.
The good news is it’s not either/or – it’s yes+and! SQL for relational modeling and data aggregation, and all your favorite Python libraries for data science and machine learning. If they’re lucky, your data scientists will be working on nice, clean data prepped into tables using SQL! (If they’re unlucky, they’ll be doing their own ad-hoc prep in spreadsheets.)
The best news is that many modern, notebook-centric technologies (like Databricks) allow data engineers to seamlessly switch from one to the other. It’s chocolate and peanut butter, all in one delicious notebook.
Convinced? Unconvinced? Reach out and let us know. We’d love to learn more about your data struggles and data aspirations and to see how Prominence can help you reach the summit on your data journey!