logo
episode-header-image
Feb 2022
24m 34s

Column by your name: The analytics datab...

The Stack Overflow Podcast
About this episode

These days, every company looking at analyzing their data for insights has a data pipeline setup. Many companies have a fast production database, often a NoSQL or key-value store, that goes through a data pipeline.The pipeline process performs some sort of extract-transform-load process on it, then routes it to a larger data store that the analytics tools can access. But what if you could skip some steps and speed up the process with a database purpose-built for analytics?

On this sponsored episode of the podcast, we chat with Rohit (Ro) Amarnath, the CTO at Vertica, to find out how your analytics engine can speed up your workflow. After a humble beginning with a ZX Spectrum 128, he’s now in charge of Vertica Accelerator, a SaaS version of the Vertica database. 

Vertica was founded by database researcher Dr. Michael Stonebreaker and Andrew Palmer. Dr. Stonebreaker helped develop several databases, including Postgres, Streambase, and VoltDB. Vertica was born out of research into purpose-built databases. Stonebreaker’s research found that columnar database storage was faster for data warehouses because there were fewer read/writes per request. 

Here’s a quick example that shows how columnar databases work. Suppose that you want all the records from a specific US state or territory. There are 52 possible values here (depending on how you count territories). To find all instances of a single state in a row-based DB, the search must check every row for the value of the state column. However, searching by column is faster by an order of magnitude: it just runs down the column to find matching values, then retrieves row data for the matches. 

The Vertica database was designed specifically for analytics as opposed to transactional databases. Ro spent some time at a Wall Street firm building reports—P&L, performance, profitability, etc. Transactions were important to day-to-day operations, but the real value of data came from analyses that showed where to cut costs or increase investments in a particular business. Analytics help with overall strategy, which tends to be more far-reaching and effective. 

For most of its life, Vertica has been an on-premises database managing a data warehouse. But with the ease of cloud storage, Vertica Accelerator is looking to give you a data lake as a service. If you’re unfamiliar, data lakes take the data warehouse concept—central storage for all your data—and remove limits. You can have “rivers” of data flowing into your stores; if you go from a terabyte to a petabyte overnight, your cloud provider will handle it for you. 

Vertica has worked with plenty of industries that push massive amounts of data: healthcare, aviation, online games. They’ve built a lot of functionality into the database itself to speed up all manner of applications. One of their prospective customers had a machine learning model with thousands of lines of code that was reduced to about ten lines because so much was being done in the database itself. 

In the future, Vertica plans to offer more powerful management of data warehouses and lakes, including handling the metadata that comes with them. To learn more about Vertica’s analytics databases, check out our conversation or visit their website.

Up next
Today
There is no golden path anymore: Engineering practices are being rewritten
In this episode of Leaders of Code, Ben Matthews, Senior Director of Engineering at Stack Overflow, and Loïc Houssier, CTO at Superhuman, dive into how engineering teams can navigate paradigm shifts in a world of constant technological change. They discuss the importance of leade ... Show More
36m 43s
Jul 8
Attention isn’t all we need; we need ownership too
NEAR is the blockchain for AI, enabling AI agents to transact freely across networks.Connect with Illia on LinkedIn and X, and read the original Transformers paper that Illia co-authored in 2017.Today’s shoutout goes to Populous badge winner Adi Lester for answering the question ... Show More
36m 32s
Jul 4
Why call one API when you can use GraphQL to call them all?
Apollo GraphQL lets you orchestrate APIs with a composable, declarative, self-service model. Apollo's MCP Server is now available.Connect with Matt on LinkedIn.Today we’re shouting out a Famous Question badge winner, user jkfe, for their question How to hide/show thymeleaf fields ... Show More
25m 45s
Recommended Episodes
Nov 2022
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase
Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how ... Show More
59m 25s
May 2022
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore
Summary A large fraction of data engineering work involves moving data from one storage location to another in order to support different access and query patterns. Singlestore aims to cut down on the number of database engines that you need to run so that you can reduce the amou ... Show More
41m 22s
Feb 2020
Data Modeling That Evolves With Your Business Using Data Vault
Summary Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling strategy that provides them with flexibility and speed ... Show More
1h 6m
Sep 2018
Data Engineering
If you’re a data scientist, you know how important it is to keep your data orderly, clean, moving smoothly between different systems, well-documented… there’s a ton of work that goes into building and maintaining databases and data pipelines. This job, that of owner and maintaine ... Show More
16m 22s
Oct 2021
On Graph Databases | The Backend Engineering Show
I get a lot of emails asking me to talk about graph databases, so I want to start researching them, but I wanted to give you guys the framework of how I think about any databases to defuse any “magic” that might be there. In this video, I discuss what constrains a database and ho ... Show More
22m 27s
Jun 2021
Accelerating ML Training And Delivery With In-Database Machine Learning
Summary When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the ... Show More
1h 5m
Aug 2022
An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications
Summary Data has permeated every aspect of our lives and the products that we interact with. As a result, end users and customers have come to expect interactions and updates with services and analytics to be fast and up to date. In this episode Shruti Bhat gives her view on the ... Show More
1h 6m
Jun 2020
Bringing Business Analytics To End Users With GoodData
Summary The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence u ... Show More
52m 24s
Apr 2022
#83 Empowering the Modern Data Analyst
As data volumes grow and become ever-more complex, the role of the data analyst has never been more important. At the disposal of the modern data analyst, are tools that reduce time to insight, and increase collaboration. However, as the tools of a data analyst evolve, so do the ... Show More
37m 1s
Jan 2024
SingleStore CEO on High-Speed Database Currents
Enterprise data architecture is highly complex, databases deeply fragmented and demand for high-speed information flows continues to grow. In this edition of the Tech Distruptors podcast, SingleStore CEO Raj Verma joins Sunil Rajgopal, Bloomberg Intelligence senior software analy ... Show More
47m 26s