episode-header-image

May 2022

41m 22s

A Multipurpose Database For Transactions...

About this episode

Summary

A large fraction of data engineering work involves moving data from one storage location to another in order to support different access and query patterns. Singlestore aims to cut down on the number of database engines that you need to run so that you can reduce the amount of copying that is required. By supporting fast, in-memory row-based queries and columnar on-disk representation, it lets your transactional and analytical workloads run in the same database. In this episode SVP of engineering Shireesh Thota describes the impact on your overall system architecture that Singlestore can have and the benefits of using a cloud-native database engine for your next application.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at dataengineeringpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan.
Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
Your host is Tobias Macey and today I’m interviewing Shireesh Thota about Singlestore (formerly MemSQL), the industry’s first modern relational database for multi-cloud, hybrid and on-premises workloads

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what SingleStore is and the story behind it?
The database market has gotten very crouded, with different areas of specialization and nuance being the differentiating factors. What are the core sets of workloads that SingleStore is aimed at addressing?
- What are some of the capabilities that it offers to reduce the need to incorporate multiple data stores for application and analytical architectures?
What are some of the most valuable lessons that you learned in your time at MicroSoft that are applicable to SingleStore’s product focus and direction?
Nikita Shamgunov joined the show in October of 2018 to talk about what was then MemSQL. What are the notable changes in the engine and business that have occurred in the intervening time?
- What are the macroscopic trends in data management and application development that are having the most impact on product direction?
For engineering teams that are already invested in, or considering adoption of, the "modern data stack" paradigm, where does SingleStore fit in that architecture?
- What are the services or tools that might be replaced by an installation of SingleStore?
What are the efficiencies or new capabilities that an engineering team might expect by adopting SingleStore?
What are some of the features that are underappreciated/overlooked which you would like to call attention to?
What are the most interesting, innovative, or unexpected ways that you have seen SingleStore used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on SingleStore?
When is SingleStore the wrong choice?
What do you have planned for the future of SingleStore?

Contact Info

LinkedIn
@ShireeshThota on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Up next

High Performance And Low Overhead Graphs With KuzuDB

SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. ... Show More

Bridging Data and Decision-Making: AI's Role in Modern Analytics

SummaryIn this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds ... Show More

From Bits to Tables: The Evolution of S3 Storage

SummaryIn this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates ... Show More

Recommended Episodes

Shorten the distance between production data and insight

Modern networked applications generate a lot of data, and every business wants to make the most of that data. Most of the time, that means moving production data through some transformation process to get it ready for the analytics process. But what if you could have in-app analy ... Show More

On Graph Databases | The Backend Engineering Show

I get a lot of emails asking me to talk about graph databases, so I want to start researching them, but I wanted to give you guys the framework of how I think about any databases to defuse any “magic” that might be there. In this video, I discuss what constrains a database and ho ... Show More

How Important are algorithm and data structures in backend engineering?

Algorithms & Data Structures are critical to Backend Engineering however it really depends on what kind of application and infrastructure you are building. In this video I want to go through the following 1 Backend Engineers are two types - Integrating Existing Backend - Core ... Show More

Introduction to GraphQL

Tanmai Gopal (@tanmaigo, CEO Hasura) and Rajoshi Ghosh (@rajoshighosh, COO Hasura) talk about the evolution of GraphQL as an efficient way to engage with APIs and data models, and how Hasura Cloud helps simplify GraphQL for developers.SHOW: 462 SHOW SPONSOR LINKS:Datadog Security ... Show More

2476: ThoughtSpot - How AI Analytics is Redefining Business Intelligence

In the rapidly evolving world of data analytics, staying ahead of the curve is essential. Today on Tech Talks Daily, I'm thrilled to have Sumeet Arora from ThoughtSpot to walk us through their game-changing announcements. ThoughtSpot is already renowned for its advanced analytics ... Show More

Making the Turn from Data Inventory to Helpful Information with Mara Reiff, the Chief Data Officer of FreshBooks

If data is in a pool that only keeps getting deeper as data inventory is accounted for, when is the exact moment for a business leader to jump in to do something with all the accumulated information? Leaders who care about data appreciate that it’s necessary to take stock before ... Show More

Buying and Selling Homes Algorithmically with Opendoor’s VP of Research and Data Science, Kushal Chakrabarti

For many people, the process of buying and selling a home will undoubtedly be the most difficult decisions they will make in their lifetime. Is the price you’re paying for your home fair? Is the price you’re selling your home for an adequate sale price? For a long time, realtors ... Show More

Time Plus Data Equals Efficiency with Paul Dix, the Founder and CTO of InfluxData and the Creator of InfluxDB

If the topic of databases is brought up to certain people, their eyes may gloss over. But if that happened, that would be because they just don’t know the awesome power of databases. Data can be valuable but only if it is contextualized, and time is an extremely relevant aspect t ... Show More

Listen to millions of songs and podcasts on Anghami