About this episode
Summary
Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities. In this episode Jillian Rowe shares her experience of working in the field and supporting teams of scientists and analysts with the data infrastructure that they need to get their work done. This is a fascinating exploration of the collaboration between data professionals and scientists.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
- Your host is Tobias Macey and today I’m interviewing Jillian Rowe about data engineering practices for bioinformatics projects
Interview
- Introduction
- How did you get involved in the area of data management?
- How did you get into the field of bioinformatics?
- Can you describe what is unique about data needs in bioinformatics?
- What are some of the problems that you have found yourself regularly solving for your clients?
- When building data engineering stacks for bioinformatics, what are the attributes that you are optimizing for? (e.g. speed, UX, scale, correctness, etc.)
- Can you describe a typical set of technologies that you implement when working on a new project?
- What kinds of systems do you need to integrate with?
- What are the data formats that are widely used for bioinformatics?
- What are some details that a data engineer would need to know to work effectively with those formats while preparing data for analysis?
- What amount of domain expertise is necessary for a data engineer to work in life sciences?
- What are the most interesting, innovative, or unexpected solutions that you have seen for manipulating bioinformatics data?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on bioinformatics projects?
- What are some of the industry/academic trends or upcoming technologies that you are tracking for bioinformatics?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Support Data Engineering Podcast
Nov 16
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
Summary <br />In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, ... Show More
51m 46s
Nov 9
The AI Data Paradox: High Trust in Models, Low Trust in Data
Summary<br />In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in ... Show More
51m 35s
Nov 2
Bridging the AI–Data Gap: Collect, Curate, Serve
SummaryIn this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand ... Show More
50m 40s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Sep 9
Leading across technical domains, strategic deep-dives & applying your skills in new industries w/ Simone Kalmakis #231
<p>How do you apply your leadership skills to a new, mission-driven industry and effectively lead teams across multiple technical domains? In this episode, Simone Kalmakis (VPE @ Viam) shares her playbook for successfully transitioning between industries from health-tech and clim ... Show More
43m 17s
Apr 2025
Specialized AI brains for physical industry
Everyone wants a piece of general purpose models. Instacart has deployed ChatGPT for recipes and meal planning. The Mayo Clinic is using it to summarize patient records. Schneider Electric is using an OpenAI LLM to generate sustainability reports. With such powerful models, what’ ... Show More
37m 2s
Jul 2022
IoT, IIoT and Managing Edge Data
<p>Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.</p><p><b>SHOW: 634</b></p><p><b>CLOUD NEWS OF THE WEEK - </b><a href='http://bit.ly/cloudcast-cnot ... Show More
35m 37s
Aug 12
172: Tesla Data Analyst: This is how to land a data job (Lily BL)
<p>What does it take to land a data analyst job at Tesla, and what challenges await you once you're there? Join me as I interview Lily BL, a former Tesla data analyst, who reveals her exhilarating journey in the world of data at one of the world's most innovative companies.</p><p ... Show More
33m 45s
Mar 2025
150: 9 Huge LIES About Becoming a Data Analyst Nobody Talks About
<p>In this episode, I uncover the nine biggest LIES about landing a data job. Maybe what's stopping you from pursuing a data career is just a big lie.</p><p>No College Degree As A Data Analyst YT Playlist: https://www.youtube.com/playlist?list=PLo0oTKi2fPNjHi6iXT3Pu68kUmiT-xDWs</ ... Show More
16m 11s
Apr 2025
AGI is still 30 years away — Ege Erdil & Tamay Besiroglu
Ege Erdil and Tamay Besiroglu have 2045+ timelines, think the whole "alignment" framing is wrong, don't think an intelligence explosion is plausible, but are convinced we'll see explosive economic growth (economy literally doubling every year or two).This discussion offers a tota ... Show More
3h 8m
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
<p>As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage.</p> <p>At a time when AI-driven workloads and hybrid cloud computing are push ... Show More
24m 26s
Mar 2025
NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget
<p>NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA ... Show More
42m 6s
Jun 2025
Architecting AI-Driven Financial Systems: Innovation at the Intersection of Fintech and Emerging Tech
In this episode of the Data Science Salon Podcast, we sit down with Sasibhushan Rao Chanthati, AVP and Senior Software Engineer at T. Rowe Price, where he’s building the future of finance through intelligent, scalable technologies. Sasi specializes in creating secure digital ecos ... Show More
29m 7s