About this episode
Summary
Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities. In this episode Jillian Rowe shares her experience of working in the field and supporting teams of scientists and analysts with the data infrastructure that they need to get their work done. This is a fascinating exploration of the collaboration between data professionals and scientists.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
- Your host is Tobias Macey and today I’m interviewing Jillian Rowe about data engineering practices for bioinformatics projects
Interview
- Introduction
- How did you get involved in the area of data management?
- How did you get into the field of bioinformatics?
- Can you describe what is unique about data needs in bioinformatics?
- What are some of the problems that you have found yourself regularly solving for your clients?
- When building data engineering stacks for bioinformatics, what are the attributes that you are optimizing for? (e.g. speed, UX, scale, correctness, etc.)
- Can you describe a typical set of technologies that you implement when working on a new project?
- What kinds of systems do you need to integrate with?
- What are the data formats that are widely used for bioinformatics?
- What are some details that a data engineer would need to know to work effectively with those formats while preparing data for analysis?
- What amount of domain expertise is necessary for a data engineer to work in life sciences?
- What are the most interesting, innovative, or unexpected solutions that you have seen for manipulating bioinformatics data?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on bioinformatics projects?
- What are some of the industry/academic trends or upcoming technologies that you are tracking for bioinformatics?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Support Data Engineering Podcast
Oct 5
The Data Model That Captures Your Business: Metric Trees Explained
SummaryIn this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data pr ... Show More
1h 1m
Sep 28
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
SummaryIn this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC archite ... Show More
56m 31s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Sep 9
Leading across technical domains, strategic deep-dives & applying your skills in new industries w/ Simone Kalmakis #231
How do you apply your leadership skills to a new, mission-driven industry and effectively lead teams across multiple technical domains? In this episode, Simone Kalmakis (VPE @ Viam) shares her playbook for successfully transitioning between industries from health-tech and climate ... Show More
43m 17s
Apr 2025
Specialized AI brains for physical industry
Everyone wants a piece of general purpose models. Instacart has deployed ChatGPT for recipes and meal planning. The Mayo Clinic is using it to summarize patient records. Schneider Electric is using an OpenAI LLM to generate sustainability reports. With such powerful models, what’ ... Show More
39m 2s
Jul 2022
IoT, IIoT and Managing Edge Data
Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.SHOW: 634CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST ... Show More
35m 37s
Mar 2025
150: 9 Huge LIES About Becoming a Data Analyst Nobody Talks About
In this episode, I uncover the nine biggest LIES about landing a data job. Maybe what's stopping you from pursuing a data career is just a big lie.No College Degree As A Data Analyst YT Playlist: https://www.youtube.com/playlist?list=PLo0oTKi2fPNjHi6iXT3Pu68kUmiT-xDWsDon’t Learn ... Show More
16m 46s
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage. At a time when AI-driven workloads and hybrid cloud computing are pushing storag ... Show More
24m 26s
Mar 2025
NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX.
Chris Deotte is a Senior Data Scientist at NVIDIA an ... Show More
42m 6s
Jun 2025
Architecting AI-Driven Financial Systems: Innovation at the Intersection of Fintech and Emerging Tech
In this episode of the Data Science Salon Podcast, we sit down with Sasibhushan Rao Chanthati, AVP and Senior Software Engineer at T. Rowe Price, where he’s building the future of finance through intelligent, scalable technologies. Sasi specializes in creating secure digital ecos ... Show More
29m 7s
Nov 2024
DataStax and the Future of Real-Time Data Applications with Jonathan Ellis
DataStax is known for its expertise in scalable data solutions, particularly for Apache Cassandra, a leading NoSQL database. Recently, the company has focused on enhancing platform support for AI-driven applications, including vector search capabilities.
Jonathan Ellis is the Co- ... Show More
43m 24s
Sep 15
#321 Developing Financial AI Products at Experian with Vijay Mehta, EVP of Global Solutions & Analytics at Experian
Financial institutions are racing to harness the power of AI, but the path to implementation is filled with challenges. From feature engineering to model deployment, the technical complexities of AI adoption in finance require careful navigation of both technological and regulato ... Show More
49m 28s