Accelerating privacy-enhancing data processing

Track:: Data Engineering and MLOps (2025)
Type:: Talk (long session)
Level:: intermediate
Room:: North Hall
Start:: 10:45 on Wednesday, 16 July 2025
End:: 11:30 on Wednesday, 16 July 2025
Duration:: 45 minutes

Abstract

Our mission is simple but profound: to improve and extend lives by learning from the experience of every person with cancer. Achieving this requires seamless feedback loops between scientists, engineers, and clinicians working with sensitive data across heterogeneous environments.

You might expect a story about how we tried and failed. And yes, we’ll share some failures and surprises along the way. But this is, at its core, a success story - because it works.

In this talk, we’ll dive into the data architecture and core technologies that enable us to learn from every patient’s journey with cancer. We’ll reveal how our cross-functional teams - scientists, engineers, and clinicians - collaborate to transform raw data into research-grade datasets. Along the way, we’ll share the challenges we faced, the lessons we learned, and the tools we developed to ensure the high data quality required for groundbreaking research.

We’ll also show how we leverage domain knowledge, data science, and established technologies to create tools that maintain this quality and accelerate feedback loops. By shifting insights generation left, we empower teams to iterate faster and drive more impactful outcomes.

This talk is ideal for anyone with data warehouse or lakehouse experience curious about harnessing the Pythonic data stack to iteratively generate insights from sensitive data in highly regulated environments.

Abstract

Recording