Unveiling Data's Hidden Shape with TDA in Python (GUDHI)

Track:
Data preparation and visualisation
Type:
Talk (long session)
Level:
intermediate
Duration:
45 minutes

Abstract

As data complexity increases, traditional analysis methods often fall short in uncovering hidden structures within datasets. How can we move beyond linear models to reveal the true shape of data? Topological Data Analysis (TDA) offers a breakthrough approach, yet it remains underexplored in the Python ecosystem. This session will demonstrate how TDA can be made accessible to a wider audience, showcasing its potential for discovering patterns that traditional methods miss.

Why is this interesting to the Python community? Python is the go-to language for data analysis, but TDA is an underutilized tool that offers new insights, especially for high-dimensional or complex data. This session introduces TDA and explores two popular Python libraries—GUDHI and Ripser. Attendees will learn how these tools can uncover hidden structures in data that other methods, like clustering and dimensionality reduction, may overlook.

My Perspective on the Problem: As a TDA researcher, I’ve used GUDHI and Ripser to analyze large, high-dimensional datasets, such as those from the Galaxy Zoo project. These libraries revealed topological features that deepened my understanding of data structure. I’ll compare GUDHI and Ripser, sharing practical insights into how they can be applied in Python to extract meaningful topological features from your data.

What will the audience take away? Introduction to TDA: Learn the core concepts like persistent homology and simplicial complexes and how they reveal the shape of data. Hands-On with GUDHI and Ripser: Discover how to compute persistent homology using GUDHI and Ripser, and integrate them into your workflow. Practical Insights: Apply TDA to real-world data and uncover hidden patterns in high-dimensional datasets. Comparing GUDHI and Ripser: Understand the strengths of both libraries and when to choose one over the other. Applications Beyond Machine Learning: See how TDA complements clustering, dimensionality reduction, and opens up new possibilities in all fields.