QuackOSM & OvertureMaestro: Open geospatial data at your fingertips
- Track:
- Data Engineering and MLOps
- Type:
- Talk
- Level:
- intermediate
- Duration:
- 30 minutes
Abstract
Have you ever wanted to process big geospatial data from OpenStreetMap or Overture Maps Foundation? What if I told you that you don't need an entire technology stack to retrieve data the size of a country and wait an eternity to process it.
With QuackOSM and OvertureMaestro, you can easily work with whole-country vector and tags data without installing any additional dependencies - come and find out how you can use it in your next project!
QuackOSM is a powerful and user-friendly library that streamlines the process of accessing and manipulating OpenStreetMap (OSM) vector and tags data. It's using the DuckDB engine with its Spatial extension, and PyArrow library that enables users to efficiently retrieve large-scale OSM data in the GeoParquet format.
It's similar in functionality to other available libraries, but it's faster, can work with bigger than memory datasets and doesn't require any additional dependencies.
OvertureMaestro is a twin library for accessing Overture Maps data using a very similar API and, in addition, using multi-processing and a special index to reduce the amount of data needed for retrieval.
In addition to the API, the libraries also include a CLI allowing them to be used in data engineering tasks, for example when loading data into cloud databases.
This talk will dive deep into the inner workings of the QuackOSM and OvertureMaestro libraries. The talk will also cover how DuckDB and PyArrow work with the GeoParquet file format.