Design a Distributed Geospatial Data Platform | System Design

preview_player
Показать описание

In this video, we discuss a high-level design of a geospatial data aggregation platform. This system would be responsible for ingesting multiple formats of data from a variety of sources, aggregating and cleaning the data, and providing a performant and convenient dashboard to interact with the processed dataset.

Table of Contents:
0:00 - Introduction
0:35 - Requirements
2:12 - Data Processing (Single-Node)
3:22 - Data Processing (Distributed)
4:14 - Workflow Orchestration
4:58 - Data API
5:30 - Caching
6:12 - Conclusion

Socials:
Рекомендации по теме
Комментарии
Автор

Thanks for the video! Would be great to also see the how you would write it on a real application

joseavellaneda
Автор

I would like to point out that there are datebase (extensions) for GIS data. Postgis for postgres. So in fact you could query a database. Other databases have also extensions or native features.

rankala
Автор

Very good and explicative video, thank you very much.
I am currently building an internal data platform, and I was going to use Prefect on a VM, but after seeing your video I believe the best way to go would be: Prefect + Dask Scheduler + Dask Worker on Azure Kubernetes Service. Does that make sense to you?  Then I could benefit from autoscaling of the workers.
Thanks again!

pmshadow
Автор

This made me wonder whether systems like Hadoop and MapReduce are still used/built.

pieter
Автор

Did something similar but on a very large scale in PayPal,

yashpandey
Автор

Hi, what exactly is this subject? Is it data science?

ocean