filmov
tv
Jay Chia & Sammy Sidhu: Daft - The Distributed Python Dataframe for Complex Data

Показать описание
Daft is an open-sourced distributed dataframe library built for "Complex Data" (data that doesn't usually fit in a SQL table such as images, videos, documents etc).
Experiment Locally, Scale Up in the Cloud
Daft grows with you and is built to run just as efficiently/seamlessly in a notebook on your laptop or on a Ray cluster consisting of thousands of machines with GPUs.
Pythonic
Daft lets you have tables of any Python object such as images/audio/documents/genomic files. This makes it really easy to process your Complex Data alongside all your regular tabular data. Daft is dynamically typed and built for fast iteration, experimentation and productionization.
Blazing Fast
Daft is built for distributed computing and fully utilizes your all of your machine's or cluster's resources. It uses modern technologies such as Apache Arrow, Parquet and Iceberg for optimizing data serialization and transport.
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Комментарии