Uncluster Your Data Science Using Vaex • Maarten Breddels & Jovan Veljanoski • GOTO 2021

preview_player
Показать описание
This presentation was recorded at GOTO Copenhagen 2021. #GOTOcon #GOTOcph

ABSTRACT
Would you like to build an snappy dashboard visualising hundreds of millions of data points, or interactively explore hundreds of Gigabytes of data, all of that using a single machine?
Meet Vaex - an out of core DataFrame library in Python that can do all the typical data manipulations, filtering, and aggregations on a billion rows in real time & on a single computer. This approach empowers your team and allows them to focus much more on the business problem, as it removes the large DevOps overhead of configuring and maintaining a cluster.
Vaex fully supports Apache Arrow, which both facilitates the interoperability with other systems and enables storage and manipulation of more complex data structures like lists [...]

TIMECODES
00:00 Intro
00:50 Motivation
05:20 Vaex
06:14 Concepts: Memory mapping
07:56 Concepts: Column based storage
09:37 Concepts: No memory copies
10:50 Concepts: Compute & expression system
13:30 Demo
32:32 In production
34:23 In the wild
35:00 In production: Dash example
37:13 Summary
37:58 Outro

Download slides and read the full abstract here:

RECOMMENDED BOOKS

#Vaex #ApacheArrow #DataScience #AI #ML #ArtificialIntelligence #MachineLearning #DataFrame #Programming #VaexIO #Astronomy

Looking for a unique learning experience?

SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
Рекомендации по теме
Комментарии
Автор

Extremely proud of this guys, I have been doing analytics with vaex for almost 3 years, these guys are actively pushing updates, fixing bugs, new features, improving vaex. Sometimes I have thought that one day vaex will be as powerfull as spark, they definitly deserves funding!

kleyersosa
Автор

Wow! Thanks! Super awesome talk and a tool presented! Will research and probably use it!

TaranovskiAlex