Apache Spark Announcement | Single node data science meets big data | Keynote Data + AI Summit 2021

preview_player
Показать описание
Reynold, Co-founder and Chief Architect at Databricks, will talk about the latest in Apache Spark and Koalas and how data scientists can move beyond the laptop. Brooke, a data scientist and machine learning expert, will then demonstrate and discuss some of the new features in Apache Spark as the community improves usability for Python and data science.

Stay tuned for some big news!

Register for free to see the rest of the keynotes and exciting announcements live, plus over 200+ sessions. Learn from the creators and top contributors of technologies like PyTorch, TensorFlow, MLflow, Delta Lake, Apache Spark, Hugging Face, DBT and more.

Connect with us:
Рекомендации по теме
Комментарии
Автор

Based on the title of this talk, I expected an improved single-node install experience. Still not pip installable nor installable via any OS package managers like apt/brew/chocolatey? And nothing to ease the burden of enabling AWS/Azure/GCP storage? If these are announced elsewhere, please lmk. The workspace install experience is still very painful IMHO, and I still struggle to find turnkey full featured docker images designed for local workstation use cases

aaronsteers
Автор

I dislike the fact that Databricks is not considering Spark for Scala more than it's considered for python.

The main issue I see here is that the flexibility or the ease of writing both programs are equal on databricks, but writing an end to end application in python is straightforward which is not the case in scala. Here, you need to manually download the script, convert to jar file, upload and then it can run. And repeat the cycle for every small change you make.

this is the kind of resistance that I think is driving developers to switch towards python.

on other hand, SQL i agree is really good, but they r often written with main application in python/scala where the above resistance and choice favours python.

.Counting
Автор

Glad to see pandas being implemented into pyspark. I liked koalas, but yeah this will be better

twinkazz