So You Wanna Be a Pandas Expert? (Tutorial) - James Powell | PyData Global 2021

preview_player
Показать описание
So You Wanna Be a Pandas Expert? | (Pre-recorded Tutorial)
Speaker: James Powell

So… you want to be a Pandas expert.

What’s it going to take? Should you memorize the Pandas API? Should you read through the source code, line-by-line, file-by-file? Should you try to write your own Pandas from scratch? Or could it be much simpler than that? Could there be an idea, a small, tiny idea, sitting there in plain sight; an idea so obvious that you may have overlooked it; yet an idea that unlocks the complexity of the tool?

In this talk, we’ll discuss what Pandas really is, how it distinguishes itself from NumPy, and what the index and index-alignment are all about. And, through the lens of index alignment, we will see how to unlock the power of Pandas, how to understand the vagaries of the API, and how to create a simple framework for understanding all the many details of the tool, and motivate how deliberate, fluent use of indexing in Pandas is the one thing sitting between you and Pandas effective use for every day tasks.

James Powell's Bio
James serves as lead instructor for Don't Use This Code. Don't Use This Code provides consulting, coaching, and training services to a number of clients in the financial services and tech industry, helping them develop greater expertise in the use of Python for data analysis, computational simulation, and automation.

PyData Global 2021

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Рекомендации по теме
Комментарии
Автор

I saw James Powell, I clicked. You always learn something from this guy, at least it refreshes your knowledge.

arnoldwolfstein
Автор

please give a tutorial on how to understand this tutorial

sahilsharma-hjgq
Автор

Had it playing in the background, and with a complete straight face at 16:00 he accelerates like a bat out of hell, figuratively leaving me in a dust cloud (switching to action replay to eventually catch up). Hilarious and epic! 🤩

cjsveningsson
Автор

Now that is a true expert. A breath of fresh air compared to the countless others who only stay at intro level.

nccamsc
Автор

James Powell has this uncanny capacity to make me feel like a dummie every time I see this presentations.

Half of it just goes right over my head

DuarteMolha
Автор

I watched this video more then 3 times over, every minute of it is valuable :)

grigorytrofimov
Автор

Two hours worth watching. Twice. The tutorial I needed to finally "get" Pandas.

NormanBaatz
Автор

1:13:30 what is a series and a dataframe

morenoh
Автор

Inspiring! Took me two days to watch this, but I've learning so much. Didn't come here to learn about the path and subprocess modules, but I had to learn how you created the dataframe with file paths and line numbers for the pandas module!

stiankarlsen
Автор

anyone getting stressed about the tea getting cold?

lbermude
Автор

Epic presentation!! Looking forward to the follow up, thanks James!

longtailfinancial
Автор

Thank god for the playback speed = 0.75 option!
First time it was really useful.

djchrisi
Автор

Anyone has the code / pdf of this session ?

modakad
Автор

Can you explain your IDE? Is that Vim+TMUX?

longtailfinancial
Автор

how you execute the code in just by navigating down under multi line comment?

indrabhushansingh
Автор

With data frames and series it works OK but one has to remember that the operations are respectful of indexes. Yes, indexes are the key to efficient manipulation of data frames and series. It makes perfect sense once you understand this. We do want this to happen in data analysis - aligning data frames by indexes AND COLUMN NAMES. This makes operations predictable without relying on the order of rows/columns. When there are duplicates in indexes, joining takes place between corresponding values and this is why they get cross-joined. It makes perfect sense from the point of view of data analysis. SQL joins work exactly the same. And indexes, even though pandas allows duplicates, should not have duplicates. There are functions and safety vales in pandas that can notify the user automatically that indexes are not unique. Pandas is a very well thought and design API but one has to dedicate some time to understand the underlying principles. As usual :)

There are many functions in pondas, like, say, 'transform', 'apply', 'agg', because they are supposed to do DIFFERENT things. For instance, 'transform' will always return the result of the computation with the same shape as the input data (which is very, very useful in so many situations), whereas apply is free to return ANYTHING. There are very good reasons, as I've already pointed out, to have these specialized functions do different things instead of having just one giant function that takes million arguments.

Another thing you're missing, James, is that pandas allows for the fluent style of programming. Without these functions and functionalities, you'd be stuck with very ugly code and you'd litter it with variables, myriads of variables... This is what essentially happens with your numpy code :)

dariuszspiewak
Автор

How can someone be such a great programmer AND stylish presenter at the same time?

vladimirkraus
Автор

Dear James, thank you for lesson, it would even greater if you could speak a bit slowly, as many people here from international audience.

AkaExcel
Автор

The only thing wrong with this presentation is him not using regular jupyter notebooks. Like I get people thinking they need to use vim for fking anything, i tried myself to use it for anything interactive with python but all the options suck in the end. I love vim for anything webdev or developing python, but not for interactive stuff. LIke I dont even care what he uses in his free time, but when you make educational content and there is a tool which more people are familiar with and which might be easier for the viewer to understand then maybe dont use vim.

dont get me wrong, i love all of his videos and watched many multiple times, the educational value is gigantic. But in this one maybe just use jupyter :/

jeffrey
Автор

Dude writes python code like an R user

BingbongRecto