Group By - Pandas

preview_player
Показать описание
“There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with pandas. In reading the docs it feels like there are a thousand ways to do each operation. And it is hard to tell if they do the exact same thing or which one you should use. That's why I made An Opinionated Guide to pandas—to present you one consistent (and a bit opinionated) way of doing data science with pandas and cut out all the confusion and cruft.

I'll talk about which methods I use, why I use them and most importantly tell you the stuff that I've never touched in my years of data science practice. If this sounds helpful to you then please watch and provide feedback in your comments.

This series is beginner-friendly but aimed most directly at intermediate users.

“Opinionated Guide – Group Operations” contents:

Helpful links:

Рекомендации по теме
Комментарии
Автор

Lecture notes - Group By
1. Groupby - specify the column(s) -> df_gb = df.groupby(['aa', 'bb'])
2. Agg: df_gb.agg({dictionary}) -> weird structures... to fix this
2-1. Multi index(rows): use df_agg.reset_index()
2-2. Multi index(columns):
df_agg.columns = ['__'.join(col).strip() for col in df_agg.columns.values]
3. filter, transform

arhataria
Автор

Great video. I am an experienced sql / etl engineer working on my (weak) pandas game. I ran into this exact multi index column stuff yesterday and it made my head hurt for a few minutes... funny I would stumble on this video today.

jasonclement
Автор

Muchísimas gracias, es clarísimo como explica. Voy a buscar el tutorial para trabajar las columnas de índices múltiples

futuromejor
Автор

I use .apply() and I find it very slow.

rchuso