Python pandas — Chipotle Exercises

preview_player
Показать описание
“There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with pandas. In reading the docs it feels like there are a thousand ways to do each operation. And it is hard to tell if they do the exact same thing or which one you should use. That's why I made An Opinionated Guide to pandas—to present you one consistent (and a bit opinionated) way of doing data science with pandas and cut out all the confusion and cruft.

In this video I work through the examples—cold! So, this should be entertaining for you. 

I'll talk about which methods I use, why I use them and most importantly tell you the stuff that I've never touched in my years of data science practice. If this sounds helpful to you then please watch and provide feedback in your comments.

This series is beginner-friendly but aimed most directly at intermediate users.

“Getting and Knowing - Chipotle” contents:
Exercise

13:55. Step 16. Avg revenue/order. The two approaches above change the result, but obviously avg revenue/order = revenue/count_unique_orders.

Helpful links:
An Opinionated Guide to pandas – Intro to Data Structures P1:
An Opinionated Guide to pandas – Intro to Data Structures P2:
An Opinionated Guide to pandas – Intro to Data Structures P3:
Link to GitHub repo including environment setup for tutorials:
Link to GitHub Intro To Data Structures Jupyter Notebook:
PEP 20 – The Zen of Python link:
Рекомендации по теме
Комментарии
Автор

Some corrections/modifications to the answers above!

13:31​. Step 14. This answer might be incorrect. Summing item_price fails to account for quantity. Perhaps a better solution was: revenue = (chipo.item_price * chipo.quantity).sum()
13:44​. Step 15. Quantity of orders. This is an ambiguous question. Perhaps a better solution was count_unique_orders = chipo.order_id.nunique()
13:55​. Step 16. Avg revenue/order. The two approaches above change the result, but obviously avg revenue/order = revenue/count_unique_orders.

DataTalks
Автор

That's the best thing one can find on the whole internet regarding pandas practice, superb

sugammehta
Автор

1. chipo['item_price'] =
2. chipo['item_price'] = x: float(x.strip('$')))
3. chipo['item_price'] =

The first one runs faster but the second one may be more readable.
The third one is sort of a combination of the first and second ones.

jimmymesa
Автор

Good tutorial. However, the revenue is incorrectly calculated as the price should be multiplied by the quantity. Also, the number of orders should be revised as there are many items in the same order, so the number of rows in the dataset is higher than the number of orders

garciarogerio
Автор

Dang!! You just became my favorite teacher!!!

santiagopr
Автор

The methods idxmax() and max() can be used to obtain the max values in steps 9 and 10:


item_quants['quantity'].max()

jimmymesa
Автор

Hi, wondering why none else commented about Step 14; revenue is not just sum of item_prices but sum of item_prices * quantity, which is then $39237.02 :-)

brotherlui
Автор

a little rusty haha

those python techniques with slicing in the float part is amazing, a truly pythonic way to solve things

Levy
Автор

At step 3, chipo = pd.read_table(url) is possible, too.

jimmymesa
Автор

Blind speed-runs of exercises like this are fantastic. Great content, very helpful!

JonnyD
Автор

why are you so cute! really appreciate your updates especially the one about deep learning. Thank you so much! looking forward to your updates often

liyingfeng
Автор

We will come here to learn and not to a marathon run, so no need to hurry bro go slow go correct. Very nice job keep it up.

Nachiketa_TheCutiePie
Автор

Thank you so much for those videos!
I have learned a lot

evyatarse
Автор

For step 16, you're simply doing overall average price, for ALL order. The questions is PER order, so I believe it should be:

'mean'})

Lion-wmmf
Автор

This may sound a bit naive but how is he opening the repository in his notebook????

zaidazim
Автор

Hi I did exactly the same but the url cannot be opened. What should I do? thanks

Lironoosh
Автор

i got the same answer with one line code at 6:52

koorahd
Автор

Hi Data Talks,
for the 9th question - which was the most ordered item?
why is it not like this
chipo.item_name.max(). whats wrong with this? I mean whats the difference?

janakiyeluripati
Автор

For me the easiest way to complete steps 9 and 10 is to use values.count function:



To filter the data, after values.count we can use head(1) or nlargest(1)

bartomiejonak
Автор

6:28 how do I look up function description/usage/signature in kaggle notebooks as shown here ?

rashmimahadevaiah