Stop using inplace=True in Pandas!

preview_player
Показать описание
Lots of Pandas users think that inplace=True is a great way to save memory, to make their queries more efficient, or generally to be better. But none of these is true. Moreover, inplace=True is going away in the future. So if you're using it, you should stop! In this video, I demonstrate what inplace=True means and does, what you can (should) do instead, and some of the methods that are affected.
Рекомендации по теме
Комментарии
Автор

I stopped using inplace=True once I started doing method chaining. The other bonus is that you can play with the data without changing the original dataframe, then once everything is good during EDA, you can then overwrite the original dataframe later.

christophertyler
Автор

What about in a loop scenario, eg preparing train and test?
data=[train, test]
for df in data:
df.drop(columns = ['BLOCKID', 'SUMLEVEL', 'primary'], inplace=True
)

thanks

johnbainbridge
Автор

Awesome advice. I'm working in Pandas to subset database records. I was considering inplace=True for memory efficiency. Ill assign back to the original variable for the reasons mentioned. Thanks for the tip.

BrownStain_Silver
Автор

Can you point where Pandas core developer have mentioned that it does not save memory? I was in illusion that it saves memory.

tushartiwari
Автор

Interesting and I'll bear that in mind. But didn't you once say that setting a dataframe back to itself (ie df=df.reset_index(), or somesuch) was problematic? Can't remember why. Don't you have to force it to make a copy or something like that. That's why I've been using Inplace=True up to now.

ps I really enjoy these bite-sized insights

WildRover
Автор

I think it's pretty important to lead off with the fact that the Pandas development team is working to remove "inplace" support. Then mention chaining and no performance/memory improvement.

sloanlance
Автор

Let's say you have a list of pandas dataframes, and you want to rename some of the columns of each frame using a dictionary. If you loop through the dataframes and apply the rename method, the column names will not change unless you use inplace = True.
Prove me wrong.

cybernne
Автор

I also thought that i would save memory by using "inplace=True", will stop using it from now on, thanks for the hint.

franky
Автор

I had the None problem today and could not figure out why. Once I removed inplace=True, it worked.

method
Автор

Thanks. Basically it says to apply method chaining on the fly.

elu
Автор

Not the most convincing video. The memory issue aside the rest of this was just filler of basic play around stuff which has nothing to do with why someone would "inplace" in the first place, which is because they are happy with their exploratory edits and now explicitly want to change the df for good.

lade_edal
Автор

Thanks for the tip! Will implement that from now on 😊

Arne_Boeses