Calculate the mean, median, and mode in pandas Python

preview_player
Показать описание
In this video we go over how to calculate the measures of central tendency (i.e., mean, median, and mode) for an entire DataFrame and a Series.

We also discuss some optimization techniques to improve the speed of your calculation.

Did you find this video helpful? Consider subscribing for weekly tips, tricks, and tutorials.

Join my discord server for data analyst, scientists, and those who aspire to be

0:00 Intro
0:17 Data
0:39 Mean for DF
1:00 Median for DF
1:07 Mode for DF
1:24 For Single Column
2:00 Optimization
Рекомендации по теме
Комментарии
Автор

Thanks you for your video which is very helpful for me. In the process of processing the data, I found some special examples for computing median. When the data column dtype is "object", the speed of np.median(df_col.values) is slower than df_col.median(). If possible, it is best to convert data from Object type to float or int before computing median.
Here is the code:
da1 = pd.DataFrame({"col_1":np.random.randint(0, 10, 10**6)})
da1["col_1"] = da1["col_1"].astype(object)
%timeit da1["col_1"].median()
%timeit

lyguoguo
Автор

Hi bro, Thanks for your video. BTW, I run the code %timeit df['col1'].median()
%timeit np.median(df['col1'].values). for me, I got an output this: 69.9 µs ± 3.22 µs per loop (mean ± std. dev. of 7 runs, 10, 000 loops each)
112 µs ± 4.77 µs per loop (mean ± std. dev. of 7 runs, 10, 000 loops each). which means the first code is faster than the second one. can you clarify me. TIA.

abdullahfaizal
visit shbcf.ru