PySpark Tutorial

preview_player
Показать описание
Learn PySpark, an interface for Apache Spark in Python. PySpark is often used for large-scale data processing and machine learning.

⌨️ (0:00:10) Pyspark Introduction
⌨️ (0:15:25) Pyspark Dataframe Part 1
⌨️ (0:31:35) Pyspark Handling Missing Values
⌨️ (0:45:19) Pyspark Dataframe Part 2
⌨️ (0:52:44) Pyspark Groupby And Aggregate Functions
⌨️ (1:02:58) Pyspark Mlib And Installation And Implementation
⌨️ (1:12:46) Introduction To Databricks
⌨️ (1:24:65) Implementing Linear Regression using Databricks in Single Clusters

--

🎉 Thanks to our Champion and Sponsor supporters:
👾 Wong Voon jinq
👾 hexploitation
👾 Katia Moran
👾 BlckPhantom
👾 Nick Raker
👾 Otis Morgan
👾 DeezMaster
👾 Treehouse

--

Рекомендации по теме
Комментарии
Автор

This man is singlehandedly responsible for spawning data scientists in the industry.

stingfiretube
Автор

Why are u uploading the good stuff during my exams bro

anikinskywalker
Автор

Sir Krish Naik is an amazing tutor, learned a lot about statistics and data science from his channel

shritishaw
Автор

I ran into an issue while importing pyspark(Import Error) in my notebook even after installing it within the environment. After doing some research, I found that the kernel used by the notebook, would be the default kernel, even if the notebook resides within virtual env. We need to create a new kernel within the virtual env, and select that kernel in the notebook.

Steps:
1. Activate the env by executing "source bin/activate" inside the environment directory
2. From within the environment, execute "pip install ipykernel" to install IPyKernel
3. Create a new kernel by executing "ipython kernel install --user --name=projectname"
4. Launch jupyter notebook
5. In the notebook, go to Kernel > Change kernel and pick the new kernel you created.

Hope this helps! :)

vivekadithyamohankumar
Автор

I am happy that I completed this video in one sitting

MSuriyaPrakaashJL
Автор

Dear Mr Beau, thank you so much for amazing courses on this channel.
I am really grateful how such invaluable courses are available for free.

ygproduction
Автор

You guys are literally reading everyone's mind. Just yesterday I searched for pyspark tutorial and today it's here. Thank you so much. ❤️

nagarjunp
Автор

I have to say, it is nice and clear. The pace is really good as well. There are many tutorials online that are either too fast or too slow.

yitezeng
Автор

IMPORTANT NOTICE:

the na.fill() method now works only on subsets with specific datatypes, e.g. if value is a string, and subset contains a non-string column, then the non-string column is simply ignored.

So now it is impossible to replace all columns' NaN values with different datatypes into one.

Other important question is: how come values in his csv file are treated as strings, if he has set inferSchema=True?

arturo.gonzalex
Автор

Uploaded at the right time. I was looking for this course. Thank you so much.

candicerusser
Автор

VERY MUCH HAPPY IN SEEING MY FAVORITE TEACHER COLLABORATING WITH THE FREE CODE CAMP

lakshyapratapsigh
Автор

0:52:44 - complementing Pyspark Groupby And Aggregate Functions

df3 = df3.groupBy(
"departaments"
).agg(
sum("salary").alias("sum_salary"),
max("salary").alias("max_salary"),

)

oiwelder
Автор

42:17 Here the 'Missing values' is only replacing in the 'Name' column not anywhere else. even if I am specifying the columns names as 'age' or 'experience', it's not replacing the null values in those columns

baneous
Автор

I didn't expect krish.... Amazingly explained

mohandev
Автор

Hi krishnaik,

All I can say is just beautiful, I followed from start to finish, and you were amazing, was more interested in the transformation and cleaning aspect and you did justice, I realize some line of code didn't work as yours but all thanks to Google for the rescue.

This is a great resource for introduction to PySpark, keep the good work.

dataisfun
Автор

I just love how he says

“Very very simple guys”

And it turns out to be simple xD

MiguelPerez-nvyw
Автор

Biggest crossover : Krish Naik sir teaching for free code camp

sharanphadke
Автор

I am very happy to see krish sir on this channel.

siddhantbhagat
Автор

@Krish Naik Sir just to clarify at 26:33 I think the Name column min-max decided on the lexicographic order, not by index number.

yashbhawsar
Автор

There is an update in na.fill(), any integer value inside fill will replace nulls from columns having integer data types and so for the case of string value as well.

ujjawalhanda
join shbcf.ru