Summarizing a DataFrame in PySpark | min, max, count, percentile, schema

preview_player
Показать описание
In this video, I will show you how to summarize a Spark dataframe. You can directly extract basic statistics on large datasets instead of converting the dataframe into pandas and then exploring.

I have put my 3 years of learning experience into this playlist.

Contents of this video:
00:00 - Setting up the PySpark environment
00:50 - Initialize Spark Session object
00:58 - Read data from UCI
01:22 - Summarize DataFrame using PySpark
02:07 - Shape of PySpark DataFrame
02:57 - Print schema of PySpark DataFrame
03:37 - Describe a Dataframe in PySpark
05:14 - Percentile in PySpark
07:39 - Summary and Subscribe :)

Please do like, share and subscribe to this channel and share this video with your friends. Keep learning :)

Follow me here:

Tags:
abhishek mamidi, data science, machine learning, deep learning, artificial intelligence, internship, career, college, job, experience, krish naik, ai engineering, fresher, data science enthusiasts, pyspark, apache spark, python, pysparkling
Рекомендации по теме
Комментарии
Автор

Sorry for being late.
But video is worthy 🔥

bornnoob
Автор

Please upload every week at least 2 videos

anilkumar
Автор

Subscribed, Liked Nice video

Can we make histogram from dataframe created from csv on any one column please show sample or video

viane