14. explode(), split(), array() & array_contains() functions in PySpark | #PySpark #azuredatabricks

preview_player
Показать описание
In this video, I explained about explode() , split(), array() & array_contains() functions usages with ArrayType column in PySpark.

Link for PySpark Playlist:

Link for PySpark Real Time Scenarios Playlist:

Link for Azure Synapse Analytics Playlist:

Link to Azure Synapse Real Time scenarios Playlist:

Link for Azure Data bricks Play list:

Link for Azure Functions Play list:

Link for Azure Basics Play list:

Link for Azure Data factory Play list:

Link for Azure Data Factory Real time Scenarios

Link for Azure Logic Apps playlist

#PySpark #Spark #databricks #azuresynapse #synapse #notebook #azuredatabricks #PySparkcode #dataframe #WafaStudies #maheer #azure
Рекомендации по теме
Комментарии
Автор

this is worthy to watch.... The speed I picked up after following you is unbelievable. thank you soo muchh for this amazing content and no doubt your explanation is finest ever I have seen.

raghunathpanse
Автор

Awesome Video this is i can thoroughly understand it.

VivekKBangaru
Автор

Thank you Maheer. you are doing a very gentle work. have you prepared the tips of this videos i means slides or whatever?

Aelmasri-htsv
Автор

You are doing an amazing job brother. Keep it up. Thanks for all your contributions to data engineering tutorials.

deepjyotimitra
Автор

Thanks a lot for sharing maheer. Can we create any trail account for practice. As of now Microsoft not provide community free trail subscription I think

phanidivi
Автор

Thanks very much for the tutorial :), I have a query regarding reading in json files.

so i have an array of structs where each struct has a different structure/schema.
And based on a certain property value of struct I apply filter to get that nested struct, however when I display using printschema it contains fields that do not belong to that object but are somehow being associated with the object from the schema of other structs, how can i possibly fix this issue ?

shreyaspatil
Автор

Nice video how can we remove duplicates from array column

RakeshGandu-wbeu
Автор

When you used array() ... What if the number of skills is different between each data?

yosaki-fvyy
Автор

in case of split, what will happen if we give delimiter as | instead of,

sahilgarg
Автор

@WafaStudies
Are there any other ways to explode the array without the explode command?
I ask because I made a script with the explode command, but the performance is really bad and I'm looking for another way to do this.
Thank you!

julianalilian
Автор

sir how can we explode more than 2 columns or more like 150

mohitpande
Автор

Please drop that notebook details in description..so that it will be easy for us to refer...or u can share at git hub repository

vasanthasworld
Автор

0:48 eh tusi soap paya pani ch, , sahi tarah dasso, confusion ho rhi hai bahut jada

maskally
Автор

For me I am not sure why it was not working I changed the script then i got skills and skill both the columns from pyspark.sql.functions import explode, col

# Sample data
data = [(1, 'abhishek', ['dotnet', 'azure']), (2, 'abhi', ['java', 'aws'])]
schema = ['id', 'name', 'skills']

# Create DataFrame
df = spark.createDataFrame(data, schema)
df.show()

# Apply explode function on the "skills" column and rename the exploded column
df1 = df.withColumn('skill', explode(col('skills'))).select('id', 'name', 'skills', 'skill')
df1.show()

abhishekstatus_