1. Remove double quotes from value of json string using PySpark

preview_player
Показать описание
In this video, I discussed about removing double quotes from value of json string property using PySpark.

Link for Azure Synapse Analytics Playlist:

Link to Azure Synapse Real Time scenarios Playlist:

Link for Azure Data bricks Play list:

Link for Azure Functions Play list:

Link for Azure Basics Play list:

Link for Azure Data factory Play list:

Link for Azure Data Factory Real time Scenarios

Link for Azure Logic Apps playlist

#PySpark #Spark #DatabricksNotebook #PySparkLogic #WafaStudies #maheer
Рекомендации по теме
Комментарии
Автор

Thanks for the vedio
Please continue this series. I learnt a lot from your ADF vedios.

polakigowtam
Автор

Hi Maheer, When we have data like below. It will replace double quote of 1st row. Please check the output of this.

string = '{"ID":"2", "Name":"Suresh"Gajula", "City":"Hyd"}'

data = [(1, '{"ID":"1", "Name":"Suresh", "City":"Hyd"}'), (2, string)]
schema = ['id', 'value']

df = spark.createDataFrame(data, schema)
# display(df)

from pyspark.sql.functions import split, lit, concat_ws, concat

df = df.withColumn('value1', split('value', '"Name":"')[0])\
.withColumn('value2', lit('"Name":"'))\
.withColumn('value3', split('value', '"Name":"')[1])
# display(df)

df = df.withColumn('value4', split('value3', '"', 2))
# display(df)

df = df.withColumn('value5', concat_ws(' ', 'value4'))
# display(df)

df = df.withColumn('value6', concat('value1', 'value2', 'value5')).select('id', 'value', 'value6')
display(df)

g.suresh
Автор

You are my guru Bhai, looking this video . Hope you will bring all the real time scenario videos on pyspark . Love from Odisha❣️❣️

dipti
Автор

Very helpful. Thankyou so much for sharing

azurecontentannu
Автор

DataFrame[col1: bigint, col2: string, col3: string, col4: string, col5: string]
I'm facing the above issue while trying to execute the code in Google Colab. Tried to define some schema also, But no luck. so please let me how to resolve it.

mukeshkola
Автор

why do we need even if we use "" - will it gives the same result?

suryameenchirel
Автор

Hi MAheer bhai,
in video you said u will past code in the description, but i cud not able to find it.

shuaibsaqib
Автор

Sir, is there any playlist or videos to learn basics of pyspark, like phython playlist.

Please create videos for real-time scenarios based on databricks.

barbershop
Автор

Thanks for all your efforts for skill-up to us!

Can we try with lambda function,
logic looks lengthy and hardcoded value in split

zoptyhf
Автор

Hi @Maheer eagerly waiting for more content in this playlist

jaymakam
Автор

This is a good practice to split strings. How about we use map? Is that easier?

jerryyang
Автор

Hi Bro pls upload databricks realtime scenarios and interview questions. I learned a lot from your videos thanks a lot!

nharshimha
Автор

Thanks for all your efforts for skill-up to us!
Could you please make a video on setup pyspark environment on pycharm locally?

RaushanKumar-bwqf
Автор

Hi Maheer

Please let me know if you also take live classes.

phalgun
Автор

Please bro made a series of Cosmos dv and how I use jaca script to right query

focus
Автор

What ever you did to accomplish that requirement that was good, but is there any way to reduce the code boss

hemakshudukoduru
Автор

Is there any way to do the same for all columns without using?
.withColumn as number of columns will be dynamic in my case.

mohitjoshi
Автор

Hi maheer, could you please suggest some books for databricks spark certification?

abhishekjoshi
Автор

Nice explanation 👏. But this will work if the name has only one double quotes. We have to modify the code if we have more than one double quotes. How about using replace function?? Can you try that.

venkateshgunturu