Spark Scenario Based Question | Handle JSON in Apache Spark | Using PySpark | LearntoSpark

preview_player
Показать описание
In this video, we will learn how to process the JSON file and load it as a dataframe in Apache Spark using PySpark. Hope this video will help you in Spark Interview Preparation with scenario based questions.

Blog link to learn more on Spark:

Blog to handle nested JSON file using Spark

Linkedin profile:

FB page:
Рекомендации по теме
Комментарии
Автор

Explode will convert Array to Struct- Struct can be directly accessible - Nice video man , will be great if you upload more intreview question in the Pyspark for 2024

ArabindaMohapatra
Автор

This is good, I'm actually going to watch this again and take notes.

stephenmartin
Автор

thanks a lot for the valuable topic. i really apprciate your efforts..

saikannanravichandar
Автор

Bro, Your video is amazing... Really appreciate the way you teach.. thanks a lot.

MohanakrishnanR
Автор

Thanks for sharing. Am looking for a pyspark command to read json files with single line(struct) and multiple line(array) in a single dataframe.

kesavakrishnan
Автор

Thanks for useful videos, I stuck up during Flatten from StructType to Strings (I could able to do Array to string, Map to String as well), Can you please share piece of code - how to flatten (StructType -->Array)
UseCase - one of the files has StructType -->Array-->Strct, Find below piece from printSchema()

root
|-- batters: struct (nullable = true)
| |-- batter: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- id: string (nullable = true)
| | | |-- type: string (nullable = true)

gunturaudi
Автор

Hi your video is amazing! I am wondering to know how to handle a table having three columns, column first is nested array having all columns name and second column having nested array inside array, how to map the second column values to first column which contains column names?? Thank you

sudippandit
Автор

Thanks. it was helpful to me for reading a string field inside nested struct structure. simple and better way. i found anothey ways using udf functions to extract that in other articles.

surendrabisht
Автор

maybe a mundane question, but does the explode work on struct type also ?

albinchandy
Автор

Flatten ? If I want just a certain value?

puggyk
Автор

How to calculate number of partitions required for a 10 GB of data, and for repartitioning and coalesce please help??

MrManish
Автор

java out of memory error, java heap space, showing that while reading the json file of 4 mb

francis.joseph
Автор

Brother, can you please do a video on spark structured streaming using pyspark- kafka (the streaming data as JSON strings)

albinchandy
Автор

Bro, thank you very much. Pls let us how to deal same and nested json using scala spark.

maheshk
Автор

Thank you for sharing,
I have one question, can you please guide me how to solve
Hive table column have json object data, I want parse that json data and load into another table using spark/pyspark not with hive

my hive column data like :

fruits(column name)

{
"fruits":[{
"fruit":"apple"
"rate":10.25
},
{
"fruit":"mango"
"rate" : 9.50
},
"fruit":"orange"
"rate": 5.50
}]
}



my expected output like:



fruit1 fruit2 fruit3

apple mango orange

sathishkolla
Автор

Hi Azar... Im Shalini...Hope you are doing good.i tried to read same kind of json file, but I got _corrupt_record...I thought format of json not good..Then i just formated json using online jason editor.then i read the file ...Again i got _corrupt record

srmr
Автор

Bro plz make a videos on scala with these examoles

madhanmohanreddy
Автор

thanks for sharing, can i send you some question in your email

blhijez