Dealing Multi source CSV file in Spark SQL | json_tuple | CSV & JSON in a single file | using Scala

preview_player
Показать описание
Hi Friends,
In today's video, i have explained about how to flatten a csv file using Scala, if the CSV file contains both comma delimited and JSON formats.

Please subscribe to my channel and provide your feedback in the comments section.
Рекомендации по теме
Комментарии
Автор

Useful information and presented in details, thanks Sravana

vasudeorane
Автор

Hello .. Thank you for this video .. I m looking for similar scenario, i have huge data in csv file both comma delimited and JSON formats and i m not sure column details in json formats .. So how to derive column name and assign it dynamically or any other option is available.. Thanks in advance

suprajar
Автор

Can you please provide the sample data ?

mondritaroy
Автор

Hi Sravana
i am getting null values when using from_json, can you help me figure out the missing piece here . TY
~ input is the .csv file with json e.g.
id, request
1, {"Zipcode":704, "ZipCodeType":"STANDARD", "City":"PARC PARQUE", "State":"PR"}
2, {"Zipcode":704, "ZipCodeType":"STANDARD", "City":"PASEO COSTA DEL SUR", "State":"PR"}
~my code (scala/spark)
val input_df = spark.read.option("header", true).option("escape", "\"").csv(json_file_input)
val json_schema_abc = StructType(Array(
StructField("Zipcode", IntegerType, true),
StructField("ZipCodeType", StringType, true),
StructField("City", StringType, true),
StructField("State", StringType, true))
)
val output_df = input_df.select($"id", from_json(col("request"), json_schema_abc).as("json_request"))
.select("id", "json_request.*")

thesadanand
Автор

can you please provide the sample data & source code (github link should do ) ~ty

thesadanand