Spark Scenario Based Question | Handle Bad Records in File using Spark | LearntoSpark

preview_player
Показать описание
In this Video, we will learn How to handle Bad Records or Corrupt records in Spark and also we will see a great feature available with Databricks to handle and save Bad Records.

Dataset with corrupt records:

Blog link to learn more on Spark:

Linkedin profile:

FB page:
Рекомендации по теме
Комментарии
Автор

Hi Shahul,

is it possible to make the corrupt records into proper json with Spark?

Actually, i have the similar JSON Log data from Streams, i have written a custom code using Java/Python to clean the raw Json into Proper Json. Instead of this approach can we do this With Spark?

DiverseDestinationsDiaries
Автор

Hi Azar, . Thanks for your explanation. How to handle mismatch data while reading CSV file from any source and store in to another dataframe using Spark scala code. Could you please explain?

narayanareddy
Автор

Hi Azar...I was applying schema manually to create Dataframe using IntelliJ. The problem is that even for a single record if schema is not matching it moves all the JSON records to corrupt column because multiline is set to "true". How can we make Spark to treat records separately in this scenario?

dippusingh
Автор

hi Azar, how to read a 92351 length clob column using pyspark and store in hive

etlquery
Автор

Hi Azar, I have Created the same code snippet for the above explanation but the bard Records folder not get created in the file store. below is the code snippet

df= spark.read.option("badRecordsPath", "/FileStore/tables/badRecords_ford").json("/FileStore/tables/ford_json.json")
df.show()


please guide

swaroopsuki