How to Generate MD5 for PySpark dataframe | PySpark Tutorial

preview_player
Показать описание

In this video, I have shared a quick method to generate md5 value for entire row using the pyspark dataframe.
MD5 value is generally generated when we want to print unique id for each entire row. It is really helpful when we want to do some kind of data validation.
Рекомендации по теме
Комментарии
Автор

Hey hi. This video is used so much for me. But could you please confirm this is not useful for nested json right. So do you have any another approach to create md5 or hash value??

gbhargavi
Автор

hi I m trying to create DF using below
val x = (1 to 10).toList
val numbersDf = x.toDF("number")


but getting error ?
could you pls tell me what I m missing in this code

ur