4. Skip line while loading data into dataFrame| Top 10 PySpark Scenario Based Interview Question|

preview_player
Показать описание
Please enroll in data engineering project courses

#databricks #interviewquestion #pyspark
Рекомендации по теме
Комментарии
Автор

alternative

cols=StructType ([StructField ('id', IntegerType ()), (StructField( 'name', StringType ())) ])

df=spark. read. option ('mode', 'DROPMALFORMED) .schema (cols). format ( 'csv'). load(' <enterpathhere>')

df. show ()

namratachavan
Автор

first() - not returning a list [ID, Name], but a string: ID, Name' And the filter to remove the columns names also returning a list of strings rather than a list of lists. ['1, A', '2, B', '3, C', '4, D']

SireeshaPulipati
Автор

Please add thumbnails at the very end of the video. I couldn't see the last 15s of your work.

mohitupadhayay
Автор

we can also achieve same using modes with badRecordsPath or dropmalformed

VikasChavan-vc
Автор

Hi Sir My Solution

filteredRdd = rdd.filter(lambda x: x[1] >3)
finalRdd = filteredRdd.map(lambda x: (x[0]))
finalRdd = filteredRdd.map(lambda x: (x[0][0], x[0][2]))
df1 = spark.createDataFrame(finalRdd, ['id', 'Name']) or df1.toDF('id', 'Name').show()

rawat
Автор

How to skip last 10 rows while reading csv in pyspark?

ruinmaster