How to create a dataframe from a text file

preview_player
Показать описание
Creating a dataframe from a text file involves 4 steps.
1. First is to create a RDD from the text file
2. Second step is to create a case class
3. Third step is to convert the RDD to row RDD using case class
4. Final step is to create dataframe from the schema RDD using to-DF method.

This creates a RDD called employeeRdd.

2. Now let’s do the second step of creating case class. Creating case class is easy.
All we should do is type

case class caseclassname & within braces fields names and their data types separated by comma.

3. Now, let’s do the third step of converting it to row RDD using the case class. For this, applying Map transformation to the employeeRDD, to split the individual elements of the RDD, by using split function and by using ‘comma’ as the delimiter. On top of that, applying another map transformation to convert as Row object, with the first field as integer and to trim the second field for each of the records. This creates a RDD called empRowRdd.

4. Finally, we can apply toDF method on the empRowRdd to create employee-Df dataframe. Now that we have created employeeDf successfully, let’s see how to save the contents of the dataframe to external text file.

Рекомендации по теме
welcome to shbcf.ru