25. split function in pyspark | pyspark advanced tutorial | getitem in pyspark | databricks tutorial

preview_player
Показать описание
Azure Databricks #spark #pyspark #azuredatabricks #azure
In this video, I discussed how to use split functions in pyspark.

1. split function in pyspark
2. getItem function in pyspark

Create dataframe:
======================================================

data = [('Susheel','','Singh','1992-08-26'),
('Indra','Bahadur','Singh','1967-01-01'),
('Amrita','','Singh','1992-09-05'),
('Vaibhav','Kumar','Singh','1999-12-01'),
('Prabhu','Srimant','Darshanal','1992-02-17')
]
columns=["firstname","middlename","lastname","dob"]
display(df)
-----------------------------------------------------------------------------------------------------------------------
Split Column using withColumn()
.withColumn('month', split(df['dob'], '-').getItem(1))\
.withColumn('day' , split(df['dob'], '-').getItem(2))
display(df1)

------------

display(df1)

display(df3)

============================================================

Learn PySpark, an interface for Apache Spark in Python. PySpark is often used for large-scale data processing and machine learning.

1. pyspark introduction | pyspark tutorial for beginners | pyspark tutorial for data engineers:

2. what is dataframe in pyspark | dataframe in azure databricks | pyspark tutorial for data engineer:

3. How to read write csv file in PySpark | Databricks Tutorial | pyspark tutorial for data engineer:

4. Different types of write modes in Dataframe using PySpark | pyspark tutorial for data engineers:

5. read data from parquet file in pyspark | write data to parquet file in pyspark:

6. datatypes in PySpark | pyspark data types | pyspark tutorial for beginners:

7. how to define the schema in pyspark | structtype & structfield in pyspark | Pyspark tutorial:

8. how to read CSV file using PySpark | How to read csv file with schema option in pyspark:

9. read json file in pyspark | read nested json file in pyspark | read multiline json file:

10. add, modify, rename and drop columns in dataframe | withcolumn and withcolumnrename in pyspark:

11. filter in pyspark | how to filter dataframe using like operator | like in pyspark:

12. startswith in pyspark | endswith in pyspark | contains in pyspark | pyspark tutorial:

13. isin in pyspark and not isin in pyspark | in and not in in pyspark | pyspark tutorial:

14. select in PySpark | alias in pyspark | azure Databricks #spark #pyspark #azuredatabricks #azure

15. when in pyspark | otherwise in pyspark | alias in pyspark | case statement in pyspark:

16. Null handling in pySpark DataFrame | isNull function in pyspark | isNotNull function in pyspark:

17. fill() & fillna() functions in PySpark | how to replace null values in pyspark | Azure Databrick:

18. GroupBy function in PySpark | agg function in pyspark | aggregate function in pyspark:

19. count function in pyspark | countDistinct function in pyspark | pyspark tutorial for beginners:

20. orderBy in pyspark | sort in pyspark | difference between orderby and sort in pyspark:

21. distinct and dropduplicates in pyspark | how to remove duplicate in pyspark | pyspark tutorial:

Azure Databricks Tutorial Platlist:

Azure data factory tutorial playlist:

ADF interview question & answer:
Рекомендации по теме
Комментарии
Автор

Thanks Very Good Explain.

Ques:-
val lis = list ("A$B", "C$D", "E$F")
val lis1 = lis.FlatMap( x => x.SPLIT("$"))
println(lis1)
lis1.foreach(println) run what is the resion???

But

val lis = list ("A~B", "C~D", "E~F")
val lis1 = lis.FlatMap( x => x.SPLIT("~"))
println(lis1)
lis1.foreach(println)

OutPut:-
A
B
C
D
E
F

subhashyadav