Spark Scenario Based Question | Deal with Ambiguous Column in Spark | Using PySpark | LearntoSpark

preview_player
Показать описание
In this video, we will learn how to solve the Ambiguous column issue while the reading the file in Spark.

Fb page:

Dataset:

Рекомендации по теме
Комментарии
Автор

In pyspark we can simply apply
df_final=df.withColumn("Name", df["name0"]).drop("name0", "name4")

In upgrade version of pyspark it will display value with indexing by default
"create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so"

Thank you so much for this playlist Sir!

Akshaykumar-puvi
Автор

Before unwrapping the inner JSON, we can rename the name coloum and we can unwrapped the inner JSON right

bhaskarreddy-wtrc
Автор

Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..

sushantshekhar
Автор

cant we rename column by this code:-
for i in df_cols:
if i in lst:
i = i+"new"

lst.append(i)

it will check if col exist and if exist, it will append "new" with it. as simple as that, indirectly you are just counting the occurence and then appending, instead of that, we can do above

ayushmittal
Автор

Wanted to deal with duplicate columns as well... This is nice

akshayanand
Автор

Great work.Keep posting new use cases.You will definetly make it big.Thank you

ashutoshrai
Автор

I have made Python machine learning web app can I do the same with Pyspark MLlib .
IF yes then how ?
I have used Heroku for my Python machine l apps ?

bhavitavyashrivastava
Автор

Thanks for your efforts. Amazing work
could you please this put the logic in spark scala also

ramyagudivaka
Автор

creating the our own schema does not help is it?

subramanyams
Автор

cant we rename column by this code:-
for i in df_cols:
if i in lst:
i = i+"new"

lst.append(i)

it will check if col exist and if exist, it will append "new" with it. as simple as that, indirectly you are just counting the occurence and then appending, instead of that, we can do above.

o/p
['name', 'product', 'address', 'mob', 'namenew']

ayushmittal
Автор

Can project template for pyspark project to submit job in cluster

ravikirantuduru
Автор

bro can you make video on unit testing

manojkalyan
Автор

Good one.. please post in scala as well!

Shiva-kztn
Автор

Hi sir could you please explain same in spark scala 🙏

ppriya
Автор

Sir could you please explain the same thing in spark scala in next video

ppriya