Use Regular Expression to split string into Dataframe columns (Pandas)

preview_player
Показать описание
Use Regular Expression to split string into Dataframe columns (Pandas)

This video explains the power of regular expressions when we have data which is not in proper format i.e. when the data is in raw form. It will also show you how to break up the dataframe row values which are in continuous string form i.e several column value clubbed together to show a single string as a row value.Regular expressions comes handy to segregate the individual column values which has different data types.

********************GitHub Repo*****************

#DataScience #TheAIUniversity #Pandas
Рекомендации по теме
Комментарии
Автор

Have you encountered such situation you were given a column and were asked to split into separate columns?

TheAIUniversity
Автор

i tried like 10 different ways doing it and by far this is the best, shortest code I could find.

texasfossilguy
Автор

It would be greatful if you help to resolve the problem of NaN in Salary column

shailmodi
Автор

This is exactly what i am looking for, Great presentation! Thanks

thetpainghmoo
Автор

Only to show some ideas with a mate, please see below:


a = dataf['data'].str.split(' ', expand=True) # This is gonna split data in columns and saving in a variable.
dataf['DOJ'] = a[0] # create a column name for the data-frame and use a variable with a zero between the square bracket.
dataf['id'] = a[1] # create a column name for the data-frame and use a variable with a one between the square bracket.
dataf['Salary'] = a[2] # create a column name for the data-frame and use a variable with a two between the square bracket.
dataf['Emp_Name'] = dataf['data'].str.extract('([A-z]\w{0, })', expand = True) # and the same as you did in the video.


dataf['DOJ'] = pd.to_datetime(dataf.DOJ) # change object to datetime series.
dataf['Salary'] = dataf.Salary.astype(float) # change that to float.
dataf['id'] = dataf.id.astype(int) # change to integer.


dataf = dataf.iloc[:, 1:5] # saving the columns you want.
dataf # result of steps above...

marceloperes
Автор

You should have resolved that NaN problem in salary. Needs improvement.

HARSHRAJ-
Автор

Thanks a lot. I think this is better as for salary `dataf['Salary'] = dataf['data'].str.extract('(\d+\.\d+)', expand = True)`

KhalilYasser
Автор

How can I do following task: Size column has sizes in Kb as well as Mb. To analyze, we need to convert these to numeric.
1. Extract the numeric value from the column
2. Multiply the value by 1, 000, if size is mentioned in Mb
18KB
25MB
50mb
120KB

Please make a video

mukundab
Автор

Want to extract words in continue string..ex..name abc gender male salary 15 k loc hydrabad..like wise.. without any comma or collan tto dataframe..how to do it

devendrachaudhari
Автор

Hi,
I have a df column which contains string info. From which I need to extract a text between parentheses and store it as a new column in the same df. Can you help me with this.
df[Col Name]: it is found in 2 erindale crt (TX). This is the column out of which i need to extract the TX between parentheses and save it as a separate column in the same df. Please help

janakiyeluripati
Автор

If the file name is ABBV 29 NOV 19 PUT 81.5.csv
Then in the corresponding dataframe the following columns will be added Symbol = “ABBV”, ExpiryDate=”29Nov19”, OptionType=”PUT”, StrikePrice=81.5. help me this

creativeKDR
Автор

How to change string to float in rows?

tanvikurademusic
Автор

You need to improve your regex expressions
You are missing out on great capabilities!
But thank you for this video anyways this is what I was looking for rn.

esmaelawad
Автор

Hi, I need help in solving a problem

assume two dataframe

df1 = pd.DataFrame({'Text': ['Some text 1', 'Some text 2', 'The monkey eats a banana', 'Some text 4']})
df2 = pd.DataFrame({'Keyword': ['apple', 'banana', 'chicken'], 'Type': ['fruit', 'fruit', 'meat']})

df1

Text
0 Some text 1
1 Some text 2
2 The monkey eats a banana
3 Some text 4

df2

Keyword Type
0 apple fruit
1 banana fruit
2 chicken meat


Thus, the preferable outcome would be:

Text Type
0 Some text 1 -
1 Some text 2 -
2 The monkey eats a banana fruit
3 Some text 4 -


the problem, however, is that banana is in a sentence not a standalone value.

Thanks in advance

smstoaj
Автор

Please take some regex course first, but nice video

uploadvoice
Автор

Stopped watching after the Nan salary... This needs improvement.

sicmikeg