Preprocessing data for Machine Learning - Python Programming for Finance p. 9

preview_player
Показать описание
Hello and welcome to part 9 of the Python for Finance tutorial series. In the previous tutorials, we've covered how to pull in stock pricing data for a large number of companies, how to combine that data into one large dataset, and how to visually represent at least one relationship between all of the companies. Now, we're going to try to take this data and do some machine learning with it!

Рекомендации по теме
Комментарии
Автор

I cannot express how much I enjoy your videos. Thank you for making them!

fuba
Автор

Just discovered your Python Finance series two days ago and am working along on a different screen while watching.
Thank you so much - the videos are very informative.

hankblack
Автор

thank you for the videos! However, I would suggest to take log returns as they are more linear than classical returns.

dmitrypetrov
Автор

hi, thanks for everything.
one point;what u calculated is not percentage.

alie
Автор

6:29 I think divided by should be previous day price, not current price.

hajaksksnsjksksbsnsn
Автор

I feel like there is bias in the data [Data Snooping or Look ahead].
The percentage change for a one day look back period shows up earlier than we might have access to the data and the same continues for others.

for example
df = pd.read_csv('sp500_joined_closes.csv', parse_dates=True, index_col=0)
df['AAPL_rets'] = df['AAPL'].pct_change() df['AAPL_1d'] = (df['AAPL'].shift(-1) - df['AAPL'])/df[AAPL'] should have the same result.

I'm Confused each look back period assumes that the future data is already available for use which is a huge bias. Please correct me if I'm wrong

adeshmallaHQ
Автор

I still don"t understand the shift part in line: df["{}_{}d".format(ticker, i)] = (df[ticker].shift(-i) - df[ticker])/ df[ticker]

First question: You said that: df["{}_{}d".format(ticker, i)]
is the Adj Close value for i days in the future. But you also don't know df[ticker].shift(-i), because that just gives NaN because future data is not known, so how does this works. Because now the equasion is: a = b-c/c. Where in you don't know a and b?

And the other question why do you fillna two times?

tomvkgames
Автор

minor is no longer supporter in np.range() function

amaboh
Автор

I am having a very stupid problem, come list have the share type using a .B or .A type, others have -B and -A. What is a good strategy to avoid this conflict, I having a very recurrent problem with this. Thank you

rubcaspac
Автор

I just ran this (9 Sep 19). I wasn't able to use tickers = df.columns.values.tolist() as it says .values doesn't have .tolist() I'm hoping this doesn't impact the next video.

BrandonJacobson
Автор

not pronx by that, no such thing as typox, type anyx and anyx can be perfx. type and talk can be perfx

zes
Автор

I wanna hit the fucking monitor everytime he sings a word

robsonvonbrum
join shbcf.ru