Machine Learning for Trading - Stocks - Shares - Python - AI - Deep Learning Course Algorithmic

preview_player
Показать описание
Pandas, Time series analysis, Computational Investing, Algorithmic trading, Reinforcement learning for Trading

This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations.

#python #machinelearning #ai
Рекомендации по теме
Комментарии
Автор

Thank you for your teaching!

0:01:31 3 mini courses
1. Manipulating Financial Data in Python
2. Computational Investing
3. Learning Algorithms for Trading

0:02:33 Textbooks
1. Python for Finance
2. What Hedge Funds Really Do
3. Machine Learning

0:04:24 Python Features

0:05:23 Comma Separated Values files with headers

0:12:21 Pandas dataframe

0:15:33 df =

0:16:35 print('The top 5 lines of XXX data\n\n', df.tail())

0:17:00 index column added by Pandas dataframe to access rows

0:17:20 Slicing to get the data for a range by index
print('The range from the 11th line to 20th line of 10 lines for XXX data\n\n', df[11:20 + 1])
print(df[33:99 + 1][['Date', 'High']])
df['Date'] = pd.to_datetime(df['Date'])
df_october = df[(df['Date'].dt.year == 2020) & (df['Date'].dt.month == 10)]
start_date = pd.to_datetime("2020-10-01") end_date = pd.to_datetime("2020-10-31")
df_filtered = df[df["Date"].between(start_date, end_date)]

0:18:06 max(), mean(), min() methods in Pandas dataframe
df[('Close')}.max()
print('The maximum value of the "Close" column for XXX data\n', df['Close'].max())

0:30:47 Build a dataframe in Pandas with a time range
start_date = '2020-01-01' end_date = '2020-01-31'
df_dates = pd.date_range(start_date, end_date)

0:34:43 df = pd.read_csv('data.csv', index_col = 'Date', parse_dates = True, usecols = ['Date', 'Close'])
df = df.set_index("Date")

0:36:36 df = df.dropna()

0:46:26 df = df.set_index("Date") print(df["2020-10-01":"2020-10-02"][['Open', 'High']])

0:50:30 Plotting
plt_df = df.plot(title = 'title', fontsize = 12)
plt_df.set_xlabel('Data')
plt_df.set_ylabel('Price')
plt.show()

0:53:05 Normalize data df = df / df[0, :]
df = df.div(df.iloc[:, 0], axis=0)

0:55:45 Slicing on ndarrays for Numpy nda[0:3, -5:-2]

1:02:30 Creating ndarrays np.empty(5) np.empty((5, 3)) print(np.ones((2, 3, 4, 5)))

1:06:01 Generating random numbers
print(np.random.random((3, 4))) # between 0 to less than 1
print(np.random.randint(10)) # a integer in 0 to 9
print(np.random.randint(6, 10)) # a integer by a range 6 to less than 10
print(np.random.randint(10, 110, size = 5)) # 5 intergers in a array
print(np.random.randint(1, 11, size = (2, 3, 4, 5)))

1:09:31 Checking the rows and columns of ndarrays
print(df.shape)
print(df.shape[0])
print(df.shape[1])

1:10:16 Checking the dimension of ndarrays
print(len(df.shape))

1:10:35 Counting total number for elements of ndarrays print(df.size) print(df.dtype)

1:12:18 Calculation for ndarrays
print('Sum of all elements: ', df.sum())
print('Sum of each column: ', df.sum(axis = 0))
print('Sum of each row: ', df.sum(axis = 1))

1:14:03 print('Maximum of each column: ', df.max(axis = 0))
print('Minimum of each row: ', df.min(axis = 1))
print('Mean of all elements: ', df.mean())

1:15:30 Checking the index of a specific condition
print('The index of the maximum value: ', df.argmax())

1:19:05 Accessing for ndarrays

1:20:55 Slicing non-series columns df[:, 0:3:2]

1:21:43 Assigning ndarray elements df[0, :] = [1, 2, 3, 4]

1:23:08 Assigning ndarray elements with other ndarrays
nda1[1, 1, 2, 4] nda2[nad1] # the values of the indices of nda1 are assigned to nda2 array

1:24:40 Boolean ndarrays

1:25:40 Masking mean_df = df.mean() print(df[df < mean_df]) df[df < mean_df] = mean_df

1:26:27 Arithmetic operations by elements' indeices

1:27:05 Divisions give the results with C language attribute, not Python's

1:30:13 Global statistics

1:31:28 df.mean() df.median() df.std()

1:33:21 Rolling statistics

1:37:09 Bollinger bands

1:44:39 Daily returns

1:50:24 Shift column diff_df = (df[1:] / df[:-1].values) - 1
diff_df = (df / df.shift(1)) - 1 diff_df.iloc[0, :] = 0

1:50:47 Cumulative returns cum_df = (df.iloc[-1] / df.iloc[0]) - 1
diff_df = (df[:, -1] / df[:, 0].fillna(0)) - 1

1:53:09 Historical financial data

2:02:53 Pandas fillna() df.fillna(0, inplace = True) df.ffill(inplace = True) df.bfill(inplace = True)

2:12:40 Plot a histogram

2:25:00 Scatter plots

2:31:20 Daily portfolio value

2:35:57 Portfolio statistics

2:40:21 Sharpe ratio risk adjusted return

2:52:30 Optimizer

3:02:40 Convex problems

3:05:49 Building a parameterized model

3:24:52 Framing the problem
provide a function to minimize
provide an initiial guess for x
call the optimizer

3:26:53 Ranges and constraints
Ranges: limits on values for x
Constrains: properties of x that must be true

3:29:38 Types of funds
ETF Buy/Sell like stocks Baskets of stocks Transparent Liquid
Mutual fund Buy/Sell at end of day Quarterly disclosure Less transparent Large cap
Hedge fund Buy/Sell by agreement No disclosure Not transparent

3:37:16 Incentives: How are they compensated?

4:01:13 What is in an order?

4:03:58 The order book

4:11:34 How orders get to the exchange

4:15:40 How hedge funds exploit market mechanics

4:19:48 Additional order types

4:29:35 Why company value matters

4:35:51 The value of a future dollar

4:45:25 What is the value?
intrinsic value

4:45:58 Book value Total assets minus intangible assets and liabilities

4:48:05 Market cappitalization number of shares x price

4:57:09 Definition of a portfolio

5:04:09 The CAPM equation

5:17:08 Arbitrage pricing theory

5:33:03 Characteristics of Technical Analysis

5:55:02 How data is aggregated

6:00:43 Stock splits

6:07:04 Dividends

6:14:15 Survivor bias

6:17:07 EMH assumption

6:21:02 3 forms of the EMH
Weak : Future prices cannot be predicted by analyzing historical prices
Semi-strong: Prices adjust rapidly to new public information
Strong : Prices reflect all information public and private

6:28:36 Grinold's Fundamental Law
Performance Skill Breadth

6:29:24 Performance = Skill * Breadth**(1/2)
'''Skill much dominant than Breadth
so/but breadth has to be at least 1 not to diminish the total, performance'''


6:44:16 Real World
RenTec trades 100k/day (some say 2k/0.3sec)
BerkHath holds 120 stocks

6:48:55 IR =IC * BR**(1/2)
Information Ratio
Information Coefficient(correlation of forecasts to returns)
BReadth number of trading opportunities per year

6:53:00 What is risk?

6:58:53 The importance of covariance

7:02:26 Mean Variance Optimization
Inputs: Expected return Volatilaty Covariance Target return
Output: Asset weights for portfolio that minimize risk

jaekunyoo
Автор

Thanks Hi Tucker and Dave, I am sure enjoying the power of the pandas & the python.

merv
Автор

why do majority people use price lagging indicators, they don't give profits in the short term trading, could you use price action trading so that majority wants to see price action trading as the main trading tool. Please use price action alone for trading using your machine learning that will be the game changer.

munivoltarc
Автор

I see many stock market analysis, majority use moving averages, why they won't talk much or use quantitative analysis using price action top down approach, Elliott wave theory or Wycoff ?, could you use these price action trading on machine learning models, in combination of your technical indicators?

munivoltarc
Автор

For any one stuck on def normalise_data function you can use this: return (df /df.head(1).sqeeze()). Had slight problem on colab.

merv
Автор

could you explain what is volume actually means in a particular time or any other, like a trader bought 10 shares + another trader sold 10 shares to the buyer = 20 shares traded is that volume? please explain

munivoltarc
Автор

You can also give us the codes from the presentation

costelbossul
Автор

Where did u get the vid I want to find the source code and I cant find anything

plugk
Автор

7:08:44 Machine Learning

7:09:13 The Machine Learning problem

7:10:00 on Model in ML observation multi-demensional prediction sigle dimension

7:11:42 Supervised regression learning
regression: numerical prediction
supervised: provide example x, y
learning: train with data

7:13:00 Linear regression(parametric)
k nearest neighbor(KNN)(instance based)
decision trees
decision forests

7:14:50 Example Training Episode Robot car

7:17:52 How it works with stock data

7:26:00 Backtesting

7:29:40 Problems with regression
noisy and uncertain
challenging to estimate confidence
holding time, allocation

7:31:19 Policy Learning RL

7:33:25 Parametric regression

7:37:37 K nearest neighbor(KNN)

7:40:49 Parametric or non-parametric

7:45:42 Training and testing

7:48:31 Learning APIs
For Linear regression:
learner = LinRegLearner()
learner.train(Xtrain, Ytrain)
y = learner.query(Xtest)
For KNN:
learner = KNNLearner(K = 3)
learner.train(Xtrain, Ytrain)
y = learner.query(Xtest)

7:49:46 Example for linear regression
class LinRegLearner():
def __init__():
pass
def train(x, y):
self.m, self.b = xxx-linreg(x, y)
def qery(x):
y = self.m * X + self.b
return y

7:52:23 A closer look at KNN solutions

7:59:14 Metric1: RMS error ((sum * ((Ytest - Ypredict) ** 2)) / N ** (1/2))

8:02:11 Cross validation for train - 60% data for test - 40% data 80:20

8:04:07 Metric2: Correlation

8:07:13 Overfitting
The train and test results diverge

8:11:17 A few other considerations

8:13:53 Ensemble learners Taking the mean of the results for the multiple types of models

8:16:45 How to build an ensemble?

8:18:19 Bootstrap aggregating - bagging
n number of instances
n' number in a bag random with replacement
m number of bags different models

8:23:32 Boosting: Ada Boost
an ensemble learning algorithm that combines weak learners to make a strong learner

8:26:39 Boosting and bagging
wrappers for existing methods
reduces error
reduces overfitage

8:27:41 Reinforcement Learning

8:28:10 The Reinforcement Learning problem
state policy action reward

8:32:05 Q: Trading as an RL problem

8:34:00 Mapping trading to RL

8:35:51 Markov decision problems
Set of states S
Set of actions A
Transition function T[s, a, s']
Reward function R[s, a]
Find policy ㅍ*(s) that wil maximize reward * express 'optimum'

8:38:14 Unknown transitions and rewards
Model-based
Build model of T[s, a, s'] R[s, a]
Value/Policy iteration
Model-free Q-Learning

8:41:09 What to optimize?
infinite horizon
finite horizon
discounted reward - for Q-Learning

8:47:41 Q: Which gets $1M?

8:49:14 RL summary
RL algos solve MDPs
S, A, T[s, a, s'], R[s, a]
Find ㅍ(s) -> a
Map trading to RL

8:51:03 Q-Learning - model-free approches

8:51:42 What is Q? - table, not greedy
Q[s, a] = immediate reward + discounted reward(for future actions)
How to use Q?
ㅍ(s) = argmax`a(Q[s, a])
ㅍ*(s) Q*[s, a]

8:54:34 Q Learning procedure
Big picture
select training data
iterate over time <s, a, s', r>
test policy ㅍ
repeat until converge
Details
set starttime, init Q[]
compute s
select a
observer r, s' <s, a, s', r>
update Q

8:57:57 Update rule
alpha learning rate 0 to 1 (0.2) -larger faster
gamma discount rage 0 t0 1 -lower lower later rewards by high discount rate
Q'[s, a] = (1 - alpha) * Q[s, a] + alpha * improved estimate
Q'[s, a]=(1-alpha)Q[s, a]+alpha*improved estimate
Q'[s, a] = (1 - alpha) * Q[s, a] + alpha * (r + gamma * later rewards)
Q'[s, a] = (1 - alpha) * Q[s, a] + alpha * (r + gamma * Q[s', argmax`a'(Q[s', a'])])
Q'[s, a]=(1-alpha)Q[s, a]+alpha(r+gammaQ[s', argmax`a'(Q[s', a, ])])

9:03:04 Two finer points
Sucess depends on exploration
Choose random action with prob c

9:04:35 The trading problem: Actions Buy Sell Nothing

9:07:58 Q: The trading problem: Rewards

9:08:29 The trading problem: State

9:10:31 Creating the state
state is an integer
discretize each factor
combine

9:12:23 Discretizing
stepsize = size(data) / steps
data.sort()
for i in range(0, steps)
threshold[i] = data[(i + 1) * stepsize)]

9:14:17 Q-Learning Recap
Building a model
define states, actions, rewards
choose in-sample training period
iterate: Q-table update
backtest
Testing a model
backtest on later data

9:15:54 Dyna-Q Big Picture
Q-Learn
init Q table
observe s
execute a, observe s', r
update Q with <s, a, s', r>
repeat => expensive
Dyna-Q
Learn model T(state transition function) R(reward function)
Hallucinate experience
Update Q
repeat 100 - 200 => cheap
T'[s, a, s'] R'[s, a] update each model
s = random
a = random
s' = infer from T[ }
r = R[s, a]
update Q w/ <s, a, s', r>

9:20:08 Learning T
T[s, a, s'] prob s, a -> s'
init Tc[ ] = 0.00001 c: count
while executing, observe s, a, s'
increment Tc[s, a, s']

9:21:43 Q: How to evaluate T?
T[s, a, s'] = Tc[s, a, s'] / sum of i * T[s, a, i]

9:23:23 Learning R
R[s, a] expected reward for s, a
r immadiate reward
R'[s, a] = (1 - alpha) * R[s, a] + alpha * gamma

9:25:03 Dyna-Q recap
Q-Learn
init Q table
observe s
execute a, observe s', r
update Q with <s, a, s', r>
repeat => expensive
Dyna-Q
T'[s, a, s'] R'[s, a] update each model
s = random
a = random
s' = infer from T[ }
r = R[s, a]
update Q w/ <s, a, s', r>

9:25:59 Interview with Quandl founder

Nov 4, 2023 Mon 14:32 PST

jaekunyoo
join shbcf.ru