Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

preview_player
Показать описание
Get my Free NumPy Handbook:

In this Machine Learning from Scratch Tutorial, we are going to implement a Random Forest algorithm using only built-in Python modules and numpy. We will also learn about the concept and the math behind this popular ML algorithm.

~~~~~~~~~~~~~~ GREAT PLUGINS FOR YOUR CODE EDITOR ~~~~~~~~~~~~~~

📓 Notebooks available on Patreon:

If you enjoyed this video, please subscribe to the channel!

The code can be found here:

You can find me here:

#Python #MachineLearning

----------------------------------------------------------------------------------------------------------
* This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
Рекомендации по теме
Комментарии
Автор

I thoroughly enjoyed learning (1) Decision Tree and (2) Random Forest from your videos. Thanks a lot. The Decision Tree program was sleek and modular and is easy to understand and remember. If you throw in something from your lecture as comments, it will be a great learning tool.

airesearch
Автор

Again, great work, as in the others. I learned how to code ML algorithms from you. Thanks a lot. I added random features and random row selection sections to your algorithm. Interested friends can try it.

import numpy as np
from dt import DecisionTree
from collections import Counter

def bootstrap_sample(X, y):
n_samples, n_columns = X.shape
# random rows selection
n_samples_row = int(n_samples/10)
row_idxs = np.random.choice(n_samples, size=n_samples_row, replace=False)
# random features selection
n_samp_col = int(np.sqrt(n_columns))
col_idxs = np.random.choice(n_columns, size=n_samp_col, replace=False)
# creating sub dataset
Xidx = X[row_idxs]
newX = np.zeros([n_samples_row, n_samp_col])
for i in range(n_samp_col):
newX[:, i] = Xidx[:, col_idxs[i]]

return newX, y[row_idxs], col_idxs

def most_common_label(y):
counter = Counter(y)
most_common = counter.most_common(1)[0][0]
return most_common

class RandomForest:

def __init__(self, n_trees=100, min_samples_split=2, max_depth=10, n_feats=None):
self.n_trees = n_trees
self.min_samples_split = min_samples_split
self.max_depth = max_depth
self.n_feats = n_feats
self.trees = []

def fit(self, X, y):
self.n_feats = X.shape[1] if not self.n_feats else min(self.n_feats, X.shape[1])
self.trees = []
self.rand_feats = []
for _ in range(self.n_trees):
X_sample, y_sample, rand_feat = bootstrap_sample(X, y)
tree = DecisionTree(min_samples_split=self.min_samples_split,
max_depth=self.max_depth, n_feats=self.n_feats)
tree.fit(X_sample, y_sample)
self.trees.append(tree)


def predict(self, X):
# creating prediction array
y_pred = []
for j in range(self.n_trees):
# selection of necessary features
len_feat = len(self.rand_feats[1])
new_X = np.zeros([len(X), len_feat])
new_feats = self.rand_feats[j]
for i in range(len_feat):
new_X[:, i] = X[:, new_feats[i]]
# prediction is made for each tree

# majority vote
y_pred = np.swapaxes(y_pred, 0, 1)
y_pred_final = [most_common_label(tree_pred) for tree_pred in y_pred]
return np.array(y_pred_final)

muratsahin
Автор

Your code looks likes Bagging, but in Random Forest we randomly choose the number of features in each node of decision tree

The process of building a tree is randomized: at the stage of choosing the optimal feature for which the splitting will take place, it is not searched among the entire set of features, but among a random subset of the size of q.

Special attention should be paid to the fact that a random subset of size q is selected anew every time it is necessary to split another vertex. This is the main difference between this approach and the method of random subspaces, where a random subset of features was selected once before constructing the basic algorithm.

Am i wrong?

fedorlaputin
Автор

hi..nice tutorial..what about plotting each tree in notebook?

abseenahabeeb
Автор

For Regression instead of most_common_label we can calculate mean of the predictions right?

dheerajkumark
Автор

what about make playlist like this for deep learning, it will be very helpful!

fedorlaputin
Автор

You referred your previous video to better understand this video but you did not keep a link of your previous video anywhere. This is not good. You should mention the previous video link in description or a pinned comment.

MdMainuddincse
Автор

Shouldn't it be
np.random.choice(n_sampe, size = subset_sample_value, replace = True)
so that it choose a subset of data.

BTW you should make a paid course, I will surely purchase.
I had bought lot's of udemy courses, yours is the best and it's free lol.

All the course just give a slight overview (theory) then import scikitLean,
1. import data: do feature scaling and engineering, do some necessary data engineering stuff.
then import model.
2. trainTestsplit.
3. fit.
4. predict
5. accuracy
done lol.

yours is the best.

jacjacl
Автор

Here I see random forest class is using almost all the __init__ from decision tree. Can I use super() to get the init from Decision tree class instead of manually copying the parameters?

jasonyam
Автор

hi
np.swapaxes is giving this error
"numpy.AxisError: axis2: axis 1 is out of bounds for array of dimension 1"

prashantsharmastunning
Автор

Can you please one more Adaboost video to complete this series

jasony
join shbcf.ru