Mask R-CNN on Custom Dataset | Practical Implementation

preview_player
Показать описание

In this video, I have explained step by step how to train Mask R-CNN on Custom Dataset.

If you have any questions with what we covered in this video then feel free to ask in the comment section below & I'll do my best to answer your queries.

Please consider clicking the SUBSCRIBE button to be notified for future videos & thank you all for watching.
Support my channel 🙏 by LIKE ,SHARE & SUBSCRIBE

Instance Segmentation with Mask R-CNN:
Mask RCNN is a deep neural network aimed to solve instance segmentation problem in machine learning or computer vision. In other words, it can separate different objects in a image or a video. You give it a image, it gives you the object bounding boxes, classes and masks.
There are two stages of Mask RCNN. First, it generates proposals about the regions where there might be an object based on the input image. Second, it predicts the class of the object, refines the bounding box and generates a mask in pixel level of the object based on the first stage proposal. Both stages are connected to the backbone structure.
What is backbone? Backbone is a FPN style deep neural network. It consists of a bottom-up pathway , a top-bottom pathway and lateral connections. Bottom-up pathway can be any ConvNet, usually ResNet or VGG, which extracts features from raw images. Top-bottom pathway generates feature pyramid map which is similar in size to bottom-up pathway. Lateral connections are convolution and adding operations between two corresponding levels of the two pathways. FPN outperforms other single ConvNets mainly for the reason that it maintains strong semantically features at various resolution scales.
Now let’s look at the first stage. A light weight neural network called RPN scans all FPN top-bottom pathway( hereinafter referred to feature map) and proposes regions which may contain objects. That’s all it is. While scaning feature map is an efficient way, we need a method to bind features to its raw image location. Here come the anchors. Anchors are a set of boxes with predefined locations and scales relative to images. Ground-truth classes( only object or background binary classified at this stage) and bounding boxes are assigned to individual anchors according to some IoU value. As anchors with different scales bind to different levels of feature map, RPN uses these anchors to figure out where of the feature map ‘should’ get an object and what size of its bounding box is. Here we may agree that convolving, downsampling and upsampling would keep features staying the same relative locations as the objects in original image, and wouldn’t mess them around.
At the second stage, another neural network takes proposed regions by the first stage and assign them to several specific areas of a feature map level, scans these areas, and generates objects classes(multi-categorical classified), bounding boxes and masks. The procedure looks similar to RPN. Differences are that without the help of anchors, stage-two used a trick called ROIAlign to locate the relevant areas of feature map, and there is a branch generating masks for each objects in pixel level. Work completed.

#maskrcnn #objectdetection #deeplearning #ai #artificialintelligence #ml #computervision #cnn #convolutionalneuralnetwork
Рекомендации по теме
Комментарии
Автор

Thank you maam. Most of the other tutorials dont explain in such way. Your explaination helps me to understand code in deep. Thank u very much

sumedh
Автор

Your videos are so well "architected". It takes us to the depths never taken by others. THANK

debarunkumer
Автор

Thank you very much ma'am. It is my first time watching a clear cut example with step by step explanation for mask-rcnn training on Custom Dataset.

anjalipc
Автор

I looked up the Walled Abdulla blog post. The customData Class had me stuck for rwo days, this explanation solves it all. Thanks a many, Aarohi.
This is the 2nd time your efforts are helping me, first one I guess FPN pyramid using CNN.

adityanjsg
Автор

Madam
We are using only one annotations1 =
and while preparing datset we are sending two train and val but using only one annotations
pl. clarify

seetharamnageshappe
Автор

hello, I have tried to test the detection on my own dataset.
There is no error but I got this message : "NO INSTANCE TO DISPLAY"
I see instances but the detection are not good

nadabelhadjslimen
Автор

thank you for your answer, the problem is solved but it gives me the high loss rate which implies a bad detection, so please what should I do to correct it.
Epoch 30/30
100/100 - 784s 8s/step - batch: 49.5000 - size: 1.0000 - loss: 0.3552 - rpn_class_loss: 0.0052 - rpn_bbox_loss: 0.1705 - mrcnn_class_loss: 0.0218 - mrcnn_bbox_loss: 0.0532 - mrcnn_mask_loss: 0.1044 - val_loss: 3.8523 - val_rpn_class_loss: 0.1956 - val_rpn_bbox_loss: 2.3940 - val_mrcnn_class_loss: 0.3829 - val_mrcnn_bbox_loss: 0.3561 - val_mrcnn_mask_loss: 0.5238

PLEASE HELP ME

ikrambekkar
Автор

Thank you so much Ma'am. Your explanation is really Amazing. It has helped us alot. It is very knowledge full. Thank you Again Madam

Automotive_Engineer
Автор

polygons = [r['shape_attributes'] for r in a['regions']]
22 objects = for s in a['regions']]
23 print("objects:", objects)

TypeError: string indices must be integers
why am i getting this error

saikiran
Автор

please give me solution of this error FutureWarning: Input image dtype is bool. Interpolation is not defined with bool data type. Please set order to 0 or explicitely cast input image to another data type. Starting from version 0.19 a ValueError will be raised instead of this warning.
order = _validate_interpolation_order(image.dtype, order)

nedjmahachemi
Автор

I tried to do this project on colab but it's not moving beyond step9/10 of epoch 1. I also have a doubt about where do we use the val json file as you didn't mention it anywhere

sheshankk
Автор

thank you maam for the explaination. Could you please help why im getting this error when i run this on my dataset AttributeError: 'str' object has no attribute 'decode'

hargunkaur
Автор

Hi, I have question. I implemented everything as you said; however, no h5 files are being created in the logs folder. How to fix? (I have tf= 1.13.1, python=3.7, keras=2.3.0)

manasaprabhala
Автор

thank you for this video, please in the training phase I got the following error:
ResourceExhaustedError: OOM when allocating tensor of shape [] and type float
[[{{node mul/y}}]]
PLEASE HELP ME

ikrambekkar
Автор

mam how to resize the custom data as the model is not accepting the images

rajnigoyal
Автор

Maam im trying to train my model in colab but its using CPU rather than GPU provided by colab, and because of that training is taking much more time to complete what could be the issue?

harshchindarkar
Автор

Hello. Thank you for your video. It helps me a lot.

1 little question about the annotation section tho.
Q) you said we need to change the annotation directory to the training folder so that the json file or training folder can be used. However, we actually labeled training folder AND validation folder as well. What about the json file of val folder?? We are not going to use it?

Please answer my question when you are available.. ! Thank you once again for the helps!

Seung
Автор

may I ask you ?

as far as I understood in log folder we'll get trained weights, not full model, won't we ?
And could you help me ? how can I convert this output to JS model.
I should train model and load this weights (which I got after your tutorial) to my new model ?

foxil
Автор

Thank you so much for this video. Really helpful.

sushantparajuli
Автор

Thank you for this video, I'm implementing RCNN on satellite images, but I'm not getting the desired accuracy score. After considering Augmentation techniques, I was wondering if there is a way to choose the best Augmentation technique. Please Help! Thanks in advance!

GlenBennetHermon