YOLO v4 object detector test / YOLO version 4 example detections

Показать описание

YOLO v4 test by Prof. Dr. Jürgen Brauer, University of Applied Sciences Kempten

Introduction:
==========
In this video I test the performance of the new YOLO v4 object detector ("You Only Look Once"), which sets the new state of the Art (SotA) in object detection (regarding the trade-off between speed and object detection performance, measure by the mAP metric).

Input data:
=========
The test frames are from this video:
It shows a 24 minute walk by Manfred Auer through the nice city of Kempten, located in south Germany near to the Alps.

This video provides us with very interesting test objects: persons (mainly pedestrians), cars, bikes, motor bikes, busses, chairs, etc.

The 24 minute video was splitted into 86473 individual frames using ffmpeg with this command:

YOLO v4 implementation used:
=========================
For object detection I used the new YOLO v4 object detector, available as the original implementation / source code:

It is NOT based on PyTorch or TensorFlow/Keras, but uses its own deep learning framework, called "Darknet". Darknet is written in C and the code is easy to understand.

I compiled the code with GPU support for a faster single frame processing. You can read here a little bit how I got the code running:

Compared to TensorFlow's object detection API (which does not support the new TF 2.x framework, but is based on the old TF 1.x framework), it is a piece of cake to get the code working.

How YOLO was called to produce the output:
====================================
After I got the code compiling, I downloaded the pre-trained YOLO v4 weights from here:

This model was trained on the MS COCO dataset:

I further did a small modification of the code in order to save each prediction image as an individual image.

In /src/detector.c I therefore changed the test_detector() function such that each prediction image is saved (I said, that the code is easy to understand and thus easy to modify...)

Then I used ffmpeg again in order to compile a video from the 86473 prediction images.

This is how I called YOLO v4:

Results:
=======
I have to say that I am really impressed about YOLO v4.
It is extremely fast: a single prediction took less than 50ms on my notebook GPU:

Which GPU?
"hwinfo --gfxcard --short" (on command line)
gave me as result "nVidia GP104GLM Quadro P5200 Mobile"

So I had a framerate of about 20 frames per second and the complete prediction for the 24 minutes video took about:
86473/20 seconds = 72 minutes

... which was roughly my lunch time yesterday ;-)

On the other hand the predictions are incredibly good! I did my PhD in computer vision in an era when the best detectors were the Viola-Jones detector, the HOG detector and the Implicit Shape Model (ISM), and later the Deformable Parts Model (DPM). Actually, object detection at that time did not work (to be honest). Now, things have changed as you can see in this video!

Link to YOLO v4 original paper:
=========================
"YOLOv4: Optimal Speed and Accuracy of Object Detection"
by Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao

Keywords: YOLOv4, YOLO v4, YOLO version4, "You Only Look Once" v4, object detector, object detection, object recognition, object localization, Deep Convolutional Neural Network for object detection, YOLO v4 performance test, YOLO v4 object detection examples, Deep Learning

Another similar video I have produced:
===============================
"Canny Edge World"
(edge detection results for the same input video)