filmov
tv
Low-Latency GPU Motion Tracking (C++/CUDA/OpenCV) - Test 1
![preview_player](https://i.ytimg.com/vi/RFflwwcafyA/maxresdefault.jpg)
Показать описание
Test 1 of my GPU-based motion tracking program, on a scene from Apocalypse Now. This test used a variable number of keypoints, but with a typical search area size. This configuration achieves latencies low enough for very fast real-time operations.
Using C++, CUDA, and OpenCV (used for video input/output), I created a motion tracking program that works similar to the h.264 motion vector search algorithm, but is heavily parallelized to run on a graphics card. It runs about 20-40 times faster for a typical search size (16x16 match area, 48x48 search window) as compared to the serial CPU algorithm.
In the above video, the GPU algorithm managed a worst-case latency of 2.5ms (or over 400 FPS *minimum*), whereas the CPU-based algorithm had a worst-case latency of over 200ms (a mere 5 FPS).
The algorithm works similar to the last video, using thresholds between a reference frame, the last tracking point's location, and the current frame to determine the keypoint's new location (full-search, sum of absolute differences of each pixel).
With a constrained search window of 48x48 with 16x16 reference blocks and 64 keypoints, I can get around 4ms average latency (250 FPS).
Test were run with an Intel i7 2600k @ 4.0 GHz (8GB DDR-1866) and a Nvidia GeForce 560 Ti.
Using C++, CUDA, and OpenCV (used for video input/output), I created a motion tracking program that works similar to the h.264 motion vector search algorithm, but is heavily parallelized to run on a graphics card. It runs about 20-40 times faster for a typical search size (16x16 match area, 48x48 search window) as compared to the serial CPU algorithm.
In the above video, the GPU algorithm managed a worst-case latency of 2.5ms (or over 400 FPS *minimum*), whereas the CPU-based algorithm had a worst-case latency of over 200ms (a mere 5 FPS).
The algorithm works similar to the last video, using thresholds between a reference frame, the last tracking point's location, and the current frame to determine the keypoint's new location (full-search, sum of absolute differences of each pixel).
With a constrained search window of 48x48 with 16x16 reference blocks and 64 keypoints, I can get around 4ms average latency (250 FPS).
Test were run with an Intel i7 2600k @ 4.0 GHz (8GB DDR-1866) and a Nvidia GeForce 560 Ti.
Комментарии