CUDA Crash Course: Sum Reduction Part 5

preview_player
Показать описание
In this video we look at another optimization of our sum reduction kernel using a device function and loop unrolling!

Рекомендации по теме
Комментарии
Автор

If the idle warps are an issue, can we relaunch a kernel at each iteration? Then the free warps can be used for other purposes, is that fine?

summerQuanta
Автор

hi,sir.I have an question.my accumulated result is 16 in, not 65536, i
dont know why, do you know why?

谢晗-zd