Let Us time Some Code - Solution - Intro to Parallel Programming

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

Tested on Titan Xp, which proved option 3 is slower than option 1, even slower than option 3 which uses atomic operation.
Detailed time elapses are shown as follows:
Time elapsed = 0.024576 ms
Time elapsed = 0.032704 ms
Time elapsed = 0.436224 ms
Time elapsed = 0.197632 ms
Time elapsed = 1.8943 ms

jasonperhaps
Автор

Hello Sir,

I have a doubt regarding the option 1 & option 2. Why shouldn't both of them take the same time. As per the CUDA guide it is stated as follows
"The operation is atomic in the sense that it is guaranteed to be performed without interference from other threads. In other words, no other thread can access this address until the operation is complete."
So if no other thread can't access the same address, which means we can operate on the other addresses. Since there are equal threads and equal elements, why shouldn't both of them take the same time?

Thanks

shaurakar
Автор

can I just assume that the majority of the time is used to allocating threads rather than computation/cost of atomic ops?

thinkingaloud
welcome to shbcf.ru