Let us consider the example in last post where it has been underlined how using asynchronous copies enables achieving true multi-GPU concurrency. In particular, let us consider Test case #8 of that post.
The full code of Test case #8 is available on our GitHub website, while the profiler timeline is reported here for the sake of clarity:
The full code for the timing example here reported is available on our GitHub website.
Timing the asynchronous copies - concurrency is destroyed
Now...
More
Multi-gpu
Concurrency in CUDA multi-GPU executions

Achieving concurrent executions on multi-GPU systems is a very appealing feature since it can further linearly scale the execution time of embarrassingly parallel problems.
We have done some experiments on achieving concurrent execution on a cluster of 4 Kepler K20c GPUs. We have considered 8 test cases, whose corresponding codes along with the profiler timelines are reported below.
Test case #1 - "Breadth-first" approach - synchronous copy
Code - https://github.com/OrangeOwlSolutions/Multi...
More