CUDA Timing for Multi-GPU Applications

Let us consider the example in last post where it has been underlined how using asynchronous copies enables achieving true multi-GPU concurrency. In particular, let us consider Test case #8 of that post. The full code of Test case #8 is available on our GitHub website, while the profiler timeline is reported here for the sake of clarity: The full code for the timing example here reported is available on our GitHub website. Timing the asynchronous copies - concurrency is destroyed Now...
More

Concurrency in CUDA multi-GPU executions

Achieving concurrent executions on multi-GPU systems is a very appealing feature since it can further linearly scale the execution time of embarrassingly parallel problems. We have done some experiments on achieving concurrent execution on a cluster of 4 Kepler K20c GPUs. We have considered 8 test cases, whose corresponding codes along with the profiler timelines are reported below. Test case #1 - "Breadth-first" approach - synchronous copy Code - https://github.com/OrangeOwlSolutions/Multi...
More