Sharing Data between GPU Kernels within GPU Memory

Yes, sharing data between guest GPU Kernels without having to transfer data back to the host CPU is the ideal situation.

```

__global__ void Addition (float *aD, float *bD, float *cD)where aD, bD, and cD are pointers to device memory and cD [index] = aD [index] + bD [index], you can use cD in the next kernel without copying the cD back to the CPU.
 
__global__ void ScalarMultiply (float *arrD, int scalar) could use the output of Addition without doing a copy to the CPU! The host code that launches the kernels might look something like:
 
// Do cudaMalloc allocations and cudaMemcpys here
 

// cudaMemcpy() or cudeDeviceSynchronize() on CPU Host will cause Host to wait for Kernels to complete

//These 2 guest calls will happen synchronously without explicit synchronize() call
Addition <<< gridSize, blockSize >>> (num1D, num2D, sumD);

// This call can re-use the sumD memory already in the GPU
ScalarMultiply <<<gridSize, blockSize>>> (sumD, 5);

``` 

Have more questions? Submit a request

Comments