What is the performance difference between using cudaMallocManaged() and cudeMalloc()

Pre-Pascal architecture, using cudaMallocManaged() can be 2x slower than cudaMalloc().

Post-Pascal architecture, cudeMallocManaged() is faster.  (Results TBD.)

Have more questions? Submit a request

Comments