GPU profiling tools

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Thu Jul 30, 2009 12:07 pm

When you huys were developing the CUDA kernels, and wante to test the performance of your code, were there any tools that you found easy to use and useful when it came to profiling the code? Im interested in things like
1. Time spent on the GPU vs CPU for my code, assuming I outsource only force computations using OpenMM versus outsourcing everything to the GPU.
2. How the GPU to CPU bandwidth and latency affects my code if I need to transfer coordinates and velocities to the CPU every X steps

Michael Sherman · Post by **Michael Sherman** » Thu Jul 30, 2009 12:57 pm

I don't have a direct answer to your question, just a comment. I also outsourced only force computations to OpenMM from Molmodel. At first I had a lot of trouble figuring out what was going on because my CPU was 100% busy no matter what I did. I looked at it with Intel VTune and determined that it was spin-looping in the driver on Windows waiting for GPU response. So no matter how much work was getting done on the GPU the CPU was pegged waiting for it.

This changed with CUDA 2.2. Although the default is still to spin loop, there is a CUDA option for blocking; i.e., the CPU thread that invokes CUDA relinquishes control until the GPU comes back. Peter is planning to make the no-spinning behavior the default in the next version of OpenMM. For now you would have to add in a CUDA call to change the default behavior. When I did that my CPU was at 10-20% busy with moderate-size problems being handled by OpenMM.

Assuming you're building your own OpenMMCuda library and have CUDA 2.2, you can add the following lines to the beginning of the gpuInit() function in gpu.cpp:

cudaError_t status;

// Prevent spin-looping when waiting for response from the GPU (applies
// to the current thread only). Must be done prior to initializing the
// CUDA runtime in this thread.
status = cudaSetDeviceFlags(cudaDeviceBlockingSync);
RTERROR(status, "Error setting device flags")

(And remove the now-duplicate declaration of "status" just below that.)

Sherm

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Thu Jul 30, 2009 1:30 pm

Thanks Sherm I'll try that out and see how it affect performance.

Peter Eastman · Post by **Peter Eastman** » Thu Jul 30, 2009 1:48 pm

Nvidia's cudaprof tool is very useful for that sort of thing. You can see exactly how much GPU time is spent in each kernel, as well as copying data between GPU and CPU.

Peter

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Thu Jul 30, 2009 2:29 pm

Hi Peter

I tried cudaprof first up but 64 bit support seems to be non-existent right now.

GPU profiling tools

GPU profiling tools

RE: GPU profiling tools

RE: GPU profiling tools

RE: GPU profiling tools

RE: GPU profiling tools