Benchmarks of CUDA enabled Gromacs

Esben Friis · Post by **Esben Friis** » Wed May 26, 2010 1:14 am

Hi All

I did some benchmarks of the CUDA enabled Gromacs, and the results were quite disappointing, as you can see below. The price/performance ratio of the GPU is (very!) inferior. I am nowhere near the 10-20x speedup, that has been reported. Does anyone have a test system available, which shows good performance on the GPU?

Cheers,

Esben

System:

Protein in water
58339 atoms
NVE
temp = 305 K
pbc
no electrostatic cutoffs

1 CPU core (Intel(R) Xeon(R)X5450@3.00GHz)

start: 2010-05-11 11:11
end: 2010-05-11 13:11
wall time: 120 min = 2.00 h

frames: 408
time: 40.8 ps

speed: 20.4 ps/h
normalized: 1.00x

4 CPU cores (Intel(R) Xeon(R)X5450@3.00GHz)

start: 2010-05-11 11:23
end: 2010-05-11 11:43
wall time: 20 min = 0.333 h

frames: 241
time: 24.1 ps

speed: 72.3 ps/h
normalized: 3.54x

CUDA, Nvidia Quadro FX5800

start: 2010-05-11 15:17
end: 2010-05-11 11:02
wall time: 1185 min = 19.75 h

frames: 5802
time: 580.2 ps

speed: 29.4 ps/h
normalized: 1.44x

francesco oteri · Post by **francesco oteri** » Wed May 26, 2010 6:06 am

Dear Esben,
I found the same difference of performance. I discovered that the problem is the algorithm used for electrostatic computation. The published data refers to simulations performed used Reaction field that is 15x-20x faster in GPU implementation. PME is slower, and the simulation that use it, are only 3x faster than the single CPU.

Peter Eastman · Post by **Peter Eastman** » Wed May 26, 2010 11:20 am

You're correct that PME is significantly slower than Reaction-Field. We hope to improve the speed of it in future releases, but it's a hard algorithm to adapt to a GPU.

Keep in mind that the parameters that give optimal performance will often be different for the GPU than for the CPU, so if you run both simulations with identical parameters that isn't really a fair comparison. In particular, you should try increasing the cutoff distance a bit. That will allow it to do more of the computation in direct space (where the GPU is faster) and less in reciprocal space. I usually find that the CPU gives optimal performance with a cutoff of around 1 to 1.2 nm, while the GPU gives optimal performance with a cutoff of around 1.5 to 1.8 nm.

Peter

Benchmarks of CUDA enabled Gromacs

Benchmarks of CUDA enabled Gromacs

RE: Benchmarks of CUDA enabled Gromacs

RE: Benchmarks of CUDA enabled Gromacs