Benchmarks of CUDA enabled Gromacs

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Esben Friis
Posts: 1
Joined: Fri Oct 30, 2009 4:52 am

Benchmarks of CUDA enabled Gromacs

Post by Esben Friis » Wed May 26, 2010 1:14 am

Hi All

I did some benchmarks of the CUDA enabled Gromacs, and the results were quite disappointing, as you can see below. The price/performance ratio of the GPU is (very!) inferior. I am nowhere near the 10-20x speedup, that has been reported. Does anyone have a test system available, which shows good performance on the GPU?

Cheers,

Esben


System:

Protein in water
58339 atoms
NVE
temp = 305 K
pbc
no electrostatic cutoffs


1 CPU core (Intel(R) Xeon(R)X5450@3.00GHz)

start: 2010-05-11 11:11
end: 2010-05-11 13:11
wall time: 120 min = 2.00 h

frames: 408
time: 40.8 ps

speed: 20.4 ps/h
normalized: 1.00x


4 CPU cores (Intel(R) Xeon(R)X5450@3.00GHz)

start: 2010-05-11 11:23
end: 2010-05-11 11:43
wall time: 20 min = 0.333 h

frames: 241
time: 24.1 ps

speed: 72.3 ps/h
normalized: 3.54x


CUDA, Nvidia Quadro FX5800

start: 2010-05-11 15:17
end: 2010-05-11 11:02
wall time: 1185 min = 19.75 h

frames: 5802
time: 580.2 ps

speed: 29.4 ps/h
normalized: 1.44x

User avatar
francesco oteri
Posts: 9
Joined: Fri Oct 30, 2009 7:52 am

RE: Benchmarks of CUDA enabled Gromacs

Post by francesco oteri » Wed May 26, 2010 6:06 am

Dear Esben,
I found the same difference of performance. I discovered that the problem is the algorithm used for electrostatic computation. The published data refers to simulations performed used Reaction field that is 15x-20x faster in GPU implementation. PME is slower, and the simulation that use it, are only 3x faster than the single CPU.

User avatar
Peter Eastman
Posts: 2580
Joined: Thu Aug 09, 2007 1:25 pm

RE: Benchmarks of CUDA enabled Gromacs

Post by Peter Eastman » Wed May 26, 2010 11:20 am

You're correct that PME is significantly slower than Reaction-Field. We hope to improve the speed of it in future releases, but it's a hard algorithm to adapt to a GPU.

Keep in mind that the parameters that give optimal performance will often be different for the GPU than for the CPU, so if you run both simulations with identical parameters that isn't really a fair comparison. In particular, you should try increasing the cutoff distance a bit. That will allow it to do more of the computation in direct space (where the GPU is faster) and less in reciprocal space. I usually find that the CPU gives optimal performance with a cutoff of around 1 to 1.2 nm, while the GPU gives optimal performance with a cutoff of around 1.5 to 1.8 nm.

Peter

POST REPLY