I was doing some benchmarking of OpenMM and some other MD software on Kepler cards (K20s). It seems things run at about the same speed at the moment when comparing a K20 to an older M2070 in terms of OpenMM. Also I can't seem to get the CUDA platform working, not sure if that might be contributing (although I also have no idea why it isn't working, the testInstallation.py just says there was an error with the CUDA test, and whenever I run my own code it uses OpenCL).
I'm wondering if you had any sense how you thought things might eventually compare between a Kepler and Fermi card? My benchmarking (which seems to agree with others) of AMBER suggests an ~2x-2.5x speed boost when going from an M2070 to a K20. If that was the case for OpenMM also, it would be fantastic, but currently I can't generate much enthusiasm for the new Kepler cards.
~Aron
Current State of OpenMM on Kepler
- Peter Eastman
- Posts: 2610
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Current State of OpenMM on Kepler
Hi Aron,
I find that CUDA is usually faster on Kepler than on Fermi, whereas OpenCL is significantly slower on Kepler than on Fermi.
Or putting it a different way: on Fermi, CUDA and OpenCL are roughly the same speed, but on Kepler, CUDA is much faster than OpenCL.
Try forcing your code to use CUDA by passing Platform.getPlatformByName("CUDA") to the Context/Simulation constructor. Does it give an error? If so, what?
Peter
I find that CUDA is usually faster on Kepler than on Fermi, whereas OpenCL is significantly slower on Kepler than on Fermi.
![Sad :(](./images/smilies/icon_e_sad.gif)
Try forcing your code to use CUDA by passing Platform.getPlatformByName("CUDA") to the Context/Simulation constructor. Does it give an error? If so, what?
Peter
- Aron Broom
- Posts: 54
- Joined: Tue Mar 13, 2012 11:33 am
Re: Current State of OpenMM on Kepler
OK, that has highlighted the issue, the error is:
Exception: Error launching CUDA compiler: 32512
sh: /usr/local/cuda/bin/nvcc: No such file or directory
I've set the LD_LIBRARY_PATH, but I'm not sure if there is supposed to be another variable for setting where to find the other cuda stuff. My CUDA install is in a non-standard location (I have no control over that sadly). I thought there used to be something like CUDAHOME or CUDA_HOME that OpenMM used, but I don't see that references anywhere.
during the install I don't think I had to specify where cuda was, I just gave it the python path and the install path.
Exception: Error launching CUDA compiler: 32512
sh: /usr/local/cuda/bin/nvcc: No such file or directory
I've set the LD_LIBRARY_PATH, but I'm not sure if there is supposed to be another variable for setting where to find the other cuda stuff. My CUDA install is in a non-standard location (I have no control over that sadly). I thought there used to be something like CUDAHOME or CUDA_HOME that OpenMM used, but I don't see that references anywhere.
during the install I don't think I had to specify where cuda was, I just gave it the python path and the install path.
- Peter Eastman
- Posts: 2610
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Current State of OpenMM on Kepler
Set OPENMM_CUDA_COMPILER to point to nvcc. See section 11.2 of the manual for details.
Peter
Peter
- Aron Broom
- Posts: 54
- Joined: Tue Mar 13, 2012 11:33 am
Re: Current State of OpenMM on Kepler
Thanks that worked great.
So it seems like on average between variously sized explicit solvent simulations and some implicit and vacuum ones, the K20s are ~2.5x faster than an M2070 (I'll assume the K20x are about that much faster than an M2090).
That's fantastic!!!
Just for people's information, here are my benchmark speeds for some systems on a K20:
All timesteps were 2fs with HBonds constrained, and used a langevin integrator
All explicit simulations were PME, with 10 angstrom cutoff, NVT
The implicit simulation had no cutoff, and used OBC2
The vacuum simulation had no cutoff
Explicit solvent, 150 amino acid protein in water with ions, total system size 30k atoms: 21.84 ns/day
Explicit solvent, 150 amino acid protein in water with ions, total system size 50k atoms: 14.32 ns/day
Explicit solvent, 150 amino acid protein in water with ions, total system size 100k atoms: 6.83 ns/day
Implicit solvent, 150 amino acid protein (~2500 atoms): 171.06 ns/day
Vacuum, 150 amino acid protien (~2500 atoms): 585.88 ns/day
So it seems like on average between variously sized explicit solvent simulations and some implicit and vacuum ones, the K20s are ~2.5x faster than an M2070 (I'll assume the K20x are about that much faster than an M2090).
That's fantastic!!!
Just for people's information, here are my benchmark speeds for some systems on a K20:
All timesteps were 2fs with HBonds constrained, and used a langevin integrator
All explicit simulations were PME, with 10 angstrom cutoff, NVT
The implicit simulation had no cutoff, and used OBC2
The vacuum simulation had no cutoff
Explicit solvent, 150 amino acid protein in water with ions, total system size 30k atoms: 21.84 ns/day
Explicit solvent, 150 amino acid protein in water with ions, total system size 50k atoms: 14.32 ns/day
Explicit solvent, 150 amino acid protein in water with ions, total system size 100k atoms: 6.83 ns/day
Implicit solvent, 150 amino acid protein (~2500 atoms): 171.06 ns/day
Vacuum, 150 amino acid protien (~2500 atoms): 585.88 ns/day