Relative performance of various GPUs

Maxim Imakaev · Post by **Maxim Imakaev** » Tue Apr 07, 2015 11:59 am

Thanks a lot! It looks like the speedup is very significant, and the upgrade is definitely worth it.

Max

Maxim Imakaev · Post by **Maxim Imakaev** » Tue Apr 07, 2015 12:06 pm

Thanks a lot! It looks like the speedup is very significant, and the upgrade is definitely worth it.

on 580 GTX I'm getting 28 ns/day (cuda 5.5, OpenMM 6.2)
On 680 GTX I'm getting 48 ns/day (zotac) and 53 ns/day (EVGA superclocked) (cuda 6.5, OpenMM 6.2)

Lee-Ping Wang · Post by **Lee-Ping Wang** » Tue Apr 07, 2015 12:25 pm

Thanks, Peter, for the writeup; it covers most of my concerns and more.

What are the different ways to define energy drift? I have seen numbers reported as a percentage of the kinetic energy over a time interval, like in this article: http://dx.doi.org/10.1063/1.478995

I think it's also helpful to have a "minimal" version of the benchmark if possible, something that I could run in 2 minutes. It's impractical to run the full suite half-casually like I did this morning.

Peter Eastman · Post by **Peter Eastman** » Tue Apr 07, 2015 12:33 pm

Any number reported "per picosecond" is meaningless, because energy drift is not linear in time. It also is not linear in the number of degrees of freedom, so dividing it by the kinetic energy will not give anything meaningful.

Peter

Lee-Ping Wang · Post by **Lee-Ping Wang** » Tue Apr 07, 2015 12:41 pm

What's a reasonable number then? How about simply reporting a number in kJ/mol over some time interval we choose (100 ps, for example).

- Lee-Ping

Peter Eastman · Post by **Peter Eastman** » Tue Apr 07, 2015 2:59 pm

There's no unique answer to that. It all depends on the details of your simulation and what you hope to learn from it. A given change in energy will have more effect on a small system than a large one. Constant temperature simulations are much less affected by drift than constant energy simulations. Some quantities are very sensitive to small changes in temperature (and hence energy), while others are much less sensitive to it.

Peter

Lee-Ping Wang · Post by **Lee-Ping Wang** » Wed Apr 08, 2015 10:10 am

I agree. I was mainly asking what would be a reasonable unit system for reporting energy drift, since I hardly ever see this number being quantitatively reported. Once we decide on the unit system, we can then try to set reasonable values for constant temperature and constant energy simulations.

- Lee-Ping

Mark Williamson · Post by **Mark Williamson** » Wed Apr 08, 2015 10:14 am

leeping wrote:Hi Mark,

I found that your benchmark reported performance that was almost 40% lower than the benchmark included with the OpenMM source, which also uses the JAC system. I found out where the difference was coming from - mainly the PME settings and the reporters.
....

Thus, I would recommend making the above changes to make things consistent with OpenMM's benchmark.

Thanks,

- Lee-Ping

Dear Lee-Ping,

Thank you for the feedback, I did not realise that this was going to generate so much discussion.

As Peter has mentioned in this thread, the area of benchmarking is really quite complex, and sometimes with the best of objective intentions, can unfortunately drift into a political realm. I've had this benchmark internally for a while and I've tried to be as fair as possible; this feedback and discussion is very useful. The performance improvements in OpenMM over time have been exciting and my aim here has been to try to ensure that the two calculations within that repository, (that I run on the same machine) are as similar as possible.

If I remove the two reporters from the OpenMM calculation, the performance increases from ~56 ns/day --> ~73 ns/day, which is great. However, if you look at the AMBER mdin file associated, the parameters ntpr=1000, ntwx=1000 are essentially are acting as the StateDataReporter and PDBReporter within pmemd.cuda, hence turning these off in OpenMM would not be a fair comparison. It seems that the PDBReporter is a bit slow, and it may be fairer to use perhaps one of Jason Swails' mdcrd reporters? What do you think?

With respect to precision, single precision is not a fair comparison here since pmemd.cuda is running in the SPDP mode, which, to my knowledge, is the same as OpenMM's mixed precision mode.

With respect to the cutoff, within the mdin file, this is set to 8A (cut=8), hence I have set this to be the same in the OpenMM. If I set this to 9A, performance increases from ~56 ns/day --> ~60 ns/day. Walker has variants of the JAC benchmark, and I'm specifically using the JAC_production_NVE here (http://ambermd.org/amber10.bench1.html#jac) which has this set to 8A (I initially started this with an interest in energy conservation). So, yes, this is technically not the "JAC" benchmark, but within the context of the two runs within the repository, they are, IMHO, a fair comparison.

Relating to this and returning to another important point from Peter. I think the community needs to agree on a set of, mutually agreed, standard, modern benchmarks, but I can empathise with the difficulty of persuading people from all camps to decide on this. It is difficult and I think Peter's attached document provides a solid foundation for moving towards such a goal.

Thanks,

Mark

Peter Eastman · Post by **Peter Eastman** » Wed Apr 08, 2015 11:31 am

PME parameters are set completely differently in OpenMM from in AMBER. Making one parameter (direct space cutoff) match while all the other relevant parameters (grid size, spline order, alpha) are different is not a meaningful comparison. You need to adjust all of them at once to give optimal performance while maintaining the same level of accuracy.

OpenMM tries to make this somewhat easy by letting you explicitly specify the accuracy you want (ewaldErrorTolerance). You then adjust the direct space cutoff, and it automatically adjusts the grid dimensions and alpha appropriately. So changing the direct space cutoff only affects performance, not accuracy. When you increase it from 8 to 9 A, OpenMM automatically responds by decreasing the grid size. In this case, that leads to a net performance improvement.

Peter

Mark Williamson · Post by **Mark Williamson** » Thu Apr 09, 2015 3:53 am

peastman wrote:size, spline order, alpha) are different is not a meaningful comparison. You need to adjust all of them at once to give optimal performance while maintaining the same level of accuracy.

Agreed; hence, given the AMBER mdout file present, I assume the following would be correct?

Code: Select all

for i in range(system.getNumForces()):
   if (type(system.getForce(i)) == openmm.NonbondedForce):
      # NFFT1 =   64       NFFT2 =   64       NFFT3 =   64
      # Ewald Coefficient =  0.39467 (A^-1)
      system.getForce(i).setPMEParameters(3.9467, 64, 64, 64)

Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs

Re: Relative performance of various GPUs