Hi Peter,
We have been doing polymer simulations using OpenMM for many years now. We typically have fairly simple systems: 50k to 400k particles connected by harmonic bonds, simple repulsive potential between them, variable langevin integrator, and PBC or external confinement. Our polymers are typically at a high volume density.
Every time we upgraded GPUs we typically saw a large performance increase. This time I built a new machine for testing, with 4x 3080 GPUs, and I only found a modest increase (maybe 1.5x compared to 1080 Ti) even at large system sizes (200k or 400k particles).
Is this consistent with your expectations? Do you think we reached some bottleneck at which a stronger GPU does not make simulations faster because we are using very simplistic potentials? Would be interested in your opinion, and potential troubleshooting steps. I'm using most recent OpenMM that I just installed from conda-forge. PCI-E link is 3rd generation 8x or 16x. CPU is at 40-50%.
Best,
Max
Slow performance on 3080 compared to 1080 Ti
- Maxim Imakaev
- Posts: 87
- Joined: Sun Oct 24, 2010 2:03 pm
- Peter Eastman
- Posts: 2588
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Slow performance on 3080 compared to 1080 Ti
200k particles ought to be enough to keep even a 3080 busy. That suggests something else is the bottleneck. Some possibilities are PCIe bandwidth if you're retrieving results too frequently, disk I/O if you're saving results too frequently, slow operations taking place on the CPU that starve the GPU, etc.
Can you profile it to see what's happening on the GPU, for example with Nsight Systems? I would look for fluctuations where the GPU is busy for a while, then idle for a while. Depending on the time scale, you might even be able to see the GPU utilization fluctuating with nvidia-smi.
Can you profile it to see what's happening on the GPU, for example with Nsight Systems? I would look for fluctuations where the GPU is busy for a while, then idle for a while. Depending on the time scale, you might even be able to see the GPU utilization fluctuating with nvidia-smi.
- Maxim Imakaev
- Posts: 87
- Joined: Sun Oct 24, 2010 2:03 pm
Re: Slow performance on 3080 compared to 1080 Ti
hi Peter! Thanks for your advice.
Before I dive deeply into this, I wanted to share some general stats with you.
Running a simulation with 500k at volume density of 50% I'm getting 200 timesteps per second on 3080, and 140 timesteps per second on 1080 Ti. Interestingly, 1080 Ti has 100% GPU utilization in nvidia-smi (as I have always seen in our simulations even with 10k particle systems), while 3080 alternates between 83-84% and 94% utilization. On smaller 10k system I saw utilization of 80% or so on 3080.
I run the simulation in blocks of 3000 timesteps, so it is definitely not limited by PCIe bandwidth.
I'm assuming that the non-100%-utilization is something that may need to be investigated further.
Do these numbers make sense to you? And how much faster would you expect 3080 to be compared to 1080 Ti.
Before I dive deeply into this, I wanted to share some general stats with you.
Running a simulation with 500k at volume density of 50% I'm getting 200 timesteps per second on 3080, and 140 timesteps per second on 1080 Ti. Interestingly, 1080 Ti has 100% GPU utilization in nvidia-smi (as I have always seen in our simulations even with 10k particle systems), while 3080 alternates between 83-84% and 94% utilization. On smaller 10k system I saw utilization of 80% or so on 3080.
I run the simulation in blocks of 3000 timesteps, so it is definitely not limited by PCIe bandwidth.
I'm assuming that the non-100%-utilization is something that may need to be investigated further.
Do these numbers make sense to you? And how much faster would you expect 3080 to be compared to 1080 Ti.
- Peter Eastman
- Posts: 2588
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Slow performance on 3080 compared to 1080 Ti
I don't have any direct experience with either 1080 Ti or 3080. The lower utilization does suggest something else is slowing it down. That would be the place to start investigating.
Just looking at the official specs for the two cards, 3080 has theoretical performance for single precision that's more than 2x 1080 Ti, but the difference is much smaller for double precision. What precision mode do you use?
Just looking at the official specs for the two cards, 3080 has theoretical performance for single precision that's more than 2x 1080 Ti, but the difference is much smaller for double precision. What precision mode do you use?
- Maxim Imakaev
- Posts: 87
- Joined: Sun Oct 24, 2010 2:03 pm
Re: Slow performance on 3080 compared to 1080 Ti
I've been using "mixed" precision and have been assuming it's best since it was introduced.
Now I'm testing it, and I'm finding that single precision is visibly faster (260 steps per second compared to 200 with mixed!) Still non-100% utilization is present though. However, on 1080 Ti, switching from mixed to single leads to no change in speed (140 -> 144 maybe).
Overall now, 260 vs 143 is more consistent with my expectations, and if 90% utilization will be changed to 100, it would be exactly 2x faster, which would make sense.
Now I'm testing it, and I'm finding that single precision is visibly faster (260 steps per second compared to 200 with mixed!) Still non-100% utilization is present though. However, on 1080 Ti, switching from mixed to single leads to no change in speed (140 -> 144 maybe).
Overall now, 260 vs 143 is more consistent with my expectations, and if 90% utilization will be changed to 100, it would be exactly 2x faster, which would make sense.
- Peter Eastman
- Posts: 2588
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Slow performance on 3080 compared to 1080 Ti
NVIDIA's consumer GPUs have terrible double precision performance, and the latest generation made it even worse. The 1080 has a 32:1 ratio of single-to-double precision performance. That's what it had been for several generations. In the 3080 it's 64:1.