OpenMM benchmark on multiple RTX GPUs?

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Ilias Hurley
Posts: 2
Joined: Fri Feb 16, 2024 12:52 pm

OpenMM benchmark on multiple RTX GPUs?

Post by Ilias Hurley » Tue Apr 02, 2024 12:10 pm

Hi,

I've been looking for a GPU setup that makes microsecond-scale simulations of my 100,000-atom system doable in a reasonable amount of time. I saw that in the benchmark tests on the OpenMM website, an array of 4xA100 is not much faster than a single RTX 4090 on a system similar to mine. Has anyone tried running OpenMM on an array of multiple RTX 4090 (or similar)?

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: OpenMM benchmark on multiple RTX GPUs?

Post by Peter Eastman » Tue Apr 02, 2024 12:24 pm

You're better off just using a single GPU. For one thing, 100,000 atoms isn't really that large. It probably doesn't provide enough work to saturate multiple 4090s. For another thing, the performance of multi-GPU simulations depends strongly on the speed of communication between GPUs. All consumer GPUs use PCIe, which isn't that fast. The A100 benchmark used NVLink for communication, which is much faster.

The benchmark page has numbers for ApoA1, which is similar size (92,224 atoms). Running on a single 4090 it gets 784 ns/day.

User avatar
Ilias Hurley
Posts: 2
Joined: Fri Feb 16, 2024 12:52 pm

Re: OpenMM benchmark on multiple RTX GPUs?

Post by Ilias Hurley » Thu Apr 11, 2024 8:56 am

Hi Peter,

Thank you for the detailed answer! Has OpenMM been optimized for Tensor Cores? If not, are there plans to do so? What would it take to optimize it?

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: OpenMM benchmark on multiple RTX GPUs?

Post by Peter Eastman » Thu Apr 11, 2024 9:11 am

No, it does not use tensor cores. We've looked into it, but there isn't really anything they would be useful for. They have a couple of problems.

1. Really all they do is matrix multiplication. They aren't useful for anything that doesn't look like a matrix multiplication.

2. They don't support 32 bit floating point, just reduced precision modes (and curiously double precision, but that would make it much slower).

There aren't a lot of operations in MD that look like matrix multiplication, and there aren't a lot of operations where reduced precision is good enough, and I haven't found any where both of those are true at once.

POST REPLY