Hi,
I've been looking for a GPU setup that makes microsecond-scale simulations of my 100,000-atom system doable in a reasonable amount of time. I saw that in the benchmark tests on the OpenMM website, an array of 4xA100 is not much faster than a single RTX 4090 on a system similar to mine. Has anyone tried running OpenMM on an array of multiple RTX 4090 (or similar)?
OpenMM benchmark on multiple RTX GPUs?
- Peter Eastman
- Posts: 2602
- Joined: Thu Aug 09, 2007 1:25 pm
Re: OpenMM benchmark on multiple RTX GPUs?
You're better off just using a single GPU. For one thing, 100,000 atoms isn't really that large. It probably doesn't provide enough work to saturate multiple 4090s. For another thing, the performance of multi-GPU simulations depends strongly on the speed of communication between GPUs. All consumer GPUs use PCIe, which isn't that fast. The A100 benchmark used NVLink for communication, which is much faster.
The benchmark page has numbers for ApoA1, which is similar size (92,224 atoms). Running on a single 4090 it gets 784 ns/day.
The benchmark page has numbers for ApoA1, which is similar size (92,224 atoms). Running on a single 4090 it gets 784 ns/day.
- Ilias Hurley
- Posts: 2
- Joined: Fri Feb 16, 2024 12:52 pm
Re: OpenMM benchmark on multiple RTX GPUs?
Hi Peter,
Thank you for the detailed answer! Has OpenMM been optimized for Tensor Cores? If not, are there plans to do so? What would it take to optimize it?
Thank you for the detailed answer! Has OpenMM been optimized for Tensor Cores? If not, are there plans to do so? What would it take to optimize it?
- Peter Eastman
- Posts: 2602
- Joined: Thu Aug 09, 2007 1:25 pm
Re: OpenMM benchmark on multiple RTX GPUs?
No, it does not use tensor cores. We've looked into it, but there isn't really anything they would be useful for. They have a couple of problems.
1. Really all they do is matrix multiplication. They aren't useful for anything that doesn't look like a matrix multiplication.
2. They don't support 32 bit floating point, just reduced precision modes (and curiously double precision, but that would make it much slower).
There aren't a lot of operations in MD that look like matrix multiplication, and there aren't a lot of operations where reduced precision is good enough, and I haven't found any where both of those are true at once.
1. Really all they do is matrix multiplication. They aren't useful for anything that doesn't look like a matrix multiplication.
2. They don't support 32 bit floating point, just reduced precision modes (and curiously double precision, but that would make it much slower).
There aren't a lot of operations in MD that look like matrix multiplication, and there aren't a lot of operations where reduced precision is good enough, and I haven't found any where both of those are true at once.