Cpu overhead

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Saugat Kandel
Posts: 7
Joined: Wed Nov 05, 2014 4:50 pm

Cpu overhead

Post by Saugat Kandel » Thu Apr 23, 2015 2:58 pm

When I run OpenMM on the CPU platform, I find that it routinely exceeds the CPU usage I set using the OPENMM_CPU_THREADS environment variable. For the included simulateAmber.py example, when I set OPENMM_CPU_THREADS=1, the cpu usage (from the top command) is usually between 130-150%. For 2 threads, the cpu usages hovers at around 250%, and so on. This becomes an issue when I try to run multiple instances of OpenMM in the same node, particularly in a cluster environment. Is this a problem on my end, or a general issue with OpenMM? Do you have any recommendations on how to limit the CPU usage?

In case it helps, I am running CentOS 6.6, with gcc version 4.8.4, on Intel Xeon E5620 cpus.

User avatar
Peter Eastman
Posts: 2543
Joined: Thu Aug 09, 2007 1:25 pm

Re: Cpu overhead

Post by Peter Eastman » Mon Apr 27, 2015 3:03 pm

Is this actually a problem? Operating systems are pretty good at multitasking. If you run multiple simulations at once, with the numbers of threads adding up to a reasonable total based on your number of cores, they should have no trouble sharing the system.

Peter

User avatar
Saugat Kandel
Posts: 7
Joined: Wed Nov 05, 2014 4:50 pm

Re: Cpu overhead

Post by Saugat Kandel » Fri May 01, 2015 11:34 am

Apologies for responding to you late.

This has been a problem in simulations in computing clusters. If I set OPENMM_CPU_THREADS=1, then assign each simulation one cpu in my submit script, the OpenMM cpu overhead affects all the other jobs running in the same node, making some cpu threads entirely unavailable. If I assign each simulation two cpus, but with OPENMM_CPU_THREADS=1 again, then the node is undertasked, and resources are wasted.

Basically, the cpu overhead makes it difficult to assign the proper number of cpu threads for each simulation. While this is not a problem when I run the simulations in a desktop system, it becomes a problem when I submit simulations to computing clusters, where every cpu thread has to be accounted for.

User avatar
Peter Eastman
Posts: 2543
Joined: Thu Aug 09, 2007 1:25 pm

Re: Cpu overhead

Post by Peter Eastman » Fri May 01, 2015 12:02 pm

I don't think I've ever encountered an environment like that, and I'm not really sure how to deal with it. OpenMM creates a lot of threads for a lot of different purposes. At any given time, the majority of them will be sleeping, and it tries to limit how many of them will be active at any time, but there are no hard guarantees.

I'm still having trouble believing a system would actually act like you describe. Certainly, it's common to allocate specific cores to specific jobs, but that shouldn't cause problems. If a job only has one core allocated to it but has multiple active threads, that just means those threads will get timesliced on the one core. But requiring a unique core for every thread? What kind of a system is this?

Peter

User avatar
Lee-Ping Wang
Posts: 102
Joined: Sun Jun 19, 2011 5:14 pm

Re: Cpu overhead

Post by Lee-Ping Wang » Fri May 01, 2015 12:46 pm

Hi Peter,

I don't know all the nuts and bolts of how common batch systems work (e.g. PBS, SGE, Slurm), but my impression is that they do not place actual restrictions on the number of cores when a user submits a job. It's up to the user to restrict the number of cores that are used. This is distinct from other limitations such as memory, where SGE will kill the job if memory usage is too high.

In simulation programs like GROMACS or TINKER, when the user specifies the number of threads, it appears to fully occupy the same number of cores. At least looking at "top" the CPU percentage is very close to 100% for one core, 200% for two cores, and so on. I think this behavior is fairly common.

Thanks,

- Lee-Ping

User avatar
Saugat Kandel
Posts: 7
Joined: Wed Nov 05, 2014 4:50 pm

Re: Cpu overhead

Post by Saugat Kandel » Fri May 01, 2015 1:53 pm

Hi Lee-Ping,
I am working with an SGE submit system, and my experience has been exactly as you describe. I can submit a job and assign one core to it. SGE then counts that job as using one core in its accounting, and assumes that all the other cores are free. However, SGE does not actually restrict the job to only one core. In effect, this means that if the job internally uses more than 100% of the cpu, then there is a mismatch between the cores *actually* available and the cores that SGE thinks is available. I don't really know how to get around this.

User avatar
Peter Eastman
Posts: 2543
Joined: Thu Aug 09, 2007 1:25 pm

Re: Cpu overhead

Post by Peter Eastman » Fri May 01, 2015 2:56 pm

I just committed a change that will make it do a better job of using the specified number of cores. Still no hard guarantees, but it should be closer than it was before.

Peter

POST REPLY