Cpu overhead

Saugat Kandel · Post by **Saugat Kandel** » Thu Apr 23, 2015 2:58 pm

When I run OpenMM on the CPU platform, I find that it routinely exceeds the CPU usage I set using the OPENMM_CPU_THREADS environment variable. For the included simulateAmber.py example, when I set OPENMM_CPU_THREADS=1, the cpu usage (from the top command) is usually between 130-150%. For 2 threads, the cpu usages hovers at around 250%, and so on. This becomes an issue when I try to run multiple instances of OpenMM in the same node, particularly in a cluster environment. Is this a problem on my end, or a general issue with OpenMM? Do you have any recommendations on how to limit the CPU usage?

In case it helps, I am running CentOS 6.6, with gcc version 4.8.4, on Intel Xeon E5620 cpus.

Peter Eastman · Post by **Peter Eastman** » Mon Apr 27, 2015 3:03 pm

Is this actually a problem? Operating systems are pretty good at multitasking. If you run multiple simulations at once, with the numbers of threads adding up to a reasonable total based on your number of cores, they should have no trouble sharing the system.

Peter

Saugat Kandel · Post by **Saugat Kandel** » Fri May 01, 2015 11:34 am

Apologies for responding to you late.

This has been a problem in simulations in computing clusters. If I set OPENMM_CPU_THREADS=1, then assign each simulation one cpu in my submit script, the OpenMM cpu overhead affects all the other jobs running in the same node, making some cpu threads entirely unavailable. If I assign each simulation two cpus, but with OPENMM_CPU_THREADS=1 again, then the node is undertasked, and resources are wasted.

Basically, the cpu overhead makes it difficult to assign the proper number of cpu threads for each simulation. While this is not a problem when I run the simulations in a desktop system, it becomes a problem when I submit simulations to computing clusters, where every cpu thread has to be accounted for.

Peter Eastman · Post by **Peter Eastman** » Fri May 01, 2015 12:02 pm

I don't think I've ever encountered an environment like that, and I'm not really sure how to deal with it. OpenMM creates a lot of threads for a lot of different purposes. At any given time, the majority of them will be sleeping, and it tries to limit how many of them will be active at any time, but there are no hard guarantees.

I'm still having trouble believing a system would actually act like you describe. Certainly, it's common to allocate specific cores to specific jobs, but that shouldn't cause problems. If a job only has one core allocated to it but has multiple active threads, that just means those threads will get timesliced on the one core. But requiring a unique core for every thread? What kind of a system is this?

Peter

Lee-Ping Wang · Post by **Lee-Ping Wang** » Fri May 01, 2015 12:46 pm

Hi Peter,

I don't know all the nuts and bolts of how common batch systems work (e.g. PBS, SGE, Slurm), but my impression is that they do not place actual restrictions on the number of cores when a user submits a job. It's up to the user to restrict the number of cores that are used. This is distinct from other limitations such as memory, where SGE will kill the job if memory usage is too high.

In simulation programs like GROMACS or TINKER, when the user specifies the number of threads, it appears to fully occupy the same number of cores. At least looking at "top" the CPU percentage is very close to 100% for one core, 200% for two cores, and so on. I think this behavior is fairly common.

Thanks,

- Lee-Ping

Saugat Kandel · Post by **Saugat Kandel** » Fri May 01, 2015 1:53 pm

Hi Lee-Ping,
I am working with an SGE submit system, and my experience has been exactly as you describe. I can submit a job and assign one core to it. SGE then counts that job as using one core in its accounting, and assumes that all the other cores are free. However, SGE does not actually restrict the job to only one core. In effect, this means that if the job internally uses more than 100% of the cpu, then there is a mismatch between the cores *actually* available and the cores that SGE thinks is available. I don't really know how to get around this.

Peter Eastman · Post by **Peter Eastman** » Fri May 01, 2015 2:56 pm

I just committed a change that will make it do a better job of using the specified number of cores. Still no hard guarantees, but it should be closer than it was before.

Peter

Cpu overhead

Cpu overhead

Re: Cpu overhead

Re: Cpu overhead

Re: Cpu overhead

Re: Cpu overhead

Re: Cpu overhead

Re: Cpu overhead