Simulations with >100k particles.

Maxim Imakaev · Post by **Maxim Imakaev** » Mon Aug 13, 2012 6:55 am

Dear Peter,

I'm running simulations of long polymers (<170k particles), using OpenCL platform.
I always thought I can't run longer simulations because of memory size...
(for larger systems, simulation segfaults while creating context, after all forces have been added to the system)

Surprisingly, I found that it is not because of memory size (I can run two 170k simulations in parallel).
Adding forces one by one showed that bondforce and externalForce are responsible for this.

I can run simulation of 300k particles, but only if it has less than ~200k bonds. (the "maximum" number of bonds varied a bit).
Also, having only ExternalForce hangs it instead of segfaulting.

I tried splitting bonds between multiple bond forces, but it didn't help.

Max

Peter Eastman · Post by **Peter Eastman** » Fri Aug 17, 2012 11:06 am

Hi Max,

Could you provide a few more details about your situation?

1. How much memory does you GPU have?

2. What do you mean by "bondforce"? HarmonicBondForce? CustomBondForce? The combination of several different bonded forces?

3. What other forces are in your system?

4. If you have any sort of nonbonded force (NonbondedForce, CustomNonbondedForce, GBSAOBCForce, etc.), what nonbonded method does it use?

The memory requirements for bonded forces are usually fairly modest, unless they have a really large number of parameters.

Peter

Maxim Imakaev · Post by **Maxim Imakaev** » Mon Aug 20, 2012 2:23 pm

Peter,

I have Nvidia GTX 580 GPU with 1.5 GB RAM
I'm using Cuda 4.2 with default Ubuntu 12.04 drivers, and OpenMM 4.1.1
I use OpenCL platform. It works on reference. For Cuda it runs out of memory for any configuration.
(Exception: InteractionFlag: cudaMalloc in CUDAStream::Allocate failed out of memory)

I did further study of the problem, and found exactly how it shows up.

My system is a polymer chain of 300000 particles.
It starts from a folded coil in space (not from a straight line).
At first, the only force I have is harmonic bond force.

I add consecutive bonds: (0,1), (1,2)... (299998, 299999) - segfault
I add 99% of these bonds, selected randomly - runs perfectly.

I add all bonds but the middle one: (0,1), (149998, 149999), (150000,150001), (299998,299999) - runs perfectly!

I go further: I omit only bond # 250000 instead, and it segfaults.
I omit 180000 - it segfaults
170000 - runs

I found that the boundary is somewhere in between 174400 and 174550, and it's inconsistent in between.
The number seems to be the same both for 200k and 300k long system.

The same was true on the other end: boundary at 125000 fails, at 130000 - runs
To summarize, if there's a linear connected subchain of at least 175000 particles, it fails.

Surprisingly, when I created a chain that is broken in the middle (connect 0...X, X+1...N-1 and 0,N-1), it run for X=150000, and failed for X>175000
(even though it was in fact a single connected chain).

I don't know what's going on, but it seems that there is some sort of calculation of clusters of particles, connected by bonds.

I tried reshuffling bonds (like 0... X1, X1... X2, X2... X3, etc.), but then the pattern was much more confusing, and generally allowed for a longer clusters of particles, connected by bonds, but not the entire chain.
I tried adding regular bonds (0..N-1) in a shuffled order, but I got the same result.

It seems to be the only problem. A system of 500k particles, with couple external forces and custom nonbonded force work fine!
However, something similar appears to be true for the angle force... I can look deeper into it.

Peter Eastman · Post by **Peter Eastman** » Tue Aug 21, 2012 10:05 am

Using the OpenCL platform, does it run out of host memory or device memory? For CUDA you said that it runs out of device memory (the failure is in cudaMalloc). Where and how does it fail with OpenCL?

By any chance do you have constraints in your system? The device memory should simply scale with the number of bonds, regardless of whether they form a connected chain. (Host memory might be another matter, which is why I asked for that clarification.) But device memory can vary based on the number of connected constraints.

Peter

Maxim Imakaev · Post by **Maxim Imakaev** » Thu Aug 23, 2012 8:04 am

The issue I described is only for OpenCL platform.
CUDA platform does not tolerate thus # of particles at all, and gives me out of memory error for any bond configuration.

No, I don't use any constraints.
When I observed this issue, the only things I used are HarmonicBondBondforce and LangevinIntegrator.

I can try to produce a minimal code that produces this issue.

Max

Maxim Imakaev · Post by **Maxim Imakaev** » Thu Aug 23, 2012 8:05 am

It's not a memory issue for OpenCL platform.
When it works, I can easily run two simulations in parallel.

Peter Eastman · Post by **Peter Eastman** » Thu Aug 23, 2012 9:58 am

With the OpenCL platform, is it running out of host or device memory? What is the exact behavior you're seeing? Does it produce an error message? If so, what does that message say?

Peter

Maxim Imakaev · Post by **Maxim Imakaev** » Sat Sep 01, 2012 10:38 am

With OpenCL it silently segfaults, it does not print anything.
It is not running out of device memory, as far as I can tell from watching nvidia-smi.
It is not running of host memory for sure.

It segfaults when I initialize context.

Peter Eastman · Post by **Peter Eastman** » Wed Sep 05, 2012 11:05 am

Ok, so the problem isn't actually about running out of memory. That's good to know.

Can you run your code inside gdb? When it hits the segfault it should break into the debugger. Then type "bt" to get a trace of the stack. That might tell us something about where the error is happening.

Peter

Maxim Imakaev · Post by **Maxim Imakaev** » Fri Sep 07, 2012 10:53 am

I could... but is there a way to run it inside gdb with python API?
Or should I rewrite it in C++?

Simulations with >100k particles.

Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.