Simulations with >100k particles.

Maxim Imakaev · Post by **Maxim Imakaev** » Thu Feb 07, 2013 3:36 pm

Hi Peter,

Sorry it took me so long.

So here is the segmentation fault:

0x00007ffff6776847 in OpenMM::OpenCLContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so

And for CUDA platform:
0x00007ffff54d15e9 in OpenMM::CudaContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMCUDA.so

The error is still there in OpenMM 5.0.

Thanks,
Max

Maxim Imakaev · Post by **Maxim Imakaev** » Thu Feb 07, 2013 3:42 pm

Perhaps, this is because of the use of recursion in CudaContext::tagAtomsInMolecule...

Spela Ivekovic · Post by **Spela Ivekovic** » Thu May 30, 2013 12:23 pm

Hi Peter,

I'm experiencing a similar issue. I can run simulations of up to 330,000 atoms on an NVidia Quadro 4000 with 2GB of graphics memory and only around 430,000 on Tesla M2075 with 6 GB of memory. It seems strange that the scaling is not better.

I've run some tests on HelloWaterBox.cpp to establish just how big the simulation could get. On a Quadro 4000 with 2GB memory, with the standard compilation (Release and gcc optimisation switched on), I can run up to 46x46x46 water molecules. At 47x47x47 it crashes with:

Code: Select all

REMARK  Using OpenMM platform CUDA
EXCEPTION: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (700)

I also tried 100x100x100 water molecules and the error message is different:

Code: Select all

EXCEPTION: Error creating array random: CUDA_ERROR_OUT_OF_MEMORY (2)

I attempted to run this through gdb to see what was going on, but because the code was optimised, I could not get to the exact place where it crashed. Simple printout statements pointed to the CudaKernels.cpp Verlet kernel, but I got no further than that:

Code: Select all

void CudaIntegrateVerletStepKernel::execute(ContextImpl& context, const VerletIntegrator& integrator)

When I compiled the OpenMM Library and the HelloWaterBox.cpp in debug mode and with -g -O0, the simulation crashed at the very end, after seemingly exiting from main.c inside HelloWaterBox and I began getting the following error messages for any simulation size (even 10x10x10):

Code: Select all

*** glibc detected *** ./HelloWaterBox: double free or corruption (!prev): 0x0000000000f70300 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7e626)[0x7facba7f0626]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSsD1Ev+0x23)[0x7facbb0ddc13]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b921)[0x7facba7ad921]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b9a5)[0x7facba7ad9a5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf4)[0x7facba793774]
./HelloWaterBox[0x401a29]

This last error message is for the original HelloWaterBox.cpp with 10x10x10.

If I try running 47x47x47, I get the additional error message

Code: Select all

EXCEPTION: Error downloading array posq: clEnqueueReadBuffer (-36)

here:

Code: Select all

EXCEPTION: Error downloading array posq: clEnqueueReadBuffer (-36)
*** glibc detected *** ./HelloWaterBox: double free or corruption (!prev): 0x0000000000f70300 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7e626)[0x7facba7f0626]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSsD1Ev+0x23)[0x7facbb0ddc13]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b921)[0x7facba7ad921]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b9a5)[0x7facba7ad9a5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf4)[0x7facba793774]
./HelloWaterBox[0x401a29]

Any idea what might be going on?

Spela

Peter Eastman · Post by **Peter Eastman** » Thu May 30, 2013 5:40 pm

Hi Spela,

Those numbers are very low. Our memory use has improved a lot in the last few versions, and at least on the Tesla you should have no problem simulating over a million atoms. Could you confirm that you're using OpenMM 5.1? Also, I gather the results are the same with both CUDA and OpenCL (since some of your error messages mentioned CUDA, and others mentioned OpenCL)?

Also, can you confirm that nothing else is using significant device memory? Check for anything else using the GPU, either for computation or graphics.

Peter

Maxim Imakaev · Post by **Maxim Imakaev** » Thu May 30, 2013 6:29 pm

Hi Peter,

I confirm that the bug is still there.
For 200k particles, connected in a chain (by harmonic bonds) it still segfaults.

For 150k it works, taking just 80MB of GPU memory, so it's not a memory issue (CUDA platform, Openmm 5.1).

For a million particles, connected in 10 100k chains, it works perfectly (doing as many as 5 steps per second, and using 320 MB of GPU memory).

Please see below where it segfaulted last time; I couldn't find my C++ code this time so I can't check if it's still there.

Again, in my case it's not a memory issue. It is, perhaps, a stack overflow in a recursive algorithm which calculates connected blocks of atoms.

Max
Where it segfaulted for OpenMM 5.0:
0x00007ffff6776847 in OpenMM::OpenCLContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so

And for CUDA platform:
0x00007ffff54d15e9 in OpenMM::CudaContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMCUDA.so

Peter Eastman · Post by **Peter Eastman** » Fri May 31, 2013 10:06 am

Hi Maxim,

In your case, I believe you're correct that it's a stack overflow. I can fix that pretty easily. But Spela's test was a box of water, so there would be very little recursion. So I think it's something different.

Peter

Spela Ivekovic · Post by **Spela Ivekovic** » Thu Jun 06, 2013 7:24 am

Hi Peter,

I am running OpenMM5.0.1 at the moment and the tests with error printouts in my previous post were all done on a desktop PC (Dell Precision T5500) with 2 Quadro 4000 with 2GB memory. On Tesla, I only ran the simulation in various sizes to see where it gave up, I did not recompile and run in gdb, but I can if that helps.

Regarding Cuda/OpenCL: I let the code choose the platform to run on and it looks like I was running it on Cuda while the code was still gcc-optimised and when I compiled it without optimisation, it then switched to OpenCL - I hadn't spotted that earlier and it seems a bit strange, I can hard-code the platform choice to avoid that in the future.

I have nothing else running on the desktop other than the OpenMM code and the GPU is a dual GPU, so I can designate the one that doesn't control the screen to run the simulation, which I normally do by using an environment variable:

Code: Select all

export CUDA_VISIBLE_DEVICES="1"

The typical output (before I run the simulation, when the GPUs are idle) I get from nvidia-smi -q | grep % is the following:

Code: Select all

> ~/openmm5.0.1/examples$ nvidia-smi -q | grep %
    Fan Speed                   : 40 %
        Gpu                     : 7 %
        Memory                  : 7 %
    Fan Speed                   : 40 %
        Gpu                     : 0 %
        Memory                  : 0 %

and while I am running the simulation:

Code: Select all

> ~/openmm5.0.1/examples$ nvidia-smi -q | grep %
    Fan Speed                   : 40 %
        Gpu                     : 1 %
        Memory                  : 5 %
    Fan Speed                   : 40 %
        Gpu                     : 98 %
        Memory                  : 6 %

I'll install OpenMM5.1 to see if any of the "double free" issues go away, but I think the simulation size will still be an issue.

Spela

Spela Ivekovic · Post by **Spela Ivekovic** » Thu Jun 06, 2013 11:41 am

Hi Peter,

I've installed OpenMM5.1 from source and tested again, on Quadro 4000. The installation is Release with the optimisation on.

This time, the water box simulation runs up to 45x45x45 water molecules (in OpenMM5.0.1 it reached 46x46x46).

At 46x46x46 it fails in the middle of the simulation run (the location of the failure changes, I have had the crash after frame 5 in one attempt and after frame 6 in another) with the following message:

Code: Select all

HETATM292006  O   HOH  97336     139.848 139.242 140.253  1.00  0.00
HETATM292007  H1  HOH  97336     139.675 139.415 141.178  1.00  0.00
HETATM292008  H2  HOH  97336     140.794 139.380 140.292  1.00  0.00
ENDMDL
EXCEPTION: Error downloading array posq: CUDA_ERROR_LAUNCH_FAILED (700)

I have not modified anything other than:
- the number of water molecules in the HelloWaterBox.cpp code
- the simulation time: I set it to run for 1ps instead of 10ps
like so:

Code: Select all

//                     MODELING AND SIMULATION PARAMETERS
const int    NumWatersAlongEdge  = 46;     // Size of box is NxNxN waters.
const double Temperature         = 300;    // Kelvins
const double FrictionInPerPs     = 91.;    // collisions per picosecond
const double CutoffDistanceInAng = 10.;    // Angstroms

const bool   UseConstraints      = true;   // Should we constrain O-H bonds?
const double StepSizeInFs        = 2;      // integration step size (fs)
const double ReportIntervalInFs  = 100;    // how often to generate PDB frame (fs)
const double SimulationTimeInPs  = 1;      // total simulation time (ps)

The platform it runs on is Cuda:

Code: Select all

REMARK  Using OpenMM platform CUDA
MODEL     1
REMARK 250 time=0.000 picoseconds
HETATM    1  O   HOH     1       0.000   0.000   0.000  1.00  0.00
HETATM    2  H1  HOH     1       0.957   0.000   0.000  1.00  0.00
HETATM    3  H2  HOH     1      -0.240   0.927   0.000  1.00  0.00

The interesting thing is that I can now run a 100x100x100 simulation too, without the "out of memory" message. Instead, it just fails with the same "Error downloading array posq" message after the first frame:

Code: Select all

HETATM2999995  O   HOH  999999     307.593 307.593 304.486  1.00  0.00
HETATM2999996  H1  HOH  999999     306.636 307.593 304.486  1.00  0.00
HETATM2999997  H2  HOH  999999     307.833 306.666 304.486  1.00  0.00
HETATM2999998  O   HOH  1000000     307.593 307.593 307.593  1.00  0.00
HETATM2999999  H1  HOH  1000000     308.550 307.593 307.593  1.00  0.00
HETATM3000000  H2  HOH  1000000     307.353 308.520 307.593  1.00  0.00
ENDMDL
EXCEPTION: Error downloading array posq: CUDA_ERROR_LAUNCH_FAILED (700)

It looks like the code in 5.1 has become a lot more memory efficient, but the posq problem remains.

Would you be able to hazard a guess at what might be going on?

Spela

Peter Eastman · Post by **Peter Eastman** » Thu Jun 13, 2013 2:40 pm

Hi Spela,

Sorry for not replying sooner - I've been on vacation. Here is the script I've been using to test this. Try running it and see what happens for you:

Code: Select all

from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *

boxSize = 20

forcefield = ForceField('amber99sb.xml', 'spce.xml')
modeller = Modeller(Topology(), [])
print "Adding solvent"
modeller.addSolvent(forcefield, boxSize=Vec3(boxSize, boxSize, boxSize)*nanometers)
print "Building system"
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, nonbondedCutoff=0.9*nanometers)
print system.getNumParticles(), "atoms"
integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
print "Creating context"
simulation = Simulation(modeller.topology, system, integrator, Platform.getPlatformByName('CUDA'))
simulation.context.setPositions(modeller.positions)
print "Computing"
print 'initial energy:', simulation.context.getState(getEnergy=True).getPotentialEnergy()
simulation.step(50)
print 'final energy:', simulation.context.getState(getEnergy=True).getPotentialEnergy()

You can adjust boxSize at the beginning to vary the number of water molecules that get added. On a machine with a GTX 680, 2 GB of device memory and 16 GB system memory, running Ubuntu 12.04, it can get up to 24 (over 1.3 million atoms) without crashing.

Also, how much system memory does your computer have?

Peter

Spela Ivekovic · Post by **Spela Ivekovic** » Sun Jun 16, 2013 12:31 pm

Hi Peter,

my PC has 12GB system memory and I am also running Ubuntu 12.04.

I tried your script but have discovered that the Python installation does not work in my OpenMM 5.1. I'll have to sort that out first. I'm not normally using the Python scripts, I work with OpenMM C++ API directly.

I'll get back to you when I have the Python stuff sorted out.

Spela

Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.

Re: Simulations with >100k particles.