Simulations with >100k particles.
- Maxim Imakaev
- Posts: 87
- Joined: Sun Oct 24, 2010 2:03 pm
Re: Simulations with >100k particles.
Hi Peter,
Sorry it took me so long.
So here is the segmentation fault:
0x00007ffff6776847 in OpenMM::OpenCLContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
And for CUDA platform:
0x00007ffff54d15e9 in OpenMM::CudaContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMCUDA.so
The error is still there in OpenMM 5.0.
Thanks,
Max
Sorry it took me so long.
So here is the segmentation fault:
0x00007ffff6776847 in OpenMM::OpenCLContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
And for CUDA platform:
0x00007ffff54d15e9 in OpenMM::CudaContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMCUDA.so
The error is still there in OpenMM 5.0.
Thanks,
Max
- Maxim Imakaev
- Posts: 87
- Joined: Sun Oct 24, 2010 2:03 pm
Re: Simulations with >100k particles.
Perhaps, this is because of the use of recursion in CudaContext::tagAtomsInMolecule...
- Spela Ivekovic
- Posts: 26
- Joined: Thu Mar 17, 2011 4:27 am
Re: Simulations with >100k particles.
Hi Peter,
I'm experiencing a similar issue. I can run simulations of up to 330,000 atoms on an NVidia Quadro 4000 with 2GB of graphics memory and only around 430,000 on Tesla M2075 with 6 GB of memory. It seems strange that the scaling is not better.
I've run some tests on HelloWaterBox.cpp to establish just how big the simulation could get. On a Quadro 4000 with 2GB memory, with the standard compilation (Release and gcc optimisation switched on), I can run up to 46x46x46 water molecules. At 47x47x47 it crashes with:
I also tried 100x100x100 water molecules and the error message is different:
I attempted to run this through gdb to see what was going on, but because the code was optimised, I could not get to the exact place where it crashed. Simple printout statements pointed to the CudaKernels.cpp Verlet kernel, but I got no further than that:
When I compiled the OpenMM Library and the HelloWaterBox.cpp in debug mode and with -g -O0, the simulation crashed at the very end, after seemingly exiting from main.c inside HelloWaterBox and I began getting the following error messages for any simulation size (even 10x10x10):
This last error message is for the original HelloWaterBox.cpp with 10x10x10.
If I try running 47x47x47, I get the additional error message
here:
Any idea what might be going on?
Spela
I'm experiencing a similar issue. I can run simulations of up to 330,000 atoms on an NVidia Quadro 4000 with 2GB of graphics memory and only around 430,000 on Tesla M2075 with 6 GB of memory. It seems strange that the scaling is not better.
I've run some tests on HelloWaterBox.cpp to establish just how big the simulation could get. On a Quadro 4000 with 2GB memory, with the standard compilation (Release and gcc optimisation switched on), I can run up to 46x46x46 water molecules. At 47x47x47 it crashes with:
Code: Select all
REMARK Using OpenMM platform CUDA
EXCEPTION: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (700)
Code: Select all
EXCEPTION: Error creating array random: CUDA_ERROR_OUT_OF_MEMORY (2)
Code: Select all
void CudaIntegrateVerletStepKernel::execute(ContextImpl& context, const VerletIntegrator& integrator)
Code: Select all
*** glibc detected *** ./HelloWaterBox: double free or corruption (!prev): 0x0000000000f70300 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7e626)[0x7facba7f0626]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSsD1Ev+0x23)[0x7facbb0ddc13]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b921)[0x7facba7ad921]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b9a5)[0x7facba7ad9a5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf4)[0x7facba793774]
./HelloWaterBox[0x401a29]
If I try running 47x47x47, I get the additional error message
Code: Select all
EXCEPTION: Error downloading array posq: clEnqueueReadBuffer (-36)
Code: Select all
EXCEPTION: Error downloading array posq: clEnqueueReadBuffer (-36)
*** glibc detected *** ./HelloWaterBox: double free or corruption (!prev): 0x0000000000f70300 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7e626)[0x7facba7f0626]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSsD1Ev+0x23)[0x7facbb0ddc13]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b921)[0x7facba7ad921]
/lib/x86_64-linux-gnu/libc.so.6(+0x3b9a5)[0x7facba7ad9a5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf4)[0x7facba793774]
./HelloWaterBox[0x401a29]
Spela
- Peter Eastman
- Posts: 2583
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Simulations with >100k particles.
Hi Spela,
Those numbers are very low. Our memory use has improved a lot in the last few versions, and at least on the Tesla you should have no problem simulating over a million atoms. Could you confirm that you're using OpenMM 5.1? Also, I gather the results are the same with both CUDA and OpenCL (since some of your error messages mentioned CUDA, and others mentioned OpenCL)?
Also, can you confirm that nothing else is using significant device memory? Check for anything else using the GPU, either for computation or graphics.
Peter
Those numbers are very low. Our memory use has improved a lot in the last few versions, and at least on the Tesla you should have no problem simulating over a million atoms. Could you confirm that you're using OpenMM 5.1? Also, I gather the results are the same with both CUDA and OpenCL (since some of your error messages mentioned CUDA, and others mentioned OpenCL)?
Also, can you confirm that nothing else is using significant device memory? Check for anything else using the GPU, either for computation or graphics.
Peter
- Maxim Imakaev
- Posts: 87
- Joined: Sun Oct 24, 2010 2:03 pm
Re: Simulations with >100k particles.
Hi Peter,
I confirm that the bug is still there.
For 200k particles, connected in a chain (by harmonic bonds) it still segfaults.
For 150k it works, taking just 80MB of GPU memory, so it's not a memory issue (CUDA platform, Openmm 5.1).
For a million particles, connected in 10 100k chains, it works perfectly (doing as many as 5 steps per second, and using 320 MB of GPU memory).
Please see below where it segfaulted last time; I couldn't find my C++ code this time so I can't check if it's still there.
Again, in my case it's not a memory issue. It is, perhaps, a stack overflow in a recursive algorithm which calculates connected blocks of atoms.
Max
Where it segfaulted for OpenMM 5.0:
0x00007ffff6776847 in OpenMM::OpenCLContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
And for CUDA platform:
0x00007ffff54d15e9 in OpenMM::CudaContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMCUDA.so
I confirm that the bug is still there.
For 200k particles, connected in a chain (by harmonic bonds) it still segfaults.
For 150k it works, taking just 80MB of GPU memory, so it's not a memory issue (CUDA platform, Openmm 5.1).
For a million particles, connected in 10 100k chains, it works perfectly (doing as many as 5 steps per second, and using 320 MB of GPU memory).
Please see below where it segfaulted last time; I couldn't find my C++ code this time so I can't check if it's still there.
Again, in my case it's not a memory issue. It is, perhaps, a stack overflow in a recursive algorithm which calculates connected blocks of atoms.
Max
Where it segfaulted for OpenMM 5.0:
0x00007ffff6776847 in OpenMM::OpenCLContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
And for CUDA platform:
0x00007ffff54d15e9 in OpenMM::CudaContext::tagAtomsInMolecule(int, int, std::vector<int, std::allocator<int> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) () from /usr/local/openmm/lib/plugins/libOpenMMCUDA.so
- Peter Eastman
- Posts: 2583
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Simulations with >100k particles.
Hi Maxim,
In your case, I believe you're correct that it's a stack overflow. I can fix that pretty easily. But Spela's test was a box of water, so there would be very little recursion. So I think it's something different.
Peter
In your case, I believe you're correct that it's a stack overflow. I can fix that pretty easily. But Spela's test was a box of water, so there would be very little recursion. So I think it's something different.
Peter
- Spela Ivekovic
- Posts: 26
- Joined: Thu Mar 17, 2011 4:27 am
Re: Simulations with >100k particles.
Hi Peter,
I am running OpenMM5.0.1 at the moment and the tests with error printouts in my previous post were all done on a desktop PC (Dell Precision T5500) with 2 Quadro 4000 with 2GB memory. On Tesla, I only ran the simulation in various sizes to see where it gave up, I did not recompile and run in gdb, but I can if that helps.
Regarding Cuda/OpenCL: I let the code choose the platform to run on and it looks like I was running it on Cuda while the code was still gcc-optimised and when I compiled it without optimisation, it then switched to OpenCL - I hadn't spotted that earlier and it seems a bit strange, I can hard-code the platform choice to avoid that in the future.
I have nothing else running on the desktop other than the OpenMM code and the GPU is a dual GPU, so I can designate the one that doesn't control the screen to run the simulation, which I normally do by using an environment variable:
The typical output (before I run the simulation, when the GPUs are idle) I get from nvidia-smi -q | grep % is the following:
and while I am running the simulation:
I'll install OpenMM5.1 to see if any of the "double free" issues go away, but I think the simulation size will still be an issue.
Spela
I am running OpenMM5.0.1 at the moment and the tests with error printouts in my previous post were all done on a desktop PC (Dell Precision T5500) with 2 Quadro 4000 with 2GB memory. On Tesla, I only ran the simulation in various sizes to see where it gave up, I did not recompile and run in gdb, but I can if that helps.
Regarding Cuda/OpenCL: I let the code choose the platform to run on and it looks like I was running it on Cuda while the code was still gcc-optimised and when I compiled it without optimisation, it then switched to OpenCL - I hadn't spotted that earlier and it seems a bit strange, I can hard-code the platform choice to avoid that in the future.
I have nothing else running on the desktop other than the OpenMM code and the GPU is a dual GPU, so I can designate the one that doesn't control the screen to run the simulation, which I normally do by using an environment variable:
Code: Select all
export CUDA_VISIBLE_DEVICES="1"
Code: Select all
> ~/openmm5.0.1/examples$ nvidia-smi -q | grep %
Fan Speed : 40 %
Gpu : 7 %
Memory : 7 %
Fan Speed : 40 %
Gpu : 0 %
Memory : 0 %
Code: Select all
> ~/openmm5.0.1/examples$ nvidia-smi -q | grep %
Fan Speed : 40 %
Gpu : 1 %
Memory : 5 %
Fan Speed : 40 %
Gpu : 98 %
Memory : 6 %
Spela
- Spela Ivekovic
- Posts: 26
- Joined: Thu Mar 17, 2011 4:27 am
Re: Simulations with >100k particles.
Hi Peter,
I've installed OpenMM5.1 from source and tested again, on Quadro 4000. The installation is Release with the optimisation on.
This time, the water box simulation runs up to 45x45x45 water molecules (in OpenMM5.0.1 it reached 46x46x46).
At 46x46x46 it fails in the middle of the simulation run (the location of the failure changes, I have had the crash after frame 5 in one attempt and after frame 6 in another) with the following message:
I have not modified anything other than:
- the number of water molecules in the HelloWaterBox.cpp code
- the simulation time: I set it to run for 1ps instead of 10ps
like so:
The platform it runs on is Cuda:
The interesting thing is that I can now run a 100x100x100 simulation too, without the "out of memory" message. Instead, it just fails with the same "Error downloading array posq" message after the first frame:
It looks like the code in 5.1 has become a lot more memory efficient, but the posq problem remains.
Would you be able to hazard a guess at what might be going on?
Spela
I've installed OpenMM5.1 from source and tested again, on Quadro 4000. The installation is Release with the optimisation on.
This time, the water box simulation runs up to 45x45x45 water molecules (in OpenMM5.0.1 it reached 46x46x46).
At 46x46x46 it fails in the middle of the simulation run (the location of the failure changes, I have had the crash after frame 5 in one attempt and after frame 6 in another) with the following message:
Code: Select all
HETATM292006 O HOH 97336 139.848 139.242 140.253 1.00 0.00
HETATM292007 H1 HOH 97336 139.675 139.415 141.178 1.00 0.00
HETATM292008 H2 HOH 97336 140.794 139.380 140.292 1.00 0.00
ENDMDL
EXCEPTION: Error downloading array posq: CUDA_ERROR_LAUNCH_FAILED (700)
- the number of water molecules in the HelloWaterBox.cpp code
- the simulation time: I set it to run for 1ps instead of 10ps
like so:
Code: Select all
// MODELING AND SIMULATION PARAMETERS
const int NumWatersAlongEdge = 46; // Size of box is NxNxN waters.
const double Temperature = 300; // Kelvins
const double FrictionInPerPs = 91.; // collisions per picosecond
const double CutoffDistanceInAng = 10.; // Angstroms
const bool UseConstraints = true; // Should we constrain O-H bonds?
const double StepSizeInFs = 2; // integration step size (fs)
const double ReportIntervalInFs = 100; // how often to generate PDB frame (fs)
const double SimulationTimeInPs = 1; // total simulation time (ps)
Code: Select all
REMARK Using OpenMM platform CUDA
MODEL 1
REMARK 250 time=0.000 picoseconds
HETATM 1 O HOH 1 0.000 0.000 0.000 1.00 0.00
HETATM 2 H1 HOH 1 0.957 0.000 0.000 1.00 0.00
HETATM 3 H2 HOH 1 -0.240 0.927 0.000 1.00 0.00
Code: Select all
HETATM2999995 O HOH 999999 307.593 307.593 304.486 1.00 0.00
HETATM2999996 H1 HOH 999999 306.636 307.593 304.486 1.00 0.00
HETATM2999997 H2 HOH 999999 307.833 306.666 304.486 1.00 0.00
HETATM2999998 O HOH 1000000 307.593 307.593 307.593 1.00 0.00
HETATM2999999 H1 HOH 1000000 308.550 307.593 307.593 1.00 0.00
HETATM3000000 H2 HOH 1000000 307.353 308.520 307.593 1.00 0.00
ENDMDL
EXCEPTION: Error downloading array posq: CUDA_ERROR_LAUNCH_FAILED (700)
Would you be able to hazard a guess at what might be going on?
Spela
- Peter Eastman
- Posts: 2583
- Joined: Thu Aug 09, 2007 1:25 pm
Re: Simulations with >100k particles.
Hi Spela,
Sorry for not replying sooner - I've been on vacation. Here is the script I've been using to test this. Try running it and see what happens for you:
You can adjust boxSize at the beginning to vary the number of water molecules that get added. On a machine with a GTX 680, 2 GB of device memory and 16 GB system memory, running Ubuntu 12.04, it can get up to 24 (over 1.3 million atoms) without crashing.
Also, how much system memory does your computer have?
Peter
Sorry for not replying sooner - I've been on vacation. Here is the script I've been using to test this. Try running it and see what happens for you:
Code: Select all
from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *
boxSize = 20
forcefield = ForceField('amber99sb.xml', 'spce.xml')
modeller = Modeller(Topology(), [])
print "Adding solvent"
modeller.addSolvent(forcefield, boxSize=Vec3(boxSize, boxSize, boxSize)*nanometers)
print "Building system"
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, nonbondedCutoff=0.9*nanometers)
print system.getNumParticles(), "atoms"
integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
print "Creating context"
simulation = Simulation(modeller.topology, system, integrator, Platform.getPlatformByName('CUDA'))
simulation.context.setPositions(modeller.positions)
print "Computing"
print 'initial energy:', simulation.context.getState(getEnergy=True).getPotentialEnergy()
simulation.step(50)
print 'final energy:', simulation.context.getState(getEnergy=True).getPotentialEnergy()
Also, how much system memory does your computer have?
Peter
- Spela Ivekovic
- Posts: 26
- Joined: Thu Mar 17, 2011 4:27 am
Re: Simulations with >100k particles.
Hi Peter,
my PC has 12GB system memory and I am also running Ubuntu 12.04.
I tried your script but have discovered that the Python installation does not work in my OpenMM 5.1. I'll have to sort that out first. I'm not normally using the Python scripts, I work with OpenMM C++ API directly.
I'll get back to you when I have the Python stuff sorted out.
Spela
my PC has 12GB system memory and I am also running Ubuntu 12.04.
I tried your script but have discovered that the Python installation does not work in my OpenMM 5.1. I'll have to sort that out first. I'm not normally using the Python scripts, I work with OpenMM C++ API directly.
I'll get back to you when I have the Python stuff sorted out.
Spela