Page 1 of 2

Problem running openmm on Retina MBP

Posted: Sat Feb 23, 2013 10:30 am
by jlmaccal
I'm having a problem installing / running openmm 5.0 on my 15" retina MBP running OS X 10.8.2 with a Geforce GT650M.

I have installed CUDA 5.0.37 from NVIDIA's website. I can compile and run the deviceQuery and bandwidthTest programs without problem.The output from deviceQuery is below:

Code: Select all

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 650M"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1024 MBytes (1073414144 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:    384 CUDA Cores
  GPU Clock rate:                                900 MHz (0.90 GHz)
  Memory Clock rate:                             2508 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GT 650M
I have downloaded and installed the binary version of openmm5.0-mac. When I run testInstallation.py, it gets through the test of the reference platform, but hangs my machine completely when it gets to the cuda platform. I can compile the Hello* example programs, but none of them run successfully. Below is the output from HelloArgon. Running others, e.g. HelloEthane, crashes my machine completely.

Code: Select all

REMARK  Using OpenMM platform CUDA
MODEL     1
ATOM      1  AR   AR     1       0.000   0.000   0.000  1.00  0.00
ATOM      2  AR   AR     1       5.000   0.000   0.000  1.00  0.00
ATOM      3  AR   AR     1      10.000   0.000   0.000  1.00  0.00
ENDMDL
libc++abi.dylib: terminate called without an active exception
[1]    1582 abort      ./HelloArgon
I am trying to prepare for the upcoming openmm workshop, so any help would be appreciated. I have turned off automatic graphics switching and my PATH and DYLD_LIBRARY_PATH are set correctly. I have also unsuccessfully tried to build openmm from source, but I run into a number of build errors.

-- Justin

Re: Problem running openmm on Retina MBP

Posted: Sat Feb 23, 2013 3:56 pm
by jlmaccal
A couple of updates.

After some hacking of the cmake files, I was able to build from source. This crashes with exactly the same issues. Many of the TestCuda* programs cause a total freeze of the system as well.

Running cuda-memcheck on any of the tests produces a whole load of errors:

Code: Select all

cuda-memtest ./HelloArgon

MODEL     1
ATOM      1  AR   AR     1       0.000   0.000   0.000  1.00  0.00
ATOM      2  AR   AR     1       5.000   0.000   0.000  1.00  0.00
ATOM      3  AR   AR     1      10.000   0.000   0.000  1.00  0.00
ENDMDL
========= Error: process didn't terminate successfully
========= CUDA-MEMCHECK
========= Invalid __global__ write of size 16
=========     at 0x000002a0 in clearTwoBuffers
=========     by thread (95,0,0) in block (2,0,0)
=========     Address 0x002415f0 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/Library/Frameworks/CUDA.framework/Versions/A/Libraries/libcuda_304.10.20.dylib (cuLaunchKernel + 0x320) [0x128f0]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x6e) [0x1237e]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM11CudaContext21clearAutoclearBuffersEv + 0x454) [0x12c54]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM29CudaCalcForcesAndEnergyKernel16beginComputationERNS_11ContextImplEbbi + 0x9f) [0x37a1f]
=========     Host Frame:/usr/local/openmm/lib/libOpenMM.dylib (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0x110) [0x9be0]
=========     Host Frame:/usr/local/openmm/lib/libOpenMM.dylib (_ZN6OpenMM16VerletIntegrator4stepEi + 0x4e) [0x40dee]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (_Z13simulateArgonv + 0x3ee) [0x20de]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (main + 0x9) [0x23c9]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (start + 0x34) [0x1cc4]
=========     Host Frame:[0x1]
=========
========= Invalid __global__ write of size 16
=========     at 0x000002a0 in clearTwoBuffers
=========     by thread (94,0,0) in block (2,0,0)
=========     Address 0x002415e0 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/Library/Frameworks/CUDA.framework/Versions/A/Libraries/libcuda_304.10.20.dylib (cuLaunchKernel + 0x320) [0x128f0]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x6e) [0x1237e]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM11CudaContext21clearAutoclearBuffersEv + 0x454) [0x12c54]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM29CudaCalcForcesAndEnergyKernel16beginComputationERNS_11ContextImplEbbi + 0x9f) [0x37a1f]
=========     Host Frame:/usr/local/openmm/lib/libOpenMM.dylib (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0x110) [0x9be0]
=========     Host Frame:/usr/local/openmm/lib/libOpenMM.dylib (_ZN6OpenMM16VerletIntegrator4stepEi + 0x4e) [0x40dee]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (_Z13simulateArgonv + 0x3ee) [0x20de]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (main + 0x9) [0x23c9]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (start + 0x34) [0x1cc4]
=========     Host Frame:[0x1]
...
...
There are about 1000 errors like that. I'm not sure if this is a red herring, but if writes are happening outside of array bounds, it could explain the GPU freeze I'm observing.

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 11:01 am
by peastman
Hi Justin,

Thanks. Other people have also reported this problem. It only seems to affect that one particular model: MacBook Pro with retina display. I suspect a driver bug, but that's just a guess. We're trying to track it down.

Peter

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 11:15 am
by jlmaccal
Some progress... but, still not working. I believe this is a 32/64 bit issue.

First, in order to compile with recent versions of xcode from the Mac App store, you must remove the line:

Code: Select all

SET (CMAKE_OSX_SYSROOT "/Developer/SDKs/MacOSX10.6.sdk")
from the files:

Code: Select all

platforms/cuda/CMakeLists.txt
platforms/opencl/CMakeLists.txt
plugins/rpmd/platforms/cuda/CMakeLists.txt
plugins/rpmd/platforms/cuda/CMakeLists.txt
Xcode no longer installs SDKs to /Developer. They are now installed inside of the application bundle. Removing this line allows cmake to find the correct version of the SDK.

Building from source still causes the error above. However, if I build a 32-bit version only, then all of the C++ programs (Hello*, TestCuda*) function correctly. But, my build of python is 64-bit and so I can't import any of the python modules. I don't really feel like rebuilding all of my python infrastructure in 32-bits, so I'm pressing on trying to debug whey the 64-bit version isn't working.

Upon further investigation, the errors are occurring when the autoclearBuffers are cleared at the start of the first frame. autoclearBuffers is a vector of type

Code: Select all

std::vector<CUdeviceptr>
. On 64-bit platforms, this is defined as an

Code: Select all

unsigned long long
. However,

Code: Select all

clearSingleBuffer
and related kernels all take an

Code: Select all

int *
argument. If I add extra diagnostic information to

Code: Select all

CudaArray::CudaArray
,

Code: Select all

CudaContext::addAutoclearBuffer
, and

Code: Select all

CudaContexxt::clearAutoclearBuffers
and then run with cuda-memcheck, I get the following:

Code: Select all

Allocated array at 30066081792 name: posq size: 32 elemSize: 16
Result 0
Allocated array at 30066082304 name: velm size: 32 elemSize: 16
Result 0
Allocated array at 30067130368 name: energyBuffer size: 1536 elemSize: 4
Result 0
Allocated array at 30066082816 name: posDelta size: 32 elemSize: 16
Result 0
Allocated array at 30066083328 name: stepSize size: 1 elemSize: 8
Result 0
Allocated array at 30066083840 name: vsite2AvgAtoms size: 1 elemSize: 16
Result 0
Allocated array at 30066084352 name: vsite3AvgAtoms size: 1 elemSize: 16
Result 0
Allocated array at 30066084864 name: vsiteOutOfPlaneAtoms size: 1 elemSize: 16
Result 0
Allocated array at 30066085376 name: vsite2AvgWeights size: 1 elemSize: 8
Result 0
Allocated array at 30066085888 name: vsite3AvgWeights size: 1 elemSize: 16
Result 0
Allocated array at 30066086400 name: vsiteOutOfPlaneWeights size: 1 elemSize: 16
Result 0
Allocated array at 30066086912 name: sigmaEpsilon size: 32 elemSize: 8
Result 0
Allocated array at 30066087424 name: force size: 96 elemSize: 8
Result 0
Adding autoclear buffer at location 30066087424 with size 192
Current contents of autoclearBuffers and autoclearBufferSizes:
        30066087424     192
Adding autoclear buffer at location 30067130368 with size 1536
Current contents of autoclearBuffers and autoclearBufferSizes:
        30066087424     192
        30067130368     1536
Allocated array at 30066088448 name: atomIndex size: 32 elemSize: 4
Result 0
Allocated array at 30066088960 name: exclusionIndices size: 1 elemSize: 4
Result 0
Allocated array at 30066089472 name: exclusionRowIndices size: 2 elemSize: 4
Result 0
Allocated array at 30066089984 name: exclusions size: 32 elemSize: 4
Result 0
REMARK  Using OpenMM platform CUDA
MODEL     1
ATOM      1  AR   AR     1       0.000   0.000   0.000  1.00  0.00
ATOM      2  AR   AR     1       5.000   0.000   0.000  1.00  0.00
ATOM      3  AR   AR     1      10.000   0.000   0.000  1.00  0.00
ENDMDL
clearAutoclearBuffers called
        total: 2
        30066087424     192
        30067130368     1536

========= CUDA-MEMCHECK

...snipped...

=========
========= Invalid __global__ write of size 16
=========     at 0x000000d8 in clearTwoBuffers
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x00141600 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/Library/Frameworks/CUDA.framework/Versions/A/Libraries/libcuda_304.10.20.dylib (cuLaunchKernel + 0x320) [0x128f0]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x6e) [0x149ce]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM11CudaContext21clearAutoclearBuffersEv + 0x5e4) [0x15454]
=========     Host Frame:/usr/local/openmm/lib/plugins/libOpenMMCUDA.dylib (_ZN6OpenMM29CudaCalcForcesAndEnergyKernel16beginComputationERNS_11ContextImplEbbi + 0x9f) [0x3a4bf]
=========     Host Frame:/usr/local/openmm/lib/libOpenMM.dylib (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0x77) [0x9f07]
=========     Host Frame:/usr/local/openmm/lib/libOpenMM.dylib (_ZN6OpenMM16VerletIntegrator4stepEi + 0x4e) [0x4264e]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (_Z13simulateArgonv + 0x3ee) [0x213e]
=========     Host Frame:/Users/jlmaccal/Downloads/OpenMM5.0-Source/./HelloArgon (main + 0x9) [0x2429]
=========     Host Frame:/usr/lib/system/libdyld.dylib (start + 0x0) [0x27e1]
=========     Host Frame:[0x1]
=========

...snipped...

I think the key thing to note here is that the pointers printed by the C++ code are 64-bit, while the addresses reported as out of bounds by cuda-memcheck are 32-bit. The 32-bit addresses match the low-order 32-bits of the 64-bit addresses (30066087424 = 0x700141600, vs 0x00141600 from cuda-memcheck). It appears as though the C++ code has 64-bit pointers, while the GPU code is only expecting 32-bits. I suspect this is just some kind of cmake issue where a compiler flag isn't being correctly set on mac, but I don't really even know where to look.

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 11:54 am
by peastman
Hi Justin,

Great debugging work! From what you're saying, it sounds like the host code is compiled in 64 bit mode but the device code is compiled in 32 bit mode, and that inconsistency is causing the problem. If so, we should be able to fix it by telling it to compile the device code is 64 bit mode.

Device code is compiled in CudaContext::createModule(). Within that method you'll find the following lines, which create a command line for invoking nvcc:

Code: Select all

#ifdef WIN32
    string command = ""+compiler+" --ptx --machine 32 -arch=sm_"+gpuArchitecture+" -o "+outputFile+" "+options+" "+inputFile+" 2> "+logFile;
#else
    string command = "\""+compiler+"\" --ptx -arch=sm_"+gpuArchitecture+" -o \""+outputFile+"\" "+options+" \""+inputFile+"\" 2> \""+logFile+"\"";
#endif
Try adding the option "--machine 64" to the second version (the non-WIN32 one). Does that fix the problem?

Peter

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 11:56 am
by jlmaccal
Editing createModule in CudaContext.cpp:

Code: Select all

string command = "\""+compiler+"\" -m64 --ptx -arch=sm_"+gpuArchitecture+" -o \""+outputFile+"\" "+options+" \""+inputFile+"\" 2> \""+logFile+"\"";
I've hard-coded the -m64 flag to nvcc to force it to build 64-bit. This hack makes everything work fine on my system. It's obviously not a general fix, but it should give some idea of where to look. I'm not sure why this error would be specific to Retina MBPs. I have a slightly older macbook at home with an nvidia gpu in it. I'll report back what happens on that machine.

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 11:57 am
by jlmaccal
Great minds think alike!

I independently made a change similar to the one you suggested and everything is working now.

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 12:17 pm
by peastman
Great! And the general fix is only very slightly more complicated: replace "64" by "intToString(4*sizeof(void*))", and it will always use the same mode for host and device code.

I'm still mystified why this problem shows up on that particular model and no other...

Peter

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 1:15 pm
by jchodera
Justin,

Thanks so much for figuring this out! We were suffering from exactly this problem and couldn't make head or tail of what was going on...

Best,

John

Re: Problem running openmm on Retina MBP

Posted: Mon Feb 25, 2013 3:32 pm
by jlmaccal
All of the tests pass except:

Code: Select all

./TestCudaCustomGBForce                                                                                                                                                                                                                                                                                             

exception: Assertion failure at TestCudaCustomGBForce.cpp:253.  Expected 6568.5, found 6696.99
The failure is in testMembrane, if I comment out this test, then everything passes. Should I be concerned?