Compiling OpenMM on Ubuntu Linux

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
User avatar
Peter Eastman
Posts: 2580
Joined: Thu Aug 09, 2007 1:25 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Peter Eastman » Thu Jul 15, 2010 12:44 pm

I've seen errors like those too, though only on certain machines. I believe it's caused by an Nvidia bug. Those test cases run a whole lot of tests in rapid succession, where each test creates a new Context, does some calculation with it, and then disposes the Context. On certain machines, it appears the driver can't keep up with the rapid creation and destruction of Contexts so that at some point, when it tries to create a new one, it fails with CL_DEVICE_NOT_AVAILABLE.

Peter

User avatar
Joshua Adelman
Posts: 20
Joined: Thu Feb 21, 2008 4:42 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Joshua Adelman » Thu Jul 22, 2010 10:32 am

I've switched the machine that I'm attempting to build OpenMM (r2367) on, from a machine with an ATI card in addition to the GTX480 to a machine with two GTX480s which should eliminate any questions about what type of resource is being used. I'm still getting errors when I run the unit tests:

93% tests passed, 6 tests failed out of 83

Total Test time (real) = 311.39 sec

The following tests FAILED:
55 - TestOpenCLCustomAngleForce (Failed)
57 - TestOpenCLSort (Failed)
60 - TestOpenCLEwald (Failed)
74 - TestOpenCLNonbondedForce (SEGFAULT)
75 - TestOpenCLCustomHbondForce (Failed)
79 - TestOpenCLGBSAOBCForce (Failed)

David and I saw similar failures on his machine. The breakdown for the individual tests were:
$ ./TestOpenCLCustomAngleForce
exception: Error initializing context: clCreateContextFromType (-2)
$ ./TestOpenCLSort
exception: Error downloading array sortData: clEnqueueReadBuffer (-5)
$ ./TestOpenCLEwald
exception: Error invoking kernel findAtomRangeForGrid: clEnqueueNDRangeKernel (-5)
$ ./TestOpenCLNonbondedForce
Segmentation fault
$ ./TestOpenCLCustomHbondForce
exception: Error initializing context: clCreateContextFromType (-2)
$ ./TestOpenCLGBSAOBCForce
exception: Error initializing context: clCreateContextFromType (-2)

Any ideas of what might be going on? The machine is again running Ubuntu Linux and device Query gives the following:


CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "GeForce GTX 480"
CUDA Driver Version: 3.10
CUDA Runtime Version: 3.10
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1609760768 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.40 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No

Device 1: "GeForce GTX 480"
CUDA Driver Version: 3.10
CUDA Runtime Version: 3.10
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1610285056 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.40 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 2, Device = GeForce GTX 480, Device = GeForce GTX 480


PASSED

User avatar
Peter Eastman
Posts: 2580
Joined: Thu Aug 09, 2007 1:25 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Peter Eastman » Thu Jul 22, 2010 10:48 am

I've contacted Nvidia about the CL_DEVICE_NOT_AVAILABLE errors, so hopefully they'll be able to track down the problem with them.

Could you try running TestOpenCLNonbondedForce in gdb to see where the segfault is occurring? If possible, use a version of OpenMM compiled in debug mode.

Peter

User avatar
Joshua Adelman
Posts: 20
Joined: Thu Feb 21, 2008 4:42 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Joshua Adelman » Thu Jul 22, 2010 11:02 am

Hi Peter,

I rebuilt OpenMM in debug mode and ran the cuda-gdb and did a backtrace. Let me know if you need additional tests to be run.

Josh

cuda-gdb TestOpenCLNonbondedForce
NVIDIA (R) CUDA Debugger
3.1 beta release
Portions Copyright (C) 2008,2009,2010 NVIDIA Corporation
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(cuda-gdb) run
Starting program: /home/jadelman/OpenMM_r2367/TestOpenCLNonbondedForce
[Thread debugging using libthread_db enabled]
[New process 28114]
Error while reading shared library symbols:
Cannot find new threads: generic error
[New Thread 140214759069472 (LWP 28114)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 140214759069472 (LWP 28114)]
0x00007f8648b9c243 in ?? () from /usr/lib/libcuda.so
(cuda-gdb) backtrace
#0 0x00007f8648b9c243 in ?? () from /usr/lib/libcuda.so
#1 0x00007f8648bd5bec in ?? () from /usr/lib/libcuda.so
#2 0x00007f8648b93d8a in ?? () from /usr/lib/libcuda.so
#3 0x00007f8648ba229a in ?? () from /usr/lib/libcuda.so
#4 0x00007f8648c10f5f in ?? () from /usr/lib/libcuda.so
#5 0x00007f8648c04e7f in ?? () from /usr/lib/libcuda.so
#6 0x00007f8648c0f9dd in ?? () from /usr/lib/libcuda.so
#7 0x00007f864a9f28e1 in cl::CommandQueue::enqueueNDRangeKernel (this=0x16dd6f8, kernel=@0x7fff9ee155e0, offset=@0x7f864ac858e0, global=@0x7fff9ee15290, local=@0x7fff9ee15260, events=0x0,
event=0x0) at /home/jadelman/OpenMM_r2367/platforms/opencl/src/cl.hpp:2951
#8 0x00007f864a9ee938 in OpenMM::OpenCLContext::executeKernel (this=0x16dd690, kernel=@0x7fff9ee155e0, workUnits=20, blockSize=64)
at /home/jadelman/OpenMM_r2367/platforms/opencl/src/OpenCLContext.cpp:243
#9 0x00007f864a9ec839 in OpenCLContext (this=0x16dd690, numParticles=2, deviceIndex=0) at /home/jadelman/OpenMM_r2367/platforms/opencl/src/OpenCLContext.cpp:119
#10 0x00007f864aa0ca8f in PlatformData (this=0x16dd630, numParticles=2, deviceIndex=-1) at /home/jadelman/OpenMM_r2367/platforms/opencl/src/OpenCLPlatform.cpp:109
#11 0x00007f864aa0c7ae in OpenMM::OpenCLPlatform::contextCreated (this=0x7fff9ee16140, context=@0x16dd360, properties=@0x7fff9ee16208)
at /home/jadelman/OpenMM_r2367/platforms/opencl/src/OpenCLPlatform.cpp:100
#12 0x00007f864a573ab3 in ContextImpl (this=0x16dd360, owner=@0x7fff9ee16200, system=@0x7fff9ee160b0, integrator=@0x7fff9ee161c0, platform=0x7fff9ee16140, properties=@0x7fff9ee16208)
at /home/jadelman/OpenMM_r2367/openmmapi/src/ContextImpl.cpp:69
#13 0x00007f864a55ccb1 in Context (this=0x7fff9ee16200, system=@0x7fff9ee160b0, integrator=@0x7fff9ee161c0, platform=@0x7fff9ee16140) at /home/jadelman/OpenMM_r2367/openmmapi/src/Context.cpp:44
#14 0x000000000040e026 in testCoulomb () at /home/jadelman/OpenMM_r2367/platforms/opencl/tests/TestOpenCLNonbondedForce.cpp:68
#15 0x0000000000421dec in main () at /home/jadelman/OpenMM_r2367/platforms/opencl/tests/TestOpenCLNonbondedForce.cpp:680

User avatar
Timo Stich
Posts: 2
Joined: Tue Mar 09, 2010 2:58 am

RE: Compiling OpenMM on Ubuntu Linux

Post by Timo Stich » Fri Jul 23, 2010 4:48 am

Joshua,

can you also please report which driver you have installed (e.g. by running glxinfo | grep NVIDIA )?

Which Ubuntu version are you on - 64bit 10.04?

Thanks,
Timo

User avatar
Joshua Adelman
Posts: 20
Joined: Thu Feb 21, 2008 4:42 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Joshua Adelman » Fri Jul 23, 2010 5:02 am

Hi Timo,

The computer is running Ubuntu 10.04 LTS 64-bit.

glxinfo | grep NVIDIA reports the following:
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce 9600M GT OpenGL Engine
OpenGL version string: 1.4 (2.1 NVIDIA-1.6.16)

although I'm not sure what the OpenGL driver version means in terms of OpenCL/Cuda. According to deviceQuery, the drivers are v3.1 (see previous post).

User avatar
Timo Stich
Posts: 2
Joined: Tue Mar 09, 2010 2:58 am

RE: Compiling OpenMM on Ubuntu Linux

Post by Timo Stich » Mon Jul 26, 2010 4:19 am

Joshua,

Ubuntu 10.04 is an officially supported distro for CUDA 3.1. So might be the reason for the failures.

Also i am wondering if the output is from the same machine? You reported earlier that the system has two GTX480 but your glxinfo reports a 9600M...

The reason i am askin for this is that besides the CUDA version the display driver plays a role as well. The official developer driver for the 3.1 cuda release is 256.40 on Linux. Can you check that this is the one you have installed on that system? Alernatively you can find the driver version with cat /proc/driver/nvidia/version on a headless system.

Thanks.

User avatar
David Koes
Posts: 13
Joined: Thu Jan 14, 2010 8:20 am

RE: Compiling OpenMM on Ubuntu Linux

Post by David Koes » Mon Jul 26, 2010 6:24 am

Just to clear things up.. Joshua is trying to run benchmarks on some of the machines in our group. We have two GPU-centric machines, one has a GTX 480 and a Radeon 5970, the other has two GTX 480s and an older nVidia card (hence the 9600M). Both machines are running Ubuntu 10.04. Both machines have (nondeveloper) nVidia drivers 256.35. Both machines have the same set of failures when running make test, however we don't have any problems running accelerated gromacs.

Previously, I had tried installing the beta opencl1.1 developer drivers and that actually resulted in more tests failing. I will try installing the released developer drivers, but I am not very hopeful..

User avatar
Darren Weber
Posts: 2
Joined: Sun May 22, 2011 10:38 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Darren Weber » Tue May 24, 2011 1:53 pm


Same problem with CUDA 3.2 on Ubuntu 10.04, i.e.:

GPU check failed

Change Dir: /data/symbiosWorkshop/OpenMM3.0-Build/CMakeFiles/CMakeTmp

Run Build Command:/usr/bin/make "cmTryCompileExec/fast"
/usr/bin/make -f CMakeFiles/cmTryCompileExec.dir/build.make
CMakeFiles/cmTryCompileExec.dir/build
make[1]: Entering directory `/data/symbiosWorkshop/OpenMM3.0-Build/CMakeFiles/CMakeTmp'
/usr/bin/cmake -E cmake_progress_report
/data/symbiosWorkshop/OpenMM3.0-Build/CMakeFiles/CMakeTmp/CMakeFiles 1
Building C object CMakeFiles/cmTryCompileExec.dir/has_cuda_gpu.c.o
/usr/bin/gcc -I/usr/local/cuda/include -o CMakeFiles/cmTryCompileExec.dir/has_cuda_gpu.c.o
-c /data/symbiosWorkshop/OpenMM3.0-Source/src/platforms/cuda/tests/has_cuda_gpu.c
Linking C executable cmTryCompileExec
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTryCompileExec.dir/link.txt --verbose=1
/usr/bin/gcc -L/usr/local/cuda/lib64 CMakeFiles/cmTryCompileExec.dir/has_cuda_gpu.c.o
-o cmTryCompileExec -rdynamic /usr/local/cuda/lib/libcudart.so -Wl,-rpath,/usr/local/cuda/lib
/usr/local/cuda/lib/libcudart.so: could not read symbols: File in wrong format
collect2: ld returned 1 exit status
make[1]: *** [cmTryCompileExec] Error 1
make[1]: Leaving directory `/data/symbiosWorkshop/OpenMM3.0-Build/CMakeFiles/CMakeTmp'
make: *** [cmTryCompileExec/fast] Error 2


User avatar
Darren Weber
Posts: 2
Joined: Sun May 22, 2011 10:38 pm

RE: Compiling OpenMM on Ubuntu Linux

Post by Darren Weber » Tue May 24, 2011 4:11 pm


This solution arrives at a complete build on Ubuntu 10.04 with CUDA 3.2:

$ cd OpenMM3.0-build
$ ccmake -i ../OpenMM3.0-source -DFOUND_CUBLAS=/usr/local/cuda/lib64/libcublas.so -DFOUND_CUDART=/usr/local/cuda/lib64/libcudart.so -DFOUND_CUFFT=/usr/local/cuda/lib64/libcufft.so

POST REPLY