Page 1 of 2

Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 1:51 pm
by jadelman
I'm having some problems compiling OpenMM (from the SVN rev 2355 and 2356) on two different Ubuntu Linux machines (10.04). One has a GTX 480 and the other has a GTX 260. Both machines have the Cuda 3.1 drivers and runtimes.

I've set LD_LIBRARY_PATH to point to the cuda lib and PATH to point to the cuda bin directory. When I run 'cmake .' from within the OpenMM directory, I get:

cmake .
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
GPU check failed
Change Dir: /home/jadelman/code/OpenMM_r2356/CMakeFiles/CMakeTmp

Run Build Command:/usr/bin/make "cmTryCompileExec/fast"
/usr/bin/make -f CMakeFiles/cmTryCompileExec.dir/build.make CMakeFiles/cmTryCompileExec.dir/build
make[1]: Entering directory `/home/jadelman/code/OpenMM_r2356/CMakeFiles/CMakeTmp'
/usr/bin/cmake -E cmake_progress_report /home/jadelman/code/OpenMM_r2356/CMakeFiles/CMakeTmp/CMakeFiles 1
Building C object CMakeFiles/cmTryCompileExec.dir/has_cuda_gpu.c.o
/usr/bin/gcc -I/usr/local/cuda/include -o CMakeFiles/cmTryCompileExec.dir/has_cuda_gpu.c.o -c /home/jadelman/code/OpenMM_r2356/platforms/cuda/tests/has_cuda_gpu.c
Linking C executable cmTryCompileExec
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTryCompileExec.dir/link.txt --verbose=1
/usr/bin/gcc CMakeFiles/cmTryCompileExec.dir/has_cuda_gpu.c.o -o cmTryCompileExec -rdynamic /usr/local/cuda/lib/libcudart.so -Wl,-rpath,/usr/local/cuda/lib
/usr/local/cuda/lib/libcudart.so: could not read symbols: File in wrong format
collect2: ld returned 1 exit status
make[1]: *** [cmTryCompileExec] Error 1
make[1]: Leaving directory `/home/jadelman/code/OpenMM_r2356/CMakeFiles/CMakeTmp'
make: *** [cmTryCompileExec/fast] Error 2


-- Found OPENCL: /usr/lib/libOpenCL.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jadelman/code/OpenMM_r2356

The line that reads "could not read symbols: File in wrong format" may be due to a 32 vs 64-bit issue, so I switch LD_LIBRARY_PATH to point to /usr/local/cuda/lib64 and start a fresh build, but cmake still populates CMakeCache.txt with references to the /lib directory.

I've also tried editing CMakeCache by hand to point to all of the proper cuda-related lib files, but I continue to get a message that the GPU check has failed (I also get No OpenCL platforms found.' errors, even if cmake finds /usr/lib64/libOpenCL.so). After hand-editing the CMakeCache file, I can get it to run through the make sequence, getting lines like:
[ 71%] Converting NVCC dependency to CMake (/home/jla65/code/OpenMM_r2355/src/cuda/kCalculateCDLJForces.cu_OpenMMCuda_generated.cpp.depend)
[ 71%] Building (Device) NVCC /home/jla65/code/OpenMM_r2355/platforms/cuda/./src/kernels//kCalculateCDLJForces.cu: /home/jla65/code/OpenMM_r2355/src/cuda/kCalculateCDLJForces.cu_OpenMMCuda_generated.cpp
Scanning dependencies of target OpenMMCuda
[ 72%] Building CXX object platforms/cuda/sharedTarget/CMakeFiles/OpenMMCuda.dir/__/src/kernels/rng.cpp.o
[ 72%] Building CXX object platforms/cuda/sharedTarget/CMakeFiles/OpenMMCuda.dir/__/src/kernels/gpu.cpp.o

but then it doesn't generate any of the Cuda tests for 'make test'.

I haven't had any problems with the cmake tool setting things up properly on various OSX platforms or a RHEL machine.

Does anyone have any advice on how to proceed on an Ubuntu machine or have had similar experiences?

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 2:19 pm
by cmbruns
Try setting the cmake variable FOUND_CUDART to the location of the the 64 bit cudart library. If that doesn't help, set CUDA_HAVE_GPU to TRUE, which should at least get the cuda tests built.

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 2:47 pm
by dkoes
A large part of the problem was that the nvidia.icd was not installed in /etc/OpenCL/vendors. I don't know why the nvidia installer didn't put it there, but manually extracting it from the nvidia installer (-x) and copying it to the location helped.

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 2:56 pm
by jadelman
I should start off by mentioning that this is Dave's machine, so many thanks to him for hacking around on this.

After the changes that Dave made, cmake was much happier and set up the build system without any errors. I was able to compile OpenMM, but it failed several tests (see below). This is one of the first times I've seen 'make test' fail a subset of tests. My past experience is that it fails all or none:


87% tests passed, 10 tests failed out of 79

Total Test time (real) = 236.06 sec

The following tests FAILED:
52 - TestOpenCLGBSAOBCForce (Failed)
57 - TestOpenCLEwald (Failed)
61 - TestOpenCLCustomHbondForce (Failed)
67 - TestOpenCLNonbondedForce (SEGFAULT)
69 - TestOpenCLCustomAngleForce (Failed)
70 - TestOpenCLSort (Failed)
76 - TestCudaGBVISoftcoreForce (Not Run)
77 - TestFindExclusions (Not Run)
78 - TestParser (Not Run)
79 - TestLocalEnergyMinimizer (Not Run)

Ignoring the last three tests, which are new and I'm guessing are part of the developer debugging work, the errors that I'm getting when I run the other tests individually are:

$ ./TestOpenCLGBSAOBCForce
exception: Error initializing context: clCreateContextFromType (-2)
$ ./TestOpenCLEwald
exception: Error invoking kernel findAtomRangeForGrid: clEnqueueNDRangeKernel (-5)
$ ./TestOpenCLCustomHbondForce
exception: Error initializing context: clCreateContextFromType (-2)
$ ./TestOpenCLNonbondedForce
Segmentation fault (core dumped)
$ ./TestOpenCLCustomAngleForce
exception: Error initializing context: clCreateContextFromType (-2)

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 3:02 pm
by peastman
TestFindExclusions and TestParser aren't at all new. They've both been around for a long time and don't even use the GPU. It looks to me like you still aren't getting a clean compile.

The first thing to do is "make clean", just to be sure you aren't somehow mixing code compiled by two different versions of CUDA. That would certainly cause problems.

Peter

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 3:03 pm
by friedrim
The return value -2 in

exception: Error initializing context: clCreateContextFromType (-2)

corresponds to CL_DEVICE_NOT_AVAILABLE


RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 3:05 pm
by dkoes
I suspect the problem with the OpenCL tests is that this machine has both an ATI and an nVidia card in it. How do you control which card the opencl tests run on?

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 3:10 pm
by peastman
> How do you control which card the opencl tests run on?

By which set of OpenCL libraries you include in LD_LIBRARY_PATH. If they're both included, it will just use whichever comes up as platform 0.

All the tests should pass with either GPU, but if you're using the ATI one, make sure you have Stream SDK 2.1 installed. With earlier versions you'll get a lot of failures (which may be what is happening here).

Peter

RE: Compiling OpenMM on Ubuntu Linux

Posted: Wed Jul 14, 2010 3:17 pm
by dkoes
I should mention that I'm compiling the 2.0 release on the same machine and the failures I get are:
52 - TestOpenCLGBSAOBCForce (Failed)
57 - TestOpenCLEwald (Failed)
61 - TestOpenCLCustomHbondForce (Failed)
67 - TestOpenCLNonbondedForce (SEGFAULT)
69 - TestOpenCLCustomAngleForce (Failed)
70 - TestOpenCLSort (Failed)
76 - TestCudaGBVISoftcoreForce (Not Run)

76 doesn't compile since it's missing a
#include <cstdio>
(stderr and printf aren't defined).

Also, make install fails since the user guide isn't where the installer was expecting it.

RE: Compiling OpenMM on Ubuntu Linux

Posted: Thu Jul 15, 2010 8:46 am
by dkoes
It seems to be using the nvidia opencl library. The errors generated by the failing tests are:
dkoes@quasar:~/build/build_openmm$ ./TestOpenCLGBSAOBCForce
exception: Error initializing context: clCreateContextFromType (-2)
dkoes@quasar:~/build/build_openmm$ ./TestOpenCLEwald
exception: Error invoking kernel findAtomRangeForGrid: clEnqueueNDRangeKernel (-5)
dkoes@quasar:~/build/build_openmm$ ./TestOpenCLCustomHbondForce
exception: Error initializing context: clCreateContextFromType (-2)
dkoes@quasar:~/build/build_openmm$ ./TestOpenCLNonbondedForce
Segmentation fault (core dumped)
dkoes@quasar:~/build/build_openmm$ ./TestOpenCLCustomAngleForce
exception: Error initializing context: clCreateContextFromType (-2)
dkoes@quasar:~/build/build_openmm$ ./TestOpenCLSort
exception: Error downloading array sortData: clEnqueueReadBuffer (-5)


-2 is CL_DEVICE_NOT_AVAILABLE and -5 is CL_OUT_OF_RESOURCES

If I reduce the size of the array in TestOpenCLSort to 100 (instead of 10000) then the test passes.

Any suggestions as to how to debug this?

Thanks,
-Dave