problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Hyuntae Jung
Posts: 3
Joined: Mon Mar 09, 2009 7:34 am

problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

Post by Hyuntae Jung » Fri May 17, 2019 8:37 pm

Hello,

I would like to get any comments to solve following problem:
My cluster (intel x86_64 linux) had GTX 1080 Ti nodes, but currently I added a new node with RTX 2070 cards.
The cluster has installed OpenMM with CUDA 9.2 (using

Code: Select all

conda install -c omnia/label/cuda92 -c conda-forge openmm
) and I was going to install a new OpenMM with CUDA 10.0 using

Code: Select all

conda install -c omnia/label/cuda100 -c conda-forge openmm
However, I got the problem when I run '

Code: Select all

python -m simtk.testInstallation
',
and the message is:

Code: Select all

There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Error computing forces with CUDA platform
4 OpenCL - Successfully computed forces

CUDA platform error: Error launching CUDA compiler: 256
nvcc fatal   : Value 'sm_75' is not defined for option 'gpu-architecture'

Median difference in forces between platforms:

Reference vs. CPU: 6.30224e-06
Reference vs. OpenCL: 6.75426e-06
CPU vs. OpenCL: 8.12575e-07
As for first trial to solve, I thought it may be related with wrong path of cuda libraries,
and I checked the linking library path like this:

Code: Select all

(openmm_cuda10_env) htjung@compute-0-3:~>ldd ~/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/libOpenMMCUDA.so
        linux-vdso.so.1 =>  (0x00007ffff79ec000)
        librt.so.1 => /lib64/librt.so.1 (0x00002b28ecabe000)
        libOpenMM.so => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libOpenMM.so (0x00002b28eccc6000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00002b28ed20b000)
        libcufft.so.10.0 => /usr/local/cuda-10.0/lib64/libcufft.so.10.0 (0x00002b28ee380000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b28f4834000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b28f4a50000)
        libstdc++.so.6 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libstdc++.so.6 (0x00002b28ec458000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b28f4c54000)
        libgcc_s.so.1 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libgcc_s.so.1 (0x00002b28ec59a000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b28f4f56000)
        libnvidia-fatbinaryloader.so.418.74 => /lib64/libnvidia-fatbinaryloader.so.418.74 (0x00002b28f5323000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b28ec422000)
my CUDA 10.0 library locates in /usr/local/cuda-10.0 and CUDA 9.2 library locates in /usr/local/cuda and /usr/local/cuda-9.2.
I am not sure about /lib64 (does it matter for this problem?)

As for libOpenMMCudaCompiler.so, it also seems to be okay:

Code: Select all

(openmm_cuda10_env) htjung@compute-0-3:~>ldd ~/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/libOpenMMCudaCompiler.so
        linux-vdso.so.1 =>  (0x00007ffc63977000)
        librt.so.1 => /lib64/librt.so.1 (0x00002b61ef483000)
        libnvrtc.so.10.0 => /usr/local/cuda-10.0/lib64/libnvrtc.so.10.0 (0x00002b61ef68b000)
        libOpenMMCUDA.so => not found
        libOpenMM.so => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libOpenMM.so (0x00002b61f0ca7000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b61f11ec000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00002b61f13f0000)
        libcufft.so.10.0 => /usr/local/cuda-10.0/lib64/libcufft.so.10.0 (0x00002b61f2565000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b61f8a19000)
        libstdc++.so.6 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libstdc++.so.6 (0x00002b61ef08e000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b61f8c35000)
        libgcc_s.so.1 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libgcc_s.so.1 (0x00002b61ef1d0000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b61f8f37000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b61ef057000)
        libnvidia-fatbinaryloader.so.418.74 => /lib64/libnvidia-fatbinaryloader.so.418.74 (0x00002b61f9304000)
I would be glad to have any comments about what I need to check and where can I do better.

User avatar
Peter Eastman
Posts: 1901
Joined: Thu Aug 09, 2007 1:25 pm

Re: problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

Post by Peter Eastman » Sat May 18, 2019 2:16 pm

CUDA 9.2 doesn't support Turing GPUs. It was released before they came out. In addition to needing the runtime libraries, it also needs the nvcc compiler, which by default it expects to be at /usr/local/cuda/bin/nvcc. In your case, that's the CUDA 9.2 version, which doesn't support your GPU. You can set the OPENMM_CUDA_COMPILER environment variable to give it the correct path:

Code: Select all

export OPENMM_CUDA_COMPILER=/usr/local/cuda-10.0/bin/nvcc

User avatar
Hyuntae Jung
Posts: 3
Joined: Mon Mar 09, 2009 7:34 am

Re: problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

Post by Hyuntae Jung » Mon May 20, 2019 9:30 am

It works. Thank you for your comment!

POST REPLY