Page 1 of 1

problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

Posted: Fri May 17, 2019 8:37 pm
by jht0664
Hello,

I would like to get any comments to solve following problem:
My cluster (intel x86_64 linux) had GTX 1080 Ti nodes, but currently I added a new node with RTX 2070 cards.
The cluster has installed OpenMM with CUDA 9.2 (using

Code: Select all

conda install -c omnia/label/cuda92 -c conda-forge openmm
) and I was going to install a new OpenMM with CUDA 10.0 using

Code: Select all

conda install -c omnia/label/cuda100 -c conda-forge openmm
However, I got the problem when I run '

Code: Select all

python -m simtk.testInstallation
',
and the message is:

Code: Select all

There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Error computing forces with CUDA platform
4 OpenCL - Successfully computed forces

CUDA platform error: Error launching CUDA compiler: 256
nvcc fatal   : Value 'sm_75' is not defined for option 'gpu-architecture'

Median difference in forces between platforms:

Reference vs. CPU: 6.30224e-06
Reference vs. OpenCL: 6.75426e-06
CPU vs. OpenCL: 8.12575e-07
As for first trial to solve, I thought it may be related with wrong path of cuda libraries,
and I checked the linking library path like this:

Code: Select all

(openmm_cuda10_env) htjung@compute-0-3:~>ldd ~/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/libOpenMMCUDA.so
        linux-vdso.so.1 =>  (0x00007ffff79ec000)
        librt.so.1 => /lib64/librt.so.1 (0x00002b28ecabe000)
        libOpenMM.so => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libOpenMM.so (0x00002b28eccc6000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00002b28ed20b000)
        libcufft.so.10.0 => /usr/local/cuda-10.0/lib64/libcufft.so.10.0 (0x00002b28ee380000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b28f4834000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b28f4a50000)
        libstdc++.so.6 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libstdc++.so.6 (0x00002b28ec458000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b28f4c54000)
        libgcc_s.so.1 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libgcc_s.so.1 (0x00002b28ec59a000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b28f4f56000)
        libnvidia-fatbinaryloader.so.418.74 => /lib64/libnvidia-fatbinaryloader.so.418.74 (0x00002b28f5323000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b28ec422000)
my CUDA 10.0 library locates in /usr/local/cuda-10.0 and CUDA 9.2 library locates in /usr/local/cuda and /usr/local/cuda-9.2.
I am not sure about /lib64 (does it matter for this problem?)

As for libOpenMMCudaCompiler.so, it also seems to be okay:

Code: Select all

(openmm_cuda10_env) htjung@compute-0-3:~>ldd ~/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/libOpenMMCudaCompiler.so
        linux-vdso.so.1 =>  (0x00007ffc63977000)
        librt.so.1 => /lib64/librt.so.1 (0x00002b61ef483000)
        libnvrtc.so.10.0 => /usr/local/cuda-10.0/lib64/libnvrtc.so.10.0 (0x00002b61ef68b000)
        libOpenMMCUDA.so => not found
        libOpenMM.so => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libOpenMM.so (0x00002b61f0ca7000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b61f11ec000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00002b61f13f0000)
        libcufft.so.10.0 => /usr/local/cuda-10.0/lib64/libcufft.so.10.0 (0x00002b61f2565000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b61f8a19000)
        libstdc++.so.6 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libstdc++.so.6 (0x00002b61ef08e000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b61f8c35000)
        libgcc_s.so.1 => /home/htjung/miniconda3_cuda10/envs/openmm_cuda10_env/lib/plugins/../libgcc_s.so.1 (0x00002b61ef1d0000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b61f8f37000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b61ef057000)
        libnvidia-fatbinaryloader.so.418.74 => /lib64/libnvidia-fatbinaryloader.so.418.74 (0x00002b61f9304000)
I would be glad to have any comments about what I need to check and where can I do better.

Re: problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

Posted: Sat May 18, 2019 2:16 pm
by peastman
CUDA 9.2 doesn't support Turing GPUs. It was released before they came out. In addition to needing the runtime libraries, it also needs the nvcc compiler, which by default it expects to be at /usr/local/cuda/bin/nvcc. In your case, that's the CUDA 9.2 version, which doesn't support your GPU. You can set the OPENMM_CUDA_COMPILER environment variable to give it the correct path:

Code: Select all

export OPENMM_CUDA_COMPILER=/usr/local/cuda-10.0/bin/nvcc

Re: problem with Installation of OpenMM 7.3.1 with Cuda 9.2 and 10.0 for heterogeneous cluster

Posted: Mon May 20, 2019 9:30 am
by jht0664
It works. Thank you for your comment!