The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
-
Istvan Kolossvary
- Posts: 34
- Joined: Fri Jul 20, 2018 1:48 pm
Post
by Istvan Kolossvary » Thu Jan 30, 2020 12:10 pm
I am trying to install OpenMM with conda on a machine that has multiple CUDA installations in /usr/local and /usr/local/cuda is a symlink.
Code: Select all
lrwxrwxrwx 1 root root 19 Sep 11 13:57 cuda -> /usr/local/cuda-8.0/
drwxr-xr-x 19 root root 4096 Sep 11 13:45 cuda-10.0/
drwxr-xr-x 17 root root 4096 Jun 5 2017 cuda-8.0/
drwxr-xr-x 18 root root 4096 Jan 8 2019 cuda-9.1/
The /usr/local directory is read-only for me, I cannot change the symlink. When I install OpenMM with conda via
Code: Select all
conda install -c omnia/label/cuda100 -c conda-forge openmm
the GPU platform won't work because it tries to link with CUDA 8.0 libraries. Is there a way to modify the conda command to tell it explicitly to use the /usr/local/cuda-10.0 directory instead of the default /usr/local/cuda?
Thanks,
Istvan
-
Peter Eastman
- Posts: 2583
- Joined: Thu Aug 09, 2007 1:25 pm
Post
by Peter Eastman » Thu Jan 30, 2020 12:18 pm
There are two distinct issues. The first is what libraries it links to. If it's finding the libraries under /usr/local/cuda, that means you probably have LD_LIBRARY_PATH set to include it. Check what the current value is with
Then you can set it to an appropriate value with, for example
Code: Select all
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64
The second issue is that OpenMM needs to find the CUDA compiler at runtime. You can use the OPENMM_CUDA_COMPILER environment variable to control that:
Code: Select all
export OPENMM_CUDA_COMPILER=/usr/local/cuda-10.0/bin/nvcc
-
Istvan Kolossvary
- Posts: 34
- Joined: Fri Jul 20, 2018 1:48 pm
Post
by Istvan Kolossvary » Thu Jan 30, 2020 12:45 pm
I tried this already but it shows this error message. My guess was that CUDA 8.0 was still referenced somehow. Does OpenMM link to CUDA only runtime? How do I know my anaconda environment where I installed OpenMM knows the value of my Linux envars?
Code: Select all
$ python -m simtk.testInstallation
OpenMM Version: 7.4.1
Git Revision: 068f120206160d5151c9af0baf810384bba8d052
There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Error computing forces with CUDA platform
4 OpenCL - Successfully computed forces
CUDA platform error: Error loading CUDA module: CUDA_ERROR_INVALID_PTX (218)
Median difference in forces between platforms:
Reference vs. CPU: 6.29717e-06
Reference vs. OpenCL: 6.75312e-06
CPU vs. OpenCL: 8.07169e-07
All differences are within tolerance.
-
Peter Eastman
- Posts: 2583
- Joined: Thu Aug 09, 2007 1:25 pm
Post
by Peter Eastman » Thu Jan 30, 2020 1:02 pm
It's possible this is just caused by a cached kernel that was compiled on an earlier run with the 8.0 compiler. It saves compiled kernels in /tmp to speed up context creation. They all get deleted when you reboot. Or you can just delete them yourself. Look in /tmp, and you should see a lot of files whose names are a 20 character hash, followed by the GPU's compute capability and either _32 or _64.
-
Istvan Kolossvary
- Posts: 34
- Joined: Fri Jul 20, 2018 1:48 pm
Post
by Istvan Kolossvary » Thu Jan 30, 2020 1:51 pm
The machine is a cluster front end, I can't reboot it, and I couldn't find any compiled kernels in /tmp. I tried starting with a clean slate, removing the entire anaconda3 directory, starting a new console, and then
Code: Select all
$ export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64
$ export PATH=/usr/local/cuda-10.0/bin/:$PATH
$ export OPENMM_CUDA_COMPILER=/usr/local/cuda-10.0/bin/nvcc
$ bash Anaconda3-2019.10-Linux-x86_64.sh
$ source .bashrc
# make sure envars are still set
(base)$ printenv | grep OPENMM
(base)$ printenv | grep PATH
# install OpenMM
(base)$ conda install -c omnia/label/cuda100 -c conda-forge openmm
(base)$ python -m simtk.testInstallation
OpenMM Version: 7.4.1
Git Revision: 068f120206160d5151c9af0baf810384bba8d052
There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Error computing forces with CUDA platform
4 OpenCL - Successfully computed forces
CUDA platform error: Error loading CUDA module: CUDA_ERROR_INVALID_PTX (218)
Median difference in forces between platforms:
Reference vs. CPU: 6.29856e-06
Reference vs. OpenCL: 6.75312e-06
CPU vs. OpenCL: 8.10363e-07
All differences are within tolerance.
-
Peter Eastman
- Posts: 2583
- Joined: Thu Aug 09, 2007 1:25 pm
Post
by Peter Eastman » Thu Jan 30, 2020 1:59 pm
Reinstalling wouldn't have any effect on the cached files. Is the environment variable TMPDIR set? If so, that's the directory where they're being created. If not, try setting it to point to a directory you create inside your home directory.
Of course there's another option. We provide prebuilt OpenMM libraries for all CUDA releases since 7.5. If the cluster administrators really want you to be using 8.0, you could just install that one by specifying cuda80 instead of cuda100.
-
Istvan Kolossvary
- Posts: 34
- Joined: Fri Jul 20, 2018 1:48 pm
Post
by Istvan Kolossvary » Thu Jan 30, 2020 2:12 pm
TMPDIR is not set and setting it to ~/tmp didn't make any difference. CUDA 8.0 will go away from the cluster soon, but there will be multiple CUDA installations available for different applications and there is no guarantee that /usr/local/cuda will point to CUDA-10.0 or CUDA-10.1. I guess, I'd like to use OpenMM with the latest CUDA, though, and that may not be the default on the cluster.
-
Istvan Kolossvary
- Posts: 34
- Joined: Fri Jul 20, 2018 1:48 pm
Post
by Istvan Kolossvary » Thu Jan 30, 2020 3:26 pm
I wonder if this might be a CUDA driver issue. The cluster has a rather old version 390.87.