Page 1 of 2

OpenMM segfault

Posted: Mon Apr 08, 2013 6:26 am
by silvio
Good morning,

I am trying to run OpenMM on my CPU with OpenCL (I plan to use it on a CPU cluster). I have an Intel Xeon E5440 @ 2.83GHz, running on a Dell Precision T5400. I downloaded and installed the AMD AMD-APP-SDK version 2.8, and after that OpenMM version 5.0.1, binary distribution. Both installations went through fine.

When running testInstallation.py, this is the output I get:

Code: Select all

There are 2 Platforms available:

1 Reference - Successfully computed forces
Setting of real/effective user Id to 0/0 failed
FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
segmentation fault (core dumped)
and then it gives segfault. I also tried to run a few tests from the AMD-APP-SDK, and they went fine.

I then tried to compile and run helloArgon.cpp with gdb, and this is the backtrace:

Code: Select all

GNU gdb (GDB) 7.5-ubuntu
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/XXX/src/OpenMM5.0.1-Linux64/examples/a.out...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/XXX/src/OpenMM5.0.1-Linux64/examples/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Setting of real/effective user Id to 0/0 failed
FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
[New Thread 0x7ffff7ff0700 (LWP 27566)]
[New Thread 0x7ffff23f4700 (LWP 27567)]
[New Thread 0x7ffff1be1700 (LWP 27568)]
[New Thread 0x7ffff13ce700 (LWP 27569)]
[New Thread 0x7ffff0bbb700 (LWP 27570)]
[New Thread 0x7fffdbfff700 (LWP 27571)]
[New Thread 0x7fffdb7ec700 (LWP 27572)]
[New Thread 0x7fffdafd9700 (LWP 27573)]
[New Thread 0x7fffda7c6700 (LWP 27574)]
[New Thread 0x7fffd9fb3700 (LWP 27577)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6e16895 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
(gdb) bt
#0  0x00007ffff6e16895 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#1  0x00007ffff656ff05 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#2  0x00007ffff658db44 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#3  0x00007ffff658eff0 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#4  0x00007ffff659197d in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#5  0x00007ffff6627502 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#6  0x00007ffff66289f5 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#7  0x00007ffff662e38d in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#8  0x00007ffff662f395 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#9  0x00007ffff6e04789 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#10 0x00007ffff6e048c5 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#11 0x00007ffff6e04ac6 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#12 0x00007ffff5fe72c8 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#13 0x00007ffff5fe7519 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#14 0x00007ffff5fec8cf in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#15 0x00007ffff5feeb88 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#16 0x00007ffff5fc2a24 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#17 0x00007ffff59fa540 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#18 0x00007ffff59fafe2 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#19 0x00007ffff59d05f5 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#20 0x00007ffff59e28a0 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#21 0x00007ffff59c89d7 in clBuildProgram () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#22 0x00007ffff3d7328f in OpenMM::OpenCLContext::createProgram(std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, char const*) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
#23 0x00007ffff3da3bfd in OpenMM::OpenCLNonbondedUtilities::initialize(OpenMM::System const&) () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
#24 0x00007ffff3d7122d in OpenMM::OpenCLContext::initialize() () from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
#25 0x00007ffff3e34120 in OpenMM::OpenCLPlatform::PlatformData::initializeContexts(OpenMM::System const&) ()
    from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
#26 0x00007ffff3dbae91 in OpenMM::OpenCLIntegrateVerletStepKernel::initialize(OpenMM::System const&, OpenMM::VerletIntegrator const&) ()
    from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
#27 0x00007ffff53705f5 in OpenMM::VerletIntegrator::initialize(OpenMM::ContextImpl&) () from /usr/local/openmm/lib/libOpenMM.so
#28 0x00007ffff53a1941 in OpenMM::ContextImpl::ContextImpl(OpenMM::Context&, OpenMM::System&, OpenMM::Integrator&, OpenMM::Platform*, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&) () from /usr/local/openmm/lib/libOpenMM.so
#29 0x00007ffff5390a19 in OpenMM::Context::Context(OpenMM::System&, OpenMM::Integrator&) () from /usr/local/openmm/lib/libOpenMM.so
#30 0x000000000040263c in myInitializeOpenMM(int, double, double, double, std::string&) ()
#31 0x0000000000401c93 in main ()

Considering that the examples from AMD-APP-SDK work, I think there must some clash between OpenMM and AMD-APP-SDK, but I don't know how to fix it, and I know no alternative way to use OpenMM on a CPU (apart from the reference platform).

Kind regards

Silvio a Beccara
FBK Foundation
Trento, Italy

Re: OpenMM segfault

Posted: Mon Apr 08, 2013 11:13 am
by peastman
fglrx is the AMD graphics driver, so it's complaining you don't have that installed. I didn't think it was required for running on the CPU, but perhaps that's changed? Anyway, try installing it and see if that fixes it.

OpenMM 5.1 will also support Intel's OpenCL, so that will be another way to run on the CPU.

Peter

Re: OpenMM segfault

Posted: Tue Apr 09, 2013 1:05 am
by silvio
Hi Peter,
peastman wrote:fglrx is the AMD graphics driver, so it's complaining you don't have that installed.
actually OpenMM complains about the fglrx even when I use my nvidia Quadro graphics card with OpenCL, but the problem is not there when I run AMD's tests.

I also tried to install the fglrx module, but this it what I get:

Code: Select all

FATAL: Error inserting fglrx (/lib/modules/3.2.0-39-generic/updates/dkms/fglrx.ko): Operation not permitted
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
UNREACHABLE executed!
and if I try sudo, the path to the library is not there anymore:

Code: Select all

Failed to import OpenMM packages: libOpenMM.so: cannot open shared object file: No such file or directory
Make sure OpenMM is installed and the library path is set correctly.

peastman wrote:OpenMM 5.1 will also support Intel's OpenCL, so that will be another way to run on the CPU.
When is OpenMM 5.1 going to be released?

Thanks

Silvio

Re: OpenMM segfault

Posted: Tue Apr 09, 2013 10:24 am
by peastman
actually OpenMM complains about the fglrx even when I use my nvidia Quadro graphics card with OpenCL
If it's complaining about fglrx, then it's definitely not using an NVIDIA GPU. How are you telling it which one to use? If you have both AMD and NVIDIA versions of OpenCL installed, they'll show up as different "OpenCL platforms", not to be confused with OpenMM platforms. When creating your Context, use the OpenCLPlatformIndex property to tell it which one to use.
and if I try sudo, the path to the library is not there anymore:
You only need to use sudo when installing the driver. Running OpenMM can be done normally.
When is OpenMM 5.1 going to be released?
We hope to have a beta out quite soon, within the next week or two.

Peter

Re: OpenMM segfault

Posted: Tue Apr 09, 2013 11:51 am
by silvio
peastman wrote:If it's complaining about fglrx, then it's definitely not using an NVIDIA GPU. How are you telling it which one to use?
To see which one is in use, I issue the command perf top while the program is running. The nvidia libOpenCL.so.1 shows up.
If you have both AMD and NVIDIA versions of OpenCL installed, they'll show up as different "OpenCL platforms"
only one shows up.
When creating your Context, use the OpenCLPlatformIndex property to tell it which one to use.
I tried it, but OpenMM seems to ignore it. Only if I set LD_PRELOAD=<path to AMD libOpenCL> does the AMD library get loaded, but then OpenMM crashes. Would it be possible for you to try and reproduce the error by running a Python example on a CPU OpenCL platform with the AMD libraries?
You only need to use sudo when installing the driver. Running OpenMM can be done normally.
I tried dmesg, and the reason why the driver doesn't get loaded is that I don't have any compatible AMD GPU card. But, as I said, the AMD examples run fine with their library, despite the fglrx "error".
We hope to have a beta out quite soon, within the next week or two.
that would be really nice, also considering the performance increase announced for the next release.

Silvio

Re: OpenMM segfault

Posted: Tue Apr 09, 2013 3:27 pm
by cjryan
I'm hoping to do the same thing (use CPU-multithreading via the OpenCL in the AMD APP SDK) and getting very similar errors. Also, I have noticed that the appearance of the following error message when selecting the OpenCL-CPU platform:

Code: Select all

FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user 
to load kernel module directly
can be due to a known issue, and (if so) would not interfere with the execution of the program (see last bullet point here http://developer.amd.com/wordpress/medi ... amples.pdf). So it seems like the segfault that happens afterward could be unrelated.

In case it helps, the output of testInstallation.py for me is:

Code: Select all

There are 2 Platforms available:

1 Reference - Successfully computed forces
Setting of real/effective user Id to 0/0 failed
FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
Segmentation fault
and the GDB output (with backtrace) of this is:

Code: Select all

(gdb) run testInstallation.py 
Starting program: /global/home/users/cjryan/epd/epd-7.3-2-rh3-x86_64/bin/python testInstallation.py
[Thread debugging using libthread_db enabled]
There are 2 Platforms available:

1 Reference - Successfully computed forces
Detaching after fork from child process 56135.
Setting of real/effective user Id to 0/0 failed
FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
[New Thread 0x2aaab2797700 (LWP 56136)]
[New Thread 0x2aaabc812700 (LWP 56137)]
[New Thread 0x2aaabd025700 (LWP 56138)]
[New Thread 0x2aaac8812700 (LWP 56139)]
[New Thread 0x2aaac9025700 (LWP 56140)]
[New Thread 0x2aaad4812700 (LWP 56141)]
[New Thread 0x2aaad5025700 (LWP 56142)]
[New Thread 0x2aaae0812700 (LWP 56143)]
[New Thread 0x2aaae1025700 (LWP 56144)]
[New Thread 0x2aaaec812700 (LWP 56145)]
[New Thread 0x2aaaed025700 (LWP 56146)]
[New Thread 0x2aaaf8812700 (LWP 56147)]
[New Thread 0x2aaaf9025700 (LWP 56148)]
[New Thread 0x2aab04812700 (LWP 56149)]
[New Thread 0x2aab05025700 (LWP 56150)]
[New Thread 0x2aab10812700 (LWP 56151)]
[New Thread 0x2aab11025700 (LWP 56152)]
[New Thread 0x2aab1c812700 (LWP 56153)]
[New Thread 0x2aab1d025700 (LWP 56154)]
[New Thread 0x2aab28812700 (LWP 56155)]
[New Thread 0x2aab29025700 (LWP 56156)]
[New Thread 0x2aab34812700 (LWP 56157)]
[New Thread 0x2aab35025700 (LWP 56158)]
[New Thread 0x2aab40812700 (LWP 56159)]
[New Thread 0x2aab41025700 (LWP 56160)]
[New Thread 0x2aab4c812700 (LWP 56161)]
[New Thread 0x2aab4d025700 (LWP 56162)]
[New Thread 0x2aab58812700 (LWP 56163)]
[New Thread 0x2aab59025700 (LWP 56164)]
[New Thread 0x2aab64812700 (LWP 56165)]
[New Thread 0x2aab65025700 (LWP 56166)]
[New Thread 0x2aab70812700 (LWP 56167)]
[New Thread 0x2aab71025700 (LWP 56168)]
[New Thread 0x2aab7c812700 (LWP 56169)]
[New Thread 0x2aab7d025700 (LWP 56170)]
[New Thread 0x2aab88812700 (LWP 56171)]
[New Thread 0x2aab89025700 (LWP 56172)]
[New Thread 0x2aab94812700 (LWP 56173)]
[New Thread 0x2aab95025700 (LWP 56174)]
[New Thread 0x2aaba0812700 (LWP 56175)]
[New Thread 0x2aaba1025700 (LWP 56176)]
[New Thread 0x2aabac812700 (LWP 56177)]
[New Thread 0x2aabad025700 (LWP 56178)]
[New Thread 0x2aabb8812700 (LWP 56179)]
[New Thread 0x2aabb9025700 (LWP 56180)]
[New Thread 0x2aabc4812700 (LWP 56181)]
[New Thread 0x2aabc5025700 (LWP 56182)]
[New Thread 0x2aabd0812700 (LWP 56183)]
[New Thread 0x2aabd1025700 (LWP 56184)]
[New Thread 0x2aabdc812700 (LWP 56185)]
[New Thread 0x2aabdd025700 (LWP 56186)]
[New Thread 0x2aabe8812700 (LWP 56187)]
[New Thread 0x2aabe9025700 (LWP 56188)]
[New Thread 0x2aabf4812700 (LWP 56189)]
[New Thread 0x2aabf5025700 (LWP 56190)]
[New Thread 0x2aac00812700 (LWP 56191)]
[New Thread 0x2aac01025700 (LWP 56192)]
[New Thread 0x2aac0c812700 (LWP 56193)]
[New Thread 0x2aac0d025700 (LWP 56194)]
[New Thread 0x2aac18812700 (LWP 56195)]
[New Thread 0x2aac19025700 (LWP 56196)]
[New Thread 0x2aac24812700 (LWP 56197)]
[New Thread 0x2aac25025700 (LWP 56198)]
[New Thread 0x2aac30812700 (LWP 56199)]
[New Thread 0x2aac31025700 (LWP 56200)]
Detaching after fork from child process 56201.
Detaching after fork from child process 56202.
[New Thread 0x2aac3c405700 (LWP 56203)]
Detaching after fork from child process 56204.
Detaching after fork from child process 56205.
Detaching after fork from child process 56206.
Detaching after fork from child process 56207.
Detaching after fork from child process 56208.
Detaching after fork from child process 56209.
Detaching after fork from child process 56210.
Detaching after fork from child process 56211.
Detaching after fork from child process 56212.
Detaching after fork from child process 56213.
Detaching after fork from child process 56214.
Detaching after fork from child process 56215.
Detaching after fork from child process 56216.
Detaching after fork from child process 56217.
Detaching after fork from child process 56218.
Detaching after fork from child process 56219.
Detaching after fork from child process 56220.
Detaching after fork from child process 56221.
Detaching after fork from child process 56222.
Detaching after fork from child process 56223.
Detaching after fork from child process 56224.
Detaching after fork from child process 56225.

Program received signal SIGSEGV, Segmentation fault.
0x00002aaab5ccc895 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.5.x86_64 libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64 libXext-1.1-3.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 libxcb-1.5-1.el6.x86_64
(gdb) bt
#0  0x00002aaab5ccc895 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#1  0x00002aaab5425f05 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#2  0x00002aaab5443b44 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#3  0x00002aaab5444ff0 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#4  0x00002aaab544797d in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#5  0x00002aaab54dd502 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#6  0x00002aaab54de9f5 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#7  0x00002aaab54e438d in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#8  0x00002aaab54e5395 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#9  0x00002aaab5cba789 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#10 0x00002aaab5cba8c5 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#11 0x00002aaab5cbaac6 in ?? ()
---Type <return> to continue, or q <return> to quit---
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#12 0x00002aaab4e9d2c8 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#13 0x00002aaab4e9d519 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#14 0x00002aaab4ea28cf in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#15 0x00002aaab4ea4b88 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#16 0x00002aaab4e78a24 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#17 0x00002aaab48b0540 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#18 0x00002aaab48b0fe2 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#19 0x00002aaab48865f5 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#20 0x00002aaab48988a0 in ?? ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#21 0x00002aaab487e9d7 in clBuildProgram ()
   from /global/home/users/cjryan/ksong/AMDAPP/lib/x86_64/libamdocl64.so
#22 0x00002aaab16447d2 in OpenMM::OpenCLContext::createProgram(std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::basic_strin---Type <return> to continue, or q <return> to quit---
g<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, char const*) ()
   from /global/home/users/cjryan/openmm/lib/plugins/libOpenMMOpenCL.so
#23 0x00002aaab15b54a1 in OpenMM::OpenCLNonbondedUtilities::initialize(OpenMM::System const&) ()
   from /global/home/users/cjryan/openmm/lib/plugins/libOpenMMOpenCL.so
#24 0x00002aaab16432a6 in OpenMM::OpenCLContext::initialize() ()
   from /global/home/users/cjryan/openmm/lib/plugins/libOpenMMOpenCL.so
#25 0x00002aaab1652810 in OpenMM::OpenCLPlatform::PlatformData::initializeContexts(OpenMM::System const&) ()
   from /global/home/users/cjryan/openmm/lib/plugins/libOpenMMOpenCL.so
#26 0x00002aaab15d4b8c in OpenMM::OpenCLIntegrateLangevinStepKernel::initialize(OpenMM::System const&, OpenMM::LangevinIntegrator const&) ()
   from /global/home/users/cjryan/openmm/lib/plugins/libOpenMMOpenCL.so
#27 0x00002aaaac4695c6 in OpenMM::LangevinIntegrator::initialize(OpenMM::ContextImpl&) () from /global/home/users/cjryan/openmm/lib/libOpenMM.so
#28 0x00002aaaac48c707 in OpenMM::ContextImpl::ContextImpl(OpenMM::Context&, OpenMM::System&, OpenMM::Integrator&, OpenMM::Platform*, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, s---Type <return> to continue, or q <return> to quit---
td::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)
    () from /global/home/users/cjryan/openmm/lib/libOpenMM.so
#29 0x00002aaaac4a0c99 in OpenMM::Context::Context(OpenMM::System&, OpenMM::Integrator&, OpenMM::Platform&) ()
   from /global/home/users/cjryan/openmm/lib/libOpenMM.so






#30 0x00002aaaac11e4a6 in _wrap_new_Context__SWIG_1 (
    self=<value optimized out>, args=<value optimized out>)
    at src/swig_doxygen/OpenMMSwig.cxx:33875
#31 _wrap_new_Context (self=<value optimized out>, args=<value optimized out>)
    at src/swig_doxygen/OpenMMSwig.cxx:35133
#32 0x00002aaaaada514b in PyEval_EvalFrameEx (f=0x1881540, throwflag=5247136)
    at Python/ceval.c:4331
#33 0x00002aaaaada7324 in PyEval_EvalCodeEx (co=0x2aaaabfb9a30, 
    globals=0xfffffff8, locals=0x2aaab52cd760, args=0x36df220, argcount=4, 
    kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#34 0x00002aaaaad39c58 in function_call (func=0xb9c398, arg=0x36df208, kw=0x0)
    at Objects/funcobject.c:526
#35 0x00002aaaaad15cfd in PyObject_Call (func=0xb9c398, arg=0x36df208, kw=0x0)
    at Objects/abstract.c:2529
---Type <return> to continue, or q <return> to quit---
#36 0x00002aaaaad25094 in instancemethod_call (func=0xb9c398, arg=0x36df208, 
    kw=0x0) at Objects/classobject.c:2578
#37 0x00002aaaaad15cfd in PyObject_Call (func=0x16e1280, arg=0x3be0c80, kw=0x0)
    at Objects/abstract.c:2529
#38 0x00002aaaaad69a97 in slot_tp_init (self=0x7fffffff8cd0, args=0x3be0c80, 
    kwds=0x0) at Objects/typeobject.c:5663
#39 0x00002aaaaad62a3b in type_call (type=0xd475e0, args=0x3be0c80, kwds=0x0)
    at Objects/typeobject.c:735
#40 0x00002aaaaad15cfd in PyObject_Call (func=0xd475e0, arg=0x3be0c80, kw=0x0)
    at Objects/abstract.c:2529
#41 0x00002aaaaada481e in PyEval_EvalFrameEx (f=0x1eb3f90, throwflag=32194880)
    at Python/ceval.c:4239
#42 0x00002aaaaada7324 in PyEval_EvalCodeEx (co=0xd8fdb0, globals=0xfffffff8, 
    locals=0x2aaab52cd760, args=0x4, argcount=5, kws=0x0, kwcount=0, 
    defs=0xd9c9f8, defcount=2, closure=0x0) at Python/ceval.c:3253
#43 0x00002aaaaad39c58 in function_call (func=0xe62b90, arg=0xf29d70, kw=0x0)
    at Objects/funcobject.c:526
#44 0x00002aaaaad15cfd in PyObject_Call (func=0xe62b90, arg=0xf29d70, kw=0x0)
    at Objects/abstract.c:2529
#45 0x00002aaaaad25094 in instancemethod_call (func=0xe62b90, arg=0xf29d70, 
    kw=0x0) at Objects/classobject.c:2578
#46 0x00002aaaaad15cfd in PyObject_Call (func=0x16e1140, arg=0x3be8158, kw=0x0)
    at Objects/abstract.c:2529
---Type <return> to continue, or q <return> to quit---
#47 0x00002aaaaad69a97 in slot_tp_init (self=0x7fffffff8cd0, args=0x3be8158, 
    kwds=0x0) at Objects/typeobject.c:5663
#48 0x00002aaaaad62a3b in type_call (type=0xe8d2b0, args=0x3be8158, kwds=0x0)
    at Objects/typeobject.c:735
#49 0x00002aaaaad15cfd in PyObject_Call (func=0xe8d2b0, arg=0x3be8158, kw=0x0)
    at Objects/abstract.c:2529
#50 0x00002aaaaada481e in PyEval_EvalFrameEx (f=0x5c97e0, throwflag=6068584)
    at Python/ceval.c:4239
#51 0x00002aaaaada7324 in PyEval_EvalCodeEx (co=0x2aaaabcbaf30, 
    globals=0xfffffff8, locals=0x2aaab52cd760, args=0x0, argcount=0, kws=0x0, 
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#52 0x00002aaaaada7522 in PyEval_EvalCode (co=0x7fffffff8cd0, 
    globals=0xfffffff8, locals=0x2aaab52cd760) at Python/ceval.c:667
#53 0x00002aaaaadc029c in run_mod (mod=0x7fffffff8cd0, 
    filename=0xfffffff8 <Address 0xfffffff8 out of bounds>, globals=0x540280, 
    locals=0x540280, flags=0x2aaab52cd760, arena=0x0)
    at Python/pythonrun.c:1353
#54 0x00002aaaaadc08fc in PyRun_FileExFlags (fp=0x5a6ed0, 
    filename=0x7fffffffe0ba "testInstallation.py", start=6149536, 
    globals=0x540280, locals=0x540280, closeit=1, flags=0x7fffffffda4c)
    at Python/pythonrun.c:1339
#55 0x00002aaaaadc1811 in PyRun_SimpleFileExFlags (fp=0x5a6ed0, 
    filename=0x7fffffffe0ba "testInstallation.py", closeit=1, 
---Type <return> to continue, or q <return> to quit---
    flags=0x7fffffffda4c) at Python/pythonrun.c:943
#56 0x00002aaaaadd1f06 in Py_Main (argc=1, argv=0x7fffffffdcf8)
    at Modules/main.c:729
#57 0x00002aaaab84dcdd in __libc_start_main () from /lib64/libc.so.6
#58 0x00000000004008ea in _start ()
I should also mention that, since the AMD APP SDK cannot typically be installed as a non-root user, a cluster support team member helped me to rewrite AMD's installation script to install inside my home directory. Examples programs of this SDK seem to run correctly, but perhaps OpenMM might not work well with such a non-standard configuration in my case.

Re: OpenMM segfault

Posted: Thu Apr 11, 2013 6:05 am
by silvio
Dear Christopher,

thank you for your contribution. So it really seems that the segfault is due to some conflict between OpenMM and the AMD SDK. By browsing through the gdb output, it looks like OpenMM is not able to build the openCL program:

Code: Select all

#21 0x00007ffff59c89d7 in clBuildProgram () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#22 0x00007fffeee4b28f in OpenMM::OpenCLContext::createProgram(std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, char const*) ()
   from /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so
Have you made any progress with using OpenMM with the AMD CPU platform?

Silvio

Re: OpenMM segfault

Posted: Mon Apr 15, 2013 12:30 pm
by cjryan
Hi Silvio,

In addition to the errors I mention above when using CPU-multithreading in OpenMM/OpenCL(AMD) on a cluster, I also get very similar errors (perhaps identical) to yours when doing this on my desktop. Unlike the configuration I've set up on the cluster, the installation of AMD APP SDK and OpenMM, etc, is fairly standard on this machine.

The output of `testInstallation.py' is:

Code: Select all

$ python testInstallation.py 
There are 3 Platforms available:

1 Reference - Successfully computed forces
Setting of real/effective user Id to 0/0 failed
FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
Segmentation fault (core dumped)
and the gdb output of this, with backtrace, is:

Code: Select all

$ gdb python
(gdb) r testInstallation.py
Starting program: /usr/bin/python testInstallation.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
There are 3 Platforms available:

1 Reference - Successfully computed forces
Setting of real/effective user Id to 0/0 failed
FATAL: Module fglrx not found.
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly
[New Thread 0x7fffed622700 (LWP 8250)]
[New Thread 0x7fffe9c5d700 (LWP 8251)]
[New Thread 0x7fffe944a700 (LWP 8252)]
[New Thread 0x7fffe8c37700 (LWP 8253)]
[New Thread 0x7fffdbfff700 (LWP 8254)]
[New Thread 0x7fffdb7ec700 (LWP 8257)]
[New Thread 0x7fffed3dc700 (LWP 8266)]
[New Thread 0x7fffdabe7700 (LWP 8267)]
[New Thread 0x7fffda3d4700 (LWP 8268)]
[New Thread 0x7fffd9bc1700 (LWP 8269)]
[New Thread 0x7fffd93ae700 (LWP 8270)]
[New Thread 0x7fffd8996700 (LWP 8273)]

Program received signal SIGSEGV, Segmentation fault.
0x00007fffeb3cf895 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
(gdb) bt
#0  0x00007fffeb3cf895 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#1  0x00007fffeab28f05 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#2  0x00007fffeab46b44 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#3  0x00007fffeab47ff0 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#4  0x00007fffeab4a97d in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#5  0x00007fffeabe0502 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#6  0x00007fffeabe19f5 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#7  0x00007fffeabe738d in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#8  0x00007fffeabe8395 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#9  0x00007fffeb3bd789 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#10 0x00007fffeb3bd8c5 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#11 0x00007fffeb3bdac6 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#12 0x00007fffea5a02c8 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#13 0x00007fffea5a0519 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#14 0x00007fffea5a58cf in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#15 0x00007fffea5a7b88 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#16 0x00007fffea57ba24 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#17 0x00007fffe9fb3540 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#18 0x00007fffe9fb3fe2 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#19 0x00007fffe9f895f5 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#20 0x00007fffe9f9b8a0 in ?? () from /opt/AMDAPP/lib/x86_64/libamdocl64.so
#21 0x00007fffe9f819d7 in clBuildProgram ()
   from /opt/AMDAPP/lib/x86_64/libamdocl64.so
---Type <return> to continue, or q <return> to quit---
#22 0x00007ffff237d28f in OpenMM::OpenCLContext::createProgram(std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, char const*) ()
   from /home/christopherryan/OtherPrograms/openmm/lib/plugins/libOpenMMOpenCL.so
#23 0x00007ffff23adbfd in OpenMM::OpenCLNonbondedUtilities::initialize(OpenMM::System const&) ()
   from /home/christopherryan/OtherPrograms/openmm/lib/plugins/libOpenMMOpenCL.so
#24 0x00007ffff237b22d in OpenMM::OpenCLContext::initialize() ()
   from /home/christopherryan/OtherPrograms/openmm/lib/plugins/libOpenMMOpenCL.so
#25 0x00007ffff243e120 in OpenMM::OpenCLPlatform::PlatformData::initializeContexts(OpenMM::System const&) ()
   from /home/christopherryan/OtherPrograms/openmm/lib/plugins/libOpenMMOpenCL.so
#26 0x00007ffff23ca59e in OpenMM::OpenCLIntegrateLangevinStepKernel::initialize(OpenMM::System const&, OpenMM::LangevinIntegrator const&) ()
   from /home/christopherryan/OtherPrograms/openmm/lib/plugins/libOpenMMOpenCL.so
#27 0x00007ffff5c4dfe5 in OpenMM::LangevinIntegrator::initialize(OpenMM::ContextImpl&) () from /home/christopherryan/OtherPrograms/openmm/lib/libOpenMM.so
#28 0x00007ffff5c4a941 in OpenMM::ContextImpl::ContextImpl(OpenMM::Context&, Ope---Type <return> to continue, or q <return> to quit---
nMM::System&, OpenMM::Integrator&, OpenMM::Platform*, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&) ()
   from /home/christopherryan/OtherPrograms/openmm/lib/libOpenMM.so

#29 0x00007ffff5c3a2ce in OpenMM::Context::Context(OpenMM::System&, OpenMM::Integrator&, OpenMM::Platform&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&) ()
   from /home/christopherryan/OtherPrograms/openmm/lib/libOpenMM.so
#30 0x00007ffff601b7af in _wrap_new_Context__SWIG_2 (args=<optimized out>)
    at src/swig_doxygen/OpenMMSwig.cxx:35713
#31 _wrap_new_Context (self=<optimized out>, args=<optimized out>)
    at src/swig_doxygen/OpenMMSwig.cxx:36923
#32 0x0000000000463ea7 in PyEval_EvalFrameEx ()
#33 0x0000000000467209 in PyEval_EvalCodeEx ()
#34 0x00000000004a9fea in ?? ()
#35 0x000000000048249d in ?? ()
#36 0x0000000000491bb4 in ?? ()
#37 0x00000000004aac6e in ?? ()
#38 0x00000000004600be in PyEval_EvalFrameEx ()
#39 0x0000000000467209 in PyEval_EvalCodeEx ()
#40 0x00000000004a9fea in ?? ()
#41 0x000000000048249d in ?? ()
#42 0x0000000000491bb4 in ?? ()
---Type <return> to continue, or q <return> to quit---
#43 0x00000000004aac6e in ?? ()
#44 0x00000000004600be in PyEval_EvalFrameEx ()
#45 0x0000000000467209 in PyEval_EvalCodeEx ()
#46 0x00000000004d0242 in PyEval_EvalCode ()
#47 0x00000000005102bb in ?? ()
#48 0x000000000044a466 in PyRun_FileExFlags ()
#49 0x000000000044a97a in PyRun_SimpleFileExFlags ()
#50 0x000000000044b6bc in Py_Main ()
#51 0x00007ffff6f0576d in __libc_start_main (main=0x44b77b <main>, argc=2, 
    ubp_av=0x7fffffffdc98, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffdc88) at libc-start.c:226
#52 0x00000000004ce0ad in _start ()
At the moment, I don't have any new thoughts about how to address this.

Chris

Re: OpenMM segfault

Posted: Tue Apr 16, 2013 12:20 am
by silvio
cjryan wrote:Hi Silvio,
In addition to the errors I mention above when using CPU-multithreading in OpenMM/OpenCL(AMD) on a cluster, I also get very similar errors (perhaps identical) to yours when doing this on my desktop. Unlike the configuration I've set up on the cluster, the installation of AMD APP SDK and OpenMM, etc, is fairly standard on this machine.
We tested the combination OpenMM+OpenCL(AMD) on another machine we have here, and we got exactly the error you're reporting.

At this point we have three quite different systems where the combination OpenMM+OpenCL AMD does not work.
I hope that Peter and the developers will try to reproduce the error and address this problem. Until OpenMM 5.1 is not available to the public this remains the only way to (efficiently) use OpenMM on CPU's.

Silvio

Re: OpenMM segfault

Posted: Wed Apr 17, 2013 10:18 am
by peastman
I've asked someone at AMD about this. I'll report back as soon as I have information.

Peter