Simbody3.3.1 make test errors compiled on Scientific Linux 6

Simbody is useful for internal coordinate and coarse grained molecule modeling, large scale mechanical models like skeletons, and anything else that can be modeled as bodies interconnected by joints, acted upon by forces, and restricted by constraints.
User avatar
Colin Smith
Posts: 53
Joined: Fri Feb 24, 2012 11:50 am

Simbody3.3.1 make test errors compiled on Scientific Linux 6

Post by Colin Smith » Tue Mar 25, 2014 2:11 pm

I compiled simbody without any errors on Scientific Linux 6, however, when I run make test, the following tests fail:

The following tests FAILED:
183 - TestCustomConstraints (Failed)
184 - TestCustomConstraintsStatic (Failed)
Errors while running CTest
make: *** [test] Error 8

If I just run these tests (i.e ./TestCustomConstraints in the build directory) I get the following:

./TestCustomConstraints
Starting test TestCustomConstraints ...
testCoordinateCoupler1 ... done. testCoordinateCoupler1 time: 0ms.
testCoordinateCoupler2 ... done. testCoordinateCoupler2 time: 30ms.
testCoordinateCoupler3 ... done. testCoordinateCoupler3 time: 30ms.
testSpeedCoupler1 ... done. testSpeedCoupler1 time: 0ms.
testSpeedCoupler2 ... done. testSpeedCoupler2 time: 380ms.
Test failed due to exception: SimTK Exception thrown at TestCustomConstraints.cpp:451:
Internal bug detected: Test values should have been numerically equivalent at tolerance=1e-06.
(Assertion 'SimTK::Test::numericallyEqual((0.0),(power),1,(10*integ.getConstraintToleranceInUse()))' failed).
Please file a bug report at https://simtk.org/home/simbody (Advanced tab).
Include the above information and anything else needed to reproduce the problem.
Done. TestCustomConstraints time: 440ms.

################################################################################

./TestCustomConstraintsStatic
Starting test TestCustomConstraints ...
testCoordinateCoupler1 ... done. testCoordinateCoupler1 time: 0ms.
testCoordinateCoupler2 ... done. testCoordinateCoupler2 time: 20ms.
testCoordinateCoupler3 ... done. testCoordinateCoupler3 time: 20ms.
testSpeedCoupler1 ... done. testSpeedCoupler1 time: 0ms.
testSpeedCoupler2 ... done. testSpeedCoupler2 time: 310ms.
Test failed due to exception: SimTK Exception thrown at TestCustomConstraints.cpp:451:
Internal bug detected: Test values should have been numerically equivalent at tolerance=1e-06.
(Assertion 'SimTK::Test::numericallyEqual((0.0),(power),1,(10*integ.getConstraintToleranceInUse()))' failed).
Please file a bug report at https://simtk.org/home/simbody (Advanced tab).
Include the above information and anything else needed to reproduce the problem.
Done. TestCustomConstraints time: 350ms.


Do you have any suggestions to fix this?
Thanks in advance!
Colin

User avatar
Michael Sherman
Posts: 806
Joined: Fri Apr 01, 2005 6:05 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Michael Sherman » Tue Mar 25, 2014 4:21 pm

Hi, Colin. This is likely to be due to the particular compiler being use on Scientific Linux 6. Two possible explanations come to mind:
- The compiler may have an optimization bug and is generating an incorrect result.
- The test may be too touchy and the result is actually OK.

To test the first hypothesis, you can reduce the optimization level. The most extreme version is to build a Debug version of Simbody, which will do no optimization at all. The tests will run very slowly but should all pass. If that works, you can try modifying the top level CMakeLists.txt file -- currently in Release mode the compiler is told to use -O3, which is the most extreme optimization level. Change all the O3's to O1, and rebuild from scratch. If that allows the tests to pass, try O2. Settle on the highest optimization level that allows all the tests to pass.

Alternatively, the test is too strict. To check that, you will need to extract some more information from TestCustomConstraints.cpp. It is failing in subtest testSpeedCoupler2() at line 451 where it expects the constraint power (which is analytically zero for a workless constraint) to be less than 1e-6. Check the actual value for power (by printing it out). If it is very small but not quite within tolerance (say 1e-5) this may be just an overly strict test.

Please post what you try and what you find out. If the test turns out to be too strict we should loosen it. But if it passes with -O1 or -O2 but not -O3 then I think it more likely that the optimizer on your compiler is broken. Please post the particular compiler type & version also.

Regards,
Sherm

User avatar
Colin Smith
Posts: 53
Joined: Fri Feb 24, 2012 11:50 am

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Colin Smith » Mon Mar 31, 2014 3:47 pm

Sherm,

I am using gcc 4.4.7

Thanks for your help. I tried building the Debug version, but the same tests failed.

To check if the test was too strict, I added the following just above line 451:
std::cout<<power<<std::endl;

I trying to attach the output values I got from running this modified version of the test, but I get "Sorry, the board attachment quota has been reached".
To summerize the output:
- there are 1449 values output
- the early values are around -1e-16 and slowly trend towards -1e-6 (although not monotonically)
- the last ~100 values are included below

-6.47248e-10
-6.14904e-10
-5.80826e-10
-5.38535e-10
-4.95703e-10
-4.53156e-10
-4.11148e-10
-3.70051e-10
-3.34921e-10
-3.00403e-10
-2.66382e-10
-2.36582e-10
-2.06882e-10
-1.77089e-10
-1.47011e-10
-1.16408e-10
-8.502e-11
-5.25748e-11
-8.93774e-12
3.11307e-11
6.71321e-11
1.05224e-10
1.38925e-10
1.74261e-10
2.11159e-10
2.49486e-10
2.88907e-10
3.28967e-10
3.63684e-10
3.9762e-10
4.29907e-10
4.59522e-10
4.82487e-10
5.01473e-10
5.15485e-10
5.23585e-10
5.23897e-10
5.12046e-10
4.86864e-10
4.4767e-10
3.9438e-10
3.27589e-10
2.31367e-10
1.21496e-10
3.30225e-12
-1.16415e-10
-2.29875e-10
-3.29266e-10
-4.07937e-10
-4.61739e-10
-4.89649e-10
-4.93657e-10
-4.77939e-10
-4.47614e-10
-4.07709e-10
-3.62746e-10
-3.16092e-10
-2.59988e-10
-2.07713e-10
-1.60661e-10
-1.1935e-10
-8.37481e-11
-5.35376e-11
-2.82965e-11
-3.76488e-12
1.50249e-11
2.41762e-11
3.03242e-11
3.4424e-11
3.72911e-11
3.93499e-11
4.08651e-11
4.2002e-11
4.28724e-11
4.35474e-11
4.40785e-11
4.45066e-11
4.48583e-11
4.51568e-11
4.5409e-11
4.56275e-11
4.58424e-11
4.60147e-11
4.61924e-11
4.63469e-11
4.65157e-11
4.66684e-11
4.68443e-11
4.70095e-11
4.72014e-11
4.73968e-11
4.76224e-11
4.78977e-11
4.8237e-11
4.86011e-11
4.90541e-11
4.96243e-11
5.0246e-11
5.10241e-11
5.19123e-11
5.30385e-11
5.43086e-11
5.59339e-11
5.77653e-11
6.0103e-11
6.27463e-11
6.61142e-11
6.99139e-11
7.47775e-11
8.02558e-11
8.72546e-11
9.51488e-11
1.05221e-10
1.16593e-10
1.31099e-10
1.47466e-10
1.68342e-10
1.9191e-10
2.21952e-10
2.55852e-10
2.99082e-10
3.47868e-10
4.10068e-10
4.80256e-10
5.69742e-10
6.70724e-10
7.99474e-10
9.44823e-10
1.12999e-09
1.33912e-09
1.60571e-09
1.90659e-09
2.29011e-09
2.72325e-09
3.27498e-09
3.89855e-09
4.69299e-09
5.59066e-09
6.73435e-09
8.02788e-09
9.6752e-09
1.15397e-08
1.39148e-08
1.66037e-08
2.00289e-08
2.39124e-08
2.88574e-08
3.44726e-08
4.16221e-08
4.97585e-08
6.01121e-08
7.19228e-08
8.6955e-08
1.06345e-07
1.27464e-07
1.54309e-07
1.88877e-07
2.27272e-07
2.76079e-07
3.38885e-07
4.10946e-07
5.02827e-07
6.20945e-07
7.73638e-07
9.71981e-07
1.1817e-06

User avatar
Michael Sherman
Posts: 806
Joined: Fri Apr 01, 2005 6:05 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Michael Sherman » Mon Apr 07, 2014 10:17 am

Hi, Colin. I just got a chance to take a look at this. I instrumented the test like this:

Code: Select all

Real worstPower=-Infinity; // before the loop (line 440)
// ...
if (std::abs(power) > worstPower) { // above line 451
    printf("t=%g power=%g\n", integ.getTime(), power);
    worstPower = std::abs(power);
}
On Windows using Visual Studio 2013 (64 bit) the last output line was

Code: Select all

t=5.96841 power=-5.2014e-008
with identical results whether optimized or debug.

On Mac OSX 10.9.1 using clang llvm 5.0 (64 bit), the last line was

Code: Select all

t=5.96816 power=-1.64746e-08
and that also was consistent between optimized and debug builds.

There is certainly some room for drift in this test and numerical differences between compilers. You were getting around 1e-06 though, which is 20X worse than 5e-08. It is conceivable that this could be chaotic amplification of a small numerical difference but it seems suspicious to me.

I will try to find a gcc-using Linux system to run this on to see if I get the same behavior you are seeing. Meanwhile would you mind re-running this with the above instrumentation to see if at least it is reaching the worst case at around 5.968 seconds and thus behaving at least qualitatively reasonably?

Regards,
Sherm

User avatar
Michael Sherman
Posts: 806
Joined: Fri Apr 01, 2005 6:05 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Michael Sherman » Mon Apr 07, 2014 11:23 am

I ran the instrumented test on Ubuntu 12.04 LTS 64 bit using gcc 4.6.3 (on the same hardware on which I ran the Windows test). The result was:

Code: Select all

t=5.96806 power=-1.30201e-07
That's about 2X worse than the Visual Studio result on the same hardware, showing that the test is clearly sensitive to the compiler's numerics. But this is still 10X better than what you were seeing. Also it occurs to me that since the test dies as soon as the 1e-06 tolerance is exceeded you should probably run with that check disabled to see whether the simulation continues to drift further.

Since you are using an older version of gcc (4.4.7), it is possible that you are seeing numerical issues that have since been addressed. These might be very minor and are getting amplified by this sensitive test, or they could be major and the test is revealing them. Hard to say! Is it possible for you to upgrade to a more recent OS or at least a more recent build toolchain?

BTW, all the test runs I did were 64 bit builds. In theory that shouldn't matter since all the floating point is 64 bit either way, but the compilers do make different instruction choices in 32 and 64 bit compilations. Also I suspect that the compiler developers are putting more effort into their 64 bit code generators now. Are you building 32 bit or 64 bit binaries? If 32 you might want to try 64 instead.

Sherm

User avatar
Michael Sherman
Posts: 806
Joined: Fri Apr 01, 2005 6:05 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Michael Sherman » Mon Apr 07, 2014 11:27 am

I should also mention that I built Simbody 3.4.1 (recently released) rather than 3.3.1. That should have no effect on the numerics since 3.4.1 was almost exclusively installation and packaging changes.

User avatar
Christopher Dembia
Posts: 506
Joined: Fri Oct 12, 2012 4:09 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Christopher Dembia » Wed Aug 13, 2014 12:53 pm

Colin, have you tried this out any further? Carmichael and I just compiled Simbody 3.3.1 in release on Ret Hat Enterpirse Linux 6.5 (Fedora 12, 13) using gcc 4.4.7. TestCustomConstraints fails for us as well.

Using Sherm's instrumentation, the last line of output is:

Code: Select all

t=5.96747 power=1.1817e-06
Our power matches Colin's, actually (at all times).

Carmichael and I are trying this out on Stanford's new Sherlock cluster, on which they only have gcc 4.4.7. We'll try running this test using gcc 4.8 soon.

User avatar
Colin Smith
Posts: 53
Joined: Fri Feb 24, 2012 11:50 am

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Colin Smith » Wed Aug 13, 2014 2:20 pm

Sherm and Chris,

Sorry I missed your replies to this and have gotten side tracked working on other projects, so I haven't gotten it working. I was trying to install the 64bit version. I just tried it with Sherm's Instrumentation and my result is exactly the same as Chris' (t=5.96747 power=1.1817e-06).

I was able to compile it on another computer running Fedora 20 (which I believe is gcc 4.8.2).

I am also hoping to to run Simbody (well actually OpenSim) on a cluster which is running Scientific Linux 6, hence the reason for wanting to use the older gcc compiler.

User avatar
Christopher Dembia
Posts: 506
Joined: Fri Oct 12, 2012 4:09 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Christopher Dembia » Wed Aug 13, 2014 4:39 pm

Colin:
I was able to compile it on another computer running Fedora 20 (which I believe is gcc 4.8.2).
Did the test pass with the newer Fedora?

I just tried compiling on the same cluster (Red Hat) but with gcc 4.8.1 and with clang 3.4. The test still failed with both compilers. So I think it must be (math?) libraries on Red Hat Enterprise Linux 6.5. We cannot really change the cluster...maybe we can try using different libraries? I'm not sure. Any ideas?

User avatar
Christopher Dembia
Posts: 506
Joined: Fri Oct 12, 2012 4:09 pm

Re: Simbody3.3.1 make test errors compiled on Scientific Lin

Post by Christopher Dembia » Wed Aug 13, 2014 5:38 pm

We solved the issue by using our own lapack/blas from the http://www.netlib.org/lapack/ :D We used version 3.4.2.

POST REPLY