Install package without tests?

Hi all,

I’m trying to update lsst_distrib and obs_subaru to the latest weekly tag, w_2018_09, and am running into a problem with astshim seeming failing one of the tests.

What I’m running is this:
eups distrib install astshim -t w_2018_09 -v -v -v

After it’s been running for a while, it finally aborts with this error:

Writing log to: /opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/astshim-14.0-15-g77b58a8+1/build.log

***** error: from /opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/astshim-14.0-15-g77b58a8+1/build.log:
E       (mismatch 33.33333333333333%)
E        x: array([[ -1.110223e-15,  -1.110223e-15],
E              [  1.300000e+00,   0.000000e+00],
E              [  0.000000e+00,   1.300000e+00]])
E        y: array([[ 0. ,  0. ],
E              [ 1.3,  0. ],
E              [ 0. ,  1.3]])

tests/ AssertionError
==================== 1 failed, 164 passed in 73.07 seconds =====================
Global pytest run: failed
Failed test output:
Global pytest output is in /opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/astshim-14.0-15-g77b58a8+1/astshim-14.0-15-g77b58a8+1/tests/.tests/pytest-astshim.xml.failed
The following tests failed:
1 tests failed
scons: *** [checkTestStatus] Error 1
scons: building terminated because of errors.
+ exit -4
eups distrib: Failed to build astshim-14.0-15-g77b58a8+1.eupspkg: Command:
        source "/opt/lsst/lsst_stack/eups/2.1.4/bin/"; export EUPS_PATH="/opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6"; (/opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/astshim-14.0-15-g77b58a8+1/ >> /opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/astshim-14.0-15-g77b58a8+1/build.log 2>&1 4>/opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/astshim-14.0-15-g77b58a8+1/build.msg 
exited with code 252
Using Distrib type: eupspkg
Removing lockfile /opt/lsst/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/.lockDir/exclusive-rkotulla.228554

Looking at the assert statement, it seems to get the right answer to within some numeric noise, but still fails the assert_allclose statement, likely because atol (the absolute tolerance) is set to 0, while the error is -1e-15. Is there a way to just skip the tests and declare the package to be installed correctly? I’ve tried the --nobuild option, but that didn’t seem to do anything. In most other cases I could just comment something out or modify the code to make it work, but the whole integrated eups package doesn’t allow for any of that.

I’ve tried running earlier weekly tags as well but run into the same (w_2018_08) or similar (w_2018_05 - there it’s a different test that fails:
<test_object.TestObject testMethod=test_multiprocessing>
E RuntimeError: AST: Error at line 59 in file src/ Object pointer given (value is 389514).This pointer is currently owned by another thread (possible programming error).
) problems.

As a side-question, what is the best way to pass additional compiler flags through the whole chain of nested install/compile tools? For me astshim also failed to link due to a missing link statement to include the standard package, which can be fixed by adding -lstdc++ to the link command. I’ve tried adding it to LDFLAGS, CFLAGS, CXXFLAGS, both using explicit export and when running eups (i.e. LDFLAGS=’-lstdc++’ eups distrib …), but none made it through to actually being used when it came down to running c++. In the end I hacked the sconsUtils/ file to add this flag and get it to compile, but I doubt that’s the officially endorsed way.

I’ve attached the full build-log just in case that might be helpful: build.log (1.4 MB)


I don’t know of a way to disable tests. What operating system and compiler do you use? What version of python and numpy do you have (with the LSST installation)? It would be ideal to fix the test, but it may be a bit tricky because it passes on my system and Jenkins. I would appreciate it if you could file a ticket on JIRA containing this information.

Alright, here’s some information on my system, but please let me know if you need anything else:

Intel Xeon Phi CPU (64 cores, 256 threads), 192 GB RAM, running OpenSuSE Leap 42.3
Kernel: Linux trantor 4.4.76-1-default #1 SMP Fri Jul 14 08:48:13 UTC 2017 (9a2885c) x86_64 x86_64 x86_64 GNU/Linux

Python: Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
>>> numpy.__version__

Note: That’s the one installed using the LSST distribution

Compiler: I’m using gcc-6 (I tried the gcc-4.8.5 first, but although it should support C++14 from what I read online it never worked to compile any of the LSST stuff, so I switched over to gcc6 which seems to do just fine)

> gcc --version
gcc (SUSE Linux) 6.2.1 20160826 [gcc-6-branch revision 239773]
> g++ --version
g++ (SUSE Linux) 6.2.1 20160826 [gcc-6-branch revision 239773]

Alternatively to disabling the tests, is there a way to install the package outside of using eups, i.e. by downloading the code from github by hand, followed by running XXX (scons???) ? In this case I could just disable that particular test, compile away and live happily ever after?

One way that might work to get around this is:

  • set your stack up
  • git clone astshim
  • set up your version of astshim: “setup -r .”
  • build your astshim: “scons” and delete any broken tests (or fix them by specifying a reasonable atol, such as 1e-14 – if you do that you can commit them to the ticket branch using the ticket you filed)
  • install your astshim: “scons install declare version=…” using the version lsstdistrib was trying to install

Once that’s installed you can probably use lsstdistrib to finish your installation.

As you said, the long term fix seems to be to provide a value for atol. Some astshim tests already do this, but many use the default of 0, which is unsafe, even though it happens to work on our test systems.

Thanks, Russell, I’ll give that a try!

First, gcc 4.8.5 is definitely too old; gcc 6 is fine.

I’m worried about the need/desire to add -lstdc++; I think the compiler should be adding this automatically when necessary. I’ve seen failures of the dynamic linker to find the library at runtime, but I don’t think I’ve seen a failure of the linker to build a shared object because it was missing.

As you’ve seen, we use scons to build, which strips out (most) environment variables in order to provide reproducible builds. It is possible to pass some arguments into the build, but generally things should be fixed in other ways.

I’m even more worried because it looks to me like this case (a simple two-coordinate zoom matrix) should actually provide an exact solution. I’m not sure what in the chain of CPU, OS, and libstdc++ might be causing the divergence.

As a follow-up: I got astshim compiled and installed in the end, with only minor path fiddling to make the rest of the packages install. Following that all other dependencies compiled and installed perfectly fine without any further issues.

As for the tests, I think there is some random element to it, maybe related to the CPU and it’s many cores/threads (just my random guess). What I noticed is that in one try a given test was reported as “failed”, while in the next try it seemed to pass just fine (all in all there were 3-4 different tests that seemed to fail randomly). I’m not sure why that is and how to fix it, but since it doesn’t seem to cause any problems during actual operation I’m fine with this hack for now.

This is probably worth a ticket for further study. Although, do we have a machine with this architecture we can use to experiment on?

If not let me know and I can try to give you access to my machine for testing etc.

I’m not entirely surprised there are machines that show problems, though I wonder why none of our standard machines do. numpy.testing.assert_allclose has a default value of atol=0 which makes it too picky when checking values at or near 0.0. The default atol for numpy.allclose is 1e-8 which is better for “reasonable” values, and is what I thought the default was for assert_allclose. I filed DM-13836 to specify atol everywhere that uses assert_allclose