Installing lsst_sims from source fails

Hello,

I’m trying to install lsst_sims from source on a CentOS 7 VM. I started by running the “newinstall.sh” script and then installing the main LSST distribution (using eups distrib install -t v13_0 lsst_distrib) which all seemed to go smoothly. However, I get an error when installing lsst_sims. I used the command:

eups distrib install lsst_sims -t sims

The error occurs while installing sims_alertsim:

  [ 86/88 ]  sims_alertsim 2.3.4.sims+3 ... 

***** error: from /home/james/lsst/EupsBuildDir/Linux64/sims_alertsim-2.3.4.sims+3/build.log:
        </ObsDataLocation>
    </WhereWhen>
    <Citations>
        <EventIVORN cite="followup">
        </EventIVORN>
        <EventIVORN cite="followup">
        </EventIVORN>
    </Citations>
    <Description></Description>
</voe:VOEvent>


received data: 000009b0
Number of events from this visit : 3. Time from first to last event 0.003985 or 0.001328 per event
The following tests failed:
/home/james/lsst/EupsBuildDir/Linux64/sims_alertsim-2.3.4.sims+3/sims_alertsim-2.3.4.sims+3/tests/.tests/testEndToEnd.py.failed
1 tests failed
scons: *** [checkTestStatus] Error 1
scons: building terminated because of errors.
+ exit -4
eups distrib: Failed to build sims_alertsim-2.3.4.sims+3.eupspkg: Command:
     source /home/james/lsst/eups/bin/setups.sh; export EUPS_PATH=/home/james/lsst; (/home/james/lsst/EupsBuildDir/Linux64/sims_alertsim-2.3.4.sims+3/build.sh) >> /home/james/lsst/EupsBuildDir/Linux64/sims_alertsim-2.3.4.sims+3/build.log 2>&1 4>/home/james/lsst/EupsBuildDir/Linux64/sims_alertsim-2.3.4.sims+3/build.msg 
exited with code 252
[james@localhost lsst]$  

It looks to be a test failure, but I am new to the LSST software and have no idea where to start with troubleshooting it. Does anyone know what might have gone wrong?

Edit: another thing I noticed, in case it’s relevant, is that if I re-run the installation, the test still fails, but the numbers in the “Time from first to last event” line are completely different this time around.

Many thanks,
James

Hi James —

I don’t know the sims codebase at all, so I probably can’t help with the specifics of this question. However, the details of the test failure itself should be store in the /home/james/lsst/EupsBuildDir/Linux64/sims_alertsim-2.3.4.sims+3/sims_alertsim-2.3.4.sims+3/tests/.tests/testEndToEnd.py.failed file – while that might not tell you enough to work out the problem for yourself, if you could post the contents here it will give the experts something further to ponder.

testEndToEnd.py.failed (151.5 KB)
Thanks. I have uploaded the testEndToEnd.py.failed file which does seem to contain some more detail.

======================================================================
FAIL: test_alert_sim_end_to_end (__main__.AlertSimEndToEndTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/testEndToEnd.py", line 181, in test_alert_sim_end_to_end
    % (len(control_data), len(voevent_data_dicts))))
AssertionError: 30 != 29 : 30 catalog entries; 29 voevents

———————————————————————————————————

It looks like a race. There is one event that is printed after the test has already failed. I don’t know how that test runs. It could be as simple as rerunning the test to see if the events win the race a second time.

Funny enough, I’m currently rebuilding my stack with lsstsw and I’m getting a failure in this test as well.
Running the test by hand failed the first time (got a timeout, but had a VPN with NOAO up) but succeeded the second time (after I turned off the VPN). It also succeeded when I turned off the internet connection completely, but I think in this case it’s testing if it can reach the UW database it’s querying and if it can’t, it skips the test.

That said, I don’t know enough about alertsim to know why the test is actually failing here. The test looks like it gets the data, writes the alerts to disk, parses them, and then tries to read them back, as sequential steps in the test code. It’s not clear if one of these steps can somehow return before it’s actually finished?

So I think trying to rerun the test is the thing to do, but perhaps @darko can comment further?

@jamesp did rerunning the install a second time work? That will at least tell us definitively if it’s a race (though not how to fix it).

Thanks for all the replies. I re-ran the install a second time. The test still failed, though the numbers in the “Time from first to last event” line were different from the first time. Unfortunately I am currently having to rebuild my VM from scratch due to another problem. I will let you know what happens when I try to install the sims packages on the new VM.

I pinged Veljko and @danielsf (guys who designed tests for alertsim) and expect their response soon.

The test is not trying to contact any external resources. It is generating the alerts and monitoring them itself. That being said: the problem is still a race condition. I will mark this test as an expectedFailure and issue a new tagged version of lsst_sims today. I will post here when that is available.

Veljko and I will need to figure out a better way to test alertsim.

@jamesp I have updates the distribution of lsst_sims so that it will ignore the offending test. eups distrib install lsst_sims should now work (or, at least, not fail because of a race condition in the sims_alertsim tests).

Thanks for all your help. I just re-ran the installation, it works fine now.