Install failure with newinstall.sh and eups distrib


(Craig Lage) #1

Hi, I have a machine where I have successfully been running the lsst stack. It recently has gone through an OS upgrade, and is now running Ubuntu 16.04. When re-installing the stack, eups distrib failed on [ 55/122 ] galsim 1.5.1.lsst2 ... The error is below. I’d appreciate any help.
Thanks, Craig Lage

***** error: from /home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/galsim-1.5.1.lsst2/build.log:
Numpy version is 1.13.1
Checking for PyFITS... yes
PyFITS version is 3.4
Checking for future... yes
Future version is 0.16.0
Checking if we can build against Boost.Python... yes
Checking if C++ exceptions are propagated up to python... yes
GalSim version  1.5.1
scons: done reading SConscript files.
scons: Building targets ...
Install file: "lib/libgalsim.so.1.5" as "/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/galsim/1.5.1.lsst2/lib/libgalsim.so.1.5"
BuildExecutableScript(["bin/galsim"], ["bin/galsim.py"])
g++ -o galsim/_galsim.so -fopenmp -pthread -shared -L/home/cslage/Software/lsst_stack/python/miniconda3-4.3.21/lib -Wl,-rpath=/home/cslage/Software/lsst_stack/python/miniconda3-4.3.21/lib,--no-as-needed -Wl,-rpath=/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/galsim/1.5.1.lsst2/lib pysrc/.obj/module.os pysrc/.obj/Angle.os pysrc/.obj/Bounds.os pysrc/.obj/PhotonArray.os pysrc/.obj/Image.os pysrc/.obj/SBProfile.os pysrc/.obj/SBAdd.os pysrc/.obj/SBConvolve.os pysrc/.obj/SBDeconvolve.os pysrc/.obj/SBFourierSqrt.os pysrc/.obj/SBTransform.os pysrc/.obj/SBBox.os pysrc/.obj/SBGaussian.os pysrc/.obj/SBDeltaFunction.os pysrc/.obj/SBExponential.os pysrc/.obj/SBSersic.os pysrc/.obj/SBMoffat.os pysrc/.obj/SBAiry.os pysrc/.obj/SBShapelet.os pysrc/.obj/SBInterpolatedImage.os pysrc/.obj/SBKolmogorov.os pysrc/.obj/SBSpergel.os pysrc/.obj/SBInclinedExponential.os pysrc/.obj/SBInclinedSersic.os pysrc/.obj/Random.os pysrc/.obj/Noise.os pysrc/.obj/HSM.os pysrc/.obj/Integ.os pysrc/.obj/Table.os pysrc/.obj/Interpolant.os pysrc/.obj/CorrelatedNoise.os pysrc/.obj/Bessel.os pysrc/.obj/CDModel.os pysrc/.obj/Silicon.os pysrc/.obj/RealGalaxy.os pysrc/.obj/WCS.os -Llib -L/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.66.0/lib -L/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/tmv/0.73.lsst2/lib -L/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/fftw/3.3.4.lsst2/lib -L/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/tmv/0.73.lsst2/lib -lboost_python3 -ltmv_symband -lfftw3 -lpthread -ltmv -lpthread -lgalsim
BuildExecutableScript(["bin/galsim_yaml"], ["bin/galsim_yaml.py"])
BuildExecutableScript(["bin/galsim_json"], ["bin/galsim_json.py"])
BuildExecutableScript(["bin/galsim_download_cosmos"], ["bin/galsim_download_cosmos.py"])
Install directory: "share/SEDs" as "/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/galsim/1.5.1.lsst2/share/galsim/SEDs"
scons: *** [/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/galsim/1.5.1.lsst2/share/galsim/SEDs/CWW_E_ext.sed] TypeError : decoding to str: need a bytes-like object, NoneType found
scons: building terminated because of errors.
+ exit -5
eups distrib: Failed to build galsim-1.5.1.lsst2.eupspkg: Command:
        source "/home/cslage/Software/lsst_stack/eups/2.1.4/bin/setups.sh"; export EUPS_PATH="/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6"; (/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/galsim-1.5.1.lsst2/build.sh) >> /home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/galsim-1.5.1.lsst2/build.log 2>&1 4>/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/galsim-1.5.1.lsst2/build.msg 
exited with code 251

(Paul Price) #2

My first thought is that this is a bug in galsim, but to be sure, could you please post the full build log (/home/cslage/Software/lsst_stack/stack/miniconda3-4.3.21-10a4fa6/EupsBuildDir/Linux64/galsim-1.5.1.lsst2/build.log)?


(Tim Jenness) #3

We are currently using galsim 1.6 so it might be worth trying a weekly from the last two months and see if that helps.


(Craig Lage) #4

Here it is. Thanks for your help

build.log (592.4 KB)


(Craig Lage) #5

When you say “trying a weekly from the last two months…”, can you give me some guidance on how to do this?


(Paul Price) #6

Well, I don’t see anything else than the error right at the end. Maybe you’re now using python 3 when you weren’t before, and that older version of galsim doesn’t work with python 3? If that’s the case, you should build using python 2 (-2 argument to newinstall.sh, I believe), or build a more recent release as suggested by @timj. What release are you trying to build now?


(Craig Lage) #7

I’m trying to build release 15.0.


(Paul Price) #8

Well, I think that disproves my theory, as that version should work with python 3.


(Paul Price) #9
eups distrib install -t w_2018_27 lsst_distrib

(Craig Lage) #10

OK. Trying this now. I’ll keep you posted. Thanks


(Craig Lage) #11

Same results. Attached is the build log.
build_galsim-1.6.0.log (582.1 KB)


(Craig Lage) #12

I’m stuck here. I don’t really need GalSim at this point. Is there a way to bypass GalSim and have it keep installing?


(Craig Lage) #13

I tried building v16.0 and got the same failures. However, I noticed that it doesn’t always fail at the same place. It fails in a similar way, with the “TypeError : decoding to str: need a bytes-like object, NoneType found”, but not always on the same file. Here are two different failures from two consecutive attempts. Is this a clue?
build_2.log (553.9 KB)
build_1.log (553.6 KB)


(Michael Jarvis) #14

At this point in the installation, SCons is just copying files. It’s not trying to compile or anything like that. So I’m not sure what the TypeError means, but it smells like an OS problem to me. Like the files that it thinks are there aren’t able to be read or written or something. I’ve never seen that particular error before though, so I don’t really know what it means, but maybe there is some issue with permissions or a bad sector on the hard drive or something along those lines.


(Paul Price) #15

Thanks for having a look, @rmjarvis!

I wonder if there’s something peculiar about your filesystem, @craiglagegit. What kind of filesystem are you installing on? Are you playing any tricks with directory links?

Do you have SCONSFLAGS or EUPS_SCONSFLAGS set?


(Craig Lage) #16

I didn’t set the system up. I think it is a standard Ubuntu system. The filesystem type reports as “nfs4”. Neither SCONSFLAGS nor EUPS_SCONSFLAGS is set. There are no links within the “lsst_stack” directory where I am building the stack, but there are some symbolic links at a higher level.

Also, as I reported above, I’ve noticed that when I run the “eups distrib install -t v16_0 lsst_distrib” command repeatedly, it always fails with the same TypeError, but not always on the same file. Sometimes it has failed on SED files, sometimes on sensor files, and sometimes on files in the share directory.


(Craig Lage) #17

I notice that the galsim install can have a higher level of verbosity for logging. Is there a way for me to turn this on when running eups distrib? It might help figure out what is going on.


(Craig Lage) #18

@price, @rmjarvis

Well, I don’t fully understand what is happening, but I found the link below online, where others were having the same problem with SCons3.0 and python 3.6.

https://pairlist4.pair.net/pipermail/scons-users/2017-October/006383.html

So with that hint, I hacked the SCons.Util.to_str() function as below. After I did that, the GalSim load has completed and it is continuing with the rest of the “eups distrib install -t v16_0 lsst_distrib” install as we speak. While building GalSim, it printed the “TypeError, returning empty string” 3 times. Hope this helps.

Craig

old:
def to_str (s):
if bytes is str or is_String(s):
return s
return str (s, ‘utf-8’)

new:
def to_str (s):
print(“In SCons.Util.to_str(), modified to test for TypeError”)
try:
if bytes is str or is_String(s):
return s
return str (s, ‘utf-8’)
except TypeError:
print(“TypeError, returning empty string”)
return “”


(Craig Lage) #19

@price, @rmjarvis,

More on this. The fails happen after it installs an empty directory. This is why it happens three times. It happens once after installing share/SEDs, once after installing share/bandpasses, and once after installing share/sensors. Attached is a partial build log. It took me a while to find the build log, because it gets moved after the build is complete. Also, I had to cut it down to just the relevant part, because it was too big to upload. If you need the whole thing, let me know.

Craig


(Craig Lage) #20

Forgot to attach it:
build_v16.0_partial.log (33.7 KB)