If lsstsw `rebuild` fails with flake8 but a local checkout and build succeeds, what should I look into?

When I use an updated lsstsw to rebuild -u verify, the build fails with flake8 errors.

When I build explicitly a locally checked-out copy, it’s fine.

I’ve read through DM-11822 and DM-11809 and tried to follow the discussion in #dm-square.

What is the proper behavior is supposed to be and what should be local configuration be to ensure this is correct.

@jsick How do I post an attachment to a community post? I’d like to post the build log and failed.xml.

A snippet of the end of .failed file will be fine. I was able to build verify using lsstsw yesterday so this will be interesting.

============================= test session starts ==============================
platform darwin -- Python 2.7.12, pytest-3.2.0, py-1.4.34, pluggy-0.4.0
rootdir: /Volumes/PS1/lsstsw/build/verify, inifile: setup.cfg
plugins: session2file-0.1.9, xdist-1.19.2.dev0+g459d52e.d20170907, forked-0.3.dev0+g1dd93f6.d20170907, flake8-0.8.1
gw0 I / gw1 I / gw2 I / gw3 I / gw4 I / gw5 I / gw6 I / gw7 I / gw8 I / gw9 I / gw10 I / gw11 I / gw12 I / gw13 I / gw14 I / gw15 I
gw0 [533] / gw1 [533] / gw2 [533] / gw3 [533] / gw4 [533] / gw5 [533] / gw6 [533] / gw7 [533] / gw8 [533] / gw9 [533] / gw10 [533] / gw11 [533] / gw12 [533] / gw13 [533] / gw14 [533] / gw15 [533]

scheduling tests via LoadScheduling
........................................F..FF..FFFF....................................................................................................................F....F.......................F......F................................s.........s.......................F..F...F.F......ss.F.s...s...............F.........................................F.............................................................F....F......................................................................................s.........s....s..ss..s...
 generated xml file: /Volumes/PS1/lsstsw/build/verify/tests/.tests/pytest-verify.xml
=================================== FAILURES ===================================
_____________ FLAKE8-check(ignoring E133 E226 E228 N802 N803 N806) _____________
[gw10] darwin -- Python 2.7.12 /Users/wmwv/lsstsw/miniconda/bin/python
build/bdist.macosx-10.6-x86_64/egg/pytest_flake8.py:115: in runtest
    ???
/Users/wmwv/lsstsw/miniconda/lib/python2.7/site-packages/py/_io/capture.py:150: in call
    res = func(*args, **kwargs)
build/bdist.macosx-10.6-x86_64/egg/pytest_flake8.py:187: in check_file
    ???
build/bdist.macosx-10.6-x86_64/egg/flake8/main/application.py:229: in make_file_checker_manager
    ???
build/bdist.macosx-10.6-x86_64/egg/flake8/checker.py:89: in __init__
    ???
/Users/wmwv/lsstsw/miniconda/lib/python2.7/multiprocessing/__init__.py:232: in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
/Users/wmwv/lsstsw/miniconda/lib/python2.7/multiprocessing/pool.py:159: in __init__
    self._repopulate_pool()
/Users/wmwv/lsstsw/miniconda/lib/python2.7/multiprocessing/pool.py:223: in _repopulate_pool
    w.start()
/Users/wmwv/lsstsw/miniconda/lib/python2.7/multiprocessing/process.py:130: in start
    self._popen = Popen(self)
/Users/wmwv/lsstsw/miniconda/lib/python2.7/multiprocessing/forking.py:121: in __init__
    self.pid = os.fork()
E   OSError: [Errno 35] Resource temporarily unavailable

E OSError: [Errno 35] Resource temporarily unavailable

seems to be the key line.

Yes. Do you have 16 cores?

8 real cores.

You can drag the file into your editing window. If xml isn’t whitelisted (yet) you might want to add a .txt extension.

Thanks. “Upload” was what I was looking for. I understand now that attachments are associated with a comment, not the thread (opposite to JIRA tickets).

But I am guessing 16 cores if you query the operating system. That means that scons is doing the “right” thing. When you say your build succeeds when you do a local checkout I am guessing that’s because you are doing the build with scons and not scons -j16

You might have to set $EUPSPKG_NJOBS=8 to get the build to work if your machine has too many cores to be usable.

Does it look like it’s running out of processes or RAM?

It could be a low ulimit for processes or open files. ulimit -a will list all user limits.

@mwv does pytest -n 16 (if run in the verify dir) also fail for you?

Setting

export $EUPSPKG_NJOBS=8

succeeds.

I just ran into this again with meas_base this morning.

I don’t really know what to say. It doesn’t seem like it’s because the tests themselves use lots of resources. I just ran a 16 worker test with verify and didn’t even see the memory usage blip. What does ulimit -a say for you?

I get:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1418
virtual memory          (kbytes, -v) unlimited
[serenity ~] ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 7168
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

Try changing that one.