Using pytest for test execution

ktl · August 15, 2017, 5:22am

I think I would prefer to have $NJOBS supported.

timj · August 15, 2017, 2:22pm

ie you want $NJOBS to always be used to decide how many processes to use even if that is inefficient because there are only 2 test files and they are both quick? Or, you want a flag in the SConscript file to decide whether $NJOBS should be used?

ktl · August 15, 2017, 2:25pm

Sorry, I think it’s good that we use $NJOBS if we are running in parallel, but we should only run in parallel if it’s desirable.

timj · August 18, 2017, 12:23am

Missed this, so you want a switch in SConscript which can be something like runTestsSerially and that defaults to False (use whatever SCons is using from -j). Or runTestsInParallel defaulting to True? (still implicitly using -j rather than True using -n auto). It’s easy to add. Probably will spend more time working out the naming and behavior.

timj · August 18, 2017, 12:51am

Here is the current status of pytest:

sconsUtils now runs pytest and by default will use the *.py files detected by sconsUtils.
The output from pytest goes to standard out (the _build.log) and to the tests/.tests directory.
If pytest fails the standard .failed file turns up.
You can specifically call out test files that do not work if run with other tests.
If you use pyList=[] in the SConscript file pytest will use automatic test discovery – you can enable this once your package is RFC-229 compliant.
pytest will use the number of jobs that scons is using (scons -j aka $NJOBS from eupspkg) using pytest-xdist.
If auto test discovery is enabled you can also enable automated flake8 testing by using a setup.cfg file. I have already put one in meas_base as an example.
lsst_sims and lsst_ci pass without any problems with serial pytest execution. There is a race condition in obs_base that I am looking at (similar to one in daf_persistence that I just fixed).
We have demonstrated JUnit visualization in Jenkins but @josh has to implement the xml discovery mechanism.

I believe I have addressed everyone’s concerns. Please let me know if people see any show stoppers. In particular, @rowen and @rhl, are you (relatively) happy? I will move this summary to the RFC in a bit.

@ktl has some concerns over the default behavior of pytest-xdist, although the current implementation does emulate previous behavior (except pytest-xdist seems to be able to run different tests on different processes from the same test file: much cleverer than I expected it to be but that does trigger the races I have found relating to non-randomized temp directory usage).

ktl · August 18, 2017, 12:58am

Wait, I have concerns? I think that any absolute penalty for parallel execution is likely to be small and that it might pick up test issues that we might otherwise miss. So I’m fine with the above.

timj · August 18, 2017, 1:13am

I think the absolute overhead from launching ~ 8 subprocesses is about 5 seconds. You notice that if you have a package that only has a couple of test files and take less than a second to run. You don’t notice it for afw where you can more than save those 5 seconds by running in parallel.

If we want to add fine-grained control per-package of whether to multiprocess and how many processes to use then it’s relatively straightforward to add that later.

timj · August 19, 2017, 3:29am

I’ve fixed race conditions in daf_persistence, afw, and obs_base so far. They are either reusing a temporary directory of a fixed name or reusing a temp file name that has a fixed name. I have just seen that sims_catUtils also has a problem.

timj · August 19, 2017, 4:20pm

One more wrinkle: given that pytest allocates individual tests to different processes and order is no longer guaranteed, this leads to the leak checks being completely irrelevant since you can end up with all the leak tests running on their own process where they will obviously not find any leaks.

The only way leak testing can ever work in the new parallel world is for them to be configured in setUp and tested in every tearDown. Alternatively, we have one weekly Jenkins job that does a build with $NJOBS=1.

timj · August 21, 2017, 3:41pm

Status report:

I have been able to build lsst_apps and lsst_sims. I had to make extensive changes to Sims packages to get things to work (they write a lot of files in their tests) and I have open PRs for sims_catUtils, sims_catalogs, sims_maf, sims_GalSimInterface, and sims_alertsim. sims_skybrightness has a resource issue on Jenkins (it uses too much memory if 8 subprocesses each attempt to load all the sky models – works fine on my laptop which has fewer cores) that I need to talk to @yoachim about. There is a problem with ctrl_execute that @srp is looking in to.

The current status to do you own testing is to run rebuild with:

rebuild -r tickets/DM-11514 -r tickets/DM-11514-base -r tickets/DM-11514-pipe_tasks-xdist -r tickets/DM-11614 -r tickets/SIM-2422 -r tickets/SIM-2424 -r tickets/SIM-2425 -r tickets/SIM-2426 -r tickets/SIM-2427

The base and pipe_tasks branches can’t be merged until the sconsUtils branch is merged because they use new features to enable tests to be run standalone. The jointcal change is minor: the global filter list needs to be reset for each test.

The number of outstanding tickets that need to be merged suggests that we won’t be able to switch to pytest this week. @josh is working on Jenkins integration of the JUnit files. Regarding RFC-370 there have been no comments against pytest so it looks like I can adopt it, pending a clean Jenkins run of everything.