Using pytest for test execution

I don’t really understand what this means, but I think it may be a misunderstanding of how things work now. When we run scons or scons tests now, all of the Python tests are supposed to be run if anything has changed. The C++ tests are indeed run based on whether any of their dependencies have changed, but SCons doesn’t do any dependency tracking for Python files. The only time those commands run only the tests with .failed files are when SCons believes nothing has changed (it often fails to notice when Python code has changed, but I think that’s a bug, not a feature).

1 Like

even if you add the shebang target to your list?

Some people have advocated removing tests from the default scons target list.

Fundamentally, this all comes back to @ktl’s point above. If the testing overhead drops by a factor of five (which seems to be the case when using pytest to do it all rather than scons to run each one separately) then we have significant gains for the general case of doing a build that is mostly going to pass all the tests.

In the proposed scheme scons will always run pytest once and we will rely on `–lf`` to decide whether to run all the tests or only the ones that failed last time. There will be some cases where you don’t want tests to run because you just tweaked some documentation but are they enough of a reason to throw away all the other performance gains that come with this change?

It’s obviously a tradeoff, but some tests (e.g. the Kron ones) run quite a long time – the PFS tests are even worse, but that’s outside MREFC scope. Still, given that you can specify that you don’t want tests run may be enough – some variant of scons python && python tests/test_foo.py will presumably still work.

Jim’s right about dependency tracking; I didn’t try to write a scanner to run a minimal number of tests.

And yes, adding shebang works (and I do that when I remember).

We would need an RFC to stop that working.

Note that pytest has significantly more options for test filtering so you could use $PYTEST_ADDOPTS to add filtering options by default and then pytest would only run a subset you care about.

I have made the changes described above:

  • pytest can now run all the *.py tests using a single pytest process (with --lf).
  • pytest can use auto test discovery mode if the SConscript tests() call has pyList=[] as an argument.
  • For tests that should not be run in a shared process (for example, one test in base that relies on LSST classes not having been loaded yet) you can now mark those tests using the pySingles argument to tests(). You are warned if a single-process test has a name starting with test_. This does mean you get a .failed file for that one and therefore possibly multiple JUnit files for Jenkins to find.

As an added bonus, if you use auto-discovery mode you can then enable flake8 testing of all the python files. I have enabled that in sconsUtils on the relevant ticket branch.

We should also be enabling the --doctest-modules pytest argument to verify examples in docstrings with doctest. This worked very with the verify package.

Yes, but that is at the per-package setup.cfg level.

RFC filed: RFC-370.

I made some changes to pytest in sconsUtils today so that the output from pytest is sent to standard out and also to tests/.tests/ (and is renamed to .failed on failure). This seems to work reasonably well and thanks to @josh for pointing out the existence of pytest-session2file plugin.

The only remaining issue is, I believe, multi-process dispatch of tests. There is a pytest-xdist plugin which allows this and @parejkoj is looking into eups packages for it to give us the option of using multi-processing. I’m a little bit reticent to enable this globally inside sconsUtils as it is a lot of overhead for packages that have a handful of quick tests. It might be best if we enable it on a per-package basis by adding -n auto to a setup.cfg file (as we are doing for --flake8 settings). Packages that would benefit then get the multiple processes all the time. The downside is that you can’t then use $NJOBS to control that in the standard way. If we want to support $NJOBS then I think we’d need a flag in the SConscript file in the tests/ directory along the lines of canMultiProcess=Bool. Thoughts?

I think I would prefer to have $NJOBS supported.

ie you want $NJOBS to always be used to decide how many processes to use even if that is inefficient because there are only 2 test files and they are both quick? Or, you want a flag in the SConscript file to decide whether $NJOBS should be used?

1 Like

Sorry, I think it’s good that we use $NJOBS if we are running in parallel, but we should only run in parallel if it’s desirable.

1 Like

Missed this, so you want a switch in SConscript which can be something like runTestsSerially and that defaults to False (use whatever SCons is using from -j). Or runTestsInParallel defaulting to True? (still implicitly using -j rather than True using -n auto). It’s easy to add. Probably will spend more time working out the naming and behavior.

Here is the current status of pytest:

  • sconsUtils now runs pytest and by default will use the *.py files detected by sconsUtils.
  • The output from pytest goes to standard out (the _build.log) and to the tests/.tests directory.
  • If pytest fails the standard .failed file turns up.
  • You can specifically call out test files that do not work if run with other tests.
  • If you use pyList=[] in the SConscript file pytest will use automatic test discovery – you can enable this once your package is RFC-229 compliant.
  • pytest will use the number of jobs that scons is using (scons -j aka $NJOBS from eupspkg) using pytest-xdist.
  • If auto test discovery is enabled you can also enable automated flake8 testing by using a setup.cfg file. I have already put one in meas_base as an example.
  • lsst_sims and lsst_ci pass without any problems with serial pytest execution. There is a race condition in obs_base that I am looking at (similar to one in daf_persistence that I just fixed).
  • We have demonstrated JUnit visualization in Jenkins but @josh has to implement the xml discovery mechanism.

I believe I have addressed everyone’s concerns. Please let me know if people see any show stoppers. In particular, @rowen and @rhl, are you (relatively) happy? I will move this summary to the RFC in a bit.

@ktl has some concerns over the default behavior of pytest-xdist, although the current implementation does emulate previous behavior (except pytest-xdist seems to be able to run different tests on different processes from the same test file: much cleverer than I expected it to be but that does trigger the races I have found relating to non-randomized temp directory usage).

2 Likes

Wait, I have concerns? I think that any absolute penalty for parallel execution is likely to be small and that it might pick up test issues that we might otherwise miss. So I’m fine with the above.

I think the absolute overhead from launching ~ 8 subprocesses is about 5 seconds. You notice that if you have a package that only has a couple of test files and take less than a second to run. You don’t notice it for afw where you can more than save those 5 seconds by running in parallel.

If we want to add fine-grained control per-package of whether to multiprocess and how many processes to use then it’s relatively straightforward to add that later.

I’ve fixed race conditions in daf_persistence, afw, and obs_base so far. They are either reusing a temporary directory of a fixed name or reusing a temp file name that has a fixed name. I have just seen that sims_catUtils also has a problem.

One more wrinkle: given that pytest allocates individual tests to different processes and order is no longer guaranteed, this leads to the leak checks being completely irrelevant since you can end up with all the leak tests running on their own process where they will obviously not find any leaks.

The only way leak testing can ever work in the new parallel world is for them to be configured in setUp and tested in every tearDown. Alternatively, we have one weekly Jenkins job that does a build with $NJOBS=1.

Status report:

I have been able to build lsst_apps and lsst_sims. I had to make extensive changes to Sims packages to get things to work (they write a lot of files in their tests) and I have open PRs for sims_catUtils, sims_catalogs, sims_maf, sims_GalSimInterface, and sims_alertsim. sims_skybrightness has a resource issue on Jenkins (it uses too much memory if 8 subprocesses each attempt to load all the sky models – works fine on my laptop which has fewer cores) that I need to talk to @yoachim about. There is a problem with ctrl_execute that @srp is looking in to.

The current status to do you own testing is to run rebuild with:

rebuild -r tickets/DM-11514 -r tickets/DM-11514-base -r tickets/DM-11514-pipe_tasks-xdist -r tickets/DM-11614 -r tickets/SIM-2422 -r tickets/SIM-2424 -r tickets/SIM-2425 -r tickets/SIM-2426 -r tickets/SIM-2427

The base and pipe_tasks branches can’t be merged until the sconsUtils branch is merged because they use new features to enable tests to be run standalone. The jointcal change is minor: the global filter list needs to be reset for each test.

The number of outstanding tickets that need to be merged suggests that we won’t be able to switch to pytest this week. @josh is working on Jenkins integration of the JUnit files. Regarding RFC-370 there have been no comments against pytest so it looks like I can adopt it, pending a clean Jenkins run of everything.