Using pytest for test execution

unittest
dm-dev
Tags: #<Tag:0x00007f61b2ceeaa8> #<Tag:0x00007f61b2cee918>

(Tim Jenness) #1

In DM-11514 I have got to a point where “everything” (lsst_ci, lsst_distrib, lsst_obs, lsst_sims) can build and have all tests pass if, for each test file, we replace the python testfile.py call with pytest testfile.py. This was the simplest possible change to the test execution environment but has the downside that we have a slightly increased overhead (possibly of order 0.5 second) when executing each file’s tests because of the pytest infrastructure spinning up and the creation of the JUnit XML output files.

Here are some test execution statistics for afw (117 files; 1227 tests passed; 13 skipped; 2 CPUs):

  • Running all tests with one pytest process: 1minute 45 seconds.
  • Running all tests with one pytest using pytest-xdist parallelization (auto mode: 4 “threads”): 1m5s.
  • Running all tests serially with one pytest process per test file: 10m30s minutes (!!)
  • Running all tests serially with one python process per test file (the current mode): 9m30s.

These statistics are frightening; the import overhead is killing us and pytest is adding about 10% overhead to those imports. Even in multi-threaded mode with scons managing the subprocesses, our builds are slower than they could be.

This run time discrepancy suggests to me that I should not merge the current sconsUtils changes but should make a more extensive change to trigger a single pytest process. This would result in the per-testfile .failed files disappearing (binary executables should be run by the test_executables.py test file and should not be run explicitly by scons) and being replaced by a JUnit XML file and a file containing the output from pytest itself (which in theory I could call .failed if we wanted to do that).

Jenkins will use the XML file to give a detailed test report.

Another advantage of using pytest is that we can automatically add flake8 testing via a plugin if pytest is run in auto discovery mode. I think the way forward would be:

  • By default collect all the python test files and explicitly pass them to pytest.
  • If a SConscript flag is set indicating that auto-discovery is fine, skip the collection and just run pytest without arguments. (if all the test files are renamed following RFC-229 before we enable this mode then no flag is needed).
  • If a product is flake8-clean add the --flake8 option to a setup.cfg file to enable that (do not use a SConscript flag so that pytest will work correctly from the command-line.

Does anyone see a problem with changing test execution in a more extreme manner described above? This should make builds faster.


(josh) #2

I don’t think the .failed files are necessary if;

  • scons test, which is presumablly what a developer working locally would run, sends the pytest output to the console and the developer as the option of enabling the pedantic per test output with the -v flag (eg., PYTEST_ADDOPTS="-v")
  • the CI system (jenkins), is collecting the junit xml output and formatting a test report
  • the pytest console output appears in the _build.log when being driven by eupspkg/_build.sh

(Tim Jenness) #3

I am reasonably comfortable with a clean break from the old system as you describe. I welcome counter arguments though.


(Robert Lupton) #4

I don’t know enough about pytest so this may be a stupid question, but the .failed files are used to make scons only rerun failed tests (I don’t think we always set the dependencies correctly to rerun when we change python, but that’s a different problem). Will the new approach always run all tests? If so, I think it’s a problem.


(K-T Lim) #5

If running all the tests in the new mode takes the same time as running 12-20 tests in the old mode, and less than 2 minutes overall, it’s not clear that rerunning only failed tests is a win.


(Jonathan Sick) #6

pytest has built-in functionality that can let us deprecate the .failed files. We can use these command-line arguments:

--lf, --last-failed   rerun only the tests that failed last time (or all if none failed)
--ff, --failed-first  run all tests but run the last failures first.

See https://docs.pytest.org/en/latest/cache.html for details.


(Paul Price) #7

The *.failed files have been useful for when users have reported test failures — it’s then simple for them to post the log of the failure. If the *.failed files are removed, please make the path to posting logs of the failures clear for the end-user.


(Tim Jenness) #8

One option is to get the users to send us their JUnit .xml file. That contains all the information and is relatively readable if you run it through xmllint --format first. I’m not sure if there is a JUnit parsing command-line tool or some such thing.


(Tim Jenness) #9

This sounds good. I think the execution times described above indicate that using the per-file testing approach has too much overhead in the general case. I think it’s reasonable to add --lf to the scons pytest options when we run all tests.


(Robert Lupton) #10

I don’t think this is quite good enough. The --ff sounds good when you’re working on fixing on a test failure, but people (or at least I) tend to build all targets: export SCONSFLAGS="-Q opt=3 -j 6; scons. When you’re just rebuilding something that isn’t explicitly tested that’ll run all the tests (and it’s really hard to do algorithmic development by TDD). I know how to avoid this: scons lib python (but that doesn’t rebuild the ./bin scripts on os/x); I think we’d prefer people to be using a vanilla scons.

Now, I’m hoping that an ever decreasing portion of DM won’t be rebuilding C++ very often so maybe this isn’t too bad.


(Jim Bosch) #11

I don’t really understand what this means, but I think it may be a misunderstanding of how things work now. When we run scons or scons tests now, all of the Python tests are supposed to be run if anything has changed. The C++ tests are indeed run based on whether any of their dependencies have changed, but SCons doesn’t do any dependency tracking for Python files. The only time those commands run only the tests with .failed files are when SCons believes nothing has changed (it often fails to notice when Python code has changed, but I think that’s a bug, not a feature).


(Tim Jenness) #12

even if you add the shebang target to your list?

Some people have advocated removing tests from the default scons target list.

Fundamentally, this all comes back to @ktl’s point above. If the testing overhead drops by a factor of five (which seems to be the case when using pytest to do it all rather than scons to run each one separately) then we have significant gains for the general case of doing a build that is mostly going to pass all the tests.

In the proposed scheme scons will always run pytest once and we will rely on `–lf`` to decide whether to run all the tests or only the ones that failed last time. There will be some cases where you don’t want tests to run because you just tweaked some documentation but are they enough of a reason to throw away all the other performance gains that come with this change?


(Robert Lupton) #13

It’s obviously a tradeoff, but some tests (e.g. the Kron ones) run quite a long time – the PFS tests are even worse, but that’s outside MREFC scope. Still, given that you can specify that you don’t want tests run may be enough – some variant of scons python && python tests/test_foo.py will presumably still work.

Jim’s right about dependency tracking; I didn’t try to write a scanner to run a minimal number of tests.

And yes, adding shebang works (and I do that when I remember).


(Tim Jenness) #14

We would need an RFC to stop that working.

Note that pytest has significantly more options for test filtering so you could use $PYTEST_ADDOPTS to add filtering options by default and then pytest would only run a subset you care about.


(Tim Jenness) #15

I have made the changes described above:

  • pytest can now run all the *.py tests using a single pytest process (with --lf).
  • pytest can use auto test discovery mode if the SConscript tests() call has pyList=[] as an argument.
  • For tests that should not be run in a shared process (for example, one test in base that relies on LSST classes not having been loaded yet) you can now mark those tests using the pySingles argument to tests(). You are warned if a single-process test has a name starting with test_. This does mean you get a .failed file for that one and therefore possibly multiple JUnit files for Jenkins to find.

(Tim Jenness) #16

As an added bonus, if you use auto-discovery mode you can then enable flake8 testing of all the python files. I have enabled that in sconsUtils on the relevant ticket branch.


(Jonathan Sick) #17

We should also be enabling the --doctest-modules pytest argument to verify examples in docstrings with doctest. This worked very with the verify package.


(Tim Jenness) #18

Yes, but that is at the per-package setup.cfg level.


(Tim Jenness) #19

RFC filed: RFC-370.


(Tim Jenness) #20

I made some changes to pytest in sconsUtils today so that the output from pytest is sent to standard out and also to tests/.tests/ (and is renamed to .failed on failure). This seems to work reasonably well and thanks to @josh for pointing out the existence of pytest-session2file plugin.

The only remaining issue is, I believe, multi-process dispatch of tests. There is a pytest-xdist plugin which allows this and @parejkoj is looking into eups packages for it to give us the option of using multi-processing. I’m a little bit reticent to enable this globally inside sconsUtils as it is a lot of overhead for packages that have a handful of quick tests. It might be best if we enable it on a per-package basis by adding -n auto to a setup.cfg file (as we are doing for --flake8 settings). Packages that would benefit then get the multiple processes all the time. The downside is that you can’t then use $NJOBS to control that in the standard way. If we want to support $NJOBS then I think we’d need a flag in the SConscript file in the tests/ directory along the lines of canMultiProcess=Bool. Thoughts?