First of all, we clearly need a real test framework to run our unittest unit tests. @timj has been experimenting with nose, which is used by numpy, but I also know that py.test is a newer entry in the market. On the surface, both seen very similar, so it’s hard to tell which is better. So I asked the community what they like to use, and this is what I got:
we’d be happy with both nose or py.test for basic functionality
py.test has a better plugin architecture, so we (SQuaRE and friends) would likely be much happier if we were writing our own plugins against py.test.
py.test and nose are largely compatible in terms of the tests they run when we’re using them in a basic sense, so we can even try both initially if we really wanted
I’m perfectly okay with switching to py.test. It wouldn’t hurt to try both initially and ensure that we get the same failure modes when we port things over.
What I’ve not seen – either in the above or in previous discussions – is a straightforward enumeration of our requirements from a test framework. I understand that the ability to produce xUnit output which can be ingested by Jenkins is important, but is there anything else?
My own feeling is that the “activation energy” for requiring our developers and contributors to learn a new, non-standard-library, test (or, indeed, anything) framework should be quite high: I’d like to see a really concrete discussion of what we’re buying into.
My opening position here is that I’m not asking for any move away from the unittest approach to testing. To start with py.test or nose would just be used for driving the tests and gathering the output. I’m not worrying about using py.test-specific features to begin with and that is not driving this change.
The reason tests have to change is because the tests are not actually compliant with the unittest philosophy in many cases. In particular the way tests are skipped differs wildly from test to test and only occasionally are tests skipped properly. Furthermore the use of suites is not compatible with test drivers that automatically determine which tests are present and removing those is a side effect of the work.
The other outcome of using a test driver is that the tests tend to run in a different namespace (not __main__) and some tests seem to misbehave when they are not run in a separate process (the SIP fitting in meas_astrom has that problem). We probably don’t want tests that fail simply because some other code has run first so these occurrences worry me and need to be examined further.
The memory test case seems to be important and it is relatively easy to make it trigger without using an explicit suite. This doesn’t mean we will like the results though.