Is nbval a viable candidate for testing our example Jupyter notebooks?

mwv · January 25, 2017, 3:32pm

Testing of example code in Jupyter notebooks will be helpful in ensuring that our tutorials and related examples remain current. Does anyone have any experience with nbval, which is supposed to integrate with the py.test framework?

Are there other similar packages that are worth investigating?

jsick · January 25, 2017, 3:55pm

Thanks @mwv, I need to try it out still but I imagine that’s what we’ll want for documentation testing.

Mini-update: it seems to do the testing by comparing computed output against saved output cells. I’ll have to see if this can do everything for us, or if there’s an addition need for ‘hidden’ test cells that contain py.test test functions.

I also noticed nblint recently for flake8-ing notebook code, and then there’s nbdime for diffing notebooks. This looks like a good technology stack for CI’ing our docs.

swinbank · January 25, 2017, 3:59pm

I have no experience of nbval, so I can’t answer the question directly.

However, I wonder if we should actually take a step back and ask what our preferred mechanism for presenting tutorials and examples actually is (or, rather, will be). I’d prefer us to take a unified, coherent approach to this, rather than attempting to expand our systems (and our readers’ patience) to cover whatever formats we can think of.

For example, given the approach pipelines documentation described in DMTN-030, I’d imagine one could construct a good argument that we’d should embed tutorials directly in the documentation, written as reStructuredText documents and tested appropriately, rather than by trying to graft on notebooks (which, arguably, will never integrate properly with Sphinx).

I don’t take an opinion on which route we should go down for now, but I do think we should step back and think about how we actually want our tutorials to work before we focus on the details of the technology.

timj · January 25, 2017, 4:00pm

Slight tangent here, but is the plan to replace all our current examples/ code with tested code? I get the impression that most of our examples will only work by fluke.

jsick · January 25, 2017, 4:02pm

I’ve got some ideas about a workflow where examples are written natively in Sphinx, but can be output into notebooks for users to try at home or on the Science Platform. You’re absolutely right that I should write up a proposal for this.

swinbank · January 25, 2017, 4:03pm

My opinion is that untested examples are actually worse than useless: they are actively instructing folks to do things which won’t work. From that point of view, I strongly think that ought to be the plan.

In terms of resource allocation, it rather depends how our current work prototyping the ideas set forth in DMTN-030 proceeds, together with if & how we converge in discussions like this one, and the availability of person-power to actually do the work.

jsick · January 25, 2017, 4:04pm

Yes, it’d also be worth having a discussion about this. I think we should eventually deprecate examples/ in their current form and only have examples that are driven by documentation and tested as part of documentation CI. So whether there’s anything in examples/ or not is an implementation detail.

KSK · January 25, 2017, 4:42pm

I agree strongly with this. I don’t have a notion of how much work it would take, but (at least for the python examples) it would be useful to go through and convert the ones that can be doctested in rST and pitch the rest of them.

timj · January 25, 2017, 4:44pm

This might be an interesting JTM hack session: collaboratively migrating a few examples into a testable form.