Is nbval a viable candidate for testing our example Jupyter notebooks?

Tags: #<Tag:0x00007fb37ee2e350> #<Tag:0x00007fb37ee2e260> #<Tag:0x00007fb37ee2e0a8> #<Tag:0x00007fb37ee2df18>

Testing of example code in Jupyter notebooks will be helpful in ensuring that our tutorials and related examples remain current. Does anyone have any experience with nbval, which is supposed to integrate with the py.test framework?

Are there other similar packages that are worth investigating?

1 Like

Thanks @mwv, I need to try it out still but I imagine that’s what we’ll want for documentation testing. :+1:

Mini-update: it seems to do the testing by comparing computed output against saved output cells. I’ll have to see if this can do everything for us, or if there’s an addition need for ‘hidden’ test cells that contain py.test test functions.

I also noticed nblint recently for flake8-ing notebook code, and then there’s nbdime for diffing notebooks. This looks like a good technology stack for CI’ing our docs.

I have no experience of nbval, so I can’t answer the question directly.

However, I wonder if we should actually take a step back and ask what our preferred mechanism for presenting tutorials and examples actually is (or, rather, will be). I’d prefer us to take a unified, coherent approach to this, rather than attempting to expand our systems (and our readers’ patience) to cover whatever formats we can think of.

For example, given the approach pipelines documentation described in DMTN-030, I’d imagine one could construct a good argument that we’d should embed tutorials directly in the documentation, written as reStructuredText documents and tested appropriately, rather than by trying to graft on notebooks (which, arguably, will never integrate properly with Sphinx).

I don’t take an opinion on which route we should go down for now, but I do think we should step back and think about how we actually want our tutorials to work before we focus on the details of the technology.


Slight tangent here, but is the plan to replace all our current examples/ code with tested code? I get the impression that most of our examples will only work by fluke.

I’ve got some ideas about a workflow where examples are written natively in Sphinx, but can be output into notebooks for users to try at home or on the Science Platform. You’re absolutely right that I should write up a proposal for this.

1 Like

My opinion is that untested examples are actually worse than useless: they are actively instructing folks to do things which won’t work. From that point of view, I strongly think that ought to be the plan.

In terms of resource allocation, it rather depends how our current work prototyping the ideas set forth in DMTN-030 proceeds, together with if & how we converge in discussions like this one, and the availability of person-power to actually do the work.


Yes, it’d also be worth having a discussion about this. I think we should eventually deprecate examples/ in their current form and only have examples that are driven by documentation and tested as part of documentation CI. So whether there’s anything in examples/ or not is an implementation detail.

I agree strongly with this. I don’t have a notion of how much work it would take, but (at least for the python examples) it would be useful to go through and convert the ones that can be doctested in rST and pitch the rest of them.

This might be an interesting JTM hack session: collaboratively migrating a few examples into a testable form.