Last week we ran a small hack day at the end of our Joint Technical Meeting. The main focus of the hack session was to begin migrating our EUPS stack packages into the new framework for https://pipelines.lsst.io. See DMTN-030 for background on the technical design of the new system.
More information on the hack day:
This post will debrief the outcomes from this hack session.
We split the hack into two parts:
- Add Sphinx boilerplate files to the
doc/directory of a package.
- Migrate docstrings in that package to the Numpydoc format.
As of this writing:
11 packages have upgraded
- 4 packages have fully-converted docstrings.
After the initial learning curve, it seemed that folks could tackle the
doc/ directory conversions fairly easily (maybe 15 minutes of work per package, or less, once the tooling was installed and instructions are understood).
The more time-consuming bit is the Numpydoc conversion. I continue to see GitHub activity related to this, and I think we’re on a slow-but-steady trajectory towards completion.
The good thing is that Numpydoc conversion does not have to be a big-bang merge. In fact, it’s possible to convert a single module’s (as in, Python file’s) docstrings at a time, have
automodapi render documentation for that module, and have the package in a merge-able state. I’ll talk about this more at the end of the post.
Task configuration docs: the unexpected discovery
A fantastic outcome of the hack day was @parejkoj’s discovery that configuration classes document themselves.
pex_config configuration classes generate their docstrings (
__doc__ attributes) based on user-supplied configuration parameters. The configuration field objects are attributes of a Task’s configuration class, so the API reference page of a configuration class effectively reads as the Task’s configuration.
These configuration field docstrings aren’t necessarily Numpydoc and reStructuredText-compatible, but @KSK heroically stepped-up and implemented a PR from an LAX bar on the way back from JTM (using the prototype LSST Science Platform’s JupyterLab as a development environment). The ticket is DM-13755.
In DMTN-030 we talk about Task documentation, and how we intend to have a specialized topic type for Tasks that combines information from both the Task and Task configuration classes. With this discovery, it should be possible to generate the configuration reference automatically by single-sourcing documentation directly from the configuration class.
The hack day was a great test environment for the developer experience around writing package documentation. Again, thanks to all the participants for also beta-testing my infrastructure.
Here are some takeaways:
- The templates were successful. I think there was some confusion about how to interpret the Jinja cookiecutter template, but using the rendered example worked well. The templates are still a work in progress, and adding per-file instructions as planned should solve most problems.
- In the package homepage, I want to link to the package’s corresponding JIRA component. It’s a little hard to discover a permanent link to a JIRA component, though. For now we commented out that information. I think the solution will be to curate a centralized DB in the documenteer package and provide a directive/role to add that link for a given package.
- Sphinx emits a warning about missing
_static/directories in per-package builds. This doesn’t have any adverse effects, but finding a way to mute this warning when necessary will improve developer confidence.
- While iterating on docstrings, developers will often need to run
rm -R _build && rm -R py-apito clear the cached build. I think it will be useful to provide a custom Sphinx front-end for per-package builds that includes a
clearsub-command (normal Sphinx projects use a Makefile for this, but I’ve run into SIP issues specific to Sphinx docs inside the EUPS stack environment).
- Overall, we’re not getting useful error reporting from the Sphinx/numpydoc/automodapi toolchain. The immediate solution is to convert docstrings deliberately, one module at a time, to identify bad syntax that breaks a build. Long-term, though, I think I should try to contribute error reporting improvements upstream.
- We realized that we need to document graphviz as a new development dependency for the Stack.
- We underscored the need for a Jenkins job dedicated to documentation development. Watch DM-13681.
As I see it, there are two milestones to work towards.
1. Flipping the switch on pipelines.lsst.io
The first milestone is making the new pipelines.lsst.io, with integrated package documentation, the default. DM-11216 is currently serving as an integration branch in the pipelines_lsst_io repo. Merging that branch to
master requires three things:
- A Jenkins job for building pipelines.lsst.io from arbitrary git refs, like stack-os-matrix (DM-13681).
- A decision on whether we want the https://pipelines.lsst.io homepage to reflect
masteror the most recent release tag (a weekly or a stable release).
- Some design improvements to the Sphinx theme, including a version switcher.
2. Numpydoc conversion
The second milestone is to have 100% conversion of Python docstrings to Numpydoc so that we have a comprehensive Python API reference. As I said earlier, this can be done incrementally, even on a file-by-file basis, so long as the Sphinx build is successful.
I think this means that we can see organic growth of the API reference as the code base is developed. Nearly all LSST developers I’ve talked to have been enthusiastic about writing Numpydoc API references, and that’s awesome.
I think the places we’ll need to create specific tickets for Numpydoc conversion are in fairly mature packages that aren’t frequently developed. I can provide some effort here, but we can also look towards external contributions (hi @drphilmarshall).
I recognize that it will be easier to plan and estimate effort towards docstring conversion if we could quantify the number of APIs that need to be documented, and track the number that have already been converted. I’ll look into this, but I’m also open to suggestions.
Bottom line, thanks to all the DM developers for caring about documentation. Together we’ll make https://pipelines.lsst.io awesome.