Package documentation hack session at JTM 2018

jsick · March 13, 2018, 5:49pm

Last week we ran a small hack day at the end of our Joint Technical Meeting. The main focus of the hack session was to begin migrating our EUPS stack packages into the new framework for https://pipelines.lsst.io. See DMTN-030 for background on the technical design of the new system.

More information on the hack day:

This post will debrief the outcomes from this hack session.

Before we go further, I want to give shout outs to @hsinfang, @merlin, @timj, @kennylo, @jmeyers314, @cmorrison, @parejkoj, @mrawls, and @sophiereed for their participation.

Overall progress

We split the hack into two parts:

Add Sphinx boilerplate files to the doc/ directory of a package.
Migrate docstrings in that package to the Numpydoc format.

As of this writing:

11 packages have upgraded doc/ directories.
4 packages have fully-converted docstrings.

After the initial learning curve, it seemed that folks could tackle the doc/ directory conversions fairly easily (maybe 15 minutes of work per package, or less, once the tooling was installed and instructions are understood).

The more time-consuming bit is the Numpydoc conversion. I continue to see GitHub activity related to this, and I think we’re on a slow-but-steady trajectory towards completion.

The good thing is that Numpydoc conversion does not have to be a big-bang merge. In fact, it’s possible to convert a single module’s (as in, Python file’s) docstrings at a time, have automodapi render documentation for that module, and have the package in a merge-able state. I’ll talk about this more at the end of the post.

Task configuration docs: the unexpected discovery

A fantastic outcome of the hack day was @parejkoj’s discovery that configuration classes document themselves.

pex_config configuration classes generate their docstrings (__doc__ attributes) based on user-supplied configuration parameters. The configuration field objects are attributes of a Task’s configuration class, so the API reference page of a configuration class effectively reads as the Task’s configuration.

These configuration field docstrings aren’t necessarily Numpydoc and reStructuredText-compatible, but @KSK heroically stepped-up and implemented a PR from an LAX bar on the way back from JTM (using the prototype LSST Science Platform’s JupyterLab as a development environment). The ticket is DM-13755.

In DMTN-030 we talk about Task documentation, and how we intend to have a specialized topic type for Tasks that combines information from both the Task and Task configuration classes. With this discovery, it should be possible to generate the configuration reference automatically by single-sourcing documentation directly from the configuration class.

Lessons learned

The hack day was a great test environment for the developer experience around writing package documentation. Again, thanks to all the participants for also beta-testing my infrastructure.

Here are some takeaways:

The templates were successful. I think there was some confusion about how to interpret the Jinja cookiecutter template, but using the rendered example worked well. The templates are still a work in progress, and adding per-file instructions as planned should solve most problems.
In the package homepage, I want to link to the package’s corresponding JIRA component. It’s a little hard to discover a permanent link to a JIRA component, though. For now we commented out that information. I think the solution will be to curate a centralized DB in the documenteer package and provide a directive/role to add that link for a given package.
Sphinx emits a warning about missing _static/ directories in per-package builds. This doesn’t have any adverse effects, but finding a way to mute this warning when necessary will improve developer confidence.
While iterating on docstrings, developers will often need to run rm -R _build && rm -R py-api to clear the cached build. I think it will be useful to provide a custom Sphinx front-end for per-package builds that includes a clear sub-command (normal Sphinx projects use a Makefile for this, but I’ve run into SIP issues specific to Sphinx docs inside the EUPS stack environment).
Overall, we’re not getting useful error reporting from the Sphinx/numpydoc/automodapi toolchain. The immediate solution is to convert docstrings deliberately, one module at a time, to identify bad syntax that breaks a build. Long-term, though, I think I should try to contribute error reporting improvements upstream.
We realized that we need to document graphviz as a new development dependency for the Stack.
We underscored the need for a Jenkins job dedicated to documentation development. Watch DM-13681.

Next steps

As I see it, there are two milestones to work towards.

1. Flipping the switch on pipelines.lsst.io

The first milestone is making the new pipelines.lsst.io, with integrated package documentation, the default. DM-11216 is currently serving as an integration branch in the pipelines_lsst_io repo. Merging that branch to master requires three things:

A Jenkins job for building pipelines.lsst.io from arbitrary git refs, like stack-os-matrix (DM-13681).
A decision on whether we want the https://pipelines.lsst.io homepage to reflect master or the most recent release tag (a weekly or a stable release).
Some design improvements to the Sphinx theme, including a version switcher.

2. Numpydoc conversion

The second milestone is to have 100% conversion of Python docstrings to Numpydoc so that we have a comprehensive Python API reference. As I said earlier, this can be done incrementally, even on a file-by-file basis, so long as the Sphinx build is successful.

I think this means that we can see organic growth of the API reference as the code base is developed. Nearly all LSST developers I’ve talked to have been enthusiastic about writing Numpydoc API references, and that’s awesome.

I think the places we’ll need to create specific tickets for Numpydoc conversion are in fairly mature packages that aren’t frequently developed. I can provide some effort here, but we can also look towards external contributions (hi @drphilmarshall).

I recognize that it will be easier to plan and estimate effort towards docstring conversion if we could quantify the number of APIs that need to be documented, and track the number that have already been converted. I’ll look into this, but I’m also open to suggestions.

Bottom line, thanks to all the DM developers for caring about documentation. Together we’ll make https://pipelines.lsst.io awesome.

parejkoj · March 13, 2018, 5:57pm

I think that should be called clean instead, for consistency with scons et al.

kfindeisen · March 13, 2018, 6:09pm

In the package homepage, I want to link to the package’s corresponding JIRA component. It’s a little hard to discover a permanent link to a JIRA component, though.

Oh dear. I thought I’d found the permanent link fairly easily when I was documenting ap_verify, but now the link is broken.

Since I now can’t find a “component homepage” for ap_verify on Jira (the obvious path just takes me to a list of issues), I wonder if those pages were removed as part of the recent Jira upgrade.

timj · March 13, 2018, 7:09pm

What do we want this to mean? In my mind it’s a query returning all the tickets relating to the component.

jsick · March 13, 2018, 7:29pm

That’s right. I really miss GitHub Issues as a way of learning about known bugs and understanding the development trajectory of a package. I think a readily-available JIRA query that shows that, proxied through JIRA tickets associated with a JIRA component, would help a lot of people.

timj · March 13, 2018, 7:31pm

https://jira.lsstcorp.org/issues/?jql=project%20%3D%20DM%20AND%20component%20%3D%20daf_butler ?

kfindeisen · March 13, 2018, 7:37pm

I think if the current behavior is what we want, then there’s no need to have a centralized database as proposed in the original post – we could just provide a link in the documentation template and the substitution would be fairly obvious (at least, if the link uses literal spaces instead of %20 for readability).

jsick · March 13, 2018, 7:44pm

Great. I’ve updated the template to use the JIRA query URL: