Today I’ve shipped a new framework for documenting our tasks in the pipelines.lsst.io documentation site. I’d also like to point out the contributions along the way from @mssgill , @swinbank , and @KSK as well. I’d love for you to take a look and give your feedback in this topic thread.
This framework iterates on what we’ve done previously in Doxygen. This Sphinx-based reimagining of task documentation leverages custom Sphinx extensions (implemented in Documenteer) to automate as much of the documentation process as possible. For me, this is a fun moment because we’re finally tapping into the extensibility of Sphinx that drove our decision to adopt it.
There’s lots more that can be done to make useful task documentation. Now is a good time, I think, to show this work to the DM team and get your feedback.
Brief tour of task documentation in action
You can see the new task documentation today in the daily builds of the lsst.pipe.tasks documentation. The module homepage lists tasks with this brief bit of boilerplate:
Task reference
==============
Command-line tasks
------------------
.. lsst-cmdlinetasks::
:root: lsst.pipe.tasks
Tasks
-----
.. lsst-tasks::
:root: lsst.pipe.tasks
:toctree: tasks
Configurations
--------------
.. lsst-configs::
:root: lsst.pipe.tasks
:toctree: configs
The task and config summaries come from the one-sentence summaries in the corresponding class docstrings.
Look at the ProcessCcdTask documentation as an example of what task documentation can look like. All the subtasks and other configuration fields are automatically documented with this boilerplate:
Retargetable subtasks
=====================
.. lsst-task-config-subtasks:: lsst.pipe.tasks.processCcd.ProcessCcdTask
Configuration fields
====================
.. lsst-task-config-fields:: lsst.pipe.tasks.processCcd.ProcessCcdTask
There are a lot of places in the ProcessCcdTask documentation that would normally be links, but currently aren’t because the corresponding API reference page isn’t available yet.
Next, take a look at the AssembleCoaddTask documentation. Content-wise it isn’t complete, but there you can see how the Python API summary section is intended to work.
With the new task documentation, one of my design goals was to move task documentation out of docstrings. We’re doing this for a couple of reasons. First, it gives us a bit more flexibility than what the numpydoc standard gives. With tasks, we’re documenting more than a single class and using class docstrings as we were was a bit of a stretch. Second, tasks will be used by more than just our API user base. For example, users on the Science Platform may fire off tasks (thanks to PipelineTask
) without using the Python API. Task topic pages cater to multiple focuses, be it users of different PipelineTask activators, API users, or even general scientific documentation.
All this to say, the Python API summary section is designed as a bridge from a task topic page to the API reference page. It lets an API user quickly jump to the numpydoc-generated documentation so we can let that numpydoc documentation provide Python API-specific details like parameters, returns, and exceptions.
Documentation for task documentation
This new task documentation framework is documented in the “DM Stack” section of developer.lsst.io:
- The Task topic type page describes how to write task documentation following the template.
- The Config topic type page describes how to document those rare Config classes that aren’t directly associated with a task but are referenced by a config field.
- The new Task reference section in the module homepage describes how to lists tasks that a module implements.
Those pages reference new templates in the github.com/lsst/templates repository. They are:
Lastly, there is some brief documentation about the new Sphinx extensions implemented in Documenteer. For example, there are lsst-task and lsst-config-field roles that link to a task or configuration field:
:lsst-task:`~lsst.pipe.tasks.processCcd.ProcessCcdTask`
:lsst-config-field:`lsst.pipe.tasks.processCcd.ProcessCcdConfig.isr`
Feedback requested
Over the next couple of weeks, I invite you to give your impressions of the current template for task documentation, and give suggestions for where we can go next with it.
Here are some starter questions I have:
- Overall, does the template give you all the sections you need to document your tasks? Are there any common situations that could be incorporated into the template?
- Do you like the Python API summary section? Does it strike the right balance of brevity and letting the actual API reference page do its thing with making the task topic page useful? For example, should the Python API summary section include parameters and return types for the class docstring and run method?
- How can we document dynamic and obs package configuration overrides? I’m thinking of simply showing the code for the config class’s
setDefaults
method and the content from the corresponding modules in the obs package’sconfig
directory. - Configurations of subtasks. Right now these task topics are designed for users to click through to subtasks to see their configuration fields. Should we include the configuration fields of (default) subtasks on the parent task’s page? Perhaps this requires a search box and JavaScript-enabled progressive disclosure.
- Would it be useful for a task to automatically list all the known tasks that use it?
There’s also a well-known need for the following things, which I look forward to hearing your ideas about:
- Testable, Jupyter-enabled examples.
- Dataset documentation (ideally integrated with content in the codebase thanks to the Gen 3 middleware work).
- Replacement of the command-line task documentation with activator documentation.
I’ll be unable to personally respond to questions and feedback for the next couple of months, but I look forward to reviewing the discussion soon. Thanks!