DMTN-030: Science Pipelines Documentation Design (technote)

Available at https://dmtn-030.lsst.io/v/v1.

Last month, @mssgill, @KSK, @swinbank and I met in Tucson to design the documentation system for the LSST Science Pipelines (https://pipelines.lsst.io). This documentation will cover everything from installing the pipelines and processing data, to detailed task and API references.

DMTN-030 is a report of the design decisions made in that meeting, and will serve as an evolving reference for our implementation plans.

A large part of DMTN-030 is devoted to designing a topic-based documentation architecture. Following Every Page is Page One, topic-based documentation is organized as a network of self-contained pages that orient a reader who arrives from a web search, answers the reader’s question, and lets them move to related information with links. Unlike a linearly structured manual, topic-based documentation works naturally on the web. Some advantages of topic-based docs:

  • Topics follow predefined types. This makes contributing documentation easier because we’ll provide templates and style documentation for every type of documentation we’ll need. As a developer, you’ll be able to contribute your knowledge without having to worry about how your content is organized.
  • Since topics are self-contained, it’s easy to assign topic writing in JIRA tickets. It should be well understood what the scope and inter-topic relationships are before writing each topic. There will be less information duplication because topics link out to, rather than re-explain, subjects that are beyond their scope.
  • The consistent presentation afforded by the type system will also improve reader’s wayfinding.

Combined with the doc-as-code approach, we think that topic-based documentation is an ideal system for documentation that is convenient for DM to manage, contribute to, and maintain.

DMTN-030 (v1) has the following sections:

I hope you can take a moment to browse through DMTN-030 to become familiar with the roadmap for https://pipelines.lsst.io. We welcome your questions and feedback in this forum topic.

Note that the https://pipelines.lsst.io project is part of the overall Data Management Documentation Architecture proposal, currently being considered by the TCT as LDM-493.

Soon, we will follow-up DMTN-030 with author documentation and ready-to-use templates so that we can finally begin to write the docs that DM needs.

4 Likes

Looks great! Just a few comments I came up with while reading (skimming) the tech note:

  • LSST Camera and T&S are also major consumers of the Science Pipelines that should be included in the list in Section 3. I don’t think that omission has any affect on the conclusions of section 3; I think Camera and T&S documentation needs are essentially fully covered by the needs of DM and DESC.

  • 4.2 implies that stack-wide documentation configuration will live in sconsUtils or one of its dependencies. I think we should try to move as much of that as possible to base instead, where we currently keep most of our Doxygen configuration. I’d like to keep sconsUtils limited to build logic as much as possible, with configuration in downstream packages.

  • On the connection to LDM-151 in the Note in 6.2: I think the primary difference is that LDM-151 is a guess at the list of future high-level pipelines, and your list in 6.2 is essentially a list of current high-level pipelines. I expect the codebase and the documentation to evolve during construction to ultimately look more like LDM-151’s list (but perhaps not be identical to it, not least because it’s just a guess). I could also imagine that we’d ultimately want to expand out to a level below what’s in the LDM-151 list and flatten, perhaps doubling the number of entries (the future pipelines will simply have many more independently-useful stages than the current ones do).

  • 10.5: Task See-Also sections should probably link to child Tasks. These aren’t strictly an implementation detail: they’re quite visible in the configuration tree, which is perhaps the most public interface a Task has.

  • 10.7: We’ll definitely need sections on Task initialization, because it’s the most counter-intuitive part of some of our most important Tasks, and very easy to forget to document well.

  • 10.12: We should have an explicit option to describe the algorithm on a separate page (or separate document); some of these descriptions may ultimately be many pages long, and I think we’ll want to link to those descriptions from documentation entry points designed for people who aren’t interested in the software at all.

  • I think we may want to consider moving the parts of the Processing/Postprocessing topic that discuss programmatic access to data products (rather than CmdLineTasks that do Postprocessing) up to a new homepage-level topic. That’s one of the most important contexts for using the Science Pipelines software, and it’s really quite different from running processing jobs on the command-line.

1 Like

Finally got around to reading this. The proposed system looks very nice, especially the prospect of unifying C++ and Python API documentation in a coherent manner once Breathe’s formatting is improved.

I do have one correction, however: in section 11.1.2, you say Doxygen documents all classes and functions on the same page. This is true for functions but not true for classes, for the reasons you describe. Did you mean to talk about Breathe, rather than Doxygen’s own HTML output?

Thanks for spotting this typo, I’ve pushed the fix. You’re right, I meant Breathe’s output rather than Doxygen’s.

Here’s a brief update on Pipelines documentation and the on-going implementation of the DMTN-030 plan.

New build tool

We’ve just released documenteer 0.2.0 that includes a Sphinx front-end tool for building multi-package Science Pipelines documentation.

Briefly, here’s how the build tool works:

  • We’re using an EUPS table file in the pipelines_lsst_io root documentation repository to set up packages that are part of the pipelines.lsst.io project.
  • It uses EUPS to find those packages and their doc directories.
  • We’ve placed a new manifest.yaml file in the doc directories of packages to tell the build tool what documentation content exists.
  • The build tool creates symlinks between the root pipelines_lsst_io project and documentation content in individual packages.
  • Finally, the build tool runs a Sphinx build and optionally uploads the product to LSST the Docs.

The new manifest.yaml file is currently described in DMTN-030.

New package documentation organization

Our design now considers two types of documentation in package doc directories:

  1. Documentation for Python modules (e.g., lsst.afw.table).
  2. Documentation for the package itself (e.g., afw).

Previously we only considered the first type of documentation, but it’s becoming clear we need to support both.

Module documentation lets us document Python modules (and underlying C++) without reference to the segmentation of the lsst namespace across EUPS-managed packages. This will be good for our API users who aren’t Stack developers.

Package documentation gives us a place to document the EUPS package itself. Here we can link to the Git repository, the JIRA component, and document EUPS dependencies. We can also document data packages this way.

I’ve updated DMTN-030 to describe this information architecture a bit more.

Implementation in pipe_base

I’ve been working on user documentation for pipe_base that follows the new documentation system. You can see what it looks like here: https://github.com/lsst/pipe_base/tree/tickets/DM-11253/doc

Is there some plan for migrating packages to the new system? Should we add a manifest.yaml to any new packages?

Hey @kfindeisen, I don’t recommend that anyone do this for their own packages at this point. I’m still slowly growing the prototype out and learning from the experience.

Once the system matures it will be fully documented and templated so that others can contribute.

The last update was just to gain early feedback from DM so I can still pivot if there’s an issue.