Pipelines Documentation Site Organization Sketch

In a separate discussion today I sketched out my vision for the Pipelines documentation site (pipelines.lsst.io). Although this is still a proposal, I wanted to share this outline widely both so that the pipelines team can understand my vision for the docs, and also to start a conversation about the types of content we’ll need to produce and how it will be organized. Eventually this can be distilled into a technote to help focus pipelines documentation production.

The proposed organization reflects patterns I’ve noticed in successful documentation sites. Some key features of the plan are:

  • An ‘Overview’ section to promote key functionality in an approachable way to attract and orient new users.
  • Command line tasks are featured as a main interface to the Stack for astronomers. Collecting and curating tasks in a top-level section makes this interface approachable.
  • In-depth user guides and API references for EUPS packages live in situ with the code. Still, this organization scheme curates packages into logical groups on the homepage and allows us to write umbrella user guides around API topics (e.g., for all meas packages, or all obs packages).

Again, what follows is my vision for the organization of the docs, primarily from the perspective of the layout of the homepage.

The pipelines team might have a different vision and I’d love to hear about that perspective. The main thing is that I’d like us to have a concrete vision for the Pipelines docs so that we can work together to ship it.


Getting started

This will link into:

  • Installation docs
  • Release notes and a high-level “what’s new in this release” page
  • A user-oriented EUPS guide/recipes so that people know how to integrate additional packages with their installed Stack.
  • A page describing the meta-packages (lsst_apps, lsst_distrib, etc) and what they contain

Overview

This section will link into a series of short pages that introduce astronomers into the world-view of the LSST Science Pipelines. These pages are written for people who are new to the Stack (and maybe haven’t even invested in installing it yet) and just want to the know what the Stack is and what it can offer for their work.

This is where we introduce all the key concepts behind the Stack, such as tasks and command line tasks, the butler, key data structures like exposures and tables, ISR algorithms, calibration algorithms, measurement algorithms, and so on.

I see these pages being short, approachable, and non-exhaustive. These pages will link into more in-depth documentation later in the doc site. You can think of these pages as marketing.

These pages will be a souped-up version of something like https://www.djangoproject.com/start/overview/ and the “Intro to Django” section of https://www.djangoproject.com/start/

Tutorials

This section will link into a series of general tutorials that exercise the Stack and give astronomers a feel for what it’s like to use both the command line tasks and the Python API.

Command Line Tasks

This section will be a gateway for all command line task (pipeline) documentation. I’m putting it here because a lot of astronomers will be focused on this interface. There will be two parts to this section:

  1. General user guides for command line tasks: how they can be run; how they are configured; and how command line tasks can be strung together to form whole pipelines.

  2. A reference section linking to pages documenting all command line tasks from across the Stack. These tasks will be organized into topical groups and ordered by flow of execution.

API Guide

This section will link to the user guides of individual packages (i.e., content in the doc/ directories of each package’s Git repository).

At the highest level, the API Guide section will be divided into topical subsections. For example, groupings of all the measurement packages, all the obs packages, all the butler and data access packages, etc… These subsections can have an umbrella guide that covers the API topic as a whole, and orients the user towards what each package does. There might also be tutorials that specifically focus on this part of the api.

Within the API topic subsection would be the user guides for each package. I expect these user guides will be similar to the user guides that astropy produces for their subpackages (e.g. see the astropy.table user guide as a great archetype). The main sections of each package’s user guide are:

  1. Introduction — A brief statement of what the package is for and a high-level list of key features)
  2. Getting Started — If appropriate, a section with examples show what it’s like to use the API)
  3. Using {{package name}} — detailed user guides covering all of the API’s functionality and behaviour
  4. Task Reference — A section linking to the detailed reference documentation for each task and command line task
  5. Python API Reference — A section linking to the detailed API reference pages for all Python classes and functions, i.e., content from docstrings.
  6. C++ API Reference — A section linking to the complementary API reference oriented to C++ consumers.
2 Likes

@jsick the organization looks great, we could also have a “Developing for LSST” session pointing to the developers guide to have everything in one place. Tutorials and command line tasks like Jim’s technote dmtn-023 is great to have.

Absolutely. There’ll be some tension between developer.lsst.io and pipelines.lsst.io. I think that developer guidance that applies only to pipelines can and should live in pipelines.lsst.io. For example, documentation on making a package for the stack, or an obs package in particular should be in pipelines.lsst.io.

A clear policy for defining what developer content goes where is needed.

And you’re right, whatever related content in developer.lsst.io can be linked from pipelines.lsst.io as if it were part of the Pipelines documentation. I’ll try to design the documentation projects such that it’s visually clear when you’ve jumped from one project to another.

This is more a comment on the goals of the pipeline docs rather than on the site organisation, but I think that they are related so I’m posting this in this thread.

I agree that command line tasks are one major entry point, but in particular tasks that handle data files specified on the command line (whether processFile.py or via obs_test) is the most accessible way to start using the stack and I think we should design the docs around this.

I also think that it’s important to document the reusability of functional tasks (e.g. object detection), and to explain how this can be done. This is partly configuration parameters, and we need a strategy for documenting them (and making them easier to use!), and partly how someone would make small python-level changes.

1 Like

Do you intend for the {{package name}}s in section 3 of the API guide to be EUPS or Python packages, or some other conceptual grouping of functionality?

Unfortunately, I don’t think those two concepts of packages map well to one another right now - or at least they don’t map well enough that I’d recommend our existing Python or EUPS packages as a organizing principle for top-level documentation (the package boundaries are of course not arbitrary at all, but there’s also a lot of historical package contents that I think new users would find confusing). Having a separate page listing what’s in each package would be very worthwhile, but probably not in a table-of-contents sense. Hopefully we’ll eventually reorganize our packagfes to improve matters.

Thanks, this is a very good point. My idea is that these package API guides, since they’ll live directly in Git repos rather than the root doc repo: https://github.com/lsst/pipelines_docs—will map to the EUPS package/GitHub repo.

However, I want the docs to refer to packages by their root Python namespace rather than EUPS package name since I think that makes the most sense to astronomers who will be consuming the Python API and won’t immediately care about the Stack’s core organization into EUPS packages (I’ve seen us refer to units of the stack by either C++ namespace, EUPS package name and Python namespace).

Implicit in this is that EUPS packages map 1:1 to a unique root Python namespace. Is this generally true?

For some EUPS packages, like afw, I intend to split the package guides into the sub-packages, like afw.table.

And given that many EUPS packages naturally group together, the docs will group those packages together and provide a layer of documentation that describes that system of packages as a whole.

I think we’re in agreement—let me know if I’m wrong. What I could do to make this more clear is actually implement the full doc outline so we can talk specifically about where packages will end up in the docs.

Yes. At least in Science Pipelines, I believe it’s always true with the exception of afw (which you’ve already noted).

I think we generally agree about what goes in per-package docs. My biggest concern here is more about how much of a role the package organization plays in structuring the cross-package entry points into the API documentation. I suppose I’m saying that I think we need something beyond the marketing-level overview and command-line-oriented tutorials to serve as an well-organized entry point, since we shouldn’t rely on the per-package overview pages (or a list of their summaries) for that. I don’t have a comprehensive solution in mind, but I think we need user guide and tutorial level docs for using the stack via Python as well as from the command-line, and I’m skeptical we can do that well in per-package documentation. Much of the content in those would need to come later, as it will require work to create, but I think starting off with a place for it and a skeleton for its content might help it along.

Yes, I think I understand. I think my original proposal was coloured by astropy where the API structure is a central. The Pipelines are a little different in that they’re ‘opinionated’ fully-implemented solutions in addition to an API for building a la carte solutions.

Maybe what we need are topical guides that cover key concepts and provide natural pathways to the docs of their implementations (as either command line tasks or the APIs). Then the contents would look something like

  1. Getting Started
  2. Overview
  3. Tutorials
  4. Topical Guides ← :new:
  5. Command line task reference (organized mirror the structure of topic guides)
  6. API Guides (organized to mirror the structure of topic guides)

Each topic guide would link to its associated set of command line task pages and API guides, but these lists of tasks and API guides would also be on the homepage for discoverability.

We’ll have to be very careful about having the appropriate level of documentation at each level of their hierarchy so that there isn’t unnecessary duplication. Topical guides would speak to science user stories (all about measurement; all about instrument data mapping, all about data containers, all about calibration, etc.), while the task documentation and API guides would speak specially to their practical usage and specific behaviours of the API.

Inserting the topical guides section sounds like a very reasonable way to address my concern.

I’m closing this thread so we can move continued discussed to DMTN-030: Science Pipelines Documentation Design (technote) (discussion around DMTN-030 Science Pipelines Documentation Design).