Adding end user documentation to the product tree

jsick · May 16, 2017, 7:43pm

I’ve been tasked with making an end user documentation tree for LDM-294. This message is to coordinate how this can be best integrated with the DM Product Tree.

For background, at the highest level, this is a map of what I see as the main end user documentation sub-sites for the DM system:

The key documentation sites I expect we’ll make for astronomers are:

pipelines.lsst.io: integrated documentation for our science pipeline stack.
alerts.lsst.io: integrated documentation for the alert stream service.
drN.lsst.io: integrated documentation for a specific L2 data release and data access services.

There are additional documentation sites for each software product and service that aren’t directly astronomer-facing. These include, Qserv, Firefly, DAX servers, and so on. These additional doc sites are tied to code bases, and are pitched to that product’s developers and operators. Every software repo will have a documentation site.

For the documentation tree, what I’d like to do is map each WBS to a software or service product and then to a documentation site, for example:

For many documentation sites we’ll have multiple software projects integrated into one documentation product (for example, each science pipelines package is integrated into pipelines.lsst.io).

Some products, particularly those for services like DAX and Qserv, will have their own documentation sites for developers and operators tied closely to the code base, in addition to being implicated in astronomer-facing documentation for DR or alert stream products:

@ktl: I’d like to see how we can modify the Product Tree CSV to accommodate this mapping. I think the best way to do this is to add new CSV files with SQL-style foreign key relationships.

One is a softwareproducts.csv with one row per Git repository. The columns are:

name (i.e. repo name in the GitHub URL)
Git repo URL
WBS (key into productlist.csv)

(Effectively this replaces the “Package List” column in the product tree CSV

Then also add a webservices.csv table, with columns:

service name
software repo names (key into softwareproducts.csv)
WBS (only if the web service is not associated with a software repo, otherwise the WBS automatically flows from the softwareproducts.csv table)

And finally add a docproducts.csv table with columns:

site domain name
root repository URL (e.g., of the Sphinx project)
list of software repo and service names (keys into softwareproducts.csv and webservices.csv)

Then, if necessary, I can add additional tables that map documentation projects down to the section and page level, but that may be beyond the immediate scope of LDM-294 (we’re developing content architectures for documentation sites elsewhere, in https://dmtn-030.lsst.io, for example).

@ktl what do you think of this CSV refactor? @womullan, @timj, and @swinbank do you have any other input on how we’re capturing this product and documentation tree?

Another aspect is that this data model is a graph with some many-to-one relationships, rather than a tree. This might impact some of the Python visualization tools we have.

swinbank · May 16, 2017, 10:57pm

I’m a little concerned that mapping from WBS to repository to documentation goes beyond “some many-to-one relationships”: several of our repositories have contributions from more than one WBS and contain products which map to more than one section of the documentation (the obvious example is afw, but there are others). That turns the nice simple model in your post into a whole bunch of many-to-many relationships, and the complexity might balloon out of control.

That said, it’s also possible that thinking this through will help us better understand and manage that complexity, so perhaps it’s worth giving it a try to see how far we can get. I’ll be passing through Tucson next week — would it be worth our while getting together to chat about this?

jsick · May 16, 2017, 11:14pm

You’re spot on that the tracing of WBS -> software/services -> documentation sites involves a lot of many-to-many graphs.

On one hand the complexity of the mapping relates to the real complexity of our architecture. Perhaps we can more clearly define what the use cases for this sort of document tree are, and then figure out what data model best addresses it. It’d be great to talk about this next week.

rowen · May 17, 2017, 12:10am

I’m a bit nervous that developer.lsst.io and pipelines.lsst.io are so far apart on the diagram, in that they have different parents. I realize they are different, but developers, at least, may appreciate closer coupling.

What is the Contribution Guide that you list under pipelines.lsst.io? On the fact of it, it sounds like it may overlap with the developer guide.

jsick · May 17, 2017, 1:31am

Thanks for pointing out that concern @rowen. I’ll try to explain how I’m approaching this.

developer.lsst.io is meant to be a sort of handbook for DM staff to learn how the DM organization works and be productive in it. It’s where we’ve published policies in an accessible place. The word “Developer” in “Developer Guide” is intended to mean “a site for DM developers” not “how to develop the DM Stack.” Perhaps my branding has been confusing; it doesn’t help that developer.lsst.io has been the main documentation project for some time now. It could just as easily be branded as dm-team.lsst.io, for example. Ultimately, developer.lsst.io is primarily an internal policy site for staff, and we won’t be highlighting it to LSST’s main audience of astronomers.

pipelines.lsst.io is the proposed documentation site for the product that’s the scientific processing part of the DM stack. This is one of many products that DM makes, but it’s one that is directly usable by astronomers and that’s why it’s a top-level site in the diagram, not collected in internal documentation projects under the www.lsst.io hub. The pipelines are also an open source project, and although we don’t operate it as a true open source project today, it’s not inconceivable that by operations, the pipelines will be mature enough that the project starts regularly accepting contributions and participation from the wider community, as is standard in projects like Astropy.

The proposed Contribution Guide in pipelines.lsst.io will specifically guide readers who want to make code contributions, but aren’t necessarily DM staff. These contributions could be PRs to the LSST repos, or guidance on how to create your own independent packages that integrate with the pipelines stack. For matters of DM policy, like the Code Style Guides, pipelines.lsst.io will link to the relevant pages in developer.lsst.io. I’d also expect that some of the very pipelines-specific coding guides will migrate to pipelines.lsst.io itself. The guide for lsst.log is a prime example.

One way to think of this is that I don’t document how to deploy LSST the Docs in the Developer Guide; instead I document it with the LSST the Docs product’s documentation. The same pattern holds for Science Pipelines.

In summary, both pipelines.lsst.io and developer.lsst.io will continue to exist. developer.lsst.io covers organization and policy. pipelines.lsst.io documents the pipelines software at a technical level for both users and contributors/developers.

ktl · May 17, 2017, 7:26pm

First: the product tree is not intended to (and does not) directly align with the Work Breakdown Structure (WBS). Every leaf product should belong to exactly one WBS entry, but higher-level products (for which a documentation site may be appropriate) may correspond to multiple WBS entries, and many WBS entries will not correspond to any product.

As you and John both said, there are many-to-one, one-to-many, and potentially many-to-many relationships between products, git repos, and documentation sites. I think the goal should be that every high-level product has sufficient user and developer documentation; I’m not sure why software git repos should play a role here unless they happen to also be hosts for documentation (because we want to keep the docs close to the code).

swinbank · May 18, 2017, 4:35am

I think we should expect that all software products are documented in some way (possibly as part of a higher level document), and that all documentation lives in a repository, which is, presumably, git. Beyond that, things get fuzzy — which is precisely why I’d like to talk it though with @jsick in person