Introducing LSST the Docs for Continuous Documentation Delivery

jsick · May 16, 2016, 4:11pm

On behalf of the crew at SQuaRE, I’m pleased to introduce LSST the Docs, Data Management’s new documentation publishing platform. LSST the Docs will allow Data Management to create and iterate on documentation more effectively, while also giving readers a better experience.

Soon, you’ll see DM’s technotes, Developer Guide, and some design documents migrate from Read the Docs to the new platform. In the upcoming Fall 2016 cycle we will begin publishing a rebooted Science Pipelines documentation site on LSST the Docs.

You can read more about the platform in SQR-006: The LSST the Docs Platform for Continuous Documentation Delivery.

Why did we build LSST the Docs?

I really admire what Read the Docs has done for open source documentation. Read the Docs has made it so much easier for developers to continuously deploy documentation alongside their projects. At one point, LSST Data Management had 39 projects published with Read the Docs. I have been, and continue to be, grateful for what Read the Docs has done for open source software and the Python community in particular.

But we learned two things from using Read the Docs. First, LSST’s projects demand a lot of flexibility in their build environments. Second, we needed more automation to help manage the fleet of documents that Data Management ships.

Read the Docs is built to be an easy-to-use integrated documentation publishing service, and that integration includes the environment where documentation is built. Unfortunately, LSST Science Pipelines simply can’t fit in that environment, both in terms of computational resources and that LSST speaks a different build language than most Python projects (EUPS versus pip). We already have continuous integration services for LSST projects and it makes sense to build documentation on those as well.

Beyond EUPS, we can also envision projects where data intensive computation, testing, and figure generation are part of the documentation build process. Having flexibility in the build environment makes this possible.

We also found that Read the Docs projects needed a bit of administrative effort to provision new projects, their domain names, and set up new branch builds. While we tried to hide this administrative effort, it became a bottleneck for the team. LSST the Docs is built around an API, meaning that it’s ready to automate and integrate into LSST’s systems and workflows.

What can LSST the Docs do?

Here are some of the most exciting features of the LSST the Docs platform. See SQR-006: The LSST the Docs Platform for Continuous Documentation Delivery for additional detail.

Flexible documentation builds

Documentation can be built on any continuous integration platform. Big projects, like the LSST Science Pipelines documentation, will be built on DM’s Jenkins CI. Smaller documents, like technotes, will be built on Travis CI. We’ve written documentation describing how to setup a .travis.yml. We also have an elegant system for building multi-repository documentation for EUPS-based projects.

LSST the Docs is very flexible in how documentation is built. In essence, it’s a generator-agnostic static site publishing platform. Even Sphinx isn’t a hard dependency; alternative formats, like LaTeX documents, can be published too.

Beautiful, versioned URLs

Every documentation project has its own subdomain on lsst.io, for example ltd-keeper.lsst.io or sqr-006.lsst.io. These URLs are memorable and mean you won’t need a link shortener to refer to projects.

From these domains we publish multiple editions of documentation that map to branches on GitHub. The root URL, example.lsst.io/, hosts the master branch by default (though this is configurable). This gives us beautiful URLs for the canonical versions of the site we want readers to visit by default.

Documentation for branches of projects are published under /v/. For example, a release branch might be published to example.lsst.io/v/v1/ and a ticket branch at example.lsst.io/v/DM-1234/. Documentation for branches will be published automatically as soon as you push to GitHub. I think this feature will be tremendously valuable for documentation reviews during pull requests.

As a bonus, we retain old documentation builds. Individual builds are published to example.lsst.io/builds/(id)/. This will be helpful for seeing, and sharing, A/B comparisons of your documentation. It also means that if one of the main documentation editions breaks we can immediately hot-swap to any previous documentation build without having to rebuild the documentation from scratch.

And don’t worry, we’ll add <link rel="canonical" href="..."> headers for our HTML templates to help search engines sort through our documentation versions.

Served by Fastly

To give readers the best experience we’re using the Fastly content distribution network for everything published by LSST the Docs. Whether you’re West Coast, East Coast, down in Chile, over in France, or anywhere else on Earth, there will be a nearby Fastly point of presence serving you docs.

Besides performance, we’re also taking advantage of the Varnish caching layer that Fastly hosts. Varnish lets us map URLs for all documentation projects, and their individual builds, to directories in a single AWS S3 bucket (see SQR-006 for details). This will allows us to scale LSST the Docs to host an enormous number of projects without breaking a sweat. (Hat tip to HashiCorp for advocating this pattern.)

Last but not least, Fastly allows us to securely deliver content over TLS (i.e., HTTPS). This is nice to have for static documentation projects, but will become critical for serving interactive content with client-side JavaScript.

API Driven

Starting with our earliest whiteboard design sessions, we knew that LSST the Docs needed to be decomposed into discrete microservices with well-defined interfaces. This design gives us flexibility, and isolates details. For example, LSST the Docs can publish documentation for EUPS projects without having to be aware of EUPS. Below is an architectural diagram describing how an EUPS-based documentation project, like the Science Pipelines, is published by LSST the Docs.

At the heart of LSST the Docs is LTD Keeper, a RESTful web app. LTD Keeper maintains the state of documentation projects and builds, and coordinates the builders on CI servers (LTD Mason) and other web services (AWS S3 and Route 53, and Fastly).

This API can also be consumed by external services. For example, documents can use this API to power user interface elements that help readers find the right version of the docs. Dashboards can use the API to list documentation projects and their versions. Even ChatOps bots could use this API.

Deployed with Kubernetes

Being my first major DevOps project, I wanted to cultivate modern best practices for deploying applications to the web. We decided to deploy the LTD Keeper API server (built on Flask in Python 3) in Docker containers orchestrated by Kubernetes. This is all done in the Google Container Engine. Below is a diagram of what the application deployment looks like.

A Kubernetes load balancer service receives traffic from the internet and routes it to pods with Nginx containers that terminate TLS traffic. These forward the traffic, via another internal load balancer, to pods composed of a Docker container that reverse-proxies traffic and finally a container with the uWSGI-run Flask application. All of the pods are managed by Kubernetes replication controllers, meaning that it’s easy to scale the number of pods, and also to deploy updated pods without service interruptions.

The best part is that this entire infrastructure is configured and managed on the command line with a few YAML files. The LTD Keeper documentation contains complete deployment instructions.

I couldn’t be happier with Kubernetes, and I believe that this deployment architecture will be a useful template for future SQuaRE projects.

Onwards

With LSST the Docs, we are at last in a position to move forward on DM’s documentation projects, not least of which will be a reboot of the LSST Science Pipelines documentation. We look forward to migrating Science Pipelines to Sphinx during the Fall 2016 development cycle.

This platform will also enable exciting integrations and automations for the LSST DM Technote platform (SQR-000) and the DocHub project (SQR-011) for LSST documentation search and discovery.

We’re continuously improving LSST the Docs. The Fall 2016 DM-5858 epic lists some of the planned work, including dashboards for listing documentation versions and builds.

Get the code and read the docs

LSST the Docs code is MIT-Licensed open source. It’s built either natively for, or compatible with, Python 3. Here are the main repositories and their documentation:

LTD Mason
- Docs: https://ltd-mason.lsst.io.
- GitHub: https://github.com/lsst-sqre/ltd-mason
LTD Keeper
- Docs: https://ltd-keeper.lsst.io.
- GitHub: https://github.com/lsst-sqre/ltd-keeper.
- Docker Hub:
  - https://hub.docker.com/r/lsstsqre/ltd-keeper/
  - https://hub.docker.com/r/lsstsqre/nginx-python/

You can follow the progress of LSST the Docs on JIRA by searching for the label: ‘lsst-the-docs.’

The technote describing this project, its philosophy, architecture, and implementation is available at https://sqr-006.lsst.io.

heather999 · June 3, 2016, 4:42pm

@jsick I’m interested in setting up the DESC project Twinkles to deploy its documentation to LSST the Docs. I’m getting there but I have some general questions.

Concerning the nice URLs, are there plans for a DESC subdomain or would each project, like Twinkles, have a URL something like: lsst.io.twinkles?

Looking at the examples described here: https://ltd-mason.lsst.io/travis.html and https://github.com/lsst-sqre/sqr-006/blob/master/.travis.yml
In the requirements.txt and .travis.yml files used to install Sphinx and other modules, I note the use of specific versions or at least a range of versions for those modules. Is there a suggested way to stay up to date? In particular, I see: pip install “ltd-mason>=0.2,<0.3” in the sqr-006 example, but the documentation indicates no such need for a specific version. Which resource is more correct? And how would a project hoping to utilize LSST the Docs hope to stay in line with the choices the developers feel are best?

Twinkles isn’t building against a matrix of python versions, rather it is using DMstack installed with its version of miniconda and various python modules. I noted the choice of python 3.5 in the LSST the Docs example - if we’re still on python 2.7 (for now) is that a problem or perhaps it really doesn’t matter?

timj · June 3, 2016, 8:50pm

I’ll try some quick responses as @jsick is on vacation.

The version constraints are there to ensure that a deployed document build will work in the same way today as it does a year ago. This relies on semantic versioning where in this case the assumption is that the API will break when the version number changes from 0.2 to 0.3. Rather than be forced to check the rendering of all documentation builds every time a major new release is made the version constraints provide safety.

The python version for building the docs on the LTD servers is not related to the version of python you use to build your DM stack.

heather999 · June 3, 2016, 9:26pm

Thanks, @timj
so for ltd-mason I see the most recent release is 0.2.1 - is there some major change coming in 0.3 that a user may need to keep in mind? Maybe I’m just wondering where to keep track of the ltd-mason development - if it’s deemed relevant to the end-users. I suspect that’s in JIRA somewhere. Maybe it really doesn’t matter to someone like me and I always want to access the latest and greatest ltd-mason.

I’ll admit I’m still learning to play with and use travis-ci… in our environment to run our Twinkles tests in travis-ci, we are setting up python via DMstack - isn’t that the python that will be used for building the doc as well in the same travis-ci job? Maybe it’s a matter of setting up the .travis.yml appropriately so that these activities are completely separated - I don’t know how to do that yet
Should I prefer to use python 3.5 to build the docs?

timj · June 3, 2016, 9:30pm

As far as I know 0.3 doesn’t exist and there may well be no plan for it. The point is that in semantic versioning 0.3 might result in some breakage compared to 0.2 so you just set up your build system to install 0.2 only.

When you work on some documentation after 0.3 is released you will put in the constraint of >= 0.3 and < 0.4 to play it safe. Your previous document will still work because it will only be installing a 0.2.x version.

It may be that a purported 0.3 won’t break anything but it is better if updating to that version is under your control rather than something which happens behind the scenes next time the document is rebuilt.

Preferring python 3 is always my position…

jsick · June 14, 2016, 4:49pm

Sorry for the long delay in responding to your post @heather999. Tim covered most things, but I’ll offer a couple clarifications and also touch on some additional points relevant to publishing existing Twinkles docs with LTD.

I think you mean having a domain like twinkles.io or twinkles.org. This can be done, but it’s likely more ambitious than necessary. LTD uses Fastly to serve traffic, and Data Management is currently paying for a ‘wildcard certificate’ to secure traffic to any *.lsst.io domain. Twinkles, or anyone else, could buy a certificate for their domain ($250/mo). An alternative is serve unencrypted data. That’s a fair trade-off now, but in the 2020s it’ll be a strong anti-pattern.

I think the real solution is that DESC/Twinkles can serve documentation/websites from any number of lsst.io subdomains, such as twinkles.lsst.io. Since DESC/Twinkles is affiliated with LSST, Data Management is happy to provide this as complimentary service to you (and all other LSST Science Collaborations).

And to clarify, you can choose any number of subdomains for any number of different projects. You might have a site for software documentation, another for a white paper, another for a general DESC website at desc.lsst.io, etc… There’s no need to be conservative with the number of projects you publish—even LaTeX papers are ‘publishable,’ as DM does for LDM-151 drafts right now.

At the moment I’ll be working directly with you, and any other interested groups, in providing and maintaining the Travis configuration files for publishing with LSST the Docs. The reason for this is that I need to sprinkle in some encrypted passwords to authenticate with LTD servers. This means that you don’t really need to worry about about the version of LTD Mason used by Travis—that’s on me to keep every project up to date. I, or in the short future, a bot, will be sending PRs to update projects on the rare occasion an update is needed.

In addition to setting up the LTD Mason documentation build, I can also work with you to ensure that your software is built and tested as needed.

SQuaRE’s policy is to write and use Python 3 whenever possible, since that’s inevitably where the LSST Stack is headed. LTD Mason works in Python 2.7 environments as well. If you’re software uses Python 2.7 (and/or leverages the LSST Stack) you’ll want to use Python 2.7 in your documentation build environment.

Yes, once you ‘setup’ the Stack you’ll be using (I assume) the Python 2.7 that comes with the Conda build of the LSST Stack. This is totally fine; that’s what DM will be doing to build the LSST Stack’s documentation. In this case the Python version made available from the Travis environment is moot.

For the specific case of Twinkles, you already have a lot of great content in Markdown. I think that Sphinx is still useful for helping you assemble those markdown files into a website. Recently Sphinx gained the ability to parse content in Markdown, instead of only reStructuredText.. I haven’t tried it yet, but I think a great option for you would be to use this to bridge your existing content into Sphinx. Longer term, you may want to convert content to reStructuredText to gain additional functionality—but at least that would be a gradual transition.

Send me a note when you’re ready to set up an LTD site and I’ll help you out.

heather999 · June 23, 2016, 3:43pm

Thank you for the additional details, and sorry to take so long to respond!
Yes, I’m ready to pursue this further. I would love to have something up and running in time for the DESC Oxford meeting next month. Initially I had thought I had to set up the encrypted passwords myself and had set off getting set up to do that If you’re in a position to help out with that set up - so much the better! But please let me know what I can do to help.

I have a created a branch in the Twinkles GitHub area for this effort:

So far, all I’ve done is add three files: Makefile, conf.py, and index.rst
At the moment, I’d just want to see that the index.rst file can be processed via travis-ci and published to an area like twinkles.lsst.io

About Sphinx and Markdown - I saw that *.md files were an option - my initial trial though resulted in pages that were not rendered well. Twinkles is wiling to move to ReStructedText, and I still need to work with the group to determine precisely what portions of their existing pages they want to make available. It seems like a good time to just make the jump to *.rst. Of course, there is wiggle room to try *.md again if that becomes helpful.

Please feel free to update this branch or just let me know what I can do to move forward. I was starting to look at adding a setup/docRequirements.txt file and editing our existing .travis,yml, but it sounds like you are in a better position to set that up appropriately. I would like to learn as much as possible about how this works, so do not hesitate to just point me in some direction.
Thank you,
Heather

jsick · June 25, 2016, 6:59pm

Great. I’ll let you know when I’ve got everything hooked up.

heather999 · June 29, 2016, 4:53pm

Thanks, @jsick Actually I have question for you… there will likely be ReStructuredText docs in another GitHub repo https://github.com/DarkEnergyScienceCollaboration/ComputingInfrastructure that we may want to access when producing our documentation. Is that realistic or perhaps we would have to set up this other repo to publish its doc to LSST the Docs and then the Twinkles pages would just link to them?
Sorry I’m imagining the content under “Computing Infrastructure” is going to be generally useful to DESC folks and not specific to Twinkles - hence the two separate areas.

jsick · June 29, 2016, 5:54pm

Sure—sounds like a similar model to how we’ve got a DM Developer Guide at developer.lsst.io separate from our software projects e.g. https://pipelines.lsst.io.

What URL subdomain would you like for the Computing Infrastructure doc site? e.g. desc-computing.lsst.io, desc-developer.lsst.io… I’m open to suggestions

heather999 · June 29, 2016, 6:14pm

Thank you! How about desc-computing.lsst.io or even desc-ci.lsst.io, most of us have been using CI to refer to our “computing infrastructure” group.

price · June 29, 2016, 6:29pm

I suggest you steer away from “CI”, as it also stands for “continuous integration”, and we would like to get some DESC code under CI so there’s potential for confusion.

heather999 · June 29, 2016, 6:33pm

Ok - fortunately people actually read these posts
Let’s go with desc-computing.lsst.io