Search and browse Rubin Observatory docs with our new website

jsick · July 16, 2020, 4:13pm

Today we’re pleased to launch a new portal for Rubin Observatory documentation:

https://www.lsst.io

With this new website, you can browse the Rubin Observatory’s documents and search across the full-text content. This new search site will update immediately as we produce new documentation.

Content coverage and expansion possibilities

At launch, the portal is focused on PDF documents and Sphinx-based technotes hosted on LSST the Docs (the lsst.io domain). This covers many of our documents with handles such as DMTN, DMTR, ITTN, LDM, LPM, LSE, OPSTN, PSTN, RTN, SMTN, SITCOMTN, SQR, and TSTN. There is a lot more documentation out there though, and our goal is to ultimately make all of these types of content discoverable and searchable from the documentation portal:

Sphinx-formatted user guides (such as pipelines.lsst.io)
Community forum posts
GitHub repository READMEs
Confluence wiki pages
Documents held in DocuShare that don’t already have counterparts on LSST the Docs
Presentations archived on Zenodo
RFCs and other key Jira projects

We hope to slowly add these contents, with user guides being our highest priority.

Giving feedback

We’re subscribing to the adage that if everything is perfect, you’ve launched too late. You can help us improve the documentation portal and prioritize new features by giving us feedback. Since you’re already here on the Community forum, feel free to reply to this post. You can also give feedback on these other platforms:

Create a GitHub issue
Create a Jira ticket in the www_lsst_io component (internal)
Chat in #dm-docs on Slack (internal)

How we built it

We built the documentation portal on an exciting technology stack that both leverages our existing expertise in backend infrastructure and points to new directions in front-end engineering.

On the backend, we built an application called Ook (after the Librarian in Discworld) that is responsible for receiving and processing content ingest requests. Ook transforms content into small, structured records that it uploads into our search database.

Ook itself is a Python microservice built on aiohttp and our Safir framework and Kafkit library. We’ve deployed Ook on our Roundtable Kubernetes cluster, which runs in the Google Cloud. Since its launch last year, Roundtable has proved to be a tremendously successful Kubernetes platform for the SQuaRE team to deploy apps that don’t need direct access to the Science Platform. LSST the Docs, sqrbot-jr, templatebot, Vault, checkerboard, segwarides, and neophile are some other apps that serve Rubin Observatory through the Roundtable platform.

Ook receives ingest requests either through an HTTP API or messages in a Kafka topic. For example, we added a service (LTD Events) to LSST the Docs that produces a Kafka message whenever documentation is updated. Through either method, Ook quickly classifies the content to determine if and how it should be ingested. Ook queues ingest tasks into another Kafka topic, which lets Ook buffer and distribute the load to a dedicated set of Kubernetes pods.

How Ook goes about ingesting a document depends on the content type. For PDF documents, Ook uses metadata and content that’s extracted directly from the TeX source by our PDF landing page static site generator, Lander. For Sphinx-formatted technotes, Ook extracts metadata and content directly from the published HTML. In both cases, we break a full page of content into smaller records covering individual subsections or paragraphs. Smaller content chunks lead to faster and more relevant search results. To structure our search records, and in turn configure our search indices, we followed the recommendations in this Algolia blog post.

Algolia is our search database and API. For this initial launch, we’re fortunate to use Algolia’s open-source tier. By working with Algolia, we could focus on aspects specific to Rubin Observatory, such as ingesting our specific types of content and building out a front-end that meets our needs, without getting bogged down with the general problem of operating a search engine.

To build the website (the front-end), we adopted Gatsby, a static site generator built on React. Since Gatsby builds static sites, we could deploy it on LSST the Docs behind the Fastly CDN. To build the search interfaces, we used Algolia’s InstantSearch library for React. We loved how InstantSearch facilitates progressive customization. For example, we could prototype a new search component directly from InstantSearch with only minimal configuration. Once we adopted the component, we could style it through styled-components. If we needed to dramatically customize the component, we could do so with the Connectors API to implement our own UI but keep the prebuilt Algolia integration.

jsick · August 4, 2020, 7:55pm

In the latest update to our documentation portal, we’ve added curated collections of our user guides. For example, the Guides for scientists page collects documentation for software that is relevant to astronomers working with Rubin Observatory data.

At the moment, these guide collections are for browsing. In a future update, the full-text content of these guides will become searchable. (Many of our guides, though, include their own search services). To help you learn more about the status of the documentation portal, we’ve published an About page.

Let us know what you think!