Today we’re pleased to launch a new portal for Rubin Observatory documentation:
With this new website, you can browse the Rubin Observatory’s documents and search across the full-text content. This new search site will update immediately as we produce new documentation.
Content coverage and expansion possibilities
At launch, the portal is focused on PDF documents and Sphinx-based technotes hosted on LSST the Docs (the
lsst.io domain). This covers many of our documents with handles such as DMTN, DMTR, ITTN, LDM, LPM, LSE, OPSTN, PSTN, RTN, SMTN, SITCOMTN, SQR, and TSTN. There is a lot more documentation out there though, and our goal is to ultimately make all of these types of content discoverable and searchable from the documentation portal:
- Sphinx-formatted user guides (such as pipelines.lsst.io)
- Community forum posts
- GitHub repository READMEs
- Confluence wiki pages
- Documents held in DocuShare that don’t already have counterparts on LSST the Docs
- Presentations archived on Zenodo
- RFCs and other key Jira projects
We hope to slowly add these contents, with user guides being our highest priority.
We’re subscribing to the adage that if everything is perfect, you’ve launched too late. You can help us improve the documentation portal and prioritize new features by giving us feedback. Since you’re already here on the Community forum, feel free to reply to this post. You can also give feedback on these other platforms:
- Create a GitHub issue
Create a Jira ticket in the
- Chat in #dm-docs on Slack (internal)
How we built it
We built the documentation portal on an exciting technology stack that both leverages our existing expertise in backend infrastructure and points to new directions in front-end engineering.
On the backend, we built an application called Ook (after the Librarian in Discworld) that is responsible for receiving and processing content ingest requests. Ook transforms content into small, structured records that it uploads into our search database.
Ook itself is a Python microservice built on aiohttp and our Safir framework and Kafkit library. We’ve deployed Ook on our Roundtable Kubernetes cluster, which runs in the Google Cloud. Since its launch last year, Roundtable has proved to be a tremendously successful Kubernetes platform for the SQuaRE team to deploy apps that don’t need direct access to the Science Platform. LSST the Docs, sqrbot-jr, templatebot, Vault, checkerboard, segwarides, and neophile are some other apps that serve Rubin Observatory through the Roundtable platform.
Ook receives ingest requests either through an HTTP API or messages in a Kafka topic. For example, we added a service (LTD Events) to LSST the Docs that produces a Kafka message whenever documentation is updated. Through either method, Ook quickly classifies the content to determine if and how it should be ingested. Ook queues ingest tasks into another Kafka topic, which lets Ook buffer and distribute the load to a dedicated set of Kubernetes pods.
How Ook goes about ingesting a document depends on the content type. For PDF documents, Ook uses metadata and content that’s extracted directly from the TeX source by our PDF landing page static site generator, Lander. For Sphinx-formatted technotes, Ook extracts metadata and content directly from the published HTML. In both cases, we break a full page of content into smaller records covering individual subsections or paragraphs. Smaller content chunks lead to faster and more relevant search results. To structure our search records, and in turn configure our search indices, we followed the recommendations in this Algolia blog post.
Algolia is our search database and API. For this initial launch, we’re fortunate to use Algolia’s open-source tier. By working with Algolia, we could focus on aspects specific to Rubin Observatory, such as ingesting our specific types of content and building out a front-end that meets our needs, without getting bogged down with the general problem of operating a search engine.
To build the website (the front-end), we adopted Gatsby, a static site generator built on React. Since Gatsby builds static sites, we could deploy it on LSST the Docs behind the Fastly CDN. To build the search interfaces, we used Algolia’s InstantSearch library for React. We loved how InstantSearch facilitates progressive customization. For example, we could prototype a new search component directly from InstantSearch with only minimal configuration. Once we adopted the component, we could style it through styled-components. If we needed to dramatically customize the component, we could do so with the Connectors API to implement our own UI but keep the prebuilt Algolia integration.