On most Thursday afternoons we deploy updates and perform routine maintenance actions on our various science platform instances, including at the IDF (data.lsst.cloud - operations) and the LDF (lsp-stable - construction). These usually start at 3pm Pacific (22:00 UT summer / 23:00 winter). Things to note:
- This is not a planned outage period and you will not be asked to stop using the Platform.
- However there might be transient service interruptions or instabilities during this time, and very rarely a deployment can take longer than anticipated because of unforseen issues not seen in testing.
- If you are using the RSP during this time, save your work early and often as on rare occasions user sessions may need to be terminated.
- Wait until after this window to file bug reports (if you still see a problem)
Typically Patch Thursdays last no more than two hours, often significantly less. You know Patch Thursday is finished when this red banner is removed from the landing page:
(you will have to refresh the page in your browser for the moment to check it has disappeared).
Additional information that you don’t have to know but might be curious about.
What actually happens?
A lot of activity including third-party dependencies security patches and bringing up our own services up to their latest version (picking up both bugfixes and improvements). For the Interim Data Facility hosted at Google we also roll out infrastructure updates (update practices on our on-prem kubernetes clusters differ by infrastructure provider). Notable user-visible changes are advertised in our Community Forum.
Why does this happen every week?
The RSP is a primary user interface to our data and services and is under active development, so we balance desire for new features and quick bugfix turnaround with the desire for reliable and uninterrupted service. Due to the public profile of the project, we also need to stay on top of our security updates. Earlier in construction we did nearly continuous deployment, but as we are adding more external users we find that a weekly cadence for production clusters offers a good compromise. In some circumstances, hotfixes to serious issues can be deployed at once - the weekly cadence is a a policy issue, not a technical constraint. For a detailed look at our deployment practices, see sqr-056.lsst.io.
Partly for historical reasons, partly for practical reasons - it’s between Wednesday (which is the weekly release day for science pipelines code) and Friday (where we try to keep production systems stable to avoid running into the weekend and to support internal programmes such as Data Management’s Focus Fridays).
Why can’t you do this out of hours
We have users all over the world, including construction teams in European timezones so there is no real “out of hours”. We also give priority attention to our summit (telescope) services at nights. And if we run into any issues with our on-premises infrastructure deployment, it’s easier to find IT help during the day.
Why do I have to refresh my browser like an animal?
This is interim functionality to allow (particularly our external to the project) users to be reminded of Patch Thursday and other important service news. On the RSP roadmap there are a number of more sophisticated approaches including per-user notifications, browser notifications and a platform status page indicating service health.