2025-07-04 RSP @ data.lsst.cloud : Severe disruption Monday July 7th (and some happier news)

frossie · July 5, 2025, 6:42am

They Call It Stormy Monday

First the bad news: due to extensive electrical work at our US Data Facility at SLAC, services at data.lsst.cloud will be severely disrupted on Monday July 7th. This will include all catalogs being offline. The estimated outage period is from 4am Pacific (11:00 UT) till approximately 8pm Pacific (04:00 UT on Tuesday).

The following table shows service availability while SLAC is down:

Service	DP1	DP0.3	DP0.2
TAP (catalogs)
HIPS		N/A
SIA (images)		N/A
SODA (cutouts)		N/A
Butler (images)		N/A
ObsTAP (images)		N/A

Things you can do during this time:

Access the Notebook aspect
Use services marked with above
Analyze previously retrieved data
Access services from external archives

To understand more about what is unavailable and why check out this page in the RSP docs. Needless to say, we really regret the interruption, especially since…

You folks sure got busy with DP1

It has been tremendous fun (and a little bit terrifying) to see the intense user activity following the release of Data Preview 1. We are learning a lot about how to prepare our system for the survey-era data releases - no matter how much automated testing we do, it’s nothing like having real users on the system^[1]. By the end of the first week we had 534 active user sessions and millions of calls to our APIs.

And it seems we already have announced discoveries using the RSP - after only five days online! Amazing. And thank you to those that give us a shout out in your papers; it’s not that we need the citations per se, but it sure makes us feel good to know our work is helping you out.

That said…

Steady on the enthusiasm (and the service calls)

We are currently not applying service call rate limits since in this RSP preview period as one of our goals is to understand natural user patterns. We will be introducing rate limits gradually as we figure out what values allow you the most performance while preserving the service for all users.

That said, even without limits you can really slow yourself down (and others) by flooding the services. If you are doing massive and/or parallel scripted calls, adjust your code so that you’re not asking for data at a higher rate than you’re getting it back. This will give you optimal throughput.

Latest updates

Most of the updates last Patch Thursday focused on performance fixes for the services. We do have an on-going issue that has led us to truncate the display of the user query history in the portal Job Monitor to only the last 50, but the full history will be available once the bug is fixed.

Stay tuned as we expect to roll out new capabilities in the coming weeks.

For example, bot testers don’t put their laptop lid down and walk away; one of the more fun bugs we found this week is that the notebook service, which had been automatically tested to x10 more users that we saw this week, suffered a proxy issue caused by suspended sessions. ↩︎

frossie · July 8, 2025, 7:34am

I am very sorry to report that the news from our Rubin US Data Facility is that system recovery has run into complications and the ETA for the restoration of services to the Rubin Science Platform is not clear and heading into Tuesday at this point.

The banner on data.lsst.cloud is updated with the latest information we have.

We feel your frustration.

frossie · July 8, 2025, 6:45pm

Service has been restored to DP1 data. Again, we are very sorry for the extended outage.

Issues are still ongoing with the infrastructure providing DP0.3 catalogs and DP0.2 images