FAQ: Technical Aspects of Rubin Science Platform Accounts for DP0

Tags: #<Tag:0x00007f27ab460738>

This year, the Rubin Observatory will allocate up to 300 Rubin Science Platform (RSP) accounts to data rights holders to enable community participation in Data Preview 0 (DP0): the release of simulated LSST-like data products in the RSP at the Interim Data Facility (IDF; deployed on Google Cloud) at the end of June 2021.

This post addresses some of the main differences between what these 300 DP0 delegates will see in the DP0-era RSP as opposed to the final operations-era RSP (the latter of which is described in the RSP Vision Document) and also presents some of the Q&A which arose during the DP0 Info Sessions held in Jan and Feb 2021.

Q1: What are the main technical differences between the DP0-era RSP and the operations-era RSP as described in the RSP Vision Document?

A1: A description of the operations-era RSP for future users is provided in the RSP Vision Document. The DP0-era RSP will have limited features in many respects, as it is still under active development. Details and instructions related to the following items will be provided to DP0 delegates in June. Some of the more notable differences of the DP0-era RSP are:

  • Authentication will be performed using Github as the identity provider. DP0 delegates will have to create a GitHub account if they do not already have one. During operations, authentication will be done with, e.g., US university (InCommon) identity.
  • Some major usability features will not be available, such as support for user database tables; support for parallelized or batch computation; the ability to sync files between RSP accounts and personal devices; and the ability to manage the sharing of data within private groups.
  • For the Notebook aspect, only python notebooks and the terminal interface are supported, and RSP users will not be able to access their Portal queries from the Notebook aspect for DP0.1 (see also Q2).
  • For the API aspect, only the Table Access Protocol (TAP) will be available for DP0 (i.e., catalog queries).
  • For the Portal Aspect, only catalog queries will be available. Note that the Portal Aspect has not been under active development recently and it is expected to evolve significantly before the first LSST annual data release (DR1).
  • A number of safeguards for avoiding uptime or temporary data loss will not be present – the resources are still in “trusted user” mode. DP0 delegates will be provided with guidelines that they must follow for safe usage of the RSP during DP0.
  • Performance during DP0 may not reflect the performance of the final system (see also Q4), and the resources made available to DP0 delegates may not reflect the final user quotas of the operations-era RSP.
Q2: What are the main differences between RSP functionality for DP0.1 and DP0.2?

A2: In DP0.1 the Notebook Aspect of the RSP will offer image access via the Butler (a middleware component of the DMS for persisting and retrieving image datasets) through a python interface in the Notebook aspect. In DP0.2 image access will also be available via select Virtual Observatory (VO) services both from the Notebook Aspect as well as through the API aspect.

Q3: Will the ability to process the data directly using the LSST Science Pipeline Butler interface be available for DP0?

A3: Limited processing capability will be available as part of DP0. All delegates will share access to a common Butler (‘Gen3’) repository. DP0.1 provided data products in the repository will be read-only. Users may create their own data products and collections thereof in the repository, which will be available to all other delegates.

Q4. Will it be possible to prove that algorithms can scale using the DP0-era RSP?

A4. No access to user batch or parallelization are available in DP0. Resources provided as part of DP0.1 are limited and meaningful scalability testing of algorithms is not practical.

Q5: Will it be possible to bulk download data from the RSP/IDF?

A5: No, bulk data downloads from the RSP specifically, or the IDF in general, are not supported during DP0. This is because bulk download services are not yet in production, and because the major goals of DP0 are to allow Rubin Observatory to evaluate how the community uses the services, and for the community to become familiar with the RSP environment. Note that DESC has made the DC2 DR6 catalogs publicly available via: https://lsstdesc-portal.nersc.gov/.

Q6: Will it be possible for DP0 delegates to add their own Google Cloud instances at the IDF in order to have additional computational resources?

A6: No.

Q7: Will there be a software environment for R in the DP0-era RSP?

A7: No, the RSP Notebook Aspect will only have python environments. The Community Engagement Team is interested in hearing from DP0 delegates about which other JupyterLab-supported environments might be necessary for their science goals; bear in mind that using the LSST Science Pipelines requires a python environment.

Q8: Will RSP users be able to install their own code in the DP0-era RSP?

A8: DP0 delegates will be able to add software that can be installed without privileges into their own homespace (eg. via pip install --user) for their personal use to the extent that their quota allows. In the operations-era RSP there will be a process for users to request certain popular packages to be added to the common environment for use by all.

Q9: Will there be support for parallel processing in the DP0-era RSP?

A9: No, there is no plan to support parallelized computation or user batch processing in DP0. During the operations-era RSP there will be support for user batch processing, subject to the limitations outlined in the Science Platform Vision Document (ls.st/lse-319): i.e., that 10% of the total LSST computing and storage resources will be reserved for RSP users to process and analyze LSST data.