Intensive Compute Time

I know that the intention for data processing in the Rubin environment is to use Python notebooks to retrieve and process data. I am interested in understanding how this might function when the data extraction part is a small part of the computational requirement.

To evaluate the use of the Rubin/LSST environment for exoplanet research, we have been working with the TRILEGAL Rubin/LSST sim that is housed in NoirLab/DataLab. This has allowed us to simulate the performance of the database under conditions that are required for exoplanet recoveries. The focus of on-sky observations has been limited to the 6 DDF survey fields and will continue to be so limited.

In the operational Rubin/LSST environment the focus will be on visit data in those 6 fields.

In an early stage of our current work, I attempted to use Python notebooks to do the “after query” processing. In communicating with the support team from NoirLab, I understood that the facilities strengths were in data storage and retrieval, not in “after query” processing. Because the “after query” processing is intense, I downloaded the required data and and have been running multi-processing on my side of the interface.

Even with multi-processing, the “after query” computing times are measured in days, if not longer. In the case of NoirLab, the use of multi-processing seems to be expressly forbidden and the single thread time in the Noirlab environment was even longer than the single thread time in our own environment. We, also, need access to light curve processing tools that we may not be able to find in the current Python environment like Transit Least Squares and to AI/ML technologies like XGBoost, for example.

I’m interested in any suggestions that might aid us in planning for retrieval and processing for the exoplanet transit science case. Thanks.

1 Like

Hi @sdcorle1, thanks for posting your question.

I can’t comment on the NOIRLab Astro DataLab resources, and it sounds like you’re already familiar with their helpdesk resources, but I can speak to the Rubin Science Platform (RSP) capabilities.

In the RSP it is possible to install other python code packages that might be needed for data analysis. But the computational resources are shared, and the default allocation is minimal for the types of processing you mention, yes.

In the future, there will be additional resources for LSST data processing. One will come from the Independent Data Access Centers (IDACs), some of which may offer more compute power or even GPUs. The other will be additional resources hosted by Rubin, and allocated by the Resource Allocation Committee through a proposal process. At this time there is not much concrete information on these resources or their timeline, but I can quote the RSP Roadmap: “There is a high demand for more performant computation, which we are committed to provide within our resources. A Dask (parallel Python computing) service is on the roadmap, and we are investigating ways to competitively provide access to GPU and/or other resources friendly to machine learning.”

Further information will be advertised here in the Forum when these opportunities become available.

I think this provides the current answer for your question, so I’m going to tentatively mark this post as the solution – but if this didn’t answer your question please unmark it and reply in the thread, we’ll continue the discussion.