I know that the intention for data processing in the Rubin environment is to use Python notebooks to retrieve and process data. I am interested in understanding how this might function when the data extraction part is a small part of the computational requirement.
To evaluate the use of the Rubin/LSST environment for exoplanet research, we have been working with the TRILEGAL Rubin/LSST sim that is housed in NoirLab/DataLab. This has allowed us to simulate the performance of the database under conditions that are required for exoplanet recoveries. The focus of on-sky observations has been limited to the 6 DDF survey fields and will continue to be so limited.
In the operational Rubin/LSST environment the focus will be on visit data in those 6 fields.
In an early stage of our current work, I attempted to use Python notebooks to do the “after query” processing. In communicating with the support team from NoirLab, I understood that the facilities strengths were in data storage and retrieval, not in “after query” processing. Because the “after query” processing is intense, I downloaded the required data and and have been running multi-processing on my side of the interface.
Even with multi-processing, the “after query” computing times are measured in days, if not longer. In the case of NoirLab, the use of multi-processing seems to be expressly forbidden and the single thread time in the Noirlab environment was even longer than the single thread time in our own environment. We, also, need access to light curve processing tools that we may not be able to find in the current Python environment like Transit Least Squares and to AI/ML technologies like XGBoost, for example.
I’m interested in any suggestions that might aid us in planning for retrieval and processing for the exoplanet transit science case. Thanks.
I can’t comment on the NOIRLab Astro DataLab resources, and it sounds like you’re already familiar with their helpdesk resources, but I can speak to the Rubin Science Platform (RSP) capabilities.
In the RSP it is possible to install other python code packages that might be needed for data analysis. But the computational resources are shared, and the default allocation is minimal for the types of processing you mention, yes.
In the future, there will be additional resources for LSST data processing. One will come from the Independent Data Access Centers (IDACs), some of which may offer more compute power or even GPUs. The other will be additional resources hosted by Rubin, and allocated by the Resource Allocation Committee through a proposal process. At this time there is not much concrete information on these resources or their timeline, but I can quote the RSP Roadmap: “There is a high demand for more performant computation, which we are committed to provide within our resources. A Dask (parallel Python computing) service is on the roadmap, and we are investigating ways to competitively provide access to GPU and/or other resources friendly to machine learning.”
Further information will be advertised here in the Forum when these opportunities become available.
I think this provides the current answer for your question, so I’m going to tentatively mark this post as the solution – but if this didn’t answer your question please unmark it and reply in the thread, we’ll continue the discussion.
Melissa, Thanks for your response. You confirmed my understanding.
In the meantime, I have worked through the basics of using the API portal, rather than the Python notebooks and I believe that I have the answer to our needs. The documentation is just a tiny bit stale, but I was able to obtain the necessary API access token and retrieved the basic information on the dp1 and dp2 schema tables just as test.
It appears that we will be able to continue the process used with NoirLab Astro DataLab. Since we will only be monitoring about 5,000 targets in the DDF surveys and we have the basic identifying parameters for those targets, we should be able to query the info that we need and do the compute intensive work in our external environment.
Our plan would be to pull the data for exoplanet transit light curves as each new release becomes available and my intention would be to limit the pull to only new data since the prior release (intending to limit our use of Rubin/LSST computing resources).
If this thinking is counter to the plan of the operations team, please let me know.
On the API page, the sentence that begins “To access APIs” has two hot links. If I had taken the first link, I probably would have experienced things differently, but I took the second link to the user guide (Creating user tokens — Rubin Science Platform).
The 2nd line of the instructions on the user guide page (“Select Security tokens from the user drop-down menu at the upper right.”) clearly points the user to the drop down next to the user name at the top right of the screen.
After realizing that I needed to log on, I did as instructed and displayed the drop down next to the user name. There is no mention of tokens in that drop down as the guide suggests.
However, there is a hot link to “settings” (Account settings | Rubin Science Platform). On the settings page, “Access Tokens” is the first item on the settings list. It’s easy, breezy after that.
So the crux of this story is that only first-timers will fall into this documentation trap and only then if they don’t sign on before reading the user guide.