I know that the intention for data processing in the Rubin environment is to use Python notebooks to retrieve and process data. I am interested in understanding how this might function when the data extraction part is a small part of the computational requirement.
To evaluate the use of the Rubin/LSST environment for exoplanet research, we have been working with the TRILEGAL Rubin/LSST sim that is housed in NoirLab/DataLab. This has allowed us to simulate the performance of the database under conditions that are required for exoplanet recoveries. The focus of on-sky observations has been limited to the 6 DDF survey fields and will continue to be so limited.
In the operational Rubin/LSST environment the focus will be on visit data in those 6 fields.
In an early stage of our current work, I attempted to use Python notebooks to do the “after query” processing. In communicating with the support team from NoirLab, I understood that the facilities strengths were in data storage and retrieval, not in “after query” processing. Because the “after query” processing is intense, I downloaded the required data and and have been running multi-processing on my side of the interface.
Even with multi-processing, the “after query” computing times are measured in days, if not longer. In the case of NoirLab, the use of multi-processing seems to be expressly forbidden and the single thread time in the Noirlab environment was even longer than the single thread time in our own environment. We, also, need access to light curve processing tools that we may not be able to find in the current Python environment like Transit Least Squares and to AI/ML technologies like XGBoost, for example.
I’m interested in any suggestions that might aid us in planning for retrieval and processing for the exoplanet transit science case. Thanks.