We plan to start sessions/runs of the Stack in a production environment.
We’ll need to ensure that ANY data produced in these runs are properly tagged in reference to conditions used during these runs:
- software version (Stack, …)
- origin of input data
- description of input date
- values of run parameters
Is there anythng foreseen or available in the LSST software, in terms of logging data, conventions, formats, or even database, to conventionnally store and retrieve run data?
Thanks for any suggestion
In calendar 2016 we hope to investigate and develop “workflow” solutions that would provide storage and discovery of run information (“processing metadata”) in a database such as you are asking for, but our current facilities are rather more primitive.
CmdLineTasks persist the configuration (“values of run parameters”) to the output repository. Software versions can be obtained using “
eups list --setup” and are persisted by the
ctrl_orca orchestration layer, if that is used (itself or with
(DM-3372, which is not currently scheduled, may also be relevant.)
I think some of this would also relate to the provenance system that @jbecla is prototyping?
Yup, I’d be very interested in connecting my provenance prototype with pipelines but I heard pipeline orchestration code is not ready for that yet and it is better to wait.
ctrl_orca when executed through ctrl_execute (via runOrca.py) will execute “eups list --setup” and deposit it in the output directory.
The orchestration software itself used ctrl_provenance when policy files were in use, and saved the software versions to the database (along with other things). When the switchover to pex_config happened, orchestration switched to that as well. When the new provenance code is ready, we can get that functionality added back into Orca.