We plan to start sessions/runs of the Stack in a production environment.
We’ll need to ensure that ANY data produced in these runs are properly tagged in reference to conditions used during these runs:
date
software version (Stack, …)
origin of input data
description of input date
values of run parameters
Is there anythng foreseen or available in the LSST software, in terms of logging data, conventions, formats, or even database, to conventionnally store and retrieve run data?
In calendar 2016 we hope to investigate and develop “workflow” solutions that would provide storage and discovery of run information (“processing metadata”) in a database such as you are asking for, but our current facilities are rather more primitive.
Our CmdLineTasks persist the configuration (“values of run parameters”) to the output repository. Software versions can be obtained using “eups list --setup” and are persisted by the ctrl_orca orchestration layer, if that is used (itself or with ctrl_execute).
(DM-3372, which is not currently scheduled, may also be relevant.)
Yup, I’d be very interested in connecting my provenance prototype with pipelines but I heard pipeline orchestration code is not ready for that yet and it is better to wait.
ctrl_orca when executed through ctrl_execute (via runOrca.py) will execute “eups list --setup” and deposit it in the output directory.
The orchestration software itself used ctrl_provenance when policy files were in use, and saved the software versions to the database (along with other things). When the switchover to pex_config happened, orchestration switched to that as well. When the new provenance code is ready, we can get that functionality added back into Orca.