Composite Datasets in Butler

A feature has been added to Butler that allows a single object to be accessed, by Butler.get or Butler.put, that contains one or more component dataset objects that have been read, or are written, separately from/to disk. Documentation about Composite Datasets can be found in the draft version of LDM-463

@jbosch provided the feature description and use cases for this feature. The use cases and some of the supporting design pseudocode can be found on confluence.

This feature is ‘done’ for the time being. Jim and I would like for people to use the feature and ask for help where needed and provide feedback. We will gather feature requests and work with schedulers to schedule any needed additional work.

To clarify a bit: actually being able use this composite datasets functionality requires work in three places:

  • Within the butler itself. This is what @natepease has completed.

  • In the policy files for the mappers. This should be quite simple.

  • In the low-level persistence for objects (such as Exposure) we’d like to treat as composites. The work needed to get one of these usable as a composite is probably quite small, but the work to make it efficient is more substantial.

Per a recent discussion in slack#dm-science-pipelines, turning ExposureInfo and VisitInfo into components of an Exposure composite dataset might be a good idea™ soon-ish.

2 Likes