Use cases for Butler Composite Datasets

I’m gathering high-level descriptions of use cases for Butler Composite Datasets. @jbosch recently provided some to me, they are captured on Confluence at Use Cases for Composite Datasets.

The requirements document provides a high level description of the composite dataset feature.

If you’re interested, please take a look. I’m specifically seeking:

  • questions and corrections about currently published use cases
  • new use cases

(note, the use case titles were created by me based on my parsing of what Jim wrote. I’ve asked him to review but he’s traveling and I suppose it may take a few days for him to get to it. If a title seems wrong it very well may be my mistake; let me know)

Per conversions at the all hands meeting, after defining use cases the next step would be to create pseudocode for those use cases. Revisions on the requirements and design/implementation proposal will follow after that.

If you’d prefer discussion on the other page, feel free to kick me over there. One use case that I think is not covered (correct me if I’m wrong), is subsectioning repositories along the dataset axis.

I.e. the calexp is a composite dataset comprising a MaskedImage and other things like the Psf, Wcs, and Calib objects. The src is effectively a composite dataset (though it is not stored this way) because it needs the Calib object from the calexp in order to put the instrumental fluxes in a calibrated system.

We know of cases where we want to create a new repository with just the src catalogs for all the dataIds because the images are so large. Maybe this is captured elsewhere in the requirements, but this implies that composite datasets should be able to share components and be able to (un)persist the shared components independently.

This is a fine place for conversation and sharing use case ideas. This use case does seem unique, especially where it identifies the need to support subsectioning repositories. I’ve added it to the other page.

Simon’s use case is one that I would have suggested too.

My other suggestion is I think already captured in your 6): full camera Visit metadata which is shared by all the CCD Exposures associated with that visit.

That wasn’t exactly what I intended by (6), though it’s closely related. I’d recommend adding this explicitly to the confluence page.

I’ll let @natepease manage adding items to the list. Let me know if you want a more detailed description or if the above is enough that you can expand on it (we talked about this during the mini-session at the All Hands).

Hey @parejkoj, it would be great if you would write more detailed description of your idea. I added an item 10 (“Full Camera Visit Metadata Shared by All the CCD Exposures Associated With That Visit”) to the list. Please write your use case in that section.

I’ve put in some more notes. Since I’m not exactly sure what design I want, I didn’t want to go into too many implementation details. Let me know if you need more.

@natelust @hsinfang You were both involved in partially convincing me that we shouldn’t have a full focal plane Exposure/Catalog object, but rather let all of that be managed by the Butler. That conversation is relevant to my use case #10 that I added to the list above. I sadly don’t remember much of that conversation: could you please add some thoughts about managing full focal plane visits here, since it weighs on what butler composite dataset requirements are.