Custom class as a recognized dataset type in the Gen3 Butler

Snyder005 · April 15, 2021, 11:52pm

I am currently working on a use case of the Gen 3 butler and pipetask analysis for the BOT data and I could use some helpful guidance for implementing my code.

I am planning to run an analysis on a Source Catalog and encapsulate the results as a custom python class that has read/write function to save as a FITS file. I would like to be able to have the butler recognize this new class as a valid dataset type, be able to create a collection of datasets corresponding to the new class, and associate it with the original source catalog, raw exposure, post-ISR exposure, etc. Is this possible and if so what are the code requirements for the custom python class to properly interface with the butler?

Currently my ideal case would be a class that inherits from Source Catalog, but has additional code to handle a new set of information (a new HDU in the Fits file).

timj · April 22, 2021, 4:37pm

Creation of new DatasetTypes usually happens in the PipelineTask definition but can be done directly with registry interface. At the moment DatasetTypes are global for registry.

To create a DatasetType you need to first define a new StorageClass. This effectively associates a name with a python type and lets you declare and components or derived read-only components. If you look at daf_butler/storageClasses.yaml at master · lsst/daf_butler · GitHub you can see the current ones.

In theory you can make a local StorageClass definition in a storageClasses.yaml using a Butler config search path – that works fine for testing but as soon as you want others to use your code you’ll need it to be defined in the usual place and that means a daf_butler PR for now (although we have some ideas on how to relocate those).

Once you have the StorageClass defined you need to tell datastore how to format the python type to a file. In your case if you support readFits and writeFits you can use the generic FITS formatter – examples are in daf_butler/formatters.yaml at master · lsst/daf_butler · GitHub – again you can define your own datastores/fileDatastore.yaml in a directory with an environment variable pointing at it but if you want others to be able to use your code you’ll need to make a change to the daf_butler one.