Creating new storage class to ingest DES sky frames

Hi All, I’m looking to implement sky modeling a la DES, which uses 4 eigenimages instead of HSC’s single image. I’m trying to ingest a collection of these into my butler repository. I would like to create a new storage class that saves the 4 images of each .fits as a multi-array ExposureF object, and I’m having difficulty navigating the documentation to 1. create this new storage class using the pre-existing ExposureF storage class and 2. register the storage class with the butler. Any assistance would be appreciated!

Step 1 is to work out what your python type is going to be. It sounds like you want a container class that has multiple images inside it. Something like list[ExposureF] for example but you need to define an explicit python type (not something generic like list). Once you have an agreed upon python type you can define the corresponding StorageClass, including any thoughts about components. Once that is agreed upon we can talk about the formatter code that will read and write these python objects.

1 Like

Speaking with @jbosch about this, we were considering whether something simple like a numpy ndarray (ArrowNumpy) would suffice here? Unless there’s specific metadata that you need to bring along with your ingested data?

1 Like

I would not recommend actually using just a NumPy array; that doesn’t leave you any room to extend things in the future, and it’s not descriptive about what’s in it. But a simple class (maybe a dataclass) built on top of Numpy arrays would be easier to maintain than something built on top of afw.image objects, unless there’s something specific you’re getting out of those afw objects.

2 Likes

What I was trying to say in my original comment was that I strongly suggest you create a proper python class for this data. Using a generic type seems easy to begin with but it very quickly makes it very difficult if you decide you want to add some extra metadata or rearrange the way you do things. A proper type also gives you type safety since butler will check that it is receiving the right thing, and also the ability in the future to support storage class conversions in a more robust way. A data class that has the numpy arrays in it gives you much more flexibility to evolve.

1 Like

Thanks for the response! I suppose the only requirements I have for the data is that it can store the 4 images of each .fits (as numpy arrays), along with their corresponding filter and ccd numbers, in a way that can be read with the HSC sky modeling pipeline, and that it can be manipulated with the butler of course.