I’m currently working on data structures to hold “cell-based coadds” - coadds that are built in small, sub-patch cells, in which all pixels in each cell have exactly the same set of input images, and hence discontinuities in the PSF and other properties only appear on cell boundaries. I have a good sense for how to handle the in-memory data structures, even though there’s still plenty of prototyping work to be done, but there are some big choices to be made about how to handle the (FITS) persistence of these coadds.
First off, I do not think writing them via afw.image.Exposure
is a good idea; we are adding a few image planes beyond the image-mask-variance triplet that really should be coequal with image-mask-variance, and ExposureInfo
components like Psf
, SkyWcs
, PhotoCalib
, ApCorrMap
, and CoaddInputs
don’t naturally map well to the cell-based structure. Even more importantly, the cells will have overlaps, so we can’t represent the full multi-cell content as a piecewise image. We can and will provide a Exposure
view that is a piecewise image, with piecewise components, as that will play an important role in being able to use existing algorithms on these new coadds, but this will be a lossy view, and that makes it unsuitable for persistence.
Here are a few ideas for how we might want to save these multi-cell coadds instead to seed discussion. I am assuming FITS because FITS is perfectly capable of representing all of these forms, and we’re all familiar with it, and we have some requirements involving providing FITS versions of at least some data products. I think using e.g. HDF5 or ASDF instead is mostly orthogonal to the data-organization questions I’d prefer to focus on here.
Exploded piecewise images: Save full cells for each plane in a single piecewise image HDU (different HDUs for each plane), with the right positions in the grid but overlaps duplicated. This is similar in spirit to how raw amplifier images (including overscan regions) are stitched into a single image on disk for HSC and DECam (but not LSST, though we simulate this form on read for LSST as well). We would presumably do the same for PSF postage stamp images (one per cell), even though these have different dimensions, with non-image components in more-or-less opaque binary tables (as they are for Exposure
components today). The actual geometry would be written to the headers somehow.
- This form is pretty good for humans who want to look at images by eye; they have to mentally clip out the overlap regions, but the eye is pretty good at that.
- This form is pretty good for advanced measurement algorithms that want to use the full cell regions and handle overlaps themselves; they don’t have to stitch anything together, but they do have to seek around the file quite a bit to find all of the data relevant for a single cell, if they want to read only some cells.
- This form is bad for naive algorithms that just want to work on the stitched-together piecewise image.
- This form is fine for compression: a careful tile compression configuration could handle this well (edited).
Per-plane data 4-cubes: Same as “Exploded piecewise images”, except each plane is a 4-d array with the index of each cell in the grid the other two dimensions. Pros and cons seem to be pretty much the same, but with a bit more of the geometry encoded in the array rather than the header, and for human inspection it makes looking at cells independently a bit easier while making it nearly impossible to view them together.
Big binary table: Save full cells for each image as 2d array columns in a binary table, where each row is a cell and each column is a different image plane. Non-image components could be stored in other columns in the same row. The actual geometry could be written to the header and/or stored (with some duplication) in columns.
- This form is bad for humans who want to look at images by eye, because most FITS viewers aren’t designed for binary tables that contain images.
- This form is good for advanced measurement algorithms that want to use the full cell regions and handle overlaps themselves; all information for each cell is close together on disk.
- This form is the easiest to map to our in-memory form, and may require fewer temporaries or computation in I/O.
- This form is good for providing a complete description of the FITS data model. Such a description would be possible for other forms, but I think this one is particularly easy to fully describe and understand.
- This form is bad for compression (FITS binary table compression is experimental, not standard).
Piecewise stitched images with overlaps separate: Save the inner, non-overlapping regions of each cell together as a piecewise image, with one plane per HDU. For each plane, add another image HDU that has the alternate, non-primary pixels from those overlap regions, with zero or NaN where there is no alternate. The PSF images would be another plane, and it would probably make sense to use the same grid with some zero or NaN padding (it is safe to assume that PSF images are smaller than the other planes for cells). Non-image components would be saved in more-or-less opaque binary tables.
- This form is very good for humans who want to look at images by eye; the image is as continuous as it can be, and the alternate pixels can be easily blinked with the primary pixels.
- This form is bad for advanced measurement algorithms that want to use the full cell regions and handle overlaps themselves; these need to stitch together information from multiple HDUs to construct full-cell images for even a single plane.
- This form is good for naive algorithms that just want to work on the stitched-together piecewise image, as this image is directly available as natural FITS image per plane.
- This form requires at least lossless compression to avoid inefficiencies due to the zero/NaN padding it implies. It is very bad for lossy compression, because the regions we would want to compress together are the full-cell images, and those are spread across HDUs.