A few data products came up in my conversation with @jbecla and @npease yesterday as potentially having a larger contribution to the overall size of the outputs than the Data Access team anticipated. I agreed to get them some rough size measurements from the HSC pipeline:
Psfobject for a single CCD is 180k, shrinking down to 118k when gzipped. I could imagine that going up quite a bit (factor of 10?) for LSST as the complexity of the models increases. It’s also likely we’ll switch to having per-visit
Psfobjects rather than per-CCD, but with probably the same information content overall.
CoaddPsffor a single patch is 20m, shrinking down to 14m when gzipped. This will scale with the number of exposures; the one I measured had 22 (but you can’t just divide the size by 22 - there are actually 121 CCDs that contributed to this particular coadd patch, due to dithers). However, nearly all of this duplicates stuff (mostly
Psfobjects) also stored with per-CCD
Exposures, and we could eliminate that redundancy by being a bit more clever.
For a per-patch coadd measurement catalog that contains 33524 objects, the
HeavyFootprints are 256m (yes, that’s more than the 169m of pixel data in the corresponding
MaskedImage), compressing down to 164m in gzip. The regular (non-heavy)
Footprints are just 9m uncompressed and 2m compressed.