A few data products came up in my conversation with @jbecla and @npease yesterday as potentially having a larger contribution to the overall size of the outputs than the Data Access team anticipated. I agreed to get them some rough size measurements from the HSC pipeline:
-
A
Psf
object for a single CCD is 180k, shrinking down to 118k when gzipped. I could imagine that going up quite a bit (factor of 10?) for LSST as the complexity of the models increases. It’s also likely we’ll switch to having per-visitPsf
objects rather than per-CCD, but with probably the same information content overall. -
A
CoaddPsf
for a single patch is 20m, shrinking down to 14m when gzipped. This will scale with the number of exposures; the one I measured had 22 (but you can’t just divide the size by 22 - there are actually 121 CCDs that contributed to this particular coadd patch, due to dithers). However, nearly all of this duplicates stuff (mostlyPsf
objects) also stored with per-CCDExposure
s, and we could eliminate that redundancy by being a bit more clever. -
For a per-patch coadd measurement catalog that contains 33524 objects, the
HeavyFootprint
s are 256m (yes, that’s more than the 169m of pixel data in the correspondingMaskedImage
), compressing down to 164m in gzip. The regular (non-heavy)Footprint
s are just 9m uncompressed and 2m compressed.