PSF and Footprint sizes

A few data products came up in my conversation with @jbecla and @npease yesterday as potentially having a larger contribution to the overall size of the outputs than the Data Access team anticipated. I agreed to get them some rough size measurements from the HSC pipeline:

  • A Psf object for a single CCD is 180k, shrinking down to 118k when gzipped. I could imagine that going up quite a bit (factor of 10?) for LSST as the complexity of the models increases. It’s also likely we’ll switch to having per-visit Psf objects rather than per-CCD, but with probably the same information content overall.

  • A CoaddPsf for a single patch is 20m, shrinking down to 14m when gzipped. This will scale with the number of exposures; the one I measured had 22 (but you can’t just divide the size by 22 - there are actually 121 CCDs that contributed to this particular coadd patch, due to dithers). However, nearly all of this duplicates stuff (mostly Psf objects) also stored with per-CCD Exposures, and we could eliminate that redundancy by being a bit more clever.

  • For a per-patch coadd measurement catalog that contains 33524 objects, the HeavyFootprints are 256m (yes, that’s more than the 169m of pixel data in the corresponding MaskedImage), compressing down to 164m in gzip. The regular (non-heavy) Footprints are just 9m uncompressed and 2m compressed.

And Jim was telling us that it’d be useful to keep these data products (or maybe regenerate on the fly the biggest ones, like HeavyFootprints) and release them as part of the Data Release, because some advanced users might need these products eg when developing better algorithms for DRP. I understand these data products do not need to be indexed, and can be stored as “blobs” somewhere alongside images, and do not need to be in the database. I don’t recall seeing any mention of space for psf and coaddpsf in our storage estimates.