When producing temporary warped exposures for coadds, we’re noticing that our warped frames are being padded by extremely large values (i.e., maximum floating point values of 3.40282e38). This appears to prevent the files from being effectively lossy compressed (we’re using a quantizeLevel of 16, which appears to be robust for singleFrameDriver.py/processCcd.py outputs), which is a bit of a problem for those of us with limited disk space.
Does anyone know if there is a way to specify the pad value when producing temporary warp exposures? We’re also encountering a similar effect in the outputs of imageDifference.py, so a similar solution there would also be really handy.
I think that value is the result of round-trip quantising a NaN. I would, however, be surprised if that gets in the way of lossy compression. Do you have evidence for that assertion?
Before compression, the pixels in the regions with no data are set as (image, mask, var) = (np.nan, "NO_DATA" and np.inf). That is currently not configurable.
I appreciate that this isn’t an ideal test, but it’s what give me my first suspicions that the large dynamic range of the image may be messing with the compression. The next thing I was going to try was hacking the coadd code to do something similar to above and seeing if I got a similar difference in file size.
Oh, you’re reading and compressing with external tools, not the LSST tools. Well, I can see why that might be like that (astropy not masking the bad pixels when determining the stdev). Why don’t you write the files as compressed directly with the LSST tools?
OK - thanks, that’s a great help. At least if I know where this is set, I can hack my local version of the stack to see if it helps. Or even better, retarget my own from coadddriver.py.
I could give that a try and take full responsibility of the consequences
Oh, apologies, I should have made myself clearer. I am indeed using the LSST tools for compression, and so was surprised when I was getting such large files from the warp stage (it’s what first drew my attention). The original file I opened there - the warp-L-132-3,2-27128.fits file - is 198MB - and that’s written with this in my writeRecipes.yaml file:
198 MB is quite large. I wonder if that is doing lossy compression at all. Are you setting a compression algorithm? There is none in your example above. See the example for “Basic lossy (quantizing) compression” here.
In fact, I don’t think your example should work at all: the quantizeLevel should be defined only under scaling, not under compression. Maybe it’s a bug that we’re not catching it?
I’ll check, but I’m definitely* getting lossy compression with the calexp images, which uses the same writeRecipes.yaml. My understanding was that by default the stack uses lossless GZIP_SHUFFLE compression, and that the writeRecipes.yaml just overrides those specific components.
*Before I introduced that writeRecipes.yaml file my calexps were of the order 300MB, whereas as soon as I implemented quantizeLevel: 16, they dropped to about 100MB.
Gives a calexp file that’s 88MB (and removing the scaling component completely also produces an 88MB calexp). So, for me, it seems the stack is only considering the quantizeLevel when it’s under compression, rather than scaling.
and keeping everything else the same (including my writeRecipes.yaml file) results in a warped file of 66MB, as opposed to 198MB. So, removing the nans and infs seems, for me at least, to have a dramatic impact on the compression. Although, I appreciate that could just be because I’ve changed the file.
I think you’re not getting any scaling because you haven’t specified a scaling algorithm. As suspected by @ktl, putting quantizeLevel under compression allows cfitsio to do the quantisation, which always scares me because it is ignorant of our masks.
Note also that astropy does not read our lossy-compressed images correctly (I’d need to dig to remember what the problem is; I think it’s related to the fuzz that gets applied, which it wants to do while it’s an integer image rather than floating-point).
I get calexps that are 300MB big, whereas moving quantizeLevel to compression gives me 88MB files. I acknowledge the concerns re:masks when using cfitsio for quantization, but its level of compression is remarkable compared to otherwise.
I also don’t understand why I’m getting so much more compression when I remove the infs and nans from the warped images within the stack (i.e., no outside packages).
OK - using lossyBasic now seems to have solved the problem I first raised. Apologies for my confusion earlier, and thanks again for your help and persistence.
I still do find it kinda weird, though, that cfitsio compression doesn’t seem to like those infs and nans. Oh well, I’ll stick to the stack’s in-house compression from now on.
The stack does its own scaling because I don’t trust cfitsio to get the scaling right. Besides the handling of NaN and Inf, it doesn’t know about our masks and so bad pixels end up in the same statistical pot as the good pixels and so the scaling can be wrong.
Another reason for doing our own scaling is that we can add features that cfitsio doesn’t have. Perhaps in the future we’ll want to do an asinh or other non-linear scaling.