When producing temporary warped exposures for coadds, we’re noticing that our warped frames are being padded by extremely large values (i.e., maximum floating point values of 3.40282e38). This appears to prevent the files from being effectively lossy compressed (we’re using a
quantizeLevel of 16, which appears to be robust for
processCcd.py outputs), which is a bit of a problem for those of us with limited disk space.
Does anyone know if there is a way to specify the pad value when producing temporary warp exposures? We’re also encountering a similar effect in the outputs of
imageDifference.py, so a similar solution there would also be really handy.
I think that value is the result of round-trip quantising a
NaN. I would, however, be surprised if that gets in the way of lossy compression. Do you have evidence for that assertion?
Before compression, the pixels in the regions with no data are set as
(image, mask, var) = (np.nan, "NO_DATA" and np.inf). That is currently not configurable.
The only evidence I have is from doing the following in python 3.5:
>>> from astropy.io import fits
>>> hdu = fits.open('warp-L-132-3,2-27128.fits')
>>> im = hdu.data
>>> hdu_c = fits.CompImageHDU(im, quantize_Level=16)
>>> im[im > 1e5] = 0.
>>> hdu_c = fits.CompImageHDU(im, quantize_Level=16)
> ls -lh
-rw-r--r-- 1 xxxxxx xxxxxx 102M Jul 13 16:54 c_orig.fits
-rw-r--r-- 1 xxxxxx xxxxxx 23M Jul 13 16:54 c_rem.fits
I appreciate that this isn’t an ideal test, but it’s what give me my first suspicions that the large dynamic range of the image may be messing with the compression. The next thing I was going to try was hacking the coadd code to do something similar to above and seeing if I got a similar difference in file size.
Oh, you’re reading and compressing with external tools, not the LSST tools. Well, I can see why that might be like that (astropy not masking the bad pixels when determining the stdev). Why don’t you write the files as compressed directly with the LSST tools?
OK - thanks, that’s a great help. At least if I know where this is set, I can hack my local version of the stack to see if it helps. Or even better, retarget my own from
I could give that a try and take full responsibility of the consequences
Oh, apologies, I should have made myself clearer. I am indeed using the LSST tools for compression, and so was surprised when I was getting such large files from the warp stage (it’s what first drew my attention). The original file I opened there - the
warp-L-132-3,2-27128.fits file - is 198MB - and that’s written with this in my
That use of outside packages was just for me to check if I got similar results outside the stack.
198 MB is quite large. I wonder if that is doing lossy compression at all. Are you setting a compression algorithm? There is none in your example above. See the example for “Basic lossy (quantizing) compression” here.
In fact, I don’t think your example should work at all: the
quantizeLevel should be defined only under
scaling, not under
compression. Maybe it’s a bug that we’re not catching it?
I’ll check, but I’m definitely* getting lossy compression with the
calexp images, which uses the same
writeRecipes.yaml. My understanding was that by default the stack uses lossless
GZIP_SHUFFLE compression, and that the
writeRecipes.yaml just overrides those specific components.
*Before I introduced that
writeRecipes.yaml file my calexps were of the order 300MB, whereas as soon as I implemented
quantizeLevel: 16, they dropped to about 100MB.
I’ll smarten up my
writeRecipes.yaml, though, as you suggest and keep you posted.
Just noticed something a bit odd, given @price’s last reply. Running
singleFrameDriver.py with the following `writeRecipe.yaml’:
Gives a calexp file that’s around 300MB, whereas the following:
Gives a calexp file that’s 88MB (and removing the
scaling component completely also produces an 88MB calexp). So, for me, it seems the stack is only considering the
quantizeLevel when it’s under
compression, rather than
https://github.com/lsst/obs_base/blob/master/python/lsst/obs/base/cameraMapper.py#L1241-L1294 seems to indicate that
quantizeLevel is valid for both
scaling: with possibly different meanings.
Good spot @ktl That goes some way to explain the above outcome.
I’ve just edited
makeCoaddTempExp.py at the line that @yusra highlighted so it instead reads:
exp.getMaskedImage().set(1e5, afwImage.Mask.getPlaneBitMask("NO_DATA"), 1e5)
and keeping everything else the same (including my
writeRecipes.yaml file) results in a warped file of 66MB, as opposed to 198MB. So, removing the nans and infs seems, for me at least, to have a dramatic impact on the compression. Although, I appreciate that could just be because I’ve changed the file.
I think you’re not getting any scaling because you haven’t specified a scaling
algorithm. As suspected by @ktl, putting
compression allows cfitsio to do the quantisation, which always scares me because it is ignorant of our masks.
Note also that astropy does not read our lossy-compressed images correctly (I’d need to dig to remember what the problem is; I think it’s related to the fuzz that gets applied, which it wants to do while it’s an integer image rather than floating-point).
Then I really don’t understand, having now tried both methods. When using:
I get calexps that are 300MB big, whereas moving
compression gives me 88MB files. I acknowledge the concerns re:masks when using cfitsio for quantization, but its level of compression is remarkable compared to otherwise.
I also don’t understand why I’m getting so much more compression when I remove the
nans from the warped images within the stack (i.e., no outside packages).
Apologies, @price, I missed your point. I see now that you meant I need to add an algorithm under
OK - using
lossyBasic now seems to have solved the problem I first raised. Apologies for my confusion earlier, and thanks again for your help and persistence.
I still do find it kinda weird, though, that cfitsio compression doesn’t seem to like those infs and nans. Oh well, I’ll stick to the stack’s in-house compression from now on.
Glad to hear it’s working now!
The stack does its own scaling because I don’t trust cfitsio to get the scaling right. Besides the handling of
Inf, it doesn’t know about our masks and so bad pixels end up in the same statistical pot as the good pixels and so the scaling can be wrong.
Another reason for doing our own scaling is that we can add features that cfitsio doesn’t have. Perhaps in the future we’ll want to do an asinh or other non-linear scaling.
That does, indeed, sound much better than using the standard cfitsio compression. Nice job! Thanks, everyone, for your help.