Warped images padded with extremely large values

jrmullaney · July 13, 2018, 9:28am

Hi there,

When producing temporary warped exposures for coadds, we’re noticing that our warped frames are being padded by extremely large values (i.e., maximum floating point values of 3.40282e38). This appears to prevent the files from being effectively lossy compressed (we’re using a quantizeLevel of 16, which appears to be robust for singleFrameDriver.py/processCcd.py outputs), which is a bit of a problem for those of us with limited disk space.

Does anyone know if there is a way to specify the pad value when producing temporary warp exposures? We’re also encountering a similar effect in the outputs of imageDifference.py, so a similar solution there would also be really handy.

Thanks

price · July 13, 2018, 3:48pm

I think that value is the result of round-trip quantising a NaN. I would, however, be surprised if that gets in the way of lossy compression. Do you have evidence for that assertion?

yusra · July 13, 2018, 4:06pm

Before compression, the pixels in the regions with no data are set as (image, mask, var) = (np.nan, "NO_DATA" and np.inf). That is currently not configurable.

jrmullaney · July 13, 2018, 4:06pm

The only evidence I have is from doing the following in python 3.5:

>>> from astropy.io import fits
>>> hdu = fits.open('warp-L-132-3,2-27128.fits')
>>> im = hdu[1].data
>>> hdu_c = fits.CompImageHDU(im, quantize_Level=16)
>>> hdu_c.writeto('c_orig.fits')
>>> im[im > 1e5] = 0.
>>> hdu_c = fits.CompImageHDU(im, quantize_Level=16)
>>> hdu_c.writeto('c_rem.fits')
>>> exit()
> ls -lh
-rw-r--r-- 1 xxxxxx xxxxxx 102M Jul 13 16:54 c_orig.fits
-rw-r--r-- 1 xxxxxx xxxxxx 23M Jul 13 16:54 c_rem.fits

I appreciate that this isn’t an ideal test, but it’s what give me my first suspicions that the large dynamic range of the image may be messing with the compression. The next thing I was going to try was hacking the coadd code to do something similar to above and seeing if I got a similar difference in file size.

price · July 13, 2018, 4:23pm

Oh, you’re reading and compressing with external tools, not the LSST tools. Well, I can see why that might be like that (astropy not masking the bad pixels when determining the stdev). Why don’t you write the files as compressed directly with the LSST tools?

jrmullaney · July 13, 2018, 4:24pm

OK - thanks, that’s a great help. At least if I know where this is set, I can hack my local version of the stack to see if it helps. Or even better, retarget my own from coadddriver.py.

I could give that a try and take full responsibility of the consequences

jrmullaney · July 13, 2018, 4:27pm

Oh, apologies, I should have made myself clearer. I am indeed using the LSST tools for compression, and so was surprised when I was getting such large files from the warp stage (it’s what first drew my attention). The original file I opened there - the warp-L-132-3,2-27128.fits file - is 198MB - and that’s written with this in my writeRecipes.yaml file:

FitsStorage:
  default:
    image:
      compression:
        quantizeLevel: 16
    variance:
      compression:
        quantizeLevel: 16

That use of outside packages was just for me to check if I got similar results outside the stack.

price · July 13, 2018, 4:32pm

198 MB is quite large. I wonder if that is doing lossy compression at all. Are you setting a compression algorithm? There is none in your example above. See the example for “Basic lossy (quantizing) compression” here.

In fact, I don’t think your example should work at all: the quantizeLevel should be defined only under scaling, not under compression. Maybe it’s a bug that we’re not catching it?

jrmullaney · July 13, 2018, 4:39pm

I’ll check, but I’m definitely* getting lossy compression with the calexp images, which uses the same writeRecipes.yaml. My understanding was that by default the stack uses lossless GZIP_SHUFFLE compression, and that the writeRecipes.yaml just overrides those specific components.

*Before I introduced that writeRecipes.yaml file my calexps were of the order 300MB, whereas as soon as I implemented quantizeLevel: 16, they dropped to about 100MB.

jrmullaney · July 13, 2018, 4:46pm

I’ll smarten up my writeRecipes.yaml, though, as you suggest and keep you posted.

Thanks!

jrmullaney · July 13, 2018, 5:03pm

Just noticed something a bit odd, given @price’s last reply. Running singleFrameDriver.py with the following `writeRecipe.yaml’:

FitsStorage:
  default:
    image:
      compression:
        algorithm: GZIP_SHUFFLE
      scaling:
        quantizeLevel: 16
    variance:
      compression:
        algorithm: GZIP_SHUFFLE
      scaling:
        quantizeLevel: 16

Gives a calexp file that’s around 300MB, whereas the following:

FitsStorage:
  default:
    image:
      compression:
        algorithm: GZIP_SHUFFLE
        quantizeLevel: 16
      scaling:
        quantizeLevel: 16
    variance:
      compression:
        algorithm: GZIP_SHUFFLE
        quantizeLevel: 16
      scaling:
        quantizeLevel: 16

Gives a calexp file that’s 88MB (and removing the scaling component completely also produces an 88MB calexp). So, for me, it seems the stack is only considering the quantizeLevel when it’s under compression, rather than scaling.

ktl · July 13, 2018, 5:10pm

https://github.com/lsst/obs_base/blob/master/python/lsst/obs/base/cameraMapper.py#L1241-L1294 seems to indicate that quantizeLevel is valid for both compression: and scaling: with possibly different meanings.

jrmullaney · July 13, 2018, 5:14pm

Good spot @ktl That goes some way to explain the above outcome.

jrmullaney · July 13, 2018, 5:19pm

I’ve just edited makeCoaddTempExp.py at the line that @yusra highlighted so it instead reads:

exp.getMaskedImage().set(1e5, afwImage.Mask.getPlaneBitMask("NO_DATA"), 1e5)

and keeping everything else the same (including my writeRecipes.yaml file) results in a warped file of 66MB, as opposed to 198MB. So, removing the nans and infs seems, for me at least, to have a dramatic impact on the compression. Although, I appreciate that could just be because I’ve changed the file.

price · July 13, 2018, 5:20pm

I think you’re not getting any scaling because you haven’t specified a scaling algorithm. As suspected by @ktl, putting quantizeLevel under compression allows cfitsio to do the quantisation, which always scares me because it is ignorant of our masks.

Note also that astropy does not read our lossy-compressed images correctly (I’d need to dig to remember what the problem is; I think it’s related to the fuzz that gets applied, which it wants to do while it’s an integer image rather than floating-point).

jrmullaney · July 13, 2018, 5:30pm

Then I really don’t understand, having now tried both methods. When using:

FitsStorage:
  default:
    image:
      compression:
        algorithm: GZIP_SHUFFLE
      scaling:
        quantizeLevel: 16
    variance:
      compression:
        algorithm: GZIP_SHUFFLE
      scaling:
        quantizeLevel: 16

I get calexps that are 300MB big, whereas moving quantizeLevel to compression gives me 88MB files. I acknowledge the concerns re:masks when using cfitsio for quantization, but its level of compression is remarkable compared to otherwise.

I also don’t understand why I’m getting so much more compression when I remove the infs and nans from the warped images within the stack (i.e., no outside packages).

jrmullaney · July 13, 2018, 5:33pm

Apologies, @price, I missed your point. I see now that you meant I need to add an algorithm under scaling.

jrmullaney · July 13, 2018, 6:01pm

OK - using lossyBasic now seems to have solved the problem I first raised. Apologies for my confusion earlier, and thanks again for your help and persistence.

I still do find it kinda weird, though, that cfitsio compression doesn’t seem to like those infs and nans. Oh well, I’ll stick to the stack’s in-house compression from now on.

price · July 13, 2018, 8:43pm

Glad to hear it’s working now!

The stack does its own scaling because I don’t trust cfitsio to get the scaling right. Besides the handling of NaN and Inf, it doesn’t know about our masks and so bad pixels end up in the same statistical pot as the good pixels and so the scaling can be wrong.

Another reason for doing our own scaling is that we can add features that cfitsio doesn’t have. Perhaps in the future we’ll want to do an asinh or other non-linear scaling.

jrmullaney · July 14, 2018, 6:56am

That does, indeed, sound much better than using the standard cfitsio compression. Nice job! Thanks, everyone, for your help.