FITS tile compression support in the stack?

parejkoj · July 5, 2016, 11:54pm

Has anyone tried using the stack with tile compressed FITS images? My attempt to tile-compress some calexps for jointcal’s validation data results in jointcal producing errors (for reasons I haven’t fully tracked down, but which may have to do with the handful of headers that change in the file when it is tile compressed).

In general, it would be useful for us to support FITS tile compression if we are going to use FITS files: it allows one to still potentially mostly memory map the data and access the headers without having to decompress the files. Tile compression is lossless for integer data, and lossy with a specifiable level of precision for floating point data.

boutigny · July 6, 2016, 6:56am

Some time agio, I tried compressing the calexps and run into a lot of trouble. I will try to dig in my notes to retrieve the details but the resukts was that it was not usable for calexps.

price · July 6, 2016, 7:49pm

The stack needs some work to support convenient on-the-fly FITS compression, e.g., we don’t want cfitsio scaling our floating-point data itself (because it doesn’t know about masked pixels). We can probably just adapt the code used in Pan-STARRS, which works fine. This is one of those things that we’ve always planned to do, but have just never gotten around to doing. Now that HSC’s data volume is growing rather rapidly, maybe it’s getting about time to do it?

parejkoj · July 6, 2016, 8:07pm

These all seem like good points. But whether we can read tile compressed FITS images seems like a separate question to me, from how we actually manage compressed data.

price · July 6, 2016, 8:22pm

I think the stack will already support reading tile-compressed raw FITS images.

parejkoj · July 6, 2016, 8:33pm

Hmm, so what might be the problem with compressing calexps after the fact? Particularly when I’m not actually accessing the pixel data?

ktl · July 6, 2016, 8:35pm

I would compare the results of cfitsio fpack (which I believe the stack should read) with those from astropy.io.fits (which I think you’re using) to see what differences there are.

parejkoj · July 6, 2016, 8:37pm

Ok, I’ll give that a try. Note that I did find and fix the bug in astropy that was causing it to write non-valid compressed files before, so the files astropy produces now pass fitsverify.

parejkoj · July 7, 2016, 2:01am

Following up on this: I ran fpack (no arguments) on the images (without zeroing them out), and got a similar compression ratio to what astropy gave me. jointcal did not crash, but also did not fit the images correctly, suggesting that something is still going wrong in ingesting the data.

For my particular usecase (generating small-size test catalogs for jointcal), I have found a solution by zeroing the images and then gzipping them and using the fact that cfitsio will read the .fits.gz files “automatically” when the butler requests the .fits file. This isn’t an ideal solution (it’s vaguely magical), but I’m only using it for my test data so it will do for now.

I guess I’ll file a ticket about reading tile compressed images and someone can try to create some tests to see what exactly is failing?

ktl · July 7, 2016, 2:17pm

I didn’t realize what your actual problem was until reading @RHL’s other post. If all you really want is the image metadata including the binary-persisted ExposureInfo, you ought to be able to replace the image HDUs with single pixels rather than just replacing the current pixels with zeros. Then you shouldn’t need gzip. This will eventually be solved by persisting the ExposureInfo separately.

RHL · July 7, 2016, 2:22pm

@parejkoj tried that, but jointcal gets the bounding boxes from the image data.

ktl · July 7, 2016, 2:24pm

Darn. I believe composite datasets are a high priority item for the Butler.

price · July 7, 2016, 3:39pm

A common problem is trying to read NAXIS[12] from the header of a compressed image, but cfitsio doesn’t convert those when you read the image in. The Pan-STARRS FITS code includes some header manipulation to restore the header, which we may want to copy.

parejkoj · July 7, 2016, 5:18pm

I can’t confirm that NAXIS/NDIM was the actual problem, though I would not bet against it, but neither a single 0 “pixel” nor a 2x2 0 pixel array were accepted.