Scaling the variance plane and accounting for covariance

raphaelshirley · August 3, 2021, 1:36pm

This is a somewhat detailed post regarding the variance plane, its scaling, and the effect on subsequent measurement errors. It is quite long so thank you in advance for any interest.

We are joint processing VISTA and HSC imaging in preparation for Rubin. The standard multiband pipeline requires a consistent pixel scale which effectively means warping the native 0.336 arcsec/pixel VISTA images to the higher resolution 0.168 HSC arcsec/pixel pixel scale. This makes the VISTA imaging subject to significantly higher pixel covariance. That said, my questions are relavent to the native scale pipelines due to the general warping process and interpolation.

In the standard run, as I understand it, the variance plane of the final coadd is scaled by the ScaleVarianceTask such that the median variance is equal to the coadd image pixel variance. Using this standard method we find that measured aperture magnitude errors are offset from those measured on the native scale using SExtractor by a factor of around 4. We are weighing up two possible mitigation strategies:

Firstly, we are considering simply scaling the measured catalogue errors to account for this offset. This is what has been done in previous studies using SExtractor in dual image mode on VISTA and DES imaging (Banerji, et al. MMRAS 446.3, 2015).

Secondly, we are considering measuring the image autocorrelation matrix. The sum of the normalised matrix should then give us the relative factor of variance lost to covariance. By scaling the whole variance plane according to this factor the various measurement algorithms should have their error measurements scaled.

The first is certainly simpler partly because it only involves post processing of the catalogues. It also should mean that pixel variances are appropriate for performing signal to noise detection thresholds on pixels. I am also nervous that the latter depoends on the assumption that all measurement algorithms will be similarly impacted by covariance which seems questionable.

In the long term a pipeline with multiple pixel scales would have advantages. Presumably given the large differences in resolution with, eventually, Euclid will necessitate such a new pipeline. For now we would like to work with a consistent pixel scale.

Finally, it might be that we have a more fundamental problem with the variance plane. During the detectCoaddSources it appears that the variance plane is being scaled by a factor of 0.1:

coaddDriver.detectCoaddSources.scaleVariance INFO: Renormalizing variance by 0.099319
coaddDriver.detectCoaddSources.detection INFO: Applying temporary wide background subtraction
coaddDriver.detectCoaddSources.detection INFO: Detected 2012 positive peaks in 1775 footprints to 5 sigma
coaddDriver.detectCoaddSources.detection.skyObjects INFO: Added 1000 of 1000 requested sky sources (100%)
coaddDriver.detectCoaddSources.detection.skyMeasurement INFO: Performing forced measurement on 1000 sources
coaddDriver.detectCoaddSources.detection INFO: Modifying configured detection threshold by factor 0.300488 to 3.004881
coaddDriver.detectCoaddSources.detection INFO: Detected 4432 positive peaks in 3513 footprints to 3.00488 sigma
coaddDriver.detectCoaddSources.detection INFO: Detected 5104 positive peaks in 3678 footprints to 3.00488 sigma
coaddDriver.detectCoaddSources.detection.skyObjects INFO: Added 1000 of 1000 requested sky sources (100%)
coaddDriver.detectCoaddSources.detection.skyMeasurement INFO: Performing forced measurement on 1000 sources
coaddDriver.detectCoaddSources.detection INFO: Tweaking background by -0.000661 to match sky photometry

This seems high. Perhaps we should switch off the variance scaling? In a previous post I discussed how we use image standard deviation to determine source detection because the pixel based standard deviation was yielding too many sources. Is there an obvious reason why the scale variance task would fail when using non-native pixel scales. Another issue with our runs is that we are using six exposure stacks as input images instead of raw exposures because the single exposures were not deep enough to identify enough calibration stars. Does the variance scaling propagate through to the measurements? I am curious that following the variance scaling the detection threshold is also scaled by a similar factor (sqrt(0.1)~0.3).

Many thanks for reading this detailed question.

Raphael.

price · August 3, 2021, 3:27pm

As you’ve seen, when dealing with resampled images (and coadds are sums of resampled images), the variance plane can only ever be a rough guide to the pixel variances because resampling introduces co-variance between pixels which the pipeline does not track, and hence it is lost from the accounting. The pipeline scales the variance plane to match the measured image variance as a desperate attempt to bring the variance plane closer to what it should be, but this can only ever be an approximation because the pipeline does not track covariance, so any combination of the variance over multiple pixels is going to be wrong. This means that photometric (and astrometric and shape) errors computed from the variance plane are incorrect. I believe this is the case for any photometry on any coadd image (if constructed with resampling): do not trust the errors you get out of Source Extractor either. If you need accurate errors, Monte Carlo is the way to go.

In your case, with images of different native pixel sizes that require large resampling kernels, the problem is going to be magnified, as you’ve recognised.

Now, what can be done? In Pan-STARRS, we calculated a “covariance pseudo-matrix”, which was an average measure of the ratio of the covariance to the variance for an image (similar to your second method, but calculated from the resampling kernels rather than measuring the image autocorrelation). I think it was a better approximation, but it was still only ever an approximation because resampling kernels vary on very fine scales. I believe someone in LSST played around with tracking the covariance (at least, over a small scale) for the entire image, but that’s got to be expensive (in terms of both compute and storage). I don’t think anyone has come up with a clean solution that’s universally acknowledged to be good, and so we’ve attempted to proceed by duct tape and Monte Carlo.

Rescaling the variance plane is a stupid duct tape-style solution because it only deals with scales of a single pixel, but I like rescaling the detection limit using sky objects (Modifying configured detection threshold by factor 0.300488 to 3.004881): it’s Monte Carlo on the detection limit. I would hope that the problem with the measurement errors can similarly be dealt with using measurements of fake sources.

Now, regarding your question about the variance scaling, my first impression is that you shouldn’t turn it off. We’re going to adjust the detection limits later using sky objects which may suggest that scaling the variance plane isn’t important, but there’s a detection stage before we adjust the detection limits that we use to identify empty sky. But I suppose that if you’re not using thresholdType=pixel_stdev then maybe it really doesn’t matter. I would love to see images with detected sources overlaid that you’re getting with different settings.

raphaelshirley · August 3, 2021, 4:31pm

Here are the detected pixels in blue over the image for various detectCoaddSources config values

For this reason we settled on stdev and 10 sigma based on the highly unscientific criteria of the detection planes ‘looking’ sensible. We are also considering simply including a threshold multiplier based on our measurements of the autocorrelation matrix sum.

price · August 3, 2021, 5:41pm

My guess on what is happening here is that you’ve got a lot of covariance in the images even before resampling (probably from your “raw” images actually being coadds themselves; the images you show above look like they’ve been smoothed), and it’s getting away from us, so that the attempt at finding sky leaves only the lowest valleys, which ruins the algorithm (how many sky objects is it using?).

I don’t like using thresholdType=stdev for coadds because then the detection doesn’t account for different noise levels over the image (e.g., chip gaps, edge of the field), but thresholdType=pixel_stdev requires getting the variance twiddle factors reasonable. One advantage you’ve got is that there’s a lot of sky, so we might be able to tolerate some source contamination in our sky because we can identify those as outliers. Some parameters to play with include:

prelimThresholdFactor: fraction of the threshold to use for the initial pass (to find sky objects). This defaults to 0.5; I suggest raising it to 1 or 2 or maybe even more.
skyObjects.avoidMask: mask planes that are avoided when placing sky objects. Since you’ve got so much sky, maybe you could get away with the otherwise ludicrous idea of removing DETECTED from this list.

raphaelshirley · August 3, 2021, 6:10pm

Ok I will experiment with those parameters tomorrow. This is a small test in a deep field and we also saw strong dependence on the number of exposures going in to the preliminary coadds. I will run tests on a selection of situations. We are still thinking of retrying using the individual exposures if we can tweak parameters to get enough calibration stars. There are also issues with the variance at the edge of the coadd which might involve fewer coadds on a given pixel and around bad pixels. We are planning to implement our own pipeline’s confidence maps to account for these features as we curently simply multiply the gain by the number of exposures going in to the preliminary coadds.

I am adding 1000 sky objects but in the worst case it only manages to add around 20.

coaddDriver.detectCoaddSources.detection.skyObjects INFO: Added 18 of 1000 requested sky sources (2%)
coaddDriver.detectCoaddSources.detection.skyMeasurement INFO: Performing forced measurement on 18 sources
coaddDriver.detectCoaddSources.detection INFO: Tweaking background by -0.015537 to match sky photometry

Thank you for your advice.

price · August 3, 2021, 6:50pm

I think the way to do this Properly would be to rewrite early elements of the pipeline (mostly processCcd) to be more suitable for NIR data (low S/N per exposure). The main goal would be to have only a single resampling stage (specifically, in makeWarp just before assembleCoadd). Maybe you have to do a simple coaddition before running something like processCcd in order to measure good astrometry and PSF model; if you can do that with integer pixel shifts only, then you can use those simple coadds in the same way we treat the optical raw images and everything is easy, but if not then I would transfer the astrometric solutions and PSF models from the simple coadds back to the original images so that those can be input into the proper coadd. The other big problem would be that CoaddPsf doesn’t work well with large numbers of inputs, but that’s a problem LSST knows it has to fix.

raphaelshirley · August 4, 2021, 11:47am

The original coadds performed by the CASU pipeline are on integer pixel shift observations. I agree that ideally we would reimplement the first stages of the CASU pipeline at the processCcd stage. We currently have a first test run in gen 2 with the present method ingesting the preliminary coadds. Our next step is to reimplement this run with the gen 3 Butler after which we have around a year to improve the pipeline so I think looking again at the processCcd stage and possibly using the single exposures will be a priority.

For the deep VIDEO deep coadds we will be coadding very large numbers of images (~150 images of six exposure coadds or 150 *6 = 900 single exposures). The main purpose of the work however is the VHS survey which only has around ten images contributing to a given pixel. VHS covers the entirety of the wide fast deep Rubin survey.

Thanks again for the advice I will reply here if we make significant progress with the covariance tracking and creation of the initial variance planes.