Why was detection.includeThresholdMultiplier=10 for old ProcessCcdTask?

rowen · January 25, 2016, 10:10pm

The old ProcessCcdTask explicitly set detection.includeThresholdMultiplier=10 (as opposed to its default of 1), and otherwise used the default of detecting at 5 sigma.

In contract, the new task will start with a high S/N detection pass (e.g. 10 sigma), use that to determine PSF (and write the results as “icSrc", then perform a low S/N detection (e.g. 5 sigma) and measure those sources and use them for astrometric and photometric calibration (and write the results as “src”). I’m trying to figure out if I should also set includeThresholdMultiplier.

Also…a more obscure question. The old calibrate task did the following in its config’s setDefault method. Why? It looks like a no-op to me, and what is special about the catalog star selector that it needs such a tweak?

 initflags = [x for x in self.measurePsf.starSelector["catalog"].badStarPixelFlags]
 self.measurePsf.starSelector["catalog"].badStarPixelFlags.extend(initflags)

RHL · January 25, 2016, 10:14pm

I never understood the point of includeThresholdMultiplier; it was introduced long ago (by @price?) to solve some problem in HSC data, and rather than adopt and change it now I’d rather figure out which problem we were trying to solve.

I have no insight to add to the initFlags part of the question.

price · January 26, 2016, 2:19am

We want to specify the threshold using two numbers: the first number is the threshold for including a source in the catalog and the second is the threshold to use for setting the Footprint. I don’t think these numbers should be the same — we don’t want the Footprint set to a single pixel (or even a single pixel grown by the PSF) for a 50-sigma detection when we’re doing the calibration; I think we want about the same Footprint as we’ll use later for doing the actual photometry.

I chose to represent these two numbers with the threshold for setting the Footprint, and then a multiplier of that threshold for the threshold for including a source in the catalog — hence includeThresholdMultiplier. No doubt we could choose a better name or scheme, but I believe the whole detection scheme needs rethinking so I suggest changing it when everything else gets updated.

RHL · January 26, 2016, 3:00pm

I don’t agree with Paul’s fundamental point. We detect in PSF-smoothed images, so a single pixel above threshold is a significant detection of an object. We grow to the size of the PSF, and that is the correct footprint for a point-source of any magnitude.

Now, in practice, we don’t grow by a PSF large enough to capture the faint wings of the source, but I don’t think a second threshold is the correct way to capture this. It’d be better to enable the multi-scale detection code and merge the detected pixels, which is what we did in SDSS, or fit the wings and mask them, or something cleverer that doesn’t spring to mind.

So while this may be a pragmatic practice for now, it is not the right thing to do and I don’t think we should be doing things this way in a year’s time. If you want multiple thresholds (e.g. to find bright objects and mask faint ones) it’s easy enough to write efficient code which detects Footprints containing nested Footprints at higher thresholds (which is what SDSS did).

rowen · January 26, 2016, 4:15pm

@price the existing CalibrationTask has the following defaults for detection (and, as far as I can tell, these are the values used, based on the config/processCcde.py override files in various obs_ packages):

includeThresholdMultiplier = 10 (overriding the DetectionTask default of 1)
thresholdValue = 5
thresholdType = "stddev"

Do you have any insight into why includeThresholdMultiplier is so large? From your description it sounds as if we are detecting sources at 50 sigma (and footprints at 5). I am surprised that we detect any sources.

Also, we are moving from a system that detects once and measures twice to a system that has two separate detect-and-measure passes : first detect and measure bright sources and use to fit a PSF, then detect and measure more sources for the final source catalog, and use that for astrometric and photometric calibration. We’d like to use 5 sigma detection for the bright stars and 10 sigma for the more complete catalog. Can you suggest appropriate values for thresholdValue and includeThresholdMultiplier?

price · January 26, 2016, 4:43pm

For calibration, we have explicitly been detecting bright objects only — 50 sigma (that threshold’s not usually as bright as you seem to think). Of course, this is one area where the pipeline can fall over, for in low S/N data (short exposures, lots of cloud) we fail to recover many objects. Ideally, we’d like to detect everything we can but that can be hard on the PSF estimation and maybe other calibration steps which end up wasting a lot of time on lots of faint sources. I think the best way around that is to modify the PSF estimation to take only the brightest N (or F% of) sources. I hope you can move things in this direction, for it would make the pipeline much more stable (and get rid of the need for my includeThresholdMultiplier hack).

rowen · January 30, 2016, 12:51am

PSF estimation is now done in CharacterizeImageTask and this uses the old detection config values of includeThresholdMultiplier = 10 and thresholdValue = 5, thus only bright stars.

The CalibrationTask now only measures sources for the Src catalog and for astrometric and photometric calibration. At the moment it uses includeThresholdMultiplier = 1 and thresholdValue = 5, since @jbosch had suggested 5 sigma. I wonder if such a low threshold multiplier is asking for trouble (e.g. footprints that are too small).

I figured we’d tweak these settings based on the results from running on real data, as will be reported here but if you have any suggestions, please let me know.

jbosch · February 1, 2016, 4:32pm

I’ve created DM-4973 for @price’s suggestion that we switch from high thresholds to brightest N stars.