I have run into an issue implementing the image characterization task. Iterating detect/measure/fit PSF measurement is not converging for my test data (which is an exposure from obs_test). In more detail, CharacterizeImageTask does the following; note the extra steps before the detect/measure iterations:
add a simple PSF model to the exposure (if it doesn’t have one, and it usually won’t), like the old task. Measurement requires this.
run repair, leaving cosmic rays in, as the old task does.
subtract an initial estimate of background, like the old task, since detection gives insane results otherwise
iterate the following N times (N is a config parameter):
DetectAndMeasureTask
detect sources
deblend
measure sources
fit a PSF
I observe that the sigma of the fit PSF does not converge, though it does not vary much. For example starting with a naive estimate of 1.5 pixels, I see .55, 1.58, 1.50, 1.56, 1.55, 1.58, all based on a single source, alas, then 1.55 based on 2 sources and the next iteration fails with no eigen images. I will append a log as a reply, since attaching text files is not supported.
I realize this could be mis-configuration, but it worries me that it does not converge. Any ideas for improving convergence?
Also, I think @jbosch was hoping not to need an initial estimate for PSF FWHM, but I don’t see how to avoid it with the algorithm.
I am surprised that you found only 17 stars down to 5 sigma. What kind
of dataset is that? Do you have any insight into their width and
brightness distribution? Are there resolved objects and CRs?
In my experience with SDSS PSF code (which appears similar in spirit to
what you are trying to do here), below 10-15 bright stars per sample,
all kinds of problems start appearing. It’s not trivial to select a self-similar
subset of objects when you don’t have any prior information…
The data is visit one of obs_test: data/input/raw/raw_v1_fg.fits.gz, but after ISR and removing background. It is too large to upload to discourse so I put a copy here.
(Note: at this point it should be detecting to 10 sigma, which will make things even worse. That is an issue I haven’t yet tried to track down).
This dataset is fairly sparse and the PSF is a little funky looking. We should look at the debugging info to see why it’s not selecting more template stars (and what stars are being considered as candidates).
Fair enough. I tried again after improving the config (setting detection.thresholdValue = 10.0 and leaving detection.includeThresholdMultiplier at its default) and it detects 37 stars, but still only measures one as a PSF candidate and still doesn’t converge. Using a fixed # of iters is probably unwise in any case, so I’ll have to think about convergence criteria.
We may need to have different configuration for different iterations. In particular, I think we may need the first pass star selector to use a different algorithm from the first one if we don’t have idea what the seeing is at first, and the measurement algorithms those depend on may change as well. I’m guessing that’s not possible right now, just because I know it’d be difficult to wedge that concept into our current config system. But I think we will need it - if not here, then elsewhere soon - and we should think about how we could support that in pex_config.
On the other hand, if our “loop” actually does something slightly different at each iteration, and we expect a very small number of iterations, we may be able to express that better by explicitly unrolling it (which at some level is what the old task did very inflexibly).
Right now we make a repair pass before we have any idea of the actual PSF. I wonder if it’s safe to allow the results of that initial repair to be in the final exposure. An alternative might be something like the following:
if exposure has no PSF model then add a simple model whose FWHM is in the config
loop a few times
make a deep copy of the exposure and use that for the following
repair
estimate background and subtract it
detect
deblend
measure
measure PSF
The first measurement of PSF is likely to be quite close enough to trust the repair and background estimation, and if those are slow then those steps could be restricted to the first two times through the loop.
I don’t anticipate it being necessary to deep-copy to avoid problems with the first pass PSF causing problems, and it seems a bit wasteful, but I suppose it would be somewhat safer, and it may make it easier to debug problems.
Note that the only way the background should depend on the PSF here is through detection, which includes an implicit background estimation step itself, so the previous explicit one you’ve listed may be unnecessary. It’s also pretty much guaranteed that some combination of the last three steps (deblend, measure, estimate PSF) will dominate the overall computation time, so there’s no need to worry about trying to optimize repair and background estimation.
I do think we can do better than just starting with a PSF with FWHM is set via config, though we’d have to write some new algorithmic code for that and hence it might be out of scope. It shouldn’t be hard to run a special detection/measure stage that just aims to find a couple of unsaturated stars and measures their width, and that could be extended to something that works on crowded fields at some point too. We should make sure the design permits that even if we don’t have it now. I could imagine that being expressed as a completely separate subtask we run early, or as a specialized configuration of many of the existing steps in the iterative sequence.
Detection misbehaves if background is not subtracted first. I found this out when running tests/testProcessCcd.py. This explains why the old CalibrateTask also performed this step.
Your suggestion for factoring out predicting the initial PSF model as a separate subtask makes sense.
I switched to using a deep copy of the image for each iteration and found that PSF estimation still did not converge for visit 1 of the obs_test data. It turns out the PSF itself is causing the problem, though I’m still not sure why. I found rapid convergence when I changed the PSF model to a simple Gaussian whose sigma = the sigma of the most recently computed PSF model at the start of each iteration. Just one repeat is required to converge. I’m not at all sure that’s the best we can do, but it ought to be better than our current code, which starts with a Gaussian PSF and fit once.
I have made installing that simple PSF model a subtask.
I love PSFs and would like to learn more about the algorithm which selects stars to be used in PSF estimation (e.g. what measured quantities are used in per star decision making good4psf vs. bad4psf,
and what does this decision algorithm do with those quantities).
I am ok with eventually reading (well-commented) code, but first I’d like to have some high-level descriptions to start with (I don’t even know if the implemented algorithm is a clone of SDSS methods, or something totally different).