Image differencing gen3 migration - call for comments on historical features

gkovacs · February 5, 2020, 3:06am

The first version of gen3 support in imageDifference.py (ImageDifferenceTask) will be merged soon. This is going to be a “backward compatible” implementation. It will be executable for a while in both gen2 and gen3 middleware worlds and supposed to produce (mostly) identical results.

The image difference code in the LSST stack has a long development history and consists of two parts:

There is the C++ Alard-Lupton PSF matching (AL) code, untouched for many years.
There are many layers of pre- and postprocessing higher level python code that have accumulated over years around this core.

Unfortunately, the origin or purpose of some logical branches in this pre- and postprocessing code is not obvious for us (the AP team at UW). There are also technical solutions which are not straightforward to migrate into gen3, occasionally because of deviation from the current LSST standard design structures or just because the more rigorous data handling in gen3.

Here, I’d like to mention a few examples of such functionality (configuration). I’d like to poll the Community whether these features are actually known or used by anybody, or can become gradually unsupported.

getTemplate = GetCalexpAsTemplateTask: calexp from calexp subtraction.
doSelectSources = True and kernelSourcesFromRef = True - Selecting AL kernel candidates (image stamps around sources) from cross matching calexp sources with an external catalog.
doSelectSources = True and kernelSourcesFromRef = False - Selecting AL kernel candidates from cross matching to an existing source catalog of the template. This mode seems to be supported in calexp - calexp mode only at the moment.
doSubtract = False - “subtraction without subtraction” i.e. detection and cataloging only on an already persisted difference image product.
doAddMetrics = True - adding AL kernel debug metrics to the output catalog.
doMatchSources = True Matching diasources to the calexp source catalog or to an external reference catalog. (“srcMatchId”, “refMatchId” in output catalog)
class Winter2013ImageDifferenceTask - a subclass of ImageDifferenceTask

Please shout if the removal of any of the above mentioned configurations from the LSST stack would break your daily processing. Also, please don’t hesitate to share your memories with us if you can recall a motivation or a use case for them.

Gabor

jbosch · February 5, 2020, 3:14pm

I’d strongly advocate making detection and measurement on image differences a separate PipelineTask from the one that actually does the matching and subtraction, as soon as it is convenient to do so (maybe even worth doing on the CmdLineTask forms).

That would give you the “doSubtract = False” functionality in a much more natural form (i.e. “just run the second task”), and the second task might be a better place to put the “doMatchSources = False” functionality, if it does turn out that you need to retain it.

mwv · February 11, 2020, 4:43pm

I would ask that the ability to do calexp-calexp subtractions should be retained.

I think in the past this has been de-prioritized because of the idea of there being a simplification of just having one template and getTemplate can just get that one. But I think both in development and even in operations, the ability to coherently uses different coadd templates will be very useful and important. And so the bookkeeping challenges in having image-image differencing become the same as image-coaddtemplate differencing.

jbosch · February 11, 2020, 5:07pm

I’m afraid this isn’t quite true: any kind of coadd is going to live on tracts and patches defined by some skymap, and that means there’s a very clear answer to “what is the best coadd image of some region”. It’s still not a trivial question to answer, because there are overlap regions, but the geometry is still such that we can define an answer via a (configurable) algorithm. There’s simply no way to do that for calexp-calexp subtractions, so the high-level interface is quite different (you need to provide two calexp data IDs, not just one), and it doesn’t fit naturally into the way the Gen3 middleware thinks about data relationships as being primarily defined by spatial/temporal/spectral overlaps. That’s not insurmountable, but it does mean that there is a lot of development work that would be just for calexp-calexp mode, and that may extend into the middleware itself (some already-planned improvements may or may not deliver the middleware support that would be needed).

Would making a single-calexp “coadd” and using that as the template meet some of your needs?

mwv · February 11, 2020, 5:11pm

The use case I have in mind is definitely that you would specify both dataIDs.

There doesn’t need to be any particular orchestration for this. Just a Task that does this operation.

mwv · February 11, 2020, 5:16pm

that means there’s a very clear answer to “what is the best coadd image of some region”.

I don’t think that answer will stabilize until Year 3. Different differences in PSF, how much de-convolution can you allow, what’s the trade-off between minimizing template noise by allowing more images and maximizing control over the difference in PSFs and DCR by being more restrictive.

You’ll at least want to be able to easily switch between DCR aware and more vanilla coadds.

But perhaps you were just making the point about finding something that overlaps in RA, Dec, and filter.

jbosch · February 11, 2020, 5:25pm

I was thinking that the other variables would be handled by different types of coadds, and that could be controlled via configuration (so essentially the template-finding algorithm would be able to down-select to just one suite of self-consistent coadds at the beginning). Choosing between different types of coadds on-the-fly based on the properties of the new science image would indeed be more complex, and I hope it doesn’t come to that, but I think we could at least support that in the middleware already, even if it would require additional complexity in the task.

mrawls · February 11, 2020, 7:29pm

It sounds to me like @mwv wants to be able to specify 2 calexp-like dataIds (e.g., visit + ccd) and have difference imaging happen, whereas @gkovacs and @jbosch et al. want to require a coadd data type for diffim templates. I think the simplest way to reconcile this is probably what @jbosch proposed above: a “turn my calexp into a coadd” task. A user could then use a gen3-style pipeline to execute

do ISR, calibration, etc. on both calexp1 and calexp2
create coadd1 from calexp1
difference image calexp2 using coadd1 as the template

Would this pipeline be supported with the present gen3 dataId and collection concepts?

I agree we should be thoughtful moving forward about how users choose among multiple template options. Right now in gen2 (in coadd template mode), it is done via a combination of configs and by specifying inputs/outputs/reruns that include the science calexps as well as the template coadds. In my experience - and I think @isullivan’s too - this is not as transparent as we’d like and can lead to unintended science/template pairings which is tedious at best.

mwv · February 12, 2020, 4:03pm

I think adding a “turn the calexp into a coadd” task adds an additional unnecessary layer. It also doesn’t solve the problem. E.g., perhaps I want to run subtract several different reference images against the same science image. Do I make three different pretend coadds, one for each different template image. This seems like just adding a layer of complexity that doesn’t really simplify anything.

Under either scenario, at some point there will be presumably a piece of code that takes two images and runs image subtraction. I’d like to easily be able to call that piece of code from the command line by passing two dataIds. I don’t think I’m asking for any middleware support.

(I do think that Commissioning and Operations may want such middleware support, but I’m not asking for it right as part of this present query.)