Single-frame processing sketch

jbosch · November 4, 2015, 7:45pm

Here’s the single-frame processing pipeline sketch we came up with today:

ISR (instrument signature removal)

Largest data unit: amp, then CCD (includes assembly).
Run on Snaps independently.
Flat field assuming sky SED.
We’re guessing (hoping?) this includes all the brighter-fatter correction we need.

Combine Snaps

Largest data unit: CCD
We’re expecting to just be able to add them, but that’s not guaranteed. If we have to warp and/or PSF-match, a lot needs to be changed (and maybe we should rethink the observing strategy).
Could find and mask some cosmic rays here (by diffing); can’t interpolate them yet without the PSF.

Detect+Background

Largest data unit: CCD
Only concerned with moderately bright stars at this point (i.e. inputs to PSF modeling); can make tradeoffs in the algorithms that adversely affect galaxies.
Use wavefront sensors to provide a PSF size guess for detection.

Measurement

Largest data unit: CCD
Only algorithms needed for star selection and PSF determination - aperture fluxes, centroider, 2nd-moments.

Initial PSF Estimation

Largest data unit: CCD
Expect to use traditional, non-physical (e.g. PCA/polynomial) methods, or relatively simple extensions.
This needs to include some approach to modeling the extended wings of stars.

Repair CRs/Defects

Largest data unit: CCD
Run on Snaps independently; recombine afterwards.
Can use both differences and morphology to find CRs.
Need to use PSF model for interpolation as well as morphological CR detection (why we can’t do this in ISR).

Resampling (?)

Largest data unit: CCD
We may need to resample some or all of the image at this stage to deal with astrometric features in the sensors (I don’t recall the name for the effect; perhaps @rhl can remind me - and expound).
We can’t just roll these distortions into the WCS, because they’re not linear over the scale of the PSF.
If we don’t correct for these effects, they could mess up downstream operations (like warping) that assume good sampling.
It’s not clear whether we’ll have to do this, and if we do, how much of the image needs to be modified - it depends on sensor characteristics we haven’t really determined. The traditional approach has been to treat these astrometric effects as QE variation, which (mostly?) works for photometry, but doesn’t work for other measurements.

Detect+Background (again)

Largest data unit: CCD
Lower threshold than last time, now only mostly concerned with stars.
Can use the PSF model for detection smoothing

Deblend?

Largest data unit: CCD
May not be necessary; depends on how faint we need to go here. Since these are still just for calibration purposes, we’d prefer isolated objects anyway.

Measure (again)

Largest data unit: CCD
Can now use PSF-dependent algorithms.
Still focused on measurements needed for calibration purposes (this is not what populates the final Source table).
Need debiased centroid estimates.

Match to External Catalog

Largest data unit: Visit
We expect some CCDs not to have enough well-measured objects; want to use full visit to recover them, since we know the relationship between CCDs in advance.
Includes initial astrometric solution and initial photometric calibration.

Before we can actually continue to the final stage of Source measurement and detection, we’ll have to start other pipelines, because we’ll use them to find backgrounds, PSF estimation stars, and artifacts that we’ll want to use in the final SFM.

Relative Calibration

Largest data unit: all visits in an area of sky, size TBD
This is a “local ubercal”; we don’t expect to need to use the entire sky at once, but that’s not completely ruled out.
Solves for final astrometric solution (including small-scale atmospheric variations we can’t constrain from lab data or matching to Gaia)
Solves for final photometric solution, including correction from raw DN (assuming sky SED we used for flat-fielding) to more reasonable nominal SEDs for objects.
If we can’t rely on Gaia to tie the photometry together at large scales, we may need to have a larger-scale relative solver for that (Gaia will be definitely be sufficient for large scale astrometry).
We may use galaxies for astrometry, but not for photometry.
We might be able to identify our best set of stars to use for PSF estimation here, by matching stars considered for PSF estimation on individual frames. We’ll also find the true (per-epoch) centroids of PSF stars with proper motion, or at least be able to reject them.

PSF-Matched Coaddition and Background Matching

Largest data unit: all visits (sequentially?) within a patch
This generates templates for difference images and coadds for aperture fluxes (which are either the same, or just different subsets of visits with the same processing, modulo DCR correction choices)
We’ll do background matching at the same time, which (after we subtract the final background from the coadd) will allow us to compute our best background for individual visit images.
Background matching will also give us an opportunity to identify artifacts, and because we’re making PSF-matched coadds we can also use per-pixel outlier rejection to help with this.

Likelihood and/or direct coaddition and processing (?)

Largest data unit: all visits (sequentially?) within a patch, patch
Only need to do this before the final SFM if coadd processing is necessary to reject galaxies from the PSF estimation star catalog.

Image Differencing

Largest data unit: visits?
May be necessary to find all artifacts and moving objects that we’ll need to flag in single-frame measurement; we have better sensitivity than we will in background matching because we don’t have to warp the science image here.

Once we’ve got a consistent PSF star catalog, good backgrounds, and good artifact masks from some combination of the above steps, we can complete single-frame processing:

Final PSF Determination

Largest data unit: visit (probably; some chance we could use multiple overlapping visits)
This will be a much more sophisticated PSF modeling algorithm than what came before, mostly focused on requirements for weak lensing. Will likely use some sort of physical models and wavefront sensor data.

Final Single Frame Measurement

Largest data unit: CCD
Need to have best PSFs here so we can run shear estimation codes for systematics tests and reject exposures with unconstrained PSFs/astrometry from multi-fit. Will not use shear estimation here for science, and we may not use the same algorithm we use later.
Not clear what the other algorithmic requirements are here; we suspect that these measurements will be superceded by forced photometry or multi-fit for most science.

A few closing thoughts:

We forgot about aperture correction. Probably goes with initial PSF determination, and it’s closely related to modeling the wings of stars (they at least need to be consistent) but how to coadd that information is tricky.
We’re assuming here that the final PSF estimation is only needed by multi-fit, and hence it’s okay to make all coadds using the initial PSF. If we do need the final PSF when we build the coadds, but we can create a good PSF star catalog withing needing the coadds, then that’s easy - we just build coadds after we do the final PSF determination (which would still have to happen after relative calibration). If we do need the coadds to reject galaxies/binaries/etc. from the PSF star catalog and we can’t use the initial PSF to build the coadds, things get tricky; we’d either have to add another (final) round of coaddition or another (intermediate) round of PSF determination.
Given that the final PSF estimation and single-frame measurement stages happen completely separately from the image characterization stuff that happens earlier, and they may need to happen after we build coadds, it may make sense to try to accomplish these when we’re doing multi-fit, since that’s the next time we’d naturally be revisiting the individual visit images. The challenge would be that the PSF estimation at least will definitely need to operate on full visits (though maybe not with access to all of the pixels).

gpdf · November 11, 2015, 8:09pm

In which step does the initial determination of a WCS take place?

ktl · November 11, 2015, 8:13pm

Isn’t that 11, “includes initial astrometric solution”?

gpdf · November 11, 2015, 8:18pm

I saw that, but it wasn’t clear to me whether the “includes” meant “performs” or “takes into account the previously determined”.

gpdf · November 11, 2015, 8:22pm

Can you clarify what processing of the WFS data is required in order to enable this?

And am I understanding correctly that this makes this WFS data processing a dependency of image differencing in Level 1 (Alert Production)?

price · November 11, 2015, 9:40pm

I don’t think we need WFS to guess the PSF size. We can simply do a detection pass without any PSF smoothing. It doesn’t go as deep, but we can bootstrap that.

nidever · November 11, 2015, 9:42pm

Super useful!

Have you talked to the relevant T&S people to make sure that we can get that information from them? We should also tell them that we are interested in this information. I don’t think it’s a trivial matter. We should start inter-subsystem communications about we need. I’ve started on this from the QA side.

Will this sketch work in crowded fields? How will you select PSF stars, define a PSF, get the background and derive aperture corrections in a crowded field? Also, is simultaneous-fitting of objects involved at single-frame processing?

I think I missed something. Why would you not have a single-frame “final PSF” at the time you are creating coadds?

swinbank · November 11, 2015, 9:57pm

Already discussed with Sandrine Thomas. We’ll be following up with T&S at the Joint Technical Meeting.

nidever · November 11, 2015, 10:13pm

Great.

One of the common issues that I see is that there is a lot of information captured on the mountain by the Engineering Facility Database but in the current plan it is only synced to NCSA once a day. That’s too slow to make use of that data in Level 1. The two solutions I see is that we either (1) require the sync to be more frequent or (2) we ship certain EFD metadata with the data (i.e. as FITS headers, or however we are shipping snap/visit-level metadata to NCSA). I think #2 is probably easier. I would be interested in hearing what solutions people come up with.

timj · November 11, 2015, 10:15pm

Metadata will be shipped with the data (otherwise it can’t be reduced). I think the important issue is to ensure that all the data that are needed is included in that transfer. L1 is on a tight schedule and can’t be hanging around waiting for an asynchronous syncing of data.

L1 also has to send data back to the telescope.

gpdf · November 11, 2015, 10:57pm

We already have this in the architecture, as we are directly acquiring the WFS data from the Camera for archiving along with the science sensor data. The acquisition of the WFS data is therefore not a T&S interface for us. In fact, the T&S team will get the WFS data from us when they want to do historical analyses. (They have a separate live path for WFS data for the near-real-time active optics correction pipeline.)

nidever · November 11, 2015, 11:13pm

So does the WFS code on the mountain need to get the focus/guiding pixel data from us? But the output of the code goes through the DDS system and is captured by Facilities Engineering Database, correct?

gpdf · November 11, 2015, 11:26pm

This is also already in the existing architecture.

The Alert Production / Level 1 pipeline has to have a “whitelist” of the telemetry / EFD data that it needs, and then the data ingest part of the Level 1 system - running in La Serena - will subscribe to the appropriate telemetry channels and/or retrieve data from the EFD (when not available as telemetry) and “attach” this data to the image data as it’s sent to NCSA for analysis. So that’s your Model 2.

P.S. Just out of curiosity, where are you picking up the “in the current plan [the EFD] is only synced to NCSA once a day”? I believe we don’t have a requirement to sync it with shorter latency, but I wasn’t aware (or missed) that the engineering choice to actually do it once a day had been made.

jbosch · November 12, 2015, 2:17am

The final WCS isn’t actually determined until Relative Calibration. I did intend for there to be a WCS determination step in (11), which would be the last time we determine a WCS in single-frame processing, but I don’t think that should be used for anything other than feeding the Relative Calibration algorithm.

jbosch · November 12, 2015, 2:45am

I’m afraid I haven’t thought enough about how to process the WFS data to really say what we’d need to do here.

I don’t think this makes processing the WFS data a Level 1 dependency; as @price says…

…we may not need the WFS at all to solve this problem. It’s really not a hard problem to solve.

But there are harder problems in both Level 2 and maybe Level 1 (initial PSF estimation and maybe difference kernel estimation) where WFS information could also be useful, and that suggests that processing the WFS data before or jointly with the science CCDs is a good idea, if it’s not too hard to get access to them at that stage (ditto for other auxiliary information from the telescope).

jbosch · November 12, 2015, 3:10am

If we can get through the initial PSF modeling stage (5) with something halfway decent, I think this general approach will do okay in semi-crowded fields, where most objects may be blended but we’re still able to find a handful of semi-isolated stars that we can use to constrain the PSF model.

For really crowded fields, I think we’ll essentially want to do an iterative loop over steps 3-5, allowing us to subtract previously detected objects during the next background measurement step, and with a bit of sophistication in detection, measurement, and PSF modeling to account for previously-detected objects and blending. That will have to include some simultaneous fitting.

As for the final single-frame measurement stage, I’m not prepared to make any strong statements about how we’ll do blended measurement (whether that’s via simultaneous fitting or measurement on deblended pixels). Given that we don’t have to worry about motion or variability (since it’s single epoch), and we’re not going to really care about single-frame measurements for galaxy science, I think the gap between the two approaches is really quite small.

I’d like to, certainly. And I think the PSFs we have at this stage will have to be pretty good - at least as good as the best PSF modeling anyone is doing today.

The “final PSF” is really quite focused on weak lensing, and the reason we can’t be certain we can do it earlier is to ensure we’re using all the information we can to ensure a clean list of stars to use as inputs to the PSF modeling. If some coadd processing (or even an early multi-fit stage) lets us do a better job of removing galaxies and binaries from the PSF star catalog (or, conversely, it lets us safely include fainter stars with uncertain single-epoch classifications), that may prove important to the quality of the final PSF model.

For what it’s worth, I think it’s likely we’ll be able to get all the information we need in that department at the Relative Calibration stage, where we’ll have access to all of the single-epoch classifications of each star together. And that would be very nice, because it means all subsequent processing could use the same PSF model and hence there’d be no bookkeeping nastiness concerning which of several PSF models we’d used for a particular measurement. But at the meeting we weren’t ready to commit to the joint single-epoch classifications being sufficient.

nidever · November 12, 2015, 5:59am

Thanks. That sounds good.

nidever · November 12, 2015, 6:01am

I think both Dave Mills and Chuck Claver said this and I think I read it in a design document, but don’t remember which one at the moment. It wasn’t set in stone that it was going to be one day, but more along the lines of “the current thinking is that it’ll be synced every 24 hours”.

CStubbs · December 21, 2015, 5:11pm

Or else use the PSF from the guide sensors. It should be available right away, as soon as the image is read out.