Should we have an extendedness like parameter in the Sources table?

In I am checking LPM-17 against LSE-163 and trying to write the actual queries to select the samples to compute individual visit requirements, it’s been a very interesting exercise so far. One of the issues I found is that to compute some requirements, e.g. image quality, we have to select point sources, but the Sources table (Level 2) does not have an ‘extendedness’ parameter, however it is present on the Objects and in DIASource tables. The baseline database schema is consistent with that Is that on purpose? should I use the Objects table instead? at some point LSE-163 says “Objects (detected on deep coadds as well as individual visits)” is that expected?

I expect that we will measure many quantities on individual frames that will be used for QA but not published to the database for science users, and the DPDD is really only designed to capture the latter. Our plan is to only report galaxy model fits for Objects (on coadds and in multi-epoch fitting), because the extended light from galaxies doesn’t change from epoch to epoch, and hence any galaxy science should be based on those deep measurements. But those models are likely to be a critical ingredient of any star/galaxy classifier.

Coming up with the list of measurements that need to be made for QA purposes is one of the big planning efforts that hasn’t yet been done; I think it’s probably something the Science Pipelines Working Group needs to get done this cycle. We could certainly consider running the galaxy fitting code (and hence extendedness) on single visits for QA reasons, though the classification there still won’t be as good as it will be for Objects - so if you want the extendedness to enable other tests on the exposure quality, rather than as a direct test, you may get better results by matching to an Object table from a previous data release.

FWIW, I think we will almost certainly be running some sort of measurement on the single epoch frames in Level 1. This comes from my expectation that we need to do astrometric fitting and (most likely) photometric calibration. Right now, those sources aren’t persisted at all. We could keep those measurements around for other purposes, but it would be an increase in scope over what we are planning now.

@jbosch right, if extendedness depends on galaxy model fitting and this is not planned to be done on individual Sources we should consider a simpler star/galaxy classifier for QA (high s/n sources). Perhaps the Source size? from the adaptive moments size=sqrt(mSum) I see that adaptive moments are measured for Sources and stored in the database.

I think the plan (LDM-135) is to have QA tables in the Level 2 database (@ktl recently confirmed that) and I am trying to write queries to select the samples to compute single visit requirements ( they are defined in words in LPM-17 and could be implemented as views in the production database.

Any concentration index measured for Sources?

If you want a secure sample of moderately bright stars in order to analyze the PSF, there are a few flags you can use:

  • calib_psfUsed: set for stars that were actually used when building the PSF model.
  • calib_psfCandidate: set for stars that were selected for use in building the PSF model, but may not have been due to outlier rejection or reservation.
  • calib_psfReserved: set for stars that were selected for use in building the PSF model, but were intentionally not used to build it, allowing us to use these stars to test the PSF model for overfitting/underfitting.

If you want a more complete sample of stars or a more pure sample of galaxies, I would recommend computing a “shape trace difference” from the SdssShape moments:

trDiff = ((base_SdssShape_xx + base_SdssShape_yy) - (base_SdssShape_psf_xx + base_SdssShape_psf_yy)/2

That should be something like the square of the PSF-deconvolved radius; it should be close to zero for point sources and positive for extended sources (significantly negative probably indicates something like a missed cosmic ray; slightly negative is just noise on a point source). To really make sense of it, you’d want to plot it against magnitude; you should see the stars quite clearly at the bright end (and hence what cuts you should make), and if the image is deep enough you’ll see where our ability to distinguish stars from galaxies basically fails.

What do you have in mind for concentration?

Hi @jbosch thanks for the suggestions, that helped a lot, I will try the shape trace difference and also would use pixel flags to reject bad sources. For concentration since we don’t fit galaxy models at this point I had in mind something more robust like a ratio of the 3/4 light radius to 1/4 light radius.

We don’t attempt to measure those radii directly (I think that’s actually very difficult to do without assuming a profile, as you never know how far to go out in radius), but we do measure a set of aperture fluxes at a sequence of fixed radii, and I think you could use those to compute some sort of concentration if you include some other measure of the size.

@jbosch I will try different approaches for S/G classification in single visit processing. @ktl, @mjuric correct me if I am wrong but my impression is that the Sources table should also be designed to enable the calculation of the single visit requirements as specified in LPM-17. I imagine a process, like in DES, where visits that did no pass the requirements are flagged and excluded from co-addition - is that described somewhere?

The verification plans for the requirements in the SRD (and LSR and OSS and DMSR) are not yet finalized. While I think that many will rely on simulated data, I would assume that the same computations would continue to operate on real data when available, and that any results or key intermediates from those computations would be persisted.

I think we are hopeful that useful information can be extracted from (nearly) all visits when generating co-adds. But if some filtering is necessary (and filtering has been done in the past on SDSS images, for example), it would be performed by the co-addition task and described in its design.

Angelo, thanks for raising these issues and thanks to everyone else for comments
and clarifications. We need more work like Angelo’s to uncover shortcomings of our
design before too late! Here are a couple of quick comments:

QA and immediate single-visit processing:
Much before doing any co-addition, essentially immediately after an image is acquired,
we must assess its quality (ranging from crude questions such as “did the shutter
actually open?”, “did we read all the pixels?” to more subtle effects such as “is the PSF
slightly elliptical?”, “is gray opacity too patchy?”, “is the background unexpectedly
bright?”, etc). Such a rapid quality assessment feedback from DM is expected by
Scheduler and will play a role in scheduling subsequent visits.

This information should be in DPDD, but we have not done a serious QA design job
yet. We’ll know better after we focus on the design of QA procedures and pipelines in
due time (after we converge with Level1/2 and calibration, in a few weeks hopefully).

S/G separation for Sources in single-visit images:
We will need a way to separate extended from non-extended Sources in single-visit
images, if for no other reasons then for QA purposes. Galaxy fits (bulge/disk presumably)
for DIASources sounds like an overkill (for Sources, it’s not so obvious). Getting
extendedness for a Source by matching it to Object from last DR sounds OK. Another
good option might be to use adaptive moments as a size estimate, which should work
fine for high-SNR sources. Again, we’ll know better after we go through QA pipeline
design later this year.

Is this really for the DPDD? My reading of the DPDD is that it should define all the data products required for science. This will be a subset of all the data we keep around. QA data will also need to be produced, but it seems like that might be a totally separate, maybe less tightly change controlled, document.

If it is another thing, I think we should start with a skeleton of that document as soon as we can. What actually is persisted in Level 1 is of interest to me.

It is not clear (yet) to me why (I have never seen a document, or recall a discussion)
we would be computing quantities for QA but not publish them to users. Especially,
if those quantities are based on pixel data (e.g., the engineering active optics
parameters are a different story). For example, how would one handle such
quantities if they are not well documented and available in a db, with tools to
interpret them? But once we have all that, why not make it available to users - who
can predict at this time what would be useful and what not? I’d argue that the default
should be “if we compute it, we keep it and serve it”, unless there are good reasons
why not, which are decided on a per-case basis.

These ambiguities aside, I see good/useful QA quantities as important for science as
quantities computed for individual objects (e.g. various per-CCD or per-FOV quantities
might be crucial for robust science, such as the scatter of implied zero points per
calibration star which might indicate a bad calibration patch, etc). For this reason, I think
we should have a high-level summary of SDQA pipeline(s) in DPDD. Once we converge
with Level1/Level2 pipelines, we should start by asking the question “how do we know
their outputs are as expected” and proceed with the SDQA design, first in DPDD and
then in LDM-151. Of course, there will be nothing secret about that process - everything
will be recorded on Confluence, in the space that we already have for scipi-wg.

See also "Quality" information - what, where, how