Our survey takes three 2-minute back-to-back exposures at the same pointing before slewing to a different location. Presently, the first time these three exposures are coadded by the stack is during the
coaddDriver.py stage, whereas it would save a lot of resources (disk space and processor time) if these three exposures could be median-combined post-ISR, with these combined images (and resultant source datasets) used as a
calexp images and datasets for later stages.
Does anyone know if it would be possible to do this? If so, how could I go about implementing this? I’m particularly thinking about how I would ensure the new calexp files were persisted properly for later steps (since there would no longer be a one-to-one relation between raw images and calexp images).
I’m happy to retarget to my own tasks, should it be necessary.
Do you swear that the pointing and PSF is identical between the different exposures? If the former is violated I think you’ll need to run the warp code that’s behind
coaddDriver. If the latter is violated you shouldn’t use a median, but you can of course still do the addition.
To answer your question, the snap code is supposed to do this, but I don’t know it’s current state (and whether it hardcodes the number of snaps and their filenames). Otherwise it’s probably most easily handled as another step post-ISR and pre-processCcd; if that’s not currently doable then I think we would be happy to add the minimal support. The task itself would be simple, the most complex part being how you wanted to identify the input and output datasets.
Thanks for the response and the offer of help.
On inspection of our raw images, there is some variation in the PSF and pointing between exposures which would need to be corrected for. I expect the drift could be solved with a zeroth-order x-y translation, and I’d be happy to add rather than median combine (we were hoping to do the latter for cosmic ray removal, but I agree that variation in the PSF puts a spanner in the works for that).
Despite all that, I still think there may be benefits to coadding post-ISR and pre-calexp. In particular, it would mean we’d only image difference and force photometry on “meaningful” images (e.g., one probably doesn’t learn much from two difference images separated by 2 minutes). However, perhaps there’s a way we could achieve that after generating calexps from each raw exposure.
Like you say, the translation/addition is reasonably straightforward (at least conceptually), it’s getting the bookkeeping right that I feel will be the more difficult aspect.
Once you have an x/y offset you need a full warp kernel, but constant so that should be quicker as you imply.
The way that the current LSST code handles 15s “snaps” is to give them a different name, so the (two) snaps comprising visit
666 would be e.g.
raw-666-1.fits, then write them without the extra suffix (
postISRCCD-666.fits); that’s doable via a simple addition to the mapper yaml file. If you can do this then the book-keeping is pretty trivial, and all you need do is write a simplest wrapper to read the sets of “snaps”, call
IsrTask and write out the
That’s a very simple, yet elegant, solution (as the best ones tend to be)! I’ve got a good idea of how to proceed now.
Thanks very much for your help. Once I’ve coded it up, I’ll add a link to my solution (in my obs_ package) to the end of this thread in case it’s useful for others in the future.
As promised, here’s a link to my solution, in case it’s useful for anyone else who comes across this thread in the future.
In the end, I wrote my own driver script, as I wanted it to be parallelised. It parallelises across visits (typically 3 back-to-back exposures, but it’s agnostic to the number of exposures) and ccds. My solution sends a single list of image references to the run function, which I think means that if one image fails, the whole thing stops (even with --noError set). To avoid this, I’ve had to wrap up the various functions it calls within try…excepts. It’ll be nice to know if there’s a better solution to this.
I tried to get it to process each exposure of a visit in parallel, but this involved having an “exposure” pool contained within a “visit” pool, which it didn’t seem to like (it hung indefinitely). It’s not a major concern, though, as there’ll always be sufficient visit/ccd combinations to fill up our cores.
Thanks again for your help. I learned a whole lot of useful stuff in the process, and have overcome my fear of writing my own tasks.
Wow, well done!
Yeah, ctrl_pool doesn’t support sub-pools, sorry. You have to do each parallalisation scheme separately of each other (e.g., for each exposure: process the exposure, write it to disk; for each visit: read the exposures, coda and write the result).
Wow, well done!
Haha - thanks!! There was a lot of banging my head against a good few "
dataRef has no attribute get"s, but I got there in the end. I’m currently working on adding forced photometry as a final step.
Yeah, ctrl_pool doesn’t support sub-pools, sorry.
That’s fine, I did suspect that this would be a big ask. Besides, it’s useful to know I can’t do any better in this respect; I won’t look into it further, then.