(had to artificially insert spaces in doWriteHeavyFootprintsInSources in the title of this post for the forum to allow me to make this post)
I’m using v19_0_0 of the pipelines and was hoping to limit the output file size of the “src” tables in cases with high source density using:
processCcd.py DATA --calib DATA/CALIB --rerun processCcdOutputs --id visit=725289 ccdnum=49 --longlog --config isr.doFringe=False --config calibrate.astrometry.matcher.maxRefObjects=3000 --config calibrate.doWriteHeavyFootprintsInSources=False &> processCcd.log &
However, the resulting
src-0725289_49.fits file size with/without
doWriteHeavyFootprintsInSources=False (and holding everything else fixed) ends up exactly the same (2+ GB). Checking DATA/rerun/processCcdOutputs/config/processCcd.py shows that this config setting does seem to be properly recorded:
grep doWriteHeavy DATA/rerun/processCcdOutputs/config/processCcd.py
Thanks very much.
Looks like this feature was removed in Fix pep8 warnings · lsst/pipe_tasks@1ff2a9c · GitHub without comment or notice.
Actually, that’s unfair. That commit was merely removing a useless setting of a variable that was already unused. The real removal was Remove 'flags=sourceWriteFlags' from dataRef.put in CalibrateTask. · lsst/pipe_tasks@790c077 · GitHub
That September 2016 commit gave the rationale “The Butler ‘put’ does not support a ‘flags’ option to pass down
to the underlying catalogs.”
Is this a question that should be revisited with the Gen3 Butler?
There are ongoing discussions about the best way to handle this in Gen3. There is also [DM-26761] Add flags parameter for reading afw tables in gen3 - Jira for dealing with this on read (not write).
Gen3 does not allow people to specify parameters on put. It’s up to the user to strip it ahead of time if they don’t think it will be needed. See also the discussion in DM-6927.
Thanks all for the very helpful responses!
As a mere end user, I would indeed like the ability to specify this type of flag and have its intent be obeyed/propagated, but I can’t claim to know about all the other implications that might bring along.
@timj could you elaborate on “strip it ahead of time” for Gen3 – does that mean I would be manipulating a Butler object such that it discards the things I don’t wish to write-out before issuing the write-out command? Thanks again!
It could mean that you run the code to strip out the heavy footprints before writing the table, but ordinarily that would either be happening in the
Task.run() method (controlled by configuration) or by the
PipelineTask.runQuantum() infrastructure that is interacting with butler. Since you say you are a user and not a Task author, there’s little you can do at the present time.
v19 is an extremely old release of the software and many many things have changed (including the entire way we run pipelines and interact with data files). In this particular case we haven’t implemented any tasks that will strip the heavy footprints out so updating to a current release would not help you. On the other hand, migrating to the modern infrastructure would at least let you make use of any task configuration that does happen in the future.
Thanks, Tim! Yes, I am aware that v19 is quite old. The context is that I’m reducing DECam data, and my understanding has been that DECam support in Gen3 is currently under development, so I’m trying to figure out how to balance using something stable versus not getting ridiculously far behind. At any rate, I am very interested in learning to use Gen3 in the near future.
As far as I’m aware DECam support in Gen3 is working great unless you need to use the community pipeline calibrations. Even if you were wanting to use gen2 you should be using v23.0.2. We are using DECam data with gen3 in many of our alert production tests. @lskelvin or @mrawls can comment in case I’m missing something subtle.
See for example:
Thanks! Yes, using Lee’s guide to learn DECam Gen3 processing has been on my to-do list for a while. Thanks also for the specific recommendation about a more recent version of Gen2.
It is convenient to be able to use CP master cals with Gen2, but I’m alright with learning a different way to handle master cals for Gen3.
What is the status of custom reference catalogs in Gen3? Our group wants to, for instance, process DECam data below Dec = -30 where there’s no PS1 available. @lskelvin @mrawls
Reference catalogs aren’t really different between gen2/gen3. What reference catalog are you thinking of using? We have instructions for converting external catalogs to our refcat format. See this Community post for a Gaia DR2 refcat (though that doesn’t help with photometry).
@parejkoj Thanks for the info! Our group has had good success with making LSST pipeline formatted reference catalogs for our Gen2 DECam reductions, so that’s nice to hear that references catalogs don’t really change much between Gen2/Gen3. Examples of southern data sets we’ve used as reference catalogs with Gen2 are DECaPS (DECam Plane Survey), SkyMapper, and NOIRLab Source Catalog. Thanks again.