Is there a reference manual and/or tutorial for using gen3 butler?

pollack · February 15, 2021, 4:00pm

Where can I find the most up-to-date information for first-time users for how to ingest and process a RAW FITs source catalog and reference catalog using gen3 butler? Apparently, the information presented here: https://pipelines.lsst.io/v/DM-11077/getting-started/index.html is outdated

jbosch · February 15, 2021, 4:54pm

I’m not aware of any good tutorials for this, but the first steps - setting up a data repository and ingesting the raws - involve the butler create, butler register-instrument, and butler ingest-raws commands. All of those have --help options that can provide some reference information.

One particularly important piece of information for these is the name and class for the instrument. There are actually different instrument classes for different simulators for LSSTCam data (ImSim and PhoSim). If you’re using one of those - and the versions of those we regularly work with - then this will be relatively straightforward. If not, there may be a lot of work involved, either in adding a new Instrument class or modifying the simulator or simulations to closely resemble the output of a supported one.

pollack · February 15, 2021, 4:58pm

I am using a Euclid simulation of the LSST e2V sensor following the LsstCam FITs format. This uses a different FITs format than ImSim (which simulates the ITL sensor), and I had issues ingesting and processing with gen2 middleware. I am not sure if I will have issues now with gen3. I am wondering what configurations I might need to set.

jbosch · February 15, 2021, 6:13pm

If the gen2 problems involved understanding/translating the header metadata, I suspect the problems will remain in Gen3, because they use the same tooling for that.

I think the best option is probably to copy the ImSim or LSSTCam Instrument and translator classes on a branch or fork of obs_lsst to make new ones for this simulator, and modify them as needed until they work. There may be a fair amount of trial-and-error.

pollack · February 15, 2021, 6:31pm

Actually, Tim Jenness wrote a new translator for the metadata for LsstCam. I have to use the LsstCam mapper. However, I ran into further problems involving some of the header keywords settings. Kian-Tat gave me a workaround. But, I admit that I don’t quite understand all of the keyword definitions/settings in LSE-400. So, yes it’s been trial and error.

timj · February 15, 2021, 6:40pm

She needs to use lsst.obs.lsst.LsstCam for instrument registration in gen3. The translator seems to be fine. This is discussed elsewhere which also shows the error message from processCcd.py.

jbosch · February 15, 2021, 6:54pm

Ah, I’m afraid I haven’t caught up yet on the ID mangling and bit-consumption issue; it’s on my to-do list.

pollack · February 15, 2021, 8:49pm

I completed the first steps of ingesting the raw FITS, but what is the command for ingesting the reference catalog? And, what is the command to process ccd?

jbosch · February 15, 2021, 10:02pm

We don’t actually have any tooling for native ingestion of reference catalogs into gen3 yet; all of our reference catalogs are already in gen2 repos, and we have just been using our gen2->gen3 conversion tools to get them into gen3 repos. So I’m afraid there’s more development needed there as well.

You should be able to perform the first steps of processing without a reference catalog - that would be invoking the pipetask tool (from ctrl_mpexec) on just two tasks (“isr” and “characterizeImage”) from our DRP pipelines. I don’t remember the syntax off the top of my head (others following along might), but first I think you’ve got another problem: you’ll need some calibration data - probably at least flats and biases to keep detection and measurement from falling over, though there are many others that are important for getting high-quality results. Do you have raw calibration frames you plan to combine to make masters, or some other way of obtaining calibrations?

lskelvin · May 28, 2021, 7:51pm

Asking this question here in case it’s useful to others down the line. I’m also in the process of setting up a new Butler on a new machine and ingesting raw data (in my case, DECam data). So far, I’ve successfully managed to

butler create $REPO
butler register-instrument $REPO lsst.obs.decam.DarkEnergyCamera
butler ingest-raws $REPO ... (for science, bias and flat frames)

Next, I’ve used pipetask run to build master biases using the cpBias pipeline, and used butler certify-calibrations to validate across a date range. Finally, I’ve used pipetask run with the RunIsrForCrosstalkSources pipeline (a DECam-only required pipeline) to generate crosstalk sources.

I’m now at the stage of needing to generate master flats, but when trying to use pipetask run with the cpFlat I get the error message: KeyError: "Dataset type with name 'camera' not found.".

I’m fairly sure I need to set up a new collection with these dataset types, but I’m not sure of the best way to go about this in gen3.

timj · May 28, 2021, 8:05pm

$ butler write-curated-calibrations $REPO DECam

lskelvin · May 28, 2021, 8:59pm

Thanks @timj, that did the trick. For reference, both

butler write-curated-calibrations $REPO DECam

and

butler write-curated-calibrations $REPO lsst.obs.decam.DarkEnergyCamera

seemed to work for me. I’m not sure if one is preferred over another, but the -h help file seems to suggest the latter.

timj · May 28, 2021, 9:19pm

They both work so long as the instrument has been registered. The doc string is wrong. Only register-instrument requires the class name (because looking up the name in registry is how butler works out the class from the name). The name is clearly shorter and easier to get right.

There is an issue with pipetaskat the moment in that it requires the class and not the name. I think that’s complicated to fix at the minute given that the name->class mapping needs butler access – there is some reason why it wasn’t trivial but I can’t remember exactly what the problem is (since pipetask does get told to use a butler).