Ingesting individual CCDs from fits files with mutiple extensions

raphaelshirley · July 22, 2020, 6:24pm

Hi all,

I am trying to ingest images from the VISTA VIRCAM camera. I have fits images with 16 extensions (one for each CCD). I presumed I could ingest the images directly into the butler and then get processCCD.py to loop over the extensions. However, this seems impossible as the ingestImages.py seems to require a ‘ccd’ key for the registry. Do I have to preprocess all the images to make an individual file for each CCD or is there some way I can either write an individual file to the Butler repo for each CCD during ingestion or else ingest the full file and get the processCcd task to loop through the extensions?

Many thanks,

Raphael.

timj · July 22, 2020, 6:31pm

These data sound like DECam data. Therefore you might want to look in obs_decam at the ingestImagesDecam.py.

Gen3 should be fine (but we might need to tweak things in case you discover that we have some decam-specific code in there).

raphaelshirley · July 22, 2020, 7:30pm

Great, thanks. I’ll look at that.

raphaelshirley · July 29, 2020, 8:10am

Hi,

I am trying to implement ingestImagesDecam.py for my own data. It seems to call lsst.obs.decam.ingest.DecamIngestTask which is described as being for a gen2 Butler. Implementing these pieces of code enables to the ingest stage to run but then the processCcd fails with

“RuntimeError: No such FITS file: /Users/rs548/GitHub/lsst-ir-fusion/dmu4/data_test/raw/2018-09-11/111197-VISTA-Y.fit[0]”

That file does not exist but I was hoping the [0] was pointing to an extension of the fits (the first ccd is in extension 1). The other relevant pieces of code seem to be lsst.obs.decam.ingest.DecamRawIngestTask and the code in obs_decam/config/convertRepo.py which appear to be for a gen3 Butler.

I am starting from scratch so presume it is sensible to start with a gen3 Butler. Do I need to run obs_decam/config/convertRepo.py in order to retarget the RawIngestTask called by a typical ingestImages.py run or should I be using the gen2 lsst.obs.decam.ingest.DecamIngestTask called by bin/ingestImagesDecam.py?

I have also set the raw template: raw/%(dateObs)s/%(visit)04d-%(filter)s.fit[%(hdu)d] in my VistaMapper.yaml file.

Many thanks for your help; I’m nearly there.

Best,

Raphael.

timj · July 29, 2020, 2:29pm

As the manager of the gen3 development I’m probably a little biased, but if you are starting from scratch and don’t mind tweaking things to keep up with development then gen3 should work great for you. Stop using gen2 and forget it even exists… The one caveat here is that we don’t have a generic ingestCalibs script yet for taking externally produced calibration biases/darks/flats and putting them in a repo. For specific calibrations that can be done by hand crafting yaml but obviously that’s not the end game – at the moment converting from a gen2 repo is the easiest way for most people but that has to be fixed fairly soon.

There are a few things you’d have to sort out before you get to the ingest phase. First you will need a metadata translator plugin that knows how to parse your headers. You can send me a pull request for astro_metadata_translator or you can keep the translator code in your obs package like we do with obs_lsst. Note that the eventually plan is to move those translators from obs_lsst to astro_metadata_translator to make them more generally discoverable but we didn’t want to do that until we have stable headers.

Then you need to write a gen3 Instrument class. You’ll see examples in the obs packages _instrument.py. This class describes the instrument and will enable the butler register-instrument command. We generally name them as lsst.obs.lsst.Latiss and lsst.obs.decam.DarkEnergyCamera. Once you’ve done that you can tackle the ingest side of things. This will require you defining your camera geometry – best to do that using the yamlCamera system – look in obs_lsst for that. You also need to define all your filters (conventionally in a filters.py).

You don’t need to worry about convertRepo if you aren’t converting from gen2 to gen3. You won’t need the makeDataIdTranslatorFactory in Instrument if you don’t have gen2.

DecamRawIngestTask equivalent is all you need for ingest – it knows about the multiple extensions.

Your decision really depends on how cutting edge and forward looking you want to be. Gen3 should be up and running in autumn with gen2 retired by the end of the year. If you don’t mind riding the wave and cursing me when I tweak an API then gen3 is for you.

raphaelshirley · July 29, 2020, 2:46pm

Hi,

Thank you for getting back to me. We will go with the gen3 butler and accept things may change. I had managed to define the VISTA camera to the extent it could run processCcd on the first ccd image from the fits file. To be honest I’m so new to working with the stack I’m not sure how I know if I am using gen2 or gen3 except that I installed the 20.0.0 stack recently so presume I’m working with gen3 by default.

The main thing I don’t understand is how DecamRawIngestTask gets called since ingestImagesDecam.py calls the gen2 DecamIngestTask. Since these functions inherit from different base classes (lsst.obs.base.RawIngestTask compared to lsst.pipe.tasks.ingest.IngestTask) they do not function in the same way so I can’t interchange them. How should I be calling DecamRawIngestTask? Note I’m replacing Decam with Vista in every instance for my obs package.

I would happily start work on a astro_metadata_translator module for vista. At the moment I am doing all the header translation in the obs package but will try to move it to a local astro_metadata_translator branch and make a pull request when I have it working.

Thanks again,

Raphael.

timj · July 29, 2020, 3:07pm

If you are using Mappers and from lsst.daf.persistence import then you are using gen2. v20 had both gen2 and gen3 in it. Gen3 is daf_butler with a completely different set of classes in obs packages. daf_butler provides a unified command line tooling for manipulating repositories so butler --help will tell you the commands that you can use. One of those will be butler ingest-raws. That command takes a --ingest-task option which would be the full python name of the ingest task you need. Making the system clever enough to work out that you are trying to ingest MEF data is something that is on the list in the longer term but for now we have to declare it manually.

Gen2 and gen3 ingest are completely different and not interchangeable at the task level. DECam doesn’t use astro_metadata_translator in gen2 but does in gen3 whereas obs_lsst uses the same translators for both.

Things like process ccd are different as well since CmdLineTask is going away and is replaced by the pipetask command from ctrl_mpexec and pipeline definitions in YAML.

raphaelshirley · July 29, 2020, 3:23pm

Ok great that explains things. In that case I think the code I am starting with from the template obs_necam is using gen2 mappers. I will try to start from scratch using the butler ingest-raws and gen3 but may work on the gen2 version simultaneously since I had it working to some extent.

Thanks very much for answering my questions. I’ll let you know if I make progress.

Best,

Raphael.

jrmullaney · July 29, 2020, 3:26pm

Yes, obs_necam is still using the gen2 butler. It’s on my summer to-do list to convert obs_goto and then obs_necam to gen3.

raphaelshirley · July 31, 2020, 10:13am

Sorry but I have one last question about this. After ingesting images with code following that in obs_decam for the gen2 ingestion. I can get the ingested ccds from a single MEF with:

butler.queryMetadata(‘raw’, [‘visit’, ‘ccd’], dataId={‘filter’: ‘VISTA-Y’})

Which returns the 16 ccd exposures from the one fits file. I can also get the filename (with the hdu number in square brackets at the end) with:

butler.get(‘raw_filename’, dataId={‘filter’: ‘VISTA-Y’, ‘visit’: 111157, ‘hdu’:2})

But when I try to get the exposure it tries to open the filename.fit[2] file which does not exist.

butler.get(‘raw’, dataId={‘filter’: ‘VISTA-Y’, ‘visit’: 111157, ‘hdu’:2})

leads to:

RuntimeError: No such FITS file: /Users/rs548/GitHub/lsst-ir-fusion/dmu4/data_test/raw/2018-09-11/111157-VISTA-Y.fit[2]

I have the getDestination function from https://github.com/lsst/obs_decam/blob/master/python/lsst/obs/decam/ingest.py which cuts off the square brackets. However, I get the same file not found error with or without it. I am working on a VistaTranslator for astro_metadata_translator so I can use the gen3 ingestor but would like to get processCcd running with the current data just to run some tests and compare to the gen3 outputs.

Is this an issue with the ingestion or do I need to write some processCcd overrides to read extensions?

This is such a trivial issue but I’m struggling to fix it.

Thanks again,

Raphael.

ktl · July 31, 2020, 1:02pm

Unfortunately, I think you are running into https://github.com/lsst/daf_persistence/blob/master/python/lsst/daf/persistence/posixStorage.py#L595
which hard-codes .fits rather than .fit. There is no good way to override this that I can think of, so I think the alternatives are:

Monkey-patch the routine
Rename your files
We patch that line to either not look for any specific extension or accept both fits and fit (or simplify things by combining with the later RE for readMetadata), and you use a weekly release rather than 20.0.0

ktl · July 31, 2020, 1:15pm

I have created https://jira.lsstcorp.org/browse/DM-26183 to track this.

raphaelshirley · July 31, 2020, 2:20pm

Aha thank you. Simply adding an ‘s’ to the template in policy/VistaMapper.yml

raw:
template: ‘raw/%(dateObs)s/%(visit)04d-%(filter)s.fits[%(hdu)d]’

Seems to have given the links the .fits name even if the actual files don’t and I have at least got past that error. I have moved on to a new error which hopefully isn’t related so I think that has fixed that issue.

Thank you very much. In terms of the best response long term perhaps the code could simply raise an error that fits files must be named with the .fits extension?

Thank you again,

Raphael.