Ingesting other surveys/camera into a butler repo

Hi (@timj ?)

What’s the best way to start trying to ingest data from another survey/instrument into a butler repo. The data are unlikely to be further processed but just needs to searchable in the repo (eg by sky position, filter). Is there some documentation/tutorial on this? Does a new class or package need writing for each camera or can metadata/header info be translated (use astrometadata)? We’ve several datasets we’re thinking of ingesting, though I readily have access to WFCAM data locally so would probably start with that.

Assuming you don’t mind using other LSST software, step one is to create an obs package for WFCAM:

I’m assuming you are using MEF files rather than the original HDS files. If you have all the detectors in a single file then that’s like DECam. The instrument package defines how the instrument and physical_filter records in butler registry will be populated. The formatter tells butler how to read a file.

The big question then becomes what dataset types you are wanting to ingest. We mostly start from raw files and ingest those and create exposure records from the astrometadata translated output.

If you are only ingesting coadds that are on a skymap you can skip a lot of that and the issue becomes how to define your skymap and whether it matches one of the ones we already have.

You can use butler ingest-files to ingest the coadds once you have the physical_filter, instrument, tract and patch dimension records clarified.

Thanks @timj. These would be MEFs and already coadds. Though this is just an example but generally they’ll be final coadds from various other surveys we’ve been asked to host. But I hadn’t appreciated everything would need to be on a defined skymap or warped to one.

I thought they could go in butler and be queryable using extracted WCS information.

Maybe butler isn’t the way to go but rather some sort of SIAP service.

Sounds like I need my more info on use cases.


There are different types of skymap definitions that might help you. The real trick is working out what the “data ID” should be. Each dataset has to be uniquely accessible by a RUN collection, a dataset type, and a dataId. The dataId provides the “data coordinate” of the dataset. For example a coadd from LSSTCam might be accessible for coordinate (instrument, physical_filter, tract, patch) where tract and patch define the area on the sky. Our healpix maps will use a healpix ID instead of tract and patch. You can’t have a two datasets with the same run/datasetType/dataId combination.

I was specifically answeriing a WFCAM question since I had thought you would have coadds that are mapped onto a sky grid somehow. For coadds that are effectively located by HTM or HEALPix ID then that can work if things are setup in a specific way. I’m sure @jbosch can comment further if you give us some explicit examples of your datasets.

The hard part with butler is working out the data coordinate. We can definitely discuss options with you.

Sorry probably shouldn’t have led with WFCAM, it was a convenient dataset I’m used to and thought it would be a useful test. The stacks (coadds) are not on a skymap as such(?) just left in their original “frame” with usually a ZPN WCS.

We asked the UK community for a list of ancillary datasets they wanted the UK-DAC to host. Which produced a hefty list but without much use case information. I suspect a lot of uses (what does the LSST object look like in survey X) might well be satisfied by existing image services bundled with the RSP or maybe calls to remote services.

It would still be good to know if a Healpix approach would be a simple way to ingest the frames without any pixel processing.

I worked for UKIRT so it’s a dataset I know all about.

It would be great to see a representative list. It would clearly be convenient for notebook users to be able to butler.get a WFCAM image and LSSTCam image easily.

So these are effectively stacks of unwarped exposures? I recall that we observed WFCAM by stepping to fill in the gaps.

That implies we could probably use visit, detector, physical_filter as the dataId. It’s interesting of course that there is no data at the center of the visit. WFCAM data used YYYYMMDD and sequence number for exposure (just like LSSTCam is doing). We define visit as something that is assumed to be on a single night so that makes it a bit of a tricky concept for multi-night coadds of unwarped exposures.

Didn’t WFCAM observing have an implicit skymap? There was a grid of fields on the sky that the scheduler would use. I can’t remember if those fields had names. @jbosch would a discrete skymap work for that?


:grinning: another reason I led with that, I thought you might have experimented already with WFCAM and butler.

WISE, SPITZER, VST, DES, HSC (though already solved?, will be done anyway?), VISTA (also solved as is being used and rewarped to LSST skymap for a separate work package).

Yes, the individual exposures were dithered by a known amount and sometimes microstepped but the stacks were just formed by offsetting, interleaving and adding (some interpolation?). Four stacks represent a filled in tile, but tiles were never produced. An individual detector (extension in the MEF) in the pawprint is kind of treated as the observation.

Sounds workable as most WFCAM surveys were single night, only DXS and UDS were multinight. UDS did swarp etc.

WFCAM did observe fields based on the scheduler/queue but not sure if that constitutes a skymap. Sometimes this went a little bit awry (probably wrong inputs) and/or guide star availability issues. And as you say the field is only covered by a pawprint. Seems best to treat each extension as a separate observation so YYYYMMDD , seqnum, extnum? Some pawprints have individual extensions deprecated too, and an extension from another observation is used instead. So I would probably copy the data to single extension FITS first and ingest. There’s not much WFCAM/LSST overlap so not too big a hit disk space wise. Though for a data usability POV, part of me would ultimately like to see all WFCAM survey data exposed via butler/RSP?

Jumping in on this –

I’d like to import some of the final processed images from DECals and DES. These are final survey products, which have (presumably good) WCS and photometric calibration information in their headers. Does the stack already have parameters in it for these specific surveys? If not, I haven’t yet defined any custom skymaps or the like; what are the resources I should be looking at to figure out how to do this? (Specifically: ingest fully processed images for further use, and figure out how to define a skymap that matches the bricks of the survey image release.)

To ingest any file you need to know what it’s dataId is going to be. How do DES uniquely assign a value to a coadded image on the sky? Presumably they do warp to a common skymap so you need to find out what that skymap might be. Are they using tract/patch or healpix grids?

I am not sure exactly what you mean by tract/patch.

What decals make available is “bricks”. These are patches on the sky of ~1/4 degree. (Bigger for DES.) They have a database with center RA/Dec that allows you to find the images. The actual (reduced, stacked) images themselves then have a WCS in the header.

The two things I’d like to figure out is first, how I’d even load these into a butler-managed repository. What parameters would I need to set up so that I could load them in properly? And, second, how to figure out what to do with a skymap for these templates.

You need to define the dataId of each image. As I alluded to, we do it by defining a skymap and then the skymap can be accessed by tract/patch tuples. A tract is a large region with a shared projection, and we usually split a tract into 7x7 patches. So a dataId could be tract=2, patch=36, skymap="my_skymap".

You have to convert your “bricks” into a skymap that butler understands so you can address each dataset by it’s skymap coordinates.

@jbosch is there a user guide on setting up skymaps?

No user guide for skymaps, I’m afraid, but it sounds like your bricks will map to tracts, and if you don’t already have a useful subdivision below that, you can either have one tract per patch or make up some subdivision that might be useful for smaller derived images in terms of fitting well within memory. I’m assuming each brick has one WCS and it’s a relatively standard one, like TAN?

If that’s the case, I think there are two options for how to do this:

  • If the centers of the “bricks” are algorithmically generated in a relatively simple way, you could write a subclass of lsst.skymap.BaseSkyMap (or perhaps CachingSkyMap for some extra functionality; the RingsSkyMap class in that package is the one we use most and is a good example of this kind of thing).

  • If the centers are not algorithmically generated, or you’d rather just write them all out explicitly, you can use DiscreteSkyMap, which is configured with lists of (ra, dec, radius), and it makes big square tracts with that radius as width. If that’s not quite flexible enough to represent the brick WCSs and layout exactly, you can make a custom subclass by copying it instead.