Coadding DP0 calexps from different detectors for the same deepCoadd

kalamo · April 5, 2022, 1:16am

Hi everyone, I want to coadd a set of calexps from different detectors (not all) for the same visit. I have seen examples for HSC data but not for DP0. What should I use for the instrument name in the case of DP0 data? Is there a tutorial for this?

MelissaGraham · April 6, 2022, 12:32am

Hi @kalamo, thanks for posting this question.

In order to help you with this query, could you clarify whether I understand correctly that “wanting to coadd calexps from different detectors of the same visit” means “wanting to coadd images of different regions of sky”? I can think of a few science cases for doing that, but it is a little rare, so I want to make sure that’s the goal, to be sure we’re on the same page.

There is not yet a specific DP0 tutorial for coadding the DP0 images, but the LSST Science Pipelines team has created this tutorial for coadding images which might help you get started: Getting started tutorial part 5: coadding images — LSST Science Pipelines.

Let me know if this is helpful, and please don’t hesitate to reach out for more help!

MelissaGraham · April 8, 2022, 6:04pm

Hi again, I had a discussion with @alxogm about this issue in the Delegate Assembly this morning, and we think we found the answer to “What should I use for the instrument name in the case of DP0 data?”: LSSTCam-imSim.

We found the instrument name in the datasetRefusing the Butler. One place to see this demonstrated is in Section 2.3 of the tutorial notebook 04_Intro_to_Butler.ipynb, which all delegates can find in their notebooks/tutorial-notebooks/ directory.

A side note about terminology – based on my conversation with @alxogm today I now understand that the goal is not to coadd calexps that are from the same visit, but that were used to create the same deepCoadd. I know terms and jargon can be quite survey-specific, and in the case of LSST a visit is an observation of a field in a filter at a certain time (which might be comprised of 2x15s exposures that are immediately combined, or a single exposure of, e.g., 30s). So I’m going to change ‘visit’ to ‘deepCoadd’ in the title of this topic, and no worries on that!

Anyway, let us know whether that instrument name works in your code?

kalamo · May 3, 2022, 12:50am

Hi Melissa, thanks for your answer.

What @alxogm explained is right, we want to build customized deepCoadds (with only a subset of visits). Although we also want a single image composed of a subset of detectors (lets say, only 4 contiguous detectors).

We thought it would be good to start by reproducing the available deepCoadd for a certain RA and DEC. With the coordinates we can extract the tract, patch, and visits used to construct that deepCoadd, but we haven’t even been able to make the warp.

This is what I’m trying:

collection = '2.2i/runs/DP0.1'
butler = Butler(repo, collections=collection)

ra, dec = 55.064, -29.783
cutout = cutout_coadd(butler, ra, dec, band='g', datasetType='deepCoadd', cutoutSideLength=1000)

# To get the visits used to contruct the deepCoadd
unique_visits = list(set([i['visit'] for i in cutout.getInfo().getCoaddInputs().ccds]))

# To get the tract and patch
radec = geom.SpherePoint(ra, dec, geom.degrees)
tractInfo = skymap.findTract(radec)
patchInfo = tractInfo.findPatch(radec)

# To make the warp to then coadd:

! butler create GEN3_run
! butler register-instrument ./GEN3_run lsst.obs.lsst.LsstCamImSim

! pipetask run -b ./GEN3_run/butler.yaml \
             -i "2.2i/runs/DP0.1" \
             -d "band='g' AND skymap='DC2' AND tract=4638" \
             -p ${PIPE_TASKS_DIR}/pipelines/DRP.yaml#makeWarp \
             --instrument lsst.obs.lsst.LsstCamImSim  \
             -o u/$USER/warps \
             --register-dataset-types

MelissaGraham · May 4, 2022, 4:30am

Hi @kalamo, what kind of error messages is this script returning?

kalamo · May 4, 2022, 8:46pm

lsst.daf.butler.cli.utils ERROR: Caught an exception, details are in traceback:
Traceback (most recent call last):
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ctrl_mpexec/g768b03a34c+18099b7fc4/python/lsst/ctrl/mpexec/cli/cmd/commands.py", line 126, in run
    qgraph = script.qgraph(pipelineObj=pipeline, **kwargs)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ctrl_mpexec/g768b03a34c+18099b7fc4/python/lsst/ctrl/mpexec/cli/script/qgraph.py", line 183, in qgraph
    qgraph = f.makeGraph(pipelineObj, args)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ctrl_mpexec/g768b03a34c+18099b7fc4/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 561, in makeGraph
    registry, collections, run = _ButlerFactory.makeRegistryAndCollections(args)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ctrl_mpexec/g768b03a34c+18099b7fc4/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 350, in makeRegistryAndCollections
    butler, inputs, self = cls._makeReadParts(args)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ctrl_mpexec/g768b03a34c+18099b7fc4/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 287, in _makeReadParts
    self = cls(butler.registry, args, writeable=False)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ctrl_mpexec/g768b03a34c+18099b7fc4/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 218, in __init__
    self.inputs = tuple(registry.queryCollections(args.input, flattenChains=True)) if args.input else ()
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_butler/gbc02ca69e9+7f943868d5/python/lsst/daf/butler/registries/sql.py", line 886, in queryCollections
    for record in query.iter(
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_butler/gbc02ca69e9+7f943868d5/python/lsst/daf/butler/registry/wildcards.py", line 673, in iter
    yield from self._search.iter(
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_butler/gbc02ca69e9+7f943868d5/python/lsst/daf/butler/registry/wildcards.py", line 507, in iter
    manager.find(name),
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_butler/gbc02ca69e9+7f943868d5/python/lsst/daf/butler/registry/collections/_base.py", line 448, in find
    raise MissingCollectionError(f"No collection with name '{name}' found.")
lsst.daf.butler.registry._exceptions.MissingCollectionError: No collection with name '2.2i/runs/DP0.1' found.

MelissaGraham · May 11, 2022, 5:44am

Hi again,

I don’t yet have the answer, but I’ve been looking into how to coadd DP0 calexps from a notebook. I think at least part of the solution is that there’s no need to create a new, empty butler, and that is probably why the collection cannot be found – it’s not in the newly created GEN3_run.

I think also, from a notebook, there’s no need to run the command line task externally, and lsst.pipe.tasks can be imported and called (lsst.pipe.tasks — LSST Science Pipelines). I think the two tasks we want to use are WarpAndPsfMatchTask and AssembleCoaddTask. But I haven’t quite figured out how to use them yet, still learning.

If anyone else finds this thread and can help out, that would be great.

natelust · May 13, 2022, 5:54pm

Hey, I’m just getting caught up on this now and want to make sure I understand everything, and offer some reading material for background. If you are already familiar with some of what I say, I apologize I just want to make sure everyone is on the same page.

In Rubin we use a Butler to mediate access to all data so that all the complexities of where data is located, relations between data, and what is available is abstracted away. The Butler is a system that has recorded all of this information inside of itself, and knows how to deliver the appropriate results (be it data, or metadata about the data), but its Butler repository does not contain the actual data. There can be multiple Butler repositories each with their own metadata, processing histories etc. When you are in python you instantiate a Butler object that connects with one of these repositories. You can kind of loosely think of it like a mount point on a file system can be the same directory, but different storage devices (eg different usb thumb drives) might be mounted to that location. The python Butler class can be pointed at different backed repositories where data may be found, but each repository is independent.

Data products within a butler are organized into what are called collections. These are kind of logically similar to directories, but the analogy is not 1:1, and there can be multiple kinds (though that is mostly transparent, and is often an edge case). A collection contains all the inputs that go into processing data (running a pipeline), as well as the outputs of that processing. So say collection “A” contains raws (and all other appropriate inputs) that can be used as an input to run a pipeline that produces calexps that are output in a collection named “B” . Looking in collection “B” you will then see all the calexps as well as all the raws that went into processing those specific calexps. Collection “B” can be used as an input to running further pipelines and output into collection “c”, etc. What collection data associated with is metadata within the Butler repository, if you were to connect to a different Butler repository, you would see different data, and different collections.

In that sense there really is not one butler, but it is a concept that encompass a system built from all the Butler pieces created at a specific location. In the example you followed, there was a brand new Butler repository created at the location ./GEN3_run that contained only the raws and such that were ingested into that specific Butler repository and contained only the collections created within that repository. The Butler repository available to delegates containing all the already processes data can be found at the url s3://butler-us-central1-dp01, and the collection that is associated with all the data available is named 2.2i/runs/DP0.1. So if we want to process some data, that collection can be used as an input collection, and you can specify what output collection you want the results to be associated with. This will then be available withing the s3://butler-us-central1-dp01 Butler repository.

Lets break down the command line you wrote above.

The -b option specifies which Butler repo should be used in the data-processing.
-d is like a filter that is applied when the system tries to work out what data to run on. If there is no where filter, then they system will try to run all combinations of input data that it finds in the butler that matches the input types that the processing pipelines wants. This specific where statement is saying only process data that was taken in band ‘g’, is defined on skymap ‘DC2’ (for now lets skip over how a skymap comes in, but it is just a tessellation of the sky) and the input data must spatially overlap the spatial region defined for tract 4638.
-i specifies the input collection to use when deciding what data can be used as an input
-o specifies the name to use for the output collection to be associated with the processed data
-p defines the data processing pipeline to run, in this case it is only a subset of the pipeline to run (see here for more information on pipelines
--register-dataset-types is something that needs to be run when a Butler repository is created, and should not be needed or used for the DP0 butler repository.

Another point of clarification is the relation to the terms used when you describe what data you want to manipulate. A visit is a single pointing of the telescope, it contains all the detectors in the focal plane. Coadds are built from visits, but are defined purely in terms of spatial terms. The sky is divided into a series of tracts, and each tract is further divided into several patches. A given tract, patch combination will bound a certain area of the sky.

When we build coadds, we find all the visits where the spatial extents of the visit overlap the specified tract,patch. Note that a complete visit may not fall within a tract,patch as they are arbitrary pointings on the sky, and can also be arbitrarily rotated. For each of these visits we consider which detectors in that visit are contained or overlap the tract, patch position. Each of these visits are then resampled onto a common grid, which accounts for rotations, distortion models, etc. This is the makeWarp process. The coaddition process then combines each of these appropriately to form the deeper coadded image.

If you restrict your coadd to only specific detectors within the camera (which is doable if this is really what you want), you greatly decrease the depth of your coadd, and in fact may not get much more than one visit for any given area withing the coadd. This is due to the fact that these visits again can be arbitrarily aligned with the tract, patch and there is no reason persay that a given detector ever line up with the same detector in a subsequent visit. And in general the better strategy is not NOT have the same detector always fall on the same part of sky, this prevents imprinting detector features into your coadd, and ensure a varied sampling of the sky such that defects and such can be rejected.

So the question is, when you want an image composed of only a few detectors, what do you mean by that? Do you want to have an image of a single visit where the detectors are plotted in focal plane coordinates? Or do you want only a small area of the coadd?

Now on to your initial question, running a specific task. There are three interfaces for this, using the command line interface as above, directly using the python task and worrying about all the i/o yourself, or using the basic python pipetask interface. The latter is the way I am recommending at this point until you get more familiar with the system.

This interface is designed to be a simplified interface suitable for most tests, one offs, and notebook use. It does not have all the options that the command line interface has, and is never intended to. At some point in the future we will be delivering a more feature rich python API. That said this interface is nice, and takes care of a lot of the work for you, and is exactly what you want here.

The documentation forSimplePipelineExecutor can be found here. A good example of using it can be found in this community post. For this case you can ignore the quanta variable, after the run command has finished, you can load the resulting coadd from the butler (though it would be easier to not re-use the butler from the linked snippit, just re-instantiate it, pointing at the collection you used for the output)

MelissaGraham · August 5, 2022, 3:46am

Hi @kalamo – I know it’s been a long time, I just wanted to get back in touch and say I’m still working towards a full tutorial notebook that demonstrates how to create a custom coadd and then do source detection and measurement on it. It’s been unexpectedly challenging and is still in progress.

MelissaGraham · October 28, 2022, 3:50pm

Hello again @kalamo, we now have a tutorial for creating custom coadds with the LSST Science Pipelines within a Jupyter Notebook.

All delegates should find new files in their notebooks/tutorial-notebooks/ directory, titled:

09a_Custom_Coadd.ipynb
09b_Custom_Coadd_Sources.ipynb

These are also available via the rubin-dp0 GitHub Org in the tutorial-notebooks repo (prod branch will match the directories in data.lsst.cloud user accounts).