Running MakeDiscreteSkyMapTask as part of a gen3 pipeline

Hi,

I would like to run MakeDiscreteSkyMapTask as part of a gen3 pipeline after creating calexp datasets. I am using the latest stack release. All calexps should be used to produce a single skymap. I understand there is a butler client tool for this, but I want to use it in a pipeline.

Doing this out of the box is presently not possible (I believe) because the task is not written / configured for gen3 pipeline use. I have therefore written the following code:

from lsst.pipe.tasks import makeDiscreteSkyMap as taskBase

DIMENSIONS = ("skymap",)
TEMPLATES = {"calexpType": ""}

class MakeDiscreteSkyMapConnections(pipeBase.PipelineTaskConnections,
                                    dimensions=DIMENSIONS,
                                    defaultTemplates=TEMPLATES):

    # Based on lsst.pipe.tasks.makeCoaddTempExp
    calExpList = cT.Input(
        doc="Input exposures to be covered by the output skyMap.",
        name="{calexpType}calexp",
        storageClass="ExposureF",
        dimensions=("instrument", "visit", "detector"),
        multiple=True,
        deferLoad=True,
    )

    # Based on lsst.skymap.baseSkyMap
    skyMap = cT.Output(
            name="skyMap",
            doc="The sky map divided into tracts and patches.",
            dimensions=["skymap"],
            storageClass="SkyMap"
    )


class MakeDiscreteSkyMapConfig(pipeBase.PipelineTaskConfig, taskBase.MakeDiscreteSkyMapConfig,
                               pipelineConnections=MakeDiscreteSkyMapConnections):
    pass


class MakeDiscreteSkyMapTask(taskBase.MakeDiscreteSkyMapTask):

    ConfigClass = MakeDiscreteSkyMapConfig

    def runQuantum(self, butlerQC, inputRefs, outputRefs):
        """
        """
        inputs = butlerQC.get(inputRefs)

        # Organise inputs to what the base task needs
        wcs_md_tuple_list = []
        for calexp in inputs["calExpList"]:
            wcs = calexp.getWcs()
            md = calexp.getMetadata()
            wcs_md_tuple_list.append((wcs, md))

        # Run the task
        outputs = self.run(wcs_md_tuple_list)

        # Use butler to store the outputs
        butlerQC.put(outputs, outputRefs)

The simplified / problematic quantum graph looks like this:

I run the task in an existing butler repository:

pipetask run -b ${REPO} -t huntsman.drp.lsst.tasks.makeDiscreteSkyMap.MakeDiscreteSkyMapTask --input calexp/20210720T034253Z --output skymap --register-dataset-types

This results in the RuntimeError: QuantumGraph is empty error. I have read the FAQ about this but found no solution.

Running butler query-datasets ${REPO} --collections calexp/20210720T034253Z shows that the calexp datasets do exist.

I do however notice this warning:

SAWarning: SELECT statement has a cartesian product between FROM element(s) "skymap" and FROM element "physical_filter". Apply join condition(s) between each element to resolve.

I do not know how to resolve this issue. A workaround would be to split the pipeline into two parts and manually create the skyMap in between, but this is not ideal and should not be necessary.

Any pointers on this would be much appreciated! Apologies if this has already been asked, I have searched the community forum with no luck.

Have you tried running the butler make-discrete-skymap command? There was a bug in it but the most recent weekly fixed the problem. The most recent weekly no longer has the wcs_md_tuple_list parameter because I changed it to wcs_bbox (I had not realized that anyone was wanting to use the run method directly instead of using the command-line interfaces).

I think you’re going to have problems doing what you want. In Gen3, as I understand it, a skymap is not only a dataset defining tract/patch positions on the sky but also a dimension in the Registry that relates a skymap name to lists of tract and patch numeric identifiers. You are trying to write out the dataset, but its DataId cannot be determined because its value doesn’t yet exist in the Registry dimensions. (In addition, you nowhere seem to specify what the skymap name should be, which seems problematic.)

You may be able to determine the parameters of the DiscreteSkyMap in a pipeline, but you need to execute a non-pipeline Registry operation to use it (as butler make-discrete-skymap does).

Hi @timj , thanks for your reply. Yes, it seems that I hit a bug using that command:

(lsst-scipipe-0.6.0) [lsst@docker-desktop stack]$ butler make-discrete-skymap --collections calexp/20210720T051630Z /opt/lsst/software/stack/br lsst.obs.huntsman.HuntsmanCamera
numexpr.utils INFO: NumExpr defaulting to 4 threads.
Traceback (most recent call last):
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/daf_butler/21.0.0-112-g6e624863+02ffdaf10e/bin/butler", line 28, in <module>
    sys.exit(main())
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/daf_butler/21.0.0-112-g6e624863+02ffdaf10e/python/lsst/daf/butler/cli/butler.py", line 328, in main
    return cli()
  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.6.0/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.6.0/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.6.0/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.6.0/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-0.6.0/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/pipe_tasks/21.0.0-120-g57749b33+77c36da417/python/lsst/pipe/tasks/cli/cmd/commands.py", line 57, in make_discrete_skymap
    script.makeDiscreteSkyMap(*args, **kwargs)
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/pipe_tasks/21.0.0-120-g57749b33+77c36da417/python/lsst/pipe/tasks/script/makeDiscreteSkyMap.py", line 78, in makeDiscreteSkyMap
    wcs_md_tuple_list = [(butler.getDirect('calexp.metadata', ref), butler.getDirect('calexp.wcs', ref))
  File "/opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/pipe_tasks/21.0.0-120-g57749b33+77c36da417/python/lsst/pipe/tasks/script/makeDiscreteSkyMap.py", line 78, in <listcomp>
    wcs_md_tuple_list = [(butler.getDirect('calexp.metadata', ref), butler.getDirect('calexp.wcs', ref))
TypeError: getDirect() takes 2 positional arguments but 3 were given

Which looks like the bug you just fixed.

Hi @ktl , thanks for your reply.

Yes, I had suspected the dataId determination was part of the problem. I could use the butler CLI version, but this seems somewhat contrived / against the ethos of gen3 pipelines.

In the ideal case one would have a single pipeline to go from raw exposures to coadds, but from what you say, that is not currently possible if the skymap is not already present (which it is not in my case).

Would your advise be to split the full pipeline into two parts (with the first producingcalexp datasets and the second making coadds from them) and then to run butler make-discrete-skymap in the middle?

Unfortunately, one of the prices to pay for Gen3’s power and simplifications is the fixed dimension Registry system. I can’t see how anything that could affect the QuantumGraph structure can be a data-dependent function of the inputs to that very graph.

I suspect that’s your only alternative for now (aside from using a non-Discrete skymap or pre-determining the Discrete one).

1 Like

O.K., thank you for the clarification!

Yes, use weekly 29 or newer.

Yes, since we are mostly doing large area surveys an all sky skymap is what we use most of the time.