Change to visit definition for LSSTCam and LATISS

dm-dev
Tags: #<Tag:0x00007f61a389b5a0>

(Tim Jenness) #1

For the next weekly there will be a change in the definition of visit used for LATISS (and later LSSTCam and ComCam).

Previously we have been treating visit as the same thing as the exposure. This is fine if each visit only has one exposure in it but that doesn’t work in the general case where you are taking multiple exposures that you want to combine. We now report four distinct quantities in the metadata translator:

  • Observation_id is a unique string associated with this exposure. It’s read directly from the OBSID header.
  • Exposure_group is a string that is the same for every exposure that should be treated as a single processing entity. This is the value of the GROUPID header and is generally set by the script queue and can be modified in a controlled way by the observing script if the observing script wants to separate exposures into separate groups. This is the new way that a visit is defined during observing.
  • exposure_id: This is an integer derived from the observation_id. For LSSTCam and LATISS it’s going to be a combination of the day of observation (YYYYMMDD) and a zero-padded sequence number.
  • visit_id: This is an integer derived from exposure_group. You won’t be able to easily predict this from the GROUPID header (especially if the GROUPID header has an unexpected form).

For gen2 butler these will be in the registry as obsid, expId, expGroup and visit. If you have been previously writing code that used visit, simply changing that to expId will be enough. If you are listening to events from the summit observing system you should be able to use the obsid directly rather than trying to work out how to generate expId from obsid.

For gen3 butler exposure.group will start to be populated as the visit string. The plan is to generate visit tables from this value but we are intending to support flexible visit definitions in gen3 so that we can easily handle deep drilling fields and calibrations without being locked into a single definition that was forced on us by the observing system.


(Robert Lupton) #2

Is the visit definition going to change for instruments that take simple single exposures (e.g. HSC)? If so, I think we need a new name and keep visit as an alias with the old meaning when available.

Also, the project defined (Telescope, Controller, DayObs, SeqNum) to define a unit of data; isn’t that the canonical form that people should be using? I realise that this may be encoded in Observation_id but the components are fundamental. If it’s managed as a string internally I don’t care.


(Tim Jenness) #3

This post was entirely about LSSTCam and LATISS. There has been no change to HSC where visit_id and exposure_id are identical.

You can use butler gen2 like that if you wish (although gen2 never supports more than one instrument per registry). obsid is more useful if you are reading the events coming from the camera.

Telescope code (not telescope name), controller, day obs and seqnum are not generic concepts and at present we haven’t worked out how we can make gen3 support them for LSSTCam/LATISS and not confuse HSC etc.


(Robert Lupton) #4

I don’t understand, “obsid is more useful if you are reading the events coming from the camera”. You mean non-image data? If the tuple doesn’t work for these events, we need to fix that.


(Tim Jenness) #5

I mean all camera image data. The “image name” issued by the camera is the obsid. That’s the only thing issued by the camera to tell us what data were written. I don’t see a problem with this and it’s much easier to grab one string and pass it straight to a dataID.


(Robert Lupton) #6

I disagree on this one. The usual pattern is something like,

dataId = dict(telescope='LATISS', controller='O', dayObs='2020-01-31')

“Which exposure was that?” “307” “OK”

dataId["seqNum"] = 307
raw = butler.get('raw', dataId)   # or pass  seqNum=307 as a kwarg

reusing the same base dataId through the night. But others may use your proposed pattern.


(Tim Jenness) #7

You can do that also, but if you are listening to events from the camera you don’t know the sequence number, you know the image name / obsid.


(K-T Lim) #8

The distinction here is between an observing script (submitted to the Script Queue) that only deals with events (and telemetry) versus an analysis script that has access to decoded and other information. The former case, which includes automated processing by the proposed OCS-Controlled Pipeline System, most simply refers to exposures by the image name/obsid returned by the camera as a string in its events.


(Lauren MacArthur) #9

If you have been previously writing code that used visit , simply changing that to expId will be enough.

So what am I doing wrong:
Using w_2020_04:

In [1]: import lsst.daf.persistence as dafPersist                                                             
In [2]: butlerDir = "/datasets/DC2/repo/rerun/w_2019_50/DM-22665/multi" 
   ...: butler = dafPersist.Butler(butlerDir)                                                                 
CameraMapper INFO: Loading exposure registry from /datasets/DC2/repo/registry.sqlite3
CameraMapper INFO: Loading calib registry from /datasets/DC2/repo/CALIB/calibRegistry.sqlite3
visiCameraMapper INFO: Loading calib registry from /datasets/DC2/repo/CALIB/calibRegistry.sqlite3
LsstCamMapper WARN: Unable to find valid calib root directory
LsstCamMapper WARN: Unable to find valid calib root directory
In [3]: visitDataId =  {"visit": 179972, "detector": 7, "filter": "u"}                                        
In [4]: exp = butler.get("calexp", visitDataId)                                                               
In [5]: exp                                                                                                   
Out[5]: <lsst.afw.image.exposure.exposure.ExposureF at 0x7f5324aab618>

but using w_2020_05 (note “visit” -> “expId”):

In [1]: import lsst.daf.persistence as dafPersist                                                             
In [2]: butlerDir = "/datasets/DC2/repo/rerun/w_2019_50/DM-22665/multi"   
   ...: butler = dafPersist.Butler(butlerDir)                                                                 
CameraMapper INFO: Loading exposure registry from /datasets/DC2/repo/registry.sqlite3
CameraMapper INFO: Loading calib registry from /datasets/DC2/repo/CALIB/calibRegistry.sqlite3
CameraMapper INFO: Loading calib registry from /datasets/DC2/repo/CALIB/calibRegistry.sqlite3
LsstCamMapper WARN: Unable to find valid calib root directory
LsstCamMapper WARN: Unable to find valid calib root directory
In [3]: visitDataId =  {"expId": 179972, "detector": 7, "filter": "u"}                                        
In [4]: exp = butler.get("calexp", visitDataId) 
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
<ipython-input-4-7192716177fa> in <module>
----> 1 exp = butler.get("calexp", visitDataId)

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+5/python/lsst/daf/persistence/butler.py in get(self, datasetType, dataId, immediate, **rest)
   1372         dataId.update(**rest)
   1373 
-> 1374         location = self._locate(datasetType, dataId, write=False)
   1375         if location is None:
   1376             raise NoResults("No locations for get:", datasetType, dataId)

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+5/python/lsst/daf/persistence/butler.py in _locate(self, datasetType, dataId, write)
   1291             components = components[1:]
   1292             try:
-> 1293                 location = repoData.repo.map(datasetType, dataId, write=write)
   1294             except NoResults:
   1295                 continue

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+5/python/lsst/daf/persistence/repository.py in map(self, *args, **kwargs)
    237         if self._mapper is None:
    238             raise RuntimeError("No mapper assigned to Repository")
--> 239         loc = self._mapper.map(*args, **kwargs)
    240         if not loc:
    241             return None

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+5/python/lsst/daf/persistence/mapper.py in map(self, datasetType, dataId, write)
    161         """
    162         func = getattr(self, 'map_' + datasetType)
--> 163         return func(self.validate(dataId), write)
    164 
    165     def canStandardize(self, datasetType):

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/obs_base/19.0.0-20-g6de566f+1/python/lsst/obs/base/cameraMapper.py in mapClosure(dataId, write, mapper, mapping)
    383                     if not hasattr(self, "map_" + datasetType):
    384                         def mapClosure(dataId, write=False, mapper=weakref.proxy(self), mapping=mapping):
--> 385                             return mapping.map(mapper, dataId, write)
    386                         setattr(self, "map_" + datasetType, mapClosure)
    387                     if not hasattr(self, "query_" + datasetType):

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/obs_base/19.0.0-20-g6de566f+1/python/lsst/obs/base/mapping.py in map(self, mapper, dataId, write)
    150             Location of object that was mapped.
    151         """
--> 152         actualId = self.need(iter(self.keyDict.keys()), dataId)
    153         usedDataId = {key: actualId[key] for key in self.keyDict.keys()}
    154         path = mapper._mapActualToPath(self.template, actualId)

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/obs_base/19.0.0-20-g6de566f+1/python/lsst/obs/base/mapping.py in need(self, properties, dataId)
    314             return newId
    315 
--> 316         lookups = self.lookup(newProps, newId)
    317         if len(lookups) != 1:
    318             raise NoResults("No unique lookup for %s from %s: %d matches" %

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/obs_base/19.0.0-20-g6de566f+1/python/lsst/obs/base/mapping.py in lookup(self, properties, dataId)
    259                 # here we transform that to {(lowKey, highKey): value}
    260                 lookupDataId[(self.range[1], self.range[2])] = dataId[self.obsTimeName]
--> 261             result = self.registry.lookup(properties, self.tables, lookupDataId, template=self.template)
    262         if not removed:
    263             return result

/software/lsstsw/stack_20191101/stack/miniconda3-4.5.12-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+5/python/lsst/daf/persistence/registries.py in lookup(self, lookupProperties, reference, dataId, **kwargs)
    363             cmd += " WHERE " + " AND ".join(whereList)
    364         cursor = self.conn.cursor()
--> 365         cursor.execute(cmd, valueList)
    366         return [row for row in cursor.fetchall()]
    367 

OperationalError: no such column: expId

I should also note that I get the same error even if I use “visit” as the key string in the above.


(K-T Lim) #10

The change of definition applies to data that has been ingested with the new Stack version. Existing Gen2 registries do not contain the necessary columns in the metadata tables. You should continue to use the same code with those until/unless they have registries updated via either a (faster) manual update or (safer) reingestion of all data. (There may be an issue with pathname templates when using the new obs_lsst with old registries, in which case you should fall back to an older obs_lsst version until the registry has been updated.)

Scripts for update of pre-w_2020_05 DC2 and LATISS registries differ slightly, as may those for other obs_lsst cameras. These scripts should not be applied to registries created using w_2020_05 or later; those registries already contain the necessary columns.

DC2 registry update:

begin;
alter table raw add column controller text;
alter table raw add column obsid text;
alter table raw add column expGroup text;
alter table raw add column expId int;
update raw set controller = 'S', obsid = visit, expGroup = cast(visit as text), expId = visit;
create unique index u_raw on raw (expId, detector, visit);
commit;

LATISS registry update:

begin;
alter table raw add column controller text;
alter table raw add column obsid text;
alter table raw add column expGroup text;
alter table raw add column expId int;
update raw set controller = 'O', obsid = 'AT_O_' || replace(dayObs, "-", "") || '_' || substr(cast(1000000+seqNum as text), 2), expGroup = cast(visit as text), expId = visit;
create unique index u_raw on raw (expId, visit);
commit;

(Lauren MacArthur) #11

You should continue to use the same code with those until/unless they have registries updated via either a (faster) manual update or (safer) reingestion of all data. (There may be an issue with pathname templates when using the new obs_lsst with old registries, in which case you should fall back to an older obs_lsst version until the registry has been updated.)

Yeah…since the “same code” (i.e. using “visit” as the key) doesn’t work with w_2020_05, I’m stuck having to setup an old obs_lsst for all pre-w_2020_05 processing of DC2 data?


(K-T Lim) #12

If you update the registry, then I think both old visit-based code and new expId-based code should work with either old or new obs_lsst for butler.get(). If you update the registry, you should only use w_2020_05 or later for ingestion.

If you do not update the registry, then I think you’re stuck using pre-w_2020_05 obs_lsst (and visit-based code) for butler.get().

(I have confirmed this behavior in testing on a duplicate LATISS repository.)


(Lauren MacArthur) #13

Ok, I’ll ask the “owners” of those data repositories to update the registries. Thanks again!


(Krzysztof Findeisen) #14

Just to be clear, will processed data (i.e., outputs of ISR + inputs/outputs of later PipelineTasks) still be parametrized by visit, not exposure, in Gen 3?


(Tim Jenness) #15

It all depends on how you want to use it. It makes sense for ISR to function on a per exposure basis since ISR doesn’t care about visits. On the other hand some processing does want to work on the visit level (eg combining the snaps prior to template subtraction). You have to decide what your PipelineTask is designed to work on. This change is going to force people to consider the difference between exposure and visit.


(K-T Lim) #16

Note that AP and DRP will have access to the sequence of GROUPIDs/exposure_groups/visit_ids emitted by the Script Queue for standard and alternate science visits.