Problem with megacam dataId

This is the continuation of a discussion started on HipChat.

I have run on some CFHT images and correctly produced files in output/coadd_dir/forced but I get an error when I try to access the produced catalogs through the butler. For instance if I do the following:

dataId = {'visit':visit, 'ccd':ccd, 'tract':0, 'filter':filter}
forced = butler.get("forced_src", dataId=dataId)

I get :

/home/boutigny/LSST/new/lsstsw/stack/Linux64/daf_persistence/12.0-1-gc553c11+1/python/lsst/daf/persistence/registries.pyc in lookup(self, lookupProperties, reference, dataId, **kwargs)
315 valueList.append(v)
316 cmd += " WHERE " + " AND ".join(whereList)
--> 317 c = self.conn.execute(cmd, valueList)
318 result = []
319 for row in c:

OperationalError: no such column: tract

I the forced_src entry in MegacamMapper.paf file looks ok:

forced_src: {
template: "forced/%(runId)s/%(object)s/%(date)s/%(filter)s/%(tract)d/FORCEDSRC-%(visit)d-%(ccd)02d.fits"
python: "lsst.afw.table.SourceCatalog"
persistable: "ignored"
storage: "FitsCatalogStorage"
tables: raw
tables: raw_visit

but apparently the butler is unable to reconstruct the full dataId from the partial list of provided keywords (visit, ccd, filter and tract) which are yet sufficient.

The problem seems to be related to the mixture between ccd-like keywords (visit, ccd, filter) and the coadd-like keyword (tract).

Following @hsinfang suggestion I tried to provide the complete dataId:

dataId = {'runId':'08BO01', 'object':'SCL-2241_P1','date':'2008-09-02', 'visit':1022064, 'ccd':25, 'tract':0, 'filter':'u'}

and it worked without problem.

This is very annoying as one of the functionality of the butler is to determine automatically the missing keywords. Doing that by hand is not practical.

I have the feeling that it may be possible to implement a trick in to force the system to consider “tract” as a valid keyword for forced_src but I don’t know how to do this practically…

That’s right. The keyword “tract” isn’t in the registry, and so it breaks. Maybe we should strip out “tract” and “patch” from queries, and reserve them solely for coadd-like data. What do you think, @ktl and @natepease?

But “tract” is not optional for forced_src. If we have several tracts there should be as many forced_src datasets

I think @price is proposing stripping “tract” out somewhere in the butler implementation, not removing it from the forced_src dataset. I think this will require some kind of fix to the butler.

As a possibly simpler workaround in the meantime, I believe you don’t have to specify a complete data ID to avoid the sort of registry lookups that are causing problems here; you just have to include all of the data ID keys that are needed to fill out the template. Of course, that will still probably require looking up some keys you shouldn’t have to.

Yes, the subset of keys present in the template is enough to make it work.

Here’s another workaround (as I suggested here):

dataRefList = list(butler.subset("raw", visit=visit, ccd=ccd))
assert(len(dataRefList) == 1)
dataRef = dataRefList.pop()
forced = dataRef.get("forced_src", tract=tract, immediate=True)

The idea is that you’re populating the dataId using the raw product, and using that to get the forced_src.


I have a question related to this discussion. I would like to get from a butler and a catalog name (‘forced_src’ or ‘deepCoadd_forced_src’ for instance) the full list of available dataIds. The idea would be to loop over this list instead of knowing in advance the exact set of dataIds. Is there a general way to get this list which does not depend on the input catalog?

I found a way to do it for the ‘forced_src’ catalog, but cannot do it for the other using the same trick:

catalog = "forced_src"
butler.getKeys(catalog).pop('tract') # fails otherwise
dataids = [merge_dicts(dict(zip(keys, v)), {'tract': 0}) for v in self.butler.queryMetadata("forced_src", format=keys)]
dataids = [dataid for dataid in dataids if butler.datasetExists(catalog, dataId=dataid)]


The butler (actually, the mapper which the butler uses) often is backed by a “registry” of metadata for the raw data, which allows us to quickly identify CCDs to be processed. However, there is (currently) no registry of coadd data, so it’s not possible to identify tracts and patches to be processed through the butler. However, what you can do is retrieve the deepCoadd_skyMap and iterate over that:

skyMap = butler.get("deepCoadd_skyMap")
dataIds = (dict(tract=tract.getId(), patch="%d,%d" % patch.getIndex(), filter="HSC-I") for tract in skyMap for patch in tract)
dataIds = [dataId for dataId in dataIds if butler.datasetExists("deepCoadd_forced_src", dataId)]

Thanks for your help! That helped a lot.


post deleted; @jbosch was not reading the code he was commenting on correctly

To close the loop, this issue was fixed in DM-8230.