I’ve just merged DM-27153 to master, which means it should appear in the upcoming w_2021_03
release. This includes some big changes to the Butler
and Registry
.
First off, Registry
is now aware of the collections
and run
arguments passed to Butler
at construction, so you no longer need to pass those again when using the Registry
directly.
For example, here’s what was necessary before:
butler = Butler(root, collections=["a", "b", "c"])
refs = butler.registry.queryDatasets("calexp", collections=["a", "b", "c"])
but now the second line can just be:
refs = butler.registry.queryDatasets("calexp")
Passing collections
(or run
) to Registry
is still valid everywhere it was valid before, and it will override what was passed at construction. It’s still necessary in those places if you don’t pass collections
or run
at construction.
The other big improvement is that Butler
s and their associated Registry
objects can now also be initialized with default values for certain dimensions - specifically, the “governor” dimensions, instrument
and skymap
. This can be done explicitly, via keyword arguments:
butler = Butler(root, instrument="HSC", collections=["HSC/defaults"])
raw = butler.get("raw", exposure=903334, detector=16) # no instrument necessary here!
or implicitly, if the collections the Butler
is initialized with contain datasets that have exactly one value for instrument
or skymap
, so the first line can actually just be (for the usual definition of this particular collection, that is):
butler = Butler(root, collections=["HSC/defaults"])
This should usually Just Work (for Registry
methods, too), but it’s important to understand that the inference is based on the collections given at construction: if you initialize a Butler
with no collections, even if the data repository only has one instrument or skymap overall, you won’t get any defaulting those values. That may be another case we can handle in the future, but right now it’s not obvious that the additional cost to butler initialization (more cleverness -> more queries) is worth it.
These improvements come with some other changes that we’re hoping no one would otherwise notice:
-
The
tags
andchains
arguments toButler
's constructor have been removed; they were a little-used (i.e. “only in unit tests”) piece of too-magical functionality that wasn’t worth the complexity it added to the implementation. Instead, it’s now guaranteed that you can create aButler
with aTAGGED
orCHAINED
collection that does not yet exist, and then usebutler.registry.registerCollection
to create it, as long as you don’t attempt to query for anything first. There is no longer any way to automatically add new datasets toTAGGED
collections inButler.put
orButler.ingest
; instead you can just do that manually withbutler.registry.associate
. -
It is no longer possible to set
butler.collections
orbutler.run
after construction, at least not directly. This actually wasn’t ever intended to be supported - the fact that it worked at all was a temporarily lucky combination of Python’s permissiveness and a simple implementation (and I’m not sure it actually would have always worked, though it often did). Now, attempting to set either of these directly will raise a clear exception (they’re read-only properties), but it is possible to set them both via a statement like:butler.registry.defaults = lsst.daf.butler.registry.RegistryDefaults(...)
This can be used to set default
instrument
andskymap
values as well, and it will [re-]infer them from collection contents just likeButler
construction does if they are not given.