Updates to Gen3 Butler construction and defaulting

I’ve just merged DM-27153 to master, which means it should appear in the upcoming w_2021_03 release. This includes some big changes to the Butler and Registry.

First off, Registry is now aware of the collections and run arguments passed to Butler at construction, so you no longer need to pass those again when using the Registry directly.
For example, here’s what was necessary before:

butler = Butler(root, collections=["a", "b", "c"])
refs = butler.registry.queryDatasets("calexp", collections=["a", "b", "c"])

but now the second line can just be:

refs = butler.registry.queryDatasets("calexp")

Passing collections (or run) to Registry is still valid everywhere it was valid before, and it will override what was passed at construction. It’s still necessary in those places if you don’t pass collections or run at construction.

The other big improvement is that Butlers and their associated Registry objects can now also be initialized with default values for certain dimensions - specifically, the “governor” dimensions, instrument and skymap. This can be done explicitly, via keyword arguments:

butler = Butler(root, instrument="HSC", collections=["HSC/defaults"])
raw = butler.get("raw", exposure=903334, detector=16)  # no instrument necessary here!

or implicitly, if the collections the Butler is initialized with contain datasets that have exactly one value for instrument or skymap, so the first line can actually just be (for the usual definition of this particular collection, that is):

butler = Butler(root, collections=["HSC/defaults"])

This should usually Just Work (for Registry methods, too), but it’s important to understand that the inference is based on the collections given at construction: if you initialize a Butler with no collections, even if the data repository only has one instrument or skymap overall, you won’t get any defaulting those values. That may be another case we can handle in the future, but right now it’s not obvious that the additional cost to butler initialization (more cleverness -> more queries) is worth it.

These improvements come with some other changes that we’re hoping no one would otherwise notice:

  • The tags and chains arguments to Butler's constructor have been removed; they were a little-used (i.e. “only in unit tests”) piece of too-magical functionality that wasn’t worth the complexity it added to the implementation. Instead, it’s now guaranteed that you can create a Butler with a TAGGED or CHAINED collection that does not yet exist, and then use butler.registry.registerCollection to create it, as long as you don’t attempt to query for anything first. There is no longer any way to automatically add new datasets to TAGGED collections in Butler.put or Butler.ingest; instead you can just do that manually with butler.registry.associate.

  • It is no longer possible to set butler.collections or butler.run after construction, at least not directly. This actually wasn’t ever intended to be supported - the fact that it worked at all was a temporarily lucky combination of Python’s permissiveness and a simple implementation (and I’m not sure it actually would have always worked, though it often did). Now, attempting to set either of these directly will raise a clear exception (they’re read-only properties), but it is possible to set them both via a statement like:

      butler.registry.defaults = lsst.daf.butler.registry.RegistryDefaults(...)
    

    This can be used to set default instrument and skymap values as well, and it will [re-]infer them from collection contents just like Butler construction does if they are not given.