I’ve just merged DM-25919, which both renames and modifies some of the Gen3 butler query methods that many of you are already familiar with.
First, the bad news (breaking changes):
Registry.queryDimensionshas been renamed to
expandargument has been removed from
And now the good news:
queryDatasetsare now faster by many orders of magnitude for large queries, at least by default (performance is similar to the old speed with
expand=Truewas the default).
There is a new method,
queryDimensionRecords, which returns metadata rows for a dimension directly, and is hence a much more convenient interface for that purpose (compared to the old approach of querying for data IDs, and then accessing
queryDatasetsnow return custom iterator objects (
DatasetQueryResults) with many extra methods, most of which return new result objects (it’s a “method chaining” interface, for those of you familiar with that concept). Those include an
expandedmethod that replaces the old
expand=Truekeyword argument (but without the enormous performance penalty), a
findDatasetsmethod to do bulk searches for datasets whose data IDs were identified by the original query, and a
materializecontext manager that stores the results in a temporary table in the database, allowing follow-up related queries without having to nest (and hence possibly re-execute) the original query as a subquery or round-trip the results through Python objects. These result objects are all still lazy iterators that don’t execute the query until iteration begins; we don’t want to assume users always want to fetch all results and stuff them in a container, even if that’s often the case. They do have
toSequencemethods that make fetching into Python containers easy when desired.
As documented on DM-24938, these changes make the parts of QuantumGraph generation that they were intended to optimize dramatically faster, but they make what is actually the bottleneck slightly slower, so there’s little overall change in performance. But they also set the stage for optimizing that bottleneck in the same way (on DM-24432, my current project), so I’m optimistic that we’ll soon get QuantumGraph generation down from approximately an hour (per tract, on HSC) down to 10-15 minutes.
I’ll add API doc links to the text above once the weekly docs are built. User guide docs for this functionality is not yet written; there’s some more functionality I’d like to add first.