I’ve just merged DM-25919, which both renames and modifies some of the Gen3 butler query methods that many of you are already familiar with.
First, the bad news (breaking changes):
-
Registry.queryDimensions
has been renamed toqueryDataIds
. -
The
expand
argument has been removed fromqueryDimensions
/queryDataIds
andqueryDatasets
.
And now the good news:
-
queryDataIds
andqueryDatasets
are now faster by many orders of magnitude for large queries, at least by default (performance is similar to the old speed withexpand=False
, butexpand=True
was the default). -
There is a new method,
queryDimensionRecords
, which returns metadata rows for a dimension directly, and is hence a much more convenient interface for that purpose (compared to the old approach of querying for data IDs, and then accessing.records
on those). -
queryDataIds
andqueryDatasets
now return custom iterator objects (DataCoordinateQueryResults
andDatasetQueryResults
) with many extra methods, most of which return new result objects (it’s a “method chaining” interface, for those of you familiar with that concept). Those include anexpanded
method that replaces the oldexpand=True
keyword argument (but without the enormous performance penalty), afindDatasets
method to do bulk searches for datasets whose data IDs were identified by the original query, and amaterialize
context manager that stores the results in a temporary table in the database, allowing follow-up related queries without having to nest (and hence possibly re-execute) the original query as a subquery or round-trip the results through Python objects. These result objects are all still lazy iterators that don’t execute the query until iteration begins; we don’t want to assume users always want to fetch all results and stuff them in a container, even if that’s often the case. They do havetoSet
andtoSequence
methods that make fetching into Python containers easy when desired.
As documented on DM-24938, these changes make the parts of QuantumGraph generation that they were intended to optimize dramatically faster, but they make what is actually the bottleneck slightly slower, so there’s little overall change in performance. But they also set the stage for optimizing that bottleneck in the same way (on DM-24432, my current project), so I’m optimistic that we’ll soon get QuantumGraph generation down from approximately an hour (per tract, on HSC) down to 10-15 minutes.
I’ll add API doc links to the text above once the weekly docs are built. User guide docs for this functionality is not yet written; there’s some more functionality I’d like to add first.