New Butler query system released

[This functionality is available on the Rubin Science Platform from Weekly 2024_37 onwards]

Today I merged ticket DM-45872 which moves the new query system APIs from experimental mode to fully public mode. We have also begun to move away from butler.registry for querying to a simpler butler interface.

The new query system has been used under the hood in the command-line tools for a while now but now people can use the new system from Python.

The new query system has been designed to provide a more unified approach for all the different types of queries but does provide some key enhancements:

  • You can now query for datasets with a lsst.sphgeom.Region or use the new POINT(ra, dec) syntax. Recently we added the ability for Regions to be constructed from simple strings using the IVOA POS definition via Region.from_ivoa_pos().
  • We have fixed the duplication of results problem so you no longer need to immediately pass the results to a set().
  • There is now a simplified interface that returns a list and an advanced interface that allows complex queries to be built up by method chaining.
  • Calibration dataset queries are now supported either in the simplified interface with find_first=False or with the advanced interface where a temporal dimension can be used. There is also an experimental interface in the advanced system that can be used to obtain validity ranges.
  • Coarse spatial joins (such as visit and tract) are now supported.
  • The advanced query system now allows for lists of data IDs to be used.
  • All the APIs now support limit and order_by, not just querying dimension records (and those parameters are now supported on the command-line as well.

The new APIs are:

  • butler.query_datasets()
  • butler.query_dimension_records()
  • butler.query_data_ids()

All these APIs return lists and by default we cap the number of results at 20,000 and issue a warning if you hit that limit.

The advanced query system uses a context manager:

with butler.query() as query:
    ...

Additionally there is now a butler.collections interface (see lsst.daf.butler.ButlerCollections) to replace the butler.registry collection APIs.

  • collections.query() replaces registry.queryCollections.
  • collections.get_info() replaces getCollectionDocumentation, getCollectionParentChains, getCollectionSummary, getCollectionChain, and getCollectionType.
  • collections.query_info() returns all the information available for all matching collections.

This butler.collections interface already existed for chain manipulation but has been extended to support all the registry collection APIs.

The long term plan is to make it so that butler.registry is no longer used. We are not deprecating the registry interfaces at this time but we hope that the new features motivate an eventual migration.

6 Likes