Querying registry for time-ordered calexps

herjy · June 1, 2022, 11:33am

I would like to do queries that would be ordered by time of observation.
The way to do this (, according to these notebooks 1, 2) seems to be to pull all the calexp I’m interested in and then sort them once they are all loaded. Instead, I would like to get an ordered query and be able to access observation time before having to pull the calexp with the butler. My understanding is that visits correspond to a specific observing time, so this should be possible, but I did not find an obvious way to do it.
I’m using the desc-stack-weekly-latest kernel on DC2 3.1i DDF.

merlin · June 1, 2022, 2:15pm

You should be able to do something like this to get your records sorted by time of observation:

records = butler.registry.queryDimensionRecords("exposure", where="<your-query>")
records = list(records)
records.sort(key=lambda r: r.timespan.begin)

merlin · June 1, 2022, 2:17pm

You can also call .order_by() on the generator itself, which might be more efficient for some cases too:

recordGenerator = butler.registry.queryDimensionRecords("exposure", where="<your-query>")
recordGenerator.order_by(<some-ordering>)

timj · June 1, 2022, 2:58pm

The lines from notebook one:

# Append MJD (from this src's associated image's header)
visit_info = butler.get('calexp.visitInfo', dataId=did)
mjd_arr.append(visit_info.getDate().get(dafBase.DateTime.MJD))

are not the most efficient way to do this. If you already have the DatasetRef from the queryDatasets you can get the timespan information directly from the visit dimension record attached to the dataId (assuming you’ve called .expanded() on the return value from queryDatasets). The time is an inherent part of the dataId. (cc/ @kadrlica )

herjy · June 1, 2022, 3:48pm

Thanks both! @timj had exactly what I needed, but I’ve learned a bunch of things exploring @merlin 's solution. Thanks!

This is getting me the time catalog I need.
[ref.dataId.timespan.begin for ref in list(datasetRefs.expanded())]

timj · June 1, 2022, 3:51pm

That list should not be needed. datasetRefs.expanded() can be used directly as a generator.

herjy · June 1, 2022, 3:54pm

Indeed it can, thanks Tim:

[ref.dataId.timespan.begin for ref in datasetRefs.expanded()]