As a part of Commissioning, we need to write scripts to test things like photometric and astrometric repeatability. This means getting all of the observations of a region of sky and looking at the distributions of the different measurements of individual objects in that region. I have been doing this with code like
for data_id in list_of_data_id_in_region:
src = butler.get('src', dataId=data_id)
calexp = butler.get('calexp_photoCalib',
dataId=data_id)
...analysis code...
I am finding that the butler.get
steps take about 0.1 second per data_id
. This starts to become a problem when you are dealing with a few 10**4
data_id
s (as in the HSC UDEEP field). Is there a more efficient way to load data from a large number of visits with the Butler, or do we just need to eat the cost (presumably running a pre-burner to load all of the data we want into a more columnar form before doing any analysis)?
Thanks.