There appears to be a gigantic performance difference between using sourceCat['field']
and sourceCat.get('field')
that (I believe) scales with the width of the table.
It seems as though the first call with square brackets might be converting the whole table (though this is just conjecture on my part).
For a tiny table, here is a timing comparison of accessing the elements in a small table:
import lsst.afw.table as afwTable
import numpy as np
schema = afwTable.Schema()
schema.addField('col1',type=np.float32,doc="column 1")
schema.addField('col2',type=np.float32,doc="column 2")
schema.addField('col3',type=np.float32,doc="column 3")
testCat = afwTable.BaseCatalog(schema)
nRow = 10
testCat.table.preallocate(nRow)
for i in xrange(nRow):
rec=testCat.addNew()
rec['col1'] = i
rec['col2'] = i*10
rec['col3'] = i*100
% timeit test=testCat['col1']
## 2.21 ms
% timeit test=testCat.get('col1')
## 34.1 us
That’s a factor of 60 for this little toy case. When doing a source selection like the following:
gdFlag = np.logical_and.reduce([~sources['flag_pixel_saturated_center'],
~sources['flag_pixel_interpolated_center'],
~sources['flag_pixel_edge'],
~sources['flag_pixel_cr_center'],
~sources['flag_pixel_bad'],
~sources['flag_pixel_interpolated_any'],
~sources['slot_Centroid_flag'],
~sources['slot_ApFlux_flag'],
sources['deblend_nchild'] == 0,
sources['parent'] == 0,
sources['classification_extendedness'] < 0.5])
vs
gdFlag = np.logical_and.reduce([~sources.get('flag_pixel_saturated_center'),
~sources.get('flag_pixel_interpolated_center'),
~sources.get('flag_pixel_edge'),
~sources.get('flag_pixel_cr_center'),
~sources.get('flag_pixel_bad'),
~sources.get('flag_pixel_interpolated_any'),
~sources.get('slot_Centroid_flag'),
~sources.get('slot_ApFlux_flag'),
sources.get('deblend_nchild') == 0,
sources.get('parent') == 0,
sources.get('classification_extendedness') < 0.5])
Then the speed difference is a factor of approximately 12000. In one case it takes ~10 seconds (for a fully populated source catalog) and in the other 822 microseconds.
For reference, doing it the way the sourceSelector does (select each row and append one at a time) is somewhere in the middle … from intolerable to tolerable, but not as good as the vectorized selection through numpy
.
Notably, for an individual source/row there does not seem to be any performance difference between the []
call and the .get()
call.
Relatedly, when looping over a source catalog to set values, if you do:
for i in xrange(nRow):
tempCat['field'][i] = computed_value
then it will be super-slow, but
for i in xrange(nRow):
tempCat[i]['field'] = computed_value
then it will be fast because you use the setter on an individual row (fast) vs trying to grab the full column in a numpy array (slow). This is opposite to what one does for a regular numpy recarray.
Finally, there does not seem to be a corresponding fast table-wide .set()
call to go with the fast .get()
call so at the moment you’re stuck with either looping over the full table (as above), or taking the performance hit.