jointcal obviously has to read calexp “src” catalogs. In the standard version of the simultaneous astrometry solver,
getting the catalogs into memory takes about 2 CPU seconds per catalog (containing typically 4,000 objects).
So far, in all use cases, getting the catalogs into memory dominates the overall time budget (both clock time and CPU).
We select objects in a python function, doing simple things as checking flags, S/N and chasing NaN’s, explicitly iterating over catalog objects. This takes about 1 CPU s per catalog. Finding the file, reading it and instanciating the afw objects takes about 1 s per catalog as well. Turning the afw data into what the astrometry code expects takes a negligible amount of time. I recently wrote a task that matches a whole exposure to an external catalog and tested it on CFHT/Megacam data (36 chips). It takes about 70 CPU s per exposure (with the python selection function),
and ~32 CPU s when bypassing the python selection function. Combinatorial matching and fitting the 36
WCS’s take about 2 s. The same code reading ASCII catalogs completes the job in ~5 CPU s, wasting half of that converting strings to double’s.
I have two questions:
- 
is there an efficient way to select objects from an afw table in python? 
- 
Is there a way to speed up the loading of a calexp src catalog ? Spying on the process, I found about 700,000 calls to malloc per catalog, and about as many calls to cfitsio internals (e.g. ffgcv). The strace log shows that the same blocks of the data file are read several times. I wonder if the afw::table code actually reads the Footprint extensions of the SRC files, which we happen not to need. We just need to load the catalog itself, i.e. a ~2MB fits table. 
Thanks in advance for any help.