Pythonic views for afw.table

Despite the fact I own the dubious title of being World Expert on afw.table, I’m growing increasingly frustrated with the clunkiness of using it for simple analysis and algorithm-debugging in Python. I’m also quite aware that there are at least two Python table libraries - astropy.table and pandas - that do very similar things with (apparently) much better interfaces.

I’m also quite certain that the afw.table in-memory data model is almost completely compatible with those other libraries, in the sense that one should be able to create a very lightweight view in astropy.table or pandas that uses memory shared (via numpy) with an afw.table object. And I think it’d be fairly easy for me to write code that creates such views with minimal human intervention, if I knew a bit more about those target Python libraries - I’ve just never had an opportunity to learn them.

Does anyone who does have experience with astropy.table or pandas have any recommendations on how to approach this? For instance:

  • Does pandas have enough momentum now that we should focus on it directly?
  • Does the new support for making pandas views from astropy.table mean we should start with astropy.table?
  • Is there any part of our table data model that definitely isn’t compatible with either or both of these libraries? (I’m particularly worried about our packed-bit flag fields).
  • afw.table has description-only fields for units, which we could perhaps translate into a more functional unit system in a view. What are our options here?

Perhaps most importantly, leaving aside (for now) the question of how we schedule and Earn Value for this, does anyone have the expertise and inclination to pair-program some view-building code with me?

They pose other problems (e.g. for provenance), so I wouldn’t necessarily be adverse to getting rid of them, even at the cost of increased space.

I would also want to know if anyone felt that they had the expertise to do this without Jim’s assistance.

Without replying yet to the technical specifics of how the view layer would be implemented, I’ll remind everyone that this sort of thing is exactly the aim of the SPIE paper that @timj is leading.

My perception is that it would be most useful to start with astropy.table.Table instances backed by afw.table data since the Astropy ecosystem is a specific priority for us to integrate with.

A new software developer will be starting in Princeton in a couple of months. While at first he won’t have any afw experience at all, the intention is that he’ll rapidly ramp up to be able to handle projects like this.

Does AFW support multi-dimensional columns? As I mentioned in my AAS report one of the reasons astropy has not adopted Pandas natively is that they have to support this as it’s very common in FITS tables.

I was thinking of Pim too. The required expertise here is perhaps more in NumPy-internals rather than general C++, but if our plan is for him to take over maintenance of afw.table and/or ndarray from me, this could nevertheless be a good learning project for him.

Yes, but they’re very infrequently used, so a view that didn’t include them (or included them clunkily) wouldn’t be missing much.

Yes, agreed.

On multidimensional column support:

Ah – I couldn’t figure out how to use them at all. Could you explain? I can see how to make a column contain 1D arrays of arbitrary sizes, but not how to extend that to N dimensions.

Oh, you’re right. You can make fields that are 1-d arrays, which means the column is a 2-d array, but there’s no support for higher dimensions.