How to update a saved catalog to a new format?

rowen · May 18, 2016, 5:18pm

I modified lsst.afw.table.AmpInfoCatalog by adding a field and changing the type of another field. My question is: how to convert saved catalogs to the new format?

Things I have tried that fail:

Read the old data using the new format. That fails, as one might expect.
Read the old data as a SimpleCatalog. That fails with a complaint about being the wrong data type.
Reading the data with astropy.table. That works, and conversion is trivial, but the converted result cannot be read by afw.table, apparently because the flag fields aren’t right.

The only solutions I have thought of are to regenerate all the tables or temporarily add two versions of AmpInfoTable to afw.table. I am hoping there is a more straightforward solution. I could probably manage without backwards compatibility, if that helps.

jbosch · May 18, 2016, 5:56pm

I think reading as SimpleCatalog should have worked; if not, BaseCatalog certainly ought to, and if it doesn’t it’s probably worth creating an issue calling it a bug.

jbosch · May 18, 2016, 6:04pm

Nevermind. I think I know what happened. The problem is that there’s a special tag in the FITS header that indicates the table subclass, and that actually controls what type is loaded. If you can find that tag (“AFW_TABLE_TYPE”?) and change it with pyfits, using SimpleTable should work from there. There sholuld be a better approach to this problem eventually, but I think this is expedient.

.

rowen · May 18, 2016, 6:27pm

Thanks. I created https://jira.lsstcorp.org/browse/DM-6144 to request a way to ignore AFW_TABLE_TYPE. Meanwhile I am going to try recreating all the amp info tables, since that must be done anyway.

rowen · May 18, 2016, 11:34pm

For the record:

Stripping AFW_TYPE fails with lsst::pex::exceptions::RuntimeError: 'Invalid table class for catalog.'
Changing AFW_TYPE from AMPINFO to SIMPLE fails with lsst::pex::exceptions::InvalidParameterError: 'Schema for Simple must contain at least the keys defined by makeMinimalSchema().'

I wonder if there is any way at all to read the old catalogs. I had hoped to avoid the need to do so by generating new ones directly. This worked for all packages except obs_subaru, which no longer has working generating code.

jbosch · May 18, 2016, 11:36pm

Try setting AFW_TYPE to “BASE” instead.

rowen · May 19, 2016, 6:04pm

With help from @jbosch I have determined the following:

It is necessary to preprocess the fits files to set the value of header AFW_TYPE to "BASE" in the HDU containing the binary table (usually the second HDU). astropy.io.fits is useful for this.
Read the old catalog using oldCat = lsst.afw.table.BaseCatalog.readFits(path)
Create a new catalog with the new schema and call newCat.reserve(len(oldCat)) to reserve enough space that the new catalog will be contiguous.
Call newCat.addNew() once for each row in the old table to create the rows.
For each field name in the old table, compute newFieldName = oldFieldName.replace(".", "_"); this is necessary if the old table may be old enough to use old style field names, and safe in any case.
Set each row using newCat.set(newFieldName, oldCat.get(fieldName)) for each field. Note that dict format will not work for all field types (e.g. newCat[newFieldName] =...), neither on the whole table nor on individual rows.

A schema mapper might also be used (I am not sure how that is done if one is changing a field type).

I have uploaded the converter that I ended up with. It has some special-case code for obs_subaru but otherwise is a straightforward implementation of the above. (The proper suffix is .py; I had to add suffix .txt to appease the file type nanny.

convertAmpInfoCatalog.py.txt (3.7 KB)

timj · May 20, 2016, 5:10pm

one option is to put code like this in a gist at GitHub and then reference the gist (which might get inlined at that point).

There is an example at Issues building the released version of PhoSim from BitBucket

jsick · May 21, 2016, 3:10am

To close the loop on this side issue: I’ve disabled filetype whitelists for uploads (everything is accepted, though there is a file size limit). But as Tim says, Gists are also great solutions here.