FITS and lowercase header keys

jbosch · September 22, 2016, 8:47pm

Somewhere between the CFITSIO version used by the stack (3.36) and the version I’ve been using personally (3.39), fits_write_key (and other header-writing routines) started transforming all header keys (including HIERARCH keys) to upper case. I just checked, and this is consistent with the FITS standard, which explicitly disallows lowercase for header keys.

It’s hard to argue with that change, but it means the strings we use no longer round-trip, and that causes at least one test in the stack to fail. I think the right thing to do is to make that test more permissive, but pushes me further along the track of believing we should consider giving up on FITS as a format we use internally to persist our own objects (we obviously will need to continue to support it as import/export, but round-tripping isn’t as important there); not being able to round-trip the keys in a dict to the most natural FITS version of a dict seems like a big problem.

timj · September 22, 2016, 8:56pm

I’m surprised it ever worked.

Python dict keys and FITS header cards are not compatible. You’d have to do something like:

KEY_1 = ‘first_key'
VAL_1 = ‘First Value’
KEY_2 = ‘second_key’
VAL_2 = 42

jbosch · September 22, 2016, 9:10pm

Yeah, though up until now I thought it was close enough (at least if you allow HIERARCH) that we could live with the incompatibilities in most of cases.

I like your proposal for how to do the mapping safely, and I suppose we should add something like that to all of our routines that map PropertySet / PropertyList to FITS, probably with some kind of extra logic to ensure standard FITS header entries are written directly. As it is, you can put anything in a PropertySet / PropertyList, and there’s no guarantee at all that you’ll get it back out.

Does that make sense? As for that “extra logic”, do you think it’d be better to write anything that’s a valid FITS key directly, or define a global mapping somewhere of special PropertySet / PropertyList keys that need to be mapped to standard FITS header keys? The latter seems both more flexible and more fragile.

timj · September 22, 2016, 9:19pm

I think that would be straightforward. If uppercase and <= 8 characters use a simple header; uppercase and > 8 characters use HIERARCH; else use the KEY_/VAL_ scheme. I think you may be able to repeat the same keyword to make things easy:

LSSTKEY = ‘lowercase_key’
LSSTVAL = ‘Value’
LSSTKEY = ‘second_key’
LSSTVAL = 42

FITS header cards don’t have to be unique if I recall correctly (maybe I mis-remember).

parejkoj · September 22, 2016, 11:18pm

A danger here is that we’re using HIERARCH (as given above) in a non-standard way. HIERARCH is not a “long FITS keyword” system, it’s a system for defining hierarchical keywords (e.g. lsst.afw.key = value). Using it as a way to deal with long keywords could break at any time (and in fact did when I attempted to upgrade to cfitsio 3.38).

I filed DM-6266 about this issue: https://jira.lsstcorp.org/browse/DM-6266

timj · September 23, 2016, 4:01am

CFITSIO is documented to think that it is: http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c_user/node117.html

parejkoj · September 23, 2016, 6:08am

I know. See this Community post where I discuss this in more detail, with quotes from an email exchange with Bill Pence. Summary: using non-namespaced HIERARCH values for long keywords broke in 3.38beta because they were experimenting with some of the other long keywords conventions, and it would likely have to break in order to natively support one of those conventions in the future.

jbosch · September 23, 2016, 3:00pm

@parejkoj, would you say that HIERARCH is sufficiently broken that you think we should use @timj’s KEY_/VAL_ scheme for anything longer than 8 chars, in addition to anything that uses invalid characters? That would prevent anyone else from reading long keys that only use valid characters, but it sounds like it’d be safer.

parejkoj · September 23, 2016, 4:42pm

Trying to round-trip FITS header keywords is a pain, no question. There are a few different competing FITS conventions (or not-yet-conventions) on this topic. I wonder if we should review those and maybe choose one to support? Assuming we’re going to continue with FITS, or at least with saving things in the FITS header. Another option would be to just put all these values into a binary table HDU (which is one of the long keyword ideas under consideration). Unfortunately, I don’t have the list immediately to hand: I think there was a summary email sent to the FITSBITS list in the past couple years, but I can’t find it in my own email archive.

rseaman · September 23, 2016, 5:28pm

The FITS Tech Group have discussed a variety of options. The notion of replacing / supplementing the original simple ASCII FITS headers with multi-column binary tables for hierarchical metadata, including mixed cases (and perhaps, for that matter, UTF-8, not just ASCII), but also data typing, units, timestamps, etc, has come up before. This option is already completely legal FITS. Then the nominal FITS headers could be used solely for structural metadata such as checksums and compression. Table compression, including for these tabular header structures, is also now part of the standard.

Which is to say that you should be considering the requirements for lossless and idempotent round tripping of the keyword values, not just their names.

timj · September 23, 2016, 5:43pm

That is guaranteed to work and does not require any FITS extension support.

parejkoj · September 23, 2016, 7:51pm

It does require some custom work on our part, would not be usable by anyone else, doesn’t support keys or values longer than 80 characters (how would this interact with CONTINUE?), and may not exactly round-trip floats (ASCII conversion), and doesn’t support more complicated data types (e.g. lists, dicts).

timj · September 23, 2016, 7:55pm

Of course.

The VAL_ header can use whatever standard scheme you want to represent long strings. The issue here is to not be lossy with our long keywords and surely the keywords aren’t going to be longer than 68 characters?

I’d ask the question here as to what our actual goals are with storing this information when FITS headers are clearly not suitable.

Don’t know what you mean. It’s just as bad as FITS at representing numbers in headers.

parejkoj · September 24, 2016, 6:47am

This is also the question I have. For internal representation, we should either use binary tables or just dump FITS. We’ll be better off for it in the long run (and probably the short run, too).

jbosch · September 26, 2016, 3:31pm

I’ve been struggling with how to answer this question, and I think it’s because individual PropertySets are frequently heterogeneous in terms of their goals: they’ll often contain internal values that must be round-tripped exactly, informative values that we want humans to be able to read (ideally without using our software), values that can be directly mapped to standard FITS header keys, and usually some values that fit into more than one of those categories. Once we’ve finishing building a PropertySet and want to write it to FITS, we’ve usually lost the context that tells us the categories for those values.

I can think of two ways to get that category information to the persistence layer without assuming FITS before we actually get to persistence:

We replace PropertySet with a custom class that also stores the categories. Unfortunately, that pushes it further away from being directly round-trippable to more flexible (but simpler) forms, like Python dicts or YAML, because we’d lose the categories when we read these from simpler forms.
We create some sort of global registry that maps keyword names to storage categories. That’s unfortunate simply because it introduces a new global list of keywords, and that leads to naming conflicts that might require namespaces to resolve.

parejkoj · September 26, 2016, 6:30pm

Why can’t we just use FITS binary tables for those, if we’re going to use FITS?

jbosch · September 26, 2016, 6:50pm

Because the stuff that probably should go in a FITS binary table is mixed up with stuff that can’t, because it’s a FITS standard header key (or something all astronomers have a strong expectation will be in the header). Take Exposure persistence as an example: Exposure creates a PropertySet, adds a bunch of keys to it for internal stuff, and then merges in the result of a call to Wcs::getFitsMetadata(), which includes a bunch of standard FITS header entries. We then pass the PropertySet down into the FITS I/O code, which now has no idea what can go in a separate binary table HDU and what has to go in the header.

parejkoj · September 26, 2016, 8:25pm

Ooof. Right. Though I’d probably argue that once we’ve made a PropertySet, nothing from it should go back into the header. However, that will break most functionality with existing code, so it’s not really a workable solution.

I guess related to your “global registry” could also be something like a property on every object that says “put this in the FITS header” since this is specifically a FITS problem. I don’t like this solution very much either.