Somewhere between the CFITSIO version used by the stack (3.36) and the version I’ve been using personally (3.39), fits_write_key (and other header-writing routines) started transforming all header keys (including HIERARCH keys) to upper case. I just checked, and this is consistent with the FITS standard, which explicitly disallows lowercase for header keys.
It’s hard to argue with that change, but it means the strings we use no longer round-trip, and that causes at least one test in the stack to fail. I think the right thing to do is to make that test more permissive, but pushes me further along the track of believing we should consider giving up on FITS as a format we use internally to persist our own objects (we obviously will need to continue to support it as import/export, but round-tripping isn’t as important there); not being able to round-trip the keys in a dict to the most natural FITS version of a dict seems like a big problem.
Yeah, though up until now I thought it was close enough (at least if you allow HIERARCH) that we could live with the incompatibilities in most of cases.
I like your proposal for how to do the mapping safely, and I suppose we should add something like that to all of our routines that map PropertySet / PropertyList to FITS, probably with some kind of extra logic to ensure standard FITS header entries are written directly. As it is, you can put anything in a PropertySet / PropertyList, and there’s no guarantee at all that you’ll get it back out.
Does that make sense? As for that “extra logic”, do you think it’d be better to write anything that’s a valid FITS key directly, or define a global mapping somewhere of special PropertySet / PropertyList keys that need to be mapped to standard FITS header keys? The latter seems both more flexible and more fragile.
I think that would be straightforward. If uppercase and <= 8 characters use a simple header; uppercase and > 8 characters use HIERARCH; else use the KEY_/VAL_ scheme. I think you may be able to repeat the same keyword to make things easy:
A danger here is that we’re using HIERARCH (as given above) in a non-standard way. HIERARCH is not a “long FITS keyword” system, it’s a system for defining hierarchical keywords (e.g. lsst.afw.key = value). Using it as a way to deal with long keywords could break at any time (and in fact did when I attempted to upgrade to cfitsio 3.38).
I know. See this Community post where I discuss this in more detail, with quotes from an email exchange with Bill Pence. Summary: using non-namespaced HIERARCH values for long keywords broke in 3.38beta because they were experimenting with some of the other long keywords conventions, and it would likely have to break in order to natively support one of those conventions in the future.
@parejkoj, would you say that HIERARCH is sufficiently broken that you think we should use @timj’s KEY_/VAL_ scheme for anything longer than 8 chars, in addition to anything that uses invalid characters? That would prevent anyone else from reading long keys that only use valid characters, but it sounds like it’d be safer.
Trying to round-trip FITS header keywords is a pain, no question. There are a few different competing FITS conventions (or not-yet-conventions) on this topic. I wonder if we should review those and maybe choose one to support? Assuming we’re going to continue with FITS, or at least with saving things in the FITS header. Another option would be to just put all these values into a binary table HDU (which is one of the long keyword ideas under consideration). Unfortunately, I don’t have the list immediately to hand: I think there was a summary email sent to the FITSBITS list in the past couple years, but I can’t find it in my own email archive.
The FITS Tech Group have discussed a variety of options. The notion of replacing / supplementing the original simple ASCII FITS headers with multi-column binary tables for hierarchical metadata, including mixed cases (and perhaps, for that matter, UTF-8, not just ASCII), but also data typing, units, timestamps, etc, has come up before. This option is already completely legal FITS. Then the nominal FITS headers could be used solely for structural metadata such as checksums and compression. Table compression, including for these tabular header structures, is also now part of the standard.
Which is to say that you should be considering the requirements for lossless and idempotent round tripping of the keyword values, not just their names.
It does require some custom work on our part, would not be usable by anyone else, doesn’t support keys or values longer than 80 characters (how would this interact with CONTINUE?), and may not exactly round-trip floats (ASCII conversion), and doesn’t support more complicated data types (e.g. lists, dicts).
The VAL_ header can use whatever standard scheme you want to represent long strings. The issue here is to not be lossy with our long keywords and surely the keywords aren’t going to be longer than 68 characters?
I’d ask the question here as to what our actual goals are with storing this information when FITS headers are clearly not suitable.
Don’t know what you mean. It’s just as bad as FITS at representing numbers in headers.
This is also the question I have. For internal representation, we should either use binary tables or just dump FITS. We’ll be better off for it in the long run (and probably the short run, too).
I’ve been struggling with how to answer this question, and I think it’s because individual PropertySets are frequently heterogeneous in terms of their goals: they’ll often contain internal values that must be round-tripped exactly, informative values that we want humans to be able to read (ideally without using our software), values that can be directly mapped to standard FITS header keys, and usually some values that fit into more than one of those categories. Once we’ve finishing building a PropertySet and want to write it to FITS, we’ve usually lost the context that tells us the categories for those values.
I can think of two ways to get that category information to the persistence layer without assuming FITS before we actually get to persistence:
We replace PropertySet with a custom class that also stores the categories. Unfortunately, that pushes it further away from being directly round-trippable to more flexible (but simpler) forms, like Python dicts or YAML, because we’d lose the categories when we read these from simpler forms.
We create some sort of global registry that maps keyword names to storage categories. That’s unfortunate simply because it introduces a new global list of keywords, and that leads to naming conflicts that might require namespaces to resolve.
Because the stuff that probably should go in a FITS binary table is mixed up with stuff that can’t, because it’s a FITS standard header key (or something all astronomers have a strong expectation will be in the header). Take Exposure persistence as an example: Exposure creates a PropertySet, adds a bunch of keys to it for internal stuff, and then merges in the result of a call to Wcs::getFitsMetadata(), which includes a bunch of standard FITS header entries. We then pass the PropertySet down into the FITS I/O code, which now has no idea what can go in a separate binary table HDU and what has to go in the header.
Ooof. Right. Though I’d probably argue that once we’ve made a PropertySet, nothing from it should go back into the header. However, that will break most functionality with existing code, so it’s not really a workable solution.
I guess related to your “global registry” could also be something like a property on every object that says “put this in the FITS header” since this is specifically a FITS problem. I don’t like this solution very much either.