This is another blog-like post like my last one, but this one is more relevant (despite being a bit historical), and it involves a problem I still think we have a chance to solve.
Once upon a time, the afw.image.Exposure
class was just a MaskedImage
with a PropertySet
header and a Wcs
. That made it pretty simple - everything about it had a nice, FITS-standard representation (we only had simple, FITS
-standard WCSs in those days) - but a bit inconvenient. If you wanted to use some other object that represented some other aspect of our knowledge about the image, like a Calib
, you had to pass it around separately. So it was a bit inevitable that we ended up attaching Calib
to Exposure
, but that still left things pretty simple - it’s fundamentally just a zero-point, and we could put that in the headers, too.
When we added Psf
and Detector
classes to the codebase, the practice of attaching everything we know about an Exposure
to that Exposure
was pretty well established (I think; this was a long time ago), but those presented a new problem: there wasn’t a FITS-standard way to save them in the same file. We didn’t save the Detectors at all - instead we had the Butler create a full Camera
object and attach a Detector
to every Exposure
on read. For a long time we just wrote the PSFs into separate files using Boost.Serialization, which wrote non-standard, vaguely-human-readable text files than none of us really liked (it could also do XML, which I suppose we liked even less?). That meant it was up to whoever was working with the Exposure
to load the Psf
separately and attach it.
That all changed in 2013, when we decided to write CoaddPsf
, a Psf
class that was implemented by lazily evaluating all of the Wcs
s and Psf
s of all of the input images every time a PSF image was requested at a point on the coadd. That made it a much more complex and much larger data structure than any previous Psf
. That made persistence very tricky, and I didn’t want to try to solve that tricky problem with Boost.Serialization.
That was, in retrospect, probably a mistake: Boost.Serialization is actually excellent at saving complex data structures, and while there were some real problems with the code we used to interoperate with it and its output format was non-ideal, I didn’t really consider the possibility of trying to understand and fix those problems. Writing a new persistence framework from scratch sounded much more promising, and having just written afw::table
the year before, it made for very shiny hammer that made this problem seem very nail-like.
But even that wasn’t quite enough for me to decide that inventing a new persistence framework was a good idea. What really sold me on the idea was the thought of addressing three long-standing problems at the same time; in addition to implementing persistence for CoaddPsf
, we could:
- Start saving coadd provenance (what became the
CoaddInputs
class) - if you think about it, that’s essentially exactly the same information thatCoaddPsf
wants. - Start saving
Psf
s and other complex objects to the same FITS file as the Exposure. This is something @RHL did with his PSF models in SDSS, and we were all looking forward to not having to pass those separate PSF files around with the correspondingExposure
FITS files.
And so afw::table::io::Persistable
was born, a home-brew persistence framework that writes almost-arbitrary data structures (including polymorphic objects and shared pointer relationships) to a sequence of FITS binary table HDUs. And while the home-brew-newss of it was a mistake, it does have some very nice features. Writing Exposure
components to same FITS file is a big win for convenience, and the binary tables themselves are at least a little bit self-describing and language-agnostic; I have already written pure-Python, non-DM code to read some things from them, and I’m sure sufficiently-dedicated other people could do so as well (especially with some slightly better high-level docs from me). What I should have done was to adopt a mature serialization library (like Boost.Serialization), and try to write a FITS binary table archive back-end for it. I didn’t realize that until I was pretty far along, however, and by then it felt too late to back out.
Anyhow, along with with that new persistence framework came some big improvements to Exposure
: we split off managing the non-image components into a separate object, ExposureInfo
(which also had the new code for reading and writing them with afw::table::io
). We also added the ExposureRecord
and ExposureCatalog
classes to make it possible to pass around Exposure
components and metadata in table form.
All of that work seemed to have left us in a much better state, and on the surface that was true. But at this point, I think we’d made a enormous mistake in structuring the code (in addition to writing our own persistence framework), and we’ve been paying for it ever since. Consider:
-
The new persistence framework (
afw.table.io
) was part ofafw
. It requires persistable objects to inherit from a base class (in C++) and use data structures defined inafw::table
. That means persistable objects need to be defined in or aboveafw
in the package dependency tree, and they need to be written in C++. That’s already pretty bad, but it gets worse. -
The
Exposure
class is also part ofafw
, and it holds its components as regular, strongly-typed data members. That means anything held byExposure
needs to be defined in or belowafw
in the package dependency tree (or at least have a polymorphic base class that is - but most of these components are concrete, nonpolymorphic classes). You can probably guess where I’m going here. -
For a class to be convenient (and, at some level, be non-surprising) as an
Exposure
component, it also needs to be persistable withExposure
, and that means it needs to useafw::table::io
. And hence any class that represents the properties of an image must be be in afw, and written in C++. It’s no wonder that package has gotten so big.
But wait, that’s not all:
-
afw.table.SourceRecord
plays a similar (albeit milder) role for individual astronomical objects; any data structure beyond tabular data that we want to associate with those objects needs to be known to it and persisted with it (right now that’s “just”Footprint
/HeavyFootprint
/SpanSet
/PeakRecord
). So anything that plays that role needs to go inafw
(and be written in C++), too. -
afw.table.io
(a package lower bound) andafw.table.ExposureRecord
, andafw.table.SourceRecord
(package upper bounds) are all not just part ofafw
, but part ofafw.table
. So any class that describes an image or an astronomical object in a persistent way actually needs to go in afw.table (or have a circular dependency withafw.table
, which is what we’ve actually done).
And here’s the kicker:
- Any time we add a new class that describes an image or an astronomical object in a persistent way we need to modify at least one of our core low-level primitives,
Exposure
andSourceRecord
, and that includes updating their persistence code (and unless we have a very good reason not to, maintaining backwards compatibility with files we’ve already written).
We got to this point by doing what seemed convenient and trusting our own intuition and judgement as to what made for good interfaces, design, and code organization. The lesson I’ve learned since 2013 is that our judgement and especially our intuition in those areas are often terrible, especially given that most of us (in Science Pipelines, anyway) think like scientists, not software engineers.
Good design principles are often counter-intuitive, at least at first, and that’s especially true of the classic OO design principles we’ve very explicitly violated four (of five!) of here (which and how are left as an exercise to the reader). This is made more difficult by the fact that those design principles often encourage splitting software up into smaller units, and that exacerbates very real problems with packaging tooling and ecosystems that often (and legitimately!) feel much more important in the real world than adherence to abstract design principles (I think the discussion on RFC-539 is a perfect example of this).
Anyhow, I think there are a lot of ways out of this mess, and to first order I think they’re pretty straightforwardly derived from actually applying some dependency-inversion-principle in particular, but the devil is of course in the details…and the disruption. Given where we’re at now, any changes to these core classes is going to be quite disruptive. But if we ever want to get our package organization in order, or write more stuff in Python, or avoid churn at the bottom of the dependency tree, this is the hurdle we’ve got to overcome.