Calibration product certification in Gen3 Butler

Here’s an API proposal for the low-level calibration certification butler operation in Gen3. I hope it’s clear enough from context how things might work such that even those who aren’t super familiar with Gen3 classes can evaluate it (if not, please ask!):

I imagine a higher-level (and command-line accessible) interface would differ primarily in that it would provide convenient ways to generate the refs argument (e.g. “all datasets in some other collection matching some of our usual query criteria”).

I’d like to get feedback from at least the CPP team and others with experience managing calibrations in other surveys; please take a look and see:

  • if there’s anything you need to do that you can’t do with this;
  • whether there is any extra checking or other behavior you’d want to change;
  • if there’s anything else you’d want to change here.

For PFS, we recently realised that we will likely want to have multiple validity periods for a single calibration to support calibrations taken at the end of the night. Will this proposal support that, e.g., by calling certify multiple times?

I think this seems like a reasonable path forward. A few questions:

  1. Will the historical validity ranges be stored when replace=True? Will there be a way to determine which calibrations were applied to some processed data without this?
  2. Do calibration collections have to already exist to be used (is “collection to which datasets will be appended” a restriction or just shorthand for the usual purpose of appending to the collection of valid calibrations)?
  3. Can calibrations be transferred between collections, or is the assumption that there will only ever be a single certify call moving datasets from the origin RUN to the final calibration collection? Being able to have an intermediate collection would make the validation step much easier (calibrations start in their origin run, are certified into a test collection with ~infinite validity to determine the correct validity ranges, and are then re-certified for use with those ranges into a general use collection).
  4. replace=True seems like a dangerous default, leading me to another question like #1: will there be a way to undo/rollback certification without re-certifying new replacements?

Yes, I just wrote up the initial table schema that will back this, and was careful to define the primary key in such a way that the same dataset can appear multiple times.

No and yes, respectively - there will be other ways to find the exact provenance, but this method would not attempt to preserve old validity ranges. Of course, if you do want to preserve validity ranges, you can first create new calibration collection and then just modify that. I haven’t thought about the APIs for copying these collections, but they clearly need to exist.

Yes. They need to be explicitly registered before this certify method is called. It’s actually a little difficult to make that happen implicitly, even though that would be convenient (issues related to transactions and concurrency vs. table creation), so I would like to leave that a separate step.

Datasets can only be created in a RUN collection, and have to remain in that RUN forever. But they can be added to and removed from any number of CALIBRATION (or TAGGED) collections over their lifetime.

My vision for the workflow (which hasn’t had a lot of deep thought) was:

  • CPP Tasks would put datasets to a RUN collection;
  • [optional] later CPP Tasks could either use that RUN as an input when producing downstream datasets into a different RUN, or add those downstream datasets to the same RUN by using it as input output - the only constraint is that there would be no validity-range lookups for those inputs, so this is all confined to one validity range;
  • Operator would call certify with all or most of the contents of one or more of those RUN collections, associating them with a validity range in a new CALIBRATION collection.
  • The above steps would be repeated until the CALIBRATION collection is complete over all time periods and dataset types of interest.
  • The “default calibration collection” for an instrument would actually be a CHAINED collection, so defining or modifying the default calibrations is actually a matter of redefining the child collection(s) that references.

I’m certainly happy to change the default; I have no real preference for that.

It would be possible to wrap many certify calls and other registry opertations in a with registry.transaction(): context block, and roll back at any time before that completes, but once it’s done, it would not be something you can roll back. I suspect just copying calibration collections (which are lightweight things anyway) before modifying them (as a matter of practice, not API) is more what you want.

After thinking about @czw’s concerns about replace=True and some implementation constraints, I think I’ve going to just remove that option and make the replace=False behavior the only one (so it’d be a hard error and complete rollback to try to certify a calibration that overlaps an existing one). We’ll just provide a separate operation that can decertify calibrations to meet that use case.

That checking should be very straightforward to implement in a concurrency-safe way both in PostgreSQL (via its custom range types and exclusion constraints) and in SQLite (where we can rely on its aggressive locking to make a multi-statement SELECT-then-INSERT approach atomic).

I don’t want to contradict myself, but does this mean that certification needs to include explicit valid end dates for everything, or will there be a recertify method that allows the non-overlapping dates to be set on an existing calibration before a new one is certified? The workflow in my head has been that we certify all new calibrations with a validStart, and let them be valid until “the future”, under the assumption that the V&V system will make it obvious when we need to construct a replacement. My concern with replace=True was that it makes overwriting very easy. I’m fine with it being hard (recertify/decertify followed by a new certify, with errors thrown if the new certify still overlaps), but an update process will need to exist.

For submission of a bunch of calibrations we will want a method that just takes start dates and the end dates get worked out automatically if they are not specified explicitly. Or at least we will need a pre-submission method that works all that out generically.

Very good point. This was the workflow in my head as well, but I hadn’t noticed the inherent conflict between that and refusing inserts that overlap. That makes the PostgreSQL-range solution I had in mind much less complete (sadly, EXCLUSION constraints can’t be used with INSERT ... ON CONFLICT DO UPDATE).

I think I’m still favoring it, along with an explicit decertify step, because it still provides what seems by far the simplest way of guaranteeing “errors thrown if the new certify still overlaps”, even in the presence of concurrency (which shouldn’t be common, but I don’t want even rare things to have a chance of corrupting the registry). And it should provide the best performance as well, since it comes with custom indexes.

The alternatives I can think of (getting into the weeds for the database experts here) all involve explicit table locking around a multi-statement implementation of what appears to the user as a single action (and that’s what we’ll have to do with SQLite regardless, but it’s going to lock whether we ask it to or not). That would involve some combination of INSERT and SELECT if we still make decertify a separate step, but it would also allow us to support replace=True (or some slightly safer variant, like one that only rewrites the end timestamps if they were previously infinite) by including some UPDATE or DELETE in that implementation.

This also made me wonder again about implementing this with each row only having the begin timestamp, with the end timestamp implied by the presence of another row with the same data ID and a later begin timestamp. That certainly handles the case where the last entry was open-ended and we want to close it by inserting a new one. But I think it makes it harder to checking for whether the new entry is between existing ones (regardless of whether that should be an error or a command to replace), and we end up with table-locking and a multi-statement implementation anyway.

Where are you imagining these coming from? I don’t think this is terribly hard, but the CPP people have thus far seemed to be unusually clear about not wanting support for making more than one validity range at a time (and often one dataset type) at a time. And import/export scenarios seem safer if they include begin and end timestamps explicitly.

I was thinking of this as a convenience function for things like defects and the other curated calibrations. This wouldn’t affect export at all since by that point the end dates would be set. Maybe the convenience function that I’m talking about simply goes in the code that reads the curated calibrations and it automatically defines the end date if absent.