Package version tracking and checking ("stack provenance")

provenance
cmdlinetask
Tags: #<Tag:0x00007fd8906ec740> #<Tag:0x00007fd8906ec268>

(Paul Price) #1

I’ve just merged DM-3372, which implements RFC-169 (see also this previous discussion on CLO). This introduces some run-time mechanisms for collecting a list of packages that are being used and their versions (in the ‘base’ package) and integrates this with CmdLineTask (in the ‘pipe_base’ package) to track and check what versions are being used in order to prevent the user from unknowingly producing data with multiple versions of the code. This is important because data produced with different versions are difficult to debug due to the uncertain provenance.

This check can be disabled (e.g., for developers who are actively changing package versions and understand the implications) using the --no-versions command-line flag. The persisted set of package versions can be overwritten (e.g., when you’ve changed versions deliberately and understand the implications) using the --clobber-versions command-line flag; note that --clobber-versions is subject to race conditions (just like --clobber-config) if you’re running it in different processes.

Please let me know if you have any trouble with this new feature.

Some details

There are a few different types of packages, and their versions are collected in different ways:

  1. Run-time libraries (e.g., cfitsio, fftw): we get their version from interrogating the dynamic library. (There’s a new facility in the ‘base’ package that helps with pulling symbols out of dynamic libraries.)
  2. Python modules (e.g., afw, numpy; galsim is also in this group even though we only use it through the library, because no version information is currently provided through the library, but that’ll change soon): we get their version from the __version__ module variable, set at build time. Note that this means that we’re only aware of modules that have already been imported.
  3. Some other packages provide no run-time accessible version information (e.g., astrometry_net): we get their version from interrogating the environment. Currently, that means EUPS; if EUPS is replaced or dropped then we’ll need to consider an alternative means of getting this version information.
  4. Local versions of packages (a non-installed EUPS package, selected with setup -r /path/to/package): we identify these through the environment (EUPS again) and use as a version the path supplemented with the git SHA and, if the git repo isn’t clean, an MD5 of the diff.

(Jim Bosch) #2

In practice, when this code is executed, have all of the LSST packages that could affect the processing been imported?


(Paul Price) #3

It’s run after Task creation, so it should have everything of importance. Now, of course, there’s no way it can pick up on anything that gets imported in a function that hasn’t fired yet (at least, not without great complications in our system), but it should pick up the important stuff.


(Jim Bosch) #4

Great, that’s what I was hoping. In particular, it’ll catch all the plugin packages that were imported in the configuration.