Butler changes to allow CameraMapper to work with repositories that are not on the local filesystem

Butler and CameraMapper have been refactored so that CameraMapper can be used with repositories that are not on the local file system. The draft version of LDM-463 has been updated. Major changes are outlined below. As always, if this breaks something for you please reach out to me for help as needed.
##General User Visible Changes
###Butler should init mapper
Butler establishes repository relationships based on inputs, some of this affects mapper configuration. When passing a mapper into butler init, the mapper should be an importable string (e.g. lsst.daf.persistence.CameraMapper) or a class object, NOT a class instance.

At some point this may change from “should” to “must”.
###Bypass Functions
Bypass functions do not participate in the Butler’s deferred-read mechanism.
see https://ldm-463.lsst.io/v/draft/#bypass-functions
**until the draft version of LDM-463 builds, you can replace draft in the URL with DM-8686 e.g. https://ldm-463.lsst.io/v/DM-8686/#bypass-functions. This is true for all the following links to LDM-463.
###Init RepositoryArgs via dict
It can be very verbose to type lsst.daf.persistence.RepositoryArgs (or some abbreviation e.g. dafPersist.RepostioryArgs is common) for every item in inputs and outputs in Butler init. To help with this, a dict can be used to initialize a RepositoryArgs, and can be used as the value passed to inputs and outputs in the Butler initializer.
https://ldm-463.lsst.io/v/draft/#id1/

Changes that Affect Mapper Authors

###ButlerLocation requires and uses a Storage Interface.
https://ldm-463.lsst.io/v/draft/#butlerlocation-requires-storage-interface
###Mappers Should Provide Access to their Registry
Mappers should provide access to their registry via the method getRegistry(self). This allows mappers in child repositories to use the registry from a parent.
https://ldm-463.lsst.io/v/draft/#mappers-should-provide-access-to-their-registry
###CameraMapper input args
The Butler needs to be able to pass arguments to CameraMapper during init. The two that it currently needs to pass are the RepositoryCfg of the associated repository as repositoryCfg, and the parent Registry as parentRegistry. I’ve modified subclass mappers to take these args and pass them on to the CameraMapper superclass.
##Butler Power-User Changes
###Storage and Storage Interface Classes
We wanted to allow Butler (and CameraMapper) to work with files that are not in the local posix filesystem (e.g. in a Swift storage).
https://ldm-463.lsst.io/v/draft/#storage-layer
In the next story, DM-7468 (newer than DM-8686, implements a SwiftStorage interface class) StorageInterface is an abc, and the API is in the docstrings of that file
https://ldm-463.lsst.io/v/DM-7468/#storage-layer

There are a few Posix-only methods in the PosixStorage class to support old-butler repositories:

  • v1RepoExists(root): given a filesystem path see if a butler “version 1 repository” exists.
  • getParentSymlinkPath(root): For Butler V1 Repositories only, if a _parent symlink exists, get the location pointed to by the symlink.

The Storage class is a helper for accessing storage interface functions, although I think maybe this is misguided. See documentation, I’m not sure if this is good syntatic sugar, or making it do too much.
###New Butler init Signature Being Used More.
New Butler initialization api is being used in more places. This is ``Butler(inputs=…, outputs=…)```. The arguments describe repositories. More information is in https://ldm-463.lsst.io/v/draft/#id1
https://ldm-463.lsst.io/v/draft/#repositoryargs
###Repositories do not Access their Parents Directly
In New Butler the CameraMapper does not access parents directly (via _parent
symlinks). In the cases where child repositories need to access elements of
parent repositories, those elements are passed to the children: Parent Registry, Parent mapper, and mapper init.
https://ldm-463.lsst.io/v/draft/#cameramapper-no-longer-accesses-parents-directly

###Relative and Absolute Paths for Repository Root and Parents
Using relative paths can become confusing when trying to identify a specific repository in Butler logic. Internally Butler converts all relative paths to absolute paths.

However, we want to be able to move Repositories around as a group (e.g. within a top level folder) on one computer and from computer to computer. So when it is possible to store a relative path from one repository to its parent, the relative path is kept in the “RepositoryCfg.

https://ldm-463.lsst.io/v/draft/#persisted-parent-path-is-relative-when-possible

When copying a repository from one Storage type to another (e.g. from a developer to a Swift location) it’s possible the parent URIs will have to be adjusted. When we add Storage locations this should be considered, and it’s possible we should write a helper script to support this.

https://ldm-463.lsst.io/v/draft/#moving-repositories-and-repositorycfgs

How do we use this feature?

processCcd.py remoteNode:/path/to/data --id visit=12345 ccd=67

?

@price I guess that line runs a pipe task? If the value of remoteNode becomes namespace.output or namespace.input in lsst.pipe.base.ArgumentParser then you’d replace /path/to/data with a URI to your repository, which might look something like "swift://nebula.ncsa.illinois.edu:5000/v2.0/LSST/testContainer01". I’m working on this in DM-7468.

Would it make sense for us to demonstrate soon that this works with DAX imgserv URLs? This would definitely seem like another step toward realizing bits of the Science Platform design.

We can discuss it. I should discuss with @brianv0 what kind of data is downloaded from a DAX imgserv URL. And I’d want to understand more about the use case.