The Butler now has the concept of a repository index that people can use to lookup a memorable label and be returned a URI to the relevant butler repository.
For this to work the user’s site must define the environment variable DAF_BUTLER_REPOSITORY_INDEX
to point to a URI (can be file system path or S3 URI etc) of a YAML (or JSON) file containing a simple dict mapping of label to URI.
For example:
latiss: "/repo/main"
lsstcam: "/repo/main"
dc22: "/repo/dc2.2i"
You can then do something like:
from lsst.daf.butler import Butler
butler = Butler(Butler.get_repo_uri("latiss"))
and not have to remember where the recommended LATISS butler repository is located. This same code will work wherever someone has defined a default repository location for “latiss”.
You can list all known repos with:
print(Butler.get_known_repos())
None of this works though without us writing those YAML index files.
Ideally we’d ue the same label for the (conceptually) same set of data everywhere.
On IDF you could imagine a label of “dp01” and “dp02” and “dp0” with the latter changing from dp0.1 to dp0.2 when dp0.2 comes out.
At NCSA we have /repo/main
as a repository with HSC and LSST data in it but the summit does not have such a thing. It therefore might make more sense for per instrument labels to point to the same repository so if you want LATISS data you’ll always end up in the best repository for LATISS.
What people choose for these labels is out of my hands but we can discuss on this ticket.
In particular I imagine @hsinfang , @yusra , @jbosch, and @merlin will have opinions on whether the same labels should work at NCSA and summit and how much IDF should conform.
Once we know the names I can make a ticket for each site for someone to implement the creation of the file and the setting of the environment variable.