Using an external data storage service with the data butler

The CADC is planning to place a copy of the LSST co-add images into our Storage Inventory system minoc https://ws-cadc.canfar.net/minoc/.

This will involve turning requests for files into a HTTPS URL from the URI and returning a ByteIO stream, I think?

Can someone point me in the right direction for implementing such a system? I suspect I just need to implement a custom.

lsst.daf.butler.datastores.file_datastore.get.get_dataset_as_python_object_from_get_info

method and somehow config the file_datastore to use my method?

If it would be possible to implement this by inventing a custom URI scheme for your storage system and implementing a subclass of ResourcePath, that would probably be the best approach. It’s not obvious whether that would work here or not, and it would probably require a patch to the base class to make it aware of your scheme (@timj, maybe we should think about an entry-points hook for this?). But it’s what we’ve done to abstract over file storage backends in all other cases, and there are quite a few other implementations you could use as an example.

How do people access any file from minoc? If it’s a simple URL then you can populate the butler file datastore with either a relative path to a root URL or the full URL that you need to use. The question then becomes how you handle auth to restrict access to the file to the data rights community.

We do support X509 type auth in our https access package used by butler resources/python/lsst/resources/http.py at main · lsst/resources · GitHub

Are you using something like that for auth?

You are braver than us using direct butler with a readonly postgres user. We are using a client/server intermediary that uses a web service to talk to butler. This allows things like rate limiting and URL signing.