URL-oriented HTTP download service for Git LFS repos

I like very much this kind of tutorials. I’m experiencing an issue cloning the test dataset, so I have not followed it all but it looks very interesting and very useful. My intention is to follow it all when I manage to get my environment working.

As an end-user, I would find beneficial to be able to quickly download the test data using commands such as curl, in the same spirit of what is done for running the pipelines demo. The goal for me is to keep focused on the purpose of the tutorial which is to use the LSST pipelines, instead of getting git and all the surrounding stuff configured before being able to do something. Although this configuration is necessary to go further after having completed the tutorial, I would say that being able to quickly download the dataset needed for the tutorial greatly removes friction and helps keeping one focused.

Does anyone know if git-lfs supports remote git archive? If so, it could be relatively easy to generate a hyperlink that produces a downloadable tarball on-the-fly from a git-lfs repo (though it’s still be beyond my meager abilities).

If that doesn’t work, I think we really need to consider mirroring many of our git-lfs repos as tarballs somewhere directly downloadable; I very much agree with @FabioHernandez that pipeline users who don’t intend to do much development should not have to deal with git-lfs (at least in its current state w.r.t. authentication).

1 Like

I really like this idea of providing Git-repo-relative URLs into objects on the Git LFS S3 bucket so that one could download files via HTTP GET in the same way that GitHub provides https://raw.githubusercontent.com/{org}/{repo}/{branch}/{path} urls, e.g.

https://raw.githubusercontent.com/lsst/lsst/master/scripts/newinstall.sh

Pinging @jmatt

This is absolutely possible. But it should be seen as a service built with git which just happens to use git-lfs.

The Git LFS server and the objects stored in S3 have no meta-data associating them to specific objects in git. They are defined by their SHA-256 value and length. Only the combination of git, the git-lfs client and server has the required information to build a service like this.

All of that being said, if there is interest then I think this is a great idea. We’d have to work out the exact scope of a MVP. I could go into more detail but I think it’s best to do that out of band.