# Generation 3 Butler notes using S3¶

## butler Command line tasks ¶

### Creating a generation 3 butler repository ¶

• Step 1 : Creating registry file (reg.yaml file)

• Firstly, we need to create a SQL file for the registry (e.g vi test.sqlite3) (postgresql is often use for S3)

• Then, we have to create an S3 bucket on echo which will be the butler repository. I used Rclone (https://rclone.org/docs/) to do this "rclone mkdir remote:bucket_name"

• Now we have all of this, create a new file and call it reg.yaml. Within that file you have the path to the SQL file. Example

• registry:
  db: sqlite:////home/test.sqlite3 
• Step 2 : Configuring the butler repository

• Now we have got the butler.yaml file we can create an empty Gen3 Butler repository.

• We do this by using "butler create" which is a command line task (https://pipelines.lsst.io/modules/lsst.daf.butler/scripts/butler.py.html)

• We run:

• » butler create s3://bucket_name --seed-config reg.yaml --override

• where s3://bucket_name is the REPO which is the URI or path to the new repository

• Now we have created a generation 3 Butler repository, if we check our s3 butler repository will see that the butler.yaml is now in the repository. We can also check the sql file and see all the tables loaded in which help query the dataset

### Adding an instrument to the GEN 3 butler repository ¶

• Step 1: Find the instrument class

• Step 2: Running the "register-instrument" Command

• » butler register-instrument s3://bucket_name lsst.obs.subaru.HyperSuprimeCam

• where s3://bucket_name is the REPO which is the URI or path to the new repository and lsst.obs.subaru.HyperSuprimeCam is the instrument class

### Ingest raw frames into from a directory into the butler registry ¶

• Step 1: Adding an instrument to the GEN 3 butler repository

• Make sure that an instrument has been added the into the GEN 3 butler repository (Look the above for instructions to how to add an instrument to the butler repsitory)

• Step 2: Running the "ingest-raws" Command

• » butler ingest-raws s3://bucket_name /home/lsst_stack/testdata_ci_hsc/raw

• where s3://bucket_name is the REPO which is the URI or path to the new repository and /home/lsst_stack/testdata_ci_hsc/raw is the LOCATIONs specifies files to ingest and/or locations to search for files.

### Convert a Butler gen 2 repository into a gen 3 repository ¶

• Step 1: Set up a Gen 3 butler repository(Look above for instructions)

• Step 2: Running the "convert" Command

• » butler convert s3://bucket_name --gen2root /home/lsst_stack/DATA

• where s3://bucket_name is the REPO which is the URI or path to the new repository, --gen2root Root path of the gen 2 repo to be converted and/home/lsst_stack/DATA is the path to the gen 2 repo.

• The tutorial for create a gen 2 repository is here https://pipelines.lsst.io/getting-started/data-setup.html

### Importing data accross two GEN 3 repositories ¶

• Step 1: Export the data

• Firstly you will have export data from the repository with the data currently in it.(how to export data)

• Step 2: Running the "ingest-raws" Command

• » butler import s3://bucket_name_new s3://bucket_name --export-file exports.yaml

• where s3://bucket_name is the REPO which is the URI or path to the repository with the data and s3://bucket_name_new is the REPO which is the URI or path to the repository ehre you want to put your data .

## Butler using a jupyter notebook ¶

### Accessing the data registry ¶

The registry is a good tool for investigating a repo (more on the registry schema can be found here). For example, we can get a list of all collections, which includes the HSC/raw/all collection that we were using before

now that we "know" that HSC/raw/all exists, let's create our butler with this collection:

We can also use the registry to get a list of all dataset types

We suspect that this is all datasetTypes that the processing has tried to create during the processing. There may be intermediate products that were created during processing, but no longer exist.

It is now possible to get all DatasetRef (including dataId) for a specific datasetType in a specific collection with a query like the one that follows

Ok, now that we know what collections exist (HSC/raw/all in particular), the datasetTypes that are defined for that collection, and the datasetRefs (which contain dataIds) for data products of the requested type. This is all the information that we need to get the dataset of interest.

From the list above, I choose index 16 and with this we will find the dataId

### Getting the URIs ¶

• ?
• Getting the URI of the raw data
• getURI(datasetRefOrType, …): Return the URI to the Dataset
• getURIs(datasetRefOrType, …): Returns the URIs associated with the dataset
• getURIs is the “proper” interface for retrieving URIs to a single dataset because butler supports composite disassembly. This means that you can configure your datastore such that on butler.put() it splits the dataset into its component parts. This means that for an Exposure it would write the image, variance, mask, wcs etc into separate files. The motivation for this is that you can then do butler.get("calexp.wcs",...) and for S3 that will be much much more efficient when disassembled since it will only download the small WCS file and not the entire file so that it can read a small part of it. In general composite disassembly is not the default but you can make it so by putting the relevant line in your seed yaml in the datastore section.
• getURI is there for the simple case and will break for you as soon as disassembly is turned on. raws are never disassembled so that’s always safe. If you have disassembled the getURIs dict will be filled in with keys like wcs mapping to a URI. getURIs returns the same answer as getURI in its first return value.

### Importing ¶

• Using either a preexisting GEN 3 butler repository or a a new GEN 3 butler repository
• Set up the but client for the new repository and make sure that "writeable=True" is there
• As I'm transfer the data from one repository to another I'll set ""transfer="copy" otherwise transfer has Options:auto, link , symlink , hardlink , copy , move , relsymlink and direct

Check if the files transfered