Step 1 : Creating registry file (reg.yaml file)
Firstly, we need to create a SQL file for the registry (e.g vi test.sqlite3) (postgresql is often use for S3)
Then, we have to create an S3 bucket on echo which will be the butler repository. I used Rclone
(https://rclone.org/docs/) to do this "rclone mkdir remote:bucket_name"
Now we have all of this, create a new file and call it reg.yaml. Within that file you have the path to the SQL file. Example
db: sqlite:////home/test.sqlite3
Step 2 : Configuring the butler repository
Now we have got the butler.yaml file we can create an empty Gen3 Butler repository.
We do this by using "butler create" which is a command line task
(https://pipelines.lsst.io/modules/lsst.daf.butler/scripts/butler.py.html)
We run:
Now we have created a generation 3 Butler repository, if we check our s3 butler repository will see that
the butler.yaml is now in the repository. We can also check the sql file and see all the tables loaded in which help query the dataset
Step 1: Find the instrument class
Step 2: Running the "register-instrument" Command
» butler register-instrument s3://bucket_name lsst.obs.subaru.HyperSuprimeCam
where s3://bucket_name is the REPO which is the URI or path to the new repository and lsst.obs.subaru.HyperSuprimeCam is the instrument class
Step 1: Adding an instrument to the GEN 3 butler repository
Step 2: Running the "ingest-raws" Command
» butler ingest-raws s3://bucket_name /home/lsst_stack/testdata_ci_hsc/raw
where s3://bucket_name is the REPO which is the URI or path to the new repository and /home/lsst_stack/testdata_ci_hsc/raw is the LOCATIONs specifies files to ingest and/or locations to search for files.
Step 1: Set up a Gen 3 butler repository(Look above for instructions)
Step 2: Running the "convert" Command
» butler convert s3://bucket_name --gen2root /home/lsst_stack/DATA
where s3://bucket_name is the REPO
which is the URI or path to the new repository, --gen2root Root path of the gen 2 repo to be converted and/home/lsst_stack/DATA is the path to the gen 2 repo.
The tutorial for create a gen 2 repository is here https://pipelines.lsst.io/getting-started/data-setup.html
Step 1: Export the data
Step 2: Running the "ingest-raws" Command
import lsst.daf.butler as dafButler
import lsst.afw.display as afwDisplay
import pylab as plt
import os,glob
import lsst.geom as geom
butler = dafButler.Butler("s3://joshuakitenge-DATA")
registry = butler.registry
registry.isWriteable()
# We can examine the registry with
#help(registry)
()
The registry is a good tool for investigating a repo (more on the registry schema can be found here). For example, we can get a list of all collections, which includes the HSC/raw/all collection that we were using before
for c in registry.queryCollections():
print(c)
HSC/raw/all HSC/calib HSC/calib/unbounded HSC/calib/curated/1970-01-01T00:00:00 HSC/calib/curated/2013-01-31T00:00:00 HSC/calib/curated/2014-04-03T00:00:00 HSC/calib/curated/2014-06-01T00:00:00 HSC/calib/curated/2015-11-06T00:00:00 HSC/calib/curated/2016-04-01T00:00:00 HSC/calib/curated/2016-11-22T00:00:00 HSC/calib/curated/2016-12-23T00:00:00
now that we "know" that HSC/raw/all exists, let's create our butler with this collection:
butler = dafButler.Butler("s3://joshuakitenge-DATA",collections='HSC/raw/all')
registry = butler.registry
We can also use the registry to get a list of all dataset types
for x in registry.queryDatasetTypes():
print(x)
DatasetType('raw', {band, instrument, detector, physical_filter, exposure}, Exposure) DatasetType('camera', {instrument}, Camera, isCalibration=True) DatasetType('defects', {instrument, detector}, Defects, isCalibration=True) DatasetType('bfKernel', {instrument}, NumpyArray, isCalibration=True) DatasetType('transmission_optics', {instrument}, TransmissionCurve, isCalibration=True) DatasetType('transmission_sensor', {instrument, detector}, TransmissionCurve, isCalibration=True) DatasetType('transmission_filter', {band, instrument, physical_filter}, TransmissionCurve, isCalibration=True) DatasetType('transmission_atmosphere', {instrument}, TransmissionCurve, isCalibration=True)
We suspect that this is all datasetTypes that the processing has tried to create during the processing. There may be intermediate products that were created during processing, but no longer exist.
It is now possible to get all DatasetRef (including dataId) for a specific datasetType in a specific collection with a query like the one that follows
datasetRefs = list(registry.queryDatasets(datasetType='raw',collections=['HSC/raw/all']))
for ref in datasetRefs:
print(ref.dataId)
{band: r, instrument: HSC, detector: 11, physical_filter: HSC-R, exposure: 903344} {band: r, instrument: HSC, detector: 5, physical_filter: HSC-R, exposure: 903344} {band: r, instrument: HSC, detector: 0, physical_filter: HSC-R, exposure: 903344} {band: i, instrument: HSC, detector: 25, physical_filter: HSC-I, exposure: 903990} {band: i, instrument: HSC, detector: 18, physical_filter: HSC-I, exposure: 903990} {band: r, instrument: HSC, detector: 6, physical_filter: HSC-R, exposure: 903346} {band: r, instrument: HSC, detector: 1, physical_filter: HSC-R, exposure: 903346} {band: r, instrument: HSC, detector: 12, physical_filter: HSC-R, exposure: 903346} {band: i, instrument: HSC, detector: 23, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 22, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 100, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 16, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 17, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 23, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 24, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 16, physical_filter: HSC-I, exposure: 903988} {band: r, instrument: HSC, detector: 10, physical_filter: HSC-R, exposure: 903342} {band: r, instrument: HSC, detector: 4, physical_filter: HSC-R, exposure: 903342} {band: r, instrument: HSC, detector: 100, physical_filter: HSC-R, exposure: 903342} {band: i, instrument: HSC, detector: 4, physical_filter: HSC-I, exposure: 904010} {band: i, instrument: HSC, detector: 10, physical_filter: HSC-I, exposure: 904010} {band: i, instrument: HSC, detector: 100, physical_filter: HSC-I, exposure: 904010} {band: r, instrument: HSC, detector: 23, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 16, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 100, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 22, physical_filter: HSC-R, exposure: 903334} {band: i, instrument: HSC, detector: 6, physical_filter: HSC-I, exposure: 904014} {band: i, instrument: HSC, detector: 12, physical_filter: HSC-I, exposure: 904014} {band: i, instrument: HSC, detector: 1, physical_filter: HSC-I, exposure: 904014} {band: r, instrument: HSC, detector: 18, physical_filter: HSC-R, exposure: 903338} {band: r, instrument: HSC, detector: 25, physical_filter: HSC-R, exposure: 903338} {band: r, instrument: HSC, detector: 24, physical_filter: HSC-R, exposure: 903336} {band: r, instrument: HSC, detector: 17, physical_filter: HSC-R, exposure: 903336}
Ok, now that we know what collections exist (HSC/raw/all in particular), the datasetTypes that are defined for that collection, and the datasetRefs (which contain dataIds) for data products of the requested type. This is all the information that we need to get the dataset of interest.
From the list above, I choose index 16 and with this we will find the dataId
ref = datasetRefs[16]
print(ref.dataId)
{band: r, instrument: HSC, detector: 10, physical_filter: HSC-R, exposure: 903342}
# To get the image,we pass the dataId
test = butler.get(ref)
#And plot!
afwDisplay.setDefaultBackend('matplotlib')
fig = plt.figure(figsize=(10,8))
afw_display = afwDisplay.Display(1)
afw_display.scale('asinh', 'zscale')
afw_display.mtv(test)
plt.gca().axis('off')
(-0.5, 2143.5, -0.5, 4240.5)
Raw_URI=[]
Raw_URIs=[]
for ref in butler.registry.queryDatasets("raw", collections=['HSC/raw/all']):#, where="detector = 22"):
uri = butler.getURI(ref)
Raw_URI.append(uri)
print("{}\n".format(uri))
#for ref in butler.registry.queryDatasets("raw", collections=['HSC/raw/all']):#, where="detector = 22"):
#uri = butler.getURIs(ref)
#print("{}\n".format(uri))
#print("{}\n".format(Raw_URI))
#print("{}\n".format(Raw_URIs))
s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903344_11_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903344_5_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903344_0_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903990_25_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903990_18_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903346_6_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903346_1_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903346_12_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903986_23_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903986_22_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903986_100_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903986_16_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903988_17_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903988_23_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903988_24_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_903988_16_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903342_10_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903342_4_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903342_100_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_904010_4_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_904010_10_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_904010_100_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903334_23_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903334_16_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903334_100_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903334_22_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_904014_6_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_904014_12_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/i/HSC-I/raw_i_HSC-I_904014_1_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903338_18_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903338_25_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903336_24_HSC_HSC_raw_all.fits s3://joshuakitenge-DATA/HSC/raw/all/raw/r/HSC-R/raw_r_HSC-R_903336_17_HSC_HSC_raw_all.fits
with butler.export(filename = "/home/vrs42921/lsst_stack/exports.yaml") as export:
export.saveDatasets(butler.registry.queryDatasets("raw", collections=['HSC/raw/all']))
butler_im = dafButler.Butler("s3://kitenge_test_2",writeable=True)
butler_im.import_(directory="s3://joshuakitenge-DATA",filename="exports.yaml",transfer="copy")
Check if the files transfered
butler_im_test = dafButler.Butler("s3://kitenge_test_2")
reg =butler_im_test.registry
for c in reg.queryCollections():
print(c)
HSC/raw/all
datasetRefs_test = list(reg.queryDatasets(datasetType='raw',collections=['HSC/raw/all']))
for ref2 in datasetRefs_test:
print(ref2.dataId)
{band: i, instrument: HSC, detector: 100, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 16, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 22, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 23, physical_filter: HSC-I, exposure: 903986} {band: i, instrument: HSC, detector: 16, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 17, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 23, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 24, physical_filter: HSC-I, exposure: 903988} {band: i, instrument: HSC, detector: 18, physical_filter: HSC-I, exposure: 903990} {band: i, instrument: HSC, detector: 25, physical_filter: HSC-I, exposure: 903990} {band: i, instrument: HSC, detector: 100, physical_filter: HSC-I, exposure: 904010} {band: i, instrument: HSC, detector: 10, physical_filter: HSC-I, exposure: 904010} {band: i, instrument: HSC, detector: 4, physical_filter: HSC-I, exposure: 904010} {band: i, instrument: HSC, detector: 12, physical_filter: HSC-I, exposure: 904014} {band: i, instrument: HSC, detector: 1, physical_filter: HSC-I, exposure: 904014} {band: i, instrument: HSC, detector: 6, physical_filter: HSC-I, exposure: 904014} {band: r, instrument: HSC, detector: 100, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 16, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 22, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 23, physical_filter: HSC-R, exposure: 903334} {band: r, instrument: HSC, detector: 17, physical_filter: HSC-R, exposure: 903336} {band: r, instrument: HSC, detector: 24, physical_filter: HSC-R, exposure: 903336} {band: r, instrument: HSC, detector: 18, physical_filter: HSC-R, exposure: 903338} {band: r, instrument: HSC, detector: 25, physical_filter: HSC-R, exposure: 903338} {band: r, instrument: HSC, detector: 100, physical_filter: HSC-R, exposure: 903342} {band: r, instrument: HSC, detector: 10, physical_filter: HSC-R, exposure: 903342} {band: r, instrument: HSC, detector: 4, physical_filter: HSC-R, exposure: 903342} {band: r, instrument: HSC, detector: 0, physical_filter: HSC-R, exposure: 903344} {band: r, instrument: HSC, detector: 11, physical_filter: HSC-R, exposure: 903344} {band: r, instrument: HSC, detector: 5, physical_filter: HSC-R, exposure: 903344} {band: r, instrument: HSC, detector: 12, physical_filter: HSC-R, exposure: 903346} {band: r, instrument: HSC, detector: 1, physical_filter: HSC-R, exposure: 903346} {band: r, instrument: HSC, detector: 6, physical_filter: HSC-R, exposure: 903346}