How to Obtain Local Access to DP0.2 Images

Hello all! I am looking to create a dataset of galaxy images from DP0.2 external to the Jupyter notebooks (so we can use our own GPU architecture for training).

On the RSP API Aspect page it mentions that the API services include SODA which seems like it can perform image retrievals from looking at the IVOA documentation, but RSP does not have documentation for how to use it with pyvo as far as I can tell. Does anyone know of a working API that can perform image retrievals from the DP0.2 dataset?

It would be ideal if the API services could include the butler. Will this be true in the future if it is not now?

As for the size of the data being requested each time as small as one tract per request would be fine. Any suggestions/help for this would be greatly appreciated, and please let me know if I can clarify anything!

How much of DP0.2 are you wanting to download?

Yes. SODA is the cutout service. You can experiment with it from the portal aspect. Do an ObsTAP query and you will see datalinker links that can point to the SODA cutout service and image downloader. You need to go via ObsTAP because the other services need UUIDs to determine what to download. You can use ObsTAP from your pyvo client.

Client/server butler is being worked on (https://dmtn-288.lsst.io) that in theory would allow remote access to butler from outside of RSP. We do not have agreement on when we would consider making butler visible outside of RSP.

1 Like

Thank you for the quick response! I will experiment with using ObsTAP from pyvo & report here with how it goes. Not sure on the volume of images that we’re going to download, just trying to see if it was possible.

A bit more followup:

@timj mentioned “Do an ObsTAP query and you will see datalinker links that can point to the SODA cutout service and image downloader.” If you’re not familiar with the DataLink-style data access model to images via ObsTAP, let me expand on that.

When you do an ObsTAP query, for each image in the query result, you get an access_url value. In this data access model, that URL doesn’t point to the image itself, but rather to a service that provides an additional layer of indirection. (It’s way beyond the scope of this answer to explain what that’s desirable, so you’ll have to take it on faith that we consider it essential.) When you access that URL, you get back a short table – currently it only has two rows.

One row is a pointer to the cutout service (SODA) for that image; this row has semantics="#cutout". The other row, with semantics="#this", provides a short-lived “signed URL” for accessing the actual image file. So you do not have to go through the SODA service to download an image, if you in fact need the whole image.

All of the images available in DP0.2 are at either CCD scale (for single-epoch images) or at patch scale (for coadds), and are 4k*4k. There is no access to tract-scale images (which you asked about); they don’t actually exist as such in the system.

As Tim mentioned, you can experiment with all this in the RSP Portal. You can perform an ObsTAP search there. You can start that by working in the UI-assisted mode, but you can then click on the “Populate and edit ADQL” button at the bottom of the screen to see the actual query to be executed. This might look something like this:

SELECT dataproduct_type,dataproduct_subtype,calib_level,lsst_band,em_min,em_max,lsst_tract,lsst_patch,
       lsst_filter,lsst_visit,lsst_detector,lsst_ccdvisitid,t_exptime,t_min,t_max,s_ra,s_dec,s_fov,
       obs_id,obs_collection,o_ucd,facility_name,instrument_name,obs_title,s_region,access_url,
       access_format 
FROM ivoa.ObsCore 
WHERE dataproduct_type = 'image' AND dataproduct_subtype = 'lsst.deepCoadd_calexp' AND CONTAINS(POINT('ICRS', 62, -37), s_region)=1

After you’ve executed the query, in the Portal you’ll see a “Data Product” tab in the interface, usually in the upper left, and in that UI element you’ll see a “More” dropdown menu. If you pick “Show Datalink VO Table for list of products” you’ll get a display of the two-line table I mentioned above.

I know what this notion means to me and I think I have a good idea what it means to Tim, but I’d like to be sure that these map to what you actually had in mind. What would this mean to you if you got it? I.e., what is it you would want to do here? Do a Butler.get() call remotely, on some computer of yours (i.e., not in an RSP notebook)?

2 Likes

Thank you for your in-depth answer and clarifying the datalink-style data access model!

This is indeed what I had in mind.

OK, thanks for confirming. That indeed is what @timj was referring to.

This was not in the original design requirements but was long recognized as a feature likely to be of great interest to users.

Because, on other grounds, we subsequently decided to move much of the logic of the Butler from locally-executed Python to a central server, this will soon become more feasible to implement. There are still some remaining technical issues, though, so at the moment we can’t commit to this as a service available at the start of operations.

However, it’s very useful data to hear about it as a request from an actual user. :slight_smile:

1 Like