Recreating the LSST Science pipeline tutorial (gen 2) only using Generation 3 command line tasks and the pipetasks

I have updated the tutorials to make it explicit which version of the science pipelines were used for producing them. Sorry for the confusion. The docs will be rebuilt as part of our nightly software build and should be available tomorrow.

Thanks Tim and Simon.
little by little, I’m grasping more of this pipeline process. Spent a good 1.5 hours on zoom with Joshua Kitenge at RAL over in UK and realized it’s amazing how much you can move forward with just technical conversation. It was great!!!
Simon, I’ll watch for your official declaration that the pages are re-rendered and start my Gen3 adventure anew.
Getting cooler now in Texas…Fred

You don’t need to wait. The only change is to fix the version number so it doesn’t say v22 but says the weekly as described above. The weekly you have is perfectly fine for driving the tutorial.

Okay, in Step 3. Run newinstall.sh, it refers to a url as shown below. It is correct that it refers to /22.0.0/
, or ,
is this not relevant to our discussions here…
curl -OL https://raw.githubusercontent.com/lsst/lsst/22.0.0/scripts/newinstall.sh
bash newinstall.sh -ct
thanks Tim

Another detail, is that I’m just following the new tutorial…I’m not working separately to grab W33, 34 or 35, as I have been in the past weeks in trying to make the prototype gen3 execute.
thanks

You should probably use the master version of the newinstall.sh script, and then install the w_2021_33 tag of the science pipelines

You can use the w34 or w35 you already have. You are not required to download a new stack version to run the tutorial if the tutorial is compatible with the version of the pipelines software you are already using. The documentation does indicate this.

Yes, the tutorial does say that you can use the software you already have.

Safest to use the w.2021.33 tag on the git repo so that it’s guaranteed to have the matching conda environment.

Very good point

In the part 1 step to set up the Butler data repository, the downline step entitled
Creating a Butler object for HSC data
it references the os.environ variable as DC2_SUBSET_DIR, which I believe should be RC2.

…same issue with part 2. Under Setup, the step reads
cd $GEN3_DC2_SUBSET_DIR
I change this to the obvious to continue, however, I have to ask:
Are we working from the correct gen3 tutorial docs?

You are correct that it is a typo. Sorry about that. This was already found by another user, but I haven’t had time to fix it yet. Hopefully next week I can get to putting in the very useful feedback I’ve gotten back.

I have merged fixes for several typos that made it through the last round somehow. Thanks for reporting. New docs will be out tomorrow.

Thanks, I’m still successfully working through part 2 for single frame processing. I paused to gen an Entity Relationship diagram of the sqlite relational dbms.
I regret not specifying a dozen or so cores to speed this thing up. It’s been running for most of the day.

Tim and James Bosch
Having fun with latest gen3 tutorial, running step 2. I paused to gen an Entity Relationship diagram of the sqlite relational dbms.
I read your 2016 doc regarding lsst software stack and Astropy.
Can you advise what pkgs we’re using for defining regions. I don’t see a regions package present from Astropy, so not sure if we’re using STC-S or something else.
For example, in my SKYMAP table, I see a region described thusly:

cB9JFy+wieu//UFpY+BH4D8i2o1IIt2YP3SJVcbcAOy/1/IT48Pu3j9veYhNIt2YP06DNJe++Ou/DrP+IHzl3j9HTcAwD26rP3/ADwCSgeu/uc81gjxD4D9JIUIrD26rPw==

Can you advise how I can translate this to a geo shape or a series of hms, dms points, or plot it?
Many thanks. Also, I will say gen3 is much more learning friendly with the yaml-driven executions.
Still learning.
Fred Dallas

Skymaps are from lsst.skymap package and regions are defined by lsst.sphgeom package.

Thank you Simon. So far my Gen3 run going okay. When it works well, I can do a lot of sidebar explorations to learn even more.
You may have received these minor augmentations:

  1. Part 3, section on Getting the source catalog, the collection = statement needs the “f” character.
  2. Part 3, same section, the detector=41 has an extra “}”.
    Have a good Wednesday PM.

I’m doing great in my slow/deep tutorial on Gen3 version. I like the way it is structured, at least I can see the process better.
I’m looking at the source catalog steps in part 3 and have a general question.
In the fits files created under /src/ branch, I see 6 BinTableHDU’s in the fits headers. I’ve explored these weeks ago, but understanding more these days; I can see the cols/rows with astropy.table .
My question: Is the Butler repository composed of both the set of sqlite3 relational tables AND the BinTableHDU’s?
I have the full sqlite3 ER model and so far cannot find the source objects in the relational tables; however, perhaps I’m just missing something here.

The Butler only worries about datasets and does not worry about the content of those datasets.

There’s an overview paper here:

So the butler registry keeps track of all the datasets it knows about and how they relate to science concepts. The datastore knows how to read and write those files. Catalog data inside those files is opaque to butler. Those catalogs can be read by clients and analyzed or passed to other tasks or ingested into databases like Qserv and accessible via TAP queries.

Okay, I’ll review the paper. In my question, I never used the word dataset, so I’m not clear what you’re saying. Other than, just call the butler and it’ll keep track of everything.
My lsst_stack tree has at least two types of data. A sqlite3 db that is updated with each step, new rows etc,…AND…other os files (i.e. .fits files as I mentioned in my original question), config files, etc…
If dataset refers to a collection of data with a defined structure, then surely the rows in the EXPOSURE relational table are a much a dataset as the BinTableHDU’s in the fits headers? So, again, my question is, are both of these part of the Butler repository. Or am I conveniently confusing the question by using the term Butler? Maybe I should use the phrase “pipeline repository”.

and,
thanks for your patience. As a 30 year vet of large-scale databases, I’m trying to imagine the “repository” as a large-scale modern-day database…which may be incorrect and unfair.