Tracing through the processCcd.py runtime steps

fsklich · July 27, 2021, 5:41pm

I’m repeating v22 runthrough with no probs. so far.
I’m digging deeper to understand the underlying steps a bit better this second time.
I see in the log during the execution of: [ processCcd.py DATA --rerun processCcdOutputs --id ]that the process steps through isr, then charImage, then calibrate and so on.
In the master script processsCcd.py , it invokes parseAndRun, very simple that then launches these downline steps.
My questions is about seeing the control transition between the master script, and the downline steps, isr, etc… Even with detailed trace logs I observe nothing that shows the transition from processCcd.py to isr and so on. I must be missing something simple; Is there a part of the parseAndRun software that controls these downline steps/sequences in the process…and I’m just missing it?
Anyway, so far so good. Just trying to dig in a bit more to see “handoffs” from one phase/step to the other.
thanks again,
Fred klich, Dallas

timj · July 27, 2021, 5:46pm

Things might be a bit clearer for you if you work in the new gen3 middleware. In gen3 pipelines are defined using YAML files and we have a command pipetask that knows how to execute them (rather than separate scripts per combination of tasks).

See for example the command line in this unit test: pipelines_check/run_demo.sh at master · lsst/pipelines_check · GitHub

The pipeline for that is at pipe_tasks/DRP.yaml at master · lsst/pipe_tasks · GitHub

ktl · July 27, 2021, 6:05pm

There are documents about how the Gen2 CmdLineTask framework functions, but I don’t want to refer you to them because they will very soon be obsolete.

We’re still working on comprehensive documentation for the Gen3 system, but if you’re interested in the details, you might get a start at PipelineTask — LSST Science Pipelines and the various classes and methods linked from there.

kfindeisen · July 27, 2021, 7:11pm

I agree that switching to Gen 3 is a good idea, but for the record:

In Gen 2 (and technically also in Gen 3, though it’s less prominent there), each task (in this case, ProcessCcdTask) is responsible for deciding when and how to call its subtasks. This is not something parseAndRun does automatically. In this case the logic is fairly simple, and can be found in ProcessCcdTask.runDataRef. In other cases subtasks may not be simply “steps” of their parent task, but behave more like modular functions.

Note also that ProcessCcdTask does not exist in Gen 3. There, ISR, characterizeImage etc. are top-level tasks, though they still have subtasks of their own. Top-level tasks interact with each other as Tim Jenness described.

fsklich · July 27, 2021, 10:17pm

thanks, the GitHub links that you shared does “appear” a bit more centralized compared to what I’m trying to understand now.
Okay, I migrated to V22, after my initial tests with V21. I’ll do some more reading re: gen3 pipelines; will it be released as a future version? V23+ ?

fsklich · July 27, 2021, 10:19pm

thanks. Will gen3 come out as V23 or later?

fsklich · July 27, 2021, 10:21pm

thanks, from what you and Tim explain…and…from the links you shared, it does appear that Gen3 is a little more centralized and (for me now) easier to understand.

timj · July 27, 2021, 10:31pm

Gen3 is already out. It was available in v21 but has been evolving rapidly so we recommend that gen3 users do not use the formal releases but instead install the weekly releases. For example the current weekly is available using tag w_2021_30.

You can find other discussions of gen3 on community. For example Recreating the LSST Science pipeline tutorial (gen 2) only using Generation 3 command line tasks and the pipetasks

fsklich · July 28, 2021, 10:29am

Also, Krzysztof, as a followup, I can see the core logic in ProcessCcdTask.runDataRef for each of the major sub-tasks (as you are explaining). My deficiency was following the logic to flow from the ProcessCcdTask.parseAndRun to this runDataRef module. I was hoping this might be apparent in the full/detailed logging that I generated, but I could not see it.
Many thanks for your explanation.

fsklich · July 28, 2021, 11:13am

Krzysztof, I also discovered I can use “–show config data tasks” on the processCcd.py CLI and I can see a bit more of the processing steps and flow…just FYI. Drinking from the firehose, but getting there.
Fred