Ap_pipe in parallel jobs

mchbib · July 1, 2020, 2:18pm

Hello all, I want to run ap_pipe for different bunches of visits for a given night (in 3 jobs separately), just to reduce the time. What is the best way to do this as I would like to get, by the end, one common association.db file that gathers all visits of the 3 jobs? Thank you

kfindeisen · July 1, 2020, 5:24pm

Hello, thank you for your question. ap_pipe.py is designed so that multiple calls can update the same database, so there’s no problem there.

There may be a problem with running the jobs in parallel, however; one possibility is that sources detected in visits processed simultaneously would be assigned to different objects at the same sky position. @cmorrison would be better able to tell you how well the database handles concurrent access.

mchbib · July 2, 2020, 3:36pm

Hello, thank you for your answer. So, there is no guaranty to have safe association with parallel jobs. Is there another way to reduce the time needed by ap_pipe? It took almost 16 hours for one visit/103 ccds. Thank you

kfindeisen · July 2, 2020, 10:50pm

Maybe you could clarify something. When you said “in 3 jobs separately”, did you mean you run ap_pipe.py three times? That is what I assumed at first, and as I said I am not sure what might happen in that case.

On the other hand, we have done parallel runs internal to the program using the --processes/-j command-line arguments, and those function without corruption (with the caveat, as @mrawls mentioned, that sources may be processed in a different order from that in which they were observed). If you are not yet using that feature, it might give you the speed-up you need.

mchbib · July 3, 2020, 3:49am

Hello, yes this is what I meant with 3 jobs separately (run it three times). Thank you for explaining how you performed parallel jobs. I was not using the --process option before, it is what I need then to reduce the ap_pipe process time ! Thank you very much!