DM Boot Camp Proposal

We have had a number of new people joining the team recently, and more will be joining in the next couple of months. A suggestion was made to hold a “DM Boot Camp” to introduce these people to DM processes and the code, particularly the Science Pipelines stack.

My original conception of this was to have a single, unified activity that would be partly local and partly virtual. Given current schedules, it may not be possible to do it this way, but this at least provides a starting point for discussion.

Sites

UW and Princeton (the primary Science Pipelines development locations)

Participants

All UW and Princeton new hires. DavidN, JonathanS, AngeloF, NateP, Hsin-FangC would benefit from a solid understanding of the Science Pipelines stack. Possibly others.

Curriculum

Each day would involve a few hours of lectures delivered via videoconference by subject experts. Due to time zones, something like 10-1 Pacific (1-4 Eastern) might be a good timeslot. A hands-on tutorial with a dedicated local tutor would occupy much of the rest of the day (afternoon Pacific, morning Eastern); additional time could be spent on personal projects with assistance from the local tutor.

  • Day 1:
    • Presentation: Introduction to LSST and DM science goals, system description, DM’s role, data products, processing flows
    • Presentation: DM organization and people, communication, documents
    • Presentation: DM code structure
    • Tutorial: Installing and using the stack (Python, eups)
  • Day 2:
    • Presentation: Middleware (log, daf_base, pex_exceptions, Butler, pex_config, pipe_base, db)
    • Presentation: afw
    • Presentation: meas_base
    • Tutorial: Modifying existing code (JIRA, git, GitHub, eups, lsstsw, scons, doxygen, Jenkins, code review)
  • Day 3:
    • Presentation: Orchestration and control (ctrl_execute)
    • Presentation: obs_* packages
    • Presentation: meas_* packages
    • Presentation: pipe_tasks (driver scripts, measurement conversion, database ingest)
    • Tutorial: Creating a new package (sconsUtils, .table files, .cfg files, third-party packages, eupspkg, distribution server)
2 Likes

I think we need to find room for at least two more Science Pipelines package presentations: one on pipe_task another on the rest of meas_*.

I have been hoping for something just like this since I have started. Additionally I was wondering if this should be a longer term thing as well, to keep up with all the changes that are going on in the stack as they happen.

That was mentioned under “Middleware”, but it’s possible it needs a more extended treatment.

Sorry, meant pipe_tasks, not pipe_base (fixed above).

Added a pipe_tasks presentation to Day 3, and ctrl_execute should be reasonable short, so fitting meas_* in is probably possible.

2 posts were split to a new topic: Keeping DM and the community updated with new Stack features

This is a great idea! I think there might several people in Tucson that would be interested in joining (e.g, DavidN, JonathanS, AngeloF). What timescale are you thinking for this? This month? Sooner rather than later would be good.

I was hoping for early October, but I think that presents problems at Princeton, where people will be headed to Japan. I was worried that trying to do it in September left little time for preparation and that some people would not have come on board yet.

Perhaps we should just bite the bullet and set a date. I’m guessing even a half-baked boot camp would be better than none.

Are you planning to allow for remote participation? I think that besides me, some people in the French community may be interested in several of the topics mentioned in your tentative curriculum.

Effective remote participation to the tutorials may be difficult to achieve but if being able to follow at least the lectures via a video-link it would be really great. In addition, the lectures could be recorded for offline consumption.

Anyway, it is a great idea to have this camp.

Do you have some basic requirement for the participants if IPAC wants to send a person or two to attend the tutorial? Also I would like to participate remotely. Thank you!

Since the lecturers would always be remote from one site or the other or even both, I don’t see how there could be a problem with having others listen in. I don’t want this to be a three-day shutdown of all of DM, though. I would only send people to the sites if they could really benefit from the hands-on work and close contact with Science Pipelines staff.

It would be great if we could have this by the end of September and allow remote participation, even for the “hands-on” part (people could just have Blue Jeans open for many hours and ask questions of the “leaders” when they get stuck with something).

Current thinking is to add a third site at Tucson, since there are at least three people there, and perhaps fly in a local tutor.

I’m thinking of aiming for Sep 29, 30, Oct 1.

I’m not sure that an “all-day BlueJeans meeting” model works well enough, even with screen-sharing, although it’s undoubtedly better than nothing.

1 Like

Just a heads up, not that you must plan this around me, but I will be away September 24th-October 11th for my wedding/honey moon. I only mention this as a datapoint incase there are many other collisions and a different time is being debated. If I miss this it’s not a big deal, I will catch up in other ways.

I have reorganized the curriculum to emphasize “how-to” on the first day with more internals and programming on the second and third days. I have tentatively assigned speakers. Since this is a rather drastic change from what people have seen before, I’m not going to send it to dm-devel until people have had a chance to react.

Updated Proposal

We will hold a “DM Boot Camp” to introduce new hires and interested users to DM processes and the Science Pipelines stack from October 5-7, 2015.

Sites

UW, Tucson, and Princeton, plus remote participants

Participants

All available UW and Princeton new hires. DavidN, JonathanS, and AngeloF in Tucson, plus other local users if desired. NateP will travel to Seattle. Hsin-FangC may travel to whichever site is most convenient.

JohnS and LaurenM will provide local instruction at Princeton. RussellO and TimJ will do the same in Tucson. YusraA and SimonK will instruct at UW.

Remote participants are welcome, up to the limit of our videoconferencing tools. We will record lectures for future use and reference.

Organization

Each day will involve a few hours of lectures delivered via videoconference by subject experts. Due to time zones, lectures will be from 10-1 Pacific (1-4 Eastern). We plan to record lectures for future reference and usage.

A hands-on tutorial with local instructor(s) will occupy much of the rest of the day (afternoon Pacific, morning Eastern); additional time could be spent on personal projects with assistance from the local tutor(s). We will try to enable inter-site sharing and remote participation in the tutorials, but this may not work well depending on how much preparation we are able to do.

We have not yet selected a videoconferencing tool; stay tuned for more information.

Tentative Agenda

The following agenda and speaker assignments are subject to change but give an idea of the topics we are planning to cover.

  • October 5:
  • Presentation (1 hr): Introduction to LSST and DM: science goals, system description, DM’s mission, data products, processing flows for Level 1 (Alert Production) and Level 2 (Data Release Production), available data processing scripts. MarioJ or KTL
  • Presentation (0.5 hr): Basic afw Concepts: Image, Exposure, afw.table from the end user’s perspective. KTL
  • Presentation (0.5 hr): Using the Butler: repositories, mappers, registries. KTL
  • Presentation (0.5 hr): Using Tasks: arguments, configuration, retarget/subtasks. JohnS
  • Presentation (0.5 hr): Basic eups Concepts: products, versions, tags, dependencies. JohnS
  • Tutorial: Installing the stack, eups practicalities, using the stack to process data
  • October 6:
  • Presentation (0.5 hr): DM Code 1: overall system structure, available third-party packages and middleware. TimJ
  • Presentation (0.5 hr): DM Code 2: structure of a package, scons and sconsUtils, SWIG. KTL
  • Presentation (1 hr): afw: contents, how to use (images, cameraGeom, detections), common pitfalls (angles, XY0). SimonK
  • Presentation (1 hr): Detection and Measurement: meas_base, afw.table, meas_*, writing a new measurement plugin. JimB
  • Tutorial: Modifying existing code: JIRA, git, GitHub, eups, lsstsw, scons, doxygen, Jenkins, code review
  • October 7:
  • Presentation (1 hr): DM Organization: DM org chart and people, communication mechanisms, documents, JIRA. JeffK
  • Presentation (0.5 hr): Orchestration and Control: HTCondor, ctrl_execute. SteveP
  • Presentation (0.5 hr): Writing a Task: writing a Config, using subtasks, CmdLineTask/argument parser/TaskRunner. RussellO or PaulP?
  • Presentation (1 hr): Creating an obs_* Package: camera description, configuration overrides, task customizations, mapper policy and subclass. RussellO
  • Tutorial: Creating a new package: sconsUtils, .table files, .cfg files, third-party packages, eupspkg, distribution server

I’ll miss this (as I’ll be in Japan). It’d be good to see all the presentations well in advance, as the whole thing needs to fit together and it’s all too easy to concentrate on the details not the bigger picture.

I don’t see anything on the Tasks that we assemble pipelines from. I’d like to see a discussion of available tasks and their functionality (I’d personally start with processCcd not one of the cmdlineTask tasks as I think it’s easier to see the drill-down to measurement. I looked at the new meas_base recently and it seemed more complex than last time I looked at it.