LSST 2015 Bremerton DM Wrap-Up

ktl · August 26, 2015, 4:57am

Goals

The primary DM goals of the meeting were to attempt to perform an end-to-end execution of code from ISR through coaddition, database ingest, and SUI/T display; to produce measurements suitable for determining the DM Key Performance Metrics (KPMs); to make decisions on a variety of outstanding issues; and in general to exchange information among DM team members, with LSST project staff, and with science collaboration members.

The many side conversations, informal meetings, meals, soccer games, and board gaming sessions were very useful at sharing information, suggesting changes, resolving interpersonal issues, and improving team cohesiveness.

End-to-End and KPMs

Data Release Production

The end-to-end Data Release Production testing went quite well, partly due to a lot of preparatory activity the week prior.

Processed raw HSC engineering frames into calibrated exposures using psfex
Detected and measured sources on calibrated exposures
Used measurement transformation framework to generate calibrated catalogs
Worked on porting meas_mosaic for relative astrometric calibration but not yet functional
Coadded calibrated exposures into per-band deep (non-PSF-matched) coadds (one patch)
Detected blended peaks on each coadd and merged peaks across bands
Deblended and measured (same) peaks on each coadd
Did forced photometry on other coadds based on (per-object) reference band
Ingested calibration, single-frame, and coadd sources into MySQL database
Close to loading same catalogs into Qserv
Fixed some bugs in Qserv that showed up at large scales
Pointed data access web services to MySQL database, Qserv, and images
Close to running Firefly to display images and catalogs (estimate 28-Aug; Firefly can already display S13 SDSS data)
Determined ellipticity (TE1, TE2) Key Performance Metrics
Close to determining photometric and astrometric repeatability KPMs (waiting on DM-3490)
Have timings that can be used to determine DRP TFLOPS KPM
Data, Tasks, configurations, and scripts should be used to automate integration tests and KPM generation in the future

Alert Production

Some Alert Production work occurred as well:

Used fixed “matchOptimisticB” astrometry matcher to correct for distortion
Ran difference imaging on HSC frames and coadds
Timings used to estimate OTT1 KPM; should be able to estimate AP TFLOPS KPM
Data, Tasks, configurations, and scripts should be used to automate integration tests and KPM generation in the future

Decisions

Science Verification Datasets

An evening discussion led to a decision on a list of datasets appropriate for performing science verification.

DECam data; COSMOS, bulge survey, Solar System objects, HiTS survey, and SMASH survey
CFHT lensing survey
HSC COSMOS data
Pan-STARRS data for database and SUI/T testing
Simulated LSST data (yet to be determined)

Once we have updated the DM System Requirements, we will need to use these datasets as part of writing a verification plan for each requirement.

RFCs

Decisions were reached on many long-open RFCs. See the RFC issues for more detail.

RFC-21 (GitHub branch policy): adopted with some tweaks to the release process and a requirement for an automated process to be developed to turn merged ticket branches into tags.
RFC-45 (Copyright policy): adopted with the possibility of an unchanging pointer to the repo’s COPYRIGHT file in place of the current boilerplate.
RFC-51 (JIRA assignments for components: adopted with the proviso that a T/CAM may remove the assignee for a ticket but must then set the Team of the ticket instead.
RFC-56 (gcc 4.8+): adopted with devtoolset-3 subsets (gcc 4.9) on NCSA cluster machines and the Jenkins CentOS 6 machines; gcc 4.8 on CentOS 7. We will continue to support CentOS 6 until we can retire most of the old LSST cluster machines. All new NCSA machines should be CentOS 7.
RFC-62 (pex_config tree root naming): adopted with config. I think Russell may volunteer to implement this change.
RFC-25 (32-bit MaskPixel): adopted. Jim Bosch suggested this change; I’m not sure if he’s volunteering to implement it.
RFC-78 (API error return checking): adopted. Gregory will add his suggested text to the coding standards.
RFC-29 (lsst.log): no decision. We asked Russell to do some initial investigation of whether lsst.log could be “dropped in” in place of pex_logging. We also asked Tim to be the “owner” of lsst.log until NCSA can support it.
RFC-60 (Python 3): adopted. We will move to python3 + 2.7 using future with a few prerequisites.
RFC-81 (Pythonic API): no decision. Some investigation of Jim’s suggested modest changes can occur prior to the HSC merge, but major changes to the code have to wait for that (estimated to be in October).
RFC-39 (Rename 1D makeStatistics): This RFC was withdrawn prior to the meeting, but the Statistics interface was acknowledged to need improvement in the future.
RFC-58 (Refactor Approximate, BackgroundMI, GaussianProcess): This RFC was partially adopted (name, usage, and operations but not constructors) before the meeting.

Trac tickets

We triaged 140 afw Trac tickets, deciding that 44 were obsolete, 41 were still valid, and 45 needed more investigation before being declared one or the other. The team will try to similarly categorize the other 391 Trac tickets pertaining to other components.

DM Planning

In the short term, we will focus on updating our requirements and design documents. Where we know what we are building, we need to write that down so that others inside and outside DM can understand it. Where we don’t yet know what we’re building, we need to write down what we are doing first and give some idea of what other possibilities we might try.

Once that is done, it will be easier to determine dependencies and estimate resources.

Our goal is a justifiable resource-loaded plan by November.

The T/CAMs developed reasonable FY16 budget plans, identified the causes and developed recovery plans for current DM variances, and identified process improvements to avoid “administrative” variances associated with level-of-effort activities (for all personnel).

Science Pipelines Sprint Reviews

We are working to improve the per-sprint processes for the Science Pipelines teams.

It was decided that K-T will participate in “backlog grooming” (deciding what the highest priority tasks are for the next sprint) with input from Mario. We will try to summarize to Mario what the next sprint will accomplish. It will help to have improved versions of the Data Products Definition Document, better requirements flowdown via the DM System Requirements, and rewritten design documents, especially the Applications Design Document.

Sprint planning sessions should be more for the team and the T/CAM(s), not “management”. The sprint review, backlog grooming, and sprint planning should be scheduled at fixed times in the monthly cycle to ensure that people are available.

Sprint reviews need to teach Mario and Robert what is going on.

Questions are not a critique but to facilitate understanding
Need to resist temptation to go into too much detail
Try to have authors available to present work
Moderator might cut off minor discussions to allow time for major points
Moderator should look out for questions that could be perceived as harsh
Everything can be questioned, but decisions must stick unless new information is available

It was thought that restarting weekly Apps/Science Pipelines calls might now be desirable. Processing data

Perhaps move Princeton Monday meetings an hour later
Use RFDs for longer, earlier design discussions

Documentation Tools

A proposed set of documentation tools was presented.

Developers will be given templates and editing assistance for low-level docs.
JonathanS will try writing one “higher-level” document himself.
Algorithm descriptions and design rationales should go into the design documents.
Some of that should also probably go into “pipeline” scientific papers.

Information Exchange

DM Status

We presented the status of the DM effort to the project and to scientists.

New Hire Orientation

K-T gave recent DM hires an overview of DM and a chance to ask questions. A “boot camp” was proposed for new Science Pipelines developers and selected others to learn in a hands-on fashion about our tools and code; we are thinking about scheduling this for October.

Technical Operations Working Group

Margaret presented common ITC (information technology and communications) use cases based on ITIL that were welcomed. We indicated that these use cases can only set a floor for staffing; experience and judgement are needed to estimate proper staffing since many are highly variable (like resolving problems).

Network and Base Center

Ron presented the status of the network fiber paths. He presented summit and base data center network design proposals and discussed them with Mike Huffer, Stuart Marshall, Tony Johnson, and Jason Alt; Cisco ACI seems like an acceptable starting point (a decision is needed by 31-Dec).

Ron also discussed details of Camera Data System network hardware being purchased for development and testing with the Camera team.

SUI/T

At both a breakout and in the DM Status for Scientists plenary, presentations of Firefly APIs and usage models went over well, with the audience particularly impressed by the possibility of per-collaboration tools. Tony Johnson rebuilt the latest Firefly from source for use by the Camera team; all features asked for in March are now done, with the UIUC team working on utilizing the new capabilities. In the end, there was no time for proposed coding work on graphics overlays. Discussions about the SUI workspace were fruitful, as were discussions with the EPO team.

Calibration plan and data products

RobertL met with the calibration hardware folks (Stubbs and Ingraham) and made significant progress on the spectrophotometric instrument. The LSST calibration plan was presented. Tucker and Ting talked about the DES experience. Filter scanning was discussed with Steve Ritz and Chuck Claver, although without any definite conclusion.

Other Breakouts

OpSim
Project simulation needs
LSE-68 Camera Data System interface
Telescope and Site Software
Simultaneous Astrometry
Computational Needs for Simulations
Control System requirements during commissioning
Sensor Features

jbosch · August 27, 2015, 7:58pm

From “Science Pipelines Sprint Reviews”:

It was decided that K-T will participate in “backlog grooming” (deciding what the highest priority tasks are for the next sprint) with input from Mario. We will try to summarize to Mario what the next sprint will accomplish.

My recollection was that K-T (and optionally Robert and Mario) would participate in a sprint planning stage after an initial backlog-grooming has been done by institutional technical mangers, and that this sprint planning stage would be the more open meeting that included both management and the rest of the teams, with backlog-grooming more of a one-person job.

I do expect that there would be some prioritization done at the more public sprint planning meetings with the larger group in attendance, but my feeling is that there’s a fair amount of initial grunge work in grooming the backlog that would just waste most meeting attendees’ time.

ktl · August 27, 2015, 9:08pm

I don’t think that’s right. The product owner should set the priorities; local T/CAMs should then work with their staff to determine what can get done in a given sprint based on those priorities but also taking into account available time, skills, etc.

There’s grooming in the sense of closing open tickets, fixing up descriptions, logging story points, etc., but that’s not what we’re talking about here.

jbosch · August 27, 2015, 9:17pm

That certainly makes more sense from an ideal agile perspective.

I was expecting our approach to be somewhat different largely because our high-level priorities (and their associated issues) are already set by the long-term planning processes, so those don’t need a lot of prioritization-level attention, and for what remains at the per-sprint backlog grooming stage - a lot of lower-level bugs - it might be a waste of product owner time to have them present for the first pass. I was thinking that a local manager prioritization pass for the first stage would make it pretty quick to do a final round of prioritization at the beginning of the spring planning meeting with everyone present.

I’m happy to try it either way, of course.