Jenkins maintenance Sunday 2019-01-13 @ ~2000-2100 project time

Tags: #<Tag:0x00007ff62b072b40> #<Tag:0x00007ff62b0729d8> #<Tag:0x00007ff62b0727f8>

(josh) #1

Due to a number of known security issues, the jenkins core+plugins needs to be updated. As this would inherently be a downtime event, this will be combined with an “across the board” update including bumping blueocean +2 minor releases (no obvious UI changes), updating docker, system kernel, etc.

The downtime for the jenkins master is expected to be fairly brief and the agent nodes will be updated afterwards in a rolling fashion. The beginning/end of the maintenance period will be announced on #dm-jenkins.

(updated subject to remove the requirement for time travel)

(josh) #2

Jenkins maintenance is complete. Please let us know if any strange behavior is noticed post-update.

(josh) #3

Unfortunately, the k8s cluster that was hosting, after successfully making a routine update from k8s 1.10.x -> 1.11.x, went into some sort of bizarre non-functional state about 20 minutes later, and after I had stopped manually checking service function. On top of this, there was an after hours monitoring policy in place that limited outage notifications to be only every 2 hours (already removed by DM-17194). The result is that the several of the after hours jenkins jobs fail, including the nightly release.

A fresh deployment was made this morning, which only takes a few minutes, but it will take a considerable amount of time for the > 1.5TiB of files to be copied back from s3. My current guess is this will take another 4-6 hours to complete. Restarting the nightly release is blocked until the sync from s3 has completed.