Deployment technologies sharing in Infrastructure meeting

It has come to my attention that several teams within DM are looking into, learning about, and implementing service deployment technologies like Docker, Docker Swarm, Kubernetes, etc. NCSA (@jalt), SUIT (@xiuqin, @gpdf), SQuaRE (@frossie, @josh, @jmatt), and DAX (@fritzm, @fjammes, @brianv0) all seem to have an interest. I would like to propose that we share information and discuss these technologies in the Infrastructure meetings, currently being held alternate Fridays at 1 PM Pacific.

We might start off with each group giving a quick 10 minute presentation on their uses, what they already know, and what they would like to know, but this is of course up to Jason as the chair of the meeting.

I love it. If there’ s no objection, let’s start this at the Aug 6th mtg. Tomorrow’s meeting is already double booked with PDAC discussions due to vacation schedules next week.

I’ve been deploying Kubernetes projects at SQuaRE and I’ll be happy to talk about that.

Great idea! SUIT has not done anything with Docker yet. We love to learn from you all.

At CC-IN2P3 we have been experimenting packaging single steps of the LSST pipeline (e.g. tasks such as processCcd.py, ingestImages.py, etc.) as separate Docker containers.

Our initial main goal is to understand whether it is effective to exploit the local storage of the compute node where the container executes to store intermediate data* locally (as opposed as using, say, a networked file system or an object store). For reaching that goal we need to be able to orchestrate the containers so to decide in which host to execute the containerised next step of a given pipeline, based on the location of its input data. For container orchestration we intend to explore mechanisms such as Mesos, Swarm or Kubernetes.

The person leading this work is @YvanCalas. We are interested in learning from others and sharing our admittedly short experience. The slot of Friday 1 PM Pacific (10 PM Lyon time) is not fantastically convenient for us but we will do our best if that’s is the best slot for all the intended participants.

* By intermediate data we mean data that is to be consumed by the next step downstream in the pipeline.

1 Like

I’m completely open to a different time slot. As is, this time slot was difficult to coordinate with all involved. But we’ll entertain suggested new time slots.

1 Like

As a reminder, we will begin these discussions in the infrastructure call tomorrow (Friday) 1PM PST.

Great meeting today with a lot of information exchanged. A thought as to how to continue to make progress in this area:

  • Teams (particularly DAX and SUIT but also SQuaRE and maybe NCSA L1 and possibly NCSA Data Backbone and L2) can list the service components they will need to deploy in the development, integration, and operational environments, if this is not already in documents.
  • Teams could propose a partitioning of these components into containers/pods for peer review by others in the group.
  • Teams could propose a deployment strategy of those containers/pods for similar peer review.