11:00:41 <oneswig> #startmeeting scientific-sig
11:00:42 <openstack> Meeting started Wed Apr 25 11:00:41 2018 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:45 <openstack> The meeting name has been set to 'scientific_sig'
11:01:01 <oneswig> greetings o/
11:01:10 <martial_> Hello everyone
11:01:18 <daveholland> hi
11:01:26 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_25th_2018
11:02:04 <oneswig> We have a guest speaker, but he may be a little late
11:02:48 <oneswig> How about we cover some of the AOB first
11:03:26 <oneswig> #topic Vancouver summit planning
11:03:35 <oneswig> Two forum topics proposed at present
11:03:48 <oneswig> #link etherpad for forum https://etherpad.openstack.org/p/YVR18-scientific-sig-brainstorming
11:04:45 <oneswig> There was a good deal of interest in blazar's model of reservations at the PTG. I wonder if that would make a good subject.
11:05:48 <qwebirc54714> hey there
11:05:55 <oneswig> martial_: do you remember the date for submissions?
11:05:56 <qwebirc54714> CHristian Kniep here
11:06:07 <oneswig> Hi Christian - thanks for coming along
11:06:11 <oneswig> and at short notice too
11:06:17 <verdurin> Afternoon.
11:06:21 <qwebirc54714> are we going to stick with IRC?
11:06:49 <oneswig> I think so - all the meetings are logged and I often get messages from people who read the logs afterwards
11:07:02 <oneswig> It's a low-intensity form of meeting...
11:07:31 <oneswig> Shall we get started?
11:07:41 <oneswig> #topic Docker and HPC environments
11:07:51 <martial_> Which workshop?
11:07:59 <oneswig> So I met Christian and martial_ met Christine around the same time
11:08:32 <oneswig> martial_: Vancouver forum
11:09:04 <oneswig> #link Christian's presentation on Docker and HPC https://docs.google.com/presentation/d/1ol0WHEhzT7dukafGKf7gE06wz2cIBeDYcvSzpbz_Nss/edit#slide=id.g1e0da86092_0_4
11:10:14 <qwebirc54714> I am sorry guys; I was not aware that it will be textual. Am I supposed to walk you trough the presentation now?
11:10:34 <oneswig> Q&A works well, is that OK?
11:10:58 <qwebirc54714> ok, cool... I only have like 10min as I have to jump on something that came up.
11:11:06 <qwebirc54714> shoot
11:11:29 <oneswig> Christian - there's two sides to this integration I think.  Stuff that HPC people want to see and stuff that Docker does that HPC doesn't want to see.  Is that true?
11:12:53 <qwebirc54714> kind of, yes. Coming from the cloudy side of things; docker fancies being host-agnostic, while HPC/AI needs to be host-specific in order to mount in certain shared file-systems and devices
11:13:16 <oneswig> Can you talk about what your projects in Docker are aiming to improve?
11:13:23 <oneswig> for HPC use cases that is
11:13:26 <qwebirc54714> furthermore, a user has to be honored no matter what; thus pinning the UID:GID for a given container according to the user executing the docker run is a must
11:14:07 <qwebirc54714> page 31 explains what we need to do in order to fix it.
11:15:02 <qwebirc54714> the improvement would be reproducibility, portability (this time for real) and (my favorite) performance-optimization since the container file-system only has to support the solver and not all the services around it (syslog, slurmd, ...)
11:15:16 <oneswig> How do you see workload schedulers working - slurmd talks to dockerd, or something more streamlined?
11:15:41 <martial_> Oneswig: "Deadline for proposing Forum topics. Scheduling committee meeting to make draft agenda" was April 15
11:15:53 <oneswig> ah, oops...
11:16:30 <qwebirc54714> first iteration: the batch scripts fires up the container, either `create` in a prolog or `run` in the script. Later on we can have a look at more optimised solutions
11:16:57 <qwebirc54714> from my POV: page 25 in the end
11:17:15 <qwebirc54714> the workload schedulers schedule and let the engine execute the binary (within a container of course)
11:17:30 <oneswig> Christian - there was reference in the feedback at HPCAC to a desire to have a process group lineage, which dockerd doesn't provide.  Is that something you've seen much?
11:17:57 <qwebirc54714> process group lineage... can you elaborate a bit - not sure I understand
11:18:16 <oneswig> parent, child, process tree
11:18:28 <oneswig> I think that was the issue
11:19:08 <oneswig> The bash script you mentioned doesn't have a descendent which is the active workload
11:19:12 <qwebirc54714> yeah, ok... I guess you are hinting at MPI wire-up. I did it by spawning orted using a fake ssh client so far
11:20:04 <oneswig> I'm aware you're tight for time.  Have you got a link to your work?
11:20:06 <qwebirc54714> that would be something to work on, I reckon that the docker-engine could include a PMIx server and we are all fine - but that is a long way to iterate towards
11:20:25 <qwebirc54714> https://github.com/qnib/go-wharfie
11:20:56 <qwebirc54714> but the README needs an update with an example.... :/
11:21:35 <oneswig> Thanks.  You think this kind of HPC-oriented runtime configuration is a way off - any idea how far?
11:22:25 <qwebirc54714> I did a hack last week, forcing the changes upon the docker-engine
11:22:45 <qwebirc54714> https://github.com/qnib/moby/blob/houdini/HOUDINI.md
11:23:08 <qwebirc54714> but that is far from being the solutions; it is a hack, but that is the functionality I would like to see
11:23:53 <oneswig> This is for enabling GPU passthrough specifically?
11:24:44 <qwebirc54714> you can map in any device, so IB/OPA should work as well - didn't have access to an IB system
11:25:05 <qwebirc54714> also default mounts for a given cluster, based on how it is mounted on the host
11:25:23 <qwebirc54714> like /A/home vs /B/home has to be mounted to /home
11:26:20 <oneswig> Seems useful to me... There were some interesting points around IB and network isolation.
11:26:36 <oneswig> Some people do not co-locate running workloads on the same node but others do.
11:26:58 <oneswig> But those that do share an IB resource today and seem happy with it.
11:27:11 <oneswig> Is anything more necessary in terms of isolation?
11:27:14 <qwebirc54714> it is a start and I won't go into the weeds of site-specific solutions. :)
11:27:38 <qwebirc54714> there is no silver bullet for HPC, not even a bronce one. :)
11:28:00 <oneswig> Is this use case getting much interest within Docker inc?
11:28:05 <qwebirc54714> I rather provide the building blocks and a flexible solution, so that sites and vendors can tailor it to their needs
11:28:32 <qwebirc54714> The GPU use-case might carry it over the finish line. Much more buzz then in HPC land
11:29:01 <oneswig> Certainly seems that way with AI
11:29:18 <qwebirc54714> but as I showed in page26ff: the GPU use-case will end up as a tightly coupled, distributed workload relying on RDMA and shared filesystems.
11:29:33 <qwebirc54714> et voila - you have your HPC job covered right there
11:29:53 <qwebirc54714> and I do not care how we get over the finish line.
11:30:03 <oneswig> Looking ahead, do you see the OCI components changing things, around runC for example?
11:30:16 <qwebirc54714> 5min more, then I have to drop
11:30:35 <qwebirc54714> there was a RDMA namespace PR a couple of weeks ago
11:30:56 <qwebirc54714> https://github.com/opencontainers/runtime-spec/pull/942
11:31:33 <qwebirc54714> runC has almost everything we need to make it work; it is the stack above the needs to be enabled. the engine and containerd
11:31:37 <oneswig> I think the one you linked to was RDMA cgroups rather than a namespace - isn't that more about resource management?
11:32:29 <qwebirc54714> Haven't look at it but my understanding is that it's both.
11:32:41 <oneswig> I saw this - assume you're well familiar with it: https://rootlesscontaine.rs/
11:33:24 <b1airo> o/ hello all, sorry for being late! Now what's the most annoying question I can ask that has already been discussed...?
11:33:29 <qwebirc54714> to me, user-land container are just a drop-in replacement like singularity and shifter that will work with existing workloads.
11:33:29 <oneswig> Hi b1airo
11:33:35 <oneswig> #chair martial_ b1airo
11:33:36 <openstack> Current chairs: b1airo martial_ oneswig
11:34:09 <qwebirc54714> it's about the abstraction/orchestration above and the streamlining of the execution no matter what orchestration you are using.
11:34:36 <oneswig> qwebirc54714: the drop-in bit is quite appealing.  Perhaps the user portability between environments is maximised then
11:35:19 <qwebirc54714> I know it is appealing but you need to be OCI compliant in order to hook into kubernetes and swarm (and the future once).
11:35:57 <qwebirc54714> shifter, singularity and alike (IMHO) can not be OCI compliant as they would need to become runC + a bit of containerd
11:36:21 <qwebirc54714> ok guys, have to drop out. sorry for being in a hurry
11:36:34 <oneswig> OK thanks Christian.
11:37:04 <martial_> Thanks Christian
11:37:10 <priteau> Thanks!
11:37:17 <martial_> (too slow on my phone keyboard)
11:37:17 <daveholland> thanks
11:37:30 <oneswig> It's an interesting question on oci compliance and kubernetes integration
11:37:48 <martial_> Is Christine still with us?
11:38:03 <oneswig> I'm not sure they are aiming for that but I do think the commonality of codebase helps with maintainability
11:38:29 <daveholland> do users typically ask for "I want to run containers" or "I want something with OCI compliance"? (the former, IME)
11:39:27 <priteau> I suppose they generally want to run containers [that they already run elsewhere on a non-HPC environment]
11:39:27 <oneswig> I think we have the same - the advantages of a familiar development environment.  I don't think our users are onto defining their applications through kuberenets
11:40:00 <verdurin> I think the question is whether, if you use k8s, that community will help you if you're not using something OCI-compliant
11:40:01 <daveholland> oneswig: we have a few brave adventurers putting kubespray on top of openstack
11:40:40 <oneswig> daveholland: we've used that, it's actually pretty easy to get something going.  There's quite a bit of invention going on around integrating the two
11:40:51 <oneswig> (as in, kuberenetes and openstack)
11:41:01 <daveholland> yes
11:41:12 <daveholland> but, it's "just" another bit of learning curve :)
11:41:15 <oneswig> but mostly we like using Magnum, when we can
11:41:59 <oneswig> daveholland: and I'm sure they aren't doing it for the sysadmin skills, this is just in the way of their goal
11:42:55 <daveholland> oneswig: oh exactly, we (sysadmins) are only just on top of openstack admin, before we think about providing container orchestration as-a-service
11:43:21 <oneswig> The HPC container landscape is pretty interesting right now and OpenStack infrastructure sits especially well beneath it all
11:43:49 <oneswig> OK, shall we move on?
11:43:59 <oneswig> any final thoughts?
11:44:40 <oneswig> It might be interesting to invite someone from one of the dedicated HPC container projects along for another point of view
11:45:08 <martial_> I need to check if both Christine and Christian are going to be at DockerCon
11:45:39 <oneswig> martial_: any comparison with what was discussed when you were at docker federal?
11:47:49 <oneswig> OK let's move on
11:48:17 <oneswig> #topic conferences, AOB
11:48:27 <martial_> Also I started a Supercomputing 18 BoF with our past Openstack crowd (that will be at sc) and Christian and Christine
11:48:27 <martial_> Today was more a technical how do we do it
11:48:27 <martial_> The other one was why we are doing it and what that means with a little bit of how
11:49:08 <b1airo> Sounds good martial_
11:49:16 <martial_> I was able to ask a couple more technical questions of Christine after
11:49:29 <b1airo> I'm not sure where I'll be yet (Europe or US)
11:49:46 <oneswig> I'm pretty sure I'll be in Berlin
11:50:09 <oneswig> Is anyone here going to ISC?
11:50:34 <oneswig> https://www.isc-hpc.com/
11:51:15 <verdurin> oneswig: considering it
11:51:25 <b1airo> I put the two Forum session suggestions that came via the SIG - GPUs and Preemptible Instances - in to the Forum site (albeit a day late)
11:51:32 <b1airo> (for Vancouver)
11:51:33 <oneswig> Alas I am not planning to go - but I'll be at HPC Knowledge Partnership in Barcelona the week before.
11:51:47 <oneswig> b1airo: thanks!  I'd not been keeping on top of that at all.
11:51:53 <oneswig> Glad someone is...
11:51:55 <martial_> I can add you if you want Blair
11:52:19 <verdurin> oneswig: have you a link for the Barcelona event?
11:52:29 <daveholland> both sessions are of interest to us but particularly preemptible
11:52:40 <priteau> About the Forum, the Blazar has submitted a topic to discuss requirements for resource reservation across OpenStack
11:52:55 <priteau> the Blazar *team*
11:53:05 <verdurin> (don't worry, found it)
11:53:11 <b1airo> oneswig: yeah I got invited for the Barcelona thing but can't go unfortunately. Alas, would have loved to return
11:53:12 <oneswig> I don't think the programme's up yet but it's here: http://www.hpckp.org/index.php/annual-meeting/hpckp-18
11:53:48 <oneswig> priteau: excellent - was hoping it would get covered given the interest in Dublin
11:54:00 <priteau> It still has to be selected of course
11:54:12 <priteau> IIRC at the last summit we merged with another topic
11:54:51 <b1airo> priteau: i.e. not just Nova resources?
11:56:21 <priteau> b1airo: indeed, we would like to integrate with other projects as well. Reserving floating IPs from Neutron is a use case for NFV.
11:56:37 <priteau> #link http://forumtopics.openstack.org/cfp/details/111
11:56:57 <oneswig> I think Bob Budden also wanted a mention of the workshop on Container-based systems for Big data, Distributed and Parallel computing - in Turin in August - https://sites.google.com/view/cbdp18/home
11:57:09 <oneswig> I assume that means he's going!
11:58:28 <oneswig> priteau: if you'll be in Vancouver for this session, you should begin it with the demo of Chameleon's system. It's compelling.
11:58:31 <b1airo> Love a call for participation that starts with, "Nowadays..." :-)
11:59:02 <b1airo> +1 to that suggestion
11:59:04 <priteau> oneswig: unfortunately I cannot attend the Vancouver summit. From Blazar the NTT guys will be there.
11:59:13 <b1airo> :-(
11:59:49 <oneswig> Ah, too bad.
12:00:02 <oneswig> (although I am sure they'll do a great job)
12:00:19 <b1airo> Looks like we are on the hour!
12:00:20 <oneswig> Will you get that booking system upstream?
12:00:27 <oneswig> ah, well spotted b1airo
12:00:34 <oneswig> final comments!?
12:00:35 <priteau> oneswig: that's the plan!
12:00:44 <oneswig> +1 to that
12:00:52 <oneswig> #endmeeting