11:00:41 <oneswig> #startmeeting scientific-sig 11:00:42 <openstack> Meeting started Wed Apr 25 11:00:41 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:45 <openstack> The meeting name has been set to 'scientific_sig' 11:01:01 <oneswig> greetings o/ 11:01:10 <martial_> Hello everyone 11:01:18 <daveholland> hi 11:01:26 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_25th_2018 11:02:04 <oneswig> We have a guest speaker, but he may be a little late 11:02:48 <oneswig> How about we cover some of the AOB first 11:03:26 <oneswig> #topic Vancouver summit planning 11:03:35 <oneswig> Two forum topics proposed at present 11:03:48 <oneswig> #link etherpad for forum https://etherpad.openstack.org/p/YVR18-scientific-sig-brainstorming 11:04:45 <oneswig> There was a good deal of interest in blazar's model of reservations at the PTG. I wonder if that would make a good subject. 11:05:48 <qwebirc54714> hey there 11:05:55 <oneswig> martial_: do you remember the date for submissions? 11:05:56 <qwebirc54714> CHristian Kniep here 11:06:07 <oneswig> Hi Christian - thanks for coming along 11:06:11 <oneswig> and at short notice too 11:06:17 <verdurin> Afternoon. 11:06:21 <qwebirc54714> are we going to stick with IRC? 11:06:49 <oneswig> I think so - all the meetings are logged and I often get messages from people who read the logs afterwards 11:07:02 <oneswig> It's a low-intensity form of meeting... 11:07:31 <oneswig> Shall we get started? 11:07:41 <oneswig> #topic Docker and HPC environments 11:07:51 <martial_> Which workshop? 11:07:59 <oneswig> So I met Christian and martial_ met Christine around the same time 11:08:32 <oneswig> martial_: Vancouver forum 11:09:04 <oneswig> #link Christian's presentation on Docker and HPC https://docs.google.com/presentation/d/1ol0WHEhzT7dukafGKf7gE06wz2cIBeDYcvSzpbz_Nss/edit#slide=id.g1e0da86092_0_4 11:10:14 <qwebirc54714> I am sorry guys; I was not aware that it will be textual. Am I supposed to walk you trough the presentation now? 11:10:34 <oneswig> Q&A works well, is that OK? 11:10:58 <qwebirc54714> ok, cool... I only have like 10min as I have to jump on something that came up. 11:11:06 <qwebirc54714> shoot 11:11:29 <oneswig> Christian - there's two sides to this integration I think. Stuff that HPC people want to see and stuff that Docker does that HPC doesn't want to see. Is that true? 11:12:53 <qwebirc54714> kind of, yes. Coming from the cloudy side of things; docker fancies being host-agnostic, while HPC/AI needs to be host-specific in order to mount in certain shared file-systems and devices 11:13:16 <oneswig> Can you talk about what your projects in Docker are aiming to improve? 11:13:23 <oneswig> for HPC use cases that is 11:13:26 <qwebirc54714> furthermore, a user has to be honored no matter what; thus pinning the UID:GID for a given container according to the user executing the docker run is a must 11:14:07 <qwebirc54714> page 31 explains what we need to do in order to fix it. 11:15:02 <qwebirc54714> the improvement would be reproducibility, portability (this time for real) and (my favorite) performance-optimization since the container file-system only has to support the solver and not all the services around it (syslog, slurmd, ...) 11:15:16 <oneswig> How do you see workload schedulers working - slurmd talks to dockerd, or something more streamlined? 11:15:41 <martial_> Oneswig: "Deadline for proposing Forum topics. Scheduling committee meeting to make draft agenda" was April 15 11:15:53 <oneswig> ah, oops... 11:16:30 <qwebirc54714> first iteration: the batch scripts fires up the container, either `create` in a prolog or `run` in the script. Later on we can have a look at more optimised solutions 11:16:57 <qwebirc54714> from my POV: page 25 in the end 11:17:15 <qwebirc54714> the workload schedulers schedule and let the engine execute the binary (within a container of course) 11:17:30 <oneswig> Christian - there was reference in the feedback at HPCAC to a desire to have a process group lineage, which dockerd doesn't provide. Is that something you've seen much? 11:17:57 <qwebirc54714> process group lineage... can you elaborate a bit - not sure I understand 11:18:16 <oneswig> parent, child, process tree 11:18:28 <oneswig> I think that was the issue 11:19:08 <oneswig> The bash script you mentioned doesn't have a descendent which is the active workload 11:19:12 <qwebirc54714> yeah, ok... I guess you are hinting at MPI wire-up. I did it by spawning orted using a fake ssh client so far 11:20:04 <oneswig> I'm aware you're tight for time. Have you got a link to your work? 11:20:06 <qwebirc54714> that would be something to work on, I reckon that the docker-engine could include a PMIx server and we are all fine - but that is a long way to iterate towards 11:20:25 <qwebirc54714> https://github.com/qnib/go-wharfie 11:20:56 <qwebirc54714> but the README needs an update with an example.... :/ 11:21:35 <oneswig> Thanks. You think this kind of HPC-oriented runtime configuration is a way off - any idea how far? 11:22:25 <qwebirc54714> I did a hack last week, forcing the changes upon the docker-engine 11:22:45 <qwebirc54714> https://github.com/qnib/moby/blob/houdini/HOUDINI.md 11:23:08 <qwebirc54714> but that is far from being the solutions; it is a hack, but that is the functionality I would like to see 11:23:53 <oneswig> This is for enabling GPU passthrough specifically? 11:24:44 <qwebirc54714> you can map in any device, so IB/OPA should work as well - didn't have access to an IB system 11:25:05 <qwebirc54714> also default mounts for a given cluster, based on how it is mounted on the host 11:25:23 <qwebirc54714> like /A/home vs /B/home has to be mounted to /home 11:26:20 <oneswig> Seems useful to me... There were some interesting points around IB and network isolation. 11:26:36 <oneswig> Some people do not co-locate running workloads on the same node but others do. 11:26:58 <oneswig> But those that do share an IB resource today and seem happy with it. 11:27:11 <oneswig> Is anything more necessary in terms of isolation? 11:27:14 <qwebirc54714> it is a start and I won't go into the weeds of site-specific solutions. :) 11:27:38 <qwebirc54714> there is no silver bullet for HPC, not even a bronce one. :) 11:28:00 <oneswig> Is this use case getting much interest within Docker inc? 11:28:05 <qwebirc54714> I rather provide the building blocks and a flexible solution, so that sites and vendors can tailor it to their needs 11:28:32 <qwebirc54714> The GPU use-case might carry it over the finish line. Much more buzz then in HPC land 11:29:01 <oneswig> Certainly seems that way with AI 11:29:18 <qwebirc54714> but as I showed in page26ff: the GPU use-case will end up as a tightly coupled, distributed workload relying on RDMA and shared filesystems. 11:29:33 <qwebirc54714> et voila - you have your HPC job covered right there 11:29:53 <qwebirc54714> and I do not care how we get over the finish line. 11:30:03 <oneswig> Looking ahead, do you see the OCI components changing things, around runC for example? 11:30:16 <qwebirc54714> 5min more, then I have to drop 11:30:35 <qwebirc54714> there was a RDMA namespace PR a couple of weeks ago 11:30:56 <qwebirc54714> https://github.com/opencontainers/runtime-spec/pull/942 11:31:33 <qwebirc54714> runC has almost everything we need to make it work; it is the stack above the needs to be enabled. the engine and containerd 11:31:37 <oneswig> I think the one you linked to was RDMA cgroups rather than a namespace - isn't that more about resource management? 11:32:29 <qwebirc54714> Haven't look at it but my understanding is that it's both. 11:32:41 <oneswig> I saw this - assume you're well familiar with it: https://rootlesscontaine.rs/ 11:33:24 <b1airo> o/ hello all, sorry for being late! Now what's the most annoying question I can ask that has already been discussed...? 11:33:29 <qwebirc54714> to me, user-land container are just a drop-in replacement like singularity and shifter that will work with existing workloads. 11:33:29 <oneswig> Hi b1airo 11:33:35 <oneswig> #chair martial_ b1airo 11:33:36 <openstack> Current chairs: b1airo martial_ oneswig 11:34:09 <qwebirc54714> it's about the abstraction/orchestration above and the streamlining of the execution no matter what orchestration you are using. 11:34:36 <oneswig> qwebirc54714: the drop-in bit is quite appealing. Perhaps the user portability between environments is maximised then 11:35:19 <qwebirc54714> I know it is appealing but you need to be OCI compliant in order to hook into kubernetes and swarm (and the future once). 11:35:57 <qwebirc54714> shifter, singularity and alike (IMHO) can not be OCI compliant as they would need to become runC + a bit of containerd 11:36:21 <qwebirc54714> ok guys, have to drop out. sorry for being in a hurry 11:36:34 <oneswig> OK thanks Christian. 11:37:04 <martial_> Thanks Christian 11:37:10 <priteau> Thanks! 11:37:17 <martial_> (too slow on my phone keyboard) 11:37:17 <daveholland> thanks 11:37:30 <oneswig> It's an interesting question on oci compliance and kubernetes integration 11:37:48 <martial_> Is Christine still with us? 11:38:03 <oneswig> I'm not sure they are aiming for that but I do think the commonality of codebase helps with maintainability 11:38:29 <daveholland> do users typically ask for "I want to run containers" or "I want something with OCI compliance"? (the former, IME) 11:39:27 <priteau> I suppose they generally want to run containers [that they already run elsewhere on a non-HPC environment] 11:39:27 <oneswig> I think we have the same - the advantages of a familiar development environment. I don't think our users are onto defining their applications through kuberenets 11:40:00 <verdurin> I think the question is whether, if you use k8s, that community will help you if you're not using something OCI-compliant 11:40:01 <daveholland> oneswig: we have a few brave adventurers putting kubespray on top of openstack 11:40:40 <oneswig> daveholland: we've used that, it's actually pretty easy to get something going. There's quite a bit of invention going on around integrating the two 11:40:51 <oneswig> (as in, kuberenetes and openstack) 11:41:01 <daveholland> yes 11:41:12 <daveholland> but, it's "just" another bit of learning curve :) 11:41:15 <oneswig> but mostly we like using Magnum, when we can 11:41:59 <oneswig> daveholland: and I'm sure they aren't doing it for the sysadmin skills, this is just in the way of their goal 11:42:55 <daveholland> oneswig: oh exactly, we (sysadmins) are only just on top of openstack admin, before we think about providing container orchestration as-a-service 11:43:21 <oneswig> The HPC container landscape is pretty interesting right now and OpenStack infrastructure sits especially well beneath it all 11:43:49 <oneswig> OK, shall we move on? 11:43:59 <oneswig> any final thoughts? 11:44:40 <oneswig> It might be interesting to invite someone from one of the dedicated HPC container projects along for another point of view 11:45:08 <martial_> I need to check if both Christine and Christian are going to be at DockerCon 11:45:39 <oneswig> martial_: any comparison with what was discussed when you were at docker federal? 11:47:49 <oneswig> OK let's move on 11:48:17 <oneswig> #topic conferences, AOB 11:48:27 <martial_> Also I started a Supercomputing 18 BoF with our past Openstack crowd (that will be at sc) and Christian and Christine 11:48:27 <martial_> Today was more a technical how do we do it 11:48:27 <martial_> The other one was why we are doing it and what that means with a little bit of how 11:49:08 <b1airo> Sounds good martial_ 11:49:16 <martial_> I was able to ask a couple more technical questions of Christine after 11:49:29 <b1airo> I'm not sure where I'll be yet (Europe or US) 11:49:46 <oneswig> I'm pretty sure I'll be in Berlin 11:50:09 <oneswig> Is anyone here going to ISC? 11:50:34 <oneswig> https://www.isc-hpc.com/ 11:51:15 <verdurin> oneswig: considering it 11:51:25 <b1airo> I put the two Forum session suggestions that came via the SIG - GPUs and Preemptible Instances - in to the Forum site (albeit a day late) 11:51:32 <b1airo> (for Vancouver) 11:51:33 <oneswig> Alas I am not planning to go - but I'll be at HPC Knowledge Partnership in Barcelona the week before. 11:51:47 <oneswig> b1airo: thanks! I'd not been keeping on top of that at all. 11:51:53 <oneswig> Glad someone is... 11:51:55 <martial_> I can add you if you want Blair 11:52:19 <verdurin> oneswig: have you a link for the Barcelona event? 11:52:29 <daveholland> both sessions are of interest to us but particularly preemptible 11:52:40 <priteau> About the Forum, the Blazar has submitted a topic to discuss requirements for resource reservation across OpenStack 11:52:55 <priteau> the Blazar *team* 11:53:05 <verdurin> (don't worry, found it) 11:53:11 <b1airo> oneswig: yeah I got invited for the Barcelona thing but can't go unfortunately. Alas, would have loved to return 11:53:12 <oneswig> I don't think the programme's up yet but it's here: http://www.hpckp.org/index.php/annual-meeting/hpckp-18 11:53:48 <oneswig> priteau: excellent - was hoping it would get covered given the interest in Dublin 11:54:00 <priteau> It still has to be selected of course 11:54:12 <priteau> IIRC at the last summit we merged with another topic 11:54:51 <b1airo> priteau: i.e. not just Nova resources? 11:56:21 <priteau> b1airo: indeed, we would like to integrate with other projects as well. Reserving floating IPs from Neutron is a use case for NFV. 11:56:37 <priteau> #link http://forumtopics.openstack.org/cfp/details/111 11:56:57 <oneswig> I think Bob Budden also wanted a mention of the workshop on Container-based systems for Big data, Distributed and Parallel computing - in Turin in August - https://sites.google.com/view/cbdp18/home 11:57:09 <oneswig> I assume that means he's going! 11:58:28 <oneswig> priteau: if you'll be in Vancouver for this session, you should begin it with the demo of Chameleon's system. It's compelling. 11:58:31 <b1airo> Love a call for participation that starts with, "Nowadays..." :-) 11:59:02 <b1airo> +1 to that suggestion 11:59:04 <priteau> oneswig: unfortunately I cannot attend the Vancouver summit. From Blazar the NTT guys will be there. 11:59:13 <b1airo> :-( 11:59:49 <oneswig> Ah, too bad. 12:00:02 <oneswig> (although I am sure they'll do a great job) 12:00:19 <b1airo> Looks like we are on the hour! 12:00:20 <oneswig> Will you get that booking system upstream? 12:00:27 <oneswig> ah, well spotted b1airo 12:00:34 <oneswig> final comments!? 12:00:35 <priteau> oneswig: that's the plan! 12:00:44 <oneswig> +1 to that 12:00:52 <oneswig> #endmeeting