11:00:41 #startmeeting scientific-sig 11:00:42 Meeting started Wed Apr 25 11:00:41 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:45 The meeting name has been set to 'scientific_sig' 11:01:01 greetings o/ 11:01:10 Hello everyone 11:01:18 hi 11:01:26 #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_25th_2018 11:02:04 We have a guest speaker, but he may be a little late 11:02:48 How about we cover some of the AOB first 11:03:26 #topic Vancouver summit planning 11:03:35 Two forum topics proposed at present 11:03:48 #link etherpad for forum https://etherpad.openstack.org/p/YVR18-scientific-sig-brainstorming 11:04:45 There was a good deal of interest in blazar's model of reservations at the PTG. I wonder if that would make a good subject. 11:05:48 hey there 11:05:55 martial_: do you remember the date for submissions? 11:05:56 CHristian Kniep here 11:06:07 Hi Christian - thanks for coming along 11:06:11 and at short notice too 11:06:17 Afternoon. 11:06:21 are we going to stick with IRC? 11:06:49 I think so - all the meetings are logged and I often get messages from people who read the logs afterwards 11:07:02 It's a low-intensity form of meeting... 11:07:31 Shall we get started? 11:07:41 #topic Docker and HPC environments 11:07:51 Which workshop? 11:07:59 So I met Christian and martial_ met Christine around the same time 11:08:32 martial_: Vancouver forum 11:09:04 #link Christian's presentation on Docker and HPC https://docs.google.com/presentation/d/1ol0WHEhzT7dukafGKf7gE06wz2cIBeDYcvSzpbz_Nss/edit#slide=id.g1e0da86092_0_4 11:10:14 I am sorry guys; I was not aware that it will be textual. Am I supposed to walk you trough the presentation now? 11:10:34 Q&A works well, is that OK? 11:10:58 ok, cool... I only have like 10min as I have to jump on something that came up. 11:11:06 shoot 11:11:29 Christian - there's two sides to this integration I think. Stuff that HPC people want to see and stuff that Docker does that HPC doesn't want to see. Is that true? 11:12:53 kind of, yes. Coming from the cloudy side of things; docker fancies being host-agnostic, while HPC/AI needs to be host-specific in order to mount in certain shared file-systems and devices 11:13:16 Can you talk about what your projects in Docker are aiming to improve? 11:13:23 for HPC use cases that is 11:13:26 furthermore, a user has to be honored no matter what; thus pinning the UID:GID for a given container according to the user executing the docker run is a must 11:14:07 page 31 explains what we need to do in order to fix it. 11:15:02 the improvement would be reproducibility, portability (this time for real) and (my favorite) performance-optimization since the container file-system only has to support the solver and not all the services around it (syslog, slurmd, ...) 11:15:16 How do you see workload schedulers working - slurmd talks to dockerd, or something more streamlined? 11:15:41 Oneswig: "Deadline for proposing Forum topics. Scheduling committee meeting to make draft agenda" was April 15 11:15:53 ah, oops... 11:16:30 first iteration: the batch scripts fires up the container, either `create` in a prolog or `run` in the script. Later on we can have a look at more optimised solutions 11:16:57 from my POV: page 25 in the end 11:17:15 the workload schedulers schedule and let the engine execute the binary (within a container of course) 11:17:30 Christian - there was reference in the feedback at HPCAC to a desire to have a process group lineage, which dockerd doesn't provide. Is that something you've seen much? 11:17:57 process group lineage... can you elaborate a bit - not sure I understand 11:18:16 parent, child, process tree 11:18:28 I think that was the issue 11:19:08 The bash script you mentioned doesn't have a descendent which is the active workload 11:19:12 yeah, ok... I guess you are hinting at MPI wire-up. I did it by spawning orted using a fake ssh client so far 11:20:04 I'm aware you're tight for time. Have you got a link to your work? 11:20:06 that would be something to work on, I reckon that the docker-engine could include a PMIx server and we are all fine - but that is a long way to iterate towards 11:20:25 https://github.com/qnib/go-wharfie 11:20:56 but the README needs an update with an example.... :/ 11:21:35 Thanks. You think this kind of HPC-oriented runtime configuration is a way off - any idea how far? 11:22:25 I did a hack last week, forcing the changes upon the docker-engine 11:22:45 https://github.com/qnib/moby/blob/houdini/HOUDINI.md 11:23:08 but that is far from being the solutions; it is a hack, but that is the functionality I would like to see 11:23:53 This is for enabling GPU passthrough specifically? 11:24:44 you can map in any device, so IB/OPA should work as well - didn't have access to an IB system 11:25:05 also default mounts for a given cluster, based on how it is mounted on the host 11:25:23 like /A/home vs /B/home has to be mounted to /home 11:26:20 Seems useful to me... There were some interesting points around IB and network isolation. 11:26:36 Some people do not co-locate running workloads on the same node but others do. 11:26:58 But those that do share an IB resource today and seem happy with it. 11:27:11 Is anything more necessary in terms of isolation? 11:27:14 it is a start and I won't go into the weeds of site-specific solutions. :) 11:27:38 there is no silver bullet for HPC, not even a bronce one. :) 11:28:00 Is this use case getting much interest within Docker inc? 11:28:05 I rather provide the building blocks and a flexible solution, so that sites and vendors can tailor it to their needs 11:28:32 The GPU use-case might carry it over the finish line. Much more buzz then in HPC land 11:29:01 Certainly seems that way with AI 11:29:18 but as I showed in page26ff: the GPU use-case will end up as a tightly coupled, distributed workload relying on RDMA and shared filesystems. 11:29:33 et voila - you have your HPC job covered right there 11:29:53 and I do not care how we get over the finish line. 11:30:03 Looking ahead, do you see the OCI components changing things, around runC for example? 11:30:16 5min more, then I have to drop 11:30:35 there was a RDMA namespace PR a couple of weeks ago 11:30:56 https://github.com/opencontainers/runtime-spec/pull/942 11:31:33 runC has almost everything we need to make it work; it is the stack above the needs to be enabled. the engine and containerd 11:31:37 I think the one you linked to was RDMA cgroups rather than a namespace - isn't that more about resource management? 11:32:29 Haven't look at it but my understanding is that it's both. 11:32:41 I saw this - assume you're well familiar with it: https://rootlesscontaine.rs/ 11:33:24 o/ hello all, sorry for being late! Now what's the most annoying question I can ask that has already been discussed...? 11:33:29 to me, user-land container are just a drop-in replacement like singularity and shifter that will work with existing workloads. 11:33:29 Hi b1airo 11:33:35 #chair martial_ b1airo 11:33:36 Current chairs: b1airo martial_ oneswig 11:34:09 it's about the abstraction/orchestration above and the streamlining of the execution no matter what orchestration you are using. 11:34:36 qwebirc54714: the drop-in bit is quite appealing. Perhaps the user portability between environments is maximised then 11:35:19 I know it is appealing but you need to be OCI compliant in order to hook into kubernetes and swarm (and the future once). 11:35:57 shifter, singularity and alike (IMHO) can not be OCI compliant as they would need to become runC + a bit of containerd 11:36:21 ok guys, have to drop out. sorry for being in a hurry 11:36:34 OK thanks Christian. 11:37:04 Thanks Christian 11:37:10 Thanks! 11:37:17 (too slow on my phone keyboard) 11:37:17 thanks 11:37:30 It's an interesting question on oci compliance and kubernetes integration 11:37:48 Is Christine still with us? 11:38:03 I'm not sure they are aiming for that but I do think the commonality of codebase helps with maintainability 11:38:29 do users typically ask for "I want to run containers" or "I want something with OCI compliance"? (the former, IME) 11:39:27 I suppose they generally want to run containers [that they already run elsewhere on a non-HPC environment] 11:39:27 I think we have the same - the advantages of a familiar development environment. I don't think our users are onto defining their applications through kuberenets 11:40:00 I think the question is whether, if you use k8s, that community will help you if you're not using something OCI-compliant 11:40:01 oneswig: we have a few brave adventurers putting kubespray on top of openstack 11:40:40 daveholland: we've used that, it's actually pretty easy to get something going. There's quite a bit of invention going on around integrating the two 11:40:51 (as in, kuberenetes and openstack) 11:41:01 yes 11:41:12 but, it's "just" another bit of learning curve :) 11:41:15 but mostly we like using Magnum, when we can 11:41:59 daveholland: and I'm sure they aren't doing it for the sysadmin skills, this is just in the way of their goal 11:42:55 oneswig: oh exactly, we (sysadmins) are only just on top of openstack admin, before we think about providing container orchestration as-a-service 11:43:21 The HPC container landscape is pretty interesting right now and OpenStack infrastructure sits especially well beneath it all 11:43:49 OK, shall we move on? 11:43:59 any final thoughts? 11:44:40 It might be interesting to invite someone from one of the dedicated HPC container projects along for another point of view 11:45:08 I need to check if both Christine and Christian are going to be at DockerCon 11:45:39 martial_: any comparison with what was discussed when you were at docker federal? 11:47:49 OK let's move on 11:48:17 #topic conferences, AOB 11:48:27 Also I started a Supercomputing 18 BoF with our past Openstack crowd (that will be at sc) and Christian and Christine 11:48:27 Today was more a technical how do we do it 11:48:27 The other one was why we are doing it and what that means with a little bit of how 11:49:08 Sounds good martial_ 11:49:16 I was able to ask a couple more technical questions of Christine after 11:49:29 I'm not sure where I'll be yet (Europe or US) 11:49:46 I'm pretty sure I'll be in Berlin 11:50:09 Is anyone here going to ISC? 11:50:34 https://www.isc-hpc.com/ 11:51:15 oneswig: considering it 11:51:25 I put the two Forum session suggestions that came via the SIG - GPUs and Preemptible Instances - in to the Forum site (albeit a day late) 11:51:32 (for Vancouver) 11:51:33 Alas I am not planning to go - but I'll be at HPC Knowledge Partnership in Barcelona the week before. 11:51:47 b1airo: thanks! I'd not been keeping on top of that at all. 11:51:53 Glad someone is... 11:51:55 I can add you if you want Blair 11:52:19 oneswig: have you a link for the Barcelona event? 11:52:29 both sessions are of interest to us but particularly preemptible 11:52:40 About the Forum, the Blazar has submitted a topic to discuss requirements for resource reservation across OpenStack 11:52:55 the Blazar *team* 11:53:05 (don't worry, found it) 11:53:11 oneswig: yeah I got invited for the Barcelona thing but can't go unfortunately. Alas, would have loved to return 11:53:12 I don't think the programme's up yet but it's here: http://www.hpckp.org/index.php/annual-meeting/hpckp-18 11:53:48 priteau: excellent - was hoping it would get covered given the interest in Dublin 11:54:00 It still has to be selected of course 11:54:12 IIRC at the last summit we merged with another topic 11:54:51 priteau: i.e. not just Nova resources? 11:56:21 b1airo: indeed, we would like to integrate with other projects as well. Reserving floating IPs from Neutron is a use case for NFV. 11:56:37 #link http://forumtopics.openstack.org/cfp/details/111 11:56:57 I think Bob Budden also wanted a mention of the workshop on Container-based systems for Big data, Distributed and Parallel computing - in Turin in August - https://sites.google.com/view/cbdp18/home 11:57:09 I assume that means he's going! 11:58:28 priteau: if you'll be in Vancouver for this session, you should begin it with the demo of Chameleon's system. It's compelling. 11:58:31 Love a call for participation that starts with, "Nowadays..." :-) 11:59:02 +1 to that suggestion 11:59:04 oneswig: unfortunately I cannot attend the Vancouver summit. From Blazar the NTT guys will be there. 11:59:13 :-( 11:59:49 Ah, too bad. 12:00:02 (although I am sure they'll do a great job) 12:00:19 Looks like we are on the hour! 12:00:20 Will you get that booking system upstream? 12:00:27 ah, well spotted b1airo 12:00:34 final comments!? 12:00:35 oneswig: that's the plan! 12:00:44 +1 to that 12:00:52 #endmeeting