21:02:21 #startmeeting scientific-wg 21:02:21 Meeting started Tue Dec 13 21:02:21 2016 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:25 The meeting name has been set to 'scientific_wg' 21:02:28 christx2 blairo: sorry for overrunning 21:02:30 cool, listening - christoph - london, uk 21:02:34 no probs ttx 21:02:46 hi christx2 ! 21:02:58 #chair martial 21:02:59 Current chairs: b1airo martial 21:03:02 hello all o/ 21:03:11 * ildikov is lurking :) 21:03:14 hi there Time 21:03:22 *Tim (sorry!) 21:03:32 lol, np 21:04:02 agenda is at #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_December_13th_2016 21:04:04 Hi all, I am Lizhong working with Martial from NIST 21:04:18 greetings lizhong, welcome 21:04:34 GPUs and Virtualisation: 21:04:34 Getting set up for virtualising GPU hardware 21:04:34 Performance optimisation and further topics 21:04:34 Monitoring round-up 21:04:34 Review gathered notes on wiki at https://wiki.openstack.org/wiki/ScientificWGMonitoringAndTelemetry 21:04:41 evening / morning b1airo et al! 21:04:51 oneswig is an apology for today 21:04:58 howdy powerd 21:05:23 ok let's get into it then 21:05:36 b1airo: asked Lizhong to join as he is the coder of the Dmoni software so he will be able to comment/describe it if needed 21:05:36 #topic GPUs and Virtualisation 21:06:04 ok so i've made a start at the GPUs investigation 21:06:20 thanks martial - i take it lizhong has already seen the etherpad from last couple of weeks? 21:06:33 we put a K80 and a K2 in to a system in our lab and have passthrough working now 21:06:52 just put a quick blog post together this afternoon - i'll update the wg wiki with the address 21:06:52 powerd, did you end up starting an etherpad or anything yet? 21:06:56 b1airo: yes, he contributed even 21:07:10 hello 21:07:33 i have just dug up some notes that we had and also some that Joe Topjian from Cybera passed on - i think we could make a pretty good start to some proper docs with these 21:07:44 g'day rbudden o/ 21:08:30 powerd, cool - i will take an action to start an etherpad and share 21:08:41 not started an etherpad 21:08:47 #action b1airo to start GPUs on OpenStack etherpad 21:09:21 (that should probably be "GPUs in OpenStack" - oh well) 21:09:36 so powerd , did you have any problems with your setup? 21:10:52 we've been doing the GPU passthrough thing for a fairly long time in prod now - ~2.5 years 21:11:56 from what I remember of our chat the passthrough works great but there is no way currently to virtualize a pci switch? 21:12:11 anyone else doing GPUs with OpenStack ? 21:12:21 can I ask a naive question. when you use GPU within an OS instance, I assume you mean it's seen as a HW accelerator like on the compute side, or is this something about the instances graphics 21:12:44 I'm in the attempt to wrestle money free to purchase some gpus stage 21:12:59 jmlowe, well not strictly true - you can tell qemu to emulate a particular pci/e topology, but i haven't played with it much yet - lot's of manual XML-ing required 21:13:09 I'm aiming for a 1U with 4x p100's 21:13:28 jpr, good question - we should aim to have docs answering this sort of thing up front 21:13:29 The starting point of some instructions have been added to: https://wiki.openstack.org/wiki/ScientificWGGPUs 21:13:48 b1airo: ok, might be possible but certainly not baked in 21:14:07 No - no real problems setting up. But only starting to look at the perf now. 21:14:29 We'll have P100 in the lab by the end of the week so will be giving that a spin too 21:14:38 @powerd: thanks. that's clear 21:14:41 jmlowe, correct - and also, i'm not even sure whether the emulated pci topology is the problem for e.g. CUDA P2P 21:14:51 it could actually be PCIe ACS 21:15:13 In our team, we have someone did passthrough with OpenStack. The GPU appears in the VM, however it fails when you start a GPU program. 21:15:39 jpr, going back to your question - we use GPU passthrough both for GPU compute and to accelerate grpahics, e.g. with VirtualGL 21:17:21 lizhong, we have seen those sort of problems plenty - the two primary issues that cause that appear to be either hypervisor support (i.e. not new enough qemu/kvm) or a driver in the host already bound to the device 21:17:42 do you know what hypervisor version you were using? 21:19:05 @b1airo: ah, so do you use that under the hood of the instance to make it think it has GPU hardware or within the context of the instance via the virtualgl support for graphics at the user level 21:19:06 I don't know the exact version, but I'll let my colleague know. 21:19:11 powerd, what hypervisor platform are you using? 21:19:14 thanks 21:19:25 fyi.. there is a talk in Barcelona about GPU on openstack. 21:20:02 #link Barcelona Presentation about GPU: https://www.youtube.com/watch?v=j9DmE_oUo5I 21:20:43 FWIW, we have K80s and P100s currently that I may try and integrate in OpenStack at some point, but no immediate plans since they are under heavy us through interactive/batch/reservation jobs. If there’s some interesting research/tests to be done I could snag some time on the debug node 21:20:57 jpr, from inside the instance. particularly for HPC desktops, we have a cluster focusing on characterisation and visualisation, so accelerated rendering is important for many of those use-cases and with a virtual desktop on the cluster all data on the PFS is local 21:22:03 rbudden, cool - we're about to get a rack full of P100 (PCIe) nodes, so far I haven't had any confirmation of anyone using them in pass-through, so crossing fingers at this point! 21:22:14 lizhong/b1ario - we saw that hang on workload with previous versions of OS/KVM ourselves also. Using centos 7 with liberty which works fine. Will be updating to Mitaka soon too and will report back. 21:23:01 powerd, i assume you are getting the virt stack from another repo though? 21:23:22 the RHEV equivalent for centos maybe? 21:24:02 rbudden: b1airo: I'd love to hear how it goes with the p100's 21:24:17 leong, thanks for reminding me about that talk - if i recall correctly that was more about Intel's GPU virtualisation specifically? 21:24:39 b1airo: yup 21:25:00 @b1airo: nice! thanks. 21:25:08 virt stack is either epel or standard centos core - i'd need to go double check (or RDO) 21:26:01 leong, what is the target use-case for that? VM graphics acceleration? 21:28:03 graphics virtualization... 21:28:22 i have seen some work on the libvirt-dev list to add support for virtualisation-able (i.e. device multiplexing) NVIDIA gear too 21:28:34 can be used for media transcoding or hpc use cases as well... 21:29:23 DNNs? (sorry, just playing buzz-word bingo) ;-) 21:30:11 virtualizing the gpu's, this is new to me, I also need the buzzwords to google 21:30:17 so one problem that we have not yet solved with using host device based accelerators like GPUs in OpenStack is that it turns nodes into unicorns 21:31:30 e.g. i've got a bunch of hypervisors with 2x K1 or 2x K2, which equates to 4 or 8 usable (passthrough-able) GPUs and thus 4-8 GPU-enabled cloud instances that can run on that node 21:31:50 there is some related work in the dev community 21:31:50 so if we can get the GPU passed through we can share it :) and then nvidia themselves created docker-nvidia to split the GPU into multiple sub systems https://github.com/NVIDIA/nvidia-docker 21:32:19 this one is about adding support to xenapi 21:32:35 #link XenAPI/GPU: https://review.openstack.org/#/c/280099/ 21:33:00 that's all fine until we realise that those GPU instances are often only lightly utilised and we have a bunch of excess compute+memory capacity on the node 21:33:47 the problem is, i have not yet figured out a way with nova-scheduler to always have room for GPU instances but also let regular CPU-only instances onto the node 21:36:29 i'm guessing there are not that many people who have felt that problem yet though 21:36:48 or just don't care because they only have a few GPU nodes anyway 21:37:01 another thing I plan to look into is bitfusion for virtualising the GPUs - could be useful for this VDI-like requirement 21:37:20 bitfusion? don't think i've seen that 21:37:22 yea the 2nd one for us, only a few GPU nodes so its not a huge issue yet 21:37:24 I didn't some benchmarking for GPU with baremetal, KVM and Docker. Baremetal and docker get really better performance than KVM. If we have baremetal + GPU on OpenStak would be really nice. 21:37:32 b1airo: you want more CPU overcommit on the GPU nodes, but only when there are GPU instances running? 21:37:48 b1airo: agreed. i believe we tend to ignore the CPUs on the GPU nodes 21:37:50 Sorry for type. I DID some benchmarking for GPU with baremetal, KVM and Docker. Baremetal and docker get really better performance than KVM. If we have baremetal + GPU on OpenStak would be really nice. 21:38:20 essentially we schedule no CPU jobs during GPU jobs. the GPU jobs get a portion of the CPU based on the number of GPUs assigned. 21:38:35 www.bitfusion.io - not opensource but we'll give it a spin anyway. 21:39:15 looks suspiciously magic 21:39:52 lizhong, that's interesting about your benchmarking - the papers i have read all indicate KVM with passthrough is or is almost BM performance 21:39:53 lizhong: which version of qemu? 21:40:52 of course, you need to have pinning, numa topology, etc. so perhaps that is difference you saw? we have seen very bad perf degradation for GPU heavy workloads on KVM without that tuning 21:42:23 b1airo, actually that's possible, if it's a tuned KVM 21:42:38 also, you can have baremetal GPU on OpenStack, you just need an Ironic cloud with GPUs in the nodes - i'm not sure what the hardware discovery and scheduling support is like though 21:43:26 powerd, what perf tests are you planning? 21:44:01 b1airo: true, we have our GPUs controlled through Ironic, although we don’t do anything fancy on the scheduling side since SLURM handles that 21:44:19 we do have a flavor for the GPUs so we can boot independant images if necessary 21:45:21 So we are planning on performing some of the host <-> gpu transfer speeds/latencies, linpack and a couple of others. anything to add? 21:45:48 b1airo: for baremetal you can add extra_specs to your flavor and use that for scheduling on a specific node type 21:46:27 we'll add more GPUs and try node to node / GPUDirect (on a single host, then across fabric too) 21:46:56 priteau, ah cool 21:47:25 b1airo: 13 minutes to go 21:48:10 and to answer your earlier question about mixing GPU-enabled and CPU-only instances on the same hypervisor, i just always want to be able to launch n GPU instances (where n is # of GPUs in the node), but otherwise i'm happy to fill available capacity 21:48:42 (this is for general purpose cloud usage, not specifically HPC, so lots of idle dev instances and light desktop acceleration stuff) 21:49:08 powerd, excellent - that's the problem i'm currently stuck on! 21:49:46 what is the wiki page where people can add their notes on the GPU work ? 21:49:55 ok, that was useful (for me anyway!). martial lizhong did you want to cover off anything about Dmoni 21:49:56 good - guess i'll be stuck there too so soon enough ;) 21:50:06 martial, i will draft an etherpad and share so that everyone can edit 21:50:42 I see a lot of what is needed to be known is already in https://wiki.openstack.org/wiki/ScientificWGMonitoringAndTelemetry 21:50:51 #topic Monitoring round-up 21:50:56 lizhong entered this content on the etherpad at the time 21:50:59 b1airo: it sounds a bit tricky. There may be a combination of nova-scheduler filters and their config to do it, but I don't know it! 21:51:20 we have started the discussion on adding the code to github 21:51:35 so that people can see how it works 21:51:54 we have also discussed the release of our prototype BDaaS VM that relies on DMoni 21:52:06 priteau, i had some ideas i discussed with Joe Topjian (Cybera) a long while back - will add them to the etherpad 21:52:16 would be good to get your thoughts 21:52:35 lizhong is the main engineer on this tool, so I will let him comment 21:52:41 martial, sounds good 21:52:56 BDaaS = big data ? 21:53:43 b1airo: yes 21:54:25 we started the work to provide a mean to run a VM with builtin BD tools within our stack on a sequestered set of data 21:54:30 Dmoni is cluster monitoring tool targeting specific application running in a cluster 21:54:40 like Hadoop, Spark, etc. 21:55:09 and we wanted to know the benchmarking of different algorithms under different paradigms at the "step" level 21:55:10 from now on I will be pronouncing bdaas as badass 21:57:02 jmlowe, i have a sticker on my laptop that says BADaaS, but it means something else - i don't even know what, it was just fun o_0 21:57:11 so Dmoni was created to give us additional hooks into the benchmarking layer 21:57:22 Dmoni differs from Ganglia and other cluseter monitoring tools which collect all system info 21:57:27 martial, the BDaaS thing might be interesting in the context of the scientific datasets activity area 21:57:42 b1airo: happy to discuss it obviously 21:59:33 sorry - just been distracted by a security incident on my cloud 22:00:06 someone spamming from it, the joys of having a research user-bae 22:00:16 b1airo: uncool 22:00:19 time to wrap it up 22:00:26 yep 5pm 22:00:28 glad to see it happens to everybody 22:00:31 thanks all!! 22:00:32 (well here :) ) 22:00:39 #endmeeting