11:00:27 <oneswig> #startmeeting scientific-sig 11:00:28 <openstack> Meeting started Wed Mar 28 11:00:27 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:31 <openstack> The meeting name has been set to 'scientific_sig' 11:00:38 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_March_28th_2018 11:00:49 <oneswig> Good morning good afternoon good evening 11:01:06 <zhipeng> good evening :) 11:01:16 <oneswig> Hey zhipeng, thanks for joining today 11:01:30 <zhipeng> glad to be here 11:01:43 <priteau> Hello everyone 11:01:44 <verdurin> Hello - I can only join for a while because I only remembered about the timing change this morning... 11:01:47 <daveholland> hi 11:02:03 <oneswig> Then let's get started and talk fast :-) 11:02:08 <martial__> good day :) 11:02:10 <oneswig> #topic Cyborg 11:02:18 <oneswig> Hi martial__, good morning 11:02:23 <oneswig> #chair martial__ 11:02:24 <openstack> Current chairs: martial__ oneswig 11:02:44 <oneswig> #link Zhipeng's presentation https://docs.google.com/presentation/d/1tERW4CVhyxNdX50AOPZRa44iPEhico8O_vQ0Ou75L80/edit?usp=sharing 11:03:23 <oneswig> zhipeng: Thanks for sharing the presentation with us. I have plenty of questions and I am sure others do too. 11:03:34 <zhipeng> no problem ! 11:03:51 <oneswig> Can you start by describing, what is missing is OpenStack's support for (say) GPUs? 11:03:54 <zhipeng> have to applogize for lacking a more up to date one 11:04:49 <zhipeng> ok, so we have discussed very early on with Scientific SIG 11:05:07 <zhipeng> and got valuable input as well 11:05:38 <b1airo> I'm still awake it seems 11:05:47 <zhipeng> the initial feedback we got regarding GPU, is that it is difficult to fully balance between GPU and CPU resource 11:05:54 <oneswig> #chair b1airo 11:05:55 <openstack> Current chairs: b1airo martial__ oneswig 11:06:02 <oneswig> Evening Blair 11:06:07 <zhipeng> hey Blair 11:06:10 <b1airo> O/ 11:06:14 <martial__> welcome b1airo :) 11:06:44 <zhipeng> so for example, users typically either have to host aggregate the GPU resource in order to fully utilize it 11:06:53 <oneswig> zhipeng: balance, as in manage the scheduling of workloads that require GPUs without under-utilising the GPU hardwrare? 11:07:05 <zhipeng> oneswig exactly 11:07:06 <zz9pzza> o/ 11:07:39 <b1airo> Sounds like that is more a user problem? 11:07:43 <zhipeng> or user have a mix CPU-GPU setup but workload that needs GPU not scheduled onto GPU nodes as planned 11:08:00 <zhipeng> that was the input back in Boston Summit :) 11:08:07 <zhipeng> from Jim Golden I believe :) 11:08:46 <oneswig> I think this is a good example that people can relate to. 11:08:49 <zhipeng> well I think if we look at the latest release, with the Placement and all, this issue is not that severe 11:09:03 <b1airo> I'm not sure if I could be talking at cross purposes here as I came in late... 11:09:11 <oneswig> How does Cyborg help? 11:09:38 <zhipeng> so Cyborg, developed as a general mgmt framework dedicated to the acceleration resources 11:09:48 <zhipeng> like FPGA, GPU, NVMe SSD,... 11:10:21 <zhipeng> Will help treating these types resources as first class citizen when Nova schedules 11:10:31 <zhipeng> so we have a more recent use case 11:10:37 <zhipeng> from another perspective 11:10:42 <b1airo> But this sounds like an inherent problem with direct attached devices and passthrough - balancing usage of must-have system resources like CPU & RAM versus accelerator resources has to be done at hardware purchase time - i.e. cannot be flexible or optimal for all use-cases/workloads 11:11:49 <zhipeng> b1airo well I think getting a reasonable utilization rate is achievable 11:12:20 <zhipeng> ok,back to the new use case 11:12:26 <zhipeng> provided by kosamara from CERN 11:12:50 <zhipeng> is that for security reason, users might want to clean up the GPU after usage 11:13:28 <zhipeng> this could also be something Cyborg could help with , via NVIDIA driver 11:13:35 <oneswig> zhipeng: in a virtualised context? Cleaning after pass-through? 11:14:23 <zhipeng> nuh it is for hpc 11:14:28 <zhipeng> no virtualization 11:14:30 <kosamara> Yes, cleaning the GPU memory after passthrough. 11:15:02 <zhipeng> kosamara tho not virtualized GPU right ? 11:15:11 <kosamara> No, only in passthrough config 11:15:19 <b1airo> That's an interesting one - what is the attack / information leak vector there kosamara ? 11:15:50 <kosamara> Events: user 1 uses a gpu, relinquishes it, user 2 claims it 11:16:08 <belmoreira> I think firmware is more important than memory 11:16:18 <kosamara> Then, user 2 can access user 1's data, which is not erased automatically 11:16:21 <b1airo> And how will you clean? Load a custom CUDA kernel that zeros all memory (without using unified memory)? 11:17:13 <kosamara> That's what I think. The problem is that the host can't do that, because it doesn't have the nvidia driver kernel modules loaded. 11:17:32 <kosamara> To allow gpu passthrough, the device must be claimed by vfio. 11:17:38 <b1airo> belmoreira: please let me know if you get a different response from elsewhere inside NVIDIA than I did 11:18:02 <belmoreira> b1airo sure 11:18:38 <martial__> (this reminds me there were quite a few announcements during GTC yesterday) 11:18:48 <zhipeng> well on a sidenote, we are really looking forward to work with NVIDIA team to have a driver ready for Rocky :) 11:18:57 <b1airo> kosamara: would this require a rowhammer alike attack? 11:19:03 <zz9pzza> You could have a image that is run between jobs who's tak kis to do the clean up 11:19:20 <b1airo> zhipeng: you can hope! :-) 11:19:25 <oneswig> kosamara: Are you simply able to write to the PCI memory regions of the VF without a driver loaded (or is that naive)? 11:19:32 <zhipeng> b1airo lol 11:19:45 <oneswig> Nat VF, GPU, ahem 11:20:26 <kosamara> blairo no. If user 1 in the above example leaves his data on memory, then user 2 can simply read the entire gpu memory and find them. 11:20:29 <zz9pzza> Having a cleaning image per thing is more generic. 11:20:51 <b1airo> Mind you, they are pretty busy making boxes that melt racks... (for those who read about DGX-2 today) 11:21:25 <belmoreira> zhipeng: not sure if I completely understood the goal of cyborg. Can you explain how cyborg differs from the work done in nova to support vGPUs? 11:21:43 <kosamara> oneswig: I'm currently researching that possibility. I don't have low-level pci knowledge yet, so it will take me some time. Perhaps someone else can provide a better answer? 11:22:01 <b1airo> kosamara: is that a verified leak, i.e. between guest instance boots and driver initialisations - I didn't know about this :-/ 11:22:43 <zhipeng> belmoreira great question, so on a higher level, cyborg aims to provide a general framework. Per GPU, we are actually discussing with the vGPU folks in Nova on working out a collaboration plan 11:22:57 <kosamara> blairo: yes, I can link to this paper: https://www.semanticscholar.org/paper/Confidentiality-Issues-on-a-GPU-in-a-Virtualized-E-Maurice-Neumann/693a8b56a9e961052702ff088131eb553e88d9ae 11:23:14 <oneswig> To follow on belmoreira's question, this issue with GPU cleaning between uses, I guess it can generalise to other kinds of acceleration. But does that require a service? 11:23:19 <priteau> kosamara: so it's quite similar to Ironic node cleaning, but with a cleaning VM loaded after each user instance is terminated? 11:23:21 <b1airo> Thanks, will have a look 11:23:24 <zhipeng> the current thinking is that cyborg could provide a more nuanced represenation of vGPU resources, for example in a tree structure 11:23:42 <zhipeng> which was originally planned but later ditched in the nova spec, if I remember correctly 11:24:53 <zhipeng> One thing worth mentioning is that Cyborg utilize and interact with Placement for resource information aggregation 11:24:56 <priteau> kosamara: which component is responsible for launching this cleaning VM? Is there a cybord-compute agent? 11:25:00 <belmoreira> oneswig: that's a good point. If nova supports PCI passthrough maybe it should be handled there 11:25:28 <kosamara> priteau: I don't know how ironic node cleaning works. But yes, a cleaning VM loaded after each user. Apart if what oneswig suggests above can actually work, through the vfio driver, which is already on the host 11:25:34 <b1airo> oneswig: kosamara: re. direct PCI config space writes - yes I believe you can. Vfio can intercept, but it doesn't currently protect everything that should be protected with GPU BAR0 11:26:23 <belmoreira> zhipeng: but at the end is nova scheduler that needs to be aware of this available resources in placement 11:26:41 <zhipeng> yes exactly 11:26:54 <b1airo> I can imagine cyborg coming into its own with a solid network/fabric based accelerator attachment model 11:26:59 <zhipeng> cyborg-conductor will sync with Placement about all the acceleration resources 11:27:11 <zhipeng> b1airo that is definitely something we are looking at 11:27:28 <b1airo> Things like PCIe fabrics 11:27:37 <oneswig> One issue with this approach is that programmed-IO writes to how-many-GB of GPU RAM might be slower than booting a vm to get the GPU to do it itself. 11:27:38 <zhipeng> that model better suits coz the life cycle is independant from the compute 11:27:47 <martial__> zhipeng: what is your timeline for features in cyborg? 11:27:51 <b1airo> Or perhaps NVMeoF 11:28:01 <zhipeng> martial__ which features ? 11:28:24 <oneswig> b1airo: you thinking of RCUDA here? 11:29:13 <zhipeng> oneswig the Huawei Cloud will actually have a RCUDA enable remote GPU for use this year 11:29:14 <b1airo> Yes, RCUDA is a good example for GPUs and would be cool to have a prototype 11:29:32 <zhipeng> the service end is implemented based upon cyborg 11:29:53 <oneswig> zhipeng: sounds good, better get that cleaning working :-) 11:30:06 <zhipeng> cleaning is more fun :) 11:30:06 <b1airo> Will we hear about that in Berlin zhipeng ? :-) 11:30:20 <martial__> zhipeng: given the abstraction level per hardware, are you prioritizing some components/hardware first or is the model/solution thought as a generic enabler for all hardware? 11:30:29 <zhipeng> b1airo will endeaver to do so :P 11:30:58 <zhipeng> martial__ starting Rocky we will try to establishing something like a standardized metadata description 11:31:05 <zhipeng> across FPGA, GPU and other things 11:31:17 <zhipeng> device tree for ARM for example 11:31:41 <zhipeng> we want to make Cyborg talk as general as possible to the accelerators 11:32:14 <martial__> sounds very good 11:32:34 <oneswig> zhipeng: can you talk more on the interaction with nova/placement? Does Cyborg do something with custom resource classes? 11:33:08 <zhipeng> yes oneswig, cyborg implements custom trait and resource class for FPGA resources at the moment 11:33:09 <b1airo> zhipeng: I think the SLURM scheduler already has a similar tree like resource model, you should look into that for inspiration and/or blatant copying 11:33:21 <zhipeng> and will do the same for other types of accelerators as well 11:33:32 <zhipeng> b1airo any pointers ? 11:33:59 <zhipeng> would love to blatanyly copy XD 11:34:42 <b1airo> It's called GRES 11:35:08 <oneswig> zhipeng: what extra does Cyborg add to the placement service's handling of custom resource classes for scheduling with accelerators? 11:35:53 <zhipeng> oneswig actually nothing special (beauty of the placement design) 11:36:13 <zhipeng> as long as we define the schema correctly, it could work :) 11:36:30 <zhipeng> our Intel dev team did a PoC to verify that, just couple of days ago 11:36:48 <oneswig> Ah, OK, so the focus of development effort is for supporting the hardware end, more than the scheduling end? 11:37:06 <oneswig> cleaning and so on? 11:37:11 <zhipeng> yes the gaps for example for FPGA, is how to interact with Glance on image mgmt 11:37:18 <zhipeng> and how to attach 11:37:38 <zhipeng> so one of the outcome of the discussion we had with the nova team in Dublin 11:37:49 <zhipeng> is that they suggest we created a os-acc lib 11:37:57 <zhipeng> similar to os-vif and os-brick, to handle that 11:38:36 <zhipeng> For GPU, I guees it would be the cleaning and attach/detach as well 11:38:45 <oneswig> Do you have support for loading user netlists onto FPGAs and passing-through as a prepared device? 11:40:04 <zhipeng> oneswig I have to double check on that with the driver team :) 11:40:20 <b1airo> Oh right, now I realise at the start of the meeting you were probably talking about dynamic attachment of GPUs etc to instances (as opposed to fixed to instance like we have now) 11:40:42 <oneswig> could be good, but very scary attack potential 11:40:58 <zhipeng> b1airo yep :) 11:41:35 <oneswig> Ah, I have to drop off - I have a visitor - chairs can you take over the meeting bot? 11:42:15 <b1airo> Copy 11:42:47 <martial__> sure 11:43:17 <b1airo> zhipeng: have you looked at potential for cyborg to orchestrate vGPU? 11:44:13 <zhipeng> yes definitely, we actually invite Jianghua to join the discussion on our weekly meeting about 2 hours and 15mins later :P 11:44:19 <zhipeng> on #openstack-cyborg 11:45:12 <martial__> zhipeng: then can you also describe how to best use Cyborg; ie how to deploy and make use of it efficiently ? 11:45:54 <b1airo> I am not sure what actually ended being implemented for the new vGPU support in Nova, but I'm guessing since there is still no NVIDIA Linux/KVM drivers available for host side yet that there must be gaps 11:46:21 <martial__> (or plan for use, ie best practice with cyborg) 11:47:14 <zhipeng> martial__ well as you know we wrote the project from ground up, so it is still very buggy, but devstack is the best way at the moment to try it out 11:47:55 <zhipeng> b1airo i think I could confirm that with Jianghua later 11:48:24 <b1airo> Thanks, I will follow up to check the logs 11:48:30 <martial__> sounds good, thank you 11:49:00 <b1airo> zhipeng: did you have anything else to report on Cyborg? 11:49:03 <martial__> zhipeng: anyhting else we need to know about cyborg? 11:49:20 <b1airo> Or for that matter does anyone else have further questions? 11:49:28 <zhipeng> i think we've covered all the important bits 11:49:29 <b1airo> (jinx martial__ ) 11:49:43 <martial__> (indeed b1airo :) ) 11:49:55 <b1airo> Great! 11:50:02 <martial__> zhipeng: thank you very much for taking the time to come talk to us 11:50:02 <zhipeng> for our rocky priorities, you could checkout from the mailinglist archive 11:50:06 <zhipeng> no problem, cyborg could not be born without the great early support from SWG, and your inputs are always welcomed 11:50:16 <zhipeng> :) 11:50:47 <martial__> :) 11:50:49 <b1airo> martial__: do you have the Forum brainstorm etherpad link handy? 11:51:12 <martial__> #link Forum brainstorming (https://etherpad.openstack.org/p/YVR18-scientific-sig-brainstorming 11:51:28 <martial__> #link Forum brainstorming https://etherpad.openstack.org/p/YVR18-scientific-sig-brainstorming 11:52:07 <martial__> so far still only Blair's content 11:52:28 <martial__> we will have more added as we get closer and get confirmation of who will be able to join 11:53:18 <b1airo> kosamara: belmoreira - please throw your ideas in there regarding a session on GPUs 11:53:22 <martial__> but FTI, fellow Scientific SIG participants, the Etherpad is for our collection of ideas for the Forum 11:53:43 <martial__> (FTI -> FYI) 11:54:58 <martial__> And as I see no comments yet 11:55:11 <martial__> moving on to the next topic 11:55:27 <martial__> #topic AOB 11:55:50 <martial__> Well GTC yesterday gave us some things to look ino 11:55:52 <martial__> into 11:56:23 <martial__> "NVIDIA TensorRTâ„¢ is a high-performance deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. With TensorRT, you can optimize neural network models, calibrate for lower precision with high accuracy, and finally deploy the models to hyperscale data centers, embedded, or automotive product platforms. TensorRT-based applications on 11:56:23 <martial__> GPUs perform up to 100x faster than CPU during inference for models trained in all major frameworks." https://developer.nvidia.com/tensorrt 11:56:49 <martial__> and "NVLink is a great advance to enable eight GPUs in a single server, and accelerate performance beyond PCIe. [...] NVIDIA NVSwitch is the first on-node switch architecture to support 16 fully-connected GPUs in a single server node and drive simultaneous communication between all eight GPU pairs" https://www.nvidia.com/en-us/data-center/nvlink/ 11:57:11 <martial__> For people interested, the full 2h30 video is at https://www.ustream.tv/gpu-technology-conference 11:57:31 <martial__> and the model 2 ... b1airo ? :) 11:57:53 <b1airo> Yeah, 10kW beast 11:59:05 <martial__> Link for people interested https://www.nvidia.com/en-us/data-center/dgx-2/ 11:59:19 <martial__> and with that, we are reaching the end of the hour 11:59:39 <b1airo> I want HGX-2 (for the little people) 12:00:17 <martial__> thanks again to zhipeng for spending some quality time talking to us about Cyborg (reminder on presentation https://docs.google.com/presentation/d/1tERW4CVhyxNdX50AOPZRa44iPEhico8O_vQ0Ou75L80/edit?usp=sharing ) 12:00:17 <b1airo> Time's up! 12:00:27 <b1airo> Thanks!! 12:01:04 <martial__> thanks everybody for joining us for another fun session :) 12:01:11 <martial__> #endmeeting