13:59:21 #startmeeting openstack-cyborg 13:59:22 Meeting started Wed Apr 25 13:59:21 2018 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:59:25 The meeting name has been set to 'openstack_cyborg' 13:59:34 #topic Roll Call 13:59:41 #info Howard 13:59:45 #info Mike 13:59:48 #info Sundar 14:00:16 Hi Howard and Mike 14:00:27 Hello Sundar, Howard et all. 14:00:37 #info Helloway 14:00:40 Hello everyone :) 14:00:49 #info Li Liu 14:01:10 Hi guys 14:03:15 #info shaohe 14:03:27 hi all 14:03:48 hi 14:04:45 let's start then 14:05:01 #topic KubeCon EU ResMgmt WG preparation 14:05:09 #link https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU/edit?usp=sharing 14:05:29 Hi all 14:05:36 so i think I mentioned that for this year's planning I want to be able to align what we have done here 14:05:41 with the k8s community 14:06:09 KubeCon EU is around the corner next week, and it would be a great place to start participating 14:06:27 #info edleafe 14:06:28 Sundar could you help share some status about the resource mgmt wg in k8s ? 14:06:36 Sure 14:06:49 that's great 14:07:05 We started participating last year, with a document describing the FPGA structure and use cases 14:07:42 The main thing to note is that the FPGA structural model -- with regions, accelerators, local memory etc. -- is the same independent of orch framework -- openStack, K8s etc 14:08:21 Also, the use cases defined in the Cyborg/Nova spec stay the same -- FPGA as a Service, Accelerated Function as a Service, etc. 14:08:25 The main difference is in the set of mechanisms available 14:09:05 In OpenSTack, we have the notion of nested Resource Providers (nRPs), which provides a natural tree structure that matches many device topologies 14:09:44 The data models and resource handling in K8s is still evolving 14:10:46 What we have now is the device plugin mechanism: there is standard API by which kubelet can invoke a plugin for a category of devices 14:11:41 The plugin advertises a resource name, e.g. intel.com/fpga-a10, and lists the devices corresponding to that. There is also a provision to update that list over time, and to report the health of each device 14:12:09 Based on this information, when a pod spec asks fora resource, the standard K8s scheduler picks a node and informs the kubelet on that node 14:12:35 The kubelet then invokes another API on the device plugin to allocate a device of the requested type and prepare it 14:13:22 After that, the kubelet invokes a container runtime (e.g. Docker) through the CRI with an OCI runtime spec 14:14:05 This basic mechanism does not include the nested structure of FPGAs, and we have been discussing how to fit that in 14:14:35 However, there are many options: we can use Custom Resource Definitions (CRDs) https://kubernetes.io/docs/concepts/api-extension/custom-resources/#customresourcedefinitions 14:14:36 and vGPU as well I suppose ? 14:15:11 Howard, yes, I think vGPUs, esp.of different types, will also require further consideration 14:15:52 CRDs are eseentially custom resource classes: we can instantiate resources for a CRD. 14:16:33 There are also two ongoing proposals for including resource classes: 14:16:51 The first one is: https://docs.google.com/document/d/1qKiIVs9AMh2Ua5thhtvWqOqW0MSle_RV3lfriO1Aj6U/edit# 14:17:28 An alternative proposal for resource classes is at: https://docs.google.com/document/d/1666PPUs4Lz56TqKygcy6mXkNazde-vwA7q4e5H92sUc/edit# 14:18:04 Thei stated goals and non-goals are not exactly the same. 14:18:05 We don't have access two those 2 google docs.. 14:18:38 Li_Liu, these are supposed to be public dos -- please ask for access 14:18:41 *docs 14:18:50 ok, just did 14:19:26 Sundar there is a PR on Resource API 14:19:45 this is correlates to jiaying's or vish's ? 14:19:45 IMHO, lot of the discussion comes from GPU background. For FPGAs, we are trying to get alignment within Intel first 14:20:37 Zhipeng, Jiaying's proposal came first -- but I need to look at the PR before confirming 14:21:34 okey :) 14:21:54 So, in short, our task is to find a way of handling FPGAs, without having the benefit of a tree-structured data model, but handling the same FPGA structures and usage models 14:21:57 is the vGPU is vendor specific? ie tied to a particular driver implementation? 14:23:03 Sundar, does the k8s community have any plans to add the tree-structure data model in the near future? 14:23:05 NokMikeR, the discussions I have seen have been centered on Nvidia's vGPU types, though not necessarily phrased in vendor-specific terms 14:23:15 in other words how do you differentie the features on one vGPU vs another if the underlying features in the real gpu are different - or are they abstracted somehow? 14:23:20 Li_Liu, None that I am aware of. 14:24:14 Sundar: ok thought so re: nvid gpus. 14:24:35 NokMikeR, in OpenStack, the answer is clearer: the device itself exposes different vGPU types as traits, and their capacities as units of a generic accelerator RC 14:25:44 Cyborg needs to handle GPUs and FPGAs of course. But, IMHO, there is enough attention on GPUs :) It is FPGAs that need further thought :) 14:25:57 Sundar Jiaying's proposal, as far as I understand, still tries to modify the k8s core functionality ? 14:26:37 Sundar: do we decide to only support one vendor GPU or FPGA in this release without nest Provider? 14:26:42 Zhipeng, yes, it requires changes on controller side and kubelet changes 14:28:15 Shaohe: for Rocky, I was proposing to include only device of a particualr type: one GPU or one FPGA. But, based on feedback, w ehave to relax it to multiple devices of the same type, i.e., 14:28:29 you could have 2 GPUs of the same type, 2 FPGAs of the same type etc. 14:28:51 Sundar if we say, propose a CRD type of kube-cyborg thing, will it make sense to the res mgmt wg people ? 14:29:00 meaning that similar to OpenStack 14:29:10 we view accelerator not part of the general compute infra 14:29:46 and have its own model and scheduling process if needed 14:30:25 ZHipeng, we can propose CRDs, but the exact workflows will matter. 14:30:49 zhipeng, you are saying, similar to what we did to Cyborg, we cut a piece out from K8S? 14:30:51 I think the main pain point is still at scheduler extention 14:31:07 which Derek also mentioned KubeCon last Dec 14:31:31 Li_Liu essentially a out-of-band controller for accelerators 14:31:43 Sundar: will cyborg support nest provider in Rocky release? 14:32:10 shaohe_feng_ I think Placement won't support it 14:32:29 zhipeng: Got it. 14:32:40 but the way we are modeling it is very close to nrp, correct me if i'm wrong Li_Liu 14:32:43 Zhipeng, I was also advocating a scheduler extension. But apparently it is not popular within the community. There is a proposal to revamp the scheduler itself: https://docs.google.com/document/d/1NskpTHpOBWtIa5XsgB4bwPHz4IdRxL1RNvdjI7RVGio/edit# 14:32:58 So, the scheduler, as well as its extension APIs, may change 14:34:22 well CRDs are generally great for API aggregation, but complex for resource related functionalities 14:34:32 Here is a possible way to get to a few basic cases without anything fancy (this is not fully agreed upon, please take this as an option, not a plan): 14:34:37 like binding the resource to the pod 14:34:46 since the process is external via CRD 14:34:46 • Publish each region type as a resource. E.g. intel.com/fpga-dcp, intel.com/fpga-vg. • The pod spec asks for a region type as a resource, and also specifies a bitstream ID. That could be a label. • An admission controller inserts an init container on seeing a FPGA resource. • The scheduler picks a node based on the requested region type (and ignores the bitstream ID). • The init container pulls the bitstream wi 14:35:07 Ah, that didn't come out well in IRC -- let me re-type 14:35:24 • Publish each region type as a resource. E.g. intel.com/fpga-dcp, intel.com/fpga-vg 14:35:34 • The pod spec asks for a region type as a resource, and also specifies a bitstream ID. That could be a label 14:35:45 • An admission controller inserts an init container on seeing a FPGA resource. 14:35:59 • The scheduler picks a node based on the requested region type (and ignores the bitstream ID). 14:36:10 • The init container pulls the bitstream with that ID from a bitstream repository (mechanism TBD) and programs the selected device. 14:37:20 I have heard that, if we give a higher security context to the init container for programming, it may affect other containers in the same pod. I am still trying to find evidence for that 14:37:46 lol this is just too complicated 14:38:07 thx Sundar I think we have a good understanding of the status qup 14:38:10 quo 14:38:18 Zhipeng, more complicated than other proposals out there? ;) 14:38:25 anyone else got questions regarding k8s ? 14:38:54 if you are attending KubeCon we could meet f2f, and give them hell XD 14:39:09 lol 14:39:28 * NokMikeR braces for impact 14:41:04 if you guys have any dial-in-able meeting during kubecon, please loop us in 14:41:36 yes, loop us in 14:42:02 Okey I will give a howler if a bridge is available 14:42:33 Seems like my PC irc client just died 14:43:05 #topic Sub team arrangements 14:46:08 phew 14:46:13 :) 14:46:42 cell phone irc bouncer crashed just now 14:46:57 moving on 14:47:05 #topic subteam arrangments 14:47:35 okey so given recent events, I think it is necessary to reorg the subteams 14:48:23 and also encourage subteam to organize their specific meetings 14:48:30 for specific topics 14:48:45 so I would suggest shaohe to help lead the driver subteam 14:49:10 work with our Xilinx and lenovo colleagues on FPGA and GPU driver in Rocky 14:49:29 Ok. 14:49:40 Li Liu help lead the doc team, to work with our CMCC member and others to make documentation as good as your spec :) 14:49:49 sure 14:49:59 I will keep on the release mgmt side 14:50:35 shaohe_feng_ you can sync up with Chuck_ on a meeting time more suited for US west coast 14:50:55 mainly China morning times I guess 14:51:13 zhipeng: what is Chuck_? 14:51:28 I am Chuck_ :-) 14:51:37 Hi Shaohe, this is Chuck from Xilinx 14:51:44 lol 14:51:46 Chuck_: hello 14:51:46 I work from US west time 14:51:54 I will add Chuck_ into our wechat group as well 14:52:13 talk in Chinese :) 14:52:47 count me in for those driver meeting shaohe_feng_ 14:52:52 :) 14:52:53 yes, look forward to working with you. 14:53:20 Li_Liu: OK. 14:53:21 and subteam plz send report to the mailing list, you can decide whether it is bi-weekly or weekly 14:53:26 or monthly even 14:53:28 up to you 14:53:56 Is it all WeChat in Chinse then? ;) I can join if that helps 14:54:10 it is for the China region devs :P 14:54:20 all in Chinese and crazy emoticons :P 14:54:24 WeChat has a translation feature tho :) 14:54:39 Sundar: we speak Chinese there. :) 14:54:53 Sundar: you can learn Chinese. 14:55:24 OK :) My daughters learnt Mandarin. I should have joined them 14:55:55 haha will learn a lot 14:56:34 :) Do we have alignment on what use cases we will deliver in Rocky? 14:57:11 CERN HPC PoC could be the one for GPU 14:57:12 Can we say we will deliver AFaaS pre-programmed, and FPGA aaS with request time programming? These are the simplest ones, and many customers want that 14:57:36 Zhipeng, yes, GPU POC too 14:57:41 Sundar that is something we should be able to deliver 14:57:47 for FPGA 14:57:47 Sundar: oh, do we will have same RC name for VGPU and FPGA, and other accelerators in Rocky release? 14:58:49 Shaohe, yes our agreemnt with Nova is to use a generic RC for all accelerators 14:59:04 as in the spec 14:59:21 DO we have kosamara here? 14:59:34 Any input on the spec from GPU perspective? 14:59:58 Sundar: But I'm worry about without nest provider. 15:00:28 Shaohe, without nRP, we will apply the traits etc. to the compute node RP 15:00:46 Do you see problems in doing that? 15:01:07 how do we distinguish vGPU and FPGA in one host？ 15:02:02 hello? 15:02:08 welcome back 15:02:12 everyone quit? 15:02:21 net split a glitch in the matrix. 15:02:44 All traits will get mixed in the compute node. So, a flavor asking for resource:CUSTOM_ACCELERATOR=1 and trait:CUSTOM_GPU_ will come to cyborg which needs to choose a GPU based on its Deployables 15:02:54 Li_Liu, I am still here :) 15:03:24 Sundar: for example, we make a FPGA traits and GPU traits on host provider. 15:03:33 okey this is a crazy night 15:03:42 Sundar: and there is a one GPU and one FPGA 15:03:48 Sorry, i have another call at this time 15:04:04 firstly we consume one FPGA 15:04:08 Shaohe, can we continue another time? 15:04:14 Sundar: OK. 15:04:34 okey moving on 15:04:45 #topic critical rocky spec update 15:05:09 xinran_ are you still around ? 15:05:48 Yes I’m here 15:06:06 could you provide a brief update on the quota spec ? 15:09:31 Ok. Like we discussed with xiyuan, I think we should do the usage part first, so I want to separate the spec into two part. 15:09:45 What do you think 15:11:13 makes sense 15:11:24 have you already updated it as two parts ? 15:12:20 Not yet will update the usage part in this week 15:12:40 sounds great :) 15:13:10 #action xinran to update the quota spec into two parts and complete the usage one first 15:13:15 For the limit part, i think it still need more discussion with xiyuan 15:13:29 no problem 15:13:35 So that we will have 2 spec on this in stead of 1 right? 15:13:42 yep 15:13:53 but the limit one should be rather simple 15:14:08 since we will utilize a lot of things Keystone has already designed 15:14:46 I see 15:15:09 Yes I think so 15:15:53 okey another thing is that the os-acc spec will need more time 15:16:08 so I suggest we relax the deadline for proposal on that one to the June MS2 15:16:38 if by then we still could not get it landed, then I will block it for Rocky but could be land first thing in Stein :) 15:16:47 zhipeng, let me know if you need some help on that one 15:16:57 sounds reasonable 15:16:58 ? 15:17:06 Li_Liu sure :) 15:17:29 since I think I have some thoughts on it 15:17:34 #agreed os-acc spec extended to MS2 for approval 15:17:53 Li_Liu no problem, feel free to share it 15:18:04 okey moving on 15:18:14 #topic open patches/bugs 15:18:31 I will push up the fix for mutable-config this week 15:20:16 maybe combined with the fix lenovo folk has provided but blocked by me due to trivial fix 15:21:07 any other issues on this topic ? 15:22:04 okey then 15:22:09 #topic AoB 15:22:32 any other business 15:23:31 what is AoB...... 15:23:45 Ah i know! 15:24:11 xinran_ bien :) 15:25:07 ;) 15:26:10 okey if there are no other topics 15:26:17 let's conclude the meeting today 15:26:28 thx for the great conversation :) 15:26:31 #endmeeting