03:04:07 #startmeeting openstack-cyborg 03:04:08 Meeting started Thu Jun 11 03:04:07 2020 UTC and is due to finish in 60 minutes. The chair is Yumeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 03:04:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 03:04:11 The meeting name has been set to 'openstack_cyborg' 03:04:14 #topic Roll call 03:04:27 #infor Yumeng 03:04:36 Hi all 03:04:40 #info Sundar 03:04:41 Hi all 03:04:43 #info wangzhh 03:04:50 #info s_shogo 03:04:53 #info xinranwang 03:04:54 #info swp20 03:04:54 #info chenke 03:05:00 #topic Agenda 03:05:42 we will start from SmartNic topic. 03:05:48 #topic SmartNic 03:06:05 Thanks, Yumeng 03:06:26 I'd like to followup on our PTG discussion 03:06:56 Just to set the background: there are broadly two kinds of 'smart NICs'. A smart NIC may have a single ‘device’ that combines the accelerator and the NIC, or two (or more) components in a single PCI card, with separate accelerator and NIC components. 03:07:25 We should model the smart NIC as a single RP representing the combined accelerator/NIC for the first case. Agreed, right? 03:08:22 Yumeng, xinranwang, chenke, all : ^ 03:09:36 Anybody around?! 03:09:41 HI sundar. 03:09:50 I didn't follow for a long time, just listen at first.... 03:10:04 yes, agree. I may miss some in the PTG. But agree. 03:10:21 Do you means the first case is only one region ,right? 03:11:07 The concept of region is specific to FPGAs. A smart NIC may not have FPGAs. May be it is a NIC ASIC + many ARM cores. 03:11:30 As a first step, that is good. agree. 03:11:52 It may have fpga as well. I think we need consider that. I think subprovider can support this. 03:11:54 Great. For the second case, we could have a hierarchy with separate RPs for the accelerator and the NICs, and a top-level resource-less RP which aggregates all the children RPs and combines their traits 03:12:33 xinranwang: Of course. My point was, the concept of region can be used if it is a FPGA, not otherwise. 03:12:55 Sundar ok. got it. 03:13:35 For example: N3000 may be modeled as two RPs - one for the FPGA and one for the 2 Fortville NICs. Perhaps each Fortville NIC cna be a separate RP. That is up to us. 03:13:54 I agree. 03:14:49 Correspondingly, Cyborg will create a separate deployable for each component, and a top-level deployable that contains all component deployables. 03:15:38 The resources and traits of the children RPs are exposed in the top-level RP; similarly, the accelerators and attributes of children deployables are in the top-level deployable. 03:16:03 It corresponds to the device topology. Cyborg can report like this. If we need interact with other project like neutron, we should consider more. Because neutron report to placement if bandwidth feature is enabled. 03:17:00 xinranwang: I am hoping that Cyborg can create RPs for all components, not leave it to neutron even for bandwidth provider. Otherwise, it gets a bit complicated. 03:17:20 Neutron can, of course, use the RP that Cyborg created 03:17:49 The physnet is neutron who's in charge of, not sure cyborg should take it over. 03:18:10 it a network concept 03:18:16 s/it/it's 03:18:19 Yes. The physnet is best left to the admin or the OpenStack installer 03:18:47 Cyborg shouldn't get in the way. 03:19:23 However, we could model the physnet as a trait. Cyborg creates the NIC's RP but doesn't want to manage the physnet trait on that RP. 03:19:37 what is neutron's bandwidth feature enabled, how will cyborg know that neutron report to placement as well. 03:19:44 s/what is/what if 03:19:56 So, we can levae it to the admin or installer however they do it. That is how PCI whitelist handles physnet anyway. 03:20:47 I am not sure this will be accepted by the community. I am still thinking about it. 03:21:03 xinranwang: Good point, I have thought about it. Hoping we can reach agreement with Neutron that they need not create RPs in this case. 03:21:51 What won;t be accepted:: phsynet as trait, or Cyborg creating the RPs for NICs? 03:22:32 Anyway, I will propose a spec in nova community and everyone can discuss there. But I am still thinking about which solution I should choose to propose firstly, and others will be an alternative. 03:23:19 We should have some internal agreement hopefully before we approach others 03:24:39 Yes, of course 03:24:42 Anyways, what do others think of Cyborg creating RPs for the NIC side too? It will do that for the first type where the accelerator and NIC are combined. To keep it uniform, we should do that for the second case too 03:25:16 Otherwise, we'll have different solutions for different types of smart NICs, depending on whether they have a single component or multiple components 03:26:36 Yumeng, chenke, s_shogo, all: ^ 03:27:42 so if Admin use Placement CLI to set traits, then admin should know “physicalnet” and RP RC of this smartNIC. so does that mean admin needs to GET physicalnet first, then GET RP,RC, then report? 03:27:43 Anyway, I'll throw another idea in. Ideally, the admin should be able to formulate the device profile in the same way, independent of whether it is a single-component or multi-component device. 03:28:36 s_shogo: 03:28:42 Yumeng: the admin needs to get the RP for the NIC and set the trait there. he could do so via the installer 03:28:48 Shogo, are you around? 03:29:35 xinranwang: yes, I'm considering about that, with thinking about the operation of N3000.. 03:29:59 Good. Thanks, s_shogo 03:30:07 To repeat: Ideally, the admin should be able to formulate the device profile in the same way, independent of whether it is a single-component or multi-component device. 03:30:23 That common device profile would look like this: 03:30:33 Sundar: I am not familar with this. 03:30:58 s_shogo: lol, np. Just for other things. Haibin can not access to IRC, and he met some problem with this https://review.opendev.org/#/c/698190/ , could you connect him, maybe by email? 03:31:45 chenke: ok, np 03:32:01 { "name": "my-smartnic-dp", 03:32:03 Yes, I agree that from operator 's point of view > same way to treat device profile 03:32:18 { "name": "my-smartnic-dp", 03:32:26 "groups": [{ 03:32:35 "resources:FPGA": "1", 03:32:46 "resources:CUSTOM_NIC_X": "1", 03:33:05 "trait:CUSTOM_FPGA_REGION_ID_FOO": "required", 03:33:15 "trait:CUSTOM_NIC_TRAIT_BAR": "required", 03:33:22 "accel:bitstream_id": "3AFE" 03:33:29 xinranwang: OK, I took conversation with haibin and shaohe yesterday, that seems to be solved. Of course, If another one , I can help that, too. 03:33:31 }] 03:33:32 } 03:33:51 IOW, the resource, traits and Cyborg properties for both the accelerator and NIC would be presented as a single resource group, which would ensure that a single RP would have to satisfy that. That single RP could be the top-level RP of a hierarchy. 03:34:13 s_shogo: yes, another one, need your help 03:34:35 s_shogo: thanks, please contace him when you got time. 03:34:41 Basically, unless it is a single request group, it is not guaranteed that the resources will come form the same RP 03:35:03 Do you have any comments or questions? 03:35:11 shaohe_feng, OK,I got it! Could you contact me via e-mail like yesterday? 03:35:25 s_shogo: yes. 03:35:46 During ARQ binding, Cyborg would still get a single RP as today. In the case of a multi-component device, Cyborg would translate that to the top-level Deployable object, and figure out what constituent components are present. 03:35:50 we can merge the triat to a single request group, from cyborg, or from neutron. 03:36:48 I am not against you Sundar I understand your proposal, it is one of the solution we presents in PTG. 03:36:55 Neutron cannot add traits to a RP owned by Cyborg, as per existing agreement among developers. 03:38:24 I don't think we got into the aspects of how the device profile should look, during the PTG. It is impotant to nail that down, so we get a uniform way of handling all smart NICs, whether they have 1 component or many 03:38:53 If neutron don't want do this. Cyborg can update traits created by neutron, in that case, cyborg should know whether neutron create RP or not. We should think about it. 03:39:47 Sundar,xinranwang: I just got a question. if admin knows the "physicalnet", why can't the admin create a device_profile with "physicalnet" directly? 03:39:50 If one OpenStack service starts updating traits on RPs created by another service, that can cause confusion. You are welcome to discuss it with other developers. 03:40:06 We have already had many such discussions during Nova-Cyborg discussion. 03:40:13 How device_profile looks like is not related to how we report the RP. It's 2 questions. 03:40:36 Yumeng: yes, admin can do that. 03:40:39 so that either nova or neutron does not have to merge "physicalnet" trait to request group 03:40:53 Sundar, 'During ARQ binding, Cyborg would still get a single RP as today.' the RP is for FPGA or NIC_X? 03:41:13 My point is, the device profile must have a single RG for both sides. If you create 2 separate RGs for the multi-component case, it won't ensure co-location. 03:41:39 swp20: That would be the top-level RP, which contains both the FPGA and NIC RPs as children. 03:41:40 I think we have mix 2 questions... one is who report to placement, one is how nova get this traits before scheduling. 03:41:49 xinranwang: ok. so that's also one of the solution. 03:42:42 Yumeng: admin can create device profiles with physnet as a trait. The question is, how is that trait set in the RP, before it is referenced in the devic eprofile. 03:43:17 xinranwang: No, I have addressed both questions separately above. 03:45:06 Sundar: got it,Thanks. 03:45:19 Yumeng: yes, sure. That's one of the solution 03:46:00 If it is not clear, let me state it again: I think Cyborg should create all the RPs needed for a smart NIC. It populates most traits. Some traits, like physnet, may need to come from the admin. The dmin creates device profiles. The user/tenant creates Neutron ports with the device profile name. Nova uses the device profiles in neutorn ports + 03:46:00 flavor to do the scheduling. 03:46:50 Yes, I agree. That's one of the solution 03:47:17 What alternative solution is under consideration? 03:47:35 ok, let me state 03:48:15 1. cyborg create rp, neutron use this rp and update phynet traits. 03:48:16 Sundar: ' Some traits, like physnet', should we report these traits to Placement? 03:48:43 2. neutron create rp, cyborg use this rp to update acc related rc and traits. 03:49:51 First, you are assuming it is ok for one service to create an RP and another to update it by adding traits. Secondly, who will create the top-level RP for multi-component NICs? 03:51:28 there is also solution by using provider-config.yaml 03:51:31 swp20: SOme traits cna be discovered from the device's PCI ID, like the type of NIC. Those can be added as traits by Cyborg. Others, like physnet or external network connecivity, are know only to the admin or the OpenStack installer. They are best left out of Cyborg. 03:51:45 The provider-config.yaml doe snot create RPs today. 03:52:07 It can be enhanced n the future to do that, but that will be another long discussion and development 03:52:16 cyborg create rp and other update traits. 03:53:11 I am investigating how neutron report physnet traits, it seems neutron has also create a logic rp. Not sure, need to verify this. 03:53:22 You are free to bring this up. Look at past IRC discussions before you spend time on this. 03:53:37 Yes, Neutron creates an RP for bandwidth provider feature. 03:54:31 Sundar: so how we scheduler the resources by these traits left out Cyborg? 03:55:01 Just one more point before I conclude: when we have multi-component NICs with different RPs, it is important that the resource classes and traits for the accelerator RP and the NIC RP be totally disjoint (no overlapping resource classes or traits. That will usually be the case. 03:55:15 neutron has a create_pci_requests_for_sriov_ports 03:55:35 swp20: The admin can reference such traits in the device profile which he creates in Cyborg. 03:56:15 Ok, this has taken a bit longer than I thought :). I didn't mean to take up the whole meeting. 03:56:39 Yes, every solution has it's pros & cons... Need more investigation 03:56:56 Sundar, xinranwang and all: IMHO, If nova and neutron have no objections, I would prefer the solution which Cyborg create the top-level RP for multi-component NICs. Since Cyborg does the lifecycle management of NICs. So I personally prefer Cyborg should create all the RPs needed for a smart NIC. 03:57:38 it should can create RP. but anyway, who create the RP, cyborg and neutron should have the same way, that means it know each other 03:57:43 Yumeng: Sure. We need to discuss with neutron folks on the bandwidth provider case, where they create the RP now 03:57:49 Yes, I prefer this too. RP creation in cyborg seems reasonable. 03:58:35 About the physnet trait. that's what we need to discuss more. 03:58:49 admin can set physnet for both cyborg and neutron. Tenant does not need to know this. Neutron can get the detail and device profile from cyborg, and add physnet when neutron call create_pci_requests_for_sriov_ports 03:59:25 And also need to discuss the consistency about rp created in neutron and cyborg. 03:59:41 agree 04:00:19 I think it's better to let cyborg ccreate RP. And for the traits, can cybog auto sync it from neuton? 04:00:29 *cyborg 04:00:32 Sundar: not familar with this, i will track it. thanks 04:00:49 Neutron need to provide an interface in this case. wangzhh 04:01:27 1. if we just let neutron the create set the physnet, that is convenient to admin. convenience maybe that need more change, more effort. 04:03:06 actually, cyborg can not know the physical networks. 04:03:06 It is bridge_mapping conf of neutron, not sure if we can get it from neutron net-show external-net. xinranwang 04:04:31 neutron will report physnet trait only if bandwidth feature enabled, as my understanding. So I am not sure we can get info from these apis 04:05:27 I would suggest that we keep networking aspects out of Cyborg. That can only lead to complexity and trouble. 04:06:01 agree. 04:06:39 seems we have run out of time! let's continue discussion in wechat or ML. 04:07:12 And I will quickly mention the left topics. 04:07:51 #topic third-party driver CI 04:09:13 we got three new drivers in victoria releases, do we need to require a CI for new drivers? what should be include in the tests? 04:11:43 From my opinion, third-party driver CI is good. But not sure each vendor can provide. I think cyborg should at least require one or two main driver vendors to provide this kind of CI tests. 04:11:49 what do you think? 04:12:31 Hmmm, better to have 3rd CI. We can keep the dreiver as experimental until it has 3rd CI 04:13:05 Yumeng: sounds like a good idea, but how do we force vendors to provide it? We can only say that, if hey provide it, we will support it, otherwise it is an unsupported driver. 04:13:30 xinranwang: yes, I think we are aligned. 04:14:07 Sundar: yes. we cannot require any vendor. sounds 'experimental' or 'supported' is a good idea.! 04:14:51 anyway, we don't have to conclude it today. we can think more. and continue discussion. 04:15:10 Would ZTE provide 3p CI for their smart NIC? Just curious :) 04:15:11 # topic storyboard usage 04:16:10 Sundar: haha not sure. I need to check with the network team 04:18:52 about the storyboard usage, we've got some complains in the PTG. I investigate some in it, and found it has a very flexible advantage in defining your own priority, worklists and so on. so it is worth use. 04:20:16 so please check links in topic 3-7 https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda, and review the usage guide. We need to decide how we want to use the storyboard. 04:20:56 Otherwise, if any other good solutions, we can also discuss. 04:21:24 I'm fine with whatever you all decide. 04:22:39 ok. Thanks Sundar. 04:22:41 #topic AoB 04:22:48 Does anybody want to bring up anything else? 04:23:14 seem nope. 04:23:47 That's so great today we had a very effective meeting today. Thank you all! 04:24:05 Thank you all 04:24:06 Let's meet next week. Have a good day/night! 04:24:12 Bye 04:24:27 #endmeeting