14:00:12 #startmeeting neutron_drivers 14:00:12 Meeting started Fri Jul 9 14:00:12 2021 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:12 The meeting name has been set to 'neutron_drivers' 14:00:18 hi 14:00:20 o/ 14:00:51 Good morning / afternoon / evening 14:00:52 hi 14:00:56 hi 14:01:16 o/ 14:01:52 Hi 14:02:14 hi 14:02:29 o/ 14:03:35 hi 14:03:48 let's see if yamamoto or njohnston show up. We need one of them to discuss the first RFE ion out agenda. Let's geve them a few minutes 14:04:03 https://review.opendev.org/admin/groups/5b063c96511f090638652067cf0939da1cb6efa7,members 14:04:07 nate is on PTO 14:05:00 let's wait a few minutes. If we don't get any of them, we can discuss the other two RFEs in our agenda since we have quorum for those 14:08:32 fnordahl, dmitriis: are you here to discuss a RFE? I don't want to have you waiting until the end 14:08:46 yes 14:08:58 yes 14:09:16 ok, I see fnordahl filed https://bugs.launchpad.net/neutron/+bug/1932154 14:09:23 so let's start there 14:09:44 #topic RFEs 14:10:08 https://bugs.launchpad.net/neutron/+bug/1932154 is up for discussion now 14:11:03 I have one question about the spec (about the architecture) 14:11:16 sure 14:11:26 in the schema, the "SmartNIC DPU Board" is hosting the OVN processes (controller, vswtichd, etc) 14:11:36 but any Neutron process? 14:12:00 Lajos Katona proposed openstack/networking-bagpipe master: Follow pyroute2 changes https://review.opendev.org/c/openstack/networking-bagpipe/+/800062 14:12:09 ralonsoh: We aim to avoid it in the OVN case. At least the goal is to not introduce any Neutron agents 14:12:30 at the moment there is a discussion upstream around representor plugging at the SmartNIC DPU side 14:12:38 ok so is running the backend, not the orchestrator processes 14:12:44 perfect then 14:13:10 yes, extra comms between Neutron and every SmartNIC DPU wouldn't be good 14:13:42 so far, I'm ok with the RFE (and the spec although I need to review it again) 14:14:35 we are actively discussing the OVN bits upstream since a lot depends on them 14:14:47 if it is only related to the backend, why do you need to use "binding:profile" (i.e. API visible field)? Can't you pass information from neutron (driver) to the backend? 14:15:07 I understand you need to pass information on host to the backend (ovn). 14:15:31 (I think this is for Nova) 14:15:47 as I understand it this is part of the current Nova -> Neutron communication which we continue to use here 14:16:12 the board serial number is needed to get an OVN chassis hostname 14:16:26 in case of port deletion that info needs to be present in the port as well 14:16:53 otherwise we don't know which SmartNIC DPU host should handle representor unplugging 14:18:31 (if that makes sense) 14:19:13 The OVN mechanism driver would most likely need it throughout the lifetime of the port, to know what requested-chassis to put on the LSP 14:19:44 I see. no neutron agent like sriov agent, so you need to pass host information to OVN via the ovn mech driver running in the API server. 14:19:55 yes 14:20:38 and that is why you have a converstion upstream with ovn, to be able to coordinate that, correct? 14:21:08 yes, we need an entity that would do the job similar to os-vif 14:21:25 how is that conversation going? 14:21:44 Our conversation with OVN is to get code to look up representor ports running alongside / as part of the ovn-controller and extending the CMS API to specify how to find it. 14:21:51 do you think you will get the necessary support implemented? 14:22:48 Our target is to get it into OVN 21.09, it has been a conversation since May, and recently we have made progress with a few compromises from both sides suggested. We don't have a +2/-2 as of now, but I feel confident we will be able to move forward. 14:23:38 * mlavalle shudders at the thought of having to get approvals from ovn upstream, nova and worst of all, neutron ;-) 14:23:51 add libvirt to the mix 14:23:53 ;) 14:23:58 LOL 14:24:01 mlavalle: yes, that's a tricky feature to coordinate 14:24:36 the libvirt part is just about retrieving PCIe VPD and exposing it 14:24:44 I'm fine with this RFE. I propose that we continue the detailed conversation in the spec 14:24:49 right 14:24:51 we had a conversation about it already in the ML and it was approved 14:25:03 which ML? 14:25:16 ^ I think dmitriis is talking about libvirt? 14:25:18 https://listman.redhat.com/archives/libvir-list/2021-May/msg00873.html 14:25:33 let me find a reply as well 14:25:44 thanks 14:26:26 https://listman.redhat.com/archives/libvir-list/2021-June/msg00037.html 14:26:48 there are several components involved. where can we get the whole picture on this? I think it is an important thing for people to understand how it works. 14:27:55 amotoki: the Nova spec contains a lot of information https://review.opendev.org/c/openstack/nova-specs/+/787458. It originally contained more Neutron and OVN pieces but we removed them during the review to keep it Nova-related 14:28:09 amotoki: I agree with you. Should we strive to get that whole picture in the spec phase? 14:28:39 The nova spec still has schematics (ascii) of the whole thing if I don't misremember dmitriis? 14:28:56 https://review.opendev.org/c/openstack/nova-specs/+/787458/9/specs/xena/approved/integration-with-off-path-network-backends.rst line 532 14:29:14 fnordahl: yes there's a bit more information in it regarding certain pieces 14:29:29 amotoki: mlavalle: would that schema help? 14:31:35 I might ask for more in the spec, but at first glance it looks like a good start to me 14:31:43 I cannot read through it right now, but it is okay as it looks like it covers the main picture. if we find something missing, we can cover it in the neutron spec (or add smth to the nova spec if needed) 14:31:56 we can look thru the nova spec when reviewing the neutron spec. 14:31:58 ack 14:32:10 mlavalle, amotoki: ack 14:32:42 I am feeling okay with this rfe. 14:33:51 yeah, we will have to read also at least the nova spec 14:34:06 haleyb: what are your thoughts? 14:34:30 i'm +1 on it, just trying to read through the spec too 14:34:45 complicated with all the dependencies 14:35:36 ok, I think we have the votes necessary yo approve this RFE today. Thanks fnordahl and dmitriis for this proposal. We'll see you in gerrit in the spec conversation 14:35:46 fnordahl, dmitriis: the subject of the rfe says Off-path SmartNIC Port Binding, but can we add "with OVN"? 14:36:02 sure 14:36:28 done 14:36:36 thank you all for taking the time to review and discuss it with us 14:36:38 dmitriis: thanks. it highlights the scope more :) 14:37:24 Yes, thanks a lot for considering it. The input is much appreciated as there are many pieces to this effort. 14:37:45 good luck herding all these cats! 14:38:04 👍 14:38:11 * mlavalle shudders again 14:38:22 :) I might just add professional cat herder to my CV after this 14:38:52 ok, next up for discussion is https://bugs.launchpad.net/neutron/+bug/1933222 14:40:02 I have a couple of question (that I'll add to the LP bug) 14:40:23 1) shouldn't be called metadata aggregation? this is what he is proposing 14:40:37 2) what is the resource (memory) saving we gain? 14:40:58 I saw that an haproxy is using around 2MB 14:41:17 we can have tens of then in one host without any problem 14:41:32 that's all from my side 14:42:36 he also raises reliability concerns in his rfe 14:43:13 RPC chattiness is another concern as I see 14:43:50 if not the main 14:44:54 because if I'm not wrong, what he is proposing is NOT having a single metadata agent in one single place 14:45:11 but one agent with one haproxy per host, right? 14:45:32 correct.... there will be an ovs bridge in each compute with flows properly set up 14:45:33 (from c#3) 14:45:57 so I don't see too much benefit on this (at least in memory or CPU usage) 14:46:06 is there a spec? i guess i would like to see a diagram better describing what this META_IP range is and how the mapping is going to work 14:46:09 no 14:47:16 so there is meta agent and meta proxy now, the RFE is going to replace meta agent only IIUC 14:47:41 same understanding 14:47:45 I think this proposal is in line with the distributed dhcp ( https://bugs.launchpad.net/neutron/+bug/1900934 ) 14:48:20 to have only as minimum agents as possible and that is ovs-agent 14:48:21 yeah, it is really an overall effort on yulong's part to distribute a lot of the functions 14:49:45 ok this is what I though initially but I didn't read that in the RFE 14:50:12 it is part of a broader picture 14:50:55 my understanding is as yours. it tries to replace the metadata agent to per-host agent-like feature. I am not sure who plays a role of the metadata agent i.e. who laucnhes local haproxy. 14:51:04 but perhaps it can be covered by a spec. 14:51:09 Hi guys, 14:51:29 hi, we are discussing https://bugs.launchpad.net/neutron/+bug/1933222 14:51:41 amotoki, it is ovs-agent 14:52:04 The datapath is VM -> br-int -> br-meta -> tap-meta -> host haproxy. 14:52:24 and all that path is in the same host 14:52:37 Yep 14:52:46 so you want OVS agent to control the metadata proxy (I din't read that in the RFE) 14:53:07 so may it result in more time needed to handle each new port? 14:53:08 ralonsoh, "So, I'd like to request for implementing an agent extension for Neutron 14:53:08 openvswitch agent to make the metadata datapath distributed." 14:53:10 I mean overall 14:53:48 currently the load is kind of shared between metadata/dhcp/l3 agents and ovs agent 14:53:49 obondarev, No, at least for our cloud. 14:54:17 if ovs agent becomes responsible for all this.. 14:54:20 We implement this about 2 years ago, like distribtued DHCP extension for ovs-agent. 14:55:01 so did you measure port provisioning time? 14:55:45 comparing to metadata agent, this is more reliable and fast. 14:55:46 if you did, it would be useful data in the rfe / spec 14:56:03 The origin metadata work procedure is: 14:56:41 VM -> router namespace -> haproxy -> metadata agent 14:57:15 with node hops in the middle 14:57:16 #link https://review.opendev.org/c/openstack/neutron/+/633871 14:58:10 This is the fix of race conditon between VM booting and L3 agent processing router ( creating "router namespace + haproxy" ). 14:59:23 one question: that feature will be independent to the L3 deployment (legacy, DVR, HA) now you don't need the router 14:59:26 right? 14:59:52 "router namespace -> haproxy -> metadata agent" any of these point down will have effect on all hosts for VMs. 15:00:16 but with your feature that won't be needed 15:00:19 right? 15:00:24 Yes 15:00:28 perfect 15:00:30 ok, we are at the top of the hour. We will start next meeting with this RFE, which will take place in two weeks 15:00:47 Have a nice weekend 15:00:51 you too 15:00:54 OK 15:00:56 this proposal replaces existing metadata-agent along with l3-agent and dhcp-agent with a per-host metadata-agent as part of ovs-agent. 15:00:57 #endmeeting