14:00:18 #startmeeting neutron_drivers 14:00:19 Meeting started Fri Jan 8 14:00:18 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:23 The meeting name has been set to 'neutron_drivers' 14:00:29 o/ 14:00:41 \o 14:00:42 hi 14:00:44 hi 14:00:45 o/ 14:00:53 hi 14:01:07 Hi 14:01:20 haleyb: njohnston yamamoto: ping 14:01:32 hi, didn't see reminder 14:01:49 hello everyone on the first drivers meeting in 2021 :) 14:01:56 first of all Happy New Year! 14:02:04 Happy New Year 14:02:18 o/ 14:02:22 happy new year 14:02:31 hny! 14:02:43 and now lets start as we have couple of topics to discuss 14:02:46 #topic RFEs 14:02:55 first one: 14:02:56 https://bugs.launchpad.net/neutron/+bug/1909100 14:02:59 Launchpad bug 1909100 in neutron "[RFE]add new vnic type "cyborg"" [Wishlist,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 14:03:28 I think xinranwang could explain this RFE a bit 14:03:47 sure 14:04:26 We hope to add a new vnic type for port to indicate that the port has a backend managed by cyborg 14:04:48 so that nova can trigger the interaction with cyborg according to this vnic type 14:06:04 based on last comments by gibi and ralonsoh in the LP I'm not sure we really need to add such new vnic type 14:06:16 and amotoki's comment 14:06:20 right 14:06:46 this port is almost a "direct" port 14:06:54 actually this is a PCI device 14:07:00 if there is no new vnic_type then nova will either ignor the vnic_type for these ports or neutron should enforce vnic_type=direct 14:07:07 or direct-physical 14:07:25 ignoring incoming vnic_type seems hackish 14:07:32 agree 14:07:36 yeah, it should behave like this. 14:08:17 vnic types are used to determine how neutron handles the port. my concern is what happens if two or more vnic types which are backed by cyborg. 14:08:51 maybe should limited it's should not be normal in neutron? after all neutron know what network should like. 14:09:30 for now, direct is supported, and for future the direct-physical is candidate. 14:10:02 so does it mean you need more vnic type for cybord backed ports? 14:10:43 the intent was to have a sperate vnic type dedicated to device that were managed by cyborg and not support the device-procfile with other vnic types 14:11:30 but this is not needed in the nova side 14:11:34 on the nova side we wanted a clear way to differenciate between hardwar offloaded ovs and ovs with cyborg ports or similar for ml2/sriov nic agent 14:11:41 and neutron can limit the device-profile to direct ports 14:12:07 sean, we had to use sriov agent 14:12:11 this can be done reading the port definition, with the "device_profile" extension 14:12:17 then we cant have colocation of hardware offloaded ovs and cyborg on the same compute 14:12:20 right? 14:12:24 we should limit that only new vnic type should have device -profile filled, if we have new vnic type. 14:12:36 we dont want to assume that any exssiting ml2 driver that support ovs will work with cyborg 14:13:08 * not support ovs support vnic type direct 14:13:21 sean, sure, only sriov-agent working, not support ovs managed vf 14:13:51 right but ml2/ovs support direct as does ovn for hardware offloaded ovs 14:14:14 we did not want those ml2 drivers to be bind the port in that case correct 14:14:17 OVN direct is for external ports (sriov) 14:14:51 ralonsoh: that will taken the hardware offloaded ovs codepath in os-vif 14:15:03 how ml2/ovs differentiate it from normal sriov direct 14:15:12 depending on the vif_type that is returned 14:15:17 speerat topic i guess 14:15:53 so the if no new vinc, neutron should limited the backend not been set if it's belong to ovs 14:16:10 base on vif_type 14:16:47 neutron folks? 14:16:48 yes so there are 2 things nova would have to treat the port as a normal sriov port 14:17:10 e.g. not attempt to add it to ovs but if its bound by the ml2/ovs dirver then we woudl try to add it to ovs 14:17:26 the only thing that would prevent that today woudl be the check for swtidev api 14:17:42 presumably the cyborg VF would not have that enable but they could in theory 14:17:44 so IIUC new vnic type will be mostly useful for nova, right? So nova will not need to do various "if's" 14:18:19 and will know that if vnic_type=='cyborg' then device_profile is needed also 14:18:20 slaweq: its also useful for neturon so existing ml2 dirver dont have to be modifed to filter out cyborg ports if they dont supprot them 14:18:22 is that correct? 14:18:36 yes 14:18:38 sean-k-mooney: right 14:19:12 we could technically make this work i without the new vnic type 14:19:28 and without this new type both, nova and neutron will need to do some if vnic_type=='direct' and device_profile not None, then "it's cyborg port" 14:19:29 but we felt being explicit as simpler 14:19:31 or something like that 14:19:35 correct? 14:19:43 yes 14:19:45 what about new vif? 14:19:55 just like ml2/ovs does it 14:20:13 thx sean-k-mooney for confirmation 14:20:25 yonglihe: sorry im not following can you restate that again 14:20:55 use vif to mark the port is 'cyborg backend' instead of vnic. 14:21:05 vif_type vs vnic_type 14:21:08 you cannot set the vif-type 14:21:13 ok 14:21:16 that is chosen by the driver 14:21:25 we could add a new one i guess 14:21:28 vif type is an output of the binding process 14:21:42 so that's now working 14:21:45 thanks 14:21:48 so, one more question - about amotoki's concern regarding name of the new vnic_type 14:21:52 so we could use vnic direct with vif-type cyborg 14:22:20 that would allow use to resus macvtap or direct-phsyical if we wanted too 14:22:27 can it be something else, to reflect " correspoding functionality rather than who implements the functionality."? 14:22:40 if we assume 'direct' vnic type with cybord support, isn't better to name it sa direct-cybord or direct-<....> with more functional name? 14:23:03 or "accelerator" maybe? 14:23:09 ok for me 14:23:13 if so, if you want to support cybord support with other vnic, we can name it as XXX-cybord/accerlarator. 14:23:24 amotoki: that came up in the ptg and i belive sylvain had a similar concern basicaly suggesting do not include the project name 14:23:48 that's a good idea IMO 14:23:48 accelerator-x might be nice 14:23:53 slaweq: acclerator and device-profile were both suggested before 14:24:03 :) 14:24:04 i have no strong feeling either way 14:24:19 acclerator-direct acclerator-direct-phy 14:24:21 let's not use the project name 14:24:27 agree 14:24:28 yonglihe: +1 14:24:39 +1 for yonglihe's idea 14:24:41 that is fine for me 14:24:43 yep accelerato- sound good to me 14:24:44 +1 14:24:55 nice 14:24:57 yonglihe: +1 14:25:04 +1 14:25:12 +1 14:25:14 I'll amend the patch today 14:25:24 haleyb: njohnston: any thoughts? 14:25:56 +1 from me 14:26:12 ralonsoh, thanks , i gonna verify that patch 3 days later 14:26:30 I will mark that rfe as approved 14:26:31 by the way we are avoiding the exsiting smartnic vnic type because that is used for ironic already. 14:26:33 https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/portbindings.py#L119 14:26:35 with note about naming change 14:27:13 next RFE now 14:27:15 slaweq ralonsoh thanks 14:27:15 https://bugs.launchpad.net/neutron/+bug/1900934 14:27:19 Launchpad bug 1900934 in neutron "[RFE][DHCP][OVS] flow based DHCP" [Wishlist,New] - Assigned to LIU Yulong (dragon889) 14:27:23 thank You xinranwang for proposal 14:27:54 regarding LP 1900934 - this was already discussed few times 14:28:10 liuyulong proposed spec already https://review.opendev.org/c/openstack/neutron-specs/+/768588 14:28:20 but rfe is still not decided 14:28:57 so I think we should decide if we want to go with this solution, and continue discussion about details in the review of spec or if we don't want it in neutron at all 14:29:03 slaweq: i assume this is doing dhcp via openflow rules similar ot ovn with ml2/ovs? 14:29:11 sean-k-mooney: yes 14:29:15 exactly 14:29:43 cool that would be nice espcially for routed networks 14:29:56 since each l2 agent could provide dhcp for the segment 14:30:05 I also have to say that my employer might be interested on this 14:30:26 assumign it was done as an l2 agent exteion rather then in the dhcp agent 14:30:32 I'm ok with the RFE, just some comments in the spec (we can move the discussion there) 14:30:43 sean-k-mooney: that is original proposal IIRC 14:30:46 just wondering the gaps between DHCP agent and OVS DHCP 14:31:29 ralonsoh: one of the gaps will be for sure that there will be no dns names resolving in such case 14:31:32 only dhcp 14:31:35 yeah 14:31:56 how about extra dhcp optios? 14:32:01 also, I'm not sure if all extra-dhcp-options will work 14:32:05 amotoki++ 14:32:13 probably some of them may not work, I'm not sure 14:32:25 but IMHO that 14:32:33 anyway it can be covered by documentation on feature differences between flow-based dhcp and dhcp-agent 14:32:36 that is fine as long as it will be documented 14:32:51 amotoki: You are faster than me again :P 14:32:59 and will serve a lot of "plain vanilla' dhcp cases 14:33:28 I think it is better to call it as "flow-based dhcp agent" rather than distributed dhcp agent. slaweq's rfe covers distributed agent in some way too. 14:34:50 amotoki: technically it's not even "agent" but dhcp-extension 14:35:14 slaweq: correct. I know it is not an agent. 14:35:36 I spelled "dhcp AGENT" too many times :( 14:35:43 :) 14:37:08 so let's approve it and move on with the RFE 14:37:20 +1 from me 14:37:36 +1 14:37:37 i'm fine to approve it 14:37:39 +1 from me 14:37:40 mlavalle: that is also my vote - lets approve rfe and continue discussion about details in spec review 14:37:46 so +1 14:38:12 I will mark this rfe as approved 14:38:14 thx 14:38:27 last rfe for today 14:38:28 https://bugs.launchpad.net/neutron/+bug/1910533 14:38:41 Launchpad bug 1910533 in neutron "[RFE] New dhcp agents scheduler - use all agents for each network" [Wishlist,New] - Assigned to Slawek Kaplonski (slaweq) 14:39:05 I proposed that RFE but ralonsoh may have more details about use case 14:39:12 I can confirm this is a source of problems in some deployments 14:39:16 as he was recently "impacted" by this limitation :) 14:39:52 if you have several leafs in a deployment and networks across those leafs 14:40:26 if you don't specify the correct number of DHCP agents, some leafs won't have a DHCP agent running 14:40:34 and the VMs won't have IP 14:40:53 ralonsoh: is a broadcast domain separeted? 14:40:59 yes 14:41:55 to overcome it we need to use dhcp-relay or deploy dhcp agents per broadcast domain 14:42:14 this request sounds reasonable to me 14:42:18 ralonsoh: so when you add a leaf/site but don't increase the agents it doesn't get an agent? 14:42:35 exactly this is the problem 14:42:57 ack, thanks 14:43:05 haleyb: right as number of agents per network isn't related to sites at all 14:43:32 so such new scheduler could be simply "workaround" of that problem 14:44:11 ralonsoh: technically you could deploy 1 dhcp instance per network segment 14:44:30 yes, that's a possibility 14:44:37 at least for routed networks 14:44:39 but you need to know where is each agent 14:44:57 you kind of already do 14:45:13 you know its hosts and the segment mappings 14:45:43 but again only in the routed networks case 14:47:02 increasing the dhcp agent count would not guarenttee it is on the leaf site right 14:47:16 it could add another instnace to the central site in principal 14:47:20 ok, I was looking for the BZ 14:47:21 https://bugzilla.redhat.com/show_bug.cgi?id=1886622 14:47:33 bugzilla.redhat.com bug 1886622 in openstack-neutron "Floating IP assigned to an instance is not accessible in scale up Multistack env with spine& leaf topology" [High,Assigned] - Assigned to ralonsoh 14:47:35 it has public information about this error 14:48:04 we can assume a deployment knows which network node belongs to which segment 14:48:09 sean-k-mooney: right, that's why we propose to add scheduler which will schedule network to all dhcp agents 14:48:19 so really when adding a new leaf site today you would have to expcitly add an instnace to the new agent deployed at that site 14:48:25 amotoki, yes, that's correct 14:48:27 if we deploy dhcp agent per segment, scheduling a netwokr to all dhcp agents will be a workaround 14:49:10 amotoki: you could do both. all agent if not a routed network and per segment if it is. but per segement is just an optimisation really 14:49:53 sean-k-mooney: yes, that's right 14:50:08 TBH Liu's proposal about distributed dhcp would solve this use case also 14:50:45 right (for OVS) 14:50:53 ralonsoh: yep 14:51:37 but would be desirable to have this dhcp scheduler to avoid the need of setting the exact number of DHCP agents needed 14:52:20 yeah agree. deployments can continue to use DHCP agent they are familiar with too. 14:52:43 I am not sure we need a new dhcp agent scheduler for this. 14:52:46 Another option is to modify the current dhcp agent scheduler to accept an option like dhcp_agent_per_network=all 14:53:02 agree 14:53:19 amotoki: that may be good idea 14:54:18 do you prefer to explore this idea? the change in the code will be smaller 14:54:37 ralonsoh: I can 14:54:51 and we will get back to that in next weeks 14:55:21 anyway I am okay with the basic idea to assign a network to all agents. 14:55:38 +1 to this idea 14:55:56 so do You want to vote on approval rfe as an idea today, or wait more for some PoC code? 14:56:16 (I will not vote as I proposed rfe) 14:56:17 +1 14:56:24 I can wait for a POC 14:56:35 i am okay with either way 14:56:45 but we can surely approve the RFE 14:56:45 +1 from me 14:57:03 if the PoC is not satusfactory, we can scrap it 14:57:12 I think we can trust slaweq 14:57:17 thx :) 14:57:17 can't we? 14:57:18 maybe... 14:57:20 hehehehe 14:57:23 :P 14:57:28 hehe :) 14:57:35 I don't trust myself so I'm not sure ;) 14:57:41 but thank You 14:57:48 I will mark this one as approved also 14:57:56 this was really effective meeting 14:58:00 3 rfes approved 14:58:00 sure 14:58:06 thank You 14:58:14 o/ 14:58:16 I think we can call it a meeting now 14:58:20 bye! 14:58:29 have a great weekend and see You online 14:58:29 o/ 14:58:31 o/ 14:58:33 #endmeeting