14:01:29 #startmeeting neutron_drivers 14:01:29 Meeting started Fri Jun 28 14:01:29 2024 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:29 The meeting name has been set to 'neutron_drivers' 14:01:46 hello 14:02:05 \o 14:02:22 o/ 14:02:41 o/ 14:03:08 i guess that's quorum at 5 14:03:45 we had three items in the agenda. i did see an RFE bug come in yesterday but have not looked yet, if we have time can do that as well 14:04:03 #link https://bugs.launchpad.net/neutron/+bug/2070376 14:04:10 ralonsoh: yours is first 14:04:18 thanks, in one shot, to speed up 14:04:21 Hi, I would like to talk about https://bugs.launchpad.net/neutron/+bug/2070376 14:04:21 This is related to the community goal "eventlet-deprecation" 14:04:21 I started the investigation of how DHCP agent works and I found that we have several threads running at the same time. 14:04:21 1 for the periodic resyncs, 1 for reading the RPC messages, 1 to notify the server about the port reservations 14:04:22 and multiple threads to process the RPC events and execute the resource updates 14:04:24 There we have several problems: if we switch to a preemptive thread module, we no longer will have the "protection" of the collaborative threads and we'll need to add execution locks 14:04:27 But my main concert with this resource update threads are that we don't really gain any speed having more than one thread 14:04:30 In https://review.opendev.org/c/openstack/neutron/+/626830 we implemented a way to process the ports using a priority queue 14:04:33 But along with this patch we introduced the multithreading processing, that doesn't add any speed gain 14:04:36 So my proposal is to keep the priority queue (that is of coursed valuable and needed) but remove the multithreading for the event processing 14:04:39 That will (1) not reduce the processing speed and (2) make more robust the event processing when changing to kernel threads 14:05:52 (I have a very simple patch testing that: https://review.opendev.org/c/openstack/neutron/+/922719/) 14:06:24 and this removes the need for locking as well? 14:06:42 oh, you said that 14:06:48 big copy/paste 14:06:49 more or less, because we have the port reservations notification 14:07:06 but at least the single thread processing the events will be unique 14:08:03 i guess i always thought the multiple threads helped when configuring, like creating namespaces, when lots were involved, but you didn't see that? 14:08:39 we can use the rally test to validate that 14:09:03 but I don't see speed gain with multiple threads 14:09:12 rally CI, I mean 14:09:20 this multi threads could be most useful when e.g. agent was restarted 14:09:29 but how? 14:09:30 and have to go over many networks and ports 14:09:44 there won't be multiple threads working at the same time 14:09:47 this is python 14:10:01 right 14:10:17 I was more refering to what haleyb wrote 14:10:22 to improve that (and I think that was commented before) we should need a multiprocess DHCP agent 14:10:54 right, i think it was always restart when there were issues, like an hour wait, although the l3-agent was always worse 14:10:57 but for that we need locks 14:11:20 which is ralonsoh's point 14:11:41 and, btw, we use ctypes.PyDLL 14:12:04 # NOTE(ralonsoh): from https://docs.python.org/3.6/library/ 14:12:04 # ctypes.html#ctypes.PyDLL: "Instances of this class behave like CDLL 14:12:04 # instances, except that the Python GIL is not released during the 14:12:04 # function call, and after the function execution the Python error 14:12:04 # flag is checked." 14:12:24 so the GIL will be attached to this thread 14:12:54 late o/ 14:13:21 I can, of course, do some load testing with multiple VMs on a node and restarting the DHCP agent 14:13:37 with different DHCP_PROCESS_GREENLET_MIN/DHCP_PROCESS_GREENLET_MAX values 14:14:02 that would be good test IMHO 14:14:11 agree 14:14:26 perfect, I'll spawn a single compute node with tons of RAM and I'll try to spawn as many VMs as possible 14:14:37 and the restart the agent with different thread values 14:14:43 I'll update the LP bug 14:14:52 I don't think You really need vms for that 14:15:03 probably creating networks and ports would be enough 14:15:05 just ports in the netowkr 14:15:07 right 14:15:22 maybe https://github.com/slawqo/neutron-heater can help with that too 14:15:33 ahh yes 14:16:03 so let's wait for my feedback on this, but please consider the problem we have ahead with the eventlet deprecation 14:17:39 (that's all from my side, thanks a lot) 14:17:51 thanks ralonsoh 14:18:13 yes, thanks ralonsoh 14:18:47 ralonsoh: thanks, will keep a lookout on the patches 14:19:55 and i guess we don't need to vote as it's not an rfe, but i agree with doing this work 14:20:17 we've been having many problems with eventlet, especially when running neutron-api with uwsgi, so I'm looking forward to this 14:21:26 slaweq: yours is next 14:21:30 #link https://bugs.launchpad.net/neutron/+bug/2060916 14:22:13 thx 14:22:26 I recentrly wanted to finally start working on this 14:22:50 it came up when we introduced 'service' role policies 14:23:25 as it seems that with those new policies trusted_vif can't be set through the 'binding_profile' attribute 14:24:07 so this is pretty small and easy RFE to do where new api extension would be proposed and it would add new attribute to the port 14:24:37 this field would be then set by neutron in the binding_profile to be send e.g. to nova 14:24:39 as it is now 14:24:55 so that other components would not require changes 14:25:20 and binding_profile would be used (more) as it should be used so for the machine2machine communication 14:25:34 that's all from me 14:25:50 +1 to decouple Neutron configurations parameters written in port.binding_profile, as done before with others 14:26:30 +1, sounds reasonable 14:26:35 +1 14:26:42 +1 14:27:11 +1 from me 14:28:07 thank You, so I assume that RFE is approved and I can start work on it now, right? 14:28:22 yes 14:28:34 fire away 14:28:40 yes, i will mark it approved, don't think you need a spec as the bug is pretty clear 14:28:41 thank You, that's all from me then :) 14:29:14 haleyb exactly, that's why I didn't propose any spec until now as I was hoping it will not be needed :) 14:29:51 the next one was added by me (and mlavalle :) 14:29:56 #link https://bugs.launchpad.net/neutron/+bug/2067183 14:30:05 #link https://review.opendev.org/c/openstack/neutron/+/920459 14:31:16 I added because we have broken things when tweaking dns_domain in the past 14:31:24 again DNS :) 14:31:45 we have gotten and even implemented this in the past (Assaf implemented it) and then we reveresed it 14:31:46 we broke it so many times that I can't even count them :P 14:32:33 so I think there is a group of users whose use case we are not properly covering 14:33:01 so in https://review.opendev.org/c/openstack/neutron/+/571546 we were directly reading the network dns_domain value in the DHCP agent 14:33:19 and in your proposal you are inheriting this value from the network 14:33:19 while at the same time we are trying to preserve the current behavior, which was specified here https://specs.openstack.org/openstack/neutron-specs/specs/liberty/internal-dns-resolution.html 14:33:25 is almost the same, right? 14:33:33 it's not my proposal 14:33:46 Jay Jahns proposal 14:34:10 I just thought, while looking at the patch, that we are not addressing a use case 14:35:06 so why don't we do this optional through an extension? The code change mostly is in a ml2 extension: https://review.opendev.org/c/openstack/neutron/+/920459 14:35:35 so why not create a new extension which allows users to have this new behavior? 14:35:57 that won't break current deployments and will allow this network dns inheritance 14:36:06 +1 to this idea 14:36:08 yeap 14:36:18 my (minor) issue with that is that we have already so many dns integration extensions that it may not be easy for users which they should use 14:36:20 like there wouldn't be enough dns extensions already :-/ 14:36:29 correct... 14:36:37 and you cannot stack them 14:36:40 and they inherits one from the other 14:36:47 maybe just a bit more descriptive name.. 14:37:00 yes, but we have users who seem to need a new behavior 14:37:42 mlavalle: this change (as is) could break things for users not expecting it i'd guess? 14:37:50 and it keeps returning at us 14:38:01 agree with the number of DNS related extensions, and the problems configuring them (some of them are incompatible) 14:38:06 haleyb: yes, think so 14:38:07 but that could be documented 14:38:31 another problem with having such two different extensions is testing of them in the CI 14:39:09 do we have now jobs with DNS without designate? 14:39:17 maybe we should look at it from the other perspective and e.g. propose new API extension which would fit this 'new' use case? 14:39:43 that's exactly what I'm saying slaweq 14:40:39 mlavalle so you are talking about api extension now? I though that You want to have another ml2 plugin extension for that 14:41:19 lajoskatona I though that there are (or were) some tests like that in neutron_tempest_plugin and were run in every our job there 14:41:29 slaweq: I meant booth. A new API extension that is implemented by a ml2 extension 14:41:29 but maybe I'm wrong and we don't have them anymore 14:42:05 I would be fine with new API extension for sure 14:43:14 regarding new ml2 extension related to dns - ok, but maybe we could somehow refactor what we have now and add this new functionality to the exisitng one somehow? But this can also be maybe done as a different task probably 14:43:25 slaweq: we have these extensions in zuul for DNS: dns-domain-ports, dns-integration, dns-integration-domain-keywords so it seems we have some tests 14:43:34 so an API extension to add something to the network? 14:43:51 that could be an option, to add a new field to the network 14:44:14 so this behaviour will apply not globally but per network 14:45:12 ++ 14:45:13 and this could be implemented, most probably, in the current dns plugin extensions 14:45:23 exactly 14:45:35 ralonsoh++ for that 14:45:55 that would be IMO even better if we could do it in existing ml2 extension(s) 14:46:07 agree 14:47:50 +1 for new field for network 14:47:57 +1 to this network API DNS extension 14:48:14 +1 14:48:41 +1 14:48:54 +1 14:49:04 +1 14:49:15 mlavalle: can you write-up ^^ and put it in the bug? you are better at dns wording than i am :) 14:49:31 yes, I'll take care of it haleyb 14:49:55 and I'll help Jan with the implementation 14:50:49 mlavalle: great, thanks 14:52:05 i'm not sure we have time to talk about rfe liushy filed yesterday as it has not been triaged 14:52:43 #link https://bugs.launchpad.net/neutron/+bug/2071323 14:52:52 in case anyone is wondering 14:53:58 but now reading that it looks like the metering agent did something like it 14:54:45 ovs can send sflow data to some monitoring tool IIRC 14:54:57 wouldn't that be enough? 14:55:11 yes, ovs can do that 14:55:19 I've tested it 14:55:33 for the SG rules accept/deny statistics we have SG logging - maybe that is enough 14:55:45 thx mlavalle for confirmation 14:56:09 I am not sure what data should neutron agents collets according to this rfe 14:56:28 I think this would require more detailed description IMO 14:56:36 I think we is thinking about OVS agent, but I'm just guessing 14:56:51 yes, probably 14:57:13 slaweq: right, there are some pieces in place, and i'm not sure either, but agree it is probably OVS related based on their deployments 14:57:15 but this agent can already be busy 14:58:53 can we request more info or to participate in this meeting? 14:59:15 I will put a comment in there asking, and yes, it would be better if he was in the meeting 14:59:53 +1 15:00:20 ++ 15:00:29 +1 15:00:47 that said, with it being summer, I will be out the next two Fridays (US holiday-ish, vacation), and again a couple weeks after 15:01:05 I will be off this coming week 15:01:15 lucky you! 15:01:32 enjoy it :-) 15:01:41 but if liushy can attend on july 12th maybe someone else can lead? assuming quorum 15:01:47 enjoy 15:01:53 have a nice vacation mlavalle! 15:01:56 we can lead the meeting, for sure 15:02:04 thanks 15:02:22 ok, i will ask, i know it's hard for timezone 15:02:35 ralonsoh can lead the weekly meeting and I can lead the drivers, or viceversa 15:02:49 whichever he prefers 15:02:59 perfect for me, I can lead the weekly meeting next week 15:03:18 i will be here for next week's neutron meeting, just not drivers 15:03:27 ah perfect 15:03:53 so if there is an rfe you can run drivers, up to you based on that 15:04:10 can I get someone to push this over the edge, please: https://review.opendev.org/c/openstack/neutron/+/918151 15:04:12 we will see if there will be quorum 15:04:15 ? 15:04:17 we have no way to share schedules with other really 15:04:36 anyways, i will end this meeting, thanks for attending and discussion! 15:04:38 we'll check next Friday 15:04:39 #endmeeting