14:01:29 <haleyb> #startmeeting neutron_drivers 14:01:29 <opendevmeet> Meeting started Fri Jun 28 14:01:29 2024 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:29 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:29 <opendevmeet> The meeting name has been set to 'neutron_drivers' 14:01:46 <ralonsoh> hello 14:02:05 <mlavalle> \o 14:02:22 <slaweq> o/ 14:02:41 <lajoskatona> o/ 14:03:08 <haleyb> i guess that's quorum at 5 14:03:45 <haleyb> we had three items in the agenda. i did see an RFE bug come in yesterday but have not looked yet, if we have time can do that as well 14:04:03 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2070376 14:04:10 <haleyb> ralonsoh: yours is first 14:04:18 <ralonsoh> thanks, in one shot, to speed up 14:04:21 <ralonsoh> Hi, I would like to talk about https://bugs.launchpad.net/neutron/+bug/2070376 14:04:21 <ralonsoh> This is related to the community goal "eventlet-deprecation" 14:04:21 <ralonsoh> I started the investigation of how DHCP agent works and I found that we have several threads running at the same time. 14:04:21 <ralonsoh> 1 for the periodic resyncs, 1 for reading the RPC messages, 1 to notify the server about the port reservations 14:04:22 <ralonsoh> and multiple threads to process the RPC events and execute the resource updates 14:04:24 <ralonsoh> There we have several problems: if we switch to a preemptive thread module, we no longer will have the "protection" of the collaborative threads and we'll need to add execution locks 14:04:27 <ralonsoh> But my main concert with this resource update threads are that we don't really gain any speed having more than one thread 14:04:30 <ralonsoh> In https://review.opendev.org/c/openstack/neutron/+/626830 we implemented a way to process the ports using a priority queue 14:04:33 <ralonsoh> But along with this patch we introduced the multithreading processing, that doesn't add any speed gain 14:04:36 <ralonsoh> So my proposal is to keep the priority queue (that is of coursed valuable and needed) but remove the multithreading for the event processing 14:04:39 <ralonsoh> That will (1) not reduce the processing speed and (2) make more robust the event processing when changing to kernel threads 14:05:52 <ralonsoh> (I have a very simple patch testing that: https://review.opendev.org/c/openstack/neutron/+/922719/) 14:06:24 <haleyb> and this removes the need for locking as well? 14:06:42 <haleyb> oh, you said that 14:06:48 <haleyb> big copy/paste 14:06:49 <ralonsoh> more or less, because we have the port reservations notification 14:07:06 <ralonsoh> but at least the single thread processing the events will be unique 14:08:03 <haleyb> i guess i always thought the multiple threads helped when configuring, like creating namespaces, when lots were involved, but you didn't see that? 14:08:39 <ralonsoh> we can use the rally test to validate that 14:09:03 <ralonsoh> but I don't see speed gain with multiple threads 14:09:12 <ralonsoh> rally CI, I mean 14:09:20 <slaweq> this multi threads could be most useful when e.g. agent was restarted 14:09:29 <ralonsoh> but how? 14:09:30 <slaweq> and have to go over many networks and ports 14:09:44 <ralonsoh> there won't be multiple threads working at the same time 14:09:47 <ralonsoh> this is python 14:10:01 <slaweq> right 14:10:17 <slaweq> I was more refering to what haleyb wrote 14:10:22 <ralonsoh> to improve that (and I think that was commented before) we should need a multiprocess DHCP agent 14:10:54 <haleyb> right, i think it was always restart when there were issues, like an hour wait, although the l3-agent was always worse 14:10:57 <lajoskatona> but for that we need locks 14:11:20 <mlavalle> which is ralonsoh's point 14:11:41 <ralonsoh> and, btw, we use ctypes.PyDLL 14:12:04 <ralonsoh> # NOTE(ralonsoh): from https://docs.python.org/3.6/library/ 14:12:04 <ralonsoh> # ctypes.html#ctypes.PyDLL: "Instances of this class behave like CDLL 14:12:04 <ralonsoh> # instances, except that the Python GIL is not released during the 14:12:04 <ralonsoh> # function call, and after the function execution the Python error 14:12:04 <ralonsoh> # flag is checked." 14:12:24 <ralonsoh> so the GIL will be attached to this thread 14:12:54 <obondarev_> late o/ 14:13:21 <ralonsoh> I can, of course, do some load testing with multiple VMs on a node and restarting the DHCP agent 14:13:37 <ralonsoh> with different DHCP_PROCESS_GREENLET_MIN/DHCP_PROCESS_GREENLET_MAX values 14:14:02 <slaweq> that would be good test IMHO 14:14:11 <mlavalle> agree 14:14:26 <ralonsoh> perfect, I'll spawn a single compute node with tons of RAM and I'll try to spawn as many VMs as possible 14:14:37 <ralonsoh> and the restart the agent with different thread values 14:14:43 <ralonsoh> I'll update the LP bug 14:14:52 <slaweq> I don't think You really need vms for that 14:15:03 <slaweq> probably creating networks and ports would be enough 14:15:05 <ralonsoh> just ports in the netowkr 14:15:07 <ralonsoh> right 14:15:22 <slaweq> maybe https://github.com/slawqo/neutron-heater can help with that too 14:15:33 <ralonsoh> ahh yes 14:16:03 <ralonsoh> so let's wait for my feedback on this, but please consider the problem we have ahead with the eventlet deprecation 14:17:39 <ralonsoh> (that's all from my side, thanks a lot) 14:17:51 <lajoskatona> thanks ralonsoh 14:18:13 <mlavalle> yes, thanks ralonsoh 14:18:47 <haleyb> ralonsoh: thanks, will keep a lookout on the patches 14:19:55 <haleyb> and i guess we don't need to vote as it's not an rfe, but i agree with doing this work 14:20:17 <seba> we've been having many problems with eventlet, especially when running neutron-api with uwsgi, so I'm looking forward to this 14:21:26 <haleyb> slaweq: yours is next 14:21:30 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2060916 14:22:13 <slaweq> thx 14:22:26 <slaweq> I recentrly wanted to finally start working on this 14:22:50 <slaweq> it came up when we introduced 'service' role policies 14:23:25 <slaweq> as it seems that with those new policies trusted_vif can't be set through the 'binding_profile' attribute 14:24:07 <slaweq> so this is pretty small and easy RFE to do where new api extension would be proposed and it would add new attribute to the port 14:24:37 <slaweq> this field would be then set by neutron in the binding_profile to be send e.g. to nova 14:24:39 <slaweq> as it is now 14:24:55 <slaweq> so that other components would not require changes 14:25:20 <slaweq> and binding_profile would be used (more) as it should be used so for the machine2machine communication 14:25:34 <slaweq> that's all from me 14:25:50 <ralonsoh> +1 to decouple Neutron configurations parameters written in port.binding_profile, as done before with others 14:26:30 <obondarev> +1, sounds reasonable 14:26:35 <lajoskatona> +1 14:26:42 <mlavalle> +1 14:27:11 <haleyb> +1 from me 14:28:07 <slaweq> thank You, so I assume that RFE is approved and I can start work on it now, right? 14:28:22 <mlavalle> yes 14:28:34 <mlavalle> fire away 14:28:40 <haleyb> yes, i will mark it approved, don't think you need a spec as the bug is pretty clear 14:28:41 <slaweq> thank You, that's all from me then :) 14:29:14 <slaweq> haleyb exactly, that's why I didn't propose any spec until now as I was hoping it will not be needed :) 14:29:51 <haleyb> the next one was added by me (and mlavalle :) 14:29:56 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2067183 14:30:05 <haleyb> #link https://review.opendev.org/c/openstack/neutron/+/920459 14:31:16 <haleyb> I added because we have broken things when tweaking dns_domain in the past 14:31:24 <slaweq> again DNS :) 14:31:45 <mlavalle> we have gotten and even implemented this in the past (Assaf implemented it) and then we reveresed it 14:31:46 <slaweq> we broke it so many times that I can't even count them :P 14:32:33 <mlavalle> so I think there is a group of users whose use case we are not properly covering 14:33:01 <ralonsoh> so in https://review.opendev.org/c/openstack/neutron/+/571546 we were directly reading the network dns_domain value in the DHCP agent 14:33:19 <ralonsoh> and in your proposal you are inheriting this value from the network 14:33:19 <mlavalle> while at the same time we are trying to preserve the current behavior, which was specified here https://specs.openstack.org/openstack/neutron-specs/specs/liberty/internal-dns-resolution.html 14:33:25 <ralonsoh> is almost the same, right? 14:33:33 <mlavalle> it's not my proposal 14:33:46 <ralonsoh> Jay Jahns proposal 14:34:10 <mlavalle> I just thought, while looking at the patch, that we are not addressing a use case 14:35:06 <mlavalle> so why don't we do this optional through an extension? The code change mostly is in a ml2 extension: https://review.opendev.org/c/openstack/neutron/+/920459 14:35:35 <mlavalle> so why not create a new extension which allows users to have this new behavior? 14:35:57 <ralonsoh> that won't break current deployments and will allow this network dns inheritance 14:36:06 <ralonsoh> +1 to this idea 14:36:08 <mlavalle> yeap 14:36:18 <slaweq> my (minor) issue with that is that we have already so many dns integration extensions that it may not be easy for users which they should use 14:36:20 <frickler> like there wouldn't be enough dns extensions already :-/ 14:36:29 <ralonsoh> correct... 14:36:37 <frickler> and you cannot stack them 14:36:40 <slaweq> and they inherits one from the other 14:36:47 <obondarev> maybe just a bit more descriptive name.. 14:37:00 <mlavalle> yes, but we have users who seem to need a new behavior 14:37:42 <haleyb> mlavalle: this change (as is) could break things for users not expecting it i'd guess? 14:37:50 <mlavalle> and it keeps returning at us 14:38:01 <ralonsoh> agree with the number of DNS related extensions, and the problems configuring them (some of them are incompatible) 14:38:06 <mlavalle> haleyb: yes, think so 14:38:07 <ralonsoh> but that could be documented 14:38:31 <slaweq> another problem with having such two different extensions is testing of them in the CI 14:39:09 <lajoskatona> do we have now jobs with DNS without designate? 14:39:17 <slaweq> maybe we should look at it from the other perspective and e.g. propose new API extension which would fit this 'new' use case? 14:39:43 <mlavalle> that's exactly what I'm saying slaweq 14:40:39 <slaweq> mlavalle so you are talking about api extension now? I though that You want to have another ml2 plugin extension for that 14:41:19 <slaweq> lajoskatona I though that there are (or were) some tests like that in neutron_tempest_plugin and were run in every our job there 14:41:29 <mlavalle> slaweq: I meant booth. A new API extension that is implemented by a ml2 extension 14:41:29 <slaweq> but maybe I'm wrong and we don't have them anymore 14:42:05 <slaweq> I would be fine with new API extension for sure 14:43:14 <slaweq> regarding new ml2 extension related to dns - ok, but maybe we could somehow refactor what we have now and add this new functionality to the exisitng one somehow? But this can also be maybe done as a different task probably 14:43:25 <lajoskatona> slaweq: we have these extensions in zuul for DNS: dns-domain-ports, dns-integration, dns-integration-domain-keywords so it seems we have some tests 14:43:34 <haleyb> so an API extension to add something to the network? 14:43:51 <ralonsoh> that could be an option, to add a new field to the network 14:44:14 <ralonsoh> so this behaviour will apply not globally but per network 14:45:12 <slaweq> ++ 14:45:13 <ralonsoh> and this could be implemented, most probably, in the current dns plugin extensions 14:45:23 <mlavalle> exactly 14:45:35 <slaweq> ralonsoh++ for that 14:45:55 <slaweq> that would be IMO even better if we could do it in existing ml2 extension(s) 14:46:07 <ralonsoh> agree 14:47:50 <lajoskatona> +1 for new field for network 14:47:57 <ralonsoh> +1 to this network API DNS extension 14:48:14 <mlavalle> +1 14:48:41 <haleyb> +1 14:48:54 <obondarev> +1 14:49:04 <slaweq> +1 14:49:15 <haleyb> mlavalle: can you write-up ^^ and put it in the bug? you are better at dns wording than i am :) 14:49:31 <mlavalle> yes, I'll take care of it haleyb 14:49:55 <mlavalle> and I'll help Jan with the implementation 14:50:49 <haleyb> mlavalle: great, thanks 14:52:05 <haleyb> i'm not sure we have time to talk about rfe liushy filed yesterday as it has not been triaged 14:52:43 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2071323 14:52:52 <haleyb> in case anyone is wondering 14:53:58 <haleyb> but now reading that it looks like the metering agent did something like it 14:54:45 <slaweq> ovs can send sflow data to some monitoring tool IIRC 14:54:57 <slaweq> wouldn't that be enough? 14:55:11 <mlavalle> yes, ovs can do that 14:55:19 <mlavalle> I've tested it 14:55:33 <slaweq> for the SG rules accept/deny statistics we have SG logging - maybe that is enough 14:55:45 <slaweq> thx mlavalle for confirmation 14:56:09 <slaweq> I am not sure what data should neutron agents collets according to this rfe 14:56:28 <slaweq> I think this would require more detailed description IMO 14:56:36 <ralonsoh> I think we is thinking about OVS agent, but I'm just guessing 14:56:51 <slaweq> yes, probably 14:57:13 <haleyb> slaweq: right, there are some pieces in place, and i'm not sure either, but agree it is probably OVS related based on their deployments 14:57:15 <slaweq> but this agent can already be busy 14:58:53 <ralonsoh> can we request more info or to participate in this meeting? 14:59:15 <haleyb> I will put a comment in there asking, and yes, it would be better if he was in the meeting 14:59:53 <lajoskatona> +1 15:00:20 <slaweq> ++ 15:00:29 <ralonsoh> +1 15:00:47 <haleyb> that said, with it being summer, I will be out the next two Fridays (US holiday-ish, vacation), and again a couple weeks after 15:01:05 <mlavalle> I will be off this coming week 15:01:15 <ralonsoh> lucky you! 15:01:32 <lajoskatona> enjoy it :-) 15:01:41 <haleyb> but if liushy can attend on july 12th maybe someone else can lead? assuming quorum 15:01:47 <slaweq> enjoy 15:01:53 <obondarev> have a nice vacation mlavalle! 15:01:56 <ralonsoh> we can lead the meeting, for sure 15:02:04 <mlavalle> thanks 15:02:22 <haleyb> ok, i will ask, i know it's hard for timezone 15:02:35 <mlavalle> ralonsoh can lead the weekly meeting and I can lead the drivers, or viceversa 15:02:49 <mlavalle> whichever he prefers 15:02:59 <ralonsoh> perfect for me, I can lead the weekly meeting next week 15:03:18 <haleyb> i will be here for next week's neutron meeting, just not drivers 15:03:27 <ralonsoh> ah perfect 15:03:53 <haleyb> so if there is an rfe you can run drivers, up to you based on that 15:04:10 <mlavalle> can I get someone to push this over the edge, please: https://review.opendev.org/c/openstack/neutron/+/918151 15:04:12 <slaweq> we will see if there will be quorum 15:04:15 <mlavalle> ? 15:04:17 <haleyb> we have no way to share schedules with other really 15:04:36 <haleyb> anyways, i will end this meeting, thanks for attending and discussion! 15:04:38 <ralonsoh> we'll check next Friday 15:04:39 <haleyb> #endmeeting