14:01:29 <haleyb> #startmeeting neutron_drivers
14:01:29 <opendevmeet> Meeting started Fri Jun 28 14:01:29 2024 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:29 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:29 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:01:46 <ralonsoh> hello
14:02:05 <mlavalle> \o
14:02:22 <slaweq> o/
14:02:41 <lajoskatona> o/
14:03:08 <haleyb> i guess that's quorum at 5
14:03:45 <haleyb> we had three items in the agenda. i did see an RFE bug come in yesterday but have not looked yet, if we have time can do that as well
14:04:03 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2070376
14:04:10 <haleyb> ralonsoh: yours is first
14:04:18 <ralonsoh> thanks, in one shot, to speed up
14:04:21 <ralonsoh> Hi, I would like to talk about https://bugs.launchpad.net/neutron/+bug/2070376
14:04:21 <ralonsoh> This is related to the community goal "eventlet-deprecation"
14:04:21 <ralonsoh> I started the investigation of how DHCP agent works and I found that we have several threads running at the same time.
14:04:21 <ralonsoh> 1 for the periodic resyncs, 1 for reading the RPC messages, 1 to notify the server about the port reservations
14:04:22 <ralonsoh> and multiple threads to process the RPC events and execute the resource updates
14:04:24 <ralonsoh> There we have several problems: if we switch to a preemptive thread module, we no longer will have the "protection" of the collaborative threads and we'll need to add execution locks
14:04:27 <ralonsoh> But my main concert with this resource update threads are that we don't really gain any speed having more than one thread
14:04:30 <ralonsoh> In https://review.opendev.org/c/openstack/neutron/+/626830 we implemented a way to process the ports using a priority queue
14:04:33 <ralonsoh> But along with this patch we introduced the multithreading processing, that doesn't add any speed gain
14:04:36 <ralonsoh> So my proposal is to keep the priority queue (that is of coursed valuable and needed) but remove the multithreading for the event processing
14:04:39 <ralonsoh> That will (1) not reduce the processing speed and (2) make more robust the event processing when changing to kernel threads
14:05:52 <ralonsoh> (I have a very simple patch testing that: https://review.opendev.org/c/openstack/neutron/+/922719/)
14:06:24 <haleyb> and this removes the need for locking as well?
14:06:42 <haleyb> oh, you said that
14:06:48 <haleyb> big copy/paste
14:06:49 <ralonsoh> more or less, because we have the port reservations notification
14:07:06 <ralonsoh> but at least the single thread processing the events will be unique
14:08:03 <haleyb> i guess i always thought the multiple threads helped when configuring, like creating namespaces, when lots were involved, but you didn't see that?
14:08:39 <ralonsoh> we can use the rally test to validate that
14:09:03 <ralonsoh> but I don't see speed gain with multiple threads
14:09:12 <ralonsoh> rally CI, I mean
14:09:20 <slaweq> this multi threads could be most useful when e.g. agent was restarted
14:09:29 <ralonsoh> but how?
14:09:30 <slaweq> and have to go over many networks and ports
14:09:44 <ralonsoh> there won't be multiple threads working at the same time
14:09:47 <ralonsoh> this is python
14:10:01 <slaweq> right
14:10:17 <slaweq> I was more refering to what haleyb wrote
14:10:22 <ralonsoh> to improve that (and I think that was commented before) we should need a multiprocess DHCP agent
14:10:54 <haleyb> right, i think it was always restart when there were issues, like an hour wait, although the l3-agent was always worse
14:10:57 <lajoskatona> but for that we need locks
14:11:20 <mlavalle> which is ralonsoh's point
14:11:41 <ralonsoh> and, btw, we use ctypes.PyDLL
14:12:04 <ralonsoh> # NOTE(ralonsoh): from https://docs.python.org/3.6/library/
14:12:04 <ralonsoh> # ctypes.html#ctypes.PyDLL: "Instances of this class behave like CDLL
14:12:04 <ralonsoh> # instances, except that the Python GIL is not released during the
14:12:04 <ralonsoh> # function call, and after the function execution the Python error
14:12:04 <ralonsoh> # flag is checked."
14:12:24 <ralonsoh> so the GIL will be attached to this thread
14:12:54 <obondarev_> late o/
14:13:21 <ralonsoh> I can, of course, do some load testing with multiple VMs on a node and restarting the DHCP agent
14:13:37 <ralonsoh> with different DHCP_PROCESS_GREENLET_MIN/DHCP_PROCESS_GREENLET_MAX values
14:14:02 <slaweq> that would be good test IMHO
14:14:11 <mlavalle> agree
14:14:26 <ralonsoh> perfect, I'll spawn a single compute node with tons of RAM and I'll try to spawn as many VMs as possible
14:14:37 <ralonsoh> and the restart the agent with different thread values
14:14:43 <ralonsoh> I'll update the LP bug
14:14:52 <slaweq> I don't think You really need vms for that
14:15:03 <slaweq> probably creating networks and ports would be enough
14:15:05 <ralonsoh> just ports in the netowkr
14:15:07 <ralonsoh> right
14:15:22 <slaweq> maybe https://github.com/slawqo/neutron-heater can help with that too
14:15:33 <ralonsoh> ahh yes
14:16:03 <ralonsoh> so let's wait for my feedback on this, but please consider the problem we have ahead with the eventlet deprecation
14:17:39 <ralonsoh> (that's all from my side, thanks a lot)
14:17:51 <lajoskatona> thanks ralonsoh
14:18:13 <mlavalle> yes, thanks ralonsoh
14:18:47 <haleyb> ralonsoh: thanks, will keep a lookout on the patches
14:19:55 <haleyb> and i guess we don't need to vote as it's not an rfe, but i agree with doing this work
14:20:17 <seba> we've been having many problems with eventlet, especially when running neutron-api with uwsgi, so I'm looking forward to this
14:21:26 <haleyb> slaweq: yours is next
14:21:30 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2060916
14:22:13 <slaweq> thx
14:22:26 <slaweq> I recentrly wanted to finally start working on this
14:22:50 <slaweq> it came up when we introduced 'service' role policies
14:23:25 <slaweq> as it seems that with those new policies trusted_vif can't be set through the 'binding_profile' attribute
14:24:07 <slaweq> so this is pretty small and easy RFE to do where new api extension would be proposed and it would add new attribute to the port
14:24:37 <slaweq> this field would be then set by neutron in the binding_profile to be send e.g. to nova
14:24:39 <slaweq> as it is now
14:24:55 <slaweq> so that other components would not require changes
14:25:20 <slaweq> and binding_profile would be used (more) as it should be used so for the machine2machine communication
14:25:34 <slaweq> that's all from me
14:25:50 <ralonsoh> +1 to decouple Neutron configurations parameters written in port.binding_profile, as done before with others
14:26:30 <obondarev> +1, sounds reasonable
14:26:35 <lajoskatona> +1
14:26:42 <mlavalle> +1
14:27:11 <haleyb> +1 from me
14:28:07 <slaweq> thank You, so I assume that RFE is approved and I can start work on it now, right?
14:28:22 <mlavalle> yes
14:28:34 <mlavalle> fire away
14:28:40 <haleyb> yes, i will mark it approved, don't think you need a spec as the bug is pretty clear
14:28:41 <slaweq> thank You, that's all from me then :)
14:29:14 <slaweq> haleyb exactly, that's why I didn't propose any spec until now as I was hoping it will not be needed :)
14:29:51 <haleyb> the next one was added by me (and mlavalle :)
14:29:56 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2067183
14:30:05 <haleyb> #link https://review.opendev.org/c/openstack/neutron/+/920459
14:31:16 <haleyb> I added because we have broken things when tweaking dns_domain in the past
14:31:24 <slaweq> again DNS :)
14:31:45 <mlavalle> we have gotten and even implemented this in the past (Assaf implemented it) and then we reveresed it
14:31:46 <slaweq> we broke it so many times that I can't even count them :P
14:32:33 <mlavalle> so I think there is  a group of users whose use case we are not properly covering
14:33:01 <ralonsoh> so in https://review.opendev.org/c/openstack/neutron/+/571546 we were directly reading the network dns_domain value in the DHCP agent
14:33:19 <ralonsoh> and in your proposal you are inheriting this value from the network
14:33:19 <mlavalle> while at the same time we are trying to preserve the current behavior, which was specified here https://specs.openstack.org/openstack/neutron-specs/specs/liberty/internal-dns-resolution.html
14:33:25 <ralonsoh> is almost the same, right?
14:33:33 <mlavalle> it's not my proposal
14:33:46 <ralonsoh> Jay Jahns proposal
14:34:10 <mlavalle> I just thought, while looking at the patch, that we are not addressing a use case
14:35:06 <mlavalle> so why don't we do this optional through an extension? The code change mostly is in a ml2 extension: https://review.opendev.org/c/openstack/neutron/+/920459
14:35:35 <mlavalle> so why not create a new extension which allows users to have this new behavior?
14:35:57 <ralonsoh> that won't break current deployments and will allow this network dns inheritance
14:36:06 <ralonsoh> +1 to this idea
14:36:08 <mlavalle> yeap
14:36:18 <slaweq> my (minor) issue with that is that we have already so many dns integration extensions that it may not be easy for users which they should use
14:36:20 <frickler> like there wouldn't be enough dns extensions already :-/
14:36:29 <ralonsoh> correct...
14:36:37 <frickler> and you cannot stack them
14:36:40 <slaweq> and they inherits one from the other
14:36:47 <obondarev> maybe just a bit more descriptive name..
14:37:00 <mlavalle> yes, but we have users who seem to need a new behavior
14:37:42 <haleyb> mlavalle: this change (as is) could break things for users not expecting it i'd guess?
14:37:50 <mlavalle> and it keeps returning at us
14:38:01 <ralonsoh> agree with the number of DNS related extensions, and the problems configuring them (some of them are incompatible)
14:38:06 <mlavalle> haleyb: yes, think so
14:38:07 <ralonsoh> but that could be documented
14:38:31 <slaweq> another problem with having such two different extensions is testing of them in the CI
14:39:09 <lajoskatona> do we have now jobs with DNS without designate?
14:39:17 <slaweq> maybe we should look at it from the other perspective and e.g. propose new API extension which would fit this 'new' use case?
14:39:43 <mlavalle> that's exactly what I'm saying slaweq
14:40:39 <slaweq> mlavalle so you are talking about api extension now? I though that You want to have another ml2 plugin extension for that
14:41:19 <slaweq> lajoskatona I though that there are (or were) some tests like that in neutron_tempest_plugin and were run in every our job there
14:41:29 <mlavalle> slaweq: I meant booth. A new API extension that is implemented by a ml2 extension
14:41:29 <slaweq> but maybe I'm wrong and we don't have them anymore
14:42:05 <slaweq> I would be fine with new API extension for sure
14:43:14 <slaweq> regarding new ml2 extension related to dns - ok, but maybe we could somehow refactor what we have now and add this new functionality to the exisitng one somehow? But this can also be maybe done as a different task probably
14:43:25 <lajoskatona> slaweq: we have these extensions in zuul for DNS: dns-domain-ports, dns-integration, dns-integration-domain-keywords so it seems we have some tests
14:43:34 <haleyb> so an API extension to add something to the network?
14:43:51 <ralonsoh> that could be an option, to add a new field to the network
14:44:14 <ralonsoh> so this behaviour will apply not globally but per network
14:45:12 <slaweq> ++
14:45:13 <ralonsoh> and this could be implemented, most probably, in the current dns plugin extensions
14:45:23 <mlavalle> exactly
14:45:35 <slaweq> ralonsoh++ for that
14:45:55 <slaweq> that would be IMO even better if we could do it in existing ml2 extension(s)
14:46:07 <ralonsoh> agree
14:47:50 <lajoskatona> +1 for new field for network
14:47:57 <ralonsoh> +1 to this network API DNS extension
14:48:14 <mlavalle> +1
14:48:41 <haleyb> +1
14:48:54 <obondarev> +1
14:49:04 <slaweq> +1
14:49:15 <haleyb> mlavalle: can you write-up ^^ and put it in the bug? you are better at dns wording than i am :)
14:49:31 <mlavalle> yes, I'll take care of it haleyb
14:49:55 <mlavalle> and I'll help Jan with the implementation
14:50:49 <haleyb> mlavalle: great, thanks
14:52:05 <haleyb> i'm not sure we have time to talk about rfe liushy filed yesterday as it has not been triaged
14:52:43 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2071323
14:52:52 <haleyb> in case anyone is wondering
14:53:58 <haleyb> but now reading that it looks like the metering agent did something like it
14:54:45 <slaweq> ovs can send sflow data to some monitoring tool IIRC
14:54:57 <slaweq> wouldn't that be enough?
14:55:11 <mlavalle> yes, ovs can do that
14:55:19 <mlavalle> I've tested it
14:55:33 <slaweq> for the SG rules accept/deny statistics we have SG logging - maybe that is enough
14:55:45 <slaweq> thx mlavalle for confirmation
14:56:09 <slaweq> I am not sure what data should neutron agents collets according to this rfe
14:56:28 <slaweq> I think this would require more detailed description IMO
14:56:36 <ralonsoh> I think we is thinking about OVS agent, but I'm just guessing
14:56:51 <slaweq> yes, probably
14:57:13 <haleyb> slaweq: right, there are some pieces in place, and i'm not sure either, but agree it is probably OVS related based on their deployments
14:57:15 <slaweq> but this agent can already be busy
14:58:53 <ralonsoh> can we request more info or to participate in this meeting?
14:59:15 <haleyb> I will put a comment in there asking, and yes, it would be better if he was in the meeting
14:59:53 <lajoskatona> +1
15:00:20 <slaweq> ++
15:00:29 <ralonsoh> +1
15:00:47 <haleyb> that said, with it being summer, I will be out the next two Fridays (US holiday-ish, vacation), and again a couple weeks after
15:01:05 <mlavalle> I will be off this coming week
15:01:15 <ralonsoh> lucky you!
15:01:32 <lajoskatona> enjoy it :-)
15:01:41 <haleyb> but if liushy can attend on july 12th maybe someone else can lead? assuming quorum
15:01:47 <slaweq> enjoy
15:01:53 <obondarev> have a nice vacation mlavalle!
15:01:56 <ralonsoh> we can lead the meeting, for sure
15:02:04 <mlavalle> thanks
15:02:22 <haleyb> ok, i will ask, i know it's hard for timezone
15:02:35 <mlavalle> ralonsoh can lead the weekly meeting and I can lead the drivers, or viceversa
15:02:49 <mlavalle> whichever he prefers
15:02:59 <ralonsoh> perfect for me, I can lead the weekly meeting next week
15:03:18 <haleyb> i will be here for next week's neutron meeting, just not drivers
15:03:27 <ralonsoh> ah perfect
15:03:53 <haleyb> so if there is an rfe you can run drivers, up to you based on that
15:04:10 <mlavalle> can I get someone to push this over the edge, please: https://review.opendev.org/c/openstack/neutron/+/918151
15:04:12 <slaweq> we will see if there will be quorum
15:04:15 <mlavalle> ?
15:04:17 <haleyb> we have no way to share schedules with other really
15:04:36 <haleyb> anyways, i will end this meeting, thanks for attending and discussion!
15:04:38 <ralonsoh> we'll check next Friday
15:04:39 <haleyb> #endmeeting