14:05:55 <haleyb> #startmeeting neutron_drivers
14:05:55 <opendevmeet> Meeting started Fri Nov 21 14:05:55 2025 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:05:55 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:05:55 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:05:57 <haleyb> Ping list: ykarel, mlavalle, mtomaska, slaweq, tobias-urdin, lajoskatona, haleyb, ralonsoh
14:06:10 <mlavalle> \o
14:06:15 <cardoe> o/
14:06:19 <haleyb> i know rodolfo was online earlier but don't see him now
14:06:21 <mtomaska> o/
14:06:27 <slaweq> o/
14:06:42 <ralonsoh> hi (late)
14:07:07 <TheJulia> o/
14:07:32 <ralonsoh> sorry, is the drivers meeting ongoing now?
14:07:49 <haleyb> ralonsoh: yes, i was late as well, just started
14:07:56 <ralonsoh> cool
14:08:00 <haleyb> we had a few things on the list
14:08:15 <haleyb> first one was from dsan
14:08:27 <haleyb> #link bugs.launchpad.net/neutron/+bug/2123836
14:08:31 <haleyb> there is also a patch
14:09:25 <haleyb> i'm not sure they are here, can move to second item
14:09:50 <dsan> o/
14:10:25 <cardoe> lol ran in as the doors closed
14:10:30 <haleyb> dsan: hi, i had commented in the bug and think others had looked at the change
14:10:53 <slaweq> I just looked at it now
14:10:55 <slaweq> and it looks ok for me
14:11:20 <ralonsoh> same for me, being this option configurable and disabled by default
14:11:57 <ralonsoh> just a question: is it possible to have something else apart from unit tests?
14:12:23 <ralonsoh> something that spawns dnsmasq with this option and other agent making this query?
14:12:30 <haleyb> my only comment (in the bug) is should the dhcp-agent be using this as well as liveness check, and restart any dnsmasq that doesn't respond?
14:12:52 <haleyb> but if others are fine with it...
14:13:03 <ralonsoh> haleyb, we never consider that
14:13:24 <ralonsoh> but I think that could be a follow up RFE
14:13:28 <slaweq> haleyb: IMO this could be potential follow up improvement
14:14:22 <haleyb> sure, that would work. it's not something you can put on a dashboard but could help
14:15:50 <haleyb> ralonsoh: regarding your testing ask, that would be good - dsan is that doable?
14:16:33 <slaweq> I'm not sure what you want to test there? Wouldn't it be testing of dnsmasq itself actually?
14:16:39 <slaweq> do we need that really?
14:16:50 <ralonsoh> yes, that will test dnsmasq
14:17:31 <dsan> meaning like adding a dns check ?
14:17:43 <ralonsoh> yes
14:17:54 <dsan> guess it's doable
14:18:13 <dsan> there's not much input validation as is
14:18:19 <haleyb> so more a functional test
14:18:41 <dsan> i was wondering if and how to do that
14:19:06 <dsan> inside dnsmasq it's only string manipulation
14:19:25 <dsan> not really if it's a valid txt record
14:20:00 <dsan> right now if we feed a wrong config value
14:20:14 <dsan> dnsmasq won't start and output an error
14:20:31 <dsan> not sure if it's acceptable
14:20:41 <dsan> or if a preflick check is needed
14:21:38 <slaweq> IMO it is acceptable
14:21:48 <dsan> ok
14:22:00 <haleyb> i'm thinking just the positive check would be necessary, i.e. start dnsmasq and use dig to get the txt message and verify it's what it should be?
14:22:12 <dsan> then I look into adding a little dns test
14:22:16 <haleyb> ralonsoh: is that what you were thinking?
14:22:20 <ralonsoh> yes
14:23:08 <haleyb> great, so we should vote, +1 from me
14:23:15 <mlavalle> +1
14:23:16 <slaweq> +1
14:23:18 <ralonsoh> +1
14:23:53 <haleyb> ok, i'll mark it approved, and thanks for working on it dsan
14:24:15 <haleyb> ralonsoh: the next topic was yours
14:24:30 <ralonsoh> thanks
14:24:54 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1970944
14:25:21 <ralonsoh> So the goal is to replace the keepalived-state-change script for something not so heavy (in RAM terms)
14:25:26 <ralonsoh> we already discussed that in the PTG
14:25:44 <ralonsoh> initially we were using keepalived notify_scripts
14:26:25 <ralonsoh> but we move to the python script in https://review.opendev.org/c/openstack/neutron/+/125384 because the errors present when multiple quick transitions happened
14:26:38 <ralonsoh> however this error was still present in next released
14:26:56 <ralonsoh> this is why I proposed https://review.opendev.org/q/I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad (6 years ago)
14:27:13 <opendevreview> Merged openstack/neutron master: Revert "[eventlet-removal] Handle stop DHCP agent"  https://review.opendev.org/c/openstack/neutron/+/967808
14:27:26 <ralonsoh> so the point is going back to the notify_scripts, but this time making it configurable (for a period of time)
14:27:43 <ralonsoh> so a user will be able to choose between python script or notify_scripts
14:28:08 <ralonsoh> if we switch to notfy_scripts by default and that works, we can remove the python script implementation
14:28:16 <ralonsoh> POC: https://review.opendev.org/c/openstack/neutron/+/965909
14:28:23 <ralonsoh> 2 questions:
14:28:31 <ralonsoh> 1) Do you agree with this strategy
14:28:57 <ralonsoh> 2) I'm not going to implement it, someone have time for this? Someone interested in the ML2/OVS backend
14:30:36 <haleyb> I agree we should fix the memory issue, but also cannot implement it
14:31:25 <slaweq> I like this new approach, I think it should be better
14:32:15 <ralonsoh> so if we vote for it, haleyb will you send a mail to ask for volunteers?
14:32:27 <haleyb> so you remember who noticed the memory issue? not to task them but just so they can be involved
14:32:36 <ralonsoh> zigo ^
14:32:57 <haleyb> ralonsoh: yes, i could send something to the ML
14:34:08 <haleyb> he might be offline
14:34:09 * zigo is looking this up
14:35:08 <ralonsoh> from PTG etherpad: (zigo) Some numbers from our public cloud related to that: in a random network node in production, we have 673 instances of this process. That's 65 GB of RAM
14:35:12 <zigo> ralonsoh: It's be nicer if the neutron/agent/l3/keepalived-state-change.sh file was located in /usr/bin and  had "neutron-" at the begining of its name.
14:35:38 <zigo> Also, you're not removing the older script, are you?
14:35:53 <ralonsoh> this is another alternative
14:36:07 <ralonsoh> we are discussing using notify_scripts (form keepalived) again
14:36:24 <ralonsoh> that will be much easier
14:37:02 <ralonsoh> (ok, you are already talking about https://review.opendev.org/c/openstack/neutron/+/965909/1/neutron/agent/l3/keepalived-state-change.sh)
14:37:13 <lajoskatona> o/
14:37:24 <zigo> Yeah, I was.
14:37:36 <ralonsoh> so yes, that could be changed. The final location of the file depends on the toml file, that moves this file where needed
14:38:58 <zigo> I'm just not a fan of having executed files in private directories. Also, the distro policies say you shouldn't use ".sh" as extension: from a user standpoint, the implementation language has no value, and therefore the extension shouldn't be used.
14:39:15 <zigo> If you don't do all of this, I'll have to cary a Debian specific patch, which is very annoying.
14:39:39 <zigo> (these are just pieces of advices on the implementation... nothing blocking though)
14:40:14 <haleyb> zigo: well, we are also looking for someone to do the work
14:40:15 <zigo> About using keepalived own implementation: I have no idea if it made progress over the last 10 years or not.
14:40:41 <zigo> Though it worries me that its call was removed in 2014, and we don't know why.
14:40:46 <lajoskatona> zigo: but the basic idea to replace python with shell script is acceptable from distro maintaner perspective and can be accepted?
14:40:57 <zigo> Yeah, I like the idea.
14:41:10 <lajoskatona> ak
14:41:12 <lajoskatona> ack
14:41:38 <haleyb> or have the choice? i think that's what ralonsoh said above? i.e. a config option
14:41:57 <zigo> An implementation in a better language could be cool too (C / Rust anymore ? :P )
14:42:05 <ralonsoh> initially we can implement both and make it configurable
14:42:22 <zigo> Maybe on a side proect to depend on?
14:42:29 <ralonsoh> but there is not need for this
14:42:53 <ralonsoh> keepalived notify_script is expecting a simple script, for example a bash script
14:43:13 <ralonsoh> nothing complex (actually we need to write a file and send a socket message)
14:43:26 <zigo> Well, I still don't like the fact that it doesn't scale. Reducing the memory footprint is cool, but what if I get 6000 instances instead of 600 ?
14:43:54 <zigo> I'd say it's a good and fast approach to fix things quick.
14:44:11 <ralonsoh> sorry, I dont' understand
14:44:26 <zigo> (well, to tell the truth: if I had 6000 HA router on a network node, I'd have other issues too ... :P)
14:44:46 <ralonsoh> if you have 6000 instances you will still need 6K instances of keepalived, not any python script to monitor the state change
14:44:55 <lajoskatona> what if we say that here's a reference implementation in shel, this is the interface in the documentation how it should behave do it yourself if you need something more
14:45:51 <slaweq> lajoskatona: you mean to make that notify script configurable?
14:46:08 <slaweq> and give operators way to bring their own one
14:46:13 <zigo> ralonsoh: The point I'm making is that the current situation where we spawn a long-lived daemon forever isn't ideal, because (not depending on how much memory it takes) it's going to always grow...
14:46:17 <slaweq> if yes, I like that idea
14:46:26 <lajoskatona> slaweq: yes,something like that
14:46:29 <ralonsoh> zigo, what daemon?
14:46:42 <zigo> The current script no?
14:46:45 <ralonsoh> no
14:46:53 <ralonsoh> notify_scripts doens't work like this
14:46:57 <zigo> Is it spawned, then dies?
14:47:01 <ralonsoh> yes
14:47:19 <zigo> Ok, then I shut up. :P
14:47:21 <ralonsoh> keepalived will call it when active/backuop event happens
14:47:34 <ralonsoh> and the execution should take milliseconds
14:48:10 <zigo> Then maybe, my idea of writing it in something less heavy than bash isn't that stupid.
14:48:19 <zigo> ralonsoh: You'll be surprised how heavy bash is.
14:48:37 <zigo> If you do something in Bash, I'll write an MR to do it at least in dash... :P
14:48:52 <zigo> It will spawn at least 3 times faster.
14:49:15 <zigo> Anyways, yeah, all my support for this !
14:49:24 <ralonsoh> as commented before, we need someone to take the lead on this implementation
14:49:32 <ralonsoh> I won't be able to continue
14:49:45 <slaweq> I think that we can use "sh" maybe instead of bash but I wouldn't go with something like dash there
14:49:49 <ralonsoh> so if you are able to do this or someone else you know, that wil be perfect
14:50:08 <zigo> slaweq: /bin/sh in Debian is dash, which is why I'm writing this.
14:50:11 <lajoskatona> yes we have to select something that first works in the CI :-)
14:50:27 <zigo> dash == iso shell, no bashism involved.
14:50:28 <slaweq> ok, I didn't know that
14:50:35 <ralonsoh> ok, let's keep the implementatikon for the patch
14:50:41 <haleyb> ok, we still have one more item on agenda so should finish up
14:50:42 <ralonsoh> we should vote for this idea
14:50:49 <haleyb> still need someone to own it
14:50:50 <zigo> ralonsoh: Your previous bash implementation was really fine, IMO.
14:50:58 <zigo> At least as a first approach.
14:51:07 <ralonsoh> that is discarded now
14:51:11 <zigo> :/
14:51:19 <haleyb> i vote to go forward with this, pending an owner
14:51:41 <slaweq> +1
14:51:43 <ralonsoh> +1
14:52:10 <lajoskatona> +1, I check with my management  (I have anyway long discussions with them in the recent weeks....)
14:52:12 <zigo> I'm +1 on anything that fixes the current situation, whatever implementation that is.
14:52:44 <mlavalle> +1
14:52:50 <opendevreview> Bodo Petermann proposed openstack/neutron master: Allow plugins to add periodics to maintenance worker  https://review.opendev.org/c/openstack/neutron/+/939817
14:52:54 <haleyb> ok, so i guess we can change the existing bug to have rfe tags, and i can send an email to list asking for an owner
14:53:30 <lajoskatona> +1
14:53:44 <haleyb> ok, last topic was from cardoe
14:53:51 <haleyb> and sorry for not having a lot of time
14:53:59 <cardoe> Well happy to do what I can.
14:54:24 <cardoe> I can throw a meetpad link but Ironic boxes will likely take longer than 5 minutes to boot all the way up
14:55:02 <ralonsoh> it's ok, no problem
14:55:04 <TheJulia> The ask was for some sort of demo to advance the basic understanding... so maybe why not and just keep going in a separate forum ?
14:55:23 <ralonsoh> we can close the drivers meeting and join the meetpad
14:55:31 <cardoe> https://meetpad.opendev.org/neutron-vxlan-demo
14:55:59 <lajoskatona> I am on mobile-net in a car currently, but I try....
14:56:48 <haleyb> alright i will end this meeting then, i don't have a conflict for 30 minutes
14:56:58 <haleyb> #endmeeting