14:05:55 <haleyb> #startmeeting neutron_drivers 14:05:55 <opendevmeet> Meeting started Fri Nov 21 14:05:55 2025 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:05:55 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:05:55 <opendevmeet> The meeting name has been set to 'neutron_drivers' 14:05:57 <haleyb> Ping list: ykarel, mlavalle, mtomaska, slaweq, tobias-urdin, lajoskatona, haleyb, ralonsoh 14:06:10 <mlavalle> \o 14:06:15 <cardoe> o/ 14:06:19 <haleyb> i know rodolfo was online earlier but don't see him now 14:06:21 <mtomaska> o/ 14:06:27 <slaweq> o/ 14:06:42 <ralonsoh> hi (late) 14:07:07 <TheJulia> o/ 14:07:32 <ralonsoh> sorry, is the drivers meeting ongoing now? 14:07:49 <haleyb> ralonsoh: yes, i was late as well, just started 14:07:56 <ralonsoh> cool 14:08:00 <haleyb> we had a few things on the list 14:08:15 <haleyb> first one was from dsan 14:08:27 <haleyb> #link bugs.launchpad.net/neutron/+bug/2123836 14:08:31 <haleyb> there is also a patch 14:09:25 <haleyb> i'm not sure they are here, can move to second item 14:09:50 <dsan> o/ 14:10:25 <cardoe> lol ran in as the doors closed 14:10:30 <haleyb> dsan: hi, i had commented in the bug and think others had looked at the change 14:10:53 <slaweq> I just looked at it now 14:10:55 <slaweq> and it looks ok for me 14:11:20 <ralonsoh> same for me, being this option configurable and disabled by default 14:11:57 <ralonsoh> just a question: is it possible to have something else apart from unit tests? 14:12:23 <ralonsoh> something that spawns dnsmasq with this option and other agent making this query? 14:12:30 <haleyb> my only comment (in the bug) is should the dhcp-agent be using this as well as liveness check, and restart any dnsmasq that doesn't respond? 14:12:52 <haleyb> but if others are fine with it... 14:13:03 <ralonsoh> haleyb, we never consider that 14:13:24 <ralonsoh> but I think that could be a follow up RFE 14:13:28 <slaweq> haleyb: IMO this could be potential follow up improvement 14:14:22 <haleyb> sure, that would work. it's not something you can put on a dashboard but could help 14:15:50 <haleyb> ralonsoh: regarding your testing ask, that would be good - dsan is that doable? 14:16:33 <slaweq> I'm not sure what you want to test there? Wouldn't it be testing of dnsmasq itself actually? 14:16:39 <slaweq> do we need that really? 14:16:50 <ralonsoh> yes, that will test dnsmasq 14:17:31 <dsan> meaning like adding a dns check ? 14:17:43 <ralonsoh> yes 14:17:54 <dsan> guess it's doable 14:18:13 <dsan> there's not much input validation as is 14:18:19 <haleyb> so more a functional test 14:18:41 <dsan> i was wondering if and how to do that 14:19:06 <dsan> inside dnsmasq it's only string manipulation 14:19:25 <dsan> not really if it's a valid txt record 14:20:00 <dsan> right now if we feed a wrong config value 14:20:14 <dsan> dnsmasq won't start and output an error 14:20:31 <dsan> not sure if it's acceptable 14:20:41 <dsan> or if a preflick check is needed 14:21:38 <slaweq> IMO it is acceptable 14:21:48 <dsan> ok 14:22:00 <haleyb> i'm thinking just the positive check would be necessary, i.e. start dnsmasq and use dig to get the txt message and verify it's what it should be? 14:22:12 <dsan> then I look into adding a little dns test 14:22:16 <haleyb> ralonsoh: is that what you were thinking? 14:22:20 <ralonsoh> yes 14:23:08 <haleyb> great, so we should vote, +1 from me 14:23:15 <mlavalle> +1 14:23:16 <slaweq> +1 14:23:18 <ralonsoh> +1 14:23:53 <haleyb> ok, i'll mark it approved, and thanks for working on it dsan 14:24:15 <haleyb> ralonsoh: the next topic was yours 14:24:30 <ralonsoh> thanks 14:24:54 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1970944 14:25:21 <ralonsoh> So the goal is to replace the keepalived-state-change script for something not so heavy (in RAM terms) 14:25:26 <ralonsoh> we already discussed that in the PTG 14:25:44 <ralonsoh> initially we were using keepalived notify_scripts 14:26:25 <ralonsoh> but we move to the python script in https://review.opendev.org/c/openstack/neutron/+/125384 because the errors present when multiple quick transitions happened 14:26:38 <ralonsoh> however this error was still present in next released 14:26:56 <ralonsoh> this is why I proposed https://review.opendev.org/q/I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad (6 years ago) 14:27:13 <opendevreview> Merged openstack/neutron master: Revert "[eventlet-removal] Handle stop DHCP agent" https://review.opendev.org/c/openstack/neutron/+/967808 14:27:26 <ralonsoh> so the point is going back to the notify_scripts, but this time making it configurable (for a period of time) 14:27:43 <ralonsoh> so a user will be able to choose between python script or notify_scripts 14:28:08 <ralonsoh> if we switch to notfy_scripts by default and that works, we can remove the python script implementation 14:28:16 <ralonsoh> POC: https://review.opendev.org/c/openstack/neutron/+/965909 14:28:23 <ralonsoh> 2 questions: 14:28:31 <ralonsoh> 1) Do you agree with this strategy 14:28:57 <ralonsoh> 2) I'm not going to implement it, someone have time for this? Someone interested in the ML2/OVS backend 14:30:36 <haleyb> I agree we should fix the memory issue, but also cannot implement it 14:31:25 <slaweq> I like this new approach, I think it should be better 14:32:15 <ralonsoh> so if we vote for it, haleyb will you send a mail to ask for volunteers? 14:32:27 <haleyb> so you remember who noticed the memory issue? not to task them but just so they can be involved 14:32:36 <ralonsoh> zigo ^ 14:32:57 <haleyb> ralonsoh: yes, i could send something to the ML 14:34:08 <haleyb> he might be offline 14:34:09 * zigo is looking this up 14:35:08 <ralonsoh> from PTG etherpad: (zigo) Some numbers from our public cloud related to that: in a random network node in production, we have 673 instances of this process. That's 65 GB of RAM 14:35:12 <zigo> ralonsoh: It's be nicer if the neutron/agent/l3/keepalived-state-change.sh file was located in /usr/bin and had "neutron-" at the begining of its name. 14:35:38 <zigo> Also, you're not removing the older script, are you? 14:35:53 <ralonsoh> this is another alternative 14:36:07 <ralonsoh> we are discussing using notify_scripts (form keepalived) again 14:36:24 <ralonsoh> that will be much easier 14:37:02 <ralonsoh> (ok, you are already talking about https://review.opendev.org/c/openstack/neutron/+/965909/1/neutron/agent/l3/keepalived-state-change.sh) 14:37:13 <lajoskatona> o/ 14:37:24 <zigo> Yeah, I was. 14:37:36 <ralonsoh> so yes, that could be changed. The final location of the file depends on the toml file, that moves this file where needed 14:38:58 <zigo> I'm just not a fan of having executed files in private directories. Also, the distro policies say you shouldn't use ".sh" as extension: from a user standpoint, the implementation language has no value, and therefore the extension shouldn't be used. 14:39:15 <zigo> If you don't do all of this, I'll have to cary a Debian specific patch, which is very annoying. 14:39:39 <zigo> (these are just pieces of advices on the implementation... nothing blocking though) 14:40:14 <haleyb> zigo: well, we are also looking for someone to do the work 14:40:15 <zigo> About using keepalived own implementation: I have no idea if it made progress over the last 10 years or not. 14:40:41 <zigo> Though it worries me that its call was removed in 2014, and we don't know why. 14:40:46 <lajoskatona> zigo: but the basic idea to replace python with shell script is acceptable from distro maintaner perspective and can be accepted? 14:40:57 <zigo> Yeah, I like the idea. 14:41:10 <lajoskatona> ak 14:41:12 <lajoskatona> ack 14:41:38 <haleyb> or have the choice? i think that's what ralonsoh said above? i.e. a config option 14:41:57 <zigo> An implementation in a better language could be cool too (C / Rust anymore ? :P ) 14:42:05 <ralonsoh> initially we can implement both and make it configurable 14:42:22 <zigo> Maybe on a side proect to depend on? 14:42:29 <ralonsoh> but there is not need for this 14:42:53 <ralonsoh> keepalived notify_script is expecting a simple script, for example a bash script 14:43:13 <ralonsoh> nothing complex (actually we need to write a file and send a socket message) 14:43:26 <zigo> Well, I still don't like the fact that it doesn't scale. Reducing the memory footprint is cool, but what if I get 6000 instances instead of 600 ? 14:43:54 <zigo> I'd say it's a good and fast approach to fix things quick. 14:44:11 <ralonsoh> sorry, I dont' understand 14:44:26 <zigo> (well, to tell the truth: if I had 6000 HA router on a network node, I'd have other issues too ... :P) 14:44:46 <ralonsoh> if you have 6000 instances you will still need 6K instances of keepalived, not any python script to monitor the state change 14:44:55 <lajoskatona> what if we say that here's a reference implementation in shel, this is the interface in the documentation how it should behave do it yourself if you need something more 14:45:51 <slaweq> lajoskatona: you mean to make that notify script configurable? 14:46:08 <slaweq> and give operators way to bring their own one 14:46:13 <zigo> ralonsoh: The point I'm making is that the current situation where we spawn a long-lived daemon forever isn't ideal, because (not depending on how much memory it takes) it's going to always grow... 14:46:17 <slaweq> if yes, I like that idea 14:46:26 <lajoskatona> slaweq: yes,something like that 14:46:29 <ralonsoh> zigo, what daemon? 14:46:42 <zigo> The current script no? 14:46:45 <ralonsoh> no 14:46:53 <ralonsoh> notify_scripts doens't work like this 14:46:57 <zigo> Is it spawned, then dies? 14:47:01 <ralonsoh> yes 14:47:19 <zigo> Ok, then I shut up. :P 14:47:21 <ralonsoh> keepalived will call it when active/backuop event happens 14:47:34 <ralonsoh> and the execution should take milliseconds 14:48:10 <zigo> Then maybe, my idea of writing it in something less heavy than bash isn't that stupid. 14:48:19 <zigo> ralonsoh: You'll be surprised how heavy bash is. 14:48:37 <zigo> If you do something in Bash, I'll write an MR to do it at least in dash... :P 14:48:52 <zigo> It will spawn at least 3 times faster. 14:49:15 <zigo> Anyways, yeah, all my support for this ! 14:49:24 <ralonsoh> as commented before, we need someone to take the lead on this implementation 14:49:32 <ralonsoh> I won't be able to continue 14:49:45 <slaweq> I think that we can use "sh" maybe instead of bash but I wouldn't go with something like dash there 14:49:49 <ralonsoh> so if you are able to do this or someone else you know, that wil be perfect 14:50:08 <zigo> slaweq: /bin/sh in Debian is dash, which is why I'm writing this. 14:50:11 <lajoskatona> yes we have to select something that first works in the CI :-) 14:50:27 <zigo> dash == iso shell, no bashism involved. 14:50:28 <slaweq> ok, I didn't know that 14:50:35 <ralonsoh> ok, let's keep the implementatikon for the patch 14:50:41 <haleyb> ok, we still have one more item on agenda so should finish up 14:50:42 <ralonsoh> we should vote for this idea 14:50:49 <haleyb> still need someone to own it 14:50:50 <zigo> ralonsoh: Your previous bash implementation was really fine, IMO. 14:50:58 <zigo> At least as a first approach. 14:51:07 <ralonsoh> that is discarded now 14:51:11 <zigo> :/ 14:51:19 <haleyb> i vote to go forward with this, pending an owner 14:51:41 <slaweq> +1 14:51:43 <ralonsoh> +1 14:52:10 <lajoskatona> +1, I check with my management (I have anyway long discussions with them in the recent weeks....) 14:52:12 <zigo> I'm +1 on anything that fixes the current situation, whatever implementation that is. 14:52:44 <mlavalle> +1 14:52:50 <opendevreview> Bodo Petermann proposed openstack/neutron master: Allow plugins to add periodics to maintenance worker https://review.opendev.org/c/openstack/neutron/+/939817 14:52:54 <haleyb> ok, so i guess we can change the existing bug to have rfe tags, and i can send an email to list asking for an owner 14:53:30 <lajoskatona> +1 14:53:44 <haleyb> ok, last topic was from cardoe 14:53:51 <haleyb> and sorry for not having a lot of time 14:53:59 <cardoe> Well happy to do what I can. 14:54:24 <cardoe> I can throw a meetpad link but Ironic boxes will likely take longer than 5 minutes to boot all the way up 14:55:02 <ralonsoh> it's ok, no problem 14:55:04 <TheJulia> The ask was for some sort of demo to advance the basic understanding... so maybe why not and just keep going in a separate forum ? 14:55:23 <ralonsoh> we can close the drivers meeting and join the meetpad 14:55:31 <cardoe> https://meetpad.opendev.org/neutron-vxlan-demo 14:55:59 <lajoskatona> I am on mobile-net in a car currently, but I try.... 14:56:48 <haleyb> alright i will end this meeting then, i don't have a conflict for 30 minutes 14:56:58 <haleyb> #endmeeting