sahid | ralonsoh: morning | 07:38 |
---|---|---|
sahid | sorry to jump on you like that, but you may have a hint, can you have a look to the unit tests? https://24e2a8e2d1a77fa9bdf5-a6c8485b6a6aeffda6bfb9fca2c239bc.ssl.cf2.rackcdn.com/939321/23/check/openstack-tox-py39/7b25c30/testr_results.html | 07:38 |
sahid | a lot are failing but I'm not able at that point to identify the reason | 07:38 |
sahid | locally they work | 07:39 |
sahid | i'm sure it's a tiny thing... | 07:39 |
sahid | locally: tox -epy3 -- neutron.tests.unit.plugins.ml2.drivers.openvswitch.agent.test_ovs_neutron_agent.TestOvsNeutronAgentOSKen | 07:42 |
sahid | Ran: 171 tests in 4.4855 sec. - Passed: 171 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 0 | 07:43 |
sahid | Sum of execute time for each test: 40.2115 sec. | 07:43 |
sahid | hum perhaps i should rm my py3 env | 07:43 |
sahid | that works, even after to have resested env... | 07:44 |
ralonsoh | sahid, so the errors in the testsuit are restricted to the worker {3} | 07:51 |
ralonsoh | https://24e2a8e2d1a77fa9bdf5-a6c8485b6a6aeffda6bfb9fca2c239bc.ssl.cf2.rackcdn.com/939321/23/check/openstack-tox-py39/7b25c30/job-output.txt | 07:51 |
ralonsoh | and they start after executing one of your new tests | 07:51 |
ralonsoh | 2025-01-30 15:58:24.643291 | ubuntu-focal | {3} neutron.tests.unit.plugins.ml2.drivers.openvswitch.agent.openflow.native.test_ovs_oskenapp.TestSignalHandling.test_signal_initialization [0.001730s] ... ok | 07:52 |
ralonsoh | this test is doing something wrong | 07:52 |
ralonsoh | try sending this patch without this test (commenting it) | 07:52 |
ralonsoh | the UTs will pass | 07:52 |
ralonsoh | I've reproduced that locally but executing the whole test suite | 07:54 |
ralonsoh | that means, executing test_signal_initialization too | 07:54 |
ralonsoh | I'm trying now without this test | 07:54 |
ralonsoh | sahid, to be honest, I don't see any benefit from this test. This is just testing that some methods are executed. But these methods are not in a branch or condition | 07:58 |
ralonsoh | this test is not useful | 07:58 |
sahid | yes... not useful and looks to broke something | 07:59 |
sahid | i'm trying to see how i can articulate the change | 08:00 |
ralonsoh | this test can check that the singal_handler is called | 08:00 |
ralonsoh | this is passed to ovs_agent main | 08:00 |
ralonsoh | and you spawn it in the hib | 08:00 |
ralonsoh | then the daemon_loop must be called | 08:01 |
ralonsoh | it is important, and I think that's what is breaking the test suit | 08:01 |
ralonsoh | to stop the app\ | 08:01 |
ralonsoh | this must be define in a cleanUp method | 08:01 |
sahid | thanks ralonsoh let me see if i can try to follow your reco | 08:03 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321 | 08:26 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765 | 08:26 |
sahid | ok I have removed the not interesting tests and added cleanup to stop the hub. let's seee if that fix our problem, in that same time I will work to a test that go through ovs-avent to call daemon_loop | 08:27 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [OVN] Refresh host nodes before notifying https://review.opendev.org/c/openstack/neutron/+/940256 | 09:28 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2024.1: [OVN] Add the network type to the ``Logical_Switch`` register https://review.opendev.org/c/openstack/neutron/+/940453 | 09:43 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2024.1: [OVN] Set reside-on-chassis-redirect also for FLAT networks https://review.opendev.org/c/openstack/neutron/+/940454 | 10:01 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2023.2: [OVN] Add the network type to the ``Logical_Switch`` register https://review.opendev.org/c/openstack/neutron/+/940502 | 10:13 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2023.2: [OVN] Set reside-on-chassis-redirect also for FLAT networks https://review.opendev.org/c/openstack/neutron/+/940503 | 10:14 |
*** bpetermann is now known as Guest7571 | 10:18 | |
opendevreview | Lajos Katona proposed openstack/neutron-tempest-plugin master: Tap Mirror API and scenario tests https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/886004 | 10:22 |
opendevreview | Lajos Katona proposed openstack/networking-bagpipe master: CFG: add help text for OVS dataplane driver cfg options https://review.opendev.org/c/openstack/networking-bagpipe/+/933735 | 10:22 |
opendevreview | Lajos Katona proposed openstack/tap-as-a-service master: Do not set ageing in case of system datapath type https://review.opendev.org/c/openstack/tap-as-a-service/+/922400 | 10:23 |
opendevreview | Merged openstack/neutron stable/2023.2: Clean up state VRRP PID file https://review.opendev.org/c/openstack/neutron/+/940446 | 10:42 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [OVN] Refactor ``HashRingManager`` sync method https://review.opendev.org/c/openstack/neutron/+/940342 | 10:47 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2023.2: [OVN] Add the network type to the ``Logical_Switch`` register https://review.opendev.org/c/openstack/neutron/+/940502 | 10:58 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2023.2: [OVN] Set reside-on-chassis-redirect also for FLAT networks https://review.opendev.org/c/openstack/neutron/+/940503 | 10:58 |
opendevreview | Merged openstack/neutron master: Check existence of GW port before trying to delete it https://review.opendev.org/c/openstack/neutron/+/939451 | 12:47 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321 | 13:02 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765 | 13:02 |
haleyb | #startmeeting neutron_drivers | 14:00 |
opendevmeet | Meeting started Fri Jan 31 14:00:17 2025 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:00 |
opendevmeet | The meeting name has been set to 'neutron_drivers' | 14:00 |
haleyb | Ping list: ykarel, mlavalle, mtomaska, slaweq, obondarev, tobias-urdin, lajoskatona, amotoki, haleyb, ralonsoh | 14:00 |
mlavalle | \o | 14:00 |
ralonsoh | hello | 14:00 |
obondarev | o/ | 14:00 |
haleyb | will wait another minute for quorum | 14:01 |
lajoskatona | o/ | 14:01 |
haleyb | i had pinged daniel about the one item on our agenda, just noticed he said he might be late | 14:02 |
f0o | o/ I'm here | 14:02 |
haleyb | oh hi | 14:02 |
f0o | Hi! literally just got my laptop open in time hehe | 14:02 |
haleyb | we can get started then | 14:02 |
haleyb | #link https://bugs.launchpad.net/neutron/+bug/2096704 | 14:02 |
haleyb | [RFE] Allow rebalancing of OVN LRPs | 14:02 |
f0o | Shoot, how can I help clarifying the RFE :) | 14:03 |
haleyb | so i think you had talked to ralonsoh last week about this? it does seem like a gap to me | 14:03 |
ralonsoh | well, not a gap | 14:04 |
ralonsoh | but a useful tool when using OVN L3 | 14:04 |
mlavalle | seems a sensible tool | 14:04 |
f0o | :) | 14:04 |
lajoskatona | Tuesday ralonsoh said it would be a tool not APi, am I right about it< | 14:05 |
lajoskatona | ? | 14:05 |
ralonsoh | yes, that should be a tool | 14:05 |
ralonsoh | but f0o can explain how it should be and the goal | 14:05 |
f0o | For us it would scratch the itch that Nova already has covered with migrations, we can always rebalance our compute nodes with some maintenance window/s but as for the network nodes they seem destined to just overflow and give in - having little space for horizontal scaling as we canno reassign the LRP priorities to the new nodes | 14:06 |
ralonsoh | yes, but this is the user case | 14:08 |
ralonsoh | how this tool should work? | 14:08 |
ralonsoh | why is it needed? | 14:08 |
haleyb | are there other options? etc | 14:09 |
haleyb | f0o: are you still here? | 14:10 |
f0o | I can only speak for us, we are happy with a one-shot command that we can Cron or issue on-demand when we see capacity imbalances. But if there's a better option akin to feedback-loops where neutron would check on chassis-events if it needs to rebalance or not, that can also work. However I'm unsure how invasive the changing of LRP priorities is | 14:10 |
f0o | I dug a big and saw that I could just YOLO change the priorities of the lrp<>chassis but I have no idea how that affects neutron if I were to do that with a bash-loop | 14:11 |
f0o | The situation we are in now is a deadlocked one. The only rectifying way I can see to get the priorities shifted is to remove all routers and readd them which breaks FIPs and causes big outages | 14:11 |
ralonsoh | how did you end in this situation? | 14:12 |
f0o | I'm not even sure how one GN ended up with being the prio for all lrps - but that's out of scope because even if thats fixed in $future, the bad situation wont be fixed retrospectively | 14:12 |
ralonsoh | did you remove GW chassis? | 14:12 |
f0o | we had no notable downtime of GW chassis, we had a few maintenances that made a few failovers but nothing longer than an hour | 14:12 |
f0o | during which we didnt accept public neutron endpoint requests to avoid customers from creating routers | 14:13 |
ralonsoh | but you removed GW chassis from the deployment | 14:13 |
f0o | we did not, we just quit ovs and performed the maintenance and rebooted | 14:13 |
ralonsoh | did you stop GW chassis? | 14:14 |
f0o | yeah via systemctl stop ovs-vswitchd | 14:14 |
ralonsoh | so this is why OVN rebalanced the LRP assigned chassis | 14:15 |
ralonsoh | so you manually made an action that triggered the rebalance of the assigned chassis | 14:15 |
f0o | likely; it was our understanding that the LRP priorities would go back to the previously assigned chassis | 14:15 |
ralonsoh | eventually, when the high prio chassis is restored, the assigned chassis should be changed back, yes | 14:16 |
ralonsoh | but if that didn't happen, then you need to check your OVN chassis list | 14:16 |
ralonsoh | in any case, about this tool, it should be an admin only tool, of course | 14:16 |
f0o | what I saw when we spoke is that the rt1 chassis has prio 2 and the rt1 chassis has prio 1 on all lrps | 14:17 |
f0o | so the chassis is there, but its low prio on all lrps, instead of regaining prio 2 on some of them | 14:17 |
ralonsoh | yes but you didn't tell me that you rebooted systems | 14:17 |
ralonsoh | that was missing in this conversation | 14:17 |
ralonsoh | in any case | 14:17 |
ralonsoh | first of all, you should also open a bug related to the OVN L3 scheduler | 14:18 |
f0o | how should the maintance have been performed to avoid this in the future? because some actions require reboots | 14:18 |
ralonsoh | if with 2 GW chassis the balance is not correctly done, this could be al algorithm issue | 14:18 |
ralonsoh | f0o, as commented, when you restart a GW chassis, OVN restores the high prio chassis assigned | 14:18 |
f0o | if I were to add a 3rd GW chassis now, how would the existing LRPs be redistributed so that the new GW wouldnt be empty until a new LRP is created? | 14:18 |
ralonsoh | f0o, by default, the underlaying architecture is usually static in a deployment | 14:19 |
ralonsoh | you are talking about upgrade/maintenance operations | 14:19 |
ralonsoh | and this is fine but this is not normal operation | 14:19 |
ralonsoh | do not expect the OVN L3 scheduler to re-balance | 14:20 |
f0o | if the L3 Scheduler wont rebalance that I need to be able to do so, no? | 14:20 |
f0o | or at least have the option to do so if I wanted to. Otherwise I will have a dying node permanently and no way to avoid it since it just happens to have more LRPs historically | 14:21 |
ralonsoh | again, I think this tool is something nice to have and useful **after cluster modifications** | 14:22 |
f0o | +1 | 14:22 |
mlavalle | f0o: would you implement it? | 14:22 |
f0o | mlavalle: I have no domain knowledge how external changes to the OVN LRP prios would affect neutron internals. But I can create a tool that just scans all LRPs and all chassis and just redistributes the priors to have a balanced sheet. If neutron needs to be informed of this and somebody can point me to _how_ I could inform neutron of this change so the DB (I guess?) is in | 14:24 |
f0o | sync then I can do that as well | 14:24 |
f0o | this is where we are now halted as well, if we are able to freely reassign prios without neutron going haywire from it (AKA no notifications needed) then I can make a PoC very quickly | 14:25 |
ralonsoh | I would need to check but this is only OVN NB config: there is a list of gateway_chassis registers per LRP and they are assigned to the LRP.gateway_chassis list | 14:27 |
ralonsoh | Neutron DB doesn't play a role here | 14:27 |
f0o | so then issuing ovn-nbctl lrp-set-gateway-chassis lrp-123 chassis-with-prio-1 900 on the northd hosts should cause no problems, right? | 14:28 |
ralonsoh | hold on, if you are creating a neutron script, you should use the mechanisms provided by Neutron | 14:29 |
ralonsoh | we are not talking about a bash script here | 14:29 |
ralonsoh | check any other Neutron script | 14:29 |
lajoskatona | yeah like remove_duplicated_port_bindings.py | 14:30 |
lajoskatona | https://opendev.org/openstack/neutron/src/branch/master/neutron/cmd/remove_duplicated_port_bindings.py | 14:30 |
ralonsoh | that's a good example | 14:30 |
f0o | ok, I'll have a look at the OVN NB codebase in neutron and see what I can access/use there | 14:31 |
mlavalle | +1 | 14:32 |
ralonsoh | maybe neutron_ovn_db_sync_util, that has access to the OVN DB could be more useful | 14:32 |
ralonsoh | do we need a spec for this RFE? | 14:32 |
ralonsoh | I don't think so | 14:32 |
ralonsoh | but should be properly documented | 14:32 |
haleyb | if it's small in scope i would agree we don't need a spec | 14:33 |
haleyb | i think it should be separate from neutron-ovn-db-sync to be clear | 14:34 |
ralonsoh | yes, this is just an example | 14:34 |
ralonsoh | so should we vote then? | 14:34 |
mlavalle | +1 | 14:35 |
ralonsoh | +1 | 14:35 |
obondarev | +1 | 14:35 |
lajoskatona | +1 with good documentation what the user can expect using the tool | 14:35 |
haleyb | +1 | 14:35 |
f0o | +1 (Not even sure if mine counts ;) | 14:35 |
mlavalle | lol, it doesn't but it is welcome | 14:36 |
lajoskatona | your development will count more :-) | 14:36 |
haleyb | f0o: ok, can you write-up the summary here and add to the bug? | 14:36 |
haleyb | i will approve it | 14:36 |
f0o | haleyb: will do | 14:37 |
haleyb | we can then just comment in any patch | 14:37 |
haleyb | f0o: thanks for bringing up the issue and thanks for working on it | 14:37 |
f0o | Happy to help :) | 14:38 |
haleyb | is there anything else people want to discuss? | 14:38 |
haleyb | ok, then thanks for attending, and have a good weekend! | 14:39 |
haleyb | #endmeeting | 14:39 |
opendevmeet | Meeting ended Fri Jan 31 14:39:19 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 14:39 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/neutron_drivers/2025/neutron_drivers.2025-01-31-14.00.html | 14:39 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/neutron_drivers/2025/neutron_drivers.2025-01-31-14.00.txt | 14:39 |
opendevmeet | Log: https://meetings.opendev.org/meetings/neutron_drivers/2025/neutron_drivers.2025-01-31-14.00.log.html | 14:39 |
ralonsoh | bye | 14:39 |
mlavalle | \o | 14:39 |
f0o | o/ | 14:39 |
* f0o puts on the secretary hat and types away | 14:40 | |
lajoskatona | Bye | 14:40 |
f0o | took a look at neutron.common.ovn and I think it has mostly everything. Might need to reimplement something akin to _sync_ha_chassis_group just for non chassis groups I guess. | 14:48 |
f0o | will do more digging over the weekend hopefully | 14:48 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2024.1: [OVN] Set reside-on-chassis-redirect also for FLAT networks https://review.opendev.org/c/openstack/neutron/+/940454 | 15:04 |
opendevreview | Rodolfo Alonso proposed openstack/neutron stable/2023.2: [OVN] Set reside-on-chassis-redirect also for FLAT networks https://review.opendev.org/c/openstack/neutron/+/940503 | 15:07 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [OVN] Refresh host nodes before notifying https://review.opendev.org/c/openstack/neutron/+/940256 | 15:15 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [OVN] Refactor ``HashRingManager`` sync method https://review.opendev.org/c/openstack/neutron/+/940342 | 15:37 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765 | 16:13 |
sahid | o/ if you have a moment, how do you feel about this one? | 16:16 |
sahid | https://review.opendev.org/c/openstack/neutron/+/939321 | 16:16 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ovn-bgp-agent master: Add option to avoid VRF removal https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940537 | 16:32 |
priteau | Hello. Could we get another +2 for this backport? Thanks! https://review.opendev.org/c/openstack/neutron/+/940463 | 16:37 |
bcafarel | priteau: let me check that | 16:39 |
opendevreview | Merged openstack/neutron stable/2024.1: [OVN] Add the network type to the ``Logical_Switch`` register https://review.opendev.org/c/openstack/neutron/+/940453 | 17:48 |
opendevreview | Merged openstack/neutron stable/2023.2: [OVN] Add the network type to the ``Logical_Switch`` register https://review.opendev.org/c/openstack/neutron/+/940502 | 17:49 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add LB sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/925324 | 18:15 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Listener sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931250 | 18:15 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Pool sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931266 | 18:15 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Member sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931267 | 18:15 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Health monitor sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931288 | 18:15 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add sync floating IP support https://review.opendev.org/c/openstack/ovn-octavia-provider/+/929039 | 18:15 |
opendevreview | Merged openstack/neutron unmaintained/2023.1: Clean up state VRRP PID file https://review.opendev.org/c/openstack/neutron/+/940463 | 18:37 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Listener sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931250 | 18:39 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Pool sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931266 | 18:39 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Member sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931267 | 18:39 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add Health monitor sync logic https://review.opendev.org/c/openstack/ovn-octavia-provider/+/931288 | 18:40 |
opendevreview | Fernando Royo proposed openstack/ovn-octavia-provider master: Add sync floating IP support https://review.opendev.org/c/openstack/ovn-octavia-provider/+/929039 | 18:40 |
opendevreview | Merged openstack/neutron master: [eventlet-removal] Use non-eventlet metadata proxy in OVN metadata agent https://review.opendev.org/c/openstack/neutron/+/938393 | 19:34 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!