opendevreview | Lajos Katona proposed openstack/neutron master: If OVS Manager creation failes retry to set values https://review.opendev.org/c/openstack/neutron/+/939117 | 08:14 |
---|---|---|
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: async_process: fix potential race condition with respawn https://review.opendev.org/c/openstack/neutron/+/939627 | 08:20 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: async_process: remove usage of eventlet for AsyncProcess https://review.opendev.org/c/openstack/neutron/+/939348 | 08:20 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: common: fix wait_until_true to support native thread https://review.opendev.org/c/openstack/neutron/+/937843 | 08:20 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321 | 08:20 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765 | 08:20 |
slaweq | ralonsoh ykarel or lajoskatona hi, can one of you check and approve https://review.opendev.org/c/openstack/neutron/+/938135/? It already have 2 x +2 and then parent patch was just merged tonight | 08:27 |
ralonsoh | let me check | 08:27 |
ralonsoh | sure | 08:27 |
slaweq | thx, and also https://review.opendev.org/c/openstack/neutron/+/937887 if you have time | 08:27 |
ralonsoh | one sec | 08:27 |
ralonsoh | ahh you implemented one_or_none() | 08:28 |
slaweq | yes, I used it as you advaiced me :) | 08:29 |
ralonsoh | if you have time, these short 3 patches | 08:29 |
ralonsoh | https://review.opendev.org/c/openstack/neutron/+/939095/4 | 08:29 |
ralonsoh | https://review.opendev.org/c/openstack/neutron/+/939097/5 | 08:29 |
ralonsoh | https://review.opendev.org/c/openstack/neutron/+/939210/5 | 08:29 |
slaweq | ralonsoh also also those backports https://review.opendev.org/c/openstack/neutron/+/937887 :) | 08:29 |
slaweq | and sure, I will look at your patches right now :) | 08:30 |
ralonsoh | sure | 08:30 |
lajoskatona | slaweq, ralonsoh: good morning, I check it | 08:30 |
ralonsoh | slaweq, what backports?? | 08:30 |
slaweq | thx lajoskatona | 08:30 |
slaweq | https://review.opendev.org/q/Ic11992ba3ed91980189efbacdc2a54fba64fcf7c | 08:30 |
ralonsoh | ahhh yes | 08:30 |
slaweq | sorry, I copied wrong link before :) | 08:30 |
ralonsoh | btw, yesterday I was talking to Terry and I think we know how to fix the hash ring manager instability with wsgi. I'm pushing today a patch (that will be better than describing it here) | 08:32 |
slaweq | great, I hope it will work and our gate will be more stable :) | 08:33 |
slaweq | I +2'ed your patches already | 08:35 |
ralonsoh | thanks | 08:35 |
slaweq | and approved 2 of them which had other +2 | 08:35 |
slaweq | one of them you just rechecked and it needs another +2 still | 08:35 |
slaweq | so maybe lajoskatona or ykarel will look into it too | 08:36 |
ralonsoh | that would be perfect | 08:36 |
ralonsoh | closing some eventlet-removal bits | 08:36 |
opendevreview | Vasyl Saienko proposed openstack/neutron master: Add link to Octavia and SRIOV limitations from generic OVN Gaps page https://review.opendev.org/c/openstack/neutron/+/939946 | 08:39 |
ralonsoh | haleyb, hello! Please check https://review.opendev.org/c/zuul/zuul-jobs/+/940074 comments. As you know, because of the comment in twine, they jumped from 20.04 to 24.04, with the issues during the installation | 09:15 |
ralonsoh | and the jobs is still failing in all openstack... | 09:16 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ovn-bgp-agent master: Ensure that ARP/NDP is enabled for vlan devices https://review.opendev.org/c/openstack/ovn-bgp-agent/+/935801 | 10:36 |
opendevreview | Merged openstack/neutron master: Make API policies for tags to be working with resource attributes https://review.opendev.org/c/openstack/neutron/+/938135 | 11:14 |
opendevreview | Anton Kurbatov proposed openstack/neutron master: Fix DHCP agent events throttling malfunction https://review.opendev.org/c/openstack/neutron/+/939970 | 11:20 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [eventlet-removal][OVN] Require wsgi start-time in the config https://review.opendev.org/c/openstack/neutron/+/940123 | 12:05 |
f0o | ralonsoh: at risk of asking outdated information; I just came across your proposal to implement a new l3 scheduler for OVNGatewayHAChassisGroup while investigating why all HA routers are located on one network node instead of being split among both (least loaded does suggest this split but I guess not..?) - I saw your WIP Patch was abandoned, did it get replaced by | 12:25 |
f0o | something similar? Am I barking at the wrong tree? | 12:25 |
opendevreview | Michel Nederlof proposed openstack/ovn-bgp-agent master: Fix cleanup of rules per evpn device https://review.opendev.org/c/openstack/ovn-bgp-agent/+/927816 | 12:33 |
ralonsoh | f0o, the patch I started 1 year ago to implement a new L3 scheduler using HA_Chassis_Group makes no sense with this new requirement if we are going to migrate to HA_CHassis_Group the current schedulers | 12:33 |
ralonsoh | right now, the HA_Chassis_Group is used for external ports only | 12:34 |
ralonsoh | when the HA_Chassis list is built, it is not considered any scheduling method, just retrieve the list and that's all | 12:34 |
ralonsoh | thus, it could be possible to also include, the external ports HA_Chassis_Group creation, some kind of scheduling, in order to balance the GW chassis | 12:36 |
opendevreview | yatin proposed openstack/neutron master: [DNM] Check functional failure https://review.opendev.org/c/openstack/neutron/+/940128 | 12:37 |
f0o | I feel like either I misunderstood how L3HA is supposed to work on OVN or my setup is not working correctly. What I see is that all routers are shifted to the other node on BFD failure, which is great. But all routers always remain on one chassis which is a shame and makes it a bit difficult to scale since we cant just add more nodes but need to add _bigger_ nodes to scale | 12:37 |
ralonsoh | f0o, the L3 scheduler default algorithm is leastloaded, that should "shared" the GW ports across all GW nodes | 12:41 |
ralonsoh | how many GW chassis do you have? do you have availability zones? | 12:41 |
f0o | 2 GW Nodes (https://paste.opendev.org/show/bQ7LLYGm7OEyEskI9vJv/) - no AZs | 12:43 |
ralonsoh | f0o, yes but are you sure the chassis are GW nodes? please check the ovs local config | 12:44 |
ralonsoh | one sec | 12:44 |
ralonsoh | root@u22ovn:~# ovs-vsctl list open . | grep enable-chassis-as-gw | 12:44 |
ralonsoh | external_ids : {hostname=u22ovn, ovn-bridge=br-int, ovn-bridge-mappings="public:br-ex", ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="192.168.10.100", ovn-encap-type=geneve, ovn-remote="tcp:192.168.10.100:6642", system-id="3c4a168c-ac2a-496d-9c04-1a1ced01052a"} | 12:44 |
ralonsoh | check this in each host | 12:44 |
f0o | I got that set on both rt1 and rt2 | 12:45 |
f0o | or does it need to be set on the computehosts too? I dont run distributed FIPs because we dont want to drag the public vlan into all hypervisors | 12:45 |
ralonsoh | no if you don't want dvr, that's ok | 12:46 |
ralonsoh | but there is something weird in this distribution | 12:46 |
ralonsoh | 1 lrp in rt2 only | 12:46 |
f0o | (https://paste.opendev.org/show/bzhH9TyDBSMAqy65juDJ/ if you want to see the output) | 12:46 |
ralonsoh | no, that's fine | 12:47 |
f0o | yeah that's what confused me too, it's not 100% on rt1 and 0 on rt2, there's one router that has no special config or range or setup that is on rt2 | 12:47 |
ralonsoh | so what are you doing? just create a router and assign the public network, right? | 12:47 |
f0o | I tripple checked that it is set to leastloaded and there was no mentionable downtime of rt2 so that scheduling could've been skewed | 12:47 |
f0o | correct, just create router and add to network | 12:47 |
opendevreview | Merged openstack/neutron stable/2023.2: Make sure that policy enforcer is initialized before use https://review.opendev.org/c/openstack/neutron/+/938913 | 12:48 |
f0o | If there is a way to rebalance these bindings retroactively I could just make a daily cronjob and call it a day haha | 12:48 |
ralonsoh | ok, there could be something broken in the algorithm with 2 nodes. I think the tests are now using 3 or more GW chassis | 12:48 |
ralonsoh | but there could be some weird corner case with 2 GW nodes only | 12:48 |
ralonsoh | no, right now there is no way to do this (in an automated way) | 12:49 |
ralonsoh | you can: | 12:49 |
ralonsoh | 1) propose the implementation of this tool (this is something that has been discussed before). Please open a LP bug | 12:49 |
ralonsoh | 2) open another LP bug for the L3 issue with 2 GW nodes | 12:50 |
ralonsoh | this is something that can be tested with UTs | 12:50 |
f0o | My memory is a bit weak here but I recall an LP bug/feature-request about this tool a while back... need to comb my history | 12:51 |
ralonsoh | f0o, what version are you running? there were many new features/changes during the last year | 12:51 |
f0o | but from my very limited understanding; the tool only needs to change the chassis priorities in OVN right? | 12:51 |
f0o | running openstack-ansible 2024.2 | 12:51 |
f0o | let me dig out the tooling versions | 12:52 |
ralonsoh | f0o, yes, you need to change the HA_Chassis priority in sync with the other registers associated to this HA_Chassis_Group | 12:52 |
ralonsoh | but this tool is dangerous because that implies a change in the LRP binding | 12:52 |
ralonsoh | and that implies breaking the active traffic | 12:53 |
ralonsoh | so this is not a trivial tool for a live env | 12:53 |
f0o | so when you mention HA_Chassis_Group; None of my LRPs have that populated; it's all `[]` but the rt* ids show in status field as hosting-chassis | 12:54 |
ralonsoh | sorry | 12:55 |
ralonsoh | not hcg, this is still NOT implemented | 12:55 |
ralonsoh | I mean gateway_chassis | 12:55 |
f0o | https://paste.opendev.org/show/bXH1riYAo1Tzuxw4LqcF/ << gateway chassis seems to only include the Hypervisors with active VMs for me | 12:56 |
ralonsoh | the second one is incorrect | 12:56 |
ralonsoh | by default the gateway_chassis list can have up to 5 elements (that is hardcoded) | 12:57 |
ralonsoh | so if you have 2 GW chassis, that should have 2 elements | 12:57 |
ralonsoh | something didn't work well with the scheduler and 1 of the GW chassis got rejected | 12:57 |
f0o | oh let me go through the other lrp's then, maybe the rest also only has 1 entry | 12:58 |
f0o | also disregard that statement on hypervisor uuids; I grepped all chassis and the UUIDs are the GW nodes just with different Ids over and over again | 12:59 |
f0o | ralonsoh: https://paste.opendev.org/show/b8BqVZnyXqzhFEaYB6NM/ All have 2 entries apart from that one that is on rt2. So somehow rt1 has full priority over rt2 and the only router that's on rt2 is because it wasnt scheduled on rt1 due to a bug? :D | 13:04 |
ralonsoh | f0o, I would need to test the scheduler algorithm with 2 nodes, that could be done with UTs I think (this is just python code) | 13:06 |
f0o | Interestingly enough, the lrp that's only on rt2 is just a standard router no different than the rest | 13:07 |
f0o | fun :) | 13:07 |
opendevreview | Michel Nederlof proposed openstack/ovn-bgp-agent master: Fix running sync method for every external_ids update. https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940129 | 13:10 |
f0o | so if I wanted to create a tool to migrate the LRPs priorities around I would just issue ovn-nbctl lrp-set-gateway-chassis lrp-123 chassis-with-prio-1 3 && ovn-nbctl lrp-set-gateway-chassis lrp-123 chassis-with-prio-2 1 && ovn-nbctl lrp-set-gateway-chassis lrp-123 chassis-with-prio-1 2 - which would swap the chassises with prio1 to prio3 (making it active), demote the | 13:11 |
f0o | previously active chassis to prio 1 and ultimately make the new primary chassis to prio 2 to retain numbering standards | 13:11 |
f0o | obviously this would not be 1,2,3 but something a bit bigger like 1,2,...n,n+1 to avoid collision | 13:12 |
opendevreview | Michel Nederlof proposed openstack/ovn-bgp-agent master: Fix running sync method for every external_ids update. https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940129 | 13:12 |
f0o | I understand that this is quite the jackhammer method, I'd just like to know if this would even work since a fix in the assigment algo is not going to backpropagate to these existing lrp's | 13:13 |
f0o | do I need to inform/mess with neutron in any way or is this entire thing confined to OVN? (that would be very nice tbh) | 13:14 |
zigo | ralonsoh: We still continue to have openvswitch-agent turning in loops. The type of logs we're getting: | 13:17 |
zigo | https://paste.opendev.org/show/b6rGYiuIj0QqyzvSAi5X/ | 13:17 |
zigo | It also happen in some compute nodes that are running with no VM, that's weird... | 13:17 |
opendevreview | Michel Nederlof proposed openstack/ovn-bgp-agent master: Expose floating ips attached to virtual ports https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940132 | 13:31 |
opendevreview | Michel Nederlof proposed openstack/ovn-bgp-agent master: Expose floating ips attached to virtual ports https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940132 | 13:36 |
opendevreview | Michel Nederlof proposed openstack/ovn-bgp-agent master: Fix running sync method for every external_ids update. https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940129 | 13:41 |
ralonsoh | f0o, this kind of tool affects Neutron only. We are OVN users and OVN is not responsible of creating the gateway_chassis priorities | 13:48 |
ralonsoh | if you are going to create something like this, I would suggest to first create a LP with [RFE] in the title | 13:48 |
ralonsoh | and present it in the Neutron drivers meeting (Fridays at 14UTC) | 13:49 |
ralonsoh | today there is no meeting because there is no agenda | 13:49 |
ralonsoh | the agenda link (to add the topic) | 13:49 |
ralonsoh | https://wiki.openstack.org/wiki/Meetings/NeutronDrivers | 13:49 |
ralonsoh | zigo, let me check | 13:50 |
ralonsoh | zigo, you receive the RPC call, then the agent can (or not) do anything | 13:51 |
zigo | Thanks. | 13:51 |
ralonsoh | if there are no ports affected, then no action is done | 13:51 |
ralonsoh | but you'll see the RPC event received | 13:51 |
ralonsoh | l2pop is a bit chatty, but that makes sense if any agent needs to receive all events from all VMs | 13:52 |
ralonsoh | so anytime you create a new VM and spawn a port (or many), you'll receive that in every OVS agent | 13:52 |
zigo | Yeah, but there's some req-IDs that have been ongoing for more than a day. | 13:53 |
zigo | That feels weird ... | 13:54 |
zigo | I now believe we had this before, though what changed with my upgrade is that I'm now in debug True. | 13:54 |
zigo | So maybe I'm just seeing things I didn't before. | 13:54 |
f0o | ralonsoh: thanks gotcha - will probably do so in the near future. unfortunately I'm swamped today (family keeping me on my toes) | 13:55 |
haleyb | ralonsoh: yes, will try to get that updated today | 14:00 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ovn-bgp-agent master: Ensure that ARP/NDP is enabled for vlan devices https://review.opendev.org/c/openstack/ovn-bgp-agent/+/935801 | 14:01 |
ralonsoh | haleyb, thanks! | 14:01 |
ralonsoh | btw, please check this patch | 14:01 |
ralonsoh | https://review.opendev.org/c/openstack/neutron/+/939095 | 14:01 |
opendevreview | Bence Romsics proposed openstack/neutron master: Do not assume the existence of a trunk bridge since os-vif may have deleted it https://review.opendev.org/c/openstack/neutron/+/939786 | 14:01 |
ykarel | froyo_, when you get chance please check https://bugs.launchpad.net/neutron/+bug/2095807, may be that's already something known and a duplicated bug | 14:02 |
zigo | ralonsoh: It really is the case that it's just because we switched to debug=True, so we didn't know what was going on. Though I wonder why that much continue traffic for L2 stuff, it still doesn't feel right. | 14:02 |
ralonsoh | the point is that you need all the events and then filter them depending on the local ports | 14:03 |
ralonsoh | but you need to receive all of them in the first place | 14:03 |
zigo | Even in a compute with no VMs ? | 14:04 |
ralonsoh | yes, you are subscribed to all events, then the OVS agent will filter the ones needed | 14:05 |
ralonsoh | you can check that code | 14:05 |
zigo | Ok, thanks, understood. | 14:05 |
ralonsoh | that said, OVS agent is not the most scalable architecture | 14:05 |
zigo | I really see in the rabbit monitoring page that it's a broadcast thingy. | 14:05 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ovn-bgp-agent master: Ensure that ARP/NDP is enabled for vlan devices https://review.opendev.org/c/openstack/ovn-bgp-agent/+/935801 | 14:07 |
zigo | ralonsoh: Thanks for the details, this really helps. We were kind of scared, I'm not anymore, and I can continue my upgrades! :) | 14:10 |
ralonsoh | yw | 14:11 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ovn-bgp-agent master: Ensure that ARP/NDP is enabled for vlan devices https://review.opendev.org/c/openstack/ovn-bgp-agent/+/935801 | 14:33 |
opendevreview | Slawek Kaplonski proposed openstack/neutron master: Add limit of tags for every resource https://review.opendev.org/c/openstack/neutron/+/937887 | 14:34 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [eventlet-removal][OVN] Require wsgi start-time in the config https://review.opendev.org/c/openstack/neutron/+/940123 | 14:41 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ovn-bgp-agent master: Ensure that ARP/NDP is enabled for vlan devices https://review.opendev.org/c/openstack/ovn-bgp-agent/+/935801 | 14:48 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: async_process: fix potential race condition with respawn https://review.opendev.org/c/openstack/neutron/+/939627 | 15:07 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: async_process: remove usage of eventlet for AsyncProcess https://review.opendev.org/c/openstack/neutron/+/939348 | 15:07 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: common: fix wait_until_true to support native thread https://review.opendev.org/c/openstack/neutron/+/937843 | 15:07 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321 | 15:07 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765 | 15:07 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: polling: remove usage of eventlet.sleep() https://review.opendev.org/c/openstack/neutron/+/940136 | 15:07 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: WIP == [eventlet-removal] OVN hash ring manager reimplementation https://review.opendev.org/c/openstack/neutron/+/940140 | 15:20 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: WIP == [eventlet-removal] OVN hash ring manager reimplementation https://review.opendev.org/c/openstack/neutron/+/940140 | 16:03 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939977 | 16:06 |
ralonsoh | otherwiseguy, ^^ this patch is on top of the re-implementation of the hash ring. Now we just initialize the hash ring but we don't refresh it\ | 16:06 |
* otherwiseguy looks | 16:07 | |
ralonsoh | I need to leave now but I'll reconnect in 3 hours | 16:07 |
opendevreview | Brian Haley proposed openstack/neutron master: Optionally configure IPv6 metadata address https://review.opendev.org/c/openstack/neutron/+/926497 | 16:12 |
opendevreview | Brian Haley proposed openstack/neutron master: Optionally configure IPv6 metadata address https://review.opendev.org/c/openstack/neutron/+/926497 | 16:29 |
opendevreview | Merged openstack/neutron master: [eventlet-removal] Reimplement ``common.utils.spawn_n`` https://review.opendev.org/c/openstack/neutron/+/939095 | 17:23 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321 | 18:16 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765 | 18:16 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939977 | 18:31 |
opendevreview | Bernard Cafarelli proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939977 | 19:30 |
ralonsoh | bcafarel, thanks! | 19:30 |
bcafarel | np, I can still do 1-char typo fix :) | 19:31 |
opendevreview | Jakub Libosvar proposed openstack/neutron master: Update NAT entry on FIP update https://review.opendev.org/c/openstack/neutron/+/939918 | 21:47 |
opendevreview | Jakub Libosvar proposed openstack/ovn-bgp-agent master: Change DVR FIP events to monitor the NAT table https://review.opendev.org/c/openstack/ovn-bgp-agent/+/940174 | 22:46 |
opendevreview | Merged openstack/neutron master: Use is_cidr_host utils to detect if AAP ip is host in l3_dvr_db https://review.opendev.org/c/openstack/neutron/+/939075 | 23:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!