Wednesday, 2025-01-15

opendevreviewyatin proposed openstack/neutron master: Move neutron rally jobs to wsgi  https://review.opendev.org/c/openstack/neutron/+/93931505:41
opendevreviewyatin proposed openstack/neutron-tempest-plugin master: Always create router interface for ipv6 metadata test  https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/93910406:09
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: common: fix wait_until_true to support native thread  https://review.opendev.org/c/openstack/neutron/+/93784308:09
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent  https://review.opendev.org/c/openstack/neutron/+/93776508:10
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling  https://review.opendev.org/c/openstack/neutron/+/93932108:10
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use native os-ken implementation  https://review.opendev.org/c/openstack/neutron/+/93848708:10
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: common: fix wait_until_true to support native thread  https://review.opendev.org/c/openstack/neutron/+/93784308:13
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent  https://review.opendev.org/c/openstack/neutron/+/93776508:13
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use greenlet os-ken implementation not monkey-patched  https://review.opendev.org/c/openstack/neutron/+/93905708:13
ralonsohlajoskatona, slaweq hi folks, please check ykarel patch: https://review.opendev.org/c/openstack/neutron/+/93931508:36
ralonsohnew eventlet (surprise!) is breaking the Neutron CI08:36
slaweqralonsoh done08:38
ralonsohthanks!08:38
*** dmellado075539377 is now known as dmellado0755393708:49
ralonsohykarel, hi! I'08:56
ralonsoh(sorry)08:56
ralonsohI'm going to implement the fix for https://bugs.launchpad.net/neutron/+bug/209473608:57
ralonsohusing the port_binding08:57
ralonsohof course, in parallel, we need to figure out why we are missing the LSP events08:57
lajoskatonaralonsoh: thanks for headsup, just reached to this patch and LP bug09:52
ykarelralonsoh, yes sure as mentioned over mail no objection from me on that parallel effort10:07
ykarellajoskatona, ralonsoh can you also check https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/93910411:05
ralonsohsure11:06
opendevreviewSlawek Kaplonski proposed openstack/neutron master: Add limit of tags for every resource  https://review.opendev.org/c/openstack/neutron/+/93788711:26
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling  https://review.opendev.org/c/openstack/neutron/+/93932111:36
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent  https://review.opendev.org/c/openstack/neutron/+/93776511:36
opendevreviewGaudenz Steinlin proposed openstack/neutron master: Fixup conntrackd support  https://review.opendev.org/c/openstack/neutron/+/93880013:29
ihrachysralonsoh: ykarel_ thanks for figuring out what happened with rally. so the fact that it was q-svc WAS important :)13:33
ralonsohihrachys, actually was ykarel_ only, I just didn't migrate the rally jobs... because of this issue13:34
opendevreviewRodolfo Alonso proposed openstack/neutron master: WIP == [OVN] ``PortBindingUpdateUpEvent``  https://review.opendev.org/c/openstack/neutron/+/93934513:37
ralonsohihrachys, ^^ btw13:37
ihrachysralonsoh: re LSP update missing, in pb switch, hiding is a concern. apparently with https://review.opendev.org/c/openstack/devstack/+/937606 we should have more logs for northd to confirm it translates pb up to LSP up.13:37
ralonsohihrachys, but I see the ovn logs that this is happening13:38
* haleyb hides in shame regarding q-svc13:38
ralonsohI've reviewed several jobs13:38
ihrachysyou mean northd translated?13:38
ralonsohand always the LSP.up=true happens13:38
ralonsohbut Neutron never receives it13:38
ihrachysok. so then it's ovsdb-server not notifying / idl layer not receiving for reasons13:38
ralonsohI think is the IDL layer (I think, of course, with no proof)13:39
ihrachysare notify events logged by ovsdb-server?13:39
ralonsohwhat events?13:39
ralonsohI see the NB and SB events (for the port_binding and the LSP)13:40
ihrachysalso interesting if this can be seen with any versions of ovs/ovn or just the old ones from ubuntu repos13:40
ralonsohwe started hitting this issue since the wsgi migration13:40
ralonsoh(this is why I suspect this could be something related to the hash ring manager)13:40
ralonsohfor example, in the https://bugs.launchpad.net/neutron/+bug/2094736 description13:40
ihrachysralonsoh: AFAIU neutron-server registers a monitor with ovsdb-server. then on matching events, ovsdb-server would notify about them to corresponding monitors. correct? (if so then I'd expect some way to confirm ovsdb-server does it, on ovsdb side)13:41
ralonsohin these logs, all the NB and SB events are correctly defined: the port_binding.up and some usecs later the LSP.up13:41
ralonsohihrachys, ok, so that's extra info, for sure13:41
ralonsohihrachys, let me push a patch on top of yours13:41
ihrachysthe timing with uwsgi change is damning though, I agree.13:42
ralonsohwith repeated CI jobs and (that's important) more API workers13:42
*** ykarel_ is now known as ykarel13:43
opendevreviewRodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI  https://review.opendev.org/c/openstack/neutron/+/93934613:43
ralonsohihrachys, ^13:43
ralonsohahhhh no, wrong patch13:44
ykareland just to add we see all these issues with wsgi switch when running with >1 uwsgi workers13:44
opendevreviewRodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI  https://review.opendev.org/c/openstack/neutron/+/93934613:45
ralonsohdone ^ this patch is on top of a patch that uses the devstack one13:45
ralonsohlet's wait for the CI results13:45
opendevreviewRodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI - BUG 2094736  https://review.opendev.org/c/openstack/neutron/+/93934713:47
ihrachyslet's say hash ring is the culprit - somehow the worker that grabbed it is locked. I'd expect no events to be received at all then. but we see port-binding events happening, just not LSP. are these untangled in regards to locking?13:48
ralonsohihrachys, the events are hashed using the row.uuid13:50
ralonsohthe LSP.uuid and PB.uuid are not the same13:51
ralonsohdespite we are talking about the same Neutron port13:51
ralonsohso, as you said, the worker dealing with any other LSP event could be locked, but not the other one that matches pb.uuid13:52
ihrachysmeaning, one worker handles PB but not LSP. if LSP worker locked / died, we'll see pb updates but not lsps. if so, switching to pb won't do much?13:52
ihrachysI mean, now we will - sometimes - lose pb events :)13:53
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling  https://review.opendev.org/c/openstack/neutron/+/93932113:53
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent  https://review.opendev.org/c/openstack/neutron/+/93776513:53
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: async_process: remove usage of eventlet for AsyncProcess  https://review.opendev.org/c/openstack/neutron/+/93934813:53
ihrachys(that's the theory / rambling; we'll see if WIP changes anything I guess)13:53
opendevreviewLajos Katona proposed openstack/tap-as-a-service master: Doc: add documentation for usage and flow examples for OVS  https://review.opendev.org/c/openstack/tap-as-a-service/+/82838213:55
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use native os-ken implementation  https://review.opendev.org/c/openstack/neutron/+/93848713:55
opendevreviewLajos Katona proposed openstack/tap-as-a-service master: Doc: add documentation for usage and driver details for SRIOV driver  https://review.opendev.org/c/openstack/tap-as-a-service/+/88180713:55
opendevreviewSahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use greenlet os-ken implementation not monkey-patched  https://review.opendev.org/c/openstack/neutron/+/93905713:56
lajoskatonaDear Neutron Cores: please review my documentation patches for tap-as-a-service: https://review.opendev.org/q/topic:%22taas_driver_docs%2213:58
lajoskatonaThanks in advance13:59
ralonsohlajoskatona, sure14:00
ihrachysis it normal that I see "Hash Ring loaded. 2 active nodes. 0 offline nodes" in logs posted every other minute? does it reinitialize idl or what?14:01
ihrachys(sometimes it's also "Hash Ring loaded. 1 active nodes. 0 offline nodes")14:02
ralonsohihrachys, it is reloaded, this is just a debug message14:02
ralonsohihrachys, in the initial transient period, the number can change14:03
ralonsohbut once all workers are loaded (they have updated the Neutron DB register)14:03
ralonsohthe number should be always static14:03
ralonsohand all must be active 14:03
ihrachyswell this is is not at the start of the service; it's mid-flight14:05
ralonsohihrachys, so this is a problem14:05
ralonsohI have some patches there under review14:05
ralonsohone sec14:05
ralonsohhttps://review.opendev.org/c/openstack/neutron/+/93735114:06
ralonsohactually I need to check the comments14:06
ralonsohihrachys, the point is that we have a thread (per worker) that is in charge of refreshing the Neutron DB hashring register14:07
ralonsohif the hash ring reload method (other thread) reloads (reads the active nodes) and see a non-updated one, we'll have this issue14:08
ralonsohso I would go for https://review.opendev.org/c/openstack/neutron/+/937351, at least during the eventlet deprecation14:08
ralonsohwe have issues with the GIL yield and the thread in charge of refreshing the hashring register is not executed on time14:09
ralonsoh(that should NOT happen with kernel/preemptive threads)14:09
opendevreviewyatin proposed openstack/neutron master: [DNM] repro functional failure with test order reversed  https://review.opendev.org/c/openstack/neutron/+/93775714:15
ihrachysin a log, I see for a while all events handled by one node only14:17
opendevreviewRodolfo Alonso proposed openstack/neutron master: [OVN] Reduce the OVN hash ring touch interval  https://review.opendev.org/c/openstack/neutron/+/93735114:17
ihrachysand just before the other one seized to handle any, I see Hash Ring loaded. 0 active nodes. 2 offline nodes; then HashRing is empty, error: Hash Ring returned empty when hashing "b'22ef8001-b0d9-43fd-956b-0abd72515c54'". All 2 nodes were found offline. This should never happen in a normal situation, please check the status of your cluster:14:18
ralonsohihrachys, exactly, that happens if the hash ring node looses one active node (by de fault, we have 2 API workers in the CI)14:18
ihrachysneutron.common.ovn.exceptions.HashRingIsEmpty: Hash Ring returned empty when hashing "b'22ef8001-b0d9-43fd-956b-0abd72515c54'". All 2 nodes were found offline. This should never happen in a normal situation, please check the status of your cluster14:18
ihrachysand then Hash Ring loaded. 1 active nodes. 1 offline nodes14:18
ralonsohplease check https://review.opendev.org/c/openstack/neutron/+/93735114:18
ihrachysok. so one node falls into offline. the other one should pick up its events, so it should not be the reason for lsp event lost either, right?14:20
ralonsohihrachys, yes, you are right14:20
ihrachysok looking more. so the worker that is actually handling events says "Hash Ring loaded. 2 active nodes. 0 offline nodes" but the worker that doesn't says "Hash Ring loaded. 1 active nodes. 0 offline nodes". that's just before the time when we miss the LSP update14:26
ihrachysthis looks like maybe there's a split reality situation14:27
ihrachysthe node that seized to handle events expects the other one to pick up the work14:27
ihrachysbut the other one still believes the retired worker will continue handling its events?14:28
ralonsohihrachys, that's the point, each node refresh it's own hashring manager independently, based on the DB status14:28
ralonsohbut this refresh operation is not done at the same time14:29
ralonsohso yes, that could lead to a situation where 2 nodes can discard an event because it doesn't belong to them (according to their own local hashring managers)14:29
ihrachysok so is it then... inherently racy?14:29
ralonsohso far, is the best implementation we have14:30
ihrachysok and to confirm I'm not piling up, just trying to make sense :)14:31
ihrachysso bumping timeouts in your patch is in hope that workers fall offline less often?14:31
ralonsohyes14:31
ralonsohand reducing the refresh time14:31
ihrachyssee, I'm slow but I'm getting there!14:32
ihrachysralonsoh: and what's the theory of why wsgi switch made it worse?14:43
ralonsohihrachys, to be honest, I'm not sure. But the main problems we have are related to the hash ring14:46
ralonsohmissing events, nodes offline, etc14:46
ralonsohthese problems are not present in Ml2/OVS14:47
ihrachysthe theory of touching the node more often seems reasonable BUT14:49
ihrachyswe touch nodes in notify()14:49
ihrachysand I see events handled by the worker-about-to-become-offline just second(s) before it goes offline14:49
ihrachysI'd think any ovsdb-monitor event would refresh the timer in db?14:50
ralonsohihrachys, that would be too much to update always update a register in the DB14:59
ralonsohhaving said that, this table has no references to other tables (that removes xreferences issues), the table is small and we are not modifying the indexes15:00
ralonsohso this touch should be very fast 15:01
ralonsohihrachys, btw, we have CI results: https://review.opendev.org/c/openstack/neutron/+/93934615:02
ralonsohthis is on top of your change, that uses de devstack patch15:02
ihrachysralonsoh: we already touch; it's not a suggestion, it's in code15:03
ihrachyshttps://github.com/openstack/neutron/blob/585ea689d5d26356e28d8eb47f6d0511d21806cf/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L83515:04
ihrachyswe see in logs the debug messages from line 852 from the worker15:04
ralonsohah right, just after an event15:04
ihrachysI don't see how this message can pop up without node also being touched. (assuming 30 seconds passed). so that's weird.15:06
ralonsohihrachys, sorry, what do you mean?15:06
ihrachysthe linked code, it seems that if ovsdb-monitor event is being handled, and the node is past HASH_RING_CACHE_TIMEOUT then the event handler would also touch the node in db.15:08
ihrachysbut then it's not clear why it would immediately be seen as offline by the same worker15:08
ihrachysexcept there's some caching of timestamps involved, so what is written in db may not necessarily be reflected into cache. maybe it should...15:09
ralonsohnononono15:10
ralonsohhold on15:10
ralonsohself._last_touch is a local variable15:10
ralonsohnot the DB timestamp15:10
ihrachysyeah I know. there's also _node_last_touch inside the hash manager15:10
ihrachysso when it get_node(), it uses the cached timestamps15:11
ihrachysbut when it touches, it touches db; but at the same time, cache is not updated with the new timestamp.15:11
ihrachyswonder if it should?15:11
ralonsohbtw, I'm checking https://fdc1d05a9a337a8993b4-089607d394060d72ce519e30966a0033.ssl.cf2.rackcdn.com/939346/2/check/neutron-ovn-tempest-ipv6-only-ovs-release-wsgi-2/4eb0215/testr_results.html15:12
ralonsohin particular the port for instance 3c49676b-0677-4021-b6e6-4a6ee65f070415:12
ralonsohin https://fdc1d05a9a337a8993b4-089607d394060d72ce519e30966a0033.ssl.cf2.rackcdn.com/939346/2/check/neutron-ovn-tempest-ipv6-only-ovs-release-wsgi-2/4eb0215/controller/logs/ovn/ovn/ovsdb-server-nb_log.txt15:12
ralonsohthat is LSP.uuid=10146369-053d-446e-a5fe-ba09158c3b4515:13
ralonsohwhere is the LSP.up field defined there?15:13
opendevreviewIhar Hrachyshka proposed openstack/neutron master: WIP: refresh hash ring timestamp cache on touch  https://review.opendev.org/c/openstack/neutron/+/93935415:13
ihrachyslike in ^ (sorry too lazy to stash some other changes that I have locally)15:13
ralonsohihrachys, nonono15:14
ralonsohhttps://review.opendev.org/c/openstack/neutron/+/93683815:14
ralonsohI explicitly added this parameter15:15
ralonsohin order to read the init time (passed via WSGI config)15:15
ihrachysignore this. look at where I add refresh() to touch15:15
ihrachys(should have backed these changes out not to confuse you)15:16
ihrachysjust this https://review.opendev.org/c/openstack/neutron/+/939354/1/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#84515:16
ralonsohihrachys, but this is actually the opposite to https://review.opendev.org/c/openstack/neutron/+/93735115:17
ihrachyswithout this, our touch of db does not affect the worker view of hash ring state (including its own state) and if the maint thread doesn't get to refresh() on time, then the touch was for naught (for this current worker)15:17
ralonsohyes but this refresh also reads other hash ring registers15:18
ralonsoh(that should be updated, of course)15:18
ihrachyslet it do that, not sure what the problem with it is (but if that's a concern, we can refresh just ourselves)15:19
ralonsohihrachys, ok, let's push a patch with this line change only15:19
ihrachysthe reduction of interval for cache update may also be good. not sure about cache timeout bump.15:19
ralonsohihrachys, please check the CI logs that I mentioned: https://fdc1d05a9a337a8993b4-089607d394060d72ce519e30966a0033.ssl.cf2.rackcdn.com/939346/2/check/neutron-ovn-tempest-ipv6-only-ovs-release-wsgi-2/4eb0215/testr_results.html15:20
ihrachys(actually I should probably do refresh=True there too)15:20
ihrachyswill check in a sec, let me push the refresh one first15:21
ralonsohI don't see the LSP.UP event15:21
ralonsohsure15:21
ihrachys(scratch above, refresh means refresh=True already)15:21
ralonsohno no, sorry, it is there15:22
opendevreviewIhar Hrachyshka proposed openstack/neutron master: ovn: Refresh hash ring timestamp cache on touch_node  https://review.opendev.org/c/openstack/neutron/+/93935715:26
opendevreviewIhar Hrachyshka proposed openstack/neutron master: WIP: nit: Remove _last_touch attribute for OvnIdlDistributedLock  https://review.opendev.org/c/openstack/neutron/+/93935915:29
ihrachysralonsoh: how can I test the patch with refresh I wonder?.. is there a way to thrash the gate to the point that it would always fail with the problem?15:29
ralonsohihrachys, sure, one sec15:30
ralonsohcherry pick this https://review.opendev.org/c/openstack/neutron/+/939346/. Remove the change-id (to create a new one)15:30
ralonsohpush it on top of your patch15:30
opendevreviewIhar Hrachyshka proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI  https://review.opendev.org/c/openstack/neutron/+/93936015:32
ihrachysralonsoh: to confirm, nothing to check in your logs above?15:33
ralonsohihrachys, well, only to confirm that is happening again the problem15:33
ralonsohwe have the LSP.up event15:33
ralonsoh2025-01-15T14:03:05.984Z|06409|jsonrpc|DBG|ssl:[2607:5300:201:2000::743]:40650: send notification, method="update3", params=["cfcf56ee-d348-11ef-aefb-fa163e340894","00000000-0000-0000-0000-000000000000",{"Logical_Switch_Port":{"b95d0ee3-cb36-4b02-87c9-1d07a051ab36":{"modify":{"up":true}}}}]15:33
ralonsohbut nothing is received in Neutron API15:34
ihrachysok great that we can validate ovsdb-server is doing the right thing anyway15:34
ihrachysI'll have to step down from this hash ring issue for the next few hours. will check results in ci later. thanks ralonsoh for bearing with me stoopid questions :p15:35
ralonsoha pleasure15:36
opendevreviewIhar Hrachyshka proposed openstack/neutron master: nit: Remove unused updated_at argument for touch_node  https://review.opendev.org/c/openstack/neutron/+/93936616:39
lajoskatonaotherwiseguy: Hi, there's an ovsdbapp bug perhaps you can decide if it can be something to check in detail: https://bugs.launchpad.net/ovsdbapp/+bug/2093247 , thanks in advance16:53
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server.17:08
opendevreviewLajos Katona proposed openstack/tap-as-a-service master: Doc: add documentation for usage and driver details for SRIOV driver  https://review.opendev.org/c/openstack/tap-as-a-service/+/88180717:11
lajoskatonahaleyb: Hi, if you have some free time for doc patches for taas: https://review.opendev.org/q/topic:%22taas_driver_docs%22  ;)17:12
opendevreviewMerged openstack/neutron master: Move neutron rally jobs to wsgi  https://review.opendev.org/c/openstack/neutron/+/93931517:24
ihrachysstill events missed with my refresh() patch, now in log I see:17:38
ihrachysCRITICAL neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance [-] The number of nodes in the Hash Ring (4) is higher than the number of API workers (2) for host "np0039575603". Something is not right and OVSDB events could be missed because of this. Please check the status of the Neutron processes, this can happen when the API workers are killed and restarted.17:38
ihrachysRestarting the service should fix the issue, see LP #2024205 for more information.17:38
ihrachysit's weird though, don't we set processes: 4 in https://review.opendev.org/c/openstack/neutron/+/939360/1/zuul.d/tempest-singlenode.yaml#82117:40
opendevreviewIhar Hrachyshka proposed openstack/neutron master: refactor: Remove OvnIdlDistributedLock._last_touch attribute  https://review.opendev.org/c/openstack/neutron/+/93935917:58
opendevreviewIhar Hrachyshka proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI  https://review.opendev.org/c/openstack/neutron/+/93936018:05
opendevreviewBrian Haley proposed openstack/neutron master: Optionally configure IPv6 metadata address  https://review.opendev.org/c/openstack/neutron/+/92649718:13
opendevreviewMerged openstack/neutron-tempest-plugin master: Always create router interface for ipv6 metadata test  https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/93910418:31
opendevreviewIhar Hrachyshka proposed openstack/neutron master: functional: Handle ovsdb monitor returning inserts in different checks  https://review.opendev.org/c/openstack/neutron/+/93938418:50
haleybihrachys: oh, that seems to make the functional tests happy ^^20:44
ihrachysone can hope, yes20:49
haleybi ran a recheck to double-check20:51
opendevreviewIhar Hrachyshka proposed openstack/neutron master: When storing _last_time_loaded for hash ring, use a lower timestamp  https://review.opendev.org/c/openstack/neutron/+/93939720:52
ihrachysI mean, I haven't even run the tests with the change, it's straight from my brain :p20:53
ihrachyshaleyb: it's interesting that this test started to misbehave, don't you think21:07
ihrachyssince we are dealing with missed ovn events from monitor elsewhere21:08
ihrachysI don't think anyone touched the test case since forever21:08
haleybno, but we did bump to a later OVN/OVS recently i think21:08
ihrachyshaleyb: apparently unit tests now trigger sudo? https://zuul.opendev.org/t/openstack/build/31c980b9fdb5498ca392ec863c6d137021:20
ihrachysbtw I also noticed that my mac asked me to fingerprint for sudo when I ran tests yesterday; I denied though thought it's something about macos; but looks like maybe something creeped into the code base21:21
haleybihrachys: probably a missing mock somewhere, that's my bet21:21
ihrachysreported here https://bugs.launchpad.net/neutron/+bug/209504421:23
ihrachysyeah, probably a mock21:23
ihrachysI also some some other tests were calling to e.g. tc (for qos?) and failed on mac. which suggests that some more mocks may be missing (since unit tests should never call to system)21:23
haleybthere is at least one other bug for a missing mock that i filed, for segment tests https://bugs.launchpad.net/neutron/+bug/203837321:25
ihrachysin other news, functional is sometimes busted for a different reason, see https://zuul.opendev.org/t/openstack/build/d8a35748226140408d088f5273a7199921:32
ihrachysovsdbapp.exceptions.TimeoutException: Commands [AddBridgeCommand(_result=None, name=ovs-test-d9a790, may_exist=True, datapath_type=system), DbAddCommand(_result=None, table=Bridge, record=ovs-test-d9a790, column=protocols, values=('OpenFlow13', 'OpenFlow14', 'OpenFlow10')), DbSetCommand(_result=None, table=Bridge, record=ovs-test-d9a790, col_values=(('other_config',21:32
ihrachys{'mac-table-size': '50000'}),), if_exists=True)] exceeded timeout 30 seconds, cause: TXN queue is full21:32
ihrachysoh fun21:32
haleybmy head hurts21:33
ihrachysthis exact error was mentioned in the rally bug report https://bugs.launchpad.net/neutron/+bug/209497021:33
ihrachys(about TXN queue is full)21:33
ihrachyshaleyb: being a certified Debby Downer, I'll say taht I think the team gets into a habit of letting failures pile up (bare rechecks etc.) until it becomes untenable... then the heads indeed hurt :p21:35
haleybihrachys: that does happen sometimes and expect $(someone_else) to fix them, it's hard to spend half a day tracking down a single failure and there's been >2 people doing it recently21:42
haleybi do want to know how you managed to get haproxy in that call trace, it's not remotely near that codepath21:45
ihrachysit's not me, it's zuul21:52
ihrachysthat's on a very complex patch, see https://review.opendev.org/c/openstack/neutron/+/939272 :)21:53
ihrachyssomething tells me this is not the patch that broke it21:53
haleyb@mock.patch('eventlet.spawn_n') - i will blame it on eventlet since that's the easy thing to do :)21:53
haleybthat patch! ygbfkm21:54
ihrachys:) i'll blame both mock and eventlet21:56
haleybyahtzee21:57
ihrachysstill unclear what happens with hash ring members. apparently the thread to touch nodes is not running / blocked and they over time fall into offline state, sometimes. apparently no one has a good theory of why the thread is blocked, except a vague notion of GIL issues?21:59
haleybthat is maybe a question for rodolfo22:23
haleybi'll be back again to look tomorrow22:42
opendevreviewMerged openstack/neutron master: functional: Handle ovsdb monitor returning inserts in different checks  https://review.opendev.org/c/openstack/neutron/+/93938422:49
opendevreviewIhar Hrachyshka proposed openstack/neutron master: WIP: ovn: don't attempt to create router port when no fixed-ips are set  https://review.opendev.org/c/openstack/neutron/+/93925322:51
opendevreviewIhar Hrachyshka proposed openstack/neutron master: Add option to configure live migration activation strategy for OVN  https://review.opendev.org/c/openstack/neutron/+/93810622:51
opendevreviewIhar Hrachyshka proposed openstack/neutron master: Remove shorturls from code  https://review.opendev.org/c/openstack/neutron/+/93927222:51
opendevreviewIhar Hrachyshka proposed openstack/neutron master: refactor: Remove OvnIdlDistributedLock._last_touch attribute  https://review.opendev.org/c/openstack/neutron/+/93935922:51
opendevreviewIhar Hrachyshka proposed openstack/neutron master: nit: Remove unused updated_at argument for touch_node  https://review.opendev.org/c/openstack/neutron/+/93936622:52
opendevreviewIhar Hrachyshka proposed openstack/neutron master: Remove linuxbridge driver  https://review.opendev.org/c/openstack/neutron/+/92721622:53

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!