Wednesday, 2025-01-15

opendevreview	yatin proposed openstack/neutron master: Move neutron rally jobs to wsgi https://review.opendev.org/c/openstack/neutron/+/939315	05:41
opendevreview	yatin proposed openstack/neutron-tempest-plugin master: Always create router interface for ipv6 metadata test https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/939104	06:09
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: common: fix wait_until_true to support native thread https://review.opendev.org/c/openstack/neutron/+/937843	08:09
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765	08:10
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321	08:10
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use native os-ken implementation https://review.opendev.org/c/openstack/neutron/+/938487	08:10
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: common: fix wait_until_true to support native thread https://review.opendev.org/c/openstack/neutron/+/937843	08:13
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765	08:13
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use greenlet os-ken implementation not monkey-patched https://review.opendev.org/c/openstack/neutron/+/939057	08:13
ralonsoh	lajoskatona, slaweq hi folks, please check ykarel patch: https://review.opendev.org/c/openstack/neutron/+/939315	08:36
ralonsoh	new eventlet (surprise!) is breaking the Neutron CI	08:36
slaweq	ralonsoh done	08:38
ralonsoh	thanks!	08:38
*** dmellado075539377 is now known as dmellado07553937		08:49
ralonsoh	ykarel, hi! I'	08:56
ralonsoh	(sorry)	08:56
ralonsoh	I'm going to implement the fix for https://bugs.launchpad.net/neutron/+bug/2094736	08:57
ralonsoh	using the port_binding	08:57
ralonsoh	of course, in parallel, we need to figure out why we are missing the LSP events	08:57
lajoskatona	ralonsoh: thanks for headsup, just reached to this patch and LP bug	09:52
ykarel	ralonsoh, yes sure as mentioned over mail no objection from me on that parallel effort	10:07
ykarel	lajoskatona, ralonsoh can you also check https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/939104	11:05
ralonsoh	sure	11:06
opendevreview	Slawek Kaplonski proposed openstack/neutron master: Add limit of tags for every resource https://review.opendev.org/c/openstack/neutron/+/937887	11:26
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321	11:36
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765	11:36
opendevreview	Gaudenz Steinlin proposed openstack/neutron master: Fixup conntrackd support https://review.opendev.org/c/openstack/neutron/+/938800	13:29
ihrachys	ralonsoh: ykarel_ thanks for figuring out what happened with rally. so the fact that it was q-svc WAS important :)	13:33
ralonsoh	ihrachys, actually was ykarel_ only, I just didn't migrate the rally jobs... because of this issue	13:34
opendevreview	Rodolfo Alonso proposed openstack/neutron master: WIP == [OVN] ``PortBindingUpdateUpEvent`` https://review.opendev.org/c/openstack/neutron/+/939345	13:37
ralonsoh	ihrachys, ^^ btw	13:37
ihrachys	ralonsoh: re LSP update missing, in pb switch, hiding is a concern. apparently with https://review.opendev.org/c/openstack/devstack/+/937606 we should have more logs for northd to confirm it translates pb up to LSP up.	13:37
ralonsoh	ihrachys, but I see the ovn logs that this is happening	13:38
* haleyb hides in shame regarding q-svc		13:38
ralonsoh	I've reviewed several jobs	13:38
ihrachys	you mean northd translated?	13:38
ralonsoh	and always the LSP.up=true happens	13:38
ralonsoh	but Neutron never receives it	13:38
ihrachys	ok. so then it's ovsdb-server not notifying / idl layer not receiving for reasons	13:38
ralonsoh	I think is the IDL layer (I think, of course, with no proof)	13:39
ihrachys	are notify events logged by ovsdb-server?	13:39
ralonsoh	what events?	13:39
ralonsoh	I see the NB and SB events (for the port_binding and the LSP)	13:40
ihrachys	also interesting if this can be seen with any versions of ovs/ovn or just the old ones from ubuntu repos	13:40
ralonsoh	we started hitting this issue since the wsgi migration	13:40
ralonsoh	(this is why I suspect this could be something related to the hash ring manager)	13:40
ralonsoh	for example, in the https://bugs.launchpad.net/neutron/+bug/2094736 description	13:40
ihrachys	ralonsoh: AFAIU neutron-server registers a monitor with ovsdb-server. then on matching events, ovsdb-server would notify about them to corresponding monitors. correct? (if so then I'd expect some way to confirm ovsdb-server does it, on ovsdb side)	13:41
ralonsoh	in these logs, all the NB and SB events are correctly defined: the port_binding.up and some usecs later the LSP.up	13:41
ralonsoh	ihrachys, ok, so that's extra info, for sure	13:41
ralonsoh	ihrachys, let me push a patch on top of yours	13:41
ihrachys	the timing with uwsgi change is damning though, I agree.	13:42
ralonsoh	with repeated CI jobs and (that's important) more API workers	13:42
*** ykarel_ is now known as ykarel		13:43
opendevreview	Rodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939346	13:43
ralonsoh	ihrachys, ^	13:43
ralonsoh	ahhhh no, wrong patch	13:44
ykarel	and just to add we see all these issues with wsgi switch when running with >1 uwsgi workers	13:44
opendevreview	Rodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939346	13:45
ralonsoh	done ^ this patch is on top of a patch that uses the devstack one	13:45
ralonsoh	let's wait for the CI results	13:45
opendevreview	Rodolfo Alonso proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI - BUG 2094736 https://review.opendev.org/c/openstack/neutron/+/939347	13:47
ihrachys	let's say hash ring is the culprit - somehow the worker that grabbed it is locked. I'd expect no events to be received at all then. but we see port-binding events happening, just not LSP. are these untangled in regards to locking?	13:48
ralonsoh	ihrachys, the events are hashed using the row.uuid	13:50
ralonsoh	the LSP.uuid and PB.uuid are not the same	13:51
ralonsoh	despite we are talking about the same Neutron port	13:51
ralonsoh	so, as you said, the worker dealing with any other LSP event could be locked, but not the other one that matches pb.uuid	13:52
ihrachys	meaning, one worker handles PB but not LSP. if LSP worker locked / died, we'll see pb updates but not lsps. if so, switching to pb won't do much?	13:52
ihrachys	I mean, now we will - sometimes - lose pb events :)	13:53
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: reimplement signals handling https://review.opendev.org/c/openstack/neutron/+/939321	13:53
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: ovs: remove the usage of eventlet in the OVS agent https://review.opendev.org/c/openstack/neutron/+/937765	13:53
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: async_process: remove usage of eventlet for AsyncProcess https://review.opendev.org/c/openstack/neutron/+/939348	13:53
ihrachys	(that's the theory / rambling; we'll see if WIP changes anything I guess)	13:53
opendevreview	Lajos Katona proposed openstack/tap-as-a-service master: Doc: add documentation for usage and flow examples for OVS https://review.opendev.org/c/openstack/tap-as-a-service/+/828382	13:55
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use native os-ken implementation https://review.opendev.org/c/openstack/neutron/+/938487	13:55
opendevreview	Lajos Katona proposed openstack/tap-as-a-service master: Doc: add documentation for usage and driver details for SRIOV driver https://review.opendev.org/c/openstack/tap-as-a-service/+/881807	13:55
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/neutron master: dnm: use greenlet os-ken implementation not monkey-patched https://review.opendev.org/c/openstack/neutron/+/939057	13:56
lajoskatona	Dear Neutron Cores: please review my documentation patches for tap-as-a-service: https://review.opendev.org/q/topic:%22taas_driver_docs%22	13:58
lajoskatona	Thanks in advance	13:59
ralonsoh	lajoskatona, sure	14:00
ihrachys	is it normal that I see "Hash Ring loaded. 2 active nodes. 0 offline nodes" in logs posted every other minute? does it reinitialize idl or what?	14:01
ihrachys	(sometimes it's also "Hash Ring loaded. 1 active nodes. 0 offline nodes")	14:02
ralonsoh	ihrachys, it is reloaded, this is just a debug message	14:02
ralonsoh	ihrachys, in the initial transient period, the number can change	14:03
ralonsoh	but once all workers are loaded (they have updated the Neutron DB register)	14:03
ralonsoh	the number should be always static	14:03
ralonsoh	and all must be active	14:03
ihrachys	well this is is not at the start of the service; it's mid-flight	14:05
ralonsoh	ihrachys, so this is a problem	14:05
ralonsoh	I have some patches there under review	14:05
ralonsoh	one sec	14:05
ralonsoh	https://review.opendev.org/c/openstack/neutron/+/937351	14:06
ralonsoh	actually I need to check the comments	14:06
ralonsoh	ihrachys, the point is that we have a thread (per worker) that is in charge of refreshing the Neutron DB hashring register	14:07
ralonsoh	if the hash ring reload method (other thread) reloads (reads the active nodes) and see a non-updated one, we'll have this issue	14:08
ralonsoh	so I would go for https://review.opendev.org/c/openstack/neutron/+/937351, at least during the eventlet deprecation	14:08
ralonsoh	we have issues with the GIL yield and the thread in charge of refreshing the hashring register is not executed on time	14:09
ralonsoh	(that should NOT happen with kernel/preemptive threads)	14:09
opendevreview	yatin proposed openstack/neutron master: [DNM] repro functional failure with test order reversed https://review.opendev.org/c/openstack/neutron/+/937757	14:15
ihrachys	in a log, I see for a while all events handled by one node only	14:17
opendevreview	Rodolfo Alonso proposed openstack/neutron master: [OVN] Reduce the OVN hash ring touch interval https://review.opendev.org/c/openstack/neutron/+/937351	14:17
ihrachys	and just before the other one seized to handle any, I see Hash Ring loaded. 0 active nodes. 2 offline nodes; then HashRing is empty, error: Hash Ring returned empty when hashing "b'22ef8001-b0d9-43fd-956b-0abd72515c54'". All 2 nodes were found offline. This should never happen in a normal situation, please check the status of your cluster:	14:18
ralonsoh	ihrachys, exactly, that happens if the hash ring node looses one active node (by de fault, we have 2 API workers in the CI)	14:18
ihrachys	neutron.common.ovn.exceptions.HashRingIsEmpty: Hash Ring returned empty when hashing "b'22ef8001-b0d9-43fd-956b-0abd72515c54'". All 2 nodes were found offline. This should never happen in a normal situation, please check the status of your cluster	14:18
ihrachys	and then Hash Ring loaded. 1 active nodes. 1 offline nodes	14:18
ralonsoh	please check https://review.opendev.org/c/openstack/neutron/+/937351	14:18
ihrachys	ok. so one node falls into offline. the other one should pick up its events, so it should not be the reason for lsp event lost either, right?	14:20
ralonsoh	ihrachys, yes, you are right	14:20
ihrachys	ok looking more. so the worker that is actually handling events says "Hash Ring loaded. 2 active nodes. 0 offline nodes" but the worker that doesn't says "Hash Ring loaded. 1 active nodes. 0 offline nodes". that's just before the time when we miss the LSP update	14:26
ihrachys	this looks like maybe there's a split reality situation	14:27
ihrachys	the node that seized to handle events expects the other one to pick up the work	14:27
ihrachys	but the other one still believes the retired worker will continue handling its events?	14:28
ralonsoh	ihrachys, that's the point, each node refresh it's own hashring manager independently, based on the DB status	14:28
ralonsoh	but this refresh operation is not done at the same time	14:29
ralonsoh	so yes, that could lead to a situation where 2 nodes can discard an event because it doesn't belong to them (according to their own local hashring managers)	14:29
ihrachys	ok so is it then... inherently racy?	14:29
ralonsoh	so far, is the best implementation we have	14:30
ihrachys	ok and to confirm I'm not piling up, just trying to make sense :)	14:31
ihrachys	so bumping timeouts in your patch is in hope that workers fall offline less often?	14:31
ralonsoh	yes	14:31
ralonsoh	and reducing the refresh time	14:31
ihrachys	see, I'm slow but I'm getting there!	14:32
ihrachys	ralonsoh: and what's the theory of why wsgi switch made it worse?	14:43
ralonsoh	ihrachys, to be honest, I'm not sure. But the main problems we have are related to the hash ring	14:46
ralonsoh	missing events, nodes offline, etc	14:46
ralonsoh	these problems are not present in Ml2/OVS	14:47
ihrachys	the theory of touching the node more often seems reasonable BUT	14:49
ihrachys	we touch nodes in notify()	14:49
ihrachys	and I see events handled by the worker-about-to-become-offline just second(s) before it goes offline	14:49
ihrachys	I'd think any ovsdb-monitor event would refresh the timer in db?	14:50
ralonsoh	ihrachys, that would be too much to update always update a register in the DB	14:59
ralonsoh	having said that, this table has no references to other tables (that removes xreferences issues), the table is small and we are not modifying the indexes	15:00
ralonsoh	so this touch should be very fast	15:01
ralonsoh	ihrachys, btw, we have CI results: https://review.opendev.org/c/openstack/neutron/+/939346	15:02
ralonsoh	this is on top of your change, that uses de devstack patch	15:02
ihrachys	ralonsoh: we already touch; it's not a suggestion, it's in code	15:03
ihrachys	https://github.com/openstack/neutron/blob/585ea689d5d26356e28d8eb47f6d0511d21806cf/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L835	15:04
ihrachys	we see in logs the debug messages from line 852 from the worker	15:04
ralonsoh	ah right, just after an event	15:04
ihrachys	I don't see how this message can pop up without node also being touched. (assuming 30 seconds passed). so that's weird.	15:06
ralonsoh	ihrachys, sorry, what do you mean?	15:06
ihrachys	the linked code, it seems that if ovsdb-monitor event is being handled, and the node is past HASH_RING_CACHE_TIMEOUT then the event handler would also touch the node in db.	15:08
ihrachys	but then it's not clear why it would immediately be seen as offline by the same worker	15:08
ihrachys	except there's some caching of timestamps involved, so what is written in db may not necessarily be reflected into cache. maybe it should...	15:09
ralonsoh	nononono	15:10
ralonsoh	hold on	15:10
ralonsoh	self._last_touch is a local variable	15:10
ralonsoh	not the DB timestamp	15:10
ihrachys	yeah I know. there's also _node_last_touch inside the hash manager	15:10
ihrachys	so when it get_node(), it uses the cached timestamps	15:11
ihrachys	but when it touches, it touches db; but at the same time, cache is not updated with the new timestamp.	15:11
ihrachys	wonder if it should?	15:11
ralonsoh	btw, I'm checking https://fdc1d05a9a337a8993b4-089607d394060d72ce519e30966a0033.ssl.cf2.rackcdn.com/939346/2/check/neutron-ovn-tempest-ipv6-only-ovs-release-wsgi-2/4eb0215/testr_results.html	15:12
ralonsoh	in particular the port for instance 3c49676b-0677-4021-b6e6-4a6ee65f0704	15:12
ralonsoh	in https://fdc1d05a9a337a8993b4-089607d394060d72ce519e30966a0033.ssl.cf2.rackcdn.com/939346/2/check/neutron-ovn-tempest-ipv6-only-ovs-release-wsgi-2/4eb0215/controller/logs/ovn/ovn/ovsdb-server-nb_log.txt	15:12
ralonsoh	that is LSP.uuid=10146369-053d-446e-a5fe-ba09158c3b45	15:13
ralonsoh	where is the LSP.up field defined there?	15:13
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: WIP: refresh hash ring timestamp cache on touch https://review.opendev.org/c/openstack/neutron/+/939354	15:13
ihrachys	like in ^ (sorry too lazy to stash some other changes that I have locally)	15:13
ralonsoh	ihrachys, nonono	15:14
ralonsoh	https://review.opendev.org/c/openstack/neutron/+/936838	15:14
ralonsoh	I explicitly added this parameter	15:15
ralonsoh	in order to read the init time (passed via WSGI config)	15:15
ihrachys	ignore this. look at where I add refresh() to touch	15:15
ihrachys	(should have backed these changes out not to confuse you)	15:16
ihrachys	just this https://review.opendev.org/c/openstack/neutron/+/939354/1/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#845	15:16
ralonsoh	ihrachys, but this is actually the opposite to https://review.opendev.org/c/openstack/neutron/+/937351	15:17
ihrachys	without this, our touch of db does not affect the worker view of hash ring state (including its own state) and if the maint thread doesn't get to refresh() on time, then the touch was for naught (for this current worker)	15:17
ralonsoh	yes but this refresh also reads other hash ring registers	15:18
ralonsoh	(that should be updated, of course)	15:18
ihrachys	let it do that, not sure what the problem with it is (but if that's a concern, we can refresh just ourselves)	15:19
ralonsoh	ihrachys, ok, let's push a patch with this line change only	15:19
ihrachys	the reduction of interval for cache update may also be good. not sure about cache timeout bump.	15:19
ralonsoh	ihrachys, please check the CI logs that I mentioned: https://fdc1d05a9a337a8993b4-089607d394060d72ce519e30966a0033.ssl.cf2.rackcdn.com/939346/2/check/neutron-ovn-tempest-ipv6-only-ovs-release-wsgi-2/4eb0215/testr_results.html	15:20
ihrachys	(actually I should probably do refresh=True there too)	15:20
ihrachys	will check in a sec, let me push the refresh one first	15:21
ralonsoh	I don't see the LSP.UP event	15:21
ralonsoh	sure	15:21
ihrachys	(scratch above, refresh means refresh=True already)	15:21
ralonsoh	no no, sorry, it is there	15:22
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: ovn: Refresh hash ring timestamp cache on touch_node https://review.opendev.org/c/openstack/neutron/+/939357	15:26
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: WIP: nit: Remove _last_touch attribute for OvnIdlDistributedLock https://review.opendev.org/c/openstack/neutron/+/939359	15:29
ihrachys	ralonsoh: how can I test the patch with refresh I wonder?.. is there a way to thrash the gate to the point that it would always fail with the problem?	15:29
ralonsoh	ihrachys, sure, one sec	15:30
ralonsoh	cherry pick this https://review.opendev.org/c/openstack/neutron/+/939346/. Remove the change-id (to create a new one)	15:30
ralonsoh	push it on top of your patch	15:30
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939360	15:32
ihrachys	ralonsoh: to confirm, nothing to check in your logs above?	15:33
ralonsoh	ihrachys, well, only to confirm that is happening again the problem	15:33
ralonsoh	we have the LSP.up event	15:33
ralonsoh	2025-01-15T14:03:05.984Z\|06409\|jsonrpc\|DBG\|ssl:[2607:5300:201:2000::743]:40650: send notification, method="update3", params=["cfcf56ee-d348-11ef-aefb-fa163e340894","00000000-0000-0000-0000-000000000000",{"Logical_Switch_Port":{"b95d0ee3-cb36-4b02-87c9-1d07a051ab36":{"modify":{"up":true}}}}]	15:33
ralonsoh	but nothing is received in Neutron API	15:34
ihrachys	ok great that we can validate ovsdb-server is doing the right thing anyway	15:34
ihrachys	I'll have to step down from this hash ring issue for the next few hours. will check results in ci later. thanks ralonsoh for bearing with me stoopid questions :p	15:35
ralonsoh	a pleasure	15:36
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: nit: Remove unused updated_at argument for touch_node https://review.opendev.org/c/openstack/neutron/+/939366	16:39
lajoskatona	otherwiseguy: Hi, there's an ovsdbapp bug perhaps you can decide if it can be something to check in detail: https://bugs.launchpad.net/ovsdbapp/+bug/2093247 , thanks in advance	16:53
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server.		17:08
opendevreview	Lajos Katona proposed openstack/tap-as-a-service master: Doc: add documentation for usage and driver details for SRIOV driver https://review.opendev.org/c/openstack/tap-as-a-service/+/881807	17:11
lajoskatona	haleyb: Hi, if you have some free time for doc patches for taas: https://review.opendev.org/q/topic:%22taas_driver_docs%22 ;)	17:12
opendevreview	Merged openstack/neutron master: Move neutron rally jobs to wsgi https://review.opendev.org/c/openstack/neutron/+/939315	17:24
ihrachys	still events missed with my refresh() patch, now in log I see:	17:38
ihrachys	CRITICAL neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance [-] The number of nodes in the Hash Ring (4) is higher than the number of API workers (2) for host "np0039575603". Something is not right and OVSDB events could be missed because of this. Please check the status of the Neutron processes, this can happen when the API workers are killed and restarted.	17:38
ihrachys	Restarting the service should fix the issue, see LP #2024205 for more information.	17:38
ihrachys	it's weird though, don't we set processes: 4 in https://review.opendev.org/c/openstack/neutron/+/939360/1/zuul.d/tempest-singlenode.yaml#821	17:40
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: refactor: Remove OvnIdlDistributedLock._last_touch attribute https://review.opendev.org/c/openstack/neutron/+/939359	17:58
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: DNM - Test "neutron-ovn-tempest-ipv6-only-ovs*" with WSGI https://review.opendev.org/c/openstack/neutron/+/939360	18:05
opendevreview	Brian Haley proposed openstack/neutron master: Optionally configure IPv6 metadata address https://review.opendev.org/c/openstack/neutron/+/926497	18:13
opendevreview	Merged openstack/neutron-tempest-plugin master: Always create router interface for ipv6 metadata test https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/939104	18:31
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: functional: Handle ovsdb monitor returning inserts in different checks https://review.opendev.org/c/openstack/neutron/+/939384	18:50
haleyb	ihrachys: oh, that seems to make the functional tests happy ^^	20:44
ihrachys	one can hope, yes	20:49
haleyb	i ran a recheck to double-check	20:51
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: When storing _last_time_loaded for hash ring, use a lower timestamp https://review.opendev.org/c/openstack/neutron/+/939397	20:52
ihrachys	I mean, I haven't even run the tests with the change, it's straight from my brain :p	20:53
ihrachys	haleyb: it's interesting that this test started to misbehave, don't you think	21:07
ihrachys	since we are dealing with missed ovn events from monitor elsewhere	21:08
ihrachys	I don't think anyone touched the test case since forever	21:08
haleyb	no, but we did bump to a later OVN/OVS recently i think	21:08
ihrachys	haleyb: apparently unit tests now trigger sudo? https://zuul.opendev.org/t/openstack/build/31c980b9fdb5498ca392ec863c6d1370	21:20
ihrachys	btw I also noticed that my mac asked me to fingerprint for sudo when I ran tests yesterday; I denied though thought it's something about macos; but looks like maybe something creeped into the code base	21:21
haleyb	ihrachys: probably a missing mock somewhere, that's my bet	21:21
ihrachys	reported here https://bugs.launchpad.net/neutron/+bug/2095044	21:23
ihrachys	yeah, probably a mock	21:23
ihrachys	I also some some other tests were calling to e.g. tc (for qos?) and failed on mac. which suggests that some more mocks may be missing (since unit tests should never call to system)	21:23
haleyb	there is at least one other bug for a missing mock that i filed, for segment tests https://bugs.launchpad.net/neutron/+bug/2038373	21:25
ihrachys	in other news, functional is sometimes busted for a different reason, see https://zuul.opendev.org/t/openstack/build/d8a35748226140408d088f5273a71999	21:32
ihrachys	ovsdbapp.exceptions.TimeoutException: Commands [AddBridgeCommand(_result=None, name=ovs-test-d9a790, may_exist=True, datapath_type=system), DbAddCommand(_result=None, table=Bridge, record=ovs-test-d9a790, column=protocols, values=('OpenFlow13', 'OpenFlow14', 'OpenFlow10')), DbSetCommand(_result=None, table=Bridge, record=ovs-test-d9a790, col_values=(('other_config',	21:32
ihrachys	{'mac-table-size': '50000'}),), if_exists=True)] exceeded timeout 30 seconds, cause: TXN queue is full	21:32
ihrachys	oh fun	21:32
haleyb	my head hurts	21:33
ihrachys	this exact error was mentioned in the rally bug report https://bugs.launchpad.net/neutron/+bug/2094970	21:33
ihrachys	(about TXN queue is full)	21:33
ihrachys	haleyb: being a certified Debby Downer, I'll say taht I think the team gets into a habit of letting failures pile up (bare rechecks etc.) until it becomes untenable... then the heads indeed hurt :p	21:35
haleyb	ihrachys: that does happen sometimes and expect $(someone_else) to fix them, it's hard to spend half a day tracking down a single failure and there's been >2 people doing it recently	21:42
haleyb	i do want to know how you managed to get haproxy in that call trace, it's not remotely near that codepath	21:45
ihrachys	it's not me, it's zuul	21:52
ihrachys	that's on a very complex patch, see https://review.opendev.org/c/openstack/neutron/+/939272 :)	21:53
ihrachys	something tells me this is not the patch that broke it	21:53
haleyb	@mock.patch('eventlet.spawn_n') - i will blame it on eventlet since that's the easy thing to do :)	21:53
haleyb	that patch! ygbfkm	21:54
ihrachys	:) i'll blame both mock and eventlet	21:56
haleyb	yahtzee	21:57
ihrachys	still unclear what happens with hash ring members. apparently the thread to touch nodes is not running / blocked and they over time fall into offline state, sometimes. apparently no one has a good theory of why the thread is blocked, except a vague notion of GIL issues?	21:59
haleyb	that is maybe a question for rodolfo	22:23
haleyb	i'll be back again to look tomorrow	22:42
opendevreview	Merged openstack/neutron master: functional: Handle ovsdb monitor returning inserts in different checks https://review.opendev.org/c/openstack/neutron/+/939384	22:49
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: WIP: ovn: don't attempt to create router port when no fixed-ips are set https://review.opendev.org/c/openstack/neutron/+/939253	22:51
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: Add option to configure live migration activation strategy for OVN https://review.opendev.org/c/openstack/neutron/+/938106	22:51
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: Remove shorturls from code https://review.opendev.org/c/openstack/neutron/+/939272	22:51
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: refactor: Remove OvnIdlDistributedLock._last_touch attribute https://review.opendev.org/c/openstack/neutron/+/939359	22:51
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: nit: Remove unused updated_at argument for touch_node https://review.opendev.org/c/openstack/neutron/+/939366	22:52
opendevreview	Ihar Hrachyshka proposed openstack/neutron master: Remove linuxbridge driver https://review.opendev.org/c/openstack/neutron/+/927216	22:53

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!