Tuesday, 2024-11-12

timburke	frickler, it depends on the test job. probe tests (which were the ones affected by https://github.com/eventlet/eventlet/issues/989) run unconstrained	00:02
opendevreview	Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045	11:34
opendevreview	Suzan Song proposed opendev/git-review master: Support GIT_SSH_COMMAND https://review.opendev.org/c/opendev/git-review/+/934745	11:48
opendevreview	Suzan Song proposed opendev/git-review master: Support GIT_SSH_COMMAND https://review.opendev.org/c/opendev/git-review/+/934745	12:58
dtantsur	FYI folks. Not sure if it's a know issue or not, but the opendev.org v6 address does not look reachable here (Deutsche Telekom): https://paste.opendev.org/show/b5WW19Pmlpwk9d8GoPdP/	14:23
dtantsur	This may explain why it sometimes take me many seconds to open any page	14:23
frickler	dtantsur: yes, known issue with vexxhost's connectivity	14:24
dtantsur	So you know, good	14:24
fungi	is it still the same as a year or two ago when the prefix they were announcing into bgp was too small and was dropped by some providers' filters?	14:29
fungi	(prefix too long, technically, network too small)	14:30
frickler	I didn't check yet this time. but a /48 isn't too small, the issue was lack of proper database entries/certificates	14:32
fungi	well, i meant too small for the range. at one point iana decided that different ranges would have different minimum network sizes for bgp announcements, at least back in the early days of v6 allocation, and a number of backbone providers baked those assumptions into their filter. a /48 was considered valid to announce from some parts of the v6 space but not other parts (similar to how parts of	14:36
fungi	the v4 space were intended for smaller networks and others not)	14:36
fungi	by "lack of proper certificates" do you mean providers are expecting rpki now?	14:38
fungi	(rfc 6480)	14:39
*** artom_ is now known as artom		14:42
clarkb	I'm doing a quick pass of meeting agenda updates. Hoping to send that one shortly. Anything important to add/remove/edit?	15:45
opendevreview	Clark Boylan proposed opendev/system-config master: Update backup verifier to handle purged repos https://review.opendev.org/c/opendev/system-config/+/934768	16:01
frickler	clarkb: something about promote-openstack-manuals-developer jobs failing?	16:08
opendevreview	Joel Capitao proposed opendev/system-config master: Enable mirroring of CentOS Stream 10 contents https://review.opendev.org/c/opendev/system-config/+/934770	16:10
clarkb	frickler: ack working on it	16:10
clarkb	trying to pull up an example job log and seems like zuul build search might be slow? Either that or I'm not doing search terms properly	16:12
clarkb	corvus: ^ fyi in case there were db updates with the recent upgrade of zuul last weekend	16:12
clarkb	I found https://review.opendev.org/c/openstack/api-site/+/933901 is that the most recent run? Looks like that is still failing on an undefined variable?	16:15
clarkb	anyway its on the draft on the wiki and I'll send that out as able with meetings this morning	16:17
frickler	clarkb: build searches with just a job as a filter have been non-working for me for weeks, but I've given up complaining about that	16:19
corvus	clarkb: no db changes	16:20
frickler	combined with a project it works fine, so the above was at least the last one for that repo https://zuul.opendev.org/t/openstack/builds?job_name=promote-openstack-manuals-developer&project=openstack/api-site	16:21
opendevreview	Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045	16:31
clarkb	there are new jitsi meet images today. We should automatically upgrade when daily infra-prod jobs run. I don't have any concerns with this as there are no big events and this usually works	16:36
opendevreview	Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045	16:41
clarkb	sending the meeting agenda has resulted in two service-discuss members having non zero bounce processing scores	16:49
fungi	sounds like it's working!	17:10
JayF	So I know some folks have reported this some, I saw it frequently this weekend (Sunday, mainly), where initial connections to review.opendev.org were failing	17:13
JayF	I can't quite put my finger on what's happening. If I didn't know any better I'd say that sometimes I'm getting a bad DNS result; but it's not consistent enough to really apply meaningful network tools to it.	17:14
JayF	This is v4 only, from Centurylink/(old Qwest network) in .wa.us	17:14
clarkb	we have dnssec enabled bad DNS shouldn't be possible just good dns or no dns	17:16
clarkb	though I guess taht depends on client side settings and whether or not you verify	17:16
clarkb	JayF: are you seeing this with ssh or https or both?	17:18
JayF	both	17:18
JayF	well, I'll say, the experience with ssh was weird enough I blamed WSL	17:18
JayF	but I had a patch push that took almost 5 minutes before failing, but it didn't actually fail (the new patchset was in gerrit; just with a delay)	17:19
clarkb	how did the failure manifest?	17:19
JayF	but I saw this a bunch of times with firefox on windows. review.opendev.org (failure to connect) multiple times until I shift-refresh	17:20
JayF	once I got any successful connection, it stayed working	17:20
clarkb	fwiw I don't see any ssh connections from you in the logs from saturday, sunday, or monday. You are in there today. (of course I could be doing something wrong or you're using multiple usernames perhaps)	17:24
clarkb	it is worth noting that I sometimes get sad gerrit change pages sometime after they have loaded becuase background refreshes or whatever gerrit does to reload things (I think it polls for status updates like new patchset available and new comments) will occur when my local network is sad (for example during system updates that touch networking and shut it down temproarily)	17:29
clarkb	this is distinct to `open a new tab and connect to gerrit there`	17:29
clarkb	it sounds like you're talking about open new tab and see change there, but if it is the former situation instead I think this is expected unfortunately as long as gerrit does polling	17:29
frickler	hmm three release job failures in a row, all POST_FAILURE without logs available	17:40
frickler	all in the last 10 minutes	17:42
clarkb	looking at live streamed logs it appears to be some problem with rax_* uploads	17:43
JayF	clarkb: apparently the slow SSH was yesterday, around 5:10pm, looking at my patch log	17:43
JayF	clarkb: (PST)	17:43
JayF	clarkb: https://review.opendev.org/c/openstack/ironic/+/931055 patchset 6 here	17:44
clarkb	not seeing any tracebacks in ansible or anything like that just result failure attempting to upload to swift in at least rax dfw and rax iad	17:45
clarkb	we can disable rax_* for uploads if it persists	17:45
JayF	Weird that you don't see that in the logs though	17:45
JayF	oh, other thread	17:45
clarkb	JayF: I think those are today's logs because 5:10 pst is after 00:00 UTC	17:46
frickler	https://zuul.opendev.org/t/openstack/builds?result=POST_FAILURE&skip=100 looks like it might have started around 15 UTC	17:46
JayF	clarkb: ...does something happen at 5pm PST which would cause slowness? Like a backup that's at 00:00 no splay?	17:46
clarkb	JayF: I think gerrit does some internal processes at 00:00 UTC	17:46
JayF	well the timing lines up 100%, and that push took me three attempts	17:47
clarkb	but repacking and actual backups don't run around then	17:47
JayF	so maybe something underscaled or a job that's growing linearly or worse with patchset #? /me just guessing	17:47
clarkb	JayF: can you clarify how it failed? You said it succeeded so I'm wondering what the failure case looks like or even just how you know it failed	17:47
clarkb	slow != failed	17:48
clarkb	frickler: or earlier? there aer also blocks around 0700-0800?	17:48
clarkb	frickler: but we run a lot more jobs around 1500 UTC that 0800 UTC	17:49
fungi	https://rackspace.service-now.com/system_status?id=service_status&service=af7982f0db6cf200e93ff2e9af96198d	17:50
fungi	it's rackspace keystone-ish again	17:50
JayF	clarkb: 3 total attempts: first time: long hang during "Processing changes " spinner, I waited maybe 90 seconds, ^c, retried. second time: error message after a REALLY LONG TIME. Minutes, easily, but it actually worked because... third time: rejected after around 30+ seconds because it was missing a revision (because the second time worked even if it errored)	17:50
fungi	oh, though that incident today says it ended 16 hours ago	17:50
JayF	clarkb: I'll also note that after the second time, I refreshed, and it wasn't in the web ui yet. So it almost seems like the ssh connection died/err'd before the backend processing of the change completed	17:51
fungi	i wonder if it's picked back up again and their status dashboard hasn't been updated to reflect that yet	17:51
corvus	JayF: possible that it's actually the first that succeeded. i had a similar experience but did not do your step #2.	17:51
clarkb	JayF: in the ssh log I see a killed git-recieve-pack. I suspect this corresponds to your ^C.	17:51
clarkb	it ran for ~67seconds supposedly	17:52
JayF	that fits, I am impatient enough I would believe I overestimated the wait	17:52
frickler	fungi: most recent one started 11:27 CST according to that, whatever time that is	17:53
clarkb	JayF: importantly I'm not seeing any evidence of conenctivity problems	17:53
clarkb	that doesn't mean http isn't having them but on the ssh side I suspect that patience is simply needed	17:53
JayF	yeah, it "felt" much more like software slowness or I/O slowness	17:53
JayF	in fact I specifically looked for I/O issues because WSL is ... bad at I/O	17:54
JayF	and network transfers were in the hundreds of kb/s	17:54
fungi	frickler: oh, they just updated it after i pulled it up	17:54
fungi	11:27 cst is 17:27 utc	17:54
fungi	so 28 minutes agi	17:55
fungi	ago	17:55
clarkb	oh and this occured at ~0100 UTC today (since DST went away that lins up with ~1700 PST)	17:55
clarkb	fungi: ack so probably keysteon again then? Do we want to pull those regions out for now?	17:55
fungi	probably a good idea, seems like they keep having problems and no idea how long it will take to resolve this time or when the next incident will occur	17:56
fungi	two incidents on thursday and now two more (so far) today, just 5 days later	17:57
frickler	+1	17:57
fungi	though today's looks like it's just ord not other regions	17:57
opendevreview	Clark Boylan proposed opendev/base-jobs master: Disable rax job log uploads https://review.opendev.org/c/opendev/base-jobs/+/934819	17:58
fungi	(according to their incident info)	17:58
fungi	hopefully that change won't be blocked by the same problem it's trying to work around	17:59
clarkb	it probably will be	17:59
fungi	might get lucky if it's really only ord that's broken	18:00
clarkb	fwiw I'm running a gerrit show-caches --show-jvm right now to see ifthere are any obvious memory issues that may be contributing to JayF's thing.	18:00
frickler	that patch disables ovh, not rax?!?	18:00
clarkb	frickler: yes in base-test	18:00
fungi	frickler: you're looking at base-test	18:01
clarkb	that way we can run jobs against base-test and confirm that rax is workign again before we revert	18:01
frickler	ah, right	18:01
clarkb	in addition to memory and potential gerrit daily tasks another idea is that maybe the jgit update that came with the most recent point release could be contributing. We upgraded what 3 weeks ago?	18:02
clarkb	also could be the AI bot army	18:02
fungi	october 30 according to ps	18:02
fungi	(was the last restart)	18:02
clarkb	fungi: that restart would've been for cache changes not the software update	18:03
fungi	oh, right, that was the config update	18:03
clarkb	I suppose caches could also be at fault but we made them bigger which you'd expect would make things faster not slower	18:03
corvus	clarkb: JayF https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2024-11-07.log.html#t2024-11-07T01:09:14 there's the timing for my report of a similar problem	18:03
clarkb	corvus: thats also ~0100 UTC so something about that time may be the thread to pull on	18:03
corvus	++	18:04
JayF	if I remember around that time today I might push something chonky to sandbox	18:04
clarkb	I did check that git repack and borg backups don't run then so its probably something internal to Gerrit or some external thing also on a timer	18:04
clarkb	external to opendev I mean	18:04
clarkb	like meta bot going crazy or something	18:04
fungi	clarkb: 934819 failed due to not adding spaces after your hashmarks	18:06
clarkb	one sec	18:06
fungi	which, separately, seems like a silly rule	18:06
opendevreview	Clark Boylan proposed opendev/base-jobs master: Disable rax job log uploads https://review.opendev.org/c/opendev/base-jobs/+/934819	18:07
clarkb	JayF: corvus: I'll try to run a gerrit show-queue at 0100 today and every 30 seconds or so	18:09
clarkb	in theory that would show the tasks that Gerrit might be running internally and their runtimes	18:13
clarkb	Mem: 96.00g total = 57.96g used + 37.65g free + 400.00m buffers	18:14
clarkb	current memory consumption looks good to me	18:15
opendevreview	Merged opendev/base-jobs master: Disable rax job log uploads https://review.opendev.org/c/opendev/base-jobs/+/934819	18:17
opendevreview	Jeremy Stanley proposed zuul/zuul-jobs master: Switch logo color in docs pages to dark blue https://review.opendev.org/c/zuul/zuul-jobs/+/934453	19:03
opendevreview	James E. Blair proposed openstack/project-config master: Fix openstack developer docs promote job https://review.opendev.org/c/openstack/project-config/+/934832	19:58
clarkb	fungi: when we create lists we apply either private-default or legacy-default as the style depending on whether or not the list is private or public	19:58
clarkb	fungi: I suspect that we simply need to update those styles in mm3 to enable bounce processing by default? Not sure if updating a style changes existing lists or only new ones	19:59
fungi	aha, yeah i was trying to track that down	20:00
clarkb	updating the style will not update existing lists	20:00
clarkb	styles are initial defaults according to the docs	20:01
fungi	yep	20:01
fungi	i was just looking for how to adjust it for new lists, figured we'd still need to update the existing ones directly (which can be done in bulk from the cli if we like)	20:01
clarkb	I'm going to manually do the three opendev lists I manage now	20:02
clarkb	huh service-announce and service-incident are already set	20:03
clarkb	is it possible that the setting is set vhost wide so when I did service-discuss it did all of them? or maybe we just carried over these settings from mm2?	20:03
clarkb	and only some lists had it disabled?	20:03
clarkb	actually I bet that is what happened	20:03
clarkb	but then additionally the legacy-default style must not be enabling it so new lists don't get it?	20:04
timburke	could i get a node hold on https://zuul.opendev.org/t/openstack/stream/1358e40883c64c07b05ecdfd9ff8dcba?logfile=console.log ? seems like another hang, but i suspect it's a little different from the last one i saw...	20:11
fungi	timburke: added	20:17
timburke	thanks	20:18
fungi	clarkb: so based on https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/config/docs/config.html#styles it sounds like we can define our own styles https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/styles/docs/styles.html	20:22
fungi	i'm still hunting for an example of where to define them	20:22
clarkb	fungi: ya the mm3 docs are very sparse on this stuff	20:33
fungi	you could have stopped at "the mm3 docs are very sparse"	20:34
fungi	looks like the default styles are defined in src/mailman/styles/default.py	20:36
fungi	3.2.0 release notes mention "you can now specify additional styles using the new plugin architecture"	20:40
fungi	so i think we have to create a mailman plugin with our custom style(s)	20:40
fungi	the good news is that mailman plugins are written in python	20:41
fungi	https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/plugins/docs/intro.html#plugins	20:42
clarkb	fungi: I think you can define then via the rest api too	20:42
clarkb	but it isn't clear how to set all the attributes of that style via the api	20:42
clarkb	https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/styles/docs/styles.html#registering-styles	20:43
fungi	and doing that idempotently is perhaps another challenge	20:43
fungi	but maybe still simpler than maintaining a full-blown python project/package	20:44
clarkb	for lists and domains we just check if they exist first and create if not	20:44
clarkb	works as long as you only have one ansible running at a time	20:45
gouthamr	o/ fungi: frickler mentioned you were aware of the problem with opendevmeetbot joining #openstack-eventlet-removal .. is there a config change i need to make? or something i can seek help for?	20:47
gouthamr	opendevmeet*	20:47
clarkb	as a heads up I approved https://review.opendev.org/c/zuul/zuul-jobs/+/934243 to trigger rtd with curl	20:52
fungi	gouthamr: i wasn't aware of a problem	20:53
fungi	i see the channel is listed in the supybot.networks.oftc.channels setting of /var/lib/limnoria/limnoria.config on eavesdrop01, which should suffice	20:54
gouthamr	oh, I tried to start a meeting there and nothing happened	20:54
clarkb	gouthamr: is the bot in the room?	20:54
gouthamr	so I thought opendevmeet needed to be in the channel	20:54
gouthamr	Nope	20:54
clarkb	ok I suspect the issue is we don't auto restart that bot because it is disruptive	20:55
gouthamr	i op’ed and invited it too, in case that worked :)	20:55
fungi	it's not in the channel, but yes it needs to be	20:55
clarkb	we probably just need to manually restart the bot	20:55
fungi	oh, right, checking...	20:55
clarkb	we don't auto restart becuse it impacts running meetings so usually we wait for an empty meeting block of time to restart it	20:55
fungi	yeah, the current process has been running since some time last year	20:55
clarkb	probably a good idea to confirm the configuration updated before restarting but ya I suspect that is all that is needed	20:56
fungi	it's been so long since we added a new channel, i forgot we were doing manual restarts	20:57
fungi	but anyway, yes, i checked the config and it contains the channel, so the config update did deploy successfully	20:57
opendevreview	Merged zuul/zuul-jobs master: Use "curl" to trigger rtd.org webhook with http basic auth https://review.opendev.org/c/zuul/zuul-jobs/+/934243	20:57
fungi	the only meeting held at 21z on tuesdays according to https://meetings.opendev.org/ is the scientific sig, and i don't see it going on in #openstack-meeting so it should be safe to restart the bot container now	21:11
fungi	i'll do that unless there are objections in the next few minutes	21:11
clarkb	wfm	21:14
*** dhill is now known as Guest9233		21:16
opendevstatus	fungi: finished logging	21:23
fungi	gouthamr: ^	21:25
clarkb	I see the bot in the channel now	21:27
fungi	yeah, i watched it join as well	21:31
timburke	fungi, how's that node doing? looks like the job finally timed out -- hopefully whatever stuck process is still running	21:44
fungi	timburke: it's held, what's your ssh key?	21:46
timburke	https://gist.githubusercontent.com/tipabu/d5f2319c19b2672143b9c153f6a67ebd/raw/0909646bc09ffa2b156d63163a5f4cc506899fd9/gistfile1.txt	21:46
fungi	timburke: ssh root@23.253.166.112	21:47
timburke	thanks!	21:47
fungi	any time	21:49
timburke	fungi, think i've got what i need -- looks like more motivation to get off eventlet, but i think it's because we were holding the tool wrong this time	22:03
fungi	yeah, make sure you have the right end	22:03
fungi	injury will result	22:03
timburke	mixing tpool with manually-managed forking might be a recipe for a bad time	22:04
timburke	gotta watch them tines	22:04
fungi	i've freed that node to be recycled back into our quota	22:04
opendevreview	Tony Breeds proposed opendev/system-config master: Also include tzdata when installing ARA https://review.opendev.org/c/opendev/system-config/+/923684	23:57
opendevreview	Tony Breeds proposed opendev/system-config master: Update ansible-devel job to run on a newer bridge https://review.opendev.org/c/opendev/system-config/+/930538	23:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!