Friday, 2025-12-05

opendevreviewMerged opendev/zuul-providers master: Instrument dstat in image build jobs  https://review.opendev.org/c/opendev/zuul-providers/+/96838601:12
*** mrunge_ is now known as mrunge06:37
*** liuxie is now known as lushy08:58
*** lushy is now known as liushy08:58
opendevreviewDr. Jens Harbott proposed opendev/zuul-providers master: Add nodesets for debian-trixie  https://review.opendev.org/c/opendev/zuul-providers/+/96991213:00
fricklerinfra-root: ^^ seems like we missed this, better to have this centrally rather than doing openstack-only nodesets I'd say. also are we in agreement to only do nodesets with explicit memory reqs going forward?13:02
fungifrickler: let me find the earlier discussion, but i think the decision was to stop providing generic (non-ram-specific) nodeset names13:43
fungiyeah, looks like 969912 is erroring because the nodesets for debian-trixie-4GB, debian-trixie-8GB and debian-trixie-16GB while we had some consensus not to create a generic debian-trixie pointed at an arbitrary label13:44
fungier, erroring because those three nodesets are already defined13:44
fungihttps://review.opendev.org/c/opendev/zuul-providers/+/967599 added them a few weeks ago13:46
fricklerah, I was working off an old clone, sorry, that got me confused. so I'll only add arm nodesets now that we have those images13:47
fungifrickler: the discussion relative to the debian-trixie nodeset happened here: https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-11-18.log.html#opendev.2025-11-18.log.html%23t2025-11-18T17:56:4813:48
fungii guess adding a comment about the lack of generic one and that it's intentional would have been a good idea, in retrospect13:49
opendevreviewDr. Jens Harbott proposed opendev/zuul-providers master: Add nodesets for debian-trixie arm64  https://review.opendev.org/c/opendev/zuul-providers/+/96991213:49
fricklerone day we also might find consensus on whether to order distro releases alphabetically or chronologically ...13:50
fungioh, i guess we did the inverse with comments. all the generic ones say "A legacy nodeset from nodepool"13:51
fricklerbut good, then I can update 967582 right away without having to wait for the above to merge13:51
funginot that it'll take long to merge. i single-core approved it since it's consistent with what we added for amd64 already13:52
opendevreviewMerged opendev/zuul-providers master: Add nodesets for debian-trixie arm64  https://review.opendev.org/c/opendev/zuul-providers/+/96991213:52
fungiwell, except for the lack of debian-trixie-arm64-4GB but we don't seem to do 4gb ram flavors in osuosl anyway13:53
fungiyeah, we have no .*-arm64-4GB nodesets or labels at all13:53
fricklerrelated, can someone take a look why py314 doesn't seem to work? https://zuul.opendev.org/t/openstack/build/b9074d51c5bb468a80a6bb3dd10d6902 maybe we need to run that on trixie instead of noble?13:56
fungifrickler: looks like it's because we're still selecting 3.14t-dev as the "latest" 3.14 version. i thought we had worked around or fixed that previously somewhere else...14:06
fungihttps://zuul.opendev.org/t/openstack/build/b9074d51c5bb468a80a6bb3dd10d6902/console#3/0/12/ubuntu-noble14:06
fungifrickler: https://review.opendev.org/c/zuul/zuul-jobs/+/940158 is the workaround i was thinking of, i guess i wrote that one14:12
fungiwhich points to https://github.com/pyenv/pyenv/issues/301514:14
fungifrickler: oh! i see it. they used to provide 3.14t so we filtered out any matching 't$' but now they're adding 3.14t-dev which no longer matches that exclusion. i'll get a fix up now14:19
fricklercool, thx14:21
opendevreviewJeremy Stanley proposed zuul/zuul-jobs master: ensure-python: Also filter t-dev suffixes in pyenv  https://review.opendev.org/c/zuul/zuul-jobs/+/96993914:22
fungifrickler: please rebase on that ^ to make sure it's sufficient14:22
fungior rather, depends-on i mean14:22
opendevreviewMichal Nasiadka proposed zuul/zuul-jobs master: Use mirror_info in configure-mirrors role  https://review.opendev.org/c/zuul/zuul-jobs/+/96618714:30
fricklerlooks like it is installing 3.14.1 now https://zuul.opendev.org/t/openstack/stream/aa6bee524e454204bb38726f0853579e?logfile=console.log14:36
fungiperfect14:38
fungithanks for noticing the job had started failing14:38
mnasiadkaSo, the image build jobs still have some arm64 problems, looking at the mask firewalld patch14:42
fungimnasiadka: yeah, i was just digging into the build that timed out, now that we have dstat graphs14:45
fungihttps://zuul.opendev.org/t/opendev/build/b22d87243b224a82aa362afd4634530714:45
fungilooks like the long silence starts at 07:09:59 in the log and continues until END RESULT_TIMED_OUT14:46
fungicomparing to dstat graphs collected from it, i don't see any significant shift in resource consumption on the node during that timespan14:46
fungithough the inode count does climb significantly, and then precipitously drop right after the silent span in the log begins14:47
fungifrom 67k to 25k inodes in a couple of minutes14:48
fungithere's definitely some i/o continuing throughout, though bursty, and the vast majority is read not write14:49
fungithe 1m load average is also jumping up to 3+ at times14:50
fungisadly no smoking gun, i don't think dstat has definitively confirmed any suspicions14:51
opendevreviewMichal Nasiadka proposed zuul/zuul-jobs master: Use mirror_info in configure-mirrors role  https://review.opendev.org/c/zuul/zuul-jobs/+/96618714:53
fungias for the build that ended in post_failure, looks like the "Upload image to swift" task got a http/500 error back from the keystone endpoint: https://zuul.opendev.org/t/opendev/build/5a28a04d11f5489eaf087f2e5d214a1014:53
mnasiadkaMaybe we need a retry there in such cases14:53
fungi(that's in rackspace flex, if memory serves)14:54
corvusfungi: it's interesting to compare the dstat graph to others from the same buildset.  it does look like there's generally 10x the iowait (maybe 10% vs 1%).15:07
corvusit may also be interesting to run that job alone and compare the graph in that case.15:08
fungiyeah, could indicate that node was running on a host with a noisy neighbor, i/o wise15:08
fungiso the actual i/o is low because it's fighting for bandwidth, but wait goes up accordingly15:09
corvusthe load average is also higher, usually above 1 (and typically above 2 in the second half of the job) on arm, while half that on x86.15:11
corvusthere are two peaks in the inode graph on the x86 job.  the second peak is 25% of the job runtime after the first (so it happens relatively quickly).  on the arm graph, the first peak shows up 50% through the job and the second one never shows up.  proportionally, there should have been time for that second peak.15:17
corvusso it seems like whatever is happening between those two peaks runs unusually slowly, even compared to the rest of the job.15:17
fungitheory time... why is inode count accumulating when file count isn't? the only thing i can think of is unlinked files, which could be pending file deletes piling up. then that dropoff in the graph is them actually being cleaned up which could result in a massive amount of backlogged write operations. if disk write is severely constrained then when an implicit fsync is triggered that15:21
fungigets completed at a trickle leading to the increased iowait we see?15:21
fungiand blocks the requested explicit write operation (the fstab file copy to that filesystem) while the background sync is in progress15:23
opendevreviewJeremy Stanley proposed zuul/zuul-jobs master: ensure-python: Also filter t-dev suffixes in pyenv  https://review.opendev.org/c/zuul/zuul-jobs/+/96993915:31
clarkbfungi: theoretically deleting as we go wouldn't be any faster right? you have to pay the cost at some point. But if we can pay it down when we're not doing other io it may help. However, i suspect that is normal build completion type cleanups rather than something we can do over the entire build. Another thought is maybe we can add a no cleanup mode to dib since we know we're going15:49
clarkbto throw the node away15:49
clarkbbasically don't delete individual files when we know we will delete the entire instance later. But that will almost certainly require someone to go learn a dib cleanup process15:49
fungiright, i was more suggesting that this points to something limiting write throughput15:52
fungilikely at the host layer, like a noisy neighbor or bad iscsi network connection15:53
fungibut seems like it's either on specific hosts (i guess we could start comparing host ids) or comes and goes at different times. maybe we're our own noisy neighbor even15:53
clarkbbeing our own noisy neighbors particularly with io is something that is familiar to us15:56
clarkbEMS responded to my supoprt ticket and says that email addresses are required now for added security (I'm not sure I follow how that helps). There is apparently a new admin ui beta system we can use to add the email without performing verification of it16:02
clarkbso I think our options are to either use the beta admin ui to add a bot with a bogus email address OR use the existing ui to add the bot with a bot specific plus address off of infra-root16:02
fungiwhich would allow us to add bogus e-mails if we want, i suppose16:02
fungiyeah that16:03
fungii would add addresses to all of them for consistency. can it at least be the same address for multiple users?16:03
clarkbI think we can probably use plus addresses since those apparently work with our email system. I just haev to figure out how to get my local client to view the resulting filtered locations16:03
clarkbfungi: I suspect, but don't actually know, that you can't use duplicates otherwise what security are we gaining16:04
clarkbthe response went to infra-root if you want to see what they wrote back16:04
*** darmach9 is now known as darmach16:04
fungiyeah, i already read it. i was more asking if the current interface prevents multiple accounts from using the same address16:04
clarkbfungi: I don't know. None of the current accounts have an address so adding a new one would be the first16:05
clarkbside note: if changing requirements for user workflows updating docs first or in tandem seems like a good idea. I'm sure I/we/everyone fails at this though16:05
clarkbanyway we can think about that for a moment before we make any decisions on how we want ot proceed. The bigger question is probably how do we want to generally manage this for all the bot accounts (which I think fungi is getting at with the duplicates question)16:06
clarkbwe can probably respond to that ticket with an inquiry about duplicates too16:07
clarkbother than that the major thing on my radar today is Gerrit upgrade prep. I think I've got that in a good place. But please let me know if there is anything you feel needs to be double checked or new items to look into16:08
clarkbI'll probably take it easy today since I expect to be working for a good chunk of sunday16:08
fungifridays are usually pretty quiet anyway (hope i didn't just jinx us)16:11
clarkbre Thunderbird and new folders. You right click on the server name in the left hand panel, select subscribe..., then optionally hit refresh if new folders aren't in there (they were for me so imap already synced their presence I guess) then check the empty check box next to them and voila you now have visibility into those folders16:16
clarkbwhy they don't automatically subscribe and sync to new folders I do not know16:16
clarkbbut considering that works and is a solveable problem I'm somewhat included to use infra-root+matrixstatus and infra-root+matrixlogs and infra-root+matrixfoo etc. Or if we prefer to use duplicates have infra-root+matrixbots shared amongst them16:17
clarkbassuming duplicates are valid (and if we want to do that I can respond to the ticket to ask)16:17
fungior just use infra-root@ if duplicates are allowed, unless we find that we get too much e-mail to them16:21
clarkboh ya if duplicates are fine we don't really need plus addressing16:22
clarkbwe can filter via source address as necessary in that case?16:22
fungithat's the main reason i asked about duplicates16:22
fungiinfra-root: i'm tentatively scheduling 20:00-21:00 UTC Friday 2025-12-12 for the starlingx/app-kubernetes-module-manager repository rename, and have commented accordingly on https://review.opendev.org/96804716:28
clarkbmy calendar appears to be clear at that day and time so I should be around16:29
fungicool, i figured it will be fairly quiet at that time while not being so late in my day as to be inconvenient16:29
opendevreviewMerged zuul/zuul-jobs master: ensure-python: Also filter t-dev suffixes in pyenv  https://review.opendev.org/c/zuul/zuul-jobs/+/96993916:40
clarkbI single core approved ^ since it seems like a trivial update to an existing bugfix that users reported fixed their problems16:58
fungiyep, thanks, it got confirmed working via depends-on and has a fairly small blast radius regardless17:29

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!