| opendevreview | Merged opendev/zuul-providers master: Instrument dstat in image build jobs https://review.opendev.org/c/opendev/zuul-providers/+/968386 | 01:12 |
|---|---|---|
| *** mrunge_ is now known as mrunge | 06:37 | |
| *** liuxie is now known as lushy | 08:58 | |
| *** lushy is now known as liushy | 08:58 | |
| opendevreview | Dr. Jens Harbott proposed opendev/zuul-providers master: Add nodesets for debian-trixie https://review.opendev.org/c/opendev/zuul-providers/+/969912 | 13:00 |
| frickler | infra-root: ^^ seems like we missed this, better to have this centrally rather than doing openstack-only nodesets I'd say. also are we in agreement to only do nodesets with explicit memory reqs going forward? | 13:02 |
| fungi | frickler: let me find the earlier discussion, but i think the decision was to stop providing generic (non-ram-specific) nodeset names | 13:43 |
| fungi | yeah, looks like 969912 is erroring because the nodesets for debian-trixie-4GB, debian-trixie-8GB and debian-trixie-16GB while we had some consensus not to create a generic debian-trixie pointed at an arbitrary label | 13:44 |
| fungi | er, erroring because those three nodesets are already defined | 13:44 |
| fungi | https://review.opendev.org/c/opendev/zuul-providers/+/967599 added them a few weeks ago | 13:46 |
| frickler | ah, I was working off an old clone, sorry, that got me confused. so I'll only add arm nodesets now that we have those images | 13:47 |
| fungi | frickler: the discussion relative to the debian-trixie nodeset happened here: https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-11-18.log.html#opendev.2025-11-18.log.html%23t2025-11-18T17:56:48 | 13:48 |
| fungi | i guess adding a comment about the lack of generic one and that it's intentional would have been a good idea, in retrospect | 13:49 |
| opendevreview | Dr. Jens Harbott proposed opendev/zuul-providers master: Add nodesets for debian-trixie arm64 https://review.opendev.org/c/opendev/zuul-providers/+/969912 | 13:49 |
| frickler | one day we also might find consensus on whether to order distro releases alphabetically or chronologically ... | 13:50 |
| fungi | oh, i guess we did the inverse with comments. all the generic ones say "A legacy nodeset from nodepool" | 13:51 |
| frickler | but good, then I can update 967582 right away without having to wait for the above to merge | 13:51 |
| fungi | not that it'll take long to merge. i single-core approved it since it's consistent with what we added for amd64 already | 13:52 |
| opendevreview | Merged opendev/zuul-providers master: Add nodesets for debian-trixie arm64 https://review.opendev.org/c/opendev/zuul-providers/+/969912 | 13:52 |
| fungi | well, except for the lack of debian-trixie-arm64-4GB but we don't seem to do 4gb ram flavors in osuosl anyway | 13:53 |
| fungi | yeah, we have no .*-arm64-4GB nodesets or labels at all | 13:53 |
| frickler | related, can someone take a look why py314 doesn't seem to work? https://zuul.opendev.org/t/openstack/build/b9074d51c5bb468a80a6bb3dd10d6902 maybe we need to run that on trixie instead of noble? | 13:56 |
| fungi | frickler: looks like it's because we're still selecting 3.14t-dev as the "latest" 3.14 version. i thought we had worked around or fixed that previously somewhere else... | 14:06 |
| fungi | https://zuul.opendev.org/t/openstack/build/b9074d51c5bb468a80a6bb3dd10d6902/console#3/0/12/ubuntu-noble | 14:06 |
| fungi | frickler: https://review.opendev.org/c/zuul/zuul-jobs/+/940158 is the workaround i was thinking of, i guess i wrote that one | 14:12 |
| fungi | which points to https://github.com/pyenv/pyenv/issues/3015 | 14:14 |
| fungi | frickler: oh! i see it. they used to provide 3.14t so we filtered out any matching 't$' but now they're adding 3.14t-dev which no longer matches that exclusion. i'll get a fix up now | 14:19 |
| frickler | cool, thx | 14:21 |
| opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: ensure-python: Also filter t-dev suffixes in pyenv https://review.opendev.org/c/zuul/zuul-jobs/+/969939 | 14:22 |
| fungi | frickler: please rebase on that ^ to make sure it's sufficient | 14:22 |
| fungi | or rather, depends-on i mean | 14:22 |
| opendevreview | Michal Nasiadka proposed zuul/zuul-jobs master: Use mirror_info in configure-mirrors role https://review.opendev.org/c/zuul/zuul-jobs/+/966187 | 14:30 |
| frickler | looks like it is installing 3.14.1 now https://zuul.opendev.org/t/openstack/stream/aa6bee524e454204bb38726f0853579e?logfile=console.log | 14:36 |
| fungi | perfect | 14:38 |
| fungi | thanks for noticing the job had started failing | 14:38 |
| mnasiadka | So, the image build jobs still have some arm64 problems, looking at the mask firewalld patch | 14:42 |
| fungi | mnasiadka: yeah, i was just digging into the build that timed out, now that we have dstat graphs | 14:45 |
| fungi | https://zuul.opendev.org/t/opendev/build/b22d87243b224a82aa362afd46345307 | 14:45 |
| fungi | looks like the long silence starts at 07:09:59 in the log and continues until END RESULT_TIMED_OUT | 14:46 |
| fungi | comparing to dstat graphs collected from it, i don't see any significant shift in resource consumption on the node during that timespan | 14:46 |
| fungi | though the inode count does climb significantly, and then precipitously drop right after the silent span in the log begins | 14:47 |
| fungi | from 67k to 25k inodes in a couple of minutes | 14:48 |
| fungi | there's definitely some i/o continuing throughout, though bursty, and the vast majority is read not write | 14:49 |
| fungi | the 1m load average is also jumping up to 3+ at times | 14:50 |
| fungi | sadly no smoking gun, i don't think dstat has definitively confirmed any suspicions | 14:51 |
| opendevreview | Michal Nasiadka proposed zuul/zuul-jobs master: Use mirror_info in configure-mirrors role https://review.opendev.org/c/zuul/zuul-jobs/+/966187 | 14:53 |
| fungi | as for the build that ended in post_failure, looks like the "Upload image to swift" task got a http/500 error back from the keystone endpoint: https://zuul.opendev.org/t/opendev/build/5a28a04d11f5489eaf087f2e5d214a10 | 14:53 |
| mnasiadka | Maybe we need a retry there in such cases | 14:53 |
| fungi | (that's in rackspace flex, if memory serves) | 14:54 |
| corvus | fungi: it's interesting to compare the dstat graph to others from the same buildset. it does look like there's generally 10x the iowait (maybe 10% vs 1%). | 15:07 |
| corvus | it may also be interesting to run that job alone and compare the graph in that case. | 15:08 |
| fungi | yeah, could indicate that node was running on a host with a noisy neighbor, i/o wise | 15:08 |
| fungi | so the actual i/o is low because it's fighting for bandwidth, but wait goes up accordingly | 15:09 |
| corvus | the load average is also higher, usually above 1 (and typically above 2 in the second half of the job) on arm, while half that on x86. | 15:11 |
| corvus | there are two peaks in the inode graph on the x86 job. the second peak is 25% of the job runtime after the first (so it happens relatively quickly). on the arm graph, the first peak shows up 50% through the job and the second one never shows up. proportionally, there should have been time for that second peak. | 15:17 |
| corvus | so it seems like whatever is happening between those two peaks runs unusually slowly, even compared to the rest of the job. | 15:17 |
| fungi | theory time... why is inode count accumulating when file count isn't? the only thing i can think of is unlinked files, which could be pending file deletes piling up. then that dropoff in the graph is them actually being cleaned up which could result in a massive amount of backlogged write operations. if disk write is severely constrained then when an implicit fsync is triggered that | 15:21 |
| fungi | gets completed at a trickle leading to the increased iowait we see? | 15:21 |
| fungi | and blocks the requested explicit write operation (the fstab file copy to that filesystem) while the background sync is in progress | 15:23 |
| opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: ensure-python: Also filter t-dev suffixes in pyenv https://review.opendev.org/c/zuul/zuul-jobs/+/969939 | 15:31 |
| clarkb | fungi: theoretically deleting as we go wouldn't be any faster right? you have to pay the cost at some point. But if we can pay it down when we're not doing other io it may help. However, i suspect that is normal build completion type cleanups rather than something we can do over the entire build. Another thought is maybe we can add a no cleanup mode to dib since we know we're going | 15:49 |
| clarkb | to throw the node away | 15:49 |
| clarkb | basically don't delete individual files when we know we will delete the entire instance later. But that will almost certainly require someone to go learn a dib cleanup process | 15:49 |
| fungi | right, i was more suggesting that this points to something limiting write throughput | 15:52 |
| fungi | likely at the host layer, like a noisy neighbor or bad iscsi network connection | 15:53 |
| fungi | but seems like it's either on specific hosts (i guess we could start comparing host ids) or comes and goes at different times. maybe we're our own noisy neighbor even | 15:53 |
| clarkb | being our own noisy neighbors particularly with io is something that is familiar to us | 15:56 |
| clarkb | EMS responded to my supoprt ticket and says that email addresses are required now for added security (I'm not sure I follow how that helps). There is apparently a new admin ui beta system we can use to add the email without performing verification of it | 16:02 |
| clarkb | so I think our options are to either use the beta admin ui to add a bot with a bogus email address OR use the existing ui to add the bot with a bot specific plus address off of infra-root | 16:02 |
| fungi | which would allow us to add bogus e-mails if we want, i suppose | 16:02 |
| fungi | yeah that | 16:03 |
| fungi | i would add addresses to all of them for consistency. can it at least be the same address for multiple users? | 16:03 |
| clarkb | I think we can probably use plus addresses since those apparently work with our email system. I just haev to figure out how to get my local client to view the resulting filtered locations | 16:03 |
| clarkb | fungi: I suspect, but don't actually know, that you can't use duplicates otherwise what security are we gaining | 16:04 |
| clarkb | the response went to infra-root if you want to see what they wrote back | 16:04 |
| *** darmach9 is now known as darmach | 16:04 | |
| fungi | yeah, i already read it. i was more asking if the current interface prevents multiple accounts from using the same address | 16:04 |
| clarkb | fungi: I don't know. None of the current accounts have an address so adding a new one would be the first | 16:05 |
| clarkb | side note: if changing requirements for user workflows updating docs first or in tandem seems like a good idea. I'm sure I/we/everyone fails at this though | 16:05 |
| clarkb | anyway we can think about that for a moment before we make any decisions on how we want ot proceed. The bigger question is probably how do we want to generally manage this for all the bot accounts (which I think fungi is getting at with the duplicates question) | 16:06 |
| clarkb | we can probably respond to that ticket with an inquiry about duplicates too | 16:07 |
| clarkb | other than that the major thing on my radar today is Gerrit upgrade prep. I think I've got that in a good place. But please let me know if there is anything you feel needs to be double checked or new items to look into | 16:08 |
| clarkb | I'll probably take it easy today since I expect to be working for a good chunk of sunday | 16:08 |
| fungi | fridays are usually pretty quiet anyway (hope i didn't just jinx us) | 16:11 |
| clarkb | re Thunderbird and new folders. You right click on the server name in the left hand panel, select subscribe..., then optionally hit refresh if new folders aren't in there (they were for me so imap already synced their presence I guess) then check the empty check box next to them and voila you now have visibility into those folders | 16:16 |
| clarkb | why they don't automatically subscribe and sync to new folders I do not know | 16:16 |
| clarkb | but considering that works and is a solveable problem I'm somewhat included to use infra-root+matrixstatus and infra-root+matrixlogs and infra-root+matrixfoo etc. Or if we prefer to use duplicates have infra-root+matrixbots shared amongst them | 16:17 |
| clarkb | assuming duplicates are valid (and if we want to do that I can respond to the ticket to ask) | 16:17 |
| fungi | or just use infra-root@ if duplicates are allowed, unless we find that we get too much e-mail to them | 16:21 |
| clarkb | oh ya if duplicates are fine we don't really need plus addressing | 16:22 |
| clarkb | we can filter via source address as necessary in that case? | 16:22 |
| fungi | that's the main reason i asked about duplicates | 16:22 |
| fungi | infra-root: i'm tentatively scheduling 20:00-21:00 UTC Friday 2025-12-12 for the starlingx/app-kubernetes-module-manager repository rename, and have commented accordingly on https://review.opendev.org/968047 | 16:28 |
| clarkb | my calendar appears to be clear at that day and time so I should be around | 16:29 |
| fungi | cool, i figured it will be fairly quiet at that time while not being so late in my day as to be inconvenient | 16:29 |
| opendevreview | Merged zuul/zuul-jobs master: ensure-python: Also filter t-dev suffixes in pyenv https://review.opendev.org/c/zuul/zuul-jobs/+/969939 | 16:40 |
| clarkb | I single core approved ^ since it seems like a trivial update to an existing bugfix that users reported fixed their problems | 16:58 |
| fungi | yep, thanks, it got confirmed working via depends-on and has a fairly small blast radius regardless | 17:29 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!