Wednesday, 2025-05-28

fungi#status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 93% to 57%00:06
fungiprevious prune (2025-05-06) only got it down to 68%00:06
opendevstatusfungi: finished logging00:06
fungiso the review02 backups retirement definitely helped00:07
corvusi went ahead and approved clarkb's straightforward df and rm changes for zuul-providers00:09
*** jroll00 is now known as jroll006:33
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404508:39
mnasiadkaclarkb: ^^ removed centos element testing08:43
mnasiadkaclarkb: and raised https://issues.redhat.com/browse/RHEL-9389109:02
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404509:07
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404511:12
opendevreviewJames E. Blair proposed opendev/project-config master: Add promote-image-build pipeline  https://review.opendev.org/c/opendev/project-config/+/95114814:42
JayFis gerrit down?14:44
JayFPING review.opendev.org (2604:e100:1:0:f816:3eff:fe31:926a) 56 data bytes14:44
JayFFrom 2001:550:2:16::2d:2 icmp_seq=1 Destination unreachable: Address unreachable14:44
claygI came to look for the mainteance notice14:44
corvusit's reachable by me over ipv414:45
claygblip - seems to be up again14:45
fungii'm starting to wonder if vexxhost ca-ymq-1 is having intermittent network outages14:49
fungithough if it's only ipv6, then maybe we're having the rogue ra problem again? i only see one default route at the moment at least14:50
corvus64 bytes from review03.opendev.org (162.253.55.233): icmp_seq=108 ttl=42 time=2155 ms14:53
corvus64 bytes from review03.opendev.org (162.253.55.233): icmp_seq=125 ttl=42 time=67.4 ms14:53
corvuslooking a little spotty14:53
Clark[m]If you get onto the mirror there is access from the mirror more reliable?14:54
fungicould also be worth checking cacti network graphs, maybe we're overloading the network interface14:54
Clark[m]Just wondering if we can isolate it to connectivity to the cloud vs within the cloud14:54
Clark[m]fungi: that seems unlikely if this is a reoccurrence of the weekend issue as that affected the backup server too14:54
corvusi started a ping session there, i'll let you know if it acts up14:55
Clark[m]Re rogue ra: the netplan config for review02 is still in system config commented out. We could update it for the review03 interface details if we suspect that to statically configure the network14:57
clarkbfrom syslog: 2025-05-28T14:45:12.635718+00:00 review03 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 27s!15:11
clarkband that repeats a number of times for the various CPUs15:11
clarkbnow I suspect that this is something like VM live migration15:12
clarkbthere is some way to list that info from nova. I'm going to see if I can figure it out again15:12
clarkb`openstack server event list review03.opendev.org` but it only shows the server creation event (so no logged migrations there)15:13
clarkbmnaser: guilhermesp ricolin ^ is that CPU stuck thing something you might know more about?15:15
opendevreviewClark Boylan proposed opendev/system-config master: WIP test mirror update on noble  https://review.opendev.org/c/opendev/system-config/+/95116315:19
opendevreviewClark Boylan proposed opendev/system-config master: WIP test zookeeper on noble  https://review.opendev.org/c/opendev/system-config/+/95116415:21
clarkbmnasiadka: any idea why that issue requires login to view? I think I did have an RH login somewhere so maybe I should just try that15:22
mnasiadkaclarkb: yes, you need RH account - it's the way it is since they moved to Jira15:23
opendevreviewThierry Carrez proposed opendev/irc-meetings master: Move Large Scale SIG meeting one hour earlier  https://review.opendev.org/c/opendev/irc-meetings/+/95116815:29
clarkbmnasiadka: looks like i don't have permission to view the issue after logging in15:31
clarkbthats ok I can wait for updates via IRC :)15:31
mnasiadkaclarkb: tried adding you as watcher, you don't have permissions ;-)15:32
tonybmnasiadka: I added myself a watcher15:33
mnasiadkaYeah, noticed that now - but you surely have bigger permissions than us ;-)15:33
mnasiadkatonyb: while you're online - are you good with having a look in https://review.opendev.org/c/opendev/glean/+/941672 and maybe merging it? ;-)15:34
tonybmnasiadka: I've also added it as a blocker for an internal Jira issue15:35
tonybmnasiadka: I'll look at it but I'm not totally comfortable approving a glean change as I'm not familiar with the history etc15:36
mnasiadkaok, then let's wait for somebody more comfortable ;-)15:37
tonybI will review it though15:37
clarkbtonyb: I think it is the middle of the night so no rush, but I wanted to double check if there was any dib nodepool ci replacement stuff to look at yet15:40
clarkbfungi: and did any concerns pop up with the hashtags plan?15:41
tonybNothing to look at yet15:43
clarkbcool just wanted to make sure I hadn't missed it there are a number of pieces to the centos puzzle at this point and while I've got most of it paged in I'm trying to stay on top of it. I think the major items are close though which is cool15:43
tonybYeah close.   It came together quickly in the end15:44
opendevreviewClark Boylan proposed opendev/system-config master: WIP test mirror update on noble  https://review.opendev.org/c/opendev/system-config/+/95116315:58
opendevreviewClark Boylan proposed opendev/system-config master: Install afsmon to a virtualenv  https://review.opendev.org/c/opendev/system-config/+/95117315:58
fungiclarkb: not yet, though i'm going to bring it up with the openstack tc and kolla folks shortly16:03
opendevreviewClark Boylan proposed opendev/system-config master: Install afsmon to a virtualenv  https://review.opendev.org/c/opendev/system-config/+/95117316:03
opendevreviewClark Boylan proposed opendev/system-config master: WIP test mirror update on noble  https://review.opendev.org/c/opendev/system-config/+/95116316:03
fungiclarkb: i noticed that the project-config change to add hashtag acls to openstack tc repos a year ago mentioned in its commit message an expectation that opendev would eventually enable it for all registered users, and so far conversation in #openstack-tc suggests they're still cool with the idea16:24
corvusyay for communicating early and often16:25
clarkbyup I saw the messages over in #openstack-tc. Thanks for following up on that. I guess its just kolla that may object then16:25
fungii've also linked the announcement in #openstack-kolla and will see how it goes16:25
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add promote-artifact role  https://review.opendev.org/c/zuul/zuul-jobs/+/95117416:32
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Promote gate image builds to production  https://review.opendev.org/c/opendev/zuul-providers/+/95117516:33
corvusthat change (which has 2 dependencies) should get images into production very quickly after changes are made to the build jobs16:34
clarkbcorvus: the df and --rm changes failed on an image build timeout so I've rechecked them just now16:35
clarkbcorvus: which makes me wonder if we should be promoting those images before the change lands?16:36
clarkbmain concern there is we could be running images in production that aren't reflected by the git repo state16:36
corvusyeah, that proposal is not to add the image-built reporter to gate, it's to add a new promote pipeline16:36
corvusso they will go into service immediately after the git repo state changes16:37
clarkbaha16:37
corvus(which will be down from "up to 24 hours from the time the git repo state changes" which is today's behavior)16:37
corvusmnasiadka: fyi after https://review.opendev.org/951175 we'll need to add an extra job for each image16:44
opendevreviewClark Boylan proposed opendev/system-config master: WIP test zookeeper on noble  https://review.opendev.org/c/opendev/system-config/+/95116416:48
opendevreviewClark Boylan proposed opendev/system-config master: Switch from netcat to netcat-openbsd package  https://review.opendev.org/c/opendev/system-config/+/95117816:48
opendevreviewClark Boylan proposed opendev/system-config master: WIP test zookeeper on noble  https://review.opendev.org/c/opendev/system-config/+/95116417:36
clarkbhttps://review.opendev.org/c/opendev/system-config/+/951173 passes testing on the current ubuntu version and in the child wip change running noble. Reviews on that welcome. I think if we get that deployed onto the existing mirror-update server and all is well then I can boot a new noble mirror update18:09
clarkbmechanically I think the process there will be to put the old server in the emergency file then disable all of the cron jobs on that server. Then once it settles we can add the new server to the inventory and have it atke over the duties18:10
clarkbare there any tasks on mirror-update that are not driven by cron jobs? I dno't think so but figured I should ask as we need to ensure that afs doesn't get multiple conflicting vos releases. Maybe after things queisce we can shut down the old server too just to avoid that from happening. Better we fail to connect than run two vos releases18:11
clarkbI'm still waiting on the similar but different change for zookeeper to complete testing but similar idea there. If we get the fixup landed against the existing servers then I can start deploying new servers as replacements18:15
fungii'm pretty certain that the only things run to update afs from mirror-update are triggered via cron, at least, which is all we need to worry about (otherwise leaving stale locks or inconsistent volumes)18:17
clarkbya even the docs/tarballs/etc release is on cron right?18:29
fungicorrect18:29
clarkbI thought maybe zuul triggered that but I don't think it does18:29
fungizuul does not18:29
fungithere's a cronjob scheduled to fire every 5 minutes to vos release those18:30
clarkbcool in that case getting the afsmon install sorted out (thats 951173 above) should get us to the point where I can boto a new server18:30
clarkbok the zk tests look like they may pass now as well if we want to alnd https://review.opendev.org/c/opendev/system-config/+/951178 too18:39
nhicherhello, I'd like to create a bug report for issue with openafs-client package from https://tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64/ for centos-9-stream, do you know where I can do that?19:45
nhicherhere the issue -> https://paste.openstack.org/show/bnfVKVLwxCzq9bsLzqYb/19:45
clarkbnhicher: I don't think we have a specific bug tracker for that. Also, I'm not sure we ever intended for those packages to be generally reconsumable. That ssaid the error you've posted seems likely one that would also affect us19:49
funginhicher: for reference, we're not really maintaining any patches for it, that's just upstream openafs built on centos with this script... https://opendev.org/openstack/openstack-zuul-jobs/src/commit/f1140fe/roles/openafs-rpm-package-build/tasks/main.yaml#L28-L7519:52
fungiin summary, download the source tarball from openafs.org, untar it, and then run their makesrpm.pl script19:54
clarkbguessing out loud: dkms/kernel updates to centos have broken the existing package setup19:54
fungiand then run rpmbuild on the resulting srpm19:54
fungilikel19:54
fungiy19:54
clarkbthere is a 1.8.13.2 release now. I can update that script to try and build it and see if that is any happier. But if not we likely need someone to work with upstream to fix it19:55
funginhicher: anyway, if you can reproduce it building the rpm yourself, then the bug report belongs upstram19:55
fungiupstream19:56
fungialso i'd strongly advise to build it yourself anyway, we can't guarantee that we'll keep that around forever19:57
clarkbremote:   https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/951201 Build openafs 1.8.13.2 rpm packages19:58
clarkbI believe ^ is self testing so if the problem persists we should know19:58
fungiand also if it works it's already approved so will just merge19:59
clarkbI guess it isn't clear from nhicher's paste which platform the install was attempted on20:00
clarkbnhicher: was it on centos 9 stream? or something else? Its useful to know if the intended package target is the problem or not20:00
nhicherclarkb: yes, up-to-date centos-9-stream. I've mirror on rhel-9.3 (it works), but it fails on rhel-9.4, that's why I tried to reproduce on 9-stream20:01
clarkbso ya best guess is centos/rhel 9 updated dkms and/or the kernel and broke what was a previously valid dkms config20:02
clarkbif 1.8.13.2 does not fix this you should followup with upstream20:02
nhicherthanks clarkb and fungi, I will test =)20:02
clarkbcorvus: I've rechecked https://review.opendev.org/c/opendev/zuul-providers/+/951045 again but I'm beginning to wonder if we need to avoid running all the image builds for updates like that so they can get in reliably?20:03
clarkbfungi: do you want to review https://review.opendev.org/c/opendev/system-config/+/951178 and/or https://review.opendev.org/c/opendev/system-config/+/951173 before I approve them?20:03
corvus2025-05-28 19:24:59.294500 | POST-RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/opendev/zuul-providers/playbooks/opendev-build-diskimage-base/post.yaml@master]20:06
corvusclarkb: hrm... i feel like the builds are generally pretty reliable so i'd like to try to avoid dismissing them at this point20:07
corvusthat issue may be easier to debug with the no_log change merged20:08
corvuswhere are we on that20:08
fungiclarkb: yeah, looking20:08
corvushttps://review.opendev.org/948989 just clarkb and i so far20:08
clarkbcorvus: ya it timed out uploading to swift I think20:09
corvusinfra-root: https://review.opendev.org/948989 is a good change but if something goes wrong, we'll expose a password.  i think the risk is small (there is no known path that would cause that) and we can rotate if needed, but i think it'd be good to get more reviews20:10
corvusclarkb: let's give that a little time, and recheck in the interim?20:11
corvusclarkb: (and with the extra info we'll get from that, we could ask our cloud friends if it indicates a problem)20:11
nhicher[m]clarkb: FYI, it will be the same with 1.8.13.2, there is NO_WEAK_MODULES="true" in ./src/packaging/RedHat/openafs.spec.in. I will report upstream, thanks20:12
fungithanks for the update nhicher[m]!20:12
clarkbnhicher[m]: ack20:12
clarkbcorvus: wfm20:12
clarkbcorvus: I wonder if its a bit of a thundering herd problem with all the jobs finishing around the same time20:13
fungii approved the no_log change for zuul-providers20:13
corvusyeah i was wondering the same thing20:13
corvusfungi: thx!20:13
fungii guess let's just keep a close eye on the results so we can be on top of any exposure20:13
clarkbyup. I'll be around to keep an eye on the afsmon update too20:14
clarkbfungi: re openafs there shouldn't be any harm in updating the package (it will just fail on centos 9 anyway) so I think we can leave that chagne to run through ci and possibly merge as is20:15
fungiagreed20:17
opendevreviewMerged opendev/project-config master: Add promote-image-build pipeline  https://review.opendev.org/c/opendev/project-config/+/95114820:38
Clark[m]OVH notified me within 5 minutes of my irc bouncer dying that they are aware of the problem21:12
Clark[m]so anyway I'm on matrix for a bit21:13
fungiexciting!21:24
clarkbok that didn't last as long as I feared it would. Took advantage of the situation to do some updates too21:29
clarkblooks like the netcat package update failed on some docker connection issues. But the afsmon venv looks like it should merge soonish21:29
clarkbjust one more job to go21:30
michelthebeauHi, has anyone notice gerrit is not responding?21:45
JayFthere have been connectivity issues to it over v6 a couple of times already today; might want to see if using v4 works for now21:45
michelthebeauv6, ok, I'll share that with others21:48
michelthebeauThanks21:48
fungii'm having trouble reaching it over both ipv4 and ipv6 at the moment21:48
clarkbI can get it via ipv421:49
fungii also can't reach mirror.ca-ymq-1.vexxhost.opendev.org in the same region21:49
fungioh, login over ipv4 just got through21:49
clarkband syslog doesn't complain about cpu slowness either21:49
clarkbtimesyncd does complain about times out reaching ntp servers over ipv4 from the server21:50
clarkbmy hunch here is that this isn't really anything server specific but on the hosting side of things21:50
clarkband ping tests from a couple of locations seem to indicate it is v6 specific21:51
clarkbip addr doesn't report an extra or different ipv6 address either. So I don't think it is the rogue ra problem21:52
fungiyeah, i guess we just keep hoping ricolin or guilhermesp or mnaser know what's happening with networking in that region21:52
mnaseryes we're on it21:53
clarkback thanks21:53
fungithanks!!!21:54
clarkbthe afsmon change did merge but I think gerritbot missed that event due to the network stuff21:55
fungistatus notice The Gerrit service on review.opendev.org is temporarily unreachable due to an ongoing issue in the hosting provider where it resides22:05
fungiinfra-root: ^ ?22:05
clarkbthat looks reasonable to me based on what we know22:06
fungi#status notice The Gerrit service on review.opendev.org is temporarily unreachable due to an ongoing issue in the hosting provider where it resides22:06
opendevstatusfungi: sending notice22:06
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org is temporarily unreachable due to an ongoing issue in the hosting provider where it resides22:06
fungithat'll get it onto our status page and fosstodon too, since michelthebeau apparently checked those first before asking in here22:06
opendevstatusfungi: finished sending notice22:09
mnaserit should be coming back22:10
opendevreviewClark Boylan proposed opendev/system-config master: Install python3-venv for venv creation on mirror-update  https://review.opendev.org/c/opendev/system-config/+/95121422:14
clarkbmnaser: ^ yup seems to be working for me22:14
clarkbfungi: ^ fyi python3-venv isn't installed and venv creation failed22:14
clarkbI don't think this is really impacting anything expcet for the ansible runs for now (the cron job is using the old path still whcih is still valid)22:15
clarkbWe install python3-venv on our test node images beacuse we rely on it for venvs there... that is why this gap exists22:15
corvusclarkb: +2 to fix, but honestly that seems like something that would be good to migrate to a base role for all systems22:23
clarkbcorvus: ya that seems reasonable22:26
clarkbthat change failed on connectivity to ubuntu mirrors (not ours the upstream ones)22:36
clarkbI fell like the universe is suggesting I do something else for a bit.22:36
opendevreviewMerged opendev/zuul-providers master: Have zstd remove the source file after compression  https://review.opendev.org/c/opendev/zuul-providers/+/95104422:41
fungimmm, 951214 failed system-config-run-base and system-config-run-mirror-update23:44
fungioh, i see it's already rechecked23:45
opendevreviewMerged opendev/zuul-providers master: Remove no_log for image upload tasks  https://review.opendev.org/c/opendev/zuul-providers/+/94898923:45

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!