fungi | #status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 93% to 57% | 00:06 |
---|---|---|
fungi | previous prune (2025-05-06) only got it down to 68% | 00:06 |
opendevstatus | fungi: finished logging | 00:06 |
fungi | so the review02 backups retirement definitely helped | 00:07 |
corvus | i went ahead and approved clarkb's straightforward df and rm changes for zuul-providers | 00:09 |
*** jroll00 is now known as jroll0 | 06:33 | |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 08:39 |
mnasiadka | clarkb: ^^ removed centos element testing | 08:43 |
mnasiadka | clarkb: and raised https://issues.redhat.com/browse/RHEL-93891 | 09:02 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 09:07 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 11:12 |
opendevreview | James E. Blair proposed opendev/project-config master: Add promote-image-build pipeline https://review.opendev.org/c/opendev/project-config/+/951148 | 14:42 |
JayF | is gerrit down? | 14:44 |
JayF | PING review.opendev.org (2604:e100:1:0:f816:3eff:fe31:926a) 56 data bytes | 14:44 |
JayF | From 2001:550:2:16::2d:2 icmp_seq=1 Destination unreachable: Address unreachable | 14:44 |
clayg | I came to look for the mainteance notice | 14:44 |
corvus | it's reachable by me over ipv4 | 14:45 |
clayg | blip - seems to be up again | 14:45 |
fungi | i'm starting to wonder if vexxhost ca-ymq-1 is having intermittent network outages | 14:49 |
fungi | though if it's only ipv6, then maybe we're having the rogue ra problem again? i only see one default route at the moment at least | 14:50 |
corvus | 64 bytes from review03.opendev.org (162.253.55.233): icmp_seq=108 ttl=42 time=2155 ms | 14:53 |
corvus | 64 bytes from review03.opendev.org (162.253.55.233): icmp_seq=125 ttl=42 time=67.4 ms | 14:53 |
corvus | looking a little spotty | 14:53 |
Clark[m] | If you get onto the mirror there is access from the mirror more reliable? | 14:54 |
fungi | could also be worth checking cacti network graphs, maybe we're overloading the network interface | 14:54 |
Clark[m] | Just wondering if we can isolate it to connectivity to the cloud vs within the cloud | 14:54 |
Clark[m] | fungi: that seems unlikely if this is a reoccurrence of the weekend issue as that affected the backup server too | 14:54 |
corvus | i started a ping session there, i'll let you know if it acts up | 14:55 |
Clark[m] | Re rogue ra: the netplan config for review02 is still in system config commented out. We could update it for the review03 interface details if we suspect that to statically configure the network | 14:57 |
clarkb | from syslog: 2025-05-28T14:45:12.635718+00:00 review03 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 27s! | 15:11 |
clarkb | and that repeats a number of times for the various CPUs | 15:11 |
clarkb | now I suspect that this is something like VM live migration | 15:12 |
clarkb | there is some way to list that info from nova. I'm going to see if I can figure it out again | 15:12 |
clarkb | `openstack server event list review03.opendev.org` but it only shows the server creation event (so no logged migrations there) | 15:13 |
clarkb | mnaser: guilhermesp ricolin ^ is that CPU stuck thing something you might know more about? | 15:15 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP test mirror update on noble https://review.opendev.org/c/opendev/system-config/+/951163 | 15:19 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP test zookeeper on noble https://review.opendev.org/c/opendev/system-config/+/951164 | 15:21 |
clarkb | mnasiadka: any idea why that issue requires login to view? I think I did have an RH login somewhere so maybe I should just try that | 15:22 |
mnasiadka | clarkb: yes, you need RH account - it's the way it is since they moved to Jira | 15:23 |
opendevreview | Thierry Carrez proposed opendev/irc-meetings master: Move Large Scale SIG meeting one hour earlier https://review.opendev.org/c/opendev/irc-meetings/+/951168 | 15:29 |
clarkb | mnasiadka: looks like i don't have permission to view the issue after logging in | 15:31 |
clarkb | thats ok I can wait for updates via IRC :) | 15:31 |
mnasiadka | clarkb: tried adding you as watcher, you don't have permissions ;-) | 15:32 |
tonyb | mnasiadka: I added myself a watcher | 15:33 |
mnasiadka | Yeah, noticed that now - but you surely have bigger permissions than us ;-) | 15:33 |
mnasiadka | tonyb: while you're online - are you good with having a look in https://review.opendev.org/c/opendev/glean/+/941672 and maybe merging it? ;-) | 15:34 |
tonyb | mnasiadka: I've also added it as a blocker for an internal Jira issue | 15:35 |
tonyb | mnasiadka: I'll look at it but I'm not totally comfortable approving a glean change as I'm not familiar with the history etc | 15:36 |
mnasiadka | ok, then let's wait for somebody more comfortable ;-) | 15:37 |
tonyb | I will review it though | 15:37 |
clarkb | tonyb: I think it is the middle of the night so no rush, but I wanted to double check if there was any dib nodepool ci replacement stuff to look at yet | 15:40 |
clarkb | fungi: and did any concerns pop up with the hashtags plan? | 15:41 |
tonyb | Nothing to look at yet | 15:43 |
clarkb | cool just wanted to make sure I hadn't missed it there are a number of pieces to the centos puzzle at this point and while I've got most of it paged in I'm trying to stay on top of it. I think the major items are close though which is cool | 15:43 |
tonyb | Yeah close. It came together quickly in the end | 15:44 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP test mirror update on noble https://review.opendev.org/c/opendev/system-config/+/951163 | 15:58 |
opendevreview | Clark Boylan proposed opendev/system-config master: Install afsmon to a virtualenv https://review.opendev.org/c/opendev/system-config/+/951173 | 15:58 |
fungi | clarkb: not yet, though i'm going to bring it up with the openstack tc and kolla folks shortly | 16:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: Install afsmon to a virtualenv https://review.opendev.org/c/opendev/system-config/+/951173 | 16:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP test mirror update on noble https://review.opendev.org/c/opendev/system-config/+/951163 | 16:03 |
fungi | clarkb: i noticed that the project-config change to add hashtag acls to openstack tc repos a year ago mentioned in its commit message an expectation that opendev would eventually enable it for all registered users, and so far conversation in #openstack-tc suggests they're still cool with the idea | 16:24 |
corvus | yay for communicating early and often | 16:25 |
clarkb | yup I saw the messages over in #openstack-tc. Thanks for following up on that. I guess its just kolla that may object then | 16:25 |
fungi | i've also linked the announcement in #openstack-kolla and will see how it goes | 16:25 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add promote-artifact role https://review.opendev.org/c/zuul/zuul-jobs/+/951174 | 16:32 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Promote gate image builds to production https://review.opendev.org/c/opendev/zuul-providers/+/951175 | 16:33 |
corvus | that change (which has 2 dependencies) should get images into production very quickly after changes are made to the build jobs | 16:34 |
clarkb | corvus: the df and --rm changes failed on an image build timeout so I've rechecked them just now | 16:35 |
clarkb | corvus: which makes me wonder if we should be promoting those images before the change lands? | 16:36 |
clarkb | main concern there is we could be running images in production that aren't reflected by the git repo state | 16:36 |
corvus | yeah, that proposal is not to add the image-built reporter to gate, it's to add a new promote pipeline | 16:36 |
corvus | so they will go into service immediately after the git repo state changes | 16:37 |
clarkb | aha | 16:37 |
corvus | (which will be down from "up to 24 hours from the time the git repo state changes" which is today's behavior) | 16:37 |
corvus | mnasiadka: fyi after https://review.opendev.org/951175 we'll need to add an extra job for each image | 16:44 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP test zookeeper on noble https://review.opendev.org/c/opendev/system-config/+/951164 | 16:48 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch from netcat to netcat-openbsd package https://review.opendev.org/c/opendev/system-config/+/951178 | 16:48 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP test zookeeper on noble https://review.opendev.org/c/opendev/system-config/+/951164 | 17:36 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/951173 passes testing on the current ubuntu version and in the child wip change running noble. Reviews on that welcome. I think if we get that deployed onto the existing mirror-update server and all is well then I can boot a new noble mirror update | 18:09 |
clarkb | mechanically I think the process there will be to put the old server in the emergency file then disable all of the cron jobs on that server. Then once it settles we can add the new server to the inventory and have it atke over the duties | 18:10 |
clarkb | are there any tasks on mirror-update that are not driven by cron jobs? I dno't think so but figured I should ask as we need to ensure that afs doesn't get multiple conflicting vos releases. Maybe after things queisce we can shut down the old server too just to avoid that from happening. Better we fail to connect than run two vos releases | 18:11 |
clarkb | I'm still waiting on the similar but different change for zookeeper to complete testing but similar idea there. If we get the fixup landed against the existing servers then I can start deploying new servers as replacements | 18:15 |
fungi | i'm pretty certain that the only things run to update afs from mirror-update are triggered via cron, at least, which is all we need to worry about (otherwise leaving stale locks or inconsistent volumes) | 18:17 |
clarkb | ya even the docs/tarballs/etc release is on cron right? | 18:29 |
fungi | correct | 18:29 |
clarkb | I thought maybe zuul triggered that but I don't think it does | 18:29 |
fungi | zuul does not | 18:29 |
fungi | there's a cronjob scheduled to fire every 5 minutes to vos release those | 18:30 |
clarkb | cool in that case getting the afsmon install sorted out (thats 951173 above) should get us to the point where I can boto a new server | 18:30 |
clarkb | ok the zk tests look like they may pass now as well if we want to alnd https://review.opendev.org/c/opendev/system-config/+/951178 too | 18:39 |
nhicher | hello, I'd like to create a bug report for issue with openafs-client package from https://tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64/ for centos-9-stream, do you know where I can do that? | 19:45 |
nhicher | here the issue -> https://paste.openstack.org/show/bnfVKVLwxCzq9bsLzqYb/ | 19:45 |
clarkb | nhicher: I don't think we have a specific bug tracker for that. Also, I'm not sure we ever intended for those packages to be generally reconsumable. That ssaid the error you've posted seems likely one that would also affect us | 19:49 |
fungi | nhicher: for reference, we're not really maintaining any patches for it, that's just upstream openafs built on centos with this script... https://opendev.org/openstack/openstack-zuul-jobs/src/commit/f1140fe/roles/openafs-rpm-package-build/tasks/main.yaml#L28-L75 | 19:52 |
fungi | in summary, download the source tarball from openafs.org, untar it, and then run their makesrpm.pl script | 19:54 |
clarkb | guessing out loud: dkms/kernel updates to centos have broken the existing package setup | 19:54 |
fungi | and then run rpmbuild on the resulting srpm | 19:54 |
fungi | likel | 19:54 |
fungi | y | 19:54 |
clarkb | there is a 1.8.13.2 release now. I can update that script to try and build it and see if that is any happier. But if not we likely need someone to work with upstream to fix it | 19:55 |
fungi | nhicher: anyway, if you can reproduce it building the rpm yourself, then the bug report belongs upstram | 19:55 |
fungi | upstream | 19:56 |
fungi | also i'd strongly advise to build it yourself anyway, we can't guarantee that we'll keep that around forever | 19:57 |
clarkb | remote: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/951201 Build openafs 1.8.13.2 rpm packages | 19:58 |
clarkb | I believe ^ is self testing so if the problem persists we should know | 19:58 |
fungi | and also if it works it's already approved so will just merge | 19:59 |
clarkb | I guess it isn't clear from nhicher's paste which platform the install was attempted on | 20:00 |
clarkb | nhicher: was it on centos 9 stream? or something else? Its useful to know if the intended package target is the problem or not | 20:00 |
nhicher | clarkb: yes, up-to-date centos-9-stream. I've mirror on rhel-9.3 (it works), but it fails on rhel-9.4, that's why I tried to reproduce on 9-stream | 20:01 |
clarkb | so ya best guess is centos/rhel 9 updated dkms and/or the kernel and broke what was a previously valid dkms config | 20:02 |
clarkb | if 1.8.13.2 does not fix this you should followup with upstream | 20:02 |
nhicher | thanks clarkb and fungi, I will test =) | 20:02 |
clarkb | corvus: I've rechecked https://review.opendev.org/c/opendev/zuul-providers/+/951045 again but I'm beginning to wonder if we need to avoid running all the image builds for updates like that so they can get in reliably? | 20:03 |
clarkb | fungi: do you want to review https://review.opendev.org/c/opendev/system-config/+/951178 and/or https://review.opendev.org/c/opendev/system-config/+/951173 before I approve them? | 20:03 |
corvus | 2025-05-28 19:24:59.294500 | POST-RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/opendev/zuul-providers/playbooks/opendev-build-diskimage-base/post.yaml@master] | 20:06 |
corvus | clarkb: hrm... i feel like the builds are generally pretty reliable so i'd like to try to avoid dismissing them at this point | 20:07 |
corvus | that issue may be easier to debug with the no_log change merged | 20:08 |
corvus | where are we on that | 20:08 |
fungi | clarkb: yeah, looking | 20:08 |
corvus | https://review.opendev.org/948989 just clarkb and i so far | 20:08 |
clarkb | corvus: ya it timed out uploading to swift I think | 20:09 |
corvus | infra-root: https://review.opendev.org/948989 is a good change but if something goes wrong, we'll expose a password. i think the risk is small (there is no known path that would cause that) and we can rotate if needed, but i think it'd be good to get more reviews | 20:10 |
corvus | clarkb: let's give that a little time, and recheck in the interim? | 20:11 |
corvus | clarkb: (and with the extra info we'll get from that, we could ask our cloud friends if it indicates a problem) | 20:11 |
nhicher[m] | clarkb: FYI, it will be the same with 1.8.13.2, there is NO_WEAK_MODULES="true" in ./src/packaging/RedHat/openafs.spec.in. I will report upstream, thanks | 20:12 |
fungi | thanks for the update nhicher[m]! | 20:12 |
clarkb | nhicher[m]: ack | 20:12 |
clarkb | corvus: wfm | 20:12 |
clarkb | corvus: I wonder if its a bit of a thundering herd problem with all the jobs finishing around the same time | 20:13 |
fungi | i approved the no_log change for zuul-providers | 20:13 |
corvus | yeah i was wondering the same thing | 20:13 |
corvus | fungi: thx! | 20:13 |
fungi | i guess let's just keep a close eye on the results so we can be on top of any exposure | 20:13 |
clarkb | yup. I'll be around to keep an eye on the afsmon update too | 20:14 |
clarkb | fungi: re openafs there shouldn't be any harm in updating the package (it will just fail on centos 9 anyway) so I think we can leave that chagne to run through ci and possibly merge as is | 20:15 |
fungi | agreed | 20:17 |
opendevreview | Merged opendev/project-config master: Add promote-image-build pipeline https://review.opendev.org/c/opendev/project-config/+/951148 | 20:38 |
Clark[m] | OVH notified me within 5 minutes of my irc bouncer dying that they are aware of the problem | 21:12 |
Clark[m] | so anyway I'm on matrix for a bit | 21:13 |
fungi | exciting! | 21:24 |
clarkb | ok that didn't last as long as I feared it would. Took advantage of the situation to do some updates too | 21:29 |
clarkb | looks like the netcat package update failed on some docker connection issues. But the afsmon venv looks like it should merge soonish | 21:29 |
clarkb | just one more job to go | 21:30 |
michelthebeau | Hi, has anyone notice gerrit is not responding? | 21:45 |
JayF | there have been connectivity issues to it over v6 a couple of times already today; might want to see if using v4 works for now | 21:45 |
michelthebeau | v6, ok, I'll share that with others | 21:48 |
michelthebeau | Thanks | 21:48 |
fungi | i'm having trouble reaching it over both ipv4 and ipv6 at the moment | 21:48 |
clarkb | I can get it via ipv4 | 21:49 |
fungi | i also can't reach mirror.ca-ymq-1.vexxhost.opendev.org in the same region | 21:49 |
fungi | oh, login over ipv4 just got through | 21:49 |
clarkb | and syslog doesn't complain about cpu slowness either | 21:49 |
clarkb | timesyncd does complain about times out reaching ntp servers over ipv4 from the server | 21:50 |
clarkb | my hunch here is that this isn't really anything server specific but on the hosting side of things | 21:50 |
clarkb | and ping tests from a couple of locations seem to indicate it is v6 specific | 21:51 |
clarkb | ip addr doesn't report an extra or different ipv6 address either. So I don't think it is the rogue ra problem | 21:52 |
fungi | yeah, i guess we just keep hoping ricolin or guilhermesp or mnaser know what's happening with networking in that region | 21:52 |
mnaser | yes we're on it | 21:53 |
clarkb | ack thanks | 21:53 |
fungi | thanks!!! | 21:54 |
clarkb | the afsmon change did merge but I think gerritbot missed that event due to the network stuff | 21:55 |
fungi | status notice The Gerrit service on review.opendev.org is temporarily unreachable due to an ongoing issue in the hosting provider where it resides | 22:05 |
fungi | infra-root: ^ ? | 22:05 |
clarkb | that looks reasonable to me based on what we know | 22:06 |
fungi | #status notice The Gerrit service on review.opendev.org is temporarily unreachable due to an ongoing issue in the hosting provider where it resides | 22:06 |
opendevstatus | fungi: sending notice | 22:06 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org is temporarily unreachable due to an ongoing issue in the hosting provider where it resides | 22:06 | |
fungi | that'll get it onto our status page and fosstodon too, since michelthebeau apparently checked those first before asking in here | 22:06 |
opendevstatus | fungi: finished sending notice | 22:09 |
mnaser | it should be coming back | 22:10 |
opendevreview | Clark Boylan proposed opendev/system-config master: Install python3-venv for venv creation on mirror-update https://review.opendev.org/c/opendev/system-config/+/951214 | 22:14 |
clarkb | mnaser: ^ yup seems to be working for me | 22:14 |
clarkb | fungi: ^ fyi python3-venv isn't installed and venv creation failed | 22:14 |
clarkb | I don't think this is really impacting anything expcet for the ansible runs for now (the cron job is using the old path still whcih is still valid) | 22:15 |
clarkb | We install python3-venv on our test node images beacuse we rely on it for venvs there... that is why this gap exists | 22:15 |
corvus | clarkb: +2 to fix, but honestly that seems like something that would be good to migrate to a base role for all systems | 22:23 |
clarkb | corvus: ya that seems reasonable | 22:26 |
clarkb | that change failed on connectivity to ubuntu mirrors (not ours the upstream ones) | 22:36 |
clarkb | I fell like the universe is suggesting I do something else for a bit. | 22:36 |
opendevreview | Merged opendev/zuul-providers master: Have zstd remove the source file after compression https://review.opendev.org/c/opendev/zuul-providers/+/951044 | 22:41 |
fungi | mmm, 951214 failed system-config-run-base and system-config-run-mirror-update | 23:44 |
fungi | oh, i see it's already rechecked | 23:45 |
opendevreview | Merged opendev/zuul-providers master: Remove no_log for image upload tasks https://review.opendev.org/c/opendev/zuul-providers/+/948989 | 23:45 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!