fungi | as long as we also build our images from our mirrors, probably fine? though is epel handled similarly? | 00:14 |
---|---|---|
Clark[m] | Our mirrors don't get the latest compose though right? | 00:16 |
ianw | this is fairly theoretical because i assume they don't want hundreds of ci machines pointing at the composes directly, so we'd have to figure out how to mirror it | 00:16 |
Clark[m] | I have new feedback for the CentOS folks now though :) they should push bigfixes through more quickly. Oh wait that's old feedback ;) | 00:16 |
ianw | and also if we fall behind, say dib can't build for a few days, then perhaps the compose falls out | 00:16 |
Clark[m] | Really though bugfix updates and feature updates seem to be treated the same here and it severely extends turnaround time on bugfixing | 00:17 |
ianw | yeah that is a good way of putting it | 00:17 |
Clark[m] | ianw: the other issue is if we update newer composes our CentOS won't match CentOS in the wild | 00:17 |
Clark[m] | And generally we try to represent what a user might see if they install $distro at home though don't always succeed at that | 00:18 |
ianw | that's true, although i guess the point of the whole thing (9-stream) is that rhel is probably what the "user" is supposed to see | 00:20 |
ianw | but for CI purposes, 9-stream seems to be what we want. but not a distro that takes several weeks to incorporate bug fixes :/ | 00:21 |
Clark[m] | I would argue we should use rocky if that is the goal. To me CentOS stream (and fedora) are more forward looking run software against new things | 00:21 |
Clark[m] | Because all these intermediate states don't feel super representative of rhel | 00:22 |
Clark[m] | But are good early indicators that something may be broken | 00:23 |
ianw | i think they need to reconsider the "we don't push every day to spare the mirrors" approach | 00:27 |
ianw | i think you can spare the mirrors, but be super duper sure that whatever you push has no bugs. like rhel level sure | 00:28 |
ianw | but if you're purposely being a front-testing distro, then you can't also be on a slow update cycle | 00:28 |
Clark[m] | ++ | 00:29 |
ianw | that seems like a better approach than us (me) tying ourselves (myself) in knots trying to somehow pull ephemeral composes | 00:29 |
ianw | i think i'll respond to that mail saying similar, also with clark's good point that updates != bugfixes | 00:30 |
opendevreview | Merged openstack/diskimage-builder master: Fix backward regex match https://review.opendev.org/c/openstack/diskimage-builder/+/846052 | 00:52 |
ianw | hopefully https://lists.centos.org/pipermail/centos-devel/2022-June/120426.html is a useful response, we'll see. i don't know what we can do to help | 01:18 |
ianw | not sure if everyone sees it, but why that archive page has decided to show the mails in a fixed-width panel that is < than a standard 80-line, making my mail look weirdly wrapped, i'm not sure | 01:19 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Revert "CentOS 9-stream : work around selinux permissions issue" https://review.opendev.org/c/openstack/diskimage-builder/+/846078 | 01:24 |
*** ysandeep|out is now known as ysandeep | 03:45 | |
*** soniya29 is now known as soniya29|ruck | 04:37 | |
*** soniya29 is now known as soniya29|ruck | 05:29 | |
opendevreview | yatin proposed openstack/project-config master: [proposal-updates] Run ensure-python before git-review setup https://review.opendev.org/c/openstack/project-config/+/846100 | 06:09 |
ykarel | frickler, also needs ^ | 06:09 |
ykarel | for https://zuul.openstack.org/build/51b6e06164b3409ab2dd1e1a5573cba0/log/job-output.txt#452 | 06:10 |
ianw | nb03 has stopped working agin | 06:17 |
ianw | 2022-06-15 15:22:24.867 | Build completed successfully | 06:18 |
ianw | that's openEuler-20-03-LTS-SP2-arm64-0000000533.log | 06:18 |
ianw | -rw-r--r-- 1 nodepool nodepool 1.6M Jun 15 15:22 openEuler-20-03-LTS-SP2-arm64-0000000533.log | 06:19 |
ianw | -rw-r--r-- 1 nodepool nodepool 1.3M Jun 16 00:51 centos-9-stream-arm64-0000000148.log | 06:19 |
ianw | the 9-stream build failed | 06:19 |
ianw | same error -- losetup: /opt/dib_tmp/dib_image.TYjoPhDC/image0.raw: failed to set up loop device: No such file or directory | 06:20 |
chkumar|rover | ianw: hello | 06:21 |
ianw | so, one possibility the openEuler build left something behind that broke the next build, or the other is that 9-stream is broken, and then breaks everything after it | 06:21 |
chkumar|rover | on cs9 content provider jobs, we are seeing resource-agents-4.10.0-17.el9.x86_64: Cannot download, all mirrors were already tried without success mirror issue | 06:22 |
chkumar|rover | https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-content-provider | 06:22 |
chkumar|rover | it all started today | 06:22 |
ianw | chkumar|rover: it's probably no coincidence that someone was pushing 20220613.0 to the mirror today ... | 06:22 |
ianw | see the thread @ https://lists.centos.org/pipermail/centos-devel/2022-June/120426.html | 06:23 |
ianw | this is the other problem with 2-weekly mirror pushes; we get all sorts of problem jammed together | 06:23 |
chkumar|rover | ianw: checking, thanks! | 06:24 |
ianw | i'm just looking at the sync logs and HighAvailability/aarch64/os/Packages/resource-agents-4.10.0-17.el9.aarch64.rpm is there | 06:25 |
ianw | sorry, and x86 anyway | 06:25 |
ianw | https://static.opendev.org/mirror/logs/rsync-mirrors/centos-stream.log | 06:26 |
chkumar|rover | ianw: yes the package is there | 06:26 |
ianw | what log file is the error in? maybe it has something to do with enabling the ha repo? something now depends on it that didn't before? | 06:30 |
chkumar|rover | it is coming while building rabbitmq container https://1ca2ee7583d21788b1d8-42b9b3ca9891e58d539431fcfb5b799d.ssl.cf2.rackcdn.com/841114/2/check/tripleo-ci-centos-9-content-provider/547bca5/logs/undercloud/home/zuul/workspace/logs/container-builds/505f9cce-8572-4172-b2c7-470f0678749f/base/rabbitmq/rabbitmq-build.log | 06:32 |
ianw | resource-agents-4.10.0-17.el9.x86_64.rpm: Downloading successful, but checksum doesn't match. Calculated: b24e6a8a70066918658ffc390d3dc4e3f91eb19eead295db9f14b7af988d8796(sha256) Expected: 3a4a37810d503f5eefb6b114d3b9f65d82ede7b9983dd1b3d62f975f8081e94b(sha256) | 06:33 |
chkumar|rover | here is the ha repo we are using https://1ca2ee7583d21788b1d8-42b9b3ca9891e58d539431fcfb5b799d.ssl.cf2.rackcdn.com/841114/2/check/tripleo-ci-centos-9-content-provider/547bca5/logs/undercloud/etc/yum.repos.d/quickstart-centos-highavailability.repo | 06:33 |
ianw | well that's weird | 06:33 |
ianw | it's getting the file, but it doesn't think it's right | 06:33 |
ianw | $ sha256sum ./resource-agents-4.10.0-17.el9.x86_64.rpm | 06:36 |
ianw | 3a4a37810d503f5eefb6b114d3b9f65d82ede7b9983dd1b3d62f975f8081e94b ./resource-agents-4.10.0-17.el9.x86_64.rpm | 06:36 |
ianw | that's from upstream, so that seem right | 06:36 |
chkumar|rover | yes expected checksum is correct | 06:38 |
chkumar|rover | I can see the same issue in image build job https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ed0/846113/2/check/tripleo-buildimage-overcloud-full-centos-9/ed07c69/build.log | 06:39 |
ianw | curl -s http://mirror.facebook.net/centos-stream/9-stream/HighAvailability/x86_64/os/Packages/resource-agents-4.10.0-17.el9.x86_64.rpm | sha256sum | 06:41 |
ianw | b24e6a8a70066918658ffc390d3dc4e3f91eb19eead295db9f14b7af988d8796 - | 06:41 |
ianw | it seems we have mirrored what the upstream has given us ... | 06:41 |
ianw | $ rpm2cpio resource-agents-4.10.0-17.el9.x86_64.rpm | cpio -ti | 06:43 |
ianw | ... | 06:43 |
ianw | cpio: premature end of file | 06:43 |
ianw | unfortunately that file looks corrupt, but it's corrupt in the upstream mirror we sync from | 06:43 |
chkumar|rover | oh, let me open a bug on launchpad and on centos infra also | 06:44 |
ianw | well it's not really centos infra, it's facebook infra who provide mirror.facebook.net that we sync from | 06:44 |
ianw | but good luck talking to them :/ | 06:44 |
*** soniya is now known as soniya|ruck | 06:45 | |
chkumar|rover | yes, it seems to be the mirror issue | 06:47 |
chkumar|rover | ianw: Do we want to change the mirror now? | 06:47 |
chkumar|rover | I have no contacts for facebook mirror | 06:47 |
chkumar|rover | let me check other mirror from mirror manager | 06:49 |
ianw | to be clear -> https://paste.opendev.org/show/b68rv7RD3lkuV1RwpuB8/ | 06:49 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/centos-stream-mirror-update | 06:51 |
ianw | is the file that does this | 06:51 |
ianw | i don't think i'm going to have time to monitor a full sync from somehwere else | 06:52 |
ianw | what i could do is manually download that file and release the volume, and take the mirror lock so it doesn't get overwritten, until tomorrow | 06:52 |
ianw | that is assuming it is only that file that is corrupt | 06:52 |
chkumar|rover | there is one more | 06:52 |
chkumar|rover | ianw: sorry that is the only package | 06:53 |
chkumar|rover | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d7e/846081/1/check/tripleo-ci-centos-9-content-provider/d7e6218/logs/undercloud/home/zuul/workspace/logs/container-builds/f4d4c433-3e5c-4187-aa35-036708efd7fa/base/redis/redis-build.log | 06:53 |
chkumar|rover | resource-agents-4.10.0-17.el9.x86_64: Cannot download, all mirrors were already tried without success | 06:53 |
chkumar|rover | ianw: that will unblock our CI | 06:57 |
ianw | -rw-r--r-- 1 10022 root 502874 Jun 10 13:42 resource-agents-4.10.0-17.el9.x86_64.rpm | 06:57 |
ianw | -rw-r--r-- 1 lp root 502874 Jun 10 13:42 resource-agents-4.10.0-17.el9.x86_64.rpm.1 | 06:57 |
ianw | they're the same size | 06:57 |
chkumar|rover | sorry it means we cannot replace the corrupted file. | 07:03 |
ianw | no i've replaced it now. but it's very weird they are the same size | 07:03 |
*** elodilles is now known as elodilles_pto | 07:04 | |
chkumar|rover | ianw: thank you thank you :-) ++++ I hope it clears the CI for today | 07:04 |
ianw | infra-root: tl;dr we found a corrupt file in the 9-stream mirror. i have replaced /afs/.openstack.org/mirror/centos-stream/9-stream/HighAvailability/x86_64/os/Packages/resource-agents-4.10.0-17.el9.x86_64.rpm.corrupt with the correct file from upstream | 07:05 |
ianw | i am holding the 9-stream mirror lock now in a root screen on mirror-update, to prevent it being overwritten. i'm out of time to investigate further. it's possible there's more corrupt files. https://paste.opendev.org/show/b68rv7RD3lkuV1RwpuB8/ is the relevant checksums | 07:06 |
ykarel | i think i have seen this in past | 07:06 |
ykarel | it's likely related to signed vs unsigned rpms | 07:07 |
ykarel | pushed to mirrors | 07:07 |
ykarel | so #centos-devel atleast will confirm if it's temporary due to mirror push or some actual bad push | 07:08 |
ianw | ok, well the file we synced is http://mirror.iad.rax.opendev.org/centos-stream/9-stream/HighAvailability/x86_64/os/Packages/resource-agents-4.10.0-17.el9.x86_64.rpm.corrupt | 07:08 |
ykarel | rpm -K http://mirror.stream.centos.org/9-stream/HighAvailability/x86_64/os/Packages/resource-agents-4.10.0-17.el9.x86_64.rpm | 07:09 |
ianw | infra-root: i have also copied the rpms in question to /root/resource-agents-rpm, and there's also dump.diff which is a dump between them | 07:10 |
ianw | ykarel: you might be on to something -- https://paste.opendev.org/show/bKxa7j6Yi2lqzBOxc3Q0/ is where they start to differ in the file | 07:10 |
ianw | the left is the "corrupt" one -- it seems to start having what look like a bunch of hashes | 07:11 |
ianw | interesting, so many questions. the digests in the mirror version are not ok? and somehow the facebook mirror has synced a different version of the file (that we then synced)? or i guess somehow the upstream one has been replaced | 07:14 |
ykarel | hmm let me ask on centos-devel | 07:15 |
ianw | ok, i'm going to be afk for a few hours at a kids recital concert. i'll check back in later to see | 07:16 |
*** jpena|off is now known as jpena | 07:42 | |
chkumar|rover | ykarel: ianw https://bugs.launchpad.net/tripleo/+bug/1978929 | 08:08 |
chkumar|rover | ykarel: feel free to add the conversation from centos-devel there | 08:08 |
ykarel | chkumar|rover, ack | 08:37 |
*** rlandy|out is now known as rlandy | 10:32 | |
frickler | I'm confused that this is actually working already http://nl01.opendev.org/dib-image-list , in my local config I always need to define [webapp:port] in order for the launcher to start listening | 10:47 |
*** soniya is now known as soniya29|ruck | 11:00 | |
ianw | frickler: at one stage i had monitoring, and may have setup those end-points. it was a long time ago | 11:02 |
ianw | chkumar|rover: hrm, not sure i really follow the centos-devel conversation. not clear to me why there would be corrupt files out there. but anyway, seems like something is happening anyway | 11:02 |
ianw | i've still got the lock on the centos-stream mirror volume, so it won't update. when we know it's good, any infra-root can just kill the screen session to release it (or i'll check on it tomorrow morning .au time) | 11:03 |
*** ysandeep is now known as ysandeep|afk | 11:08 | |
*** dviroel|afk is now known as dviroel | 11:18 | |
ykarel | ianw, it's fxed now | 11:36 |
*** soniya29 is now known as soniya29|ruck | 11:47 | |
*** soniya is now known as soniya29|ruck | 12:05 | |
fungi | i'll release the lock in that case so we pull the proper file(s) | 12:08 |
fungi | there is no longer a lock being held, so it should go back to refreshing on its usual schedule | 12:11 |
*** ysandeep|afk is now known as ysandeep | 12:18 | |
*** soniya is now known as soniya29|ruck | 12:39 | |
*** rlandy is now known as rlandy|rover | 13:03 | |
*** soniya is now known as soniya|ruck | 13:58 | |
*** ysandeep is now known as ysandeep|afk | 14:08 | |
*** soniya29 is now known as soniya29|ruck | 14:38 | |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add description to IRC channel reg example https://review.opendev.org/c/opendev/system-config/+/846182 | 14:56 |
*** ysandeep|afk is now known as ysandeep | 14:57 | |
ykarel | Clark[m], frickler can you check follow up one too https://review.opendev.org/c/openstack/project-config/+/846100 | 15:03 |
*** soniya29|ruck is now known as soniya29|out | 15:18 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Auto update nodepool launchers https://review.opendev.org/c/opendev/system-config/+/846186 | 15:25 |
clarkb | corvus: ^ fyi I think you were interested in that perviously | 15:25 |
opendevreview | Merged openstack/project-config master: [proposal-updates] Run ensure-python before git-review setup https://review.opendev.org/c/openstack/project-config/+/846100 | 15:25 |
clarkb | now to work on a change to automate the zuul cluster reboot and update playbook. fungi were you still going to run that by hand again? | 15:25 |
fungi | yeah, i can start it as soon as the openstack tc meeting wraps up | 15:25 |
fungi | also want to find some time today to resume reconciling our stale snapshot of jitsi-meet configs with current upstream examples/defaults | 15:28 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run zuul cluster reboots and updates automatically https://review.opendev.org/c/opendev/system-config/+/846195 | 15:43 |
clarkb | I marked that WIP until we're happy with another manual run | 15:44 |
corvus | clarkb: thx +2. i'm going to try to check on where we are and look at making releases later | 15:45 |
fungi | i have this ready to run in a root screen session on bridge.o.o: | 15:45 |
fungi | time ansible-playbook -f20 /home/zuul/src/opendev.org/opendev/system-config/playbooks/zuul_reboot.yaml 2>&1 | tee zuul_reboot.log.20220616 | 15:45 |
clarkb | fungi: that lgtm | 15:45 |
fungi | running now | 15:46 |
clarkb | ianw: so I don't forget, do we have WIP changes up to switch the gerrit image (and mariadb image?) yet? | 15:52 |
clarkb | we probably want those to check clean and be ready for approval post upgrade | 15:52 |
*** ysandeep is now known as ysandeep|out | 15:58 | |
opendevreview | yatin proposed zuul/zuul-jobs master: [ensure-pip] Install interpreters in loop https://review.opendev.org/c/zuul/zuul-jobs/+/846201 | 16:38 |
ykarel | clarkb, fungi found ^ missing after https://review.opendev.org/c/openstack/project-config/+/846100 merged | 16:39 |
ykarel | https://zuul.opendev.org/t/openstack/build/6f2ddabee82b42d3928589b46c176c4f/log/job-output.txt#408 | 16:40 |
ykarel | please also merge this, hopefully it will fix those jobs | 16:41 |
*** jpena is now known as jpena|off | 16:44 | |
clarkb | ykarel: fungi when done together none are installed beacuse apt errors in that case but done ina loop we do them one by one and ignore errors? | 16:51 |
clarkb | why are we trying to intsall packages that will fail? | 16:51 |
ykarel | clarkb, yes exactly | 16:51 |
ykarel | me not aware about the history of why it's done | 16:51 |
clarkb | ykarel: there should be a linter rule that won't allow you to use {{item}} in a loop because zuul jobs roles are often nested under loops | 16:51 |
clarkb | I think we use zj_somename | 16:52 |
ykarel | clarkb, ok i can fix it | 16:52 |
ykarel | clarkb, tox-py27 broken there :( | 16:54 |
ykarel | https://zuul.opendev.org/t/zuul/builds?job_name=tox-py27&project=zuul/zuul-jobs | 16:54 |
ykarel | multiple platform related jobs too broken | 16:56 |
opendevreview | yatin proposed zuul/zuul-jobs master: [ensure-pip] Install interpreters in loop https://review.opendev.org/c/zuul/zuul-jobs/+/846201 | 16:59 |
clarkb | looks like the azure dep is no longer python27 safe | 16:59 |
ykarel | c7 fails with zuul requires Python '>=3.8' but the running Python is 3.6.8 | 16:59 |
ykarel | c8 too | 17:00 |
ykarel | same with debian-buster | 17:00 |
ykarel | also with opensuse | 17:00 |
ykarel | https://review.opendev.org/c/zuul/zuul/+/840992 broke these jobs execpt tox-py27 one | 17:01 |
clarkb | hrm I thought we didn't install zuul anymore in these tests | 17:03 |
clarkb | but zuul doesn't do python3.6 anymore that change merely makes that more explicit | 17:03 |
clarkb | aha this is specific to the ensure pip tests testing that it can install something using zuul | 17:05 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix two testing problems https://review.opendev.org/c/zuul/zuul-jobs/+/846206 | 17:10 |
clarkb | ykarel: ^ that may be sufficient to address these issues | 17:11 |
ykarel | clarkb, Thanks | 17:12 |
ykarel | clarkb, tox-py27 still failed | 17:23 |
ykarel | rest good | 17:23 |
ykarel | triggered by msrest 0.7.0 release https://pypi.org/project/msrest/#history | 17:24 |
ykarel | https://github.com/Azure/msrest-for-python/commit/c97db9df00c7802b1a1af3ab8dcdbf6a5acbbdd7 | 17:25 |
clarkb | oh its not the azure dep itself but a transitive dep. Let me update | 17:26 |
ykarel | yes | 17:26 |
corvus | infra-root: the log has the last nodepool launcher restart as being on 6.0.0, and ps times match. so i think we need a launcher restart before tagging a new release. do we want to land 846186 and let it do it? | 17:28 |
ykarel | k logs say it msrest 0.7.0 requires azure-core>=1.24.0, but you'll have azure-core 1.21.1 which is incompatible. | 17:30 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Fix two testing problems https://review.opendev.org/c/zuul/zuul-jobs/+/846206 | 17:30 |
clarkb | corvus: I'm happy either way | 17:31 |
clarkb | corvus: if you want to go ahead and restart things by hand I'm sure there will be plenty of opportunity to future restarts to exercise my change | 17:31 |
clarkb | ykarel: I think the dep solver should address the version needs. As long as we cap off msrest for python2.7 | 17:32 |
opendevreview | yatin proposed zuul/zuul-jobs master: [ensure-pip] Install interpreters in loop https://review.opendev.org/c/zuul/zuul-jobs/+/846201 | 17:32 |
ykarel | clarkb, yes LGTM, i rebased my one on yours | 17:32 |
ykarel | Thanks, /me out now | 17:32 |
corvus | clarkb: i'm not in a hurry -- my slight preference would be for fungi/ianw to okay 846186 within a couple of hours and then let's see what happens. otherwise, i'll restart manually later my afternoon. sound ok? | 17:36 |
clarkb | wfm | 17:37 |
fungi | sounds good, reviewing it now | 17:53 |
opendevreview | Jay Faulkner proposed openstack/diskimage-builder master: Add support for Python 3.10 https://review.opendev.org/c/openstack/diskimage-builder/+/844613 | 17:57 |
fungi | lgtm, i went ahead and approved it since it's really straightforward | 18:00 |
clarkb | heh I happened to catch the playbook while it was rebooting ze01 so it looked like maybe it was idel. I think it is fine just good/bad timing | 18:30 |
opendevreview | Merged opendev/system-config master: Auto update nodepool launchers https://review.opendev.org/c/opendev/system-config/+/846186 | 18:39 |
fungi | clarkb: you mean you stopped the executor, or you were manually running something and it complained that ze01 was unreachable? | 18:41 |
clarkb | fungi: I was just checking on the pgoress of the playbook and ze01 wasn't in componenets but I could ssh in | 18:41 |
clarkb | so then I checked the log and it said it was rebooting and sur eenough I couldn't ssh in anymore | 18:42 |
fungi | ahh, yep | 18:42 |
clarkb | looks like the nodepool launchers all restarted while I was eating lunch | 20:08 |
fungi | good deal | 20:11 |
*** dviroel is now known as dviroel|biab | 21:07 | |
ianw | clarkb: https://review.opendev.org/c/opendev/system-config/+/844362 + 3 are the changes to switch images in system-config | 22:02 |
ianw | so the upstream fb mirror still has a borked resource-agents-4.10.0-17.el9.x86_64.rpm but we still have the right one, and fungi dropped the lock right? | 22:06 |
fungi | i did | 22:09 |
fungi | so maybe the rsync is failing, i haven't looked | 22:09 |
ianw | it has rsync'd since then. i guess maybe it's doing timestamp comparision and our .rpm doesn't get overwritten due to whatever rsync options | 22:09 |
fungi | ahh, yeah if it's not using the checksum method then that's probably it | 22:10 |
clarkb | ianw: +3 ? | 22:10 |
clarkb | oh its 844363 the child | 22:10 |
ianw | sorry, changes 844362 and 844363 | 22:11 |
ianw | https://pagure.io/centos-infra/issue/812 is what #centos-devel where talking about | 22:11 |
ianw | that links to a scsi possible corruption issue @ https://bugzilla.redhat.com/show_bug.cgi?id=1989717 that they apparently hit | 22:12 |
ianw | i wonder if some subset of mirrors have pulled randomly corrupt files, and for similar reasons that we're not overwriting, they're not overwriting it either | 22:13 |
ianw | the file *size* is exactly the same, except it has borked data at the end | 22:15 |
clarkb | I'm not seeing the link to the scsi bug? | 22:17 |
clarkb | oh wait its a hyperlink named BZ | 22:17 |
ianw | doing school run, but i should be able to quickly write something to go through the metalink results and poll each mirror and see if it's only fb out of sync, or some subset | 22:22 |
ianw | if it's only fb, then i guess it's "our" problem, in that we have a bad upstream | 22:22 |
ianw | if it's multiple, seems like an infra issue because they've pushed out bad data | 22:22 |
fungi | a "bad" upstream the centos-connected community members insisted was the best option because... facebook | 22:23 |
fungi | every time someone complains about a mirror inconsistency it seems like it's immediately followed by "shouldn't we switch to mirroring that from facebook?" | 22:24 |
ianw | well, the other one ran out of disk and was just in a broken state for an indeterminate amount of time | 22:58 |
fungi | the other-other-other one at any rate. we seem to go through upstream mirror sources like tissues | 22:58 |
ianw | i don't know if there's any insistence it's the best one. we were using kernel.org and something happened there too | 23:00 |
*** dviroel|biab is now known as dviroel | 23:07 | |
fungi | yeah | 23:10 |
fungi | well, also we're limited by the intersection of mirrors which allow rsync and mirrors which have the content we want | 23:11 |
ianw | it seems like fb is off the hook a bit here, because i see at least one other mirror with a bad file | 23:13 |
*** dviroel is now known as dviroel|out | 23:16 | |
fungi | so probably the primary mirror initially pushed the signed package which everyone mirrored, then later pushed the unsigned one with the same filename and size but generated a new index with the corresponding updated checksum | 23:17 |
fungi | facebook et cetera pulled the new index because it was a different length or different filename, but thought the package was the same and kept the original | 23:17 |
ianw | if anyone can run https://paste.opendev.org/show/bDuyvcSFgWUd6VL2TjeI/ from a different location that might be interesting | 23:18 |
ianw | i'm not sure it's signed v unsigned. the package we mirrored had a corrupt cpio inside it | 23:19 |
fungi | oh, okay for some reason i thought you said something about embedded hashes before | 23:21 |
ianw | i did :) the signed v unsigned thing was a theory | 23:22 |
ianw | https://paste.opendev.org/show/bKxa7j6Yi2lqzBOxc3Q0/ is where the data actually differs on the corrupt file (left) and correct one (right) | 23:23 |
ianw | it looks to me like a bunch of sha hashes separated by nulls | 23:23 |
ianw | https://paste.opendev.org/show/bfACXyFK1sL6CS8E5Ts4/ is the results of me pulling it from everything the metalink gives me | 23:25 |
ianw | a bunch of them show the corrupt file we have | 23:25 |
fungi | but yeah, if lots of mirrors have the old corrupt version, then probably the primary replaced it in such a way that ~nobody knows to pull the fixed one | 23:29 |
ianw | right, since it's the same size, probably the same thing our rsync is doing | 23:30 |
ianw | these tier 1 mirrors use mirrormanager or whatever, i don't know | 23:30 |
*** rlandy|rover is now known as rlandy|out | 23:36 | |
ianw | i've opened https://pagure.io/centos-infra/issue/814. i don't think there's much to do at this point but wait and see what upstream infra advises. this may not be the only corrupt file, so switching upstream mirrors might just give us the same problem on a different file | 23:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!