Thursday, 2022-06-16

fungias long as we also build our images from our mirrors, probably fine? though is epel handled similarly?00:14
Clark[m]Our mirrors don't get the latest compose though right?00:16
ianwthis is fairly theoretical because i assume they don't want hundreds of ci machines pointing at the composes directly, so we'd have to figure out how to mirror it00:16
Clark[m]I have new feedback for the CentOS folks now though :) they should push bigfixes through more quickly. Oh wait that's old feedback ;)00:16
ianwand also if we fall behind, say dib can't build for a few days, then perhaps the compose falls out00:16
Clark[m]Really though bugfix updates and feature updates seem to be treated the same here and it severely extends turnaround time on bugfixing00:17
ianwyeah that is a good way of putting it00:17
Clark[m]ianw: the other issue is if we update newer composes our CentOS won't match CentOS in the wild00:17
Clark[m]And generally we try to represent what a user might see if they install $distro at home though don't always succeed at that00:18
ianwthat's true, although i guess the point of the whole thing (9-stream) is that rhel is probably what the "user" is supposed to see00:20
ianwbut for CI purposes, 9-stream seems to be what we want.  but not a distro that takes several weeks to incorporate bug fixes :/00:21
Clark[m]I would argue we should use rocky if that is the goal. To me CentOS stream (and fedora) are more forward looking run software against new things00:21
Clark[m]Because all these intermediate states don't feel super representative of rhel00:22
Clark[m]But are good early indicators that something may be broken00:23
ianwi think they need to reconsider the "we don't push every day to spare the mirrors" approach00:27
ianwi think you can spare the mirrors, but be super duper sure that whatever you push has no bugs.  like rhel level sure00:28
ianwbut if you're purposely being a front-testing distro, then you can't also be on a slow update cycle00:28
ianwthat seems like a better approach than us (me) tying ourselves (myself) in knots trying to somehow pull ephemeral composes00:29
ianwi think i'll respond to that mail saying similar, also with clark's good point that updates != bugfixes00:30
opendevreviewMerged openstack/diskimage-builder master: Fix backward regex match
ianwhopefully is a useful response, we'll see.  i don't know what we can do to help 01:18
ianwnot sure if everyone sees it, but why that archive page has decided to show the mails in a fixed-width panel that is < than a standard 80-line, making my mail look weirdly wrapped, i'm not sure01:19
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Revert "CentOS 9-stream : work around selinux permissions issue"
*** ysandeep|out is now known as ysandeep03:45
*** soniya29 is now known as soniya29|ruck04:37
*** soniya29 is now known as soniya29|ruck05:29
opendevreviewyatin proposed openstack/project-config master: [proposal-updates] Run ensure-python before git-review setup
ykarelfrickler, also needs ^06:09
ianwnb03 has stopped working agin06:17
ianw2022-06-15 15:22:24.867 | Build completed successfully06:18
ianwthat's openEuler-20-03-LTS-SP2-arm64-0000000533.log06:18
ianw-rw-r--r-- 1 nodepool nodepool 1.6M Jun 15 15:22 openEuler-20-03-LTS-SP2-arm64-0000000533.log06:19
ianw-rw-r--r-- 1 nodepool nodepool 1.3M Jun 16 00:51 centos-9-stream-arm64-0000000148.log06:19
ianwthe 9-stream build failed06:19
ianwsame error -- losetup: /opt/dib_tmp/dib_image.TYjoPhDC/image0.raw: failed to set up loop device: No such file or directory06:20
chkumar|roverianw: hello06:21
ianwso, one possibility the openEuler build left something behind that broke the next build, or the other is that 9-stream is broken, and then breaks everything after it06:21
chkumar|roveron cs9 content provider jobs, we are seeing resource-agents-4.10.0-17.el9.x86_64: Cannot download, all mirrors were already tried without success mirror issue06:22
chkumar|roverit all started today06:22
ianwchkumar|rover: it's probably no coincidence that someone was pushing 20220613.0 to the mirror today ...06:22
ianwsee the thread @
ianwthis is the other problem with 2-weekly mirror pushes; we get all sorts of problem jammed together06:23
chkumar|roverianw: checking, thanks!06:24
ianwi'm just looking at the sync logs and HighAvailability/aarch64/os/Packages/resource-agents-4.10.0-17.el9.aarch64.rpm is there06:25
ianwsorry, and x86 anyway06:25
chkumar|roverianw: yes the package is there06:26
ianwwhat log file is the error in?  maybe it has something to do with enabling the ha repo?  something now depends on it that didn't before?06:30
chkumar|roverit is coming while building rabbitmq container
ianwresource-agents-4.10.0-17.el9.x86_64.rpm: Downloading successful, but checksum doesn't match. Calculated: b24e6a8a70066918658ffc390d3dc4e3f91eb19eead295db9f14b7af988d8796(sha256)  Expected: 3a4a37810d503f5eefb6b114d3b9f65d82ede7b9983dd1b3d62f975f8081e94b(sha256) 06:33
chkumar|roverhere is the ha repo we are using
ianwwell that's weird06:33
ianwit's getting the file, but it doesn't think it's right06:33
ianw$ sha256sum ./resource-agents-4.10.0-17.el9.x86_64.rpm 06:36
ianw3a4a37810d503f5eefb6b114d3b9f65d82ede7b9983dd1b3d62f975f8081e94b  ./resource-agents-4.10.0-17.el9.x86_64.rpm06:36
ianwthat's from upstream, so that seem right06:36
chkumar|roveryes expected checksum is correct06:38
chkumar|roverI can see the same issue in image build job
ianwcurl -s  | sha256sum06:41
ianwb24e6a8a70066918658ffc390d3dc4e3f91eb19eead295db9f14b7af988d8796  -06:41
ianwit seems we have mirrored what the upstream has given us ...06:41
ianw$ rpm2cpio  resource-agents-4.10.0-17.el9.x86_64.rpm | cpio -ti06:43
ianwcpio: premature end of file06:43
ianwunfortunately that file looks corrupt, but it's corrupt in the upstream mirror we sync from06:43
chkumar|roveroh, let me open a bug on launchpad and on centos infra also06:44
ianwwell it's not really centos infra, it's facebook infra who provide that we sync from06:44
ianwbut good luck talking to them :/06:44
*** soniya is now known as soniya|ruck06:45
chkumar|roveryes, it seems to be the mirror issue06:47
chkumar|roverianw: Do we want to change the mirror now?06:47
chkumar|roverI have no contacts for facebook mirror06:47
chkumar|roverlet me check other mirror from mirror manager06:49
ianwto be clear ->
ianwis the file that does this06:51
ianwi don't think i'm going to have time to monitor a full sync from somehwere else06:52
ianwwhat i could do is manually download that file and release the volume, and take the mirror lock so it doesn't get overwritten, until tomorrow06:52
ianwthat is assuming it is only that file that is corrupt06:52
chkumar|roverthere is one more06:52
chkumar|roverianw: sorry that is the only package06:53
chkumar|rover  resource-agents-4.10.0-17.el9.x86_64: Cannot download, all mirrors were already tried without success06:53
chkumar|roverianw: that will unblock our CI06:57
ianw-rw-r--r-- 1 10022 root   502874 Jun 10 13:42 resource-agents-4.10.0-17.el9.x86_64.rpm06:57
ianw-rw-r--r-- 1 lp    root   502874 Jun 10 13:42 resource-agents-4.10.0-17.el9.x86_64.rpm.106:57
ianwthey're the same size06:57
chkumar|roversorry it means we cannot replace the corrupted file.07:03
ianwno i've replaced it now.  but it's very weird they are the same size07:03
*** elodilles is now known as elodilles_pto07:04
chkumar|roverianw: thank you thank you :-) ++++ I hope it clears the CI for today07:04
ianwinfra-root: tl;dr we found a corrupt file in the 9-stream mirror.  i have replaced /afs/ with the correct file from upstream07:05
ianwi am holding the 9-stream mirror lock now in a root screen on mirror-update, to prevent it being overwritten.  i'm out of time to investigate further.  it's possible there's more corrupt files. is the relevant checksums07:06
ykareli think i have seen this in past07:06
ykarelit's likely related to signed vs unsigned rpms07:07
ykarelpushed to mirrors07:07
ykarelso #centos-devel atleast will confirm if it's temporary due to mirror push or some actual bad push07:08
ianwok, well the file we synced is
ykarelrpm -K
ianwinfra-root: i have also copied the rpms in question to /root/resource-agents-rpm, and there's also dump.diff which is a dump between them07:10
ianwykarel: you might be on to something -- is where they start to differ in the file07:10
ianwthe left is the "corrupt" one -- it seems to start having what look like a bunch of hashes 07:11
ianwinteresting, so many questions.  the digests in the mirror version are not ok?  and somehow the facebook mirror has synced a different version of the file (that we then synced)?  or i guess somehow the upstream one has been replaced07:14
ykarelhmm let me ask on centos-devel07:15
ianwok, i'm going to be afk for a few hours at a kids recital concert.  i'll check back in later to see07:16
*** jpena|off is now known as jpena07:42
chkumar|roverykarel: ianw
chkumar|roverykarel: feel free to add the conversation from centos-devel there08:08
ykarelchkumar|rover, ack08:37
*** rlandy|out is now known as rlandy10:32
fricklerI'm confused that this is actually working already , in my local config I always need to define [webapp:port] in order for the launcher to start listening10:47
*** soniya is now known as soniya29|ruck11:00
ianwfrickler: at one stage i had monitoring, and may have setup those end-points. it was a long time ago11:02
ianwchkumar|rover: hrm, not sure i really follow the centos-devel conversation.  not clear to me why there would be corrupt files out there.  but anyway, seems like something is happening anyway11:02
ianwi've still got the lock on the centos-stream mirror volume, so it won't update.  when we know it's good, any infra-root can just kill the screen session to release it (or i'll check on it tomorrow morning .au time)11:03
*** ysandeep is now known as ysandeep|afk11:08
*** dviroel|afk is now known as dviroel11:18
ykarelianw, it's fxed now11:36
*** soniya29 is now known as soniya29|ruck11:47
*** soniya is now known as soniya29|ruck12:05
fungii'll release the lock in that case so we pull the proper file(s)12:08
fungithere is no longer a lock being held, so it should go back to refreshing on its usual schedule12:11
*** ysandeep|afk is now known as ysandeep12:18
*** soniya is now known as soniya29|ruck12:39
*** rlandy is now known as rlandy|rover13:03
*** soniya is now known as soniya|ruck13:58
*** ysandeep is now known as ysandeep|afk14:08
*** soniya29 is now known as soniya29|ruck14:38
opendevreviewJeremy Stanley proposed opendev/system-config master: Add description to IRC channel reg example
*** ysandeep|afk is now known as ysandeep14:57
ykarelClark[m], frickler can you check follow up one too
*** soniya29|ruck is now known as soniya29|out15:18
opendevreviewClark Boylan proposed opendev/system-config master: Auto update nodepool launchers
clarkbcorvus: ^ fyi I think you were interested in that perviously15:25
opendevreviewMerged openstack/project-config master: [proposal-updates] Run ensure-python before git-review setup
clarkbnow to work on a change to automate the zuul cluster reboot and update playbook. fungi were you still going to run that by hand again?15:25
fungiyeah, i can start it as soon as the openstack tc meeting wraps up15:25
fungialso want to find some time today to resume reconciling our stale snapshot of jitsi-meet configs with current upstream examples/defaults15:28
opendevreviewClark Boylan proposed opendev/system-config master: Run zuul cluster reboots and updates automatically
clarkbI marked that WIP until we're happy with another manual run15:44
corvusclarkb: thx +2.  i'm going to try to check on where we are and look at making releases later15:45
fungii have this ready to run in a root screen session on bridge.o.o:15:45
fungitime ansible-playbook -f20 /home/zuul/src/ 2>&1 | tee zuul_reboot.log.2022061615:45
clarkbfungi: that lgtm15:45
fungirunning now15:46
clarkbianw: so I don't forget, do we have WIP changes up to switch the gerrit image (and mariadb image?) yet?15:52
clarkbwe probably want those to check clean and be ready for approval post upgrade15:52
*** ysandeep is now known as ysandeep|out15:58
opendevreviewyatin proposed zuul/zuul-jobs master: [ensure-pip] Install interpreters in loop
ykarelclarkb, fungi found ^ missing after merged16:39
ykarelplease also merge this, hopefully it will fix those jobs16:41
*** jpena is now known as jpena|off16:44
clarkbykarel: fungi when done together none are installed beacuse apt errors in that case but done ina loop we do them one by one and ignore errors?16:51
clarkbwhy are we trying to intsall packages that will fail?16:51
ykarelclarkb, yes exactly16:51
ykarelme not aware about the history of why it's done16:51
clarkbykarel: there should be a linter rule that won't allow you to use {{item}} in a loop because zuul jobs roles are often nested under loops16:51
clarkbI think we use zj_somename16:52
ykarelclarkb, ok i can fix it16:52
ykarelclarkb, tox-py27 broken there :(16:54
ykarelmultiple platform related jobs too broken16:56
opendevreviewyatin proposed zuul/zuul-jobs master: [ensure-pip] Install interpreters in loop
clarkblooks like the azure dep is no longer python27 safe16:59
ykarelc7 fails with zuul requires Python '>=3.8' but the running Python is 3.6.816:59
ykarelc8 too17:00
ykarelsame with debian-buster17:00
ykarelalso with opensuse17:00
ykarel broke these jobs execpt tox-py27 one17:01
clarkbhrm I thought we didn't install zuul anymore in these tests17:03
clarkbbut zuul doesn't do python3.6 anymore that change merely makes that more explicit17:03
clarkbaha this is specific to the ensure pip tests testing that it can install something using zuul17:05
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Fix two testing problems
clarkbykarel: ^ that may be sufficient to address these issues17:11
ykarelclarkb, Thanks17:12
ykarelclarkb, tox-py27 still failed17:23
ykarelrest good17:23
ykareltriggered by msrest 0.7.0 release
clarkboh its not the azure dep itself but a transitive dep. Let me update17:26
corvusinfra-root: the log has the last nodepool launcher restart as being on 6.0.0, and ps times match.  so i think we need a launcher restart before tagging a new release.  do we want to land 846186 and let it do it?17:28
ykarelk logs say it msrest 0.7.0 requires azure-core>=1.24.0, but you'll have azure-core 1.21.1 which is incompatible.17:30
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Fix two testing problems
clarkbcorvus: I'm happy either way17:31
clarkbcorvus: if you want to go ahead and restart things by hand I'm sure there will be plenty of opportunity to future restarts to exercise my change17:31
clarkbykarel: I think the dep solver should address the version needs. As long as we cap off msrest for python2.717:32
opendevreviewyatin proposed zuul/zuul-jobs master: [ensure-pip] Install interpreters in loop
ykarelclarkb, yes LGTM, i rebased my one on yours17:32
ykarelThanks, /me out now17:32
corvusclarkb: i'm not in a hurry -- my slight preference would be for fungi/ianw to okay 846186 within a couple of hours and then let's see what happens.  otherwise, i'll restart manually later my afternoon.  sound ok?17:36
fungisounds good, reviewing it now17:53
opendevreviewJay Faulkner proposed openstack/diskimage-builder master: Add support for Python 3.10
fungilgtm, i went ahead and approved it since it's really straightforward18:00
clarkbheh I happened to catch the playbook while it was rebooting ze01 so it looked like maybe it was idel. I think it is fine just good/bad timing18:30
opendevreviewMerged opendev/system-config master: Auto update nodepool launchers
fungiclarkb: you mean you stopped the executor, or you were manually running something and it complained that ze01 was unreachable?18:41
clarkbfungi: I was just checking on the pgoress of the playbook and ze01 wasn't in componenets but I could ssh in18:41
clarkbso then I checked the log and it said it was rebooting and sur eenough I couldn't ssh in anymore18:42
fungiahh, yep18:42
clarkblooks like the nodepool launchers all restarted while I was eating lunch20:08
fungigood deal20:11
*** dviroel is now known as dviroel|biab21:07
ianwclarkb: + 3 are the changes to switch images in system-config22:02
ianwso the upstream fb mirror still has a borked resource-agents-4.10.0-17.el9.x86_64.rpm but we still have the right one, and fungi dropped the lock right?22:06
fungii did22:09
fungiso maybe the rsync is failing, i haven't looked22:09
ianwit has rsync'd since then.  i guess maybe it's doing timestamp comparision and our .rpm doesn't get overwritten due to whatever rsync options22:09
fungiahh, yeah if it's not using the checksum method then that's probably it22:10
clarkbianw: +3 ?22:10
clarkboh its 844363 the child22:10
ianwsorry, changes 844362 and 84436322:11
ianw is what #centos-devel where talking about22:11
ianwthat links to a scsi possible corruption issue @ that they apparently hit22:12
ianwi wonder if some subset of mirrors have pulled randomly corrupt files, and for similar reasons that we're not overwriting, they're not overwriting it either22:13
ianwthe file *size* is exactly the same, except it has borked data at the end22:15
clarkbI'm not seeing the link to the scsi bug?22:17
clarkboh wait its a hyperlink named BZ22:17
ianwdoing school run, but i should be able to quickly write something to go through the metalink results and poll each mirror and see if it's only fb out of sync, or some subset22:22
ianwif it's only fb, then i guess it's "our" problem, in that we have a bad upstream22:22
ianwif it's multiple, seems like an infra issue because they've pushed out bad data22:22
fungia "bad" upstream the centos-connected community members insisted was the best option because... facebook22:23
fungievery time someone complains about a mirror inconsistency it seems like it's immediately followed by "shouldn't we switch to mirroring that from facebook?"22:24
ianwwell, the other one ran out of disk and was just in a broken state for an indeterminate amount of time22:58
fungithe other-other-other one at any rate. we seem to go through upstream mirror sources like tissues22:58
ianwi don't know if there's any insistence it's the best one.  we were using and something happened there too23:00
*** dviroel|biab is now known as dviroel23:07
fungiwell, also we're limited by the intersection of mirrors which allow rsync and mirrors which have the content we want23:11
ianwit seems like fb is off the hook a bit here, because i see at least one other mirror with a bad file23:13
*** dviroel is now known as dviroel|out23:16
fungiso probably the primary mirror initially pushed the signed package which everyone mirrored, then later pushed the unsigned one with the same filename and size but generated a new index with the corresponding updated checksum23:17
fungifacebook et cetera pulled the new index because it was a different length or different filename, but thought the package was the same and kept the original23:17
ianwif anyone can run from a different location that might be interesting23:18
ianwi'm not sure it's signed v unsigned.  the package we mirrored had a corrupt cpio inside it23:19
fungioh, okay for some reason i thought you said something about embedded hashes before23:21
ianwi did :)  the signed v unsigned thing was a theory23:22
ianw is where the data actually differs on the corrupt file (left) and correct one (right)23:23
ianwit looks to me like a bunch of sha hashes separated by nulls23:23
ianw is the results of me pulling it from everything the metalink gives me23:25
ianwa bunch of them show the corrupt file we have23:25
fungibut yeah, if lots of mirrors have the old corrupt version, then probably the primary replaced it in such a way that ~nobody knows to pull the fixed one23:29
ianwright, since it's the same size, probably the same thing our rsync is doing23:30
ianwthese tier 1 mirrors use mirrormanager or whatever, i don't know23:30
*** rlandy|rover is now known as rlandy|out23:36
ianwi've opened  i don't think there's much to do at this point but wait and see what upstream infra advises.  this may not be the only corrupt file, so switching upstream mirrors might just give us the same problem on a different file23:48

Generated by 2.17.3 by Marius Gedminas - find it at!