Thursday, 2024-05-16

liushy#join openstack-neutron02:40
*** benj_7 is now known as benj_07:55
*** sfinucan is now known as stephenfin11:35
*** sfinucan is now known as stephenfin14:00
clarkbfungi: I've been reminded that we have a contractor out this morning to write up a proposal for deck replacement (not my idea of fun). So not sure I'll be able to watch noble things for a bit. But before that happens I should be able to bump quotas and start a root screen on mirror-update and hold the lock if we think that is appropriate15:05
fungii can handle those if you prefer15:06
fungiwhat quota would you want to start out at?15:06
fungior i can look at jammy and guess15:06
clarkbfungi: well I think the 4 releases are all in the one volume right now so take current usage divide by 4 and then add some buffer?15:07
fungiroot screen session is started on mirror-update now15:07
* tonyb has attached to the screen15:08
fungiwe've got ubuntu and ubuntu-ports in separate volumes, fwiw15:08
clarkblooks like we are right at quote for ubuntu15:08
clarkbfungi: ya this change only touches ubuntu to avoid thrashing afs across multipel volumes updating at the same time15:09
fungiand yeah, we still have content on the mirror.ubuntu volume for bionic, focal, jammy and xenial15:09
clarkbHopefully we haven't broken reprepro already for existing releases. Maybe the process should be bump to 850 + (850/4) + headroom == ~120GB15:10
fungiodds are noble will be bigger than xenial due to overall accumulation15:10
clarkband then do a sync before we run with noble15:10
clarkbthen do noble. Mostly I'm worried about the issues that frickler had to repair semi recently when reprepro hits quota limits15:10
fungii'll check logs15:10
clarkbso want to make sure that is sorted before we add noble to the mix15:10
clarkber my math should've been 1200GB | 1.2TB not 120GB15:11
fungimmm, yeah https://static.opendev.org/mirror/ubuntu/timestamp.txt says 2024-05-13T17:02:49,981032177+00:00 so we probably filled it several days ago15:11
clarkbthats still well under the 2TB limit for openafs15:11
clarkbdo we recall what the problems were with recovering from that? I want ot say reprepro basically acted like it had synced stuff even though it hadnt15:12
fungi$ fs listquota /afs/.openstack.org/mirror/ubuntu15:12
fungiVolume Name                    Quota       Used %Used   Partition15:12
fungimirror.ubuntu              850000000  850000977  100%<<       17%    <<WARNING15:12
fungi:(15:12
fungii don't recall, but might be able to find it in channel history15:12
clarkbya so step 0 here is likely going to be addressing that. I think we can bump the quota up to what we expect to need for noble too at the same time15:13
clarkbkinda makes me wish deb repos used rsync despite their problems rsync is really smart about redownloading stuff if necessary15:14
clarkblooks like ports is getting close too. So maybe go ahead and bump that one as well. I think ports could go to 850GB15:15
fungihttps://meetings.opendev.org/irclogs/%23opendev/%23opendev.2024-03-11.log.html15:16
fungii think that was it?15:16
fungiyeah, involved telling reprepro to "forget" the truncated files we deleted15:17
clarkbya that looks right and apparently we have docs about this. So ya I would do quota bumps for both volumes. Then sync and see what fails then tell reprepro to forget about those files15:17
clarkbthen when we have reached steady state again we can add noble into the mix15:18
fungiso increase quota for mirror.ubuntu from 850000000 to 120000000015:19
clarkbit isn't clear to me how we identify all those files though it is implied reprepro itself will complain about them15:19
clarkbfungi: ya I think so. That should be sufficient headroom for both catching up and adding noble15:19
fungilooks like ubuntu-ports isn't quite full, just very very close15:20
clarkbyup its a few GB away. But since we're likely to add noble to it soon bump it up too makes sense (I don't think we need to do the "recovery" task on it though)15:20
fungiincrease mirror.ubuntu-ports quota from 600000000 to... 800000000? higher?15:21
fungi5/4 is 750000000 so we still want some breathing room beyond that15:21
clarkbfungi: I suggested 850 above. I think its probably better to overestimate when adding new releases then we can reduce to a saner level once we have a clear picture of actual need15:22
clarkbin order to avoid doing the recovery process multiple times15:22
fungiaha, missed you had come up with a suggestion for that one too. yeah 850000000 then15:23
fungiokay, quotas increased, both volumes are around 70% used now, and i'm rerunning reprepro-mirror-update for that volume15:30
clarkbhopefully with the lock held :)15:34
clarkbsorry I haven't managed to jump into the screen to see yet15:34
tonybYup flock -n ......15:34
tonybclarkb, fungi FWIW I added a second window in the screen to tail the logfile15:36
fungithanks, if i'd been thinking i would have done `2>&1|tee /var/log/reprepro/ubuntu.log` instead of `>> /var/log/reprepro/ubuntu.log 2>&1`15:36
tonybNo problem.15:37
fungihrm, those errors are unexpected15:40
tonybYes they are15:41
clarkbis it possible those are the corrupted package files?15:42
fungihttps://meetings.opendev.org/irclogs/%23opendev/%23opendev.2022-04-26.log.html#t2022-04-26T11:18:0815:42
clarkbheh ok so maybe we can ignore them for now15:42
clarkband then treat those gkeop packages as corrupt? (we're unlikely to ever use them since we aren't using google cloud)15:43
tonybOkay what about the Okay what about the "Cannot retrieve long description for package" errors?15:43
fungihttps://meetings.opendev.org/irclogs/%23opendev/%23opendev.2022-05-11.log.html#t2022-05-11T07:49:43 is the other hit i found15:43
clarkbtonyb: ya I think those errors are possible due to the debs being corrupt and we need to rm and forget them in reprepro forcing it to resync those files15:44
fungii don't find that we've ever discussed "Cannot retrieve long description for package" errors before now15:44
clarkbfungi: I'm wondering if it is just a different manifestation of a corrupt deb. Like download the .deb and check the description in the package?15:45
fungipossible15:45
clarkbor maybe it is missing from metadata in indexes?15:45
clarkbre installing zstd to make that error go away we do have zstd installed so that isn't the issue with the zstd errors/warnings15:45
clarkbthe zstd errors do show up in older syncs that were otherwise successful if I'm reading the logs correctly15:47
fungihttps://static.opendev.org/mirror/logs/reprepro/ubuntu.log.1 is from before the volume filled up, and if you search for "cannot" you find lots of hits in there15:48
tonybthe .deb seems okay.  I grabbed it and used ar/tar/md5sums to get the contents15:48
tonybfungi: Ahh okay15:48
fungibut i don't see the "Cannot retrieve long description for package" or "Premature end of reading from" errors in old logs15:48
fungiso those very well may represent corrupt packages15:49
clarkbtonyb: did you grab it from the RW volume or RO? The RO volume will have content from before any potential corruption15:50
tonybclarkb: ahhh I did the RO16:00
fungigrab it from /afs/.openstack.org/mirror/...16:01
fungithe leading "." gets you the writeable path16:01
fungianyway, i guess once this completes we can scrape a list of truncated packages out of the log and clear those out, then run it again16:02
clarkbsounds like a plan16:03
tonybYup.  Testing again with the same and a second .dev16:03
tonyb`ar: ../linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb: file format not recognized`16:04
tonyband it's 0bytes so yeah I think that explains that :)16:05
clarkbcool and contractor has arrived so I'm afk for a bit16:05
fungino worries, i'll keep plugging at this16:05
tonybfungi: I'm watching but let me know if you want another set of eyes on anything 16:06
fungiyou bet16:08
fungibut really i'm just going to grep the premature end error lines from the log and then build a unique list of package filenames from it16:09
fungiand then do the steps mentioned in our documentation16:09
tonybOkay16:09
fungiabout halfway down https://docs.opendev.org/opendev/system-config/latest/reprepro.html#advanced-recovery-techniques you'll see the "$REPREPRO _forget ..." example16:12
fungithat's what i'll try16:12
fungithose "WRONG SIZE of '/afs/...deb': expected NNN found 0" errors are also probably a good indicator16:23
fungilooks like it's finding the same files16:23
tonybYeah.16:24
tonybHopefully we can skip a full rebuild of the checksum DB16:26
fungiluckily it spotted the errors and didn't vos release16:47
fungilooks like the same 2 packages are resulting in the "Premature end of reading from" and "WRONG SIZE of" errors: pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb and pool/main/l/linux-gkeop/linux-gkeop-headers-5.4.0-1092_5.4.0-1092.96_all.deb16:50
tonybThat makes sense, I expect more than those 2 files though16:59
fungiokay, those two files have been forgotten and deleted. copying the edited db back into afs now17:04
fungithen we can retry17:04
tonybokay17:05
clarkbit ended up only being two debs that it complained about in in the end?17:05
clarkbI guess that makes sense if reprepro can detect the fs quota errosr quickly and bail out when the happen17:05
fungifor the premature end/wrong size errors yes. just complained about them multiple times was all17:05
fungii'm operating on the assumption that the zstd decoding errors on other files are benign17:06
fungiokay, reprepro is running again, let's see if it succeeds this time17:15
fungimmm, looks like maybe i failed at cleaning up one of those17:27
fungiseems to still be complaining about pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb17:28
tonyb:/17:30
fungiit's possible i didn't correctly "forget" that one from the db, i'll try again once this completes17:30
clarkbreprepro is cosplaying an elephant17:31
fungia pink elephant, i think17:32
fungithe last reprepro run complained:17:57
fungiMissing file pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb17:57
fungibut when i try to forget that file, it says:17:57
fungiUnable to forget unknown filekey 'pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb'.17:57
clarkbfungi: did reprepro treat that as a failure or a warning?17:58
clarkbthe reprepro run not the forget command17:58
funginope, as an error17:58
clarkbok, I wonder if we need to fetch the deb. I seem to recall frickler may have tried something similar at some point?17:58
clarkbfetch it out of band I mean17:58
fungidone, grabbed http://archive.ubuntu.com/ubuntu/pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb into that directory with wget18:01
fungitrying reprepro again18:02
fungiokay, now it's complaining about the other one missing instead :/18:16
fungiguess i'll do the same there18:17
fungiso this time i pulled in pool/main/l/linux-gkeop/linux-gkeop-headers-5.4.0-1092_5.4.0-1092.96_all.deb and am trying yet again18:37
tonybSo frustrating, Thanks for sticking with this fungi.18:38
fungimeh, it is what it is18:39
tonybTrue18:39
fungiunrelated, how do folks feel about https://review.opendev.org/917278 for glean?18:39
fungilooks like it can merge now if there's consensus18:40
fungioof, fps limiting is disabled in holoiso currently, that's also going to be painful for modded fallout 418:49
fungier, wrong channel! ;)18:49
tonybWhere do we publish our built nodepool images? I wanted to grab one for local 'fake gate' testing18:51
fricklerhttps://nb01.opendev.org/images/ 18:52
fungitonyb: also check nb02 since the most recent could be on either18:52
fungiand nb04 if you want arm18:52
tonybAhhh cool.  I looked at nb0[1-3].opendev.org/ and only saw the logs, and I didn't see '/images/' I guess because it's an alias or similar18:54
fungiyes, exactly18:54
Clark[m]Finishing up an early lunch but will look at glean and the project-config backlog frickler pointed out after18:55
fungitonyb: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/nodepool-builder/templates/vhost.conf.j2#L3618:56
tonybgot it18:56
fungiwhich is precisely why it doesn't show up in the parent autoindex18:57
fricklerit shows up in my browser history though, so was an easy find ;) not sure if we have it actually documented somewhere outside of the vhost config?19:02
fungipretty sure it's just tribal knowledge, yes19:03
tonybYesterday's jammy build we roughly twice the size of the previsou 2 days: https://paste.opendev.org/show/bD2c523CqApS7Ab5Mh3Z/19:03
fungionly jammy, or other distros too?19:04
tonybI only check jammy19:04
fungiif it was all images, could be someone merged something huuuuuuuge to a git repo in opendev, i guess19:04
fungior a change in how we're compressing qcow2 if it's only an increase for that format19:05
Clark[m]The raw images don't look significantly bigger. I wonder if it something to do with qcow compression 19:08
Clark[m]But still probably worth a du to check quickly if anything looks wrong19:09
fricklerthat seems to be a difference between nb01 and nb0219:10
fricklerthe former has the small images, the latter the large ones. pretty consistent for all builds19:11
tonybHmm interesting.19:11
fungicould be a package update which unattended-upgrades has already pulled on nb02 but hasn't upgraded on nb01 yet19:12
fungi/var/log/dpkg.log is the easiest place to see what's been upgraded in the system packages recently19:12
Clark[m]Or something in the dib caches. Maybe to do with git packing ?19:14
fricklerthey're running the same containers from 2 weeks ago, I don't think the host system should matter much?19:15
fricklerthe image size listed inside the build log looks the same at 15G, so one is compressed and the other isn't?19:15
Clark[m]Not for packages. But the dib caches are separate state that can impact what goes into images hence my suspicion 19:16
*** dxld_ is now known as dxld19:18
clarkbfungi: TheJulia re 917278 I can't shake a feeling that change is inappropriate for glean19:23
fungisome sort of parameterization would be better?19:23
clarkbin particular gleans behavior is expected to be predictabel without magical flags or switches. This change adds one.19:24
clarkbfungi: at the very least yes. But I think I would also argue that if you don't want the config drive to be mounted then maybe you shouldn't use glean since its entire purpose is to configure things based on the config drive19:24
clarkbit doesn't support meta data service at all so if there is no config drive mounted glean is a noop19:24
fungiso instead, some way to avoid starting glean in that situation?19:25
clarkbya whcih ironic can do by adjusting their systemd units / init.d etc19:25
clarkbbasically this script exists to be triggered by boot up before configuring anything to ensure the configuration to be configured is available when subsequent steps run.19:26
fungiin ubuntu mirror news, reprepro completed without error this time, and vos release is running19:26
clarkbYou should be able to simply disable that unit (and possibly tell the subsequent steps some other unit satisfies the requirement)19:26
* clarkb will write this on the change19:27
clarkbfungi: https://review.opendev.org/c/openstack/project-config/+/918146 thsi was one I was hoping you would have a chance to look at when you got back or between hikes. And now is that time :)19:30
clarkbtonyb: https://review.opendev.org/c/openstack/project-config/+/910212 looks like one that you wanted to discuss more but not sure that happened so I'll let you approve when you feel it is time19:31
tonybclarkb: Oh I forgot about that.  I say we approve it and see what happens19:37
clarkbya the rules themselves seem valid. I'm not sure I agree with the approach, but it shouldn't cause problems for gerrit19:38
clarkbinfra-root I dropped a -1 on https://review.opendev.org/c/openstack/project-config/+/917827 might want to double check I was accurate in my statements there19:38
clarkbhttps://review.opendev.org/c/openstack/project-config/+/919417 has a depends on that hasn't merged. But generally all of the depends on for these retirements are done19:40
fungivos release finished, i think we can move forward with the noble addition now?19:43
opendevreviewMerged openstack/project-config master: Add editHashtags permissions to some governance repositories  https://review.opendev.org/c/openstack/project-config/+/91962719:44
clarkbfungi: yes that seems fine. Though a reminder I'm taking the tomorrow and monday off19:44
clarkbmostly don't expect others to pick this up (which you've already done thank you!) especially if I'm popping out for a bit. But I won't say no either19:45
opendevreviewMerged openstack/project-config master: Retire puppet-panko completely  https://review.opendev.org/c/openstack/project-config/+/91941419:46
opendevreviewMerged openstack/project-config master: Retire Solum: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91922319:46
opendevreviewMerged openstack/project-config master: Retire Murano: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91937119:46
TheJuliaclarkb: so I think the answer for us is to remove simple-init usage/glean from our matrix. We're building a handle to do the mount, we've already merged it. Would a better possibility be to "if already mounted, don't attempt the mount" ?19:58
clarkbTheJulia: yes, I could see that being a reasonable compromise since at least in that case we know the drive is mounted and glean actions can proceed19:59
TheJuliaI guess the issue is there is still a race there potentially20:02
TheJuliathis was an attempt to entirely short circuit a race possibility from state/knowledge we could check20:02
clarkbyou should be able to set a strict ordering in systemd to avoid that20:02
clarkb(I think that would've been possible without these chagnes too)20:02
TheJuliabut not generally when you have multiple different possibilities20:02
clarkbright you'd need to know what you expect to do the mount then ensure everything else comes after that20:03
clarkbTheJulia: another option is to just turn off that unit for glean20:06
TheJuliayeah, that is a possibility I didn't think of20:06
clarkbwhich is what I tried to suggest in my comment on the change. You may need to update order for the subsequent glean stuff though20:06
TheJuliaif simple-init is there, we could do that20:06
TheJuliaeh, not that worried about that, just want to lockout this whole swath of issues20:07
clarkbbut I think all of that is doable post simple-init/glean install just by editing systemd stuff in /etc/systemd20:07
clarkbya I have stuff like /etc/systemd/system/graphical.target.wants/display-manager.service -> /usr/lib/systemd/system/display-manager.service (odd that isn't in /usr/lib instead but maybe it is easier to support multiple graphical display manager this way)20:08
opendevreviewMerged opendev/system-config master: Add ubuntu noble to our package mirrors  https://review.opendev.org/c/opendev/system-config/+/91977720:32
fungii'm set up to manually start the ubuntu noble mirroring in screen once that ^ config addition deploys, and can check in on it again when i wake up tomorrow since i expect it will require many hours to complete regardless20:32
fungithough for the moment it's waiting behind the murano retirement20:37
fungiokay, deploy completed success, starting reprepro-mirror-update in screen now20:46
fungiand seems to be properly including noble20:47
jrosser_can i get some help with understanding whats happening on some of my jobs21:08
jrosser_earlier today this ran, with several jobs https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/898914?tab=change-view-tab-header-zuul-results-summary21:09
jrosser_and just now similar files touched in the same repo only runs the docs job https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918129?tab=change-view-tab-header-zuul-results-summary21:09
fungithe first one altered files under templates and vars, while the second did not21:11
jrosser_indeed, but i would expect the set of jobs that ran against the first patch to also run against the second one21:12
fungiif you're not using any file filters for those jobs, yes21:12
fungistrangely, https://zuul.opendev.org/t/openstack/job/openstack-ansible-deploy-aio (the parent of those extra jobs that ran earlier) is now not loading any information21:14
jrosser_hmmm21:14
fungijrosser_: https://zuul.opendev.org/t/openstack/config-errors suggests it could be a dependency on murano, which just got retired today21:14
jrosser_i noticed this because i approved a bunch of patches which to my great surprise merged almost instantly, which is very wrong21:15
fungijrosser_: yeah, i'm not in a position to dig into it right now, but it looks like zuul can't run those jobs any more because they require murano which it no longer knows about the existence of21:17
jrosser_it is unfortunate that it fails unsafe rather than safe21:17
jrosser_as i've merged 20+ patches effectively bypassing all testing21:18
jrosser_luckily, they were extremely trivial changes21:18
fungiwell, zuul will prevent you from merging config changes which break its configuration, it's not designed for having projects removed "behind its back" like openstack's retirement process does21:21
clarkb[e: 3a48ab0dbeac4234869f29f0efbe5037] No matching parents for job openstack-ansible-deploy-aio and change <Change 0x7f87357fba10 openstack/openstack-ansible-os_manila 918129,3>21:24
clarkbthis is what the log shows for at least one of the jobs that I guess you expected to run but didnt'?21:25
jrosser_i made a patch to remove murano21:26
clarkbhttps://opendev.org/openstack/openstack-ansible/src/branch/master/zuul.d/jobs.yaml#L251 and that does seem to be the job that includes murano21:26
jrosser_that seems to have queued up jobs that i'd expect21:27
clarkbso ya I think that is the source of the problem. Seems jrosser_ was able to test that epxerimentally as well21:27
jrosser_excellent - good to understand21:27
jrosser_i had a slight panic when a whole bunch of stuff just merged21:28
opendevreviewGhanshyam proposed openstack/project-config master: Retire Senlin: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91935523:29
opendevreviewGhanshyam proposed openstack/project-config master: Retire ec2-api: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91939723:34
opendevreviewGhanshyam proposed openstack/project-config master: Retire Sahara: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91939123:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!