liushy | #join openstack-neutron | 02:40 |
---|---|---|
*** benj_7 is now known as benj_ | 07:55 | |
*** sfinucan is now known as stephenfin | 11:35 | |
*** sfinucan is now known as stephenfin | 14:00 | |
clarkb | fungi: I've been reminded that we have a contractor out this morning to write up a proposal for deck replacement (not my idea of fun). So not sure I'll be able to watch noble things for a bit. But before that happens I should be able to bump quotas and start a root screen on mirror-update and hold the lock if we think that is appropriate | 15:05 |
fungi | i can handle those if you prefer | 15:06 |
fungi | what quota would you want to start out at? | 15:06 |
fungi | or i can look at jammy and guess | 15:06 |
clarkb | fungi: well I think the 4 releases are all in the one volume right now so take current usage divide by 4 and then add some buffer? | 15:07 |
fungi | root screen session is started on mirror-update now | 15:07 |
* tonyb has attached to the screen | 15:08 | |
fungi | we've got ubuntu and ubuntu-ports in separate volumes, fwiw | 15:08 |
clarkb | looks like we are right at quote for ubuntu | 15:08 |
clarkb | fungi: ya this change only touches ubuntu to avoid thrashing afs across multipel volumes updating at the same time | 15:09 |
fungi | and yeah, we still have content on the mirror.ubuntu volume for bionic, focal, jammy and xenial | 15:09 |
clarkb | Hopefully we haven't broken reprepro already for existing releases. Maybe the process should be bump to 850 + (850/4) + headroom == ~120GB | 15:10 |
fungi | odds are noble will be bigger than xenial due to overall accumulation | 15:10 |
clarkb | and then do a sync before we run with noble | 15:10 |
clarkb | then do noble. Mostly I'm worried about the issues that frickler had to repair semi recently when reprepro hits quota limits | 15:10 |
fungi | i'll check logs | 15:10 |
clarkb | so want to make sure that is sorted before we add noble to the mix | 15:10 |
clarkb | er my math should've been 1200GB | 1.2TB not 120GB | 15:11 |
fungi | mmm, yeah https://static.opendev.org/mirror/ubuntu/timestamp.txt says 2024-05-13T17:02:49,981032177+00:00 so we probably filled it several days ago | 15:11 |
clarkb | thats still well under the 2TB limit for openafs | 15:11 |
clarkb | do we recall what the problems were with recovering from that? I want ot say reprepro basically acted like it had synced stuff even though it hadnt | 15:12 |
fungi | $ fs listquota /afs/.openstack.org/mirror/ubuntu | 15:12 |
fungi | Volume Name Quota Used %Used Partition | 15:12 |
fungi | mirror.ubuntu 850000000 850000977 100%<< 17% <<WARNING | 15:12 |
fungi | :( | 15:12 |
fungi | i don't recall, but might be able to find it in channel history | 15:12 |
clarkb | ya so step 0 here is likely going to be addressing that. I think we can bump the quota up to what we expect to need for noble too at the same time | 15:13 |
clarkb | kinda makes me wish deb repos used rsync despite their problems rsync is really smart about redownloading stuff if necessary | 15:14 |
clarkb | looks like ports is getting close too. So maybe go ahead and bump that one as well. I think ports could go to 850GB | 15:15 |
fungi | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2024-03-11.log.html | 15:16 |
fungi | i think that was it? | 15:16 |
fungi | yeah, involved telling reprepro to "forget" the truncated files we deleted | 15:17 |
clarkb | ya that looks right and apparently we have docs about this. So ya I would do quota bumps for both volumes. Then sync and see what fails then tell reprepro to forget about those files | 15:17 |
clarkb | then when we have reached steady state again we can add noble into the mix | 15:18 |
fungi | so increase quota for mirror.ubuntu from 850000000 to 1200000000 | 15:19 |
clarkb | it isn't clear to me how we identify all those files though it is implied reprepro itself will complain about them | 15:19 |
clarkb | fungi: ya I think so. That should be sufficient headroom for both catching up and adding noble | 15:19 |
fungi | looks like ubuntu-ports isn't quite full, just very very close | 15:20 |
clarkb | yup its a few GB away. But since we're likely to add noble to it soon bump it up too makes sense (I don't think we need to do the "recovery" task on it though) | 15:20 |
fungi | increase mirror.ubuntu-ports quota from 600000000 to... 800000000? higher? | 15:21 |
fungi | 5/4 is 750000000 so we still want some breathing room beyond that | 15:21 |
clarkb | fungi: I suggested 850 above. I think its probably better to overestimate when adding new releases then we can reduce to a saner level once we have a clear picture of actual need | 15:22 |
clarkb | in order to avoid doing the recovery process multiple times | 15:22 |
fungi | aha, missed you had come up with a suggestion for that one too. yeah 850000000 then | 15:23 |
fungi | okay, quotas increased, both volumes are around 70% used now, and i'm rerunning reprepro-mirror-update for that volume | 15:30 |
clarkb | hopefully with the lock held :) | 15:34 |
clarkb | sorry I haven't managed to jump into the screen to see yet | 15:34 |
tonyb | Yup flock -n ...... | 15:34 |
tonyb | clarkb, fungi FWIW I added a second window in the screen to tail the logfile | 15:36 |
fungi | thanks, if i'd been thinking i would have done `2>&1|tee /var/log/reprepro/ubuntu.log` instead of `>> /var/log/reprepro/ubuntu.log 2>&1` | 15:36 |
tonyb | No problem. | 15:37 |
fungi | hrm, those errors are unexpected | 15:40 |
tonyb | Yes they are | 15:41 |
clarkb | is it possible those are the corrupted package files? | 15:42 |
fungi | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2022-04-26.log.html#t2022-04-26T11:18:08 | 15:42 |
clarkb | heh ok so maybe we can ignore them for now | 15:42 |
clarkb | and then treat those gkeop packages as corrupt? (we're unlikely to ever use them since we aren't using google cloud) | 15:43 |
tonyb | Okay what about the Okay what about the "Cannot retrieve long description for package" errors? | 15:43 |
fungi | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2022-05-11.log.html#t2022-05-11T07:49:43 is the other hit i found | 15:43 |
clarkb | tonyb: ya I think those errors are possible due to the debs being corrupt and we need to rm and forget them in reprepro forcing it to resync those files | 15:44 |
fungi | i don't find that we've ever discussed "Cannot retrieve long description for package" errors before now | 15:44 |
clarkb | fungi: I'm wondering if it is just a different manifestation of a corrupt deb. Like download the .deb and check the description in the package? | 15:45 |
fungi | possible | 15:45 |
clarkb | or maybe it is missing from metadata in indexes? | 15:45 |
clarkb | re installing zstd to make that error go away we do have zstd installed so that isn't the issue with the zstd errors/warnings | 15:45 |
clarkb | the zstd errors do show up in older syncs that were otherwise successful if I'm reading the logs correctly | 15:47 |
fungi | https://static.opendev.org/mirror/logs/reprepro/ubuntu.log.1 is from before the volume filled up, and if you search for "cannot" you find lots of hits in there | 15:48 |
tonyb | the .deb seems okay. I grabbed it and used ar/tar/md5sums to get the contents | 15:48 |
tonyb | fungi: Ahh okay | 15:48 |
fungi | but i don't see the "Cannot retrieve long description for package" or "Premature end of reading from" errors in old logs | 15:48 |
fungi | so those very well may represent corrupt packages | 15:49 |
clarkb | tonyb: did you grab it from the RW volume or RO? The RO volume will have content from before any potential corruption | 15:50 |
tonyb | clarkb: ahhh I did the RO | 16:00 |
fungi | grab it from /afs/.openstack.org/mirror/... | 16:01 |
fungi | the leading "." gets you the writeable path | 16:01 |
fungi | anyway, i guess once this completes we can scrape a list of truncated packages out of the log and clear those out, then run it again | 16:02 |
clarkb | sounds like a plan | 16:03 |
tonyb | Yup. Testing again with the same and a second .dev | 16:03 |
tonyb | `ar: ../linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb: file format not recognized` | 16:04 |
tonyb | and it's 0bytes so yeah I think that explains that :) | 16:05 |
clarkb | cool and contractor has arrived so I'm afk for a bit | 16:05 |
fungi | no worries, i'll keep plugging at this | 16:05 |
tonyb | fungi: I'm watching but let me know if you want another set of eyes on anything | 16:06 |
fungi | you bet | 16:08 |
fungi | but really i'm just going to grep the premature end error lines from the log and then build a unique list of package filenames from it | 16:09 |
fungi | and then do the steps mentioned in our documentation | 16:09 |
tonyb | Okay | 16:09 |
fungi | about halfway down https://docs.opendev.org/opendev/system-config/latest/reprepro.html#advanced-recovery-techniques you'll see the "$REPREPRO _forget ..." example | 16:12 |
fungi | that's what i'll try | 16:12 |
fungi | those "WRONG SIZE of '/afs/...deb': expected NNN found 0" errors are also probably a good indicator | 16:23 |
fungi | looks like it's finding the same files | 16:23 |
tonyb | Yeah. | 16:24 |
tonyb | Hopefully we can skip a full rebuild of the checksum DB | 16:26 |
fungi | luckily it spotted the errors and didn't vos release | 16:47 |
fungi | looks like the same 2 packages are resulting in the "Premature end of reading from" and "WRONG SIZE of" errors: pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb and pool/main/l/linux-gkeop/linux-gkeop-headers-5.4.0-1092_5.4.0-1092.96_all.deb | 16:50 |
tonyb | That makes sense, I expect more than those 2 files though | 16:59 |
fungi | okay, those two files have been forgotten and deleted. copying the edited db back into afs now | 17:04 |
fungi | then we can retry | 17:04 |
tonyb | okay | 17:05 |
clarkb | it ended up only being two debs that it complained about in in the end? | 17:05 |
clarkb | I guess that makes sense if reprepro can detect the fs quota errosr quickly and bail out when the happen | 17:05 |
fungi | for the premature end/wrong size errors yes. just complained about them multiple times was all | 17:05 |
fungi | i'm operating on the assumption that the zstd decoding errors on other files are benign | 17:06 |
fungi | okay, reprepro is running again, let's see if it succeeds this time | 17:15 |
fungi | mmm, looks like maybe i failed at cleaning up one of those | 17:27 |
fungi | seems to still be complaining about pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb | 17:28 |
tonyb | :/ | 17:30 |
fungi | it's possible i didn't correctly "forget" that one from the db, i'll try again once this completes | 17:30 |
clarkb | reprepro is cosplaying an elephant | 17:31 |
fungi | a pink elephant, i think | 17:32 |
fungi | the last reprepro run complained: | 17:57 |
fungi | Missing file pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb | 17:57 |
fungi | but when i try to forget that file, it says: | 17:57 |
fungi | Unable to forget unknown filekey 'pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb'. | 17:57 |
clarkb | fungi: did reprepro treat that as a failure or a warning? | 17:58 |
clarkb | the reprepro run not the forget command | 17:58 |
fungi | nope, as an error | 17:58 |
clarkb | ok, I wonder if we need to fetch the deb. I seem to recall frickler may have tried something similar at some point? | 17:58 |
clarkb | fetch it out of band I mean | 17:58 |
fungi | done, grabbed http://archive.ubuntu.com/ubuntu/pool/main/l/linux-gkeop-5.15/linux-gkeop-5.15-tools-5.15.0-1045_5.15.0-1045.52~20.04.1_amd64.deb into that directory with wget | 18:01 |
fungi | trying reprepro again | 18:02 |
fungi | okay, now it's complaining about the other one missing instead :/ | 18:16 |
fungi | guess i'll do the same there | 18:17 |
fungi | so this time i pulled in pool/main/l/linux-gkeop/linux-gkeop-headers-5.4.0-1092_5.4.0-1092.96_all.deb and am trying yet again | 18:37 |
tonyb | So frustrating, Thanks for sticking with this fungi. | 18:38 |
fungi | meh, it is what it is | 18:39 |
tonyb | True | 18:39 |
fungi | unrelated, how do folks feel about https://review.opendev.org/917278 for glean? | 18:39 |
fungi | looks like it can merge now if there's consensus | 18:40 |
fungi | oof, fps limiting is disabled in holoiso currently, that's also going to be painful for modded fallout 4 | 18:49 |
fungi | er, wrong channel! ;) | 18:49 |
tonyb | Where do we publish our built nodepool images? I wanted to grab one for local 'fake gate' testing | 18:51 |
frickler | https://nb01.opendev.org/images/ | 18:52 |
fungi | tonyb: also check nb02 since the most recent could be on either | 18:52 |
fungi | and nb04 if you want arm | 18:52 |
tonyb | Ahhh cool. I looked at nb0[1-3].opendev.org/ and only saw the logs, and I didn't see '/images/' I guess because it's an alias or similar | 18:54 |
fungi | yes, exactly | 18:54 |
Clark[m] | Finishing up an early lunch but will look at glean and the project-config backlog frickler pointed out after | 18:55 |
fungi | tonyb: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/nodepool-builder/templates/vhost.conf.j2#L36 | 18:56 |
tonyb | got it | 18:56 |
fungi | which is precisely why it doesn't show up in the parent autoindex | 18:57 |
frickler | it shows up in my browser history though, so was an easy find ;) not sure if we have it actually documented somewhere outside of the vhost config? | 19:02 |
fungi | pretty sure it's just tribal knowledge, yes | 19:03 |
tonyb | Yesterday's jammy build we roughly twice the size of the previsou 2 days: https://paste.opendev.org/show/bD2c523CqApS7Ab5Mh3Z/ | 19:03 |
fungi | only jammy, or other distros too? | 19:04 |
tonyb | I only check jammy | 19:04 |
fungi | if it was all images, could be someone merged something huuuuuuuge to a git repo in opendev, i guess | 19:04 |
fungi | or a change in how we're compressing qcow2 if it's only an increase for that format | 19:05 |
Clark[m] | The raw images don't look significantly bigger. I wonder if it something to do with qcow compression | 19:08 |
Clark[m] | But still probably worth a du to check quickly if anything looks wrong | 19:09 |
frickler | that seems to be a difference between nb01 and nb02 | 19:10 |
frickler | the former has the small images, the latter the large ones. pretty consistent for all builds | 19:11 |
tonyb | Hmm interesting. | 19:11 |
fungi | could be a package update which unattended-upgrades has already pulled on nb02 but hasn't upgraded on nb01 yet | 19:12 |
fungi | /var/log/dpkg.log is the easiest place to see what's been upgraded in the system packages recently | 19:12 |
Clark[m] | Or something in the dib caches. Maybe to do with git packing ? | 19:14 |
frickler | they're running the same containers from 2 weeks ago, I don't think the host system should matter much? | 19:15 |
frickler | the image size listed inside the build log looks the same at 15G, so one is compressed and the other isn't? | 19:15 |
Clark[m] | Not for packages. But the dib caches are separate state that can impact what goes into images hence my suspicion | 19:16 |
*** dxld_ is now known as dxld | 19:18 | |
clarkb | fungi: TheJulia re 917278 I can't shake a feeling that change is inappropriate for glean | 19:23 |
fungi | some sort of parameterization would be better? | 19:23 |
clarkb | in particular gleans behavior is expected to be predictabel without magical flags or switches. This change adds one. | 19:24 |
clarkb | fungi: at the very least yes. But I think I would also argue that if you don't want the config drive to be mounted then maybe you shouldn't use glean since its entire purpose is to configure things based on the config drive | 19:24 |
clarkb | it doesn't support meta data service at all so if there is no config drive mounted glean is a noop | 19:24 |
fungi | so instead, some way to avoid starting glean in that situation? | 19:25 |
clarkb | ya whcih ironic can do by adjusting their systemd units / init.d etc | 19:25 |
clarkb | basically this script exists to be triggered by boot up before configuring anything to ensure the configuration to be configured is available when subsequent steps run. | 19:26 |
fungi | in ubuntu mirror news, reprepro completed without error this time, and vos release is running | 19:26 |
clarkb | You should be able to simply disable that unit (and possibly tell the subsequent steps some other unit satisfies the requirement) | 19:26 |
* clarkb will write this on the change | 19:27 | |
clarkb | fungi: https://review.opendev.org/c/openstack/project-config/+/918146 thsi was one I was hoping you would have a chance to look at when you got back or between hikes. And now is that time :) | 19:30 |
clarkb | tonyb: https://review.opendev.org/c/openstack/project-config/+/910212 looks like one that you wanted to discuss more but not sure that happened so I'll let you approve when you feel it is time | 19:31 |
tonyb | clarkb: Oh I forgot about that. I say we approve it and see what happens | 19:37 |
clarkb | ya the rules themselves seem valid. I'm not sure I agree with the approach, but it shouldn't cause problems for gerrit | 19:38 |
clarkb | infra-root I dropped a -1 on https://review.opendev.org/c/openstack/project-config/+/917827 might want to double check I was accurate in my statements there | 19:38 |
clarkb | https://review.opendev.org/c/openstack/project-config/+/919417 has a depends on that hasn't merged. But generally all of the depends on for these retirements are done | 19:40 |
fungi | vos release finished, i think we can move forward with the noble addition now? | 19:43 |
opendevreview | Merged openstack/project-config master: Add editHashtags permissions to some governance repositories https://review.opendev.org/c/openstack/project-config/+/919627 | 19:44 |
clarkb | fungi: yes that seems fine. Though a reminder I'm taking the tomorrow and monday off | 19:44 |
clarkb | mostly don't expect others to pick this up (which you've already done thank you!) especially if I'm popping out for a bit. But I won't say no either | 19:45 |
opendevreview | Merged openstack/project-config master: Retire puppet-panko completely https://review.opendev.org/c/openstack/project-config/+/919414 | 19:46 |
opendevreview | Merged openstack/project-config master: Retire Solum: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919223 | 19:46 |
opendevreview | Merged openstack/project-config master: Retire Murano: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919371 | 19:46 |
TheJulia | clarkb: so I think the answer for us is to remove simple-init usage/glean from our matrix. We're building a handle to do the mount, we've already merged it. Would a better possibility be to "if already mounted, don't attempt the mount" ? | 19:58 |
clarkb | TheJulia: yes, I could see that being a reasonable compromise since at least in that case we know the drive is mounted and glean actions can proceed | 19:59 |
TheJulia | I guess the issue is there is still a race there potentially | 20:02 |
TheJulia | this was an attempt to entirely short circuit a race possibility from state/knowledge we could check | 20:02 |
clarkb | you should be able to set a strict ordering in systemd to avoid that | 20:02 |
clarkb | (I think that would've been possible without these chagnes too) | 20:02 |
TheJulia | but not generally when you have multiple different possibilities | 20:02 |
clarkb | right you'd need to know what you expect to do the mount then ensure everything else comes after that | 20:03 |
clarkb | TheJulia: another option is to just turn off that unit for glean | 20:06 |
TheJulia | yeah, that is a possibility I didn't think of | 20:06 |
clarkb | which is what I tried to suggest in my comment on the change. You may need to update order for the subsequent glean stuff though | 20:06 |
TheJulia | if simple-init is there, we could do that | 20:06 |
TheJulia | eh, not that worried about that, just want to lockout this whole swath of issues | 20:07 |
clarkb | but I think all of that is doable post simple-init/glean install just by editing systemd stuff in /etc/systemd | 20:07 |
clarkb | ya I have stuff like /etc/systemd/system/graphical.target.wants/display-manager.service -> /usr/lib/systemd/system/display-manager.service (odd that isn't in /usr/lib instead but maybe it is easier to support multiple graphical display manager this way) | 20:08 |
opendevreview | Merged opendev/system-config master: Add ubuntu noble to our package mirrors https://review.opendev.org/c/opendev/system-config/+/919777 | 20:32 |
fungi | i'm set up to manually start the ubuntu noble mirroring in screen once that ^ config addition deploys, and can check in on it again when i wake up tomorrow since i expect it will require many hours to complete regardless | 20:32 |
fungi | though for the moment it's waiting behind the murano retirement | 20:37 |
fungi | okay, deploy completed success, starting reprepro-mirror-update in screen now | 20:46 |
fungi | and seems to be properly including noble | 20:47 |
jrosser_ | can i get some help with understanding whats happening on some of my jobs | 21:08 |
jrosser_ | earlier today this ran, with several jobs https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/898914?tab=change-view-tab-header-zuul-results-summary | 21:09 |
jrosser_ | and just now similar files touched in the same repo only runs the docs job https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918129?tab=change-view-tab-header-zuul-results-summary | 21:09 |
fungi | the first one altered files under templates and vars, while the second did not | 21:11 |
jrosser_ | indeed, but i would expect the set of jobs that ran against the first patch to also run against the second one | 21:12 |
fungi | if you're not using any file filters for those jobs, yes | 21:12 |
fungi | strangely, https://zuul.opendev.org/t/openstack/job/openstack-ansible-deploy-aio (the parent of those extra jobs that ran earlier) is now not loading any information | 21:14 |
jrosser_ | hmmm | 21:14 |
fungi | jrosser_: https://zuul.opendev.org/t/openstack/config-errors suggests it could be a dependency on murano, which just got retired today | 21:14 |
jrosser_ | i noticed this because i approved a bunch of patches which to my great surprise merged almost instantly, which is very wrong | 21:15 |
fungi | jrosser_: yeah, i'm not in a position to dig into it right now, but it looks like zuul can't run those jobs any more because they require murano which it no longer knows about the existence of | 21:17 |
jrosser_ | it is unfortunate that it fails unsafe rather than safe | 21:17 |
jrosser_ | as i've merged 20+ patches effectively bypassing all testing | 21:18 |
jrosser_ | luckily, they were extremely trivial changes | 21:18 |
fungi | well, zuul will prevent you from merging config changes which break its configuration, it's not designed for having projects removed "behind its back" like openstack's retirement process does | 21:21 |
clarkb | [e: 3a48ab0dbeac4234869f29f0efbe5037] No matching parents for job openstack-ansible-deploy-aio and change <Change 0x7f87357fba10 openstack/openstack-ansible-os_manila 918129,3> | 21:24 |
clarkb | this is what the log shows for at least one of the jobs that I guess you expected to run but didnt'? | 21:25 |
jrosser_ | i made a patch to remove murano | 21:26 |
clarkb | https://opendev.org/openstack/openstack-ansible/src/branch/master/zuul.d/jobs.yaml#L251 and that does seem to be the job that includes murano | 21:26 |
jrosser_ | that seems to have queued up jobs that i'd expect | 21:27 |
clarkb | so ya I think that is the source of the problem. Seems jrosser_ was able to test that epxerimentally as well | 21:27 |
jrosser_ | excellent - good to understand | 21:27 |
jrosser_ | i had a slight panic when a whole bunch of stuff just merged | 21:28 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire Senlin: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919355 | 23:29 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire ec2-api: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919397 | 23:34 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire Sahara: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919391 | 23:40 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!