*** dviroel|afk is now known as dviroel | 00:25 | |
opendevreview | Tony Breeds proposed openstack/project-config master: Move the requirements-constraints job to the periodic-weekly pipeline https://review.opendev.org/c/openstack/project-config/+/853735 | 00:51 |
---|---|---|
*** dviroel is now known as dviroel|out | 01:08 | |
ianw | i've just put a hold on a rocky job and hopefully can poke and rebuild | 01:44 |
ianw | i rebuilt it and it gets a different machine-id in /boot/loader/entries/ each time | 04:24 |
ianw | which suggest to me it's being written out | 04:25 |
ianw | ><fs> cat /etc/default/grub | 04:29 |
ianw | GRUB_DEVICE=LABEL=guest-rootfs | 04:29 |
ianw | the image has this correct | 04:29 |
ianw | ><rescue> chroot /sysroot | 04:39 |
ianw | Fatal glibc error: CPU does not support x86-64-v2 | 04:39 |
ianw | excellent, i can't inspect it | 04:39 |
ianw | 022-08-19 05:02:54.974 | + cat /etc/machine-id | 05:03 |
ianw | 2022-08-19 05:02:54.976 | + grub2-mkconfig -o /boot/grub2/grub.cfg | 05:03 |
ianw | there is no machine-id before grub2-mkconfig | 05:04 |
ianw | 2022-08-19 05:08:34.477 | + cat /etc/machine-id | 05:09 |
ianw | 2022-08-19 05:08:34.478 | 766780bd5a4943ef8725c4ab6d7cc6db | 05:09 |
ianw | 2022-08-19 05:08:34.844 | + [[ -e /boot/loader/entries ]] | 05:10 |
ianw | 2022-08-19 05:08:34.845 | + grubby --info=ALL | 05:10 |
ianw | 2022-08-19 05:08:34.878 | grep: /boot/grub2/grubenv: No such file or directory | 05:10 |
ianw | 2022-08-19 05:08:34.895 | index=0 | 05:10 |
ianw | 2022-08-19 05:08:34.895 | kernel="/boot/vmlinuz-5.14.0-70.22.1.el9_0.x86_64" | 05:10 |
ianw | 2022-08-19 05:08:34.895 | args="ro console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset gfxpayload=text" | 05:10 |
ianw | 2022-08-19 05:08:34.895 | root="LABEL=guest-rootfs" | 05:10 |
ianw | i.e. if the machine-id is there, it rewrites the bl entry | 05:10 |
*** soniya29 is now known as soniya29|ruck | 05:11 | |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] rocky : create machine-id in 9 https://review.opendev.org/c/openstack/diskimage-builder/+/853575 | 05:14 |
ianw | that seemed to work. i can rework that to a mergable change | 06:35 |
*** soniya29|ruck is now known as soniya29|ruck|afk | 07:03 | |
opendevreview | Merged openstack/project-config master: Move the requirements-constraints job to the periodic-weekly pipeline https://review.opendev.org/c/openstack/project-config/+/853735 | 07:34 |
*** soniya29|ruck|afk is now known as soniya29|ruck | 07:38 | |
*** soniya29|ruck is now known as soniya29|ruck|lunch | 08:00 | |
*** jpena|off is now known as jpena | 08:33 | |
*** soniya29|ruck|lunch is now known as soniya29|ruck | 08:47 | |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: rocky : create machine-id in 9 https://review.opendev.org/c/openstack/diskimage-builder/+/853575 | 08:57 |
*** rlandy_ is now known as rlandy | 10:18 | |
*** dviroel|out is now known as dviroel | 11:26 | |
*** dviroel is now known as dviroel|rover | 11:27 | |
*** frenzy_friday is now known as frenzyfriday|afk | 11:35 | |
*** tosky is now known as Guest532 | 12:51 | |
*** tosky_ is now known as tosky | 12:51 | |
NeilHanlon | ianw: thank you for investigating and finding that for me. I'm going to do some investigation on the back here and see why this is all happening. | 12:58 |
*** frenzyfriday|afk is now known as frenzyfriday | 13:09 | |
*** tbachman_ is now known as tbachman | 13:10 | |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: - added elrepo element https://review.opendev.org/c/openstack/diskimage-builder/+/853817 | 13:25 |
*** jpena is now known as jpena|off | 13:45 | |
*** dasm|off is now known as dasm | 13:59 | |
*** rcastillo|rover is now known as rcastillo | 14:00 | |
*** dviroel|rover is now known as dviroel|rover|lunch | 15:02 | |
clarkb | fungi: you reviewed 853575 but the rest of https://review.opendev.org/q/topic:testing-rootfs needs to land before that one can | 15:33 |
clarkb | no rush though just wanted to call that out | 15:34 |
*** marios is now known as marios|out | 15:36 | |
clarkb | johnsom: as a quick status check have you noticed any recent ssh issues? | 15:42 |
clarkb | I expect that the expected fix has largely rolled out by this point (but it can take some time to incorporate updates like that in our disk images) | 15:43 |
johnsom | Well, there was an IP swap issue: https://zuul.openstack.org/build/ddc9ebdff70748cdb6cf09fc87e05aaf/log/job-output.txt#1985 | 15:45 |
johnsom | I wonder if that was related too | 15:45 |
clarkb | thats a different issue aiui. TL;DR is openstack will sometimes reuse an IP while a host thinks it still has the IP | 15:46 |
clarkb | then you get a fight over the IP via ARP | 15:46 |
johnsom | This one still had it: https://zuul.openstack.org/build/e4bf1989542c4ca09d73b6d66181124c/log/job-output.txt | 15:46 |
johnsom | https://zuul.openstack.org/build/e4bf1989542c4ca09d73b6d66181124c/log/job-output.txt#25472 | 15:46 |
johnsom | It doesn't seem to be the 50+ occurrences in a day like it was. | 15:47 |
clarkb | johnsom: the rax-ord focal image (which that example ran on) updated about 16 hours ago. That job started about 17 hours ago. I think that was before the fix rolled out in that region | 15:48 |
clarkb | ok that is good news. I'll try to keep an eye out for new occurrences to see if we should tweak the ssh settings further | 15:49 |
johnsom | Ah, ok. | 15:49 |
johnsom | Yeah, I will let you know if I see a bunch again. | 15:49 |
*** dviroel|rover|lunch is now known as dviroel|rover | 16:09 | |
*** rlandy is now known as rlandy|lunch | 16:17 | |
*** rlandy|lunch is now known as rlandy | 16:50 | |
*** rlandy is now known as rlandy|mtg | 17:24 | |
clarkb | fungi: thanks! I'll reapprove the child change sthat zuul -1'd due to not sharing queues | 17:36 |
opendevreview | Merged openstack/diskimage-builder master: Allow setting ROOT_LABEL from environment https://review.opendev.org/c/openstack/diskimage-builder/+/853573 | 17:53 |
*** rlandy|mtg is now known as rlandy | 18:00 | |
*** rlandy is now known as rlandy|biab | 18:05 | |
TheJulia | So what controls the contents of mirrors? I ask because https://mirror.mtl01.iweb.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/ has a .treeinfo file, but lacks the entire images folder which is present at http://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/ | 18:16 |
clarkb | TheJulia: a cronjob runs https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/centos-stream-mirror-update every few (4?) hours | 18:40 |
clarkb | we explicitly exclude images because they are huge | 18:40 |
TheJulia | but they are static and not updated like packages | 18:41 |
TheJulia | at least... they are not from what I've been told | 18:41 |
* TheJulia looks | 18:42 | |
clarkb | I think that isn't true for a lot of the images I don't know about centos 9 | 18:42 |
clarkb | fedora for example added images all the time | 18:42 |
clarkb | but also raw disk is a concern | 18:42 |
clarkb | we also don't mirror source packages for similar reasons | 18:42 |
TheJulia | last updated august 8th | 18:42 |
TheJulia | so I *believe* it is monthly | 18:42 |
clarkb | we're trying to meet the 80% need which is distro packages and things like images and source packages can be fetched from upstream when needed | 18:42 |
TheJulia | the conundrum ultimately is that I can't trust .treeinfo is valid because I can't grab the image assets it refers to because the folder is gone | 18:43 |
TheJulia | so then I have to split requests across public upstream mirror and local mirror | 18:43 |
TheJulia | which is... not great :( | 18:44 |
clarkb | I'm not an expert on rpm repos. I guess .treeinfo reflects index info that the normal http index doesn't? We could probably update that file. But no one has done it for any of the rpm mirroring and we've done this for years | 18:44 |
TheJulia | well, .treeinfo provides extra insight and some checksums for the images | 18:45 |
clarkb | does dnf/yum/rpm even pull images natively? I always assumed those were separate request sanyway | 18:45 |
TheJulia | this is more so I can install a host directly from the repository | 18:45 |
TheJulia | s/repository/mirror | 18:45 |
TheJulia | so dnf/yum/rpm is after the host boots with the assets in /images/ | 18:46 |
TheJulia | clarkb: https://mirror.mtl01.iweb.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/.treeinfo | 18:47 |
clarkb | Looking at that we should also exclude the .treeinfo if we exclude images/ ? seems like that is images specific and doesn't apply to the packages/ dir | 18:48 |
clarkb | oh the very end does refer to packages | 18:48 |
TheJulia | if you guys nuke .treeinfo, I'm going to have to toss quite a bit of work | 18:49 |
TheJulia | or | 18:49 |
TheJulia | just use public mirrors | 18:49 |
clarkb | well I'm just trying to understand I don't know how any of this works other than "mirroring all the extra large content that most of our jobs don't use isn't currently viable" | 18:49 |
TheJulia | Well, I'm trying to make a job that needs images/pxeboot/* and images/install.img | 18:50 |
clarkb | right I think my assumption was that anyone needing images/ or ppc packages for a cross compile or src packages for gdb would fetch them from an upstream mirror | 18:50 |
clarkb | and our mirrors would be used for the vast majority of package installs that jobs perform | 18:50 |
TheJulia | okay | 18:51 |
TheJulia | I'll just... bifriucate the mirror settings for my job and hope that it... worsk | 18:51 |
TheJulia | works | 18:51 |
clarkb | https://grafana.opendev.org/d/9871b26303/afs?orgId=1 shows disk usage | 18:52 |
clarkb | There are two main concerns (but one is currently a subset of the other). First is that an AFS volume can't hav emore than 2TB in it. We currently organize each repo into its own volume so that gives us a hard upper limit. The second is we have 4TB total disk space and are consuming 82% ish of that (and thats above our red warn threshold) | 18:55 |
TheJulia | okay | 18:55 |
clarkb | and as a side note, we're currently not mirroring new distros we add (like rocky linux) to see how well that performs | 18:56 |
TheJulia | That seems like... it is going to be a major constraint as time moves forward | 18:57 |
clarkb | I suspect if we suddenly converted all the cnetos jobs to rocky that we'd mirror it though. Its a question of popularity which has an impact on reliability and load when talking to upstream mirrors | 18:57 |
clarkb | TheJulia: it is a big reason why I keep arguing for everyone to stop using centos as a stable distro :) | 18:57 |
clarkb | I mean other than the fact that every is surprised at how often it breaks. It would allow us to pivot the current tooling to a stable alternative that would be easier to manage mirrors for example | 18:58 |
TheJulia | And take a higher development maintenance cost on the backend when something has been broken for months if not longer | 18:58 |
clarkb | we definitely can't mirror the world and we've always said that. This is why a lot of stuff we added caching proxies for | 18:58 |
TheJulia | bottom line is, there is a huge tradeoff there | 18:58 |
clarkb | Intead we try to focus on the thing swith the most impact and currently ubuntu and centos are that so they get mirrored | 18:59 |
TheJulia | I did see the caching proxy stuff poking around, that seems... interestring :) | 18:59 |
*** rlandy|biab is now known as rlandy | 19:00 | |
clarkb | re development maintenace I've also said I think there is value in testing master against centos in a limited fashion to find the broken. But that doens't need half a terabyte of mirror disk to accomplish | 19:00 |
TheJulia | truth be told, I'm *really* surprised that we're still mirroring centos7 | 19:01 |
clarkb | A lot of jobs still use it iirc | 19:01 |
TheJulia | that is... worrying | 19:01 |
clarkb | for stable branches | 19:02 |
clarkb | slowly those are getting pruned (some pruning happened recently too) and we may need to look and see if that is still needed | 19:02 |
clarkb | one upside to testing broadly with stream is that it is harder to ignore when it doe sbreak. We have definitely seen a preference for punting or working around external breakages like that. Finding thta balanve is not easy | 19:05 |
clarkb | we ripped out tumbleweed because it was always broken and no one was caring for it. | 19:06 |
TheJulia | Yeah, I remember that. :( | 19:09 |
TheJulia | They also had some weird intentional breaks which made no sense | 19:10 |
NeilHanlon | working with stream has been, in a word, frustrating | 19:51 |
clarkb | I think it is getting better. The rtt on getting bug fixes in has gone down. But it is still longer than you'd like for a CI system | 19:52 |
NeilHanlon | i agree. there are some rough edges but it's not unsurprising. it's one of those things that needs work and love to get better, which took me a while to realize. sorta the old 'complaining for the sake of complaining doesn't help anyone' adage | 19:53 |
NeilHanlon | there's just a duality of testing ahead and behind RHEL. it's a huge effort to constantly maintain what boils down to rolling updates, versus spending a bit of time every once and a while to resolve whatever changes might cause breakages in a minor release.. I imagine most will choose testing post-RHEL rather than pre, but my crystal ball doesn't | 19:55 |
NeilHanlon | show everything :) | 19:55 |
TheJulia | At least in my interactions, we see most issues that impact us hit centos far enough ahead of rhel that we can at least make progress on getting things fixed before it detonates downstream.... or that we can fix/route around well in advance of it breaking us | 20:18 |
clarkb | A lot of the problems I've seen people run into really should've been an immediate revert in the stream distro. But I guess that isn't really possible with stream 8 due to the way package updates flow. and it is still difficult with 9? | 20:22 |
clarkb | ping was broken for over a month on stream 8 due to a packge update that should've been trivially reverted immediately when it was noticed. Instead there was a fix made that didn't fix it then another fix and I think it was the third fix that actually fixed it | 20:22 |
TheJulia | ugh :( | 20:28 |
TheJulia | At least on my end, 9 so far has been okay, 8 is where things were funky. | 20:29 |
*** rlandy is now known as rlandy|biab | 20:51 | |
*** rlandy|biab is now known as rlandy | 21:12 | |
*** dviroel|rover is now known as dviroel|rover|biab | 21:14 | |
*** dasm is now known as dasm|off | 21:33 | |
opendevreview | Merged openstack/diskimage-builder master: rocky : create machine-id in 9 https://review.opendev.org/c/openstack/diskimage-builder/+/853575 | 22:21 |
*** dviroel|rover|biab is now known as dviroel|rover | 22:28 | |
*** dviroel|rover is now known as dviroel|out | 22:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!