fungi | yeah, it was 64m lines long and over 5gb in size when i deleted it | 00:00 |
---|---|---|
fungi | i'll look at this again tomorrow with a (hopefully) clearer head, but i think we're at the blow-it-away-and-start-over stage | 00:01 |
*** ykarel_ is now known as ykarel | 06:42 | |
frickler | might be some actual loop caused by a symlink? do you remember one of the file paths? | 11:06 |
frickler | also regarding the cirros cert, it seems we might soon need to make the warning threshold for expiry configurable anyway, LE plans to offer certs with just 6d validity https://letsencrypt.org/2024/12/11/eoy-letter-2024/ | 11:08 |
frickler | clarkb: sorry about lack of feedback for the held node. I guess we'd need a fresh node anyway if we wanted to continue to debug the ansible module. but that's also not on my priority list | 11:09 |
fungi | frickler: the file paths seemed to be every deb file in the mirror | 12:56 |
fungi | and source packages for them too, e.g. | 12:58 |
fungi | pool/universe/g/golang-github-jacobsa-crypto/golang-github-jacobsa-crypto-dev_0.0~git20161111.0.293ce0c+dfsg1-7_arm64.deb | 12:58 |
fungi | pool/universe/g/golang-github-jacobsa-crypto/golang-github-jacobsa-crypto_0.0~git20161111.0.293ce0c+dfsg1-7.debian.tar.xz | 12:58 |
fungi | pool/universe/g/golang-github-jacobsa-crypto/golang-github-jacobsa-crypto_0.0~git20161111.0.293ce0c+dfsg1-7.dsc | 12:58 |
frickler | fungi: weird, sounds like the "big badaboom" approach would be the next best option to try indeed | 13:44 |
frickler | infra-root: we're down to < 200 zuul config errors now due to various cleanups and there's more pending with the eom-eol transitions. so it would be great to also be able to tackle some of the big non-openstack offenders like https://zuul.opendev.org/t/openstack/config-errors?project=starlingx%2Fzuul-jobs&severity=error&skip=0&limit=50 and x/packstack, anyone willing to help with nagging the relevant | 13:52 |
frickler | folks? | 13:52 |
fungi | frickler: looks like packstack is part of rdo, so maybe we can find someone from their crowd to get it back on track (or retire it if development has moved elsewhere) | 14:07 |
frickler | yes, my hope was that we might have enough redhat people around such that they could handle this kind of internally (looking at no infra root in particular ;-D) | 14:14 |
ykarel | jcapitao[m], karolinku[m] if you can check those ^ | 14:50 |
jcapitao[m] | frickler: wrt Packstack you are referring to https://zuul.opendev.org/t/openstack/config-errors?project=x%2Fpackstack&severity=error&skip=0&limit=50 ? | 14:55 |
jcapitao[m] | thanks ykarel for the ping | 14:55 |
fungi | jcapitao[m]: correct. they could be solved through eol of the affected branches or adjustments to job configs on those branches | 15:12 |
fungi | interesting that it has both a stable/yoga and unmaintained/yoga branch | 15:12 |
fungi | looks like it transitioned to unmaintained/yoga but stable/yoga never got removed | 15:13 |
jcapitao[m] | hmm those errors were already fixed | 15:13 |
fungi | in each stable branch or just master? (master isn't reporting errors) | 15:14 |
jcapitao[m] | hmm actually no I misread | 15:16 |
jcapitao[m] | lemme fix that by eol most of them and fixing the active stable branches | 15:16 |
fungi | sounds great, thanks! | 15:16 |
frickler | cool, progress \o/ | 15:27 |
clarkb | Gerrit does appear to be pruning log files after all. The day off set between the two sets of logfiles persists and it seems to be doing a couple more days than 30 ( Ithink it is only counting compressed files not the current and yseterday uncompressed files) | 15:48 |
clarkb | I think that is probably good enough and we can consider that done and land the followup to remove the cron from ansible completely if others agree | 15:49 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/937278 is the change for that | 15:49 |
fungi | i already voted in favor, happy for you to self-approve it if you don't think it's likely to get any additional feedback | 15:52 |
clarkb | ack thanks It should be a noop at this point as the cronjob is gone | 15:57 |
clarkb | but I'll triple check that before approving | 15:57 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch mailman role to docker-compose exec https://review.opendev.org/c/opendev/system-config/+/937790 | 15:59 |
fungi | i'm disappearing for a bit to run some pre-travel errands and grab lunch, but should return in an hour or so | 16:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update Gerrit db container to use journald logging https://review.opendev.org/c/opendev/system-config/+/937791 | 16:05 |
opendevreview | Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792 | 16:06 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Run containers on Noble with docker compose and podman https://review.opendev.org/c/opendev/system-config/+/937641 | 16:08 |
clarkb | now to see if lists and review are happy with noble docker compose and podman | 16:08 |
opendevreview | Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792 | 16:17 |
frickler | the config error fix for shade is failing CI as miserably as I expected. if people would review it anyway, I would just go ahead and force-merge it? then we can ignore that repo again until something really serious comes up https://review.opendev.org/937788 (cc gtema) | 16:20 |
clarkb | frickler: isn't shade dead and rolled into openstacksdk? I wonder if we should just remove it from zuul instead? | 16:22 |
frickler | clarkb: I can try that, yes, though I'm not sure how many references old stable branches might still have | 16:34 |
gtema | frickler - +w-ed the change, clarkb - indeed, we could just drop zuul conf from the repo | 16:34 |
clarkb | frickler: oh I meant remove it from the zuul tenant config not cleanup the config in shade itself | 16:47 |
clarkb | though you could do both | 16:47 |
frickler | clarkb: I read that as dropping it from the zuul tenant config, too, that can still trigger issues for other repos that reference it. let me just push a change to test it | 17:00 |
clarkb | oh I see what you mean. I thougth you meant needing to clean out .zuul.yaml in all stable branches for the repo | 17:01 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: DNM: Drop openstack/shade from zuul config https://review.opendev.org/c/openstack/project-config/+/937797 | 17:02 |
frickler | I guess if we want to really proceed with ^^ we'll need to split it and also include a governance update, but let's see what zuul says first | 17:02 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Run containers on Noble with docker compose and podman https://review.opendev.org/c/opendev/system-config/+/937641 | 17:05 |
frickler | a second review on the stack at https://review.opendev.org/c/openstack/project-config/+/935696 would be nice | 17:06 |
clarkb | frickler: not sure what the commit message means there, the project moved from some other zuul tenant to the openstack tenant? | 17:08 |
clarkb | looks like we didn't move tenants we just added it to gerrit and zuul | 17:09 |
clarkb | oh there are two repos one moved from the vexxhost to the openstack tenant and theo ther is a new repo that explains my confusion | 17:10 |
opendevreview | Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792 | 17:11 |
frickler | oh, I missed that the second governance patch is still pending, sorry :-( | 17:22 |
clarkb | its fine now were' ready to land things on our side with a quick recheck once governance is sorted | 17:22 |
clarkb | cool I think 937792 shows mailman and gerrit stuff working happily too. Zuul is the last big one that I've been avoiding becuse I know we rely on the docker exec containername process in those plays/roles quite a bit and mechanically I know we can convert them to a compatible system but still a bit of work to get through | 17:38 |
clarkb | anyway at thsi point we've probably got enough data to discuss if we like the approach, have any more concerns we want to test through, etc I'll make sure thati s part of tomorrow's meeting agenda | 17:38 |
fungi | i'll be in the middle of crazy holiday highway traffic during the meeting, but i'm advance registering my preference for that future direction | 17:55 |
clarkb | fungi: if you have time before you're driving can you review the changes under topic:podman-prep even if it isn't a full review just check out the shape of things and call out any concerns if you have them | 18:02 |
fungi | you bet | 18:07 |
clarkb | I have confirmed that the cronjob on review02 appears to be gone. I will approve the change ot remove it from ansible now | 18:12 |
fungi | thanks! | 18:14 |
clarkb | I'm dropping gerrit 3.10 stuff from the meeting agenda too as ^ was the last thing remaining related to it | 18:25 |
fungi | having heard no objections so far, i'll plan to blow away the contents of the ubuntu-ports volume and let our script bootstrap it from scratch again. at least the impact of it being stale for another ~week will likely continue to go unnoticed | 18:26 |
clarkb | fungi: in theory that won't break running jobs since the ro copy will stay as is for now? | 18:26 |
fungi | if anybody strongly disagrees with that approach and wants to try their hand at troubleshooting the present state of the mirror, feel free | 18:26 |
fungi | clarkb: correct, we'll continue serving the old (stale) state until it's done | 18:27 |
clarkb | that is my only real concern | 18:27 |
fungi | to be fair, nobody brought it up until i happened to notice it after fixing the stale state of our regular ubuntu miror (presumably because arm64 jobs aren't as closely scrutinized) | 18:28 |
fungi | i'm just hoping to get it back to working before it becomes a job-affecting issue | 18:28 |
clarkb | ++ | 18:29 |
fungi | but this time of year it seems like we can probably afford to wait the near-week that bootstrapping and vos releasing it from zero will require | 18:29 |
clarkb | I'm also going to drop backup server pruning/purging from the agenda. I think that reached a reasonable conclusion last week (though we can continue to apply ot to the other backup server when it starts to fill up) | 18:29 |
fungi | sgtm | 18:29 |
clarkb | ok those agenda edits are in. I'll send it out later today after others have a hcance to chime in on any other edits | 18:37 |
fungi | i've started recursively deleting all the contents of /afs/.openstack.org/mirror/ubuntu-ports | 18:39 |
clarkb | I guess I should add note about ubuntu-ports mirroing | 18:40 |
fungi | once it completes, i'll start another run of the usual script which should hopefully redownload everything and repopulate the databases | 18:40 |
fungi | please do, just be aware i won't be around for that discussion | 18:40 |
clarkb | yup mostly a "be aware of this situation" thing more than expecting you to chime in tomorrow | 18:41 |
fungi | that has already finished, starting the script now | 18:52 |
fungi | it's running in a root screen session on mirror-update, and i'll try to check in on it from time to time over the course of the week | 18:56 |
opendevreview | Merged opendev/system-config master: Drop Gerrit log cleanup cron from Ansible https://review.opendev.org/c/opendev/system-config/+/937278 | 19:33 |
clarkb | https://zuul.opendev.org/t/openstack/build/f05e379479794954bab4319521a221a6 zuul reports ^ deployed successfully | 19:37 |
clarkb | (it should be a noop) | 19:37 |
fungi | excellent~! | 19:38 |
clarkb | looks like we're approving the prep changes. One thing to note about that is for some services we may restart automatically and others we may not | 19:44 |
fungi | yes | 19:44 |
clarkb | I'll have to take stock of that once things land and work through what doesn't (I know gerrit won't for example) | 19:45 |
fungi | seems like things are pretty quiet this week, so manual restarts are probably doable whenever is convenient | 19:45 |
clarkb | looks like gitea won't either | 19:45 |
clarkb | yup its a good time to work through things like that | 19:45 |
fungi | the sooner the better as far as i'm concerned | 19:46 |
clarkb | I think lodgeit may automatically restart | 19:46 |
fungi | mailman should as well | 19:46 |
fungi | but worth double-checking | 19:46 |
clarkb | I in mailman's case we are only changing the ocnfig management checks | 19:47 |
fungi | it may not restart if images don't change (which they probably won't) | 19:47 |
clarkb | so I think those should noop | 19:47 |
fungi | agreed | 19:47 |
fungi | since we build our own mailman container images, the only unknown is if the upstream mariadb container images change i guess | 19:48 |
clarkb | ya | 19:49 |
clarkb | the main thing I think to be on the lookout for is syslog -> journald logging change and we apply that to paste, gitea, and gerrit | 19:50 |
clarkb | I think paste may be automatic but not gitea or gerrit | 19:50 |
clarkb | I can work through manual restarts of gitea later today then see where we are at before doing gerrit. That might happen tomorrow | 19:50 |
opendevreview | Merged opendev/system-config master: Refactor check for new container images https://review.opendev.org/c/opendev/system-config/+/937655 | 19:58 |
clarkb | that change is deploying to gitea09 now whihc should noop unless there is a new mariadb image | 20:00 |
clarkb | then later when the logging change lands I can manually restart things | 20:00 |
clarkb | I have imageand container listings done on gitea09 and gitea10 that I will check when the job is done just to confirm the noop | 20:00 |
clarkb | and then I'll work on lunch | 20:01 |
clarkb | the gitea job completed and as far as I an tell we did not grab new images and did not restart services (the expected behavior | 20:06 |
clarkb | fungi: I've just realized that the string change in https://review.opendev.org/c/opendev/system-config/+/937717/4/playbooks/roles/gitea/tasks/main.yaml may append a newline t othose cron commands | 20:10 |
clarkb | in theory this isn't an issue but I'm not positive of that | 20:10 |
clarkb | so that will need checking after it lands I guess | 20:11 |
clarkb | I think worst case ansible might put a stray newline in the crontab file which is fine | 20:11 |
fungi | i'll check once it deploys | 20:11 |
clarkb | meetpad didn't get new images or restart as expected so that check seems to be working in the no new imgaes cases | 20:11 |
fungi | i've made a dump of the root crontab on gitea09 for comparison | 20:12 |
fungi | i'll diff it after | 20:13 |
clarkb | cool based on other command blocks I don't think the command module is bother by the newline at the end so its jus tthe cron entry in there that I'm slightly ocncerned about | 20:13 |
clarkb | actually we may have examples of that elsewhere /me looks | 20:14 |
clarkb | fungi: the gitea db backup cron uses > and not >- (I think >- eats the newline) and that cron job seems to be fine. Your diff will hopefully confirm | 20:15 |
clarkb | and now I need to make lunch | 20:15 |
opendevreview | Merged opendev/system-config master: Use docker-compose for container execs in gitea https://review.opendev.org/c/opendev/system-config/+/937717 | 20:22 |
opendevreview | Merged opendev/system-config master: Switch mailman role to docker-compose exec https://review.opendev.org/c/opendev/system-config/+/937790 | 20:22 |
opendevreview | Merged opendev/system-config master: Log with journald and not syslog in lodgeit docker compose https://review.opendev.org/c/opendev/system-config/+/937656 | 20:22 |
clarkb | fungi: I think it did what we want and chomped it in ansible somewhere | 20:31 |
clarkb | however now I notice the old command ran docker exec -t which means with a tty and the new one is docker-compose exec -T which means without a tty | 20:32 |
clarkb | I'm not sure if that matters or not for running git garbage collection | 20:32 |
clarkb | the original cron job used -t so we didn't add it later to make things happy. corvus you write that original cron job not sure if you remember if that was needed or not | 20:34 |
fungi | /var/spool/cron/crontabs/root was last updated 6 minutes ago on gitea09 | 20:34 |
clarkb | we can probably run the new cronjob manually now if we don't want to wait for it to run and possibly email us angrily | 20:34 |
clarkb | fungi: huh I just did crontab -l earlier and it showed the new content then | 20:35 |
clarkb | it still shows the new conten tso not sure what made it update | 20:35 |
fungi | -55 16 * * */2 docker exec -t gitea-docker_gitea-web_1 find /data/git/repositories/ -maxdepth 2 -name *.git -type d -execdir git --git-dir={} gc --quiet \; | 20:35 |
fungi | +55 16 * * */2 /usr/local/bin/docker-compose -f /etc/gitea-docker/docker-compose.yaml exec -T gitea-web find /data/git/repositories/ -maxdepth 2 -name *.git -type d -execdir git --git-dir={} gc --quiet \; | 20:35 |
clarkb | yup thats what I see from the crontab -l output and I think that all looks good except maybe we need to drop the -T if the old -t was necessary | 20:36 |
clarkb | fungi: are you maybe in a position to manually run the new cronjob in a screen and see if it works as is? | 20:36 |
fungi | sure, just a sec | 20:36 |
fungi | in progress in a root screen session on gitea09 | 20:37 |
clarkb | probably need to check $? after it returns to see if it was happy or not | 20:38 |
fungi | i planned to, yeah | 20:38 |
fungi | it seems to take a few minutes to complete | 20:38 |
opendevreview | Merged opendev/system-config master: Update gitea containers to use journald logging https://review.opendev.org/c/opendev/system-config/+/937657 | 20:38 |
clarkb | lodgeit did restart and the logging seems to still go to /var/log/containers so I think that is looking good | 20:40 |
clarkb | https://paste.opendev.org/show/br9trC10ppXbuoWBgNaW/ and I made a test paste | 20:41 |
clarkb | the giteas do not appear to hvae restarted for the syslog -> journald change (expected). The haproxy for opendev.org did restart (expected) | 20:47 |
clarkb | the haproxy for zuul is about to restart if it hasn't yet | 20:48 |
fungi | gc on gitea09 is still going | 20:49 |
clarkb | if you ps -elf | grep gc you can usually catch the repo it is doing | 20:50 |
clarkb | it does seem to be making progress at least | 20:50 |
clarkb | zuul lb looks happy and did restart | 20:50 |
fungi | yeah, it's on openstack/tacker so should finish soon i hope | 20:50 |
clarkb | I think the only unknowns are if git gc is happy (have to check exit code) and restarting services for gitea and gerrit to pick up the new logging config | 20:51 |
fungi | well, openstack/cinder now, so not alpha order | 20:51 |
clarkb | once gc is happy I'll work on restarting the gitea services on the 6 backends to pickup the new logging config | 20:52 |
fungi | well, i don't really know how long the git gc will take | 20:53 |
clarkb | my guess is it will be done by the time I'm done with lunch stuff | 20:54 |
clarkb | another 10 minute sor so? | 20:54 |
clarkb | fungi: but I cant stop the gitea container the gc is running in without stopping the gc | 20:54 |
clarkb | so I have to wait for gitea09 anyway. I could start on 10-14 though | 20:54 |
fungi | okay, it just finished | 20:55 |
fungi | root@gitea09:~# echo $? | 20:56 |
fungi | 0 | 20:56 |
clarkb | perfect I guess the old -t was not needed so having -T is correct | 20:56 |
fungi | yeah. lgtm | 20:56 |
clarkb | fungi: I detached from the screen and yo ucan clos eit whenever yo ulike | 20:56 |
fungi | done | 20:56 |
clarkb | fungi: you may want to double check the bits o fthe mailman playbook that I updated too | 20:57 |
corvus | clarkb: sorry i don't know if the "-t" was required. my feeling is: whatever works, works, and sounds like fungi has established that. :) | 20:57 |
clarkb | corvus: ++ | 20:57 |
clarkb | fungi: probably need to look at the logs on bridge for that | 20:57 |
clarkb | I need to relocate back to the office then will work on service restarts to pick up the logging change on the giteas | 20:58 |
opendevreview | Merged opendev/system-config master: Update Gerrit db container to use journald logging https://review.opendev.org/c/opendev/system-config/+/937791 | 21:00 |
clarkb | ok processing gitea14 first and will work backward through the list. I'm pulling servers out of the load balancer before I restart them too | 21:03 |
clarkb | that all looks good (logs still go to /var/log/containers and containers started without complaint as far as I can tell) I'll proceed through the list | 21:05 |
clarkb | that is done | 21:16 |
clarkb | I'm now of two minds A) go ahead and restart gerrit once 937791 applies to pick up the journald logging change ( Ithink hourly jobs are currently running so it hasn't applied yet) or B) wait until tomorrow anway and just check that the giteas and paste don't have anything unexpected from journald logging | 21:17 |
clarkb | actually looks like the gerrit deploy for journald logging went before the hourly jobs | 21:21 |
clarkb | double checking the disk contents confirms | 21:22 |
clarkb | I think I'm leaning towards doing the gerrit restart later | 21:22 |
clarkb | but happy for others to override me on that and just get it done | 21:23 |
clarkb | from gitea09 /dev/vda1 155G 47G 109G 30% / if we wait on gerrit we can see if that moves significantly | 21:26 |
clarkb | /dev/root 39G 8.6G 31G 23% / is paste | 21:27 |
tonyb | jcapitao[m], karolinku[m]: I'm sorry with PTO and being unwell I let the ball drop on the CentOS-10 issues. care to update me on current state and let me know what help you need from me, if any? | 22:13 |
clarkb | tonyb: my understanding is the rhel 10 (and also centos 10 stream) have decided that x86-64-v3 is the minimal level of hardware support for those distro releases | 22:15 |
clarkb | tonyb: problem is very few of our clouds (if any) currently provide hardware with those capabilitie s(particularly avx is an issue) | 22:16 |
clarkb | I think of our clouds maybe vexxhost, raxflex, and openmetal can provide that level of cpu cpability but they are far from a majority of available resources and their flavors may not provide those features yet so another level of need to update those bits may be required | 22:17 |
fungi | as for openmetal, we may need to make our own nova config adjustments to expose that flag, i haven't checked | 22:17 |
clarkb | my personal take on this is that red hat has chosen poorly by getting ahead of the cloud providers and what is testable in the wild and other distros are making very different approaches (suse mixes in v3 capable compiled software on top of the normal distro and alma linux is doing v2 capable rebuild alongside the default of v3) | 22:18 |
fungi | regardless, yes, the bulk of our quota comes from regions in ovh and rackspace classic, neither of which support centos 10 guests | 22:18 |
clarkb | I think that red hat should be working with cloud providers to address this and not using us as a proxy. I don't feel this is a fight we should be involved in | 22:18 |
fungi | (unless red hat backtracks on their choice of compiler options) | 22:19 |
clarkb | fungi: I found a post from alma that made it seem that was unlikely let me see if I can dig that up again | 22:19 |
clarkb | I'm not sure if rockylinux is doing anything different NeilHanlon may know | 22:19 |
tonyb | clarkb: Yeah the minimum bump and implementation thereof was flagged, and it appears ignored. | 22:20 |
fungi | it's possible red hat is assuming that once centos 10 and rhel 10 release, cloud providers will be forced to upgrade their infrastructure. in the meantime though, we can't really assist with testing it | 22:20 |
clarkb | https://almalinux.org/blog/2024-10-22-introducing-almalinux-os-kitten/ has a "AlmaLinux OS Kitten includes an additional build using x86-64-v2" section | 22:20 |
tonyb | I doubt very much that there will be a back-track. | 22:20 |
clarkb | ya I don't expect a backtrack. Mor ethat I don't want to be in the middle expected to solve these problems | 22:21 |
fungi | then hopefully red hat and the centos community collectively are prepared for it not to be usable in lots of existing places | 22:21 |
clarkb | if red hat wants to work with clouds to make their distro work in those clouds we'll take advatnage of it | 22:21 |
tonyb | So assuming that we have a cloud that can support v3, we could provide a nodepool label, like we do with nested-virt, for some testing of CentOS-10, or is that not-okay | 22:22 |
clarkb | so far I'v ebeen operating under the assumption that that isn't ok | 22:23 |
clarkb | the reason is if that cloud goes away we can no longer run centos 10 at all | 22:23 |
fungi | 1. someone needs to figure out where those are, and 2. it's likely to represent a very small proportion of our available quota | 22:23 |
clarkb | whereas with nested virt if that goes away all of our platforms continue to work only specific jobs don't | 22:23 |
clarkb | and even those specific jobs may work just more slowly | 22:23 |
fungi | "support" for centos 10 testing might be on equal (or worse) footing to arm testing | 22:24 |
clarkb | that also means taking a stance that centos 10 gets to run on only our fastest resources | 22:24 |
clarkb | which I think is unfair from a general scheduling perspective | 22:24 |
tonyb | Ah I see the distinction. I admit I wasn't thinking far enough ahead | 22:24 |
clarkb | as far as determining where we can run these things we should be logging cpu flags via /proc/cpuinfo captures in every job now | 22:26 |
tonyb | I was thinking only about the DIB aspect, as in making sure that DIB would work with CentOS-10, not the general, now we have images let's run them | 22:26 |
clarkb | so it should be possible to grab those from jobs that run in every cloud region and see what if any of them have the required flags to support v3 | 22:26 |
fungi | i don't personally object to folks working on getting it going, but be aware that from an established public cloud perspective it seems beyond bleeding-edge | 22:26 |
fungi | which is an unusually out-of-character decision for red hat | 22:27 |
fungi | though maybe the shift in how they look at centos explains it (they probably aren't expecting users to want to boot rhel 10 on public clouds any time soon) | 22:27 |
clarkb | oh hrm I thought we merged the change to get cpuinfo in zuul-jobs but now I'm not finding that in our regular job logs /me digs more | 22:29 |
fungi | there was discussion of adding it to the zuul/zuul-jobs role that collects routes and disk utilization | 22:30 |
clarkb | yes I thought that landed | 22:30 |
tonyb | Yeah I thought the cpuinfo stuff merged | 22:31 |
clarkb | https://review.opendev.org/c/zuul/zuul-jobs/+/937376 it did land now to find the info in the logs | 22:31 |
clarkb | https://zuul.opendev.org/t/openstack/build/76b63ed3066146d69c9901ef55427e74/console#0/3/5/debian-bullseye | 22:32 |
clarkb | maybe we don't write it to the host info file but its there in ansible? | 22:32 |
clarkb | we are missing osxsave and lzcnt in ^ which was an openmetal node | 22:34 |
clarkb | apparently intel consideres lzcnt part of bmi1 which we do have but it advertises the feature separately so not sure if we actually have it or not | 22:35 |
fungi | i think there's a filter in the nova/libvirt config that has to include flags we want passed through to guests? | 22:36 |
clarkb | for adding centos 10 stream support to dib we can't even run the functests because they chroot and expect executables in the chroot to run iirc :/ | 22:36 |
fungi | so would need job nodes that are capable of running centos 10 binaries | 22:37 |
clarkb | fungi: the config is a bit more complicated than that. When using kvm (not qemu) you pick using host pasthrough or a custom model. You can pick from predefiend custom modules or define your own in libvirt | 22:37 |
clarkb | fungi: yes I think have any testing for cnetos 10 in dib requires us to have test nodes capable of running centos 10 | 22:37 |
fungi | got it, so still possible our cpus in openmetal would work and "just" need config adjustments | 22:37 |
clarkb | also those cpu models are per hypervisor / nova compute setup not a flavor thing iirc | 22:38 |
clarkb | fungi: ya or /proc/cpuinfo doesn't report lzcnt because its part of bmi1 and osxsave is under some other flag and we're good in openmetal already | 22:38 |
fungi | oh, that's a good point | 22:38 |
fungi | i don't know where to find the documentation that would confirm or refute that though | 22:39 |
fungi | and would probably resort to just trying to run something there instead and see if it works | 22:39 |
clarkb | ya problem with that is it works until you find some other piece of software relying on a feature you thought was good but didn't actually check | 22:39 |
clarkb | I wonder if there is a tool in linux to get a report of level and what is missing for other levels | 22:40 |
tonyb | Yeah okay, that matches what I though WRT nova and what my testing in openmetal+vexxhost indicated. | 22:40 |
tonyb | clarkb: I expect there is but I don't know of one off the top of my head. | 22:41 |
clarkb | `ld.so --help` reports it | 22:43 |
clarkb | but not what is missing from unsupported levels | 22:43 |
clarkb | also we know that some cloud regions don't have consistent levels but we can probably ignore that for now if we establish a baseline | 22:44 |
tonyb | Sort of, it reports the variants it supports and which are detected, so if your libc *only* supports v3 you essentially get nothing useful there | 22:44 |
clarkb | the mirror on openemtal seems to report v3 is supported and searched | 22:48 |
clarkb | tonyb: are you saying that ld.so is rpeorting what it was compiled to support not what the hardware supports? | 22:49 |
tonyb | clarkb: That's my understanding | 22:49 |
clarkb | the jammy mirror in openmetal says v4 is supported but the noble mirror in raxflex does not | 22:50 |
tonyb | Hmm okay. that's confusing :/ | 22:51 |
clarkb | the focal mirror in both vexxhost regions just errors | 22:52 |
clarkb | cannot load shared object etc | 22:52 |
clarkb | from that I do suspect that openmetal and raxflex support it | 22:55 |
tonyb | Oh I thought vexxhost did too. | 22:56 |
clarkb | ovh mirror says searched and supported | 22:58 |
clarkb | so maybe the main issue is rax and/or vexxhost? (just haven't been able to get data from vexxhost yet) | 22:58 |
clarkb | or there is plenty of variance ? | 22:58 |
tonyb | I suspect that RAX for sure has enough variance to be a problem | 23:00 |
clarkb | tonyb: the nested virt label did not work in initial testing with dib support and that includes raxflex, vexxhost, openmetal, and ovh iirc | 23:01 |
clarkb | so at least one of them doens't work. Which may also mean maybe this method of checking is invalid | 23:01 |
tonyb | clarkb: Yeah, I thought that was for $other reasons though | 23:02 |
clarkb | you mean a problem other than v3 cpu support? | 23:02 |
clarkb | understanding why the nested virt lable didn't work is probably a good starting point since that constrains the problme space a bit | 23:05 |
tonyb | Yeah I thought the job did something funky because it had nested support and the $funky failed, but I could easily be wrong | 23:05 |
tonyb | I have a small bash script that should (untested) report which flags are missing | 23:07 |
tonyb | well it's untested in that the laptop I wrote it on has all the tested flags | 23:07 |
clarkb | my local jammy fileserver reports v2 only supported and searched and not v3 | 23:13 |
clarkb | which I think is accurate for that cpu so this detecton method is at least sort of working | 23:13 |
clarkb | ya no avx on that system | 23:13 |
tonyb | Can you run: https://paste.opendev.org/show/b0Zw2AdKxbaInIvhhBLu/ on it ? | 23:14 |
tonyb | note the final flag "cve12" is my bogus flag to verify that it does fail to detect a flag | 23:15 |
clarkb | tonyb: ya give me a few (I want to understand it before running it). I also notice that qemu emulation of haswell feature srequires qemu 7.2 or newer | 23:17 |
clarkb | which may be a problme for the dib tests that build an image and check it (I don't know what version of qemu those currently have) | 23:17 |
tonyb | Yeah I don't know about qemu versions either | 23:18 |
clarkb | TIL here strings | 23:21 |
tonyb | clarkb: you're welcome? | 23:24 |
clarkb | tonyb: what is the flags="${flags## }" for? It seems to result in the same string at the end as the previous step for me | 23:25 |
clarkb | tonyb: also I haven't confirmed yet but I think your script may not match flgas at the beginning or end of the flags string due to he requirement for the spaces on either side? | 23:25 |
tonyb | "just in case the pevious line left spaces in the begining of the flags | 23:25 |
clarkb | ah ok but I think you do want a space at either end of the list for the case matching? | 23:26 |
clarkb | oh you embed that in the case staement on both side snevermind | 23:26 |
tonyb | the case " ${flags} " in ensures they're there | 23:27 |
tonyb | It's probably a little stupid to do it that way but I was rushing | 23:27 |
clarkb | tonyb: local msg is unneded. I ran it without the v4 detection and got all v1 and v2 are found but most of v3 is not found | 23:30 |
clarkb | so I think it is owrking | 23:30 |
tonyb | Okay cool. Thanks | 23:30 |
clarkb | my cpu is from 2016 fwiw | 23:31 |
clarkb | and my not very old laptop and desktop don't support v4 either | 23:32 |
clarkb | because they are amd and like one generation too old | 23:33 |
tonyb | Okay. I might make a patch for DIB to call that do aid with debugging. | 23:41 |
clarkb | tonyb: maybe capture the qemu version too. Thoughj I suspect we can infer that by checking the packaged version for the distro after the fact | 23:44 |
clarkb | crazy idea time: do everything on arm | 23:46 |
tonyb | LOL | 23:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!