Monday, 2024-12-16

fungi	yeah, it was 64m lines long and over 5gb in size when i deleted it	00:00
fungi	i'll look at this again tomorrow with a (hopefully) clearer head, but i think we're at the blow-it-away-and-start-over stage	00:01
*** ykarel_ is now known as ykarel		06:42
frickler	might be some actual loop caused by a symlink? do you remember one of the file paths?	11:06
frickler	also regarding the cirros cert, it seems we might soon need to make the warning threshold for expiry configurable anyway, LE plans to offer certs with just 6d validity https://letsencrypt.org/2024/12/11/eoy-letter-2024/	11:08
frickler	clarkb: sorry about lack of feedback for the held node. I guess we'd need a fresh node anyway if we wanted to continue to debug the ansible module. but that's also not on my priority list	11:09
fungi	frickler: the file paths seemed to be every deb file in the mirror	12:56
fungi	and source packages for them too, e.g.	12:58
fungi	pool/universe/g/golang-github-jacobsa-crypto/golang-github-jacobsa-crypto-dev_0.0~git20161111.0.293ce0c+dfsg1-7_arm64.deb	12:58
fungi	pool/universe/g/golang-github-jacobsa-crypto/golang-github-jacobsa-crypto_0.0~git20161111.0.293ce0c+dfsg1-7.debian.tar.xz	12:58
fungi	pool/universe/g/golang-github-jacobsa-crypto/golang-github-jacobsa-crypto_0.0~git20161111.0.293ce0c+dfsg1-7.dsc	12:58
frickler	fungi: weird, sounds like the "big badaboom" approach would be the next best option to try indeed	13:44
frickler	infra-root: we're down to < 200 zuul config errors now due to various cleanups and there's more pending with the eom-eol transitions. so it would be great to also be able to tackle some of the big non-openstack offenders like https://zuul.opendev.org/t/openstack/config-errors?project=starlingx%2Fzuul-jobs&severity=error&skip=0&limit=50 and x/packstack, anyone willing to help with nagging the relevant	13:52
frickler	folks?	13:52
fungi	frickler: looks like packstack is part of rdo, so maybe we can find someone from their crowd to get it back on track (or retire it if development has moved elsewhere)	14:07
frickler	yes, my hope was that we might have enough redhat people around such that they could handle this kind of internally (looking at no infra root in particular ;-D)	14:14
ykarel	jcapitao[m], karolinku[m] if you can check those ^	14:50
jcapitao[m]	frickler: wrt Packstack you are referring to https://zuul.opendev.org/t/openstack/config-errors?project=x%2Fpackstack&severity=error&skip=0&limit=50 ?	14:55
jcapitao[m]	thanks ykarel for the ping	14:55
fungi	jcapitao[m]: correct. they could be solved through eol of the affected branches or adjustments to job configs on those branches	15:12
fungi	interesting that it has both a stable/yoga and unmaintained/yoga branch	15:12
fungi	looks like it transitioned to unmaintained/yoga but stable/yoga never got removed	15:13
jcapitao[m]	hmm those errors were already fixed	15:13
fungi	in each stable branch or just master? (master isn't reporting errors)	15:14
jcapitao[m]	hmm actually no I misread	15:16
jcapitao[m]	lemme fix that by eol most of them and fixing the active stable branches	15:16
fungi	sounds great, thanks!	15:16
frickler	cool, progress \o/	15:27
clarkb	Gerrit does appear to be pruning log files after all. The day off set between the two sets of logfiles persists and it seems to be doing a couple more days than 30 ( Ithink it is only counting compressed files not the current and yseterday uncompressed files)	15:48
clarkb	I think that is probably good enough and we can consider that done and land the followup to remove the cron from ansible completely if others agree	15:49
clarkb	https://review.opendev.org/c/opendev/system-config/+/937278 is the change for that	15:49
fungi	i already voted in favor, happy for you to self-approve it if you don't think it's likely to get any additional feedback	15:52
clarkb	ack thanks It should be a noop at this point as the cronjob is gone	15:57
clarkb	but I'll triple check that before approving	15:57
opendevreview	Clark Boylan proposed opendev/system-config master: Switch mailman role to docker-compose exec https://review.opendev.org/c/opendev/system-config/+/937790	15:59
fungi	i'm disappearing for a bit to run some pre-travel errands and grab lunch, but should return in an hour or so	16:03
opendevreview	Clark Boylan proposed opendev/system-config master: Update Gerrit db container to use journald logging https://review.opendev.org/c/opendev/system-config/+/937791	16:05
opendevreview	Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792	16:06
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Run containers on Noble with docker compose and podman https://review.opendev.org/c/opendev/system-config/+/937641	16:08
clarkb	now to see if lists and review are happy with noble docker compose and podman	16:08
opendevreview	Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792	16:17
frickler	the config error fix for shade is failing CI as miserably as I expected. if people would review it anyway, I would just go ahead and force-merge it? then we can ignore that repo again until something really serious comes up https://review.opendev.org/937788 (cc gtema)	16:20
clarkb	frickler: isn't shade dead and rolled into openstacksdk? I wonder if we should just remove it from zuul instead?	16:22
frickler	clarkb: I can try that, yes, though I'm not sure how many references old stable branches might still have	16:34
gtema	frickler - +w-ed the change, clarkb - indeed, we could just drop zuul conf from the repo	16:34
clarkb	frickler: oh I meant remove it from the zuul tenant config not cleanup the config in shade itself	16:47
clarkb	though you could do both	16:47
frickler	clarkb: I read that as dropping it from the zuul tenant config, too, that can still trigger issues for other repos that reference it. let me just push a change to test it	17:00
clarkb	oh I see what you mean. I thougth you meant needing to clean out .zuul.yaml in all stable branches for the repo	17:01
opendevreview	Dr. Jens Harbott proposed openstack/project-config master: DNM: Drop openstack/shade from zuul config https://review.opendev.org/c/openstack/project-config/+/937797	17:02
frickler	I guess if we want to really proceed with ^^ we'll need to split it and also include a governance update, but let's see what zuul says first	17:02
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Run containers on Noble with docker compose and podman https://review.opendev.org/c/opendev/system-config/+/937641	17:05
frickler	a second review on the stack at https://review.opendev.org/c/openstack/project-config/+/935696 would be nice	17:06
clarkb	frickler: not sure what the commit message means there, the project moved from some other zuul tenant to the openstack tenant?	17:08
clarkb	looks like we didn't move tenants we just added it to gerrit and zuul	17:09
clarkb	oh there are two repos one moved from the vexxhost to the openstack tenant and theo ther is a new repo that explains my confusion	17:10
opendevreview	Joel Capitao proposed openstack/project-config master: Authorize packstack-core to force push to remove branch https://review.opendev.org/c/openstack/project-config/+/937792	17:11
frickler	oh, I missed that the second governance patch is still pending, sorry :-(	17:22
clarkb	its fine now were' ready to land things on our side with a quick recheck once governance is sorted	17:22
clarkb	cool I think 937792 shows mailman and gerrit stuff working happily too. Zuul is the last big one that I've been avoiding becuse I know we rely on the docker exec containername process in those plays/roles quite a bit and mechanically I know we can convert them to a compatible system but still a bit of work to get through	17:38
clarkb	anyway at thsi point we've probably got enough data to discuss if we like the approach, have any more concerns we want to test through, etc I'll make sure thati s part of tomorrow's meeting agenda	17:38
fungi	i'll be in the middle of crazy holiday highway traffic during the meeting, but i'm advance registering my preference for that future direction	17:55
clarkb	fungi: if you have time before you're driving can you review the changes under topic:podman-prep even if it isn't a full review just check out the shape of things and call out any concerns if you have them	18:02
fungi	you bet	18:07
clarkb	I have confirmed that the cronjob on review02 appears to be gone. I will approve the change ot remove it from ansible now	18:12
fungi	thanks!	18:14
clarkb	I'm dropping gerrit 3.10 stuff from the meeting agenda too as ^ was the last thing remaining related to it	18:25
fungi	having heard no objections so far, i'll plan to blow away the contents of the ubuntu-ports volume and let our script bootstrap it from scratch again. at least the impact of it being stale for another ~week will likely continue to go unnoticed	18:26
clarkb	fungi: in theory that won't break running jobs since the ro copy will stay as is for now?	18:26
fungi	if anybody strongly disagrees with that approach and wants to try their hand at troubleshooting the present state of the mirror, feel free	18:26
fungi	clarkb: correct, we'll continue serving the old (stale) state until it's done	18:27
clarkb	that is my only real concern	18:27
fungi	to be fair, nobody brought it up until i happened to notice it after fixing the stale state of our regular ubuntu miror (presumably because arm64 jobs aren't as closely scrutinized)	18:28
fungi	i'm just hoping to get it back to working before it becomes a job-affecting issue	18:28
clarkb	++	18:29
fungi	but this time of year it seems like we can probably afford to wait the near-week that bootstrapping and vos releasing it from zero will require	18:29
clarkb	I'm also going to drop backup server pruning/purging from the agenda. I think that reached a reasonable conclusion last week (though we can continue to apply ot to the other backup server when it starts to fill up)	18:29
fungi	sgtm	18:29
clarkb	ok those agenda edits are in. I'll send it out later today after others have a hcance to chime in on any other edits	18:37
fungi	i've started recursively deleting all the contents of /afs/.openstack.org/mirror/ubuntu-ports	18:39
clarkb	I guess I should add note about ubuntu-ports mirroing	18:40
fungi	once it completes, i'll start another run of the usual script which should hopefully redownload everything and repopulate the databases	18:40
fungi	please do, just be aware i won't be around for that discussion	18:40
clarkb	yup mostly a "be aware of this situation" thing more than expecting you to chime in tomorrow	18:41
fungi	that has already finished, starting the script now	18:52
fungi	it's running in a root screen session on mirror-update, and i'll try to check in on it from time to time over the course of the week	18:56
opendevreview	Merged opendev/system-config master: Drop Gerrit log cleanup cron from Ansible https://review.opendev.org/c/opendev/system-config/+/937278	19:33
clarkb	https://zuul.opendev.org/t/openstack/build/f05e379479794954bab4319521a221a6 zuul reports ^ deployed successfully	19:37
clarkb	(it should be a noop)	19:37
fungi	excellent~!	19:38
clarkb	looks like we're approving the prep changes. One thing to note about that is for some services we may restart automatically and others we may not	19:44
fungi	yes	19:44
clarkb	I'll have to take stock of that once things land and work through what doesn't (I know gerrit won't for example)	19:45
fungi	seems like things are pretty quiet this week, so manual restarts are probably doable whenever is convenient	19:45
clarkb	looks like gitea won't either	19:45
clarkb	yup its a good time to work through things like that	19:45
fungi	the sooner the better as far as i'm concerned	19:46
clarkb	I think lodgeit may automatically restart	19:46
fungi	mailman should as well	19:46
fungi	but worth double-checking	19:46
clarkb	I in mailman's case we are only changing the ocnfig management checks	19:47
fungi	it may not restart if images don't change (which they probably won't)	19:47
clarkb	so I think those should noop	19:47
fungi	agreed	19:47
fungi	since we build our own mailman container images, the only unknown is if the upstream mariadb container images change i guess	19:48
clarkb	ya	19:49
clarkb	the main thing I think to be on the lookout for is syslog -> journald logging change and we apply that to paste, gitea, and gerrit	19:50
clarkb	I think paste may be automatic but not gitea or gerrit	19:50
clarkb	I can work through manual restarts of gitea later today then see where we are at before doing gerrit. That might happen tomorrow	19:50
opendevreview	Merged opendev/system-config master: Refactor check for new container images https://review.opendev.org/c/opendev/system-config/+/937655	19:58
clarkb	that change is deploying to gitea09 now whihc should noop unless there is a new mariadb image	20:00
clarkb	then later when the logging change lands I can manually restart things	20:00
clarkb	I have imageand container listings done on gitea09 and gitea10 that I will check when the job is done just to confirm the noop	20:00
clarkb	and then I'll work on lunch	20:01
clarkb	the gitea job completed and as far as I an tell we did not grab new images and did not restart services (the expected behavior	20:06
clarkb	fungi: I've just realized that the string change in https://review.opendev.org/c/opendev/system-config/+/937717/4/playbooks/roles/gitea/tasks/main.yaml may append a newline t othose cron commands	20:10
clarkb	in theory this isn't an issue but I'm not positive of that	20:10
clarkb	so that will need checking after it lands I guess	20:11
clarkb	I think worst case ansible might put a stray newline in the crontab file which is fine	20:11
fungi	i'll check once it deploys	20:11
clarkb	meetpad didn't get new images or restart as expected so that check seems to be working in the no new imgaes cases	20:11
fungi	i've made a dump of the root crontab on gitea09 for comparison	20:12
fungi	i'll diff it after	20:13
clarkb	cool based on other command blocks I don't think the command module is bother by the newline at the end so its jus tthe cron entry in there that I'm slightly ocncerned about	20:13
clarkb	actually we may have examples of that elsewhere /me looks	20:14
clarkb	fungi: the gitea db backup cron uses > and not >- (I think >- eats the newline) and that cron job seems to be fine. Your diff will hopefully confirm	20:15
clarkb	and now I need to make lunch	20:15
opendevreview	Merged opendev/system-config master: Use docker-compose for container execs in gitea https://review.opendev.org/c/opendev/system-config/+/937717	20:22
opendevreview	Merged opendev/system-config master: Switch mailman role to docker-compose exec https://review.opendev.org/c/opendev/system-config/+/937790	20:22
opendevreview	Merged opendev/system-config master: Log with journald and not syslog in lodgeit docker compose https://review.opendev.org/c/opendev/system-config/+/937656	20:22
clarkb	fungi: I think it did what we want and chomped it in ansible somewhere	20:31
clarkb	however now I notice the old command ran docker exec -t which means with a tty and the new one is docker-compose exec -T which means without a tty	20:32
clarkb	I'm not sure if that matters or not for running git garbage collection	20:32
clarkb	the original cron job used -t so we didn't add it later to make things happy. corvus you write that original cron job not sure if you remember if that was needed or not	20:34
fungi	/var/spool/cron/crontabs/root was last updated 6 minutes ago on gitea09	20:34
clarkb	we can probably run the new cronjob manually now if we don't want to wait for it to run and possibly email us angrily	20:34
clarkb	fungi: huh I just did crontab -l earlier and it showed the new content then	20:35
clarkb	it still shows the new conten tso not sure what made it update	20:35
fungi	-55 16 * * /2 docker exec -t gitea-docker_gitea-web_1 find /data/git/repositories/ -maxdepth 2 -name .git -type d -execdir git --git-dir={} gc --quiet \;	20:35
fungi	+55 16 * * /2 /usr/local/bin/docker-compose -f /etc/gitea-docker/docker-compose.yaml exec -T gitea-web find /data/git/repositories/ -maxdepth 2 -name .git -type d -execdir git --git-dir={} gc --quiet \;	20:35
clarkb	yup thats what I see from the crontab -l output and I think that all looks good except maybe we need to drop the -T if the old -t was necessary	20:36
clarkb	fungi: are you maybe in a position to manually run the new cronjob in a screen and see if it works as is?	20:36
fungi	sure, just a sec	20:36
fungi	in progress in a root screen session on gitea09	20:37
clarkb	probably need to check $? after it returns to see if it was happy or not	20:38
fungi	i planned to, yeah	20:38
fungi	it seems to take a few minutes to complete	20:38
opendevreview	Merged opendev/system-config master: Update gitea containers to use journald logging https://review.opendev.org/c/opendev/system-config/+/937657	20:38
clarkb	lodgeit did restart and the logging seems to still go to /var/log/containers so I think that is looking good	20:40
clarkb	https://paste.opendev.org/show/br9trC10ppXbuoWBgNaW/ and I made a test paste	20:41
clarkb	the giteas do not appear to hvae restarted for the syslog -> journald change (expected). The haproxy for opendev.org did restart (expected)	20:47
clarkb	the haproxy for zuul is about to restart if it hasn't yet	20:48
fungi	gc on gitea09 is still going	20:49
clarkb	if you ps -elf \| grep gc you can usually catch the repo it is doing	20:50
clarkb	it does seem to be making progress at least	20:50
clarkb	zuul lb looks happy and did restart	20:50
fungi	yeah, it's on openstack/tacker so should finish soon i hope	20:50
clarkb	I think the only unknowns are if git gc is happy (have to check exit code) and restarting services for gitea and gerrit to pick up the new logging config	20:51
fungi	well, openstack/cinder now, so not alpha order	20:51
clarkb	once gc is happy I'll work on restarting the gitea services on the 6 backends to pickup the new logging config	20:52
fungi	well, i don't really know how long the git gc will take	20:53
clarkb	my guess is it will be done by the time I'm done with lunch stuff	20:54
clarkb	another 10 minute sor so?	20:54
clarkb	fungi: but I cant stop the gitea container the gc is running in without stopping the gc	20:54
clarkb	so I have to wait for gitea09 anyway. I could start on 10-14 though	20:54
fungi	okay, it just finished	20:55
fungi	root@gitea09:~# echo $?	20:56
fungi	0	20:56
clarkb	perfect I guess the old -t was not needed so having -T is correct	20:56
fungi	yeah. lgtm	20:56
clarkb	fungi: I detached from the screen and yo ucan clos eit whenever yo ulike	20:56
fungi	done	20:56
clarkb	fungi: you may want to double check the bits o fthe mailman playbook that I updated too	20:57
corvus	clarkb: sorry i don't know if the "-t" was required. my feeling is: whatever works, works, and sounds like fungi has established that. :)	20:57
clarkb	corvus: ++	20:57
clarkb	fungi: probably need to look at the logs on bridge for that	20:57
clarkb	I need to relocate back to the office then will work on service restarts to pick up the logging change on the giteas	20:58
opendevreview	Merged opendev/system-config master: Update Gerrit db container to use journald logging https://review.opendev.org/c/opendev/system-config/+/937791	21:00
clarkb	ok processing gitea14 first and will work backward through the list. I'm pulling servers out of the load balancer before I restart them too	21:03
clarkb	that all looks good (logs still go to /var/log/containers and containers started without complaint as far as I can tell) I'll proceed through the list	21:05
clarkb	that is done	21:16
clarkb	I'm now of two minds A) go ahead and restart gerrit once 937791 applies to pick up the journald logging change ( Ithink hourly jobs are currently running so it hasn't applied yet) or B) wait until tomorrow anway and just check that the giteas and paste don't have anything unexpected from journald logging	21:17
clarkb	actually looks like the gerrit deploy for journald logging went before the hourly jobs	21:21
clarkb	double checking the disk contents confirms	21:22
clarkb	I think I'm leaning towards doing the gerrit restart later	21:22
clarkb	but happy for others to override me on that and just get it done	21:23
clarkb	from gitea09 /dev/vda1 155G 47G 109G 30% / if we wait on gerrit we can see if that moves significantly	21:26
clarkb	/dev/root 39G 8.6G 31G 23% / is paste	21:27
tonyb	jcapitao[m], karolinku[m]: I'm sorry with PTO and being unwell I let the ball drop on the CentOS-10 issues. care to update me on current state and let me know what help you need from me, if any?	22:13
clarkb	tonyb: my understanding is the rhel 10 (and also centos 10 stream) have decided that x86-64-v3 is the minimal level of hardware support for those distro releases	22:15
clarkb	tonyb: problem is very few of our clouds (if any) currently provide hardware with those capabilitie s(particularly avx is an issue)	22:16
clarkb	I think of our clouds maybe vexxhost, raxflex, and openmetal can provide that level of cpu cpability but they are far from a majority of available resources and their flavors may not provide those features yet so another level of need to update those bits may be required	22:17
fungi	as for openmetal, we may need to make our own nova config adjustments to expose that flag, i haven't checked	22:17
clarkb	my personal take on this is that red hat has chosen poorly by getting ahead of the cloud providers and what is testable in the wild and other distros are making very different approaches (suse mixes in v3 capable compiled software on top of the normal distro and alma linux is doing v2 capable rebuild alongside the default of v3)	22:18
fungi	regardless, yes, the bulk of our quota comes from regions in ovh and rackspace classic, neither of which support centos 10 guests	22:18
clarkb	I think that red hat should be working with cloud providers to address this and not using us as a proxy. I don't feel this is a fight we should be involved in	22:18
fungi	(unless red hat backtracks on their choice of compiler options)	22:19
clarkb	fungi: I found a post from alma that made it seem that was unlikely let me see if I can dig that up again	22:19
clarkb	I'm not sure if rockylinux is doing anything different NeilHanlon may know	22:19
tonyb	clarkb: Yeah the minimum bump and implementation thereof was flagged, and it appears ignored.	22:20
fungi	it's possible red hat is assuming that once centos 10 and rhel 10 release, cloud providers will be forced to upgrade their infrastructure. in the meantime though, we can't really assist with testing it	22:20
clarkb	https://almalinux.org/blog/2024-10-22-introducing-almalinux-os-kitten/ has a "AlmaLinux OS Kitten includes an additional build using x86-64-v2" section	22:20
tonyb	I doubt very much that there will be a back-track.	22:20
clarkb	ya I don't expect a backtrack. Mor ethat I don't want to be in the middle expected to solve these problems	22:21
fungi	then hopefully red hat and the centos community collectively are prepared for it not to be usable in lots of existing places	22:21
clarkb	if red hat wants to work with clouds to make their distro work in those clouds we'll take advatnage of it	22:21
tonyb	So assuming that we have a cloud that can support v3, we could provide a nodepool label, like we do with nested-virt, for some testing of CentOS-10, or is that not-okay	22:22
clarkb	so far I'v ebeen operating under the assumption that that isn't ok	22:23
clarkb	the reason is if that cloud goes away we can no longer run centos 10 at all	22:23
fungi	1. someone needs to figure out where those are, and 2. it's likely to represent a very small proportion of our available quota	22:23
clarkb	whereas with nested virt if that goes away all of our platforms continue to work only specific jobs don't	22:23
clarkb	and even those specific jobs may work just more slowly	22:23
fungi	"support" for centos 10 testing might be on equal (or worse) footing to arm testing	22:24
clarkb	that also means taking a stance that centos 10 gets to run on only our fastest resources	22:24
clarkb	which I think is unfair from a general scheduling perspective	22:24
tonyb	Ah I see the distinction. I admit I wasn't thinking far enough ahead	22:24
clarkb	as far as determining where we can run these things we should be logging cpu flags via /proc/cpuinfo captures in every job now	22:26
tonyb	I was thinking only about the DIB aspect, as in making sure that DIB would work with CentOS-10, not the general, now we have images let's run them	22:26
clarkb	so it should be possible to grab those from jobs that run in every cloud region and see what if any of them have the required flags to support v3	22:26
fungi	i don't personally object to folks working on getting it going, but be aware that from an established public cloud perspective it seems beyond bleeding-edge	22:26
fungi	which is an unusually out-of-character decision for red hat	22:27
fungi	though maybe the shift in how they look at centos explains it (they probably aren't expecting users to want to boot rhel 10 on public clouds any time soon)	22:27
clarkb	oh hrm I thought we merged the change to get cpuinfo in zuul-jobs but now I'm not finding that in our regular job logs /me digs more	22:29
fungi	there was discussion of adding it to the zuul/zuul-jobs role that collects routes and disk utilization	22:30
clarkb	yes I thought that landed	22:30
tonyb	Yeah I thought the cpuinfo stuff merged	22:31
clarkb	https://review.opendev.org/c/zuul/zuul-jobs/+/937376 it did land now to find the info in the logs	22:31
clarkb	https://zuul.opendev.org/t/openstack/build/76b63ed3066146d69c9901ef55427e74/console#0/3/5/debian-bullseye	22:32
clarkb	maybe we don't write it to the host info file but its there in ansible?	22:32
clarkb	we are missing osxsave and lzcnt in ^ which was an openmetal node	22:34
clarkb	apparently intel consideres lzcnt part of bmi1 which we do have but it advertises the feature separately so not sure if we actually have it or not	22:35
fungi	i think there's a filter in the nova/libvirt config that has to include flags we want passed through to guests?	22:36
clarkb	for adding centos 10 stream support to dib we can't even run the functests because they chroot and expect executables in the chroot to run iirc :/	22:36
fungi	so would need job nodes that are capable of running centos 10 binaries	22:37
clarkb	fungi: the config is a bit more complicated than that. When using kvm (not qemu) you pick using host pasthrough or a custom model. You can pick from predefiend custom modules or define your own in libvirt	22:37
clarkb	fungi: yes I think have any testing for cnetos 10 in dib requires us to have test nodes capable of running centos 10	22:37
fungi	got it, so still possible our cpus in openmetal would work and "just" need config adjustments	22:37
clarkb	also those cpu models are per hypervisor / nova compute setup not a flavor thing iirc	22:38
clarkb	fungi: ya or /proc/cpuinfo doesn't report lzcnt because its part of bmi1 and osxsave is under some other flag and we're good in openmetal already	22:38
fungi	oh, that's a good point	22:38
fungi	i don't know where to find the documentation that would confirm or refute that though	22:39
fungi	and would probably resort to just trying to run something there instead and see if it works	22:39
clarkb	ya problem with that is it works until you find some other piece of software relying on a feature you thought was good but didn't actually check	22:39
clarkb	I wonder if there is a tool in linux to get a report of level and what is missing for other levels	22:40
tonyb	Yeah okay, that matches what I though WRT nova and what my testing in openmetal+vexxhost indicated.	22:40
tonyb	clarkb: I expect there is but I don't know of one off the top of my head.	22:41
clarkb	`ld.so --help` reports it	22:43
clarkb	but not what is missing from unsupported levels	22:43
clarkb	also we know that some cloud regions don't have consistent levels but we can probably ignore that for now if we establish a baseline	22:44
tonyb	Sort of, it reports the variants it supports and which are detected, so if your libc only supports v3 you essentially get nothing useful there	22:44
clarkb	the mirror on openemtal seems to report v3 is supported and searched	22:48
clarkb	tonyb: are you saying that ld.so is rpeorting what it was compiled to support not what the hardware supports?	22:49
tonyb	clarkb: That's my understanding	22:49
clarkb	the jammy mirror in openmetal says v4 is supported but the noble mirror in raxflex does not	22:50
tonyb	Hmm okay. that's confusing :/	22:51
clarkb	the focal mirror in both vexxhost regions just errors	22:52
clarkb	cannot load shared object etc	22:52
clarkb	from that I do suspect that openmetal and raxflex support it	22:55
tonyb	Oh I thought vexxhost did too.	22:56
clarkb	ovh mirror says searched and supported	22:58
clarkb	so maybe the main issue is rax and/or vexxhost? (just haven't been able to get data from vexxhost yet)	22:58
clarkb	or there is plenty of variance ?	22:58
tonyb	I suspect that RAX for sure has enough variance to be a problem	23:00
clarkb	tonyb: the nested virt label did not work in initial testing with dib support and that includes raxflex, vexxhost, openmetal, and ovh iirc	23:01
clarkb	so at least one of them doens't work. Which may also mean maybe this method of checking is invalid	23:01
tonyb	clarkb: Yeah, I thought that was for $other reasons though	23:02
clarkb	you mean a problem other than v3 cpu support?	23:02
clarkb	understanding why the nested virt lable didn't work is probably a good starting point since that constrains the problme space a bit	23:05
tonyb	Yeah I thought the job did something funky because it had nested support and the $funky failed, but I could easily be wrong	23:05
tonyb	I have a small bash script that should (untested) report which flags are missing	23:07
tonyb	well it's untested in that the laptop I wrote it on has all the tested flags	23:07
clarkb	my local jammy fileserver reports v2 only supported and searched and not v3	23:13
clarkb	which I think is accurate for that cpu so this detecton method is at least sort of working	23:13
clarkb	ya no avx on that system	23:13
tonyb	Can you run: https://paste.opendev.org/show/b0Zw2AdKxbaInIvhhBLu/ on it ?	23:14
tonyb	note the final flag "cve12" is my bogus flag to verify that it does fail to detect a flag	23:15
clarkb	tonyb: ya give me a few (I want to understand it before running it). I also notice that qemu emulation of haswell feature srequires qemu 7.2 or newer	23:17
clarkb	which may be a problme for the dib tests that build an image and check it (I don't know what version of qemu those currently have)	23:17
tonyb	Yeah I don't know about qemu versions either	23:18
clarkb	TIL here strings	23:21
tonyb	clarkb: you're welcome?	23:24
clarkb	tonyb: what is the flags="${flags## }" for? It seems to result in the same string at the end as the previous step for me	23:25
clarkb	tonyb: also I haven't confirmed yet but I think your script may not match flgas at the beginning or end of the flags string due to he requirement for the spaces on either side?	23:25
tonyb	"just in case the pevious line left spaces in the begining of the flags	23:25
clarkb	ah ok but I think you do want a space at either end of the list for the case matching?	23:26
clarkb	oh you embed that in the case staement on both side snevermind	23:26
tonyb	the case " ${flags} " in ensures they're there	23:27
tonyb	It's probably a little stupid to do it that way but I was rushing	23:27
clarkb	tonyb: local msg is unneded. I ran it without the v4 detection and got all v1 and v2 are found but most of v3 is not found	23:30
clarkb	so I think it is owrking	23:30
tonyb	Okay cool. Thanks	23:30
clarkb	my cpu is from 2016 fwiw	23:31
clarkb	and my not very old laptop and desktop don't support v4 either	23:32
clarkb	because they are amd and like one generation too old	23:33
tonyb	Okay. I might make a patch for DIB to call that do aid with debugging.	23:41
clarkb	tonyb: maybe capture the qemu version too. Thoughj I suspect we can infer that by checking the packaged version for the distro after the fact	23:44
clarkb	crazy idea time: do everything on arm	23:46
tonyb	LOL	23:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!