Thursday, 2022-05-19

*** rlandy\|bbl is now known as rlandy\|out		00:45
*** ykarel_ is now known as ykarel		04:51
*** ysandeep\|out is now known as ysandeep\|rover		04:52
*** pojadhav\|afk is now known as pojadhav		05:10
*** ysandeep\|rover is now known as ysandeep\|rover\|brb		05:51
*** ysandeep\|rover\|brb is now known as ysandeep\|rover		05:56
*** jpena\|off is now known as jpena		07:36
*** ysandeep\|rover is now known as ysandeep\|rover\|lunch		09:31
*** ysandeep\|rover\|lunch is now known as ysandeep\|rover		10:09
*** rlandy\|out is now known as rlandy		10:27
*** dviroel\|out is now known as dviroel		11:26
*** rlandy is now known as rlandy\|mtg		11:26
*** ysandeep\|rover is now known as ysandeep\|rover\|afk		11:33
*** rlandy\|mtg is now known as rlandy		12:02
*** ysandeep\|rover\|afk is now known as ysandeep\|rover		12:26
frickler	infra-root: cf. https://review.opendev.org/c/zuul/zuul/+/837852/5/doc/source/client.rst how would you currently for example enqueue a patch into gate? the zuul-client on zuul02 doesn't know the --trigger option, it does work without it, though. do we need to do something before zuul moves on?	12:27
frickler	also for reference I enqueued 842532,1 into gate to speed up unblocking devstack, is that worth a status log?	12:30
fungi	frickler: i do often #status log manual actions like that, just to serve as a clear record	12:36
frickler	#status log enqueued 842532,1 into gate to speed up unblocking devstack	12:41
fungi	frickler: i have this in my command history on zuul02:	12:44
fungi	sudo zuul-client enqueue --tenant=openstack --pipeline=check --project=openstack/placement --change=825849,1	12:44
frickler	fungi: yes, that is what I used, but it seems from the above patch that that is deprecated and zuul-admin should be used now? or maybe I read that wrong	12:47
fungi	oh, i thought it was `zuul` being deprecated in favor of `zuul-admin`	12:49
frickler	I just noticed a regression in gerrit: if I hit rebase and start typing a patch ID or text, it shows patches from all projects, not just from the project the patch is against.	12:50
frickler	fungi: there's a note added that says: For operations related to normal workflow like enqueue, dequeue, autohold and promote, the `zuul-client` CLI should be used instead.	12:51
frickler	but lateron there are still examples with zuul-admin for enqueue etc.	12:51
frickler	but maybe we should discuss on that patch rather than here	12:52
fungi	that seems like it might be a typo	12:52
fungi	yeah	12:52
fungi	frickler: oh!	12:55
fungi	i get it now	12:55
fungi	the enqueue, dequeue, autohold and promote subcommands are being retained for now for backward compatibility, so the zuul (not zuul-client) documentation about them is being updated to indicate that you need to run zuul-admin instead of just zuul	12:56
fungi	it's not saying "use `zuul-admin enqueue` instead of `zuul-client enqueue`" but rather "...instead of `zuul enqueue`"	12:57
frickler	fungi: ah, o.k., then I really read this the wrong way around and we should be fine with what we are doing	13:02
fungi	i added a recommendation in a review comment to hopefully reduce that point of confusion	13:03
fungi	if you look at the change, it will print a deprecation note "Warning: this command is deprecated with zuul-admin, please use `zuul-client` instead"	13:05
frickler	yes, maybe then the docs should also be updated to not show deprecated examples, I'll add a comment with that on the patch	13:06
fungi	that's basically what my comment suggested	13:06
fungi	er, well i suggested each subcommand's documentation entry mention it's deprecated, but you're right we probably should also drop the examples from those entries	13:07
fungi	i could go either way on keeping examples for deprecated options until they're actually removed	13:08
fungi	as long as we make it clear they're deprecated	13:08
frickler	I suggested grouping them into a "Deprecated" section to make it more obvious	13:10
fungi	oh, yep that could also work	13:10
fungi	looks like the ovh-bhs1 mirror has broken package updates, which has in turn broken our base deploy job, which is the reason ssl certs aren't getting updated	13:17
fungi	i'll get it squared up	13:17
fungi	huh, it wants openafs-build-deps	13:19
fungi	which is going to drag in a slew of other packages	13:19
fungi	specifically for openafs-build-deps 1.8.8.1-2~ppa0~bionic	13:21
fungi	the gra1 mirror is also running bionic and has that version installed with no problem	13:22
fungi	oh, no it doesn't have openafs-build-deps installed, just the dkms package for the openafs lkm	13:23
frickler	fungi: openafs-build-deps shouldn't be installed on mirrors, should they?	13:24
fungi	nope!	13:24
fungi	not sure when/why it was installed there	13:25
fungi	#status log Purged the unneeded openafs-build-deps package from mirror01.bhs1.ovh.opendev.org in order to unblock our base deploy job	13:25
fungi	cleaning it up seems to have solved the problem	13:25
frickler	hmm, doesn't show up in any of the apt logs, so must have been in place for a very long time. or installed manually with dpkg?	13:27
fungi	probably not via dpkg -i since that's a virtual package	13:27
TheJulia	I just observed a bunch of jobs hit this within the last few minutes: E: Failed to fetch https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/focal/universe/binary-amd64/Packages 403 Forbidden [IP: 158.69.73.218 443]	13:45
TheJulia	I don't know if should expect things to be working or not at the moment	13:46
abhishekk	clarkb,hi, around?	13:56
abhishekk	can anyone help me to get this issue resolved, https://review.opendev.org/c/openstack/glance/+/842400	13:57
abhishekk	We are stuck and not able to merge anything due to this error	13:57
abhishekk	2022-05-19 08:31:55.100787 \| ubuntu-bionic \| The conflict is caused by:	13:57
abhishekk	2022-05-19 08:31:55.100796 \| ubuntu-bionic \| The user requested glance-store>=2.3.0	13:57
abhishekk	2022-05-19 08:31:55.100804 \| ubuntu-bionic \| The user requested (constraint) glance-store===4.0.0	13:57
rosmaita	if we hack upper-constraints to glance-store===3.0.0 , we can build the tox py36 environment locally	13:58
rosmaita	and we tried as a short term thing to put glance-store>=2.3.0,<4.0.0 in requirements.txt, but that just gives us	13:59
rosmaita	ERROR: Cannot install glance-store<4.0.0 and >=2.3.0 because these package versions have conflicting dependencies.	13:59
rosmaita	The conflict is caused by:	13:59
rosmaita	The user requested glance-store<4.0.0 and >=2.3.0	13:59
rosmaita	The user requested (constraint) glance-store===4.0.0	13:59
Clark[m]	fungi: is The Julia's error related to your mirror surgery?	14:00
Clark[m]	abhishekk: rosmaita: I'm not sure how we would help with that. Seems like constraints and your requirements are in conflict so you need to change one or the other	14:01
rosmaita	Clark[m]: i misread that as "minor surgery" and was wondering what happened to fungi	14:01
Clark[m]	It is the server you need to worry about :)	14:02
Clark[m]	Anyway, I'm not really here yet and need to do a school run but can help in an hour or so	14:02
abhishekk	Clark[m], if we do change requirement in glance to >= 4.0.0 then also it is failing with same error	14:03
abhishekk	The conflict is caused by:	14:03
abhishekk	The user requested glance-store>=4.0.0	14:03
abhishekk	The user requested (constraint) glance-store===4.0.0	14:03
Clark[m]	abhishekk: because 4.0.0 requires python3.8 or newer?	14:05
abhishekk	yes, it does not support py36 and py37	14:06
Clark[m]	That is still a requirements and constraints conflict. You'll need to use different constraints under python 3.6 likely	14:06
abhishekk	ack	14:08
abhishekk	any example for the same?	14:08
rosmaita	i think the setup.cfg in 4.0.0 tag still says 3.6	14:09
rosmaita	nope was looking at the wrong branch	14:10
abhishekk	https://pypi.org/project/glance-store/ different here	14:10
fungi	Clark[m]: TheJulia: oh, yes i didn't think about it but the openafs lkm may have been unloaded on the ovh-bhs1 mirror while it was being upgraded	14:16
Clark[m]	fungi it looks to still be broken if you load the root http dir	14:17
fungi	looks like it may still be that way. rebooting the mirror server now	14:17
fungi	lsmod didn't show the module resident at all	14:18
TheJulia	sweet	14:18
fungi	also must have decided to run a filesystem check, or is otherwise timing out trying to load openafs now	14:21
fungi	A start job is running for OpenAFS client (5min 16s / 8min 3s)	14:23
fungi	i want to say we've seen this before with the ovh mirrors	14:23
fungi	i can't remember if forcing the dkms rebuild was necessary, or if it was just a boot-time race and rebooting usually solved it	14:24
fungi	it did eventually boot and seems afs is working now	14:26
fungi	#status log Distribution package mirrors on mirror01.bhs1.ovh.opendev.org were unavailable 13:25-14:25 UTC due to a package upgrade not removing and not reloading the openafs kernel module; related job errors can be safely rechecked	14:27
fungi	TheJulia: https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/focal/universe/binary-amd64/Packages is returning content again. thanks for the heads up!	14:27
*** pdeore is now known as pdeore\|afk		15:03
TheJulia	fungi: thanks!	15:04
clarkb	fungi: I think it is just slow with iops? it is the arm64 mirrors that have similar trouble	15:15
fungi	ahh, maybe related to the afs cache volume then	15:16
clarkb	abhishekk: rosmaita: ok I'm at a proper keyboard now. The first thing is to determine why you are testing with python3.6 in the first place. Is this on master? if so hasn't openstack dropped master python3.6 support?	15:16
clarkb	fungi: ya I think it prunes it or verifies it or something and that takes time	15:16
fungi	oh, right. we've blown away the cache before rebooting in the past when needed to avoid that	15:16
rosmaita	clarkb: thanks, i think we have it sorted	15:17
clarkb	abhishekk: rosmaita: if you are still testing with 3.6 because master glance-store is expected to work with stable releease of openstack then you need to do something like carry constrain override files or convince requirements to carry special 3.6 rules	15:17
rosmaita	apparently the zed template change got stuck in the gate	15:17
clarkb	PBR has examples of the special contraint override files iirc because it installs stuff back to python2.7	15:17
abhishekk	yep :/	15:17
rosmaita	clarkb: there is some kind of weird dependency on a job not defined in the glance repo that was breaking things, i don't understand it really, but abhishekk has a handle on it	15:19
rosmaita	"things" being preventing the zed template merge	15:19
abhishekk	++	15:20
*** dviroel is now known as dviroel\|lunch		15:26
*** ysandeep\|rover is now known as ysandeep\|out		15:29
clarkb	johnsom: I can see an argument to put it in the zuul job. Devstack probably doesn't need to know when its system is appropriately booted all it cares about is the user has triggered it and the user should be sure that the system si ready	15:50
clarkb	johnsom: that might still be a change to the devstack repo, but in the zuul playbooks or roles that trigger stack.sh	15:50
fungi	yeah, conversely, the fips setup role shouldn't need to care that dns resolution works	15:51
fungi	and it's unclear what or when something in zuul-jobs can generically assure the system is "fully booted"	15:52
johnsom	I am flexible. I just liked the fact that devstack would stop and give a direct error when DNS was broken instead of the current behavior where it runs a while and complains of missing packages.	15:52
clarkb	right I'm beginning to think the best place for the check is in the devstack ansible stuff that runs stack.sh. Before running stack.sh you can do whatever system readiness checks are appropriate for devstack	15:52
johnsom	Yeah, I looked briefly at systemd to get status, all of my systems reported "degraded" even though they are booted ok.	15:52
clarkb	well I think you can do the same wait until nslookup returns a result check	15:53
clarkb	but move it into the ansible running stack.sh and devstack proper can assume the user is triggering it on a ready system	15:53
fungi	though if systemd deferred allowing logins until unbound was actually started up, this would likely be a non-issue	15:54
fungi	to ianw's point	15:55
clarkb	ya but then you have to mdoify every third party ci setup's images	15:55
clarkb	and systemd is designed to have these problems somewhat intentionally aiui	15:56
clarkb	it wants to give you access to the system as early as possible so that you can decide if you are ready to do additional work (to reduce total system startup cost)	15:56
fungi	something else i liked about sysvinit	15:56
clarkb	ya I think for a laptop it makes a lot of sense. For servers consistency and stability are desireable and worth a few seconds of startup cost	15:57
fungi	agreed. it seems like systemd optimized for portable devices, at the expense of adding instability and vague lack of startup assurances for servers	15:59
frickler	fungi: ade_lee: taking the reboot issue here, because there is a deeper question hidden I think: should a consumer be able to expect a CI node to be working properly after a reboot. if the answer is "Yes, we as opendev want to support this", then we likely need to set up things like unbound in place and have tests for the images we build that ensure that this works	16:00
frickler	if not, maybe the fips job setup needs a different approach	16:00
fungi	granted, the behavior with afs on the mirror servers is a clear indication that it does block on some things. openssh wouldn't allow me to log into the mirror until afs had started	16:00
clarkb	my perspective on that is if you reboot in your job then you assume all responsibility for making the node happy	16:00
johnsom	There is a nss-lookup.target that in theory should mean the system can accept queries, but I don't know how reliable that is	16:00
clarkb	we already do ensure the node is ready for you when we hand you the node	16:00
fungi	should we rerun the validate-host role after reboots?	16:01
clarkb	rebooting throws a huge wrench in things. Its powerful that you are able to reboot at all (jenkins couldn't do it), btu its also something that jobs need to accept can be problematic and deal with	16:01
clarkb	fungi: that may be an option	16:01
fungi	granted, validate-host doesn't wait for the server to be booted, it just discards the build if the server isn't ready for the things it checks	16:02
clarkb	a "wait for system to settle after reboot" role may be reasonable to add to zuul-jobs	16:03
clarkb	then add that to the fips jobs after the reboot	16:03
clarkb	a wait for network to be up (checked via ssh connectivity?), restart zuul console logger, and validate dns resolution are the three steps I can think of off the top of my head	16:04
johnsom	The bonus with the switch to ansible is you would have access to the mirror FQDN	16:06
ade_lee	sounds like a reasonable plan	16:08
ade_lee	where would the ssh check connect to?	16:09
ade_lee	and whats the ansible parameter that specifies the mirror FQDN?	16:10
clarkb	ade_lee: zuul-executor ansible being able to connect to the node that rebooted post reboot	16:10
clarkb	I'm sure you're already doing that in a wait_for or something. I'm just suggesting we can collect these common post reboot actiosn into a single role	16:11
*** marios is now known as marios\|out		16:14
ade_lee	clarkb, ack.	16:18
ade_lee	johnsom, clarkb , fungi frickler -- If we agree on this approach, I can start putting such a role together.	16:18
ade_lee	something that does a wait_for, restarts the zuul console, and checks dns by resolving opendev.org	16:19
johnsom	ade_lee zuul_site_mirror_fqdn	16:20
ade_lee	johnsom, sorry yes ^^ that one :)	16:20
fungi	to make the role generic, opendev.org should be at most a default for some rolevar so other sites can supply a record they expect to be resolvable by their nodes	16:23
clarkb	or make it a required value	16:24
fungi	yeah, ideally we'd ask it to resolve $zuul_site_mirror_fqdn in our deployment, but other sites may want to supply a different record to test	16:25
*** dviroel\|lunch is now known as dviroel		16:25
ade_lee	ok - so resolve $zuul_site_mirror_fqdn if set, else some rolevar which defaults to opendev.org ?	16:26
clarkb	ade_lee: I think I would make it a required var input. Maybe suggest that you can use $zuul_site_mirror_fqdn if using mirrors	16:27
clarkb	then people using the role can decide if google.com is more appropriate	16:27
ade_lee	ok	16:28
*** jpena is now known as jpena\|off		16:31
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Re-sync test-mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842572	16:34
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842573	16:34
clarkb	corvus (and the rest of infra-root) only our openstack tenant uses the deprecated zuul queue syntax according to that script as of the last 10 minutes or so	16:34
clarkb	I'm working on an email to the openstack-discuss list now. There are a number of projects so I'm going to do my best to catch the attention of those that need it	16:35
fungi	thanks for checking it!	16:35
johnsom	I just posted patches for octavia and designate repos.	16:35
fungi	awesome!	16:36
fungi	don't forget your stable branches, if you also set it there	16:36
johnsom	Yeah, that will be some work/time to get through.	16:36
corvus	clarkb: re projects that are unmaintained -- maybe consider dropping them from the zuul config if they don't clean up errors after a certain time?	17:00
corvus	they can always be added back later easily enough	17:00
clarkb	ya I think that is a reasonable appraoch to take	17:00
fungi	i concur	17:07
clarkb	ok email sent	17:10
clarkb	fungi: it is just over the size limit if you can moderate it through	17:11
fungi	gladly	17:12
clarkb	(I attached a file with all the branch and file info for each project which did that)	17:12
fungi	i discarded your message and approved the ones for hydraulics investment opportunities and shipping notices	17:12
fungi	(just kidding, it was the other way around)	17:13
clarkb	I'm always open to good investment opportunities	17:14
*** timburke__ is now known as timburke		17:28
opendevreview	Merged zuul/zuul-jobs master: Re-sync test-mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842572	18:01
opendevreview	Merged zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/842573	18:03
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Make test-prepare-workspace-git role https://review.opendev.org/c/zuul/zuul-jobs/+/842598	18:11
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Make test-prepare-workspace-git role https://review.opendev.org/c/zuul/zuul-jobs/+/842598	18:40
opendevreview	Merged zuul/zuul-jobs master: Make test-prepare-workspace-git role https://review.opendev.org/c/zuul/zuul-jobs/+/842598	19:29
opendevreview	James E. Blair proposed opendev/base-jobs master: Switch base-test to test-prepare-workspace-git https://review.opendev.org/c/opendev/base-jobs/+/842615	20:12
corvus	infra-root: ^ a base-test change to prepare us for ansible 5	20:12
*** dviroel is now known as dviroel\|out		20:45
clarkb	I'm trying to push a few of these queue update changes to projects that are likely abandoned (but figure make it easy for them to address and then remove from zuul projects list if they don't) and am discovering they don't even have valid .gitreview configs on their branches ugh	20:59
clarkb	I tried to push to stable/xena and it pushed a second patchset to my master change. Trying to push to ussuri-test made a second stable/ussuri chagne and overriding the branch doesn't seem to do anything due to some .gitreview config they have	21:00
clarkb	https://review.opendev.org/q/topic:fix-queue-config that was fun	21:11
fungi	yeah, i just make a point of always telling git-review what branch to target when doing that sort of thing, for exactly that reason	21:58
clarkb	learned my lesson	22:22
fungi	make no assumptions	22:24
corvus	fungi: got a sec for https://review.opendev.org/842615 ?	22:24
fungi	lookin'	22:24
corvus	would like to keep the base-test cycle moving	22:24
fungi	me too, thanks!	22:25
opendevreview	Merged opendev/base-jobs master: Switch base-test to test-prepare-workspace-git https://review.opendev.org/c/opendev/base-jobs/+/842615	22:29
opendevreview	Merged openstack/project-config master: update generate constraints to py38,39 https://review.opendev.org/c/openstack/project-config/+/837815	22:35
ianw	ok, sorry i got distracted yesterday but i've parsed https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f46993d83ff4abb310ef7b4beced56ba96f0d9d now	23:43
clarkb	I was disrtacted yesterday too :)	23:43
ianw	spec_store_bypass_disable and spectre_v2_user can both be set to "seccomp" or "prctl"	23:44
ianw	if it's seccomp, every seccomp() enabled thing will try to enable these flags for the process. if it's prctl, it becomes an opt-in thing userland needs to set explicitly	23:45
ianw	0x4 is ssbd from previous investigation. so it is presumably spec_store_bypass_disable that is causing the problems	23:46
ianw	i just need to rejig my test machine back to the standard kernel, but i'll try booting that with spec_store_bypass_disable=prctl and i expect the flood of messages goes away	23:48
ianw	oh, and that change modifed the default kernel to turn it to prctl, because, as the changelog goes into, they're basically unhelpful	23:49
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 2.8 https://review.opendev.org/c/zuul/zuul-jobs/+/842647	23:50
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 2.9 https://review.opendev.org/c/zuul/zuul-jobs/+/842648	23:50
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 5 https://review.opendev.org/c/zuul/zuul-jobs/+/842649	23:50
ianw	so hopefully we can get an argument to someone upstream that jammy kernels should do the same and it can be backported. however that doesn't solve the immediate issue of jammy nodes having hundreds of megabytes of logs on OVH	23:51
clarkb	is that something that can be set via sysfs on boot?	23:51
clarkb	or maybe via a kernel flag?	23:52
clarkb	if so we could make dib element modify that?	23:52
ianw	hrm, yes i wonder if sysctl works dynamically. i'm just reinstalling kernels and can test	23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!