Wednesday, 2020-04-15

clarkb	mordred: ok I think that is really close but some of the puppet stuff still needs updating comments on the chnage	00:09
mordred	clarkb: responded	00:11
mordred	clarkb: and no - those are remote paths	00:11
clarkb	oh I'm going to need to melt my brain again I guess	00:12
mordred	clarkb: (I had to check myself)	00:12
mordred	clarkb: I actually think we should completely rework the puppet tests to be based on remote_puppet_else	00:12
clarkb	mordred: mgmt_ is bridge? and not mgmt_ is remote?	00:12
mordred	yup	00:13
clarkb	mordred: ok so the way this would work is we just copy from /home/zuul/etc into /opt/system-config/production on the remote and nothing else changes?	00:13
clarkb	I guess that simplifies things for making changes onbridge	00:13
mordred	like - I think it would be nice to get rid of the current puppet jobs completely - make per-service jobs that are essentially "run remote-puppet-else but with only host X" - then we'll be set for each service we transition	00:13
mordred	clarkb: yah	00:13
clarkb	mordred: ++ on the job idea	00:13
mordred	clarkb: becuase also we need thsoe legacy puppet jobs to die anyway	00:14
mordred	clarkb: I mean - really - we could start making service-foo playbooks for everything too - just with roles: - puppet in them	00:15
mordred	and completely get rid of else	00:15
mordred	corvus: if you have a sec for a re-review of the first patch in the stack: https://review.opendev.org/#/c/719186 - I can land those when I'm watching in the morning	00:17
clarkb	similarly if someone is willing to review those docker-compose upgrade changes I'm happy to babysit those tomorrow as they go in (assuming I get a second +2)	00:18
mordred	infra-root: that's https://review.opendev.org/#/c/720030/ and https://review.opendev.org/#/c/719589/ ^^	00:28
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Document output variables https://review.opendev.org/719704	00:48
openstackgerrit	Merged zuul/zuul-jobs master: ensure-pip: Add role https://review.opendev.org/717639	01:09
openstackgerrit	Merged opendev/system-config master: Write out db config for root user https://review.opendev.org/719192	01:11
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Python roles: misc doc updates https://review.opendev.org/720111	01:27
openstackgerrit	Merged openstack/project-config master: Move suse builds to nb04, drop pip-and-virtualenv https://review.opendev.org/718299	01:45
*** ysandeep\|away is now known as ysandeep\|rover		02:11
openstackgerrit	Ian Wienand proposed openstack/project-config master: AFS Grafana : add mirror release timers https://review.opendev.org/720122	03:11
openstackgerrit	Ian Wienand proposed openstack/project-config master: AFS Grafana : add mirror release timers https://review.opendev.org/720122	03:45
ianw	dirk / clarkb: so suse has built on nb04 now	03:48
ianw	i'd like to, and will be available to, push on anything needed to get things working without pip-and-virtualenv. as i've said, i think the ensure-pip stack is ready	03:49
*** DSpider has joined #opendev		03:51
ianw	cmurphy: ^ might also affect as i saw some things fly by about certs	03:53
cmurphy	ianw: ooh good to know, a new image might help me avoid needing https://review.opendev.org/720053	03:58
ianw	ahh yeah that was what i was thinking of. i'm not going to make a prediction, but maybe? :)	03:59
openstackgerrit	Merged openstack/project-config master: AFS Grafana : add mirror release timers https://review.opendev.org/720122	04:03
*** ysandeep\|rover is now known as ysandeep\|BRB		04:08
*** ysandeep\|BRB is now known as ysandeep\|rover		04:23
*** ykarel\|away is now known as ykarel		04:25
openstackgerrit	Merged openstack/project-config master: Revert "Revert "Introduce job for granular GitHub mirroring"" https://review.opendev.org/719047	05:30
AJaeger	ianw: reviewed the stack and gave my +2s, I did not approve - wanted you do do the honours yourself when you're around. Thanks!	05:41
*** roman_g has joined #opendev		05:42
ianw	AJaeger: ok, thanks, i'll do that in the morning then to avoid pushing anything before i disappear :)	05:43
AJaeger	ianw: enjoy your evening ;)	05:46
openstackgerrit	OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/720126	06:03
*** roman_g has quit IRC		06:15
prometheanfire	should glean be updated (tox wise) to py36/38?	06:19
*** hashar has joined #opendev		06:26
*** hashar has quit IRC		06:42
*** dpawlik has joined #opendev		06:49
AJaeger	ianw, cmurphy, dirk, keystone is now failing openSUSE tests, see https://review.opendev.org/715688	06:58
AJaeger	RETRY_LIMIT - and no log files ;(	06:59
openstackgerrit	Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339	07:00
*** roman_g has joined #opendev		07:00
prometheanfire	ok, that passes tests locally ^	07:00
*** lpetrut has joined #opendev		07:07
*** hashar has joined #opendev		07:07
openstackgerrit	Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/720126	07:10
*** ralonsoh has joined #opendev		07:14
ianw	AJaeger: ok ... hrm that seems before anything i'd even expect to have changed wrt pip-and-virtualenv	07:14
ianw	AJaeger, dirk, cmurhpy: this seems to be the relevant bit -> http://paste.openstack.org/show/792138/	07:18
ianw	"msg": "Data could not be sent to remote host \\"149.202.187.58\\". Make sure this host can be reached over ssh: Permission denied	07:18
ianw	# cat /etc/dib-builddate.txt	07:21
ianw	2020-04-15 04:38	07:21
ianw	i'm logged into a opensuse host that was built today though ...	07:22
ianw	opensuse-15-rax-dfw-0015944531	07:22
AJaeger	so, you can login but Zuul cannot?	07:23
ianw	hrm, maybe?	07:23
*** tosky has joined #opendev		07:23
ianw	zuul@opensuse-15-inap-mtl01-0015944709:~> cat .ssh/authorized_keys	07:25
ianw	/var/lib/nodepool/.ssh/id_rsa.pub	07:25
ianw	that .. does not look right? like it's a file and not the actual public key?	07:26
ianw	2020-04-15 02:02:16.244 \| + /opt/dib_tmp/dib_build.ujqmkwxc/hooks/extra-data.d/60-zuul-user:main:16 : echo /var/lib/nodepool/.ssh/id_rsa.pub	07:26
*** rpittau\|afk is now known as rpittau		07:28
AJaeger	shouldn't that be cat?	07:28
ianw	i think so, but ...	07:30
openstackgerrit	Ian Wienand proposed openstack/project-config master: Add ZUUL_USER_SSH_PUBLIC_KEY to opensuse-15 image https://review.opendev.org/720136	07:30
ianw	AJaeger: ^ that should fix it. i'll have to think about the echo/cat thing	07:30
ianw	if we want to merge that, i can come back and kick off a build soon, or maybe frickler could babysit it if around?	07:31
ianw	this is exactly why i did the abstract job/inheritance thing in nodepool config, so wouldn't forget stuff like this. still have to get back to convert the file	07:32
AJaeger	ianw: thanks, approved	07:32
ianw	i feel like opendev-prod-hourly might be stuck	07:43
*** ysandeep\|rover is now known as ysandeep\|lunch		07:48
openstackgerrit	Merged openstack/project-config master: Add ZUUL_USER_SSH_PUBLIC_KEY to opensuse-15 image https://review.opendev.org/720136	07:52
ianw	AJaeger: ok, i pulled that manually and triggered a build	07:54
*** ykarel is now known as ykarel\|lunch		07:54
ianw	https://nb04.opendev.org/opensuse-15-0000086893.log <- this one	07:55
*** hashar has quit IRC		08:02
openstackgerrit	Sorin Sbarnea proposed opendev/gerritlib master: Switch to ensure-docker role https://review.opendev.org/720145	08:14
*** ykarel\|lunch is now known as ykarel		08:37
*** ysandeep\|lunch is now known as ysandeep\|rover		08:40
openstackgerrit	Sorin Sbarnea proposed zuul/zuul-jobs master: Improve 404 error message on download-logs.sh https://review.opendev.org/720035	08:52
openstackgerrit	Merged opendev/irc-meetings master: Update OpenDev meeting location and name https://review.opendev.org/720060	08:58
openstackgerrit	Roman Gorshunov proposed openstack/project-config master: Retire airship-in-a-bottle https://review.opendev.org/720160	09:03
openstackgerrit	Roman Gorshunov proposed openstack/project-config master: Retire airship-in-a-bottle https://review.opendev.org/720160	09:04
*** hashar has joined #opendev		09:14
*** roman_g has quit IRC		09:25
openstackgerrit	Marcin Juszkiewicz proposed openstack/project-config master: Add CentOS 8 AArch64 nodes https://review.opendev.org/720167	09:52
openstackgerrit	Merged zuul/zuul-jobs master: Support ssh-enabled windows hosts in add-build-sshkey https://review.opendev.org/653712	10:00
*** rpittau is now known as rpittau\|bbl		10:23
ttx	Test GitHub replication on release-test repository: http://zuul.openstack.org/build/96b02fef3f6345ed89f2f44283d49022/log/job-output.txt	10:33
AJaeger	\o/	10:36
ttx	fungi, corvus, mnaser: please review ^ -- I'm wondering about all those deleted references and created branches	10:36
ttx	I mean those branches definitely correspond to the opendev repo... just wondering why they weren't already up	10:37
ttx	(maybe it's just a log artifact)	10:37
ttx	Like... That list of deleted refs could be quite long in a more active repo	10:38
* ttx is tempted to queue a second test		10:39
AJaeger	yeah, interesting to see	10:40
ttx	ok, sending a new one in	10:40
openstackgerrit	Merged zuul/zuul-jobs master: Improve 404 error message on download-logs.sh https://review.opendev.org/720035	10:46
*** ysandeep\|rover is now known as ysandeep\|coffee		11:00
ttx	http://zuul.openstack.org/build/8915e9ae33494257b1fb4928c16ec215/log/job-output.txt only has the additional change mentioned	11:03
ttx	so yeah I fear that for large projects we may end up deleting thousands of reference, which might or might not be costly	11:05
*** ysandeep\|coffee is now known as ysandeep\|rover		11:17
openstackgerrit	Merged openstack/project-config master: Add devstack-plugin-ceph notifications to manila channel https://review.opendev.org/720097	11:54
AJaeger	ttx, are those changes all on github? Did you double check?	12:03
ttx	They are, but then since Gerrit-wide replication was not turned off, that does not mean much	12:05
ttx	AJaeger: oh, you mean the refs?	12:05
AJaeger	yes	12:05
ttx	let me do a recent clone	12:05
ttx	AJaeger: on a fresh clone there aren't any refs on GitHub other than refs/remotes/origin/HEAD and refs/heads/master (+ branches)	12:10
ttx	no refs/changes	12:10
*** factor has joined #opendev		12:16
Eighth_Doctor	hey, is it normal that a repo like openstack/nova would have 182116 refs?	12:18
Eighth_Doctor	there's this refs/changes thing and refs/users thing...	12:19
ttx	Eighth_Doctor: where did you clone from?	12:29
Eighth_Doctor	https://opendev.org/openstack/nova	12:30
Eighth_Doctor	I did a `git clone --mirror`	12:30
ttx	from opendev or github?	12:30
ttx	(or both)	12:30
Eighth_Doctor	opendev	12:31
Eighth_Doctor	I only pulled from opendev	12:35
ttx	I suspect opendev has a full mirror of the Gerrit repo, which keeps all the refs/changes	12:38
ttx	while the new job pushes a mirror of a clone, so it does get rid of refs/changes in the process	12:39
Eighth_Doctor	ttx, well, it certainly exposed some interesting things about doing a full mirror from there to stg.pagure.io	12:41
Eighth_Doctor	also, wow, `git reflog` does not like this repo on my computer :/	12:41
Eighth_Doctor	I was taking a look at it due to a convo I had with mordred, clarkb, and fungi about using pagure as the source code browser frontend for opendev.org instead of gitea	12:44
Eighth_Doctor	processing all those refs at once was a bit painful on the machine that stg.pagure.io runs on...	12:48
Eighth_Doctor	but at least now it's there: https://stg.pagure.io/openstack/nova	12:48
Eighth_Doctor	this is probably going to turn into a good test case, actually, since I hadn't encountered a repo like this before	12:48
*** rpittau\|bbl is now known as rpittau		12:49
*** roman_g has joined #opendev		12:50
mnaser	ttx: i wonder if the reason why it does this because we don't do a deep mirror clone by zuul into the executor	12:53
mnaser	ttx: and so because we have a shallow clone that doesnt include all the refs (because that would take a long time and probably not needed)	12:54
ttx	yep	12:54
ttx	that's what I meant by "pushes a mirror of a clone"	12:54
*** ykarel is now known as ykarel\|afk		12:55
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Update update_constraints for Py3.8 https://review.opendev.org/720197	12:58
Eighth_Doctor	ttx: well, it took four days to push all those refs	13:08
Eighth_Doctor	and most of the git command line tools seem to be rather unhappy with the repo on my machine because of all the refs	13:09
Eighth_Doctor	but it's a nice test case, so it's not all bad	13:09
ttx	lol... Yeah I expect it will also take days to delete them if we end up mirroring nova with the new per-repo system	13:10
ttx	hence my question up there	13:10
Eighth_Doctor	ttx: if gerrit+zuul was directly managing the pagure git repository, I don't think this would be a problem	13:22
Eighth_Doctor	otherwise, probably should be somehow not sending those refs when pushing, because damn they're expensive	13:22
mordred	ttx, mnaser we _do_ have a full mirror on the executor - however, the refs/changes thing might be a smidge interesting	13:25
mordred	because I'm not sure each executor is always going to fetch refs/changes it doesn't happen to work with - so in any given push we may not get the full story of the refs/changes/*	13:25
mordred	although maybe it's fine that they're not there	13:26
Eighth_Doctor	mordred: my theory at least is that this would only be painful once	13:26
mordred	I'm a little concerned about that origin/stable/train -> origin/stable/train and friends	13:26
mordred	Eighth_Doctor: well for the gitea/pagure case it's a little different - we use those also so that people can browse proposed changes too - so we need all of the refs/changes to be in that system	13:27
mordred	for github mirroring - meh, I don't think it's actually important	13:27
Eighth_Doctor	though gitea looks like it's not happy with me doing a git fetch right now	13:27
frickler	ianw: AJaeger: cmurphy: new opensuse image seems to work better, but now fails with "virtualenv: command not found"	13:27
frickler	https://a454580e587cac547c7e-cfcb5348d0a5bd4d7cf82711ec310965.ssl.cf1.rackcdn.com/715688/6/check/keystone-dsvm-py3-functional-federation-opensuse15/152dd76/	13:28
Eighth_Doctor	mordred: at least with gitea, refs/changes are not visible	13:28
mordred	ttx: I think the created branches are a logic bug	13:28
Eighth_Doctor	it wouldn't be hard to extend pagure to show you the refs/changes stuff, but I'm not sure how useful it would be given that the refs have no context	13:28
Eighth_Doctor	I'm not even sure what the numbering scheme is here	13:28
mordred	Eighth_Doctor: yeah - they're hidden refs - those are how gerrit stores proposed changes	13:29
Eighth_Doctor	yeah	13:29
ttx	mordred: not very concerned with the branches really. Just don't want the script to block executors for one day deleting 182,116 refs every time Nova is synced	13:30
Eighth_Doctor	pagure PRs work similarly, except they're stored in an adjacent repo for pull requests	13:30
ttx	(refs/changes)	13:30
mordred	so - https://review.opendev.org/#/c/719186/9 is going to be in refs/changes/86/719186/9	13:30
Eighth_Doctor	where does `86` come from?	13:31
Eighth_Doctor	is it just the last two digits?	13:31
mordred	the last 2 digits	13:31
mordred	it's a dir hashing scheme	13:31
Eighth_Doctor	okay	13:31
mordred	but that ref can be seen in gitea: https://opendev.org/opendev/system-config/commit/c117c1106df8ff30aee7b8a118811bf239f3dcf8	13:31
mordred	so we push them there, but since they aren't branches they don't show up in the branches list	13:32
*** ysandeep\|rover is now known as ysandeep\|away		13:32
Eighth_Doctor	right	13:33
Eighth_Doctor	that should work the same way with pagure, I think	13:33
mordred	ttx: yeah - I think we might want to come up with a $something to do in git config to control refs/ interactions	13:33
mordred	ttx: or - we could do an offline script to push up refs/changes deletions for all of them	13:34
mordred	ttx: so that we just stop caring about those refs on github completely	13:34
mordred	they're not exactly browseable anyway	13:34
ttx	yeah, it's just tricky to do without freezing mirroring for a bit	13:35
ttx	Like 1/ disable Gerrit-wide replication, 2/run refs/changes deletion script other a thousand repos and 700,000 refs, 3/ enable per-repo mirroring	13:36
ttx	I have no idea how long 2 will take :)	13:37
Eighth_Doctor	I wonder if we could be clever here in pagure, and make it so that when those refs/changes things show up, they make a link to Gerrit?	13:37
Eighth_Doctor	ttx: four days at least on nova :)	13:37
ttx	Eighth_Doctor: it was to create them, hopefully deleting is faster :)	13:37
Eighth_Doctor	actually, would the Change-Id be a better thing to process and hyperlink than the refs?	13:38
ttx	Damn it's more than 700,000, it's one per patchset	13:38
Eighth_Doctor	ttx: yeah, it's a _lot_	13:39
Eighth_Doctor	Change-Ids are unique to Gerrit and are the way it tracks those things, is there a way to use that to link to the change review?	13:39
ttx	It deleted 80 in 50ms in the script	13:39
ttx	so about 14 hours for a million patchsets	13:41
ttx	assuming 3 revs per change (average from fungi), would take about a day	13:42
ttx	napkin math	13:43
Eighth_Doctor	mordred: so the way that pagure renders commits doesn't seem to make the refs thing useful :(	13:44
Eighth_Doctor	https://stg.pagure.io/openstack/nova/c/9dcc0941f1371c6e6852ad53bc6e6b04e0677d4d	13:44
Eighth_Doctor	that might be worth fixing, not sure	13:44
Eighth_Doctor	the commits list typically has these things, so it might be worth extending that view to support it	13:44
Eighth_Doctor	mordred: what do you think would be more useful? a link via change-id (assuming that's possible) or a population in the commits view of refs/changes/* that link to gerrit?	13:46
mordred	Eighth_Doctor: I think a view of a given refs/change is really only useful as something you might look at if you follow the link _from_ gerrit	13:48
mordred	that said - I do think a link from change-id back to gerrit could be useful for people browsing normal commits	13:48
mordred	Eighth_Doctor: https://review.opendev.org/#/q/I0d3b92506fab8f973bffe082cbfb2ab29cb0b8d0 is how you go to a change via change-id	13:49
Eighth_Doctor	okay, that's neat	13:49
Eighth_Doctor	I'm going to log that as an RFE and take a look at adding a feature for supporting that in pagure	13:50
openstackgerrit	Monty Taylor proposed opendev/system-config master: Upgrade to gitea 1.11.4 https://review.opendev.org/720202	13:55
Eighth_Doctor	mordred: https://pagure.io/pagure/issue/4812	14:01
mordred	cool	14:03
mordred	infra-root: I'm landing the patches to run zuul prod patches from zuul checkout - I'll be watching to make sure it all happens properly	14:03
*** ykarel\|afk is now known as ykarel		14:03
corvus	ttx: i agree with your analysis; we may be able to reconfigure gerrit not to replicate refs/changes, so if we did that, we could modify your process to: reconfigure gerrit to not replicate refs/changes; delete refs/changes asynchronously; enable zuul replication; disable gerrit. that would avoid a replication outage.	14:06
mordred	++	14:07
corvus	mordred: #zuul -> we're about to need to make a moderately complex change to the zuul deployment in order to support zk tls	14:10
fungi	Eighth_Doctor: still catching up, but the most effective way to link back to gerrit reviews from the git repository is via the git "notes" it stores	14:10
fungi	they used to be displayed by default by cgit, i think we need to configure gitea to do it (they didn't support alternative notes trees until somewhat recently and we haven't had time to revisit it since upgrading)	14:11
corvus	mordred: but we don't have a solution for running the executor in docker yet, so i don't think we can convert everything to docker; should we do the new work in ansible instead of puppet? should we use windmill?	14:11
mordred	corvus: re: gerrit - setting 'push' to +refs/heads/:refs/heads/ should do the tric	14:11
corvus	ttx: ^	14:11
mordred	corvus: I mean - I've got all the config bits converted - so I think it would be easier to just change the executor to pip install in that instead of trying to use windmill	14:12
fungi	the way i had imagined that replication job was that it would just push the current head or tag when triggered, not try to push a full mirror every time it's invoked	14:12
mordred	also - we'd have to add zk tls support to windmill and I don't know what paul's status for stuff like that is atm	14:12
fungi	so i'm surprised it was deleting anything	14:12
corvus	mordred: what do you mean you've got all the config bits converted? isn't zuul.conf still written by puppet?	14:13
mordred	corvus: (we could also change all of it to run via pip instead of docker)	14:13
mordred	corvus: my zuul patch ... one sec	14:13
fungi	assuming it's in sync already, the job should be triggered for any update to a branch anyway (and a tag once we add it to the right pipelines)	14:13
corvus	mordred: re windmill -- someone needs to, right? doesn't the ansible zuul run via windmill?	14:13
mordred	corvus: https://review.opendev.org/#/c/717620/	14:13
openstackgerrit	Merged opendev/system-config master: Update install-ansible away from /opt/system-config https://review.opendev.org/719186	14:13
openstackgerrit	Merged opendev/system-config master: Run playbooks out of zuul checkout https://review.opendev.org/719190	14:13
mordred	corvus: that's a stab at converting our pupppet use to using ansible instead - although it's obviously not going to work for the exeutor because of docker - but it's a start. but we could also use windmill - I didn't do that in this case because it seemed like a harder step	14:14
corvus	mordred: oh! you sure did write that patch. :)	14:15
corvus	mordred: i agree, updating that to s/docker/pip/ is probably easy and gets us to a place to use zk tls quickest	14:15
mordred	corvus: do you think I should do that everywhere? or just on ze?	14:15
corvus	mordred: well, it seems nodepool -> docker is already in progress, so a mixed env is a given; therefore, maybe just do that on ze	14:17
mordred	corvus: kk. I'll update the patch	14:25
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	14:28
mordred	that's just a rebase	14:28
*** ysandeep\|away is now known as ysandeep		14:31
Eighth_Doctor	fungi: git notes?	14:33
fungi	Eighth_Doctor: notes refs	14:33
fungi	https://git-scm.com/docs/git-notes	14:33
fungi	that stuff	14:33
Eighth_Doctor	hmm	14:34
fungi	though gerrit doesn't use the default refs/notes/commits tree in case you're already using that for other purposes	14:34
fungi	it uses a refs/notes/review tree	14:35
fungi	but it stores the numeric vote values/dates/users, review link and related data in there	14:36
ttx	mordred, corvus: would limiting the push not result in refs deletion ? (like what happens during the replication process for refs/changes already on GitHub ?)	14:36
Eighth_Doctor	fungi: interesting	14:36
ttx	or is it just additive	14:36
fungi	ttx: i'm surprised we actually built that job to mirror all refs in the first place, i had thought it was just going to push refs for the branch or tag which triggered the build	14:38
ttx	also where is that setting 'push' to +refs/heads/:refs/heads/ happening ? replication.config?	14:40
corvus	ttx: oh, that's a good point, it may well do that.	14:40
ttx	answering my last question :yes	14:40
fungi	ttx: yeah, it'll be in the replication config	14:40
ttx	push = +refs/heads/:refs/heads/	14:41
ttx	push = +refs/tags/:refs/tags/	14:41
ttx	probably both	14:41
corvus	ttx: we still might want to consider that though; i think we dedicate one thread to github replication; we could increase that to two, which would mean replication is slow, but would allow other work to happen while nova was 'stuck'	14:41
ttx	ok will give it some extra thought	14:42
mordred	I'm not sure it would push deletes	14:42
mordred	I think with teh mirror script it's mirroring all refs, so that means it's going to try to mirror in the refs/changes namespace, meaning pushing deletes	14:42
ttx	on mirroring it definitely deletes remote extra refs	14:43
mordred	if we limit the ref namespaces gerrit is pushing	14:43
mordred	then I don't think it would push empty refs/changes to delete things	14:43
mordred	that would be pushing ref information for a namespace we told it not to push	14:43
corvus	yeah, i don't know for sure. that sounds plausible.	14:43
mordred	we can try this out on review-dev	14:43
corvus	++	14:44
mordred	but I'm gonna put my money on it being safe to configure gerrit to just stop replicating them	14:44
mordred	and then being able to run a cleanup script	14:44
*** mlavalle has joined #opendev		14:47
fungi	it's still not clear to me, why have the job replicate everything each time it runs and not just the branch or tag for which the build was triggered?	14:47
fungi	branch and tag updates won't happen outside gerrit typically anyway, so zuul will receive events for those and then run the job	14:48
corvus	fungi: that's a fair question. perhaps to catch up after previous errors? maybe that's low-risk though?	14:49
corvus	or maybe it could be configured not to delete	14:49
fungi	the only one i'd worry about is missed tags, but maybe if the job is triggered by a tag then push all tags, but branches will eventually get new commits	14:49
fungi	just seems unnecessary to have it try to replicate the entire repository when the triggering event was a new commit merging to a single branch, and the job gets run each time that happens	14:50
fungi	(to be honest, i had it in my head that was the design, and didn't realize until now that wasn't how it was working)	14:51
corvus	speaking of replication... i think we may have a gitea backend out of sync; i'm seeing different data pulling zuul updates	14:51
cmurphy	frickler: ianw dirk "virtualenv: command not found" the pip-and-virtualenv element was removed from the image build https://review.opendev.org/718299 ???	14:52
cmurphy	can we put it back? keystone needs this	14:52
fungi	cmurphy: the idea was that the tox parent job would start installing virtualenv, i think	14:53
AJaeger	cmurphy: ianw is working on this stack: https://review.opendev.org/#/c/718224/	14:54
fungi	(and no, a big part of the delay in the suse image updates was so that we didn't have to work out installing pip, virtualenv and tox into the system context, since we're going that direction for the other distros as well, fedora is already like that apparently)	14:54
AJaeger	once that's merged all should be green again	14:54
AJaeger	cmurphy: best discuss with ianw once he's awake. We expected that what we had would work already as is.	14:55
cmurphy	AJaeger: great! can we merge that asap?	14:55
cmurphy	:( ianw won't be awake till the end of my day	14:55
AJaeger	cmurphy: ianw wants to merge once he's around - but corvus just left a -1 on https://review.opendev.org/#/c/717663/24	14:56
corvus	i thought the "plain" image was being used to work through this?	14:57
corvus	i didn't think anything was removed from the main images yet	14:57
AJaeger	corvus: https://review.opendev.org/718299 - we had problems building the opensuse images as well	14:58
AJaeger	So, between a rock and a hard place ;(	14:58
corvus	perhaps we should revert that as cmurphy says? because if we override my -1 we're going to break other zuul installs	14:59
corvus	i haven't looked into how long it would take to fix my -1, it's probably not too hard, but i'm certainly not up to speed	15:00
AJaeger	corvus: 718299 was needed to fix image builds that have been broken for ages ;(	15:00
fungi	we can roll back to months-old images maybe, if we still have them hanging around	15:02
cmurphy	the months-old image was semi-working for me with workarounds	15:03
mordred	corvus: if you have a sec - could you look at https://zuul.opendev.org/t/openstack/build/da3cec0713204f22982e65d5ac420a8c/log/job-output.txt#78 ?	15:03
mordred	corvus: that's trying to use mirror-workspace-git-repos when talking to bridge - it seems to be having a sad but I'm not 100% sure what the issue would be - did I use the wrong role here?	15:04
mordred	corvus: I'm starting to think maybe I was supposed to use prepare-workspace instead	15:04
mordred	corvus: no - I guess prepare-workspace does the synchronize	15:05
corvus	mordred: in a bit...	15:05
*** lpetrut has quit IRC		15:05
corvus	what should we do about the keystone situation?	15:05
corvus	revert and rollback? or are we going to say "sorry it's broken for a day or two"?	15:06
corvus	are there any other options?	15:06
mordred	I think revert and rollback	15:07
mordred	and then the re-revert needs to take this issue in to account	15:08
mordred	becuase I think it was the assumption that this wouldn't break things	15:08
corvus	are we talking about the opensuse-15 image?	15:08
mordred	I believe so?	15:08
yoctozepto	morning folks	15:09
yoctozepto	it's etherpad again :-(	15:09
yoctozepto	https://etherpad.opendev.org/p/KollaWhiteBoard	15:09
yoctozepto	no worky	15:09
yoctozepto	only loady	15:09
corvus	it looks like we have deleted all of the months-old opensuse-15 images	15:09
corvus	oh wait	15:10
yoctozepto	An error occurred	15:10
yoctozepto	The error was reported with the following id: 'LxotxdY5BrhtpIZtbDud'	15:10
corvus	we still have one that's 69 days old on nb02	15:10
mordred	corvus: well, that one won't have the pip-and-virtualenv element removed - maybe that's the most recent?	15:10
corvus	yeah, i think if we revert back to nb02, it'll upload those	15:11
fungi	priteau just mentioned in #openstack-infra that etherpad.opendev.org is unresponsive. i'll take a look	15:12
mordred	fungi: while you're looking, see issue from yoctozepto above about that etherpad too	15:12
openstackgerrit	James E. Blair proposed openstack/project-config master: Revert "Move suse builds to nb04, drop pip-and-virtualenv" https://review.opendev.org/720223	15:12
corvus	mordred: i think the local apache mirrors for gerrit may be out of date	15:13
corvus	mordred: maybe we missed a bind mount?	15:13
yoctozepto	mordred, fungi: oh well, I asked priteau if it worked for him :-)	15:13
mordred	corvus: looking	15:13
corvus	AJaeger, fungi, mordred, cmurphy: see 720223 ^	15:13
yoctozepto	did not think he would crossreport :-)	15:13
fungi	yoctozepto: oh, well thanks, i missed your mention of it in here, sorry, there's been a lot of discussion going on	15:14
yoctozepto	fungi: sure, no problem	15:14
AJaeger	thanks, corvus	15:15
mordred	corvus: yes.	15:15
fungi	the server itself is definitely up and reachable over ssh	15:15
cmurphy	thanks corvus	15:15
fungi	and "node node_modules/ep_etherpad-lite/node/server.js" is running since some time on monday	15:15
fungi	and it's suddenly responding to me again vi browser, i didn't change anything	15:16
fungi	load average is low	15:16
openstackgerrit	Monty Taylor proposed opendev/system-config master: Add /opt/lib/git to the volume mounts https://review.opendev.org/720225	15:17
fungi	nothing going haywire with the kernel per dmesg	15:17
fungi	cacti doesn't show anything particularly anomalous either	15:19
openstackgerrit	Monty Taylor proposed opendev/system-config master: Use prepare-workspace-git in production playbook https://review.opendev.org/720227	15:19
mordred	corvus: ^^ I believe the first of those will fix the gerrit local git replica issue	15:20
mordred	corvus: and the second should fix my git repo replication issue	15:21
mordred	actually ... let me change that	15:21
fungi	yoctozepto: so far i'm not finding anything on the server to explain the temporary outage. https://etherpad.opendev.org/p/KollaWhiteBoard is loading now too	15:22
fungi	might have been network-related, but i'm going to dig deeper in logs	15:22
yoctozepto	fungi: yes, thanks; I can only offer this id LxotxdY5BrhtpIZtbDud	15:22
yoctozepto	maybe it's greppable or something :D	15:23
fungi	i'm checking	15:23
yoctozepto	duck, I got another failure	15:24
yoctozepto	hPmlyLrRaK3KZKl6i4OY	15:24
openstackgerrit	Monty Taylor proposed opendev/system-config master: Just use synchronize to sync the repos https://review.opendev.org/720227	15:24
corvus	those are both "Uncaught TypeError: Cannot read property 'setStateIdle' of null"	15:26
mordred	corvus: ^^ I think that's a better approach for our use on bridge	15:26
fungi	i find a couple of recent proxy errors apache logged at 15:06:22z	15:27
fungi	"AH01102: error reading status line from remote server localhost:9001" and "AH00898: Error reading from remote server returned by /socket.io/"	15:27
fungi	those may be unrelated though	15:27
yoctozepto	this must be nodejs looking at that message	15:28
mordred	yup	15:28
fungi	do we use docker-compose to view etherpad's service logs now? are those written to disk in the chroot or spewed on stdout/stderr?	15:28
mordred	fungi: spewed	15:28
mordred	fungi: cd /etc/etherpad-docker ; docker-compose logs	15:28
mordred	fungi: will get you the spew	15:28
mordred	(-f will tail)	15:28
fungi	appreciated	15:29
openstackgerrit	Marcin Juszkiewicz proposed openstack/project-config master: Add CentOS 8 AArch64 nodes https://review.opendev.org/720167	15:29
corvus	fungi: i ran: "docker logs etherpaddocker_etherpad_1\|grep -C 8 LxotxdY5BrhtpIZtbDud" to get the error message above	15:29
*** ykarel is now known as ykarel\|away		15:29
corvus	https://github.com/ether/etherpad-lite/issues/3405 is relevant	15:29
mordred	corvus, fungi: I know there'sa. ton of things going on - but could I get a quick +A on 720227? we're dead in the water on bridge without it, and that means we can't land the nodepool revert that we need for the keystone issue	15:30
mordred	(when it rains it pours)	15:30
corvus	mordred: oh :( well i wanted to really look into that	15:31
corvus	but if we need to just merge it to put out fires sure	15:31
corvus	how did we get into that situtation though?	15:31
clarkb	mordred: is there a tl;dr of nodepool issue?	15:31
corvus	oh i guess this only runs in prod	15:31
yoctozepto	17:30:39 <mordred> (when it rains it pours)	15:31
yoctozepto	++	15:31
mordred	corvus: we merged the "run from git" patch - and it failed being unable to sync the git repos to bridge	15:31
mordred	yeah	15:31
*** redrobot has joined #opendev		15:32
clarkb	and nodepool change I assume is related to the opensuse things?	15:32
clarkb	note I think opensuse like fedora3X was not actually building on the old setups	15:32
mordred	corvus: yeah - it's unfortunate timing - when I clicked +A it was quiet	15:32
clarkb	so a revert is unlikely to fix anything	15:32
corvus	clarkb: revert+rollback is the proposal	15:32
mordred	clarkb: there is a 69 day old opensuse image	15:32
clarkb	ah ok if we still have old image then we are good	15:32
mordred	yeah	15:32
mordred	clarkb: but we need https://review.opendev.org/#/c/720227/ to be able to land the revert	15:33
clarkb	in the case of opensuse it isn't building due to python2 changes	15:33
mordred	clarkb: so if you have a quick morning second	15:33
clarkb	so its directly related to the change made to the image build, not to anything in the builder itself	15:33
clarkb	basically you can't have a working oepnsuse with python2 now or something	15:33
corvus	the main difference i see betwen our etherpad config and https://github.com/ether/etherpad-lite/issues/2318#issuecomment-63548542 is we don't have a timeout setting	15:34
mordred	corvus: I agree re: timeout	15:35
corvus	mordred: wait i don't understand your comment about "delete: false"	15:35
corvus	mordred: that just means that synchronize won't delete files (which could cause errors)	15:36
clarkb	corvus: oh I was just getting to that :)	15:36
corvus	i mean, i'm still okay with +2 meaning -1 just to try to dig out of this hole	15:37
clarkb	I think in the context of a git repo thats not a good thing to have set to false	15:37
corvus	yeah. it will probably work okay for the next couple of changes we land	15:37
mordred	corvus: oh - want me to put that back in? I was mostly just thinking we don't want to delete and repush project-config over and over	15:37
clarkb	mordred: well I think the best way to handle it is to use git push	15:37
corvus	i don't know why that would "delete and repush"	15:37
clarkb	not rsync	15:37
mordred	corvus: not all jobs have project-config in their required-projects	15:38
corvus	that just means "delete files on the remote side that aren't on this side"	15:38
corvus	oh	15:38
corvus	let's just merge this and replace it with the right role	15:38
mordred	kk	15:38
mordred	yeah - this should work until we can breathe and dig in better	15:38
clarkb	ok I've approved it	15:38
mordred	thx	15:38
clarkb	but ya I think we want a role that does a git push	15:38
clarkb	(and it can skip pushing if the source doesn't exist)	15:39
corvus	do we need to make sure that 720223 lands after that?	15:39
mordred	corvus: yes	15:39
yoctozepto	eh, etherpad does not like me :/	15:39
openstackgerrit	James E. Blair proposed openstack/project-config master: Revert "Move suse builds to nb04, drop pip-and-virtualenv" https://review.opendev.org/720223	15:39
mordred	corvus: +A	15:40
mordred	corvus: thanks - and sorry for timing there	15:40
corvus	mordred: np; what was wrong with prepare-workspace-git?	15:40
mordred	corvus: it might be the right choice - we were using mirror-workspace-git	15:40
corvus	mordred: here's a handy cheat sheet: http://lists.zuul-ci.org/pipermail/zuul-discuss/2020-April/001216.html	15:41
mordred	corvus: I wasn't 100% sure prepare-workspace-git was the right thing to use and figured the simple rsync would _definitely_ work in this case	15:41
corvus	mordred: prepare-workspace-git calls mirror-workspace-git	15:41
corvus	so what went wrong with mirror-workspace-git?	15:41
mordred	corvus: there were no git repos on the remote side to push to	15:42
corvus	mordred: ah, then prepare-workspace-git may well work	15:42
mordred	it tried to git config them and got an error "you can't do that without a git repo"	15:42
corvus	because prepare-workspace-git does the "use a cache if it's there, otherwise git init" step i believe	15:42
mordred	corvus: yeah. I believe that's accurrate	15:42
mordred	++	15:43
corvus	mordred: okay, want to push up a prepare-workspace-git change, and we can merge it after the nodepool change lands?	15:43
mordred	fwiw - we could handle the "delete and re-push project-config over and over again" by having the bridge playbook maintain an /opt/git cache of both	15:43
mordred	corvus: ++	15:43
corvus	mordred: i think using this role will avoid the issue; it's not going to delete any repos already in the workspace	15:44
mordred	it will - it'll call mirror-workspace-git at the end which will do the rsync --delete	15:44
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: dhall-diff: add new job https://review.opendev.org/718694	15:44
corvus	mordred: mirror-workspace-git-repos doesn't use rsync	15:45
AJaeger	config-core, could you review https://review.opendev.org/720197 - needed for release, please	15:45
mordred	corvus: sigh. so you are right :)	15:46
mordred	corvus: so yay	15:46
openstackgerrit	Monty Taylor proposed opendev/system-config master: Switch to prepare-workspace-git https://review.opendev.org/720231	15:47
mordred	clarkb corvus :^^	15:47
clarkb	AJaeger: I guess we are ok with dealing with cases where 3.6 doesn't imply 3.8 when they come up?	15:48
fungi	tailing etherpad logs, i see quite a few errors related to that KollaWhiteBoard pad	15:49
yoctozepto	hZUCejCWzEctlM9uquL5 - another token of despise from etherpad	15:49
yoctozepto	fungi: it's probably me	15:49
yoctozepto	it fails for me and other folks	15:49
fungi	not just "Uncaught TypeError: Cannot read property \'setStateIdle\' of null" but also some others	15:49
yoctozepto	we are having a kolla meeting, must be the reason	15:49
AJaeger	clarkb: for now yes	15:49
clarkb	fungi: yoctozepto the other day we were theorizing with subline that it is the client	15:49
yoctozepto	chrome 81? :D	15:50
yoctozepto	too fast? to slow?	15:50
mordred	clarkb: that some browsers are doing a bad thing?	15:50
yoctozepto	too awesome?	15:50
clarkb	yoctozepto: not specific versions of browsers but state in your browser	15:50
yoctozepto	lemme try another	15:50
fungi	"Error: Can't apply USER_CHANGES, because Trying to submit changes as another author in changeset ..."	15:50
clarkb	so try another or try private browsing mode	15:50
yoctozepto	yeah, I tried incognito already	15:50
clarkb	fungi found a bug from etherpad that showed etherpad is super sensitive to client activity too :/	15:51
prometheanfire	ian: fungi: have time to look at https://review.opendev.org/717339 ? (glean systemd-resolved thing)	15:51
fungi	"[ERROR] console - (node:1) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'colorId' of null"	15:51
clarkb	and the bug fix proposed was client side too I think	15:51
clarkb	rather than making the server robust	15:51
fungi	[WARN] client - TypeError: pad.collabClient is null"	15:52
yoctozepto	hmm, it works in firefox, but seems sluggish	15:53
corvus	fwiw, the kolla pad has been working for me without issue in ff for a while, but i haven't been writing	15:53
yoctozepto	(to load)	15:53
clarkb	AJaeger: also that sed seems to do the same replacement?	15:53
clarkb	AJaeger: it replaces python_version==3.8 to python_version==3.8 can you double check that?	15:54
AJaeger	clarkb: that's correct - it should update versions. Let me double check...	15:55
AJaeger	clarkb: we need to use '$VERSION' - that's the difference	15:56
clarkb	AJaeger: got it	15:56
AJaeger	thx	15:58
openstackgerrit	Merged zuul/zuul-jobs master: Adds roles to install and run hashicorp packer https://review.opendev.org/709292	16:01
fungi	saw a similar setStateIdle warning pop up for an unrelaetd pad	16:02
fungi	unrelated	16:02
fungi	i wonder if we're just running into tuning errors and today is the first day we've got the new deployment under typical load	16:03
yoctozepto	yikes, it finally loaded	16:04
fungi	i picked another pad i saw scroll by in the logs and am getting indefinite "loading" from it	16:04
yoctozepto	fungi could be right	16:04
fungi	okay, the one i was trying to load finally loaded	16:05
openstackgerrit	Brian Rosmaita proposed openstack/project-config master: Change gerrit ACLs for cinder-tempest-plugin https://review.opendev.org/720235	16:05
AJaeger	fungi, could you review https://review.opendev.org/720197 , please?	16:05
openstackgerrit	Brian Rosmaita proposed openstack/project-config master: Change gerrit ACLs for cinder-tempest-plugin https://review.opendev.org/720235	16:09
fungi	okay, spotted a "[WARN] client - TypeError: r.dropdowns is undefined" for https://etherpad.opendev.org/p/octavia-priority-reviews which is likely related to https://github.com/ether/etherpad-lite/issues/3464 and the later https://github.com/ether/etherpad-lite/issues/3861	16:14
fungi	checking to see if that pad is broken	16:14
openstackgerrit	Merged openstack/project-config master: Update update_constraints for Py3.8 https://review.opendev.org/720197	16:19
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	16:22
mordred	corvus: ^^ ok - that's now updated to use pip to install the executor	16:23
mordred	YAY the system-config fix patch failed on a puppet module remote repo being unreachable	16:24
corvus	i'll re-enqueue it	16:25
mordred	ok - cool	16:25
corvus	mordred: it doesn't make sense to land 231 right after 227	16:27
corvus	mordred: which do you want?	16:28
mordred	corvus: why don't we just do 231	16:28
corvus	i'm starting to think we should just put all of nodepool in the emergency file and manually fix it	16:28
mordred	corvus: yeah	16:29
corvus	because we're now at 1 hour past our decision to rollback and have made no progress on actually doing it	16:29
mordred	corvus: i'll add nodepool to emergency	16:29
corvus	i'll start logging into the builders	16:30
mordred	corvus: we just need the builders right?	16:31
corvus	mordred: yeah	16:32
mordred	k. done	16:32
*** rpittau is now known as rpittau\|afk		16:33
fungi	to follow up on my earlier speculation, https://etherpad.opendev.org/p/octavia-priority-reviews doesn't seem permanently broken (it loaded for me at least) so that "r.dropdowns is undefined" warning is apparently not always accompanied by a broken pad	16:34
johnsom	fungi FYI, we just noticed that pad won't open for some of us anymore. It times out.	16:35
johnsom	Rough time to lose our priority planning etherpad I have to say.	16:35
openstackgerrit	Monty Taylor proposed opendev/system-config master: Switch to prepare-workspace-git https://review.opendev.org/720231	16:35
fungi	johnsom: yeah, i'm starting to suspect tuning issues. we just switched out deployment to use containers so may be hitting different performance limitations, but we also upgraded to a newer etherpad release so could just be hitting new regressions in the software	16:36
corvus	this is nice -- it's easy to revert nb04 since its nodepool.yaml is a git checkout	16:37
corvus	but i have to copy the file on nb01 and nb02	16:37
mordred	corvus: yay for things being nicer in the future	16:38
mordred	corvus: actually - I think the not-yet-landed project-config would make it back in to files - but I think "I want to easily revert a change in an emergency" is a good use case, so I'll make sure we retain the nb04 behavior when we roll that out	16:39
mordred	corvus: nope - nevermind - it'll stay a git repo	16:40
johnsom	https://www.irccloud.com/pastebin/afzjDqgX/	16:42
johnsom	fungi FYI ^^^	16:43
fungi	johnsom: yep, that's another of the "[WARN] client - Uncaught TypeError: Cannot read property 'setStateIdle' of null" events	16:48
clarkb	fungi: does the apache status page show us filling up on connections?	16:49
openstackgerrit	Merged opendev/system-config master: Just use synchronize to sync the repos https://review.opendev.org/720227	16:50
corvus	clarkb, fungi, mordred: okay i think the configs are reverted on nb01, nb02, and nb04	16:50
fungi	clarkb: i was just working on trying to connect to it, we do seem to be flat-lining at 500 established per cacti, lower than typical before the switch as you can see at http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=116&rra_id=all	16:50
corvus	i think i should now delete the opensuse-15-0000086893 and opensuse-15-0000086892 dib images?	16:51
corvus	that should then prompt nb02 to upload the opensuse-15-0000053000 image?	16:51
mordred	corvus: yes - and nb should upload the old one	16:51
mordred	yeah	16:51
clarkb	fungi: I wonder if our old tuning isn't applying beacuse bionic apache uses a new mpm worker system compared to xenial	16:52
clarkb	fungi: but ya that seems like a good thread to pull on	16:52
fungi	i've confirmed that the new deployment at least hasn't broken reaching https://etherpad.opendev.org/server-status from a local shell on the server (and also hasn't inadvertently exposed it to the public)	16:54
fungi	firefox always likes to pick the worst possible times to tell me it needs to restart for an upgrade	16:56
fungi	the scoreboard still has quite a few open slots	16:58
corvus	okay, we have no opensuse-15 images now; i don't see an upload happening yet	16:58
fungi	148 requests currently being processed, 2 idle workers	16:59
fungi	claims we're still using the "event" mpm	16:59
corvus	i wonder if it's because there's still an deleted image in vexxhost for it	16:59
mordred	corvus: I didn't think we blocked on that - but maybe I'm wrong?	17:01
corvus	instance d2d73e84-d988-4605-a596-b0ddef9b2b23 in vexxhost has been deleting for 18 days	17:01
mordred	that didn't block it from uploading the new image last night	17:01
clarkb	corvus: thats "normal" beacuse openstack	17:01
corvus	mordred: right but this is an old image	17:01
mordred	good point	17:01
corvus	it's an image that already has existing zk records because it "exists" because it's "deleting"	17:02
corvus	can anyone try deleting the that instance while i try to figure out what nodepool should do in this case?	17:02
clarkb	fungi: taking a quick look at the server we have /var/log/apache2 logs for gerrit vhost (I think that must be copy paste error taking gerrit ansible and adopting it for etherpad) we should clean that up	17:02
mordred	corvus: I'll take a stab at it	17:03
fungi	clarkb: yeah, i saw that too	17:03
clarkb	fungi: I'll take a look at that now while I'm thinking about it	17:03
fungi	interestingly we're logging traffic in those	17:04
clarkb	mordred: corvus ime there are two states. One is where volume is attached to a server that does not exist. That we can clean up by removing the attachment and deleting the volume. The other is server refuses to delete which keeps the whole resource chain alive. That requires cloud intervention	17:04
fungi	oh, but they're etherpad access requests	17:05
clarkb	also I do not think that would affect nodepool's ability to make new images	17:05
corvus	we don't want it to make a new image	17:05
corvus	we want it to upload an old image	17:05
openstackgerrit	Merged openstack/project-config master: Revert "Move suse builds to nb04, drop pip-and-virtualenv" https://review.opendev.org/720223	17:06
mordred	corvus: I don't see that instance in vexxhost - by instance you mean server here right?	17:07
corvus	mordred: yes	17:08
corvus	\| 0014437332 \| vexxhost-sjc1 \| opensuse-15 \| d2d73e84-d988-4605-a596-b0ddef9b2b23 \| 38.108.68.90 \| 2604:e100:3:0:f816:3eff:fe52:b724 \| deleting \| 18:02:57:09 \| locked \|	17:08
mordred	corvus: thanks	17:08
openstackgerrit	Clark Boylan proposed opendev/system-config master: Fix etherpad port 80 logging https://review.opendev.org/720245	17:08
clarkb	fungi: ^	17:08
corvus	mordred, clarkb, fungi, AJaeger, cmurphy: i figured out why nodepool isn't uploading the old image	17:08
mordred	yeah?	17:09
corvus	it is no longer on the filesystem of the nodepool image builder	17:09
mordred	oh	17:09
corvus	so we've lost it	17:09
AJaeger	oops ;(	17:09
fungi	ahh, right, we "fixed" that in nodepool	17:09
corvus	fungi: we what?	17:09
clarkb	corvus: fungi the change was once all images were in a deleting state we could delete the image from disk	17:09
fungi	because before, it would pile up local copies of images on the builders until they could be completely deleted from all the providers	17:09
corvus	it's not in a deleting state, it's ready.	17:09
clarkb	rather than waiting for the image to delete from the cloud because what was happening is vexxhost was failing to delete many images and then our disks filled up	17:09
corvus	this was deleted out from under nodepool.	17:09
fungi	oh, then that's different	17:10
corvus	though, also, that's a really unfortunate nodepool behavior	17:10
fungi	what got fixed was to have the builder delete the local copy once it told all providers to delete their copies, whether or not the delete command was sucessful/completed	17:10
clarkb	\| 0000053000 \| 0000000002 \| vexxhost-sjc1 \| opensuse-15 \| opensuse-15-1580919581 \| c5b3b55a-4c74-4d41-998c-265342ab3afc \| deleting \| 33:14:40:58 \|	17:10
clarkb	is it that image? beacuse it is deleting	17:10
clarkb	fungi: right becuse otherwise we'd need like 10TB fo disk	17:11
corvus	that's the upload not the diskimage	17:11
corvus	\| opensuse-15-0000053000 \| opensuse-15 \| nb02 \| qcow2,raw,vhd \| ready \| 70:01:24:58 \|	17:11
corvus	that is the image that nodepool told us is ready to be uploaded	17:11
fungi	because nodepool has no control over whether providers actually follow through on image delete requests, so we were filling up the hard drives when providers failed to be able to process a delete for various reasons	17:11
corvus	yeah, i get it	17:11
corvus	so 1 of 2 things happened here: either one of us deleted the image from disk behind nodepool's back to free up space	17:11
corvus	or, somehow this new behavior change we made to nodepool did apply to this case, in which case, we seem to have programmed our software to lie to us	17:12
clarkb	corvus: I think that may be the case because we can't remove the image record until all uploads are done due to the zk fs hierarchy? and maybe thats a bug where we need to update the state on the dib build as a result?	17:12
corvus	either way, we just blew 2 hours of work	17:12
corvus	because it said "ready" when it wasn't	17:12
corvus	clarkb: if nodepool deleted the diskimage, then there is no excuse for it saying "ready". we have "deleting" for that.	17:13
clarkb	basically the record can't go away until all the uploads go away so we need to update that record state and it may be a bug that we don't (I haven't checked that in the code)	17:13
clarkb	corvus: I get that, but code has buigs	17:13
clarkb	its clearly not intentional if that is the case	17:13
corvus	mordred: can you check the openstack state of the image with id c5b3b55a-4c74-4d41-998c-265342ab3afc ?	17:14
mordred	corvus: it shows active	17:14
mordred	corvus: is that the image we want?	17:14
corvus	clarkb: yes, we agree that if that is the case, then it's a bug	17:15
clarkb	mordred: yes that that the copy of the image in vexxhost sjc1	17:15
mordred	corvus: we can download it from openstack	17:15
clarkb	*that is	17:15
corvus	mordred: yes. so we may be able to convince nodepool to continue to use that	17:15
mordred	corvus: why don't we download it as well, just to be on the safe side	17:15
corvus	mordred: first thing: were you successful in deleting that instance?	17:15
clarkb	I wasn't around when all of this was originally debugged. Did we decide we can't roll forward for some reasno (thinking about options here)	17:15
fungi	clarkb: there are issues raised with the next steps job config changes	17:16
corvus	clarkb: see my -1 on https://review.opendev.org/717663	17:16
clarkb	fungi: right but one that is easily fixable	17:16
mordred	corvus: no - I do not see if thwen I look for it - which is very strange to me	17:16
corvus	mordred: neat, at least we're in stasis	17:16
corvus	mordred: then yes, let's start by downloading that	17:17
mordred	corvus: ok. I'm going to do that now	17:17
fungi	clarkb: seemed mostly a decision as to whether it would be more work/faster/better guaranteed to return to a known state	17:17
fungi	though i think it was also assumed at the time that rolling back to the older image would be relatively easy	17:18
corvus	yep, and that is SOP in situations like this	17:18
clarkb	ya, I think the thing that makes this odd is we haven't been able to build that image for months (very similar to the fedora situation)	17:19
clarkb	normally I would agree	17:19
clarkb	and probably would have this morning. Just wanting ot make sure the other options were considered too (and if so what counted against them)	17:19
mordred	corvus: I am downloading the image to /opt/nodepool_dib/opensuse-image.save.raw on nb02	17:21
mordred	~/osc/bin/openstack --os-cloud=vexxhost --os-region-name=sjc1 image save c5b3b55a-4c74-4d41-998c-265342ab3afc --file=/opt/nodepool_dib/opensuse-image.save.raw	17:21
mordred	fwiw	17:21
corvus	mordred: cool, i think when that finishes we probably want to make md5 and sha256 files, then copy that to opensuse-15-0000053000.raw	17:23
corvus	nodepool also expecs qcow2 and vhd	17:24
corvus	maybe we can just let it fail those uploads?	17:24
corvus	or maybe we can edit the zk record	17:24
mordred	we could convert them	17:24
corvus	or that	17:24
mordred	we have the conversion tools on the host after all	17:24
clarkb	note that nodepool may try to delete them again if that image does end up deleting (periodic cleanup by provider maybe)	17:24
fungi	checking the etherpad apache server-status periodically, we have 5 of the currently 11 running workers perpetually in "stopping" state due to being on an old config generation, so not accepting connections. though i don't think that's currently causing issues because there are as many open slots for more worker processes too	17:27
fungi	i take that to mean we've updated the apache config since the parent started, and those workers are in a graceful shutdown but still have existing clients who haven't closed out (or where the line has gone dead and apache doesn't know they're never coming back)	17:29
openstackgerrit	Monty Taylor proposed opendev/system-config master: Use project-config from zuul instead of direct clones https://review.opendev.org/719343	17:29
corvus	mordred: not the speediest process is it?	17:31
mordred	corvus: nope	17:32
mordred	corvus, clarkb : ^^ I had to rebase that patch due to merge conflict	17:33
corvus	mordred: do you know how to do those conversions?	17:34
fungi	i spot-checked one of the "Cannot read property 'setStateIdle' of null" hits in the log just now and found it correlated to a request which started for the old domain (determined through correlation with /var/log/apache2/etherpad.openstack.org_access.log since we're logging that redirect vhost separately). will try to see if that is consistent	17:34
mordred	corvus: I can pull it out of the dib source	17:35
corvus	mordred: cool, if we want to do that, now's probably a good time to get that ready	17:35
corvus	mordred: is the d/l finished?	17:36
mordred	corvus: yes - it just finished	17:36
mordred	cp $TMP_IMAGE_PATH $1-intermediate	17:36
mordred	vhd-util convert -s 0 -t 1 -i $1-intermediate -o $1-intermediate	17:36
mordred	vhd-util convert -s 1 -t 2 -i $1-intermediate -o $1-new	17:36
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Check if pip is preinstalled before installing it https://review.opendev.org/720254	17:36
corvus	mordred: cool, i'll do the work for the raw image if you want to convert	17:36
mordred	that's the "convert to vhd" steps	17:36
corvus	mordred: or i can do those too	17:36
mordred	I can do the converts	17:36
corvus	ok	17:36
mordred	you want me to wait until you've renamed?	17:36
mordred	or should we cp and keep this as-is just in case?	17:37
corvus	mordred: i was going to copy, in order to avoid nodepool deleting it :)	17:37
mordred	yeah	17:37
clarkb	I think https://review.opendev.org/720254 addresses corvus and tristanC's concern with ensure-pip	17:37
clarkb	I'm hoping that existing tests will help point me in the right direction as far as testing goes (because so many of our images come preinstalled with pip)	17:37
corvus	i was told there was a 'plain' image to verify this stuff	17:38
clarkb	corvus: oh right that one should be "clean" lets see if it runs tests yet	17:38
mordred	clarkb: we should make sure that the 3pci shows it doesn't try to reinstall	17:38
mordred	corvus: wow. even just copying the image is slow	17:39
corvus	mordred: i'm several minutes into an md5sum	17:39
clarkb	mordred: ya if it clears our initial tests I can rebase into ianw's stack and that should haev 3pci run it	17:39
tristanC	clarkb: commented	17:39
clarkb	tristanC: because `pip` shoudl always be present regardless of python version	17:40
clarkb	then we check version specifics based on what is enabled	17:40
tristanC	clarkb: i meant the change assigns shell variable using `if` jinja statement, and then it evaluate content based on `if` shell statement. couldn't the type command be selected by the jinja `if` statement?	17:41
clarkb	tristanC: it could but I thought it was easier to set flags (basically translate yaml truthyness to bash truthyness) then evaluate the results in a bash context	17:42
clarkb	this way you don't have to parse jinjayamlbash	17:42
clarkb	and instead its jinjayaml then bash	17:43
tristanC	clarkb: hmm ok	17:44
tristanC	iirc, a tox user who want python2 needs to set both `tox_prefer_python2: true` and `ensure_pip_from_packages_with_python2` ?	17:47
clarkb	tristanC: or ensure_pip_from_upstream and ensure_pip_from_upstream_interpreters has python2 in it	17:48
*** dpawlik has quit IRC		17:48
clarkb	though maybe what you mean is if you want python packages you need that? since pip_from_upstream doesn't imply python packages	17:49
smcginnis	Hopefully quick and easy question - do we have any nodes with py38 available yet?	17:49
clarkb	smcginnis: I believe the bionic nodes can do that with special packages (the tox-py38 enables them)	17:50
smcginnis	Perfect, thanks clarkb.	17:50
mordred	corvus: qcow2 image convert should be: qemu-img convert -f raw -O qcow2 opensuse-15-0000053000.raw opensuse-15-0000053000.qcow2	17:51
fungi	smcginnis: yeah, if you just use the py38 jobs they should work magically	17:52
mordred	corvus: I am currently doing the second stage of the vhd convert	17:52
smcginnis	fungi, clarkb: Would that include openstack-tox-functional jobs?	17:52
clarkb	smcginnis: I don't know. You may have to add the package installs that tox-py38 does to tox-functional if it isn't already there	17:53
fungi	smcginnis: yeah, no clue, i've only seen folks using the tox-py38 job so far	17:53
smcginnis	Guess we'll find out.	17:54
fungi	but that installs the python3.8 package on the default image	17:54
fungi	er, default node type	17:54
*** ralonsoh has quit IRC		17:55
corvus	mordred: nodepool is attempting to upload images now (but failing since not all files are in place)	17:56
corvus	so either the md5sum file or the "vhd-new" file is enough for it to think there's an image there	17:56
corvus	anyway, i think that's good, harmless, but chatty :)	17:56
corvus	okay, all 3 raw pieces are in place	17:57
mordred	corvus: cool	17:58
corvus	and it looks like we're really uploading to vexxhost now	17:59
mordred	corvus: I'll do the qcow2 conversion as soon as the vhd conversion is done	17:59
corvus	mordred: cool -- want me to do the checksums and rename for vhd, or you?	17:59
mordred	corvus: if you could do the checksums that would be neat	18:00
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Check if pip is preinstalled before installing it https://review.opendev.org/720254	18:00
corvus	will do; lemme know when it's ready	18:00
mordred	will do	18:00
clarkb	re image conversions for qcow2 you might need to set the compatibility flag. I 'm not sure if we ever managed to decide if that was or wasn't needed anymore	18:01
clarkb	the --compare=0.10 or something similar flag	18:01
mordred	clarkb: I believe we stopped doing it	18:01
mordred	corvus: done	18:01
mordred	clarkb: we were only doing that for hp cloud anyway	18:01
clarkb	mordred: ya at this point it would surprise me if there were any qemu-imgs in the wild old enough to trip voer that	18:02
mordred	clarkb: I also don't see us setting QEMU_IMG_OPTIONS	18:02
mordred	corvus: I am now doing the qcow conversion	18:03
corvus	ack	18:03
clarkb	mordred: well if some clouds are unhappy with it without the flag we'll learn something :)	18:03
clarkb	(and probably be able to suggest strongly that people upgrade qemu)	18:03
mordred	clarkb: re-review https://review.opendev.org/#/c/719343/ ?	18:11
* mordred would like to get that done since there's a manual transition step and we're sort of in the awkward half-rolled-out stage :)		18:12
clarkb	mordred: is that gonna need a new rebase when the git role changes ?	18:12
clarkb	maybe we should decide on an order there with some depends on?	18:12
mordred	clarkb: or else the git role change will need a rebase	18:13
mordred	clarkb: I'd like to get the zuul one landed first (deleing the extra ansible.cfg is important) - then I'll rebase the other one	18:13
mordred	clarkb: (turns out that ansible.cfg in the root of the repo was a bad idea)	18:14
mordred	corvus: qcow2 is done	18:14
corvus	neat, still waiting on the sha256 from vhd :)	18:15
mordred	cool	18:15
mordred	corvus: have brainspace for a rebase re-review of https://review.opendev.org/#/c/719343/ while we wait? (ok if not)	18:15
openstackgerrit	Merged opendev/system-config master: Fix etherpad port 80 logging https://review.opendev.org/720245	18:16
mordred	hrm. that patch was unhappy in deploy ... why	18:17
openstackgerrit	Monty Taylor proposed opendev/system-config master: Remove infra-prod-update-system-config from etherpad https://review.opendev.org/720261	18:18
mordred	fungi, clarkb : ^^	18:18
corvus	vhd summed and moved into place	18:19
mordred	corvus: wot	18:19
mordred	woot	18:19
corvus	mordred: did you create the qcow2 as the final name?	18:20
mordred	corvus: oh - I did. cause I'm dumb	18:20
corvus	mordred: i think it's been uploading the qcow2 for 15 minutes, but it only finished converting 5 min ago	18:20
corvus	i don't know what that's going to do	18:20
corvus	that's kna1, mtl01, limestone, openedge, ovh	18:21
corvus	maybe it'll just worke?	18:21
mordred	corvus: maybe? maybe it'll just be reading from a file that's being appended to	18:22
corvus	i don't know if it does anything with sizes or checksums beforehand though	18:22
clarkb	corvus: it does, but I don't think it checks any of that except for on rax	18:23
clarkb	and there its just checking the checksum for reuploading purposes?	18:23
mordred	yah	18:23
corvus	okay, checksum files for qcow2 are in place	18:25
corvus	it looks like we're now really uploading everywhere	18:25
mordred	corvus: woot	18:27
mordred	clarkb: heh. your zuul-jobs fix for opensuse failed on there being no opensuse images	18:34
clarkb	mordred: yup, it also failed on -plain and centos ps1 but ps2 looks good	18:35
clarkb	I think tjat implies our testing has reasonable coverage	18:36
mordred	yeah. I agree	18:36
AJaeger	team, I'm puzzled https://docs.openstack.org/python-cinderclient/latest/ gives me a 404 - but https://docs.openstack.org/python-cinderclient/ussuri/ exists	18:36
AJaeger	looking at the last promote job via https://review.opendev.org/#/c/719080/ - everything looks fine.	18:38
AJaeger	can we run the promote job again?	18:40
openstackgerrit	Donny Davis proposed openstack/project-config master: Adding custom label to OE for airship support https://review.opendev.org/720263	18:41
AJaeger	or has anybody an idea why after the upload there's no content?	18:42
clarkb	AJaeger: I think the job log records what it rsyncs? /me is lokoing	18:42
clarkb	https://zuul.opendev.org/t/openstack/build/01ec599f1d4b4aa5a8e1297d20f24e3a/log/job-output.txt#137 heh I guess not	18:43
clarkb	AJaeger: is ^ that the job that needs to be rerun?	18:43
AJaeger	yes - and rsync output ishttps://zuul.opendev.org/t/openstack/build/01ec599f1d4b4aa5a8e1297d20f24e3a/console#1/0/23/localhost	18:44
AJaeger	https://zuul.opendev.org/t/openstack/build/01ec599f1d4b4aa5a8e1297d20f24e3a/console#1/0/23/localhost	18:44
clarkb	AJaeger: hrm that seems to show files being copied to the correct place. We need an index.html for your url to work right?	18:46
clarkb	(and there is an index.html copied)	18:47
clarkb	AJaeger: if you look in afs the files are there	18:48
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Use TOX_CONSTRAINTS_FILE in release script https://review.opendev.org/720265	18:48
AJaeger	clarkb: they are in afs - but not displayed on docs.o.o?	18:49
AJaeger	clarkb: do you also get a 404 on https://docs.openstack.org/python-cinderclient/latest/ ?	18:49
clarkb	AJaeger: I do	18:49
clarkb	if I try to navigate to /afs/openstack.org/docs/python-cinderclient/latest on static01.o.o that fails	18:50
clarkb	so I think this is an afs issue	18:50
AJaeger	;(	18:50
clarkb	perhaps related to [Wed Apr 15 00:08:56 2020] afs: Waiting for busy volume 536871090 () in cell openstack.org	18:50
clarkb	I'm going to try and invaldiate the cache things for that path	18:51
clarkb	I just have to remember hwo to do that	18:51
fungi	oh, maybe the vos release hasn't completed?	18:54
clarkb	maybe? fwiw `fs flush` on that path fails becuse it thinks it doesn't exist	18:55
clarkb	flushing the parent dir didn't help	18:55
fungi	/afs/openstack.org/docs/python-cinderclient/latest wouldn't exist, would it? i thought that was a redirect	18:55
clarkb	fungi: its perfectly navigable on hosts that are not static01.opendev.org	18:56
fungi	oh, i see	18:56
clarkb	fungi: and the log AJaeger shared shows we copy directly into it	18:56
fungi	and also /afs/.openstack.org/docs/python-cinderclient/latest is navigable from static	18:56
clarkb	I don't think it is a redirect	18:56
clarkb	listvldb shows there isn't a release in progress	18:56
fungi	oh, right, we redirect to latest	18:57
fungi	yeah, so this does seem like a cache problem if other clients see it	18:57
fungi	yesterday's kernel upgrade seems to have broken my local openafs lkm	18:57
clarkb	we could try restarted openafs services on static or rebooting it	18:57
clarkb	we could also try a flushvolume	18:58
clarkb	which is the more heavy handed version of flush that applies to the volume entirely	18:58
clarkb	shoudl I try fs flushvolume first? that seems like maybe the least heavy hadned thing we can do next	18:58
clarkb	didn't hel	18:59
clarkb	*help	18:59
corvus	AJaeger, clarkb, fungi, mordred, cmurphy: some uploads of the old image have completed, so i think we should be back to where we were yesterday	19:00
fungi	thanks corvus, mordred!	19:00
clarkb	corvus: I can recheck my ensure-pip change to check	19:00
AJaeger	thanks, corvus and mordred !	19:00
clarkb	also looks like systemctl stop openafs-client ; systemctl start openafs-client might be the next thing to try on static?	19:01
clarkb	that will blip everythign though	19:01
fungi	less of a blip that a reboot at least	19:01
fungi	but yeah, that's where i'd go next unless corvus has suggestions	19:01
cmurphy	thanks corvus	19:02
fungi	clarkb: is the kernel logging anything	19:02
fungi	ahh, just the "Waiting for busy volume"	19:02
clarkb	fungi: kern.log just shows those waiting for busy volume	19:02
clarkb	ya	19:02
fungi	clarkb: more than one volume though	19:02
clarkb	let me wee what volume that id belongs to	19:02
fungi	looks like they were all for volume 536870992 in previous weeks, but 536871090 is the one from earlier today	19:03
*** hashar has quit IRC		19:04
clarkb	project.airship maybe? its got 536871091 and 536871092 now	19:04
fungi	that's for the https://docs.airshipit.org/ site	19:06
clarkb	ya	19:07
clarkb	which isn't where python-cinderclient docs are stored so could eb those warnings are just noise?	19:07
fungi	i'm suspecting they may be unrelated, yes	19:07
fungi	especially since they're occurring infrequently	19:08
fungi	there's only one entry in dmesg from today, and it was around 08:00z if memory serves	19:08
AJaeger	http://zuul.opendev.org/t/zuul/stream/3d52a5bad3f643528d1ab115d12756bc?logfile=console.log is an opensuse-15 log ;)	19:11
clarkb	I'm not coming up with anything better than stop starting openafs-client. Except for maybe use the rw volume for now	19:11
clarkb	(and that will let us debug further)	19:11
AJaeger	and clarkb's change passed now	19:13
*** factor has quit IRC		19:14
*** factor has joined #opendev		19:14
corvus	oy, there's another fire? /me catches up on afs stuff	19:14
clarkb	fwiw I checked lsof against that path and its parent and it says nothing has parent open and child doesn't exist	19:15
clarkb	(just in case there would be clues in the kernel file tables)	19:15
openstackgerrit	Merged opendev/system-config master: Use project-config from zuul instead of direct clones https://review.opendev.org/719343	19:16
openstackgerrit	Merged opendev/system-config master: Remove infra-prod-update-system-config from etherpad https://review.opendev.org/720261	19:16
clarkb	mordred: ^ fyi	19:17
mordred	clarkb: woot	19:17
mordred	I have renamed the zuulcd user and moved the home dir - so that _should_ run without issue	19:18
clarkb	I need to find lunch. On static.o.o's /afs/openstack.org/docs/python-cinderclient/latest issue my only current input is that maybe we need to restart openafs-client there. I can't find anything in logs or vos output saying that it is unhappy. But it definitely doesn't seem to stat	19:20
clarkb	the dir does stat and is navigable on other hosts	19:20
corvus	clarkb: i have run some flush commands as root and they made it better	19:20
clarkb	corvus: hrm I ran fs flush on the cinderclient/ and cinderclient/latest paths as well as flushvolume on cinderclient/ and cinderclient/latest	19:21
corvus	clarkb: as root?	19:21
clarkb	corvus: I ran those from static01. did you do differently?	19:21
clarkb	yes	19:21
corvus	huh. then maybe the 'fs checkvolumes' command helped	19:21
corvus	i ran that as non-root, but initially didn't think it did anything, but i may have been mistaken	19:22
AJaeger	https://docs.openstack.org/python-cinderclient/latest/ is working now - thanks!	19:22
corvus	at any rate, some combination of those 3 commands run as some combination of non-root and root seem to have helped	19:22
clarkb	corvus: http://paste.openstack.org/show/792181/ that is what I ran	19:22
corvus	if it happens again, maybe we can narrow it down more	19:22
AJaeger	let me spider again ;)	19:23
AJaeger	(openstack-manuals merge does some sanity check for indices)	19:23
corvus	clarkb: me too, though i did it from the python-cinderclient directory against '.'	19:23
corvus	mordred: i think we can remove nb from the emergency file now, yeah?	19:24
clarkb	so ya maybe checkvolumes was what we needed. I'll keep thati n mind for testing if this comes up again	19:24
clarkb	(basically try that first then test paths I guess)	19:24
mordred	corvus: yes - I agree - I'll do that in just a bit	19:28
mordred	corvus, clarkb : the project-config chagne did not work - we hit retry limit on it in deploy pipeline - I'm looking on the zuul scheduler to try to figure out why	19:29
clarkb	mordred: k, I'm making a burger but can help after lunch	19:29
dirk	corvus: ajaeger: cmurphy: the original issue is fixed id we'd get a new dib release	19:32
dirk	There is a fix in there.that would make pip-and-virtualenv element work again and then we have time to figt out things	19:33
mordred	clarkb: I may need it - I'm not sure what I'm looking for :(	19:33
clarkb	mordred: usually if you grep the job name you find the jobs that ran. They'll have an event id in the logs then you grep that id and do a trace	19:33
clarkb	at least thats been how I've debugged similar in the past. Also you can look in logstash if we are caught up	19:34
clarkb	but it will only have info if there were logs published	19:34
openstackgerrit	Merged openstack/project-config master: Adding custom label to OE for airship support https://review.opendev.org/720263	19:35
mordred	clarkb: I'm dumb	19:36
mordred	clarkb: I missed a rename	19:36
mordred	clarkb: turns out - when you rename a user in /etc/passwd - you ALSO need to rename the user in /etc/shadow :)	19:36
*** factor has quit IRC		19:36
mordred	clarkb: I want enqueue-ref for re-triggering the deploy pipeline right?	19:38
mordred	corvus: ^^ ?	19:39
mordred	does zuul enqueue-ref --pipeline deploy --ref refs/changes/43/719343/19 --trigger gerrit --tenant openstack --project opendev/system-config look reasonable?	19:41
mordred	or I need newrev and oldrev don't I?	19:42
*** factor has joined #opendev		19:43
mordred	it's a change-merged trigger - so I think I don't	19:44
corvus	for change merged you want 'enqueue'	19:44
mordred	ah - cool	19:45
corvus	should be just like a check/gate enqueue	19:45
*** osmanlicilegi has quit IRC		19:45
mordred	corvus: zuul enqueue --pipeline deploy --change719343 --trigger gerrit --tenant openstack --project opendev/system-config	19:45
mordred	so that look ... sigh. with a =	19:45
mordred	zuul enqueue --pipeline deploy --change 719343 --trigger gerrit --tenant openstack --project opendev/system-config	19:46
mordred	that look more sane?	19:46
corvus	--change 719343,19	19:46
mordred	k. hopefully it'll work more better this time	19:47
mordred	thanks	19:47
mordred	corvus: it is at least running - so yay!	19:48
mordred	corvus: I have removed nodepool from emergency - so we should get a nodepool ansible run this time too	19:50
mordred	corvus, clarkb : infra-prod-install-ansible has run successfully from /home/zuul	19:51
mordred	\o/	19:51
corvus	mordred: woot!	19:51
mordred	corvus, clarkb : we should be able to land https://review.opendev.org/#/c/720231/ now	19:52
openstackgerrit	Monty Taylor proposed opendev/system-config master: Add /opt/lib/git to the volume mounts https://review.opendev.org/720225	19:53
mordred	corvus, clarkb also that ^^ which should fix the local mirror issue	19:55
*** jkt has quit IRC		20:09
AJaeger	config-core, please review https://review.opendev.org/720265 - small cleanup for release	20:10
*** jkt has joined #opendev		20:10
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	20:17
corvus	mordred: i'm about to start looking into your question on ^, ok?	20:17
corvus	mordred: i suspect the answer is "no it's not still accurate, and we are using the normal python3 gear install that we get in the ansible envs"	20:19
corvus	mordred: yep, pretty sure that's the case	20:22
mordred	corvus: cool!	20:23
mordred	corvus: that excites me	20:23
mordred	corvus: I'll remove that from the comment in the next iteration then :)	20:23
corvus	mordred, clarkb, fungi: so... i can't remember if i asked this quesction or not -- i do remember i was typing it into irc right as all the fires exploded. for the zk tls work, we can either (a) run zk-ca.sh manually on bridge and copy the resulting keys into private hostvars (like we did for the gear certs). or (b) we could (all in ansible) run zk-ca.sh on bridge and slurp up the keys to put them on the	20:25
corvus	zuul hosts.	20:25
corvus	i kind of like (b) -- the keys aren't precious, it just means that if we lose bridge, we will end up rekeying zuul. that doesn't sound like a big deal.	20:26
mordred	corvus: because zk-ca.sh won't make new certs if we already have old certs, right?	20:27
corvus	yep, it's idempotent	20:27
mordred	yeah. so I like b	20:27
mordred	I don't see any point in managing them in private hostvars if we don't have to	20:27
corvus	cool, i'll start down that path then	20:28
corvus	(i will make followup changes to 717620)	20:28
mordred	cool	20:28
mordred	fwiw - manage-projects did not run well this trigger	20:29
mordred	I am now investigating	20:29
mordred	it failed on synchronize with no logs because no_loig	20:31
mordred	I'm going to say "shrug"	20:31
corvus	mordred: was that on the superceded patch?	20:31
corvus	where we replaced synchronize with the git role?	20:31
clarkb	corvus: how does the slupring in b) operate? is it different than putting things in private vars?	20:32
mordred	no - the superceeded patch landed - I just re-approved the patch to replace it with the git role	20:32
corvus	mordred: i mean, what 'synchronize' operation failed?	20:33
mordred	the synchronize that we're replacing with the git role	20:33
corvus	ok, that was my question, sorry for being unclear. i agree that shrug is the right answer	20:33
fungi	clarkb: i was assuming copying files	20:33
mordred	yeah - if the other thign fails, I'll debug _that_	20:33
corvus	clarkb: yeah, it would mean a task to copy the file from bridge to the remote zuul/nodepool node	20:33
fungi	corvus: plan b sounds safe, and less hands-on	20:34
clarkb	corvus: gotcha so major difference is not tracking it in git history	20:34
clarkb	ya I think that is fine for this use case	20:34
fungi	not as fantastical as plan 9, but then what is?	20:34
corvus	oh, we're going to need all of nodepool out of puppet for this too	20:34
mordred	plan 💩	20:34
corvus	is there anything preventing rolling nb01/02/03 into containers now?	20:35
corvus	(afaik nb04 is good, with no outstanding issues)	20:35
mordred	corvus: I don't think so - I think ianw was going to start rolling each of them out	20:35
fungi	corvus: yeah, i think that was ianw's plan next, once the pip-and-virtualenv bits are settled	20:35
mordred	corvus: I think we need to do all of zk too, yeah?	20:36
corvus	cool, i'll go ahead and write the skeleton of this change, but clearly we won't be able to land it until that happens	20:36
corvus	mordred: yeah	20:36
mordred	corvus: cool. are you doing that bit in your change? or want me to start working on a change for that. also - for nodepool-launcher	20:36
corvus	mordred: i'll focus on the CA aspects for now, and deploying to zuul; if you want to start on zk and nodepool-launcher, that'd be great; i can pitch in on that when this is done	20:37
corvus	then if all that's done, we can help ianw with the nb rollout :)	20:37
corvus	mordred: oh, i just did a bunch of docker testing for zk, let me grab my docker-compose file	20:38
clarkb	before we start deploying more services with docker compose it might be a good idea to land https://review.opendev.org/#/c/719589/ and its child	20:38
corvus	clarkb: the names are changing?	20:38
mordred	corvus: yeah - isn't that swell?	20:39
clarkb	corvus: yes docker-compose was chomping the - in dir names but now it doesn't	20:39
corvus	nice	20:39
corvus	clarkb: what happens with the upgrade?	20:39
clarkb	https://review.opendev.org/#/c/719682/ is my attempt at testing that upgrade path	20:39
clarkb	corvus: ^ seems to show everything works even with the name change, but reviewing this upgrade changei s probably worthwhile too	20:39
clarkb	I was also hoping I could spend a bit more time trying to formalize what that change does into a generic upgrade testing job/tool	20:40
corvus	what does "work" mean? does it restart/recreate containers or does it just recognize old names as its own containers still?	20:40
clarkb	corvus: based on testing it stopped the old containers and started the new containers with no problems despite the name check. When I didn't do the updated test sed's in that change we failed testinfra tests beacuse those old containers did not exist anymore	20:41
clarkb	corvus: that implies to me that its stopping old name properly, then starting new name properly	20:41
clarkb	(the job runs everything with old version, upgrades docker-compose, runs docker-compose up --force-restart, then reruns testinfra)	20:42
corvus	clarkb: does --force-recreate cause the restart?	20:42
corvus	we don't normally run that, right?	20:42
clarkb	corvus: ya its the flag that says stop and start even if container images haven't changed	20:42
corvus	i'm just trying to figure out what happens to gerrit when we land https://review.opendev.org/719589	20:42
clarkb	correct we normally rely on images to have changed in order to triggerthe restarts	20:42
corvus	so if that's omitted, and we upgrade docker-compose, do we know what happens?	20:43
clarkb	corvus: I see you're thinking that maybe new docker-compose will restart even without the force	20:43
clarkb	we can test that :) one moment I'll get a patchset up for that case	20:43
corvus	yeah, it might (a) do nothing (yay) (b) restart without any prompting (meh) (c) run a second copy (boo)	20:43
corvus	my guess based on your test so far is (a), but would be good to confirm that, because (c) would be bad.	20:44
mordred	four legs good, c bad	20:45
openstackgerrit	Clark Boylan proposed opendev/system-config master: DNM Test docker-compose upgrade https://review.opendev.org/719682	20:46
clarkb	corvus: mordred ^ that runs docker-compose up -d which is what we normally run. Then it runs testinfra against the old names (we should expcet this to pass), then it udpates testinfra to check for new names and runs testinfra. This last testinfra run should fail	20:46
clarkb	if the last testinfra run passes it implies we are running both sets of containers	20:47
clarkb	and if the second to last fails it implies a restart happens even though we don't force it to	20:47
corvus	mordred: http://paste.openstack.org/show/792186/ zk docker compose and config file from my testing -- for the first pass of containerization, we should drop all the tls stuff obviously	20:48
corvus	mordred: that's based on the upstream documentation for using the container images with docker-compose, so it's shiny and new	20:48
corvus	clarkb: ack sounds good, thx	20:48
corvus	mordred: and we have actual real different hosts, so we don't need to worry about the ports and docker-based hostnames and stuff	20:49
clarkb	also actual different hosts are important for taking advantage of reliability there	20:49
corvus	and we probably have some tuning in our current config we should make sure not to lose	20:50
clarkb	(though I guess we can't guaruntee they are on differeny hypversors)	20:50
corvus	so all in all, maybe a few lines of that paste will be useful, but it's a good reference :)	20:50
clarkb	corvus: ya we force it to rotate the journal and bump up the write to disk time	20:50
openstackgerrit	Merged opendev/system-config master: Switch to prepare-workspace-git https://review.opendev.org/720231	20:58
mordred	corvus: ++	20:58
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	21:05
clarkb	apparently my new yaml in that test change isn't valid for jinja?	21:12
clarkb	its the ' unbalancing again	21:13
clarkb	I should just start typing without any 's and the issue will go away	21:13
openstackgerrit	Clark Boylan proposed opendev/system-config master: DNM Test docker-compose upgrade https://review.opendev.org/719682	21:13
clarkb	trying again	21:13
corvus	clarkb: be like data; no contractions	21:13
*** DSpider has quit IRC		21:14
corvus	mordred: there seems to be a chunk of puppet in the ansible for the zuul-scheduler role :)	21:15
clarkb	corvus: what about compression?	21:17
mordred	corvus: you're just imagining that	21:17
corvus	clarkb: i believe his upper spinal support is a poly-alloy, designed to withstand extreme stress	21:18
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	21:18
corvus	mordred: left a second comment on that too	21:19
mordred	agree	21:19
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	21:20
openstackgerrit	Merged opendev/system-config master: Add /opt/lib/git to the volume mounts https://review.opendev.org/720225	21:26
mordred	clarkb, corvus : we're going to need to restart the gerrit container to pick that up	21:31
mordred	maybe we shoudl wait until the compose change lands too	21:31
mordred	so that we just do one restart	21:32
corvus	mordred: i'm inclined to restart asap -- i have clones from those urls and they are several days out of date. maybe i'm the only one, but if not, then it's a service impact	21:33
corvus	(also, we're going to need to trigger a full replication of everything to that after restarting)	21:33
clarkb	ya I think the only reason we need to wait is if we are worried about not restarting gerrit gracefully	21:34
clarkb	because the docker-compose stack also addressse ^	21:34
corvus	mordred: and our container is going to take up a lot of extra space -- so maybe we should --recreate it?	21:34
corvus	clarkb: is there a way to gracefully shut it down now? with a plain docker comand maybe?	21:35
mordred	we could do a docker-compose exec to send the hup	21:35
mordred	corvus: and yes - let's do recreate for sure	21:35
corvus	and we don't have 'restart: always'?	21:35
clarkb	corvus: ya I'm not sure. What we want to do si hup it then wait long enough for it to stop on its own	21:36
clarkb	which is less than a minute with our version of gerrit iirc	21:36
corvus	right i'm just wondering if we do that will docker restart it	21:37
clarkb	oh mordred ^	21:37
mordred	we do not have restart: always	21:37
mordred	so I think it will not	21:37
corvus	ok	21:37
corvus	i guess we're waiting for that to land on disk	21:38
mordred	oh fun	21:38
mordred	https://zuul.opendev.org/t/openstack/build/90c6d2ddf8204800aab15a26e05952e8	21:38
corvus	how does that not have a default value	21:40
corvus	mordred: i guess just add that to the role invocation?	21:40
clarkb	corvus: mordred maybe because we add host the server	21:40
clarkb	I bet you can set it when you add host?	21:41
openstackgerrit	Monty Taylor proposed opendev/system-config master: Add port and user_dir to add_host in prod playbook https://review.opendev.org/720293	21:41
corvus	ohhh	21:41
corvus	clarkb wins	21:41
mordred	yeah - I think that should do it	21:41
mordred	I added ansible_user_dir - just from looking at the role for other things it wants	21:41
clarkb	I feel like I'm learning a lto about ansible :)	21:41
mordred	clarkb: me too!	21:41
mordred	(we could also add a default(22) to the role there)	21:42
clarkb	mordred: should system-config-run-nodepool have a parent of system-config-run-containers? or does it not matter because it is consuming images from the zuul tenant?	21:43
mordred	clarkb: that's right - that base job is only for jobs where we're dpeending on containers we're building	21:43
mordred	(that's right- it doesn't matter)	21:44
corvus	mordred: we should run our zuul containers as non-root users	21:44
corvus	10001 is set up as the zuul user in the container	21:47
corvus	er in the image	21:47
mordred	corvus: ++	21:47
mordred	we should run them as that	21:47
corvus	and likewise, same number is the nodepool user in the np images	21:47
mordred	corvus: the images set USER already ... so don't these start as that user absent other intervention?	21:48
corvus	do they?	21:48
corvus	i didn't see that they did	21:48
mordred	oh - I guess not	21:48
clarkb	what does the USER directive do in that case?	21:49
mordred	we don't do one	21:49
corvus	clarkb: the user directive says what user to run as	21:49
corvus	in the image	21:49
mordred	yeah- so if we DID do a USER, it would run as that - but we don't, so we need to set it in the compose	21:49
corvus	the current state of the nodepool/zuul images is that they have a unix user created in the filesystem of the image, but they run as root by default. but we can tell docker to run as that user.	21:49
clarkb	corvus: oh its for build time	21:50
corvus	clarkb: USER affects build and run	21:50
corvus	(you can use it during build to switch users for build activities; and the last USER line also says what it will run as by default)	21:51
clarkb	got it	21:51
corvus	which makes a weird sort of sense when you think of building and running images as the same thing, which docker does	21:51
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	21:52
mordred	user: zuul added	21:52
mordred	good catch	21:52
corvus	we should be able to run an 10001 everywhere except zuul-fingergw, which still probably wants to be run as root since we run in host networking; that way it can grab the port and drop	21:52
mordred	oh. yeah. lemme fix fingergw - I forgot about port drop	21:53
corvus	oh, and i have no idea about nodepool-builder :)	21:53
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	21:53
mordred	corvus: I think just running n-b as root makes sense- otherwise it's just going to be sudoing all over the place anyway	21:53
mordred	oh - hah. we run as nodepool but with privileged: true on	21:54
corvus	huh, we apparently run the builders as the nodepool user	21:54
mordred	yeah	21:54
mordred	so I guess diskimage-builder sudos where necessary?	21:54
mordred	I mean - whatever it's doing is apparently working	21:55
clarkb	ya it should sudo	21:55
clarkb	corvus: mordred it appears to have recreated the containers	22:00
clarkb	thats painful	22:00
clarkb	I'll get links to lgos once the buildset reports	22:00
corvus	mordred: we have some files with owner: zuul... we may want to change that to owner: 10001 ?	22:01
corvus	(maybe later we could re-id the zuul / nodepool user as 10001?)	22:02
corvus	mordred: i'm looking at the 'add github key' task in your change	22:02
mordred	clarkb: ok. so we need to emergency review when landing that	22:03
mordred	corvus: the zuul/nodepool user already is	22:03
clarkb	https://zuul.opendev.org/t/openstack/build/521f38be044647e59b4c621749841bbd/log/job-output.txt#17194 its recreating there then we fail when checking the old names at https://zuul.opendev.org/t/openstack/build/521f38be044647e59b4c621749841bbd/log/job-output.txt#17477	22:03
mordred	corvus: we set them as 10001 in the images because that's what they are in opendev :)	22:03
clarkb	mordred: well review doesn't actually docker-compose up ever I think	22:03
clarkb	thats always manual	22:03
corvus	mordred: no way. wow. cool.	22:04
clarkb	but all of the other services we'll need to have a think about?	22:04
corvus	clarkb: yeah, but i suspect they should all be okay. except we'll probably leak something in nb04. but we do anyway.	22:04
clarkb	ya so maybe this is a "land it when there haven't been fires all day and we can pay attention to things as it goes in change"	22:05
openstackgerrit	James E. Blair proposed opendev/system-config master: WIP: add Zookeeper TLS support https://review.opendev.org/720302	22:05
clarkb	I'll WIP the change now	22:05
fungi	clarkb: i'm looking forward to a day with no fires	22:06
corvus	mordred: ^ if you have a quick second to look at 720302 as an early draft, that'd be great	22:06
clarkb	gitea is the one I worry about most since our restart process relies on a new image building available	22:06
clarkb	we might be able to coincide the docker-compose update with a new image somehow and have it run through its normal updates	22:07
corvus	mainly looking for feedback about how i set it up for delegation. the role is heavyweight, so that using it should basically be a one-liner to each of the zuul/nodepool service roles, then updating their config files to point to the locations.	22:07
corvus	(and yeah, i'm thinking of having the nodepool and zookeeper config files point to /etc/zuul/certs/cert.pem)	22:08
corvus	(cause why not)	22:08
clarkb	fungi: ya it might be wishful thinking. I just want to balance "restart all the things" against "we probably need to make this transition at some point so better when all the things is relatively small"	22:09
clarkb	we could do it service by service too fwiw	22:09
clarkb	then only merge docker-compose install into the ensure-docker role once all existing services use new docker-compose	22:10
mordred	corvus: that's the flock incantation that waits for the lock?	22:10
clarkb	infra-root ^ would you prefer I split it up that way and we can iterate through it?	22:10
corvus	mordred: yep, it's exclusive and waits by default	22:10
mordred	corvus: cool - I think that approach looks good	22:10
fungi	clarkb: what's the list of services we're currently deploying that way?	22:11
fungi	gitea, gerrit, etherpad, one of the nodepool builders...	22:11
clarkb	fungi: https://review.opendev.org/#/c/719589/ the list of services are roughyl represented by the playbooks/roles files there	22:11
fungi	just trying to judge possible impact	22:11
mordred	clarkb: honestly - I think I'd go with the bandaid myself - we already serialize gitea, so it shoudl be fine	22:11
mordred	we don't do gerrit by default, so it should be fine	22:11
mordred	so we're really just talking about etherpad and nb04	22:12
clarkb	etherpad, gerrit, gitea, haproxy, jitsi, nodepool-builder, docker registry, zuul-preview	22:12
mordred	(as things where a restart might have a noticable impact we should worrry about)	22:12
fungi	clarkb: yep, basically the set i was thinking of	22:12
fungi	okay	22:12
clarkb	mordred: thats a good point re gitea. We may haev to do a replication to everything after but thats relatively low effort	22:13
fungi	and yeah, the current set seems small enough we can probably just juggle them all in one go	22:13
mordred	yeah - mostly seems like the review/land burden of doing them one at a time might actually be more costly on the team	22:13
mordred	but - definitely not today	22:14
clarkb	ya I'll leave the WIP in place for now but if things are calmer tomorrow maybe we give ti a go then	22:14
mordred	++	22:15
clarkb	https://review.opendev.org/#/c/720030/ is a related chagne that is completely safe to alnd now if anyone wants to look at it (ensures we run jobs when updating dockerfiles)	22:15
mordred	corvus: left one thought on there - it's not important, just a thing we might want to think of as a followup	22:17
*** prometheanfire has quit IRC		22:17
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: ensure-tox: use ensure-pip role https://review.opendev.org/717663	22:18
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Update Fedora to 31 https://review.opendev.org/717657	22:18
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Make ubuntu-plain jobs voting https://review.opendev.org/719701	22:18
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Document output variables https://review.opendev.org/719704	22:18
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Python roles: misc doc updates https://review.opendev.org/720111	22:18
corvus	mordred: cool yeah, i don't like the number in there either. we'll just want to make it work for both zuul and nodepool	22:19
corvus	i'm well past eod now, so i'm going to, well, eod.	22:19
ianw	i just noticed a revert for the suse change ... what's the plan?	22:19
clarkb	ianw: basically get https://review.opendev.org/#/c/720254/ in once sf.io 3pci confirms it works against https://review.opendev.org/717663	22:20
clarkb	ianw: then we can land the zuul-jobs stack you've got (I think this was the only objections that came up) and then we can retry with new images for suse	22:20
clarkb	ianw: as an alternative midway step dirk asserts that a dib release would make existing builds work	22:21
corvus	when we retry, we should keep the gap between image builds and landing that stack small -- keystone broke which is why we rolled back	22:21
fungi	it was specifically keystone's functional test job, yeah?	22:22
fungi	something which expects virtualenv to be present but isn't a typical tox unit test/linter/whatever model	22:22
openstackgerrit	Merged opendev/system-config master: Add port and user_dir to add_host in prod playbook https://review.opendev.org/720293	22:23
clarkb	ianw: fwiw now that the docker-compose thing is on semi hold I'm available to keep pushing on the suse things	22:23
clarkb	at least for a few more hours	22:23
fungi	pizza time is just about over and then i can get back to looking at etherpad/apache logs	22:23
ianw	the only thing with rolling back is that new images won't work because pip-and-virtualenv is broken ... i've been trying to avoid making a dib release with a pip-and-virtualenv that only sort of works by accident	22:24
fungi	so far the handful of spot checks i did showed each of the characteristic etherpad warnings was preceded by a request for that pad at the old domain name roughly a minute prior	22:25
ianw	at the time the ensure-pip stack was fully reviewed, so i'd hoped we could push forward with it, that was my thinking, anyway.	22:25
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620	22:25
clarkb	fungi: fwiw mordred was wondering if we should rop the redirect and just contineu to serve at the old name too	22:25
clarkb	fungi: maybe we put it in emergency and try that?	22:25
mordred	yeah - maybe something something cookies something state something sad	22:26
clarkb	ianw: not sure I understand your second to last message	22:26
clarkb	pip and virtualenv is broken but would build proper suse images?	22:26
clarkb	likely broken for a different platform I guess	22:26
mordred	ianw: also - not related to suse or pip - we started working on getting zuul+nodepool+zk all up on the ansible so we can roll out zk auth. just as an fyi	22:26
fungi	clarkb: maybe that would be okay... though could make getting people to use the new domain harder and prolong the problem if it's their existing cookies. still if it clears up the problem that's at least a data point	22:26
corvus	mordred, fungi, clarkb: i'd like to keep the redirect...	22:27
fungi	corvus: as would i	22:27
corvus	maybe we can confirm that's the problem before doing that	22:27
* mordred would also like to keep it		22:27
corvus	maybe by asking people to clear cookies, restart browser, and directly go to the new url... things like that	22:27
ianw	clarkb: https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip#L63	22:27
mordred	knowing how to reproduce the issue at all would be super great	22:27
clarkb	ianw: oh so we need package lists	22:28
ianw	clarkb: like how tumbleweed is a python3 only platform, but _do_py3 is commented out, so it's using the python2 logic to install the python3 path, and making links with tools with "2" in them and stuff	22:28
clarkb	hrm tumbleweed has python2	22:29
ianw	but not python2 packages i think?	22:30
ianw	anyway ... i don't want anyone to invest a lot of time fixing things up, and i don't want to spend a lot of time reviewing it, when we want to get rid of it asap	22:30
clarkb	thats fair	22:31
clarkb	hrm git/gerrit/zuul don't like my ensure-pip change being set as a depenods on	22:32
clarkb	maybe I have to rebase it in properly	22:32
clarkb	working on that now	22:32
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: ensure-pip: export ensure_pip_virtualenv_command https://review.opendev.org/718224	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: fetch-zuul-cloner: use ensure-pip https://review.opendev.org/717882	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: fetch-subunit-output test: use ensure-pip https://review.opendev.org/718225	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: ensure-tox: use ensure-pip role https://review.opendev.org/717663	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Update Fedora to 31 https://review.opendev.org/717657	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Make ubuntu-plain jobs voting https://review.opendev.org/719701	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Document output variables https://review.opendev.org/719704	22:34
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Python roles: misc doc updates https://review.opendev.org/720111	22:34
clarkb	there was a conflict between my change and https://review.opendev.org/718224 so ya needed to be rebased :/	22:34
*** prometheanfire has joined #opendev		22:39
clarkb	ianw: I guess one risk there is we're sort of equating pip to virtualenv there in the followon change?	22:39
clarkb	ianw: do we also need to check for virtualenv and if it isn't present run the install anyway?	22:39
ianw	clarkb: sorry which one is the followon change?	22:40
clarkb	ianw: https://review.opendev.org/718224 that one	22:41
clarkb	ianw: basically with that change we introduce the idea that if pip is present then so is virtualenv (because we're installing them together when installing pip)	22:41
clarkb	I think in the default case everything will work fine, but tristanC's case might get a little odd if they aren't also installing virtualenv	22:42
clarkb	(and maybe that is ok as power users they can deal with that)	22:42
ianw	clarkb: umm, not really ... i've tried to deliberately make it not install virtualenv	22:42
ianw	it seems like we have to on Xenial, because we found that venv doesn't work there with our mirrors	22:43
clarkb	ianw: https://review.opendev.org/#/c/718224/11/roles/ensure-pip/tasks/RedHat.yaml but it is?	22:43
clarkb	gotcha there might be a few exceptions but in general its relying on python -mvenv which should be there if python is there	22:43
ianw	yeah, were it has to, such as the python2 install	22:43
ianw	but i expect that to be hardly used	22:43
clarkb	ianw: I'm mostly wondering if we need to check for python -m venv and/or virtualenv being valid in addituion to `pip` in https://review.opendev.org/#/c/720254/2	22:44
clarkb	(or in a followup)	22:44
ianw	i don't think so? https://review.opendev.org/#/c/718224/11/roles/ensure-pip/tasks/main.yaml checks and prefers "-m venv" in all cases it can?	22:45
ianw	that should then be tested by the https://review.opendev.org/#/c/718224/11/test-playbooks/ensure-pip.yaml on all our platforms, to ensure that the ensure_pip_virtualenv_command is something valid	22:46
clarkb	ianw: ya but we are skipping the installs entirely if pip is already present	22:46
clarkb	so if you had pip installed but not venv or virtualev (depending on platform) you would be in a weird spot	22:46
clarkb	I think for now its probably fine	22:47
clarkb	because its a corner case that only power user types like tristanC will run into	22:47
clarkb	thinking about it more I think its ok to not worry about that too much. Basically what we're saying is if you know better then we'll get out of the way	22:49
clarkb	and if that breaks you its on you	22:49
ianw	i'm wondering if we should be doing this for the packaged pip case -> https://review.opendev.org/#/c/720254/2/roles/ensure-pip/tasks/main.yaml	22:50
clarkb	ianw: fwiw this all started because ensure-pip broke sf.io 3pci	22:51
clarkb	and its my undersatnding that happened because pip was already installed	22:51
clarkb	and this wasn't reconciling that state for some reason	22:51
ianw	well it is already installed on infra images too	22:52
clarkb	but the ensure-* roles are intended to noop if the thing they ensure is already there	22:52
clarkb	whcih is why corvus -1'd it	22:52
clarkb	(and why people didn't want to roll forward this morning)	22:52
clarkb	the deafult is to install from packages so skipping the checks when installing from packages doesn't help I don't think?	22:53
clarkb	at least not with the current sf.io testing	22:53
clarkb	I wonder if they are running jobs with a ro fs?	22:53
ianw	just that the package: install should be idempotent (i.e. noop when already installed) anyway	22:54
tristanC	clarkb: not sure what do you mean by power user, but i think that using the tox job with a python container that doesn't have sudo should not be a corner case	22:54
clarkb	tristanC: the corner case is you've preprepped the image. This role is for prepping the image	22:54
clarkb	tristanC: I think the correct way for you to use this would be to not use esnure-* anything if you are using prebuilt images without root	22:55
clarkb	but I'm also happy to try and accomodate the preinstalled case beacuse I think it won't be uncommon	22:55
clarkb	tristanC: the corner case here is that you are using a role that will install things if necessary but you don't let it do that	22:55
ianw	tristanC: so if ensure-pip has a "package:" call with become: yes, that won't work for you, right?	22:55
ianw	even though that is idempotent, as such -- keeping to the rules of ensure-* roles that they don't do anything if the stuff is already there	22:56
tristanC	clarkb: we are not using that role, we just use the tox job provided by the zuul-jobs project.	22:57
clarkb	tristanC: on the root point the whole system has sort of been designed to make using root as safe as possible. because unfortunately a lot of stuff does need root (not necessarily tox though)	22:58
fungi	how exactly did it break for you then?	22:58
clarkb	fungi: its because sudo rpm -q or whatever it does to check if the package is installed failed	22:58
fungi	oh, right the job not the role	22:58
clarkb	fungi: via ensure-tox consuming ensure-pip	22:58
fungi	the tox job in zuul-jobs tries to install the things it will use, so if you're preinstalling those things the job might still try to sudo even if it'll be a no-op	23:00
fungi	got it	23:00
clarkb	fungi: yup	23:00
ianw	so ... should we make the tox job not call ensure-tox? i thought we decided it wasn't yesterday?	23:00
fungi	so yeah any become would need to be guarded behind whatever conditional ensures it's a no-op	23:01
ianw	playbooks/tox/pre.yaml: - ensure-tox	23:01
clarkb	ianw: well I think there is still value in the check if pip is there without package manager case because it could be installed without the package manager?	23:01
fungi	or else the ensure roles should not be included	23:01
tristanC	clarkb: fungi: iirc we already agreed that the tox job should be usable without sudo access	23:01
fungi	tristanC: yep, makes sense	23:01
clarkb	tristanC: yup I wrote the change to fix it :)	23:02
ianw	tristanC: ++ on tox job not using sudo	23:02
clarkb	but there is a weird side case where the way we pull in pip implies virtualenv (or venv) will be available	23:02
ianw	heh, well we agree on something :)	23:02
clarkb	and if you haven't built the image with virtualenv or venv it will be weird for you	23:02
clarkb	but we can't fix that in any case because there is no sudo so its not worth worrying about I don't think	23:02
ianw	i'm back to why ensure-tox is in the tox role pre.yaml playbook	23:02
clarkb	ianw: how does the job work if it isn't ensuring tox is available?	23:03
clarkb	(I don't think I followed that conversation from before)	23:03
ianw	clarkb: i thought from yesterday, i'll have to go back, we were somewhat of the agreement it was up to you to run "ensure-tox" before running the tox job	23:04
clarkb	ianw: I think the implication was that maybe tristanC should have a different tox job that didn't run any of the roles	23:04
clarkb	any of the ensure- roles	23:04
tristanC	perhaps we could drop the assumption that zuul-jobs are not meant to be usable by custom container, and then we should provides a zuul-container-jobs that provides light weight version of the job's play that doesn't use the ensure-* role	23:05
clarkb	but I'm not sure	23:05
tristanC	those jobs could even reference public container images that are known to work with the jobs	23:05
clarkb	tristanC: I wouldn't even label them container jobs as the pattern could be useful in other systems too	23:05
clarkb	fwiw I think my change will fix this particular problem	23:06
clarkb	and we never merged the change that would break tristanC ?	23:06
clarkb	so the system is working?	23:06
ianw	http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-04-14.log.html#t2020-04-14T21:30:31 was the comment i was thinking of	23:07
ianw	"that happens to make it so that tristanC can avoid running the ensure role too if he wants to define a new tox job."	23:08
tristanC	yes, the system is working, and i'm happy to keep supporting sudo-less environment. I'm also happy to drop the support, as long as we have an agreement	23:08
clarkb	https://review.opendev.org/#/c/717663/26 is green now	23:09
clarkb	from sf.io I mean	23:09
corvus	ianw: the context for that quote was that tristanC was concerned that we were doing extra work that wasn't necessary for him.	23:09
corvus	that's different than our current understanding, which is that if we merged that change we would have broken a working system	23:10
corvus	(so, to be clear, i support tristanC optionally creating a new job that is more efficient; but at this point i don't think we're saying that should be required in order for the basic thing to work)	23:10
ianw	right, that's ok	23:12
tristanC	corvus: clarkb: it seems like there is value in being able to associate job with prepared runtime known to be working for a specific task. So perhaps we could start designing an extra zuul-jobs project that provides job play using the role from zuul-jobs.	23:12
tristanC	we could even agree on labels name and provides nodeset too	23:12
corvus	tristanC: i'm not sure i'm ready to give up on having a tox job in zuul-jobs that works everywhere	23:13
corvus	it seems like there's a path forward here, so maybe let's see how good we can make that before we fork	23:13
ianw	now i'm starting to wonder if having the virtualenv bits in ensure-pip is a good idea	23:21
openstackgerrit	Merged zuul/zuul-jobs master: Check if pip is preinstalled before installing it https://review.opendev.org/720254	23:26
ianw	looking at the keystone job	23:35
ianw	https://zuul.opendev.org/t/openstack/build/152dd7622d8b404589d09d120986ed25/log/job-output.txt#1662	23:35
*** tosky has quit IRC		23:39
*** mlavalle has quit IRC		23:47
ianw	cmurphy: ^ i can not understand where this is coming from :/	23:55
ianw	https://opendev.org/openstack/devstack/src/branch/master/stackrc#L152 ... it should be using venv ... it must be a branch or something i haven't considered	23:55
fungi	stable/stein, right?	23:56
cmurphy	https://zuul.opendev.org/t/openstack/build/152dd7622d8b404589d09d120986ed25/ is on master not stein	23:57
fungi	yeah, just double-checked	23:57
fungi	https://zuul.opendev.org/t/openstack/build/152dd7622d8b404589d09d120986ed25/log/zuul-info/inventory.yaml#131	23:57
fungi	so codesearch is returning the relevant hits in that case	23:58
fungi	only seems to appear in devstack	23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!