Monday, 2020-05-04

openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: install python2-pip on SuSE when required https://review.opendev.org/724777	01:55
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788	01:55
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776	01:55
ianw	does anyone know how the nb04 config started using ipv6 addresses for the zk hosts? i can't find any discussion on it afaics	03:29
openstackgerrit	Ian Wienand proposed opendev/system-config master: nodepool-base: Quote ipv6 literals for ZK hosts https://review.opendev.org/725160	03:48
ianw	mordred / infra-root: ^ i see now we're overwriting the zk hosts; either this or https://review.opendev.org/#/c/725157/ or both should get builders connected again	03:50
*** ykarel\|away is now known as ykarel		04:19
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: install python2-pip when running under Python 2 https://review.opendev.org/724777	04:46
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788	04:46
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776	04:46
*** ysandeep\|afk is now known as ysandeep		05:20
*** dpawlik has joined #opendev		06:04
*** dpawlik has quit IRC		06:04
ianw	btw virtualenv is broken on centos-7 : see https://github.com/pypa/virtualenv/issues/1810	06:06
ianw	this makes testing of zuul-jobs related things fail, so for today i give up	06:07
*** dpawlik has joined #opendev		06:07
*** dpawlik has quit IRC		06:07
*** dpawlik has joined #opendev		06:08
*** ykarel is now known as ykarel\|afk		06:21
*** rchurch has quit IRC		06:31
*** rchurch has joined #opendev		06:32
*** rpittau\|afk is now known as rpittau		06:33
*** ykarel\|afk is now known as ykarel		06:37
*** DSpider has joined #opendev		06:53
*** tosky has joined #opendev		07:32
*** sshnaidm\|off is now known as sshnaidm		07:33
dpawlik	hi. Is everything OK with mirroring centos and fedora? It seems like mirrors.centos and mirror.fedora was last time updated 6 days ago http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1	08:01
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Fix fetch-sphinx-tarball fails https://review.opendev.org/725210	08:09
*** roman_g has joined #opendev		08:10
*** ysandeep is now known as ysandeep\|lunch		08:35
*** dtantsur\|afk is now known as dtantsur		08:43
*** roman_g has quit IRC		08:50
jrosser	i am also having trouble with centos jobs where i get conflicting packages tying to install git-daemon http://paste.openstack.org/show/793031/	09:01
AJaeger	I see el_7_7 and el_7_8 in there- was CentOS 7.8 released and we didn't mirror completely?	09:03
*** roman_g has joined #opendev		09:03
AJaeger	infra-root, please see jrosser's and dpawlik's comments on centos and fedora mirroring	09:04
jrosser	AJaeger: from my very brief poke at this a couple of days ago it didn't look like the git-daemon package i need was present in the place we mirror from	09:06
jrosser	and yes it seems like an incomplete mix of 7.7 and 7.8	09:07
dpawlik	ack	09:08
*** ykarel is now known as ykarel\|lunch		09:22
jrosser	oops i mean git-daemon package _wasnt_ present	09:23
*** lpetrut has joined #opendev		09:25
*** Dmitrii-Sh has joined #opendev		09:25
*** roman_g has quit IRC		09:38
*** panda\|ruck is now known as panda\|pto		09:40
*** roman_g has joined #opendev		09:55
*** ralonsoh has joined #opendev		10:03
*** rpittau is now known as rpittau\|bbl		10:14
*** ykarel\|lunch is now known as ykarel		10:32
*** ysandeep\|lunch is now known as ysandeep		10:47
*** kevinz has quit IRC		11:04
*** olaph has joined #opendev		11:05
AJaeger	infra-root, donnyd , any idea what's up with openedge? http://mirror.us-east.openedge.opendev.org/ looks down. See also https://zuul.opendev.org/t/openstack/build/65ac4493a586466781744c093ad63392	11:12
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Disable openedge https://review.opendev.org/725234	11:15
AJaeger	proposal to disable for now ^	11:15
donnyd	hrm... everything else is working fine	11:17
donnyd	checking now	11:17
donnyd	interesting... the mirror node was magically shut down	11:18
AJaeger	infra-root, today's fires that I'm aware: 1) http://mirror.us-east.openedge.opendev.org/ down ; 2) virtualenv on CentOS broken, new virtualenv release is out, we need new nodepool images; 3) CentOS 7 and Fedora mirrors are old, CentOS has partial update to 7.7 and needs fixing	11:19
AJaeger	donnyd: thanks for looking!	11:19
donnyd	infra-root the mirrror at OE is fixed. the machine got shutdown somehow	11:19
AJaeger	donnyd: thanks! So, that problems was solved quickly	11:21
AJaeger	#status log mirror.us-east.openedge.opendev.org was down, donnyd restarted the node and openedge should be fine again	11:21
donnyd	9 minutes isn't too bad of a turn around time	11:21
openstackstatus	AJaeger: finished logging	11:21
AJaeger	donnyd: 9 minutes is excellent ;)	11:22
donnyd	AJaeger: yea I logged into the project and the instance was in "shutdown"	11:22
donnyd	idk how.. but anyways its back online now	11:22
*** ysandeep is now known as ysandeep\|brb		11:36
*** ysandeep\|brb is now known as ysandeep		11:52
*** rpittau\|bbl is now known as rpittau		12:22
*** ykarel is now known as ykarel\|afk		12:38
*** hashar has joined #opendev		12:53
*** sgw has joined #opendev		13:01
ttx	hey everyone... was Gerrit restarted since we merged	13:03
ttx	https://review.opendev.org/#/c/718478/	13:03
ttx	(Apr 29 22:52)	13:03
ttx	need to know if I can start moving things around on the GitHub side	13:04
frickler	ttx: I still see replication to github in the log, so I'd assume not	13:25
AJaeger	ttx, it was not	13:26
ttx	ok thanks! Keep me posted when it is :)	13:26
*** hashar has quit IRC		13:33
*** ralonsoh has quit IRC		13:37
*** lpetrut has quit IRC		13:49
*** ralonsoh has joined #opendev		13:49
corvus	i'm looking into the centos mirror issues	13:58
corvus	it looks like the volume is locked but no actual release transaction is in progress	13:58
corvus	it looks like it was updating afs02.dfw when it stopped	14:00
fungi	so not like related to the 2020-04-28 afs01.dfw outage	14:05
fungi	er, not likely	14:05
corvus	fungi: oh, that probably was it actually. the release command is run on afs01.dfw	14:07
corvus	i'm going to start a screen session on afs01.dfw, grab the mirror lock, unlock the afs volume, and start a release	14:10
fungi	if memory serves, it died in such a way that it was hanging clients rather than causing them to fail over to the other server, but due to a kernel panic (presumed to be from a host migration problem) it had to be rebooted	14:10
*** ykarel\|afk is now known as ykarel		14:10
fungi	so makes sense that the vos release command may have hung waiting for the server to respond	14:11
corvus	fungi: sorry, the vos release command was issued on afs01.dfw; it crashed in mid process. so in this case it's afs02 waiting for afs01 to tell it to finish and unlock.	14:12
corvus	basically the reverse	14:12
fungi	oh, interesting	14:13
fungi	if that happened before/during the afs02 reboot, i wouldn't expect afs02 to think it was waiting on anything there, but maybe it's more stateful than i realize	14:13
corvus	afs02 has been up 167 days	14:14
corvus	This is a completion of a previous release	14:15
corvus	Starting ForwardMulti from 536870962 to 536870962 on afs02.dfw.openstack.org (full release).	14:15
corvus	that's in progress now	14:16
*** ysandeep is now known as ysandeep\|brb		14:18
fungi	oh, wait, it was afs01.dfw which got rebooted, caffeine not connecting this morning i guess	14:20
fungi	yeah, so i guess maybe it was in the middle of that when it died	14:20
corvus	infra-root: i think that of AJaeger's 3 fires: #1 is done; #3 is in progress; that leaves #2 -- centos nodepool images	14:33
corvus	before i just poke nodepool to make new images -- does anyone understand why a new release of virtualenv would cause our existing centos images to break?	14:38
corvus	oh https://github.com/pypa/virtualenv/issues/1810	14:40
corvus	so it looks like there are 2 new releases at issue	14:41
corvus	.19 bad, and is what is on our current images	14:41
corvus	.18 and .20 good	14:41
corvus	#status log unlocked centos mirror openafs volume and manually started release	14:45
openstackstatus	corvus: finished logging	14:45
corvus	#status log deleted centos-7-0000124082 image to force rebuild with newer virtualenv	14:45
openstackstatus	corvus: finished logging	14:45
*** hashar has joined #opendev		14:45
*** panda\|pto has quit IRC		14:46
corvus	infra-root: that should mean that all 3 issues are being addressed	14:47
corvus	centos-7-0000124083 is the replacement dib image	14:47
corvus	building now	14:47
*** mlavalle has joined #opendev		14:48
*** hashar has quit IRC		14:50
*** panda has joined #opendev		14:50
fungi	ykarel: ^	14:52
*** ysandeep\|brb is now known as ysandeep		14:52
ykarel	fungi, corvus Thanks	14:53
AJaeger	corvus: thanks	14:53
AJaeger	dpawlik, jrosser, FYI, CentOS 7 mirror should be uptodate again.	14:55
dpawlik	\o/ AJaeger	14:56
dpawlik	thank you	14:56
AJaeger	dpawlik: corvus did the work, I just passed messages around ;)	14:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: DNM: trigger registry tests https://review.opendev.org/725294	14:57
*** olaph has quit IRC		14:58
dpawlik	AJaeger, ah	14:58
dpawlik	so corvus++	14:58
dpawlik	corvus, AJaeger how ofter do you refresh data in graphana? Just curious because http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 there is still 6 days	15:00
fungi	i just approved clarkb's 644432 fix which should correctly attempt to apply read-only settings to retired projects which were missing them	15:01
fungi	that may run longer than usual	15:01
clarkb	mordred: ^ I think that wont cause any issues other than ti.ing out the manage projevts job potentially	15:02
clarkb	*timing	15:02
clarkb	in which case we can run it again I suppose	15:02
clarkb	dpawlik: the graphana data is generated by a script I expect corvus manually fixed things and the udp packets for timing info werent sent	15:03
mordred	clarkb: ++	15:04
fungi	dpawlik: it also may not update until the image builds are completed	15:04
dpawlik	clarkb, fungi thanks for explanation	15:04
fungi	dpawlik: ykarel: the centos-7-0000124083 build corvus mentioned is getting logged at https://nb01.openstack.org/centos-7-0000124083.log (self-signed ssl cert, sorry) and once that's done, it still has to get uploaded to our providers which also takes a few minutes	15:08
ykarel	fungi, Thanks	15:10
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Do not use bare 'item' in build-container-image https://review.opendev.org/725298	15:11
openstackgerrit	Brian Haley proposed openstack/project-config master: Update Neutron grafana dashboard https://review.opendev.org/725299	15:12
*** ysandeep is now known as ysandeep\|away		15:13
corvus	AJaeger, dpawlik: the release is still in progress; so i don't think the mirror is up to date yet	15:23
dpawlik	corvus, ack	15:24
clarkb	no lists ooms since robots.txt was updated	15:27
clarkb	zuul scheduler looks like it might need a sigusr2 pair again	15:28
AJaeger	clarkb: want to restart gerrit some time so that we stop github replication?	15:30
clarkb	AJaeger: maybe? I think the jeepyb thing above is to address some part of that (fungi and ttx would know more than I if we are rready to update gerrit yet)	15:37
fungi	the gerrit restart would merely be to stop github replication, the jeepyb fix above isn't a blocker for that	15:38
fungi	ttx is separately working on tooling to build project-config changes to set which repositories should be replicating via zuul jobs, but the current set they're applied to is fairly comprehensive	15:39
openstackgerrit	Merged opendev/jeepyb master: Inspect all configs in manage-projects https://review.opendev.org/644432	15:43
*** odyssey4me has joined #opendev		15:44
*** ykarel is now known as ykarel\|away		15:47
*** diablo_rojo has joined #opendev		15:47
*** hashar has joined #opendev		15:51
*** dpawlik has quit IRC		15:59
*** rpittau is now known as rpittau\|afk		16:08
openstackgerrit	Merged zuul/zuul-jobs master: go: Use 'block: ... always: ...' and failed_when instead of ignore_errors https://review.opendev.org/723643	16:15
openstackgerrit	Merged zuul/zuul-jobs master: ara-report: use failed_when: false instead of ignore_errors: true https://review.opendev.org/723644	16:17
clarkb	fungi: mordred ^ it doesn't look like the jeepyb change landing caused infra-prod-manage-projects to run. Maybe that means we can run it manually without a timeout?	16:18
*** smcginnis has quit IRC		16:19
openstackgerrit	Merged zuul/zuul-jobs master: fetch-subunit-output: use failed_when: instead of ignore_errors: https://review.opendev.org/723653	16:20
clarkb	also where are we with zuul python3.8 iamges because if they aren't close maybe we should write an hourly cron to sigusr2 zuul :/	16:20
fungi	clarkb: good idea, i can do that after my current meeting maybe	16:22
clarkb	fwiw I'll plan to sigusr2 zuul after my meeting	16:22
clarkb	to hopefully reset the current trned	16:22
mordred	clarkb: latest zuul images should be on 3.8	16:24
mordred	clarkb: https://review.opendev.org/#/c/724908/ landed - so restarting with a pull should have us on 3.8	16:24
clarkb	cool so maybe this sigusr2 is the last one we need and we can schedule a restart to see if 3.8 is any better	16:25
mordred	yeah	16:25
mordred	clarkb: we also need a gerrit restart to pick up the github repl change	16:25
mordred	so maybe we do them around a similar time	16:25
clarkb	mordred: we should double check that that change landed wasn't affected by the docker image promotion bugs we were working through recently	16:25
clarkb	https://hub.docker.com/r/zuul/zuul-scheduler/tags I don't know how to map that back to a change	16:26
mordred	clarkb: just click through on the sha: https://hub.docker.com/layers/zuul/zuul-scheduler/latest/images/sha256-80d80631a2ce593db67e3f4827bf3a22bf2152f8a160467e869aa0713b305ccb?context=explore	16:28
mordred	and at least in this case you can see it built with 3.8	16:28
*** smcginnis has joined #opendev		16:28
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Do not use bare 'item' in build-container-image https://review.opendev.org/725298	16:30
mordred	corvus: ooh - maybe we should add a label to the images when we build them to tie them back to a change - docker build has a --label option to add additional ones at build time	16:30
corvus	mordred: ++	16:31
mordred	corvus: maybe one for the change, and maybe one for the git sha of the change itself (not the merge commit) - and then maybe just one that says "built by zuul.opendev.org" or something	16:32
corvus	sgtm	16:32
corvus	mordred: ooh	16:32
corvus	mordred: could we put in a url to the build page?	16:32
*** panda is now known as panda\|pto		16:35
mordred	corvus: yeah	16:37
mordred	corvus: we can put anything we want to :)	16:37
mordred	corvus: what's the best way to get the build url in a job?	16:47
openstackgerrit	Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Enable yamllint https://review.opendev.org/725091	16:48
clarkb	infra-root on closer examination the bulk of the memory use on zuul01 is zuul-web (17GB ish) not zuul-scheduler (3GB ish)	16:50
clarkb	restarting zuul-web is easy compared to scheduler can I just restart that one on the new python3.8 image?	16:51
fungi	and sending any signals to zuul-web seems to result in stopping the process, right?	16:51
clarkb	mordred: ^	16:51
corvus	clarkb: seems like we may have disproven the theory about having a memleak in latest cherrypy	16:51
clarkb	fungi: yes, https://review.opendev.org/#/c/724946/ is related	16:51
fungi	i expect just restarting it should be fine, even on a different python version	16:51
fungi	ooh, you found the handler problem i guess	16:52
clarkb	looks like tobiash has a good suggestion I need to consider	16:53
corvus	mordred: unsure; there's a build.uuid variable; and the artifact promote job does some stuff with the api	16:53
clarkb	corvus: I guess? it could be an interaction with cherrpy and newer python since the sigusr2 seemed to unstick the scheduler	16:53
corvus	clarkb: we're running old-cherrypy on 3.7 though; much like we were before the container restarts	16:54
clarkb	corvus: correct, but before we had old cherrypy + python 3.5	16:54
clarkb	I'	16:54
clarkb	er	16:54
corvus	oooh	16:54
clarkb	I'm suggesting that python3.7 is the issue here too	16:54
corvus	gotcha	16:55
clarkb	which is why restarting it on 3.8 may be useful	16:55
fungi	seems likely the same presumed gc issue could be impacting multiple daemons	16:55
clarkb	fungi: yup	16:55
corvus	clarkb: agreed; my point is that we have eliminated cherrypy alone as the cause	16:55
clarkb	rgr	16:55
mordred	corvus: download_artifact_api: "https://zuul.opendev.org/api/tenant/{{ zuul.tenant }}"	16:56
mordred	corvus: we seem to just hardcode base api in the docs promote job	16:56
clarkb	looking at docker image ls I think if I do cd /etc/zuul-web ; sudo docker-compose down && sudo docker-compose up -d we'll be running zuul-web on python3.8	16:57
corvus	mordred: since the build job is opendev specific, we can probably do that there	16:57
corvus	clarkb: ++	16:57
*** dtantsur is now known as dtantsur\|afk		16:57
corvus	clarkb: can't hurt to do an extra docker-compose pull before starting though	16:57
clarkb	corvus: k	16:58
mordred	corvus: the build-docker-image role isn't - and I think there's several generic things we can do	16:58
mordred	corvus: lemme push up what I've got so far and we can go from there	16:58
clarkb	alright I'll run a pull. down. then up -d in /etc/zuul-web now	16:58
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	16:58
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: DNM Check to see if images from intermediate work https://review.opendev.org/724751	16:58
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Write a buildkitd config file pointing to buildset registry https://review.opendev.org/724757	16:58
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Write buildkitd.toml in use-buildset-registry https://review.opendev.org/724837	16:58
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Add zuul labels to images and ability to user-define more https://review.opendev.org/725339	16:58
clarkb	#status log Restarted zuul-web and zuul-fingergw on new container images on zuul01. We were running out of memory due to leak in zuul-web which may be caused by python3.7 and new images provide python3.8	17:00
openstackstatus	clarkb: finished logging	17:00
*** ralonsoh has quit IRC		17:07
clarkb	memory use on zuul01 has been steadily climbing since the erstart. We'll haev to wait and see if we plateau	17:52
clarkb	mordred: fungi: what ist he process for running that playbook manually? do we need to lock anything?	17:54
clarkb	or maybe we can trigger the job directly in zuul (that would be subject to timeouts but wouldn't have lock issues)	17:54
*** mrunge_ has joined #opendev		17:55
*** mrunge has quit IRC		17:56
fungi	clarkb: for manage-projects? i was expecting to just fire the command locally on review.o.o instead of from ansible	17:57
fungi	using docker exec or however ansible has been calling it	17:58
clarkb	fungi: ya and oh ya that will work	17:58
mordred	clarkb: yeah - what fungi said	17:58
clarkb	I guess my concern is that if we land projects.yaml updates we could have competing processes	17:58
mordred	although you can also touch the lockfile on bridge and run the playbook from the system-config dir	17:58
clarkb	config-core ^ maybe hold off on landing new projects until we run a manage-projects by hand	17:58
mordred	clarkb: /home/zuul/DISABLE-ANSIBLE	17:59
mordred	clarkb: touch that on bridge and it'll prevent jobs from running - they have an hour timeout - so they'll resume once you rm it	17:59
clarkb	gotcha	17:59
fungi	alternatively, we can put review.o.o in the temporary disable list	18:03
clarkb	fungi: that might be simpler?	18:04
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Write buildkitd.toml in use-buildset-registry https://review.opendev.org/724837	18:04
clarkb	fwiw I'm not planning on doing it since you volunteered, but did want to have a short think about whether or not we needed to put safety glasses on first	18:04
fungi	yes, absolutely	18:04
fungi	my last scheduled meeting of the day just wrapped up, so catching up on a few urgent conversations and will start on that	18:05
mordred	no - don't put it in the disable list	18:05
mordred	touch the lock file	18:05
mordred	you need to stop zuul from doing the things	18:05
mordred	it's what it's there for :)	18:06
*** sshnaidm is now known as sshnaidm\|afk		18:07
mordred	or - I guess - honestly it's probably fine - ignore me	18:07
clarkb	it should also be fine if we don't approve any new proejct additions	18:08
mordred	but just saying "touch /home/zuul/DISABLE-ANSIBLE" should cover all the bases and also not cause anything to run in a half-configured state	18:08
clarkb	fungi: also as a sanity check you can run manage-projects against a specific project or three first before doing the whole list	18:08
clarkb	fungi: maybe run it against a retired project that hasn't had an acl update and a non retired project and make sure we get the expected results?	18:08
fungi	yeah, i can do that	18:11
fungi	mordred: `touch /home/zuul/DISABLE-ANSIBLE` on bridge will stop zuul from deploying anything to any server though, right? do we need to worry about it missing events then for other stuff we'd have deployed from unrelated changes?	18:12
clarkb	I never had breakfast. At this point I'll call it early lunch. Back in a bit	18:13
fungi	i scarfed some lentil chips and hummus while in a meeting	18:13
mordred	fungi: it backs up	18:18
mordred	fungi: so - the first job zuul enqueues will wait for up to an hour for the file to go away (and the jobs behind it will just be queued up in zuul)	18:18
mordred	fungi: we might start missing things if it's in place for more than an hour - but at that point we should probably be disabling hosts and stuff	18:19
mordred	it's basically a big pause button	18:19
fungi	oh, okay that helps	18:20
fungi	at the mere push of a single button!	18:22
fungi	the beautiful shiny button	18:22
fungi	the jolly candy-like button	18:22
* fungi can't hold out, no sir-ee		18:23
fungi	pushing it now	18:23
fungi	#status log temporarily paused ansible deploys from zuul by touching /home/zuul/DISABLE-ANSIBLE on bridge.o.o	18:24
openstackstatus	fungi: finished logging	18:24
clarkb	fungi: mordred you may need to pull a new image too?	18:24
fungi	i'll check	18:25
fungi	we're baking jeepyb into the gerrit image, or installing it into a separate image? i'm supposing the former since some of the gerrit hook scripts call into it	18:25
mordred	it's in the gerrit image	18:27
fungi	and yeah, /usr/local/bin/manage-projects is a wrapper calling docker exec	18:27
mordred	we should maybe also just make a jeepyb image	18:27
mordred	and use that for manage-projects but with a similar set of volume mounts	18:27
mordred	so that we can update jeepyb independent of gerrit	18:28
mordred	fungi, clarkb : I may have missed a thing - we have a jeepyb update that we need for manage-projects?	18:28
fungi	ocker says jeepyb==0.0.1.dev467 # git sha 9d733a9	18:28
clarkb	mordred: yes the fix for updating retired projects	18:29
fungi	er pbr freeze via docker run says that i mean	18:29
mordred	yah - but try it via exec	18:29
mordred	since that'll be what manage-projects does - run will make a new container	18:29
mordred	exec will use the existing gerrit one	18:29
mordred	I think atm we're going to need to restart the gerrit container to pick up that jeepyb change	18:29
clarkb	oh I thought wedid run not exec	18:30
mordred	or - you could urn the command in /usr/local/bin/manage-projects but replace exec with run (and add an --rm)	18:30
fungi	i was running `exec docker run ... pbr freeze` via a copy of the manage-projects wrapper script yeah	18:30
mordred	oh! we do do run	18:30
mordred	yeah- nevermind me - I for some idiotic reason thought we were execing (and planning to fix that)	18:31
mordred	you should be fine :)	18:31
fungi	anyway, that commit is too old i think	18:31
fungi	so maybe we didn't build a new gerrit image when the jeepyb change merged, or i need to pull it	18:31
mordred	yeah- you likely need to docker pull - and if that doesn't work - then we missed building the image on jeepyb change	18:32
fungi	9d733a9 is the previous commit before the fix	18:32
fungi	well, i may as well check the zuul builds page	18:32
*** redrobot has joined #opendev		18:33
fungi	we built system-config-promote-image-gerrit-2.13 after that change merged, so i guess it's just the pull we need	18:33
mordred	yeah. I concur	18:33
fungi	mordred: is it really just `sudo docker pull` on review.o.o then? no additional arguments? or do i need to specify the image name?	18:34
fungi	and that won't restart the running container processes, right?	18:34
mordred	either ...	18:34
mordred	it will not	18:34
mordred	docker pull opendevorg/gerrit:2.13	18:34
mordred	or	18:34
fungi	ahh, yeah it needs the image name	18:34
mordred	cd /etc/gerrit-compose ; docker-compose pull	18:35
fungi	running the latter now	18:35
fungi	#status log manually pulled updated gerrit image on review.o.o for recent jeepyb fix	18:35
openstackstatus	fungi: finished logging	18:36
mordred	clarkb, fungi: incidentally: https://review.opendev.org/#/c/725339/ should add metadata to our images that will let us inspect and see what change they were built from	18:36
fungi	jeepyb==0.0.1.dev469 # git sha ab498db	18:36
mordred	fungi: that seems better	18:36
fungi	well, ab498db does not appear in jeepyb's master branch history	18:37
fungi	wheere did that come from?	18:37
mordred	that'll be the merge commit on the executor	18:37
fungi	yes, metadata would be awesome, especially in cases like this	18:37
mordred	yup	18:37
fungi	oh, right, executor made a different merge commit than gerrit did	18:37
mordred	yah - so yeah, I think the labels are going to be super helpful :)	18:38
fungi	so we can't really expect merge commit shas to match between promoted images and git history	18:38
fungi	at least not until we can have zuul push merge commits into gerrit	18:38
mordred	yah	18:38
fungi	anyway, 0.0.1.dev469 is 0.0.1.dev467 + 2	18:39
fungi	so the fix plus the merge commit for it	18:39
fungi	okay, so one example of a lagging retirement was https://review.opendev.org/#/admin/projects/openstack/fuel-devops	18:40
fungi	https://opendev.org/openstack/project-config/src/branch/master/gerrit/projects.yaml#L2985-L2987 does say it should have a read-only config	18:42
fungi	so i'll run `sudo manage-projects openstack/fuel-devops` next and see if the status changes	18:43
fungi	clarkb: mordred: sound good?	18:43
fungi	s/status/state/	18:43
mordred	fungi: yes	18:44
fungi	running	18:45
fungi	and finished	18:45
fungi	state: read only	18:45
fungi	yay!!!	18:45
fungi	so now i'll just run `sudo manage-projects` i guess to do them all?	18:45
fungi	maybe i'll start a root screen session	18:45
mordred	fungi: ++	18:46
fungi	--verbose is likely to overrun the buffer. should i use it anyway?	18:46
fungi	or 2>&1 \| tee something?	18:46
fungi	i'll do that	18:47
fungi	this is staged in a root screen session on review.o.o:	18:48
fungi	manage-projects --verbose 2>&1 \| tee manage-projects.2020-05-04.retirements.log	18:48
* mordred joins		18:48
mordred	fungi: I agree - you have staged that	18:48
* mordred is ready when you are		18:48
fungi	running	18:49
clarkb	fungi: thanks sorry lunch is distracting me	18:49
fungi	clarkb: summary is we needed new gerrit image pulled as you guessed	18:50
fungi	and that merge commit shas in our containers don't match git history because they're executor/merger-constructed	18:50
fungi	but that your fix works	18:50
mordred	fungi: it at least doesn't look angry	18:50
fungi	and we're in the full run in a root screen session on review.o.o now	18:51
mordred	fungi: why did we just clone deb-cinder?	18:51
mordred	is that the thing we have to do to properly retire things?	18:51
fungi	clarkb's fix is to loop over the full project list and not just the not-retired projects list	18:52
clarkb	mordred: I think we maintain a cache of things and ya to retire we'd have to populate the cache if it were missing	18:52
fungi	though maybe if we don't want to cache we can make it smarter so that it skips any repos with a read-only state	18:52
mordred	maybe - but meh, probably for the best at this point	18:53
clarkb	ya we could probably optimize it more?	18:53
fungi	that would in theory make un-retiring harder, but in reality that needs manual intervention anyway due to a different catch-22	18:53
fungi	(can't push updated gerrit config to a read-only repo, so need to add an api call to set the desired state out of band)	18:54
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	18:57
fungi	after this completes, i think i should run it a second time and time it so we can get a rough baseline for how long a normal triggered run should require	18:58
fungi	after that, assuming no surprises, i can un-pause deployment jobs	18:58
clarkb	fungi: ++ though I don't expect it will be much longer. The noop case is much faster	18:58
clarkb	*much longer than before	18:58
fungi	right, also retired repositories are something like 10% of our total repo count	18:59
fungi	so skipping them wasn't a huge time savings anyway	18:59
mnaser	mordred: we started the constraints support inside python-builder but never got around wrapping that up. do you think we can find time to work together on that?	19:01
* mnaser is having problems building openstack images with python-builder due to that		19:02
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	19:02
mordred	mnaser: yes we can	19:03
mordred	mnaser: I think we finally got multi-arch containers sorted	19:03
mnaser	mordred: nice. i've been stuck a bit on finding ways to work around the constraints stuff (for now to unblock) but its not sustainable in the long term	19:04
mnaser	turns out msgpack 1.0.0 breaks a lot of things :	19:04
mnaser	:)	19:04
mordred	mnaser: hah	19:04
mordred	yeah - turns out contraints are important	19:04
mnaser	and that was a fun one to find too..	19:04
mordred	mnaser: lemme unfog my brain after all this multi-arch, then I'll start poking at constraints again	19:05
openstackgerrit	Mohammed Naser proposed opendev/system-config master: python-builder: drop # from line https://review.opendev.org/725374	19:18
mnaser	mordred: ^ something i caught in the midst of all of this too	19:18
openstackgerrit	Merged zuul/zuul-jobs master: Add zuul labels to images and ability to user-define more https://review.opendev.org/725339	19:19
fungi	i wonder if the depsolver in pip will obsolete the need for constraints lists in situations like this	19:19
mordred	fungi: it's a good question - although contraints still allows for a central single-point - whereas without constraints you'd have to make sure every consumer of msgpack had a version pin	19:22
fungi	yep	19:22
fungi	i mean, ideally they should if they're broken by it, but...	19:22
clarkb	that was weird had high packet loss to my irc bouncer for a minute for 5	19:22
mordred	clarkb: it wanted you to take some time off	19:23
fungi	okay, the manage-projects run completed. i'll rerun it without --verbose and time it	19:23
mordred	mnaser: hah. nice	19:23
clarkb	we are still seeing zuul-web memory use climb but we are nowhere near danger yet	19:23
clarkb	(unfortunately I think that may be pointing to python3.8 not fixing it)	19:24
fungi	or the problem not actually being a regression in the interpreter	19:24
fungi	real 0m5.165s	19:24
fungi	i'd call that fast enough that i don't care how much slower it got	19:24
clarkb	fungi: thats the first or second run?	19:24
fungi	second	19:25
clarkb	and ya that seems quck enough for our purposes	19:25
fungi	anybody want to spot-check anything before i un-pause deployments?	19:25
mordred	clarkb: next thing to try woudl be disabling jemalloc I think	19:25
clarkb	mordred: ++	19:25
mordred	clarkb: we should be able to do that just by setting LD_PRELOAD to '' in the docker-compose file	19:25
clarkb	mordred: ok that should be easy enough to do. Also packet loss coming back again :(	19:28
*** roman_g has quit IRC		19:29
fungi	#status log deployments unpaused by removing /home/zuul/DISABLE-ANSIBLE on bridge.o.o	19:32
openstackstatus	fungi: finished logging	19:32
openstackgerrit	Monty Taylor proposed opendev/system-config master: Fix siblings support in python-builder https://review.opendev.org/715717	19:32
openstackgerrit	Monty Taylor proposed opendev/system-config master: Add constraints support to python-builder https://review.opendev.org/713972	19:32
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Don't pull and retag in buildx workflow https://review.opendev.org/725380	19:39
mordred	mnaser: do you have any convenient way to verify it that ^^ works?	19:50
mordred	mnaser: also - I'm less sure about the siblings patch - we're apparently using siblings support in nodepool jobs, so I want to check in with ianw before we land that one	19:51
mordred	I'm pretty sure it's right, and I think we might just be getting lucky in nodepool	19:51
mnaser	mordred: i could build them locally.. i'm not using git right now to build things (for now): https://opendev.org/vexxhost/openstack-operator/src/branch/master/images/keystone/Dockerfile	19:51
mnaser	mordred: ACTUALLY i have something	19:51
mnaser	mordred: https://review.opendev.org/#/c/713975/	19:52
mordred	mnaser: oh - yeah - with a depends-on that should be a good test case	19:52
mnaser	mordred: feel free to update that, i think we will need to copy upper-constraints from requirements though..	19:52
mnaser	or wget it in..	19:52
mordred	kk	19:52
mordred	lemme update that patch	19:52
mnaser	because upper-constraints.txt is not inside the repo	19:53
*** roman_g has joined #opendev		19:54
mordred	mnaser: ok - updated that patch and added a pre-playbook to copy in the upper-constraints file	19:56
openstackgerrit	Clark Boylan proposed opendev/system-config master: Clear LD_PRELOAD variable on zuul-web containers https://review.opendev.org/725384	20:02
clarkb	mordred: ^ I think that is what you were suggesting	20:02
mordred	clarkb: yes - I thnk that's a great next thing to try	20:13
openstackgerrit	Monty Taylor proposed opendev/system-config master: Add constraints support to python-builder https://review.opendev.org/713972	20:21
mordred	mnaser: I pulled the siblings patch out from under the constraints patch - I think it's ditracting for now	20:22
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Use tempfile in buildx build https://review.opendev.org/725387	20:29
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: ansible-lint: use matchplay instead of matchtask https://review.opendev.org/724910	20:33
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: use zj_image instead of image as loopvar https://review.opendev.org/725012	20:33
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: use zj_log_file instead of item as loop_var https://review.opendev.org/725013	20:34
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Check blocks recursively for loops https://review.opendev.org/724967	20:34
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Update ansible-lint-rules testsuite to only test with the relevant rule https://review.opendev.org/725014	20:34
*** hillpd has joined #opendev		20:52
clarkb	the LD_PRELOAD change should land soonish. I'll restart zuul-web again once that is in place	20:52
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_ https://review.opendev.org/725392	21:04
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_ https://review.opendev.org/725392	21:08
openstackgerrit	Merged opendev/system-config master: Clear LD_PRELOAD variable on zuul-web containers https://review.opendev.org/725384	21:13
tosky	it looks like a very basic question, but... what is the API entry point for the zuul instance on zuul.openstack.org?	21:15
clarkb	tosky: https://zuul.openstack.org/api	21:17
tosky	clarkb: thanks!	21:17
*** DSpider has quit IRC		21:25
clarkb	I'm going to restart zuul-web now without jemalloc LD_PRELOAD set	21:32
clarkb	#status Log restarted zuul-web without LD_PRELOAD var set for jemalloc.	21:33
openstackstatus	clarkb: finished logging	21:33
clarkb	it seems incredibly stable over the last ~12 minutes	21:45
clarkb	maybe the issue is jemalloc afterall	21:46
clarkb	(need more data to be confident)	21:46
clarkb	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all is what I'm looking at. Specifically that first graph and how level the line is now	21:55
clarkb	I think if that holds overnight then maybe we drop it from our images entirely?	21:55
fungi	tosky: be aware that's a legacy white-labeled api endpoint, the multi-tenant url for it is https://zuul.opendev.org/api so https://zuul.opendev.org/api/tenant/openstack/projects for example	22:00
clarkb	I'm due for a bike ride and zuul-web looks stable for now s ogoing to pop out	22:00
clarkb	back in a bit	22:00
tosky	fungi: thanks; that part is clear in https://zuul-ci.org/docs/zuul/reference/web.html, but I missed the starting poin :)	22:01
fungi	tosky: also the zuul dashboard hosts dynamic api docs: https://zuul.opendev.org/openapi	22:03
tosky	that's useful, thanks	22:03
ianw	infra-root: i'm not seeing we merged either fix for nb04 and ipv6 addresses ... can we do either https://review.opendev.org/#/c/725160/ or https://review.opendev.org/#/c/725157/ or both?	22:13
*** tobiash has quit IRC		22:13
corvus	ianw: i forgot to +2 that after fixing it; +2 on 725157 now	22:18
ianw	corvus: thanks; apropos prior discussion is centos + virtualenv ok now?	22:19
openstackgerrit	Merged zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_ https://review.opendev.org/725392	22:20
corvus	ianw: i'll check the volume release; you want to check the image build?	22:23
corvus	ianw: Released volume mirror.centos successfully	22:24
corvus	ianw: looks like we should be up to date there; i'll exit out of my manual flock, so mirror updates should continue	22:24
ianw	corvus: yeah, it may be getting held up as nb04 has dropped out of zk due to the ipv6 literals coming back	22:24
ianw	thanks for looking in on mirror	22:24
ianw	00:06:55:36 for the last centos-7	22:25
corvus	ianw: that sounds about right	22:25
corvus	7 hours ago sounds about like when i started my day :)	22:26
ianw	virtualenv 20.0.20	22:26
ianw	virtualenv almost takes the record from dib for emergency point releases :)	22:27
*** avass has quit IRC		22:35
*** rchurch has quit IRC		22:36
*** rchurch has joined #opendev		22:39
*** hashar has quit IRC		22:42
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2 https://review.opendev.org/724777	22:54
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788	22:54
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776	22:54
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2 https://review.opendev.org/724777	23:10
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788	23:10
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776	23:10
*** mlavalle has quit IRC		23:25
clarkb	ianw: corvus does https://review.opendev.org/#/c/725160/1 imply the nodepool ansiblification and containering has landed?	23:27
clarkb	I was waiting on that to happen to redo my system-config reorg change	23:28
*** tosky has quit IRC		23:28
ianw	clarkb: only nb04 is affected afaik atm	23:28
ianw	so in a word, no	23:28
clarkb	got it	23:30
clarkb	ianw: is it still desireable to land that one if the other has been approved?	23:30
clarkb	zuul memory use looks very stable	23:30
ianw	clarkb: i'm ... not sure; zuul may have problems if we reuse that zk writing function? i didn't look into that. maybe we would want a similar check in zuul if it doesn't already?	23:32
openstackgerrit	Clark Boylan proposed opendev/system-config master: Stop using jemalloc in python base image https://review.opendev.org/725431	23:32
clarkb	infra-root ^ I'm pushing that now and will WIP it to collect more data from zuul01, but it is looking like jemalloc is a likely source of our problems there.	23:33
clarkb	ianw for now the change is specific to nodepool configs right?	23:34
ianw	clarkb: yes, that's the only place it writes out the zk hosts from the inventory ATM	23:34
corvus	clarkb: awesome. i guess at the end of the day, that's not a shocking conclusion is it? at least, as a hypothesis, "different malloc borks memory usage" passes the sniff test.	23:34
clarkb	corvus: ya	23:34
ianw	clarbk: but i imagine in the final switch we would want to do similar for zuul	23:35
clarkb	corvus: also worth noting the so version is different between xenial and our docker containers. It could be a bug in jemalloc	23:35
clarkb	or a bug in python using jemalloc	23:35
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2 https://review.opendev.org/724777	23:38
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788	23:38
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776	23:38
*** calcmandan has quit IRC		23:45
*** calcmandan has joined #opendev		23:46
ianw	it doesn't seem 725157 deployed itself on nb04 ... looking	23:46
ianw	https://zuul.openstack.org/build/cd7fc0ea5c694631a472ab3d491d346e was the last nodepool hourly run : @ 2020-05-04T23:04:14	23:48
ianw	ok, promote missed it https://zuul.opendev.org/t/zuul/build/f7b24f73bc4e4318a1cc42488493ee13	23:50
ianw	2020-05-04T23:10:49	23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!