Tuesday, 2021-10-19

fungi	yeah, i'm good either way	00:00
ianw	i don't feel i've been paying sufficient attention TBH	00:19
fungi	ianw: so there was a race condition in change cache cleanup where a change could get enqueued while cleanup was underway and wind up with its entry removed if it had previously been present in the cache and was up for expiration in that pass	00:48
fungi	that resulted in the release-approvals pipeline getting blocked over the weekend	00:48
fungi	so we wanted to restart on that fix	00:49
fungi	separately there's a transitional situation where builds which are paused when the scheduler is restarted will end up perpetually running from the executor's perspective, and keep nodes locked indefinitely until the executor is restarted, so a full zuul restart would clean us up from the scheduler restart last week	00:50
*** dviroel\|rover\|afk is now known as dviroel\|out		00:50
fungi	there isn't a fix per se for that second problem, but it will in theory go away once everything is being tracked in zk	00:51
fungi	there's a third minor problem where project key deletion from zk left empty parent znodes behind, which was causing the key backup cronjob to emit errors after the rename maintenance last week, clarkb has a fix up for that	00:52
fungi	and we'll presumably have to manually clean up the ones we've got, i think	00:53
fungi	the fix there isn't urgent, we'll almost certainly have it in before the next time we need to rename any projects	00:54
Clark[m]	The backups are working for the keys that do exist so that is not very urgent. Mostly just a make scary errors go away thing as they are noise	00:58
fungi	yeah, that too. it's really just noise	01:00
Clark[m]	And then I want to pull the Gerrit image for 3.3.7 build and restart to ensure the gerrit.config cleanups are happy. Also not urgent and can be done tomorrow	01:02
fungi	sounds good, my ptg schedule tomorrow is mildly less jam-packed than it was today	01:12
*** mazzy509 is now known as mazzy50		01:23
*** ysandeep\|out is now known as ysandeep		05:02
*** ykarel_ is now known as ykarel		05:20
opendevreview	Sandeep Yadav proposed zuul/zuul-jobs master: multi-node-bridge: repos to install ovs in C9 https://review.opendev.org/c/zuul/zuul-jobs/+/814516	06:12
opendevreview	Sandeep Yadav proposed zuul/zuul-jobs master: multi-node-bridge: repos to install ovs in C9 https://review.opendev.org/c/zuul/zuul-jobs/+/814516	06:20
*** ysandeep is now known as ysandeep\|afk		06:30
*** ysandeep\|afk is now known as ysandeep\|trng		06:58
*** jpena\|off is now known as jpena		07:32
*** ykarel is now known as ykarel\|lunch		08:48
*** pojadhav\|ruck is now known as pojadhav\|lunch		09:32
*** pojadhav\|lunch is now known as pojadhav		09:59
opendevreview	Pierre Riteau proposed openstack/project-config master: [kolla] Preserve Backport-Candidate and Review-Priority scores https://review.opendev.org/c/openstack/project-config/+/814548	10:14
*** ykarel\|lunch is now known as ykarel		10:34
*** pojadhav is now known as pojadhav\|ruck		10:40
yoctozepto	has anyone tried creating a gerrit query that list all changes without hashtags?	11:04
yoctozepto	meaning having 0 hashtags	11:04
*** dviroel\|out is now known as dviroel\|rover		11:06
*** jpena is now known as jpena\|lunch		11:26
*** jpena\|lunch is now known as jpena		12:15
opendevreview	Merged zuul/zuul-jobs master: ensure-podman: support Debian bullseye https://review.opendev.org/c/zuul/zuul-jobs/+/814088	12:54
*** ysandeep\|trng is now known as ysandeep		12:56
fungi	yoctozepto: i have not, but i'll readily admit i haven't messed around with hashtags much yet	13:22
yoctozepto	ack, thanks for responding	13:23
fungi	yoctozepto: did you try hashtag:'' (the hashtag parameter seems to suggest that it will only match one)	13:26
fungi	https://review.opendev.org/Documentation/user-search.html#hashtag	13:26
opendevreview	Mark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement https://review.opendev.org/c/openstack/project-config/+/814580	13:27
Clark[m]	yoctozepto: fungi: newer Gerrit seems to have the inhashtag search operator. This should allow you to search for -inhashtag:"^.*" But our Gerrit isn't new enough. If you are consistent enough about hashtags you can negate a specific list of them instead.	13:35
fungi	inhashtag is added in 3.4?	13:36
Clark[m]	I'm not sure when it got added. Gerrit upstream documents it currently but our Gerrit reports it isn't valid	13:37
Clark[m]	Might be a 3.5 festure	13:37
fungi	ahh	13:38
yoctozepto	Clark[m]: ah, thanks	13:50
*** lbragstad_ is now known as lbragstad		14:15
*** ykarel_ is now known as ykarel		14:22
opendevreview	Mark Goddard proposed openstack/project-config master: kolla-cli: enter retirement https://review.opendev.org/c/openstack/project-config/+/814597	14:48
opendevreview	Mark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement https://review.opendev.org/c/openstack/project-config/+/814580	15:05
opendevreview	Mark Goddard proposed openstack/project-config master: kolla-cli: enter retirement https://review.opendev.org/c/openstack/project-config/+/814597	15:05
opendevreview	Thiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600	15:06
opendevreview	Thiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600	15:30
opendevreview	Shnaidman Sagi (Sergey) proposed zuul/zuul-jobs master: Print version of installed podman https://review.opendev.org/c/zuul/zuul-jobs/+/814604	15:44
opendevreview	Thiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600	15:47
opendevreview	Alfredo Moralejo proposed openstack/diskimage-builder master: [WIP] Add support for CentOS Stream 9 in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/811392	16:01
opendevreview	Alfredo Moralejo proposed openstack/diskimage-builder master: [WIP] Add support for CentOS Stream 9 in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/811392	16:13
clarkb	I'm working on fixing up my zuul change for key deletion, but then I'm free to do zuul + gerrit restarts until the infra meeting and free again after that (though a bike ride this afternoon would be good too before the rain returns)	16:24
fungi	same, the last ptg session in this block ends in half an hour and then i can help with restarts/reviews	16:24
*** marios is now known as marios\|out		16:25
opendevreview	Merged openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600	16:25
*** pojadhav\|ruck is now known as pojadhav\|out		16:26
*** jpena is now known as jpena\|off		16:27
*** ysandeep is now known as ysandeep\|out		16:29
corvus	clarkb: i guess we don't need to wait on your change for the restart...? it won't fix the current situation (and we'll need to delete those znodes manually if not done already)	16:31
corvus	fungi, clarkb: are we doing a synchronized zuul+gerrit restart?	16:32
corvus	assuming "yes" to all the above, seems like aiming for 1700utc is the way to go	16:32
fungi	corvus: yes, we have gerrit things we want included in a restart	16:33
fungi	i'm up for a 17:00 utc restart of both together	16:33
corvus	kk see you then	16:33
fungi	thanks!	16:33
fungi	clarkb: you're okay to do that in ~25 minutes?	16:34
fungi	i'll give #openstack-release a heads uip	16:34
clarkb	corvus: correct don't need to wait for my change	16:34
clarkb	17:00 UTC wfm	16:34
clarkb	fungi: should we go ahead and do a docker-compose pull on review?	16:46
fungi	yeah, i can do that now	16:49
fungi	it's running in a root screen session there now	16:50
clarkb	cool docker image list shows the new image now	16:51
fungi	opendevorg/gerrit <none> a7c2687bb510 3 weeks ago 793MB	16:51
fungi	that one?	16:51
clarkb	opendevorg/gerrit 3.3 33d6300c73ad 24 hours ago 795MB that one	16:53
fungi	oh, yep, screen within tmux, i scrolled the wrong one ;)	16:54
fungi	we have more than a a terminal's worth of images listed now	16:54
clarkb	When we restart we'll have caught up to wikimedia in terms of gerrit version :)	16:54
fungi	until they upgrade again	16:54
fungi	also didn't they replace gerrit with phabricator forever ago?	16:55
clarkb	ha yes. Though I'm hoping to get the 3.4 upgrade done by the end of the year	16:55
clarkb	fungi: I think replacing gerrit has become more difficult than they anticipated (I suspect they have gerrit fans)	16:55
fungi	status notice Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again	16:59
fungi	should i send that? i copied it from the last one we did	16:59
corvus	lgtm	16:59
fungi	#status notice Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again	16:59
opendevstatus	fungi: sending notice	16:59
-opendevstatus- NOTICE: Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again		16:59
corvus	i ran zuul_pull just in case (but i'm pretty sure all the images were already there)	16:59
fungi	awesome, i'm ready for the gerrit restart once zuul is down	16:59
clarkb	fungi: I'm attached to the screen now too and will follow along	17:00
corvus	shall i stop zuul now?	17:00
clarkb	I'm ready	17:01
fungi	corvus: go for it	17:01
corvus	scheduler is stopped; feel free to proceed	17:01
fungi	gerrit is stopping	17:01
corvus	(waiting for okay to start zuul)	17:01
fungi	starting gerrit again now	17:02
fungi	[2021-10-19T17:02:23.083Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.3.7-2-g17936a0b79-dirty ready	17:02
fungi	i can pull up the webui too	17:02
fungi	clarkb: lgty?	17:03
clarkb	web ui is working for me	17:03
clarkb	ya I think its happy. reports 3.3.7 as the version too	17:03
fungi	corvus: go for starting zuul when you're ready	17:03
corvus	starting zuul	17:03
clarkb	re the gerrit.config updates we made the biggest thing I was worried about was change screen and theming since we removed the old theming config and removed the default change screen config	17:04
clarkb	but neither is a thing in gerrit anymore so would have to be a weird interaction that (and gerrit bug) for there to be any problems	17:04
clarkb	theme and change screen look as expected to me	17:04
fungi	yeah	17:04
fungi	before the restart i was seeing weird fonts in the unified diff view, looks like it's still happening too	17:05
fungi	same in side-by-side actually	17:05
clarkb	fungi: are you filtering web fonts maybe and the nfallibng back to whatever is in your browser? I don't have this issue btu I don't think I'm filtering any web fonts	17:06
fungi	probably something weird with my browser settings, but all the lines of code are rendered double-size but with single size spacing	17:06
clarkb	huh ya that isn't a problem for me. Let me double check in a different browser	17:06
fungi	neither privacy badger nor ddg privacy essentials indicate anything is being blocked	17:07
clarkb	ya I can't reproduce	17:07
corvus	looks fine for me. i tried zooming in and out and font size+leading seem to go hand-in-hand so it looks good at all zooms	17:08
fungi	it might be that newest firefox is assuming a smarter window manager than i'm using	17:11
corvus	i blame rust	17:12
corvus	;)	17:12
fungi	when i try to use its "take a screenshot" feature it complains about a background error for "getZoomFactor is not defined"	17:12
clarkb	are you on wayland?	17:12
fungi	nope, zorg with ratpoison	17:13
fungi	er, xorg	17:13
clarkb	fractional scaling and all that is a big deal with wayland	17:13
fungi	libwayland is installed, but probably just because some things link it	17:14
corvus	("zorg here!")	17:14
corvus	tenants loaded	17:15
corvus	starting re-enqueueeueueuing	17:15
clarkb	I'm glad others find that word hard to type	17:16
corvus	perhaps we should have the zuul cli accept "enq[ue]*"	17:17
fungi	we could even just set up a conditional ladder which checks whether each of the available subcommands .contains() sys.argv[1]	17:18
fungi	elif "enqueue".contains(sys.argv[1]): ...	17:19
fungi	basically allow all subcommands to be arbitrarily shortened	17:19
corvus	yeah, i like programs that do that	17:20
fungi	more magic could be added to identify ambiguous abbrevs	17:20
clarkb	\| 0026931648 \| rax-dfw \| ubuntu-focal \| c1e73330-d61f-423b-bc32-e4460f15ee56 \| 104.130.141.20 \| \| used \| 05:07:20:22 \| locked \| still shows up in a nodepool listing but the other 59 in-use nodes that got stuck seem to be gone	17:20
clarkb	I suppose ^ may be a different issue?	17:20
fungi	clarkb: held?	17:21
fungi	does --detail give you anything else useful?	17:21
clarkb	fungi: no it is "used". There is a held node from the timeframe too but thats fine	17:21
fungi	oh, yeah, it wouldn't be used in that case	17:21
clarkb	--detail doesn't give any additional useful info	17:22
corvus	it doesn't say who locked it? i thought it did	17:23
clarkb	oh is that what nl01.opendev.org-PoolWorker.rax-dfw-main-23ef88ea5474439dac253fa13c63d4f7 is?	17:24
clarkb	maybe we can restart the launcher and the lock will go away then it can retry deleting it?	17:24
clarkb	Launcher is the column id for that value	17:25
corvus	yeah. a sigusr2 thread dump might illustrate why nl01 is sitting on a locked used node and not doing anything with it	17:25
corvus	oh then that may not be the lock holder	17:26
clarkb	ya --detail's column header doesn't seem to identify the lock holder	17:26
corvus	bummer, i thought we had that :/	17:27
corvus	re-enqueue complete	17:27
corvus	at worst, we can inspect zk, but that's probably something we should expose thru the cli	17:27
clarkb	I need to figure out food as I've somehow neglected to eat anything today. Back in a bit	17:29
fungi	this is a screen capture of what the gerrit diff view has been doing in ff for me: http://fungi.yuggoth.org/tmp/ff.png	17:36
fungi	comparing the same url, chromium looks fine	17:37
fungi	i closed out the screen session on review just now, btw, seems like the restart was a success	17:48
clarkb	fungi: could it be the dark theme?	17:49
fungi	mebbe	17:50
fungi	i'll try switching it up	17:50
fungi	clarkb: aha, it's at least something to do with my account prefs, because if i load it in another container tab not authenticated, i get a reasonable looking diff	17:54
fungi	found it!	17:56
fungi	apparently changing "font size" in the diff view doesn't change the space between the lines? it was for some reason set to 24 in my preferences, dropping it to 12 seems to have fixed things	17:57
clarkb	that is similar to the issue in etherpad chopping off subsequent top of lines	17:58
clarkb	corvus: I'm looking at the lock contents in zk for that node. Is the uuid looking thing at the front of the path identifying a connection?	18:01
clarkb	hrm no it seems zk identifies connections with a session id which is a different ype of value. Any idea how to map the lock to the connection?	18:05
corvus	if the lock doesn't have an id, then you may be able to map it to a session, then find an ephemeral node owned by the same session that identifies it.	18:06
clarkb	oh right there is a way to list ephemeral nodes by sessions iirc and the locks are ephemeral?	18:07
clarkb	dump on the leader	18:07
clarkb	zk05 is the laeder	18:07
clarkb	corvus: nl01 is holding the lock. I ran dump then found the session id associated with the lock then listed session with `cons` which gave me an ip address that dig -x resoled to nl01	18:10
clarkb	so now I guess we sigusr2 on nl01 and see what might be holding up the used node deletion	18:11
fungi	nl01 is also the launcher responsible for that node, so i guess it makes sense	18:12
clarkb	neither 0026931648 nor c1e73330-d61f-423b-bc32-e4460f15ee56 show up in the thread dump. Launcher threads seem to use the nodepool id to name the thread and deleter threads the node uuid	18:15
clarkb	two thread match rax-dfw a delete thread for a different uuid and the poolworker for rax dfw	18:16
clarkb	corvus: the last thing logged by the launcher is that it unlocked the node and it is ready	18:20
clarkb	I'm wondering if we somehow lost the lock during the zk problems	18:21
clarkb	like zuul ran the job and set it to used, then nl01 grabs the lock and immediately after the zk problems occur causing nl01 to lose track	18:21
clarkb	and that happens before nl01 can log anything about doing cleanup of the used now	18:21
clarkb	*used node	18:21
clarkb	I suspect that restarting the launcher on nl01 will correct this. But I still can't find an indication for how this started	18:25
kopecmartin	hi all, how can we publish project documentation to docs.opendev.org? we tried opendev-promote-docs and also promote-tox-docs-infra but we can't still see the doc https://docs.opendev.org/opendev/	19:20
kopecmartin	I made a silly mistake somewhere	19:20
kopecmartin	for reference: https://review.opendev.org/c/openinfra/refstack/+/814635	19:21
clarkb	kopecmartin: I would start by looking at the job logs for the jobs that ran already	19:21
clarkb	https://static.opendev.org/docs/refstack/latest/ does seem to be updating so it may just be a matter of vhost stuff properly serving it?	19:22
kopecmartin	ah, yeah, that would explain it	19:23
kopecmartin	clarkb: thanks!	19:23
clarkb	kopecmartin: https://docs.opendev.org/openinfra/refstack/latest/ there you go	19:24
kopecmartin	nice, now i wonder whether it was done by promote-tox-docs-infra or opendev-promote-docs	19:24
kopecmartin	i'm gonna check the logs	19:24
fungi	yes, project docs are namespaced on docs.opendev.org since we aim to publish documentation for multiple communities	19:25
fungi	instead of writing to docs/refstack the job should be configured to publish to docs/openinfra/refstack	19:26
clarkb	ianw: I think the whole dib stack should be approved now	20:13
ianw	clarkb: thanks! i'll keep an eye on it all	20:14
opendevreview	Douglas Viroel proposed zuul/zuul-jobs master: Add FIPS enable multinode job definition https://review.opendev.org/c/zuul/zuul-jobs/+/813253	20:21
clarkb	ianw: the whole stack just failed on a tox py35 failure	20:29
clarkb	fungi: ^ we broke xenial jobs with the bindep release	20:29
clarkb	fungi: we need a packaging pin for python3.5	20:29
opendevreview	Clark Boylan proposed opendev/bindep master: Add old python packaging pin https://review.opendev.org/c/opendev/bindep/+/814647	20:32
clarkb	something like ^ then a release?	20:32
clarkb	dib can also rpobably drop python3.5 testing?	20:33
clarkb	ianw: ^ that might be quicker.	20:33
fungi	aha, we can't install latest packaging on xenial? makes sense	20:46
fungi	though surprised bindep's xenial job didn't catch it	20:46
clarkb	ya I don't understand how it got through but if you look at the error at https://zuul.opendev.org/t/openstack/build/cae47da97cff44c8a855f30378634ee1/log/job-output.txt packaging complains about invalid python and their changelog says 20.9 was the last python3.5 capable release	20:49
clarkb	I'm going to get a bike ride in now before the rain arrives tomorrow but can dig more after if we want to fully understand that	20:49
fungi	and yeah, 21.0 was tagged back at the beginning of july	20:50
fungi	have a good ride, i'll take a closer look after dinner	20:50
ianw	sorry, back, looking	20:53
*** dviroel\|rover is now known as dviroel\|rover\|afk		20:59
*** avass[m] is now known as AlbinVass[m]		21:00
ianw	i guess it's just not a path covered by the tox run	21:03
ianw	that's not it. my tox install chose packaging (20.9)	21:12
ianw	(tox py35)	21:12
ianw	seemingly so did the bindep gate tests	21:12
fungi	xenial's default pip version is too old to support python_requires metadata in packages	21:14
fungi	so it probably only failed in jobs which are not using new pip	21:14
ianw	hrm, it looks like the bindep gate uses xenial for 3.5 -> https://zuul.opendev.org/t/opendev/build/e613c1b0042549c59d07825d97b5ff05/logs	21:19
ianw	but that's not "openstack-tox-py35", it's "tox-py35"	21:20
ianw	i wonder if that's doing some pip upgrades in the tox env	21:20
fungi	the tox logs should say	21:23
ianw	oh, i think what has happened here is that the bindep jobs run "ensure-pip"	21:24
ianw	compare	21:24
ianw	bindep -> https://zuul.opendev.org/t/opendev/build/e613c1b0042549c59d07825d97b5ff05/console	21:25
ianw	dib -> https://zuul.opendev.org/t/openstack/build/cae47da97cff44c8a855f30378634ee1/console	21:25
fungi	yup, that'll do it	21:27
fungi	huh, the fix failed tox-py35 on this: https://zuul.opendev.org/t/opendev/build/1f00b3e2c8a749eca74ee50a7cc17d44/console#1/0/16/ubuntu-xenial	21:37
ianw	why does openstack-tox-py35 run zuul-jobs/playbooks/tox/post.yaml but not zuul-jobs/playbooks/tox/pre.yaml ?	21:37
ianw	tox-py35 really should have run playbooks/tox/pre.yaml, right? from https://opendev.org/zuul/zuul-jobs/src/branch/master/zuul.d/python-jobs.yaml#L42	21:40
fungi	also someone just e-mailed openstack-discuss asking for help logging into their gerrit account, seems like it might be another case of a duplicate resulting from an address change in ubuntuone	21:44
ianw	it really does seem to me that openstack-tox-p35 should ultimately parent to "tox", which should have a pre-run.yaml step that runs ensure-tox ,which will run ensure-pip, which will upgrade things	21:49
ianw	is it possible the dib problem and the failing bindep fix both stem from some other root cause relating to pip not upgrading?	21:50
ianw	possibly the zuul restart ~ 5 hours ago ... ?	21:52
fungi	something that's causing some playbooks to no longer be run?	21:54
ianw	it seems unlikely looking at recent changes ... but i am struggling to see why that playbook wouldn't run	21:55
ianw	i mean compare console of	21:57
ianw	https://zuul.opendev.org/t/opendev/build/7966ab9ee15f4f3e8460b23652cfddc5/console (tox-py35, earlier run)	21:58
ianw	https://zuul.opendev.org/t/opendev/build/1f00b3e2c8a749eca74ee50a7cc17d44/console (tox-py35, failing run now)	21:58
ianw	it's actually missing "tox/pre.yaml" and "tox/run.yaml" ... ?	21:59
ianw	ohh, i see: pre/unittests.yaml is the one that's failing. further bits are skipped	22:01
ianw	oohhh, i further see -- the broken bindep has broken the bindep testing	22:02
ianw	ok ... soooo ...	22:14
ianw	we have created the on-image bindep 2.10.0 @ https://nb02.opendev.org/ubuntu-xenial-0000210004.log	22:14
ianw	for some reason, this has created /usr/bindep-env/ with "2021-10-19 14:22:49.157 \| You are using pip version 8.1.1, however version 21.3 is available."	22:15
ianw	this has pulled packaging 21 into this venv (incorrectly)	22:16
ianw	bindep uses this to setup the tox environment to run the bindep tests, hence the recent explosion	22:17
opendevreview	Ian Wienand proposed openstack/project-config master: infra-package-needs: install latest pip https://review.opendev.org/c/openstack/project-config/+/814677	22:20
ianw	actually we should probably do that in all venvs we prime on the images	22:22
opendevreview	Ian Wienand proposed openstack/project-config master: infra-package-needs: install latest pip https://review.opendev.org/c/openstack/project-config/+/814677	22:22
ianw	since that dib stack got -2'd anyway, i could rebase that on a job to remove py35 testing which would be a workaround for dib, for now	22:35
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: epel: match replacement better https://review.opendev.org/c/openstack/diskimage-builder/+/813922	22:39
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Revert "Allowing ubuntu element use local image" https://review.opendev.org/c/openstack/diskimage-builder/+/814094	22:39
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: ubuntu-systemd-container: deprecate and remove jobs https://review.opendev.org/c/openstack/diskimage-builder/+/814068	22:39
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: ubuntu: add Focal test https://review.opendev.org/c/openstack/diskimage-builder/+/814072	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: functests: drop apt-sources https://review.opendev.org/c/openstack/diskimage-builder/+/814074	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: centos7 : drop functional testing https://review.opendev.org/c/openstack/diskimage-builder/+/814075	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: functests: drop minimal tests in the gate https://review.opendev.org/c/openstack/diskimage-builder/+/814078	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Remove extras job, put gentoo job in gate https://review.opendev.org/c/openstack/diskimage-builder/+/814079	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Simplify functests job https://review.opendev.org/c/openstack/diskimage-builder/+/814080	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Run functional tests on Debian Bullseye https://review.opendev.org/c/openstack/diskimage-builder/+/814081	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Update centos element for 9-stream https://review.opendev.org/c/openstack/diskimage-builder/+/806819	22:40
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Remove py35 tox jobs https://review.opendev.org/c/openstack/diskimage-builder/+/814680	22:40
clarkb	fungi: oh cool you figured it out. I had a eureka moment on my bike ride. I think that when we build our images we do so with old pip but when we run tox we do so with newer pip and that made the bindep jobs pass	22:40
clarkb	fungi: I think my change is correct given that	22:40
clarkb	now why did it retry limit in the gate on tox py35	22:41
clarkb	its a chicken and egg issue I think	22:41
clarkb	fungi: mayeb we manually test it in a docker container and if that works force merge then do a release?	22:42
ianw	clarkb: yeah, its actually the pip in "python -m venv" -- which must be vendored? it's 8 and even the system one is 9	22:42
ianw	clarkb: i think we just need to rebuild images with https://review.opendev.org/c/openstack/project-config/+/814677	22:42
clarkb	ianw: iirc we override it in zuul jobs to be 9 out of our ppa but in dib builds we dont do that and get 8	22:42
clarkb	ah yup I think that will do it too	22:43
clarkb	however it may struggle to update to latest pip for the same reason	22:43
clarkb	we might have to do it in two passes. pip install -U pip<someknowngoodver && pip install -U pip	22:44
clarkb	ianw: ^ I expect that will be necessary	22:44
corvus	clarkb, fungi, ianw, mordred: i'm starting to think that zuul may need larger test nodes to run its unit tests.	22:44
corvus	have we (opendev) thought about expanding the options for test node sizes?	22:45
clarkb	corvus: we do have larger labels available. Maybe give them a go and see if it helps?	22:45
ianw	clarkb: i'm not sure i remember why updating from 8 failed?	22:45
clarkb	corvus: we've alerady done it a coupel of years ago, its just that the availability of those nodes is more limited	22:45
corvus	are they multi-region? i thought maybe it was just one region for one project	22:45
fungi	and if it does, we can think about how we might roll those out more broadly	22:45
clarkb	corvus: they are multiregion. airship and vexxhost iirc.	22:45
fungi	they're in at least two regions	22:45
corvus	airship=citycloud?	22:46
clarkb	corvus: yes	22:46
clarkb	and when we had the donnyd basement cloud we had them there too	22:47
ianw	clarkb: i think we only upgraded to 9 because of mirror issues, which wouldn't affect the dib build https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-pip/tasks/xenial.yaml#L1	22:47
ianw	(sorry i know we have two conversations going on :)	22:47
clarkb	ianw: but 8 doesn't do the package metadata which is necessary to install the newest pip that supports python3.5 I think	22:47
corvus	[as an aside, i think 2 things are at play: 1) zuul and openstack have different constraints/goals/etc; 2) even in openstack's case, it's probably reasonable to reconsider whether "standard issue laptop from 12 years ago" is still the right baseline for unit tests :) ]	22:47
clarkb	ianw: ya pip 21 and newer doesn't support python3.5	22:48
clarkb	corvus: I think the broader issue is that getting these resources to fit into clouds is difficult. The bigger nodes we do the less throughput we can offer	22:48
clarkb	and that has detrimental affects in other ways	22:48
fungi	well, not entirely true if the jobs run faster	22:49
clarkb	fungi: for most of our workload (openstack) we are cpu constrained	22:49
corvus	clarkb: ++ especially if the cloud tenant is designed specifically for one flavor	22:49
clarkb	fungi: and usually we can get more memory but not more cpu	22:49
fungi	clarkb: gotta wonder what percentage of overall node-hours are consumed by dog-slow jobs grinding in swap thrash though	22:49
fungi	those could significantly skew the overall usage	22:50
corvus	i feel like zuul-web is insufficiently responsive	22:50
clarkb	ya thats fair its possible we could see cpu be freed for real work	22:50
corvus	maybe because zuul-scheduler is busy?	22:51
corvus	yep, it's better now. nm.	22:51
clarkb	corvus: ya I agree it loads status but very slowly	22:51
corvus	i think it's building a bunch of layouts	22:51
clarkb	corvus: re zuul specifically are you finding that it is memory constrainted or cpu or both? or maybe disk?	22:51
fungi	it would be an interesting experiment to switch to 16gb flavors everywhere, but we'd probably be unable to roll it back since a bunch of projects would unknowingly merge changes which consumed a lot more ram	22:51
clarkb	disk is probably the most difficult of the bunch to address	22:51
corvus	is ubuntu-bionic-32GB what i'm looking for?	22:52
fungi	corvus: we should probably add a focal version of those too	22:52
clarkb	ianw: I just confirmed on an ubuntu xenial container that we'll have to two step udpate pip	22:53
ianw	clarkb: yep, me too :) just fixing, a great observation :)	22:53
clarkb	corvus: I think the flavors are called -expanded	22:53
clarkb	for 16gb and then there is a 32gb flavor which is also available	22:53
clarkb	might be good to check against both?	22:53
corvus	yep. -32GB is only 1 region	22:54
corvus	oh wait, ubuntu-bionic-expanded is also only 1 region	22:54
clarkb	hrm when we did the vexxhost stuff did we not add them to the existing pools /me looks	22:54
corvus	there's a ubuntu-bionic-expanded-vexxhost	22:55
clarkb	ya ok so we did split them up like that hrm	22:55
clarkb	I think normalizing that better is a reasonable thing to do	22:55
corvus	but it's also one region. so maybe there's confusion thinking that ubuntu-bionic-expanded is 2x, but really it's ubuntu-bionic-expanded x1 and ubuntu-bionic-expanded-vexxhost x1	22:55
clarkb	corvus: yup exactly. I think we could do two regions but haven't	22:55
corvus	okay. i'll come up with something. gimme a few mins	22:55
opendevreview	Ian Wienand proposed openstack/project-config master: infra-package-needs: install latest pip https://review.opendev.org/c/openstack/project-config/+/814677	22:56
clarkb	ianw: pip install -U pip<21 && pip install -U pip?	22:56
ianw	i can make it more like that if you like	22:56
clarkb	ianw: no your change is fine. I did suggest maybe using <21 in the xenial case but I seriously doubt we'll get a new 20.x release	22:57
corvus	what's the purpose of ubuntu-bionic-vexxhost ? it's just an 8g node; seems the same as ubuntu-bionic	22:58
ianw	corvus: i have some feeling that may be for kvm nesting?	22:59
corvus	oh, like it's just "get be a bionic node on vexxhost because they have kvm"?	22:59
corvus	nested virt	22:59
clarkb	corvus: I think thats actually the big memory flavor with 8vcpus not 8gb memory	23:00
clarkb	corvus: you should double check with the cloud	23:00
clarkb	this really could use some normalizing and maybe comments to explain the different cloud flavors since their choices don't encessary mimic our choices in naming scheme	23:01
clarkb	nested-virt-ubuntu-focal <- that might actually be a big memory server	23:01
clarkb	ah but it isn't in other clouds so that would be the issue. We want a new label using that flavor name and the -expanded theme on our side	23:01
fungi	yes, part of why we switched vexxhost nodes out of our normal pool is their 8vcpu flavor switched to coming with 32gb ram, and then a zuul change was merged after passing testing on one of those which used more memory causing it to no longer work on 8gb nodes, so we isolated them to nonstandard labels	23:02
corvus	(was zuul-operator, but yeah)	23:03
fungi	ahh, yep sorry	23:03
corvus	and yeah, v3-standard-8 seems to be 8cpu 32gb ram	23:03
clarkb	fungi: running https://review.opendev.org/c/opendev/bindep/+/814647 locally in a xenial container works and the non py35 jobs passed on that change. Ithink we can force merge and make a release of that	23:04
clarkb	fungi: but I'll defer to you on that since I wrote the change	23:04
fungi	vexxhost's cpu:ram ratio on their hardware is apparently ~1cpu:1gb, so they wanted to align their flavors to better fit the systems	23:04
fungi	clarkb: so you're sure >3.5 is correct and doesn't need to be >=3.6 instead?	23:05
fungi	(per my inline comment on it)	23:05
clarkb	it seems to have worked but that is a good point. Let me update it so that we don't have human confusion at least	23:06
fungi	i thought we'd used >= elsewhere over concerns that >3.5 would still match 3.5.x versions	23:06
fungi	clarkb: your xenial container had distro-supplied pip? (9.something was it?)	23:07
clarkb	packaging ; python_version >= '3.6' and packaging<21.0 ; python_version < '3.6'	23:07
clarkb	fungi: no I had to do the two pass thing I described above before installing tox	23:07
clarkb	apt-get install python3-pip && pip3 isntall -U 'pip<21' && pip install -U pip && pip install tox	23:08
opendevreview	Clark Boylan proposed opendev/bindep master: Add old python packaging pin https://review.opendev.org/c/opendev/bindep/+/814647	23:09
clarkb	fungi: ^ like that?	23:09
fungi	ahh, okay. in that case you likely had new enough pip that it wouldn't have downloaded packaging 21.0 anyway, right?	23:09
fungi	i thought the problem was you needed old pip without python_requires metadata support in order to trigger it	23:10
ianw	yeah i'm not sure we need the pin, we just need up-to-date pip's?	23:10
fungi	because newer pip knows not to install a version of packaging which says it won't work with python 3.5	23:10
clarkb	oh right so in my test I need to downgrade pip back to 8	23:11
clarkb	the reason I updated was I needed to install tox	23:11
clarkb	I think making bindep work with older pip is a reasonabel thing given its position in bootstrapping things	23:11
opendevreview	James E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific https://review.opendev.org/c/openstack/project-config/+/814683	23:11
clarkb	other tools I wouldn't worry too much	23:11
corvus	clarkb, fungi, ianw: ^ that's the minimal change to get a 'large node' on 2 clouds.	23:12
fungi	right, i'm good with the change, just pointing out the hole in the test methodology	23:12
clarkb	infra-root on a hunch about slowness of things re corvus observation that zuul status was lsow and being told I couldn't resolve review.o.o locally I checked our nsX servers and only ns1 is running and nsd	23:12
clarkb	Let me finish up this bindep checking then I can look closer if no one sle has addressed that yet	23:12
fungi	i'll look at the nameservers now	23:13
corvus	i confess, i'm still not sure whether that should be in the "main" pool which holds the "nested-virt-*" labels, or the "vexxhost-specific" pool which holds the "vexxhost" labels	23:13
opendevreview	James E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific https://review.opendev.org/c/openstack/project-config/+/814683	23:13
clarkb	corvus: I think the main pool	23:14
fungi	systemctl status says nsd on ns2 crashed on 2021-08-02 at 01:45:31 UTC (2 months 18 days ago), so looks like we probably no longer have a log of why	23:14
corvus	clarkb: why's that? the big ones are in the vexxhost-specific pool	23:15
fungi	uptime for ns2 is 78 days, which looks suspiciously similar	23:15
corvus	"journalctl -fu nsd" says it failed to start but not why	23:15
fungi	i think this means the server was rebooted and nsd crashed during boot?	23:15
fungi	this may have been during vexxhost server migrations?	23:16
fungi	(ns2 is in vexxhost)	23:16
clarkb	corvus: I think vexxhost pool was done when our vexxhost tenant was wanting to try some stuff and didn't care about signle region issues	23:16
fungi	i vaguely recall those were going on around that time	23:17
clarkb	corvus: I suspect that now we can fold the vexxhost specific stuff into main since we'er doing that normally now	23:17
corvus	clarkb: okay i'll put it in main	23:17
opendevreview	James E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB https://review.opendev.org/c/openstack/project-config/+/814683	23:17
fungi	i manually stopped and started nsd with systemctl and it's running now	23:17
clarkb	if my container hadn't said "I can't resolve this" I wouldn't have thought to check the nsds	23:18
ianw	clarkb: if you can look at https://review.opendev.org/c/openstack/diskimage-builder/+/814680 to remove py35 jobs on dib when all this over that would be good too	23:18
clarkb	fungi: corvus: do we need to force a replication from adns now?	23:19
clarkb	otherwise ns2 might serve old stale data?	23:19
clarkb	ianw: can do	23:19
fungi	ns1 and ns2 are serving the same serial on the opendev.org soa	23:19
fungi	so i think it's all good now?	23:19
fungi	#status log Manually restarted nsd on ns2.opendev.org, which seems to have failed to start at boot	23:19
opendevstatus	fungi: finished logging	23:20
fungi	clarkb: i'll check the logs, but nsd ought to be smart enough to not take requests until after it checks serials on its zones against adns1 and initiates any necessary zone transfers	23:21
clarkb	ya if the serial is the same we should be good	23:21
clarkb	heh and now is paste unhappy?	23:22
clarkb	there it goes maybe my local resolver hasn't figured out ns2 is happy again	23:22
fungi	probably the usual db socket timeout	23:22
corvus	my git review running right now is very slow	23:25
clarkb	I have pretty high packet loss to paste.o.o	23:25
clarkb	I think that explains it	23:25
clarkb	pign to review is fine though so I don't know that explains a slow git review	23:26
clarkb	fungi: ianw https://gist.github.com/cboylan/a14e3458f187ccd3561c8fe96b82509b that hsould be a better test of the bindep install with pip 8.1.1	23:27
corvus	seems better now. :/	23:27
ianw	clarkb: so pbr is really in the same boat?	23:28
clarkb	ianw: pbr is in a different boat :( pbr is a setup requires which get install by easy_install. easy_install doesn't support SNI on xenial (and maybe bionic? I don't remember how far back that went) and pypi is SNI only on their CDN now	23:30
clarkb	ianw: pip does do SNI even when old like that so you have to install pbr first then install other things that use pbr :(	23:30
clarkb	does anyone else have trouble getting to paste?	23:30
clarkb	I'm wondering if we're going to get a message from rax saying the host it is on had trouble and got rebooted or if this is specific to me	23:30
clarkb	via ipv4 fwiw	23:30
fungi	i'm getting no icmp6 echo replies	23:31
ianw	agree from .au too	23:31
fungi	nor can i ping it over ipv4	23:31
fungi	oh, intermittent response now	23:32
fungi	55.5556% packet loss	23:32
fungi	i'll see if i can get to the console for it	23:32
fungi	mmm, i was able to ssh in just now	23:32
fungi	but extremely laggy	23:32
fungi	23:33:00 up 28 days, 18:47, 1 user, load average: 0.00, 0.00, 0.00	23:33
fungi	so i don't think the server is being hammered	23:33
ianw	i just pulled up a console and there's nothing but a login prompt that responds on it	23:33
fungi	likely network upstream from the instance has many sads	23:33
fungi	no ticket from rackspace about any issues yet though	23:34
clarkb	I guess we wait it out for a bit then?	23:36
ianw	i'm not sure i was aware of that pbr snafu	23:38
clarkb	I've just responded to a user on openstack-discuss that cannot login to gerrit because they are trying to login with a new openid that is associated with an old gerrit account and gerrit won't create a new account with conflicting ids	23:39
clarkb	Just a heads up here as I've asked them to reach out on IRC as its a bit easier to debug this stuff interactively	23:39
clarkb	but I did make a couple of suggestions for how we can proceed in the email should it come up when I am not around	23:40
clarkb	ianw: I +2'd the dib removal of py35 jobs but left a note	23:43
clarkb	ianw: we very intentionally made pbr continue to support really old stuff because setup_requires also can't effectively pin deps	23:45
clarkb	ianw: which basically maens you always get the latest thing even on the oldest system you've got	23:45
clarkb	we really have to be careful with pbr to not add a bunch of fancy new python stuff	23:45
fungi	or to sufficiently guard it so that it only gets called on new enough python and has working fallbacks	23:46
ianw	i forget why we added openstack-python3-wallaby-jobs instead of individual tox jobs ... to git blame!	23:50
clarkb	fungi: note we still expect https://review.opendev.org/c/opendev/bindep/+/814647 to fail in the gate right? Are you just double checking it against the newer pythons?	23:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!