Wednesday, 2021-03-31

*** artom has quit IRC		00:14
*** artom has joined #opendev		00:14
clarkb	fungi: it looks like maybe the zuul restart caught the periodic jobs for requirements in a weird spot? they are all retry limits and the ref is 000000	00:18
clarkb	(just noting it, I expect that the 0600 enqueing of those jobs will be fine)	00:18
fungi	i meant to follow up on that, i got several tracebacks from reenqueues	00:20
fungi	checking to see if i can tell which ones	00:20
fungi	so this is what i ran for it: zuul enqueue-ref --tenant openstack --pipeline periodic --project openstack/requirements --ref refs/heads/master	00:21
fungi	no errors from that	00:22
fungi	the ones which did throw errors were pyca/cryptography	00:23
fungi	http://paste.openstack.org/show/804064	00:24
fungi	clarkb: the retry limit may be separate from the reason for the restarts	00:29
fungi	retries	00:29
fungi	er, let me retry	00:29
fungi	the reason for the retries may be unrelated to the 0 ref	00:30
openstackgerrit	Merged opendev/system-config master: review01.openstack.org: add key for gerrit data copying https://review.opendev.org/c/opendev/system-config/+/783778	00:30
fungi	last zuul scheduler restart we saw a similar situation with an hourly deploy item enqueued with a 0 ref, but it grabbed the branch anyway	00:30
*** hamalq_ has quit IRC		00:49
openstackgerrit	Jeremy Stanley proposed zuul/zuul-jobs master: Document algorithm var for remove-build-sshkey https://review.opendev.org/c/zuul/zuul-jobs/+/783988	00:56
*** iurygregory has quit IRC		01:16
*** iurygregory has joined #opendev		01:17
*** iurygregory has quit IRC		01:18
*** iurygregory has joined #opendev		01:18
*** osmanlicilegi has joined #opendev		01:20
openstackgerrit	Jeremy Stanley proposed opendev/base-jobs master: Clean up OpenEdge configuration https://review.opendev.org/c/opendev/base-jobs/+/783989	01:44
openstackgerrit	Jeremy Stanley proposed openstack/project-config master: Clean up OpenEdge configuration https://review.opendev.org/c/openstack/project-config/+/783990	01:44
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Clean up OpenEdge configuration https://review.opendev.org/c/opendev/system-config/+/783991	01:45
*** iurygregory has quit IRC		02:08
*** iurygregory has joined #opendev		02:09
ianw	our new review02.opendev.org can't ping review01.openstack.org via ipv6, but the other way (review01 -> review02) does work	02:19
ianw	i'm taking suggestions on how i might have messed this up :)	02:20
ianw	i am connecting to review02 via ipv6. i can also ping it locally here. so it's not ipv6 in general	02:25
ianw	i know nobody is around, but dumping some debugging info in #vexxhost channel	02:38
*** diablo_rojo has quit IRC		02:40
fungi	i can take a look when i wake up too	02:48
ianw	fungi: :) thanks, you'll probably have more cross-over with vexxhost people	02:49
fungi	it does seem on the face to be similar to some of the ipv6 oddness we've seen with rackspace in the past, so i wouldn't assume there's anything to do with how things are set up in vexxhost	02:50
fungi	anyway, passing out now, will sleep on it	02:51
ianw	yeah; i agree. however in that case we usually did see the packets coming into the host, which responded, but the packets never found their way back	02:51
fungi	true	02:51
ianw	in this case, a tcpdump doesn't show the ping packets making to the host	02:51
ianw	also of major ipv6 things i can think of, it can't seem to ping most, but can ping google	02:52
ianw	anyway, i have review02 now syncing via ipv4	03:07
ianw	~45MB/s so not too shabby	03:08
*** akahat has quit IRC		03:08
*** kopecmartin has quit IRC		03:09
*** fbo has quit IRC		03:09
*** kopecmartin has joined #opendev		03:13
*** fbo has joined #opendev		03:14
*** akahat has joined #opendev		03:22
*** ykarel\|away has joined #opendev		04:20
*** ykarel\|away is now known as ykarel		04:39
*** marios has joined #opendev		05:03
*** zbr\|rover4 has joined #opendev		05:04
*** zbr\|rover has quit IRC		05:06
*** zbr\|rover4 is now known as zbr\|rover		05:06
*** whoami-rajat has joined #opendev		05:17
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Add upload-logs-azure role https://review.opendev.org/c/zuul/zuul-jobs/+/782004	05:27
*** auristor has quit IRC		05:27
ianw	ok, https://review02.opendev.org has some content	05:41
*** ysandeep\|away is now known as ysandeep		05:59
openstackgerrit	Ian Wienand proposed opendev/system-config master: AFS documentation : add notes on replication https://review.opendev.org/c/opendev/system-config/+/784002	06:01
openstackgerrit	Ian Wienand proposed opendev/system-config master: review02 : bump heap limit to 96gb https://review.opendev.org/c/opendev/system-config/+/784003	06:01
ianw	time docker-compose run shell java -jar /var/gerrit/bin/gerrit.war reindex -d /var/gerrit --threads 32	06:04
ianw	real 50m8.443s	06:04
*** ralonsoh has joined #opendev		06:10
*** slaweq has joined #opendev		06:10
*** sboyron has joined #opendev		06:21
*** eolivare has joined #opendev		06:30
*** hashar has joined #opendev		06:45
openstackgerrit	Hervé Beraud proposed openstack/project-config master: Use publish-to-pypi on barbican ansible roles https://review.opendev.org/c/openstack/project-config/+/784011	06:52
ianw	24 threads was "real 52m10.284s" fyi	07:19
ianw	i just rebooted the review02.opendev.org host, and all ipv6 seems to work now	07:25
*** tosky has joined #opendev		07:33
*** ysandeep is now known as ysandeep\|lunch		07:47
*** ykarel has quit IRC		08:01
openstackgerrit	Merged opendev/irc-meetings master: Remove Automation SIG meeting https://review.opendev.org/c/opendev/irc-meetings/+/783878	08:06
*** dpawlik0 is now known as dpawlik		08:08
*** hrw has joined #opendev		08:11
hrw	morning	08:11
hrw	can someone help me get centos-8-stream-arm64 node running?	08:14
hrw	project-config has such	08:14
hrw	https://zuul.openstack.org/nodes does not	08:14
openstackgerrit	Merged openstack/project-config master: Use publish-to-pypi on barbican ansible roles https://review.opendev.org/c/openstack/project-config/+/784011	08:16
ianw	hrw: this build doesn't look good	08:21
ianw	https://nb03.opendev.org/centos-8-stream-arm64-0000001549.log	08:22
hrw	ianw: let me look	08:22
ianw	2021-03-31 07:44:32.339 \| + /usr/sbin/grub2-install '--modules=part_msdos part_gpt lvm' --removable --force /dev/loop6	08:22
ianw	2021-03-31 07:44:32.341 \| /usr/sbin/grub2-install: error: this utility cannot be used for EFI platforms because it does not support UEFI Secure Boot.	08:22
ianw	it may be possible we fixed this and we haven't either done a dib release or included a new dib release in the nodepool container...	08:22
ianw	diskimage-builder version 3.7.0	08:22
hrw	grub--	08:23
ianw	looks like https://review.opendev.org/c/openstack/diskimage-builder/+/779106	08:24
ianw	it seems we are due a release	08:24
hrw	looks like	08:25
ianw	ok, i pushed 3.8.0, but we'll have to pull into nodepool and then deploy to builders. sorry, no quick route there :/	08:27
hrw	no problem, happens	08:28
ianw	ok, https://review.opendev.org/c/zuul/nodepool/+/784026 will start the process	08:29
*** ykarel has joined #opendev		08:43
*** ykarel is now known as ykarel\|lunch		08:43
jrosser	debian-bullseye-updates and debian-bullseye-backports don't seem to be being mirrored, the logs are zero length here https://files.openstack.org/mirror/logs/reprepro/	09:05
*** ysandeep\|lunch is now known as ysandeep		09:07
*** klonn has joined #opendev		09:11
*** klonn has quit IRC		09:50
*** gibi is now known as gibi_away		09:52
*** hashar has quit IRC		09:52
hrw	jrosser: are there such repos upstream already?	09:52
*** klonn has joined #opendev		09:52
jrosser	hrw: they seem to be here http://ftp.uk.debian.org/debian/dists/	09:52
hrw	o, they are. nice	09:52
hrw	jrosser: note that https://files.openstack.org/mirror/logs/reprepro/debian-buster-updates.log.1 has content so perhaps there was nothing to mirror last time reprepro ran	09:54
hrw	ops, wrong version	09:55
jrosser	theres a patch to build images for bullseye which fails because it tries to add -updates and -backports repos and apt-update is upset that theres no Releases file	09:56
*** ykarel\|lunch is now known as ykarel		10:04
mordred	bullseye is current testing - it's not going to have -updates or -backports yet	10:32
mordred	it won't grow working versions of those until it is actually released	10:32
mordred	hrm. I take that back - I agree that -updates exists for real	10:33
chkumar\|ruck	Hello Infra, we are seeing few retry limits on one patch https://zuul.opendev.org/t/openstack/status#782187 , please have a look thanks!	10:34
chkumar\|ruck	join name : tripleo-ansible-centos-8-molecule-tripleo-modules	10:35
chkumar\|ruck	*job	10:35
mordred	same with -backports - how weird (although I gotta say it makes automation nice)	10:35
* mordred goes back to morning caffine		10:35
chkumar\|ruck	it is the earlier retry_limit job https://zuul.opendev.org/t/openstack/build/d28be58628484f92a36bd8ab87279d6e	10:35
*** klonn has quit IRC		10:45
*** mugsie__ is now known as mugsie		11:01
*** dtantsur\|afk is now known as dtantsur		11:34
*** lpetrut has joined #opendev		11:37
fungi	jrosser: hrw: mordred: catching up, but the problem is that reprepro won't create empty repositories, even if they exist empty at the source end. there is set of commands we can run to create the empty indices, documented in the reprepro manpage i think, i vaguely recall doing that for buster	11:41
fungi	chkumar\|ruck: zbr\|rover was also asking about those retries in #openstack-infra, seemed like it could be related to a specific job or node type, i can help get an autohold set up for it in a bit and then we can try to retrigger the failure and investigate the resultant state of the vm after the failure and also try to extract a vm console log from it	11:43
mordred	Ahhh right	11:45
chkumar\|ruck	fungi: thanks :-)	11:50
hrw	fungi: maybe scripts which call reprepro should take care of creating empty RELEASE-{backports,update} ones when new release gets added?	11:56
hrw	fungi: so in 2 years time we will not get into same discussion again ;)	11:56
fungi	hrw: maybe, but how to add that requires some thought in declarative configuration management. we don't have a lot of tasks which are run-once-on-setup	11:56
*** sshnaidm\|off is now known as sshnaidm		11:57
hrw	fungi: understood	11:58
fungi	though maybe our script which runs reprepro could run it if the suites are missing at the end or something	11:59
fungi	basically "create these empty if they don't exist at completion"	12:00
hrw	;)	12:01
openstackgerrit	Guillaume Chauvel proposed opendev/gear master: Update SSL exceptions https://review.opendev.org/c/opendev/gear/+/784082	12:01
openstackgerrit	Guillaume Chauvel proposed opendev/gear master: WIP: Client: use NonBlockingConnection to allow TLSv1.3 https://review.opendev.org/c/opendev/gear/+/784083	12:01
*** auristor has joined #opendev		12:05
openstackgerrit	Jeremy Stanley proposed opendev/zone-opendev.org master: Clean up OpenEdge configuration https://review.opendev.org/c/opendev/zone-opendev.org/+/784086	12:10
openstackgerrit	Guillaume Chauvel proposed opendev/gear master: WIP: Client: use NonBlockingConnection to allow TLSv1.3 https://review.opendev.org/c/opendev/gear/+/784083	12:14
fungi	zbr\|rover: chkumar\|ruck: i've set an autohold for the failing job on https://review.opendev.org/782187 so feel free to recheck it and we can take a closer look at the node once it fails again	12:35
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible https://review.opendev.org/c/zuul/zuul-jobs/+/780662	12:42
fungi	jrosser: hrw: mordred: according to the reprepro manpage, i think something like `reprepro export buster-updates` is what we want, but i'll get some caffeine in me before attempting	12:42
hrw	fungi: s/buster/bullseye/ and also -backports but yes, it looks like it	12:44
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible https://review.opendev.org/c/zuul/zuul-jobs/+/780662	12:45
jrosser	fungi: thanks for taking a look at that :)	12:45
*** smcginnis has quit IRC		12:53
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible https://review.opendev.org/c/zuul/zuul-jobs/+/780662	13:04
fungi	jrosser: hrw: mordred: apparently i did it almost a year ago for focal-backports: http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-04-24.log.html#t2020-04-24T00:36:00	13:13
hrw	"we just have to remember to do that whenever adding a new release i guess"	13:14
hrw	;)	13:14
fungi	well, or like i said, maybe if we can detect in our script that no indices were created for a configured dist, we make it run that command at the end	13:18
fungi	i'm going to work on that angle here in a bit	13:18
*** ralonsoh has left #opendev		13:20
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible https://review.opendev.org/c/zuul/zuul-jobs/+/780662	13:30
zbr\|rover	fungi: probably is going to happen with https://zuul.opendev.org/t/openstack/stream/5ebc409e1d554b89b5569c6fbbfcc1f7?logfile=console.log too	13:32
zbr\|rover	already >12 min without any reply, probably is already stuck.	13:33
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible https://review.opendev.org/c/zuul/zuul-jobs/+/780662	13:34
zbr\|rover	fungi: yep, it did fail too.	13:36
*** darshna has joined #opendev		14:09
*** ykarel is now known as ykarel\|away		14:40
clarkb	chkumar\|ruck: zbr\|rover fungi note if the problem is network connectivity holding the node may not be helpful	14:46
fungi	clarkb: well, the idea is that i may at least be able to capture a vm console log from the held node, or reboot it with the nova api, or even boot it on a rescue image to get at the logs	14:48
fungi	but we've also seen similar cases where something during the job drove system load or disk i/o up so high that sshd ceased responding fast enough to beat ansible's timeout	14:49
clarkb	fair enough. The last time tripleo had network setup issues the console logs did help in that it showed that network manager was undoing our static ip config	14:49
fungi	and after a time the vm recovers and can be reached again	14:49
zbr\|rover	i am now trying to manually run the same tests that are happening inside that job, in may be able to identify if there is an issue with these tests or not.	14:56
zbr\|rover	i will find out soon, already passed 1/6	14:57
*** mfixtex has joined #opendev		15:14
*** lpetrut has quit IRC		15:22
*** Dmitrii-Sh4 has joined #opendev		15:26
*** Dmitrii-Sh has quit IRC		15:26
*** Dmitrii-Sh4 is now known as Dmitrii-Sh		15:26
*** noonedeadpunk has quit IRC		15:27
*** noonedeadpunk has joined #opendev		15:28
*** ykarel\|away has quit IRC		15:37
*** hashar has joined #opendev		15:41
*** ysandeep is now known as ysandeep\|away		15:47
*** spotz has joined #opendev		15:50
*** diablo_rojo has joined #opendev		16:06
*** hamalq has joined #opendev		16:18
*** hamalq_ has joined #opendev		16:19
*** hamalq has quit IRC		16:22
*** Dmitrii-Sh has quit IRC		16:23
*** Dmitrii-Sh has joined #opendev		16:24
*** hamalq_ has quit IRC		16:41
*** hamalq has joined #opendev		16:41
*** eolivare has quit IRC		16:43
*** marios is now known as marios\|out		16:47
*** marios\|out has quit IRC		16:59
*** dtantsur is now known as dtantsur\|afk		17:04
corvus	i'm getting started on looking into the zuul memory leak now	17:23
zbr\|rover	fungi: clarkb: re stuck job, we are currently making it nv and we have another patch that may fix the root cause, but we ware not sure yet.	17:29
clarkb	zbr\|rover: its failing in a loop through right? ideally we wouldn't just set it to nv in that case	17:29
fungi	zbr\|rover: thanks for the update, i also realized there was no reason to limit the autohold to a single change so broadened it to any tripleo-ansible-centos-8-molecule-tripleo-modules failure for openstack/tripleo-ansible	17:30
zbr\|rover	i am almost sure is not zuul or infra issue here, is a genuine bug.	17:30
fungi	also, clarkb is right, setting it nv means we'll eat 3x the node count for that job anyway and just throw away the results	17:30
zbr\|rover	we do want to see the impact of the real fix first. it do not expect it to stay nv more than day.	17:30
fungi	the change with the fix can readd the job to the check and gate pipelines	17:31
fungi	that way you still see the effects of the fix on the fix change and any queued after it	17:31
fungi	just not on the changes where you expect it to fail	17:31
zbr\|rover	to be clear it does not always fails. in fact we do not really know what introduced the issue.	17:32
fungi	<100% failure would also be a possible explanation for how the bug itself got merged if it wasn't an outside shift	17:32
clarkb	if it is a similar network issue to last time it had to do with the job forcing dhcp in regions without dhcp	17:35
clarkb	all the regions that used dhcp said ok whatever and kept running, but those that use static IPs immediately broke	17:35
clarkb	corvus: I need to take a break, but let me know if I can help with the memory leak and I can dive into that after	17:36
corvus	clarkb: thx. i'm still at step 1: waiting for the first siguser2 objgraph most common types report to finish	17:36
corvus	about 15 minutes into that	17:37
zbr\|rover	done, updated the patch to disable job	17:40
zbr\|rover	fungi: clarkb thanks for helping on that issue. time for me to go offline now.	17:42
fungi	have a good evening zbr\|rover!	17:42
*** avass has quit IRC		17:56
*** yourname has joined #opendev		17:57
*** yourname is now known as avass		17:58
*** avass has quit IRC		18:00
*** yourname has joined #opendev		18:01
*** yourname is now known as avass		18:02
*** avass has quit IRC		18:07
*** yourname has joined #opendev		18:10
*** yourname is now known as avass		18:10
fungi	poking at the missing empty debian bullseye dists a bit more, i'm starting to think it may make the most sense to just reprepro export unconditionally after every reprepro update. testing that theory now	18:13
fungi	if it's reasonably quick even on nonempty dists, then the cost is low enough to warrant the one-line fix rather than lots of unnecessary config parsing and conditionals	18:15
*** avass has quit IRC		18:17
*** yourname has joined #opendev		18:18
*** yourname is now known as avass		18:18
*** avass has quit IRC		18:19
*** yourname has joined #opendev		18:19
*** yourname has quit IRC		18:21
*** avass has joined #opendev		18:21
openstackgerrit	Clark Boylan proposed opendev/system-config master: Update Gerrit to 3.2.8 https://review.opendev.org/c/opendev/system-config/+/784152	18:22
clarkb	that is't urgent but noticed they made a new release so figure we should try and keep up if we can	18:22
fungi	the dstat effort is not yet to the point where we have enough data to decide on a timeline for 3.3 i guess	18:22
clarkb	ya I think we really want to improve the gatling git stuff for 3.3	18:23
clarkb	that said new bigger server gives us a lot of headroom and I think we can be less cautious (early data says 3.3 is more memory but faster)	18:23
johnsom	Hmm, things are queuing in an odd way. I posted a patch fifteen minutes ago and it still isn't listed in the check pipeline	18:33
johnsom	https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/760465	18:33
clarkb	johnsom: we're (mostly corvus) doing object introspection on zuul to try and root cause this memory leak and that slows stuff down	18:34
clarkb	hopefully just temporary until we get data we need	18:34
johnsom	Ah, ok. Just thought I would give a heads up	18:34
*** DSpider has joined #opendev		18:39
*** DSpider has quit IRC		18:39
fungi	okay, so adding an explicit reprepro export on an otherwise noop update added 5.5 minutes for the debian repo, (9 dists covering hundreds of thousands of packages)	19:01
fungi	i'll see what it adds for debian-security	19:01
clarkb	not too bad considering the runtime a full sync	19:01
fungi	so the fix to debian-security is going to have to be different, i think	19:02
fungi	aptmethod error receiving 'http://security.debian.org/dists/bullseye/updates/Release':	19:02
clarkb	because it doesn't exist at all upstream yet or ?	19:02
fungi	'404 Not Found	19:02
clarkb	ya	19:02
mordred	clarkb: amazing that upstream released a point release and there is an opendev patch up to maybe run it	19:03
clarkb	mordred: ya we've managed to keep up with point releases	19:03
clarkb	3.3 scares me a bit simply because a few people on the repo discuss list reverted	19:03
mordred	yah	19:04
clarkb	but I've been trying to add better testing of it when I can. We added dstat for system level stats and also have a clunky gatling git thing up	19:04
mordred	fungi: deb http://security.debian.org/debian-security bullseye-security main is what's in the bullseye docker image for security - is security of updates the thing that doesn't exist until it exists that I was thinking about earlier?	19:04
mordred	oh ...	19:05
mordred	fungi: http://security.debian.org/debian-security/dists/bullseye-security/updates/	19:05
mordred	fungi: it's security.d.o bullseye-security - not security.d.o bullseye	19:06
mordred	(even though the other releases do not have a -security suffix)	19:06
*** hashar has quit IRC		19:08
fungi	oh, righto, that's changing in bullseye	19:09
fungi	totally forgot about that announcement	19:09
fungi	so we probably need the fix i'm considering and some config change for bullseye's security repo	19:10
fungi	okay, lemme get this pushed up first then, it seems to solve the first issue	19:12
fungi	after than i can meditate on the reprepro config a bit	19:12
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Explicitly create empty reprepro dists https://review.opendev.org/c/opendev/system-config/+/784158	19:27
fungi	jrosser: hrw: ^ that's part of the solution	19:27
fungi	now to work out the config change we need for bullseye-security	19:27
*** sboyron has quit IRC		20:12
*** d34dh0r53 has quit IRC		20:35
*** d34dh0r53 has joined #opendev		20:40
*** whoami-rajat has quit IRC		20:47
*** dhellmann_ has joined #opendev		21:12
*** dhellmann has quit IRC		21:12
*** dhellmann_ is now known as dhellmann		21:14
fungi	okay, so i've worked out the fix for bullseye-security, unfortunately it meant that we haven't been updating the debian-security volume since the bullseye addition went in because it was breaking that reprepro invocation, so we're some days behind on security repo state for stretch/buster and only just now starting to mirror it for bullseye	21:14
fungi	good news is it's a one-line fix	21:15
fungi	though i had to run reprepro clearvanished on it to clean up old incomplete references to the wrong bullseye security repo	21:15
fungi	which was part of what was preventing it from running	21:15
*** dhellmann has quit IRC		21:16
*** dhellmann has joined #opendev		21:17
*** dhellmann has quit IRC		21:23
*** dhellmann has joined #opendev		21:25
fungi	so one down-side here is i think i need to put the mirror-update server in the emergency disable list until i get the config patch deployed	21:27
fungi	well and merged	21:27
fungi	and uploaded	21:27
fungi	and written	21:27
fungi	one thing at a time ;)	21:27
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Correct debian-security repo codename for bullseye https://review.opendev.org/c/opendev/system-config/+/784169	21:32
fungi	infra-root: ^ appreciate an expedited review of that and the parent change since i have the server in emergency disable with the latter fix applied manually to avoid ansible from re-breaking it and requiring additional manual cleanup	21:33
clarkb	looking	21:34
fungi	jrosser: hrw: ^ that's the remaining fix, but at this point i've also applied and run it on the mirror-update server so we should be ready to move forward and recheck the dib job addition now	21:36
fungi	and assuming that passes, approve the nodeset addition too	21:37
ianw	fungi: lgtm, thanks	21:38
hrw	fungi: cool, thanks!	21:40
*** artom has quit IRC		21:44
*** artom has joined #opendev		21:45
fungi	i'm confused by the error on https://zuul.opendev.org/t/openstack/build/dbe8af6f6b054f0eb85401a70f74b188	22:12
fungi	i wonder if that test has bitrotted	22:12
fungi	vos examine exiting 255	22:14
fungi	famous last words, but i don't think my change is causing that, seems to be arising in a wholly separate script	22:14
fungi	the last time system-config-run-mirror-update succeeded was two days ago	22:16
fungi	but these changes are the first to run it since	22:16
clarkb	fungi: could it just be a fluke related to udp and internets?	22:19
clarkb	and or did it run on an ipv6 only cloud which might be more sensitive to problems?	22:19
fungi	maybe, but both changes hit the same error a couple of hours apart	22:20
fungi	aha!	22:21
fungi	external cause	22:21
fungi	Volume does not exist on server afs01.ord.openstack.org as indicated by the VLDB	22:21
fungi	just tried it from my workstation	22:21
fungi	i guess that will clear up once ianw's vos releases finish	22:22
fungi	ianw: would it be safe to go ahead and replicate project.zuul.readonly to ord ahead of the others?	22:22
fungi	since we explicitly reference it in that test, it can't pass currently	22:22
ianw	oh, i guess alphabetically that came last in the loop	22:26
clarkb	I'm going through johnsom's list of CI issues and seeing ifI can provide any help/feedback/fixes	22:26
ianw	i'm releasing it now	22:26
clarkb	https://etherpad.opendev.org/p/wallaby-RC1-ci-challenges <- is the list	22:27
fungi	thanks ianw! lmk when it completes and i'll approve those debian mirror fixes	22:27
ianw	fungi: Released volume project.zuul successfully	22:29
clarkb	fungi: ianw: if you get a chance can you look at my orange ish notes on item 3 in that etherpad and tell me if that looks like the pip solver to you?	22:34
clarkb	I wonder if it is really slow on amd/vexxhost for some reason	22:35
TheJulia	do we have mor general ci grumpiness? a lot of jobs just went to 2nd retry	22:35
TheJulia	At least, looking at https://zuul.opendev.org/t/openstack/status#ironic	22:36
clarkb	TheJulia: there was a zk reconnection aboutan hour ago? something like that	22:36
clarkb	corvus is actively debugging whihc at times has impact on zuul performance which can trigger that (even though last I checked memory use and thus swap was fine)	22:36
TheJulia	Looks fairly recent-ish :(	22:36
TheJulia	okay	22:36
fungi	cacti claims we're not back into memory pressure on the scheduler yet at least, but maybe the repl work is stalling zk connections out	22:37
clarkb	johnsom: out of curiousity are you enabling nested virt on any of these that have libvirt/cpu trouble?	22:38
clarkb	johnsom: yes I think so as the label being used is explicitly the nested virt label	22:39
johnsom	All of them, but the errors in in nova<->libvirt. The qemu/kvm layer has no errors	22:39
clarkb	johnsom: well the cpu lockup was in the kernel/cpu/etc	22:40
johnsom	It seems related to bionic as well, they are all stable jobs that I have seen	22:40
clarkb	I have a ver ystrong suspicion that that one is related to nested virt	22:40
johnsom	It always goes through the "try CPU type", that is not unusual. The speculation is it is a bug in libvirt/glib combo	22:43
clarkb	johnsom: sure, but all of your examples are vexxhost so far :)	22:44
clarkb	maybe it is a bug with libvirt/glib + amd :)	22:44
clarkb	johnsom: also reading the qemu log I'm not sure the amd nested virt flag is being set properly on those hosts	22:44
fungi	keeping in mind that amd nested virt accel is different than intel nested virt accel too	22:44
clarkb	should be svm but doesn't seem its in the opteron_g2 flag list	22:44
clarkb	fungi: yup though I'm not convinced it is properly enabled, but if it is that could be another factor	22:45
johnsom	Yeah, it always whines about that stuff too. Not unusual	22:45
clarkb	fungi: in the stackviz one that failed on name resolution that you checked against unbound it is running /tmp/stackviz/bin/pip3 install -u file://path/to/stackviz.tar.gz	22:55
clarkb	that then does a python setup.py egg_info somewhere which does the fetch against pypi directly	22:55
clarkb	I suspect that somehow we are tripping over easy install?	22:55
clarkb	I wonder if an explicit install of pbr into the virtualenv first would help	22:56
fungi	yeah, i was more wondering why stackviz install is being done that way	22:56
fungi	we could unpack the stackviz tree and then just pip install /the/path/to/it	22:57
clarkb	johnsom: another thing to consider is your jobs are running on a reduced set of clouds due to the nested virt request. Limestone which I guess sometimes has dns failures, vexxhost which may have amd weirdness and also pip SAT solver slowness?, and ovh which I haven't seen any specific issues against yet	22:57
clarkb	calling that out because if those clouds have problems your jobs will notice much more than background	22:57
clarkb	also simply turning off the problematic clouds wn't help much if they are the only ones that can run the flavors you want	22:58
johnsom	Ha, well, do we have other clouds? I'm ignoring RAX as it has it's own set of Xen problems	22:58
fungi	and internap	22:58
fungi	we have lots of nodes there	22:58
clarkb	johnsom: rax and inap are the other two clouds currently used for x86. Neither does nested virt	22:58
clarkb	but they provide a majority of resources iirc	22:59
fungi	er, right, they're inap now not internap	22:59
johnsom	Hmm, internap did at one point	22:59
clarkb	johnsom: we don't put the nsted-virt label there as we don't get the same attention of debugging the nested virt problems	22:59
fungi	possible we just don't create a special node type there to add it	22:59
clarkb	so even if it is enabled we won't put the special label there	22:59
johnsom	Admittedly, this sampling is very small. It is all from just one patch and not the normal day-to-day	22:59
fungi	mgagne may be able to suggest someone who can help with more low-level investigation of nested virt issues there, but he's not in here at the moment	23:00
clarkb	I think it is worth investigating further if the amd cpus are having trouble with pip solving and/or nested virt	23:00
johnsom	I am 90% sure we used to just "turn it on" there in the past, before the nodeset existed.	23:00
clarkb	the pip install timing on those is reall weird	23:01
fungi	but if we want to consider exposing a nested-virt label for inap i agree that would be a prerequisite	23:01
clarkb	johnsom: right but that isn't how we are exposing the label	23:01
johnsom	Yeah, I know	23:01
clarkb	johnsom: for the label we've gotten those clouds to minimally buy into to helping debug things when we can attribute them to nested virt	23:01
fungi	clarkb: oh, speaking of clouds, i did get the openedge cleanup pushed under topic:openedge	23:02
clarkb	fungi: do you know if pip solving slowness looks like https://zuul.opendev.org/t/openstack/build/d35cc616da1744e98c2d5b081866d541/log/job-output.txt#6209-6211 ?	23:02
clarkb	the reason I don't really suspect networ slowness is after that first package everything else is much quicker	23:03
clarkb	I kind of expect an oddity in how pip logs things where the solving is just no logging and you jump ahead a minute later with the downloads starting but I'm not sure	23:03
clarkb	fungi: is there an order to those openedge cleanup changes?	23:04
fungi	dns depends on system-config which depends on the others	23:04
fungi	base-jobs and project-config can merge first	23:05
fungi	sudo -H LC_ALL=en_US.UTF-8 SETUPTOOLS_USE_DISTUTILS=stdlib http_proxy= https_proxy= no_proxy= PIP_FIND_LINKS= SETUPTOOLS_SYS_PATH_TECHNIQUE=rewrite python3.8 -m pip install -c /opt/stack/requirements/upper-constraints.txt etcd3gw	23:05
fungi	i guess that would be worth testing	23:05
clarkb	(a gerrit plugin showing depends on chains in list of changes would be neat but probably difficult to do in a way that performance isn't terrible since the gerritdb knows nothing about depends on)	23:05
fungi	clarkb: i agree it could be dep solver slowness if the pip version being used is new enough to have it	23:06
clarkb	fungi: I think devstack upgrades pip very early	23:06
clarkb	but not sure	23:07
fungi	like 4175 claims pip 20.0.2	23:08
clarkb	dep solver is 21?	23:09
fungi	20.0.3 i think	23:10
clarkb	I wonder if devstack pinned pre solver, but that wouldalso rule out that theory	23:10
clarkb	maybe we should boot one of those vexxhost nodes and profile it?	23:11
clarkb	ianw: ^ possibly related to gerrit things	23:11
clarkb	it definitely seems like it just goes out to lunch every time it needs to install something	23:11
clarkb	but then catches up after the first dep is pulled	23:11
ianw	i'm seeing ipv6 weirdness, many sites unavailable. so possibly it gives up on something a falls back?	23:13
fungi	clarkb: sorry, 20.3	23:13
fungi	but yes, not new enough to be the new solver	23:13
fungi	clarkb: the log looks like it's using distro python version from focal	23:14
clarkb	ianw: oh that is a good theory, ya it could be that	23:15
clarkb	ianw: and then it remembers to use ipv4 for everything subsequent	23:15
fungi	"pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)"	23:15
openstackgerrit	Merged opendev/base-jobs master: Clean up OpenEdge configuration https://review.opendev.org/c/opendev/base-jobs/+/783989	23:15
* clarkb updates the etherpad		23:16
fungi	and yes, ipv6 connection timeout could explain the long delay	23:17
clarkb	and why it is so consistent	23:17
clarkb	of about a minute exactly	23:17
fungi	that would definitely make setup take a long time given how many different pip install commands devstack likes to break up into	23:18
ianw	i'm tracking things somewhat between #vexxhost channel and https://etherpad.opendev.org/p/gerrit-upgrade-2021	23:19
openstackgerrit	Merged openstack/project-config master: Clean up OpenEdge configuration https://review.opendev.org/c/openstack/project-config/+/783990	23:22
clarkb	fungi: I +2'd ^ the changes in the stack but didn't approve the later two as I can't watch the big inventory change go in	23:23
clarkb	I'm going to need to sort out dinner and enjoy this 70F march day shortly	23:23
fungi	go enjoy it, was a great day here too. had the windows open all day	23:24
*** tosky has quit IRC		23:30
TheJulia	sigh, 3rd retry on multiple jobs :(	23:34
corvus	TheJulia: i'm sorry :(	23:53
TheJulia	c'est la vie	23:53
TheJulia	All I can do is wait it out	23:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!