Friday, 2021-06-04

corvus	ianw: no i'm trying to say it's an incompatible option in zuul	00:03
corvus	ianw: zuul either uses the app auth or the webhook auth	00:03
corvus	ianw: a scheduler restart will bring in significant zuul changes; i was planning one tomorrow; i would not recommend it unless you're prepared to monitor for fallout. also, i would not recommend it because i don't think it'll fix the problem.	00:05
corvus	ianw: i read https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L1116 as meaning it's not going to use the api token as a fallback	00:11
ianw	corvus: yeah, i agree also as i was reworking https://review.opendev.org/c/zuul/zuul/+/794371 that became clearer	00:15
ianw	i think if we're not installed for the project, we should fall back to api authentication	00:17
ianw	i feel like i'm doing this very inefficiently compared to someone like tobiash[m] who might have thought about this a lot more. i might just update the story and give others a chance to weigh in	00:18
*** bhagyashris_ has joined #opendev		00:31
*** bhagyashris has quit IRC		00:38
*** ysandeep\|away has joined #opendev		01:19
*** ysandeep\|away is now known as ysandeep		01:19
*** ysandeep is now known as ysandeep\|ruck		01:19
*** boistordu_old has joined #opendev		02:11
*** boistordu has quit IRC		02:17
*** brtknr has quit IRC		03:00
*** brtknr has joined #opendev		03:00
*** ysandeep\|ruck is now known as ysandeep\|afk		03:49
opendevreview	Sandeep Yadav proposed openstack/diskimage-builder master: [DNM] Lock NetworkManager in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/794704	04:01
*** ykarel\|away has joined #opendev		04:37
*** ysandeep\|afk has quit IRC		04:38
*** ysandeep\|afk has joined #opendev		04:51
*** ysandeep\|afk is now known as ysandeep\|ruck		05:03
*** ykarel\|away is now known as ykarel		05:14
*** marios has joined #opendev		05:18
*** ralonsoh has joined #opendev		05:30
opendevreview	Sandeep Yadav proposed openstack/diskimage-builder master: [DNM] Lock NetworkManager in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/794704	05:31
*** slaweq_ has joined #opendev		06:00
*** slaweq has left #opendev		06:02
*** slaweq_ has quit IRC		06:03
*** slaweq_ has joined #opendev		06:03
*** dklyle has quit IRC		06:16
*** bhagyashris_ is now known as bhagyashris		06:33
*** amoralej\|off has joined #opendev		06:35
*** amoralej\|off is now known as amoralej		06:38
opendevreview	Slawek Kaplonski proposed opendev/irc-meetings master: Move neutron meetings to the openstack-neutron channel https://review.opendev.org/c/opendev/irc-meetings/+/794711	06:39
frickler	slaweq_: do you want to have some time for neutron ppl to review ^^ or do you think it's enough that it has been discussed in the meeting? then I'd just merge it	06:54
slaweq_	frickler: yes, I want at least Liu and Brian to check it	06:56
frickler	slaweq_: o.k., waiting for that, then	06:58
slaweq_	frickler: thx	07:04
*** rpittau\|afk is now known as rpittau		07:05
*** tosky has joined #opendev		07:23
*** andrewbonney has joined #opendev		07:32
*** slaweq has joined #opendev		07:35
*** ysandeep\|ruck is now known as ysandeep\|lunch		07:35
*** slaweq_ has quit IRC		07:41
*** hashar has joined #opendev		07:46
*** jpena\|off is now known as jpena		07:54
*** open10k8s has quit IRC		08:02
*** open10k8s has joined #opendev		08:04
*** lucasagomes has joined #opendev		08:04
*** slaweq_ has joined #opendev		08:05
*** slaweq_ has quit IRC		08:11
*** CeeMac has joined #opendev		08:15
*** lucasagomes has quit IRC		08:19
*** lucasagomes has joined #opendev		08:26
opendevreview	Pierre Riteau proposed opendev/irc-meetings master: Move Blazar meeting to #openstack-blazar https://review.opendev.org/c/opendev/irc-meetings/+/794740	08:32
*** WeSteeve has joined #opendev		08:41
*** lucasagomes has quit IRC		08:48
*** lucasagomes has joined #opendev		08:49
*** WeSteeve has quit IRC		09:00
*** lucasagomes has quit IRC		09:05
*** hashar is now known as Guest833		09:16
*** hashar has joined #opendev		09:16
*** ysandeep\|lunch is now known as ysandeep		09:16
*** ysandeep is now known as ysandeep\|ruck		09:16
*** Guest833 has quit IRC		09:21
*** lucasagomes has joined #opendev		09:25
*** lucasagomes has quit IRC		09:37
*** lucasagomes has joined #opendev		09:43
opendevreview	Merged opendev/irc-meetings master: Move Blazar meeting to #openstack-blazar https://review.opendev.org/c/opendev/irc-meetings/+/794740	09:48
*** lucasagomes has quit IRC		09:54
*** lucasagomes has joined #opendev		09:59
*** hashar_ has joined #opendev		10:41
*** hashar is now known as Guest844		10:41
*** hashar_ is now known as hashar		10:41
*** Guest844 has quit IRC		10:47
*** lucasagomes has quit IRC		10:53
*** hashar has quit IRC		10:53
*** hashar has joined #opendev		10:54
*** amoralej is now known as amoralej\|afk		10:55
*** lucasagomes has joined #opendev		10:58
*** jpena is now known as jpena\|lunch		11:32
*** osmanlicilegi has quit IRC		11:38
*** cgoncalves has quit IRC		11:59
*** osmanlicilegi has joined #opendev		12:00
*** cgoncalves has joined #opendev		12:01
*** osmanlicilegi has quit IRC		12:11
*** jpena\|lunch is now known as jpena		12:20
*** ysandeep\|ruck is now known as ysandeep\|mtg		12:33
*** osmanlicilegi has joined #opendev		12:35
*** lucasagomes has quit IRC		12:38
*** kapoios_allos has joined #opendev		12:41
*** kapoios_allos has quit IRC		12:42
*** osmanlicilegi has quit IRC		12:46
*** bhagyashris_ has joined #opendev		12:50
*** osmanlicilegi has joined #opendev		12:55
*** lucasagomes has joined #opendev		12:56
*** bhagyashris has quit IRC		12:56
*** bhagyashris_ is now known as bhagyashris		12:58
*** amoralej\|afk is now known as amoralej		13:00
*** fultonj has joined #opendev		13:00
*** cgoncalves has quit IRC		13:14
*** cgoncalves has joined #opendev		13:15
*** hashar has quit IRC		13:24
*** fultonj has quit IRC		13:26
corvus	infra-root: i'd like to restart zuul to pick up some recent changes	13:30
corvus	starting that now	13:33
*** rpittau is now known as rpittau\|afk		13:37
corvus	as expected, zk data size is growing significantly (we're caching config data there now)	13:39
corvus	looks like node count increased from 12k -> 35k, and data size increaste from 10mib -> 20mib	13:42
corvus	re-enqueing	13:42
corvus	it didn't seem like startup took any more or less time, which is good.	13:43
corvus	jobs are running	13:43
*** ysandeep\|mtg is now known as ysandeep		13:44
*** ysandeep is now known as ysandeep\|ruck		13:44
corvus	#status log restarted zuul at commit 85e69c8eb04b2e059e4deaa4805978f6c0665c03 which caches unparsed config in zk. observed expected increase in zk usage after restart: 3x zk node count and 2x zk data size	13:47
opendevstatus	corvus: finished logging	13:47
corvus	looks like the final numbers may be a bit bigger as we're still adding in the operational baseline from before now that jobs are starting	13:48
corvus	so far none of the performance metrics look different	13:49
*** lowercase has joined #opendev		13:56
fungi	seems good so far, yeah	14:06
*** lucasagomes has quit IRC		14:18
*** lucasagomes has joined #opendev		14:21
*** lucasagomes has quit IRC		14:34
*** lucasagomes has joined #opendev		14:35
*** gmann is now known as gmann_afk		14:40
*** lucasagomes has quit IRC		14:46
*** dklyle has joined #opendev		14:46
*** lucasagomes has joined #opendev		14:50
corvus	everything still looks nominal	14:53
*** open10k8s has quit IRC		15:08
clarkb	one thing I wonder about is how the zookeeper memory use changes as the data set increases. Currently those servers are relatively small, but also the existing resource utilization is low so lots of headroom to grow into	15:09
*** open10k8s has joined #opendev		15:10
*** open10k8s has quit IRC		15:10
fungi	yeah, we'll probably know more as we get well into monday or tuesday	15:12
*** gmann_afk is now known as gmann		15:13
*** engine_ has joined #opendev		15:22
*** engine_ has quit IRC		15:25
*** ykarel is now known as ykarel\|away		15:27
corvus	there's like zero change in memory usage on them despite the 2x change in data size	15:33
corvus	a miniscule amount of additional cpu	15:33
corvus	(like, it's currently at 95% idle)	15:34
corvus	but that could actually just be due to a change in connection distribution	15:35
*** lucasagomes has quit IRC		15:45
*** lucasagomes has joined #opendev		15:48
*** ysandeep\|ruck is now known as ysandeep\|away		15:49
opendevreview	Merged opendev/base-jobs master: Set a fallback VERSION_ID in the mirror-info role https://review.opendev.org/c/opendev/base-jobs/+/791177	15:53
fungi	jrosser: ^ probably worth rechecking your bullseye addition change now	16:00
fungi	in theory it should work without your workaround at this point	16:01
*** marios is now known as marios\|out		16:04
*** lucasagomes has quit IRC		16:04
opendevreview	Monty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify https://review.opendev.org/c/zuul/zuul-jobs/+/739047	16:10
*** ykarel\|away has quit IRC		16:18
*** marios\|out has quit IRC		16:18
*** ysandeep\|away has quit IRC		16:33
*** jpena is now known as jpena\|off		16:38
*** ralonsoh has quit IRC		16:47
clarkb	as a reminder I was planning on dropping out of freenode channels entirely next week. Any reason to not do that given the way the transition has gone?	16:53
clarkb	fungi: should we maybe roll the dice on topic updates first?	16:53
fungi	i'm happy to stick around in there for months watching for stragglers, but sure maybe we update the topic for #opendev (as an example) to "Former channel for the OpenDev Collaboratory, see http://lists.opendev.org/pipermail/service-discuss/2021-May/000249.html"	16:59
*** amoralej is now known as amoralej\|off		17:00
clarkb	++ I liek that. Clearly points to docs people need without tripping any of the known keyword that are a problem	17:00
mordred	++	17:04
mordred	I have also dropped out of freenode fwiw	17:05
*** amoralej\|off has quit IRC		17:42
*** andrewbonney has quit IRC		17:46
mnaser	i am also planning to continue to be there to send people over	18:51
mnaser	but it is getting incresingly quiet :)	18:51
*** lowercase has quit IRC		18:51
*** tinwood has quit IRC		18:55
*** amoralej\|off has joined #opendev		18:56
*** tinwood has joined #opendev		18:58
fungi	there have only been a few people speak up in the channels i'm watching who didn't seem to know, the rest are either helpful lurkers like us or (i think the majority) zombie bouncer processes nobody's connected to in months/years	19:07
mordred	fungi: yeah - I did an active disconnect - otherwise I figured I'd become a new zombie bouncer :)	19:08
mordred	felt a little weird	19:08
*** slittle1 has joined #opendev		19:41
slittle1	Please set me (Scott Little) up as first core for new repo starlingx-audit-armada-app. I'll add the rest of the cores	19:42
*** amoralej\|off has quit IRC		19:45
fungi	looks like that repo was created on may 8 when https://review.opendev.org/790250 merged and added a starlingx-audit-armada-app-core group	19:46
fungi	and yeah, it has no members yet	19:47
fungi	slittle1: done!	19:47
slittle1	thanks	20:03
fungi	any time!	20:05
noonedeadpunk	fungi: any known activity to centos8 repos? Like dropping them?	20:08
noonedeadpunk	as they started filing wierdly today at ~10am utc	20:09
fungi	filing what?	20:28
fungi	noonedeadpunk: id you meant failing, a link to an example build result would be helpful	20:43
johnsom	I wonder if we have another ansible upgrade going on. Cloning is slow at the moment.	20:49
fungi	johnsom: i've checked resource utilization for all the backends and the load balancer, everything is fairly quiet... can you check the ssl cert for the backend you're hitting to see what hostnames it lists? i'll dive deeper on the one you're hitting	20:54
johnsom	It's a stack (devstack) so let me try in another window. Just odd to see a clone horizon taking minutes	20:55
fungi	and it's the cloning phase specifically, not installing, which is going slowly?	20:57
johnsom	~400 KiB/s	20:57
johnsom	What is the trick to grab the TLS info? Do I need to tcpdump it?	20:58
johnsom	Yeah, even a direct git clone is super slow.	20:58
fungi	i do `echo\|openssl s_client -connect opendev.org:https\|openssl x509 -text\|grep CN` but there are lots of ways	20:59
johnsom	Yeah, ok, a separate s_client.	21:00
johnsom	CN = gitea01.opendev.org	21:00
fungi	thanks	21:00
fungi	you can also get away with just a simple `openssl s_client -connect opendev.org:https` and then scroll up to the beginning of the verification chain info where it mentions the CN	21:01
fungi	but you end up with a lot of output	21:01
johnsom	Yeah, I know s_client all too well. lol	21:01
fungi	heh	21:01
fungi	currently testing cloning nova from gitea01 with another server on the same network just to get a baseline	21:06
fungi	Receiving objects: 100% (595113/595113), 155.44 MiB \| 14.64 MiB/s, done.	21:06
johnsom	This is the IP it's hitting: 2604:e100:3:0:f816:3eff:fe6b:ad62	21:07
fungi	yep, that's an haproxy load balancer	21:07
fungi	you can test cloning directly from https://gitea01.opendev.org:3081/openstack/nova to bypass it	21:07
fungi	see if you get similar speeds	21:07
johnsom	That looks faster, 2.6MiB/s	21:08
fungi	so that suggests one of two things, either the lb is slowing things down or (more likely) ipv6 performance for you is worse than ipv4 at the moment	21:09
fungi	maybe try `git clone -4 https://opendev.org/openstack/nova1 to rule out the latter	21:09
fungi	er, `git clone -4 https://opendev.org/openstack/nova`	21:09
johnsom	Yeah, I'm still getting 1gbps to Portland. Let	21:09
johnsom	me try the v4	21:10
fungi	i can confirm cloning via the load balancer's ipv6 address is very slow for me as well	21:10
fungi	even though i'm hitting a different backend entirely	21:10
johnsom	About the same for ipv4	21:11
johnsom	Well, maybe it's just Friday afternoon people streaming stuff. lol	21:11
fungi	heh	21:11
fungi	yep, i'm seeing even worse performance to the lb over ipv4 than over ipv6, yikes	21:12
fungi	network traffic graph for it seems reasonable though: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66621&rra_id=all	21:12
johnsom	Yeah, must be something upstream.	21:13
fungi	cpu is virtually idle, so it's not like it's handling an interrupt storm or anything	21:13
fungi	however, given that ipv4 to the load balancer is slow but ipv6 to the backends isn't, even though they're in the same provider, suggests there could be something else going on	21:14
fungi	er, ipv4 to both i mean	21:14
fungi	ipv4 to backends is fast (they're ipv4-only in fact)	21:15
johnsom	Well, that isn't enough traffic to wake haproxy up from an afternoon nap.	21:15
fungi	indeed	21:16
clarkb	mtr can often point out locations with problems	21:16
fungi	seems like it might be a local network issue impacting the segment the lb is on but not the segment the backends are on	21:16
clarkb	though the local router appears to be shared and the ip addresses are on the same /25	21:19
clarkb	ah and according to the network interfaces that range is part of a larger /24 segment	21:20
fungi	mtr --tcp is showing some pretty substantial packet loss to the lb	21:24
fungi	but not to the backends	21:24
fungi	anyone else observing the same?	21:24
fungi	seems to come and go in bursts	21:25
fungi	now i'm not seeing it	21:25
clarkb	I've got a couple `mtr --tcp` running now but no loss so far	21:25
clarkb	maybe sad lacp link or similar	21:26
johnsom	fungi Are you bouncing through seattle with your mtr?	21:26
fungi	one frame for you, one frame for the bit bucket, one frame for you, ...	21:26
johnsom	I'm seeing some congestion in Seattle that comes and goes	21:27
fungi	looks like my cable provider peers with zayo and then i go through atlanta to dallas to los angeles to san jose	21:27
fungi	though i wouldn't be surprised if the return route is asymmertic. lemme see if i can check the other direction	21:28
clarkb	I'm going over cogent via pdx and sfo	21:28
johnsom	Yeah, I bounce through Seattle, get on zayo straight to San Jose	21:28
fungi	but i would be surprised if my routes to and from the lb differ significantly vs those for a backend server in the same cloud	21:29
clarkb	I have not seen any loss over my path	21:30
fungi	can't even make it through one pass with mtr before it crashes on "address in use"	21:30
fungi	but a traditional traceroute shows the return path to me is actually cogent not zayo	21:31
fungi	so my connections are arriving at vexxhost via zayo but responses go over cogent (san jose straight to atlanta)	21:31
fungi	anyway, for me the routes are the same to/from gitea01.opendev.org as well, yet i can clone from it far faster	21:35
fungi	even the last hop for me is the same in both traceroutes	21:37
fungi	let's see if the first hops in the other direction line up	21:37
fungi	first hops on the return path are also the same for both servers, so maybe it's a layer 2 issue, host level even?	21:39
johnsom	Sorry to derail the end of Friday. I did get my clones finished, so I'm good to go at this point.	21:39
fungi	no, i appreciate the heads up, it's looking like we might want to let mnaser in on the fun	21:39
mnaser	hi	21:40
fungi	mnaser: are you aware of any internet network disruptions in sjc1?	21:41
fungi	er, i mean internal	21:41
mnaser	nothing that i'm aware of	21:42
mnaser	i haven't been able to digest the messages though	21:42
fungi	we're seeing very slow network performance from multiple locations communicating over tcp with (both ipv4 and ipv6) addresses for gitea-lb01.opendev.org	21:42
mnaser	ipv6 is fast but ipv4 is not, or?	21:42
fungi	both slow, v4 is actually slower for me than v6 even	21:42
fungi	however other servers on the same network, like gitea01.opendev.org are fairly snappy	21:43
fungi	resource graphs for gitea01.opendev.org all look basically idle	21:43
mnaser	oh so going direct to gitea backends is ok, but the load balancer is not?	21:43
fungi	mtr --tcp is showing me a lot of packet loss for gitea01.opendev.org as well and not for other hosts in that network	21:43
fungi	correct	21:43
fungi	er. meant to say resource graphs for gitea-lb01.opendev.org all look basically idle	21:44
fungi	wondering if there could be something happening at layer 2 but only impacting gitea-lb01.opendev.org, maybe at the host level?	21:44
fungi	traceroutes to/from both the lb and backends look identical	21:45
fungi	the server instance is showing no obvious signs of distress, not even breaking a sweat	21:46
mnaser	fungi: are you inside gitea's vm?	21:46
fungi	the haproxy lb vm is the one we're seeing weird network performance for, not the backend gitea servers	21:47
fungi	"opendev.org" (a.k.a. gitea-lb01.opendev.org)	21:47
mnaser	right, sorry, i meant gitea-lb01 :p	21:48
mnaser	my 'shortcutting' failed	21:48
mnaser	`curl -s http://169.254.169.254/openstack/latest/meta_data.json \| python3 -mjson.tool \| grep uuid`	21:48
fungi	mnaser: curl can't seem to reach that url from the server, but server show reports the instance uuid is e65dc9f4-b1d4-4e18-bf26-13af30dc3dd6	21:50
fungi	for the record, the curl response is "curl: (7) Failed to connect to 169.254.169.254 port 80: No route to host"	21:51
*** gmann is now known as gmann_afk		21:51
fungi	so we're probably missing a static route for that	21:51
clarkb	you can get the instance uuid from the api `openstack server show gitea-lb01.opendev.org`	21:52
fungi	yeah, that's where i got the one i pasted above	21:52
* mnaser looks		21:53
mnaser	hr	21:56
fungi	performance seems to at times rise as high as 1.5MiB/s and then fall as low as 400KiB/s according to git clone... same sort of cadence i see mtr --tcp report packet loss coming and going for it	21:57
mnaser	im seeing peaks of like	21:57
mnaser	400-500Mbps on the public interface	21:57
mnaser	but i guess that's because it's using the same interface for in/out traffic	21:57
fungi	that doesn't match at all what we're seeing with snmp polls though: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66621&rra_id=all	21:58
fungi	but we're aggregating at 5-minute samples, so maybe it's far more bursty	21:58
fungi	as far as the traffic graphs we have are concerned though, our network utilization on that interface is basically what we always have there, but the network performance we're seeing is not typical	21:59
mnaser	fungi: would it be possible to setup iftop on the lb node and see if you see anything odd	22:01
*** tosky has quit IRC		22:01
fungi	installed, checking the manpage for it now	22:01
clarkb	fwiw I don't see the same issues that fungi sees	22:02
fungi	ahh, reminds me of nftop and pftop on *bsd	22:02
fungi	clarkb: what speeds do you get cloning nova?	22:02
mnaser	fungi: p. much, nice little real time thing to see :)	22:02
clarkb	checking now	22:02
clarkb	just under 2MiB/s	22:03
fungi	and it's steady?	22:03
clarkb	which is typical for me iirc	22:03
fungi	what about to one of the backends?	22:03
clarkb	yup bounces between about 1.75 to 2.00 MiB/s but seems steady	22:03
clarkb	will try a backend when this completes	22:04
clarkb	155.55 MiB \| 1.74 MiB/s, done. <- was aggregate	22:04
fungi	mnaser: the averaged rates are lower than i would expect but i do see the 2sec average occasionally around 150Mbps	22:06
clarkb	155.46 MiB \| 1.70 MiB/s, done. <- to gitea01 all my data is via ipv4 as I don't have v6 here	22:07
fungi	i just was 2sec average go a hair over 200Mbps	22:07
fungi	er, just saw	22:07
clarkb	that is pretty consistent with what i recall getting via gitea in the past	22:07
fungi	actually now i'm getting fairly poor performance directly to gitea04 so it's possible there is a backend issue	22:11
fungi	mnaser: yeah this may not be as cut and dried as it seemed at first, and if clarkb's not seeing performance issues then it could be just impacting me and johnsom not everyone	22:13
clarkb	04 has plenty of memory available and cpu isn't spinning	22:13
fungi	i'm going to do some clone tests to the other backends as well for comparison	22:13
fungi	my clone from the 04 backend averaged 643.00 KiB/s	22:15
fungi	i'm getting much the same from the 01 backend now... i was seeing far better performance before. may need to test this from somewhere out on the 'net which doesn't share an uplink with lots of tourists watching netflix on a rainy friday evening	22:22
fungi	okay, so if there was a more general problem i'm not able to reproduce it now	22:26
fungi	getting 7.7 MiB/s from poland cloning via ipv6 at both the lb and directly from a backend	22:27
fungi	(in the ovh warsaw pop)	22:27
fungi	think i'm going to blame tourists and call it an afternoon, just as soon as i finish ironing out this nagging negative lookahead regex	22:33
*** whoami-rajat has quit IRC		23:40

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!