Thursday, 2021-12-09

fungi	python 3.10.1 yesterday, 3.11.0a3 today	00:00
fungi	seems like i'm always compiling a new python	00:00
clarkb	Related to new releases this is a fun one. Gerrit's 3.5.0.1 release broke a bunch of plugins because they pulled out elasticsaerch support (since they went not open source) and elasticsearch was pulling in a dep for a number of plugins and it isn't there anymore	00:01
fungi	oh, yeah, transitive deps silently satisfying direct deps is a major risk. it's bitten openstack projects before as well	00:02
corvus	ianw: interested in reviewing https://review.opendev.org/820954 ? it's the other half of a change you +2d	00:05
corvus	re keycloak	00:05
ianw	lgtm	00:06
ianw	sorry i think i meant to +2 that when i looked at the other bit	00:06
corvus	\o/ thx	00:13
opendevreview	Ade Lee proposed zuul/zuul-jobs master: DNM enable_fips role for zuul jobs https://review.opendev.org/c/zuul/zuul-jobs/+/807031	00:29
opendevreview	Merged opendev/system-config master: Add keycloak auth config to Zuul https://review.opendev.org/c/opendev/system-config/+/820954	00:51
fungi	yay! https://zuul.opendev.org/t/openstack/build/20fccb043c35459194b1094b28586055/log/lists.openstack.org/exim4/mainlog#47	01:13
fungi	mailman tried to notify me, exim got the notification and attempted delivery, then got its outbound smtp socket reset	01:14
clarkb	successful failure	01:14
clarkb	the best kind of failure	01:14
fungi	i'll integrate the firewall fix, though the question remains whether we should start the mailman services in testinfra	01:15
clarkb	if it isn't necessary to test this properly I don't know that we need to. Though not starting them probably covered up that python path issue	01:15
fungi	yes	01:16
clarkb	it shouldn't hurt to start them if we've blocked smtp outbound. People can send mail in if they really like and it won't go anywhere	01:16
fungi	well, sort of covered it up, actually the initscript does an exit 0 when python isn't found so systemd wouldn't have known the differenc	01:16
fungi	e	01:16
clarkb	ah	01:17
clarkb	another successful failure :)	01:17
opendevreview	Jeremy Stanley proposed opendev/system-config master: Block outbound SMTP connections from test jobs https://review.opendev.org/c/opendev/system-config/+/820900	02:05
opendevreview	Jeremy Stanley proposed opendev/system-config master: Copy Exim logs in system-config-run jobs https://review.opendev.org/c/opendev/system-config/+/820899	02:05
opendevreview	Jeremy Stanley proposed opendev/system-config master: Collect mailman logs in deployment testing https://review.opendev.org/c/opendev/system-config/+/821112	02:05
opendevreview	Jeremy Stanley proposed opendev/system-config master: Make sure /usr/bin/python is present for mailman https://review.opendev.org/c/opendev/system-config/+/821095	02:05
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	02:05
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	02:05
*** rlandy\|ruck\|bbl is now known as rlandy\|ruck		02:19
*** rlandy\|ruck is now known as rlandy\|out		02:23
ianw	i'm finding it quite hard to the the zuul-client docker image to generate a secret	02:43
ianw	--infile doesn't help	02:44
ianw	so far i haven't figure out how to pipe input into it either	02:47
ianw	ok, running with "-i", but not "-t", makes "cat file \| docker run ... zuul-client encrypt ..." work	02:49
*** bhagyashris_ is now known as bhagyashris		03:02
Clark[m]	ianw fwiw I think there is a python script in the tools dir of zuul yo do it as well	03:07
Clark[m]	You don't need auth for it as it grabs a pubkey to do the encryption	03:07
ianw	yeah, that is now giving a deprecated warning	03:07
*** pojadhav\|out is now known as pojadhav\|rover		03:18
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	03:50
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	03:50
fungi	okay, i think topic:mailman-lists is ready to go, finally	05:00
opendevreview	Ian Wienand proposed opendev/system-config master: infra-prod: write a secret to the bastion host https://review.opendev.org/c/opendev/system-config/+/821155	05:25
*** marios is now known as marios\|ruck		06:12
*** gibi_ is now known as gibi		07:52
*** ysandeep is now known as ysandeep\|lunch		08:08
*** ysandeep\|lunch is now known as ysandeep		08:38
opendevreview	Merged openstack/project-config master: Add NVidia vGPU plugin charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/819818	09:02
*** pojadhav\|rover is now known as pojadhav\|lunch		09:07
*** pojadhav\|lunch is now known as pojadhav\|rover		10:03
*** ysandeep is now known as ysandeep\|afk		10:21
*** redrobot6 is now known as redrobot		10:23
*** jpena\|off is now known as jpena		10:35
*** ysandeep\|afk is now known as ysandeep		10:56
*** rlandy\|out is now known as rlandy\|ruck		11:10
*** pojadhav\|rover is now known as pojadhav\|rover\|brb		11:42
*** pojadhav\|rover\|brb is now known as pojadhav\|rover		11:51
*** pojadhav\|rover is now known as pojadhav\|rover\|brb		12:02
*** pojadhav\|rover\|brb is now known as pojadhav\|rover		12:22
*** ykarel is now known as ykarel\|away		13:21
*** pojadhav\|rover is now known as pojadhav\|rover\|brb		14:18
*** pojadhav\|rover\|brb is now known as pojadhav\|rover		15:04
slittle1_	having intermittent issues with 'git review -s'	15:37
slittle1_	trying to run a script that sets up the gerrit remote on all starlingx repos	15:38
slittle1_	seems like every second or third try hangs	15:39
slittle1_	I'm working around it with a 'tineout' and a retry	15:39
slittle1_	cat .gitreview	15:41
slittle1_	[gerrit]	15:41
slittle1_	host=review.opendev.org	15:41
slittle1_	port=29418	15:41
slittle1_	project=starlingx/distcloud-client.git	15:41
slittle1_	defaultbranch=master	15:41
slittle1_	as an example	15:41
corvus	i'm going to restart zuul-web with the new auth config; expect a several-minute outage (of web only; schedulers will continue)	15:42
*** ysandeep is now known as ysandeep\|out		15:45
fungi	slittle1_: going over ipv4 or ipv6? sounds like there could be some intermittent network problems... are you seeing the same behavior from multiple locations?	15:47
slittle1_	ipv4	15:55
slittle1_	single location	15:56
slittle1_	don't have the means to test from multiple locations at the moment	15:56
slittle1_	Problem running 'git remote update gerrit'	15:57
slittle1_	Fetching gerrit	15:57
slittle1_	ssh_exchange_identification: read: Connection reset by peer	15:57
slittle1_	fatal: Could not read from remote repository.	15:57
slittle1_	Please make sure you have the correct access rights	15:57
slittle1_	and the repository exists.	15:57
slittle1_	error: Could not fetch gerrit	15:57
fungi	slittle1_: i'll see if i can reproduce from other places on the internet	15:59
fungi	running `git remote update gerrit` in starlingx/distcloud-client in a loop isn't producing errors from my house but i'll try from some virtual machines in various cloud providers as well	16:01
Clark[m]	We limit connections per account. It this is happening concurrently or quickly enough that tcp hasn't closed completely that may the be cause	16:01
Clark[m]	We also limit by IP and if you go through NAT similar problem	16:02
slittle1_	ok, so I should try adding a delay between requests? What delay do you recommend ?	16:02
Clark[m]	Well I'm suggesting this could be related but I don't know enough about your situation to be confident it is the cause.	16:03
slittle1_	How is the connect limit enforced? how many connects over what time period ?	16:04
Clark[m]	There are two methods. The first is by iptables limiting to 100 connections per source IP. The other is Gerrit limiting to 96 per Gerrit account iirc	16:06
Clark[m]	If it were me I'd git review -s on demand and not try to do them in bulk	16:06
opendevreview	Merged openstack/project-config master: Allow Zuul API access from keycloak server https://review.opendev.org/c/openstack/project-config/+/820956	16:08
slittle1_	The 'git review -s' requests are serial, not parallel.	16:09
fungi	yeah, unlikely to be either of the concurrent connection count limits in that case	16:09
fungi	(the limit of 96 concurrent ssh connections per account is enforced by the gerrit service, the limit of 100 concurrent ssh connections per source ip address is enforced by iptables/conntrack on the server, for future reference)	16:10
Clark[m]	The Gerrit ssh log may have hints. But I'm finishing a school run	16:11
Clark[m]	Similarly trying to reproduce with only ssh client on the client side with -vvv may be helpful	16:12
fungi	i can't reproduce the same problem running `git remote update gerrit` in a tight loop from various places on the internet so far	16:13
slittle1_	any other anti spam/DOS measure I might be getting caught in?	16:15
slittle1_	I'd estimate ~200 of those requests over 2-3 minutes	16:16
clarkb	slittle1_: that maybe enough that tcp isn't fully closing	16:16
clarkb	and you're hitting the tcp limit	16:16
fungi	i doubt it's the conntrack overflow, since it's set to send icmp-port-unreachable not tcp reset (git's claiming to see the latter)	16:18
*** pojadhav is now known as pojadhav\|rover		16:19
clarkb	slittle1_: do you know approximately what time the last error occured? I can look at the gerrit sshd log	16:21
slittle1_	within the last 5 min	16:23
clarkb	ok the sshd log doesn't seem to show any erorrs in that timeframe implying it is proably something before gerrit is involved	16:25
clarkb	perhaps a firewall on your end or some sort of asymmetric route causing routers/firewalls to get angry	16:26
slittle1_	I'll try again now	16:27
clarkb	I've approved https://review.opendev.org/c/opendev/system-config/+/818606 as I indicated I would yesterday (this is the lodgeit user upadte)	16:28
clarkb	if it has a sad I can amnually revert on the host then push a revert if the fix isn't straightforward	16:28
noonedeadpunk	was that discussion about connection issues to opendev infrastructure?:)	16:29
clarkb	noonedeadpunk: specifically to review.opendev.org over port 29418 over with ipv4 yes	16:29
noonedeadpunk	well just for me right now git clone https://opendev.org/openstack/requirements /tmp/req ends with `GnuTLS recv error (-9): Error decoding the received TLS packet.`	16:30
slittle1_	got a bit further	16:31
clarkb	noonedeadpunk: that is a different system hosted in another part of the world. I doubt they are related, but I suppose it is possible	16:31
slittle1_	ssh://slittle1@review.opendev.org:29418/starlingx/portieris-armada-app.git did not work. Description: ssh_exchange_identification: read: Connection reset by peer	16:31
slittle1_	fatal: Could not read from remote repository.	16:31
slittle1_	Please make sure you have the correct access rights	16:31
slittle1_	and the repository exists.	16:31
slittle1_	Could not connect to gerrit.	16:31
slittle1_	Enter your gerrit username:	16:31
noonedeadpunk	curl actually works, but you know - it;'s quite different proto used	16:31
noonedeadpunk	clarkb: do we have actually some rate limiting there?	16:31
noonedeadpunk	As I was clonning quite a lot of repos at a time....	16:32
clarkb	noonedeadpunk: we have "if you overload the system you'll break it and cause a fail over to another backend" rate limiting :)	16:32
clarkb	noonedeadpunk: were you running OSA updates in a datacenter? we know that causes it to happen and had to ask osa to not ddos us	16:32
noonedeadpunk	mmm, I see )	16:32
clarkb	unfortunately git clones are not cheap and need significant amounts of memory. Eventually we run out.	16:33
clarkb	slittle1_: looks like the same error but in a different aprt of the process?	16:33
noonedeadpunk	While I'm aware about osa issue and we got exact reason why it's happening, and I really do some osa related stuff, it's not related :)	16:33
clarkb	slittle1_: the specific repo there gives me something new to look at in the logs	16:33
noonedeadpunk	I was retrieveing HEAD SHAs for openstack services so that shouldn't cause too much load	16:34
clarkb	noonedeadpunk: its actually the same	16:35
clarkb	git has to load all the data into memory for most operations aiui	16:35
clarkb	the resulting IO can differ but the IO and cpu impact to initiate operations doesn't differ by much	16:35
slittle1_	clarkb: it's just iterating through our starling git repos. It got a bit further this time.	16:36
noonedeadpunk	um, so the issue when osa was ddosing was when it did quite the same but from each compute in the deployment	16:36
noonedeadpunk	ah	16:36
clarkb	er the memory, io and cpu to initiate don't differ much. The delta is the io afterwards	16:36
noonedeadpunk	I see	16:36
clarkb	slittle1_: right this is why I suggested doing it on demand earlier. Fwiw I don't see that request at all here	16:36
noonedeadpunk	but well... we need to updated versions and do releases... I'm not sure I know other way how to grab top of stable/xena for example and make it persistant over time	16:37
noonedeadpunk	we can do this slower though....	16:37
clarkb	noonedeadpunk: well it should be fine if you do them sequentially	16:37
noonedeadpunk	yep, I did one by one	16:38
slittle1_	ultimately the goal is to create a branch on each repo, and to modify the defaultbranch of the .gitreview files in each repo	16:38
noonedeadpunk	and then process jsut stuck and it's been like 15 minutes already that I can't clone :(	16:38
clarkb	noonedeadpunk: well are you cloning or checking the HEAD?	16:38
noonedeadpunk	so was wondering if there's some automated thing like fail2ban or dunno	16:38
clarkb	because I mentioned cloning and you said you weren't doing that. And no there is no fail2ban but we must load balance by source IP because git and if you overload your backend this can happen	16:39
clarkb	unfortauntely I'm also trying to debug a separate conenctivity issue to a separate service in another datacenter in a different country so juggling isn't easy	16:39
noonedeadpunk	clarkb: what script exactly was doing - `git ls-remote <repo> stable/xena`	16:41
noonedeadpunk	ok, sorry, grab that	16:41
noonedeadpunk	this can wait	16:41
clarkb	noonedeadpunk: ok are you cloning then?	16:41
clarkb	none of the backends indicate memory or system load pressure so likely not that	16:41
noonedeadpunk	as for me - git connection jsut hangs whatever I do	16:42
noonedeadpunk	oh, well, no	16:42
noonedeadpunk	git ls-remote jsut worked	16:42
noonedeadpunk	clone not	16:43
clarkb	noonedeadpunk: can you see which backend you are talking to by inspecting the ssl cert (we put the name of the backend in there too)	16:44
noonedeadpunk	I will probably just try to reboot....	16:44
clarkb	slittle1_: best I can tell based on lack of info in the logs on our end this is likely to be happening somewhere between you and us. Are you able to try ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects and see if you can reproduce. Then maybe that gives us a bit more info	16:45
noonedeadpunk	CN=gitea01.opendev.org	16:45
noonedeadpunk	but things jsut wen back to normal	16:46
noonedeadpunk	so I guess I had some stuck connection that wasn't closed properly...	16:46
noonedeadpunk	as I saw like 15% packet loss close to loadbalancer	16:47
clarkb	is it possible that vexxhost is having a widespread ipv4 routing problem?	16:47
clarkb	(thats just a long shot given what slittle1 observes in another datacenter but both are in vexxhost)	16:47
clarkb	to confirm gitea01 seems healthy. The gitea processes have been running for a couple days. Current free memory is good and there are no recent OOMKiller events	16:48
clarkb	slittle1_: are your connections running in parallel? I see 13 connections from your source currently 11 of which are established	16:50
clarkb	that is still only 13% of our limit though so shouldn't be in danger of that. Mostly just curious	16:51
*** pojadhav\|rover is now known as pojadhav\|out		16:51
noonedeadpunk	well, connection to vexxhost never was reliable for me at least because of zayo being in the middle.... But packet loss was somewhere on the core router...	16:51
clarkb	noonedeadpunk: also if it wasn't clear running an ls-remote sequentially the way you are doing is the correct method I think. I would expect that to work	16:52
clarkb	noonedeadpunk: doing 200 at the same time might not :)	16:52
noonedeadpunk	it always worked at least before	16:53
noonedeadpunk	and that was exactly problem with osa upgrades	16:53
clarkb	slittle1_: now down to 6. So ya I don't think we're hitting that 100 limit unless it happens very quickly and everything backs off	16:54
noonedeadpunk	we were too tolerant for failovers if things are broken on the deployer side (or they execute upgrade in wrong order)	16:54
fungi	noonedeadpunk: ooh, so the cause of osa upgrades overwhelming us was finally identified? that's great news	16:55
noonedeadpunk	but to get this fixed ppl would need to pull in fresh code...	16:56
noonedeadpunk	or follow docs while upgrading	16:56
jrosser	I think that of the people who were causing this we reached out to them all and no-one was able to help reproduce it	16:56
noonedeadpunk	both are kind of unlikely in short term	16:56
jrosser	I would be in favour of adding an assert: to the code to make it just fail when this happens	16:57
jrosser	though it technically is a valid configuration to use no local caching at all	16:57
*** jpena is now known as jpena\|off		16:57
opendevreview	Merged opendev/system-config master: Switch lodgeit to run under a dedicated user https://review.opendev.org/c/opendev/system-config/+/818606	16:58
jrosser	anyway - what noonedeadpunk is doing is trying to run a script to retrieve the SHA of stable/xena for all the OSA repos, nothing to do with a deployment	16:59
jrosser	it's needed for our release process	16:59
clarkb	yup, and from what I see things are fine on our side. noonedeadpunk indicated packetloss though	16:59
*** marios\|ruck is now known as marios\|out		16:59
clarkb	I'm beginning to suspect there may be some Internet is having a fit problems near vexxhost right now issues	16:59
slittle1_	I suspect the extra connections relate to my use of 'timeout' to kill hung sessions.	17:00
clarkb	but those are always difficult to debug if you aren't on a client end with the problem and the server side doesn't see the issue beacuse packets don't reach it	17:00
slittle1_	The hung sessions are probably cases where ssh key exchange failed and it's prompting for user/pass	17:01
slittle1_	the script doesn't know how to respond to that, and I don't see it as the prompt is routed the /dev/null. It's in a sub function, and the only thing I want coming out of stdout is the string I'm expecting to parse. I'll route stdout to stderr if I need another run	17:04
slittle1_	Ha, it finally passed	17:05
slittle1_	I'm afraid the 'git reviews' will hit the same issue	17:06
clarkb	slittle1_: it probably will	17:06
clarkb	noonedeadpunk: fwiw I was just able to clone requirements at least 10 times (it ran in a while loop and I didn't count the exact number) via opendev.org to gitea01 (I am balanced to the same backend) over ipv4 successfully.	17:09
clarkb	fungi: ^ any better ideas on debugging slittle1_'s problem if the connections don't seem to show up in our logs. I suspect something external to us	17:10
clarkb	I guess run an mtr from slittle1_'s IP to review.opendev.org and see if there is packet loss. But it might be port specific etc	17:10
fungi	git review may prompt for account details if an ssh connection attempt fails at the wrong places in the script	17:12
fungi	it's likely just another manifestation of a connection issue	17:12
clarkb	fungi: ya I'm wondering if it is general internet unhappyness, maybe an assymetric route? Or slittle1_'s local firewall limiting connections to a single endpoint or a firewall cluster not allowing port 29418 out on a specific node etc	17:13
clarkb	something like that would explain why we never see the issue in our logs	17:13
noonedeadpunk	clarkb: it works nicely now as well	17:13
fungi	if it can be minimally reproduced with specific ssh commands, then we may be able to narrow it down with added verbosity to something like problematic route distribution, a pmtud blackhole, et cetera	17:14
clarkb	infra-root: http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026250.html that is proably a meeting we should try and attend. I'll mark it on my todo list but calling it out if others want to attend	17:14
fungi	depending on at what point the connection breaks	17:14
clarkb	fungi: slittle1_: ya so something like ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects	17:14
clarkb	and see if you can make that fail	17:14
fungi	we've even seen examples of environments doing specific qos/dscp marking on ssh connections, causing them to get treated differently (in bad ways) from other tcp sessions, or particular firewalls with ssh-specific connection tracking features introducing nuanced inconsistencies	17:16
clarkb	paste updated and I've been able to make this test paste just now https://paste.opendev.org/show/bE7I0dBfkoDBsGSDZYNT/ I think that is happy	17:18
opendevreview	Sorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job https://review.opendev.org/c/zuul/zuul-jobs/+/821247	17:20
clarkb	fungi: can you check my comments on https://review.opendev.org/c/opendev/system-config/+/820900 ? I +2'd as nothing there seemed critical but didn't want ot approve in case it was worth updating	17:22
fungi	thanks, replied to them	17:26
clarkb	fungi: I think I have a slight preference to aggregate by chain since each chain's rule behaviors are specific to that chain	17:28
clarkb	maybe in a followup?	17:28
opendevreview	Sorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job https://review.opendev.org/c/zuul/zuul-jobs/+/821247	17:29
opendevreview	Sorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job https://review.opendev.org/c/zuul/zuul-jobs/+/821247	17:30
slittle1_	ran 'ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects' ten times in rapid succession. No issues	17:31
clarkb	fungi: also left a thought on https://review.opendev.org/c/opendev/system-config/+/821144 to make the test a bit more robust	17:32
fungi	thanks	17:33
slittle1_	ran it in a tighter loop. failed on the 19'th iteration....	17:35
opendevreview	Sorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job https://review.opendev.org/c/zuul/zuul-jobs/+/821247	17:35
slittle1_	debug1: Connecting to review.opendev.org [199.204.45.33] port 29418.	17:36
slittle1_	debug1: Connection established.	17:36
slittle1_	debug1: identity file /folk/slittle1/.ssh/openstack type 1	17:36
slittle1_	debug1: key_load_public: No such file or directory	17:36
slittle1_	debug1: identity file /folk/slittle1/.ssh/openstack-cert type -1	17:36
slittle1_	debug1: Enabling compatibility mode for protocol 2.0	17:36
slittle1_	debug1: Local version string SSH-2.0-OpenSSH_7.4	17:36
slittle1_	ssh_exchange_identification: read: Connection reset by peer	17:36
clarkb	ok that indicates it is being killed very early in the protocol establishment. It gets far enough to create the tcp connection but then almost as soon as it starts to negotiate ssh on top of that a peer resets it (which can be a router or firewall in between)	17:37
clarkb	our firewall rules don't do resets	17:37
clarkb	slittle1_: did you just do that in a while loop? I'll run similar locally if so just to see if I can reproduce from here	17:41
slittle1_	yes	17:42
slittle1_	i=0; while [ $i -le 100 ]; do echo $i; i=$((i + 1)); ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects; if [ $? -ne 0 ]; then break; fi; done	17:42
clarkb	ok I just did similar with 30 iterations and had no problems.	17:43
clarkb	and reran again just to be double sure. Definitely seems like something to do with your network connectivity. Whether local or upstream of you	17:45
*** weechat1 is now known as amorin		17:54
*** weechat1 is now known as amorin		18:00
clarkb	if you want to debug further the next step is probably a tcpdump to catch the reset and see where it originates from? fungi might have better ideas. That will liekly produce a large amount of data though	18:10
fungi	that might help narrow it down, but these days most middleboxes "spoof" tc resets on behalf of the remote address	18:13
fungi	er, tcp resets	18:13
fungi	so all tcpdump will probably show you is that the server sent a tcp/rst packet, and a corresponding tcpdump on the server will show no such packet emitted	18:13
fungi	but it is unlikely to help in narrowing down which system between the client and server actually originated the reset	18:14
fungi	i would say, the majority of the time i've seen those symptoms, it's either because of an overloaded state tracking/address translation table on a router selectively closing connections to keep under its limit, or a cascade effect failure due to running out of bridge table space on an ethernet switch somewhere	18:16
fungi	the intermittency can be further stretched by flow distribution across parallel devices, where one device is struggling but only a random sample of flows are sent through it	18:18
clarkb	yup tl;dr Internet	18:19
fungi	getting your isp to talk to vexxhost and/or their backbone providers might help get eyes on a problem, but usually the network providers are actually aware and are sitting on degraded states awaiting a maintenance window to replace/service something	18:20
fungi	i'm just glad to no longer be one of the people making those decisions ;)	18:21
fungi	possibly of interest to some here, a summary of the recent pypi user feedback survey: https://pyfound.blogspot.com/2021/12/pypi-user-feedback-summary.html	18:24
fungi	surveys	18:24
fungi	decisions include adding paid organization accounts on pypi (free for community projects), and further requirements gathering on package namespacing	18:27
clarkb	fungi: for the lists ansible stuff. Did you want to push up a followup to do the chain move or just update the existing change? I'm thinking we should probably land the iptables update change first before anything else just to be sure it doesn't impact prod (it shouldn't as only the test all group gets rules)	18:27
clarkb	And then we should be able to land the set of lists specific changes in one block pretty safely	18:28
fungi	yeah, i'll revise the iptables change, i'd rather not merge too many different updates to our firewall handling, as each is a separate opportunity for breakage	18:31
clarkb	++	18:32
fungi	clarkb: for the debugging, would you prefer to record the ip(6)tables-save output some other way?	18:38
fungi	i stuck the print statement where i did mainly so that it would be logged in close proximity to the assertion failures, but no idea if you had a chance to check whether that seemed too verbose to you	18:38
opendevreview	Sorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job https://review.opendev.org/c/zuul/zuul-jobs/+/821247	18:39
clarkb	fungi: let me go look at the test logs	18:39
clarkb	fungi: oh huh it looks like pytest captures stdout and doesn't show it unless you fail? IN that case I think it is fine as is	18:42
clarkb	I was owrried a bunch of tests would be dumping iptables rules to the console log and making that noisy but doesn't seem to be th case. And if that check fails you want to see the rules	18:42
fungi	yeah, the output format itself also isn't awesome, it's a one-line list representation of all the lines output by the save command, but it was sufficient for me to finally find the normalized for for the rule i was trying to match in my test addition	18:45
fungi	er, normalized form for	18:46
opendevreview	Jeremy Stanley proposed opendev/system-config master: Block outbound SMTP connections from test jobs https://review.opendev.org/c/opendev/system-config/+/820900	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Copy Exim logs in system-config-run jobs https://review.opendev.org/c/opendev/system-config/+/820899	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Collect mailman logs in deployment testing https://review.opendev.org/c/opendev/system-config/+/821112	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Make sure /usr/bin/python is present for mailman https://review.opendev.org/c/opendev/system-config/+/821095	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Restart mailman services when testing https://review.opendev.org/c/opendev/system-config/+/821144	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Use newlist's automate option https://review.opendev.org/c/opendev/system-config/+/820397	18:47
*** sshnaidm is now known as sshnaidm\|afk		19:05
clarkb	that stack lgtm now. Thanks	19:13
fungi	much obliged	19:17
clarkb	fungi: do you have time for https://review.opendev.org/c/opendev/gerritbot/+/818494 and parent?	19:22
clarkb	I should do an audit of the buster images that need bullseye updats and we can start doing them all	19:22
clarkb	I'll work on putting together this todo list as well as one for the user stuff this afternoon. Then we can work through it and know when we are done	19:25
fungi	reviewed both of those, and thanks	19:27
fungi	interesting run timeout on the mailman log collection change, i wonder if i've added too much to the job: https://zuul.opendev.org/t/openstack/build/d5aab74b18f348f0939f62c6bb116bb6	19:30
clarkb	or maybe the node was really slow creating lists?	19:40
fungi	maybe	19:42
fungi	that change is earlier in the stack than the one which alters the newlist command invocation	19:43
clarkb	https://etherpad.opendev.org/p/opendev-container-maintenance starting to put the information together there	19:56
clarkb	Need to take a break for lunch, but I'll try to get that etherpad as complete as possible. Then we can start pushing changes in a more organized manner to get through this. Previously it was pretty ad hoc (we've made decent progress though)	20:22
slittle1_	oops ... I think we missed something in the config of one of our starlingx repos	21:08
slittle1_	remote: error: branch refs/tags/vr/stx.6.0:	21:08
slittle1_	remote: You need 'Create Signed Tag' rights to push a signed tag.	21:08
slittle1_	remote: User: slittle1	21:08
slittle1_	remote: Contact an administrator to fix the permissions	21:08
slittle1_	remote: Processing changes: refs: 1, done	21:08
slittle1_	To ssh://review.opendev.org:29418/starlingx/metrics-server-armada-app.git	21:08
slittle1_	! [remote rejected] vr/stx.6.0 -> vr/stx.6.0 (prohibited by Gerrit: not permitted: create signed tag)	21:08
slittle1_	error: failed to push some refs to 'ssh://review.opendev.org:29418/starlingx/metrics-server-armada-app.git'	21:08
clarkb	slittle1_: you'll need to push a change to update your acls allowing you to push the signed tags	21:09
clarkb	if the acl is already there then you'll need to be added to the appropriate group	21:09
clarkb	slittle1_: https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/metrics-server-armada-app.config#L11	21:10
clarkb	https://review.opendev.org/admin/groups/3086a3152fc635addcd00cd4823a1be0352fac1f,members	21:11
slittle1_	yah, should have included 'starlingx-release'	21:16
opendevreview	Scott Little proposed openstack/project-config master: give starlingx-release branch and tag powers in metrics-server-armada-app https://review.opendev.org/c/openstack/project-config/+/821321	21:25
slittle1_	https://review.opendev.org/c/openstack/project-config/+/821321	21:25
clarkb	slittle1_: I'm not sure if you can double up the groups on one line like that	21:27
clarkb	but also should you replace the core group with the release group anyway?	21:27
slittle1_	yes, that would be mor consisten with our norm	21:27
opendevreview	Scott Little proposed openstack/project-config master: give starlingx-release branch and tag powers in metrics-server-armada-app https://review.opendev.org/c/openstack/project-config/+/821321	21:29
slittle1_	gotta get me a new keyboard	21:30
clarkb	ianw: ok left comments on https://review.opendev.org/c/opendev/system-config/+/821155 tl;dr I think it does what it describes and that it is safe and unintrusive but also think we should have a discussion as a group about further plans before we get too far ahead. Happy to dedicate the majority of our next meeting to that if it would be helpful (or use email or do an ad hoc meeting	22:28
clarkb	etc)	22:28
ianw	thank you! yes i agree on discussion	22:32
ianw	as far as i would want to go is having zuul write things in plain text on the bastion. i could write a spec to that, if we like, or just an email	22:33
clarkb	I think eitherway works. I might have a slight preference for a spec as it helps outline everything in the code where we do that sort of thing	22:35
ianw	i wouldn't mind applying 821155 (not now, when it's quiet and i'm watching) and reverting after a successful run, just to confirm it works as intended	22:37
ianw	i think it does, but i've thought a lot of things about this changeset that haven't quite been true :)	22:37
clarkb	heh ya. I think its a good way to test the waters as the scope is qutie small and we can clean up after it easily when done	22:38
clarkb	ok I think that etherpad is fairly compelte and I've sorted the lists by done, not applicable for one reason or another, and needs work	22:40
clarkb	I'm going to start pushing more changes up to bump to bullseye next	22:41
fungi	we seem to have very few builds in progress for the openstack tenant at the moment, most builds seem to be queued	22:43
fungi	thinking this may be all the branch creation events for starlingx repos, we saw something similar when the release team merged a change to add branches to all of the openstackansible repos earlier in the week	22:45
clarkb	exciting	22:45
clarkb	there are a lot of events	22:45
clarkb	I guess we watch that and see if they move?	22:45
fungi	corvus suggested that the scheduler should be collapsing all the reconfigure events for those together, i think?	22:45
fungi	we ended up getting out of the similar pileup from osa by doing a full scheduler restart and zk clear	22:46
clarkb	ya might be worth double checking zuul isn't doing something wrong here too	22:46
fungi	the event queues should burn down on their own, but i don't know how rapidly. https://grafana.opendev.org/d/5Imot6EMk/zuul-status says some events are taking 15-30 minutes to process	22:49
clarkb	ya I think the restart made things go faster because zuul would check all branches at the startup time and somehow that makes it go quicker?	22:50
clarkb	but I hesitate to proceed with a restart because 1) zuul should be able to handle this and 2) I thought we though zuul would handle this? Probably a good idea to see if corvus has opinions	22:51
fungi	well, it would only read them all once, rather than one for every new branch creation in one of the repos, i guess?	22:51
opendevreview	Clark Boylan proposed opendev/system-config master: Update the accessbot image to bullseye https://review.opendev.org/c/opendev/system-config/+/821328	22:52
fungi	kevinz: if you're around yet (i'm sure it's still early) we seem to have 19 server instances stuck in a "deleting" state (one i looked at is saying the task_state is deleting but the vm_state is building, with a creation date of 2021-11-19, i expect the others are similar but haven't confirmed)	22:52
fungi	as a result we're not booting any new instances there until they're cleaned up	22:52
opendevreview	Clark Boylan proposed opendev/system-config master: Update the hound image to bullseye https://review.opendev.org/c/opendev/system-config/+/821329	22:55
clarkb	the queue sizes appear to be getting smaller	23:00
opendevreview	Clark Boylan proposed opendev/system-config master: Update limboria ircbot to bullseye https://review.opendev.org/c/opendev/system-config/+/821330	23:08
opendevreview	Clark Boylan proposed opendev/system-config master: Install Limnoria from upstream https://review.opendev.org/c/opendev/system-config/+/821331	23:08
opendevreview	Clark Boylan proposed opendev/system-config master: Update matrix-eavesdrop image to bullseye https://review.opendev.org/c/opendev/system-config/+/821332	23:11
opendevreview	Clark Boylan proposed opendev/system-config master: Update refstack image to bullseye https://review.opendev.org/c/opendev/system-config/+/821335	23:26
ianw	clarkb: for 821331 did they make it to the master branch yet?	23:27
*** rlandy\|ruck is now known as rlandy\|out		23:30
clarkb	ianw: they appaer to have. I cloned and git log showed them in history	23:30
clarkb	ianw: but you should definitely double check	23:30
clarkb	I sort of figured we could get the changes up and then testing will tell us where bullseye is different and stuff will berak	23:31
clarkb	but better to get this out there as a list of things we can take action on than a secret todo list :)	23:31
clarkb	uwsgi-base is going to be the complicated on that needs thinking since it is a base image with other consumers. We want to do what we did with python-base and python-buidler so I'll have to look at it a bit more closely once the others are moving along	23:34
*** artom__ is now known as artom		23:39
opendevreview	Clark Boylan proposed opendev/system-config master: Properly build bullseye uwsgi-base docker images https://review.opendev.org/c/opendev/system-config/+/821339	23:47
clarkb	ok the uwsgi situation is a bit fun. I tried to cover it all in the commit message for ^. Lodgeit isn't actually done and will need an image rebuild once ^ lands	23:48
opendevreview	Clark Boylan proposed opendev/lodgeit master: Rebuild the lodgeit docker image https://review.opendev.org/c/opendev/lodgeit/+/821340	23:50
clarkb	ok I think that is a fairly compelte list of changes needed to bump our images up a debian release. Note I don't think we should approve them all at once and instead take a little time to make sure debian userland updates don't cause unexpectedchanges	23:50
clarkb	but the vast majority of them should be fine as they don't rely on the userland for much	23:50
fungi	management events list for the openstack tenant is down to 4 now	23:51
clarkb	I guess tomorrow I'll look for any failures and maybe we can land a subset. Then we can also start looking at the uid updates. Hopefully that etherpad lays out the todos around this pretty clearly. I added a few others as well for mariadb and zookeeper that i noticed	23:53
ianw	clarkb: thanks, will double check. i'll review the other bits this afternoon	23:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!