Wednesday, 2021-01-13

*** tosky has quit IRC		00:01
openstackgerrit	Merged zuul/zuul-jobs master: Clarity tox_environment accepts dictionary not list https://review.opendev.org/c/zuul/zuul-jobs/+/769433	00:05
openstackgerrit	Merged zuul/zuul-jobs master: Document Python siblings handling for tox role https://review.opendev.org/c/zuul/zuul-jobs/+/768823	00:05
*** zbr5 has joined #opendev		00:22
*** zbr has quit IRC		00:22
*** zbr5 is now known as zbr		00:22
*** mlavalle has quit IRC		00:47
openstackgerrit	Merged zuul/zuul-jobs master: Add configuration to make logs public https://review.opendev.org/c/zuul/zuul-jobs/+/764483	01:02
*** kevinz has joined #opendev		01:03
*** icey has quit IRC		01:15
*** icey has joined #opendev		01:16
openstackgerrit	Merged zuul/zuul-jobs master: Allow to retrieve releasenotes requirements from a dedicated place https://review.opendev.org/c/zuul/zuul-jobs/+/769292	01:36
kevinz	ianw: Morning!	01:37
ianw	kevinz: hi, happy new year	01:37
kevinz	ianw: Thanks! Happy new year! I saw there is an issue accessing to Linaro Us right? But checking the pinging, I see both the IPv4 and IPv6 works	01:38
ianw	i only got back from pto today, and wasn't aware of anything. fungi: ^ are there current issues?	01:39
kevinz	ianw: OK, np	01:39
kevinz	https://mirror.regionone.linaro-us.opendev.org/debian/dists/buster-backports/InRelease, this can not work, but ping is fine	01:39
ianw	hrm, will check in a little	01:40
*** tkajinam has quit IRC		01:41
*** tkajinam has joined #opendev		01:42
ianw	ok, host is up, noting interesting in dmesg	01:57
ianw	kevinz: do you perhaps mean https://mirror.iad.rax.opendev.org/debian/dists/buster-backports/Release ?	02:02
kevinz	ianw: aha, after checking the IRC log, this issue for linaro-us has been fixed after reboot instance	02:07
kevinz	http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-01-12.log.html.	02:07
kevinz	Thanks	02:07
kevinz	ianw: thanks for helping	02:07
*** tkajinam has quit IRC		02:09
*** tkajinam has joined #opendev		02:10
ianw	kevinz: did you manage to find anything on why these nodes shutdown suddenly?	02:10
ianw	I don't think we have an InRelease file, just Release because we don't sign our repos	02:11
ianw	but I guess per that link, apt looks for InRelease, and if the mirror is down will give that error	02:11
kevinz	req-3a0fe54f-e97f-4ca0-b24f-6bcdabc9be27StartJan. 12, 2021, 10:21 a.m.0881516836d94a8f890a031f84c985ef-	02:13
kevinz	req-1e4b24f1-13f3-4c1f-bf19-e4f1e0c8b053StopJan. 11, 2021, 2:52 p.m.--	02:13
kevinz	req-2e2cc170-c6b4-491a-804c-5af5efd604d0StartDec. 19, 2020, 12:44 a.m.0881516836d94a8f890a031f84c985ef-	02:13
kevinz	req-5cf099bb-011a-4e64-902d-40ab2e8795a5StopDec. 18, 2020, 9:25 p.m.--	02:13
kevinz	req-556cbdab-8639-42ab-b624-30b6b4ade719StartNov. 8, 2020, 10:06 p.m.0881516836d94a8f890a031f84c985ef-	02:13
kevinz	req-b04bbf39-2897-4e62-a30d-99d4722c3c70StopNov. 5, 2020, 7:18 a.m.-	02:13
ianw	i had a remote netconsole running and didn't get any sort of message out of the host	02:14
kevinz	ianw: http://paste.openstack.org/show/801575/, looks it has been shutdown every month	02:14
ianw	it was like it was just killed	02:14
kevinz	ianw: run out of the resources and got killed?	02:15
kevinz	by host	02:15
ianw	maybe? I'd expect some sort of logs in nova ...	02:15
kevinz	I will check the nova-log for this req number	02:16
ianw	i setup a console on 2020-11-09	02:16
ianw	doh, i dropped the "-a" from the tee command so i stupidly have overwritten when it stopped	02:20
ianw	Sep 14 09:09:58 <ianw> Alex_Gaynor: thanks for pointing that out. it seems we have a problem with the mirror node in that region.	02:22
ianw	req-4c549e46-760b-4353-b92d-2503e13a96c5StartSept. 13, 2020, 10:39 p.m.0881516836d94a8f890a031f84c985ef-	02:23
ianw	probably matches; are those times UTC?	02:23
kevinz	ianw: yes, it is UTC timezone	02:35
kevinz	ianw: checking the log from nova-compute, just get this: http://paste.openstack.org/show/801576/. Will find more in nova-api and conductor	02:45
ianw	that definitely seems like nova noticed the vm had already shutdown, then updated the db	02:48
ianw	kevinz: i'd be looking for corresponding oom/kill type messages in syslog for qemu-kvm around the same time ...	02:50
kevinz	ianw: will check	02:50
*** hamalq has quit IRC		02:56
ianw	kevinz: do the compute nodes have a little swap space?	02:57
*** ysandeep\|away is now known as ysandeep		03:24
kevinz	ianw: http://paste.openstack.org/show/801579/, yes, totally 4578M	03:28
kevinz	And I see some Qemu failed at Jan 11.	03:28
ianw	hrm, so 96gb ram, 4gb swap (approx) right?	03:31
ianw	although there's a lot of free ram now, the swap does seem used, which suggests to me it might have been under memory pressure at some other time	03:32
*** sboyron has quit IRC		03:41
kevinz	ianw: yes, look memories pressure. I will disable this node scheduling for a while, to see if there would be better	03:58
kevinz	ianw: I see there are quite a lot of instance are scheduling and running at this node, they are always scheduling to this node. Looks the nova scheduler are not good decided...	04:27
kevinz	I disable this node scheduling and I will check what is wrong with the nova-scheduler	04:28
ianw	kevinz: thanks; something like that would explain the very random times it seems to stop i guess. we can go month(s) with nothing but then a few failures in a week it feels like	04:30
kevinz	ianw: yes, definitely. Let's see what will happen recently.	04:32
fungi	ianw: kevinz: the mirror and builder instances were both found in a shutdown state again, we managed to boot them though ended up needing to delete the afs cache on the mirror as it ended up seemingly corrupted to the point where afsd would just hang indefinitely	04:45
ianw	fungi: yeah, that seems to be a common issue when it is shutdown unsafely	04:46
fungi	looking at grafana we're still behind on node requests (though on track to catch up by the time daily periodics kick off), and tripleo still has a 10-hour gate backlog	04:52
fungi	so maybe we should postpone the scheduler restart	04:52
ianw	i'm heading out in ~ 30 mins, so won't be able to watch this evening	04:58
ianw	if tomorrow we get reviews on the zuul summary plugin, it might be worth restarting scheduler and gerrit at the same time	04:58
fungi	great point	05:23
*** ykarel has joined #opendev		05:41
*** marios has joined #opendev		06:28
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Move snaps ACL to x https://review.opendev.org/c/openstack/project-config/+/770538	07:00
openstackgerrit	Merged openstack/project-config master: Create microstack-specs project https://review.opendev.org/c/openstack/project-config/+/770460	07:11
openstackgerrit	Merged zuul/zuul-jobs master: Enable installing nimble siblings https://review.opendev.org/c/zuul/zuul-jobs/+/765672	07:13
*** ralonsoh has joined #opendev		07:19
*** eolivare has joined #opendev		07:35
*** openstackgerrit has quit IRC		07:47
*** jpena\|off is now known as jpena		07:51
*** JayF has quit IRC		07:52
*** openstackgerrit has joined #opendev		07:53
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	07:53
*** fressi has joined #opendev		07:58
*** slaweq has joined #opendev		07:59
*** diablo_rojo__ has quit IRC		08:01
*** hashar has joined #opendev		08:03
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	08:03
*** slaweq has quit IRC		08:04
openstackgerrit	Sorin Sbârnea proposed openstack/project-config master: Move git-review zuul config in-tree https://review.opendev.org/c/openstack/project-config/+/763808	08:05
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	08:06
*** slaweq has joined #opendev		08:10
*** andrewbonney has joined #opendev		08:13
openstackgerrit	Sorin Sbârnea proposed openstack/project-config master: Move git-review zuul config in-tree https://review.opendev.org/c/openstack/project-config/+/763808	08:21
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	08:23
*** rpittau\|afk is now known as rpittau		08:25
*** sboyron has joined #opendev		08:27
*** tosky has joined #opendev		08:39
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	09:18
*** hrw has left #opendev		09:22
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	09:23
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Drop support for py27 https://review.opendev.org/c/opendev/git-review/+/770556	09:44
jrosser	i am seeing a number of Could not connect to mirror.regionone.limestone.opendev.org:443 (216.245.200.130). - connect (113: No route to host)	09:49
lourot	^ same for us, e.g. in https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/770297	10:05
frickler	yet another mirror gone offline, this is getting creepy. /me tries to take a look	10:09
frickler	infra-root: ^^ console log shows a lot of CPU/rcu related issues. trying a restart via the api	10:12
openstackgerrit	Merged openstack/project-config master: Move git-review zuul config in-tree https://review.opendev.org/c/openstack/project-config/+/763808	10:24
*** hemanth_n has joined #opendev		10:28
*** hashar has quit IRC		10:39
*** dtantsur\|afk is now known as dtantsur		10:41
*** hemanth_n has quit IRC		11:01
*** ysandeep is now known as ysandeep\|afk		11:04
*** sshnaidm\|afk is now known as sshnaidm\|ruck		11:19
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	11:45
sshnaidm\|ruck	infra-root is problem with retries in limestone known?	12:08
*** DSpider has joined #opendev		12:10
zbr	sshnaidm\|ruck: one mirror went down two hours ago.	12:22
frickler	sshnaidm\|ruck: zbr: trying to restart the server via the api hasn't worked. doing a stop/start cycle next	12:49
*** jpena is now known as jpena\|lunch		12:49
frickler	GPF while trying to start the AFS client ... guess we'll need to rebuild the mirror or talk to limestone about a possibly broken hypervisor. disabling that region for now	12:53
frickler	hmm ... actually the node did finish booting but failed with afs. did "rm -rf /var/cache/openafs/*" and another reboot, maybe that'll be enough for now	12:55
frickler	o.k., that seems to have worked for now, maybe the GPF was in fact related to afs cache corruption	13:03
frickler	#status log stopped and restarted mirror.regionone.limestone.opendev.org after it had become unresponsive. need afs cache cleanup, too.	13:04
openstackstatus	frickler: finished logging	13:04
frickler	jrosser: lourot: sshnaidm\|ruck: zbr: ^^ please let us know if you encounter any further issues, should be safe to recheck now.	13:05
sshnaidm\|ruck	frickler, thanks a lot!	13:05
*** brinzhang has quit IRC		13:11
*** brinzhang has joined #opendev		13:11
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	13:16
*** ysandeep\|afk is now known as ysandeep		13:25
*** whoami-rajat__ has joined #opendev		13:46
*** jpena\|lunch is now known as jpena		13:51
*** d34dh0r53 has quit IRC		14:10
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Drop support for py27 https://review.opendev.org/c/opendev/git-review/+/770556	14:11
*** d34dh0r53 has joined #opendev		14:13
*** hashar has joined #opendev		14:47
*** auristor has quit IRC		14:47
*** fressi has quit IRC		14:48
*** auristor has joined #opendev		14:50
fungi	frickler: did afsd also not start completely on that mirror after restarting?	15:08
fungi	ahh, you said yes	15:09
mnaser	diablo_rojo_phon: when you have a second, if you could rebase https://review.opendev.org/c/openstack/project-config/+/767057	15:10
kopecmartin	hi all, I'd like to update refstack server (https://refstack.openstack.org) about the latest changes in the refstack repo (https://opendev.org/osf/refstack/), could anyone point me a direction on how? Thank you	15:21
clarkb	kopecmartin: currently it is deployed using opendev/system-config and puppet-refstack iirc. But I proposed a change a while back to instead build docker images for it and deploy that instead in opendev/sytem-config. the problems there were that a number of changes needed to be made to refstack itself to be viable and I ran out of steam on it	15:34
clarkb	kopecmartin: I think we should pick that back up again if we are going to try and make updates	15:35
fungi	also the current refstack server is running ubuntu 14.04 lts	15:35
fungi	and current master branch says it needs python 3.6 or newer, while that version of ubuntu only has python 3.4	15:35
kopecmartin	oh, I see .. I'm happy to help .. also we have reformed refstack group a little, so there is enough core reviewers now if anything needs to be changed on refstack side	15:39
kopecmartin	in regards of the server OS update, i'm happy to help if you point me a direction	15:40
clarkb	kopecmartin: please feel free to take over that change in system-config it should show up if you earch for me and refstack in system-config	15:40
clarkb	kopecmartin: that would happen as part of the redeployment with docker. So get the docker stuff working in CI then an infra-root will work with you to do the host migration and all that	15:40
kopecmartin	clarkb: thanks, I'm gonna have a look	15:42
*** hashar is now known as hasharAway		15:43
openstackgerrit	Merged opendev/git-review master: Bring zuul configuration in-tree https://review.opendev.org/c/opendev/git-review/+/770539	15:48
ttx	kopecmartin: thanks for picking that up! I was looking into updating instructions when I realized they were already updated but just missing a current deployment	15:53
openstackgerrit	Sorin Sbârnea proposed opendev/git-review master: Assure git-review works with py37 and py38 https://review.opendev.org/c/opendev/git-review/+/770641	15:53
openstackgerrit	Jeremy Stanley proposed opendev/engagement master: Initial commit https://review.opendev.org/c/opendev/engagement/+/729293	16:00
*** ykarel is now known as ykarel\|away		16:09
*** ysandeep is now known as ysandeep\|out		16:12
*** ykarel\|away has quit IRC		16:16
*** slaweq has quit IRC		16:34
*** slaweq has joined #opendev		16:36
*** chrome0 has quit IRC		16:40
*** tosky has quit IRC		16:41
*** tosky has joined #opendev		16:42
*** chrome0 has joined #opendev		16:45
*** eolivare has quit IRC		16:52
*** jpena is now known as jpena\|off		17:05
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Adjust the example Etherpad API delete command https://review.opendev.org/c/opendev/system-config/+/770648	17:06
clarkb	#status log Manually deleted an etherpad at the request of dmsimard.	17:12
openstackstatus	clarkb: finished logging	17:12
fungi	interesting, looks like some of our afs vos release runs began to consistently fail at 04:23 today	17:14
fungi	i'll check the fileservers	17:14
fungi	both are up and there are no recent restarts	17:14
clarkb	fungi: how is disk utilization?	17:14
clarkb	(I think we've been good on that side of things but could potentially explain it?)	17:15
fungi	the /vicepa fs has 392G available on afs01.dfw and 1.1T available on afs02.dfw	17:15
fungi	dmesg indicates io errors talking to a cinder volume on afs02.dfw starting at 03:32:42	17:17
clarkb	fungi: are there stale locks? thats about the only other thing I can think of that would cause something like that	17:17
clarkb	oh interseting I guess that could do it too	17:17
fungi	[Wed Jan 13 03:32:42 2021] INFO: task jbd2/dm-0-8:484 blocked for more than 120 seconds.	17:17
fungi	[Wed Jan 13 03:35:58 2021] blk_update_request: I/O error, dev xvdk, sector 12328	17:17
fungi	i'll reboot the server	17:18
clarkb	ok	17:18
fungi	make sure it reconnects to the volumes correctly	17:18
*** marios is now known as marios\|out		17:18
fungi	okay, server's back up. i'll try to stop/fix any lingering hung vos releases or locks	17:23
clarkb	thanks!	17:24
fungi	may not be any cleanup required, looks like the errors stopped as soon as afs02.dfw was restarted	17:29
clarkb	I wonder if increasing that kernel timeout would help (and if it is even tunable)	17:29
*** sshnaidm\|ruck is now known as sshnaidm\|afk		17:30
fungi	some volumes needed "full" releases, so it's taking a bit of time to catch up again	17:31
fungi	#status log rebooted afs02.dfw following hung kernel tasks and apparent disconnect from a cinder volume starting at 03:32:42, volume re-releases are underway but some may be stale for the next hour or more	17:34
openstackstatus	fungi: finished logging	17:35
*** mlavalle has joined #opendev		17:57
*** hamalq has joined #opendev		18:03
*** rpittau is now known as rpittau\|afk		18:05
*** marios\|out has quit IRC		18:06
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	18:06
clarkb	fungi: doesn't look like zuul is any happier with its backlog today?	18:09
*** cloudnull has quit IRC		18:12
*** cloudnull has joined #opendev		18:12
*** andrewbonney has quit IRC		18:12
fungi	not really no	18:13
clarkb	looking at grafana nothing stands out as being very broken. I guess just backlogs due to demand and potentially made worse by the mirror issue in limestone earlier today	18:13
clarkb	boot times seem consistent and failures are infrequent	18:13
clarkb	I do wonder if we have no quota in vexxhost though as that is a semi common thing due to volume leaks	18:14
clarkb	grafana indicates that no we are using our quota there	18:14
clarkb	the number of jobs a single neutron change runs is not small	18:16
fungi	gerrit event volume has started to dip, so looks like the node request backlog is plateauing around 2-2.5k at least	18:18
fungi	we seem to max out at roughly 600 nodes in use	18:19
*** ralonsoh has quit IRC		18:20
fungi	aha, mystery solved on the afs02.dfw issue. ticket rackspace opened: This message is to inform you that our monitoring systems have detected a problem with the server which hosts your Cloud Block Storage device, afs02.dfw.opendev.org/main04, '9f19fd0d-a33e-4670-817c-93dd1e6c6e6f' at 2021-01-13T03:59:33.166398.	18:28
fungi	if i hadn't been so distracted this morning i might have read the root inbox earlier and noticed/fixed the problem sooner	18:30
*** dtantsur is now known as dtantsur\|afk		18:43
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	18:53
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	19:04
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	19:24
*** hamalq has quit IRC		19:29
*** paladox has quit IRC		19:34
*** hasharAway has quit IRC		19:36
*** paladox has joined #opendev		19:39
*** whoami-rajat__ has quit IRC		19:55
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	20:08
*** sboyron has quit IRC		20:09
openstackgerrit	Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656	20:24
*** slaweq has quit IRC		20:41
fungi	tarballs volume has been releasing for 4 hours now. i probably should have had the foresight to create locks preventing the mirror volumes from getting released until that was done :/	21:27
fungi	at this point cronjobs have started what are probably full releases which are still in progress for deb-octopus, yum-puppetlabs, debian, opensuse, debian-security, ubuntu-ports, epel, centos, and fedora	21:29
mordred	fungi: "fun"	21:30
fungi	yeah, i expect the tarballs release would have only required two hours based on the transfer volume cacti is clocking on the network interface, but roughly an hour in the package mirrors started also getting releases triggered	21:31
fungi	which likely slowed the second half of the tarballs volume transfer to a crawl	21:32
fungi	i'm hesitant to abruptly terminate any of the mirror volume releases though over worries that will lead to even more cleanup work	21:33
clarkb	we don't typically do global locks but I guess the issue now is its a full release?	21:33
fungi	yeah, and a full release of a bunch of different volumes at once i expect	21:33
fungi	i guess because these tried to release at some point while the filesystem was hitting write errors	21:34
ianw	fungi: the tarballs one, i don't think that runs with -localauth? the mirror ones should be running via a ssh session and not hit a timeout	21:41
fungi	2021-01-13 17:28:56,918 release DEBUG Running: ssh -T -i /root/.ssh/id_vos_release vos_release@afs01.dfw.openstack.org -- vos release project.tarballs	21:47
fungi	i'm not worried about auth timeouts, just that it's going to be an age before tarballs.o.o and some other sites (zuul-ci.org, et cetera) are current again	21:51
*** brinzhang_ has joined #opendev		23:02
*** brinzhang has quit IRC		23:05
clarkb	ianw: for https://review.opendev.org/c/opendev/system-config/+/767059/ does ansible work with symlinks like that? any reason to no just keep treating the canonical srever as .openstack.org until we properly rename it?	23:09
clarkb	(mostly concerned that we'll run against prod without the necessary vars loaded in)	23:09
ianw	clarkb: one sec, context switching back to it :)	23:10
clarkb	its just seems like we're getting ahead of schedule with that one	23:10
fungi	yeah, elsewhere we still use openstack.org in the inventory name for it	23:10
clarkb	yup the server is still named openstack.org canonically in nova too	23:11
clarkb	we just server review.opendev.org on it too	23:11
ianw	so yeah, i think that started with me looking at the testinfra, which was trying to match against review01.opendev.org and thus not actually running the tests; which i think is only "is this listening"	23:13
fungi	i have a feeling this is also not going to be a good time for a gerrit restart looking at the graphs... wonder if we should shoot for late utc friday, next week is openstack wallaby milestone 2 which likely explains the rush on the gate	23:13
ianw	so iirc my issue was as i expanded the testing, i didn't want to have it in a weird state of testing against review01.openstack.org	23:15
clarkb	ianw: ya I think we should fix the test to look at review.openstack.org. Then when we switch the host over we can update that too?	23:15
clarkb	I don't think that is a weird state if that is reality	23:16
clarkb	but maybe I'm missing something else too	23:16
clarkb	fungi: ya agreed we should probably wait for CI to settle before restarting services like gerrit and zuul	23:16
ianw	just that it's already in a dual state, in that the vhost name is set to review.opendev.org	23:17
fungi	ianw: well, we have two vhosts there (we could redo the vhost config to use an alias instead)	23:18
ianw	i guess what i mean is	23:19
ianw	inventory/service/host_vars/review01.openstack.org.yaml:gerrit_vhost_name: review.opendev.org	23:19
fungi	right, we do that	23:20
clarkb	yes because the server is canonically named review01.openstack.org (that will change when it gets upgraded)	23:20
clarkb	(it is confusing, but I worry that changingCI will make it more confusing because CI will be different than prod)	23:20
clarkb	coule of question on https://review.opendev.org/c/opendev/system-config/+/767078 too but I think we can probably land that one as is then make those changes if we want to	23:21
fungi	it wouldn't technically be all that different if we had inventory/service/host_vars/review01.opendev.org.yaml:gerrit_vhost_name: review.opendev.org because the ansible inventory hostname and apache vhost canonical name are still not the same	23:21
clarkb	the more I think about it hte more I'm thinking we should keep the status quo with https://review.opendev.org/c/opendev/system-config/+/767059 then update inventory when we update prod. That way we don't have an unexpected delta between prod and testing and weirdness in our host vars	23:23
ianw	yeah, i guess that what i was doing was building extensively on the system-config tests, and found it quite confusing with the openstack.org server in the testing etc.	23:23
clarkb	I don't think it is necessarily wrong, but it makes things different enough to be confusing	23:24
fungi	it'll like	23:24
fungi	ly be confusing either way ;)	23:24
ianw	yeah	23:25
clarkb	right but the previous was mathed production	23:25
clarkb	so its the confusing we have to deal with :)	23:25
ianw	the other thing is, we could push for the replacement server to clear this up	23:25
clarkb	what does gerrit init --dev do?	23:25
clarkb	ianw: we can do that as well :)	23:25
clarkb	we will need to be careful turning it on to avoid having it replicate to gitea and such	23:26
clarkb	but ya thats anotherthing to sort out	23:26
ianw	clarkb: when the auth type is set to DEVELOPMENT_BECOME_ANY_ACCOUNT and you run gerrit init --dev, gerrit will create the initial admin user for you	23:26
clarkb	ah both are required	23:27
ianw	yes. it's slightly different to the quickstart stuff, that uses the upstream gerrit container. that includes a LDAP connected to it, where you have the initial admin	23:27
fungi	another alternative would be to ssh as "Gerrit Code Review" using the host key and create an initial admin account with the cli	23:28
clarkb	this is fine I had just never seen the --dev flag before	23:29
ianw	fungi: i couldn't get that to work. i couldn't get that to make the initial account	23:29
ianw	you can go in with that after you have an account, and suexec, but it can't create the initial account	23:30
fungi	oh, create-user needs to run via suexec as an existing user?	23:30
fungi	yeah, now i somewhat recall that	23:30
ianw	it's been a bit since i tried, but using the "Gerrit Code Review" was my first attempt at doing it	23:31
clarkb	ianw: thank you for https://review.opendev.org/c/opendev/system-config/+/767269/4 I had meant to do that but then things got crazy when we were slimming the jobs down	23:32
*** DSpider has quit IRC		23:32
ianw	do we know off the top of anyone's head if we have enough headroom to launch another review server in dfw?	23:37
clarkb	I don't. We would if we retired review-test (we can also clean up review-dev but its much smaller)	23:37
ianw	ok, i'm happy to drive this one, i can give things a go and start on an etherpad of steps	23:38
clarkb	thanks! I imagine the spin up for it would look a lot like review-test with a pruned host vars setup	23:38
clarkb	that way it avoids replicating and stuff until we switch and add more config to it	23:38
ianw	april seems far away but it isn't :)	23:39
clarkb	(if you needexamples in recent history for doing the thing)	23:39
clarkb	ianw: I'm into the bazel stuff and it looks like the pure js plugins don't get copied automagically to the war like the java plugins do?	23:42
clarkb	hrm we also have to specify a different bazel target for the plugin. Any idea why the other plugins don't need this?	23:42
ianw	clarkb: i think because they're default plugins?	23:43
clarkb	ah	23:43
clarkb	that makes sense	23:43
fungi	need to do something like the copy i did in ansible for the pg plugin of the opendev theme?	23:43
ianw	don't take anything i say about bazel as true though :) i would love for someone who actually understands it to look at it	23:43
clarkb	fungi: ya and tell bazel to build the plgin explicitly	23:44
clarkb	ianw: I bet that is it	23:44
clarkb	and/or js vs java plugins	23:44
fungi	oh, got it, so there's also a build step for that one	23:44
clarkb	like maybe it can autodiscover java things but not the js	23:44
ianw	it should probably grow to have a java component. what we'd like is for the summary plugin to be able to order the results via config; but the only way to really do that is to write a java plugin that then exposes a REST endpoint	23:45
clarkb	re making room for new review. If we need to we can probably put review-test into cold storage and revive it again after if necessary (basically snapshot the root disk and its cinder volumes then delete the instance)	23:45
clarkb	this new testing stuff also reduces the need for review-test (though testing the migration to 3.3 on review-test with its bigger data set would be nice hence the cold storage idea	23:46
clarkb	worst case we just rebuild review-test entirely	23:46
clarkb	ianw: the symlink thing with bazel is a fun one	23:49
ianw	yeah, that's a great intersection of bazel and docker	23:49
ianw	you can not convince bazel to not use the symlinks, and you can not convince docker to follow them	23:50
clarkb	ok the rest of that stack lgtm. I did leave some nits and thoughts in a few places. You may want to double check them to make sure they are fine as is	23:52
ianw	thanks, i'll go through soon.	23:53
clarkb	neutron is running ~36 jobs per change in check and the vast majority look like expensive full integration style tets	23:54
* fungi sighs		23:55
clarkb	neutron-tempest-with-uwsgi-loki	23:55
clarkb	neutron-ovn-tripleo-ci-centos-8-containers-multinode	23:55
clarkb	those are both failing non voting jobs	23:56
clarkb	I wonder too if we've got a bunch of always failing non voting jobs in there :/	23:56
clarkb	fungi: I wonder if we need to talk to projects about taking a critical eye to tests like that especially if we're producing a large backlog as a result	23:56
clarkb	https://zuul.opendev.org/t/openstack/builds?job_name=neutron-tempest-with-uwsgi-loki confirmed for at least that first job	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!