Wednesday, 2020-09-09

clarkb	gdisk is manually installed into the image	00:00
fungi	clarkb: the no-wheel behavior is consistent with upstream venv/stdlib. i get it even though i compile my own from cpython git source	00:00
clarkb	ubuntu-focal-arm64-0000007568.log is building now and before the sgdisk failure	00:00
clarkb	fungi: ya its consistent on suse too	00:00
clarkb	fungi: I'm going to reread log sand see if it is actually failing or just complaining	00:01
clarkb	ah yup it later logs `running setup.py for X`	00:01
clarkb	that seems like undesireable behavior but explains why this isn't fatal	00:01
clarkb	sorry for the noise	00:02
clarkb	ianw: good call 2020-09-09 00:05:57.481 \| DEBUG diskimage_builder.block_device.utils [-] exec_sudo: mkfs: failed to execute mkfs.vfat: No such file or directory exec_sudo /usr/local/lib/python3.7/site-packages/diskimage_builder/block_device/utils.py:135	00:06
ianw	oh yeah, probably need vfat tools because efi partition	00:06
clarkb	https://packages.debian.org/buster/dosfstools <- seems to be the package	00:07
ianw	yeah that would be it	00:07
clarkb	I'll manually install that one too	00:07
*** DSpider has quit IRC		00:07
clarkb	ianw: wnt to update the change or should I do that?	00:07
ianw	clarkb: umm, i can, see if that uefi patch ontop works too	00:08
clarkb	ya wasn't sure what was going on there s ofigured I'd let you push the new ps	00:09
clarkb	(that way I don't get it wrong)	00:09
clarkb	debian-buster-arm64-0000089197.log is building now should give us an idea if there are more missing bits	00:09
clarkb	but its going to be dinner soon so I need to step away.	00:09
ianw	clarkb: thanks; tailing that log now	00:10
ianw	at least until my internet cuts out :)	00:10
clarkb	its weird we're all being told to stay home not because of the pandemic today but because the area's emergency services have been on a put fires out marathon	00:12
clarkb	thankfully no additional power blips since the first one yesterday (which lasted just long enough to remind me one of my UPS batteries is bad)	00:12
ianw	ahh fires, yes i remember them :)	00:14
ianw	i guess our turn is coming in a few months	00:14
ianw	looks like it's moving on to image generation stage, so that's good	00:25
ianw	(partitions made and formatted)	00:25
clarkb	ianw: looks like adding dosfstools was enough to get a successful job on the new builder	00:57
clarkb	s/job/build/	00:57
clarkb	\| debian-buster-arm64-0000089197 \| debian-buster-arm64 \| nb03.opendev.org \| qcow2 \| ready \| 00:00:18:01 \|	00:58
ianw	yep, that's good! :)	00:59
clarkb	I think that means the le thing is the only outstanding issue and I'm happy to wait for that to happen periodically	01:00
* clarkb goes back to enjoying the evening before the smoke returns		01:00
ianw	clarkb: have fun, i'll keep an eye on LE rollout	01:13
*** diablo_rojo has quit IRC		02:50
openstackgerrit	Merged opendev/system-config master: run-base-post: fix ARA artifact link https://review.opendev.org/747101	03:09
*** johnavp1989 has left #opendev		03:57
openstackgerrit	Oleksandr Kozachenko proposed openstack/project-config master: Add monasca projects in vexxhost tenant https://review.opendev.org/750561	05:09
*** ysandeep\|away is now known as ysandeep		05:42
*** zer0c00l has joined #opendev		06:07
zer0c00l	Is there a way i can subscribe to firehose.opendev.org events?	06:08
zer0c00l	mosquitto_sub -h firehose.openstack.org --topic 'gerrit'	06:08
zer0c00l	Connection error: Connection Refused: not authorised.	06:08
zer0c00l	Is there a way to setup username and password so i can 'subscribe' to mqtt events?	06:08
*** qchris has quit IRC		06:20
*** qchris has joined #opendev		06:34
openstackgerrit	Merged openstack/project-config master: Add monasca projects in vexxhost tenant https://review.opendev.org/750561	06:56
*** andrewbonney has joined #opendev		07:06
*** fressi has joined #opendev		07:17
*** hashar has joined #opendev		07:17
*** ysandeep is now known as ysandeep\|lunch		07:40
*** tosky has joined #opendev		07:57
openstackgerrit	Fabien Boucher proposed opendev/gear master: use python3 as context for build-python-release https://review.opendev.org/742165	07:57
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:01
*** DSpider has joined #opendev		08:02
*** ysandeep\|lunch is now known as ysandeep		08:55
*** dtantsur\|afk is now known as dtantsur		09:35
*** hashar has quit IRC		09:57
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Use pytest for queries https://review.opendev.org/750445	10:00
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Use pytest for queriesSwitches queries testing to use of pytest which provides the following:- test generator for each query (parametrize)- ability to test a single query test- generate html report with tsssest results, making easier to investigate failu https://review.opendev.org/750616	10:11
*** fressi has quit IRC		10:13
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Use pytest for queries https://review.opendev.org/750445	10:13
*** priteau has joined #opendev		10:42
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Use pytest for queries https://review.opendev.org/750445	10:50
openstackgerrit	Oleksandr Kozachenko proposed openstack/project-config master: Add monasca-tempest-plugin in vexxhost tenant https://review.opendev.org/750627	11:37
Open10K8S	Hi team. Please check this PS https://review.opendev.org/750627 . forgot to add tempest plugin.	11:39
Open10K8S	thank you	11:39
openstackgerrit	Tristan Cacqueray proposed openstack/project-config master: pynotedb: end project gating https://review.opendev.org/750634	12:17
openstackgerrit	Tristan Cacqueray proposed openstack/project-config master: pynotedb: remove project from infrastructure systems https://review.opendev.org/750635	12:17
*** jhesketh has quit IRC		12:17
*** jhesketh has joined #opendev		12:17
*** ykarel has joined #opendev		12:25
ykarel	Hi when does infra mirrors get's updated?	12:25
ykarel	example looking at http://mirror.mtl01.inap.opendev.org/centos/8/	12:26
ykarel	^ have missing nfv directory like http://mirror.dal10.us.leaseweb.net/centos/8/nfv/	12:27
ykarel	nfv repo was added couple of hours back	12:27
ykarel	it was added approx 4 hours ago	12:28
openstackgerrit	Lee Yarwood proposed openstack/project-config master: WIP Add Fedora 32 builds https://review.opendev.org/750642	12:36
openstackgerrit	Sean McGinnis proposed openstack/project-config master: Set neutron-lib stable ACLs https://review.opendev.org/750643	12:39
fungi	zer0c00l: i agree, something seems broken with connecting to the mqtt broker there. i'm trying to look into it but also have some meetings this morning, so may not get it fixed immediately	12:46
fungi	ykarel: i think we update every 2 hours, but we're pulling from a secondary mirror (the official primary mirror only allows public mirror operators to rsync from it) so when we see new files depends on when the mirror we're copying from gets them	12:48
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Fix networking-l2gw missed change https://review.opendev.org/750645	12:48
AJaeger	config-core, please review to fix zuul config ^	12:48
AJaeger	tristanC: this should fix your change ^	12:49
*** fressi has joined #opendev		12:50
ykarel	fungi, how to check from mirror you copy is updated or not?	12:50
ykarel	i picked the above url from opendev/system-config	12:50
ykarel	okk now i see repo is updated	12:51
ykarel	mirror.mt101 atleast	12:51
ykarel	http://mirror.mtl01.inap.opendev.org/centos/8/nfv/	12:51
fungi	ykarel: if one of ours has it, all of them should. they're just all frontends to a shared network filesystem	12:52
ykarel	fungi, okk Thanks mirrors are updated now	12:54
ykarel	fungi, btw where to check when mirrors were last updated?	12:54
*** ykarel_ has joined #opendev		12:56
AJaeger	gmann, infra-root, we currently have 104 errors for openstack tenant - many errors about openstack/networking-l2gw, see https://zuul.opendev.org/t/openstack/config-errors	12:56
*** fressi has joined #opendev		12:58
*** ykarel has quit IRC		12:58
gmann	AJaeger: yeah, its from networking-midonet which fix is up but their gate is already broken and no maintainer - https://review.opendev.org/#/c/738046/	13:02
gmann	AJaeger: just posted on current ML thread- http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017116.html	13:04
mnaser	infra-root: does someone need to create `/afs/openstack.org/mirror/ceph-deb-octopus` before https://review.opendev.org/#/c/750519/ works?	13:06
mnaser	that seems like the case to be honest, as it says src file not found..	13:06
*** ykarel_ has quit IRC		13:09
*** ykarel has joined #opendev		13:12
fungi	ykarel: we add a timestamp at the root of each tree we mirror, so in that case http://mirror.mtl01.inap.opendev.org/centos/timestamp.txt tells you when the last time was we ran rsync	13:14
fungi	that file should also be the same across all out mirror servers, because it too is just in the shared network filesystem	13:15
ykarel	fungi, Thanks for the info	13:15
fungi	yw	13:19
*** jhesketh has quit IRC		13:31
*** fressi has quit IRC		13:43
openstackgerrit	Matthew Treinish proposed opendev/puppet-mosquitto master: Update set_anonymous flag to be explicitly true https://review.opendev.org/750659	13:49
openstackgerrit	Matthew Treinish proposed opendev/puppet-mosquitto master: Update set allow_anonymous flag to be explicitly true https://review.opendev.org/750659	13:53
AJaeger	gmann: are their specific changes we should force-merge for midonet? I would ask clarkb or fungi if they are willing to do that if you have a list.	13:57
AJaeger	gmann: thanks for checking	13:57
fungi	yeah, i've been following the ml thread, happy to help there if it will get them back on track (like if there are two fixes in different repos so you can't squash them)	13:58
gmann	AJaeger: they are fixing taas error, let's wait for those if it make it green	13:59
gmann	checking here if it work then we can squash them https://review.opendev.org/#/c/750633/2	14:03
AJaeger	gmann: sure, let's wait - and if you need help, feel free to ask here.	14:14
openstackgerrit	Merged opendev/system-config master: Add zuul-jobs-failures list https://review.opendev.org/748688	14:34
corvus	zbr: ^	14:35
gmann	AJaeger: sure, thanks	14:56
fungi	infra-root: i've now seen a page allocation failure when attaching a new volume to graphite01.opendev.org (the current production system), and would like to reboot it so that the volume attachment is seen by the guest os, any objections?	14:58
corvus	fungi: no objections	14:58
fungi	in xenwatch again, so like i saw with whichever other server that was (/me checks status log...)	14:58
fungi	aha, mirror01.dfw.rax was the other one where i saw this occur	14:59
fungi	#status log cinder volumes for all six elasticsearch servers have been replaced and cleaned up	14:59
openstackstatus	fungi: finished logging	14:59
fungi	#status log cinder volume for graphite02 (not yet in production) has been replaced and cleaned up	15:00
openstackstatus	fungi: finished logging	15:00
zbr	corvus: thanks, this means next step is https://review.opendev.org/#/c/748706/	15:06
openstackgerrit	Merged opendev/puppet-mosquitto master: Update set allow_anonymous flag to be explicitly true https://review.opendev.org/750659	15:10
*** bhagyashris is now known as bhagyashris\|ruck		15:11
*** bhagyashris\|ruck is now known as bhagyashris\|rove		15:11
*** lpetrut has joined #opendev		15:12
*** mlavalle has joined #opendev		15:17
*** jhesketh has joined #opendev		15:23
clarkb	slow start here today. I'll be chwcking on nb03's web server then looking at updating vexxhost mirror netplan configs	15:24
corvus	the weather here is apocalyptic	15:28
corvus	it's super dark outside and very orange	15:28
clarkb	we've avoided that here but about an hour south its really bad like that	15:29
clarkb	zer0c00l: fungi re firehose for gerrit events it may be better to use the ssh event stream? I think firehoseis one of thr services we've talked about turning off due to lack of use	15:34
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: synchronize-repos: Remove unecessary git path modifications https://review.opendev.org/747640	15:44
*** ysandeep is now known as ysandeep\|away		15:51
clarkb	nb03's webserver looks good now. I did have to manually restart the netfilter-persistent unit. I Think that was missed beacuse the original handler handling for apache2 restart failed previously? Thats my best guess	15:53
*** ykarel is now known as ykarel\|away		15:57
clarkb	the rules files were in place on the host but the iptables rules in the kernel had not updated	15:57
clarkb	anyway was an easy fix and now the webserver is accessible and https is working	15:57
*** lpetrut has quit IRC		15:59
gmann	AJaeger: fungi : networking-midonet still fail on taas fix. i think we can force merge this which can solve config error and other fixes can continue in their own pace- https://review.opendev.org/#/c/738046/5	16:00
clarkb	gmann: that doesn't even pass pep8?	16:00
clarkb	should we force merge that?	16:00
AJaeger	gmann: please double check https://review.opendev.org/#/c/738046/5 - just left a comment	16:02
gmann	clarkb: all those failure are existing failure and retiring networking-l2gw introduces config error which 738046 fix	16:02
gmann	ok	16:02
gmann	AJaeger: fixed	16:03
clarkb	mnaser: before I got rebooting the vexxhost mirror with a new netplan config, is interactive console access avilable in the vexxhost cloud? Wondering if I should bother setting a root password or not	16:16
fungi	you should be able to log into horizon there with our tenant creds	16:24
fungi	since there were no objections, i'm rebooting the prod graphite instance now	16:25
clarkb	fungi: vexxhost login says enter an email address. Any idea if there is a direct horizon link that will accept username?	16:28
fungi	#status log rebooted graphite01 to resolve a page allocation failure during volume attach	16:29
openstackstatus	fungi: finished logging	16:29
clarkb	I guess that may be in the catalog /me checks	16:29
fungi	i want to say i just went to the base url of the api hostname last time i did it	16:29
*** ykarel\|away has quit IRC		16:31
fungi	clarkb: oh, try https://dashboard.vexxhost.net/	16:31
clarkb	aha thanks (it wasn't in the catalog fwiw)	16:32
fungi	i went poking around in my personal account there and that's where the "cloud console" tab directs you	16:32
clarkb	ok I can get to what is claimed to be the console but there is no interaction it seems	16:36
clarkb	just a blinking cursor location	16:36
clarkb	but typing doesn't do anything and there is no text	16:36
clarkb	I wonder if the image itself isn't configured properly to have libvirt hook up to a tty	16:36
clarkb	I'm thinking setting a local passwd won't help much given ^	16:37
clarkb	other options we've got include using a cron job to flip the config back	16:40
clarkb	(or an at job)	16:40
mnaser	clarkb: the novnc console should work	16:41
mnaser	maybe hit enter once	16:41
clarkb	mnaser: ya I've tpyed a bit of stuff to try and waken it	16:41
mnaser	and you will probably need to reset root password to make life easy	16:41
clarkb	mnaser: I wonder if it is a mismatch between image and libvirt expectations	16:41
mnaser	that's very likely, the default images work and i think they point to ttyS0..	16:41
clarkb	this is a default image but it is also focal	16:41
clarkb	I'm actually thinking now I should boot a test instance and test it there (with updated ip and gateways)	16:43
clarkb	that will be the least disruptive thing so doing that really quick	16:43
mnaser	clarkb: i THINK netplan has a dry run option	16:45
clarkb	ah ya netplan try	16:47
clarkb	still I'll do a test instance as that is proper end to end testing	16:47
clarkb	but good to know that is a sanity check	16:47
fungi	clarkb: oh, yep, i should have remembered, there's also a nova api command (accessible via osc too) which will return the url to the instance's novnc session	16:51
fungi	it's `openstack console url show`	16:52
fungi	(followed by the instance id	16:52
*** dtantsur is now known as dtantsur\|afk		16:55
clarkb	ok on my test instance it seems to work and the new network setup looks similar to the old except we lose the dynamic labels and the expiration times on routes	16:57
clarkb	I'll approve https://review.opendev.org/#/c/750484/ now and when it goes quiet I can apply the configs to the mirror and reboot it	16:58
clarkb	the only things that changed between the two host configs was the mac addr and the ipv6 addr (they share gateways too)	16:59
clarkb	I'm able to ping6 google.com and review.opendev.org from the test host after rebooting too	17:00
fungi	sounds like it worked then	17:01
clarkb	I think that all of our arm64 vm images may be built by nb03.opendev.org now too	17:05
clarkb	I'm guessing the old server is sad for some reason as ar esult	17:05
fungi	it will cease to be sad when it's dead	17:06
fungi	er, i mean, deleted	17:06
clarkb	ya as a sanity check nb03 seems to be running the updated image from the gdisk and dosfstools commit so we aren't relying on a manually patched image there. Also plenty of free disk space but we've got lvm now too so can change that later if we needto	17:07
clarkb	maybe ianw will be able to quickly confirm that there is no reason to keep nb03.openstack.org later today and we can clean it up	17:08
clarkb	once that is done we're ready for zuul zk tls in our zuul and nodepool installation	17:08
clarkb	*required zk tls	17:08
clarkb	thinking about the wheel "issue" in image builds I wonder if we should install wheel into those venvs to avoid the "ERROR" messages as they catch the eye	17:11
clarkb	also maybe we should see if pip can make that a warning not an error	17:11
*** andrewbonney has quit IRC		17:14
fungi	not just to avoid the errors, but also to improve efficiency. pip will be able to cache them	17:21
clarkb	change to disable vexxhost temporarily should land shortly. I'm going to pop out for a bit while the existing jobs there finish up. Back in a bit	17:21
fungi	once the volume replacement i've got in progress now finishes, we'll be down to just 5 remaining out of the 29 rackspace notified us about in dfw	17:22
fungi	topical, three are for nodepool builders (nb01, 02 and 04)	17:23
fungi	the other two are wiki-dev, and wiki (the latter i know will be more of a challenge)	17:23
openstackgerrit	Merged openstack/project-config master: Disable vexxhost for mirror work https://review.opendev.org/750484	17:25
clarkb	grafana shows nodes are draining in vexxhost now	17:46
openstackgerrit	Clark Boylan proposed openstack/project-config master: Revert "Disable vexxhost for mirror work" https://review.opendev.org/750765	17:52
*** hashar has joined #opendev		18:13
*** hashar has quit IRC		18:31
clarkb	waiting patiently, now down to 5 test nodes in use	18:40
AJaeger	config-core, please review https://review.opendev.org/750645 - this blocks some other changes to merge like https://review.opendev.org/750634	18:49
corvus	AJaeger: 750645+3	18:51
AJaeger	thanks, corvus	18:51
zer0c00l	clarkb, fungi re. Thanks. I thought i was doing something wrong. Does the gerritbot (ircbot) uses firehose mqtt or ssh?	18:51
corvus	zer0c00l: ssh	18:51
clarkb	zer0c00l: ssh	18:51
fungi	as does zuul	18:54
*** priteau has quit IRC		18:57
zer0c00l	Thanks.	18:57
*** hashar has joined #opendev		18:58
zer0c00l	Here i thought mqtt would be more reliable. I already use gerritlib(ssh), i have unexplained 'hangs'. It just stops receiving events.	18:58
fungi	zer0c00l: have you tried enabling ssh keepalive?	19:00
fungi	it may be extended silent periods are causing a firewall or nat somewhere to discard what it sees as a dead tcp state	19:00
openstackgerrit	Merged openstack/project-config master: Fix networking-l2gw missed change https://review.opendev.org/750645	19:00
AJaeger	clarkb: could you review https://review.opendev.org/#/c/750634 to start retiring pynotedb, please?	19:02
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: pynotedb: remove project from infrastructure systems https://review.opendev.org/750635	19:03
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: pynotedb: remove project from infrastructure systems https://review.opendev.org/750635	19:04
zer0c00l	fungi: I am basically running a forked version of https://opendev.org/opendev/gerritbot	19:06
zer0c00l	Wonder if i can enable keepalive there.	19:06
fungi	it's using gerritlib which is using paramiko, so if paramiko supports ssh keepalive, i imagine we'd be fine merging patches to support it in gerritlib and gerritbot	19:07
* fungi checks		19:07
fungi	http://docs.paramiko.org/en/stable/api/transport.html#paramiko.transport.Transport.set_keepalive	19:09
openstackgerrit	Merged openstack/project-config master: Add monasca-tempest-plugin in vexxhost tenant https://review.opendev.org/750627	19:10
zer0c00l	fungi: thanks, let me see if i can modify gerritlib to use keepalives	19:12
fungi	yeah, we're instantiating a paramiko.SSHClient() so i'm looking to see if there's a way to set it through there	19:13
fungi	similar to how we set_missing_host_key_policy()	19:13
fungi	zer0c00l: looks like we ought to be able to access it through http://docs.paramiko.org/en/stable/api/client.html#paramiko.client.SSHClient.get_transport	19:15
fungi	so probably something like calling client.get_transport().set_keepalive(60)	19:16
fungi	you might want to locally patch your gerritlib first and see if it solves your issue... if it does, we likely want to have gerritlib.gerrit.GerritConnection() grow a new attribute tied to it defaulting to 0, and then make setting that configurable in gerritbot	19:18
clarkb	what is the issue?	19:19
fungi	gerritbot ssh sessions to gerrit hanging	19:19
fungi	from initial description, sounds like it could be something as simple as a firewall killing idle tcp connections	19:20
fungi	(not the gerritbot instance we're running, one zer0c00l has)	19:21
fungi	which is why i suggest first just one-liner patching gerritlib on his end to turn on keepalive in paramiko and see if it helps at all	19:22
clarkb	got it	19:22
clarkb	vexxhost shows in use 0	19:22
clarkb	going to update mirror configs and reboot it now	19:22
clarkb	ssh is repsonding but tellin me non root users have to wait longer	19:26
clarkb	I suppose thats a good sign	19:26
fungi	indeed	19:26
clarkb	and I'm in	19:27
clarkb	routes and addresses look good and ping6 works to google.com and review.opendev.org	19:27
clarkb	https://mirror.ca-ymq-1.vexxhost.opendev.org/ is also accessible	19:28
clarkb	any objections to approving https://review.opendev.org/#/c/750765/ now to reenable the vexxhost usage?	19:28
clarkb	#status log Configured mirror01.ca-ymq-1.vexxhost.opendev.org to configure its ipv6 networking statically with netplan rather than listen to router advertisements.	19:29
openstackstatus	clarkb: finished logging	19:29
*** Goneri has joined #opendev		19:31
clarkb	I've deleted my test node since the mirror itself is happy	19:33
fungi	no objection	19:34
openstackgerrit	Merged openstack/project-config master: Retire the devstack-plugin-zmq project https://review.opendev.org/748724	19:36
*** tosky has quit IRC		19:37
openstackgerrit	Merged openstack/project-config master: pynotedb: end project gating https://review.opendev.org/750634	19:43
AJaeger	infra-root, https://review.opendev.org/#/c/597402 is the next change to retire pynotedb, please review	19:49
AJaeger	once that is merged, https://review.opendev.org/#/c/750635/ finishes the retirement.	19:49
openstackgerrit	Merged openstack/project-config master: Revert "Disable vexxhost for mirror work" https://review.opendev.org/750765	19:51
*** openstackgerrit has quit IRC		20:17
clarkb	vexxhost is in use again	20:33
*** hashar has quit IRC		21:06
clarkb	corvus: for restarting the scheduelr and web to pick up the hold change is that something we want to get done today?	21:13
clarkb	infra-root I've removed my WIP from https://review.opendev.org/#/c/749853/2 as I think we can probably start cleaning up nb03.opendev.org now	21:14
clarkb	er nb03.openstack.org is the one to cleanup	21:14
corvus	clarkb: yeah, i can do that in a few mins	21:15
clarkb	k I'm around and can help too	21:16
corvus	huh, i just noticed there's a bunch of whitespace at the bottom of the status page; i wonder if that's a pf4 thing	21:16
clarkb	hrm I don't see that	21:17
clarkb	https://zuul.opendev.org/t/openstack/status is what i'm looking at	21:17
corvus	me too. the scroll bar is about 30% down from the top when i've scrolled to the end of the actual content	21:18
corvus	it's the error drawer	21:18
corvus	with 111 errors	21:19
corvus	i guess that's fine then	21:19
corvus	now looks like a good time to restart	21:19
clarkb	the errors are the x/networking-l2gw stuff iirc and AJaeger and gmann have been working through those	21:20
gmann	clarkb: we need to force merge this to get rid of config error - https://review.opendev.org/#/c/738046/	21:21
gmann	all failure in this patch are existing failure in that repo	21:21
clarkb	corvus: should I go ahead and force merge ^ now or wait for after the restart?	21:22
gmann	I am not sure when those failure will be fixed as networking-midonet is in no maintainer situation	21:22
gmann	http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017116.html	21:23
corvus	clarkb: maybe wait till restart	21:23
clarkb	corvus: k will wait	21:23
corvus	which is imminent	21:23
clarkb	gmann: ya so I guess we're fixing zuul but then the repo itself will need someone to step in and address the issues	21:23
gmann	yeah.	21:24
clarkb	corvus: looks like the most recent run of service-zuul failed	21:24
corvus	clarkb: the image looked up to date so i proceeded	21:25
clarkb	ze09 is unreachable is the issue there	21:25
clarkb	so ya should be fine to proceed	21:25
corvus	this restart will probably take 20+ minutes due to schema update	21:25
clarkb	I'm able to ssh into ze09 so not sure what is going on there	21:25
corvus	weird, ze09 seems to be working. agreed	21:26
corvus	running jobs and everything	21:26
corvus	maybe a fluke?	21:26
clarkb	ya I guess check it in an hour and see if it persists (we run service-zuul hourly)	21:26
corvus	in other news, the light outside is brighter and now looks like dawn	21:30
corvus	turns out we really needad a nuclear winter to cool off after the heat wave	21:31
clarkb	we seem to be just north of the worst of it here. The photos coming out of places to the south of us are crazy though	21:33
corvus	we learned this morning that google removed the manual white balance feature from the android camera app. fortunately we had a better camera handy.	21:35
clarkb	https://i.redd.it/c4d2tvxsryl51.jpg was Salem yesterday	21:35
smcginnis	Dang!	21:36
clarkb	thats about 45-60 minutes south of here depending on traffic	21:36
smcginnis	I better check in on my friends in Eugene.	21:37
clarkb	a bunch of towns up hill from salem burned down yesterday too	21:37
clarkb	its weird because we're all being told to stay home and off roads again but not due to the pandemic	21:38
fungi	that is, like, alien abduction supernatural horror movie red sky	21:40
johnsom	The EPA AQI tops out at 500 which is "respirator recommended". We are currently somewhere between 710 and 551	21:40
JayF	We're checking a /lot/ of apocalypse boxes for 2020.	21:41
johnsom	We have had the total recall experience since Monday night.	21:41
clarkb	johnsom: I think if you get north of about wilsonville it clears up a bunch	21:44
clarkb	still not great but much better than that	21:44
corvus	wow that's red. we're more orange. this is pretty representative of what we saw this morning: https://s.hdnux.com/photos/01/14/01/75/19930957/5/gallery_xlarge.jpg	21:44
clarkb	and the wind changes in a couple days and it will all be different again too	21:44
johnsom	Yeah, I have checked the road cameras and air quality sites.	21:44
fungi	i'll just stay in my bunker. thanks	21:45
johnsom	Hmm, just got a "Proxy Error" from https://zuul.openstack.org/status "Reason: Error reading from remote server" Apache/2.4.18 (Ubuntu) Server at zuul.openstack.org Port 443	21:46
corvus	i'm restarting it	21:46
clarkb	and it needs to do a db migration so a bit slower than normal	21:46
corvus	#status log restarted zuul scheduler and web at commit 6a5b97572f73a70f72a0795d5e517cff96740887 to pick up held db attribute	21:47
openstackstatus	corvus: finished logging	21:47
corvus	re-enqueue is in progress	21:47
clarkb	corvus: now I see the extra scroll space	21:48
clarkb	corvus: I think its rendering to the bottom pre restart then remembering that length but when there are fewer chagnes (because renequeue is serial) it renders the shorter list in the bigger space	21:49
clarkb	(I didn't open the error drawer but maybe it does the same thing)	21:49
clarkb	corvus: good to force merge that change now?	21:51
corvus	clarkb: i think it's only the error drawer. the drawer is always there but normally invisible. the length of my page is exactly the length required to display it.	21:52
corvus	clarkb: and yes, gtg	21:52
corvus	reenqueue is finished	21:52
fungi	that explains why i saw something similar on a build result page	21:52
clarkb	https://review.opendev.org/#/c/738046/ is merged	21:53
corvus	yeah, i'd expect it on any sufficiently short page in the openstack tenant	21:53
clarkb	gmann: ^ fyi	21:53
clarkb	corvus: ah got it	21:53
fungi	so the amount of blank space presumably could be observed to differ between tenants based on how many errors a given tenant has	21:53
corvus	at least for the next few minutes :)	21:53
corvus	fungi: yep	21:53
gmann	clarkb: thanks	21:54
clarkb	gmann: are there stable branch fixes for networking-midonet too?	22:00
gmann	clarkb: not yet. i can backport it but not sure if their stable branch are active or not	22:02
clarkb	gmann: well the errors are there as well.	22:02
gmann	clarkb: ok. so i need to backport this and then you can force merge?	22:04
gmann	it seems we need to do until ocata	22:04
clarkb	yes I can help with that	22:05
gmann	ok. doing	22:05
gmann	give me 5 min, in between of fixing openstacksdk for Focal migration	22:05
*** openstackgerrit has joined #opendev		22:09
openstackgerrit	Clark Boylan proposed opendev/system-config master: Remove nodepool builder puppetry and nb03.openstack.org https://review.opendev.org/749853	22:09
clarkb	noticed a relatively small groups thing I missed but figured I'd update the chagne to be more complete	22:10
clarkb	ianw: ^ if you have time to look at the new server and check if it is happy then review ^ that would be great (from what I can see the new server is good)	22:10
clarkb	but then if ^ land I think we can just delete the server and its volume?	22:10
ianw	heh, i though i just +2'd that :)	22:10
clarkb	oh maybe you did, sorry	22:10
ianw	yeah, lgtm, when i left yesterday everything was built on the new server	22:11
ianw	so if it hasn't exploded overnight, seems good :)	22:11
clarkb	its not super clear to me on the best way to clean up the images that nb03.openstack.org has built	22:11
clarkb	probably just manually rm those when we are done?	22:11
ianw	hrm, could we put everything on pause in the config there and then delete them using the cli?	22:13
clarkb	ya that may work	22:14
clarkb	we can't remove the images from the image list as that will delete what the new server is building	22:14
clarkb	but pausing then asking nodepool to delete them may be easiest	22:14
* clarkb writes the pause change		22:14
openstackgerrit	Clark Boylan proposed openstack/project-config master: Pause image builds on nb03.openstack.org https://review.opendev.org/750822	22:16
clarkb	ianw: ^ something like that	22:16
ianw	clarkb: i think it's still unpuppeted? or did we fix that?	22:17
ianw	nb03.openstack.org # ianw 2020-05-20 hand edits applied to dib to build focal on xenial	22:18
ianw	no, we didn't ... so i guess just apply by hand and we can get rid of it asap?	22:18
clarkb	wfm I'll make that change manually	22:18
clarkb	wait it looks like they are all paused already	22:19
clarkb	should I go ahead and ask nodepool to delete the dib images on nb03.openstack.org?	22:19
ianw	ohh, istr fungi doing something with it	22:20
ianw	i think so, let's get rid of it rather than archaeological dig what's going on :)	22:20
clarkb	alright I'll do dib-image-delete on all the arm64 images that don't show nb03.opendev.org as their builder	22:21
clarkb	that should cause nodepool to go and try and clean things up automatically	22:21
clarkb	when that is done we can land https://review.opendev.org/749853 then delete the server and volume	22:22
clarkb	deletes have been requested. Some have deleted others haven't I wonder if there will be issues :/	22:26
clarkb	ianw: the images which are having a hard time deleting are quite old. I wonder if they are just super stale?	22:29
ianw	it's not trying on linaro-london is it?	22:29
clarkb	oh ha I think that is it	22:30
ianw	all those 118 day old ones are there	22:30
clarkb	the london cloud is gone right? so we should edit the zk db to clear those upload records instead?	22:30
ianw	yeah it is, i think it went away ungracefully from what i remember. as in wasn't working	22:31
ianw	do have instructions for hand-edit zk? i've not done but would be ahppy to learn	22:32
ianw	happy even	22:32
clarkb	ianw: I think there are simple instructions somewhere let me see	22:32
clarkb	ianw: https://docs.opendev.org/opendev/system-config/latest/nodepool.html#zookeeper that talks about the client.	22:33
clarkb	ianw: but then its a simple fs like navigation system. `help` for commands `ls` to list things `cd` to change your context `get` to get a nodes contents iirc	22:34
clarkb	I think what we want to do is delete the upload reocrds for those images then the rest will be done automatically	22:34
clarkb	I can double check things if you like	22:34
ianw	ok, let me see if i can get something talking	22:35
openstackgerrit	Clark Boylan proposed opendev/system-config master: Block port 2181 on zookeeper hosts https://review.opendev.org/750833	22:35
ianw	i wonder if it's harder with containers and ssl etc	22:35
clarkb	ianw: ssl does make it harder ^ is sort of related	22:35
clarkb	ianw: we keep listening on port 2181 too so if you run this on the zk server itself you can talk insecurely locally	22:35
clarkb	I think that is going to be our keep things simple mode of operation going forward, listen on 2181 and 2281 but only expose 2281 outside the host	22:36
ianw	yeah "connect localhost" seems to have worked	22:38
clarkb	I think you want to rm paths like /nodepool/images/centos-8-arm64/builds/0000000043/providers/linaro-london/images/0000000001	22:38
clarkb	and if you do those leaf nodes nodepool should cleanup the rest of it for us	22:39
ianw	http://paste.openstack.org/show/797675/	22:41
ianw	is the list	22:41
ianw	i'll try /nodepool/images/centos-8-arm64/builds/0000000045/providers/linaro-london/images/0000000001	22:42
clarkb	those paths look like what we want to delete to me	22:42
ianw	ok, the upper paths seem all gone, so i'd say it's cleaned up properly	22:44
ianw	i'll do the rest	22:44
*** auristor has quit IRC		22:45
gmann	clarkb: https://review.opendev.org/#/q/I9231b2b362a1f2316307908b7e9ad57a709700f6	22:45
ianw	clarkb: ok, no more linaro london	22:45
clarkb	gmann: thanks I'll work on that shortly	22:46
ianw	what's the deal with the 168 day old gentoo images?	22:46
clarkb	ianw: no idea	22:47
ianw	2020-05-07 15:24:52.286 \| tar (child): xz: Cannot exec: No such file or directory	22:48
*** auristor has joined #opendev		22:49
clarkb	gmann: done	22:55
gmann	clarkb: thank.	22:55
gmann	clarkb: i will backport networking-odl too and ask lajos to merge once he is online tomorrow - https://review.opendev.org/#/c/738074/	22:56
gmann	after that networking-l2gw error should disappear completely	22:57
fungi	sorry, post dinner catching up... what was i doing something with? paused image builds on old nb03?	22:58
clarkb	fungi: ya, we decided we're just going to delete it anyway so figuring that out doesn't matter much :)	22:58
fungi	good, because i don't know that my memory can take the strain at this point	22:59
fungi	good riddance	23:00
fungi	#status log cinder volume for graphite01 (current production server) has been replaced and cleaned up	23:01
openstackstatus	fungi: finished logging	23:01
fungi	i'm working on nb01/02/04 now but should be transparent	23:02
clarkb	my next builder question is should we rm nb04?	23:02
clarkb	I'm not sure if we need 3 x86 builders (it doesn't hurt but is it necessary?)	23:02
fungi	how's our disk space? that's the primary indicator, right?	23:04
clarkb	they are each using about half of their disk	23:07
clarkb	so cutting out 04 would push 01 and 02 to 3/4 of their disk or so	23:07
*** Goneri has quit IRC		23:21
*** cloudnull4 has joined #opendev		23:26
*** cloudnull has quit IRC		23:26
*** cloudnull4 is now known as cloudnull		23:26
fungi	seems okay	23:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!