Monday, 2022-04-25

*** rlandy is now known as rlandy\|PTO		00:12
*** ysandeep\|out is now known as ysandeep		03:51
frickler	ianw: can you please revisit https://review.opendev.org/c/zuul/nodepool/+/834152 when you have time? we should find a solution before gtema pushes the button	04:49
frickler	I also added the jammy related patches to the meeting agenda, but wouldn't mind getting reviews earlier ;)	04:52
*** bhagyashris is now known as bhagyashris\|ruck		05:46
*** ysandeep is now known as ysandeep\|afk		06:01
*** ysandeep\|afk is now known as ysandeep		06:48
*** pojadhav is now known as pojadhav\|afk		07:31
*** jpena\|off is now known as jpena		07:35
*** pojadhav\|afk is now known as pojadhav\		08:26
*** pojadhav\ is now known as pojadhav		08:26
*** tkajinam is now known as tkajinam\|away		08:33
*** rlandy\|PTO is now known as rlandy		10:33
*** dviroel_ is now known as dviroel		11:07
*** dviroel is now known as dviroel\|rover		11:07
*** ysandeep is now known as ysandeep\|afk		12:25
*** artom__ is now known as artom		13:17
opendevreview	Cedric Jeanneret proposed opendev/system-config master: Use goto, chain policy and drop REJECT https://review.opendev.org/c/opendev/system-config/+/839210	13:22
opendevreview	Cedric Jeanneret proposed opendev/system-config master: Use goto, chain policy and drop REJECT https://review.opendev.org/c/opendev/system-config/+/839210	13:25
opendevreview	Cedric Jeanneret proposed openstack/project-config master: Use goto, chain policy and drop REJECT https://review.opendev.org/c/openstack/project-config/+/839212	13:29
*** ysandeep\|afk is now known as ysandeep		13:41
*** pojadhav- is now known as pojadhav		14:06
opendevreview	Albin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir https://review.opendev.org/c/zuul/zuul-jobs/+/839225	14:06
opendevreview	Albin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir https://review.opendev.org/c/zuul/zuul-jobs/+/839225	14:09
opendevreview	Albin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir https://review.opendev.org/c/zuul/zuul-jobs/+/839225	14:11
*** pojadhav- is now known as pojadhav		14:22
corvus	i'm going to begin a zuul rolling restart now	14:27
*** tkajinam\|away is now known as tkajinam		14:42
clarkb	once I've caught up on email and system updates I'm going to look at shutting down the ELK servers. Then I'll snapshot subunit-worker01, health.o.o, logstash-worker01, logstash01, elasticsearch02 and then delete them all?	14:49
clarkb	fungi: ^ when you do that do you shutdown within the instance and then snapshot using osc or do you have to snapshot via the web ui?	14:49
clarkb	also if anyone has reason to not do these shutdowns and deletions just yet please let me know	14:51
clarkb	but I haven't seen anything that would prevent it at this point	14:51
*** hrww is now known as hrw		15:10
fungi	clarkb: within the instance i `systemctl poweroff` and then the instance shows down in the nova api once that completes	15:19
clarkb	ah cool. I've just gone and made a bunch of notes about servers and ip addres and uuids. Proceeding to shutdown instances. Then will sort out snapshots after	15:22
fungi	for xenial, systemctl may not be functional (i can't remember) but just `sudo poweroff` should also do the trick	15:25
clarkb	its systemd so seems to haev worked. Openstack hasn't caught up that they are shutdown yet. But thats ok as I need to snapshot the old health and subunit worker servers first and they are long caught up	15:28
fungi	yeah, i usually give it a few minutes	15:28
fungi	but this way, if you ever have to boot a snapshot of the system it thinks it's just coming back up from a clean reboot of the original	15:29
clarkb	these are the servers I plan to snapshot: health01.openstack.org, subunit-worker01.openstack.org, logstash01.openstack.org logstash-worker01.openstack.org, and elasticsearch02.openstack.org	15:29
clarkb	that last server (es02) has a data volume attached to it which the snapshot should ignore which is what we want	15:29
fungi	sounds good	15:29
clarkb	I don't intend on snapshotting all of the cluster members, are we ok with that?	15:29
clarkb	seems a bit overkill	15:30
fungi	yeah, i don't see any reason to so more than one from each cluster	15:30
*** dviroel\|rover is now known as dviroel\|rover\|lunch		15:30
clarkb	`osc server image create` seems to be the command to snapshot?	15:31
clarkb	I've got that running for health01 and subunit-worker01 now	15:34
*** ysandeep is now known as ysandeep\|out		15:38
fungi	yeah, that works, or the webui	15:39
fungi	speaking of volumes, we got a notification from rackspace that there's going to be a cinder maintenance in ord next week impacting a volume for the old bup backup server. i don't think we need to take any action	15:40
clarkb	api reflects shutdown status for all the servers now. I'll proceed to snapshot the other 3 servers I mentioned now.	15:40
clarkb	now I guess I need to wait a bit for the snapshots	15:43
clarkb	trying to do a volume list I'm reminded that we need a hacked up clouds.yaml to do volume listings?	15:47
clarkb	aha I can override on the ocmmand line	15:48
fungi	clarkb: i've been unable to work out how to do it with current osc, so i just use ~fungi/launch-env/bin/cinder --os-volume-api-version=1 list (with the old-school envvars exported in the environment)	15:49
clarkb	ya --os-volume-api 1 workjed for me	15:50
clarkb	I don't expect the es volumes to go away automatically but wanted to be sure if they did that I had a record of them ebfore they do go away	15:51
clarkb	I've got that now and can manually delete them if necessary	15:51
clarkb	where are at in the process is waiting for snapshots to complete. Then I might have fungi or whoever else is interested do a quick look and make sure I haven't forgotten anything then I'll proceed to instance deletions	15:53
fungi	trying to do `openstack --os-volume-api-version 1 volume list` i get "Version 1 is not supported, use supported version 3 instead. Invalid client version '1.0'. Major part should be '3'"	15:53
clarkb	fungi: oh ya I use an older install in my homedir too	15:53
clarkb	because osc removed the old api support	15:53
fungi	right, this is with 5.6.0	15:53
fungi	okay	15:53
clarkb	the first two snapshots (health01 and subunit-worker01) are done. I'm going to find breakfast while I wait on the other 3	15:56
*** dviroel\|rover\|lunch is now known as dviroel\|rover		16:17
clarkb	fungi: ok health01, subunit-worker01/02, logstash01, logstash-worker01-20, elasticsearch02-07 are all shutdown. health01, subunit-worker01, logstash01, logstash-worker01, and elasticsearch02 appear to all have snapshot images now. Any chance you have time to double check me on that and give an all clear to begin the actual deletions?	16:18
opendevreview	Clark Boylan proposed openstack/project-config master: Set noop jobs on ELK puppetry to prep for retirement https://review.opendev.org/c/openstack/project-config/+/839235	16:40
opendevreview	Clark Boylan proposed opendev/puppet-kibana master: Retire this project https://review.opendev.org/c/opendev/puppet-kibana/+/839237	16:43
opendevreview	Clark Boylan proposed opendev/puppet-elasticsearch master: Retire this project https://review.opendev.org/c/opendev/puppet-elasticsearch/+/839238	16:46
*** jpena is now known as jpena\|off		16:46
opendevreview	Clark Boylan proposed opendev/puppet-log_processor master: Retire this project https://review.opendev.org/c/opendev/puppet-log_processor/+/839239	16:48
opendevreview	Clark Boylan proposed opendev/puppet-logstash master: Retire this project https://review.opendev.org/c/opendev/puppet-logstash/+/839240	16:50
opendevreview	Clark Boylan proposed opendev/puppet-subunit2sql master: Retire this project https://review.opendev.org/c/opendev/puppet-subunit2sql/+/839242	16:52
opendevreview	Clark Boylan proposed openstack/project-config master: Finalize ELK puppetry retirement https://review.opendev.org/c/openstack/project-config/+/839243	16:57
clarkb	once the servers are gone I think we're good to land ^	16:58
fungi	clarkb: sorry, was stuffing my face. i'll take a look now	17:11
clarkb	thanks. Not in a huge rush so no biggie. I found time to do other things like construct that stack of changes	17:12
fungi	clarkb: i see all 5 images saved, lgtm	17:14
clarkb	great, I'll proceed with deleting instances now.	17:14
clarkb	no objection to that right?	17:15
fungi	none from me, thanks!	17:17
clarkb	the subunit workers and ehalth have been deleted. Now to work on the ELK servers	17:19
clarkb	alright all the ELK related servers are gone now. It looks like the volumes did not get auto deleted. I'll proceed to delete those manually	17:25
clarkb	volume elasticsearch07.opendev.org/main01 entered error deleting. I'll double check that none of the servers did that now	17:28
clarkb	none of the servers entered an error state. They are gone	17:29
clarkb	Next on my list is dns record cleanup. Then after that the last thing I've got is what to do with the subunit2sql trove instance?	17:29
clarkb	fungi: ^ you may have ideas	17:30
fungi	we can also snapshot that if you think the data is likely to be relevant	17:30
clarkb	I think the issue there is its huge iirc	17:31
clarkb	and ya i'm not sure how relevant it is considering no one noticed the service stopped running for quite a while	17:32
fungi	huge for a trove instance	17:32
fungi	500gb maybe?	17:32
fungi	i personally don't think there's any point in keeping the data	17:32
clarkb	ya I think I'm with you on that.	17:32
clarkb	and we can just delete the trove instance	17:33
fungi	good and bad news on pep 686: the sc has agreed to making utf-8 mode the default, but has scheduled it to not happen until 3.5	17:34
fungi	3.15	17:34
clarkb	wow thats a ways out	17:40
clarkb	fungi: ok all dns records (including the health.o.o and logsatsh.o.o CNAMEs) have been removed. Except for A records for subunit-worker01 and subunit-worker02. They just don't show up in the web ui so not sure what is going on there	17:41
clarkb	otherwise all the A and AAAA and CNAME records for those servers have been removed	17:41
clarkb	weird I exited the list view and opened the zone again and now I see those records. I'll delete them before they disappear again	17:42
fungi	clarkb: thanks, looks like they're no longer resolving	17:43
fungi	were you scrolling through the entries or just trying to search in the browser?	17:44
clarkb	I was scrolling. It seemed like none of the records starting with s loaded though	17:44
fungi	bizare	17:44
clarkb	once I reloaded the list and scrolled through they showed up	17:44
clarkb	I'm going to go aheada nd status log here, but then can ask about the subunit2sql db in the team meeting tomorrow	17:45
clarkb	looks like it is using about 286GB out of 500GB max	17:45
clarkb	#status log The retired ELK, subunit2sql, and health api services have now been deleted.	17:45
opendevstatus	clarkb: finished logging	17:46
clarkb	alright meeting agenda is getting udpated before being sent later today. Please add your content if you ahve any.	17:54
clarkb	fungi: https://review.opendev.org/q/topic:retire-elk the oldest change there, 839235, is straightforward flip CI to noop jobs for these repos that we'll retire change if you have time for that	17:56
clarkb	once that lands I can recheck all the changes to retire content in the repos	17:57
*** rlandy is now known as rlandy\|mtg		18:00
fungi	sounds great, thanks!	18:00
frickler	clarkb: fungi: how about I merge https://review.opendev.org/c/opendev/system-config/+/838923 (jammy mirroring) tomorrow my morning and watch how it goes? seems pretty low risk except possibly filling up its quota	18:06
clarkb	frickler: that wfm. I think the cleanups I did should give it plenty of room	18:07
fungi	i'm also happy to monitor it today if you'd rather have a head start on things	18:08
opendevreview	Merged openstack/project-config master: Set noop jobs on ELK puppetry to prep for retirement https://review.opendev.org/c/openstack/project-config/+/839235	18:08
frickler	fungi: if you have time for that, I won't object, then I could possibly watch an image build instead ;)	18:09
fungi	approved! once it deploys i'll take the lock and run reprepro without the timeout to make sure it completes	18:12
frickler	cool, thx	18:13
fungi	there's currently a reprepro run for ubuntu in progress, but i have a root screen session going on mirror-update.o.o and will grb the lock once it's released	18:14
opendevreview	Clark Boylan proposed opendev/system-config master: Update Gerrit build checkouts https://review.opendev.org/c/opendev/system-config/+/839250	18:14
opendevreview	Clark Boylan proposed opendev/system-config master: Explicitly disable Gerrit tracing.performanceLogging https://review.opendev.org/c/opendev/system-config/+/839251	18:14
clarkb	more gerrit 4.5 prep ^	18:14
clarkb	er3.5	18:14
opendevreview	Merged opendev/system-config master: Start mirroring jammy https://review.opendev.org/c/opendev/system-config/+/838923	18:38
*** rcastillo_ is now known as rcastillo		19:02
*** rlandy\|mtg is now known as rlandy		19:03
clarkb	fungi: what do you think about landing those project retirement changes now that the noop change is in place?	19:34
fungi	sounds good to me	19:37
fungi	i can review after i finish making dinner	19:37
clarkb	great. I'll probably pop out for a bike ride in an hour or too as well but the impact for those changes should be nil now that the servers are gone	19:38
clarkb	mostly trying to clean everything up so that we don't leave anything behind to confuse us later :)	19:38
clarkb	I've not spent nearly enough time on the bike this year. Trying to correct that.	19:39
clarkb	https://review.opendev.org/c/opendev/system-config/+/839251/1 is interesting because I amanged to catch that reading the gerrit mailing list. Basically gerrit 3.5 uses more memory by default because by default it collects tracing info	20:09
clarkb	since we aren't hooked up to a tracing system we can disable it and save some memory hopefully	20:10
clarkb	fungi: looks like you might have the ubuntu mirror updatel ock now?	20:18
fungi	yes	20:18
clarkb	at least I see a flock for it but no other processes	20:18
clarkb	cool	20:19
fungi	i have the reprepro script readied in a root screen session	20:19
fungi	was just waiting for the deploy results to report	20:19
fungi	which looks like it did at 19:01:07	20:23
fungi	starting it now	20:23
fungi	output is tee'd to the usual log so it can also be seen in the screen buffer	20:23
clarkb	thanks	20:26
fungi	ERROR: Condition '437D05B5\|C0B21F32' not fulfilled for '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release.gpg'.	20:26
fungi	Signatures in '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release.gpg':	20:26
fungi	'871920D1991BC93C' (signed 2022-04-25): missing pubkey	20:26
fungi	Error: Not enough signatures found for remote repository ubuntu-security (http://security.ubuntu.com/ubuntu jammy-security)!	20:27
clarkb	hrm I thought ubuntu used the same key over and over? Maybe not for security?	20:27
fungi	guess we need to add a key	20:27
fungi	gpg --verify /afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release{.gpg,}	20:29
fungi	gpg: Signature made 2022-04-25T18:35:45 UTC using RSA key 0x871920D1991BC93C	20:29
fungi	i can't seem to gpg --recv-keys 0x871920D1991BC93C	20:30
fungi	gpg: key 0x871920D1991BC93C: new key but contains no user ID - skipped	20:30
fungi	that key was created in 2018	20:31
clarkb	they distribute the keys with apt. Is possible they just never put it on the key servers?	20:31
clarkb	https://bugs.launchpad.net/ubuntu/+source/reprepro/+bug/1968198 doesn't seem related but might end up affecting us too	20:32
fungi	gpg --keyserver keyserver.ubuntu.com --receive-keys 0x871920D1991BC93C	20:34
fungi	that worked	20:34
fungi	gpg: key 0x871920D1991BC93C: public key "Ubuntu Archive Automatic Signing Key (2018) <ftpmaster@ubuntu.com>" imported	20:34
clarkb	I wonder if that just means we haven't been updating our keys like we did with debian in the repreoro config management. That seems possible	20:36
fungi	yeah, i'm experimenting	20:37
fungi	looks like we use playbooks/roles/import-gpg-key/tasks/main.yaml to import each of the archive keys into the root gnupg keyring?	20:42
clarkb	yes, we keep an ascii armored version of the pubkey in the role and those tasks iterate over them and intsall them	20:43
fungi	yep, just making sure. so if i really want to test this, i'll end up bypassing that role	20:43
fungi	i'll just propose the change i think it needs	20:44
clarkb	I think you also need to list the key fingerprint in the reprepro configs	20:44
clarkb	I'm going to work on getting out for that exercise now. Will check on that when I get back. I don't think its a big deal if you manually toggle it and also push a change we land next	20:46
opendevreview	Jeremy Stanley proposed opendev/system-config master: Add Ubuntu's 2018 Archive Signing Key to reprepro https://review.opendev.org/c/opendev/system-config/+/839261	20:51
fungi	yeah, i think i got everything in ^	20:52
fungi	i'm around all night, so happy to just wait for that to land and deploy and then try again	20:53
ianw	frickler: thanks for working on jammy things. in answer to your prior question on why we use ntpdate/chrony/systemd-timesync/* the answer is pretty much I don't know and will have to context switch it back in :)	21:21
ianw	i think we kind of make decisions that seem right at the time, but it's always worth revisiting as the world turns	21:22
fungi	in the past we've oscillated between taking whatever the platform provides by default vs overriding platform defaults in order to drive consistency across different node labels	21:28
fungi	and this is yet another of those situations	21:28
opendevreview	Merged opendev/system-config master: Add Ubuntu's 2018 Archive Signing Key to reprepro https://review.opendev.org/c/opendev/system-config/+/839261	21:58
*** rlandy is now known as rlandy\|bbl		22:16
fungi	and it's deployed, so trying again	22:20
fungi	seems to have gotten past the prior error	22:22
*** dviroel\|rover is now known as dviroel\|rover\|afk		22:36
clarkb	fungi: have you run into the error in the lp bugI linked?	22:43
fungi	nope	22:43
clarkb	cool hopefully that got fixed one way or another	22:43
fungi	with 839261 deployed it's well into pulling down packages now	22:43
clarkb	ianw: did you see frickelr was aksing if you could followup to https://review.opendev.org/c/zuul/nodepool/+/834152 ? I think his suggesting is that you push a new patchset to make the change you are asking for to avoid any confusion	22:43
fungi	probably will still be going for hours but i'll try to keep an eye on it over the course of my evening	22:44
clarkb	thanks. My bike ride was fun. I went out and it was decent weather. Its sunny now. But for about 45 minutes of my bike ride the skies decided torrential downpour and hail would be appropriate	22:47
clarkb	I'll send out the meeting agenda in a few minutes if there is anything else to add let me know	22:52
clarkb	I guess our afs graphs track the RO volumes and not RW so we won't see progress via disk usage	23:05
fungi	yeah, not until it finishes	23:06
ianw	it might show the rw volume, i don't think it explicitly doesn't at least ...	23:33
clarkb	its not a big deal I was just hoping to see a slowly increasing disk utlization grpah to estimate progress	23:34
ianw	i think it might pull the r/o into a different stat https://opendev.org/opendev/afsmon/src/branch/master/afsmon/__init__.py#L82	23:34
ianw	yeah the readonly ones are like mirror_fedora_readonly	23:37
ianw	and the stats page shows the r/w volumes. it's interesting because i'm not sure if that naming is a feature or a bug	23:39
clarkb	huh it does show a small bump now to 692GB	23:40
clarkb	also I've realized that ubuntu ports for arm64 is a seprate volume so we may not have room to do those just yet. That said I just cleared out 6TB of elasticsearch volumes. Maybe we should allocate 2TB back to AFS	23:40
ianw	it was only individual volumes exceeding 2tb that was the issue, wasn't it? when we had our on-disk pypi mirror	23:42
clarkb	yes I beliee so	23:43
clarkb	pretty sure we can go to 3TB total then keep individual volumes under 2TB	23:43
clarkb	in this case I say 2TB to afs because 1TB to each dfw server	23:43
ianw	it would complicate things to have a vicepb i guess, but speaking from experience if this volume needs to fsck it's nerve wracking	23:46
fungi	oh, right, because it's a virtual fs built on files on another fs	23:47
fungi	i always forget it's not just backed by a raw block device	23:47
corvus	as a point of interest -- there is once again a single job running for an excessive amount of time that's holding up the zuul rolling restart. this time it's the nova-live-migration job and it's stuck running the opendev.org/opendev/base-jobs/playbooks/base/cleanup.yaml playbook	23:53
corvus	i think that playbook may not deal with systemic node connection problems well :/	23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!