*** rlandy is now known as rlandy|PTO | 00:12 | |
*** ysandeep|out is now known as ysandeep | 03:51 | |
frickler | ianw: can you please revisit https://review.opendev.org/c/zuul/nodepool/+/834152 when you have time? we should find a solution before gtema pushes the button | 04:49 |
---|---|---|
frickler | I also added the jammy related patches to the meeting agenda, but wouldn't mind getting reviews earlier ;) | 04:52 |
*** bhagyashris is now known as bhagyashris|ruck | 05:46 | |
*** ysandeep is now known as ysandeep|afk | 06:01 | |
*** ysandeep|afk is now known as ysandeep | 06:48 | |
*** pojadhav is now known as pojadhav|afk | 07:31 | |
*** jpena|off is now known as jpena | 07:35 | |
*** pojadhav|afk is now known as pojadhav\ | 08:26 | |
*** pojadhav\ is now known as pojadhav | 08:26 | |
*** tkajinam is now known as tkajinam|away | 08:33 | |
*** rlandy|PTO is now known as rlandy | 10:33 | |
*** dviroel_ is now known as dviroel | 11:07 | |
*** dviroel is now known as dviroel|rover | 11:07 | |
*** ysandeep is now known as ysandeep|afk | 12:25 | |
*** artom__ is now known as artom | 13:17 | |
opendevreview | Cedric Jeanneret proposed opendev/system-config master: Use goto, chain policy and drop REJECT https://review.opendev.org/c/opendev/system-config/+/839210 | 13:22 |
opendevreview | Cedric Jeanneret proposed opendev/system-config master: Use goto, chain policy and drop REJECT https://review.opendev.org/c/opendev/system-config/+/839210 | 13:25 |
opendevreview | Cedric Jeanneret proposed openstack/project-config master: Use goto, chain policy and drop REJECT https://review.opendev.org/c/openstack/project-config/+/839212 | 13:29 |
*** ysandeep|afk is now known as ysandeep | 13:41 | |
*** pojadhav- is now known as pojadhav | 14:06 | |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir https://review.opendev.org/c/zuul/zuul-jobs/+/839225 | 14:06 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir https://review.opendev.org/c/zuul/zuul-jobs/+/839225 | 14:09 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: mirror-workspace-git: urlencode src_dir https://review.opendev.org/c/zuul/zuul-jobs/+/839225 | 14:11 |
*** pojadhav- is now known as pojadhav | 14:22 | |
corvus | i'm going to begin a zuul rolling restart now | 14:27 |
*** tkajinam|away is now known as tkajinam | 14:42 | |
clarkb | once I've caught up on email and system updates I'm going to look at shutting down the ELK servers. Then I'll snapshot subunit-worker01, health.o.o, logstash-worker01, logstash01, elasticsearch02 and then delete them all? | 14:49 |
clarkb | fungi: ^ when you do that do you shutdown within the instance and then snapshot using osc or do you have to snapshot via the web ui? | 14:49 |
clarkb | also if anyone has reason to not do these shutdowns and deletions just yet please let me know | 14:51 |
clarkb | but I haven't seen anything that would prevent it at this point | 14:51 |
*** hrww is now known as hrw | 15:10 | |
fungi | clarkb: within the instance i `systemctl poweroff` and then the instance shows down in the nova api once that completes | 15:19 |
clarkb | ah cool. I've just gone and made a bunch of notes about servers and ip addres and uuids. Proceeding to shutdown instances. Then will sort out snapshots after | 15:22 |
fungi | for xenial, systemctl may not be functional (i can't remember) but just `sudo poweroff` should also do the trick | 15:25 |
clarkb | its systemd so seems to haev worked. Openstack hasn't caught up that they are shutdown yet. But thats ok as I need to snapshot the old health and subunit worker servers first and they are long caught up | 15:28 |
fungi | yeah, i usually give it a few minutes | 15:28 |
fungi | but this way, if you ever have to boot a snapshot of the system it thinks it's just coming back up from a clean reboot of the original | 15:29 |
clarkb | these are the servers I plan to snapshot: health01.openstack.org, subunit-worker01.openstack.org, logstash01.openstack.org logstash-worker01.openstack.org, and elasticsearch02.openstack.org | 15:29 |
clarkb | that last server (es02) has a data volume attached to it which the snapshot should ignore which is what we want | 15:29 |
fungi | sounds good | 15:29 |
clarkb | I don't intend on snapshotting all of the cluster members, are we ok with that? | 15:29 |
clarkb | seems a bit overkill | 15:30 |
fungi | yeah, i don't see any reason to so more than one from each cluster | 15:30 |
*** dviroel|rover is now known as dviroel|rover|lunch | 15:30 | |
clarkb | `osc server image create` seems to be the command to snapshot? | 15:31 |
clarkb | I've got that running for health01 and subunit-worker01 now | 15:34 |
*** ysandeep is now known as ysandeep|out | 15:38 | |
fungi | yeah, that works, or the webui | 15:39 |
fungi | speaking of volumes, we got a notification from rackspace that there's going to be a cinder maintenance in ord next week impacting a volume for the old bup backup server. i don't think we need to take any action | 15:40 |
clarkb | api reflects shutdown status for all the servers now. I'll proceed to snapshot the other 3 servers I mentioned now. | 15:40 |
clarkb | now I guess I need to wait a bit for the snapshots | 15:43 |
clarkb | trying to do a volume list I'm reminded that we need a hacked up clouds.yaml to do volume listings? | 15:47 |
clarkb | aha I can override on the ocmmand line | 15:48 |
fungi | clarkb: i've been unable to work out how to do it with current osc, so i just use ~fungi/launch-env/bin/cinder --os-volume-api-version=1 list (with the old-school envvars exported in the environment) | 15:49 |
clarkb | ya --os-volume-api 1 workjed for me | 15:50 |
clarkb | I don't expect the es volumes to go away automatically but wanted to be sure if they did that I had a record of them ebfore they do go away | 15:51 |
clarkb | I've got that now and can manually delete them if necessary | 15:51 |
clarkb | where are at in the process is waiting for snapshots to complete. Then I might have fungi or whoever else is interested do a quick look and make sure I haven't forgotten anything then I'll proceed to instance deletions | 15:53 |
fungi | trying to do `openstack --os-volume-api-version 1 volume list` i get "Version 1 is not supported, use supported version 3 instead. Invalid client version '1.0'. Major part should be '3'" | 15:53 |
clarkb | fungi: oh ya I use an older install in my homedir too | 15:53 |
clarkb | because osc removed the old api support | 15:53 |
fungi | right, this is with 5.6.0 | 15:53 |
fungi | okay | 15:53 |
clarkb | the first two snapshots (health01 and subunit-worker01) are done. I'm going to find breakfast while I wait on the other 3 | 15:56 |
*** dviroel|rover|lunch is now known as dviroel|rover | 16:17 | |
clarkb | fungi: ok health01, subunit-worker01/02, logstash01, logstash-worker01-20, elasticsearch02-07 are all shutdown. health01, subunit-worker01, logstash01, logstash-worker01, and elasticsearch02 appear to all have snapshot images now. Any chance you have time to double check me on that and give an all clear to begin the actual deletions? | 16:18 |
opendevreview | Clark Boylan proposed openstack/project-config master: Set noop jobs on ELK puppetry to prep for retirement https://review.opendev.org/c/openstack/project-config/+/839235 | 16:40 |
opendevreview | Clark Boylan proposed opendev/puppet-kibana master: Retire this project https://review.opendev.org/c/opendev/puppet-kibana/+/839237 | 16:43 |
opendevreview | Clark Boylan proposed opendev/puppet-elasticsearch master: Retire this project https://review.opendev.org/c/opendev/puppet-elasticsearch/+/839238 | 16:46 |
*** jpena is now known as jpena|off | 16:46 | |
opendevreview | Clark Boylan proposed opendev/puppet-log_processor master: Retire this project https://review.opendev.org/c/opendev/puppet-log_processor/+/839239 | 16:48 |
opendevreview | Clark Boylan proposed opendev/puppet-logstash master: Retire this project https://review.opendev.org/c/opendev/puppet-logstash/+/839240 | 16:50 |
opendevreview | Clark Boylan proposed opendev/puppet-subunit2sql master: Retire this project https://review.opendev.org/c/opendev/puppet-subunit2sql/+/839242 | 16:52 |
opendevreview | Clark Boylan proposed openstack/project-config master: Finalize ELK puppetry retirement https://review.opendev.org/c/openstack/project-config/+/839243 | 16:57 |
clarkb | once the servers are gone I think we're good to land ^ | 16:58 |
fungi | clarkb: sorry, was stuffing my face. i'll take a look now | 17:11 |
clarkb | thanks. Not in a huge rush so no biggie. I found time to do other things like construct that stack of changes | 17:12 |
fungi | clarkb: i see all 5 images saved, lgtm | 17:14 |
clarkb | great, I'll proceed with deleting instances now. | 17:14 |
clarkb | no objection to that right? | 17:15 |
fungi | none from me, thanks! | 17:17 |
clarkb | the subunit workers and ehalth have been deleted. Now to work on the ELK servers | 17:19 |
clarkb | alright all the ELK related servers are gone now. It looks like the volumes did not get auto deleted. I'll proceed to delete those manually | 17:25 |
clarkb | volume elasticsearch07.opendev.org/main01 entered error deleting. I'll double check that none of the servers did that now | 17:28 |
clarkb | none of the servers entered an error state. They are gone | 17:29 |
clarkb | Next on my list is dns record cleanup. Then after that the last thing I've got is what to do with the subunit2sql trove instance? | 17:29 |
clarkb | fungi: ^ you may have ideas | 17:30 |
fungi | we can also snapshot that if you think the data is likely to be relevant | 17:30 |
clarkb | I think the issue there is its huge iirc | 17:31 |
clarkb | and ya i'm not sure how relevant it is considering no one noticed the service stopped running for quite a while | 17:32 |
fungi | huge for a trove instance | 17:32 |
fungi | 500gb maybe? | 17:32 |
fungi | i personally don't think there's any point in keeping the data | 17:32 |
clarkb | ya I think I'm with you on that. | 17:32 |
clarkb | and we can just delete the trove instance | 17:33 |
fungi | good and bad news on pep 686: the sc has agreed to making utf-8 mode the default, but has scheduled it to not happen until 3.5 | 17:34 |
fungi | 3.15 | 17:34 |
clarkb | wow thats a ways out | 17:40 |
clarkb | fungi: ok all dns records (including the health.o.o and logsatsh.o.o CNAMEs) have been removed. Except for A records for subunit-worker01 and subunit-worker02. They just don't show up in the web ui so not sure what is going on there | 17:41 |
clarkb | otherwise all the A and AAAA and CNAME records for those servers have been removed | 17:41 |
clarkb | weird I exited the list view and opened the zone again and now I see those records. I'll delete them before they disappear again | 17:42 |
fungi | clarkb: thanks, looks like they're no longer resolving | 17:43 |
fungi | were you scrolling through the entries or just trying to search in the browser? | 17:44 |
clarkb | I was scrolling. It seemed like none of the records starting with s loaded though | 17:44 |
fungi | bizare | 17:44 |
clarkb | once I reloaded the list and scrolled through they showed up | 17:44 |
clarkb | I'm going to go aheada nd status log here, but then can ask about the subunit2sql db in the team meeting tomorrow | 17:45 |
clarkb | looks like it is using about 286GB out of 500GB max | 17:45 |
clarkb | #status log The retired ELK, subunit2sql, and health api services have now been deleted. | 17:45 |
opendevstatus | clarkb: finished logging | 17:46 |
clarkb | alright meeting agenda is getting udpated before being sent later today. Please add your content if you ahve any. | 17:54 |
clarkb | fungi: https://review.opendev.org/q/topic:retire-elk the oldest change there, 839235, is straightforward flip CI to noop jobs for these repos that we'll retire change if you have time for that | 17:56 |
clarkb | once that lands I can recheck all the changes to retire content in the repos | 17:57 |
*** rlandy is now known as rlandy|mtg | 18:00 | |
fungi | sounds great, thanks! | 18:00 |
frickler | clarkb: fungi: how about I merge https://review.opendev.org/c/opendev/system-config/+/838923 (jammy mirroring) tomorrow my morning and watch how it goes? seems pretty low risk except possibly filling up its quota | 18:06 |
clarkb | frickler: that wfm. I think the cleanups I did should give it plenty of room | 18:07 |
fungi | i'm also happy to monitor it today if you'd rather have a head start on things | 18:08 |
opendevreview | Merged openstack/project-config master: Set noop jobs on ELK puppetry to prep for retirement https://review.opendev.org/c/openstack/project-config/+/839235 | 18:08 |
frickler | fungi: if you have time for that, I won't object, then I could possibly watch an image build instead ;) | 18:09 |
fungi | approved! once it deploys i'll take the lock and run reprepro without the timeout to make sure it completes | 18:12 |
frickler | cool, thx | 18:13 |
fungi | there's currently a reprepro run for ubuntu in progress, but i have a root screen session going on mirror-update.o.o and will grb the lock once it's released | 18:14 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update Gerrit build checkouts https://review.opendev.org/c/opendev/system-config/+/839250 | 18:14 |
opendevreview | Clark Boylan proposed opendev/system-config master: Explicitly disable Gerrit tracing.performanceLogging https://review.opendev.org/c/opendev/system-config/+/839251 | 18:14 |
clarkb | more gerrit 4.5 prep ^ | 18:14 |
clarkb | er3.5 | 18:14 |
opendevreview | Merged opendev/system-config master: Start mirroring jammy https://review.opendev.org/c/opendev/system-config/+/838923 | 18:38 |
*** rcastillo_ is now known as rcastillo | 19:02 | |
*** rlandy|mtg is now known as rlandy | 19:03 | |
clarkb | fungi: what do you think about landing those project retirement changes now that the noop change is in place? | 19:34 |
fungi | sounds good to me | 19:37 |
fungi | i can review after i finish making dinner | 19:37 |
clarkb | great. I'll probably pop out for a bike ride in an hour or too as well but the impact for those changes should be nil now that the servers are gone | 19:38 |
clarkb | mostly trying to clean everything up so that we don't leave anything behind to confuse us later :) | 19:38 |
clarkb | I've not spent nearly enough time on the bike this year. Trying to correct that. | 19:39 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/839251/1 is interesting because I amanged to catch that reading the gerrit mailing list. Basically gerrit 3.5 uses more memory by default because by default it collects tracing info | 20:09 |
clarkb | since we aren't hooked up to a tracing system we can disable it and save some memory hopefully | 20:10 |
clarkb | fungi: looks like you might have the ubuntu mirror updatel ock now? | 20:18 |
fungi | yes | 20:18 |
clarkb | at least I see a flock for it but no other processes | 20:18 |
clarkb | cool | 20:19 |
fungi | i have the reprepro script readied in a root screen session | 20:19 |
fungi | was just waiting for the deploy results to report | 20:19 |
fungi | which looks like it did at 19:01:07 | 20:23 |
fungi | starting it now | 20:23 |
fungi | output is tee'd to the usual log so it can also be seen in the screen buffer | 20:23 |
clarkb | thanks | 20:26 |
fungi | ERROR: Condition '437D05B5|C0B21F32' not fulfilled for '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release.gpg'. | 20:26 |
fungi | Signatures in '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release.gpg': | 20:26 |
fungi | '871920D1991BC93C' (signed 2022-04-25): missing pubkey | 20:26 |
fungi | Error: Not enough signatures found for remote repository ubuntu-security (http://security.ubuntu.com/ubuntu jammy-security)! | 20:27 |
clarkb | hrm I thought ubuntu used the same key over and over? Maybe not for security? | 20:27 |
fungi | guess we need to add a key | 20:27 |
fungi | gpg --verify /afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_jammy-security_Release{.gpg,} | 20:29 |
fungi | gpg: Signature made 2022-04-25T18:35:45 UTC using RSA key 0x871920D1991BC93C | 20:29 |
fungi | i can't seem to gpg --recv-keys 0x871920D1991BC93C | 20:30 |
fungi | gpg: key 0x871920D1991BC93C: new key but contains no user ID - skipped | 20:30 |
fungi | that key was created in 2018 | 20:31 |
clarkb | they distribute the keys with apt. Is possible they just never put it on the key servers? | 20:31 |
clarkb | https://bugs.launchpad.net/ubuntu/+source/reprepro/+bug/1968198 doesn't seem related but might end up affecting us too | 20:32 |
fungi | gpg --keyserver keyserver.ubuntu.com --receive-keys 0x871920D1991BC93C | 20:34 |
fungi | that worked | 20:34 |
fungi | gpg: key 0x871920D1991BC93C: public key "Ubuntu Archive Automatic Signing Key (2018) <ftpmaster@ubuntu.com>" imported | 20:34 |
clarkb | I wonder if that just means we haven't been updating our keys like we did with debian in the repreoro config management. That seems possible | 20:36 |
fungi | yeah, i'm experimenting | 20:37 |
fungi | looks like we use playbooks/roles/import-gpg-key/tasks/main.yaml to import each of the archive keys into the root gnupg keyring? | 20:42 |
clarkb | yes, we keep an ascii armored version of the pubkey in the role and those tasks iterate over them and intsall them | 20:43 |
fungi | yep, just making sure. so if i really want to test this, i'll end up bypassing that role | 20:43 |
fungi | i'll just propose the change i think it needs | 20:44 |
clarkb | I think you also need to list the key fingerprint in the reprepro configs | 20:44 |
clarkb | I'm going to work on getting out for that exercise now. Will check on that when I get back. I don't think its a big deal if you manually toggle it and also push a change we land next | 20:46 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add Ubuntu's 2018 Archive Signing Key to reprepro https://review.opendev.org/c/opendev/system-config/+/839261 | 20:51 |
fungi | yeah, i think i got everything in ^ | 20:52 |
fungi | i'm around all night, so happy to just wait for that to land and deploy and then try again | 20:53 |
ianw | frickler: thanks for working on jammy things. in answer to your prior question on why we use ntpdate/chrony/systemd-timesync/* the answer is pretty much I don't know and will have to context switch it back in :) | 21:21 |
ianw | i think we kind of make decisions that seem right at the time, but it's always worth revisiting as the world turns | 21:22 |
fungi | in the past we've oscillated between taking whatever the platform provides by default vs overriding platform defaults in order to drive consistency across different node labels | 21:28 |
fungi | and this is yet another of those situations | 21:28 |
opendevreview | Merged opendev/system-config master: Add Ubuntu's 2018 Archive Signing Key to reprepro https://review.opendev.org/c/opendev/system-config/+/839261 | 21:58 |
*** rlandy is now known as rlandy|bbl | 22:16 | |
fungi | and it's deployed, so trying again | 22:20 |
fungi | seems to have gotten past the prior error | 22:22 |
*** dviroel|rover is now known as dviroel|rover|afk | 22:36 | |
clarkb | fungi: have you run into the error in the lp bugI linked? | 22:43 |
fungi | nope | 22:43 |
clarkb | cool hopefully that got fixed one way or another | 22:43 |
fungi | with 839261 deployed it's well into pulling down packages now | 22:43 |
clarkb | ianw: did you see frickelr was aksing if you could followup to https://review.opendev.org/c/zuul/nodepool/+/834152 ? I think his suggesting is that you push a new patchset to make the change you are asking for to avoid any confusion | 22:43 |
fungi | probably will still be going for hours but i'll try to keep an eye on it over the course of my evening | 22:44 |
clarkb | thanks. My bike ride was fun. I went out and it was decent weather. Its sunny now. But for about 45 minutes of my bike ride the skies decided torrential downpour and hail would be appropriate | 22:47 |
clarkb | I'll send out the meeting agenda in a few minutes if there is anything else to add let me know | 22:52 |
clarkb | I guess our afs graphs track the RO volumes and not RW so we won't see progress via disk usage | 23:05 |
fungi | yeah, not until it finishes | 23:06 |
ianw | it might show the rw volume, i don't think it explicitly doesn't at least ... | 23:33 |
clarkb | its not a big deal I was just hoping to see a slowly increasing disk utlization grpah to estimate progress | 23:34 |
ianw | i think it might pull the r/o into a different stat https://opendev.org/opendev/afsmon/src/branch/master/afsmon/__init__.py#L82 | 23:34 |
ianw | yeah the readonly ones are like mirror_fedora_readonly | 23:37 |
ianw | and the stats page shows the r/w volumes. it's interesting because i'm not sure if that naming is a feature or a bug | 23:39 |
clarkb | huh it does show a small bump now to 692GB | 23:40 |
clarkb | also I've realized that ubuntu ports for arm64 is a seprate volume so we may not have room to do those just yet. That said I just cleared out 6TB of elasticsearch volumes. Maybe we should allocate 2TB back to AFS | 23:40 |
ianw | it was only individual volumes exceeding 2tb that was the issue, wasn't it? when we had our on-disk pypi mirror | 23:42 |
clarkb | yes I beliee so | 23:43 |
clarkb | pretty sure we can go to 3TB total then keep individual volumes under 2TB | 23:43 |
clarkb | in this case I say 2TB to afs because 1TB to each dfw server | 23:43 |
ianw | it would complicate things to have a vicepb i guess, but speaking from experience if this volume needs to fsck it's nerve wracking | 23:46 |
fungi | oh, right, because it's a virtual fs built on files on another fs | 23:47 |
fungi | i always forget it's not just backed by a raw block device | 23:47 |
corvus | as a point of interest -- there is once again a single job running for an excessive amount of time that's holding up the zuul rolling restart. this time it's the nova-live-migration job and it's stuck running the opendev.org/opendev/base-jobs/playbooks/base/cleanup.yaml playbook | 23:53 |
corvus | i think that playbook may not deal with systemic node connection problems well :/ | 23:54 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!