clarkb | the weekly team meeting will begin momentarily | 18:59 |
---|---|---|
ianw | o/ | 18:59 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Dec 6 19:01:06 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/GCABXQDEGIAYG4T63NXZJGNHACEICKAP/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | The foundation is entering election time for the board. Nominations for individual members close in 10 days on the 16th of december | 19:02 |
clarkb | Then an election is held in January | 19:02 |
clarkb | Any other announcements? | 19:02 |
fungi | board meeting today | 19:03 |
clarkb | oh right an hour after the end of this meeting (21:00 UTC) there will be a board meeting | 19:04 |
fungi | 21:00 utc in zoom | 19:04 |
fungi | yep | 19:04 |
clarkb | tools for openstack translations will be discussed which might interest this crowd | 19:04 |
fungi | https://board.openinfra.dev/meetings/2022-12-06 | 19:04 |
fungi | that 'un | 19:04 |
clarkb | #topic Bastion Host Updates | 19:06 |
clarkb | I think we are getting very close to the end of this thread. | 19:06 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/866542 addresses ansible installation on bridge to actually update to the ansible we are testing with | 19:06 |
clarkb | #link https://review.opendev.org/q/topic:prod-bastion-group parallelized zuul jobs on bridge. Should land when bridge is stable and we can monitor | 19:06 |
clarkb | #link https://review.opendev.org/q/topic:bridge-ansible-venv This group appears to have all of its changes merged or abandoned | 19:07 |
clarkb | ianw: Anything else to say on this topic? I need to rereview 866542 which is on my todo list for today | 19:07 |
ianw | yeah 866542 just got a rebase really since you looked at it yesterday, i removed a change that was updating a removed comment | 19:08 |
ianw | the stack really expanded to | 19:08 |
ianw | #link https://review.opendev.org/q/topic:boostrap-ansible-from-req | 19:08 |
clarkb | oh I see there are a few followons | 19:08 |
ianw | which just moves the same idea to the venv creation, which i noticed when watching the logs | 19:09 |
ianw | the other stack that needs feedback and action, particularly from infra-roots is | 19:09 |
ianw | #link https://review.opendev.org/q/topic:bridge-backups | 19:09 |
clarkb | oh right I had that in my local agenda notes. sorry | 19:09 |
clarkb | and what that does is encrypt things locally so they can be backed up remotely right? | 19:10 |
ianw | essentially yes, with a key split requiring 2 people to recombine | 19:11 |
ianw | this is so nobody needs to feel like they need to setup fort knox to keep the backup | 19:11 |
clarkb | I'll have to read into that more to understand the mechanics of it. Like do we all need to forward gpg agents or something? But that can happen in review or in #opendev | 19:12 |
clarkb | I'll do my best to review those two stacks after the board meeting today | 19:13 |
clarkb | Anything else bastion related? | 19:13 |
ianw | thanks; https://review.opendev.org/c/opendev/system-config/+/866430 should be pretty explanatory for that i think | 19:13 |
ianw | nope, thanks | 19:13 |
clarkb | #topic Upgrading old servers | 19:14 |
clarkb | Nothing new here other than we should find time to do more of this :/ | 19:14 |
clarkb | I guess technically the bastion work is a subset of this so we are pushing that along :) | 19:14 |
fungi | technically we've partially upgraded the listserv too | 19:14 |
clarkb | and the mm3 work isn't directly related but does get us off an old serverthat has kernel fun | 19:14 |
fungi | yeah that | 19:14 |
clarkb | progress then. I'll take it :) | 19:14 |
fungi | i guess we already upgraded the distro on the old mailman server anyway | 19:15 |
clarkb | #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes | 19:15 |
fungi | just not painlessly | 19:15 |
clarkb | yup | 19:15 |
clarkb | Which is a good lead into the next topic | 19:15 |
clarkb | #topic Mailman 3 | 19:15 |
clarkb | #link https://etherpad.opendev.org/p/mm3migration Server and list migration notes | 19:15 |
clarkb | lists.opendev.org and lists.zuul-ci.org moved to mailman3 on the new server yesterday | 19:15 |
fungi | and within the scheduled window even | 19:16 |
fungi | though in retrospect i should have called it two hours just in case | 19:16 |
fungi | i didn't factor in gate/deploy time for the dns updates | 19:16 |
clarkb | there were/are a couple of issues we found in the process. One was fixed which correct some url routes. The other is setting a site_owner value which was missed because all the other settings are set by env vars but not this one | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/866632 set mailman3 site_owner | 19:16 |
clarkb | fungi: we managed to make the timing work in the end | 19:17 |
clarkb | There is also a change to spin things down on the old server for these two domains | 19:17 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/866630 disable old mm2 domains | 19:17 |
fungi | yeah, the broken archiving is totally my bad. when i re-tested the held node after we updated to the latest mm release, i forgot to double-check that new list messages ended up in the imported archive | 19:17 |
fungi | thanks to corvus and clarkb for figuring it out while i was off stuffing my face | 19:18 |
clarkb | One "exciting" thing is that the upstream mailing has a thread from today (of course the day after we upgrade) suggesting people not run latest mm3 which we are doing. | 19:18 |
fungi | hah | 19:18 |
ianw | will 866632 require a restart? | 19:18 |
clarkb | The reason for this is a bug when handling more than 10 list sites possibly postfix specific (we run exim) | 19:18 |
fungi | that's so our luck | 19:18 |
clarkb | ianw: yes | 19:18 |
clarkb | #link https://gitlab.com/mailman/mailman/-/issues/1044 bug with more than 10 lists | 19:18 |
clarkb | lists.opendev.org has 10 lists and lists.zuul-ci.org has fewer. I've also grepped for that warning string in our logs. I'm not sure if we are not affected because we use exim or if it is because we have few lists but I haven't found evidence we have a problem yet | 19:19 |
clarkb | something to watch though as lists.openstack.org has many more than 10 lists and we want this to be working before we upgrade | 19:19 |
clarkb | er s/upgrade/migrate lists.openstack.org/ | 19:19 |
clarkb | And please say something if you see the indicated behavior with the migrated lists | 19:20 |
fungi | though lists.openstack.org also has way fewer lists than it used to, and might have fewer still if i can convince them to retire some more unused ones before we move it | 19:20 |
clarkb | Other than that I think this went really well. These two domains are newer but we migrated stuff off of an ancient server and software to something modern and it seems to work | 19:20 |
clarkb | the testing and planning seem to have done their job. Thank you to everyone who helped with that. fungi in particular did a lot of work on that side of things | 19:21 |
fungi | thanks everyone for all the work on that | 19:21 |
fungi | it's been on my wishlist for years and i was never able to find the time to tackle it on my own | 19:21 |
clarkb | anything else to add? | 19:22 |
fungi | nothing on my end | 19:22 |
clarkb | #topic Quo vadis Storyboard | 19:23 |
clarkb | I just realized my old link should be updated to the new hyperkitty archives. Oh well | 19:23 |
clarkb | I did send a followup covering our options and asked for feedback | 19:23 |
clarkb | The one response I got ws someone offering to help with the software but unfortunately I think we need to start with the deployment if we are going to adopt it | 19:23 |
clarkb | *if we adopt software maintenance we need to commit to updating the deployment first | 19:24 |
clarkb | I'll leave it open for more feedback as it has only been about a week. I'd be happy to hear from you | 19:24 |
clarkb | and I guess if that doesn't work I can suggest that people provide semi anonymous feedback instead and I can try to colate it if people trust me to do that | 19:24 |
clarkb | But I want to amke sure whatever we do here is reasonable and will be accepted | 19:25 |
fungi | yes, the software is already well ahead of what we're running in terms of major bug and performance fixes and new features | 19:25 |
fungi | which is a big part of the problem | 19:25 |
fungi | we had volunteers to develop the software, but nobody keeping our deployment up to date with it | 19:25 |
clarkb | right | 19:26 |
clarkb | anyway, lets see how we do over the next week for feedback and we can take a different appraoch if this continues to not generate input | 19:26 |
clarkb | I think 2 weeks is a reasonable amount of time for this sort of thing and we are halfway through that right now | 19:26 |
fungi | agreed. thanks! | 19:26 |
clarkb | #topic Vexxhost server rescues | 19:27 |
clarkb | jrosser shared image settings with me | 19:27 |
clarkb | #link https://paste.opendev.org/show/bxxFEEUWeUrkIVlBSGrw/ jrosser's image settings that work in their cloud | 19:27 |
clarkb | I've got a todo item to try and upload an image with those settings set and use it as a rescue image after modifying the root boot label | 19:27 |
clarkb | But I haven't done that yet | 19:27 |
clarkb | They use ceph too so I'm hopeful that this will work | 19:28 |
clarkb | #topic Gerrit 3.6 | 19:28 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.6 | 19:28 |
clarkb | ianw ran copy-approvals on all of our repos. We had a small problem in neutron due to a change with more than 1k patchsets which is our current limit | 19:29 |
clarkb | ianw temporarily bumped that limit and reran which caused things to work except for a corrupt change | 19:29 |
clarkb | even if that didn't work we would've been fine beacuse all of the neutron changes are closed and not open so their votes are largely there for historical accuracy | 19:29 |
clarkb | ianw: looks like you've noted the next steps are holding a node and double checking things a bit more directly | 19:30 |
clarkb | as well as working on a proposal for the upgrade | 19:30 |
clarkb | ianw: is there anything around the gerrit upgrade we can help with? | 19:30 |
clarkb | I will note the openstack release cycle schedule is at https://releases.openstack.org/antelope/schedule.html which we should avoid conflicts with | 19:31 |
ianw | not really, i just want to hold a node and validate downgrade, which i should be able to do very soon | 19:31 |
ianw | if the mm upgrade time worked, we could do it then | 19:32 |
clarkb | ianw: ~2000UTC on a monday you mean? | 19:32 |
fungi | wfm | 19:33 |
ianw | yep, that would open up the 12th or the 19th, i'm around for both, though less time after the 19th | 19:33 |
clarkb | ya its a trade off I guess. Less lead time to test and announce with the 12th and less time to fix/debug if the 19th | 19:33 |
ianw | if we get some held node validation over the next few days, maybe the 12th? | 19:34 |
ianw | i'm fairly confident, there doesn't seem to be much more we could test | 19:34 |
clarkb | ya if that all looks good and doesn't show anything that users should need to worry about I'd be good with the 12th | 19:34 |
clarkb | we can even send an announcement nowish indicating we plan to do it on the 12th and postpone if necessary | 19:34 |
clarkb | I think there is a downgrade process for 3.6 -> 3.5 too so we have that option if necessary | 19:34 |
ianw | and that's an excuse to send a message through the list to keep it ungreylisted too :) | 19:35 |
fungi | heh | 19:35 |
clarkb | if we test and confirm the downgrade process seems to work then I'm extra happy to proceed early | 19:35 |
clarkb | I think 3.7 -> 3.6 has less easy downgrade though so that upgrade will be a funner one | 19:35 |
ianw | ok, i will get onto all that, https://etherpad.opendev.org/p/gerrit-upgrade-3.6 will be updated as things happen | 19:36 |
clarkb | sounds good, thanks! | 19:36 |
clarkb | #topic Open Discussion | 19:36 |
clarkb | that was it for the agenda but this morning I noticed something I had on my back burner has made some progress and is worth calling out | 19:37 |
clarkb | The first bit is nodepool updated to latest openstacksdk which includes ianw's fix for network stuff against older apis | 19:37 |
clarkb | image uploads seem to work (we have recent images) and I haven't seen any launcher issues. But we should skim the grafana dashboard for any evidence of problems | 19:38 |
clarkb | And then that unlocked the path for updating zuul and nodepool images to python3.11 | 19:38 |
clarkb | The zuul change has landed and nodepool is gating | 19:38 |
clarkb | nodepool will restart once that change lands. zuul will normally restart over the weekend. Do we want to manually restart zuul sooner to observe it? | 19:38 |
clarkb | I should be able to do that tomorrow if we think that is a good idea. | 19:39 |
clarkb | In particular one thing I realized is that ansible might not like python3.11? However, we do have zuul testing that exercises ansible so maybe its fine? | 19:39 |
clarkb | cc corvus ^ if you have an opinion | 19:39 |
clarkb | I'm also happy to revert if we think we need more prep | 19:39 |
clarkb | Oh also last week I cleaned up the inmotion cloud's nova and placement records | 19:40 |
clarkb | There were two distinct issues. The first was that placement had leaked a few records for nodes that just didn't exist anymore either on the host or in the nova db | 19:41 |
clarkb | the second was the nova db leaked instances that didn't exist in libvirt on the hosts | 19:41 |
clarkb | cleaning up the first thing is relatively straightforward and placement has docs on the process. | 19:41 |
clarkb | Cleaning up the second thing required manually editing the nova db to associate nodes with the cell they lived in because some nova bug allowed them to be disassociated whch broke server deletion. Once those records were updated server delete worked fine | 19:42 |
clarkb | melwitt was a huge help in sorting that out, but now we have more nodes to test with so yay | 19:42 |
clarkb | oh we also had leaked nodes in rax | 19:43 |
clarkb | they didn't have proper nodepool metadata so nodepool refused to clean them up. i manually cleared those out | 19:43 |
corvus | clarkb: i think zuul's own tests should give us a heads up on any ansible/python probs like that. i don't have a strong feeling about whether we need to restart early or just let it happen on the weekend | 19:43 |
clarkb | corvus: ack thanks | 19:44 |
clarkb | as far as team meetings go I think we'll cancel the 27th. Any strong opinions for having meetings on the 13th or january 3? | 19:44 |
fungi | i'll be round on the 13th and 3rd but don't necessarily require a meeting | 19:45 |
clarkb | er sorry the 20th and 3rd | 19:45 |
clarkb | I plan to be around on the 13th and have that meeting | 19:45 |
fungi | i also should be around on the 20th but may be a little distracted | 19:45 |
ianw | i should be around on 20th ... unsure on 3 | 19:46 |
ianw | for sure not 27th | 19:46 |
clarkb | ok we can do a low key meeting on the 20th, then see what the new year looks like when we get there | 19:47 |
fungi | i do expect to have far more work than usual the week of the 3rd so may be distracted then too | 19:47 |
clarkb | ya its the time of year when all the paperwork needs to be done :) | 19:47 |
fungi | so much paperwork | 19:47 |
clarkb | alright then we'll see you here on the 13th and probably the 20th. Then we can enjoy the holidays for a bit (and you should nejoy them earlier too if you are able :) ) | 19:47 |
fungi | thanks clarkb! | 19:48 |
clarkb | anythnig else? | 19:48 |
corvus | schedule a holiday party for the 20th ;) | 19:48 |
clarkb | good idea. Let me see if I can figure something out for that | 19:49 |
clarkb | board game arena game or something :) | 19:49 |
clarkb | thank you everyone for your time, I'll let you go now. See you next week. | 19:49 |
clarkb | #endmeeting | 19:49 |
opendevmeet | Meeting ended Tue Dec 6 19:49:48 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:49 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-12-06-19.01.html | 19:49 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-12-06-19.01.txt | 19:49 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-12-06-19.01.log.html | 19:49 |
clarkb | and now lunch before the board meeting | 19:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!