19:01:36 <clarkb> #startmeeting infra
19:01:36 <opendevmeet> Meeting started Tue Aug 30 19:01:36 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:36 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:36 <opendevmeet> The meeting name has been set to 'infra'
19:01:48 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-August/000356.html Our Agenda
19:01:59 <clarkb> #topic Announcements
19:02:06 <ianw> o/
19:02:10 <clarkb> OpenStack's feature freeze begins this week.
19:02:40 <clarkb> Good time to be on the lookout for any code review and CI issues that the stress of feature freeze often brings (though more recently the added load hasn't been too bad)
19:03:12 <clarkb> Also sounds like starglinx is trying to prepare and ship a release
19:03:29 <clarkb> We should avoid landing risky changes to the infrastructure as well
19:03:48 <clarkb> use your judgement etc (I don't think we need to stop the weekly zuul upgrades for example as those have been fairly stable)
19:04:48 <clarkb> #topic Bastion Host Updates
19:05:17 <clarkb> I don't have anything new to add to this. Did anyone else? I think we ended up taking a step back on the zuul console log stuff to reevaluate things
19:05:28 <fungi> i didn't
19:07:09 <ianw> i didn't, still working on the zuul console stuff too in light of last week; we can do the manual cleanup
19:08:06 <clarkb> ok I can pull up the find comamnd I ran previously and rerun it
19:08:19 <clarkb> maybe with a shorter timeframe to keep. I think I used a month last time.
19:08:23 <clarkb> #topic Upgrading Bionic Servers
19:08:43 <clarkb> I've been distracted by gitea (something that needs upgrades acutally) and the mailman3 stuff.
19:09:06 <clarkb> Anyone else look at upgrades yet?
19:10:18 <clarkb> Sounds like no. Thats fine we'll pick this up in the future.
19:10:23 <clarkb> #topic Mailman 3
19:10:44 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/851248 Add a mailman 3 server
19:11:15 <clarkb> This change continues to converge closer and closer towards something that is deployable. And really at this point we might want to think about deploying a server?
19:11:27 <clarkb> In particular fungi tested the migration of opendev lists and that seemed to go well
19:11:37 <clarkb> configuration expectations made it across the migration
19:11:40 <fungi> yep, seemed to keep the settings we want
19:11:46 <clarkb> #link https://etherpad.opendev.org/p/mm3migration Server and list migration notes
19:12:07 <clarkb> I did add some notes to the bottom of taht etherpad for additional things to check. One of them I've updated the chagne for already.
19:12:13 <fungi> those are the exact commands i'm running, so we can turn it into a migration script as things get closer
19:12:39 <clarkb> I think testing dmarc if possible is a good next step. Unfornately I'm not super sure about how we should test that
19:13:13 <ianw> that sounds hard without making dns entries?
19:13:15 <clarkb> I guess we'd want to know if our existing dmarc valid signature preserving behavior in mm2 is preserved and if not if the mm3 dmarc options are good?
19:13:15 <fungi> it may be easier to test that once we've migrated lists.opendev.org lists to a new prod server
19:13:52 <clarkb> ya I think we can test the "pass through" behavior of service-discuss@lists.opendev.org using the test server if we can send signed email to the test server. But doing that without dns is likely apinful
19:13:56 <fungi> like, consider dmarc related adjustments part of the adjustment period for the opendev site migration
19:14:08 <clarkb> that makes sense as we'd have dns all sorted for that
19:14:11 <fungi> before we do the other sites
19:15:05 <fungi> alternative option would be to add an unused fqdn to a mm3 site on the held node and set up dns and certs, et cetera
19:15:18 <clarkb> that seems like overkill
19:15:30 <fungi> yes, i'm reluctant to create even more work
19:15:32 <clarkb> I think worst case we'll just end up using new different config for dmarc handling
19:15:46 <clarkb> and if we sort that out on the opendev lists before we migrate the others that is likely fine
19:15:58 <fungi> and at least mm3 has a greater variety of options for dmarc mitigation
19:17:05 <clarkb> The other thing I had in mind was testing migration of openstack-discuss as there are mm3 issues/posts about people hitting timeouts and similar errors with large list migrations
19:17:33 <clarkb> Maybe we should do that as a last sanity check of mm3 as deployed by this change then if that is happy clean the change up to make it mergeable?
19:17:38 <fungi> should be pretty easy to run through, happy to give that a shot this evening
19:17:55 <clarkb> great. Mostly I think if we are going to have any problems with migrating it will be with that list so that gives good confidence in the process
19:18:00 <fungi> rsync will take a while
19:18:29 <clarkb> fungi: one thing to take note of is disk consumption needs for the hyperkitty xapian indexes and the old pipermail archives and the new sotrage for mm3 emails
19:18:41 <clarkb> so that when we boot a new server we can estimate likely disk usage needs and size it properly
19:18:49 * clarkb adds a note to the etherpad
19:18:59 <fungi> also we should spend some time thinking about the actual migration logistics steps (beyond just the commands), like when do we shut things down, when do we adjust dns records, how to make sure incoming messages are deferred
19:19:35 <fungi> i have a basic idea of the sequence in my head, i'll put a section for it in that pad
19:20:05 <clarkb> ++ thanks
19:20:49 <clarkb> This is the sort of thing we should be able to safely do for say opendev and zuul in the near future then do openstack and starlingx once their releases are out the door
19:20:56 <fungi> there will necessarily be some unavoidable downtime, but we can at least avoid bouncing or misdelivering during the maintenance
19:21:23 <fungi> also there are still some todo comments in that change
19:21:50 <clarkb> I just deleted a few of them with my last update
19:21:52 <fungi> for one, still need to work out the apache rewrite syntax for preserving the old pipermail archive copies
19:22:09 <fungi> oh! i haven't looked at that last revision yet
19:22:15 <clarkb> I didn't address that particular one
19:22:21 <clarkb> I can try to take a look later today at that though
19:22:39 <fungi> i can probably also work it out now that we have an idea of what it would look like
19:22:53 <clarkb> and clean up any other todos https://review.opendev.org/c/opendev/system-config/+/851248/65/playbooks/service-lists3.yaml had high level ones we can clean up now too
19:23:14 <corvus> i think moving zuul over right after opendev would be great
19:23:50 <clarkb> fungi: if you poke at the migration for service-discuss I can focus on cleaning up the chagne to make it as mergeable as possible at this point
19:23:58 <clarkb> corvus: good to hear. I suspected you would be interested :)
19:24:23 <clarkb> So ya long story short I think we need to double check a few things and clean the change up to make it landable but we are fast appraoching the point where we actually awnt to deploy a new mailman3 server
19:24:26 <clarkb> exciting
19:25:05 <clarkb> I can also clean the change up to not deploy zuul openstack starglinx etc lists for now. Justh ave it deploy opendev to start since that will mimic our migration path
19:25:16 <clarkb> Anything else mm3 related?
19:26:48 <clarkb> #topic Gerrit load issues
19:27:07 <clarkb> This is mostly still on here as a sanity check. I don't think we haev seen this issue persist?
19:27:20 <clarkb> Additionally the http thread limit increase doesn't seem to have caused any negative effects
19:28:42 <fungi> i haven't seen any new issues
19:28:50 <clarkb> There were other changes I had in mind (like bumping ssh threads and http threads together to keep http above the ssh+http git limit), but considering we seem stable here I think we leave it be unless we observe issues agan
19:30:37 <clarkb> #topic Jaeger Tracing Server
19:31:52 <clarkb> corvus: I haven't seen any chagnes for this yet. No rush but wanted to make sure I wasn't missing anything
19:32:06 <clarkb> Mostly just a check in to make sure you didn't need reviews or other input
19:33:29 <corvus> clarkb: nope not yet -- i'm actually working on the zuul tutorial/example version of that today, so expect the opendev change to follow.
19:34:24 <clarkb> sounds good
19:34:30 <clarkb> #topic Fedora 36
19:35:06 <clarkb> ianw: can you fill us in on the plans here? In particular one of the potentially dangerous things for the openstack release is updating the fedora version under them as they try to release
19:36:16 <ianw> just trying to move ahead with this so we don't get too far behind
19:36:46 <ianw> i'm not sure it's in the release path for too much -- we certainly say that "fedora-latest" may change to be the latest as we get to it
19:37:21 <ianw> one sticking point is the openshift related jobs, used by nodepool
19:38:00 <ianw> unfortunately it seems due to go changes, the openshift client is broken on fedora 36
19:38:54 <ianw> tangentially, this is also using centos-7 and a SIG repo to deploy openshift-3 for the other side of the testing.  i spent about a day trying to figure out a way to migrate off that
19:39:20 <clarkb> ianw: the fedora 36 image is up and running now though as we can discover these problems at least? The next steps are flipping the default nodeset labels?
19:39:54 <ianw> #link https://review.opendev.org/c/zuul/zuul-jobs/+/854047
19:39:59 <ianw> has details about why that doesn't work
19:41:05 <ianw> clarkb: yep, nodes are working.  i've made the changes to nodesets dependent on the zuul-jobs updates (so we know it at least works there), so have to merge them first
19:41:20 <ianw> i think they are now fully reviewed, thanks
19:41:23 <ianw> (the zuul-jobs changes)
19:42:20 <clarkb> ok so mostly a matter of updating base testing and fixing issues that come up
19:43:49 <clarkb> Anything that needs specific attention?
19:44:49 <ianw> i don't think so
19:45:15 <ianw> thanks, unless somebody finds trying to get openshift things running exciting
19:46:18 <clarkb> I've run away from that one before :)
19:46:35 <clarkb> It might not be a terrible idea to start a thread on the zuul discuss list about whether or not the openshift CI is viable
19:46:48 <fungi> or at least "practical"
19:46:49 <clarkb> and instead start treating it like one of the other nodepool providers that doesn't get an actual deployment
19:47:05 <clarkb> not ideal but would reduce some of the headache for maintaining it
19:48:14 <fungi> related, there was a suggestion about using larger instances to test it
19:48:25 <fungi> there's this change which has been pending for a while:
19:49:02 <fungi> #link https://review.opendev.org/844116 Add 16 vcpu flavors
19:49:13 <fungi> they also come with more ram, as a side effect
19:49:53 <fungi> we could do something similar but combine that with the nested-virt labels we use in some providers to get a larger node with nested virt acceleration
19:50:30 <ianw> we could ...
19:50:59 <ianw> but it just feels weird that you can't even start the thing with less than 9.6gb of ram
19:51:08 <fungi> i concur
19:51:10 <corvus> it would be interesting to know if there are any users of the openshift drivers (vs the k8s driver, given the overlap in functionality).
19:51:58 <ianw> ++ ... that was my other thought that this may not even be worth testing like this
19:52:15 <corvus> (i am aware of people who use the k8s nodepool driver with openshift)
19:52:40 <ianw> right, but something like minikube might be a better way to do this testing?
19:52:43 <clarkb> I think tristanC may use them. But agreed starting a thread on the zuul mailing list to figure this out is probably a good idea
19:53:07 <ianw> yeah, perhaps that is the best place to start, i can send something out
19:53:55 <clarkb> #topic Open Discussion
19:54:05 <clarkb> We are nearing the end of our hour. ANything else before time is up?
19:54:24 <fungi> i got nothin'
19:54:53 <clarkb> The gitea upgrade appears to have gone smoothly
19:55:03 <clarkb> I wonder if anyone even noticed :)
19:55:49 <clarkb> Monday is technically a holiday here. I'll probably be around but less so (I think I've got a BBQ to go to)
19:56:05 <fungi> oh, good point. i should try to not be around as much on monday
19:56:29 <ianw> time to put your white shoes away
19:57:20 * fungi puts on his red shoes and dances the blues
19:58:15 <clarkb> Sounds like that is everything. Thank you everyone
19:58:20 <clarkb> We'll see you back here next week
19:58:24 <clarkb> #endmeeting