19:01:36 #startmeeting infra 19:01:36 Meeting started Tue Aug 30 19:01:36 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:36 The meeting name has been set to 'infra' 19:01:48 #link https://lists.opendev.org/pipermail/service-discuss/2022-August/000356.html Our Agenda 19:01:59 #topic Announcements 19:02:06 o/ 19:02:10 OpenStack's feature freeze begins this week. 19:02:40 Good time to be on the lookout for any code review and CI issues that the stress of feature freeze often brings (though more recently the added load hasn't been too bad) 19:03:12 Also sounds like starglinx is trying to prepare and ship a release 19:03:29 We should avoid landing risky changes to the infrastructure as well 19:03:48 use your judgement etc (I don't think we need to stop the weekly zuul upgrades for example as those have been fairly stable) 19:04:48 #topic Bastion Host Updates 19:05:17 I don't have anything new to add to this. Did anyone else? I think we ended up taking a step back on the zuul console log stuff to reevaluate things 19:05:28 i didn't 19:07:09 i didn't, still working on the zuul console stuff too in light of last week; we can do the manual cleanup 19:08:06 ok I can pull up the find comamnd I ran previously and rerun it 19:08:19 maybe with a shorter timeframe to keep. I think I used a month last time. 19:08:23 #topic Upgrading Bionic Servers 19:08:43 I've been distracted by gitea (something that needs upgrades acutally) and the mailman3 stuff. 19:09:06 Anyone else look at upgrades yet? 19:10:18 Sounds like no. Thats fine we'll pick this up in the future. 19:10:23 #topic Mailman 3 19:10:44 #link https://review.opendev.org/c/opendev/system-config/+/851248 Add a mailman 3 server 19:11:15 This change continues to converge closer and closer towards something that is deployable. And really at this point we might want to think about deploying a server? 19:11:27 In particular fungi tested the migration of opendev lists and that seemed to go well 19:11:37 configuration expectations made it across the migration 19:11:40 yep, seemed to keep the settings we want 19:11:46 #link https://etherpad.opendev.org/p/mm3migration Server and list migration notes 19:12:07 I did add some notes to the bottom of taht etherpad for additional things to check. One of them I've updated the chagne for already. 19:12:13 those are the exact commands i'm running, so we can turn it into a migration script as things get closer 19:12:39 I think testing dmarc if possible is a good next step. Unfornately I'm not super sure about how we should test that 19:13:13 that sounds hard without making dns entries? 19:13:15 I guess we'd want to know if our existing dmarc valid signature preserving behavior in mm2 is preserved and if not if the mm3 dmarc options are good? 19:13:15 it may be easier to test that once we've migrated lists.opendev.org lists to a new prod server 19:13:52 ya I think we can test the "pass through" behavior of service-discuss@lists.opendev.org using the test server if we can send signed email to the test server. But doing that without dns is likely apinful 19:13:56 like, consider dmarc related adjustments part of the adjustment period for the opendev site migration 19:14:08 that makes sense as we'd have dns all sorted for that 19:14:11 before we do the other sites 19:15:05 alternative option would be to add an unused fqdn to a mm3 site on the held node and set up dns and certs, et cetera 19:15:18 that seems like overkill 19:15:30 yes, i'm reluctant to create even more work 19:15:32 I think worst case we'll just end up using new different config for dmarc handling 19:15:46 and if we sort that out on the opendev lists before we migrate the others that is likely fine 19:15:58 and at least mm3 has a greater variety of options for dmarc mitigation 19:17:05 The other thing I had in mind was testing migration of openstack-discuss as there are mm3 issues/posts about people hitting timeouts and similar errors with large list migrations 19:17:33 Maybe we should do that as a last sanity check of mm3 as deployed by this change then if that is happy clean the change up to make it mergeable? 19:17:38 should be pretty easy to run through, happy to give that a shot this evening 19:17:55 great. Mostly I think if we are going to have any problems with migrating it will be with that list so that gives good confidence in the process 19:18:00 rsync will take a while 19:18:29 fungi: one thing to take note of is disk consumption needs for the hyperkitty xapian indexes and the old pipermail archives and the new sotrage for mm3 emails 19:18:41 so that when we boot a new server we can estimate likely disk usage needs and size it properly 19:18:49 * clarkb adds a note to the etherpad 19:18:59 also we should spend some time thinking about the actual migration logistics steps (beyond just the commands), like when do we shut things down, when do we adjust dns records, how to make sure incoming messages are deferred 19:19:35 i have a basic idea of the sequence in my head, i'll put a section for it in that pad 19:20:05 ++ thanks 19:20:49 This is the sort of thing we should be able to safely do for say opendev and zuul in the near future then do openstack and starlingx once their releases are out the door 19:20:56 there will necessarily be some unavoidable downtime, but we can at least avoid bouncing or misdelivering during the maintenance 19:21:23 also there are still some todo comments in that change 19:21:50 I just deleted a few of them with my last update 19:21:52 for one, still need to work out the apache rewrite syntax for preserving the old pipermail archive copies 19:22:09 oh! i haven't looked at that last revision yet 19:22:15 I didn't address that particular one 19:22:21 I can try to take a look later today at that though 19:22:39 i can probably also work it out now that we have an idea of what it would look like 19:22:53 and clean up any other todos https://review.opendev.org/c/opendev/system-config/+/851248/65/playbooks/service-lists3.yaml had high level ones we can clean up now too 19:23:14 i think moving zuul over right after opendev would be great 19:23:50 fungi: if you poke at the migration for service-discuss I can focus on cleaning up the chagne to make it as mergeable as possible at this point 19:23:58 corvus: good to hear. I suspected you would be interested :) 19:24:23 So ya long story short I think we need to double check a few things and clean the change up to make it landable but we are fast appraoching the point where we actually awnt to deploy a new mailman3 server 19:24:26 exciting 19:25:05 I can also clean the change up to not deploy zuul openstack starglinx etc lists for now. Justh ave it deploy opendev to start since that will mimic our migration path 19:25:16 Anything else mm3 related? 19:26:48 #topic Gerrit load issues 19:27:07 This is mostly still on here as a sanity check. I don't think we haev seen this issue persist? 19:27:20 Additionally the http thread limit increase doesn't seem to have caused any negative effects 19:28:42 i haven't seen any new issues 19:28:50 There were other changes I had in mind (like bumping ssh threads and http threads together to keep http above the ssh+http git limit), but considering we seem stable here I think we leave it be unless we observe issues agan 19:30:37 #topic Jaeger Tracing Server 19:31:52 corvus: I haven't seen any chagnes for this yet. No rush but wanted to make sure I wasn't missing anything 19:32:06 Mostly just a check in to make sure you didn't need reviews or other input 19:33:29 clarkb: nope not yet -- i'm actually working on the zuul tutorial/example version of that today, so expect the opendev change to follow. 19:34:24 sounds good 19:34:30 #topic Fedora 36 19:35:06 ianw: can you fill us in on the plans here? In particular one of the potentially dangerous things for the openstack release is updating the fedora version under them as they try to release 19:36:16 just trying to move ahead with this so we don't get too far behind 19:36:46 i'm not sure it's in the release path for too much -- we certainly say that "fedora-latest" may change to be the latest as we get to it 19:37:21 one sticking point is the openshift related jobs, used by nodepool 19:38:00 unfortunately it seems due to go changes, the openshift client is broken on fedora 36 19:38:54 tangentially, this is also using centos-7 and a SIG repo to deploy openshift-3 for the other side of the testing. i spent about a day trying to figure out a way to migrate off that 19:39:20 ianw: the fedora 36 image is up and running now though as we can discover these problems at least? The next steps are flipping the default nodeset labels? 19:39:54 #link https://review.opendev.org/c/zuul/zuul-jobs/+/854047 19:39:59 has details about why that doesn't work 19:41:05 clarkb: yep, nodes are working. i've made the changes to nodesets dependent on the zuul-jobs updates (so we know it at least works there), so have to merge them first 19:41:20 i think they are now fully reviewed, thanks 19:41:23 (the zuul-jobs changes) 19:42:20 ok so mostly a matter of updating base testing and fixing issues that come up 19:43:49 Anything that needs specific attention? 19:44:49 i don't think so 19:45:15 thanks, unless somebody finds trying to get openshift things running exciting 19:46:18 I've run away from that one before :) 19:46:35 It might not be a terrible idea to start a thread on the zuul discuss list about whether or not the openshift CI is viable 19:46:48 or at least "practical" 19:46:49 and instead start treating it like one of the other nodepool providers that doesn't get an actual deployment 19:47:05 not ideal but would reduce some of the headache for maintaining it 19:48:14 related, there was a suggestion about using larger instances to test it 19:48:25 there's this change which has been pending for a while: 19:49:02 #link https://review.opendev.org/844116 Add 16 vcpu flavors 19:49:13 they also come with more ram, as a side effect 19:49:53 we could do something similar but combine that with the nested-virt labels we use in some providers to get a larger node with nested virt acceleration 19:50:30 we could ... 19:50:59 but it just feels weird that you can't even start the thing with less than 9.6gb of ram 19:51:08 i concur 19:51:10 it would be interesting to know if there are any users of the openshift drivers (vs the k8s driver, given the overlap in functionality). 19:51:58 ++ ... that was my other thought that this may not even be worth testing like this 19:52:15 (i am aware of people who use the k8s nodepool driver with openshift) 19:52:40 right, but something like minikube might be a better way to do this testing? 19:52:43 I think tristanC may use them. But agreed starting a thread on the zuul mailing list to figure this out is probably a good idea 19:53:07 yeah, perhaps that is the best place to start, i can send something out 19:53:55 #topic Open Discussion 19:54:05 We are nearing the end of our hour. ANything else before time is up? 19:54:24 i got nothin' 19:54:53 The gitea upgrade appears to have gone smoothly 19:55:03 I wonder if anyone even noticed :) 19:55:49 Monday is technically a holiday here. I'll probably be around but less so (I think I've got a BBQ to go to) 19:56:05 oh, good point. i should try to not be around as much on monday 19:56:29 time to put your white shoes away 19:57:20 * fungi puts on his red shoes and dances the blues 19:58:15 Sounds like that is everything. Thank you everyone 19:58:20 We'll see you back here next week 19:58:24 #endmeeting