19:00:32 <clarkb> #startmeeting infra 19:00:32 <opendevmeet> Meeting started Tue Oct 31 19:00:32 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:32 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:32 <opendevmeet> The meeting name has been set to 'infra' 19:00:39 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/IILSVTDAEDTRTCRSZZ3P2UKY4CIOKUEY/ Our Agenda 19:00:53 <clarkb> #topic Announcements 19:01:13 <clarkb> I announced the gerrit 3.8 upgrade for November 17 and fungi announced a mm3 upgrade for Thursday 19:01:49 <fungi> short notice, i mainly just didn't want ui changes and logouts to surprise anyone 19:02:24 <clarkb> Also worth noting that November 23 is a big US holiday which may mean people have varying numbers of days off around then 19:02:41 <clarkb> other than that I didn't have any announcements 19:02:50 <clarkb> #topic Mailman 3 19:02:58 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/899300 Upgrade to latest mm3 suite of tools 19:03:08 <clarkb> this is the change that corresponds to the mailman3 upgrade fungi announced 19:03:11 <fungi> as of moments ago the remaining cleanup changes merged 19:03:19 <clarkb> there were two other cleanup changes but ya they have both merged now 19:03:41 <fungi> so, yes, and yes, 899300 is the upgrade for thursday 19:04:03 <clarkb> outside of those changes we still need to snapshot and cleanup the old server at some point 19:04:15 <clarkb> and I've left the question of whether or not we add MX records on the agenda 19:04:31 <clarkb> I'm somewhat inclined to leave things alone in DNS since this seems to be working and is simpler to manage 19:04:51 <tonyb> yeah they shouldn't be needed. 19:05:39 <fungi> down the road we can revisit things like spf and dkim signing if they become necessary, but i'd rather avoid them as long as we can get away with 19:05:57 <tonyb> ++ 19:06:02 <clarkb> fungi: both google and yahoo have been making statements about requiring that stuff early next year... 19:06:11 <clarkb> but ya lets worry about it when we get more concrete details 19:06:24 <clarkb> meanwhile the vast majority of spam I receive comes from gmail addresses 19:06:36 <clarkb> Anything else mailing list related? 19:07:00 <fungi> nope 19:07:07 <clarkb> #topic Server Upgrades 19:07:23 <clarkb> tonyb started looking at jammy mirror testing 19:07:24 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/899710 19:07:44 <tonyb> it's currently failing due to a change in behaviour in curl 19:07:45 <clarkb> this appears to be failing but I think for a test framwork reason not necessarily because the jammy deployment is failing. I don't fully understand the failure though 19:07:54 <clarkb> oh is it curl that changed? fun 19:07:58 <tonyb> yeah 19:08:24 <tonyb> a command line that works on focal fails on newer curls 19:08:43 <tonyb> I'm looking at what the right fix is. 19:08:54 <clarkb> tonyb: I think what the goal is there is to set SNI stuff properly so that we get the correct responses? 19:09:06 <tonyb> I'll propose a testing fix first and then add the jammy server 19:09:07 <clarkb> otherwise using localhost we don't align with the apache vhost matching 19:09:45 <fungi> i suppose an alternative could be to fiddle with /etc/hosts on the node, but that feels... dirty 19:10:12 <tonyb> yeah. that makes sense. we could probably do something in python itself 19:10:31 <tonyb> it should be a quick fix once I get back to my laptop 19:10:40 <clarkb> looking forward to it 19:11:10 <clarkb> I am not aware of any other updates but wanted to mention we are also in a good period of time to look at meetpad server replacements since the PTG just ended 19:11:22 <tonyb> at some point we should decide if we need testing for 3 Ubuntu releases 19:11:44 <tonyb> meetpad is next 19:11:54 <clarkb> tonyb: in general I think we're trying to align with all the things we're deploying. As we replace servers and reduce the total list of mirrors we can reduce the ubuntu flavors 19:12:11 <clarkb> tonyb: the main concern there is apache version differences (which haven't been a problem in more recent years) and openafs functionality 19:12:17 <tonyb> okay that's pretty much what I thought 19:12:40 <fungi> yeah, basically we want to test that the changes we make to stuff continues to work on the platforms we're currently running on, and then once we're not running on those platforms we can stop testing them 19:12:56 <tonyb> ++ 19:13:03 <fungi> on a service-by-service basis 19:13:18 <clarkb> once upon a time we tried to be more generally compatible with people doing similar to us outside of our env but realized it was too much effort and should focus on our set of things 19:13:30 <fungi> so basically, if we upgrade the meetpad servers to jammy, we can then switch to only testing that meetpad deployments work on jammy 19:14:19 <clarkb> #topic Python Container Updates 19:14:27 <clarkb> #link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open 19:14:38 <clarkb> this is very close to the finish line (as much as there is one) 19:14:52 <clarkb> python 3.9 is gone and the current TODOs are to update zuul-operator and OSC to python3.11 19:15:10 <clarkb> OSC should merge its change soon I expect as openstack is voting on python3.11 jobs now which makes switching the image to python3.11 safe 19:15:27 <clarkb> on the zuul-operator side of things the CI jobs there are all unhappy and I'm not quite sure the scope of the necessary fixes yet 19:15:44 <clarkb> I was hoping zuul-operator users would get it sorted soon enough but I may need to help out 19:15:51 <clarkb> once that is done we can drop python3.10 image builds 19:15:58 <tonyb> yay 19:16:29 <clarkb> I've also got a change up to add python 3.12 images but that is failing because uwsgi doesn't support python3.12 yet. 19:16:43 <clarkb> I think we can wait for them to make a release that works (there is upstream effort to support newer python but not yet in a release) 19:16:44 <tonyb> a quick tangent, I think it'd be good to remove old images/tags from the public registry 19:17:04 <tonyb> leaving buster based 3.7 images feels dangerous? 19:17:28 <clarkb> maybe? openshift recently broke zuul's openshift functional testing because they deleted old images 19:17:39 <fungi> sounds like a refcounting challenge 19:17:47 <tonyb> I could generate a list of things we could tag as deprecated and pull later 19:17:54 <clarkb> there is definitely a tradeoff. I think if someone is using an image for testing its fine, but you're right you wouldn't want it in production 19:18:07 <clarkb> maybe retag as foo-deprecated 19:18:17 <tonyb> fair enough. 19:18:19 <fungi> foo-dangerous 19:18:21 <clarkb> then people have an out but it also makes it more apparent if something should not be used 19:18:31 <tonyb> yeah that's sort of what I was thinking 19:18:32 <clarkb> I think that would be my preference over proper deletion 19:18:42 <fungi> foo-if-it-breaks-you-get-to-keep-the-pieces 19:19:09 <tonyb> that's all I had 19:19:18 <clarkb> #topic Gitea 1.21 19:19:25 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/897679 19:19:58 <clarkb> still no proper release and no changelog 19:20:39 <clarkb> I have tried to keep up with their updates though and it generally works for us other than the ssh key size check thing that I disabled in that change 19:20:56 <clarkb> I've left this on the agenda under the assumption we'll have to make decissions soon but upstream hasn't made that the case yet 19:21:08 <clarkb> #topic Gerrit 3.8 Upgrade 19:21:17 <clarkb> This is one with a bit more detail 19:21:19 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.8 19:21:24 <clarkb> #link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/ 19:21:37 <clarkb> I have announced the plan to upgrade Gerrit to 3.8 on November 17 at 15:30-16:30 UTC 19:21:54 <clarkb> I've tested the downgrade path on a held CI node and then re upgraded it for the experience of it 19:22:32 <clarkb> Yesterday we merged a config update necessary for 3.8 that we'll want to have in place under 3.7 to ensure it is working there as well. My plan is to restart Gerrit later today 19:22:55 <clarkb> this config update shouldn't result in any behavioral differences. It is entirely about maintaining compatibility of acceptable config in gerrit 3.8 19:23:30 <clarkb> fungi: if I want to do that restart around say 22:00 UTC is that a bad time for you? 19:23:46 <clarkb> fungi: maybe better to ask if ther eis a good time for you later today? 19:24:28 <fungi> sure, i can help at 22:00 utc 19:24:52 <clarkb> cool I think that time should work for me 19:24:56 <fungi> great 19:25:36 <clarkb> the last thing I've noticed is a traceback starting up the plugin manager plugin. Upstream thought they had already fixed it which made me concerned this waws a problem with our builds but on closer inspection it seems to be a different problem (tracebacks differ) 19:25:46 <clarkb> also we hit it on 3.7 too so shouldn't impact 3.8 19:25:57 <fungi> so basically not a regression 19:26:07 <clarkb> more of a thing to be aware of as an expected startup tracebacks that looks scary but is believed to be fine 19:26:07 <fungi> just some continued broken for a feature we're not using 19:26:11 <clarkb> yup 19:26:51 <clarkb> and that was all I had. The etherpad has pointers to the held node if anyone wants to take a look at it 19:27:03 <clarkb> #topic Etherpad 1.9.4 19:27:26 <fungi> in progress 19:27:34 <clarkb> in the time it took us to be ready to upgrade to 1.9.3 they released 1.9.4. Fun fact: 1.9.4 fixes the mysql isn't utf8mb4 encoded bug I filed with them years ago 19:27:35 <fungi> i need to finish diffing the upstream container configs 19:27:53 <clarkb> we worked around that by manually setting the encoding on the db but before that etherpad hard crashed because it couldn't log in this instance 19:28:37 <tonyb> was that the poo emoji crash from like Vancouver? 19:28:44 <fungi> snowman 19:28:53 <fungi> but yes 19:28:56 <tonyb> okay 19:28:58 <clarkb> tonyb: no, this was on the db level not the table level. They had fixed the table level thing prior 19:29:02 <clarkb> its all related though. 19:29:19 <clarkb> In this case they wanted to log "warning this is probably a problem" but their loggign was broken so the whole thing crashed 19:29:28 <clarkb> rather than bad bytes causing the crash later 19:29:36 <tonyb> lol 19:29:37 <fungi> ah, right, that problem 19:30:00 <clarkb> fixing the db level encoding meant it never tried to log and things proceeded :) 19:30:11 <fungi> also related, update to log4js which invalidates some of the config we're carrying, preventing the service from starting, which is why i need to more deeply diff the configs 19:30:11 <clarkb> fungi: I guess once we have an updated change we'll hold a node and do another round of testing 19:30:40 <fungi> correct 19:30:41 <tonyb> sounds good. 19:31:05 <clarkb> #topic Open Discussion 19:31:10 <clarkb> that was it for the emailed agenda 19:31:29 <clarkb> worth noting we just updated nodepool to exclude openstacksdk 2.0.0 as it isn't compatible with rax cinder v2 apis 19:31:44 <clarkb> a fix is in progress in openstacksdk which frickler and I mentioned we could help test 19:32:08 <clarkb> this effectively took rax offline in nodepool for a few days. It also causes nodepool to not mark nodes as node failures when a cloud is failing like that 19:32:31 <clarkb> I kinda want to make nodepool fail the request in that cloud when the cloud is throwing errors rather than try forever 19:33:10 <tonyb> that seems like it'd be move visible 19:33:27 <fungi> the openstack vmt would like a private room on the opendev matrix homeserver to use instead of its current restricted irc channel, since some members are joining from the oftc matrix bridge which doesn't handle nickserv identification very well. i doubt there will be any objections, but... objections? otherwise i'll work on adding it 19:33:29 <frickler> it failed very early, kind of a similar scenario to the expired cert issue, which also could use better handling 19:33:56 <clarkb> frickler: ya I think they have the same underlying failure method for request handling internally with nodepool which is to basically move on and then the request is never completed 19:34:43 <clarkb> fungi: no objections from me. Worth noting private and encrypted are distinct in matrix so you'll have to decide on those two things separately iirc 19:34:55 <fungi> yeah, it'll be both in this case 19:35:00 <clarkb> private is basically invite only and then encrypted is whether or not everyone is doing e2e amongst themselves 19:35:31 <frickler> afaict even then some things are not encrypted like emojis 19:35:54 <frickler> but not worse than IRC likely, so no objection either 19:36:20 <tonyb> thanks fungi 19:36:33 <fungi> the vmt uses its private communication channel only for coordinating things which can't be mentioned in public (and even then it's just things like "i triaged this private bug, please take a look: <url>" so emojis rarely come into it ;) 19:37:39 <clarkb> re Holidays the 10th is also a holiday here and I'm taking advantage for a long weekend. I won't be around on the 10th and 13th 19:38:01 <fungi> i'll try to be around 19:38:07 <frickler> one question regarding the branch deletion I did for kayobe earlier: the github mirror should sync this on the next merged change, right? 19:38:21 <fungi> correct 19:38:42 <fungi> that job only gets triggered by changes merging, so addition/deletion of branches or pushing tags doesn't replicate immediately 19:39:16 <frickler> ok, so we'll wait for that to happen and then can check again 19:39:39 <fungi> it could probably be added to additional pipelines, if that becomes a bigger problem 19:40:22 <frickler> I don't think it is urgent in this case, just wanted to cross that check off my list 19:41:31 <clarkb> last call for anything else. Otherwise we can all have a few minutes back for $meal or sleep 19:43:12 <clarkb> thank you for your time and help everyone! We'll be back here same time and place next week. 19:43:17 <clarkb> #endmeeting