clarkb | I'm going to be a minute or three late to the meeting | 19:00 |
---|---|---|
clarkb | I'd liek to finish this train of thought with gerrit | 19:00 |
clarkb | #startmeeting infra | 19:02 |
opendevmeet | Meeting started Tue Sep 26 19:02:07 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:02 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:02 |
opendevmeet | The meeting name has been set to 'infra' | 19:02 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JUUNKEA2W2T4IA64DHBIYKUHBXOH7W3D/ Our Agenda | 19:02 |
clarkb | #topic Announcements | 19:03 |
clarkb | OpenStack is going to be making final release candidates this week and the actual Bobcat release should occur next week | 19:04 |
clarkb | please be aware of that as we make changes | 19:04 |
clarkb | #topic Mailman 3 | 19:06 |
clarkb | fungi: should we jump right into planning for the final mailman3 migration? | 19:06 |
fungi | sure, i had a proposed date on the etherpad... just a sec | 19:06 |
fungi | #link https://etherpad.opendev.org/p/mm3migration | 19:07 |
fungi | line 265 | 19:08 |
fungi | tentative maintenance 15:30-19:30 utc on thursday 2023-10-12 (week after release) | 19:08 |
fungi | that's for lists.openstack.org which is the only remaining site on the old server | 19:08 |
clarkb | that date works for me | 19:09 |
fungi | if folks generally find that acceptable i'll announce it on openstack-discuss this week and start putting together a detailed maintenance plan based on the previous 6 sites we've migrated | 19:09 |
frickler | +1 | 19:09 |
fungi | i've started a handful of notes below there on things we'll want to be mindful of for this specific maintenance, either lessons learned from earlier migrations or due to the size of the data | 19:10 |
fungi | i'll take those into account when drafting the full plan | 19:10 |
fungi | aside from that, last week's maintenance went smoothly | 19:11 |
fungi | we've had a couple of requests for clarification on new urls for starlingx mailing lists, but no problems reported | 19:11 |
fungi | sounds like we can proceed with the date and time indicated, not hearing any objections | 19:12 |
fungi | i didn't have anything else on this topic, but am happy to answer questions | 19:13 |
clarkb | #topic Server upgrades | 19:13 |
clarkb | Nothing new to add here | 19:14 |
clarkb | #topic Nodepool Image Build and Upload Status | 19:14 |
clarkb | has anyone checked if the uploads in various clouds (particularly rax iad?) are looking good since our config chagnes (ending with the increase of the upload timeout about a week ago) | 19:15 |
fungi | i have not | 19:15 |
frickler | I just check rax-iad | 19:15 |
frickler | looking as expected, all uploads succeeded at the first attempt | 19:16 |
fungi | yay! | 19:16 |
fungi | i guess we can drop this from the agenda? | 19:16 |
frickler | yes, we will handle inmotion as a different topic I think | 19:16 |
fungi | agreed | 19:16 |
frickler | except maybe check for leftover images one last time? | 19:17 |
clarkb | feel free :) | 19:17 |
frickler | or did you check last week? | 19:17 |
clarkb | I did not do a pass since I cleaned up the images I last cleaned up whcih was more than a week ago | 19:17 |
fungi | i don't remember any more, but i can take a look | 19:17 |
clarkb | thanks | 19:20 |
clarkb | #topic OpenMetal | 19:20 |
clarkb | Hvaen't heard back from Yuriy since I responded last week. Did anyone else see email that I overlooked? | 19:20 |
clarkb | I probably need to prod him on that | 19:20 |
frickler | nope | 19:21 |
fungi | i did no | 19:22 |
fungi | t | 19:22 |
clarkb | ok ya I'll try to remember to send a followup again then | 19:23 |
clarkb | #topic Zuul PCRE regexes are deprecated | 19:23 |
clarkb | I don't think there is anything new to cover on this | 19:23 |
clarkb | But happy to be wrong :) | 19:23 |
frickler | I saw some teams doing patches in reaction to the announcement | 19:24 |
frickler | so IMO we can give it some weeks maybe and then recheck the remainders | 19:24 |
clarkb | sounds good | 19:25 |
clarkb | #topic Python container image updates | 19:25 |
clarkb | #link https://review.opendev.org/q/(topic:bookworm-python3.11+OR+hashtag:bookworm)+status:open | 19:25 |
clarkb | As noted previously we'll defer Gerrit until after the release. As a result I pushed up more changes | 19:25 |
clarkb | reviews welcome. Be aware that zuul/zuul-registry's image doesn't have a straightfowrard move to bookworm because it currently relies on openssl 1.1 | 19:26 |
clarkb | we may end up leaving that on bullseye for now and then keeping on bullseye image longer term for python3.11? | 19:26 |
clarkb | in any case we can sort that out later. Reviews are welcome on the other changes | 19:29 |
clarkb | #topic Etherpad 1.9.3 Upgrade | 19:29 |
fungi | yeah, that's an unfortunate situation | 19:29 |
clarkb | fungi: want to fill is in on where this neded up yseterday? | 19:30 |
fungi | #link https://review.opendev.org/896454 Upgrade Etherpad to 1.9.3 | 19:30 |
fungi | clarkb and i tested a held upgraded node yesterday | 19:30 |
fungi | i didn't observe any problems, though i think you noticed some weirdness with cached something in chrom*? | 19:31 |
clarkb | ya my chrome browser kept reconencting to the etherpad | 19:32 |
fungi | the changelog is short and doesn't seem to touch anything our deployment should care about | 19:32 |
clarkb | well and before I did a hard refresh it got an error for some unfound symbol | 19:32 |
clarkb | but switching into incognito mode made the problems go away so almost certainly something was cached badly | 19:32 |
frickler | sounds like a 3rd test might be helpful to decide how reproducible this is | 19:34 |
frickler | I'll try to do that this week, then | 19:34 |
clarkb | frickler: ++ if anyone other than fungi or myself have time for that. the clarkb-test pad is where we were testing and it has content already | 19:34 |
clarkb | #topic Gitea 1.21.0 Upgrade | 19:35 |
clarkb | Gitea 1.21.0-rc0 is out now | 19:35 |
clarkb | There is no changelog yet so this isn't urgent, but in the past I've tried to push up an upgrade testing chagne once RCs are available in order to start working through problems | 19:36 |
clarkb | It would probably be good for someone other than myself to have a go at it. Any interest in doing a 1.21.0 upgrade? | 19:36 |
clarkb | The typical process I use is 1) check if go and/or nodejs need to be updated 2) upate our templates to keep in sync with updates upsteram 3) sort out any other items called out in the changelog | 19:37 |
clarkb | 3) is what usually takes the most time as you need to understand chagnes if they intersect with our use cases and occasionally write tests to cover the updates | 19:38 |
clarkb | but even just having 1 and 2 done can help rule out a bunch of stuff pretty quickly in our test system | 19:38 |
clarkb | if there is interest say something in #opendev so that we don't end up doing duplicate work | 19:39 |
clarkb | #topic Gerrit Replication Task Leaks Fix | 19:40 |
clarkb | #link https://gerrit-review.googlesource.com/c/plugins/replication/+/387314 | 19:40 |
clarkb | I've written a chagne that when tested by hand on a held node seems to do what we want. Somewhat predictably gerrit maintainers have asked me to write test cases though. I've spent the morning doing my best to pretend I understand what is going on there and the end result is my latest comment | 19:41 |
clarkb | tl;dr I think i have two test cases that are very close but haven't figured out how to run them locally yet and a third is basically missing a major piece of testing in the replication plugin (replicating when permissiosn say no) and I am currently completely lost in Gerrit's internal models for permissions | 19:42 |
clarkb | I'm hopeful we can get that merged by the time we want to restart Gerrit though so that we can fix the problem for good | 19:42 |
clarkb | #topic PTGBot Webserver crashing | 19:45 |
clarkb | seems to be running right now. Did anyone have to restart it since we restarted it yseterday? | 19:45 |
frickler | not me | 19:46 |
fungi | i don't see any indication it's been restarted | 19:46 |
fungi | Sep 25 19:20:43 eavesdrop01 docker-ptgbot[646]: DEBUG:root:Debugging on | 19:46 |
fungi | that was the last thing it logged | 19:46 |
fungi | which is when i started it yesterday | 19:47 |
frickler | ps also says running since yesterday | 19:47 |
fungi | yeah, so i think it has not crashed (yet anyway) | 19:47 |
clarkb | ok so no new information to look at there. Did we figure out if the webserver logging was broken? | 19:48 |
fungi | i think it's just not instrumented for request logging | 19:48 |
fungi | it's logging an explicit line proving that debug logging was enabled at start | 19:48 |
fungi | i have no idea if it's sufficient to log exceptions/tracebacks though | 19:49 |
frickler | so that'll need some further testing once someone has time, but it doesn't seem urgent for now | 19:50 |
clarkb | ok sounds good | 19:50 |
clarkb | #topic Open Discussion | 19:50 |
clarkb | Anything else? | 19:50 |
fungi | i did check for leaked nodes in rax-iad and found 19 | 19:51 |
frickler | nodes or images? | 19:51 |
fungi | not sure how old they are | 19:51 |
fungi | sorry, images | 19:51 |
fungi | one i just looked at was from 2023-09-01 | 19:52 |
fungi | so they might not be very recent | 19:52 |
frickler | so that's before the timeout bump | 19:52 |
clarkb | ~2023-09-18 is when we got the config where we wanted it | 19:52 |
clarkb | I would probably clean anything before the 18th up and anything after can be subject to debugging | 19:52 |
fungi | looping through to get dates now | 19:54 |
fungi | most are from 2023-08-30 and 2023-09-01 | 19:54 |
clarkb | ya so possibly timed out uploads that didn't clean up proeprly | 19:54 |
fungi | i think 2023-09-13 is the most recent | 19:55 |
fungi | so all from before the change | 19:55 |
fungi | i think that indicates it's been successful, and we can go ahead and mop up these remnants | 19:55 |
fungi | i'll delete them now | 19:55 |
clarkb | ++ | 19:55 |
frickler | ack | 19:56 |
fungi | and now that's done too | 19:57 |
frickler | next tuesday is a bank holiday here, so I might not be around | 19:58 |
clarkb | enjoy the day off | 19:58 |
clarkb | I don't think we have any holidays here until november | 19:58 |
fungi | also corvus and i looked closely at one and they're completely missing metadata because the sdk adds the metadata after the image import tasks complete, so we can't use metadata to indicate they're safe to clean up | 19:58 |
fungi | though maybe we could look for a complete lack of metadata, i dunno | 19:59 |
clarkb | and we are just about at time. Thank you everyone. We'll be back next week | 19:59 |
clarkb | fungi: complete lack of metadata is normal for user uploads which is the risk there | 19:59 |
fungi | right, exactly | 19:59 |
clarkb | you might delete somethign someone has uploaded iirc | 19:59 |
clarkb | #endmeeting | 19:59 |
opendevmeet | Meeting ended Tue Sep 26 19:59:42 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:59 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-09-26-19.02.html | 19:59 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-09-26-19.02.txt | 19:59 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-09-26-19.02.log.html | 19:59 |
corvus | yeah, works for us manually, but not automatable | 19:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!