19:00:28 #startmeeting infra 19:00:28 Meeting started Tue Jan 14 19:00:28 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:28 The meeting name has been set to 'infra' 19:00:35 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3WNI573NTZ5VYPTQ5DBQVKXVJSC2QTGB/ Our Agenda 19:00:41 #topic Announcements 19:01:01 The OpenInfra foundation individual board member election is happening right now. It ends at 1900 UTC Friday 19:01:17 this is a good time to check your email inbox for your ballto if you are a foundation member 19:02:10 and if you did get one please vote. There is a great list of candidate 19:02:13 *candidates 19:02:25 Anything else to announce? 19:03:12 i'm going to be around at odd hours for the next week, just a heads up 19:03:19 (travelling tomorrow through monday) 19:03:44 i'll still be back in time for our thing starting on tuesday though 19:04:21 thanks for the heads up 19:04:35 #topic Zuul-launcher image builds 19:04:41 #link https://review.opendev.org/q/hashtag:niz+status:open is next set of work happening in Zuul 19:05:04 I believe this set of changes is next up. And I believe that was hung up on reliabiltiy of docker images which should be improving now that we're mirroring images 19:05:19 corvus: is now a good time to start looking into that or does it still need to settle a bit? 19:05:48 i'm about to refresh the change to switch to using quay images in zuul 19:05:58 so i think things are settled enough now to start merging changes 19:06:03 great! 19:06:09 i'll wait until that stack is fully reviewed before i approve them 19:06:17 (i think that's an all-or-none stack) 19:06:21 I'll try to take a look at them before too long 19:06:38 thanks, i'd love to merge them this week 19:07:01 stack has +2s on half of it, so i think it's plausible 19:07:04 anything else on this topic? 19:07:15 nope 19:07:23 #topic Upgrading old servers 19:07:35 tonyb isn't around today so don't expect any updates on wiki updates 19:07:47 given that I think we can jump straight into the noble and podman status 19:07:56 #topic Deploying new Noble Servers 19:08:20 Last week I discovered that the Noble image tonyb uploaded everywhere didn't actually make it into rax. I suspect that the ansible playbook using the openstack modules isn't capable of uploading to rax 19:08:47 I manually uploaded the vhd image tonyb created in his bridge homedir using openstacksdk which is capable of uploading to rax 19:09:14 did you upload it to all regions or just dfw? 19:09:16 During testing of those images I discovered that OSC cannot boot instances with network addresses in rax, but again using the sdk directly worked (launch node uses the sdk) 19:09:18 fungi: all three regions 19:09:21 thanks! 19:09:48 then yesterday I booted a replacement paste server on top of this image. Deployment of this then got hung up on docker rate limits 19:10:12 #link https://review.opendev.org/c/opendev/system-config/+/939124 deploy paste02 on noble 19:10:26 the parent change of ^ switches paste to pulling mariadb:10.11 from quay instead of docker hub to work around that 19:11:00 if we're comforatble doing that for production deployments it would be great to get those changes in so we can start on migrating the paste data and get some real world feedback on podman on noble 19:11:14 interestingly, we also needed to use configdrive with the noble image tony built, but not with the jammy image rackspace supplies 19:11:47 oh right. The image is from ubuntu upstream's qcow2 converted to vhd aiui and the cloud init there needs config drive (I'm guessing rax metadata services isn't compatible?) 19:12:10 the other thing we discovered is that podman on noble will only start containers on boot if they are marked restart:always 19:12:15 might be down to a difference in cloud-init version, yeah. it's definitely installed at least 19:12:48 this is a behavior change from docker which would restart under other situations as well (though that restarting wasn't super consistent due to spontaneous reboots putting containers in a failed state) 19:13:17 anyway long story short reviews on the quay.io source for mariadb:10.11 would be great that way we can proceed if comfortable with that 19:13:26 and we'll see if we learn more new things afterwards 19:13:36 +2 from me i say +w at will 19:13:36 i'm almost certain we will 19:13:50 excellent thanks 19:13:52 #topic Mirroring Useful Container Images 19:14:14 which takes us to a followup on this. Indications so far are that this will be super helpful. Thank you corvus for pushing this along 19:14:28 nice segue 19:14:37 \o/ whew! i hope it works! 19:14:49 yes, thank you!!! 19:14:55 currently the list of tags is somewhat limited so I had to add 10.11 to mariadb yseterday. If you see or think of other tags we should add then adding them early helps keep things moving along 19:15:02 #link https://review.opendev.org/c/opendev/system-config/+/939169 19:15:08 the change to do that is simple for an existing image ^ 19:15:22 i also mentioned during the openstack tc meeting a few minutes ago that we're making some progress on a reusable pattern for this sort of solution 19:15:28 also, protip, if adding a new image+job, don't forget to add the new job to the project pipeline config. ;) 19:15:43 the other thing I noticed this morning in checking that the 10.11 was mirrored properly is that we get different manifests due to different arches 19:16:13 corvus: I don't know if there is a way to have docker or skopeo etc copy all available arch images in the manifest? 19:16:29 that might result in over mirroring but woudl make checking status simpler and avoid problems with arm for example 19:16:33 yes i think there is -- but isn't everything x86 right now? 19:16:43 corvus: yes everything but nodepool is x86 in opendev 19:17:03 like, this should/will be a problem once we do something with arm... 19:17:08 so I think we're most likely to hit this when building nodepool images that try to fetch teh python base image from quay? 19:17:21 since I think we're only mirroring x86 right now 19:17:27 and maybe that is a non issue due to niz? 19:17:32 but for the moment, we're mirroring on x86 nodes and using on x86 nodes so shouldn't be an immediate problem 19:17:32 okay sounds like we're on the same page 19:17:43 it's certainly a low priority for me due to niz :) 19:17:49 yup I just wanted to call it out as something I noticed. Its not impactful yet 19:18:19 if we ever end up containering anything on the mirror servers in our cloud providers, i could see it coming up 19:18:30 if someone does want to address that; i would look into existing patterns in zuul-jobs first because i'm pretty sure we have some "all arches" stuff in there somewhere with skopeo or similar. 19:18:31 but yeah, nothing on the horizon 19:18:32 I was able to confirm the 10.11 images matched between quay and docker hub by fetching them locally. A docker image list showed them with the same id and docker inspect collapses them both into the same json oputput 19:18:57 just in case anyone else needs to do that later ^ seems to work 19:19:24 any other questions concerns or feedback with the mirrored images? 19:19:51 i think --all is the arg to skopeo 19:20:10 pipelines... do we run the mirror routine in promote/post in addition to periodic? 19:20:27 fungi: no currently it is only periodic 19:20:39 or i guess, what i meant to ask is should we? 19:20:40 but it is also primarily for images we don't produce 19:20:45 I think the only exception to that is the base python images 19:20:54 so we could potentailly run the mirroring for those as part of promote/post 19:21:01 we won't need to once we switch canonical locations to quay 19:21:06 true 19:21:15 which depends on the noble upgrade. so there's a window where this might be useful 19:21:20 well, more for updates to system-config which add a new image, we might want to merge another change to use it from that location without waiting a day 19:21:41 oh I see 19:21:56 ya I think that could be optimized 19:22:10 mainly pondering how we might be able to iterate more quickly on switching image locations or updating versions 19:22:29 i think we could do both things; should look at file matchers for general images in system-config. can just chain jobs for python-base updates. 19:22:34 i suppose it wouldn't just be on adding a new image but also adding different versions when we're pinning specific ones 19:22:43 corvus: ++ 19:22:50 yeah, that should work 19:22:57 fun trick btw: 19:23:08 add file matchers that never match anything, then the job only runs on job config updates. 19:23:14 hacks 19:23:21 heh 19:23:25 that's where i'd start for the system-config general new image problem 19:23:43 will the file matchers get ignored in periodic though? 19:23:49 or will that stop us from running in periodic? 19:23:57 maybe we need a child job for post/promote and that hack 19:24:07 i would only add them in a project-pipeline variant 19:24:12 aha 19:24:18 yeah, that seems cleaner 19:24:20 so not to the job, just to the invocation of it on system-config 19:24:32 in post or deploy or whatever 19:24:36 yup makes sense 19:24:54 that way the funky nonsensical overrides are all in one spot and become more obvious/don't get forgotten 19:25:05 fungi: is that someting you might be interested in pushing up? 19:25:26 i want to say yes, but i'm not sure i'd get to it before next week with other stuff currently on my plate 19:25:35 ack, its not urgent. Lets see where we end up 19:25:36 so happy to if nobody else beats me to it 19:26:29 #topic Gerrit 3.10.4 19:26:36 Gerrit made new bugfix releases 19:26:50 which seems like a good time to also bundle in the h2 compaction timeout increase change 19:26:57 #link https://review.opendev.org/c/opendev/system-config/+/939167 New bugfix releases available 19:27:03 #link https://review.opendev.org/c/opendev/system-config/+/938000 19:27:30 fungi: ^ maybe if you get a quite time this week during travel we can sit down and aldn those and do a couple of restarts 19:27:36 *quiet time 19:27:41 sure! 19:28:01 I don't think the bugfixes are super urgent. But they do look like good ones (including a NPE fix) 19:28:38 #topic Running certcheck on bridge 19:29:23 the certcheck tool uses a script that is a bit old on cacti and doesn't support SNI. If we move the check to running on bridge we get two things: the latest version and SNI support and we'd better match how we test this in our CI jobs (since we always have a bridge it runs there I guess) 19:29:34 i guess in retrospect, the three of us discussing this yesterday are the only ones in today's meeting, so deferring the decision was sorta pointless ;) 19:29:44 fungi wanted to bring this up to check if there were any concerns with adding exttra tools like this to the bastion host 19:30:14 I think the concern is valid but the risk in checking a tls connection once per day to a set of services that we (mostly) run seems low 19:30:24 I think I would be ok with moving this to bridge 19:30:28 seems like we established yesterday that none of the three of us had concerns, so i guess let's plan to move it 19:30:35 ++ 19:30:45 sounds good! 19:31:03 #topic Service Coordinator Election 19:31:14 i have a wip change to install it from a distro package instead of from git, i'll try to find time to amend that soon to move it off cacti and onto bridge 19:31:48 I haven't seen or heard any objections to the proposal I made last week for election plans 19:32:07 proposal: Nominations Open From February 4, 2025 to February 18, 2025. Voting February 19, 2025 to February 26, 2025. All times are UTC 19:32:22 Considering that I'll plan to make this official via email to the service-discuss list shortly 19:32:38 you have until then to lodge your objections :) 19:32:39 thanks! 19:32:48 #topic Beginning of the Year (Virtual) Meetup 19:32:54 #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:33:16 I have started gathering discussion topics on this etherpad which will also serve as the meetpad location for this event 19:34:07 the main thing to decide is timing. Last week I suggested January 21-23 timeframe whcih is next week. No objections to that and frickler indicated that we could proceed with planning and not worry too much about their abiltiy to attend 19:34:41 yeah, i'll be getting home the afternoon before that, so wfm 19:34:43 I'm thinking spreading it out over multiple days is a good idea so that we can do shorter gatherings rather than one long one. The 21st is also the day of our normal meeting 19:35:56 maybe on the 21st we do something like 2100-2300 UTC. Then 22 and 23 we do 1800-2000 if frickler can attend and 2100-2300? 19:36:11 then if we decide we don't need all of that time we can skip the 23rd too or something 19:36:15 i've got a couple of unrelated conference calls on tuesday and wednesday, but will be available most of the time 19:36:30 (basically I'm trying to be flexible and accomodate timezones and the massive block of meetings on Tuesday everyone seems to have) 19:37:02 corvus: fungi since you're here are those blocks of time particularly problematic? 19:37:47 1 sec 19:38:12 before 19z on monday and 15-16z on tuesday are my other calls, so your proposed times look fine for me 19:38:49 i don't currently have any blockers there 19:39:06 er, i mean before 19z on tuesday and 15-16z on wednesday are my other calls 19:39:17 anyway, wfm 19:40:04 excellent I'll get that dded to the etherpad and we can be flexible if necessary (like dropping the time on the 23rd if we don't need it) 19:40:35 #topic Open Discussion 19:40:45 I failed to put the Gitea 1.23.1 upgrade on the agenda 19:40:49 but there is also a proposed Gitea upgrade 19:41:02 I guess we could try landing that and see if we hit the mariadb problems and switch over images there too? 19:41:09 not sure if anyone else wants to inspect the held node 19:43:24 I guess see where we get with paste02 and if that makes progress look to gitea next 19:43:36 Anything else that wasn't in the agenda that we should cover before calling it a meeting? 19:44:10 there's a bunch of changes for bindep that could use another pair of eyes (many of which are mine) 19:44:36 testing fixes, support updates, packaging modernization 19:44:46 fungi: did we ever end up being able to explain why the old setup stopped working? 19:45:00 which old setup? 19:45:03 Thats probably not critical and most likely due to changes in setuptools since the original change was first proposed 19:45:21 fungi: the original change I pushed to have bindep use pyproject.toml worked but then you had to rework it to get it working today 19:45:32 but ya I'll put reviewing that on the todo list 19:45:49 i think you're talking about an earlier iteration of the pyproject.toml changes, i incorporated your feedback around that 19:46:19 i think it's split up fairly logically now 19:47:15 right. I'm just curious if we know why that became necessary 19:47:26 it did work when originally proposed but then that same pyproject.toml stopped working 19:47:44 looking 19:48:20 we're talking about 816741 i guess 19:48:50 yes looks like it 19:49:01 ps9 passed then it stopped passing and you had to add the py_module stuff 19:49:13 I guess setuptools changed somehow to make that mandatory 19:49:21 it's virtually identical to what you had before, but needed a tweak to account for newer setuptools and editable installs, looks like? 19:49:52 and changes to how setuptools did module searching 19:50:10 ya the nox stuff is a sideeffect of setuptools and pip not supporting editable wheels on python3.6 19:50:13 but yeah, i think it's all down to setuptools differences 19:50:33 https://review.opendev.org/c/opendev/bindep/+/816741/21/setup.cfg its this difference that I'm looking at I guess and ya setuptools changed seems to be the answer 19:50:59 right 19:51:23 one of those worked on 3.6 but then setuptools changed the name of the option 19:51:38 I think we can alnd everything up to https://review.opendev.org/c/opendev/bindep/+/938522 then maybe make a release? then land the pyproject.toml update? 19:52:27 yeah, also this all started because we wanted a candidate to exercise newer pbr when that gets tagged 19:52:34 then we can decide if we do a release with pyproject.toml before dropping python3.6 19:53:20 and I'll try to review the later half of that stack soon so that I can have an informed opinion on ^ 19:53:25 right, the later changes in that series are more like "we can do this, should we?" 19:53:56 in order to settle on a good example for how we might want to update the packaging for our other tools 19:54:52 and also the drop 3.6 change is as much to confirm all the todo comments are correct about whenwe'll be able to clean bits up 19:55:12 ++ 19:56:32 anything else? last call 19:57:12 sounds like that is it. Thanks everyone! 19:57:16 We'll be back next week 19:57:21 #endmeeting