19:01:25 <clarkb> #startmeeting infra 19:01:25 <opendevmeet> Meeting started Tue Aug 31 19:01:25 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:25 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:25 <opendevmeet> The meeting name has been set to 'infra' 19:01:36 <corvus> o/ 19:01:40 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-August/000279.html Our Agenda 19:02:10 <clarkb> #topic Announcements 19:02:53 <clarkb> The openstack feature freeze has begun. Keep that in mind as we make changes. In the past we've described it as a good time to be "slushy" eg we dno't have to freeze but keep in mind potential impacts and try to avoid impacting the release process if possible 19:03:01 <ianw> o/ 19:03:27 <clarkb> Looking at zuul queues they aren't quite as deep as I expected. but I expect that to grow through the week as last minute changes get merged 19:04:11 <clarkb> #topic Actions from last meeting 19:04:17 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-08-24-19.01.txt minutes from last meeting 19:04:35 <clarkb> ianw had an action to look into logo hosting. That has happened and I've added a separate agenda topic just for that to catch up 19:04:42 <clarkb> #topic Specs 19:04:47 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement 19:05:02 <clarkb> frickler and tristanC have left some thoughts on ^ if others can take a look and let me know what you think that would be helpful 19:05:21 <clarkb> adding a service like prometheus seems like the sort of safe slushy activity we could do over the next few weeks while openstack puts its release together 19:06:28 <clarkb> #topic Topics 19:06:35 <clarkb> At this point I'm not sure that topic will ever go away :) 19:06:43 <clarkb> #topic Mailman server upgrades 19:07:06 <clarkb> Fungi ran through the lists.katacontainers.io upgrade to Focal last week. It appears to be working as expected 19:07:19 <clarkb> I've cleaned up some of the testing nodes we had held/created for that process today. 19:07:39 <fungi> yeah, upgrade went smoothly for the most part 19:07:44 <clarkb> fungi: you had talked about measuring resource consumption on the upgraded server to try and infer if it would be a problem with the vhosts on lists.o.o 19:08:00 <fungi> i did not find time to restart services on the servers over the weekend 19:08:12 <fungi> but maybe some night this week when things are slow 19:09:03 <clarkb> ok, I think that may be the last thing we are waiting on before we do an upgrade of lists.o.o? Do we want to schedule that upgrade assuming the metrics gathering happens and doesn't show any problems and postpone otherwise? 19:10:09 <fungi> yeah, i think that would be fine 19:10:32 <fungi> the sooner we can get it off xenial and ua the better 19:10:57 <clarkb> fungi: you mentioned doing it on a weekend which works for me as well. Maybe September 12? 19:11:14 <clarkb> That is the Sunday before openstack's RCs go out 19:11:50 <fungi> yeah, the 11th or 12th would work fine for me 19:12:19 <clarkb> do you think that would be enough time for prep? Also how concerned are you about impacts post upgrade? You did the first server so probably have the best grasp of that 19:13:28 <clarkb> fungi: ^ 19:13:31 <fungi> not especially concerned, the sysvinit scripts are the only real difference 19:13:43 <fungi> and associated disk layout 19:13:56 <clarkb> ok in that case lets say start at 1500UTC Sunday? 19:13:57 <fungi> also the age of the server, size of the rootfs, et cetera 19:14:05 <clarkb> er 1500UTC Sunday September 12 19:14:18 <fungi> sure, that seems good 19:14:33 <clarkb> fungi: do you want to draft an announcement of that or should I? 19:14:34 <fungi> still early enough it probably won't impact apac monday 19:14:42 <fungi> i can write up an announcement later today 19:14:45 <clarkb> thanks! 19:15:13 <clarkb> Anything else on the subject of list servers? 19:15:42 <fungi> not this week i don't think 19:15:51 <clarkb> #topic Improve OpenDev's CD Throughput 19:16:24 <clarkb> This should be quick as I didn't have time last week to do the mapping. Sorry about that. I ended up being very distracted with some family stuff that poppped up. Things appear far more normal this week thankfully 19:16:55 <clarkb> Basically everything after tuesday last week is getting transplanted to after tuesday this week. I plan to get started on that in the next day or two 19:17:24 <clarkb> That said I wanted to ask if the small easy improvements we have already made are helping 19:17:59 <clarkb> From what I can see it seems to have helped quite a bit, but also wasn't paying too much attention last week after the inventory file matchers got updated 19:18:34 <fungi> yeah, stuff is deploying faster i think, anecdotally observed anyway 19:19:23 <clarkb> great and no concerns of jobs not running when we expected them? 19:19:53 <fungi> none that i've noticed, but that could easily go unspotted for weeks/months 19:20:10 <clarkb> ya something to watch out for, but not an immediate issue it seems 19:21:04 <clarkb> #topic Gerrit Account Cleanups 19:21:37 <clarkb> Last Tuesday I said I'd do the external id deletion for the last set of retired accounts tomorrow but that iddn't happen becuase I wasn't able to sit at a computer for reliably long enough periods of time 19:21:48 <clarkb> I intend on actually doing it tomorrow and really hope nothing comes up this week :) 19:21:54 <clarkb> Just a heads up 19:22:01 <clarkb> #topic OpenDev Logo Hosting 19:22:28 <clarkb> ianw dug into this after the meeting last week and came up with a few different ideas. Ultimately it seems like maybe we were going to use docker images like a packaging format? 19:22:51 <ianw> yes, i think this works 19:22:53 <fungi> yeah, that seemed like a great solution 19:23:08 <clarkb> ianw: are there changes we should be reviewing for that? 19:23:19 <fungi> we basically build a docker image of our logos and then that can be used in building or deploying our containerized services 19:23:19 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/805932 19:23:40 <ianw> this builds a small container that has nothing but images/logos/whatever at a defined location 19:23:55 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/805933 19:24:04 <ianw> then tries to use it 19:24:39 <ianw> although 805933 failed because we didn't build the container, i'm not sure how to get around that 19:24:44 <clarkb> cool I'll review those after lunch. Are there related changes to update the paste and gerrit themes to pull from gitea hosting ? or maybe copy them similarly to what you have there for gitea? 19:25:01 <ianw> it doesn't touch anything that triggers the container build in the gate, and there isn't one available for download 19:25:31 <clarkb> ianw: you need the requires and provides in the jobs? 19:25:55 <clarkb> I think if you do that it may work? 19:26:10 <ianw> hrm, will that *trigger* the job, or just order it correct? 19:26:41 <clarkb> I think its meant to say that job must run (or if it is a soft requirement then it will only run if otherwise triggered) 19:26:54 <clarkb> but I could be wrong about how that interacts with a brand new image. 19:27:12 <ianw> anyway we can discuss in #opendev, the only other thing was it uses buildkit to achieve this 19:27:42 <clarkb> corvus: do you know if we've written this down anywhere for opendev specifically? That might be another good output of the job mapping exercise and reorg we can be more consistent about those thinsg and write down how to use them properly for our jobs 19:27:44 <ianw> iirc mordred switched zuul to this anyway, and it seems like it will one day be the docker default 19:28:09 <clarkb> ya there was some buildkit feature around mounts that I think zuul wanted to use? seems fine to use buildkey for richer image creation 19:29:20 <ianw> yes, it's the same feature; being able to copy things in from another container 19:30:03 <fungi> ahh, as opposed to having to use the container image as a "layer" i guess 19:30:17 <ianw> #link https://review.opendev.org/c/zuul/zuul/+/712717 19:30:29 <ianw> is the zuul one, it's not merged 19:31:07 <corvus> provides/requires won't cause a job to run, just allow artifacts to pass between queue items 19:32:18 <corvus> which includes docker images 19:32:40 <clarkb> got it so the first change could provide the image to the second change 19:32:54 <clarkb> and that would work even without an image published to docker hub because we have the zuul registry 19:33:13 <corvus> so if change A runs a job that provides docker image foo, and change B runs a job that requires docker image foo, then change B's job will run with foo built for change A 19:33:27 <corvus> clarkb: exactly 19:34:23 <clarkb> ianw: thank you for looking into this. 19:34:24 <corvus> after a really quick look, i think that since 805933 has a "RUN" command that references the assets image, then the gitea building jobs should require the assets image 19:34:49 <ianw> right, but in this case, system-config-build-image-assets has file matchers that mean it only runs if one of the asset files are updated 19:35:05 <ianw> maybe the matchers for that job need to include the Dockerfile of images that use it 19:35:23 <corvus> ianw: but 805932 builds the image, so 933 would get it 19:35:24 <ianw> (that job being ...build-image-assets) 19:35:33 <clarkb> ianw: no, the chagne that adds the assets image would build and create the image and put it in the zuul registry. Then requiring that image would allow the gitea job to get it from zuul registry 19:35:42 <clarkb> you don't need the assets job to run when gitea updates 19:36:30 <corvus> provides/requires is specifically for bridging across changes (it doesn't do anything for jobs within a single change). to make both work, you need provides/requires and job dependencies 19:37:05 <fungi> and later updates to the gitea image would just pull it from dockerhub if not provided by a change queued ahead of it? 19:37:08 <corvus> so in this case we do want both things. 19:37:18 <ianw> oh right, ok the jobs ran in parallel and the image wasn't pushed into the bulidset registry 19:37:19 <corvus> fungi: correct 19:37:54 <ianw> zuul didn't wait to start the gitea job because it didn't have the requires/provides. ok, can fix that 19:38:03 <fungi> makes sense 19:38:35 <corvus> 3 cases: 1) no change to an image: pull from hub. 2) change to assets image in same change as gitea image change: job dependencies cause assets image build to run before gitea image build. 3) change to assets image in one change, change to gitea in second: provides/requires causes assets build to run before gitea image build. 19:39:23 <corvus> and provides/requires works whether or not the items are in the queue at the same time. 19:39:43 <corvus> (if change B runs long after change A, it uses the database) 19:41:04 <clarkb> Thanks 19:41:19 <clarkb> That was all I had on the agenda I sent out. 19:41:23 <clarkb> #topic Open Discussion 19:41:30 <clarkb> Is there anything else to bring up? 19:42:22 <ianw> corvus: speaking of all this, https://review.opendev.org/c/zuul/zuul-jobs/+/798969 could probably do with your review. that adds some more info on container builds 19:42:31 <fungi> oh, openstackid is in the process of moving to foundation webdev management 19:42:47 <fungi> i updated dns for it during the meeting and am working on the system-config cleanup change now 19:43:08 <fungi> but expect some failures for, e.g., cert updates until we get things cleared out of system-config 19:43:58 <ianw> i've been working on updating nodepool images to bullseye 19:44:07 <ianw> #link https://review.opendev.org/c/openstack/diskimage-builder/+/806318 19:44:31 <ianw> seems to be the major thing required in a dib release, if someone wants to double-check 19:44:33 <clarkb> ianw: how are you handling arm64 wheels when bumpingto bullseye? Do we have bullseye wheels built for openstack? 19:45:41 <ianw> yes we should, the buildx phase has been ok 19:45:43 <ianw> #link https://review.opendev.org/c/zuul/nodepool/+/806312 19:46:03 <clarkb> cool, that would be my only concern since that can be slow without wheels pre built 19:46:05 <ianw> is the change that updates thing. i'll put some more documentation in. the release job there will fail until we release a fixed dib 19:46:46 <clarkb> I'll add those to the list of reviews to do this afternoon 19:47:12 <ianw> are we now in a process of updating the other images on a case-by-case basis? 19:47:35 <clarkb> I think so, but I'm not sure if anyone has started yet. Doing the matrix eavesdrop bot should be easy since that one was done previously? 19:47:40 <ianw> (updating them to bullseye) 19:48:14 <ianw> ok, i can look at a few too. while there's no rush, it also seems worth getting the pain over when there isn't a pressing need 19:48:28 <clarkb> thanks! 19:49:59 <clarkb> Sounds like this may be it. Thank you everyone. We'll see you here next week. Feel free to reach out in #opendev or on the service-discuss mailing list 19:50:04 <fungi> thanks clarkb! 19:50:05 <clarkb> #endmeeting