19:00:40 <clarkb> #startmeeting infra 19:00:40 <opendevmeet> Meeting started Tue Sep 24 19:00:40 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:40 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:40 <opendevmeet> The meeting name has been set to 'infra' 19:00:53 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TI4VUGGELZFA23KRIBGVJRNJMNB7VHEK/ Our Agenda 19:00:59 <clarkb> #topic Announcements 19:01:05 <clarkb> #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1 19:01:40 <clarkb> I wanted to call this out because this event has a CFP that closes much earlier relative to the event date than a number of other events that have occurred or will occur around openinfra day themes 19:02:13 <clarkb> the event is early March and CFP closes November 1 so ~5 months in advance of the event 19:02:33 <clarkb> er 4.5 months? In any case much sooner than others 19:03:09 <clarkb> anything else to call out before we dive into the agenda? 19:03:32 <frickler> just a note I'll be away the next three weeks 19:03:51 <clarkb> thanks for the heads up. Just saw that in the tc meeting 19:04:38 <clarkb> #topic Rocky Package Mirror Creation 19:04:55 <clarkb> I didn't want NeilHanlon to have to hang around until open discussion every week so put this on the agenda 19:05:07 <clarkb> That said I don't think there is a change to do the rsyncing yet, but please point it out to me if I have missed it 19:06:35 <clarkb> sounds like there isn't anything else to add 19:06:40 <clarkb> #topic Rackspace's Flex Cloud 19:06:53 <clarkb> As noted last week the next step here is to figure out authentication to swift in the new region 19:07:10 <clarkb> I have not had a chance to poke at that. I got nerd sniped by graphviz and peppers (two separate things) 19:07:23 <NeilHanlon> hi :) i'm here, and thanks! No change submitted yet 19:07:45 <clarkb> turns out buying roasted hatch chilies in bulk leads to an afternoon of peeling and chopping and bagging and freezing the peppers 19:08:25 <clarkb> if anyone else beats me to this let me know. Otherwise its still on my todo list (probably no earlier than tomorrow) 19:08:56 <clarkb> and a reminder that if you do figure it out creating a container for staged dib image builds has been requested. Maybe opendev-zuul-dib-builds or similar for hte name 19:09:11 <clarkb> #topic Etherpad 2.2.5 19:09:39 <clarkb> Good news on this one. We have upgraded to v2.2.5 proper and are not running a previous tip of the develop branch commit 19:10:02 <clarkb> Meetpad continues to work as well which was the main motiviation behind the dev commit and this 2.2.5 upgrade 19:10:21 <clarkb> I'll drop this from next weeks agenda but wanted to catch everyone up on this so there wasn't any confusion as we near the PTG 19:11:28 <clarkb> corvus did mention a possible browser memory leak related to etherpad when we were on the dev commit but we don't have a ton of evidence so something to keep an eye on 19:12:00 <clarkb> I haven't seen it in my local firefox instance. But I also restart it at least once a week or so which may help combat issues like that. Something to be aware of and keep an eye out for if we get more reports 19:12:29 <clarkb> #topic Updating ansible+ansible-lint versions in our repos 19:12:46 <clarkb> Most of the changes related to this have merged. Thank you everyone for the reviews and faith in my ability to not break things 19:12:51 <clarkb> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work 19:12:58 <clarkb> there is one last change open for this and it is in ozj 19:13:57 <clarkb> reviews very much welcome. I'm really hopeful that we get through this then don't have to bother with it too much for a while 19:14:07 <frickler> I think we'd defer this until after the release? just to be cautious this week? 19:14:19 <clarkb> there was also some question about the utility of the ansible-lint tool and I'll mention what I said previously here since the meetings might make it easier to find 19:14:25 <clarkb> frickler: thats fine too 19:15:06 <clarkb> I think there are two main reasons to run ansible-lint. The first is that they run an ansible syntax checker as part of linting. So it checks if the ansible can run at all at a basic level. Then there are some ansible-lint rules which are actually helpful like quoting your octal permission strings 19:15:20 <clarkb> it can also detect if you miss required module parameters 19:16:40 <clarkb> there are other ways to do those checks without ansible-lint but ansible-lint can mock out modules like zuul_return and zuul_console which makes it useful for checking zuul job playbooks in particular 19:16:52 <clarkb> (otherwise you need to isntall those modules or find a way to fake them out manually) 19:17:10 <clarkb> anyway as mentioned reviews welcome. If we feeld strongly about any specific rule that we change code for feel free to note that in review 19:17:17 <clarkb> #topic Zuul-launcher image builds 19:18:07 <corvus> no significant news from me on this 19:18:17 <clarkb> Last week there were ~3 next steps here. First up was figuring out the staging location for images before they get uploaded to clouds. We were going to create a container in raxflex for this which I haven't done yet 19:18:31 <clarkb> next up was merging features into zuul-launcher itself to do the image uploads from the staging location to the clouds 19:18:38 <clarkb> corvus: ^ has that code merged yet? 19:19:18 <corvus> not yet, but it's slightly more ready to merge than it was last week :) 19:19:28 <clarkb> finally tonyb was going to look into porting the dib builds we do into more zuul jobs (currently only bullseye has a build image job) 19:19:35 <clarkb> corvus: progress! 19:19:49 <clarkb> I haven't seen any changes for new image build jobs yet 19:21:02 <clarkb> #topic OpenStack OpenAPI spec publishing 19:21:06 <clarkb> #link https://review.opendev.org/921934 19:21:17 <clarkb> I'll try to tldr this but fungi can probably fill us in with more details 19:21:35 <clarkb> basically openstacksdk folks are working on openapi specification for openstack apis that can be used to generate client/sdk tooling 19:21:55 <clarkb> they would like a new domain to host these specs under as well as the assocaited hosting and afs storage 19:22:14 <clarkb> I think the current name in the change aboev is openapi-specs.openstack.org 19:22:35 <fungi> yeah, there was some debate about the best way to publish those, the sdk team was interested in having a portable url for use in build systems for language bindings and ides 19:23:01 <fungi> in discussing yesterday it sounds like something more generic like api-specs.openstack.org could be more palatable 19:23:16 <clarkb> I think I would personally prefer that we host this stuff under something like docs.openstack.org/openapi-specs to go along with docs.openstack.org/api-refs/ but sounds like there was a lot of pushback and desire for a dedicted domain 19:23:46 <clarkb> and ya I think decoupling the domain from openapi specifically would be better bceause openapi could go away in the fuiture and get replaced with some new thing. api-specs.openstack.org would be an improvement 19:24:12 <fungi> the idea is to treat the api spec structured data similar to the existing service-types.openstack.org site 19:24:58 <fungi> i think some of the pushback was really more against using docs.o.o for it specifically because it's not documentation 19:25:13 <frickler> I still don't understand what would be different between api-specs.openstack.org vs. docs.openstack.org/api-specs 19:25:21 <frickler> specs isn't docs? 19:25:25 <clarkb> frickler: ya I'm still personally struggling with it too 19:25:40 <clarkb> if you add swagger to it then it really does become human consumable docs too 19:25:48 <fungi> the original original idea raised in the ptg session was to use the specs.openstack.org site but i pointed out that's really a completely different thing with a different kind of information and audience 19:26:19 <corvus> specs.openstack.org is for docs. ;) 19:26:21 <clarkb> but I would also argue that documentation that is meant to be machine readable doesn't stop it from being documentation 19:26:43 <clarkb> but I can at least live with a domain that we're unlikely to need to migrate off of in the future if tooling changes 19:27:01 <fungi> anyway, the reason i raised this in the agenda was that the review had been sitting for months with no feedback, so i wanted to at least check whether there was consensus on what was being proposed there 19:27:25 <fungi> it wasn't one i was comfortable approving without some additional eyes on it 19:27:51 <fungi> sounds like there isn't clear consensus, so my question is answered 19:28:31 <clarkb> I'll try to leave a review summarizing my thoughts that a) if you add swagger then this really does look like docs and then hosting this becomes simpler and b) if we continue to really think that docs.openstack.org/api-specs is inappropriate then dropping the openapi specificity would alleviate my other concerns 19:28:53 <fungi> thanks! 19:29:07 <clarkb> side note you could delete api-refs and replace it with openapi + swagger... 19:30:11 <clarkb> #topic Gitea 1.22.2 19:30:34 <clarkb> turns out there was a gitea release that I missed due to summit travel prep and traveling 19:30:40 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/930217 Change to upgrade to 1.22.2 19:30:58 <clarkb> I've gone ahead and pushed a change to do the update. It is a bugfix release compared to the one we are already running and CI passes 19:31:32 <clarkb> with the openstack release coming up maybe ya'll can take a look at the changelog and change and decide if we should do more testing with a held node or potentially just wait for the openstack release to complete? 19:31:51 <clarkb> or maybe you'll decide to send it? Mostly I'm looking for feedback on how urgent vs risky we think this update is so that we can plan accordingyl 19:33:11 <clarkb> but we don't need that feedback to happen in the meeting. COmments on the chagne are fine 19:33:17 <clarkb> #topic Upgrading old servers 19:33:34 <clarkb> anything new to note here? I don't see new patchests on the mediawiki stack 19:34:42 <clarkb> I've been starting to think about a gerrit 3.10 upgrade and then a server replacement for that server but with the openstack release it hasn't been very concrete 19:35:04 <clarkb> probably try to aim for doing that around the quieter holiday period if we can amnage to get it all lined up for that time frame 19:35:10 <frickler> gerrit server replacement? 19:35:17 <clarkb> frickler: yes to update the base os 19:35:26 <clarkb> and maybe also drop boot from volume if we can 19:35:55 <clarkb> (boot from volume makes rescuing servers far more of an adventure and I'd like to be running something that doesn't provide adventures when things are already broken) 19:36:28 <fungi> yeah, looks like our gerrit server is still on focal (20.04 lts) 19:36:36 <clarkb> there was actually a recent mailing list thread on how to do it though so should be discoverable now vs blazing our own trail 19:36:54 <clarkb> ya its not urgent but will be in 6 months 19:36:59 <clarkb> so trying to plan ahead a bit 19:37:22 <fungi> if memory serves, the "adventure" is when the original image used to boot the server is no longer present 19:37:23 <clarkb> let me know if you are interested in helping or have thoughts. Like I said its mostly just me thinking we should do that and what sorts of things might we want to change 19:37:35 <clarkb> fungi: you have to use newer microversions too 19:37:44 <fungi> oh, right 19:38:00 <fungi> older nova api didn't have support for it 19:38:36 <clarkb> #topic DNS over TLS on Test Nodes 19:38:50 <clarkb> this was actually earlier in the agenda and I skimmed right over it in my haste to prep after my errand 19:39:11 <clarkb> OVN MitM's DNS traffic that is sent in the clear 19:39:39 <clarkb> #link https://serverfault.com/questions/1134180/how-to-disable-or-fix-openstack-intercepting-dns-ptr-queries this occasioanlly breaks workloads in openstack clouds 19:39:58 <clarkb> I don't know for certain that raxflex is using OVN but the MTU that our nodes receive there implies this is the case 19:40:13 <clarkb> #link https://review.opendev.org/c/opendev/base-jobs/+/929960 Adds this to configure-unbound role in base-jobs 19:40:29 <clarkb> I've written this change to add DNS over TLS to unbound's configs in our test nodes to mitigate against this behavior 19:40:46 <corvus> is that the only cloud we've seen this so far? 19:41:04 <clarkb> the chagne is WIP because it needs to be split into two in order to do the base-test base job testing pattern but I didn't want to do all that effort until we're at least happy to make this change 19:41:20 <clarkb> corvus: yes I think they are the only one running OVN. Though it is possible openmetal is too (we can check that) 19:41:43 <corvus> probably worth including on the feedback list :) 19:42:06 <frickler> actually there are known bugs for this, I can search those tomorrow 19:42:20 <frickler> iiuc it mainly affects edns 19:42:35 <corvus> anyway, much of our setup, including unbound, is about normalizing environments and protecting us from the clouds; so i think this change is consistent with our intentions there 19:42:54 <fungi> seems like dnssec might also spot altered query replies from the mitm 19:42:54 <clarkb> in general I think dns over tls is a good idea regardless of OVN. It stops putting DNS traffic in the claer and puts traffic on TCP whcih can be handled by firewalls a bit more sanely than trying to make udp stateful 19:42:55 <frickler> using tcp should be an option, too, using tls is too much overhead I'd think 19:43:11 <clarkb> and ya we could just use tcp 19:44:15 <clarkb> frickler: re edns that is where people have noticed unexpected buggy behavior. But I'm generally not comfortable with OVN modifying/intercepting/interpretting and DNS lookup 19:44:16 <fungi> i suspect that the volume of uncached (by the local unbound) queries going over tls would be a drop in the bucket compared to things like package installs 19:44:20 <corvus> i'm okay with the minimal solution, but i do wonder, why would tls be too much? 19:45:03 <frickler> why not run unbound as recursor instead of just forwarder? 19:45:06 <clarkb> I run unbound doing upstream lookups via dns over tls on an ancient amd cpu that doesn't even have active cooling it is so underpowered 19:45:14 <clarkb> at home I mean 19:45:14 <corvus> our unbound is a caching resolver, so it's going to incur some tls overhead at the start, but.... 19:45:48 <corvus> are we not running it as a caching recursive resolver? are we running it as a forwarder only? 19:45:58 <clarkb> corvus: the current config is caching forwarder only 19:46:07 <clarkb> for test nodes. I'm not sure if the control plane nodes are recursing 19:46:30 <fungi> a caching forwarder to... google dns and opendns right? 19:46:30 <corvus> oh, then i understand why tls would be considered too much 19:46:37 <clarkb> fungi: google and cloudflare 19:46:47 <frickler> https://review.opendev.org/c/opendev/base-jobs/+/929960/8/roles/configure-unbound/templates/forwarding.conf.j2 19:46:47 <fungi> ah, okay 19:47:21 <clarkb> both google and cloudflare support this fwiw 19:47:26 <corvus> i suspect we started as resolving and changed to forwarding ages ago due to performance reasons, and that may be why i misremembered 19:47:30 <clarkb> so its not like we will be on their bad side 19:47:54 <clarkb> it is mostly a determination for whether or not we think it will have negative side effects for the jobs I think? 19:48:15 <fungi> right, they both want to collect and profit off as much dns query information as they can get their hands on, so they're highly unlikely to object 19:48:49 <clarkb> if we prefer to start with just tcp instead of tls I can do that too 19:49:01 <clarkb> as I think that will defeat the mitm behavior 19:49:33 <frickler> https://bugs.launchpad.net/neutron/+bug/2030294 and https://bugs.launchpad.net/neutron/+bug/2030295 fwiw 19:49:37 <clarkb> mostly I want rough consensus on what we think should work before I go through all the trouble of writing changes to test it 19:49:55 <clarkb> and maybe the answer is nothing or tcp or tls 19:50:22 <frickler> I'd prefer to do nothing until we see actual issues in jobs 19:50:42 <corvus> i'm +1 on tcp and +1 on tcp+tls if you have interest in benchmarking that :) 19:51:53 <frickler> but also we can talk to rackspace maybe and see whether they can get OVN to turn off this DNS mangling 19:52:01 <clarkb> I don't think OVN supports that 19:52:20 <frickler> would be an RFE likely, yes 19:52:21 <clarkb> its honestly the sort of thing that if I were openstack or neutron I would detel OVN over but I don't make those decisions 19:52:30 <clarkb> s/detel/delete/ 19:52:50 <clarkb> it is completely inappropriate behavior from an overlay network layer to intercept dns 19:53:01 <clarkb> and to do so by default without an option to turn it off 19:53:12 <clarkb> but thats a separate discussion 19:53:19 <corvus> ++ 19:53:42 <clarkb> fungi: if you have any thoughts on what you'd prefer our jobs to do can you throw them on the change? then based on that I'll see if its worth modifying to make a change whatever it may be testable 19:54:03 <clarkb> #topic Open Discussion 19:54:12 <clarkb> there were two other items I wanted to bring up before our hour is over 19:54:22 <clarkb> #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stevenfin to cleanup old jobs in ozj and project-config 19:54:29 <clarkb> #undo 19:54:29 <opendevmeet> Removing item from minutes: #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 19:54:35 <clarkb> #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stephenfin to cleanup old jobs in ozj and project-config 19:54:48 <fungi> that's been a long time coming 19:55:09 <clarkb> stephenfin has been pushing changes to clean up old stuff in our zuul configs. I started reviewing them this mornign but there are even more. The effort is much appreciated and reviewing them to make that known would be great 19:55:27 <clarkb> and then finally: 19:55:29 <clarkb> #link https://review.opendev.org/c/opendev/base-jobs/+/930082 blockdiag and seqdiag replaced with graphviz 19:55:52 <clarkb> I confused myself into believing that removing blockdiag would be straightforward because I didn't realize seqdiag is basically blockdiag with a different graph style 19:56:12 <clarkb> but I found an example that was close to what we needed with dot and graphhviz and managed to get hat working last night 19:56:32 <clarkb> the motivation here is that blockdiag and seqdiag are not really maintained and it is creating dependency trouble with python3.12 19:57:00 <clarkb> we can sidestep all of that by using graphviz and sphinx's built in support for graphviz as long as we don't mind far more verbose graph specifications in dot language 19:57:23 <clarkb> if we're happy with ^ that change and its child I can port it to zuul-jobs and zuul and elsewhere we may have the same/similar graphics 19:57:36 <clarkb> that was all from me. Anything else/ 19:57:41 <frickler> one other thing from me: any objection to dropping the exim paniclogs that keep creating additional spam after the recent unattended upgrade hickups? 19:57:47 <fungi> #link https://review.opendev.org/930236 Update Mailman containers to latest versions 19:58:16 <clarkb> frickler: no objection from me. It seemed like that got rootcaused to a race in package install and shouldn't indicate an ongoign inssue so should be safe 19:58:17 <corvus> clarkb: thanks for the graphviz work :) 19:58:32 <corvus> frickler: paniclog reset sgtm 19:58:47 <clarkb> fungi: I've added that to my review list 19:58:51 <corvus> (i think they should age out eventually, but deleting will be faster) 19:59:21 <fungi> yeah, i have no objection to clearing those exim paniclogs 20:00:08 <clarkb> and we are at time 20:00:10 <clarkb> thank you everyone 20:00:16 <clarkb> we'll be back same time and location next week 20:00:18 <clarkb> #endmeeting