19:00:40 <clarkb> #startmeeting infra
19:00:40 <opendevmeet> Meeting started Tue Sep 24 19:00:40 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:40 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:40 <opendevmeet> The meeting name has been set to 'infra'
19:00:53 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TI4VUGGELZFA23KRIBGVJRNJMNB7VHEK/ Our Agenda
19:00:59 <clarkb> #topic Announcements
19:01:05 <clarkb> #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1
19:01:40 <clarkb> I wanted to call this out because this event has a CFP that closes much earlier relative to the event date than a number of other events that have occurred or will occur around openinfra day themes
19:02:13 <clarkb> the event is early March and CFP closes November 1 so ~5 months in advance of the event
19:02:33 <clarkb> er 4.5 months? In any case much sooner than others
19:03:09 <clarkb> anything else to call out before we dive into the agenda?
19:03:32 <frickler> just a note I'll be away the next three weeks
19:03:51 <clarkb> thanks for the heads up. Just saw that in the tc meeting
19:04:38 <clarkb> #topic Rocky Package Mirror Creation
19:04:55 <clarkb> I didn't want NeilHanlon to have to hang around until open discussion every week so put this on the agenda
19:05:07 <clarkb> That said I don't think there is a change to do the rsyncing yet, but please point it out to me if I have missed it
19:06:35 <clarkb> sounds like there isn't anything else to add
19:06:40 <clarkb> #topic Rackspace's Flex Cloud
19:06:53 <clarkb> As noted last week the next step here is to figure out authentication to swift in the new region
19:07:10 <clarkb> I have not had a chance to poke at that. I got nerd sniped by graphviz and peppers (two separate things)
19:07:23 <NeilHanlon> hi :) i'm here, and thanks!  No change submitted yet
19:07:45 <clarkb> turns out buying roasted hatch chilies in bulk leads to an afternoon of peeling and chopping and bagging and freezing the peppers
19:08:25 <clarkb> if anyone else beats me to this let me know. Otherwise its still on my todo list (probably no earlier than tomorrow)
19:08:56 <clarkb> and a reminder that if you do figure it out creating a container for staged dib image builds has been requested. Maybe opendev-zuul-dib-builds or similar for hte name
19:09:11 <clarkb> #topic Etherpad 2.2.5
19:09:39 <clarkb> Good news on this one. We have upgraded to v2.2.5 proper and are not running a previous tip of the develop branch commit
19:10:02 <clarkb> Meetpad continues to work as well which was the main motiviation behind the dev commit and this 2.2.5 upgrade
19:10:21 <clarkb> I'll drop this from next weeks agenda but wanted to catch everyone up on this so there wasn't any confusion as we near the PTG
19:11:28 <clarkb> corvus did mention a possible browser memory leak related to etherpad when we were on the dev commit but we don't have a ton of evidence so something to keep an eye on
19:12:00 <clarkb> I haven't seen it in my local firefox instance. But I also restart it at least once a week or so which may help combat issues like that. Something to be aware of and keep an eye out for if we get more reports
19:12:29 <clarkb> #topic Updating ansible+ansible-lint versions in our repos
19:12:46 <clarkb> Most of the changes related to this have merged. Thank you everyone for the reviews and faith in my ability to not break things
19:12:51 <clarkb> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work
19:12:58 <clarkb> there is one last change open for this and it is in ozj
19:13:57 <clarkb> reviews very much welcome. I'm really hopeful that we get through this then don't have to bother with it too much for a while
19:14:07 <frickler> I think we'd defer this until after the release? just to be cautious this week?
19:14:19 <clarkb> there was also some question about the utility of the ansible-lint tool and I'll mention what I said previously here since the meetings might make it easier to find
19:14:25 <clarkb> frickler: thats fine too
19:15:06 <clarkb> I think there are two main reasons to run ansible-lint. The first is that they run an ansible syntax checker as part of linting. So it checks if the ansible can run at all at a basic level. Then there are some ansible-lint rules which are actually helpful like quoting your octal permission strings
19:15:20 <clarkb> it can also detect if you miss required module parameters
19:16:40 <clarkb> there are other ways to do those checks without ansible-lint but ansible-lint can mock out modules like zuul_return and zuul_console which makes it useful for checking zuul job playbooks in particular
19:16:52 <clarkb> (otherwise you need to isntall those modules or find a way to fake them out manually)
19:17:10 <clarkb> anyway as mentioned reviews welcome. If we feeld strongly about any specific rule that we change code for feel free to note that in review
19:17:17 <clarkb> #topic Zuul-launcher image builds
19:18:07 <corvus> no significant news from me on this
19:18:17 <clarkb> Last week there were ~3 next steps here. First up was figuring out the staging location for images before they get uploaded to clouds. We were going to create a container in raxflex for this which I haven't done yet
19:18:31 <clarkb> next up was merging features into zuul-launcher itself to do the image uploads from the staging location to the clouds
19:18:38 <clarkb> corvus: ^ has that code merged yet?
19:19:18 <corvus> not yet, but it's slightly more ready to merge than it was last week :)
19:19:28 <clarkb> finally tonyb was going to look into porting the dib builds we do into more zuul jobs (currently only bullseye has a build image job)
19:19:35 <clarkb> corvus: progress!
19:19:49 <clarkb> I haven't seen any changes for new image build jobs yet
19:21:02 <clarkb> #topic OpenStack OpenAPI spec publishing
19:21:06 <clarkb> #link https://review.opendev.org/921934
19:21:17 <clarkb> I'll try to tldr this but fungi can probably fill us in with more details
19:21:35 <clarkb> basically openstacksdk folks are working on openapi specification for openstack apis that can be used to generate client/sdk tooling
19:21:55 <clarkb> they would like a new domain to host these specs under as well as the assocaited hosting and afs storage
19:22:14 <clarkb> I think the current name in the change aboev is openapi-specs.openstack.org
19:22:35 <fungi> yeah, there was some debate about the best way to publish those, the sdk team was interested in having a portable url for use in build systems for language bindings and ides
19:23:01 <fungi> in discussing yesterday it sounds like something more generic like api-specs.openstack.org could be more palatable
19:23:16 <clarkb> I think I would personally prefer that we host this stuff under something like docs.openstack.org/openapi-specs to go along with docs.openstack.org/api-refs/ but sounds like there was a lot of pushback and desire for a dedicted domain
19:23:46 <clarkb> and ya I think decoupling the domain from openapi specifically would be better bceause openapi could go away in the fuiture and get replaced with some new thing. api-specs.openstack.org would be an improvement
19:24:12 <fungi> the idea is to treat the api spec structured data similar to the existing service-types.openstack.org site
19:24:58 <fungi> i think some of the pushback was really more against using docs.o.o for it specifically because it's not documentation
19:25:13 <frickler> I still don't understand what would be different between api-specs.openstack.org vs. docs.openstack.org/api-specs
19:25:21 <frickler> specs isn't docs?
19:25:25 <clarkb> frickler: ya I'm still personally struggling with it too
19:25:40 <clarkb> if you add swagger to it then it really does become human consumable docs too
19:25:48 <fungi> the original original idea raised in the ptg session was to use the specs.openstack.org site but i pointed out that's really a completely different thing with a different kind of information and audience
19:26:19 <corvus> specs.openstack.org is for docs.  ;)
19:26:21 <clarkb> but I would also argue that documentation that is meant to be machine readable doesn't stop it from being documentation
19:26:43 <clarkb> but I can at least live with a domain that we're unlikely to need to migrate off of in the future if tooling changes
19:27:01 <fungi> anyway, the reason i raised this in the agenda was that the review had been sitting for months with no feedback, so i wanted to at least check whether there was consensus on what was being proposed there
19:27:25 <fungi> it wasn't one i was comfortable approving without some additional eyes on it
19:27:51 <fungi> sounds like there isn't clear consensus, so my question is answered
19:28:31 <clarkb> I'll try to leave a review summarizing my thoughts that a) if you add swagger then this really does look like docs and then hosting this becomes simpler and b) if we continue to really think that docs.openstack.org/api-specs is inappropriate then dropping the openapi specificity would alleviate my other concerns
19:28:53 <fungi> thanks!
19:29:07 <clarkb> side note you could delete api-refs and replace it with openapi + swagger...
19:30:11 <clarkb> #topic Gitea 1.22.2
19:30:34 <clarkb> turns out there was a gitea release that I missed due to summit travel prep and traveling
19:30:40 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/930217 Change to upgrade to 1.22.2
19:30:58 <clarkb> I've gone ahead and pushed a change to do the update. It is a bugfix release compared to the one we are already running and CI passes
19:31:32 <clarkb> with the openstack release coming up maybe ya'll can take a look at the changelog and change and decide if we should do more testing with a held node or potentially just wait for the openstack release to complete?
19:31:51 <clarkb> or maybe you'll decide to send it? Mostly I'm looking for feedback on how urgent vs risky we think this update is so that we can plan accordingyl
19:33:11 <clarkb> but we don't need that feedback to happen in the meeting. COmments on the chagne are fine
19:33:17 <clarkb> #topic Upgrading old servers
19:33:34 <clarkb> anything new to note here? I don't see new patchests on the mediawiki stack
19:34:42 <clarkb> I've been starting to think about a gerrit 3.10 upgrade and then a server replacement for that server but with the openstack release it hasn't been very concrete
19:35:04 <clarkb> probably try to aim for doing that around the quieter holiday period if we can amnage to get it all lined up for that time frame
19:35:10 <frickler> gerrit server replacement?
19:35:17 <clarkb> frickler: yes to update the base os
19:35:26 <clarkb> and maybe also drop boot from volume if we can
19:35:55 <clarkb> (boot from volume makes rescuing servers far more of an adventure and I'd like to be running something that doesn't provide adventures when things are already broken)
19:36:28 <fungi> yeah, looks like our gerrit server is still on focal (20.04 lts)
19:36:36 <clarkb> there was actually a recent mailing list thread on how to do it though so should be discoverable now vs blazing our own trail
19:36:54 <clarkb> ya its not urgent but will be in 6 months
19:36:59 <clarkb> so trying to plan ahead a bit
19:37:22 <fungi> if memory serves, the "adventure" is when the original image used to boot the server is no longer present
19:37:23 <clarkb> let me know if you are interested in helping or have thoughts. Like I said its mostly just me thinking we should do that and what sorts of things might we want to change
19:37:35 <clarkb> fungi: you have to use newer microversions too
19:37:44 <fungi> oh, right
19:38:00 <fungi> older nova api didn't have support for it
19:38:36 <clarkb> #topic DNS over TLS on Test Nodes
19:38:50 <clarkb> this was actually earlier in the agenda and I skimmed right over it in my haste to prep after my errand
19:39:11 <clarkb> OVN MitM's DNS traffic that is sent in the clear
19:39:39 <clarkb> #link https://serverfault.com/questions/1134180/how-to-disable-or-fix-openstack-intercepting-dns-ptr-queries this occasioanlly breaks workloads in openstack clouds
19:39:58 <clarkb> I don't know for certain that raxflex is using OVN but the MTU that our nodes receive there implies this is the case
19:40:13 <clarkb> #link https://review.opendev.org/c/opendev/base-jobs/+/929960 Adds this to configure-unbound role in base-jobs
19:40:29 <clarkb> I've written this change to add DNS over TLS to unbound's configs in our test nodes to mitigate against this behavior
19:40:46 <corvus> is that the only cloud we've seen this so far?
19:41:04 <clarkb> the chagne is WIP because it needs to be split into two in order to do the base-test base job testing pattern but I didn't want to do all that effort until we're at least happy to make this change
19:41:20 <clarkb> corvus: yes I think they are the only one running OVN. Though it is possible openmetal is too (we can check that)
19:41:43 <corvus> probably worth including on the feedback list :)
19:42:06 <frickler> actually there are known bugs for this, I can search those tomorrow
19:42:20 <frickler> iiuc it mainly affects edns
19:42:35 <corvus> anyway, much of our setup, including unbound, is about normalizing environments and protecting us from the clouds; so i think this change is consistent with our intentions there
19:42:54 <fungi> seems like dnssec might also spot altered query replies from the mitm
19:42:54 <clarkb> in general I think dns over tls is a good idea regardless of OVN. It stops putting DNS traffic in the claer and puts traffic on TCP whcih can be handled by firewalls a bit more sanely than trying to make udp stateful
19:42:55 <frickler> using tcp should be an option, too, using tls is too much overhead I'd think
19:43:11 <clarkb> and ya we could just use tcp
19:44:15 <clarkb> frickler: re edns that is where people have noticed unexpected buggy behavior. But I'm generally not comfortable with OVN modifying/intercepting/interpretting and DNS lookup
19:44:16 <fungi> i suspect that the volume of uncached (by the local unbound) queries going over tls would be a drop in the bucket compared to things like package installs
19:44:20 <corvus> i'm okay with the minimal solution, but i do wonder, why would tls be too much?
19:45:03 <frickler> why not run unbound as recursor instead of just forwarder?
19:45:06 <clarkb> I run unbound doing upstream lookups via dns over tls on an ancient amd cpu that doesn't even have active cooling it is so underpowered
19:45:14 <clarkb> at home I mean
19:45:14 <corvus> our unbound is a caching resolver, so it's going to incur some tls overhead at the start, but....
19:45:48 <corvus> are we not running it as a caching recursive resolver?  are we running it as a forwarder only?
19:45:58 <clarkb> corvus: the current config is caching forwarder only
19:46:07 <clarkb> for test nodes. I'm not sure if the control plane nodes are recursing
19:46:30 <fungi> a caching forwarder to... google dns and opendns right?
19:46:30 <corvus> oh, then i understand why tls would be considered too much
19:46:37 <clarkb> fungi: google and cloudflare
19:46:47 <frickler> https://review.opendev.org/c/opendev/base-jobs/+/929960/8/roles/configure-unbound/templates/forwarding.conf.j2
19:46:47 <fungi> ah, okay
19:47:21 <clarkb> both google and cloudflare support this fwiw
19:47:26 <corvus> i suspect we started as resolving and changed to forwarding ages ago due to performance reasons, and that may be why i misremembered
19:47:30 <clarkb> so its not like we will be on their bad side
19:47:54 <clarkb> it is mostly a determination for whether or not we think it will have negative side effects for the jobs I think?
19:48:15 <fungi> right, they both want to collect and profit off as much dns query information as they can get their hands on, so they're highly unlikely to object
19:48:49 <clarkb> if we prefer to start with just tcp instead of tls I can do that too
19:49:01 <clarkb> as I think that will defeat the mitm behavior
19:49:33 <frickler> https://bugs.launchpad.net/neutron/+bug/2030294 and https://bugs.launchpad.net/neutron/+bug/2030295 fwiw
19:49:37 <clarkb> mostly I want rough consensus on what we think should work before I go through all the trouble of writing changes to test it
19:49:55 <clarkb> and maybe the answer is nothing or tcp or tls
19:50:22 <frickler> I'd prefer to do nothing until we see actual issues in jobs
19:50:42 <corvus> i'm +1 on tcp and +1 on tcp+tls if you have interest in benchmarking that :)
19:51:53 <frickler> but also we can talk to rackspace maybe and see whether they can get OVN to turn off this DNS mangling
19:52:01 <clarkb> I don't think OVN supports that
19:52:20 <frickler> would be an RFE likely, yes
19:52:21 <clarkb> its honestly the sort of thing that if I were openstack or neutron I would detel OVN over but I don't make those decisions
19:52:30 <clarkb> s/detel/delete/
19:52:50 <clarkb> it is completely inappropriate behavior from an overlay network layer to intercept dns
19:53:01 <clarkb> and to do so by default without an option to turn it off
19:53:12 <clarkb> but thats a separate discussion
19:53:19 <corvus> ++
19:53:42 <clarkb> fungi: if you have any thoughts on what you'd prefer our jobs to do can you throw them on the change? then based on that I'll see if its worth modifying to make a change whatever it may be testable
19:54:03 <clarkb> #topic Open Discussion
19:54:12 <clarkb> there were two other items I wanted to bring up before our hour is over
19:54:22 <clarkb> #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stevenfin to cleanup old jobs in ozj and project-config
19:54:29 <clarkb> #undo
19:54:29 <opendevmeet> Removing item from minutes: #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22
19:54:35 <clarkb> #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stephenfin to cleanup old jobs in ozj and project-config
19:54:48 <fungi> that's been a long time coming
19:55:09 <clarkb> stephenfin has been pushing changes to clean up old stuff in our zuul configs. I started reviewing them this mornign but there are even more. The effort is much appreciated and reviewing them to make that known would be great
19:55:27 <clarkb> and then finally:
19:55:29 <clarkb> #link https://review.opendev.org/c/opendev/base-jobs/+/930082 blockdiag and seqdiag replaced with graphviz
19:55:52 <clarkb> I confused myself into believing that removing blockdiag would be straightforward because I didn't realize seqdiag is basically blockdiag with a different graph style
19:56:12 <clarkb> but I found an example that was close to what we needed with dot and graphhviz and managed to get hat working last night
19:56:32 <clarkb> the motivation here is that blockdiag and seqdiag are not really maintained and it is creating dependency trouble with python3.12
19:57:00 <clarkb> we can sidestep all of that by using graphviz and sphinx's built in support for graphviz as long as we don't mind far more verbose graph specifications in dot language
19:57:23 <clarkb> if we're happy with ^ that change and its child I can port it to zuul-jobs and zuul and elsewhere we may have the same/similar graphics
19:57:36 <clarkb> that was all from me. Anything else/
19:57:41 <frickler> one other thing from me: any objection to dropping the exim paniclogs that keep creating additional spam after the recent unattended upgrade hickups?
19:57:47 <fungi> #link https://review.opendev.org/930236 Update Mailman containers to latest versions
19:58:16 <clarkb> frickler: no objection from me. It seemed like that got rootcaused to a race in package install and shouldn't indicate an ongoign inssue so should be safe
19:58:17 <corvus> clarkb: thanks for the graphviz work :)
19:58:32 <corvus> frickler: paniclog reset sgtm
19:58:47 <clarkb> fungi: I've added that to my review list
19:58:51 <corvus> (i think they should age out eventually, but deleting will be faster)
19:59:21 <fungi> yeah, i have no objection to clearing those exim paniclogs
20:00:08 <clarkb> and we are at time
20:00:10 <clarkb> thank you everyone
20:00:16 <clarkb> we'll be back same time and location next week
20:00:18 <clarkb> #endmeeting