Tuesday, 2024-09-24

clarkbMeeting time in less than a minute18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Sep 24 19:00:40 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TI4VUGGELZFA23KRIBGVJRNJMNB7VHEK/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkb#link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 119:01
clarkbI wanted to call this out because this event has a CFP that closes much earlier relative to the event date than a number of other events that have occurred or will occur around openinfra day themes19:01
clarkbthe event is early March and CFP closes November 1 so ~5 months in advance of the event19:02
clarkber 4.5 months? In any case much sooner than others19:02
clarkbanything else to call out before we dive into the agenda?19:03
fricklerjust a note I'll be away the next three weeks19:03
clarkbthanks for the heads up. Just saw that in the tc meeting19:03
clarkb#topic Rocky Package Mirror Creation19:04
clarkbI didn't want NeilHanlon to have to hang around until open discussion every week so put this on the agenda19:04
clarkbThat said I don't think there is a change to do the rsyncing yet, but please point it out to me if I have missed it19:05
clarkbsounds like there isn't anything else to add19:06
clarkb#topic Rackspace's Flex Cloud19:06
clarkbAs noted last week the next step here is to figure out authentication to swift in the new region19:06
clarkbI have not had a chance to poke at that. I got nerd sniped by graphviz and peppers (two separate things)19:07
NeilHanlonhi :) i'm here, and thanks!  No change submitted yet19:07
clarkbturns out buying roasted hatch chilies in bulk leads to an afternoon of peeling and chopping and bagging and freezing the peppers19:07
clarkbif anyone else beats me to this let me know. Otherwise its still on my todo list (probably no earlier than tomorrow)19:08
clarkband a reminder that if you do figure it out creating a container for staged dib image builds has been requested. Maybe opendev-zuul-dib-builds or similar for hte name19:08
clarkb#topic Etherpad 2.2.519:09
clarkbGood news on this one. We have upgraded to v2.2.5 proper and are not running a previous tip of the develop branch commit19:09
clarkbMeetpad continues to work as well which was the main motiviation behind the dev commit and this 2.2.5 upgrade19:10
clarkbI'll drop this from next weeks agenda but wanted to catch everyone up on this so there wasn't any confusion as we near the PTG19:10
clarkbcorvus did mention a possible browser memory leak related to etherpad when we were on the dev commit but we don't have a ton of evidence so something to keep an eye on19:11
clarkbI haven't seen it in my local firefox instance. But I also restart it at least once a week or so which may help combat issues like that. Something to be aware of and keep an eye out for if we get more reports19:12
clarkb#topic Updating ansible+ansible-lint versions in our repos19:12
clarkbMost of the changes related to this have merged. Thank you everyone for the reviews and faith in my ability to not break things19:12
clarkb#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work19:12
clarkbthere is one last change open for this and it is in ozj19:12
clarkbreviews very much welcome. I'm really hopeful that we get through this then don't have to bother with it too much for a while19:13
fricklerI think we'd defer this until after the release? just to be cautious this week?19:14
clarkbthere was also some question about the utility of the ansible-lint tool and I'll mention what I said previously here since the meetings might make it easier to find19:14
clarkbfrickler: thats fine too19:14
clarkbI think there are two main reasons to run ansible-lint. The first is that they run an ansible syntax checker as part of linting. So it checks if the ansible can run at all at a basic level. Then there are some ansible-lint rules which are actually helpful like quoting your octal permission strings19:15
clarkbit can also detect if you miss required module parameters19:15
clarkbthere are other ways to do those checks without ansible-lint but ansible-lint can mock out modules like zuul_return and zuul_console which makes it useful for checking zuul job playbooks in particular19:16
clarkb(otherwise you need to isntall those modules or find a way to fake them out manually)19:16
clarkbanyway as mentioned reviews welcome. If we feeld strongly about any specific rule that we change code for feel free to note that in review19:17
clarkb#topic Zuul-launcher image builds19:17
corvusno significant news from me on this19:18
clarkbLast week there were ~3 next steps here. First up was figuring out the staging location for images before they get uploaded to clouds. We were going to create a container in raxflex for this which I haven't done yet19:18
clarkbnext up was merging features into zuul-launcher itself to do the image uploads from the staging location to the clouds19:18
clarkbcorvus: ^ has that code merged yet?19:18
corvusnot yet, but it's slightly more ready to merge than it was last week :)19:19
clarkbfinally tonyb was going to look into porting the dib builds we do into more zuul jobs (currently only bullseye has a build image job)19:19
clarkbcorvus: progress!19:19
clarkbI haven't seen any changes for new image build jobs yet19:19
clarkb#topic OpenStack OpenAPI spec publishing19:21
clarkb#link https://review.opendev.org/92193419:21
clarkbI'll try to tldr this but fungi can probably fill us in with more details19:21
clarkbbasically openstacksdk folks are working on openapi specification for openstack apis that can be used to generate client/sdk tooling19:21
clarkbthey would like a new domain to host these specs under as well as the assocaited hosting and afs storage19:21
clarkbI think the current name in the change aboev is openapi-specs.openstack.org19:22
fungiyeah, there was some debate about the best way to publish those, the sdk team was interested in having a portable url for use in build systems for language bindings and ides19:22
fungiin discussing yesterday it sounds like something more generic like api-specs.openstack.org could be more palatable19:23
clarkbI think I would personally prefer that we host this stuff under something like docs.openstack.org/openapi-specs to go along with docs.openstack.org/api-refs/ but sounds like there was a lot of pushback and desire for a dedicted domain19:23
clarkband ya I think decoupling the domain from openapi specifically would be better bceause openapi could go away in the fuiture and get replaced with some new thing. api-specs.openstack.org would be an improvement19:23
fungithe idea is to treat the api spec structured data similar to the existing service-types.openstack.org site19:24
fungii think some of the pushback was really more against using docs.o.o for it specifically because it's not documentation19:24
fricklerI still don't understand what would be different between api-specs.openstack.org vs. docs.openstack.org/api-specs19:25
fricklerspecs isn't docs?19:25
clarkbfrickler: ya I'm still personally struggling with it too19:25
clarkbif you add swagger to it then it really does become human consumable docs too19:25
fungithe original original idea raised in the ptg session was to use the specs.openstack.org site but i pointed out that's really a completely different thing with a different kind of information and audience19:25
corvusspecs.openstack.org is for docs.  ;)19:26
clarkbbut I would also argue that documentation that is meant to be machine readable doesn't stop it from being documentation19:26
clarkbbut I can at least live with a domain that we're unlikely to need to migrate off of in the future if tooling changes19:26
fungianyway, the reason i raised this in the agenda was that the review had been sitting for months with no feedback, so i wanted to at least check whether there was consensus on what was being proposed there19:27
fungiit wasn't one i was comfortable approving without some additional eyes on it19:27
fungisounds like there isn't clear consensus, so my question is answered19:27
clarkbI'll try to leave a review summarizing my thoughts that a) if you add swagger then this really does look like docs and then hosting this becomes simpler and b) if we continue to really think that docs.openstack.org/api-specs is inappropriate then dropping the openapi specificity would alleviate my other concerns19:28
fungithanks!19:28
clarkbside note you could delete api-refs and replace it with openapi + swagger...19:29
clarkb#topic Gitea 1.22.219:30
clarkbturns out there was a gitea release that I missed due to summit travel prep and traveling19:30
clarkb#link https://review.opendev.org/c/opendev/system-config/+/930217 Change to upgrade to 1.22.219:30
clarkbI've gone ahead and pushed a change to do the update. It is a bugfix release compared to the one we are already running and CI passes19:30
clarkbwith the openstack release coming up maybe ya'll can take a look at the changelog and change and decide if we should do more testing with a held node or potentially just wait for the openstack release to complete?19:31
clarkbor maybe you'll decide to send it? Mostly I'm looking for feedback on how urgent vs risky we think this update is so that we can plan accordingyl19:31
clarkbbut we don't need that feedback to happen in the meeting. COmments on the chagne are fine19:33
clarkb#topic Upgrading old servers19:33
clarkbanything new to note here? I don't see new patchests on the mediawiki stack19:33
clarkbI've been starting to think about a gerrit 3.10 upgrade and then a server replacement for that server but with the openstack release it hasn't been very concrete19:34
clarkbprobably try to aim for doing that around the quieter holiday period if we can amnage to get it all lined up for that time frame19:35
fricklergerrit server replacement?19:35
clarkbfrickler: yes to update the base os19:35
clarkband maybe also drop boot from volume if we can19:35
clarkb(boot from volume makes rescuing servers far more of an adventure and I'd like to be running something that doesn't provide adventures when things are already broken)19:35
fungiyeah, looks like our gerrit server is still on focal (20.04 lts)19:36
clarkbthere was actually a recent mailing list thread on how to do it though so should be discoverable now vs blazing our own trail19:36
clarkbya its not urgent but will be in 6 months19:36
clarkbso trying to plan ahead a bit19:36
fungiif memory serves, the "adventure" is when the original image used to boot the server is no longer present19:37
clarkblet me know if you are interested in helping or have thoughts. Like I said its mostly just me thinking we should do that and what sorts of things might we want to change19:37
clarkbfungi: you have to use newer microversions too19:37
fungioh, right19:37
fungiolder nova api didn't have support for it19:38
clarkb#topic DNS over TLS on Test Nodes19:38
clarkbthis was actually earlier in the agenda and I skimmed right over it in my haste to prep after my errand19:38
clarkbOVN MitM's DNS traffic that is sent in the clear19:39
clarkb#link https://serverfault.com/questions/1134180/how-to-disable-or-fix-openstack-intercepting-dns-ptr-queries this occasioanlly breaks workloads in openstack clouds19:39
clarkbI don't know for certain that raxflex is using OVN but the MTU that our nodes receive there implies this is the case19:39
clarkb#link https://review.opendev.org/c/opendev/base-jobs/+/929960 Adds this to configure-unbound role in base-jobs19:40
clarkbI've written this change to add DNS over TLS to unbound's configs in our test nodes to mitigate against this behavior19:40
corvusis that the only cloud we've seen this so far?19:40
clarkbthe chagne is WIP because it needs to be split into two in order to do the base-test base job testing pattern but I didn't want to do all that effort until we're at least happy to make this change19:41
clarkbcorvus: yes I think they are the only one running OVN. Though it is possible openmetal is too (we can check that)19:41
corvusprobably worth including on the feedback list :)19:41
frickleractually there are known bugs for this, I can search those tomorrow19:42
frickleriiuc it mainly affects edns19:42
corvusanyway, much of our setup, including unbound, is about normalizing environments and protecting us from the clouds; so i think this change is consistent with our intentions there19:42
fungiseems like dnssec might also spot altered query replies from the mitm19:42
clarkbin general I think dns over tls is a good idea regardless of OVN. It stops putting DNS traffic in the claer and puts traffic on TCP whcih can be handled by firewalls a bit more sanely than trying to make udp stateful19:42
fricklerusing tcp should be an option, too, using tls is too much overhead I'd think19:42
clarkband ya we could just use tcp19:43
clarkbfrickler: re edns that is where people have noticed unexpected buggy behavior. But I'm generally not comfortable with OVN modifying/intercepting/interpretting and DNS lookup19:44
fungii suspect that the volume of uncached (by the local unbound) queries going over tls would be a drop in the bucket compared to things like package installs19:44
corvusi'm okay with the minimal solution, but i do wonder, why would tls be too much?19:44
fricklerwhy not run unbound as recursor instead of just forwarder?19:45
clarkbI run unbound doing upstream lookups via dns over tls on an ancient amd cpu that doesn't even have active cooling it is so underpowered19:45
clarkbat home I mean19:45
corvusour unbound is a caching resolver, so it's going to incur some tls overhead at the start, but....19:45
corvusare we not running it as a caching recursive resolver?  are we running it as a forwarder only?19:45
clarkbcorvus: the current config is caching forwarder only19:45
clarkbfor test nodes. I'm not sure if the control plane nodes are recursing19:46
fungia caching forwarder to... google dns and opendns right?19:46
corvusoh, then i understand why tls would be considered too much19:46
clarkbfungi: google and cloudflare19:46
fricklerhttps://review.opendev.org/c/opendev/base-jobs/+/929960/8/roles/configure-unbound/templates/forwarding.conf.j219:46
fungiah, okay19:46
clarkbboth google and cloudflare support this fwiw19:47
corvusi suspect we started as resolving and changed to forwarding ages ago due to performance reasons, and that may be why i misremembered19:47
clarkbso its not like we will be on their bad side19:47
clarkbit is mostly a determination for whether or not we think it will have negative side effects for the jobs I think?19:47
fungiright, they both want to collect and profit off as much dns query information as they can get their hands on, so they're highly unlikely to object19:48
clarkbif we prefer to start with just tcp instead of tls I can do that too19:48
clarkbas I think that will defeat the mitm behavior19:49
fricklerhttps://bugs.launchpad.net/neutron/+bug/2030294 and https://bugs.launchpad.net/neutron/+bug/2030295 fwiw19:49
clarkbmostly I want rough consensus on what we think should work before I go through all the trouble of writing changes to test it19:49
clarkband maybe the answer is nothing or tcp or tls19:49
fricklerI'd prefer to do nothing until we see actual issues in jobs19:50
corvusi'm +1 on tcp and +1 on tcp+tls if you have interest in benchmarking that :)19:50
fricklerbut also we can talk to rackspace maybe and see whether they can get OVN to turn off this DNS mangling19:51
clarkbI don't think OVN supports that19:52
fricklerwould be an RFE likely, yes19:52
clarkbits honestly the sort of thing that if I were openstack or neutron I would detel OVN over but I don't make those decisions19:52
clarkbs/detel/delete/19:52
clarkbit is completely inappropriate behavior from an overlay network layer to intercept dns19:52
clarkband to do so by default without an option to turn it off19:53
clarkbbut thats a separate discussion19:53
corvus++19:53
clarkbfungi: if you have any thoughts on what you'd prefer our jobs to do can you throw them on the change? then based on that I'll see if its worth modifying to make a change whatever it may be testable19:53
clarkb#topic Open Discussion19:54
clarkbthere were two other items I wanted to bring up before our hour is over19:54
clarkb#link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stevenfin to cleanup old jobs in ozj and project-config19:54
clarkb#undo19:54
opendevmeetRemoving item from minutes: #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%2219:54
clarkb#link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stephenfin to cleanup old jobs in ozj and project-config19:54
fungithat's been a long time coming19:54
clarkbstephenfin has been pushing changes to clean up old stuff in our zuul configs. I started reviewing them this mornign but there are even more. The effort is much appreciated and reviewing them to make that known would be great19:55
clarkband then finally:19:55
clarkb#link https://review.opendev.org/c/opendev/base-jobs/+/930082 blockdiag and seqdiag replaced with graphviz19:55
clarkbI confused myself into believing that removing blockdiag would be straightforward because I didn't realize seqdiag is basically blockdiag with a different graph style19:55
clarkbbut I found an example that was close to what we needed with dot and graphhviz and managed to get hat working last night19:56
clarkbthe motivation here is that blockdiag and seqdiag are not really maintained and it is creating dependency trouble with python3.1219:56
clarkbwe can sidestep all of that by using graphviz and sphinx's built in support for graphviz as long as we don't mind far more verbose graph specifications in dot language19:57
clarkbif we're happy with ^ that change and its child I can port it to zuul-jobs and zuul and elsewhere we may have the same/similar graphics19:57
clarkbthat was all from me. Anything else/19:57
fricklerone other thing from me: any objection to dropping the exim paniclogs that keep creating additional spam after the recent unattended upgrade hickups?19:57
fungi#link https://review.opendev.org/930236 Update Mailman containers to latest versions19:57
clarkbfrickler: no objection from me. It seemed like that got rootcaused to a race in package install and shouldn't indicate an ongoign inssue so should be safe19:58
corvusclarkb: thanks for the graphviz work :)19:58
corvusfrickler: paniclog reset sgtm19:58
clarkbfungi: I've added that to my review list19:58
corvus(i think they should age out eventually, but deleting will be faster)19:58
fungiyeah, i have no objection to clearing those exim paniclogs19:59
clarkband we are at time20:00
clarkbthank you everyone20:00
clarkbwe'll be back same time and location next week20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Sep 24 20:00:18 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-24-19.00.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-24-19.00.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-24-19.00.log.html20:00
fricklerfancy, that serverfault report was actually from my downstream ;)20:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!