clarkb | Meeting time in less than a minute | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Sep 24 19:00:40 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TI4VUGGELZFA23KRIBGVJRNJMNB7VHEK/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1 | 19:01 |
clarkb | I wanted to call this out because this event has a CFP that closes much earlier relative to the event date than a number of other events that have occurred or will occur around openinfra day themes | 19:01 |
clarkb | the event is early March and CFP closes November 1 so ~5 months in advance of the event | 19:02 |
clarkb | er 4.5 months? In any case much sooner than others | 19:02 |
clarkb | anything else to call out before we dive into the agenda? | 19:03 |
frickler | just a note I'll be away the next three weeks | 19:03 |
clarkb | thanks for the heads up. Just saw that in the tc meeting | 19:03 |
clarkb | #topic Rocky Package Mirror Creation | 19:04 |
clarkb | I didn't want NeilHanlon to have to hang around until open discussion every week so put this on the agenda | 19:04 |
clarkb | That said I don't think there is a change to do the rsyncing yet, but please point it out to me if I have missed it | 19:05 |
clarkb | sounds like there isn't anything else to add | 19:06 |
clarkb | #topic Rackspace's Flex Cloud | 19:06 |
clarkb | As noted last week the next step here is to figure out authentication to swift in the new region | 19:06 |
clarkb | I have not had a chance to poke at that. I got nerd sniped by graphviz and peppers (two separate things) | 19:07 |
NeilHanlon | hi :) i'm here, and thanks! No change submitted yet | 19:07 |
clarkb | turns out buying roasted hatch chilies in bulk leads to an afternoon of peeling and chopping and bagging and freezing the peppers | 19:07 |
clarkb | if anyone else beats me to this let me know. Otherwise its still on my todo list (probably no earlier than tomorrow) | 19:08 |
clarkb | and a reminder that if you do figure it out creating a container for staged dib image builds has been requested. Maybe opendev-zuul-dib-builds or similar for hte name | 19:08 |
clarkb | #topic Etherpad 2.2.5 | 19:09 |
clarkb | Good news on this one. We have upgraded to v2.2.5 proper and are not running a previous tip of the develop branch commit | 19:09 |
clarkb | Meetpad continues to work as well which was the main motiviation behind the dev commit and this 2.2.5 upgrade | 19:10 |
clarkb | I'll drop this from next weeks agenda but wanted to catch everyone up on this so there wasn't any confusion as we near the PTG | 19:10 |
clarkb | corvus did mention a possible browser memory leak related to etherpad when we were on the dev commit but we don't have a ton of evidence so something to keep an eye on | 19:11 |
clarkb | I haven't seen it in my local firefox instance. But I also restart it at least once a week or so which may help combat issues like that. Something to be aware of and keep an eye out for if we get more reports | 19:12 |
clarkb | #topic Updating ansible+ansible-lint versions in our repos | 19:12 |
clarkb | Most of the changes related to this have merged. Thank you everyone for the reviews and faith in my ability to not break things | 19:12 |
clarkb | #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work | 19:12 |
clarkb | there is one last change open for this and it is in ozj | 19:12 |
clarkb | reviews very much welcome. I'm really hopeful that we get through this then don't have to bother with it too much for a while | 19:13 |
frickler | I think we'd defer this until after the release? just to be cautious this week? | 19:14 |
clarkb | there was also some question about the utility of the ansible-lint tool and I'll mention what I said previously here since the meetings might make it easier to find | 19:14 |
clarkb | frickler: thats fine too | 19:14 |
clarkb | I think there are two main reasons to run ansible-lint. The first is that they run an ansible syntax checker as part of linting. So it checks if the ansible can run at all at a basic level. Then there are some ansible-lint rules which are actually helpful like quoting your octal permission strings | 19:15 |
clarkb | it can also detect if you miss required module parameters | 19:15 |
clarkb | there are other ways to do those checks without ansible-lint but ansible-lint can mock out modules like zuul_return and zuul_console which makes it useful for checking zuul job playbooks in particular | 19:16 |
clarkb | (otherwise you need to isntall those modules or find a way to fake them out manually) | 19:16 |
clarkb | anyway as mentioned reviews welcome. If we feeld strongly about any specific rule that we change code for feel free to note that in review | 19:17 |
clarkb | #topic Zuul-launcher image builds | 19:17 |
corvus | no significant news from me on this | 19:18 |
clarkb | Last week there were ~3 next steps here. First up was figuring out the staging location for images before they get uploaded to clouds. We were going to create a container in raxflex for this which I haven't done yet | 19:18 |
clarkb | next up was merging features into zuul-launcher itself to do the image uploads from the staging location to the clouds | 19:18 |
clarkb | corvus: ^ has that code merged yet? | 19:18 |
corvus | not yet, but it's slightly more ready to merge than it was last week :) | 19:19 |
clarkb | finally tonyb was going to look into porting the dib builds we do into more zuul jobs (currently only bullseye has a build image job) | 19:19 |
clarkb | corvus: progress! | 19:19 |
clarkb | I haven't seen any changes for new image build jobs yet | 19:19 |
clarkb | #topic OpenStack OpenAPI spec publishing | 19:21 |
clarkb | #link https://review.opendev.org/921934 | 19:21 |
clarkb | I'll try to tldr this but fungi can probably fill us in with more details | 19:21 |
clarkb | basically openstacksdk folks are working on openapi specification for openstack apis that can be used to generate client/sdk tooling | 19:21 |
clarkb | they would like a new domain to host these specs under as well as the assocaited hosting and afs storage | 19:21 |
clarkb | I think the current name in the change aboev is openapi-specs.openstack.org | 19:22 |
fungi | yeah, there was some debate about the best way to publish those, the sdk team was interested in having a portable url for use in build systems for language bindings and ides | 19:22 |
fungi | in discussing yesterday it sounds like something more generic like api-specs.openstack.org could be more palatable | 19:23 |
clarkb | I think I would personally prefer that we host this stuff under something like docs.openstack.org/openapi-specs to go along with docs.openstack.org/api-refs/ but sounds like there was a lot of pushback and desire for a dedicted domain | 19:23 |
clarkb | and ya I think decoupling the domain from openapi specifically would be better bceause openapi could go away in the fuiture and get replaced with some new thing. api-specs.openstack.org would be an improvement | 19:23 |
fungi | the idea is to treat the api spec structured data similar to the existing service-types.openstack.org site | 19:24 |
fungi | i think some of the pushback was really more against using docs.o.o for it specifically because it's not documentation | 19:24 |
frickler | I still don't understand what would be different between api-specs.openstack.org vs. docs.openstack.org/api-specs | 19:25 |
frickler | specs isn't docs? | 19:25 |
clarkb | frickler: ya I'm still personally struggling with it too | 19:25 |
clarkb | if you add swagger to it then it really does become human consumable docs too | 19:25 |
fungi | the original original idea raised in the ptg session was to use the specs.openstack.org site but i pointed out that's really a completely different thing with a different kind of information and audience | 19:25 |
corvus | specs.openstack.org is for docs. ;) | 19:26 |
clarkb | but I would also argue that documentation that is meant to be machine readable doesn't stop it from being documentation | 19:26 |
clarkb | but I can at least live with a domain that we're unlikely to need to migrate off of in the future if tooling changes | 19:26 |
fungi | anyway, the reason i raised this in the agenda was that the review had been sitting for months with no feedback, so i wanted to at least check whether there was consensus on what was being proposed there | 19:27 |
fungi | it wasn't one i was comfortable approving without some additional eyes on it | 19:27 |
fungi | sounds like there isn't clear consensus, so my question is answered | 19:27 |
clarkb | I'll try to leave a review summarizing my thoughts that a) if you add swagger then this really does look like docs and then hosting this becomes simpler and b) if we continue to really think that docs.openstack.org/api-specs is inappropriate then dropping the openapi specificity would alleviate my other concerns | 19:28 |
fungi | thanks! | 19:28 |
clarkb | side note you could delete api-refs and replace it with openapi + swagger... | 19:29 |
clarkb | #topic Gitea 1.22.2 | 19:30 |
clarkb | turns out there was a gitea release that I missed due to summit travel prep and traveling | 19:30 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/930217 Change to upgrade to 1.22.2 | 19:30 |
clarkb | I've gone ahead and pushed a change to do the update. It is a bugfix release compared to the one we are already running and CI passes | 19:30 |
clarkb | with the openstack release coming up maybe ya'll can take a look at the changelog and change and decide if we should do more testing with a held node or potentially just wait for the openstack release to complete? | 19:31 |
clarkb | or maybe you'll decide to send it? Mostly I'm looking for feedback on how urgent vs risky we think this update is so that we can plan accordingyl | 19:31 |
clarkb | but we don't need that feedback to happen in the meeting. COmments on the chagne are fine | 19:33 |
clarkb | #topic Upgrading old servers | 19:33 |
clarkb | anything new to note here? I don't see new patchests on the mediawiki stack | 19:33 |
clarkb | I've been starting to think about a gerrit 3.10 upgrade and then a server replacement for that server but with the openstack release it hasn't been very concrete | 19:34 |
clarkb | probably try to aim for doing that around the quieter holiday period if we can amnage to get it all lined up for that time frame | 19:35 |
frickler | gerrit server replacement? | 19:35 |
clarkb | frickler: yes to update the base os | 19:35 |
clarkb | and maybe also drop boot from volume if we can | 19:35 |
clarkb | (boot from volume makes rescuing servers far more of an adventure and I'd like to be running something that doesn't provide adventures when things are already broken) | 19:35 |
fungi | yeah, looks like our gerrit server is still on focal (20.04 lts) | 19:36 |
clarkb | there was actually a recent mailing list thread on how to do it though so should be discoverable now vs blazing our own trail | 19:36 |
clarkb | ya its not urgent but will be in 6 months | 19:36 |
clarkb | so trying to plan ahead a bit | 19:36 |
fungi | if memory serves, the "adventure" is when the original image used to boot the server is no longer present | 19:37 |
clarkb | let me know if you are interested in helping or have thoughts. Like I said its mostly just me thinking we should do that and what sorts of things might we want to change | 19:37 |
clarkb | fungi: you have to use newer microversions too | 19:37 |
fungi | oh, right | 19:37 |
fungi | older nova api didn't have support for it | 19:38 |
clarkb | #topic DNS over TLS on Test Nodes | 19:38 |
clarkb | this was actually earlier in the agenda and I skimmed right over it in my haste to prep after my errand | 19:38 |
clarkb | OVN MitM's DNS traffic that is sent in the clear | 19:39 |
clarkb | #link https://serverfault.com/questions/1134180/how-to-disable-or-fix-openstack-intercepting-dns-ptr-queries this occasioanlly breaks workloads in openstack clouds | 19:39 |
clarkb | I don't know for certain that raxflex is using OVN but the MTU that our nodes receive there implies this is the case | 19:39 |
clarkb | #link https://review.opendev.org/c/opendev/base-jobs/+/929960 Adds this to configure-unbound role in base-jobs | 19:40 |
clarkb | I've written this change to add DNS over TLS to unbound's configs in our test nodes to mitigate against this behavior | 19:40 |
corvus | is that the only cloud we've seen this so far? | 19:40 |
clarkb | the chagne is WIP because it needs to be split into two in order to do the base-test base job testing pattern but I didn't want to do all that effort until we're at least happy to make this change | 19:41 |
clarkb | corvus: yes I think they are the only one running OVN. Though it is possible openmetal is too (we can check that) | 19:41 |
corvus | probably worth including on the feedback list :) | 19:41 |
frickler | actually there are known bugs for this, I can search those tomorrow | 19:42 |
frickler | iiuc it mainly affects edns | 19:42 |
corvus | anyway, much of our setup, including unbound, is about normalizing environments and protecting us from the clouds; so i think this change is consistent with our intentions there | 19:42 |
fungi | seems like dnssec might also spot altered query replies from the mitm | 19:42 |
clarkb | in general I think dns over tls is a good idea regardless of OVN. It stops putting DNS traffic in the claer and puts traffic on TCP whcih can be handled by firewalls a bit more sanely than trying to make udp stateful | 19:42 |
frickler | using tcp should be an option, too, using tls is too much overhead I'd think | 19:42 |
clarkb | and ya we could just use tcp | 19:43 |
clarkb | frickler: re edns that is where people have noticed unexpected buggy behavior. But I'm generally not comfortable with OVN modifying/intercepting/interpretting and DNS lookup | 19:44 |
fungi | i suspect that the volume of uncached (by the local unbound) queries going over tls would be a drop in the bucket compared to things like package installs | 19:44 |
corvus | i'm okay with the minimal solution, but i do wonder, why would tls be too much? | 19:44 |
frickler | why not run unbound as recursor instead of just forwarder? | 19:45 |
clarkb | I run unbound doing upstream lookups via dns over tls on an ancient amd cpu that doesn't even have active cooling it is so underpowered | 19:45 |
clarkb | at home I mean | 19:45 |
corvus | our unbound is a caching resolver, so it's going to incur some tls overhead at the start, but.... | 19:45 |
corvus | are we not running it as a caching recursive resolver? are we running it as a forwarder only? | 19:45 |
clarkb | corvus: the current config is caching forwarder only | 19:45 |
clarkb | for test nodes. I'm not sure if the control plane nodes are recursing | 19:46 |
fungi | a caching forwarder to... google dns and opendns right? | 19:46 |
corvus | oh, then i understand why tls would be considered too much | 19:46 |
clarkb | fungi: google and cloudflare | 19:46 |
frickler | https://review.opendev.org/c/opendev/base-jobs/+/929960/8/roles/configure-unbound/templates/forwarding.conf.j2 | 19:46 |
fungi | ah, okay | 19:46 |
clarkb | both google and cloudflare support this fwiw | 19:47 |
corvus | i suspect we started as resolving and changed to forwarding ages ago due to performance reasons, and that may be why i misremembered | 19:47 |
clarkb | so its not like we will be on their bad side | 19:47 |
clarkb | it is mostly a determination for whether or not we think it will have negative side effects for the jobs I think? | 19:47 |
fungi | right, they both want to collect and profit off as much dns query information as they can get their hands on, so they're highly unlikely to object | 19:48 |
clarkb | if we prefer to start with just tcp instead of tls I can do that too | 19:48 |
clarkb | as I think that will defeat the mitm behavior | 19:49 |
frickler | https://bugs.launchpad.net/neutron/+bug/2030294 and https://bugs.launchpad.net/neutron/+bug/2030295 fwiw | 19:49 |
clarkb | mostly I want rough consensus on what we think should work before I go through all the trouble of writing changes to test it | 19:49 |
clarkb | and maybe the answer is nothing or tcp or tls | 19:49 |
frickler | I'd prefer to do nothing until we see actual issues in jobs | 19:50 |
corvus | i'm +1 on tcp and +1 on tcp+tls if you have interest in benchmarking that :) | 19:50 |
frickler | but also we can talk to rackspace maybe and see whether they can get OVN to turn off this DNS mangling | 19:51 |
clarkb | I don't think OVN supports that | 19:52 |
frickler | would be an RFE likely, yes | 19:52 |
clarkb | its honestly the sort of thing that if I were openstack or neutron I would detel OVN over but I don't make those decisions | 19:52 |
clarkb | s/detel/delete/ | 19:52 |
clarkb | it is completely inappropriate behavior from an overlay network layer to intercept dns | 19:52 |
clarkb | and to do so by default without an option to turn it off | 19:53 |
clarkb | but thats a separate discussion | 19:53 |
corvus | ++ | 19:53 |
clarkb | fungi: if you have any thoughts on what you'd prefer our jobs to do can you throw them on the change? then based on that I'll see if its worth modifying to make a change whatever it may be testable | 19:53 |
clarkb | #topic Open Discussion | 19:54 |
clarkb | there were two other items I wanted to bring up before our hour is over | 19:54 |
clarkb | #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stevenfin to cleanup old jobs in ozj and project-config | 19:54 |
clarkb | #undo | 19:54 |
opendevmeet | Removing item from minutes: #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 | 19:54 |
clarkb | #link https://review.opendev.org/q/topic:%22drop-legacy-dsvm-jobs%22 Work by stephenfin to cleanup old jobs in ozj and project-config | 19:54 |
fungi | that's been a long time coming | 19:54 |
clarkb | stephenfin has been pushing changes to clean up old stuff in our zuul configs. I started reviewing them this mornign but there are even more. The effort is much appreciated and reviewing them to make that known would be great | 19:55 |
clarkb | and then finally: | 19:55 |
clarkb | #link https://review.opendev.org/c/opendev/base-jobs/+/930082 blockdiag and seqdiag replaced with graphviz | 19:55 |
clarkb | I confused myself into believing that removing blockdiag would be straightforward because I didn't realize seqdiag is basically blockdiag with a different graph style | 19:55 |
clarkb | but I found an example that was close to what we needed with dot and graphhviz and managed to get hat working last night | 19:56 |
clarkb | the motivation here is that blockdiag and seqdiag are not really maintained and it is creating dependency trouble with python3.12 | 19:56 |
clarkb | we can sidestep all of that by using graphviz and sphinx's built in support for graphviz as long as we don't mind far more verbose graph specifications in dot language | 19:57 |
clarkb | if we're happy with ^ that change and its child I can port it to zuul-jobs and zuul and elsewhere we may have the same/similar graphics | 19:57 |
clarkb | that was all from me. Anything else/ | 19:57 |
frickler | one other thing from me: any objection to dropping the exim paniclogs that keep creating additional spam after the recent unattended upgrade hickups? | 19:57 |
fungi | #link https://review.opendev.org/930236 Update Mailman containers to latest versions | 19:57 |
clarkb | frickler: no objection from me. It seemed like that got rootcaused to a race in package install and shouldn't indicate an ongoign inssue so should be safe | 19:58 |
corvus | clarkb: thanks for the graphviz work :) | 19:58 |
corvus | frickler: paniclog reset sgtm | 19:58 |
clarkb | fungi: I've added that to my review list | 19:58 |
corvus | (i think they should age out eventually, but deleting will be faster) | 19:58 |
fungi | yeah, i have no objection to clearing those exim paniclogs | 19:59 |
clarkb | and we are at time | 20:00 |
clarkb | thank you everyone | 20:00 |
clarkb | we'll be back same time and location next week | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Sep 24 20:00:18 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-24-19.00.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-24-19.00.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-09-24-19.00.log.html | 20:00 |
frickler | fancy, that serverfault report was actually from my downstream ;) | 20:00 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!