*** kopecmartin_ is now known as kopecmartin | 15:00 | |
clarkb | The OpenDev team meeting begins in a couple of minutes | 19:00 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Feb 28 19:01:08 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | Hello everyone, its been a couple of weeks since we had one of these | 19:01 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/V2UYFDWIGJPXVEJRLIAF7WUNUMDGCJCI/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | The only Service Coordinator nomination I saw was the one I sent in. I believe that makes me it again. But if there was another and I missed it please call that out soon | 19:02 |
fungi | thank you for your service! | 19:02 |
clarkb | #topic Topics | 19:03 |
clarkb | #topic Bastion Host Updates | 19:03 |
clarkb | #link https://review.opendev.org/q/topic:bridge-backups | 19:03 |
clarkb | This stack of changes got some edits after I did my first pass of reviews. I need to do another pass of reviews. Hopefully this afternoon | 19:03 |
clarkb | ianw: anything specific to call out from that? | 19:04 |
ianw | nope, yeah just wants another look, i think i responded to all comments | 19:04 |
clarkb | any other bridge related activities? | 19:04 |
clarkb | #topic Mailman 3 | 19:06 |
clarkb | The db migration stuff should be addressed now. Thank you ianw for that change | 19:06 |
clarkb | fungi got a response from upstream on how to create domains. Apparently you run some python script and don't do migrations. I still find it confusing, but should be able to sort out on a held node | 19:07 |
fungi | no progress yet on the domain piece yet, but one of the maintainers did get back to me with clearer explanations for site creation which doesn't look too complicated. unfortunately also pointed out that postorius's host associations aren't really api-driven and we'll need to reverse engineer it from the webui code | 19:07 |
fungi | last week was a complete wash, trying to catch back up now that i'm home again | 19:07 |
clarkb | ya I'm in a similar boat as we were both traveling | 19:07 |
clarkb | The good news is that we have better directions now | 19:08 |
clarkb | fungi: other than needing to try out upstream's suggestions is there anything else we need to be thinking about here? anything that needs help? | 19:08 |
fungi | planning additional migrations | 19:09 |
fungi | and the upgrade | 19:09 |
fungi | upgrade change is already proposed, but we probably want to do the host separation deployment fixes first | 19:09 |
clarkb | and did we decide on a prefered order for that? | 19:09 |
clarkb | ack | 19:09 |
clarkb | domain host separation, upgrade, migrations ? | 19:09 |
clarkb | er in that order I mean | 19:10 |
fungi | yeah, i think so | 19:10 |
fungi | i'm hoping to knock out airship, openinfra and starlingx in march if possible | 19:10 |
fungi | oh, and katacontainers | 19:10 |
fungi | that might be a bit ambitious, we'll see | 19:10 |
clarkb | sounds good | 19:10 |
fungi | aiming for openstack migration in april or maybe may | 19:10 |
fungi | and then we can clean up the old servers | 19:11 |
clarkb | exciting | 19:11 |
fungi | anyway, that's it for me on this topic | 19:11 |
clarkb | #topic Gerrit Updates | 19:12 |
clarkb | There are two long standing issues related to this. The java 17 switch and the ssh connection channel stuff. Both of which I'll bring up at the gerrit community meeting in 2 days | 19:12 |
clarkb | Hopefully that gets both of those moving for us or at least better direction | 19:12 |
corvus | ssh connection channel stuff? | 19:13 |
clarkb | #link https://github.com/apache/mina-sshd/issues/319 Gerrit SSH issues with flaky networks. | 19:13 |
clarkb | corvus: ^ | 19:13 |
clarkb | it seems to be a minor issue but makes big scary warnings in the logs. ianw has run it down some and likely wrote a bug fix for it but last I checked it hasn't merged | 19:13 |
clarkb | though maybe I didn't set appropriate warning bells on that change to see it merge | 19:13 |
ianw | yeah no comments on it | 19:13 |
clarkb | #link https://gerrit-review.googlesource.com/c/gerrit/+/358314 Possible gerrit ssh channel fix | 19:14 |
clarkb | but ya I'll try to get some movement on that Thursday at 8am pacific in the gerrit communty meeting | 19:15 |
ianw | thanks! | 19:15 |
clarkb | Yesterday we had some Gerrit fun too which ended up exposing a couple of things | 19:15 |
clarkb | The first is that after the change of base images I set the java package to the jre-headless package on debian which doesn't include debugging tools like jcmd | 19:15 |
clarkb | jcmd ended up being unnecessary to get a thread dump as I could do kill -3 instead. That said it seems like a good idea to have jdk tools in place since you don't know you'll need them until you need them | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/875553 Install full jdk on gerrit images | 19:16 |
clarkb | That change will add the extra tools by installing the full jdk headless package instead | 19:16 |
corvus | fyi the stream-events/"recheck" problem in gerrit 3.7 should be fixed in 3.7.1 (i have not verified this, but the expected fix is in the log for the latest release) | 19:16 |
corvus | #link https://bugs.chromium.org/p/gerrit/issues/detail?id=16475 stream-events issue with zuul expected fix is in gerrit 3.7.1 | 19:16 |
clarkb | ack | 19:17 |
corvus | i like adding the tools to the img | 19:17 |
clarkb | The other thing that came up in my debugging of the issues yesterday is that several gerrit plugins expect to be able to write to review_site/data in a persistent manner | 19:17 |
clarkb | in particular the delete-project plugin "archives" deleted repos to data/ and deletes them from that location after some time. The plugin manager uses it for something I haven't figured out yet. Replication plugin uses it to persist replication tasks across gerrit restarts | 19:18 |
clarkb | it is this last one that is most interesting to us since I'm also working on gitea server replacements | 19:18 |
corvus | i also like the idea of bind-mounting that dir, even if not strictly necessary. our intent really was to use containers for "packaging convenience" and not really rely on volume management, etc. so bind-mounting to achieve the normal installation experience makes sense. | 19:18 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/875570 bind mount gerrit's data dir | 19:19 |
fungi | technically it also highlighted a third thing: that there's still academic research interest in opendev activities! | 19:19 |
clarkb | I'll talk more about replication specific things when I get to gitea acitivities, but ya I think this is a good thing | 19:20 |
clarkb | Speaking of gerrit 3.7 one of the things we need to do is update our acls. ianw you sent email about the first change to do that. Did the change get applied yet? I think maybe not as you weren't around last week? | 19:21 |
clarkb | no worries. Just want to get up to speed if that did happen | 19:21 |
corvus | i know the stream-events issue was an upgrade blocker; with that [presumably] resolved, is anything blocking an upgrade to 3.7.1 now? | 19:21 |
fungi | acls ;) | 19:21 |
clarkb | corvus: yes, we need to convert all our acls to 3.7 acceptable formats | 19:22 |
corvus | strictly speaking they shouldn't be a blocker | 19:22 |
clarkb | oh? | 19:22 |
fungi | true, just need to merge the transformation | 19:22 |
corvus | at least, from a tech standpoint. i can get behind us wanting to have them in place though. | 19:22 |
corvus | gerrit has backwards compat for the old stuff -- it's only if you want to change an acl that in comes in to play | 19:22 |
clarkb | ya I think gerrit will start refusing to accept acl updates in the affected locations. And that is likely to be confusing for users | 19:22 |
corvus | yes that | 19:22 |
clarkb | best to get things converted upfront and avoid confusion | 19:23 |
corvus | so the "copy pasta" approach that happens so often would not go well | 19:23 |
clarkb | #link https://review.opendev.org/c/openstack/project-config/+/867931 Cleaning up deprecated copy conditions in project ACLs | 19:23 |
clarkb | this is the first step (but not a complete conversion of everything that needs doing) | 19:23 |
fungi | folks will be confused enough when they cargo-cult an old project creation change and it gets rejected | 19:23 |
corvus | fungi: yep | 19:23 |
ianw | sorry -- no that's not done yet | 19:23 |
clarkb | so ya I would suggest we do as much of the converting on 3.6 as we can. Then upgrade to 3.7.1 or newer | 19:24 |
ianw | we probably want to start our gerrit 3.7 upgrade checklist page | 19:24 |
fungi | so while not technically a blocker, setting a correct example with the current acls we have in project-config will hopefully defray some of that | 19:24 |
clarkb | ianw: ++ | 19:24 |
ianw | i can do that, so we start to have a checklist of things we know to work on | 19:24 |
fungi | also probably our gerrit integration testing will break if our test acls don't at least have the correct format | 19:25 |
clarkb | corvus: do you know if java 17 is required for 3.7? | 19:25 |
clarkb | 3.6 was the first release to "support" it | 19:25 |
clarkb | we may want to do that conversion pre 3.7 as well | 19:25 |
clarkb | and hopefully gerrit community can clarify that in thursday's meeting | 19:25 |
corvus | clarkb: i don't know off hand | 19:26 |
clarkb | I'll try to run that down | 19:26 |
clarkb | Ok lets move on to the next topic which has some overlap | 19:27 |
clarkb | #topic Upgrading Old Servers | 19:27 |
clarkb | As mentioned in our last meeting I think we need to prioritize gitea backends, nameservers, and etherpad | 19:28 |
clarkb | I've started with gitea and have made quite a bit of progress. I'll try to summarize what I've done so far and then what still needs to be done | 19:28 |
clarkb | Gitea09 has been booted in vexxhost sjc1 using a modern v3 flavor there with 8vcpu and 32GB memory and built in 120GB disk (no BFV) | 19:28 |
clarkb | This is a bit larger than our old servers and I think we may end up running fewer gitea backends as a result | 19:29 |
fungi | well, also most of our gitea semi-outages have been due to memory exhaustion/swap thrash | 19:29 |
clarkb | I added the gitea09 server to our gitea group in ansible and let ansible deploy a complete gitea server but without git repo content to it. I then transplanted the database from gitea01 to gitea09 to preserve redirects | 19:30 |
fungi | so it might help there anyway | 19:30 |
clarkb | ++ | 19:30 |
clarkb | After transplanting the database I discovered that some old orgs that are no longer in projects.yaml no longer had working logos. I fixed this by manually copying files for them | 19:30 |
clarkb | So far the db transplant and the copying of the ~4 logos are the only manual interventions I've had to do | 19:30 |
clarkb | I then added gitea09 to gerrit's replication config and the gerrit restarts yesterday picked that up. I triggered a full sync to gitea09 which appears to be near completion. | 19:31 |
fungi | i would stick with 8 backends if we can. the reason the recommended flavors changed is that the memory-to-cpu ratio in the underlying hardware is higher and so the provider had lots of ram going unused on the servers anyway | 19:31 |
clarkb | The next steps I've got in mind are to do another full resync (for all 9 giteas) to make sure the gerrit restarts and problems yesterday didn't introduce problems with replication | 19:31 |
clarkb | At that point I think we can add gitea09 to haproxy and have it in production | 19:32 |
clarkb | Then I would like to upgrade gitea to 1.18.5 | 19:32 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/875533 upgrade gitea to 1.18.5 | 19:32 |
clarkb | Concurrent to that I'd also like to update Gerrit to autoreload replication configs | 19:33 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/874340 Gerrit replication autoreloading. | 19:33 |
clarkb | This will allow us to add more giteas and remove old giteas without gerrit restarts. | 19:33 |
clarkb | Previously we had removed autoreloading because we had noticed it would lose replication tasks on reload and the giteas would not all be in sync | 19:34 |
clarkb | However, the data/ dir storage used by the replication plugin should mitigate this and after discussion with nasserg I think if we bind mount that properly we should be good | 19:34 |
fungi | yeah, i'm happy to try it again | 19:34 |
clarkb | Once all of ^ is settled we can remove gitea08 and test this hypothesis. Then I'll probably boot 3 more giteas and build them out assembly line style and we can add them to production in bulk | 19:35 |
fungi | the issues it caused last time were only as disruptive as they were because it took us so long to identify the underlying cause | 19:35 |
fungi | we'll be on the lookout for it this time anyway | 19:35 |
clarkb | ++ | 19:35 |
corvus | (i just want to clarify that a gerrit restart (via docker stop/start/restart) shouldn't delete the data dir -- only a docker-compose down + up would do that) | 19:35 |
clarkb | corvus: correct | 19:35 |
clarkb | this was exercised yesterday with the restarts that didn't pull new images | 19:35 |
clarkb | whichconfirmed the replication tasks persist across those restarts if data/ content is preserved | 19:36 |
fungi | however, older gerrit versions didn't persist that queue to disk at all, with obvious reprecussions | 19:36 |
corvus | okay, i just wanted to get that out there and make sure we still expect that there have been some improvements (eg, replication actually writing to that dir) to make us think it will work better now | 19:36 |
corvus | clarkbfungi excellent that explains it perfectly, thanks | 19:36 |
clarkb | and then if we need/want another 4 I figure I'll do those in another batch together too | 19:37 |
clarkb | The only other consideration is when to move the gitea db backups off of gitea01. I figure I can do that once gitea09 is in production and running happily | 19:37 |
ianw | with the logos, iirc, we walk the list of orgs via the api and copy in the logos. was the problem the old orgs had been setup to an old logo that wasn't copied? | 19:37 |
clarkb | that way we can remove gitea01 at any point | 19:37 |
clarkb | ianw: the problem is the old orgs are not in projects.yaml anymore | 19:37 |
clarkb | ianw: and so they really only exist in hte context of our rename history and in the gitea db | 19:37 |
fungi | yeah, it's an issue with renamed orgs | 19:37 |
clarkb | ianw: all I had to do was copy the opendev logo file into a file named after the old org name | 19:38 |
ianw | i'm just wondering if we should codify that in the logo-setting role | 19:38 |
clarkb | ianw: we probably can. We'd need to inspect the rename history files to know what those orgs are I think | 19:39 |
ianw | it doesn't look at projects.yaml, but the gitea api. but it could then also have an extra list of "old" names to also copy | 19:39 |
clarkb | its also an easy manual workaround so I dind't want to hold things up on it | 19:39 |
clarkb | but happy for the help/improvements | 19:39 |
clarkb | but ya I think we've got a process largely sorted out now except for the gerrit replication config updates. Reviews on those changes much appreciated | 19:39 |
clarkb | and I'll keep working on this for the next bit until we've removed the old gitea servers. Then I can look at the next server type to upgrade. | 19:40 |
clarkb | any other questions/ideas/comments about gitea or server upgrades? | 19:41 |
clarkb | oh I have one actually. There is a request for a project rename. I'd like to request we do not rename anything until the gitea replacements are done | 19:41 |
clarkb | Thats just one extra moving part I don't want to have to keep track of as I go through this :) | 19:41 |
clarkb | #topic Handing over x/virtualdpu to OpenStack Ironic PTL | 19:42 |
clarkb | is it dpu or pdu? I may have typoed. fungi want to fill us in? | 19:43 |
fungi | this should hopefully be fairly straightforward, but i'll start with some background | 19:43 |
fungi | once upon a time, developers at internap wrote a virtual pdu project, and the openstack ironic project came to depend on it | 19:43 |
fungi | virtualpdu, as it was called, was never officially added to openstack, but ironic still relies on it | 19:44 |
fungi | the internap developers moved on, and no longer work at internap nor on virtualpdu, so it's basically abandoned | 19:44 |
fungi | the openstack ironic developers would like to adopt it, but reaching people with control of the current acl was... hard | 19:45 |
fungi | rpittau was finally able to get a response from mathieu mitchell, who expressed approval of the ironic team taking control of the repository, but since then has not replied further nor updated the acl | 19:46 |
fungi | no other virtualpdu maintainers replied at all (but all were cc'd) | 19:46 |
fungi | i have copies of the e-mail messages, complete with received headers, in case there's some dispute over things | 19:46 |
clarkb | My suggestion would be to publicly post intent to do the change in ownership to service-announce and openstack-discuss and cc the old maintainers and post a date when we'll make the change. They can concede or fight it in that period and if we hear nothing we make the change | 19:47 |
fungi | probably the next step is to post the intent to hand over access to openstack on a mailing list as well as cc the listed maintainer addresses, and set a date | 19:47 |
clarkb | I don't think anyone has nefarious intent here, its a useful tool and the old group isn't interested. New group is so we should support that | 19:47 |
fungi | if we hear no objections, then move forward granting control | 19:47 |
clarkb | fungi: ++ | 19:48 |
ianw | all seems fine -- if there was negative feedback would be harder | 19:48 |
fungi | exactly | 19:49 |
fungi | so, openstack-discuss? service-announce? suggestions on what mailing list(s) is/are most appropriate to post the notice of intent? | 19:49 |
clarkb | I feel like service-announce at least since its something happening at the opendev level | 19:50 |
fungi | we don't really have a codified process since this is basically the first time it has ever come up | 19:50 |
clarkb | but then openstack-discuss too might help get the email in front of the right eyeballs if someone did want to object | 19:50 |
fungi | yeah, i can post copies to both of those | 19:50 |
fungi | mainly bringing it up in the meeting here to make sure we've got some consensus among opendev sysadmins on a prototype process, since this is potential precedent for future similar cases | 19:51 |
fungi | seems like we have no objections over the proposal anyway | 19:52 |
clarkb | yup. I think starting by contacting maintainers directly and trying to resolve it without opendev involvement is the first step. Then if we don't get objections but also don't resolve it making a public announcement of the plan with a period to object is a good prcess | 19:52 |
ianw | ++ | 19:52 |
fungi | i'll try to get something sent out to mailing lists and the current maintainers tomorrow in that case | 19:52 |
fungi | nothing else from me on this topic, unless anyone has questions | 19:53 |
clarkb | #topic Works on ARM feedback | 19:53 |
clarkb | we are almost out of time and I wanted to get to this really quickly | 19:53 |
clarkb | The works on arm folks have asked us to talk about what we've done the last 6 months with their program. | 19:54 |
clarkb | I've started a draft response in an etherpad | 19:54 |
clarkb | #link https://etherpad.opendev.org/p/3DcVXw0PBOknv1bgyZWh | 19:54 |
clarkb | unfortunately we don't actually have 6 months of use, which i try to clarify in the email | 19:54 |
clarkb | ianw: maybe we tighten up the bit fungi had concerns about then one of us can send that soon? | 19:54 |
ianw | ++ | 19:55 |
fungi | also some cleanup as soon as your vhost change deploys | 19:55 |
clarkb | ok cool lets sync on that after the meeting | 19:55 |
clarkb | fungi: yup | 19:56 |
clarkb | #topic Open Discussion | 19:56 |
clarkb | Anything else? | 19:56 |
ianw | not from me, thanks for once again running the meeting! | 19:57 |
clarkb | Sounds like that may be it. Thank you for your time today. We'll be back here same time and location next week | 19:57 |
clarkb | #endmeeting | 19:57 |
opendevmeet | Meeting ended Tue Feb 28 19:57:56 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-02-28-19.01.html | 19:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-02-28-19.01.txt | 19:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-02-28-19.01.log.html | 19:57 |
fungi | thanks clarkb ! | 19:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!