jakeyip | hi all, anyone around for meeting? | 08:54 |
---|---|---|
jakeyip | was a extra long weekend over here, nothing much changed from last meeting | 08:54 |
dalees | i'm back, but likewise little to share | 08:58 |
jakeyip | ok quick one then. | 09:00 |
jakeyip | #startmeeting magnum | 09:01 |
opendevmeet | Meeting started Wed Apr 3 09:01:03 2024 UTC and is due to finish in 60 minutes. The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot. | 09:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 09:01 |
opendevmeet | The meeting name has been set to 'magnum' | 09:01 |
jakeyip | #link https://etherpad.opendev.org/p/magnum-weekly-meeting | 09:01 |
jakeyip | #topic Roll Call | 09:01 |
jakeyip | o/ | 09:01 |
jakeyip | mnasiadka dalees courtesy ping :) | 09:01 |
dalees | o/ | 09:01 |
jakeyip | #topic Abandon old patches | 09:03 |
jakeyip | I've abandoned many old patches as agreed | 09:03 |
jakeyip | some I've starred for further reviews but haven't had time to action all of them | 09:03 |
dalees | Thanks for doing that, jakeyip | 09:03 |
jakeyip | feel free to abandon any earlier than 2022-10-05 as previously agreed | 09:06 |
jakeyip | anything else on this topic? | 09:06 |
jakeyip | I'll go on to the next one | 09:08 |
jakeyip | next topic | 09:08 |
jakeyip | #topic PTG | 09:08 |
jakeyip | Wed, 10 Apr, 06-08 UTC (next week) | 09:08 |
jakeyip | mark your cal, see you all there :) | 09:09 |
jakeyip | anything on PTG to chat about? | 09:11 |
dalees | nothing from me, but see you there | 09:11 |
mkjpryor | I guess all the Cluster API stuff | 09:11 |
mkjpryor | Just to make sure we all understand the plan, and agree as much as possible | 09:12 |
jakeyip | #topic ClusterAPI | 09:12 |
jakeyip | mkjpryor: go ahead :) | 09:12 |
mkjpryor | So we have moved the Helm driver to opendev now | 09:12 |
mkjpryor | As discussed before, we are still hoping for in to be in-tree in Dalmation | 09:13 |
mkjpryor | But I think that is one part that definitely needs discussing at the PTG | 09:13 |
mkjpryor | There are a few supporting components that we would also like to contribute to the Magnum project, but are currently quite dependent on GitHub Actions for their CI | 09:14 |
mkjpryor | In particular, the Helm charts themselves | 09:14 |
mkjpryor | But also our addon provider and the janitor that we use to do cleanup of OCCM resources | 09:14 |
mkjpryor | They will take more work to move to opendev | 09:14 |
mkjpryor | So, in summary, our current plan is for the Magnum project to own the Helm driver, the charts themselves, the addon provider and the janitor | 09:15 |
mkjpryor | Whether the Helm driver is in-tree or not, or just the Cluster API driver that is actually owned by the Magnum project in a separate repo, needs discussion | 09:16 |
mkjpryor | Our preference is for an in-tree Cluster API driver | 09:16 |
jakeyip | I think we should drop the in-tree discussion this round, let's get it working out-of-tree properly first. being out-of-tree gives us good advantage to iterate without the massive change chain I was grappling with previously. | 09:16 |
jakeyip | we need to get the CI working | 09:16 |
mkjpryor | This is all true | 09:16 |
mkjpryor | Although the "change chain" this time would just be an import of the driver as-is right? We wouldn't be doing it in increments like before. | 09:17 |
mkjpryor | I think mnasiadka has made some progress with the CI | 09:18 |
jakeyip | yeah we can discuss if and how to get it in when we clear the current obstacles | 09:19 |
jakeyip | on the same topic, I had some issues running it, so I was wondering if it's being used in the current iteration anyway | 09:19 |
mkjpryor | We should have somewhere to raise and discuss issues that isn't this meeting | 09:20 |
jakeyip | my issue is a mismatch version of autoscaler + kubernetes. I can fix it, but without CI it's not useful. | 09:20 |
jakeyip | so CI first | 09:20 |
mkjpryor | We have seen basically no issues with mismatched autoscaler and kubernetes versions. There may be something more sinister going on. | 09:21 |
jakeyip | mkjpryor: bugs will be opened in Launchpad as per goverance I think? | 09:21 |
mkjpryor | Yes | 09:21 |
mkjpryor | But in terms of chatting about general usage? Maybe bugs in launchpad that don't actually turn out to be bugs are fine | 09:22 |
jakeyip | we can chat here or in the Kubernetes Slack. | 09:22 |
dalees | mkjpryor: there is a channel #openstack-magnum on Kubernetes Slack. It's not very active, but that is a good persistent chat location perhaps. | 09:24 |
mkjpryor | Sounds good | 09:24 |
mkjpryor | Slack works for me | 09:24 |
jakeyip | it's early days not sure where the majority of the chat will end up. both sides have their advantages. feel free to DM me on Slack if you need me, the notification is a bit better there :) | 09:25 |
jakeyip | if you can stay after the meeting, it'll be great so I can post more details about the autoscaler issue | 09:26 |
jakeyip | mkjpryor ^ | 09:26 |
mkjpryor | I can do that | 09:27 |
jakeyip | thanks | 09:27 |
jakeyip | mkjpryor: for your other points, the helm chart already has a repo `openstack/magnum-capi-helm-charts`, but is currently empty | 09:28 |
jakeyip | it's set up for the same governance as `openstack/magnum-capi-helm` | 09:29 |
mkjpryor | Of all the projects, that is the one that is most heavily reliant on GitHub Actions | 09:29 |
mkjpryor | The CI will be difficult to port | 09:29 |
mkjpryor | And will take time, which we do not currently have | 09:29 |
jakeyip | can you elaborate on the difficulties? maybe others can help | 09:30 |
mkjpryor | Just we rely on a lot of actions to do things that would need to be replicated in another way | 09:31 |
jakeyip | ok, anything that's technically not possible in zuul? | 09:32 |
mkjpryor | I don't think so | 09:32 |
mkjpryor | We also use images built for Azimuth in the CI, which might be politically not good :shrugs: | 09:33 |
mkjpryor | I'm not sure whether the Magnum project wants to build and ship images | 09:33 |
mkjpryor | Probably not | 09:33 |
jakeyip | ok, someone has to push that... StackHPC is probably best suited to do it... | 09:34 |
mkjpryor | We are, but we are also small and busy | 09:34 |
jakeyip | don't have to replicate ALL the CI too, a small subset is fine | 09:34 |
mkjpryor | This is the issue with open-source that isn't fully funded by a customer right | 09:34 |
jakeyip | yeah same deal for most of us here... it'll be worse if we can't work together right? :D | 09:35 |
jakeyip | if we can get it working, we can attract more people to help with maintenance. the initial hump is hard to get it up enough for others to see it as viable. | 09:37 |
mkjpryor | Of course, but there is a significant chunk of initial effort that is required from us | 09:37 |
jakeyip | I'm struggling with this being the PTL. | 09:37 |
mkjpryor | We have to offset this effort against customer work as well. It is tricky. | 09:38 |
mkjpryor | Also, we already have a set of things that work for us so there is inertia there as well. | 09:38 |
mkjpryor | But we do want to get there | 09:39 |
jakeyip | yeah can understand. but this is critical get us from the OpenStack world to the Kubernetes world. That's why I'm still doing this... | 09:39 |
jakeyip | anyway for the helm chart, I think we need it in governance and CI so that we can strip it down as a minimal product that can be extended by operators. that's what I see is good about this | 09:42 |
mkjpryor | I agree. That is definitely a strength of the approach for sue. | 09:43 |
mkjpryor | sure* | 09:43 |
jakeyip | FYI it's a point of contention with TC if the driver references the helm chart managed in stackhpc github, so I don't think we have much room on this. | 09:43 |
jakeyip | this driver sure got many people's attention now :) | 09:44 |
mkjpryor | So we talked about setting up a GitHub org for Azimuth the project, because we also want that to exist independently of StackHPC, and the charts could live there | 09:44 |
mkjpryor | There is definitely precedent for using components from other open-source projects | 09:44 |
mkjpryor | So maybe that is more palatable as an interim solution | 09:45 |
mkjpryor | After all, it is just an external component that can be easily replaced | 09:45 |
mkjpryor | I do understand that people have an issue with the default coming from a specific vendor | 09:46 |
jakeyip | I don't understand this approach, is it so that the GH actions can be ported more easily from SHPC to Azimuth org? and it is harder to port it to Zuul ? | 09:46 |
mkjpryor | Well - we want somewhere for Azimuth to live that is independent of StackHPC. No offence to Gerrit, but I wouldn't choose to use it unless I really had to. | 09:46 |
mkjpryor | So the new GitHub org is primarily for that | 09:47 |
mkjpryor | But it would have the side-effect that the CAPI Helm charts would now be an Azimuth sub-project rather than a "StackHPC product" (which we don't consider them to be now, but others clearly do) | 09:47 |
mkjpryor | And yes - the CI would "just work" (modulo setting up some CI vars) in the new org | 09:48 |
mkjpryor | So it is easier than porting to Zuul in that sense | 09:48 |
jakeyip | It's a departure from what was originally proposed, I'm not sure if it will be agreeable to the many parties | 09:50 |
mkjpryor | Just a thought on an alternative way to get to a palatable solution for the short term that doesn't rely on a large time investment from us that we might not be able to commit to | 09:50 |
mkjpryor | Long term, we definitely want the charts under Magnum I think | 09:51 |
mkjpryor | If only we could get a customer to fund "moving the CAPI Helm charts under the governance of OpenInfra" | 09:51 |
mkjpryor | :chuckles: | 09:52 |
mkjpryor | Honestly, the time and availability of the right people to do the work is the major blocker here | 09:52 |
mkjpryor | Especially when something exists that works nicely | 09:53 |
mkjpryor | Anyway - I think we are saying the same thing. We want the charts under Magnum governance. | 09:53 |
mkjpryor | But tensioning the time required to do that against the work that pays the bills is difficult. | 09:54 |
jakeyip | I'm looking at the actions now https://github.com/stackhpc/capi-helm-charts/actions , a bunch of them don't have any runs, is there like a minimal thing we need to get it working? keep in mind we don't need ALL the CI | 09:54 |
jakeyip | I think a big issue is StackHPC shifting to use the charts under Magnum governance but that doesn't have to happen | 09:54 |
mkjpryor | So at the moment, because they use our test infra, all the runs require manual approval which is only given after the changes have been reviewed | 09:55 |
mkjpryor | So there are a lot of runs that stay in the pending state and never get executed, yes | 09:56 |
mkjpryor | That would obviously change with Zuul | 09:56 |
jakeyip | I'm seeing like https://github.com/stackhpc/capi-helm-charts/actions/workflows/ensure-capi-images.yaml doesn't have any run | 09:57 |
jakeyip | https://github.com/stackhpc/capi-helm-charts/actions/workflows/lint.yaml | 09:57 |
mkjpryor | Oh | 09:57 |
mkjpryor | A lot of those workflows are reusable workflows that are called from other workflows | 09:57 |
mkjpryor | Not run directly | 09:57 |
mkjpryor | For example, those two you mention are called from main.yaml and pr.yaml | 09:58 |
jakeyip | I see. | 09:58 |
mkjpryor | This is what I mean by it isn't going to be trivial to port | 09:58 |
opendevreview | Merged openstack/magnum master: CI: Use Calico v3.26.4 https://review.opendev.org/c/openstack/magnum/+/911577 | 09:59 |
jakeyip | should I start from main.yaml to get an idea what needs to be ported? | 10:00 |
mkjpryor | Yeah | 10:01 |
mkjpryor | main.yaml runs the most minimal set of tests that we have | 10:01 |
jakeyip | ok I'll take a look | 10:02 |
mkjpryor | Another thing we do is mirror all the required images to https://quay.io/organization/azimuth, and use that as a registry mirror for each of the major registries | 10:02 |
jakeyip | I have a question - how do we get the chart built and hosted on opendev? | 10:02 |
mkjpryor | mnasiadka and I spoke about this before | 10:02 |
mkjpryor | Probably the easiest place to host the chart would be on Artifact Hub | 10:03 |
mkjpryor | https://artifacthub.io/ | 10:03 |
mkjpryor | So I guess there would be a Zuul job that ran the Helm packaging and pushed the resulting chart to there | 10:03 |
mkjpryor | But you don't need to package the chart to test it | 10:04 |
mkjpryor | Probably getting the CI working is more pressing | 10:04 |
jakeyip | yeah | 10:05 |
jakeyip | ok I need to end meeting cos overrun | 10:06 |
jakeyip | dalees / mnasiadka / mkjpryor: anything else we need to capture for meeting? | 10:06 |
jakeyip | ok I'll end it | 10:09 |
jakeyip | #endmeeting | 10:09 |
opendevmeet | Meeting ended Wed Apr 3 10:09:14 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 10:09 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/magnum/2024/magnum.2024-04-03-09.01.html | 10:09 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/magnum/2024/magnum.2024-04-03-09.01.txt | 10:09 |
opendevmeet | Log: https://meetings.opendev.org/meetings/magnum/2024/magnum.2024-04-03-09.01.log.html | 10:09 |
jakeyip | mkjpryor: so my issue with autoscaler was registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0 fail to start on a v1.28.1 cluster | 10:10 |
jakeyip | you tried this combination before? | 10:11 |
mkjpryor | So | 10:11 |
mkjpryor | This isn't actually a version mismatch | 10:11 |
mkjpryor | Just a permission missing from our clusterrole | 10:11 |
mkjpryor | https://github.com/stackhpc/capi-helm-charts/pull/282 | 10:11 |
mkjpryor | There will be a new release of the charts today with the fix in | 10:12 |
jakeyip | cool | 10:13 |
mkjpryor | Basically, the v1.29.0 version of the autoscaler needs permission to access an extra CAPI resource | 10:13 |
mkjpryor | Even though the OpenStack provider doesn't actually support machinepools | 10:14 |
jakeyip | ok. I was going off https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#releases | 10:14 |
jakeyip | > We recommend using Cluster Autoscaler with the Kubernetes control plane (previously referred to as master) version for which it was meant | 10:14 |
mkjpryor | They do say that | 10:14 |
mkjpryor | But in practice we haven't seen any issues with using mismatched versions | 10:15 |
jakeyip | any plan to set the CA tag to kube_tag? PR or WIP? | 10:15 |
jakeyip | yeah it was previously ok for us too, just broke. | 10:16 |
mkjpryor | The Helm driver doesn't use kube_tag, because you can't change the Kubernetes version like that | 10:16 |
mkjpryor | The Kubernetes version is tied to the image you use | 10:16 |
mkjpryor | So we use image properties | 10:16 |
jakeyip | oh yeah right hmm | 10:17 |
mkjpryor | Also, they don't release autoscaler versions for every point release, which makes doing it automatically tricky | 10:17 |
mkjpryor | i.e. if you have a 1.29.3 cluster, how do you know whether you need to use 1.29.0, 1.29.1, 1.29.2 or 1.29.3 for the autoscaler | 10:18 |
jakeyip | well on the webpage it's .X so I assume they do some sort of testing within the dot releases but not across minor versions | 10:19 |
mkjpryor | Because at the moment, the latest is 1.29.0 | 10:19 |
mkjpryor | Yeah, but there isn't a cluster-autoscaler tag of 1.29.3 even though Kubernetes 1.29.3 is out | 10:19 |
mkjpryor | Is what I mean | 10:19 |
mkjpryor | So you can't just use the Kubernetes version as the tag | 10:19 |
mkjpryor | But also, you don't just want to use autoscaler 1.29.0 in case there is a bugfix release | 10:20 |
jakeyip | yeah I understand now | 10:20 |
mkjpryor | For example, there are autoscaler tags for 1.28.{0,1,2} even though there are way more Kubernetes 1.28.X versions than that | 10:21 |
mkjpryor | And I assume that if you are running a 1.28.1 cluster, you want the 1.28.2 autoscaler as it has bugfixes :shrugs: | 10:21 |
jakeyip | I think a cloud operator might carry their helm charts with pinned versions | 10:22 |
mkjpryor | That makes sense | 10:22 |
jakeyip | how would upgrades be done? | 10:22 |
mkjpryor | At the moment, we just use 1.29.0 and it seems to work with 1.26, 1.27 and 1.28 clusters as well | 10:22 |
mkjpryor | Just change the Helm values and the autoscaler deployment would be upgraded | 10:23 |
jakeyip | from Magnum POV ? | 10:23 |
mkjpryor | So in our Magnum deployments with this driver, we are tying each Magnum template to a specific chart version and iamge | 10:23 |
mkjpryor | So when we roll out new templates, they have a new chart version and image | 10:24 |
jakeyip | ok makes sense | 10:24 |
mkjpryor | Which triggers an upgrade of the various components | 10:24 |
jakeyip | what k8s versions have you upgraded across? | 10:25 |
mkjpryor | In the CAPI Helm chart CI we test 1.27 -> 1.28 -> 1.29 right now | 10:25 |
jakeyip | and how do you version your cluster templates? | 10:26 |
mkjpryor | So we release new templates every month when we build new images | 10:26 |
mkjpryor | We version them with the Kubernetes point version | 10:26 |
mkjpryor | If we need to create a new template mid-cycle, then we add -1, -2 to the end I think | 10:27 |
jakeyip | I'm thinking e.g. how to upgrade autoscaler from 1.29.0 to 1.29.1 if there's a bug in 1.29.0 | 10:27 |
jakeyip | ok maybe that's what I would end up doing | 10:28 |
mkjpryor | I suppose you could version your templates as a combination of the kubernetes and chart versions | 10:28 |
jakeyip | have you tried upgrade -1 to -2? | 10:28 |
mkjpryor | That might work | 10:28 |
mkjpryor | It is just the same as any other template upgrade | 10:28 |
jakeyip | which ideally will only upgrade the helm chart | 10:28 |
mkjpryor | All it ends up doing is a Helm upgrade with the version and values derived from the template and cluster config | 10:29 |
jakeyip | one annoying thing is that magnum cluster templates can't be renamed or deleted if there are clusters using it | 10:29 |
mkjpryor | So that is also the case in Azimuth, but we allow templates to be marked as deprecated, which prevents them being used for new clusters | 10:29 |
mkjpryor | Magnum could do similar? | 10:30 |
jakeyip | I hide them but then the API gets confused if the user uses name instead of template id | 10:30 |
jakeyip | ideally I hope to have template names like 'xxxx-v1.28.6' and whenever the user uses that name it gets the newest CT fitting that name | 10:31 |
mkjpryor | I see what you mean | 10:32 |
jakeyip | thinking thru, maybe we could actually upgrade users using the old 'xxx-v1.28.6' to the newer one and delete the old CT | 10:32 |
jakeyip | since it's all helm anyway | 10:32 |
mkjpryor | Depends if you want to force an upgrade | 10:32 |
mkjpryor | Which could be disruptive | 10:33 |
jakeyip | hmm | 10:33 |
jakeyip | good point I'll solve it when I get to it :) | 10:33 |
mkjpryor | I think a lot of those decisions will be what the operator is happy with | 10:34 |
mkjpryor | Some will be happy with users running out-of-date clusters as long as they are isolated | 10:34 |
mkjpryor | Others will want to make sure their users stay up to date | 10:34 |
jakeyip | anyway if the CA 1.29.0 thing gets solved, I will be able to help you with it | 10:35 |
mkjpryor | Cool | 10:35 |
mkjpryor | There will be a chart release today | 10:35 |
jakeyip | with the capi-helm-charts CI or something along that line | 10:35 |
mkjpryor | Help porting the CAPI Helm charts to opendev would be much appreciated, | 10:35 |
jakeyip | I'm swapping Nectar dev over to use https://opendev.org/openstack/magnum-capi-helm from the in-tree patches | 10:36 |
mkjpryor | Cool | 10:36 |
mkjpryor | More users of that would also be helpful | 10:36 |
jakeyip | when that works I'll look at swapping to use https://opendev.org/openstack/magnum-capi-helm-charts | 10:36 |
jakeyip | looking the pace of release, one thing I'm concerned with is when to cut from StackHPC and then start to get CI working | 10:37 |
jakeyip | you have a lot of commits and the two repos will drift as soon as we cut it | 10:38 |
mkjpryor | Well we have automated updates of addon versions | 10:39 |
mkjpryor | So yeah - it will drift quite quickly | 10:39 |
mkjpryor | But in terms of actual releases, we cut them on the first Wednesday of each month | 10:39 |
mkjpryor | Unless we have bugs to fix | 10:40 |
jakeyip | ah good info | 10:40 |
jakeyip | I guess I can start testing with this month's version and maybe I can import on this | 10:40 |
jakeyip | (pending CI working for openstack/magnum-capi-helm) | 10:41 |
mkjpryor | For the time being, we could propose each release from stackhpc/capi-helm-charts to opendev/magnum-capi-helm-charts | 10:41 |
mkjpryor | Until we have CI working that we are happy with on the opendev side | 10:41 |
jakeyip | ok. | 10:42 |
mkjpryor | Then they hopefully don't drift too much | 10:42 |
jakeyip | wonder how downstream operators will clone and keep up | 10:42 |
jakeyip | I'll think about it | 10:44 |
jakeyip | thanks for your time today. | 10:45 |
mkjpryor | No worries | 10:45 |
jakeyip | I'll knock off, it's late for me :) | 10:46 |
mkjpryor | Any help you are able to offer would be much appreciated | 10:46 |
mkjpryor | I'd like to see the charts under Magnum governance | 10:46 |
jakeyip | sure will be happy to | 10:46 |
mkjpryor | Yeah - go get some rest! | 10:46 |
jakeyip | ok, if you can get onto the Kubernetes Slack that'll be great too, can find me in #openstack-magnum. | 10:48 |
jakeyip | thanks for coming to the meeting | 10:48 |
jakeyip | seeya! | 10:48 |
mnasiadka | jakeyip: was off today ;) | 15:16 |
*** gmann_ is now known as gmann | 16:55 | |
opendevreview | Merged openstack/magnum master: Replace abc.abstractproperty with property and abc.abstractmethod https://review.opendev.org/c/openstack/magnum/+/852010 | 23:05 |
opendevreview | Travis Holton proposed openstack/magnum-capi-helm master: add label for enabling auto scaling https://review.opendev.org/c/openstack/magnum-capi-helm/+/915031 | 23:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!