08:00:09 <dalees> #startmeeting magnum
08:00:09 <opendevmeet> Meeting started Tue Jun 24 08:00:09 2025 UTC and is due to finish in 60 minutes.  The chair is dalees. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:00:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:00:09 <opendevmeet> The meeting name has been set to 'magnum'
08:00:14 <dalees> #topic Roll Call
08:00:22 <dalees> o/
08:01:11 <jakeyip> o/
08:01:20 <sd109> o/
08:01:42 <dalees> mnasiadka: ping
08:01:58 <mnasiadka> o/ (but on a different meeting so might not be very responsive)
08:02:14 <dalees> okay
08:02:34 <dalees> Agenda has reviews and one topic, so I moved that first.
08:02:39 <dalees> #topic Upgrade procedure
08:03:04 <dalees> jakeyip: you brought this one up?
08:03:14 <jakeyip> hi that's mine. just want to know, anyone has an upgrade workflow inmine?
08:03:23 <jakeyip> in mind?
08:04:17 <dalees> for the combination of magnum, helm driver, helm charts capo, capi?
08:05:44 <jakeyip> yes.
08:06:50 <dalees> do you mean versions, or actual processes of performing them?
08:07:15 <jakeyip> we recently looked into upgrading capi/capo but realised there are dependency with magnum-capi-helm and also helm charts
08:07:58 <sd109> capi-helm-charts publishes a dependencies.json which specifies which capi version is tested against for each release of the charts: https://github.com/azimuth-cloud/capi-helm-charts/blob/affae0544b07c4b2e641b3b5bf990e561c055a91/dependencies.json
08:08:27 <dalees> yeah, we've done capi/capo but not to their latest where they dropped capo's v1alpha7. that will be tricky to make sure all clusters are moved off old helm charts.
08:10:28 <jakeyip> sd109: are there upgrade tests being run ?
08:11:40 <jakeyip> charts will provision the resource at the version specified. upgrading CTs / charts _should_ upgrade those resources too?
08:13:21 <jakeyip> capi / capo upgrades also upgrade the resources too I believe, but ideally they should be upgraded by charts first?
08:14:28 <dalees> jakeyip: so the way these versions work is one is the 'stored' version (usually latest) and the k8s api can translate between that and any other served version (wheel-spoke in k8s docs).
08:15:18 <dalees> so once you upgrade capo, it'll write new CRDs for the new versions to k8s, and start storing and serving the new versions (eg v1beta1). It doesn't matter if the charts write in the old or new crd version as long as it's still served.
08:15:56 <dalees> so you need to care when they stop serving, but otherwise can keep using either old or new crd versions.
08:16:27 <sd109> There are upgrade tests being run but since we haven't upgrade CAPO past v0.10 (due to some security group changes in v0.11 and the need for the new ORC installation in v0.12) yet, we haven't actually tested v0.12 upgrade that drops v1alpha7 for example
08:16:43 <dalees> kubebuilder has some good info on this if you want to read more: https://book.kubebuilder.io/multiversion-tutorial/conversion-concepts.html
08:16:45 <jakeyip> in this case, the chart will be using an older version, what happens when you do a helm upgrade?
08:17:36 <dalees> sd109: ah, that's helpful to know you haven't gotten there either!
08:18:02 <sd109> When you do a helm upgrade, it will upgrade the resources to new v1beta1 thanks to https://github.com/azimuth-cloud/capi-helm-charts/pull/423
08:19:16 <jakeyip> no I mean a `helm upgrade` changing values but not chart, which happens when you resize etc
08:20:59 <dalees> if helm tries to talk to capo with an old crd version that isn't served it'll fail.
08:21:24 <dalees> ie. new capo (after v1alpha7 no longer served), and old chart that specifies v1alpha7.
08:21:29 <jakeyip> in the case where it is still served but not latest
08:21:47 <dalees> if it's served, it'll just translate to v1beta1 and store as that.
08:22:06 <jakeyip> ok
08:22:28 <dalees> that's the wheel-spoke model the kubebuilder docs talk about. (I *think* it changes the stored version in etcd on first write after the controller upgrade)
08:23:16 <jakeyip> I'll read that
08:23:49 <sd109> So I guess we need to make sure all user clusters are using a new enough version of capi-helm-charts to be on v1beta1 before we upgrade CAPO on the management cluster
08:24:19 <jakeyip> yeah and also driver needs to be upgraded to talk v1beta1 first
08:26:07 <jakeyip> I _think_ the sequence is something like - driver -> charts -> cluster templates -> all clusters -> capi+capo ?
08:27:07 <dalees> yeah, that sounds right - for the version of capo that drops v1alpha7 (was it v0.10?).
08:27:21 <sd109> I think the only place which needs updated in the driver is here: https://opendev.org/openstack/magnum-capi-helm/src/commit/60dc96c4dae8628e92c20b1ca594c4cf10eba5e4/magnum_capi_helm/kubernetes.py#L289
08:27:56 <dalees> you do need capo at least up to a version that supports v1beta1 first (but that will be most of the installs)
08:28:02 <sd109> And as far as I can tell that actually only affects the health_status of the cluster becuase it gets a 404 when trying to fetch the v1alpha7 version of the openstackcluster object from the management cluster
08:28:31 <sd109> It was v0.12 that dropped v1alpha7 in CAPO
08:28:46 <dalees> ah v0.12, thanks.
08:29:41 <jakeyip> I think there were more issues than that but I can't remember now
08:30:21 <dalees> jakeyip: you had a patchset for that - did it merge?
08:30:44 <jakeyip> for magnum?
08:30:55 <jakeyip> sorry for the driver? I didn't merge it  yet I think
08:31:18 <dalees> ah this one - https://review.opendev.org/c/openstack/magnum-capi-helm/+/950806
08:31:36 <dalees> sd109: would you have a look at that soon?
08:31:38 <sd109> Can we move that to v1beta1 now instead of v1alpha7?
08:32:07 <dalees> yeah, I'd prefer that
08:32:24 <jakeyip> can we / should we increment more than two?
08:32:42 <dalees> well, it blocks upgrade to capo 0.12 if we don't
08:33:02 <jakeyip> does capo serve more than 2?
08:34:29 <dalees> It's probably worth noting the version restrictions, but yes they do.
08:35:03 <dalees> (i need to look up these particular versions of capo)
08:35:04 <jakeyip> ok I really need to try it out first then report back
08:35:42 <jakeyip> happy to skip ahead while I take a look at capo, then come back later
08:35:53 <sd109> There's some CAPO docs in API versions here which suggest CAPO does some kind of automatic migration to new API versions for us? https://cluster-api-openstack.sigs.k8s.io/topics/crd-changes/v1alpha7-to-v1beta1#migration
08:36:12 <sd109> I also need to go away and have a closer look so happy to move on for now
08:36:16 <dalees> if you would like a whole lot of yaml to read through, all the info about your capo version is found in: `kubectl get crd openstackclusters.infrastructure.cluster.x-k8s.io -o yaml`
08:36:42 <dalees> look for "served: true", "stored: true" and "storedVersions"
08:37:30 <dalees> okay, we shall move on. thanks for sharing what we each know!
08:37:51 <dalees> #topic Review: Autoscaling min/max defaults
08:38:05 <dalees> link https://review.opendev.org/c/openstack/magnum-capi-helm/+/952061
08:38:28 <dalees> so this is from a customer request to be able to change min and max autoscaling values
08:39:07 <dalees> i think we touched on this last meeting, but needed more thinking time.
08:39:15 <dalees> perhaps it's the same again
08:40:22 <sd109> Yeah sorry I haven't had time to look at that one yet, I'm hoping to get to it this week and will leave any comments I have on the patch itself
08:40:35 <jakeyip> hm I thought I reviewed that but seems like I didn't vote
08:41:26 <dalees> thanks both, we can move on unless there are things to discuss.  please leave review notes in there when you get to it.
08:41:28 <jakeyip> I will give it a go again
08:42:28 <jakeyip> oh it's in draft :P
08:42:59 <dalees> #topic Review: Poll more Clusters for health status updates
08:43:10 <dalees> https://review.opendev.org/c/openstack/magnum/+/948681
08:43:29 <jakeyip> I will review this
08:43:34 <dalees> so this one I understand will add polling load to conductor for a large number of clusters
08:44:32 <dalees> we've been running it for ages, and it enables a few things I'm doing in later patchsets in the helm driver: better health_status, and pulling back node_count from autoscaler.
08:44:44 <dalees> it would be better to use watches for this than polling though.
08:47:46 <jakeyip> hm the current situation is syncing _COMPLETE, but this adds more?
08:49:16 <sd109> Don't think I have anything to add on this one, seems like a nice addition to me but agree that the extra load is worth thinking about
08:49:21 <dalees> ah, that's true - it already is polling CREATE_COMPLETE and UPDATE_COMPLETE. So really that would be most clusters.
08:51:44 <jakeyip> yeah I didn't understand you comments "Without the _COMPLETE I also wonder if..." .
08:53:16 <dalees> huh. I think I had confused myself on what was being added.
08:53:36 <dalees> agree, _COMPLETE is already there.
08:54:10 <jakeyip> I think what you want is adding CREATE_IN_PROGRESS to surface the errors where a cluster gets stuck midway thru creation with things like autoscaler pod errors?
08:54:45 <jakeyip> maybe need to clarify the use case in the commit message, then good to go
08:55:09 <dalees> yeah I think so. thanks
08:56:15 <dalees> there are 3 more reviews noted in agenda but only 5 min. Any particular of those or others we might talk about?
08:56:24 <dalees> #topic Open Discussion
08:56:39 <dalees> or other topics, for the last part of the meeting
08:57:00 <jakeyip> I will look at them, maybe discuss next week
08:57:05 <jakeyip> next meeting :P
08:58:21 <sd109> Yeah I haven't had time to look at the two Helm reviews either so I don't think we need to discuss them now
08:59:22 <dalees> all good, those helm ones are from stackhpc. Johns update to his one looks good and I want to progress Stigs one sometime as it's hurting us occasionally, but it can wait.
09:00:18 <sd109> Great, thanks. I'm also trying to get someone from our side to progress Stig's one too but proving difficult to find the time at the moment
09:01:04 <dalees> yep, i think it's promising. probably just needs to move more things (like delete) to the same reconciliation loop to avoid the conflicts.
09:01:11 <dalees> but yeah, time!
09:01:27 <dalees> #endmeeting