08:00:09 <dalees> #startmeeting magnum 08:00:09 <opendevmeet> Meeting started Tue Jun 24 08:00:09 2025 UTC and is due to finish in 60 minutes. The chair is dalees. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:00:09 <opendevmeet> The meeting name has been set to 'magnum' 08:00:14 <dalees> #topic Roll Call 08:00:22 <dalees> o/ 08:01:11 <jakeyip> o/ 08:01:20 <sd109> o/ 08:01:42 <dalees> mnasiadka: ping 08:01:58 <mnasiadka> o/ (but on a different meeting so might not be very responsive) 08:02:14 <dalees> okay 08:02:34 <dalees> Agenda has reviews and one topic, so I moved that first. 08:02:39 <dalees> #topic Upgrade procedure 08:03:04 <dalees> jakeyip: you brought this one up? 08:03:14 <jakeyip> hi that's mine. just want to know, anyone has an upgrade workflow inmine? 08:03:23 <jakeyip> in mind? 08:04:17 <dalees> for the combination of magnum, helm driver, helm charts capo, capi? 08:05:44 <jakeyip> yes. 08:06:50 <dalees> do you mean versions, or actual processes of performing them? 08:07:15 <jakeyip> we recently looked into upgrading capi/capo but realised there are dependency with magnum-capi-helm and also helm charts 08:07:58 <sd109> capi-helm-charts publishes a dependencies.json which specifies which capi version is tested against for each release of the charts: https://github.com/azimuth-cloud/capi-helm-charts/blob/affae0544b07c4b2e641b3b5bf990e561c055a91/dependencies.json 08:08:27 <dalees> yeah, we've done capi/capo but not to their latest where they dropped capo's v1alpha7. that will be tricky to make sure all clusters are moved off old helm charts. 08:10:28 <jakeyip> sd109: are there upgrade tests being run ? 08:11:40 <jakeyip> charts will provision the resource at the version specified. upgrading CTs / charts _should_ upgrade those resources too? 08:13:21 <jakeyip> capi / capo upgrades also upgrade the resources too I believe, but ideally they should be upgraded by charts first? 08:14:28 <dalees> jakeyip: so the way these versions work is one is the 'stored' version (usually latest) and the k8s api can translate between that and any other served version (wheel-spoke in k8s docs). 08:15:18 <dalees> so once you upgrade capo, it'll write new CRDs for the new versions to k8s, and start storing and serving the new versions (eg v1beta1). It doesn't matter if the charts write in the old or new crd version as long as it's still served. 08:15:56 <dalees> so you need to care when they stop serving, but otherwise can keep using either old or new crd versions. 08:16:27 <sd109> There are upgrade tests being run but since we haven't upgrade CAPO past v0.10 (due to some security group changes in v0.11 and the need for the new ORC installation in v0.12) yet, we haven't actually tested v0.12 upgrade that drops v1alpha7 for example 08:16:43 <dalees> kubebuilder has some good info on this if you want to read more: https://book.kubebuilder.io/multiversion-tutorial/conversion-concepts.html 08:16:45 <jakeyip> in this case, the chart will be using an older version, what happens when you do a helm upgrade? 08:17:36 <dalees> sd109: ah, that's helpful to know you haven't gotten there either! 08:18:02 <sd109> When you do a helm upgrade, it will upgrade the resources to new v1beta1 thanks to https://github.com/azimuth-cloud/capi-helm-charts/pull/423 08:19:16 <jakeyip> no I mean a `helm upgrade` changing values but not chart, which happens when you resize etc 08:20:59 <dalees> if helm tries to talk to capo with an old crd version that isn't served it'll fail. 08:21:24 <dalees> ie. new capo (after v1alpha7 no longer served), and old chart that specifies v1alpha7. 08:21:29 <jakeyip> in the case where it is still served but not latest 08:21:47 <dalees> if it's served, it'll just translate to v1beta1 and store as that. 08:22:06 <jakeyip> ok 08:22:28 <dalees> that's the wheel-spoke model the kubebuilder docs talk about. (I *think* it changes the stored version in etcd on first write after the controller upgrade) 08:23:16 <jakeyip> I'll read that 08:23:49 <sd109> So I guess we need to make sure all user clusters are using a new enough version of capi-helm-charts to be on v1beta1 before we upgrade CAPO on the management cluster 08:24:19 <jakeyip> yeah and also driver needs to be upgraded to talk v1beta1 first 08:26:07 <jakeyip> I _think_ the sequence is something like - driver -> charts -> cluster templates -> all clusters -> capi+capo ? 08:27:07 <dalees> yeah, that sounds right - for the version of capo that drops v1alpha7 (was it v0.10?). 08:27:21 <sd109> I think the only place which needs updated in the driver is here: https://opendev.org/openstack/magnum-capi-helm/src/commit/60dc96c4dae8628e92c20b1ca594c4cf10eba5e4/magnum_capi_helm/kubernetes.py#L289 08:27:56 <dalees> you do need capo at least up to a version that supports v1beta1 first (but that will be most of the installs) 08:28:02 <sd109> And as far as I can tell that actually only affects the health_status of the cluster becuase it gets a 404 when trying to fetch the v1alpha7 version of the openstackcluster object from the management cluster 08:28:31 <sd109> It was v0.12 that dropped v1alpha7 in CAPO 08:28:46 <dalees> ah v0.12, thanks. 08:29:41 <jakeyip> I think there were more issues than that but I can't remember now 08:30:21 <dalees> jakeyip: you had a patchset for that - did it merge? 08:30:44 <jakeyip> for magnum? 08:30:55 <jakeyip> sorry for the driver? I didn't merge it yet I think 08:31:18 <dalees> ah this one - https://review.opendev.org/c/openstack/magnum-capi-helm/+/950806 08:31:36 <dalees> sd109: would you have a look at that soon? 08:31:38 <sd109> Can we move that to v1beta1 now instead of v1alpha7? 08:32:07 <dalees> yeah, I'd prefer that 08:32:24 <jakeyip> can we / should we increment more than two? 08:32:42 <dalees> well, it blocks upgrade to capo 0.12 if we don't 08:33:02 <jakeyip> does capo serve more than 2? 08:34:29 <dalees> It's probably worth noting the version restrictions, but yes they do. 08:35:03 <dalees> (i need to look up these particular versions of capo) 08:35:04 <jakeyip> ok I really need to try it out first then report back 08:35:42 <jakeyip> happy to skip ahead while I take a look at capo, then come back later 08:35:53 <sd109> There's some CAPO docs in API versions here which suggest CAPO does some kind of automatic migration to new API versions for us? https://cluster-api-openstack.sigs.k8s.io/topics/crd-changes/v1alpha7-to-v1beta1#migration 08:36:12 <sd109> I also need to go away and have a closer look so happy to move on for now 08:36:16 <dalees> if you would like a whole lot of yaml to read through, all the info about your capo version is found in: `kubectl get crd openstackclusters.infrastructure.cluster.x-k8s.io -o yaml` 08:36:42 <dalees> look for "served: true", "stored: true" and "storedVersions" 08:37:30 <dalees> okay, we shall move on. thanks for sharing what we each know! 08:37:51 <dalees> #topic Review: Autoscaling min/max defaults 08:38:05 <dalees> link https://review.opendev.org/c/openstack/magnum-capi-helm/+/952061 08:38:28 <dalees> so this is from a customer request to be able to change min and max autoscaling values 08:39:07 <dalees> i think we touched on this last meeting, but needed more thinking time. 08:39:15 <dalees> perhaps it's the same again 08:40:22 <sd109> Yeah sorry I haven't had time to look at that one yet, I'm hoping to get to it this week and will leave any comments I have on the patch itself 08:40:35 <jakeyip> hm I thought I reviewed that but seems like I didn't vote 08:41:26 <dalees> thanks both, we can move on unless there are things to discuss. please leave review notes in there when you get to it. 08:41:28 <jakeyip> I will give it a go again 08:42:28 <jakeyip> oh it's in draft :P 08:42:59 <dalees> #topic Review: Poll more Clusters for health status updates 08:43:10 <dalees> https://review.opendev.org/c/openstack/magnum/+/948681 08:43:29 <jakeyip> I will review this 08:43:34 <dalees> so this one I understand will add polling load to conductor for a large number of clusters 08:44:32 <dalees> we've been running it for ages, and it enables a few things I'm doing in later patchsets in the helm driver: better health_status, and pulling back node_count from autoscaler. 08:44:44 <dalees> it would be better to use watches for this than polling though. 08:47:46 <jakeyip> hm the current situation is syncing _COMPLETE, but this adds more? 08:49:16 <sd109> Don't think I have anything to add on this one, seems like a nice addition to me but agree that the extra load is worth thinking about 08:49:21 <dalees> ah, that's true - it already is polling CREATE_COMPLETE and UPDATE_COMPLETE. So really that would be most clusters. 08:51:44 <jakeyip> yeah I didn't understand you comments "Without the _COMPLETE I also wonder if..." . 08:53:16 <dalees> huh. I think I had confused myself on what was being added. 08:53:36 <dalees> agree, _COMPLETE is already there. 08:54:10 <jakeyip> I think what you want is adding CREATE_IN_PROGRESS to surface the errors where a cluster gets stuck midway thru creation with things like autoscaler pod errors? 08:54:45 <jakeyip> maybe need to clarify the use case in the commit message, then good to go 08:55:09 <dalees> yeah I think so. thanks 08:56:15 <dalees> there are 3 more reviews noted in agenda but only 5 min. Any particular of those or others we might talk about? 08:56:24 <dalees> #topic Open Discussion 08:56:39 <dalees> or other topics, for the last part of the meeting 08:57:00 <jakeyip> I will look at them, maybe discuss next week 08:57:05 <jakeyip> next meeting :P 08:58:21 <sd109> Yeah I haven't had time to look at the two Helm reviews either so I don't think we need to discuss them now 08:59:22 <dalees> all good, those helm ones are from stackhpc. Johns update to his one looks good and I want to progress Stigs one sometime as it's hurting us occasionally, but it can wait. 09:00:18 <sd109> Great, thanks. I'm also trying to get someone from our side to progress Stig's one too but proving difficult to find the time at the moment 09:01:04 <dalees> yep, i think it's promising. probably just needs to move more things (like delete) to the same reconciliation loop to avoid the conflicts. 09:01:11 <dalees> but yeah, time! 09:01:27 <dalees> #endmeeting