08:00:41 <dalees> #startmeeting magnum 08:00:41 <opendevmeet> Meeting started Tue Jun 10 08:00:41 2025 UTC and is due to finish in 60 minutes. The chair is dalees. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:41 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:00:41 <opendevmeet> The meeting name has been set to 'magnum' 08:00:46 <dalees> #topic Roll Call 08:00:55 <dalees> Hi all o/ 08:01:43 <sd109> Hi Dale 08:03:43 <dalees> just waiting for any others, I'm sure jakeyip is around 08:04:02 <jakeyip> o/ 08:05:53 <dalees> #topic Reviews 08:06:18 <dalees> alright let's get started; a few reviews to discuss in the agenda. Please add any others 08:06:36 <dalees> the first is `Pin OpenStack release version in Devstack script`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/940249 08:08:17 <sd109> I added this one, just wanted to try and get it over the line because myself and all of my StackHPC colleagues have been cherry-picking that patch into every devstack environment so that we can test and develop against specific OpenStack versions rather than being at the whims of (sometimes broken) master branch 08:09:40 <dalees> ok - it appears designed to have no functional changes unless you hardcode a different `OPENSTACK_VERSION` at the top of that file? 08:09:47 <sd109> Yes exactly 08:10:08 <jakeyip> I am fine with that, it's a contrib 08:10:17 <jakeyip> I do the same for my personal devstack anyway 08:10:41 <jakeyip> I was waiting for mnasiadka to chime in as he has better CI knowledge 08:11:00 <mnasiadka> sorry, having another call 08:11:09 <dalees> Sure, no objections to that merging. You might consider having that value at the time default so it could be provided without editing the shell script? 08:11:24 <mnasiadka> Basically I'm going to rewrite a lot of that CI in coming weeks, so fine - update that file, I'm going to remove it soon anyways in favour of good docs 08:11:41 <mnasiadka> :-) 08:12:16 <dalees> we appreciate you, mnasiadka 08:13:20 <mnasiadka> But yes, we should end up with magnum-capi-helm CI jobs soon - so we can work on deleting the Heat driver and some other things 08:13:45 <jakeyip> that will be awesome 08:13:47 <sd109> Okay sounds good, I wasn't aware that you were planning to remove that file when fixing up the CI 08:14:00 <dalees> I've left a comment, but I also don't have much of an opinion; it can merge as is. 08:14:44 <dalees> Shall we move on to discuss the next one? 08:15:03 <jakeyip> ok 08:15:34 <dalees> next is `Allow Magnum operators to provide additional Helm values to all clusters`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/951966 08:16:04 <dalees> I had a read of this one today, and it looks interesting. It might save us some modifications we're making to helm chart values files. 08:16:51 <jakeyip> yeah I agree this is an interesting idea 08:16:53 <sd109> Yeah that was our hope, we have a couple of customers asking for a feature like this and just wanted to make sure that you guys were onboard with adding something like this before we invest much effort into it 08:17:03 <dalees> however, it does apply to every helm chart used. So it's more limiting than publishing a helm chart per k8s version (as we do). It leads you back to global config that can't be changed without affecting old clusters. 08:17:52 <dalees> I dont mind adding it, but I don't think I'll use it for much, after considering this. I want certainty that old clusters on old charts don't accidentally change if we change some config (only on upgrade). 08:18:51 <sd109> Yeah that's a good point, we'll need to think about how we can protect against that e.g. if operator changes the default CNI used in the Helm values that would cause trouble for existing clusters on upgrade I think 08:18:54 <dalees> i can see it being useful for someone with several installations of Magnum and different configs across them, so they don't have to fork and publish their own helm charts. 08:19:44 <sd109> Out of interest, what's your reasoning for using a different chart per k8s version? What kind of things do you change in the charts between versions? 08:20:25 <jakeyip> good point. it'll be a pain maintaining this file, it almost encourages a bad practice? 08:20:58 <dalees> eg. chart version x is used for k8s 1.29. Then we rebase onto chart version y and use it for k8s 1.30. This way we never have any upgrades for a customer who sticks with k8s 1.29 (and may upgrade to the next patch version 1.29.x) 08:21:36 <dalees> and we can make big changes in the chart (pulling in all stackhpc's version bumps) with minor k8s versions. but customers have certainty and stability within patch upgrades. 08:22:23 <dalees> jakeyip: yeah, it's a trap to put any tag versions in there. Only global static config would work. OCI mirror overrides might be a good candidate. 08:23:15 <sd109> Perhaps it would make more sense to allow the value overrides as a field on the cluster template instead? 08:24:01 <jakeyip> without this file, each magnum chart is independent. with this file, you have some values that you need to maintain across all charts used i+n 08:24:27 <sd109> e.g. a something like a `config_overrides` label who's value is a JSON formatted string which the driver merges into the Helm values? 08:24:31 <dalees> isn't that what we use labels for currently? (there are always some config not implemented yet, of course!) 08:25:07 <sd109> Yeah agreed but I was hoping to avoid having to add a label for every single configurable value in the Helm charts if possible 08:25:12 <jakeyip> ... used in all templates. taking into account new creates, upgrades... 08:25:33 <dalees> sd109: what is the main use case of values you'd put in there? 08:25:34 <jakeyip> sd109: I thought about that before, I actually wrote some code for it... 08:26:46 <jakeyip> I feel the json formatted string for setting any values can be quite dangerous though 08:26:48 <dalees> yeah, a rethink of labels given their immutability woud;nt' be the worst idea. This might be another driver for that (or path to deprecating them). 08:27:53 <sd109> We have customers who want to do things like installing qemu-guest agent in all cluster VMs which they could do via the kubeadm config exposed in the helm chart values, or things like setting annotations on the CAPI Machine objects to have more fine grained control over autoscaler behaviour 08:28:30 <sd109> But I agree with Jake that setting values via JSON string could also have it's downsides 08:29:29 <dalees> ah, i see. and these would be okay in your customer case because it's their installation and maybe it does want to be applied for every cluster, including existing ones. 08:30:16 <jakeyip> sd109: what is a 'customer'? is it an op or a end user? I can image for operator, having an ability to override any values without touching `magnum-capi-helm` code can be useful. but for end user, it will be dangerous if those values are not vetted. basically it's an open invite to run anything? 08:30:32 <sd109> Yeah exactly, it saves them either having to rebuild new images for all clusters or forking the Helm charts and setting up chart publishing etc 08:30:42 <sd109> Yeah customer here is operator of a cloud 08:31:15 <dalees> the driver can be made to do this today, via the `magnum.conf`. It's just fairly flat in structure, so you might add these to the chart and flag them on/off or make them simpler. It's not as flexible as this values override though. 08:32:48 <jakeyip> I don't understand this dalees ^ 08:34:56 <dalees> well, now I write out the steps, it's a lot of actual code changes vs editing one values file! 08:35:09 <sd109> I guess the alternative here is that we are happy to continue adding more and more labels for niche use cases like this and then putting together some proper documentation on the full list of supported labels in the driver? 08:35:58 <jakeyip> I feel like this needs a rethink; the current model doesn't really align well weith helm chart values 08:38:03 <sd109> What if we refactored the driver code so that the values overrides were only merged in during the `create_cluster` function but not within the `update_cluster` and other future operations? That might protect backwards compatibility? 08:38:33 <dalees> yeah, that's a point sd109 - it ends with a lot of specialized flag labels in Magnum. That's the current option, so maybe this overrides file needs some guidance around what to put in there (and what not to) and we'll be okay. 08:39:06 <dalees> sd109: the same values must be merged every time. every 'helm upgrade' overwrites the old values 08:39:21 <dalees> so you'd have to store them at create time 08:39:52 <sd109> Helm has a `--reuse-values` flag that we could make use of I think? 08:40:55 <dalees> so it does 08:41:54 <sd109> Anyway, patch is not viable in its current state and you've both raised some very good points on backwards compatibility etc so thanks for that. We'll have a rethink around how we might be able to do this in a safer way. 08:41:56 <dalees> it would solve one problem, i'd have to think a bit more about it - I wonder if it'd cause any other issues. 08:42:04 <jakeyip> but also is storing a value at create time the expected behaviour? say can an operator want to use this file to force a new image because the existing one has a security issue? 08:43:51 <dalees> sd109: thanks to you and John for proposing it and discussing here, it's still really interesting and there's a lot about labels that is not ideal. 08:44:02 <sd109> Yeah agreed, it needs a rethink... I'm happy to move on for now if you'd like and we can pick this up again next time after some thinking time. 08:44:27 <jakeyip> yeah let's take some time to think about it 08:44:41 <dalees> alright; let's move on to `autoscaling min/max defaults`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/952061 08:45:50 <dalees> so I'm working on this currently (but hitting some backwards compatibility with older clusters which I'm working through). The goal is to let customers change autoscale min/max per nodegroup. 08:48:08 <jakeyip> it's a big patch I need some time to digest. I will star it 08:48:46 <dalees> yep fair enough; no need to review live. we can do it async. 08:49:13 <dalees> just wanted to draw attention to it, especially with sd109_ here who worked on a previous change to this. 08:49:48 <dalees> we can move on for this meeting. 08:49:59 <dalees> next to discuss: `Driver OpenStackCluster api version is old`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/950806 08:50:10 <sd109_> Yeah thanks for raising it, it's been a while since I've looked at that section of the code so will need some time to refresh my memory on how it works but I'll try to get to it this week. 08:50:57 <jakeyip> I do have a related question - is it still the case the current number of nodes doesn't reflect in magnum api ? 08:52:07 <dalees> jakeyip: yes with autoscaling that's true. It only reflects what was set in Magnum API. I have a PoC to pull this info back into Magnum but it depends on conductor periodic tasks occurring more frequently. 08:52:51 <jakeyip> thanks 08:55:04 <dalees> for 950806 - changing the CRD to v1beta1 makes sense to me. The compatibility table remains difficult but we should write what we know and add to it. 08:56:16 <jakeyip> ok thanks. is it good enough for now I guess? 08:57:08 <dalees> yeah, reviewed. 08:57:21 <jakeyip> thanks dalees appreciate it 08:57:50 <dalees> one more while we've got a few minutes 08:58:04 <dalees> `FIP`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/933619 08:58:48 <dalees> sd109_ or mnasiadka: this patch comes from stackhpc. Are you/they happy if I continue the work? 08:59:29 <dalees> I think it just needs combining of the two patchsets 08:59:55 <sd109_> Yeah I did ask Piotr to take another look a couple of weeks ago but I don't think he's gotten round to it yet. Feel free to updated it or do whatever is needed on your side. 08:59:56 <jakeyip> for the FIP I think we decided last meeting, if piotr isn't avaiable to progress this, you can modify your change with his stuffs and add him as co-author? 09:00:42 <jakeyip> that way we won't be stuck waiting for him and he can get credits 09:00:46 <dalees> sd109_: thanks, if he has short toes, we're good. dont want to step on them 09:01:14 <dalees> Ok, I'll progress it 09:01:19 <jakeyip> I do have quetions about what the labels actually supposed to mean though, cos I don't htink they are well defined 09:02:44 <dalees> jakeyip: did my review comment help? I tried to add my understanding and link to source. But i'm doing code archeology so might have misunderstood things. 09:03:30 <dalees> my understanding comes from https://review.opendev.org/c/openstack/magnum/+/641547 09:03:31 <jakeyip> sorry I want to correct myself. I think you are right, but I think they were not well understood by the majority of people 09:04:57 <dalees> i have no doubt that documentation is lacking. 09:05:41 <jakeyip> I think your opinion is that `cluster_template.floating_ip_enabled` should be ignred in capi-helm? 09:06:09 <dalees> yes 09:06:33 <jakeyip> +1 09:06:50 <dalees> because it implies FIPs on worker nodes, only because I don't have a use-case for this. Someone else might. 09:07:59 <sd109_> I don't see a use case for it either tbh 09:08:12 <dalees> we're over time. can we come back to upgrades next time jakeyip ? 09:08:16 <jakeyip> yes 09:08:53 <dalees> ok thank you; I won't do open discussion, but any last things before I close? 09:09:10 <jakeyip> nothing from me 09:09:14 <sd109_> likewise 09:09:15 <dalees> maybe I should do meeting topics per review, hmm. 09:09:40 <dalees> thank you all, good discussions! 09:09:43 <dalees> #endmeeting