#openstack-containers log

08:00:41 <dalees> #startmeeting magnum
08:00:41 <opendevmeet> Meeting started Tue Jun 10 08:00:41 2025 UTC and is due to finish in 60 minutes.  The chair is dalees. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:00:41 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:00:41 <opendevmeet> The meeting name has been set to 'magnum'
08:00:46 <dalees> #topic Roll Call
08:00:55 <dalees> Hi all o/
08:01:43 <sd109> Hi Dale
08:03:43 <dalees> just waiting for any others, I'm sure jakeyip is around
08:04:02 <jakeyip> o/
08:05:53 <dalees> #topic Reviews
08:06:18 <dalees> alright let's get started; a few reviews to discuss in the agenda. Please add any others
08:06:36 <dalees> the first is `Pin OpenStack release version in Devstack script`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/940249
08:08:17 <sd109> I added this one, just wanted to try and get it over the line because myself and all of my StackHPC colleagues have been cherry-picking that patch into every devstack environment so that we can test and develop against specific OpenStack versions rather than being at the whims of (sometimes broken) master branch
08:09:40 <dalees> ok - it appears designed to have no functional changes unless you hardcode a different `OPENSTACK_VERSION` at the top of that file?
08:09:47 <sd109> Yes exactly
08:10:08 <jakeyip> I am fine with that, it's a contrib
08:10:17 <jakeyip> I do the same for my personal devstack anyway
08:10:41 <jakeyip> I was waiting for mnasiadka to chime in as he has better CI knowledge
08:11:00 <mnasiadka> sorry, having another call
08:11:09 <dalees> Sure, no objections to that merging. You might consider having that value at the time default so it could be provided without editing the shell script?
08:11:24 <mnasiadka> Basically I'm going to rewrite a lot of that CI in coming weeks, so fine - update that file, I'm going to remove it soon anyways in favour of good docs
08:11:41 <mnasiadka> :-)
08:12:16 <dalees> we appreciate you, mnasiadka
08:13:20 <mnasiadka> But yes, we should end up with magnum-capi-helm CI jobs soon - so we can work on deleting the Heat driver and some other things
08:13:45 <jakeyip> that will be awesome
08:13:47 <sd109> Okay sounds good, I wasn't aware that you were planning to remove that file when fixing up the CI
08:14:00 <dalees> I've left a comment, but I also don't have much of an opinion; it can merge as is.
08:14:44 <dalees> Shall we move on to discuss the next one?
08:15:03 <jakeyip> ok
08:15:34 <dalees> next is `Allow Magnum operators to provide additional Helm values to all clusters`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/951966
08:16:04 <dalees> I had a read of this one today, and it looks interesting. It might save us some modifications we're making to helm chart values files.
08:16:51 <jakeyip> yeah I agree this is an interesting idea
08:16:53 <sd109> Yeah that was our hope, we have a couple of customers asking for a feature like this and just wanted to make sure that you guys were onboard with adding something like this before we invest much effort into it
08:17:03 <dalees> however, it does apply to every helm chart used. So it's more limiting than publishing a helm chart per k8s version (as we do). It leads you back to global config that can't be changed without affecting old clusters.
08:17:52 <dalees> I dont mind adding it, but I don't think I'll use it for much, after considering this. I want certainty that old clusters on old charts don't accidentally change if we change some config (only on upgrade).
08:18:51 <sd109> Yeah that's a good point, we'll need to think about how we can protect against that e.g. if operator changes the default CNI used in the Helm values that would cause trouble for existing clusters on upgrade I think
08:18:54 <dalees> i can see it being useful for someone with several installations of Magnum and different configs across them, so they don't have to fork and publish their own helm charts.
08:19:44 <sd109> Out of interest, what's your reasoning for using a different chart per k8s version? What kind of things do you change in the charts between versions?
08:20:25 <jakeyip> good point. it'll be a pain maintaining this file, it almost encourages a bad practice?
08:20:58 <dalees> eg. chart version x is used for k8s 1.29. Then we rebase onto chart version y and use it for k8s 1.30. This way we never have any upgrades for a customer who sticks with k8s 1.29 (and may upgrade to the next patch version 1.29.x)
08:21:36 <dalees> and we can make big changes in the chart (pulling in all stackhpc's version bumps) with minor k8s versions. but customers have certainty and stability within patch upgrades.
08:22:23 <dalees> jakeyip: yeah, it's a trap to put any tag versions in there. Only global static config would work. OCI mirror overrides might be a good candidate.
08:23:15 <sd109> Perhaps it would make more sense to allow the value overrides as a field on the cluster template instead?
08:24:01 <jakeyip> without this file, each magnum chart is independent. with this file, you have some values that you need to maintain across all charts used i+n
08:24:27 <sd109> e.g. a something like a `config_overrides` label who's value is a JSON formatted string which the driver merges into the Helm values?
08:24:31 <dalees> isn't that what we use labels for currently? (there are always some config not implemented yet, of course!)
08:25:07 <sd109> Yeah agreed but I was hoping to avoid having to add a label for every single configurable value in the Helm charts if possible
08:25:12 <jakeyip> ... used in all templates. taking into account new creates, upgrades...
08:25:33 <dalees> sd109: what is the main use case of values you'd put in there?
08:25:34 <jakeyip> sd109: I thought about that before, I actually wrote some code for it...
08:26:46 <jakeyip> I feel the json formatted string for setting any values can be quite dangerous though
08:26:48 <dalees> yeah, a rethink of labels given their immutability woud;nt' be the worst idea. This might be another driver for that (or path to deprecating them).
08:27:53 <sd109> We have customers who want to do things like installing qemu-guest agent in all cluster VMs which they could do via the kubeadm config exposed in the helm chart values, or things like setting annotations on the CAPI Machine objects to have more fine grained control over autoscaler behaviour
08:28:30 <sd109> But I agree with Jake that setting values via JSON string could also have it's downsides
08:29:29 <dalees> ah, i see. and these would be okay in your customer case because it's their installation and maybe it does want to be applied for every cluster, including existing ones.
08:30:16 <jakeyip> sd109: what is a 'customer'? is it an op or a end user? I can image for operator, having an ability to override any values without touching `magnum-capi-helm` code can be useful. but for end user, it will be dangerous if those values are not vetted. basically it's an open invite to run anything?
08:30:32 <sd109> Yeah exactly, it saves them either having to rebuild new images for all clusters or forking the Helm charts and setting up chart publishing etc
08:30:42 <sd109> Yeah customer here is operator of a cloud
08:31:15 <dalees> the driver can be made to do this today, via the `magnum.conf`. It's just fairly flat in structure, so you might add these to the chart and flag them on/off or make them simpler. It's not as flexible as this values override though.
08:32:48 <jakeyip> I don't understand this dalees ^
08:34:56 <dalees> well, now I write out the steps, it's a lot of actual code changes vs editing one values file!
08:35:09 <sd109> I guess the alternative here is that we are happy to continue adding more and more labels for niche use cases like this and then putting together some proper documentation on the full list of supported labels in the driver?
08:35:58 <jakeyip> I feel like this needs a rethink; the current model doesn't really align well weith helm chart values
08:38:03 <sd109> What if we refactored the driver code so that the values overrides were only merged in during the `create_cluster` function but not within the `update_cluster` and other future operations? That might protect backwards compatibility?
08:38:33 <dalees> yeah, that's a point sd109  - it ends with a lot of specialized flag labels in Magnum. That's the current option, so maybe this overrides file needs some guidance around what to put in there (and what not to) and we'll be okay.
08:39:06 <dalees> sd109: the same values must be merged every time. every 'helm upgrade' overwrites the old values
08:39:21 <dalees> so you'd have to store them at create time
08:39:52 <sd109> Helm has a `--reuse-values` flag that we could make use of I think?
08:40:55 <dalees> so it does
08:41:54 <sd109> Anyway, patch is not viable in its current state and you've both raised some very good points on backwards compatibility etc so thanks for that. We'll have a rethink around how we might be able to do this in a safer way.
08:41:56 <dalees> it would solve one problem, i'd have to think a bit more about it - I wonder if it'd cause any other issues.
08:42:04 <jakeyip> but also is storing a value at create time the expected behaviour? say can an operator want to use this file to force a new image because the existing one has a security issue?
08:43:51 <dalees> sd109: thanks to you and John for proposing it and discussing here, it's still really interesting and there's a lot about labels that is not ideal.
08:44:02 <sd109> Yeah agreed, it needs a rethink... I'm happy to move on for now if you'd like and we can pick this up again next time after some thinking time.
08:44:27 <jakeyip> yeah let's take some time to think about it
08:44:41 <dalees> alright; let's move on to `autoscaling min/max defaults`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/952061
08:45:50 <dalees> so I'm working on this currently (but hitting some backwards compatibility with older clusters which I'm working through). The goal is to let customers change autoscale min/max per nodegroup.
08:48:08 <jakeyip> it's a big patch I need some time to digest. I will star it
08:48:46 <dalees> yep fair enough; no need to review live. we can do it async.
08:49:13 <dalees> just wanted to draw attention to it, especially with sd109_  here who worked on a previous change to this.
08:49:48 <dalees> we can move on for this meeting.
08:49:59 <dalees> next to discuss: `Driver OpenStackCluster api version is old`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/950806
08:50:10 <sd109_> Yeah thanks for raising it, it's been a while since I've looked at that section of the code so will need some time to refresh my memory on how it works but I'll try to get to it this week.
08:50:57 <jakeyip> I do have a related question - is it still the case the current number of nodes doesn't reflect in magnum api ?
08:52:07 <dalees> jakeyip: yes with autoscaling that's true. It only reflects what was set in Magnum API. I have a PoC to pull this info back into Magnum but it depends on conductor periodic tasks occurring more frequently.
08:52:51 <jakeyip> thanks
08:55:04 <dalees> for 950806 - changing the CRD to v1beta1 makes sense to me. The compatibility table remains difficult but we should write what we know and add to it.
08:56:16 <jakeyip> ok thanks. is it good enough for now I guess?
08:57:08 <dalees> yeah, reviewed.
08:57:21 <jakeyip> thanks dalees appreciate it
08:57:50 <dalees> one more while we've got a few minutes
08:58:04 <dalees> `FIP`: https://review.opendev.org/c/openstack/magnum-capi-helm/+/933619
08:58:48 <dalees> sd109_ or mnasiadka: this patch comes from stackhpc. Are you/they happy if I continue the work?
08:59:29 <dalees> I think it just needs combining of the two patchsets
08:59:55 <sd109_> Yeah I did ask Piotr to take another look a couple of weeks ago but I don't think he's gotten round to it yet. Feel free to updated it or do whatever is needed on your side.
08:59:56 <jakeyip> for the FIP I think we decided last meeting, if piotr isn't avaiable to progress this, you can modify your change with his stuffs and add him as co-author?
09:00:42 <jakeyip> that way we won't be stuck waiting for him and he can get credits
09:00:46 <dalees> sd109_: thanks, if he has short toes, we're good. dont want to step on them
09:01:14 <dalees> Ok, I'll progress it
09:01:19 <jakeyip> I do have quetions about what the labels actually supposed to mean though, cos I don't htink they are well defined
09:02:44 <dalees> jakeyip: did my review comment help? I tried to add my understanding and link to source. But i'm doing code archeology so might have misunderstood things.
09:03:30 <dalees> my understanding comes from https://review.opendev.org/c/openstack/magnum/+/641547
09:03:31 <jakeyip> sorry I want to correct myself. I think you are right, but I think they were not well understood by the majority of people
09:04:57 <dalees> i have no doubt that documentation is lacking.
09:05:41 <jakeyip> I think your opinion is that `cluster_template.floating_ip_enabled` should be ignred in capi-helm?
09:06:09 <dalees> yes
09:06:33 <jakeyip> +1
09:06:50 <dalees> because it implies FIPs on worker nodes, only because I don't have a use-case for this. Someone else might.
09:07:59 <sd109_> I don't see a use case for it either tbh
09:08:12 <dalees> we're over time. can we come back to upgrades next time jakeyip ?
09:08:16 <jakeyip> yes
09:08:53 <dalees> ok thank you; I won't do open discussion, but any last things before I close?
09:09:10 <jakeyip> nothing from me
09:09:14 <sd109_> likewise
09:09:15 <dalees> maybe I should do meeting topics per review, hmm.
09:09:40 <dalees> thank you all, good discussions!
09:09:43 <dalees> #endmeeting