09:17:49 <jakeyip> #startmeeting magnum
09:17:49 <opendevmeet> Meeting started Wed Sep 13 09:17:49 2023 UTC and is due to finish in 60 minutes.  The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:17:49 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:17:49 <opendevmeet> The meeting name has been set to 'magnum'
09:17:55 <jakeyip> #topic Roll Call
09:17:59 <jakeyip> o/
09:18:08 <dalees> o/
09:18:15 <jakeyip> ping gbialas
09:18:21 <mkjpryor> o/
09:19:00 <gbialas> o/
09:19:04 <jakeyip> ping mnasiadka :)
09:19:44 <jakeyip> Thanks everyone for joining the meeting
09:19:58 <jakeyip> Agenda:
09:20:01 <jakeyip> #link https://etherpad.opendev.org/p/magnum-weekly-meeting
09:20:08 <jakeyip> Please add your topics to the Agenda
09:21:06 <jakeyip> #topic Changing container_runtime default
09:21:18 <gbialas> Yes, that is mine :)
09:21:22 <jakeyip> gbialas: let's start with this. sorry I wasn't around last week. I read thru the chat.
09:21:43 <jakeyip> I agree we don't want to keep the old things around forever. In fact, I was hoping ClusterAPI solves my problem
09:21:58 <jakeyip> but unfortunately.... long story... :)
09:22:12 <mnasiadka> It's going to solve it in longer run
09:22:31 <gbialas> Well cluster IP is song of the future, till  then we have to live with it :)
09:22:36 <mnasiadka> But let's try to make the current driver working out of the box without any labels ;)
09:22:58 <jakeyip> yeah. maybe I should provide some context
09:23:52 <jakeyip> so Antelope supports v1.23. It still works with dockershim.
09:24:52 <mnasiadka> Bobcat is going to support 1.23?
09:25:03 <jakeyip> during last PTG we sort of decided not to touch labels anymore. This is to prevent the case of an operator upgrading Magnum and the default labels changing and breaking existing cluster templates that didn't define it
09:26:27 <jakeyip> For example, an site might be using a template that defaults to dockershim and if we change the runtime, an operator upgrading from Antelope to Bobcat will have it broken because maybe there are some things that expects dockershim and it was the default
09:27:14 <jakeyip> personally, I think the labels is a massive headache. therefore I was hoping CAPI will remove this altogether
09:27:14 <mnasiadka> well, operator is upgrading to Bobcat and still wants to deploy unsupported kubernetes version?
09:27:16 <gbialas> I have spawned cluster in 1.23 version, and then changed default for container_runtime. I have rescaled cluster, and it worked.
09:27:55 <mnasiadka> jakeyip: CAPI will not remove labels, we need to somehow pass additional options to the driver
09:28:23 <mnasiadka> We could manage properly kube_version
09:28:24 <jakeyip> gbialas: the existing clusters will already have the dockershim value. it is new clusters that will have the new value
09:28:31 <mnasiadka> currently we copy kube_tag as kube_version
09:28:46 <mnasiadka> if we would know what is the major version being requested - we could do a conditional on container_runtime
09:28:53 <jakeyip> mnasiadka: yes with CAPI and the CAPI labels we will be very careful not to make this mistake
09:30:21 <jakeyip> mnasiadka: kube_ver is just one, some of them may depend on fcos version, etc
09:31:32 <jakeyip> e.g. use_podman label when Magnum was changing from fedora atomic to fedora coreos
09:31:47 <mnasiadka> just saying that this way we would not break any expectations
09:32:01 <mnasiadka> second thing is - we could have a validator on kubernetes version to the ones we support
09:33:08 <jakeyip> can explain more?
09:33:18 <jakeyip> gbialas any comments?
09:33:35 <jakeyip> I am not objecting to changing, I am open to ideas
09:33:39 <dalees> it would be nice to be able to have label defaults be "sensible" in each version and able to change. It would probably be a lot of change, but it'd help if templates stored defaults when they were created they'd not be affected by code changes of defaults.
09:33:47 <gbialas> We should support some recent version (both k8s and fcos) Right now we are supporting versions which are eol.
09:34:06 <mnasiadka> dalees: don't we do that today?
09:34:46 <dalees> mnasiadka: can't do, if a template is created with no labels, it'll use defaults on cluster creation. So the behaviour changes if the default in code changes.
09:35:04 <dalees> this is why label defaults are annoying; we may break old templates.
09:35:07 <mnasiadka> ah, so we would need to evaluate all labels defaults and push them to the Heat stack
09:35:19 <jakeyip> gbialas: we do support new versions, if you are making a new CT you have to provide the labels that will work for it. this is actually good because this CT won't follow defaults if magnum changes.
09:35:43 <jakeyip> CT becomes immutable and not dependent on Magnum defaults.
09:36:10 <dalees> the other option is to just say "we will change defaults to match k8s version X". And if the user wants to have a template that still works the same, they specify all the labels in the template (we do this at Catalyst Cloud, and are explicit about most labels)
09:36:30 <mnasiadka> well, still containerd will work with 1.23, I guess even with 1.19
09:36:45 <mnasiadka> most projects just do release notes on changed defaults
09:36:59 <jakeyip> mnasiadka: not sure but we don't test it and I'm not keen to look :)
09:38:02 <mnasiadka> yeah, once we test, that it might be better :)
09:38:19 <jakeyip> I like dalees idea on getting the defaults set in the CT on creation.
09:38:37 <mnasiadka> so what for now? we change the default or just write in the docs that any not EOL kubernetes version needs setting container_runtime to containerd?
09:39:36 <dalees> I think that is a difficult change, the defaults live in Heat template files not python code - so I don't think you could just iterate over them and store.
09:39:56 <jakeyip> the way decided last PTG is to document it. Every version we have the tags that should be used when creating CT. That way CT stores all the info
09:40:10 <mnasiadka> ok, so let's continue with that for now
09:40:12 <jakeyip> I also like mnasiadka idea of eventually not providing default lol
09:40:31 <mnasiadka> Yeah, maybe we should have some mandatory labels :)
09:40:51 <dalees> mnasiadka: Jake is working on documenting "known good" labels for k8s versions, containerd is one of them. https://review.opendev.org/c/openstack/magnum/+/894006/2/doc/source/user/index.rst
09:41:01 <jakeyip> if this problem follows us for a while (e.g. CAPI won't save us), dalees may want to explore that idea
09:42:15 <mnasiadka> ok, so that probably concludes
09:42:40 <jakeyip> then we just have a breaking change behaviour version and after that all CT will suck defaults in at create time and store them. Then we are free to rachet the labels each version
09:43:19 <mnasiadka> yup
09:43:57 <jakeyip> gbialas: you ok with that? we don't mind help in getting a patch for this idea, then we can update labels each version
09:44:30 <gbialas> Yes, im ok with this.
09:44:49 <jakeyip> I've just noticed the time, let's move on quickly. gbialas do message me if you want to discuss more. Sorry to disappoint :)
09:45:06 <jakeyip> #topic Fixing meeting name
09:45:08 <gbialas> Ii is as it is :)
09:45:10 <jakeyip> mnasiadka: updates?
09:45:53 <mnasiadka> #link https://opendev.org/opendev/irc-meetings/src/branch/master/meetings/containers-team-meeting.yaml
09:45:58 <mnasiadka> haven't raised a patch to rename it
09:46:06 <mnasiadka> I was thinking of leaving the name as is
09:46:16 <mnasiadka> changing id to magnum - so it points to correct logs on eavesdrop
09:47:14 <mnasiadka> does it make sense?
09:47:24 <mnasiadka> or should we also rename OpenStack Containers to OpenStack Magnum?
09:47:56 <jakeyip> I am not sure where the bug is actually.
09:48:03 <jakeyip> is this a community thing or can both of us decide?
09:48:24 <jakeyip> or 3 of us (sorry dalees :D )
09:50:04 <jakeyip> OH I think I understand now. because we start meeting with 'magnum'
09:50:24 <jakeyip> old logs here https://meetings.opendev.org/meetings/containers/
09:50:31 <jakeyip> new logs here https://meetings.opendev.org/meetings/magnum/
09:50:35 <jakeyip> am I right?
09:52:44 <jakeyip> did I lose everyone... ?
09:54:05 * dalees is here
09:54:14 <jakeyip> hm
09:54:30 <jakeyip> nvm let's go on to next topic, nearly time
09:54:34 <jakeyip> #topic ClusterAPI
09:56:18 <jakeyip> Update: We have encountered a bit of a roadblock while at the very last stage of the ClusterAPI driver. It turns out that the current implementation, contributed by StackHPC, may conflict with another driver VeXXHost developed in-house
09:58:13 <mnasiadka> jakeyip: we can decide - old logs are in containers
09:58:22 <mnasiadka> jakeyip: but when you write startmeeting magnum - the logs get into magnum
09:58:37 <mnasiadka> jakeyip: so either we change only id, or name as well - core reviewers can decide :)
09:58:38 <jakeyip> we have discussed about this a fair bit. The biggest consideration is keeping the community happy and engaged so the project is substainable.
09:59:27 <jakeyip> thanks mnasiadka, we will discuss this offline so as not to waste others' time
09:59:42 <mnasiadka> jakeyip: ack
10:00:13 <jakeyip> so in view of that, the current implementation in Magnum can't be merged as is
10:00:41 <jakeyip> There are a few things we will need to do
10:01:20 <jakeyip> 1. Find a way for both drivers to co-exist
10:02:23 <jakeyip> 2. Refactor the driver interface code so others can contribute out of tree drivers without conflicts
10:03:55 <jakeyip> 3. Possibly merge StackHPC and VeXXHost drivers so in the future there is 1 good reference CAPI driver
10:04:17 <jakeyip> I think that's all, mnasiadka or dalees, do you have anything to add?
10:04:53 <dalees> No, you covered what I was going to say in #2
10:05:26 <jakeyip> questions anyone?
10:06:13 <mnasiadka> not me
10:07:32 <jakeyip> mkjpryor has been strangely silent. :)
10:07:52 <jakeyip> maybe he is very disappointed
10:09:05 <jakeyip> ok let's move on
10:09:11 <jakeyip> #topic Open Discussion
10:10:47 <jakeyip> Oh I noticed a BU Mentorshop Agenda. It doesn't have a name to it, do we have someone who wants to talk about that?
10:12:07 <jakeyip> alright if there's nothing let's end the meeting.
10:12:19 <mkjpryor> Apologies - got distracted by an urgent email
10:12:27 <jakeyip> oh that's ok
10:12:37 <mkjpryor> We are currently extracting our patches into an out-of-tree driver that we will work on for the time being
10:13:37 <mkjpryor> The problem with our driver and VEXXHOST's co-existing in the same installation is that we use the same image properties to identify our drivers
10:13:53 <mkjpryor> So we would need to come up with a way out of that
10:14:14 <jakeyip> yeah we've discussed about that, we will find a way around that
10:14:57 <mkjpryor> More generally, we are keen to find a way to share as much code between our driver and VEXXHOST's as possible
10:15:14 <mkjpryor> But that is probably a fair distance off at the moment
10:15:18 <jakeyip> ideal scenario is that a Magnum installation will be able to use any number of drivers. The ClusterTemplates will hint which driver to use.
10:15:27 <mkjpryor> That makes sense I think
10:16:05 <mkjpryor> I don't really understand why it wasn't always like that, but I also wasn't using Magnum when the API was designed
10:16:07 <jakeyip> mkjpryor: yes, I think that is the most helpful. we would really love if both StackHPC and VEXXHOST can work together to come up with one good reference.
10:16:35 <jakeyip> the API was designed in mesos days so things changed :)
10:16:42 <mkjpryor> We do favour our Helm approach, TBH, as it allows us to reuse the intelligence in the Helm chart in other places, e.g. for proper gitops and in Azimuth
10:17:19 <mkjpryor> I'm not sure whether VEXXHOST's objections are because we are using Helm or because we are not using ClusterClass
10:17:49 <mkjpryor> FWIW, we are probably going to modify the Helm charts to use ClusterClass at some point in the next few months
10:18:04 <mkjpryor> That may bring us a bit closer
10:19:18 <dalees> mkjpryor, question for you (and I'll ask the same of Vexxhost): how open are you to contributions to your out of tree driver and helm charts? There are lots of changes we've been waiting to submit once it merges, which won't happen for a while. I believe one or two are created as PRs but had no attention.
10:20:17 <mkjpryor> Very open, for us
10:22:14 <mkjpryor> I did see a PR to the CAPI Helm charts for Flatcar integration
10:22:56 <jakeyip> Hi all, I think we can continue the CAPI discussion, but let's close the meeting
10:23:05 <mkjpryor> I need to have a proper think about how we support multiple OSs there, because I'm not convinced that PR is the cleanest approach
10:23:14 <jakeyip> #endmeeting