09:03:56 #startmeeting magnum 09:03:56 Meeting started Wed Sep 20 09:03:56 2023 UTC and is due to finish in 60 minutes. The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:03:56 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:03:56 The meeting name has been set to 'magnum' 09:04:03 jakeyip: couldn't resist ;) 09:04:16 #topic rollcall 09:04:18 o/ 09:04:22 o/ 09:04:22 o\ 09:04:24 o/ 09:04:26 o/ 09:04:30 o/ 09:04:31 o/ 09:04:34 o/ 09:05:10 what a crowd :) 09:05:31 #topic agenda 09:05:40 BU Mentorship 09:05:47 ClusterAPI 09:05:54 Open discussion 09:05:59 #topic BU Mentorship 09:06:09 So, it was linked last time in the etherpad 09:06:22 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:06:23 etherpad link: https://etherpad.opendev.org/p/magnum-weekly-meeting 09:06:40 I reached out to diablo_rojo - it seems there are no students that signed up, but it would be good if we could have a list of potential mentors from Magnum side 09:06:49 #link https://etherpad.opendev.org/p/2023-BU-Magnum 09:07:21 It would be nice if people interested would add their names on that etherpad to Mentors section 09:07:27 John Garbutt proposed openstack/magnum master: WIP: ClusterAPI: add initial driver implementation https://review.opendev.org/c/openstack/magnum/+/851076 09:07:29 #topic ClusterAPI 09:07:34 jakeyip: giving the meeting back to you ;-) 09:08:28 John Garbutt proposed openstack/magnum master: WIP: ClusterAPI: add initial driver implementation https://review.opendev.org/c/openstack/magnum/+/851076 09:08:29 oh man the difficult topic 09:08:49 johnthetubaguy are you around? (I guess you are online from ^) 09:08:57 I can update from our side what we are doing? 09:09:05 yes thanks 09:09:38 So given we didn't get merged this cycle, and we are testing this with a few customers, we have created this repo for now: https://github.com/stackhpc/magnum-capi-helm 09:10:26 I have been rebaseing the upstream patches so they could be kept in sync with the above 09:10:42 I certianly would still like a cluster api driver, that uses helm charts, in tree, if that is possible 09:11:05 This is the tip of the updated patch set: https://review.opendev.org/c/openstack/magnum/+/851076 09:11:49 thanks for creating that repo btw; it's much easier to install from a repo than carry a stack of gerrit patches to test against. 09:12:05 in the process of retesting the devstack bits there, to see what I broke in all the refactoring (facepalm!) 09:12:15 thanks. for background and comparison, VEXXHOST driver is out of tree, I don't think there's inclination for them to contribute to in-tree. 09:12:19 dalees: cool, glad that helps, certainly seemed easier going 09:13:08 so we spoke about things at the last PTG right, and I think the core difference is we want to use helm charts that can be shared with k8s on OpenStack outside of magnum 09:13:34 in particilar I would like ArgoCD directly as and option (alongside Azimuth that has been using these for 18 months or more) 09:13:59 and really I would love for that to be a community effort, with a tested referece starting point people can use 09:14:37 ... so I don't think that vision has changed form when we approved the spec, granted we only got our funding approved about two weeks ago, hence the re-annimation on our side 09:14:41 yeap I understand. both approaches have their pros and cons, but let's not get too deep into that now, in the interest of time as that can take a while 09:15:03 So there is a new patch in the series 09:15:16 https://review.opendev.org/c/openstack/magnum/+/895828 09:15:17 so we paused the merge because we realised the patches in tree, as it stands, would conflict with VEXXHOST and existing implementations using that driver 09:15:29 It sarts some common utils to share being the two drivers 09:15:47 jakeyip: yes, I noticed that late last week in the comments, that is certainly bad 09:15:58 for the moment I went for changing our use of os-distro 09:16:19 "os": "capi-kubeadm-cloudinit" 09:16:40 now that is pretty crazy, but it represents that the current chart defaults depend on the kubeadm cloudinit bootstrapper 09:16:56 (we can add the flat car one, with the approriate values tweak too!) 09:17:20 with my Nova hat os os_distro=ubuntu is technically mal formed and not useful to nova 09:17:45 I think, given both drivers require the same capi built images (ubuntu for now, flatcar soon), we need some other way of differentiating besides (vm, ubuntu, kubernetes) tuple. At the moment I prefer the idea of having the Magnum template define the driver preference (and it could default otherwise). 09:17:46 from a more human sense, its a bit odd though, but it stops us conflicting for now 09:17:52 cool. glad we all agree we shouldn't break existing implementations. certainly it caught us (me) by surprise. It would have helped if we did had a VEXXHOST representative reviewing those patches. 09:18:11 dalees: yeah, that would be ideal, of couse you could only enable one of the drivers via config 09:18:32 jakeyip: I think they have hard time attending this meeting, due to time difference - but let's see if we can resolve it in long term :) 09:18:48 jakeyip: 100% we shouldn't break that driver, I am glad that got spotted, I certainly didn't notice that till it was pointed out 09:19:21 mnasiadka: yeah we can put one of them as a core to review patches, offline from meeting TZ. 09:19:21 johnthetubaguy: yes, there are config flags to disable certain drivers - this solves one part. jakeyip brought up the point that if someone was migrating between drivers, they'd want both running at once for some duration. 09:19:26 mnasiadka: +1 the overlap is hard 09:19:38 Let's not get deep into technical stuff here, Gerrit is for code reviews (at least that's my opinion) - situation is that we have Bobcat RC1 right now, so if we merge any in-tree driver then it's going to be earliest Caracal 09:19:50 dalees: jakeyip yeah, good point, you would want side by side 09:20:31 FWIW, I kinda like the idea of the template speciying a driver more directly, and the driver validating the image its self 09:21:53 we could look at writing up a spec for that? but general guidence on the best way to implement that is very welcome 09:22:04 a label is tempting, but also nasty 09:22:11 in addition the config approach, the current config option is 'disabled_drivers'; a new one driver in a new cycle will be enabled by default, if operator hasn't update config 09:22:36 top level template param to select the driver? if empty falls back to the legacy image based seletion? 09:23:07 jakeyip: I have your beta driver change in my series to make sure its opt in till we are happy 09:23:49 yeah, top level template param seems suitable i think, with empty fallback to the tuple match (or *also* do the tuple match, as well as the driver selection?). 09:24:06 the problem really is the tuple design, it will hamstring us if we keep trying to work with it. plus if we introduce new tuples, we run the risk of it breaking an installation out there 09:24:54 I was thinking a top level template param means we just ignore the tuple, and we call that legacy for when the top level param is empty? 09:25:29 i.e. you opt into the new system, for new templates. 09:25:54 johnthetubaguy: yeap, that is sort of the design we came up with last week after our brainstorming session. 09:26:00 and you just say, driver=k8s_capi_helm_v1 or whatever in the template. That does mean we need an API to list the avaliable drivers. 09:26:37 yes, but that might be easier to do when we drop all other drivers 09:26:47 seems cleaner, I honestly hate the tuple thing, I have spend hours debugging that with image permissions, etc. 09:27:05 true; there's a CLI tool for listing drivers, but not an API. 09:27:43 seems like a nice priority for C cyle 09:27:46 mnasiadka: well we can't drop the old approach without killing the API right, so I don't see why that needs to wait? this is all about someting for C release anyways? 09:27:47 *cycle 09:28:49 is there someone who wants to take this one and write up a spec I guess? 09:28:50 good point about API, the old discovery is useful in certain circumstances where a user creates a template, possibly off the information on magnum user docs 09:29:08 johnthetubaguy: just saying all other drivers are already deprecated, and it might be just easier to get that implemented when we drop them for simplicity 09:29:24 so the API would exclude disabled drivers, and include any out of tree things you happen to have installed, I presume. 09:29:27 I am not sure if that is a thing anymore when we are talking about out of tree drivers nowadays. how does the user know what os_distro to use if they didn't know about the driver(s) 09:30:08 So personally, I think we should disable users from creating templates, by default, and leave that to the admin... but I might be on my own there. 09:30:21 (but that is a whole other RBAC debate really) 09:30:27 what, more api changes?! :) 09:31:04 johnthetubaguy: +1, we create these for users. self-serve is a minefield of labels :) 09:31:24 so many of our customers do that, as the users get them selvers in a mess when they create crazy templates, and they just don't have the time to help them with that 09:31:25 FWIW we create templates for user. Users do create their templates off ours if they want something special. 09:31:32 I think let's not get into the policy battle for now 09:31:41 people can change magnum policy today and I think that's fine 09:31:48 but let's shelve that, a whole other discussion 09:31:49 instead of enforcing our thinking on users :) 09:32:24 the problem is relevant to who is expected to create a template, and needing to know the avilable driver list right? but happy to ignore that for now 09:32:30 this feels like a PTG like discusion to me 09:32:32 but I think what this leads to is that the API to discover drivers is not strictly necessary? 09:33:03 at least, not at the inital to support CAPI driver. It can come later, if someone wants to. 09:33:40 I don't think it's strictly necessary for anything, it's nice to have - and given the size of the Magnum community, we might find better use for our time 09:34:10 I don't personally see this as blocking the driver, it only blocks side by side having two cluster api drivers, assuming we have the same os_distro flag, which now we don't 09:34:29 ... it does however seem useful and worth adding, either way 09:34:33 +1 I agree. we will accept patches if someone do the work. needs a spec. let's leave it as that 09:34:57 does anyone want to implement that? 09:35:07 Let's leave that question for PTG 09:35:09 (I mean does anyone have time, really) 09:35:22 johnthetubaguy: if we continue on the new approach to use CT to define driver, it's not a blocker for two CAPI driver side by side 09:35:23 Is magnum signed up to the PTG? 09:35:25 from an operator pov you need a k8s running capi somewhere - is this completely independant/decoupled from the particular magnum capi driver in use? 09:36:10 jrosser: good point. there's a config option to point to the management option. I think we have to make sure config options don't clash 09:36:35 note quite, as the CRDs the drive use might depend on specific operator versions, but in theory they should overlap a lot 09:36:38 at some point, Magnum still needs to assignment config sections to prevent conflicts 09:37:01 e.g. each driver will have their section named after driver name 09:37:22 so the heat driver does that now, so I went for capi_helm in my current patch series 09:37:38 edit: there's a config option to point to the management cluster 09:38:52 I think the two drivers are doing that today? 09:39:03 as in they both have spearate config, or did you mean they should share? 09:39:07 johnthetubaguy: I don't really understand what you mean by 'heat driver does that now' ? 09:39:40 jakeyip: I guess its not really true, I mean their is a [heat] config section that is specific to the heat driver, but I guess its not really that clean 09:39:59 johnthetubaguy: I mean, currently for both drivers, and also for future drivers, we sort of should have a standard for people implementing out of tree drivers to prevent conflicts 09:40:23 imho there needs to be some good thought to operator experience, i foresee a large attraction in general of the capi driver approach is to relieve the operator from having extremely deep k8s expertise but still be able to run the magnum service 09:42:06 there are a few point of conflicts. (1) tuple (we solve by CT specifying driver) (2) config section (3) driver name 09:42:43 I think understanding this is enough so we can help advice future patches / reviews. 09:43:00 jrosser: can you elaborate? is there something we can do better? 09:43:40 i am very interested in the new direction of travel for magnum, it looks great 09:44:20 as an operator we have been unable to deploy the existing approach as it is too much burden 09:45:03 +1 Cluster API is a strong base, regardless of all this, and I am excited by the further traction its getting inside and outside of OpenStack 09:45:20 ah I see, understand now. 09:46:26 ok to bring the driver discussion back, would like to poll the room on our approach and if there's anything we've missed 09:48:40 jakeyip: my main question is what is needed to help get the capi_helm driver merged in the C cylce? And I suspect that is probably best discussed at the PTG if we can find a slot when lots of us can attend, including vexxhost driver representatives. I know I can't usually make this meeting time either. 09:50:06 johnthetubaguy: will you, or someone from StackHPC, be able to work with VEXXHOST to align both your drivers? 09:50:10 Or maybe I should rephrase, that, to getting a clear yes we aim to merge (as we said in B) or we decide to not have any drivers in tree. 09:50:34 jakeyip: we have wanted to do that from the start, but helm is the deal breaker from both sides, based on our previous discussions 09:51:32 I have started some driver common utils where we have a bit of shared code already, it would be great to expand that, so the out of tree driver can consume that when it makes sense for it, or not, as it may choose. 09:52:17 johnthetubaguy: I think VEXXHOST mainly wants to continue on the ClusterClass route. 09:52:51 I want to use clusterclass too, its on the roadmap, just spending our effort on magnum integration right now 09:53:08 it wasn't working when we started the helm charts, so we didn't use it from the start 09:53:26 cool. definitely it can be a good collab. 09:53:30 similar to the add ons, we have our own helm add on installer, to get around the life-cycle issues on the updates there 09:54:44 right now, our focus is more on how to get both drivers to work together. Once we figure that out, the way to getting your CAPI driver merged will be clear 09:54:50 jakeyip: I would like to collab, but I am not seeing the opertunity right now, beond some share utils, which is a shame, my strong preference is for a single in tree solution we all support :'( 09:55:14 yeap I am keen for that. 09:56:44 I think Magnum needs to ensure both drivers can co-exist (discussed above), and then I'm keen to see one or both merge if there are maintainers for them. Staying out of tree is okay too. We're actively picking up the helm driver and using it now. 09:57:14 I would be cool with both merging in tree too, that would be cool too right? 09:57:35 the single in tree solution doesn't have to be there from day one. as long as we allow for multiple drivers, we can iterate 09:57:41 (much easier to share code and logic when we are both in tree) 09:58:28 jakeyip: I believe the current patches actively allow for both drivers today? at least that was always the intention on my part, although it wasn't the reality due to the problems we found in review. 09:58:56 similar to how linux can work different FS. Allow multiple to exists, see which one wins out over time. 09:59:02 maybe a different question, based on the current patches, where is the tention? 09:59:18 s/tention/conflict/ 09:59:35 we need maintainers for any driver, which is the _hard_ problem. 09:59:53 jakeyip: that is a good comparison here, we use different package managers... I mean add on providers 10:00:59 johnthetubaguy: I believed we have covered the conflicts so far? 10:01:27 jakeyip: I mean, I think they are all addressed in the current patches, I would love comments on the patches highlighting any that are left please 10:01:31 johnthetubaguy: the actual image supported by both is identical. it's true they don't conflict now in the tuple, but only because of the changed `os_distro` (as well reasoned as it is, to differ from `ubuntu`) - there's no reason the vexxhost driver shouldn't launch using an identical image. 10:03:02 dalees: agreed the same images should work with both 10:03:34 but I don't know of a use case we are blocking that would actively want to support that 10:04:20 I do like the driver selection in the template, it would be a good add, and avoid this problem 10:05:25 johnthetubaguy: I am not really sure of the question and basis you are asking from actually. may need clarification. 10:05:50 are you asking for conflicts, on the basis that (1) os_distro has been changed (2) CT appraoch is not implemetned ? 10:05:55 well that is a point; if I were moving drivers I'd not need both drivers to launch the same image; I'd use a new k8s version. 10:06:41 jakeyip: I am meaning, now we use differnet os_distro flags, I think they for most people, don't conflict 10:07:10 dalees: you wouldn't need to wait long to get a new release at least :) 10:07:46 johnthetubaguy: ok I understand. yes things don't conflict now, with the os_distro change, but there are more about using the os_distro that I need to clarify. 10:08:06 it is outside the scope of meeting though, if you would like to hang around a bit after? 10:08:19 I am just concious we are over time 10:08:27 ok, do we need some summary? 10:08:41 OK I will summarise this topic 10:08:49 afraid I really should run, I was meant to be with my customer at 9am, and its gone 11 now. 10:09:13 johnthetubaguy: sure I may put comments in your PS then. or we find a better time 10:10:59 #agreed we have stopped the CAPI driver in Bobcat due to conflict with VEXXHOST's driver. We will explore solutions that allow multiple drivers to exist in C cycle. 10:11:09 anyone wants to add on? 10:11:21 jakeyip: yes to both, lets discuss in the patch 10:11:32 yes, we need a timeframe for vPTG and some etherpad to add all those ideas in there :) 10:12:14 So I don't really agree there is a conflict, but agree we need to work that out during the C cycle. 10:12:40 mnasiadka: OK, vPTG will be separate discussion. 10:12:53 let's close ClusterAPI 10:12:55 (I should say, I still don't really understand the conflict right now, lets work that out ASAP) 10:13:06 #topic vPTG 10:13:44 I didn't register for a vPTG this cycle because for the previous cycle the timing doesn't suit us, and we had our own vPTG discussion at this timeslot on the vPTG week 10:14:16 I am not sure what's the best way to go, considering that it'll be valuable to have VEXXHOST attend too. 10:14:34 * dalees is willing to make some ungodly NZ hour, if it means more can attend. 10:14:55 I'm willing to make the same as dalees, but it will be 12 hours later for me ;) 10:15:15 (this is all new to me/us, happened after vPTG registration has closed) 10:15:27 I think generally 19UTC is 7AM NZ time and 9PM CEST 10:16:06 7am is quite acceptable, for both myself and travisholton 10:16:09 it is 5AM AEST but sure :) 10:16:19 hah, sorry jakeyip :) 10:16:23 ah right 10:16:24 lol 10:16:32 it's ok 10:17:24 are there two different slots that would work, where many people can make both? (granted its all relative to massive jet lag) 10:17:29 we have daylight savings from next week as well 10:18:16 we have that soon too, that is a good point, is this where it gets worse again or better? I forget... 10:18:31 https://shorturl.at/fuvAD 10:18:34 travisholton: that might make things better, because our daylight savings time change is 29th Oct 10:18:53 mnasiadka, johnthetubaguy, dalees - given you all may have different vPTGs to attend to, how is Wed 25/10 1900 UTC ? 10:19:28 it's ok for me 10:19:45 sorry, checking 10:19:55 that is also a slot NOT on the vPTG https://ptg.opendev.org/ptg.html 10:20:40 I think that can work (its half term here) 10:20:46 they may as well put 24 hour slots for the vPTG 10:21:14 8am NZT, yes - I can make that work. 10:21:20 ptg-bot doesn't need toilet break 10:21:24 me too 10:22:05 sounds like its worth trying that time, and asking on the ML for folks that can't make that time 10:22:25 ok let's pencil that in for now. 10:22:37 Michal Nasiadka proposed openstack/magnum-tempest-plugin master: WIP: k8s driver CI tests https://review.opendev.org/c/openstack/magnum-tempest-plugin/+/893131 10:23:49 johnthetubaguy: ok I will make a post on ML to notify others. I am still holding out for mnaser too 10:24:02 #topic Open Discussion 10:24:37 free for all to post 10:25:35 oops forgot for previous topic 10:26:03 #agreed vPTG on 25 Oct 1700 UTC 10:26:30 if there's nothing I would like to end the meeting. 10:26:56 thanks everyone for coming! 10:27:23 #endmeeting