09:01:24 #startmeeting magnum 09:01:24 Meeting started Wed Nov 1 09:01:24 2023 UTC and is due to finish in 60 minutes. The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:24 The meeting name has been set to 'magnum' 09:01:25 Agenda: 09:01:27 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:01:29 #topic Roll Cally 09:01:42 #topic Roll Call 09:05:19 o/ 09:08:16 hi lpetrut 09:08:22 hi 09:08:45 looks like there isn't others here today :) is there anything youw ant to talk about? 09:09:17 I work for Cloudbase Solutions and we've been trying out the CAPI drivers 09:09:23 there's something that I wanted to bring out 09:09:27 up* 09:09:34 the need for a manaegment cluster 09:09:41 cool, let's start then 09:09:47 #topic clusterapi 09:10:10 go on 09:10:16 we had a few concerns about the mangement cluster, not sure if it was already discussed 09:10:37 what's the concern? 09:10:50 for example, the need of keeping it around for the lifetime of the workload cluster 09:11:54 and having to provide an existing cluster can be an inconvenient in multi-tenant environments 09:13:05 just to clarify, by 'workload cluster' do you mean the magnum clusters created by capi? 09:13:10 yes 09:14:02 can you go more in detail how having a cluster in multi-tenant env is an issue? 09:14:12 other projects tried a different approach: spinning up a cluster from scratch using kubeadm and then deploying CAPI and have it manage itself, without the need of a separate management cluster. that's something that we'd like to experiment and I was wondering if it was already considered 09:14:53 by other projects yiou emaN? 09:15:05 mean? 09:15:50 this one specifically wasn't public but I find their approach interesting 09:16:17 lpetrut: I would like to understand your worry more on that 09:16:33 hio johnthetubaguy 09:16:38 about the multi-tenant env, we weren't sure if it's safe for multiple tenants to use the same management cluster, would probably need one for each tenant, which would then have to be managed for the lifetime of the magnum clusters 09:16:42 do you mean you want each tenant cluster to have a separate management cluster? 09:17:34 lpetrut: I think that is where magnum's API and quota come in, that gives you a bunch of protection 09:18:01 lpetrut: just to confirm you are testing the StackHPC's contributed driver that's in review ? 09:18:05 each cluster gets thier own app creds, so there is little crossover, except calling openstack APIs 09:18:17 jakeyip: both drivers do the same thing, AFAIK 09:18:46 yes, I'm working with Stefan Chivu (schivu), who tried out both CAPI drivers and proposed the Flatcar patches 09:18:52 jakeyip: sorry, my calendar delt with the time change perfectly, my head totally didn't :) 09:20:00 and we had a few ideas about the management clusters, wanted to get some feedback 09:20:45 one of those ideas was the one that I mentioned: completely avoiding a management cluster by having CAPI manage itself. I know people have already tried this, was wondering if it's something safe and worth considering 09:21:31 if that's not feasible, another idea was to use Magnum (e.g. a different, possibly simplified driver) to deploy the management cluster 09:21:36 lpetrut: but then mangum has to reach into every cluster directly to manage it? That seems worse (although it does have the creds for that) 09:21:59 yes 09:22:21 lpetrut: FWIW, we use helm wrapped by ansible to deploy the management cluster, using the same helm we use from inside the magnum driver 09:22:55 lpetrut: its interesting I hadn't really considered that approach before now 09:23:18 just curious, why would it be a bad idea for magnum to reach the managed cluster directly? 09:23:26 I think you could do that with the helm charts still, and "just" change the kubectl 09:24:15 lpetrut: I like the idea of magnum not getting broken by what users do within their clusters, and the management is separately managed outside, but its a gut reaction, needs more thought. 09:24:32 I see 09:25:03 I'm afraid I don't understand how CAPI works without a management cluster 09:25:15 its a trade off of course, there is something nice about only bootstraping from the central cluster, and the long running management is inside each clsuter 09:25:19 might need some links if you have them handy? 09:25:45 right, so the idea was to deploy CAPI directly against the managed cluster and have it manage itself 09:25:47 jakeyip: its really about the CAPI controllers being moved inside the workload cluster after the intiial bootstrap, at least we have spoken about that for the management cluster its-self 09:26:19 lpetrut: you still need a central managment cluster to do the initial bootstrap, but then it has less responsbility longer term 09:26:40 (in my head at least, which is probably not the same thing as reality) 09:27:30 right, we'd no longer have to keep the management cluster around 09:27:55 well that isn't quite true right 09:28:02 ah, wait a second... 09:28:20 ah, you mean a transient cluster for each bootstrap 09:28:27 yes 09:28:33 johnthetubaguy: do you mean, 1. initial manageent cluster 2. create A workload cluster (not created in Magnum) 3. move it into this workload cluster 4. point Magnum to this cluster ? 09:28:42 exactly 09:29:30 honestly that bit sounds like an operational headache, debugging wise, I prefere a persistent management cluster for the bootstrap, but transfer control into the cluster once its up 09:29:43 ... but this goes back to what problem we are trying to solve I guess 09:29:46 and I think we might be able to take this even further, avoiding the initial management altogether, using userdata scripts to deploy a minimal cluster using kubeadm, then deploy CAPI 09:29:52 I guess you want magnum to manage all k8s clusters? 09:30:41 lpetrut: I mean you can use k3s for that, which I think we do for our "seed" cluster today: https://github.com/stackhpc/ansible-collection-azimuth-ops/blob/main/playbooks/provision_capi_mgmt.yml 09:30:43 yeah, we were hoping to avoid the need of an external management cluster 09:31:32 ... yeah, there is something nice about that for sure. 09:31:34 right now, we were hoping to get some feedback, see if it make sense and if there's anyone interested, then we might prepare a POC 09:32:27 lpetrut: in my head I would love to see helm being used to manage all the resources, to keep the consisnet, and keep the manifests out of the magnum code base, so its not linked to your openstack upgrade cycle so strongly (but I would say that!) 09:33:23 one approach would be to extend one of the CAPI drivers and customize the bootstrap phase 09:33:35 I guess my main worry is that is a lot more complicated in magnum, but hopefully the POC would prove me wrong on that 09:33:54 I think it's an interesting concept 09:34:02 jakeyip: +1 09:34:14 lpetrut: can you describe more what that would look like please? 09:35:36 sure, so the idea would be to deploy a Nova instance, spin up a cluster using kubeadm or k3s, deploy CAPI on top so that it can manage itself and from then on we could use the standard CAPI driver workflow 09:35:43 at the moment the capi-helm driver "just" does a helm install, after having injected app creds and certs into the management cluster, I think in your case you would first wait to create a bootstrap cluster, then do all that injecting, then bring the cluster up, then wait for that to finish, then migrate into the deployed clusters, including injecting all the secrts into that, etc. 09:36:04 exactly 09:36:21 lpetrut: FWIW, that could "wrap" the existing capi-helm driver, I think, with the correct set of util functions, there is a lot of shared code 09:36:33 exactly, I'd just inherit it 09:36:59 now supporting both I like, let me describe... 09:37:15 if we get the standalone managenet cluster in first, from a mangum point of view that is simpler 09:37:49 second, I could see replacement of the shared managenet cluster, with a VM with k3s on for each cluster 09:38:15 then third, you move from VM into the main cluster, after the cluster is up, then tear down the VM 09:38:45 then we get feedback from operators on which proves nicer in production, possibly its both, possibly we pick a winner and deprecate the others 09:38:52 ... you can see a path to migrate between those 09:39:02 lpetrut: is that sort of what you are thinking? 09:39:08 sounds good 09:39:26 one of the things that was said at the PTG is relevant here 09:39:41 I think it was jonathon from the BBC/openstack-ansible 09:39:51 yes, although I was wondering if we could avoid the initial bootstrap vm altogether 09:39:57 magnum can help openstack people not have to understand so much of k8s 09:40:10 lpetrut: your idea here sure helps with that 09:40:51 lpetrut: sounds like magic, but I would to see it, although I am keen we make things possible with vanilla/mainstream cluster api approaches 09:40:52 before moving further, I'd like to check withe the CAPI maintainers to see if there's anything wrong with CAPI managing the cluster that it runs on 09:41:11 lpetrut: I believe that is a supported use case 09:41:26 that would be great 09:41:38 lpetrut: I think StackHPC implementation is just _one_ CAPI implementation. Magnum can and should support multiple drivers 09:41:55 as long as we can get maintainers haha 09:42:01 thanks a lot for feedback, we'll probably come back in a few weeks with a PoC :) 09:42:04 you can transfer from k3s into your created ha cluster, so it manages itself... we have a plan to do that for our shared management cluster, but have not got around to it yet (too busy doing magnum code) 09:42:34 I think important from my POV is that all the drivers people want to implement not clash with one another. 09:42:36 jakeyip: that is my main concern, diluting an already shrinking community 09:42:41 about that I need to chat with you johnthetubaguy 09:42:57 jakeyip: sure thing 09:43:00 about the hardest problem in computer science - naming :D 09:43:05 lol 09:43:09 foobar? 09:43:17 everyone loves foobar :) 09:43:41 which name are you thinking about? 09:44:01 johnthetubaguy: mainly the use of 'os' tag and config section 09:44:12 if we can have 1 name for all of them, different drivers won't clash 09:44:22 so about the os tag, the ones magnum use don't work with nova anyways 09:44:36 i.e. "ubuntu" isn't a valid os_distro tag, if my memory is correct on that 09:45:17 in config you can always turn off any in tree "clashing" driver anyways, but granted its probably better not to clash out of the box 09:45:36 yeah, is it possible to change them all to 'k8s_capi_helm_v1' ? so driver name, config section, os_distro tag is the same 09:45:47 jakeyip: I think I went for capi_helm in the config? 09:45:56 yeah I want to set rulesss 09:46:02 jakeyip: I thought I did that all already? 09:46:20 I don't 100% remember though, let me check 09:46:58 now is driver=k8s_capi_helm_v1, config=capi_helm, os_distro=capi-kubeadm-cloudinit 09:47:26 ah, right 09:47:47 so capi-kubeadm-cloudinit was chosen as to matches what is in the image 09:48:00 and flatcar will be different (its not cloudinit) 09:48:08 just thinking if lpetrut wants to develop something they can choose a name for driver and use that for config section and os_distro and it won't clash 09:48:46 it could well be configuration options in a single driver, to start with 09:49:11 hi, I will submit the flatcar patch on your github repo soon and for the moment I used capi-kubeadm-ignition 09:49:42 schivu: sounds good, I think dalees was looking at flatcar too 09:49:52 yeah I wasn't sure how flatcar will work with this proposal 09:50:14 a different image will trigger a different boostrap driver being selected in the helm chart 09:50:25 at least that is the bit I know about :) there might be more? 09:50:49 yep, mainly with CAPI the OS itself is irrelevant, what matters is which bootstrapping format does the image use 09:51:18 schivu: +1 09:51:51 I was trying to capture that in the os-distro value I chose, and operator config can turn the in tree implementation off if they want a different out of tree one? 09:51:59 (i.e. that config already exists today, I believe) 09:52:37 FWIW, different drivers can probably use the same image, so it seems correct they share the same flags 09:53:06 (I wish we didn't use os_distro though!) 09:54:10 jakeyip: I am not sure if that helped? 09:54:37 what were you thinking for the config and the driver, I was trying to copy the pattern with the heat driver and the [heat] config 09:55:28 to be honestly, I am happy with whatever on the naming of the driver and the config, happy to go with what seems normal for Magnum 09:55:58 hm, ok am I right that for stackhpc, the os_distro tag in glance will be e.g. for ubuntu=capi-kubeadm-cloudinit and flatcar=capi-kubeadm-ignition (as schivu said) 09:56:21 I am open to ideas, that is what we seem to be going for right now 09:56:33 it seems semantically useful like that 09:56:58 (we also look for a k8s version property) 09:57:37 https://github.com/stackhpc/magnum-capi-helm/blob/6726c7c46d3cac44990bc66bbad7b3dd44f72c2b/magnum_capi_helm/driver.py#L492 09:58:19 kube_version in the image properties is what we currently look for 10:00:29 jakeyip: what was your preference for os_distro? 10:00:30 I was under the impression the glance os_distro tag needs to fit 'os' part of driver tuple 10:00:54 so as I mentioned "ubuntu" is badly formatted for that tag anyways 10:01:10 I would rather not use os_distro at all 10:01:43 "ubuntu22.04" would be the correct value, for the nova spec: https://github.com/openstack/nova/blob/master/nova/virt/osinfo.py 10:02:07 see https://opendev.org/openstack/magnum/src/branch/master/magnum/api/controllers/v1/cluster_template.py#L428 10:02:42 which gets used by https://opendev.org/openstack/magnum/src/branch/master/magnum/drivers/common/driver.py#L142 10:03:25 yep, understood 10:04:03 I am tempted to register the driver as None, which might work 10:04:55 when your driver declares `{"os": "capi-kubeadm-cloudinit"}`, it will only be invoked if glance os_distro tag is `capi-kubeadm-cloudinit` ? it won't load for flatcar `capi-kubeadm-ignition` ? 10:05:25 yeah, agreed 10:05:48 I thought the decision was based on values passed to your driver 10:05:54 I think there will be an extra driver entry added for flatcar, that just tweaks the helm values, but I haven't see a patch for that yet 10:07:14 that's what I gathered from over https://github.com/stackhpc/capi-helm-charts/blob/main/charts/openstack-cluster/values.yaml#L124 10:08:25 johnthetubaguy: regarding the earlier discussion about deploying the capi managment k8s cluster - for openstack-ansible i have a POC doing that using an ansible collection, so one managment cluster for * workload clusters 10:08:25 capi_bootstrap="cloudinit|ingnition" would probably be better, but yeah, I was just trying hard not to clash with the out of tree driver 10:10:11 jrosser: cool, that is what we have done in here too I guess, reusing the helm charts we use inside the driver: https://github.com/stackhpc/ansible-collection-azimuth-ops/blob/main/playbooks/provision_capi_mgmt.yml and https://github.com/stackhpc/azimuth-config/tree/main/environments/capi-mgmt 10:10:53 will the flatcar patch be something that reads CT label and sets osDistro=ubuntu / flatcar? is this a question for schivu ? 10:10:57 jrosser: the interesting idea about lpetrut 's idea is that magnum could manage the management cluster(s) too, which would be neat trick 10:11:20 johnthetubaguy: i used this https://github.com/vexxhost/ansible-collection-kubernetes 10:12:38 jrosser: ah, cool, part of the atmosphere stuff, makes good sense. I haven't looked at atmosphere (yet). 10:13:15 yeah, though it doesnt need any atmosphere stuff to use the collection standalone, ive used the roles directly in OSA 10:14:04 jrosser: curious how do you maintain the lifecycle of the cluster deployed with ansible ? 10:14:14 jrosser: I don't know what kolla-ansible/kayobe are planning yet, right now we just add in the kubeconfig and kept the CD pipelines separate 10:14:30 i guess i would be worried about making deployment of the management cluster using magnum itself much much better than the heat driver 10:15:14 jakeyip: the flatcar patch adds a new driver entry; the ignition driver inherits the cloudinit one and provides "capi-kubeadm-ignition" as the os-distro value within the tuple 10:15:18 jakeyip: i would have to get some input from mnaser about that 10:15:51 I wish we were working on this together at the PTG to design a common approach, that was my hope for this effort, but it hasn't worked out that way I guess :( 10:17:21 schivu: thanks. in that case can you use the driver nme for os_distro? which is the question I asked johnthetubaguy initially 10:17:27 jrosser: they key difference with the heat driver is most of the work is in cluster API, with all of these approaches. In the helm world, we try to keep the mainfests in a single place, helm charts, so the test suite for the helm charts helps across all the different ways we stamp out k8s, be it via magnum, or ansible, etc. 10:17:51 jakeyip: sorry, I miss understood /miss read your question 10:19:19 jakeyip: we need the image properties to tell us cloudinit vs ignition, any that happens is mostly fine with me 10:19:37 having this converstation in gerrit would be my preference 10:21:02 sure I will also reply to it there 10:22:08 I was more meaning on the flatcar I guess, its easier when we see what the code looks like I think 10:22:26 there are a few ways we could do it 10:24:22 jakeyip: I need to discuss how much time I have left now to push this upstream, I am happy for people to run with the patches and update them, I don't want us to be a blocker for what the community wants to do here. 10:24:42 yeah I guess what I wanted to do was quickly check if using os_distro this way is a "possible" or a "hard no" 10:24:47 as I make sure drivers don't clash 10:25:23 well I think the current proposed code doesn't clash right? and you can configure any drivers to be disabled as needed if any out of tree driver changes to match? 10:25:39 johnthetubaguy: I am happy to take over your patches too, I have it running now in my dev 10:25:45 open to change the value to something that feels better 10:26:01 cool, great to have that understanding sorted 10:26:12 jakeyip: that would be cool, although granted that means its harder for you to +2 them, so swings and roundabouts there :) 10:27:38 johnthetubaguy: well... one step at a time :) 10:27:51 * johnthetubaguy nods 10:28:19 I need to officially end this cos it's over time, but feel free to continue if people have questions 10:28:27 jakeyip: are you aware of the tooling we have for the management cluster, that reuses the helm charts? 10:28:29 jakeyip: +1 10:28:31 #endmeeting