09:12:53 <jakeyip> #startmeeting magnum 09:12:53 <opendevmeet> Meeting started Wed Nov 15 09:12:53 2023 UTC and is due to finish in 60 minutes. The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:12:53 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:12:53 <opendevmeet> The meeting name has been set to 'magnum' 09:13:03 <jakeyip> Agenda: 09:13:04 <jakeyip> #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:13:07 <jakeyip> #topic Roll Call 09:13:14 <jakeyip> o/ 09:13:34 <travisholton> o/ 09:14:08 <dalees> o/ 09:14:44 <jakeyip> mnasiadka: ? 09:14:58 <mnasiadka> o/ 09:15:12 <jakeyip> thanks all 09:15:38 <jakeyip> please put discussion items in etherpad 09:15:56 <jakeyip> #topic improve-driver-discovery spec 09:16:22 <jakeyip> I've wrote up something to capture the discussion. https://review.opendev.org/c/openstack/magnum-specs/+/900410 hopefully that gives us a good base 09:16:47 <jakeyip> need help and eyes on it 09:16:54 <dalees> thanks for writing that down jakeyip , much appreciated. 09:17:32 <jakeyip> no worries, happy to move it on 09:18:45 <jakeyip> so I've been hemming and hawwing because I am unsure which alternative was the best, given all the changes that are happening 09:18:56 <jakeyip> I guess we just need to fix something, and make it work 09:19:44 <dalees> I've given my feedback today, but it's fairly minor. We need the driver tiebreak and we can move on with the CAPI drivers 09:20:02 <jakeyip> dalees: thanks for your feedback 09:20:55 <jakeyip> there's no pressure to read that spec at this meeting, happy to have a week to input comments and a discussion about it next meeting 09:20:55 <dalees> yeah - it'll require a little refactor in the driver loading system, but then the driver selection tiebreak can be done. Almost all the options do expose the Magnum user to the driver concept, but it seems unavoidable with multiple drivers for one tuple. 09:21:43 <noonedeadpunk> o/ 09:22:04 <jakeyip> my opinion is they need to know driver specific details already currently, like os_distro=fedora_coreos 09:22:29 <jakeyip> which isn't actually an os_distro label 09:22:33 <jakeyip> hi noonedeadpunk 09:22:34 <mkjpryor> Using image properties to do it still ties an image to a driver, which mirrors the current situation 09:23:00 <mkjpryor> And if the operator uploads the images and creates the templates, and blocks the creation of new templates, the user still doesn't need to know about drivers 09:23:43 <mkjpryor> Even if they allow custom templates, the operator-uploaded images would select the correct driver 09:23:54 <jakeyip> good point mkjpryor 09:24:14 <mkjpryor> So I guess that is me saying that I quite like it 09:24:48 <jakeyip> :) 09:24:56 <mkjpryor> I don't like the idea so much of co-opting the os_distro label to do explicit driver selection in the tie-break case. It feels like we should have our own label to do that 09:25:34 <mkjpryor> That way it also reduces to the current situation in the case where you have no clashes as well 09:25:44 <mnasiadka> What about adding another cluster template property that ties to the driver, if unspecified then we fall back to what we have today? 09:26:12 <mkjpryor> Yeah - that is jakes proposal one I think 09:26:22 <mkjpryor> Ah - cluster template property 09:26:57 <jakeyip> mkjpryor: the proposal calls for using another glance image property. ln 83-89 09:27:35 <jakeyip> mnasiadka: cluster template property will require client change. we can also use label 09:27:42 <mnasiadka> glance image property ties that image to a specific driver, in a cloud where users can upload their own image and create their own cluster template - this is going to be a problem 09:27:57 <dalees> mnasiadka: we we discussed that, it felt like it necessitates exposing an API to list enabled drivers. But on reflection maybe the image property does in a similar way. 09:28:57 <mnasiadka> well, if the goal is that we try to make it with minimal effort and easy backporting - then sure, we don't want an API impact 09:29:31 <jakeyip> mnasiadka: in the current implmentation, the user already has to find out driver specific information like os_distro=fedora_coreos for setting on their own image. I feel it isn't a big leap for another image property specifying the driver name 09:30:15 <jakeyip> I noted in the spec that the 'correct' os_distro label should be 'fedora' . 'fedora-coreos' is a magnum thing 09:30:20 <mnasiadka> os_distro is one of the ,,standard'' glance image properties 09:30:27 <mnasiadka> #link https://docs.openstack.org/glance/latest/admin/useful-image-properties.html 09:30:53 <mnasiadka> And people learned that over time 09:30:58 <mnasiadka> But maybe it's enough 09:31:03 <jakeyip> https://gitlab.com/libosinfo/osinfo-db/-/blob/v20231027/data/os/fedoraproject.org/coreos-stable.xml.in?ref_type=tags#L10 09:31:44 <jakeyip> anyway they can't put any image and expect it to work with CAPI driver. they have to build a CAPI image and select a CAPI driver 09:31:53 <jakeyip> it's not like heat 09:32:31 <mnasiadka> ok then, why not 09:32:32 <mkjpryor> The difference for me is that if we use an _image_ property, then users only creating custom templates don't need to worry about drivers - they just specify the image that the operator uploaded with the correct properties 09:32:40 <mnasiadka> In order to allow custom image properties, Glance must be configured with the glance-api.conf setting allow_additional_image_properties set to True. (This is the default setting.) 09:32:41 <mnasiadka> hmm 09:32:58 <mnasiadka> well, I guess nobody disables that today 09:33:28 <mkjpryor> Is there a good reason to disable it? 09:33:49 <mnasiadka> probably not 09:34:32 <mkjpryor> So if we use an extra image property to break the tie-break, users don't need to know about drivers in order to create custom templates which is nice 09:34:45 <mnasiadka> especially the operator would loose all custom image properties, eg the Nova ones as well 09:35:06 <mnasiadka> (if they would disable it) 09:35:18 <mkjpryor> If a user is uploading a custom image, especially for the CAPI drivers, I would argue that is an advanced enough use case that expecting them to know the driver names and set an extra property is not an unreasonable expectation 09:35:55 <dalees> mkjpryor: yeah, agreed - the image property keeps it confined to one place the operator is likely to manage. it does lock one CAPI image into one driver, but the use case for having two CAPI drivers enabled is to transition, so it's less likely to want the same image for both (and you could just upload twice). 09:36:47 <mkjpryor> dalees: It would be nice if Glance could recognise that the two images have the same checksum and not store them twice though :D 09:37:03 <mkjpryor> But yeah - for transition I think it is fine 09:38:49 <mnasiadka> well, ideally a user should be able to override the driver on each of those three levels (image, cluster template, cluster) - but maybe let's go with the image only (since it's for transition time) - and see if there are any corner cases users would report 09:40:27 <jakeyip> cool 09:40:49 <jakeyip> sorry was having a browse in glance source wondering if os_distro is a build in or extra property :D 09:41:01 <jakeyip> a grep shows https://github.com/openstack/glance/blob/stable/2023.2/glance/async_/flows/api_image_import.py#L85-L92 09:41:11 <jakeyip> nothing concrete, don't worry about it too much 09:41:28 <mkjpryor> I don't have a problem with also adding a template field to override the driver choice 09:41:47 <mkjpryor> I actually don't think it should be customisable on the cluster level 09:42:05 <jakeyip> mkjpryor: that needs API update, was hoping to avoid that 09:42:18 <jakeyip> unless we use labels :) 09:42:35 <mkjpryor> I think it can happen in a second patch if we decide we really want it after using the image property for a bit 09:42:51 <jakeyip> yeap 09:43:51 <jakeyip> I think mostly all fine? please leave review if you have concern, I'll tidy up and maybe we aim for a spec merge in 1 or 2 weeks 09:44:06 <jakeyip> I'm also going to try find some time to do the POC and send up code for it 09:44:28 <jakeyip> let's move on 09:44:55 <jakeyip> #topic CI tests 09:45:22 <jakeyip> mnasiadka: are you blocked on anything? 09:45:47 <mnasiadka> I'm blocked on my time, I might have some on Friday to fix the failed deploy log collection 09:45:56 <mnasiadka> because that's the only thing that is not working for now 09:47:35 <jakeyip> mnasiadka: I think if the passing case is OK, you can submit it 09:48:20 <mnasiadka> ok, I'll shape it up in current form and ask for final reviews 09:48:24 <jakeyip> it's still useful, we can use it to test if deprecating the swarm driver and stuffs will get a -1 , 09:48:50 <jakeyip> thanks 09:48:59 <jakeyip> #topic spec resize master count 09:49:00 <jakeyip> dalees: 09:49:37 <dalees> just wanted to briefly mention this - I've tested and have this working in CAPI helm, masters can size between 1, 3 and 5. 09:50:06 <dalees> I'd like to write a spec and implement this in the driver interface, so existing (heat) will not support resize, but CAPI can if it wanted to 09:50:41 <dalees> s/masters/control plane/ 09:50:45 <jakeyip> dalees: I think that will be very useful for people who started with 1... :D 09:51:01 <jakeyip> qn: can it scale down? 09:51:31 <dalees> yep, any odd number, both up and down. 09:52:36 <jakeyip> ok 09:52:56 <jakeyip> we are almost at time. dalees anything else for this topic? 09:53:15 <dalees> no, just mentioning at this point. I will write the spec 09:53:24 <jakeyip> #action dalees to write spec 09:53:34 <jakeyip> #topic ClusterAPI 09:53:42 <jakeyip> everyone's favorite :) 09:54:18 <dalees> mkjpryor: thanks for the reviews; I was about to ask ;) 09:54:20 <jakeyip> I have a question - we need to fork the Helm Chart hosted by Magnum 09:54:45 <jakeyip> can someone help? 09:55:09 <dalees> jakeyip: which helm chart? the stackhpc capi one? 09:55:37 <jakeyip> dalees: yeah 09:56:09 <jakeyip> I mean, this doesn't _need_ to happen until we merge StackHPC in tree. but I'm interested if someone already knows how to do this or is already doing this 09:56:23 <jakeyip> dalees: I think you mentioned you were working on OCI version of helm chart? 09:56:49 <dalees> I created a repo for it in opendev, but now I realise that to fork properly(and keep commits) the source git repo should be specified on creation. So we may need to delete and re-create when we're ready to merge the driver 09:57:38 <jakeyip> can a fork in opendev work? I was under the impression it may require to build the site in Github pages? 09:57:41 <dalees> jakeyip: helm charts can be uploaded to an OCI registry, so this can be too. The OCI support just needed to be in the capi helm driver (and it now is, thanks johnthetubaguy) 09:58:24 <dalees> https://helm.sh/docs/topics/registries/ 09:58:33 <mkjpryor> dalees: in the next couple of months I want to look at moving all our Helm charts to OCI, so it will probably happen by default for the CAPI Helm charts at some point 09:59:06 <dalees> mkjpryor: cool - we host stackhpc charts in our OCI and the magnum-capi-helm driver works nicely with it. 09:59:58 <noonedeadpunk> so, for ones with limited connectivity, would need either to source stackhpc repo or have a clone of whole OCI registry? 10:00:28 <dalees> noonedeadpunk: yes, but that's true for loads of OCI images that are required to boot a k8s cluster. 10:00:34 <mkjpryor> It becomes exactly like you would do for a Docker image - just mirror the repository into your own registry 10:01:04 <noonedeadpunk> well, just source is a bit concenring as a requirement for in-tree driver 10:01:06 <dalees> it ends up being a single asset in a OCI registry 10:01:45 <mnasiadka> dalees: it's not going to work that way - we just need to push all the commits with history (the fork) 10:02:07 <mnasiadka> well, not history even 10:02:09 <noonedeadpunk> yeah, and indeed it needs repo to be dropped and re-created... 10:02:12 <mnasiadka> one commit 10:02:20 <noonedeadpunk> which is kinda sucks if it's already under governance... 10:02:52 <noonedeadpunk> as then it indeed has quite some overhead 10:03:48 <noonedeadpunk> but indeed would be nice if it is moved with whole history 10:04:05 <jakeyip> dalees: can you drive this? since you have the most experience 10:04:09 <mnasiadka> basically easiest would be to push the code in one commit, add Co-Authored-By for all authors 10:04:11 <noonedeadpunk> another tip for moving to opendev - it can not contain any zuul config files 10:04:31 <noonedeadpunk> (during forking) 10:05:03 <dalees> jakeyip: i have experience in doing it wrong :) But yep I can figure out removing the empty repo, and we can fork the charts properly into governance with history once the driver merges. 10:05:07 <noonedeadpunk> yeah, that would be another way around, depending how much we wanna preserve history 10:05:52 <jakeyip> mnasiadka: if there is no requirement from OpenStack and StackHPC is fine with us doing this, that is good idea 10:06:09 <dalees> repo link: https://opendev.org/openstack/magnum-capi-helm-charts 10:06:12 <mnasiadka> I'll leave the ''StackHPC is fine with us doing this'' to mkjpryor 10:07:03 <jakeyip> that will allow us to start with bare minimum, instead of forking and backing out some of the StackHPC addition like GPU 10:07:29 <jakeyip> #action dalees to look into hosting helm chart 10:07:48 <jakeyip> we are over time, let's call the meeting. leave something for next meeting 10:08:03 <mkjpryor> The charts are released under the Apache 2.0 licence so whether we are fine with it or not isn't a deal breaker 10:08:04 <mkjpryor> :D 10:08:16 <jakeyip> mkjpryor: hahaha 10:08:40 <jakeyip> well legally correct and morally wrong isn't a path Magnum Core Team should take ;) 10:08:55 <mkjpryor> But I think the ambition was always that they would become a community resource that represents best practice for deploying CAPI clusters on OpenStack 10:09:21 <jakeyip> #endmeeting