09:02:53 #startmeeting magnum 09:02:53 Meeting started Wed Sep 11 09:02:53 2024 UTC and is due to finish in 60 minutes. The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:02:53 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:02:53 The meeting name has been set to 'magnum' 09:03:01 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:03:12 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:03:13 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:03:15 o/ 09:03:20 #topic Roll Call 09:03:22 o/ 09:03:23 o/ 09:05:19 #topic Cluster create returns before uuid is valid 09:05:22 #link https://bugs.launchpad.net/magnum/+bug/2078390 09:06:16 dalees: I just read the bug, haven't reviewed it yet, apologies 09:06:31 no worries, I understand you've been away! 09:07:12 yeah just got back Mon. :) 09:08:46 seems valid. I wonder if you referred to other services for the pattern? (synch. call to init db entry) 09:10:55 No, I didn't. I also considered adding a new state 'INITIALIZED', but figured that was more change than it was worth. It would help identify a lost or slow async RPC create message, if that ever occurred. 09:14:23 ok, I don't think that's necessary 09:14:57 I find the last two lines of the change log confusing - what can be upgraded first? 09:15:37 oh sorry I misread, ignore me 09:16:03 conductor first - it will handle both the old and new rpc create messages 09:18:08 (and by 'old' I just mean that initialize hasn't been called yet) 09:19:23 first read it looks ok, I will look more after this meeting 09:19:40 well, I will deploy this change and ensure it sorts out our Tempest problems. Maybe there is more to do, but it feels like a reasonable change even though it adds an RPC round-trip. 09:19:46 cheers, all good. 09:21:02 just curious, which tempest tests are you running? 09:22:25 we run ClusterTest.test_create_list_sign_delete_clusters regularly. 09:23:45 yeah same. do you use template id or let the tempest test create a template? 09:24:18 template id 09:26:06 ok same 09:28:11 the symptoms we end up seeing in this race condition is that tempest will end up deleting a cluster as Magnum does the helm install. So some secrets are wiped from CAPI and the cluster in Magnum is gone, but the helm resources are all created and CAPI tries to keep reconciling the cluster. 09:28:49 so keep a eye out in your CAPI management cluster for extra tempest clusters that Magnum doesn't know about. 09:30:23 I don't think it would be limited to the CAPI helm driver, as the conductor is doing all of this async. But maybe it's easier to reproduce with helm. 09:30:24 so there will be a k8s cluster but no corresponding magnum cluster? 09:31:21 yes, but it usually fails to create as Magnum deletes the OpenStack secrets, or deletes the app cred. 09:31:46 yeap ok 09:38:21 for the RPC incompatibility, I am looking if there's some recommendations for this kind of change. it's not typical like adding a new object field. have you looked into that? 09:41:08 Not at other prior art, no. I understand some services have RPC versioning they can use, which prevents upgrading until all agents are on the new version. I'm not sure if Magnum registers agents in the same way as Nova/Neutron though. 09:46:44 yeah there's a `openstack coe service list`, but no version from what I can tell 09:46:59 getting late, let's do this offline 09:47:00 anything else you want to discuss? 09:47:25 nope, that's all from me 09:50:01 ok let me take some time to review 09:50:33 seeya next week? (no pressure :P ) 09:54:34 Yes, see you next week 09:56:05 #endmeeting