16:01:56 <johnsom> #startmeeting Octavia 16:01:56 <opendevmeet> Meeting started Wed Nov 26 16:01:56 2025 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:56 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:56 <opendevmeet> The meeting name has been set to 'octavia' 16:02:28 <gthiemonge> o/ 16:02:37 <jovial> o/ 16:02:46 <johnsom> Odd, I thought I just started the meeting, but maybe we had a net split. Welcome all! 16:03:31 <gthiemonge> o/ 16:03:38 <johnsom> Yeah, ok, I am seeing some IRC issues. So sorry if I'm slow 16:03:48 <jovial> No problemo 16:04:06 <johnsom> #topic Announcements 16:04:28 <johnsom> We are past milestone 1 for Gazpacho. 16:04:44 <johnsom> Otherwise I don't have any other announcements this week. Anyone else? 16:05:07 <gthiemonge> nop 16:05:48 <johnsom> #topic Brief progress reports / bugs needing review 16:06:29 <johnsom> I am mostly focused on reviews at this point. I have had some time off and will be off the rest of the week for a US holiday. 16:07:12 <johnsom> I am still working on the rate limiting RFE, but it is slow progress at the moment due to my need to work on other things downstream 16:07:29 <gthiemonge> I've had some activity in launchpad, answering to questions/comments. 16:09:03 <johnsom> Yeah, there have been some bugs that came through this week. One I bounced to neutron as it was an OVN change in behavior issue that we have no control over. 16:11:25 <johnsom> Ok, moving on 16:11:27 <johnsom> #topic Open Discussion 16:11:33 <johnsom> Any other topics this week? 16:12:09 <jovial> I've been hitting issues with using hard anti affinity and nova scheduling races, that I'd like to ask your opinions about 16:13:01 <gthiemonge> I think it's related to https://bugs.launchpad.net/octavia/+bug/2064600 (see the newer comments) 16:13:02 <jovial> I put some observations in https://bugs.launchpad.net/octavia/+bug/2064600 16:13:53 <jovial> That is the one. Essentially we come up against https://docs.openstack.org/nova/latest/admin/troubleshooting/affinity-policy-violated.html as octavia is launching both instances in parallel 16:14:02 <johnsom> Oh joy. Yes, there are a number of bugs in nova around anti-affinity. Some that have been reported just became documentation instead of fixes.... Sigh 16:15:48 <jovial> I was wondering what you thought about making octavia boot them serially 16:15:55 <jovial> or at least have some delay 16:18:45 <jovial> I've also opened a bug against nova to get their input: https://bugs.launchpad.net/nova/+bug/2132984 16:19:01 <gthiemonge> oh nice 16:19:06 <gthiemonge> +1 for the nova bug ;-) 16:19:13 <johnsom> Well, serial is super easy to implement by just changing the flow to linear, but that seems like a sad user experience. 16:19:46 <gthiemonge> johnsom: I think we can inject some kind of dependencies between the tasks in taskflow, like the creation of the BACKUP VM would wait for the creation of the MASTER VM to complete 16:19:54 <johnsom> I don't like the nova implementation as it should be atomic with the server groups IMO and not have this race. 16:21:34 <gthiemonge> (it needs to be evaluated, it may cause thread-safety issues in taskflow) 16:21:38 <johnsom> Sorry I didn't have these links ahead to read everything. I'm also intrigued by this new "multi-create" API in the linked nova doc. This might also be a good path for us. 16:22:14 <johnsom> Well, the above, wait for primary to boot is basically a linear flow.... 16:22:49 <gthiemonge> but when both VMs are active, the rest of the tasks are executed in parallel 16:22:55 <jovial> I did run that by John Garbutt (my colleague in the nova team) and he made a disapproving face :laugh:, but the multi-create api is the direction the docs seem to suggest 16:23:22 <jovial> I can only think that there might be some dragons with using the multi-create api or that it isn't as widely tested 16:23:33 <johnsom> Since nova is super slow, we pushed to use an unordered flow there to parallelize it. It's just sad that nova isn't able to handle that properly. 16:23:59 <johnsom> John is a great guy, I have worked with him in the past... 16:24:15 <gthiemonge> if we want to use the multi-create API, it needs to be called before starting the unordered flows 16:24:38 <johnsom> Agreed, there are sadly a large number of dragons in the nova code. We have a ton of workarounds already. 16:25:07 <johnsom> Yeah, multi-create would replace the unordered flow for at least some of it. 16:26:31 <johnsom> I lean towards becoming the new tempest job for nova that uses the multi-create API to get around this issue. Thoughts? 16:27:50 <jovial> Would it be a big change to switch to creating both at the same time with multi-create? I'm not that familiar with the code, but doesn't each amphora build get added a build queue? Just wondering if that would cause a substantial reworking 16:29:05 <johnsom> The part that concerns me is that we load a unique certificate per VM for the two way TLS authentication. This might be a problem with multi-create. 16:29:18 <gthiemonge> that would be a huge change in the create LB flow: https://docs.openstack.org/octavia/latest/_images/LoadBalancerFlows-get_create_load_balancer_flow.svg 16:29:35 <johnsom> Each amphora gets a unique certificate so if one is compromised, we can isolate it. 16:29:40 <jovial> ^ thanks - nice to have a digram 16:30:21 <johnsom> lol, you are welcome. 16:30:23 <johnsom> MASTER-octavia-create-amp-for-lb-subflow-octavia-generate-serverpem 16:30:31 <johnsom> That is the challenging part 16:30:45 <gthiemonge> basically all the tasks betweeh {MASTER,BACKUP}-octavia-create-amp-fow-lb-subflow and {MASTER,BACKUP}-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create need to be put before the creation of the 2 subflows 16:30:55 <gthiemonge> johnsom: right 16:31:16 <jovial> And would it make sense to make a new octavia bug? I feel I've kind of hijacked that other one with another related problem; the original issue seemed to be loadbalancers getting stuck in pending_create state. Or do you think it is the same bug? 16:32:00 <jovial> ^ I could ask John to add his opinions on using the multi-create API if that helps 16:32:33 <gthiemonge> stuck in pending_create doesn't look related to a nova issue, that sounds more like https://bugs.launchpad.net/octavia/+bug/2043360 which was a bug in taskflow 16:33:02 <johnsom> In general I am a fan of having more bugs. It's easy to close as duplicate if it is, but having separate bugs to track issues is helpful. 16:33:11 <jovial> I think that was my conclusion in the end :) 16:34:01 <jovial> johnsom, cool, is it better to link the nova bug with octavia or just make a separate one? I can make something after the meeting. 16:34:47 <johnsom> In launchpad you can assign a bug to multiple projects (i.e. both octavia and nova) if you think it spans the projects 16:36:51 <johnsom> Seems like this is the key statement in the nova doc: "Future work is needed to add anti-/affinity support to the placement service in order to eliminate the need for the late affinity check in nova-compute." 16:37:54 <johnsom> We create the server group early, I really don't understand why nova can't lock around that to sequence the compute create requests. 16:39:06 <jovial> I mean, it makes total sense to me. Would it make to ask that question in my nova bug? 16:39:27 <johnsom> I think it's a fair question for sure 16:40:42 <johnsom> Yeah, looking at the multi-create API, we can't use it as the config drive information (mostly the certificate) is unique per compute instance. 16:41:02 <johnsom> Their multi-create API doesn't allow this 16:41:32 <johnsom> Plus, as you mentioned, it is probably not well tested 16:42:39 <jovial> I did not know about the config drive restriction, thanks for highlighting it 16:43:56 <johnsom> Yeah, we inject some information at boot into the VM to establish a chain of trust. (like secure boot, but not... lol) 16:45:43 <johnsom> Ok, I think I need to think about this more. How about a path forward of: 16:45:55 <johnsom> 1. Open a new Octavia bug to track what we have learned. 16:46:15 <johnsom> 2. Follow up on the nova bug to see if they will fix this. 16:47:00 <johnsom> 3. I will think about the Octavia bug and see what we can do to handle these situation better in the flow. (yet another nova work around...) 16:47:13 <johnsom> Any other steps? 16:47:27 <gthiemonge> wise words 16:47:38 <jovial> Sounds like a good plan to me. I'll make a new bug after this :) 16:47:57 <johnsom> Thank you for raising this. 16:48:04 <gthiemonge> johnsom: I'll take a look at taskflow, I think adding a dependency between the flows will not block the 2nd flow during a long period 16:49:15 <johnsom> Yeah, I only know of a way to revert and go down another path. I don't know about cross flow coordination. 16:49:57 <johnsom> I mean we could always break that down and make the nova boot part linear. It would be sad to slow that down, but it's not too hard. 16:50:52 <johnsom> I don't think we should add a sleep for the secondary VM boot as clouds have different performance and sleep times are a roll of the dice 16:51:31 <gthiemonge> ack 16:51:38 <johnsom> gthiemonge Thanks for looking at taskflow. Let me know what you find 16:51:59 <gthiemonge> np, thank you guys! 16:52:14 <jovial> For what it is worth, I think there is a similar problem with soft-anti-affinity, but that at least that doesn't cause a failure, just both vms come up on the same hypervisor 16:52:51 <johnsom> Yeah, more often than not they come up on the same host 16:53:49 <johnsom> Ok, I think we have a plan. Any other open discussion items today? 16:54:49 <jovial> nothing from me 16:54:52 <gthiemonge> nothing 16:55:01 <johnsom> Thank you all for the great discussion. Have a great week! 16:55:06 <johnsom> #endmeeting