16:01:56 <johnsom> #startmeeting Octavia
16:01:56 <opendevmeet> Meeting started Wed Nov 26 16:01:56 2025 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:56 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:56 <opendevmeet> The meeting name has been set to 'octavia'
16:02:28 <gthiemonge> o/
16:02:37 <jovial> o/
16:02:46 <johnsom> Odd, I thought I just started the meeting, but maybe we had a net split. Welcome all!
16:03:31 <gthiemonge> o/
16:03:38 <johnsom> Yeah, ok, I am seeing some IRC issues. So sorry if I'm slow
16:03:48 <jovial> No problemo
16:04:06 <johnsom> #topic Announcements
16:04:28 <johnsom> We are past milestone 1 for Gazpacho.
16:04:44 <johnsom> Otherwise I don't have any other announcements this week. Anyone else?
16:05:07 <gthiemonge> nop
16:05:48 <johnsom> #topic Brief progress reports / bugs needing review
16:06:29 <johnsom> I am mostly focused on reviews at this point. I have had some time off and will be off the rest of the week for a US holiday.
16:07:12 <johnsom> I am still working on the rate limiting RFE, but it is slow progress at the moment due to my need to work on other things downstream
16:07:29 <gthiemonge> I've had some activity in launchpad, answering to questions/comments.
16:09:03 <johnsom> Yeah, there have been some bugs that came through this week. One I bounced to neutron as it was an OVN change in behavior issue that we have no control over.
16:11:25 <johnsom> Ok, moving on
16:11:27 <johnsom> #topic Open Discussion
16:11:33 <johnsom> Any other topics this week?
16:12:09 <jovial> I've been hitting issues with using hard anti affinity and nova scheduling races, that I'd like to ask your opinions about
16:13:01 <gthiemonge> I think it's related to https://bugs.launchpad.net/octavia/+bug/2064600 (see the newer comments)
16:13:02 <jovial> I put some observations in https://bugs.launchpad.net/octavia/+bug/2064600
16:13:53 <jovial> That is the one. Essentially we come up against https://docs.openstack.org/nova/latest/admin/troubleshooting/affinity-policy-violated.html as octavia is launching both instances in parallel
16:14:02 <johnsom> Oh joy. Yes, there are a number of bugs in nova around anti-affinity. Some that have been reported just became documentation instead of fixes.... Sigh
16:15:48 <jovial> I was wondering what you thought about making octavia boot them serially
16:15:55 <jovial> or at least have some delay
16:18:45 <jovial> I've also opened a bug against nova to get their input: https://bugs.launchpad.net/nova/+bug/2132984
16:19:01 <gthiemonge> oh nice
16:19:06 <gthiemonge> +1 for the nova bug ;-)
16:19:13 <johnsom> Well, serial is super easy to implement by just changing the flow to linear, but that seems like a sad user experience.
16:19:46 <gthiemonge> johnsom: I think we can inject some kind of dependencies between the tasks in taskflow, like the creation of the BACKUP VM would wait for the creation of the MASTER VM to complete
16:19:54 <johnsom> I don't like the nova implementation as it should be atomic with the server groups IMO and not have this race.
16:21:34 <gthiemonge> (it needs to be evaluated, it may cause thread-safety issues in taskflow)
16:21:38 <johnsom> Sorry I didn't have these links ahead to read everything. I'm also intrigued by this new "multi-create" API in the linked nova doc. This might also be a good path for us.
16:22:14 <johnsom> Well, the above, wait for primary to boot is basically a linear flow....
16:22:49 <gthiemonge> but when both VMs are active, the rest of the tasks are executed in parallel
16:22:55 <jovial> I did run that by John Garbutt (my colleague in the nova team) and he made a disapproving face :laugh:, but the multi-create api is the direction the docs seem to suggest
16:23:22 <jovial> I can only think that there might be some dragons with using the multi-create api or that it isn't as widely tested
16:23:33 <johnsom> Since nova is super slow, we pushed to use an unordered flow there to parallelize it. It's just sad that nova isn't able to handle that properly.
16:23:59 <johnsom> John is a great guy, I have worked with him in the past...
16:24:15 <gthiemonge> if we want to use the multi-create API, it needs to be called before starting the unordered flows
16:24:38 <johnsom> Agreed, there are sadly a large number of dragons in the nova code. We have a ton of workarounds already.
16:25:07 <johnsom> Yeah, multi-create would replace the unordered flow for at least some of it.
16:26:31 <johnsom> I lean towards becoming the new tempest job for nova that uses the multi-create API to get around this issue. Thoughts?
16:27:50 <jovial> Would it be a big change to switch to creating both at the same time with multi-create? I'm not that familiar with the code, but doesn't each amphora build get added a build queue? Just wondering if that would cause a substantial reworking
16:29:05 <johnsom> The part that concerns me is that we load a unique certificate per VM for the two way TLS authentication. This might be a problem with multi-create.
16:29:18 <gthiemonge> that would be a huge change in the create LB flow: https://docs.openstack.org/octavia/latest/_images/LoadBalancerFlows-get_create_load_balancer_flow.svg
16:29:35 <johnsom> Each amphora gets a unique certificate so if one is compromised, we can isolate it.
16:29:40 <jovial> ^ thanks - nice to have a digram
16:30:21 <johnsom> lol, you are welcome.
16:30:23 <johnsom> MASTER-octavia-create-amp-for-lb-subflow-octavia-generate-serverpem
16:30:31 <johnsom> That is the challenging part
16:30:45 <gthiemonge> basically all the tasks betweeh {MASTER,BACKUP}-octavia-create-amp-fow-lb-subflow and {MASTER,BACKUP}-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create need to be put before the creation of the 2 subflows
16:30:55 <gthiemonge> johnsom: right
16:31:16 <jovial> And would it make sense to make a new octavia bug? I feel I've kind of hijacked that other one with another related problem; the original issue seemed to be loadbalancers getting stuck in pending_create state.  Or do you think it is the same bug?
16:32:00 <jovial> ^ I could ask John to add his opinions on using the multi-create API if that helps
16:32:33 <gthiemonge> stuck in pending_create doesn't look related to a nova issue, that sounds more like https://bugs.launchpad.net/octavia/+bug/2043360 which was a bug in taskflow
16:33:02 <johnsom> In general I am a fan of having more bugs. It's easy to close as duplicate if it is, but having separate bugs to track issues is helpful.
16:33:11 <jovial> I think that was my conclusion in the end :)
16:34:01 <jovial> johnsom, cool, is it better to link the nova bug with octavia or just make a separate one? I can make something after the meeting.
16:34:47 <johnsom> In launchpad you can assign a bug to multiple projects (i.e. both octavia and nova) if you think it spans the projects
16:36:51 <johnsom> Seems like this is the key statement in the nova doc: "Future work is needed to add anti-/affinity support to the placement service in order to eliminate the need for the late affinity check in nova-compute."
16:37:54 <johnsom> We create the server group early, I really don't understand why nova can't lock around that to sequence the compute create requests.
16:39:06 <jovial> I mean, it makes total sense to me. Would it make to ask that question in my nova bug?
16:39:27 <johnsom> I think it's a fair question for sure
16:40:42 <johnsom> Yeah, looking at the multi-create API, we can't use it as the config drive information (mostly the certificate) is unique per compute instance.
16:41:02 <johnsom> Their multi-create API doesn't allow this
16:41:32 <johnsom> Plus, as you mentioned, it is probably not well tested
16:42:39 <jovial> I did not know about the config drive restriction, thanks for highlighting it
16:43:56 <johnsom> Yeah, we inject some information at boot into the VM to establish a chain of trust. (like secure boot, but not... lol)
16:45:43 <johnsom> Ok, I think I need to think about this more. How about a path forward of:
16:45:55 <johnsom> 1. Open a new Octavia bug to track what we have learned.
16:46:15 <johnsom> 2. Follow up on the nova bug to see if they will fix this.
16:47:00 <johnsom> 3. I will think about the Octavia bug and see what we can do to handle these situation better in the flow. (yet another nova work around...)
16:47:13 <johnsom> Any other steps?
16:47:27 <gthiemonge> wise words
16:47:38 <jovial> Sounds like a good plan to me. I'll make a new bug after this :)
16:47:57 <johnsom> Thank you for raising this.
16:48:04 <gthiemonge> johnsom: I'll take a look at taskflow, I think adding a dependency between the flows will not block the 2nd flow during a long period
16:49:15 <johnsom> Yeah, I only know of a way to revert and go down another path. I don't know about cross flow coordination.
16:49:57 <johnsom> I mean we could always break that down and make the nova boot part linear. It would be sad to slow that down, but it's not too hard.
16:50:52 <johnsom> I don't think we should add a sleep for the secondary VM boot as clouds have different performance and sleep times are a roll of the dice
16:51:31 <gthiemonge> ack
16:51:38 <johnsom> gthiemonge Thanks for looking at taskflow. Let me know what you find
16:51:59 <gthiemonge> np, thank you guys!
16:52:14 <jovial> For what it is worth, I think there is a similar problem with soft-anti-affinity, but that at least that doesn't cause a failure, just both vms come up on the same hypervisor
16:52:51 <johnsom> Yeah, more often than not they come up on the same host
16:53:49 <johnsom> Ok, I think we have a plan. Any other open discussion items today?
16:54:49 <jovial> nothing from me
16:54:52 <gthiemonge> nothing
16:55:01 <johnsom> Thank you all for the great discussion. Have a great week!
16:55:06 <johnsom> #endmeeting