Wednesday, 2025-11-26

laerlingThx, will do!13:56
johnsom#startmeeting Octavia16:01
opendevmeetMeeting started Wed Nov 26 16:01:56 2025 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.16:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:01
opendevmeetThe meeting name has been set to 'octavia'16:01
gthiemongeo/16:02
jovialo/16:02
johnsomOdd, I thought I just started the meeting, but maybe we had a net split. Welcome all!16:02
gthiemongeo/16:03
johnsomYeah, ok, I am seeing some IRC issues. So sorry if I'm slow16:03
jovialNo problemo16:03
johnsom#topic Announcements16:04
johnsomWe are past milestone 1 for Gazpacho.16:04
johnsomOtherwise I don't have any other announcements this week. Anyone else?16:04
gthiemongenop16:05
johnsom#topic Brief progress reports / bugs needing review16:05
johnsomI am mostly focused on reviews at this point. I have had some time off and will be off the rest of the week for a US holiday.16:06
johnsomI am still working on the rate limiting RFE, but it is slow progress at the moment due to my need to work on other things downstream16:07
gthiemongeI've had some activity in launchpad, answering to questions/comments.16:07
johnsomYeah, there have been some bugs that came through this week. One I bounced to neutron as it was an OVN change in behavior issue that we have no control over.16:09
johnsomOk, moving on16:11
johnsom#topic Open Discussion16:11
johnsomAny other topics this week?16:11
jovialI've been hitting issues with using hard anti affinity and nova scheduling races, that I'd like to ask your opinions about16:12
gthiemongeI think it's related to https://bugs.launchpad.net/octavia/+bug/2064600 (see the newer comments)16:13
jovialI put some observations in https://bugs.launchpad.net/octavia/+bug/206460016:13
jovialThat is the one. Essentially we come up against https://docs.openstack.org/nova/latest/admin/troubleshooting/affinity-policy-violated.html as octavia is launching both instances in parallel16:13
johnsomOh joy. Yes, there are a number of bugs in nova around anti-affinity. Some that have been reported just became documentation instead of fixes.... Sigh16:14
jovialI was wondering what you thought about making octavia boot them serially16:15
jovialor at least have some delay16:15
jovialI've also opened a bug against nova to get their input: https://bugs.launchpad.net/nova/+bug/213298416:18
gthiemongeoh nice16:19
gthiemonge+1 for the nova bug ;-)16:19
johnsomWell, serial is super easy to implement by just changing the flow to linear, but that seems like a sad user experience.16:19
gthiemongejohnsom: I think we can inject some kind of dependencies between the tasks in taskflow, like the creation of the BACKUP VM would wait for the creation of the MASTER VM to complete16:19
johnsomI don't like the nova implementation as it should be atomic with the server groups IMO and not have this race.16:19
gthiemonge(it needs to be evaluated, it may cause thread-safety issues in taskflow)16:21
johnsomSorry I didn't have these links ahead to read everything. I'm also intrigued by this new "multi-create" API in the linked nova doc. This might also be a good path for us.16:21
johnsomWell, the above, wait for primary to boot is basically a linear flow....16:22
gthiemongebut when both VMs are active, the rest of the tasks are executed in parallel16:22
jovialI did run that by John Garbutt (my colleague in the nova team) and he made a disapproving face :laugh:, but the multi-create api is the direction the docs seem to suggest16:22
jovialI can only think that there might be some dragons with using the multi-create api or that it isn't as widely tested16:23
johnsomSince nova is super slow, we pushed to use an unordered flow there to parallelize it. It's just sad that nova isn't able to handle that properly.16:23
johnsomJohn is a great guy, I have worked with him in the past...16:23
gthiemongeif we want to use the multi-create API, it needs to be called before starting the unordered flows16:24
johnsomAgreed, there are sadly a large number of dragons in the nova code. We have a ton of workarounds already.16:24
johnsomYeah, multi-create would replace the unordered flow for at least some of it.16:25
johnsomI lean towards becoming the new tempest job for nova that uses the multi-create API to get around this issue. Thoughts?16:26
jovialWould it be a big change to switch to creating both at the same time with multi-create? I'm not that familiar with the code, but doesn't each amphora build get added a build queue? Just wondering if that would cause a substantial reworking16:27
johnsomThe part that concerns me is that we load a unique certificate per VM for the two way TLS authentication. This might be a problem with multi-create.16:29
gthiemongethat would be a huge change in the create LB flow: https://docs.openstack.org/octavia/latest/_images/LoadBalancerFlows-get_create_load_balancer_flow.svg16:29
johnsomEach amphora gets a unique certificate so if one is compromised, we can isolate it.16:29
jovial^ thanks - nice to have a digram16:29
johnsomlol, you are welcome. 16:30
johnsomMASTER-octavia-create-amp-for-lb-subflow-octavia-generate-serverpem16:30
johnsomThat is the challenging part16:30
gthiemongebasically all the tasks betweeh {MASTER,BACKUP}-octavia-create-amp-fow-lb-subflow and {MASTER,BACKUP}-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create need to be put before the creation of the 2 subflows16:30
gthiemongejohnsom: right16:30
jovialAnd would it make sense to make a new octavia bug? I feel I've kind of hijacked that other one with another related problem; the original issue seemed to be loadbalancers getting stuck in pending_create state.  Or do you think it is the same bug?16:31
jovial^ I could ask John to add his opinions on using the multi-create API if that helps16:32
gthiemongestuck in pending_create doesn't look related to a nova issue, that sounds more like https://bugs.launchpad.net/octavia/+bug/2043360 which was a bug in taskflow16:32
johnsomIn general I am a fan of having more bugs. It's easy to close as duplicate if it is, but having separate bugs to track issues is helpful.16:33
jovialI think that was my conclusion in the end :)16:33
jovialjohnsom, cool, is it better to link the nova bug with octavia or just make a separate one? I can make something after the meeting.16:34
johnsomIn launchpad you can assign a bug to multiple projects (i.e. both octavia and nova) if you think it spans the projects16:34
johnsomSeems like this is the key statement in the nova doc: "Future work is needed to add anti-/affinity support to the placement service in order to eliminate the need for the late affinity check in nova-compute."16:36
johnsomWe create the server group early, I really don't understand why nova can't lock around that to sequence the compute create requests.16:37
jovialI mean, it makes total sense to me. Would it make to ask that question in my nova bug?16:39
johnsomI think it's a fair question for sure16:39
johnsomYeah, looking at the multi-create API, we can't use it as the config drive information (mostly the certificate) is unique per compute instance.16:40
johnsomTheir multi-create API doesn't allow this16:41
johnsomPlus, as you mentioned, it is probably not well tested16:41
jovialI did not know about the config drive restriction, thanks for highlighting it16:42
johnsomYeah, we inject some information at boot into the VM to establish a chain of trust. (like secure boot, but not... lol)16:43
johnsomOk, I think I need to think about this more. How about a path forward of:16:45
johnsom1. Open a new Octavia bug to track what we have learned.16:45
johnsom2. Follow up on the nova bug to see if they will fix this.16:46
johnsom3. I will think about the Octavia bug and see what we can do to handle these situation better in the flow. (yet another nova work around...)16:47
johnsomAny other steps?16:47
gthiemongewise words16:47
jovialSounds like a good plan to me. I'll make a new bug after this :)16:47
johnsomThank you for raising this.16:47
gthiemongejohnsom: I'll take a look at taskflow, I think adding a dependency between the flows will not block the 2nd flow during a long period16:48
johnsomYeah, I only know of a way to revert and go down another path. I don't know about cross flow coordination.16:49
johnsomI mean we could always break that down and make the nova boot part linear. It would be sad to slow that down, but it's not too hard.16:49
johnsomI don't think we should add a sleep for the secondary VM boot as clouds have different performance and sleep times are a roll of the dice16:50
gthiemongeack16:51
johnsomgthiemonge Thanks for looking at taskflow. Let me know what you find16:51
gthiemongenp, thank you guys!16:51
jovialFor what it is worth, I think there is a similar problem with soft-anti-affinity, but that at least that doesn't cause a failure, just both vms come up on the same hypervisor16:52
johnsomYeah, more often than not they come up on the same host16:52
johnsomOk, I think we have a plan. Any other open discussion items today?16:53
jovialnothing from me16:54
gthiemongenothing16:54
johnsomThank you all for the great discussion. Have a great week!16:55
johnsom#endmeeting16:55
opendevmeetMeeting ended Wed Nov 26 16:55:06 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:55
opendevmeetMinutes:        https://meetings.opendev.org/meetings/octavia/2025/octavia.2025-11-26-16.01.html16:55
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/octavia/2025/octavia.2025-11-26-16.01.txt16:55
opendevmeetLog:            https://meetings.opendev.org/meetings/octavia/2025/octavia.2025-11-26-16.01.log.html16:55
jovialThanks all16:55
gthiemongethanks!16:55
jovialI've created the bug report here: https://bugs.launchpad.net/octavia/+bug/2133042. I also linked to the meeting notes. Please let me know if you need any extra details or logs.17:11
johnsomThank you!17:12

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!