Wednesday, 2026-04-01

rbachman[m]Hi all, question re. amphora management: Is there a mechanism for running multiple amphora images in parallel? I'm thinking of cases like running a stable image as default, but then making a new image available for people to test before it becomes the new default, or providing an image with some special feature in parallel to the default one. I know we have the amp_image_tag setting in the control plane config, but that just06:50
rbachman[m]allows for one, right?06:50
gthiemongerbachman[m]: hey, you can use the amp_image_tag attribute in the octavia flavors, see https://github.com/openstack/octavia/blob/master/releasenotes/notes/add-amphora-image-tag-capability-ba2ea034bc01ab48.yaml07:15
opendevreviewGregory Thiemonge proposed openstack/octavia stable/2024.2: DNM/WIP Testing tog offloading on 2024.2  https://review.opendev.org/c/openstack/octavia/+/98301108:06
opendevreviewGregory Thiemonge proposed openstack/octavia stable/2024.2: DNM/WIP Testing tog offloading on 2024.2  https://review.opendev.org/c/openstack/octavia/+/98301108:40
gthiemonge^^ FYI log offloading is broken on 2024.2, traffic ops tests are failing08:40
opendevreviewGregory Thiemonge proposed openstack/octavia stable/2024.2: Move log offload files to /var/log/octavia  https://review.opendev.org/c/openstack/octavia/+/98301609:13
yessou-samiHi all, i have one question regarding the octavia-dashboard, as of today creating an OVN Octavia LB from the octavia-dashboard is not possible due to this issue https://bugs.launchpad.net/octavia/+bug/2111590 which seems to be stuck as per https://review.opendev.org/c/openstack/octavia-dashboard/+/938133 , it would make sense to revive this topic10:59
yessou-samias the problem i think lies between making OVN LB usable from the UI but at the same time avoiding creating confusion to users using Amphora10:59
yessou-samiI think an option could be to find a way to set a default "provider" for Amphora, and if that setting or variable is set also with OVN then the SOURCE_IP_PORT is shown in the selection11:00
gthiemongeyessou-sami: I think the main problem is that if we allow users to specify a provider (like ovn-provider), most of the settings that can be set in the dashboard won't be supported by the ovn-provider11:05
gthiemongeyessou-sami: and the dashboard doesn't show the cause of the errors when something wrong happens11:05
gthiemongeI opened https://bugs.launchpad.net/octavia/+bug/2013722 long time ago11:06
rbachman[m]gthiemonge: Ah, I missed that as we haven't used it so far. Should be spot on, thanks!11:07
yessou-samigthiemonge right, but if we allow users to specify that then openstack providers could provide documentation around it, (as usually loadbalancer are used by more tech savy users)11:10
gthiemongeyessou-sami: yeah i agree, first we need to fix 2013722 or the UIX will be aweful11:13
gthiemongeI can give it a try11:14
gthiemonge(or Claude code will do it)11:14
yessou-samiOkk perfect gthiemonge i can also help out if you want, i am not a developer but with some AI assistance i can do it11:22
opendevreviewGregory Thiemonge proposed openstack/octavia-dashboard master: Add 'provider' select box in load balancer form  https://review.opendev.org/c/openstack/octavia-dashboard/+/77556112:47
opendevreviewGregory Thiemonge proposed openstack/octavia-dashboard master: Display API error details in toast notifications  https://review.opendev.org/c/openstack/octavia-dashboard/+/98304712:47
gthiemongeyessou-sami: ^ see these 2 patches, you can test it, the UX is not great, when I create a OVN LB with an HTTP listener (which is not supported), the error is not displayed (because the creation of the listener is async in the dashboard)12:49
yessou-samigthiemonge thank you! will test them and let you know (idk if today or in a few days)13:25
-opendevstatus- NOTICE: The opendev.org site is currently experiencing overwhelming load adversely impacting git operations and repository browsing since 12:20 UTC today, mitigation work is in progress14:41
gthiemonge#startmeeting Octavia16:00
opendevmeetMeeting started Wed Apr  1 16:00:17 2026 UTC and is due to finish in 60 minutes.  The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'octavia'16:00
gthiemongeo/16:00
gthiemonge#topic Announcements16:02
gthiemongein case you missed it:16:03
gthiemonge[openstack-announce] OpenStack 2026.1 "Gazpacho" is officially released!16:03
gthiemongecongrats and thank you everyone!16:03
gthiemongeunfortunately, we have 2 known issues in the release16:04
raineszmo/16:04
rcruiseo/16:04
gthiemonge- first one: a bug with the generated certificates (for the management) with python 3.1316:04
gthiemongebackports are in progress16:04
gthiemonge- and random deadlocks during the first calls to the amphora-agent16:06
gthiemongeit's due to a bug in gunicorn16:06
gthiemongeI have a workaround, but I'm trying to find if we really needed it16:06
gthiemonge(in case we can skip the buggy release of gunicorn)16:07
rcruiseFWIW I ran a test to restart ginicorn 5 times on the amphora with the new version and it failed to restart once for a completely different reason 16:09
raineszmwhich version of gunicorn is the culprit?16:09
rcruiseSo updating might not be an overall gain16:09
gthiemongeI created/deleted LBs in a loop during 30 min with 25.2.0, it worked fine16:09
gthiemongeraineszm: it's 25.1.016:10
gthiemongehttps://github.com/benoitc/gunicorn/discussions/350916:10
raineszmty16:10
gthiemongelaunchpad is almost unresponsive, so I cannot share the links to the octavia bug reports :/16:11
gthiemongeI pinged the requirement team, I would like to know what we can do for gazpacho16:12
rcruiseThere seems to have been a few issues the past 2 days, I was getting errors yesterday as well16:12
gthiemongeon master the issue will be fixed when the requirements are updated16:12
-opendevstatus- NOTICE: Load on the opendev.org Gitea backends is under control again for now, if any Zuul jobs failed with SSL errors or disconnects reaching the service they can be safely rechecked16:12
gthiemongewe can eventually merge the workaround on master and 2026.1, then remove it if gunicorn is updated16:13
gthiemongeworkaround is:16:13
gthiemongehttps://review.opendev.org/c/openstack/octavia/+/98261516:13
gthiemongefeel free to comment in the patch ^^16:15
gthiemongethen16:16
gthiemongenot related to gazpacho16:16
gthiemongelog offloading is again broken on older stable branches16:16
gthiemongeI had to backport16:16
gthiemongehttps://review.opendev.org/c/openstack/octavia/+/98301616:16
gthiemongercruise: raineszm: ^ please review16:16
raineszmo716:16
gthiemongethanks16:17
gthiemongethat's all with my announcements, do you have anything else folks?16:17
raineszmno announcements. Just stuff for discussion16:18
rcruiseNo announcements from me16:18
gthiemonge#topic Brief progress reports / bugs needing review16:18
raineszmI've been doing a little digging into https://bugs.launchpad.net/octavia/+bug/214401516:19
raineszmand https://review.opendev.org/c/openstack/octavia/+/93463816:19
gthiemongeI noticed that the test_backup_member tests is randomly failing in the CI, timing issues, I proposed a fix that ensures that the operator_status of the members is correct before testing them:16:19
gthiemongehttps://review.opendev.org/c/openstack/octavia-tempest-plugin/+/98274116:20
raineszmwill take a look16:20
rcruiseI'm mostly looking at reviews, I'm getting tagged in a lot of them at the moment 16:21
rcruiseI've been seeing an issue in CentOS 10 where I get GPG issues building CentOS 9 amphora images 16:22
gthiemongeI've also updated this old patch https://review.opendev.org/c/openstack/octavia/+/91984616:22
rcruiseI'm trying to see if it's something broken in my environment or a bigger issue 16:23
gthiemongeit fixes haproxy config with tls 1.3 ciphers16:23
gthiemongeyou're not supposed to build centos 9 images on c10s :D16:23
rcruiseWell that explains the problem :(16:24
gthiemongeI'm not sure the c9s mirrors are still up16:25
rcruiseAlright, I guess the good news is there's no bug there16:25
rcruiseYeah I ended up building the image from source packages, took ages16:25
gthiemonge#topic Open Discussion16:27
gthiemongeanything else for today?16:27
raineszmSo I wanted to get some input on the approach for the bug/ review I linked16:27
gthiemongeok16:28
raineszmIt seems to me there are two key issues. One is trying to reduce how often we give up and land in the error state. 16:28
raineszmAnd the other is to try to automatically failover when recovering from an outage. 16:28
raineszmThe first of these seems like it would be best addressed by e.g. adding a retry with back off to the flow for compute build 16:29
gthiemongeYeah we should not make the situation worse while trying to mitigate the outage16:29
raineszmThe latter I’m a little less sure about but I’m not sure the current review addresses this. 16:29
raineszmAnd it would be nice to not have to add a migration. 16:30
gthiemongesorry i don't understand the second point16:30
raineszmWould we be open to trying to accomplish the above two things as an approach to addressing the ask?16:30
rcruiseI'm wondering rather than a retry with back-off timer for failing over, should we also have some pre-checks to see if Nova is running properly? 16:31
raineszmRcruise: also a good idea16:31
rcruiseIn other words, we should not allow a failover that would just result in an ERROR16:31
raineszmGthiemonge: sorry which point16:31
gthiemongercruise: the Nova API might be responsive, but a remote AZ might be down16:32
rcruisegthiemonge: True, and we have to trust Nova to some extent, we can't go doing in depth checks16:33
gthiemongeFYI there's feature that detects global outage and blocks failovers: https://review.opendev.org/c/openstack/octavia/+/65681116:33
gthiemongeit's not configured/enabled by default16:33
raineszmNice that’s good to know. 16:35
raineszmWe had a case where someone had an intermittent outage and their load balancers landed in error. 16:35
raineszmAnd they had to go back and manually fail everything over when recovering 16:35
gthiemongeIMHO if we add the backoff thing to the new feature, we should be fine16:35
raineszmSo that’s what I was getting at with the second point. 16:36
gthiemongeraineszm: the key of the failover threshold feature is the configuration16:36
raineszmAs in tuning the threshold?16:38
gthiemongeyeah, what would be a good value? 5, 10, 1000?16:38
rcruisePerhaps we should enable the circuit breaker by default? It seems that the automatic failover can cause as many problems as it solves when there's a wider outage 16:38
gthiemongeeach cloud is different, so it's tricky16:39
gthiemongercruise: yeah perhaps16:39
gthiemongeraineszm: you mentioned a migration, what are you talking about?16:40
raineszmThe current patch added a column to the database16:41
raineszmAn alembic migration16:41
gthiemongeha ok db migration :D16:41
rcruiseWhat could go wrong? :D16:42
gthiemongei don't see how we can avoid it16:42
gthiemongecould be annoying if you want a downstream backport16:42
raineszmCan’t retry be done in task flow without touching the db? 16:42
gthiemongeyeah like retrying the tasks instead of retrying the full flow16:43
raineszmAs far as I can tell the current patch only counts errors16:43
raineszmRight. My point was that failover on error might often be an xy problem. And actually we want more robust retry behavior on the tasks so we don’t land in error16:44
raineszmWhich would be a different approach16:45
gthiemongeyeah i see16:45
gthiemongeit would be like the k8s approach: do it until it succeeds16:46
rcruiseHmm, I wonder if this could also help the issue we've seen with zombie amphorae after an outage. If the retry was a bit more robust it could ensure old amphora are deleted? 16:46
gthiemongethen people would ask "why is my LB stuck in PENDING_* status" :D16:46
gthiemongercruise: probably16:47
gthiemongei don't have an answer today16:47
raineszmAnyway. That’s what I was thinking about. Thought I’d bring it up so we can think about it16:48
gthiemongelet's think about it, there are probably some old notes in the PTG etherpads16:48
gthiemongeit's an interesting topic16:48
raineszmSounds good. 16:48
gthiemongeanything else guys?16:51
raineszmThat’s it for me 16:52
gthiemongeok16:53
gthiemongethank you! have a good one!16:53
gthiemonge#endmeeting16:53
opendevmeetMeeting ended Wed Apr  1 16:53:24 2026 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:53
opendevmeetMinutes:        https://meetings.opendev.org/meetings/octavia/2026/octavia.2026-04-01-16.00.html16:53
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/octavia/2026/octavia.2026-04-01-16.00.txt16:53
opendevmeetLog:            https://meetings.opendev.org/meetings/octavia/2026/octavia.2026-04-01-16.00.log.html16:53
raineszmHave a good one yall16:54
*** rcruise is now known as rcruise-mobile17:15
rcruiseIDENTIFY17:18

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!