| rbachman[m] | Hi all, question re. amphora management: Is there a mechanism for running multiple amphora images in parallel? I'm thinking of cases like running a stable image as default, but then making a new image available for people to test before it becomes the new default, or providing an image with some special feature in parallel to the default one. I know we have the amp_image_tag setting in the control plane config, but that just | 06:50 |
|---|---|---|
| rbachman[m] | allows for one, right? | 06:50 |
| gthiemonge | rbachman[m]: hey, you can use the amp_image_tag attribute in the octavia flavors, see https://github.com/openstack/octavia/blob/master/releasenotes/notes/add-amphora-image-tag-capability-ba2ea034bc01ab48.yaml | 07:15 |
| opendevreview | Gregory Thiemonge proposed openstack/octavia stable/2024.2: DNM/WIP Testing tog offloading on 2024.2 https://review.opendev.org/c/openstack/octavia/+/983011 | 08:06 |
| opendevreview | Gregory Thiemonge proposed openstack/octavia stable/2024.2: DNM/WIP Testing tog offloading on 2024.2 https://review.opendev.org/c/openstack/octavia/+/983011 | 08:40 |
| gthiemonge | ^^ FYI log offloading is broken on 2024.2, traffic ops tests are failing | 08:40 |
| opendevreview | Gregory Thiemonge proposed openstack/octavia stable/2024.2: Move log offload files to /var/log/octavia https://review.opendev.org/c/openstack/octavia/+/983016 | 09:13 |
| yessou-sami | Hi all, i have one question regarding the octavia-dashboard, as of today creating an OVN Octavia LB from the octavia-dashboard is not possible due to this issue https://bugs.launchpad.net/octavia/+bug/2111590 which seems to be stuck as per https://review.opendev.org/c/openstack/octavia-dashboard/+/938133 , it would make sense to revive this topic | 10:59 |
| yessou-sami | as the problem i think lies between making OVN LB usable from the UI but at the same time avoiding creating confusion to users using Amphora | 10:59 |
| yessou-sami | I think an option could be to find a way to set a default "provider" for Amphora, and if that setting or variable is set also with OVN then the SOURCE_IP_PORT is shown in the selection | 11:00 |
| gthiemonge | yessou-sami: I think the main problem is that if we allow users to specify a provider (like ovn-provider), most of the settings that can be set in the dashboard won't be supported by the ovn-provider | 11:05 |
| gthiemonge | yessou-sami: and the dashboard doesn't show the cause of the errors when something wrong happens | 11:05 |
| gthiemonge | I opened https://bugs.launchpad.net/octavia/+bug/2013722 long time ago | 11:06 |
| rbachman[m] | gthiemonge: Ah, I missed that as we haven't used it so far. Should be spot on, thanks! | 11:07 |
| yessou-sami | gthiemonge right, but if we allow users to specify that then openstack providers could provide documentation around it, (as usually loadbalancer are used by more tech savy users) | 11:10 |
| gthiemonge | yessou-sami: yeah i agree, first we need to fix 2013722 or the UIX will be aweful | 11:13 |
| gthiemonge | I can give it a try | 11:14 |
| gthiemonge | (or Claude code will do it) | 11:14 |
| yessou-sami | Okk perfect gthiemonge i can also help out if you want, i am not a developer but with some AI assistance i can do it | 11:22 |
| opendevreview | Gregory Thiemonge proposed openstack/octavia-dashboard master: Add 'provider' select box in load balancer form https://review.opendev.org/c/openstack/octavia-dashboard/+/775561 | 12:47 |
| opendevreview | Gregory Thiemonge proposed openstack/octavia-dashboard master: Display API error details in toast notifications https://review.opendev.org/c/openstack/octavia-dashboard/+/983047 | 12:47 |
| gthiemonge | yessou-sami: ^ see these 2 patches, you can test it, the UX is not great, when I create a OVN LB with an HTTP listener (which is not supported), the error is not displayed (because the creation of the listener is async in the dashboard) | 12:49 |
| yessou-sami | gthiemonge thank you! will test them and let you know (idk if today or in a few days) | 13:25 |
| -opendevstatus- NOTICE: The opendev.org site is currently experiencing overwhelming load adversely impacting git operations and repository browsing since 12:20 UTC today, mitigation work is in progress | 14:41 | |
| gthiemonge | #startmeeting Octavia | 16:00 |
| opendevmeet | Meeting started Wed Apr 1 16:00:17 2026 UTC and is due to finish in 60 minutes. The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
| opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
| opendevmeet | The meeting name has been set to 'octavia' | 16:00 |
| gthiemonge | o/ | 16:00 |
| gthiemonge | #topic Announcements | 16:02 |
| gthiemonge | in case you missed it: | 16:03 |
| gthiemonge | [openstack-announce] OpenStack 2026.1 "Gazpacho" is officially released! | 16:03 |
| gthiemonge | congrats and thank you everyone! | 16:03 |
| gthiemonge | unfortunately, we have 2 known issues in the release | 16:04 |
| raineszm | o/ | 16:04 |
| rcruise | o/ | 16:04 |
| gthiemonge | - first one: a bug with the generated certificates (for the management) with python 3.13 | 16:04 |
| gthiemonge | backports are in progress | 16:04 |
| gthiemonge | - and random deadlocks during the first calls to the amphora-agent | 16:06 |
| gthiemonge | it's due to a bug in gunicorn | 16:06 |
| gthiemonge | I have a workaround, but I'm trying to find if we really needed it | 16:06 |
| gthiemonge | (in case we can skip the buggy release of gunicorn) | 16:07 |
| rcruise | FWIW I ran a test to restart ginicorn 5 times on the amphora with the new version and it failed to restart once for a completely different reason | 16:09 |
| raineszm | which version of gunicorn is the culprit? | 16:09 |
| rcruise | So updating might not be an overall gain | 16:09 |
| gthiemonge | I created/deleted LBs in a loop during 30 min with 25.2.0, it worked fine | 16:09 |
| gthiemonge | raineszm: it's 25.1.0 | 16:10 |
| gthiemonge | https://github.com/benoitc/gunicorn/discussions/3509 | 16:10 |
| raineszm | ty | 16:10 |
| gthiemonge | launchpad is almost unresponsive, so I cannot share the links to the octavia bug reports :/ | 16:11 |
| gthiemonge | I pinged the requirement team, I would like to know what we can do for gazpacho | 16:12 |
| rcruise | There seems to have been a few issues the past 2 days, I was getting errors yesterday as well | 16:12 |
| gthiemonge | on master the issue will be fixed when the requirements are updated | 16:12 |
| -opendevstatus- NOTICE: Load on the opendev.org Gitea backends is under control again for now, if any Zuul jobs failed with SSL errors or disconnects reaching the service they can be safely rechecked | 16:12 | |
| gthiemonge | we can eventually merge the workaround on master and 2026.1, then remove it if gunicorn is updated | 16:13 |
| gthiemonge | workaround is: | 16:13 |
| gthiemonge | https://review.opendev.org/c/openstack/octavia/+/982615 | 16:13 |
| gthiemonge | feel free to comment in the patch ^^ | 16:15 |
| gthiemonge | then | 16:16 |
| gthiemonge | not related to gazpacho | 16:16 |
| gthiemonge | log offloading is again broken on older stable branches | 16:16 |
| gthiemonge | I had to backport | 16:16 |
| gthiemonge | https://review.opendev.org/c/openstack/octavia/+/983016 | 16:16 |
| gthiemonge | rcruise: raineszm: ^ please review | 16:16 |
| raineszm | o7 | 16:16 |
| gthiemonge | thanks | 16:17 |
| gthiemonge | that's all with my announcements, do you have anything else folks? | 16:17 |
| raineszm | no announcements. Just stuff for discussion | 16:18 |
| rcruise | No announcements from me | 16:18 |
| gthiemonge | #topic Brief progress reports / bugs needing review | 16:18 |
| raineszm | I've been doing a little digging into https://bugs.launchpad.net/octavia/+bug/2144015 | 16:19 |
| raineszm | and https://review.opendev.org/c/openstack/octavia/+/934638 | 16:19 |
| gthiemonge | I noticed that the test_backup_member tests is randomly failing in the CI, timing issues, I proposed a fix that ensures that the operator_status of the members is correct before testing them: | 16:19 |
| gthiemonge | https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/982741 | 16:20 |
| raineszm | will take a look | 16:20 |
| rcruise | I'm mostly looking at reviews, I'm getting tagged in a lot of them at the moment | 16:21 |
| rcruise | I've been seeing an issue in CentOS 10 where I get GPG issues building CentOS 9 amphora images | 16:22 |
| gthiemonge | I've also updated this old patch https://review.opendev.org/c/openstack/octavia/+/919846 | 16:22 |
| rcruise | I'm trying to see if it's something broken in my environment or a bigger issue | 16:23 |
| gthiemonge | it fixes haproxy config with tls 1.3 ciphers | 16:23 |
| gthiemonge | you're not supposed to build centos 9 images on c10s :D | 16:23 |
| rcruise | Well that explains the problem :( | 16:24 |
| gthiemonge | I'm not sure the c9s mirrors are still up | 16:25 |
| rcruise | Alright, I guess the good news is there's no bug there | 16:25 |
| rcruise | Yeah I ended up building the image from source packages, took ages | 16:25 |
| gthiemonge | #topic Open Discussion | 16:27 |
| gthiemonge | anything else for today? | 16:27 |
| raineszm | So I wanted to get some input on the approach for the bug/ review I linked | 16:27 |
| gthiemonge | ok | 16:28 |
| raineszm | It seems to me there are two key issues. One is trying to reduce how often we give up and land in the error state. | 16:28 |
| raineszm | And the other is to try to automatically failover when recovering from an outage. | 16:28 |
| raineszm | The first of these seems like it would be best addressed by e.g. adding a retry with back off to the flow for compute build | 16:29 |
| gthiemonge | Yeah we should not make the situation worse while trying to mitigate the outage | 16:29 |
| raineszm | The latter I’m a little less sure about but I’m not sure the current review addresses this. | 16:29 |
| raineszm | And it would be nice to not have to add a migration. | 16:30 |
| gthiemonge | sorry i don't understand the second point | 16:30 |
| raineszm | Would we be open to trying to accomplish the above two things as an approach to addressing the ask? | 16:30 |
| rcruise | I'm wondering rather than a retry with back-off timer for failing over, should we also have some pre-checks to see if Nova is running properly? | 16:31 |
| raineszm | Rcruise: also a good idea | 16:31 |
| rcruise | In other words, we should not allow a failover that would just result in an ERROR | 16:31 |
| raineszm | Gthiemonge: sorry which point | 16:31 |
| gthiemonge | rcruise: the Nova API might be responsive, but a remote AZ might be down | 16:32 |
| rcruise | gthiemonge: True, and we have to trust Nova to some extent, we can't go doing in depth checks | 16:33 |
| gthiemonge | FYI there's feature that detects global outage and blocks failovers: https://review.opendev.org/c/openstack/octavia/+/656811 | 16:33 |
| gthiemonge | it's not configured/enabled by default | 16:33 |
| raineszm | Nice that’s good to know. | 16:35 |
| raineszm | We had a case where someone had an intermittent outage and their load balancers landed in error. | 16:35 |
| raineszm | And they had to go back and manually fail everything over when recovering | 16:35 |
| gthiemonge | IMHO if we add the backoff thing to the new feature, we should be fine | 16:35 |
| raineszm | So that’s what I was getting at with the second point. | 16:36 |
| gthiemonge | raineszm: the key of the failover threshold feature is the configuration | 16:36 |
| raineszm | As in tuning the threshold? | 16:38 |
| gthiemonge | yeah, what would be a good value? 5, 10, 1000? | 16:38 |
| rcruise | Perhaps we should enable the circuit breaker by default? It seems that the automatic failover can cause as many problems as it solves when there's a wider outage | 16:38 |
| gthiemonge | each cloud is different, so it's tricky | 16:39 |
| gthiemonge | rcruise: yeah perhaps | 16:39 |
| gthiemonge | raineszm: you mentioned a migration, what are you talking about? | 16:40 |
| raineszm | The current patch added a column to the database | 16:41 |
| raineszm | An alembic migration | 16:41 |
| gthiemonge | ha ok db migration :D | 16:41 |
| rcruise | What could go wrong? :D | 16:42 |
| gthiemonge | i don't see how we can avoid it | 16:42 |
| gthiemonge | could be annoying if you want a downstream backport | 16:42 |
| raineszm | Can’t retry be done in task flow without touching the db? | 16:42 |
| gthiemonge | yeah like retrying the tasks instead of retrying the full flow | 16:43 |
| raineszm | As far as I can tell the current patch only counts errors | 16:43 |
| raineszm | Right. My point was that failover on error might often be an xy problem. And actually we want more robust retry behavior on the tasks so we don’t land in error | 16:44 |
| raineszm | Which would be a different approach | 16:45 |
| gthiemonge | yeah i see | 16:45 |
| gthiemonge | it would be like the k8s approach: do it until it succeeds | 16:46 |
| rcruise | Hmm, I wonder if this could also help the issue we've seen with zombie amphorae after an outage. If the retry was a bit more robust it could ensure old amphora are deleted? | 16:46 |
| gthiemonge | then people would ask "why is my LB stuck in PENDING_* status" :D | 16:46 |
| gthiemonge | rcruise: probably | 16:47 |
| gthiemonge | i don't have an answer today | 16:47 |
| raineszm | Anyway. That’s what I was thinking about. Thought I’d bring it up so we can think about it | 16:48 |
| gthiemonge | let's think about it, there are probably some old notes in the PTG etherpads | 16:48 |
| gthiemonge | it's an interesting topic | 16:48 |
| raineszm | Sounds good. | 16:48 |
| gthiemonge | anything else guys? | 16:51 |
| raineszm | That’s it for me | 16:52 |
| gthiemonge | ok | 16:53 |
| gthiemonge | thank you! have a good one! | 16:53 |
| gthiemonge | #endmeeting | 16:53 |
| opendevmeet | Meeting ended Wed Apr 1 16:53:24 2026 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:53 |
| opendevmeet | Minutes: https://meetings.opendev.org/meetings/octavia/2026/octavia.2026-04-01-16.00.html | 16:53 |
| opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/octavia/2026/octavia.2026-04-01-16.00.txt | 16:53 |
| opendevmeet | Log: https://meetings.opendev.org/meetings/octavia/2026/octavia.2026-04-01-16.00.log.html | 16:53 |
| raineszm | Have a good one yall | 16:54 |
| *** rcruise is now known as rcruise-mobile | 17:15 | |
| rcruise | IDENTIFY | 17:18 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!