16:00:40 <gthiemonge> #startmeeting Octavia
16:00:40 <opendevmeet> Meeting started Wed Jan 15 16:00:40 2025 UTC and is due to finish in 60 minutes.  The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:40 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:40 <opendevmeet> The meeting name has been set to 'octavia'
16:00:43 <gthiemonge> hey
16:00:53 <tweining> o/
16:01:08 <johnsom> o/
16:02:29 <gthiemonge> #topic Announcements
16:02:35 <gthiemonge> * 2025.1 Epoxy Release Schedule
16:02:47 <gthiemonge> we passed Epoxy-2 milestone last week
16:02:54 <gthiemonge> the next important milestones are
16:03:01 <gthiemonge> - Final release for non-client libraries (octavia-lib) - Feb 20
16:03:07 <gthiemonge> - Feature freeze/final release for client libraries - Feb 27
16:03:19 <gthiemonge> so, basically in one month
16:03:48 <gthiemonge> and I would like to take the opportunity to request reviews on the "Custom Security Groups on VIP ports" feature
16:03:48 <johnsom> Yeah, feature freeze is coming up quick
16:03:54 <gthiemonge> https://review.opendev.org/q/topic:%22custom_sg%22+is:open
16:04:13 <gthiemonge> note: I'm working on a python-octaviaclient patch, to make it easier for the reviewers to test the feature
16:04:30 <gthiemonge> yeah
16:04:41 <tweining> FYI, I will be on PTO for one week mid-Feb
16:04:46 <gthiemonge> ack
16:06:05 <gthiemonge> * 2025.2 F Release
16:06:22 <gthiemonge> another important update: It's official, 2025.2 will be named Flamingo!
16:06:53 <tweining> not a bad name IMO
16:07:46 <gthiemonge> any other updates/announcements folks?
16:08:41 <johnsom> PTL elections is coming up too
16:09:03 <johnsom> PTL Election from 2025-02-26T23:45 to 2025-03-19T23:45
16:09:03 <gthiemonge> wow
16:09:29 <johnsom> They announced the dates on the mailing list. It's a bit of a longer window I understand
16:09:49 <gthiemonge> yeah
16:10:38 <johnsom> Nominations start 2/5
16:12:24 <gthiemonge> ack
16:12:28 <gthiemonge> thanks johnsom
16:12:38 <gthiemonge> #topic CI Status
16:12:58 <gthiemonge> we've made a lot of progress there
16:13:16 <gthiemonge> we have fixed a great number of issues (pep8 x2, doc, tls/httpx)
16:13:30 <gthiemonge> we migrated the jobs of the master branch to ubuntu noble
16:13:36 <gthiemonge> we updated the jobs for 2025.1
16:13:38 <gthiemonge> etc...
16:13:40 <tweining> very good
16:14:21 <gthiemonge> (and all the disabled jobs have been re-enabled)
16:14:31 <gthiemonge> so yeah.. thanks for your help guys
16:15:28 <gthiemonge> #topic Brief progress reports / bugs needing review
16:16:22 <gthiemonge> I already talked about my patches in the announcement..
16:17:20 <tweining> the only update I have is that rate limiting is no longer realistic for Epoxy. it's too much work to do still
16:17:20 <johnsom> I am finally back to being able to work on the SRIOV for members / tech debt patch
16:19:31 <gthiemonge> cool
16:20:33 <gthiemonge> just a quick note: we have ~20 backports in review in gerrit: https://review.opendev.org/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard)+status:open+branch:%5Estable/.*
16:22:35 <gthiemonge> #topic Open Discussion
16:23:04 <gthiemonge> any other topics for this meeting?
16:23:12 <danfai> hi, we had recently lost an amphora due to a kernel panic. I was wondering if there was consideration of having a watchdog on the amphora or most people just leverage the failover and let the old VM die?
16:24:12 <gthiemonge> danfai: AFAIK no, we never had a plan for it
16:24:40 <tweining> watchdog meaning to reboot the vm when it panics I guess
16:24:48 <gthiemonge> danfai: I think active-standby + failover can be the solution
16:24:49 <johnsom> Yeah, there already is a watchdog that catches kernel panics (though I have not seen this issue). It is the health manager process. If the Amphora doesn't respond in 60 seconds (default config) the health manager will automatically fail over the Amphora
16:25:06 <danfai> tweining: correct, with the libvirt process involved to detect it as well
16:25:37 <danfai> gthiemonge: yes, this is what I thought, in our use case we disabled the automatic failover for a few reasons and thought about different solutions
16:25:46 <johnsom> Automatic reboots are not enough, a reboot will loose the cryptography keys
16:26:17 <johnsom> Why would you disable the automatic failovers????? Not a good idea really
16:27:03 <johnsom> It's kind of the last layer of defense against nova/neutron failures in our overall HA strategy
16:27:49 <danfai> johnsom: political and historical reasons with not trusting automated systems that introduced more downtime before my time, I would say. Plus a few others, that I cannot easily discuss online
16:28:58 <danfai> if there is an interest in such an dib-element, I can propose it upstream, but also happy to keep this a downstream patch for now. (only change is to the image anyway)
16:29:05 <johnsom> Well, that is the exact watchdog you are looking for.
16:29:35 <johnsom> What is this db-element?
16:29:39 <gthiemonge> johnsom: I guess that after a reboot, losing the crypto keys will trigger a failover because the listeners don't start, right?
16:29:52 <johnsom> gthiemonge correct
16:30:01 <danfai> well the watchdog I mean here is contained in a hypervisor and would work even if the whole octavia/nova control plane is down. It is libvirt triggering the reboot
16:30:08 <johnsom> But, since they have turned that off, it will just be a broken LB
16:30:40 <danfai> or you need to have keys stored persistently on disk
16:30:54 <danfai> which might not be the best idea either
16:31:08 <johnsom> danfai The tenant keys are stored in encrypted RAM disk
16:31:24 <johnsom> They are never stored on disk
16:31:43 <danfai> johnsom: dib-element = disk image builder, the one for building the amphora image, then used to spawn amphoras. An element would be one of the jobs run there
16:32:06 <johnsom> Yeah, I know disk image builder, I wrote all of the code for Octavia image building
16:32:21 <danfai> johnsom: If you use the default image from octavia, yes they would be stored in tmpfs
16:32:32 <johnsom> I was asking what is your proposed change there
16:33:28 <danfai> it would be to have automated restarts, but I see that in the default behavior this would not work, if the keys could not be loaded anymore or there needs to be another layer to send them again, which then defeats the purpose
16:33:38 <gthiemonge> yeah, i don't think we need such an element, it would not match most use cases
16:34:09 <danfai> +1, ok, thanks
16:34:22 <johnsom> Inside the amp we already have systemd auto restarts and keepalived failovers for Active/Standby topologies.
16:35:35 <johnsom> But a kernel panic would mean nothing *inside* the amphora can be done beyond having the kernel reboot on panic automatically
16:38:15 <danfai> yeah, I see if the keys are not there before the panic, you don't have a chance.
16:38:39 <danfai> *not there = not persisted
16:39:58 <gthiemonge> so if you want to propose the feature, I guess it would be an optional feature in the disk image create script, so I don't know if it's worth it
16:41:19 <danfai> yes, and it could only work if certs-ramfs is not enabled, which would not be best practice
16:42:48 <danfai> anyway thanks for the feedback. I think this is covered now. also thanks for the comments on the active/active spec
16:43:02 <gthiemonge> np, thank you danfai!
16:43:31 <gthiemonge> anything else folks?
16:43:52 <tweining> nope
16:44:37 <johnsom> Nothing here
16:44:59 <danfai> not from me
16:45:01 <gthiemonge> ok, good discussions!
16:45:05 <gthiemonge> thank you folks!
16:45:12 <gthiemonge> #endmeeting