16:00:40 <gthiemonge> #startmeeting Octavia 16:00:40 <opendevmeet> Meeting started Wed Jan 15 16:00:40 2025 UTC and is due to finish in 60 minutes. The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:40 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:40 <opendevmeet> The meeting name has been set to 'octavia' 16:00:43 <gthiemonge> hey 16:00:53 <tweining> o/ 16:01:08 <johnsom> o/ 16:02:29 <gthiemonge> #topic Announcements 16:02:35 <gthiemonge> * 2025.1 Epoxy Release Schedule 16:02:47 <gthiemonge> we passed Epoxy-2 milestone last week 16:02:54 <gthiemonge> the next important milestones are 16:03:01 <gthiemonge> - Final release for non-client libraries (octavia-lib) - Feb 20 16:03:07 <gthiemonge> - Feature freeze/final release for client libraries - Feb 27 16:03:19 <gthiemonge> so, basically in one month 16:03:48 <gthiemonge> and I would like to take the opportunity to request reviews on the "Custom Security Groups on VIP ports" feature 16:03:48 <johnsom> Yeah, feature freeze is coming up quick 16:03:54 <gthiemonge> https://review.opendev.org/q/topic:%22custom_sg%22+is:open 16:04:13 <gthiemonge> note: I'm working on a python-octaviaclient patch, to make it easier for the reviewers to test the feature 16:04:30 <gthiemonge> yeah 16:04:41 <tweining> FYI, I will be on PTO for one week mid-Feb 16:04:46 <gthiemonge> ack 16:06:05 <gthiemonge> * 2025.2 F Release 16:06:22 <gthiemonge> another important update: It's official, 2025.2 will be named Flamingo! 16:06:53 <tweining> not a bad name IMO 16:07:46 <gthiemonge> any other updates/announcements folks? 16:08:41 <johnsom> PTL elections is coming up too 16:09:03 <johnsom> PTL Election from 2025-02-26T23:45 to 2025-03-19T23:45 16:09:03 <gthiemonge> wow 16:09:29 <johnsom> They announced the dates on the mailing list. It's a bit of a longer window I understand 16:09:49 <gthiemonge> yeah 16:10:38 <johnsom> Nominations start 2/5 16:12:24 <gthiemonge> ack 16:12:28 <gthiemonge> thanks johnsom 16:12:38 <gthiemonge> #topic CI Status 16:12:58 <gthiemonge> we've made a lot of progress there 16:13:16 <gthiemonge> we have fixed a great number of issues (pep8 x2, doc, tls/httpx) 16:13:30 <gthiemonge> we migrated the jobs of the master branch to ubuntu noble 16:13:36 <gthiemonge> we updated the jobs for 2025.1 16:13:38 <gthiemonge> etc... 16:13:40 <tweining> very good 16:14:21 <gthiemonge> (and all the disabled jobs have been re-enabled) 16:14:31 <gthiemonge> so yeah.. thanks for your help guys 16:15:28 <gthiemonge> #topic Brief progress reports / bugs needing review 16:16:22 <gthiemonge> I already talked about my patches in the announcement.. 16:17:20 <tweining> the only update I have is that rate limiting is no longer realistic for Epoxy. it's too much work to do still 16:17:20 <johnsom> I am finally back to being able to work on the SRIOV for members / tech debt patch 16:19:31 <gthiemonge> cool 16:20:33 <gthiemonge> just a quick note: we have ~20 backports in review in gerrit: https://review.opendev.org/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard)+status:open+branch:%5Estable/.* 16:22:35 <gthiemonge> #topic Open Discussion 16:23:04 <gthiemonge> any other topics for this meeting? 16:23:12 <danfai> hi, we had recently lost an amphora due to a kernel panic. I was wondering if there was consideration of having a watchdog on the amphora or most people just leverage the failover and let the old VM die? 16:24:12 <gthiemonge> danfai: AFAIK no, we never had a plan for it 16:24:40 <tweining> watchdog meaning to reboot the vm when it panics I guess 16:24:48 <gthiemonge> danfai: I think active-standby + failover can be the solution 16:24:49 <johnsom> Yeah, there already is a watchdog that catches kernel panics (though I have not seen this issue). It is the health manager process. If the Amphora doesn't respond in 60 seconds (default config) the health manager will automatically fail over the Amphora 16:25:06 <danfai> tweining: correct, with the libvirt process involved to detect it as well 16:25:37 <danfai> gthiemonge: yes, this is what I thought, in our use case we disabled the automatic failover for a few reasons and thought about different solutions 16:25:46 <johnsom> Automatic reboots are not enough, a reboot will loose the cryptography keys 16:26:17 <johnsom> Why would you disable the automatic failovers????? Not a good idea really 16:27:03 <johnsom> It's kind of the last layer of defense against nova/neutron failures in our overall HA strategy 16:27:49 <danfai> johnsom: political and historical reasons with not trusting automated systems that introduced more downtime before my time, I would say. Plus a few others, that I cannot easily discuss online 16:28:58 <danfai> if there is an interest in such an dib-element, I can propose it upstream, but also happy to keep this a downstream patch for now. (only change is to the image anyway) 16:29:05 <johnsom> Well, that is the exact watchdog you are looking for. 16:29:35 <johnsom> What is this db-element? 16:29:39 <gthiemonge> johnsom: I guess that after a reboot, losing the crypto keys will trigger a failover because the listeners don't start, right? 16:29:52 <johnsom> gthiemonge correct 16:30:01 <danfai> well the watchdog I mean here is contained in a hypervisor and would work even if the whole octavia/nova control plane is down. It is libvirt triggering the reboot 16:30:08 <johnsom> But, since they have turned that off, it will just be a broken LB 16:30:40 <danfai> or you need to have keys stored persistently on disk 16:30:54 <danfai> which might not be the best idea either 16:31:08 <johnsom> danfai The tenant keys are stored in encrypted RAM disk 16:31:24 <johnsom> They are never stored on disk 16:31:43 <danfai> johnsom: dib-element = disk image builder, the one for building the amphora image, then used to spawn amphoras. An element would be one of the jobs run there 16:32:06 <johnsom> Yeah, I know disk image builder, I wrote all of the code for Octavia image building 16:32:21 <danfai> johnsom: If you use the default image from octavia, yes they would be stored in tmpfs 16:32:32 <johnsom> I was asking what is your proposed change there 16:33:28 <danfai> it would be to have automated restarts, but I see that in the default behavior this would not work, if the keys could not be loaded anymore or there needs to be another layer to send them again, which then defeats the purpose 16:33:38 <gthiemonge> yeah, i don't think we need such an element, it would not match most use cases 16:34:09 <danfai> +1, ok, thanks 16:34:22 <johnsom> Inside the amp we already have systemd auto restarts and keepalived failovers for Active/Standby topologies. 16:35:35 <johnsom> But a kernel panic would mean nothing *inside* the amphora can be done beyond having the kernel reboot on panic automatically 16:38:15 <danfai> yeah, I see if the keys are not there before the panic, you don't have a chance. 16:38:39 <danfai> *not there = not persisted 16:39:58 <gthiemonge> so if you want to propose the feature, I guess it would be an optional feature in the disk image create script, so I don't know if it's worth it 16:41:19 <danfai> yes, and it could only work if certs-ramfs is not enabled, which would not be best practice 16:42:48 <danfai> anyway thanks for the feedback. I think this is covered now. also thanks for the comments on the active/active spec 16:43:02 <gthiemonge> np, thank you danfai! 16:43:31 <gthiemonge> anything else folks? 16:43:52 <tweining> nope 16:44:37 <johnsom> Nothing here 16:44:59 <danfai> not from me 16:45:01 <gthiemonge> ok, good discussions! 16:45:05 <gthiemonge> thank you folks! 16:45:12 <gthiemonge> #endmeeting