16:00:02 #startmeeting Octavia 16:00:03 Meeting started Wed Jan 29 16:00:02 2020 UTC and is due to finish in 60 minutes. The chair is rm_work. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:06 The meeting name has been set to 'octavia' 16:00:10 #chair johnsom 16:00:11 Current chairs: johnsom rm_work 16:00:14 #chair cgoncalves 16:00:15 Current chairs: cgoncalves johnsom rm_work 16:00:17 hi 16:00:20 o/ 16:00:22 o/ 16:00:30 hi 16:00:45 hi 16:02:08 looks like just a few of us today 16:02:13 or maybe this is the norm? anywho 16:02:15 #topic Announcements 16:02:44 The NDSU students working on TLS ciphers and protocols start today! 16:02:50 I was about to say that :D 16:02:58 We are doing an introduction meeting for them later today. 16:03:11 awesome! if you are reading this, welcome!! 16:03:26 #link https://www.openstack.org/foundation/2019-openstack-foundation-annual-report 16:03:41 We are also mentioned in the annual report for this effort. 16:04:23 spotlight on the project! 16:04:45 So that's all I have... Anyone else have anything to announce? 16:07:49 Nothing else from me 16:07:51 going once... going once... going twice... going three times.... going five times... 16:07:59 #topic Brief progress reports / bugs needing review 16:09:45 a talkative crowd today 16:09:54 apparently 16:10:26 I spent some time reviewing stuff, particularly jobboard patches. many merged 16:10:33 So, I've resurrected the patch that allows UDP pool HMs to use other protocols 16:10:35 So, failover flow.... It is working in the lab, I am at the "clean it up" and testing phase. 16:10:36 #link https://review.opendev.org/#/c/589180/ 16:10:59 Recently I have been looking at the SINGLE topology testing. 16:11:23 cgoncalves, rm_work thanks a lot for reviews! Just one change letf :) 16:11:23 We ran into issues with the UDP healthcheck in our environment (it's ... not a great design, but I guess it's the best we can do generically) so we need to be able to use other types on a UDP LB 16:11:37 For SINGLE LBs, I have dropped the outage time down to a second or two for manual failovers. This is a huge improvement. 16:11:39 ataraday_: so close! :D 16:12:05 johnsom: o/ does it create an amp in parallel before the delete of the old one? 16:12:09 Right now I am working to debug an IPv6 DAD issue triggered by my new code to speed up the SINGLE topology failover. 16:12:24 which is what we were avoiding previously due to "possible resource constraints" but i kinda thought was a BS reason 16:12:37 rm_work, yeah, just the main change :D 16:13:13 rm_work, it does build prior to failover, so yes, if there is a quota/capacity constraint it will now fail. This is what also raised this DAD issue. 16:13:30 Duplicate Address Detection (DAD) 16:13:51 I just figured your kids came back home and kept interrupting you :D 16:14:00 ataraday_, great work on your patches! you asked a question today on Gerrit if amphorav2 should be default in Ussuri. we could discuss it here today if you'd like 16:14:53 I have a couple of patches that I worked on that are good to go I think, could just use more reviews and a push :D 16:14:59 #link https://review.opendev.org/#/c/699521/ 16:15:05 ^^ to add more functionality to AZs 16:15:17 #link https://review.opendev.org/#/c/702535/ 16:15:21 The rebase is going to be a nightmare I think.... 16:15:35 johnsom: ping me if you need help with DAD, if it's failing is there a loop? 16:15:36 ^^ allow configuring whether you want to force one-armed 16:15:48 I have also done some significant refactoring around the amphorae driver and backend to clean up some "issues". 16:16:45 haleyb I have fixed these issues before. It's a sequencing issue with the new accelerated failover. I just found it in testing last night, so will look at it and fix it today. 16:16:57 ack 16:17:59 Last time I tested, SINGLE completely rebuilds the amphora in around 30 seconds, Act/Stdby in around 70 seconds. Outage time is a second or less for both. Switching to VRRP version 3 will drop it even more. I have a followup patch for that started. 16:18:42 you're on fire! 16:19:24 Still fully backport-able. No image roll needed, but would help bring down the SINGLE outage time. 16:20:14 Anyway, that and reviews have been my focus over the last week. 16:21:21 i'd like to ask for some of my py2 removal patches to get some reviews, we're seeing other repos randomly get bitten as third party library support goes away, would be good to get ahead of it 16:21:33 except for the six removal they're all pretty small 16:22:02 johnsom: should i put some on your list? 16:22:25 haleyb I think I have already been bugging you about some of those... grin 16:22:40 haleyb But, yes, please make sure they are on the priority list. 16:23:16 cgoncalves, It can be discussed on gerrit :) I put question to highlight this point. Should be 'amphorav2' amoung enabled_provider_drivers by default or not 16:23:17 johnsom: yes, and i think i've re-spun most, i'll verify and add to list 16:23:39 #link https://etherpad.openstack.org/p/octavia-priority-reviews 16:23:45 Just in case someone doesn't have it 16:25:48 also this one I just rebased: 16:25:50 Adam Harwell proposed openstack/octavia master: Update the lb_id on an amp earlier if we know it https://review.opendev.org/698082 16:26:26 which was a combination of what ataraday_ and I did independently (though she did it first and I was just blind, lol) 16:30:34 ok so I guess it's time for: 16:30:35 #topic Open Discussion 16:32:02 I'd like to get input from the team on enabling KVM instead of QEMU, when possible, in the CI jobs 16:32:21 we've bounced around on this a lot 16:32:22 I am planning to finish up the basic cleanup stuffs and dev testing, then I will probably post failover with broken tests for v1 only. Followup will be with fixed tests. 16:32:33 we do it, and then it works for a bit, and then jobs start breaking, and then we have to disable it 16:32:40 we can try again, but just be aware 16:32:45 this'll be like the third time 16:32:53 context is that there are some nodepool providers that provide nested virtualization but we are not leveraging because of bugs in the ubuntu kernel 16:33:05 I am good with turning it on again if we seem to have passing tests across the nodepool providers. 16:33:31 We can hope that the kernel bug is now fixed and deployed across the nodepool fleet 16:34:03 although, I think root caused it to one particular provider (vexxhost) having not exact/best-matching CPU models than the actual physical CPU 16:34:08 We ran for a year and a half with it on without any issues, so I'm not worried about it in *general* 16:34:22 #link https://review.opendev.org/#/c/702921/ 16:34:53 Last root cause I found in partnership with OVH was a kernel KVM bug with certain guest and host kernel versions. 16:34:54 note there's a depends-on for a devstack patch 16:35:16 in testing, seems to work fine at OVH 16:35:27 the problematic one was vexxhost because of the CPU model 16:35:42 Yeah, it's been a long time since we tried it again to see if there is a fix out. 16:35:46 setting cpu model to host-passthrough in libvirt helped 16:37:18 there's more information on the commit message that may better explain the context and proposal 16:38:09 maybe I should give folks some time to digest it. we can talk about it again next week or in Gerrit 16:38:26 its prolly fine to try again 16:39:38 works for me. I'll make sure the devstack patch merges 16:43:32 anything else or should we call it for today? 16:50:22 ok, calling it, thanks folks 16:50:26 #endmeeting