16:02:13 <rm_work> #startmeeting Octavia
16:02:14 <openstack> Meeting started Wed Apr 24 16:02:13 2019 UTC and is due to finish in 60 minutes.  The chair is rm_work. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:02:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:02:17 <openstack> The meeting name has been set to 'octavia'
16:02:29 <rm_work> Hey folks!
16:02:34 <johnsom> o/
16:02:34 <ataraday_> hello everyone
16:02:52 <gthiemonge> Hi
16:02:55 <rm_work> Sorry for the slightly late start, still getting the hang of this
16:03:07 <rm_work> #topic Announcements
16:03:09 <cgoncalves> hi
16:03:55 <rm_work> The summit and PTG is next week!
16:04:04 <rm_work> not sure how to make that an official annoucement thing
16:04:13 <johnsom> Are you cancelling any of the weekly IRC meetings?
16:04:23 <johnsom> it is #topic <topic>
16:04:35 <johnsom> Oh, you got it
16:05:14 <rm_work> ah i guess there's no real sub-topic stuff
16:05:22 <rm_work> so anyway, yeah. next week: summit+ptg!
16:05:33 <rm_work> should we have a meeting? what do people think?
16:06:00 <rm_work> #startvote Should we have a meeting next week? Yes No
16:06:01 <openstack> Begin voting on: Should we have a meeting next week? Valid vote options are Yes, No.
16:06:03 <openstack> Vote using '#vote OPTION'. Only your last vote counts.
16:06:03 <johnsom> I vote to cancel next week at least
16:06:12 <johnsom> #vote No
16:06:24 <cgoncalves> #vote No
16:06:25 <rm_work> look at all of this democracy happening right now
16:06:31 <rm_work> it warms my heart
16:06:39 <rm_work> #vote No
16:06:44 <johnsom> Already with the votes... lol
16:06:57 <rm_work> this is what you get when you make me wake up at 9am :D
16:08:09 <rm_work> ok no more votes?
16:08:15 <rm_work> I think that's prolly clear anyway
16:08:27 <rm_work> #endvote
16:08:28 <openstack> Voted on "Should we have a meeting next week?" Results are
16:08:30 <openstack> No (3): rm_work, johnsom, cgoncalves
16:08:50 <rm_work> Ok. So, meeting next week is cancelled!
16:08:51 <johnsom> Shall we count the hanging chads?
16:09:09 <rm_work> Any other announcements?
16:09:19 <johnsom> #link https://opendev.org/explore/repos
16:09:34 <johnsom> If you haven't noticed, infra made some big changes last week.
16:09:40 <rm_work> Oh yes! Everything is officially moved to OpenDev.org
16:09:50 <johnsom> All of the openstack git repos have changed to opendev.org
16:10:08 <johnsom> You may need to update some of your .gitreview files.
16:10:11 <rm_work> Congrats infra for a relatively smooth transfer
16:10:33 <colin-> hello
16:10:33 <johnsom> Also, most important, Depends-On links to the old domain break and may need to be updated on open reviews.
16:11:00 <rm_work> assuming you used a full URL, yes
16:11:06 <rm_work> I don't think I ever did it that way...
16:11:10 <johnsom> Yeah, I am super happy about the gitea move. The old git web was horrible
16:12:35 <johnsom> When are you going to set the schedule for the PTG based on the etherpad?
16:12:55 <rm_work> Hmmm, that is a great question
16:12:59 <johnsom> #link https://etherpad.openstack.org/p/octavia-train-ptg
16:13:08 <rm_work> I think I might delegate the official PTG planner role
16:13:14 <rm_work> Any takers? johnsom? :D
16:13:17 * johnsom forgot how fun it is to not be the PTL
16:13:39 <rm_work> Let's see if I can make you un-forget ^_^
16:14:04 <johnsom> I will work on that today/tomorrow
16:14:09 <rm_work> cool
16:14:22 <rm_work> #action johnsom to make PTG schedule from the planning etherpad
16:14:35 <johnsom> Oh, one more announcement. Stackalytics is in theory fixed.
16:14:43 <rm_work> it was broken?
16:14:58 <johnsom> Yeah, it was given random results.
16:14:58 <rm_work> missed that
16:15:05 <rm_work> lol
16:15:18 <johnsom> At one point I had only contributed less code for Stein than I had in a single patch
16:17:08 <rm_work> Ok, I think that's it for announcements then
16:17:30 <rm_work> #topic Brief progress reports / bugs needing review
16:17:42 <rm_work> Anyone have anything for this?
16:18:03 <johnsom> I do, lol
16:18:11 <rm_work> link away!
16:18:16 <ataraday_> I've got request for review of my change
16:18:38 <cgoncalves> not much from my side. Easter break Friday to Tuesday
16:18:41 <johnsom> I created a feature matrix for provider drivers:
16:18:44 <johnsom> #link http://logs.openstack.org/74/651974/6/check/openstack-tox-docs/015a575/html/user/feature-classification/index.html
16:19:05 <johnsom> The styling will improve once my fixes to sphinx-feature-matrix releases
16:19:14 <rm_work> cool
16:19:20 <colin-> nice thanks
16:19:27 <ataraday_> Just not sure it is for this section or for open disscussion  -  https://review.opendev.org/#/c/652953/
16:19:39 <johnsom> I also started a patch to help the "non-graceful shutdown" situation
16:19:41 <johnsom> #link https://review.opendev.org/653872
16:19:50 <rm_work> #link https://review.opendev.org/#/c/652953/
16:20:33 <johnsom> I think this is part one of a few patches in this space as interim until we fix flow resumption.
16:20:36 <rm_work> I may agree with Ann there
16:20:48 <rm_work> depending on the timeline we have for real flow resumes
16:21:05 <johnsom> Yeah, let's talk about that patch in open discussion. I have some thoughts there
16:21:08 <rm_work> and whether or not there's a ton of additional setup for that, where this might help in a larger portion of installs
16:21:54 <johnsom> Other than those I have been working on our slides for the summit presentations.
16:22:31 <colin-> we see resources in the state johnsom's patch describes from time to time so will be watching that one to see how discussion goes
16:22:35 <rm_work> it actually looks like the patch ataraday_ linked is an alternate to the "temp solution"
16:22:45 <johnsom> Yes
16:22:57 <rm_work> ie, until we get real task resume, just allow deleting the bad ones
16:23:04 <rm_work> might be less complexity
16:23:21 <johnsom> Well, my patch requires no manual intervention
16:23:24 <rm_work> well, does anyone else have anything for this topic? or can we move directly to open discussion and talk about this?
16:24:11 <johnsom> I have one other quick topic I would like to add to the agenda if I can.
16:24:23 <rm_work> sure, whatsit?
16:24:29 <johnsom> Train features
16:24:53 <rm_work> ah, sure -- though I figured we'd do that major discussion during the PTG
16:25:00 <rm_work> #topic Train Features
16:25:29 <johnsom> I am creating our "project update" slides for the summit.
16:25:53 <johnsom> This deck includes a slide on anticipated features for Train.
16:26:14 <johnsom> Right now all I have is: retire neutron-lbaas, log offloading, and VIP ACLs.
16:26:32 <johnsom> Does anyone else have any features they plan to work on for Train they would like included in this slide?
16:26:53 <rm_work> i'd still like to get "members as a base resource" done
16:27:02 <rm_work> got a good start on that already
16:27:12 <rm_work> not sure if it's really major enough to list on that slide tho
16:27:20 <johnsom> Ok, I can include that if you would like
16:27:28 * rm_work shrugs
16:27:50 <ataraday_> If the work on taskflow will be accepeted - I can work on that
16:27:51 <colin-> we're anticipating active/active and support for a container based driver. when is the release date for train?
16:28:01 <johnsom> Ok, I just wanted to ask if there were other features planned for Train
16:28:19 <johnsom> #link https://releases.openstack.org/train/schedule.html
16:28:39 <johnsom> Feature freeze for Train is the week of Sept 9th
16:28:42 <rm_work> I think officially, we'll be deciding the feature goals at the PTG, so maybe just say something to that effect -- "this is a preliminary list, we'll be discussing more at the PTG, please join us"
16:28:53 <colin-> ok
16:29:12 <johnsom> Right, it always has the disclaimer
16:29:35 <johnsom> ataraday_ If you have resource to work on that I will add it. I might also be able to help that effort.
16:29:36 <rm_work> then that's probably fine
16:30:09 <rm_work> ataraday_: yeah, what we prioritize is largely based on what people are able to commit time for -- we'll accept whatever you think you can do :)
16:30:11 <johnsom> colin- Are there lines you would like me to add for your efforts on Train?
16:30:34 <colin-> the etherpad topics cover what we're interested in, was just reveiewing that
16:31:05 <rm_work> oooo, neutron-lbaas deprecation is THIS CYCLE, really? for realsies?
16:31:09 <johnsom> This presentation happens before the PTG, so it's a bit "guestimation"
16:31:11 <colin-> we haven't organized around it internally but understand both those bodies of work need sponsors atm
16:31:17 <rm_work> what a time to be alive
16:31:20 <rm_work> and to be PTL :D
16:31:36 <ataraday_> I'm the resource and as I spend some time on this topic, I can do more to make it happen :)
16:32:07 <rm_work> colin-: yes, both are things we'd love to see, but both have had people come, do work, and disappear
16:32:21 <rm_work> so it's really hard to say -- right now we don't have people actively active/active-ing
16:32:28 <rm_work> and containers is ???
16:32:32 <johnsom> Cool, I will add flow resumption as a Train feature goal
16:32:41 <rm_work> whatever you can commit to doing is appreciated
16:33:00 <colin-> understood, that aligns with where we thought those were. and johnsom has helpfully shared the relevant links to what was most recently done on active/active
16:33:04 <johnsom> Ha, we have a working lxd proof of concept (if you turn all of the container security off)
16:33:26 <rm_work> what about with zun?
16:33:37 <colin-> nova-lxd i'm guessing
16:33:44 <colin-> that sounds pretty cool
16:33:48 <johnsom> Yes, with nova-lxd.
16:34:08 <johnsom> #link https://review.opendev.org/636066
16:34:10 <johnsom> and
16:34:17 <johnsom> #link https://review.opendev.org/636069
16:34:56 <colin-> thanks will check those out!
16:35:08 <rm_work> cool
16:35:14 <johnsom> There is no way I would actually use that stuff though.  It is a bit messy
16:35:51 <rm_work> ok, should we move to open discussion then? did we cover this adequately?
16:36:01 <johnsom> +1 thanks for the feedback!
16:36:04 <colin-> we did for my part ty
16:36:20 <rm_work> ok
16:36:24 <rm_work> #topic Open Discussion
16:36:55 <rm_work> I have something for this, but we can resume your thing first
16:38:21 <rm_work> johnsom?
16:38:28 <johnsom> My biggest concern with starting to add --force flags is it is overriding our object locking/ownership system. The only way that command could be used safely is if the operator checks that none of the controllers are actively working on the object before using the --force.
16:38:47 <johnsom> For example, HM failovers will set PENDING_*.
16:39:05 <johnsom> It also still requires operator intervention.
16:39:58 <ataraday_> My concern that now operator goes in db and manually set status to delete, I think we should try to avoid this as much as possible...
16:40:05 <johnsom> Operators also can do this operation against the DB if it is really necessary. It seems like --force makes it too easy to abuse.
16:40:23 <cgoncalves> IIRC it all comes down to not trusting on our admin users
16:40:41 <rm_work> well, would --force be allowed for admin only, or anyone?
16:40:45 <rm_work> hopefully admin?
16:40:53 <ataraday_> why not to trust? Admin should now stuff
16:41:03 <johnsom> The cases we know of that can lead to PENDING_ in a stuck state are "kill -9" or loss of a controller mid-flow.
16:41:05 <rm_work> yeah ok just re-read the commit message
16:41:05 <cgoncalves> ataraday_, "should" is the keyword ;)
16:41:13 <colin-> ataraday_: agreed, we cannot sustain a model where our operators are required to update octavia's tables with any regularity
16:41:30 <colin-> it's too risky
16:41:30 <rm_work> colin-: is this happening with regularity?!
16:41:39 <johnsom> That is why I approached it as, have the controller look for things it owns on startup and correct the status for those.
16:41:50 <rm_work> admins are doing `kill -9` to octavia services with regularity? lol
16:42:05 <colin-> no, but that is only one way to induce the symptom the change describes
16:42:13 <rm_work> what are other ways?
16:42:15 <colin-> at least, we're not killing processes that way
16:42:21 <johnsom> Yeah, I think the common case is mis-configured systemd service definitions where systemd gives up and kill -9
16:42:30 <rm_work> hmm
16:42:44 <ataraday_> I add check for time - how long loadbalancer was not updated, is not this enough to be safe?
16:42:46 <cgoncalves> rm_work, not 'kill -9' per se. I think it's more cloud updates, controller reboots (scheduled or not)
16:43:01 <colin-> agree with cgoncalves
16:43:17 <johnsom> Still a reboot should be graceful shutdown if the systemd service is configured correctly.
16:43:24 <rm_work> personally I think ataraday_'s approach does make sense, but
16:43:39 <rm_work> do we need either of this if we think we can complete real job resumes within the cycle?
16:44:09 <johnsom> No, but that is a lot of work.
16:44:16 <rm_work> wouldn't it be better to just prioritize doing that, and not add a bunch of other extra complexity?
16:44:16 <colin-> certainly the need for it is less (gone?) if we guarantee they can't get into that state ever
16:45:32 <rm_work> of course, I was also a fan of an admin "sync" type command in the past, so
16:45:34 <johnsom> There is also a failure point with oslo messaging/rabbit. If those are not setup correctly and the queue gets lost, we could end up with PENDING_*.  That is another discussion.
16:45:42 <cgoncalves> personally I'd like to have an interim solution that is backportable but I understand if it cannot be done. if we agree it's not, I'd run to document the steps that should be taken to prevent killing flows unnecessarily (pre-maintenance window actions) and corrective actions if too late/unexpected controller reboots
16:45:52 <colin-> oh i hadn't considered that but yeah
16:46:01 <colin-> there'd be instructions octavia thinks are eventually going to complete still
16:46:04 <ataraday_> I think we need --force, as operator should not goes in db :) I don't think it will finised in one cicle at least only in experimental mode
16:46:09 <johnsom> Yeah, neither of these are backportable solutions.
16:47:20 <johnsom> I might be convinced to setup a periodic job as well. That would be backport-able.  We just need to figure out the right conditions for it.
16:47:40 <rm_work> when i brought up sync in the past, my argument had been "we're not going to catch all the cases, why can't we have a way to fix the stuff we can't predict" so I'd feel hypocritical arguing against ataraday_ here
16:48:00 <johnsom> What we really want is a consensus protocol such that all the controllers can say "I'm not working on that  object".
16:48:19 <rm_work> yes, that would work if we could figure out a way to do it
16:48:35 <rm_work> would people be ok making this a topic for the PTG for a larger discussion? it's only a week away
16:48:46 <colin-> yes, it seems complex enough
16:48:51 <rm_work> it's a small delay but we could get a more well researched and agreed consensus there
16:48:56 <johnsom> Sadly ataraday_ Can't join us at the PTG
16:49:01 <rm_work> remotely?
16:49:10 <rm_work> we usually set up video-conf
16:50:27 <johnsom> BTW, I will give my normal caveat, I am not a hard now on the --force thing. I just want us to consider all the ramifications of going down that path before we add it and can't remove it.
16:50:39 <johnsom> s/now/no/g
16:51:22 <rm_work> yes, the "can't remove it" thing is my main worry
16:51:31 <cgoncalves> we could do that, yes. question is if ataraday_ would be available. since there's no agenda schedule fixed yet, we could also consider ataraday_'s timezone
16:51:39 <rm_work> we have to be fully committed to API changes
16:51:48 <rm_work> yep
16:51:53 <johnsom> Yes, I can schedule to people's availability
16:52:06 <rm_work> ataraday_: can you remotely attend part of the PTG?
16:52:33 <rm_work> whelp
16:53:09 <rm_work> i think we say "hopefully" and plan to discuss it at the PTG
16:53:45 <rm_work> or is ataraday present
16:54:02 <johnsom> I think she is in Europe, so I will try to put in an early morning timeslot if we don't hear otherwise.
16:54:11 <cgoncalves> +1
16:54:11 <rm_work> ok, sounds good to me
16:54:35 <rm_work> I had a topic too, though we're a bit short on time
16:55:28 <rm_work> We're discussing internally about adding support (upstream-first) for Athenz authentication for amphora/control-plane communications (to replace the local cert generation)
16:55:45 <rm_work> #link https://www.athenz.io/
16:56:04 <ataraday_> sorry, I got disconnected
16:56:25 <rm_work> I actually don't know enough about it personally yet to know if it'd be as simple as another driver like the Anchor thing, or if we'd need to modify things significantly
16:56:38 <johnsom> rm_work would it replace the local cert capability, or just be another driver option?
16:56:41 <rm_work> but it's on my roadmap to find out, and probably I'll have more for discussion at the PTG
16:57:04 <rm_work> I don't think it could actually replace local-cert-gen
16:57:14 <johnsom> ataraday_ We asked if there was a chance we could video conference you in for the discussion at the PTG? Is there a timeslot that would work for you?
16:57:23 <rm_work> since that is the most basic and we'd want that for simple deployments and testing stuff regardless
16:57:33 <rm_work> wouldn't want to make athenz a hard requirement
16:57:41 <rm_work> but it might be a good optional thing
16:57:46 <cgoncalves> agreed
16:57:55 <johnsom> Yeah, I think as another driver option I don't see any reason why not.
16:58:15 <rm_work> I was curious if anyone else uses any kind of in-house sshca system
16:58:35 <johnsom> It's an Apache license, so not a concern there either
16:58:52 <rm_work> we use this internally at Verizon Media (formerly known as Oath, formerly known as Yahoo), as it was born here
16:59:40 <rm_work> but this is a fairly common thing (authz / authn) and if there are other similar things, it might be worth trying to make it a generic pattern
16:59:41 <johnsom> We still need to "retire" the anchor stuff from the octavia rep
16:59:54 <rm_work> yeah, I can do that around the same time
17:00:04 <rm_work> we can discuss this one more at the PTG also
17:00:18 <rm_work> well, we're out of time
17:00:24 <rm_work> thanks for coming everyone!
17:00:24 <ataraday_> I'm in UTC+4 zone, sure I will try to connect for  PTG disscussion - just set up time and send me a link :)
17:00:36 <rm_work> cool, thanks ataraday_ :)
17:00:42 <rm_work> #endmeeting