20:00:02 <johnsom> #startmeeting Octavia
20:00:03 <openstack> Meeting started Wed Dec 13 20:00:02 2017 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:06 <openstack> The meeting name has been set to 'octavia'
20:00:13 <johnsom> Hi folks!
20:00:26 <xgerman_> o/
20:00:29 <johnsom> Another fine week in OpenStack land
20:00:29 <nmagnezi> o/
20:00:32 <longstaff> hi
20:00:48 <cgoncalves> hey
20:00:50 <johnsom> (well, ok, zuul had it's issues...)
20:00:59 <johnsom> #topic Announcements
20:01:00 <rm_mobile> o/
20:01:09 <johnsom> I have two items
20:01:20 <johnsom> One, queens MS2 did finally release
20:01:31 <johnsom> I also released a new python-octaviaclient
20:01:36 <johnsom> So, good stuff there.
20:02:05 <johnsom> Also I wanted to make you aware of a lively discussion on the mailing list about changing the release cycle of OpenStack to once per year.
20:02:12 <johnsom> #link http://lists.openstack.org/pipermail/openstack-dev/2017-December/125473.html
20:02:47 <xgerman_> guess we are maturing…
20:02:49 <johnsom> If you have any feeling about the PTG meetings or release timing, feel free to jump into the conversation
20:02:56 <nmagnezi> johnsom, what do you think about this?
20:03:10 <johnsom> I have mixed feelings.
20:04:23 <johnsom> I think overall it will slow innovation on the project. I worry about trying to do stable branch bug backports from master that has up to a year of feature work. I do think we are a bit of a treadmill with the PTGs and PTL elections, etc. Finally, the goals do hurt the projects with the six month cycles since some require a lot of work.
20:05:22 <johnsom> I think it may mean that the PTL needs to take on more management work to keep the project on track.  Maybe more defined sprints or team milestones.
20:05:31 <johnsom> Just some top of head thoughts.
20:05:42 <rm_mobile> Yeah... Some of the stuff is too fast, but... Longer cycles means bigger releases which does not help with upgrades
20:05:43 <johnsom> Any other comments?
20:05:43 <nmagnezi> yeah. i wonder how we exactly shift to 1 year planning - feature-wise
20:06:03 <johnsom> Yep
20:06:06 <xgerman_> one PTG per year clearly makes attanding the summits more imrportant
20:06:27 <johnsom> Yeah, my last read of the list the proposal was one PTG, still two summits
20:06:52 <johnsom> Ok, any other announcements?
20:06:55 <nmagnezi> xgerman_, so virtual PTGs? :)
20:07:03 <xgerman_> midcycles?
20:07:04 <johnsom> nmagnezi I thought about that
20:07:21 <nmagnezi> could be tricky, timezone-wise
20:07:31 <nmagnezi> but I'll be in favor of that.
20:07:58 <johnsom> #topic Brief progress reports / bugs needing review
20:08:08 <johnsom> Ok, moving along to try to give time for other topics
20:08:15 <rm_mobile> Do we need to pick up anything from last week?
20:08:25 <johnsom> I have been focused on getting reviews done and updates to the specs in flight.
20:08:33 <johnsom> rm_mobile Yes, I have a long list
20:08:45 <rm_mobile> Lol k
20:08:48 <johnsom> 5 topics including the carry over
20:09:10 <johnsom> I hope we can merge QoS this week.
20:10:02 <johnsom> Provider driver spec is looking good, but we really would like feedback on some topics.  Specifically on my mind is should the Octavia API create the VIP port and pass to the driver or should the driver be responsible for creating the VIP port.
20:10:21 <johnsom> Please comment on that topic if you have a driver in your future.
20:10:40 <johnsom> The UDP spec is also coming along nicely.  Feedback welcome.
20:10:56 <johnsom> #link https://review.openstack.org/509957
20:11:03 <nmagnezi> I commented about that (VIP port create). would be happy to elaborate more if needed
20:11:13 <johnsom> #link https://review.openstack.org/503606
20:11:31 <johnsom> Any other progress updates?
20:11:59 <johnsom> #topic (rm_work / dayou) Element Flag for Disabling DHCP
20:12:07 <johnsom> #link https://review.openstack.org/#/c/520590/
20:12:17 <johnsom> We didn't get to this last week, so I put it at the top this week
20:12:20 <rm_mobile> Getting to a real keyboard for this lol
20:12:46 <johnsom> My understanding on it was the issue dayou was having was/can be resolved without this.
20:12:52 <rm_work> o/
20:12:59 <rm_work> hmmm can it?
20:13:13 <rm_work> I mean, the DHCP issue is one that I have as well (just use an internal patch to fix it)
20:13:14 <johnsom> My biggest concern with the patch is issues with cloud-init overriding those changes.
20:13:33 <jniesz> i think the problem with cloud init it doesn't support slaac
20:13:40 <jniesz> expects dhcp or static
20:14:02 <johnsom> jniesz No, it uses slaac.  That is all I have in my lab
20:14:09 <rm_work> my issue is that our cloud HAS NO DHCP (i believe we're not alone in this either), and images stall on the dhcp step for ipv4 even
20:14:31 <rm_work> maybe there is a better way to fix this that I'm missing?
20:14:32 <jniesz> johnsom it wasn't in the source code, i didn't see anything for it to write out inet6 auto lie
20:14:38 <xgerman_> but then the coud-init shoudl set it as static?
20:14:42 <rm_work> but, it seems that it stalls on dhcp even before cloud-init runs to change it
20:14:54 <rm_work> because cloud-init DOES have a static config to send it
20:15:04 <jniesz> static != slaac
20:15:08 <johnsom> rm_work That is a different issue
20:15:12 <rm_work> right
20:15:18 <rm_work> but this patch was designed to fix both
20:15:24 <jniesz> yea
20:16:00 <rm_work> i guess if it's just my issue, I can just stick to my current internal patch that ... well, does exactly this (which works)
20:16:01 <johnsom> It's been a while since I booted a v6 enabled amp, so I'm not sure how cloud init handles it.  I know we have code in for the v6 auto case
20:16:19 <jniesz> johnsom: is that dhcpv6 though?
20:16:30 <johnsom> No, I don't have dhcpv6
20:16:40 <nmagnezi> johnsom, IIRC the jinja templates handle a static ipv6 as well. but I didn't test it
20:17:01 <rm_work> this is what I apply internally: http://paste.openstack.org/show/628898/
20:17:17 <xgerman_> so rm_work says it comes up eventually after dhcp gives up
20:17:26 <rm_work> yes, like 5 minutes >_>
20:17:40 <rm_work> which is way outside the timeout for my cloud
20:17:41 <johnsom> So, I think we should break this into two stories. 1. for the dhcp delay 2. for booting v6 only amps.
20:17:52 <xgerman_> +1
20:18:25 <rm_work> k, just figured it was a workable solution for both, but yeah if his issue needs to be solved in a different way, then this would just be for me
20:18:52 <rm_work> i just pushed this because at the time he was duplicating my work for dhcp disabling
20:18:58 <jniesz> yea this would work for booth
20:19:01 <johnsom> The concern is hacking on the network scripts that cloud-init manages/overwrites is worrisome, especially if we don't explicitly tell cloud-init that we are managing those.
20:19:18 <johnsom> Like, that paste, doesn't a reboot overwrite that?
20:19:19 <xgerman_> +1
20:19:40 <rm_work> johnsom: it doesn't seem to <_< though a reboot causes a failover and recycle for me
20:19:45 <rm_work> so it's hard to tell
20:19:52 <rm_work> but i mean, it doesn't seem to be overwritten by cloudinit
20:19:54 <jniesz> cloud-init assumes wrong things : )
20:19:57 <rm_work> looking at existing amps
20:20:11 <johnsom> I'm also not as familar with cloud-init under CentOS as I am ubuntu
20:20:36 <xgerman_> nmagnezi?
20:20:44 <nmagnezi> would have to check
20:20:49 <johnsom> cloud-init does what neutron and nova tells it.  That was the solution we came up with for your case jniesz
20:21:02 <jniesz> yea, but it doesn't work
20:21:06 <jniesz> for slaac
20:21:21 <jniesz> cloud init needs to be enhanced to add that
20:21:28 <jniesz> and then would have to wait for distros to add new cloud init
20:21:50 <johnsom> jniesz Something doesn't jive with that.  All I have is slaac here and I had working IPv6, so ...
20:22:32 <jniesz> it looks for dhcp
20:22:38 <jniesz> otherwise assumes static
20:22:44 <johnsom> The other part I didn't like with the script is it is specific to the distros, where cloud init handles the translations for us.
20:22:45 <jniesz> in the network metadat
20:23:12 <rm_work> oh actually yes, looked again, it IS overwritten by cloud-init :( but cloud-init does this AFTER the boot step that tries to DHCP an address
20:23:34 <rm_work> so yeah probably on reboot it would have issues :(
20:23:35 <johnsom> So, my ask, let's open stories that characterize the use cases, then we bind patches to those and work from there.
20:23:44 <rm_work> is there some way for me to send these options during cloud-init boot?
20:23:53 <johnsom> rm_work See, I'm not totally crazy... grin
20:23:59 <rm_work> but ...
20:24:04 <rm_work> see this is a problem still
20:24:08 <rm_work> i still need my patch
20:24:10 <johnsom> Yes, there are much better ways.
20:24:12 <rm_work> just ALSO need to update cloud-init
20:24:23 <rm_work> because again, cloud-init hasn't replaced it by the time it causes a problem
20:24:27 <rm_work> it replaces it AFTER
20:24:32 <rm_work> so i need both fixes
20:24:32 <johnsom> So, let's open stories and work on solutions for those use cases
20:24:35 <rm_work> k
20:25:37 <johnsom> #topic (rm_work / BAR_RH) Specify management IP addresses per amphora
20:25:43 <johnsom> #link https://review.openstack.org/#/c/505158/
20:26:22 <johnsom> So we discussed this last week.  I think there was concern about the approach.  We talked about the update config API extension that we have wanted for a while.
20:26:28 <johnsom> Where did we leave this?
20:26:50 <rm_work> Was this the one where nmagnezi and I needed to talk after?
20:27:04 <nmagnezi> yes. and we didn't catch each other here in the past week
20:27:16 <rm_work> T_T
20:27:30 <johnsom> lol
20:27:46 <johnsom> So should we table this another week?
20:27:58 <rm_work> Summary I think was: Instead of futzing with the flows, we just need to leave the binding as-is on boot, and then on first connect we update the config to point to the single management address
20:27:59 <nmagnezi> I must admit that the agent restart per-api call sounds a bit like a hack to me. so I wanted to discuss this a bit more
20:28:16 <rm_work> nmagnezi: not every API call... just the update-agent-config call
20:28:27 <rm_work> which we wanted for a while anyway
20:28:32 <rm_work> for things like updating the HM list
20:28:42 <nmagnezi> why did we want it to begin with?
20:28:49 <johnsom> We can use the oslo reload stuff too, no need for a full restart
20:28:56 <xgerman_> +1
20:29:14 <johnsom> The HM IP:Port list is my #1
20:29:25 <rm_work> same
20:29:30 <nmagnezi> well, I'm open for discussion about this. if there's a nice way to achieve this I'm all ears
20:29:44 <johnsom> nmagnezi Why did we want the IP fixed?  It's a very old bug before we added the namespace
20:30:00 <xgerman_> if you ever hack the system you can then sink the whole fleet by sending bogus configs
20:30:34 <rm_work> ?
20:30:39 <rm_work> how
20:30:39 <nmagnezi> johnsom, because it might be risky to bind to *
20:30:48 <rm_work> nmagnezi: thats how it CURRENTLY works
20:30:50 <nmagnezi> johnsom, the agent runs on the root namespace
20:30:51 <johnsom> nmagnezi Correct
20:31:11 <rm_work> nmagnezi: and what i'm saying is, on the *initial* call, we move it to the correct IP
20:31:18 <johnsom> xgerman_ ??? what?  no
20:31:36 <johnsom> It uses the same two way SSL all of our config work happens over
20:31:43 <rm_work> nmagnezi: and if the initial call fails (someone somehow beat us to it???) we fail to finish amp creation and we trash it anyway
20:31:43 <xgerman_> if we allow configs being changed that is a theoretical possibility
20:31:45 <johnsom> So, if you get that far you are game over anyway
20:32:53 <johnsom> xgerman_ We are talking about the amphora-agent config file inside the amp, not controllers
20:32:54 <nmagnezi> rm_work, let's discuss this further. I'm open for discussion about this, as I said.
20:32:55 <nmagnezi> :)
20:33:02 <rm_work> lolk
20:33:06 <johnsom> Ok
20:33:06 <rm_work> so, next year? :P
20:33:23 <johnsom> #topic (sanfern / rm_work) Changes to DB schema for VRRP ports
20:33:24 <nmagnezi> rm_mobile, ¯\_(ツ)_/¯
20:33:31 <johnsom> #link https://review.openstack.org/#/c/521138/
20:33:44 <johnsom> So next up was the rename patch.
20:33:56 <rm_work> I guess all I wanted to know was, is everyone else OK with us changing field names in a DB schema
20:34:09 <johnsom> I think some work as occurred over the last week to resolve some of the backwards compatibility issues.
20:34:11 <rm_work> as an upgrade it is scary to me, since it absolutely kills zero-downtime
20:34:18 <rm_work> hmm
20:34:32 <jniesz> it doesn't bring the amps down, just the control plane
20:34:42 <rm_work> yes
20:34:44 <jniesz> I know sanfran added fix for amp agents
20:34:50 <rm_work> just a control-plane outage
20:34:53 <jniesz> so they wouldn't all have to be upgraded
20:34:59 <rm_work> yeah that's good :P
20:35:09 <rm_work> i hadn't even noticed that part yet, that would have been bad
20:35:14 <johnsom> Right, my stance has been we have not asserted any of the upgrade support tags yet. (nor do we have a gate)
20:35:21 <rm_work> i just saw the migration and immediately stopped
20:35:43 <rm_work> so as i keep saying, if everyone else is OK with this, then I am fine
20:36:04 <xgerman_> yeah, I think we can do a rename/update — I also think we are still in the window to change the /octavia/amphora endpoint
20:36:15 <rm_work> yes
20:36:21 <johnsom> Yeah, I agree on that
20:36:53 <nmagnezi> I have to abstain once more. since we yet to ship Octavia that does affect us in any way
20:37:05 <nmagnezi> so.. If it's in I'm fine with it.
20:37:05 <rm_work> so I just wanted a vote on "Can we change field names in our existing DB schema"
20:37:18 <rm_work> I'm concerned about my deployment, but also others
20:37:20 * johnsom gives nmagnezi a glare of shame....  grin
20:37:31 <nmagnezi> johnsom, lol, why? :-)
20:37:40 <johnsom> You haven't shipped
20:37:42 <cgoncalves> :)
20:37:48 <rm_work> as one of the things we get yelled at during project update sessions every summit is that our upgrades are not clean (though usually it's around the APIs)
20:37:53 <nmagnezi> johnsom, tripleO.. soon enough!
20:38:54 <johnsom> rm_work I hear you. We should be working towards clean upgrades. My question to you is if all of the glue code required to do a smooth transition to reasonably sane names is worth it.
20:39:56 <johnsom> I mean really it's a migration that adds the new column, copies the data, and maintains both for some deprecation cycle.
20:40:14 <rm_work> i absolutely would love to have these column names fixed to be less dumb
20:40:29 <johnsom> Long term I would hate to have one set of terms in the models and a different in the DB
20:40:30 <rm_work> i am just concerned about whether we're supposed to be doing this kind of change
20:40:39 <rm_work> yeah
20:41:10 <johnsom> Well, from an OpenStack perspective, if we have not asserted the upgrade tags (we have not), a downtime upgrade is fair game
20:41:49 <johnsom> As long as we don't require the amps to go down, I think I'm ok with it. (pending reviewing the full patch again)
20:42:03 <rm_work> kk
20:42:12 <nmagnezi> #link https://review.openstack.org/#/c/521138/
20:42:15 <rm_work> yeah that's why i asked we vote not on the patch but on the core concept
20:42:32 <rm_work> "Can we change field names in our existing DB schema"
20:42:39 <rm_work> seems like everyone is thinking "yes"
20:42:40 <johnsom> But you, jniesz and mnaser have this running in some form of production, so your voices matter here
20:43:03 <rm_work> for me it matters a lot, since this upgrade will happen for me just about the moment it merges
20:43:06 <jniesz> when we upgrade openstack regions we have to schedule control plane downtime anyways
20:43:09 <rm_work> since I run master, deployed weekly <_<
20:43:27 <nmagnezi> rm_work, master? really?
20:43:29 <rm_work> yes
20:43:33 <nmagnezi> man..
20:43:41 <johnsom> Yes, see nmagnezi you should ship...  grin
20:43:47 <nmagnezi> lol.
20:43:51 <johnsom> grin
20:43:57 * rm_work laughs maniacally
20:44:08 <jniesz> yea, I would like to get us to the point where we are deploying master
20:44:22 <rm_work> jniesz: it isn't that hard actually IME
20:44:30 <rm_work> just switch what tag you pull :P
20:44:52 <rm_work> hopefully existing CI+tests takes care of issues
20:44:52 <jniesz> yea, I think something we will look at after the pike upgrades
20:45:08 <johnsom> Ok, so I hear an operator wanting an upgrade path.  We should try to support them.
20:45:10 <rm_work> yeah since Octavia is cloud-version-agnostic
20:45:18 <johnsom> rm_work are you willing to help with the coding for the patch?
20:45:30 <rm_work> the coding for this patch?
20:45:34 <johnsom> Yep
20:45:39 <rm_work> I mean... how do we even do this
20:45:46 <mnaser> thanks for highlighting me
20:45:48 <rm_work> i thought it was basically "rename or don't"
20:46:03 <mnaser> my comment is: control plane downtime is acceptable, because that sort of thing can be scheduled
20:46:05 <johnsom> mnaser Are you caught up? Do you have an opinion here?
20:46:15 <rm_work> yeah I am leaning towards that as well honestly
20:46:26 <mnaser> also, api downtime in this case doesn't really cause that much issues, its not as critically hit as a service like nova for example
20:46:27 <rm_work> but I was literally not sure if this was a smart idea politically
20:46:37 <rm_work> but johnsom asserts we have not tagged ourselves with anything that would cause a problem
20:46:38 <rm_work> so
20:46:50 <mnaser> to be honest, it is very rare that you can get downtime-less upgrades, even with big projects
20:46:55 <johnsom> I think we would have to do the migration to have both columns in parallel so the models can continue using the old until upgraded.
20:47:05 <rm_work> ah yes.
20:47:06 <rm_work> well
20:47:18 <mnaser> johnsom: brings a good point, unless you make it very clear that the old services must be shut down when starting a new one
20:47:29 <rm_work> honestly -- i think if we don't have any big political issues with this -- we just do it, and make a big flashy release node
20:47:32 <rm_work> *release note
20:47:42 <johnsom> Right, this would have rich release notes for upgrade....
20:47:51 <jniesz> +1
20:47:59 <mnaser> we run 3 replicas of octavia but our upgrade procedure usually is, turn off all replicas except one, upgrade this single replica, when everything is back up and working properly ,start up the rest
20:48:42 <mnaser> perhaps add some sort of failsafe to prevent it from starting up with a newer database and failing unpredictabely?
20:48:51 <johnsom> mnaser This would be a "shutdown control plane", "run DB migration", "upgrade control plane", "start control plane" type of release
20:49:10 <xgerman_> you forgot backup DB
20:49:14 <mnaser> perhaps if db_migration_version > release_migration_version: refuse_to_start()
20:49:20 <johnsom> As it is written now.
20:49:44 <johnsom> mnaser Hmm, we don't have that today
20:50:03 <xgerman_> yeah, we need that to not corrupt the DB willingly
20:50:15 <johnsom> Currently our upgrade is "db migration", then update control plane
20:50:35 <johnsom> This is the first patch that would not work that way
20:51:24 <johnsom> So, let me ask again, Are there folks willing to help on this patch to make it a smooth upgarde?
20:52:08 <mnaser> im unable to commit any effort into that so for our side, it will likely be: turn off all replicas, upgrade packages of 1 replica, sync db, start 1 replica, check if all ok, start up other replicas after upgrading
20:52:19 <mnaser> as an operator im perfectly content with that procedure (its what we used for many others)
20:52:30 <rm_work> i'm pretty much booked for this week and then i'm out essentially until January
20:52:31 <rm_work> so
20:52:35 <rm_work> ....
20:52:56 <rm_work> and i'm ok with this upgrade procedure
20:53:01 <rm_work> personally
20:53:33 <rm_work> my concern was that it's the ... like, 4 of us that deploy AND are active enough in the project to be talking in the weekly meetings
20:53:37 <johnsom> Ok, so I think I hear all three stack holders that are present saying they are ok with this upgrade procedure. Correct?
20:53:39 <rm_work> that are saying it's ok
20:53:52 <rm_work> people who aren't this active but are deploying octavia ... those are who I worry about
20:54:03 <johnsom> Understand.
20:54:04 <rm_work> and i wonder about what kind of support load this will cause
20:54:18 <johnsom> I guess  the last option is to sit on this until someone can work on it
20:54:22 <rm_work> but ... i also don't have a lot of time to devote to this patch, so
20:54:34 <rm_work> we COULD patch just the amphora API
20:54:40 <rm_work> so we get it before it releases
20:54:43 <johnsom> I don't think that is a good option given it's scope
20:55:10 <rm_work> and then fix the actual DB later
20:55:11 <rm_work> the only place it's exposed is that API
20:55:14 <nmagnezi> rm_work, that's a valid point. they way I see it is the best option to ensure we don't break existing deployments is with alembic migrations\
20:55:19 <rm_work> so if we're looking like it won't make it to Queens, it's simple to update just those fields
20:55:44 <johnsom> It's the internals that are important for L3 Act/Act work
20:55:48 <rm_work> yeah ...
20:55:57 <rm_work> alright, maybe the best approach IS to "add columns"
20:56:04 <rm_work> and duplicate the writes?
20:56:10 <rm_work> these columns aren't large
20:56:16 <rm_work> so it's extra data but not tooo much
20:56:17 <rm_work> right?
20:56:22 <johnsom> Yeah, that is the work I described.
20:56:28 <jniesz> I think that would be acceptable
20:56:34 <johnsom> Then at some later point drop the duplicates
20:56:38 <rm_work> yeah
20:57:00 <rm_work> when is the last day we could do this
20:57:04 <johnsom> We have four minutes. I would like to close on this
20:57:11 <rm_work> Q-3?
20:57:15 <johnsom> Yes
20:57:16 <rm_work> is it a "feature"?
20:57:25 <rm_work> k and that's ... mid-Jan?
20:57:36 <johnsom> Yes.
20:57:43 <rm_work> k... I just won't have time until January
20:57:50 <rm_work> but THEN maybe I could look at it ... MAYBE.
20:57:52 <rm_work> if it's not done yet
20:57:59 <johnsom> Ok, so decided to make it upgrade compat.  We can figure out how/when it gets done later.
20:58:12 <rm_work> k
20:58:15 <rm_work> what DIDN'T we get to
20:58:31 <johnsom> Interface driver support and Members API Improvements Proposal
20:58:44 <nmagnezi> bar is not around
20:59:01 <nmagnezi> and.. we have like a min for it..
20:59:10 <cgoncalves> yeah...
20:59:10 <rm_work> yeah i was mostly just curious
20:59:18 <johnsom> Right.
20:59:35 <johnsom> Ok, thanks for a lively meeting today.  We got through some of it.
20:59:52 <johnsom> cgoncalves If you can hang around for a minute we can talk about your topic
21:00:03 <rm_work> yeah i'm good for a few min too
21:00:06 <johnsom> #endmeeting