15:00:49 <ihrachys> #startmeeting neutron_upgrades
15:00:50 <openstack> Meeting started Mon Feb 13 15:00:49 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:53 <openstack> The meeting name has been set to 'neutron_upgrades'
15:01:01 <korzen> hello
15:01:10 <sindhu> hi
15:02:04 <ihrachys> #topic Announcements
15:02:21 <ihrachys> 1. stable/ocata is created, master is open for merges
15:03:00 <ihrachys> 2. PTG is next week, the final agenda is currently in produce at https://etherpad.openstack.org/p/neutron-ptg-pike-final
15:03:04 <electrocucaracha> o/
15:03:12 <ihrachys> there is a section for upgrades at line 27
15:03:47 <ihrachys> kevinbenton was going to check with the foundation about room setup, to see if we may/want to split discussions, or we will need to do everything serialized
15:04:05 <ihrachys> I assume Kevin will update start of this week
15:04:38 <electrocucaracha> ihrachys: I added a couple of things in the korzen 's etherpad
15:04:53 <ihrachys> yeah, I am meant to sync that back into the -final
15:05:12 <ihrachys> #action ihrachys to sync -upgrades PTG etherpad back into -final
15:06:20 <ihrachys> #topic Partial Multinode Grenade
15:06:27 <sindhu> ihrachys: Is there anyway people who are not attending PTG can participate remotely?
15:06:33 <ihrachys> #undo
15:06:35 <openstack> Removing item from minutes: #topic Partial Multinode Grenade
15:07:12 <ihrachys> sindhu: practice shows not really. you may talk to PTL or organizers about your options though.
15:07:17 <manjeets> o/
15:07:51 <sindhu> ihrachys: Okay, thanks :)
15:08:04 <ihrachys> #topic Partial Multinode Grenade
15:08:30 <ihrachys> korzen: any updates about your partial setup of newton vs. ocata on k8s?
15:09:38 <korzen> ihrachys, sad news fuel-ccp was not playing nice with me :( I was fixing new failures in the setup but I was not able to test the whole use case :(
15:10:00 <korzen> I should be able to get you the info bu tomorrow
15:10:02 <korzen> by*
15:10:08 <ihrachys> huh not up to the hype? ;)
15:10:20 <ihrachys> ok cool, thanks for pushing
15:10:25 <electrocucaracha> ihrachys: our goal is add support to rolling upgrades or zero-downtime? we were discussing the other day that the current implementation of grenade restart all the services which is not the desired way in zero-downtime
15:10:27 <korzen> k8s is not up to the stability level :P
15:11:01 <ihrachys> electrocucaracha: it's both rolling upgrades and zero API downtime. we have some form of rolling upgrades.
15:11:13 <ihrachys> rolling upgrade only means you can upgrade services one by one
15:11:29 <ihrachys> instead of bringing down the whole cluster
15:11:44 <ihrachys> which we have (you can upgrade server without agents)
15:12:17 <electrocucaracha> but we have to upgrade all the server nodes at the same time right?
15:12:34 <manjeets> upgrading one instance of each service at once while keeping others running
15:12:58 <ihrachys> as far as grenade partial linuxbridge job, it's at this point at the same failure rate level as tempest one for the backend, and it's hard to understand if it's specific to grenade, or a general oom-killer/libvirt crashing thing that lingers our gates
15:13:35 <ihrachys> electrocucaracha: yes, we have to, but that's not part of the definition of rolling upgrades as documented by https://governance.openstack.org/tc/reference/tags/assert_supports-rolling-upgrade.html
15:14:00 <ihrachys> "This does not require complete elimination of downtime during upgrades, but rather reducing the scope from “all services” to “some services at a time.” In other words, “restarting all API services together” is a reasonable restriction."
15:15:12 <electrocucaracha> ihrachys: gotcha, thanks
15:15:15 * manjeets thought something like running multiple copies of neutron-server and updating one at a time while running other on old version
15:15:45 <ihrachys> manjeets: that's what https://governance.openstack.org/tc/reference/tags/assert_supports-zero-downtime-upgrade.html#tag-assert-supports-zero-downtime-upgrade is for
15:16:03 <ihrachys> "this tag requires services to completely eliminate API downtime of the control plane during the upgrade. In other words, requiring operators to “restart all API services together” is not reasonable under this tag."
15:16:41 <ihrachys> while we are at it, there is another one on top of it all, https://governance.openstack.org/tc/reference/tags/assert_supports-zero-impact-upgrade.html#tag-assert-supports-zero-impact-upgrade
15:16:55 <ihrachys> that last one forbids even performance degradation during upgrade
15:17:25 <electrocucaracha> which nobody has achieved, isn't it?
15:17:42 <manjeets> not even first ?
15:17:58 <ihrachys> I think even zero-api-downtime is no one's achievement
15:18:02 <ihrachys> because it assumes CI setup
15:18:23 <ihrachys> and that I believe is waiting for dolphm to clear the gate framework for projects to adopt
15:18:30 <electrocucaracha> well, afaik nova is working on that
15:18:48 <ihrachys> speaking of which, dolphm do you need any help with defining the framework? where is the work tracked?
15:21:35 <korzen> nova is working on CI, but are they changing the nova-conductor approach to support running different version of conductor at the same time?
15:21:59 <ihrachys> seems like we don't have dolphm around
15:22:03 <korzen> currently, nova is requiring to upgrade all of your conductors in the same time
15:22:37 <electrocucaracha> korzen: even when they have OVO already implemented?
15:23:14 <korzen> electrocucaracha, their ovo approach is also different from what we should cosider
15:23:18 <korzen> consinder*
15:24:02 <korzen> their ovo approach is online data migration while save/get but they are also removing old data, marking it with null
15:24:06 <electrocucaracha> ihrachys: dolphm is not in his desk, not sure if he is going to work remotely
15:24:40 <ihrachys> korzen: how does nullifying work with old services reading from there?
15:24:58 <korzen> ihrachys, it doesn't
15:25:13 <korzen> the assumption is that you are using only few format
15:25:21 <ihrachys> *new you mean?
15:25:37 <korzen> yea ;)
15:25:39 <ihrachys> format of what?
15:26:17 <korzen> the new place of data or new format
15:26:21 <korzen> format of data
15:26:39 <korzen> blob version int/bool type
15:26:52 <korzen> blob instead of int*
15:27:31 <korzen> I'm working on some draft how we should consider online upgrades, I will publish it tomorrow
15:27:36 <ihrachys> oh so you upgrade conductors and they all know the new format?
15:27:45 <korzen> yes
15:27:49 <ihrachys> meh
15:27:59 <ihrachys> that's hitting the can :)
15:28:02 <ihrachys> down the road
15:28:13 <ihrachys> ok let's move on to
15:28:18 <ihrachys> #topic Object implementation
15:28:30 <ihrachys> korzen: you mentioned you woll work on a write-up
15:28:41 <ihrachys> I wonder how electrocucaracha's https://review.openstack.org/#/c/432494/1/neutron/db/migration/alembic_migrations/versions/newton/online/__init__.py fits in
15:29:09 <korzen> yes, electrocucaracha great stuff
15:29:25 <electrocucaracha> I just tried to use the nova approach in our plans
15:29:32 <korzen> this is one part of the mechanism, the CLI tool
15:29:47 <korzen> the second part is how to manage old/new format in OVO
15:30:39 <ihrachys> yeah, I was only thinking that from code organization perspective, it would make sense to store the actual rules in object modules themselves.
15:31:12 <ihrachys> and another consideration would be, whether we can reuse code between lazy migration (on object update) and this forced online migration
15:32:00 <korzen> in nova and cinder it is done in sql
15:32:30 <korzen> one of the cinder example is doing the online data migration without OVO
15:33:05 <korzen> they are providing the CLI to do it, and some simple code in fetching method to port old data to new format
15:33:43 <korzen> in that simple example, they decided to add prefix to some ID
15:35:09 <ihrachys> I guess we can look at code reuse between OVO update path and new migration path later, and stick to direct sql for now. as for code organization, I would think that maybe objects would register online migration functions through let's say stevedore, and then the CLI tool will consume those. that would allow external subprojects to plug into the system.
15:35:38 <korzen> ihrachys, that is also a good idea
15:36:01 <ihrachys> electrocucaracha: thanks for working on the patch, cool stuff
15:36:34 <electrocucaracha> ihrachys: no problem, feel free to modify or completely change it
15:37:22 <ihrachys> korzen: what's the state of port binding OVO integration (https://review.openstack.org/#/c/407868/)? I see it passed most jobs, so it ready?
15:37:53 <korzen> yes, but it still has some problems with tempest
15:38:08 <korzen> and grenade multinode dvr
15:38:21 <korzen> I'm not sure where is the issue there
15:38:37 <korzen> the whole problem seems to be with stale data
15:39:02 <korzen> we are updating the port binding in OVO, but port db data has stale info about the binding
15:39:36 <korzen> I'm thinking if we should first work on port OVO adoption to fully address the issues with port binding
15:40:30 <korzen> so it is not so trivial as it may looked like
15:40:45 <ihrachys> I see
15:43:00 <ihrachys> ok one other patch that I wanted to discuss is
15:43:01 <ihrachys> https://review.openstack.org/#/c/419152
15:43:17 <ihrachys> it adds support for pattern matching for get_* arguments
15:43:29 <ihrachys> I am specifically looking at https://review.openstack.org/#/c/419152/8/neutron/objects/utils.py@34
15:43:47 <ihrachys> and I am not sure if I get intent right
15:44:37 <ihrachys> I thought that if we would expose a general pattern matching (%XXX%), then we could as well match against XXX% as well as %XXX?
15:45:43 <ihrachys> I mean, exposing all three would make sense as long as we stick to the existing code in https://review.openstack.org/#/c/419152/8/neutron/db/common_db_mixin.py
15:45:54 <electrocucaracha> but that only applies for strings :S
15:46:01 <ihrachys> but sqlalchemy exposes .like operator too, so why not sticking to it for all cases?
15:46:35 <ihrachys> tonytan4ever: ^
15:46:49 <ihrachys> electrocucaracha: elaborate
15:47:52 <tonytan4ever> ihrachys: sqlalchemy supports pattern matching (%XXX%), XXX%  and %XXX as different query API though.
15:47:55 <electrocucaracha> well, basically the idea that tonytan4ever  is doing is the implementation of more filters, in this case he is only covering the scenario for strings
15:48:28 <electrocucaracha> but I wondering if we have something for filters of dates or strings
15:49:10 <electrocucaracha> I've seen patches which also uses those special filters and they ended in classmethod
15:49:30 <ihrachys> tonytan4ever: but we have .like too?
15:50:08 <ihrachys> see e.g. http://docs.sqlalchemy.org/en/latest/core/sqlelement.html?highlight=like#sqlalchemy.sql.expression.ColumnElement.like
15:51:19 <tonytan4ever> ihrachys:  .like could work, but it still expose the sql details to the user though. e.g:  you need to say .like(%foobar%)
15:51:33 <tonytan4ever> Where if you use contains, you just need to say .contains("foobar")
15:52:46 <tonytan4ever> Also ".like" covers the case for startswith and endswith, True that. startswith and endswith can provide short hand for some use case like:
15:53:08 <tonytan4ever> get all agents name starting with "zoneA-agent" though.
15:54:14 <ihrachys> ok I see where you come from
15:55:18 <ihrachys> I don't particularly hate the %XXX% notation but I guess you can argue it suggests sql
15:55:53 <ihrachys> ok I will recheck the patch as it stands now
15:56:25 <tonytan4ever> Thanks.
15:56:49 <ihrachys> ok there is little time, so a quick thing...
15:56:59 <ihrachys> #topic Open discussion
15:57:44 <electrocucaracha> ihrachys: I added the topic of db prune rows for the ptg
15:57:58 <ihrachys> I raised the point of memory consumption going nuts in gate before; folks are still looking at it, some data is collected, but we struggle to get it for spawned workers that are of more interest
15:58:01 <ihrachys> electrocucaracha: thanks
15:58:19 <ihrachys> so re: memory, tl;dr is we are still waiting for results that we could work with
15:58:34 <korzen> ok
15:58:34 <ihrachys> so far we saw lots of sqlalchemy and oslo.messaging objects
15:59:12 * electrocucaracha is still waiting for johndperkins patch
15:59:30 <ihrachys> electrocucaracha: which exactly?
16:00:12 <electrocucaracha> ihrachys: he has something in mind that can help to reduce memory, more likely he has been busy doing other stuff
16:00:12 <ihrachys> ok we need to give the room back to openstack
16:00:19 <electrocucaracha> sure
16:00:24 <ihrachys> let's follow up in the team channel
16:00:26 <ihrachys> #endmeeting