15:00:49 <ihrachys> #startmeeting neutron_upgrades 15:00:50 <openstack> Meeting started Mon Feb 13 15:00:49 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:53 <openstack> The meeting name has been set to 'neutron_upgrades' 15:01:01 <korzen> hello 15:01:10 <sindhu> hi 15:02:04 <ihrachys> #topic Announcements 15:02:21 <ihrachys> 1. stable/ocata is created, master is open for merges 15:03:00 <ihrachys> 2. PTG is next week, the final agenda is currently in produce at https://etherpad.openstack.org/p/neutron-ptg-pike-final 15:03:04 <electrocucaracha> o/ 15:03:12 <ihrachys> there is a section for upgrades at line 27 15:03:47 <ihrachys> kevinbenton was going to check with the foundation about room setup, to see if we may/want to split discussions, or we will need to do everything serialized 15:04:05 <ihrachys> I assume Kevin will update start of this week 15:04:38 <electrocucaracha> ihrachys: I added a couple of things in the korzen 's etherpad 15:04:53 <ihrachys> yeah, I am meant to sync that back into the -final 15:05:12 <ihrachys> #action ihrachys to sync -upgrades PTG etherpad back into -final 15:06:20 <ihrachys> #topic Partial Multinode Grenade 15:06:27 <sindhu> ihrachys: Is there anyway people who are not attending PTG can participate remotely? 15:06:33 <ihrachys> #undo 15:06:35 <openstack> Removing item from minutes: #topic Partial Multinode Grenade 15:07:12 <ihrachys> sindhu: practice shows not really. you may talk to PTL or organizers about your options though. 15:07:17 <manjeets> o/ 15:07:51 <sindhu> ihrachys: Okay, thanks :) 15:08:04 <ihrachys> #topic Partial Multinode Grenade 15:08:30 <ihrachys> korzen: any updates about your partial setup of newton vs. ocata on k8s? 15:09:38 <korzen> ihrachys, sad news fuel-ccp was not playing nice with me :( I was fixing new failures in the setup but I was not able to test the whole use case :( 15:10:00 <korzen> I should be able to get you the info bu tomorrow 15:10:02 <korzen> by* 15:10:08 <ihrachys> huh not up to the hype? ;) 15:10:20 <ihrachys> ok cool, thanks for pushing 15:10:25 <electrocucaracha> ihrachys: our goal is add support to rolling upgrades or zero-downtime? we were discussing the other day that the current implementation of grenade restart all the services which is not the desired way in zero-downtime 15:10:27 <korzen> k8s is not up to the stability level :P 15:11:01 <ihrachys> electrocucaracha: it's both rolling upgrades and zero API downtime. we have some form of rolling upgrades. 15:11:13 <ihrachys> rolling upgrade only means you can upgrade services one by one 15:11:29 <ihrachys> instead of bringing down the whole cluster 15:11:44 <ihrachys> which we have (you can upgrade server without agents) 15:12:17 <electrocucaracha> but we have to upgrade all the server nodes at the same time right? 15:12:34 <manjeets> upgrading one instance of each service at once while keeping others running 15:12:58 <ihrachys> as far as grenade partial linuxbridge job, it's at this point at the same failure rate level as tempest one for the backend, and it's hard to understand if it's specific to grenade, or a general oom-killer/libvirt crashing thing that lingers our gates 15:13:35 <ihrachys> electrocucaracha: yes, we have to, but that's not part of the definition of rolling upgrades as documented by https://governance.openstack.org/tc/reference/tags/assert_supports-rolling-upgrade.html 15:14:00 <ihrachys> "This does not require complete elimination of downtime during upgrades, but rather reducing the scope from “all services” to “some services at a time.” In other words, “restarting all API services together” is a reasonable restriction." 15:15:12 <electrocucaracha> ihrachys: gotcha, thanks 15:15:15 * manjeets thought something like running multiple copies of neutron-server and updating one at a time while running other on old version 15:15:45 <ihrachys> manjeets: that's what https://governance.openstack.org/tc/reference/tags/assert_supports-zero-downtime-upgrade.html#tag-assert-supports-zero-downtime-upgrade is for 15:16:03 <ihrachys> "this tag requires services to completely eliminate API downtime of the control plane during the upgrade. In other words, requiring operators to “restart all API services together” is not reasonable under this tag." 15:16:41 <ihrachys> while we are at it, there is another one on top of it all, https://governance.openstack.org/tc/reference/tags/assert_supports-zero-impact-upgrade.html#tag-assert-supports-zero-impact-upgrade 15:16:55 <ihrachys> that last one forbids even performance degradation during upgrade 15:17:25 <electrocucaracha> which nobody has achieved, isn't it? 15:17:42 <manjeets> not even first ? 15:17:58 <ihrachys> I think even zero-api-downtime is no one's achievement 15:18:02 <ihrachys> because it assumes CI setup 15:18:23 <ihrachys> and that I believe is waiting for dolphm to clear the gate framework for projects to adopt 15:18:30 <electrocucaracha> well, afaik nova is working on that 15:18:48 <ihrachys> speaking of which, dolphm do you need any help with defining the framework? where is the work tracked? 15:21:35 <korzen> nova is working on CI, but are they changing the nova-conductor approach to support running different version of conductor at the same time? 15:21:59 <ihrachys> seems like we don't have dolphm around 15:22:03 <korzen> currently, nova is requiring to upgrade all of your conductors in the same time 15:22:37 <electrocucaracha> korzen: even when they have OVO already implemented? 15:23:14 <korzen> electrocucaracha, their ovo approach is also different from what we should cosider 15:23:18 <korzen> consinder* 15:24:02 <korzen> their ovo approach is online data migration while save/get but they are also removing old data, marking it with null 15:24:06 <electrocucaracha> ihrachys: dolphm is not in his desk, not sure if he is going to work remotely 15:24:40 <ihrachys> korzen: how does nullifying work with old services reading from there? 15:24:58 <korzen> ihrachys, it doesn't 15:25:13 <korzen> the assumption is that you are using only few format 15:25:21 <ihrachys> *new you mean? 15:25:37 <korzen> yea ;) 15:25:39 <ihrachys> format of what? 15:26:17 <korzen> the new place of data or new format 15:26:21 <korzen> format of data 15:26:39 <korzen> blob version int/bool type 15:26:52 <korzen> blob instead of int* 15:27:31 <korzen> I'm working on some draft how we should consider online upgrades, I will publish it tomorrow 15:27:36 <ihrachys> oh so you upgrade conductors and they all know the new format? 15:27:45 <korzen> yes 15:27:49 <ihrachys> meh 15:27:59 <ihrachys> that's hitting the can :) 15:28:02 <ihrachys> down the road 15:28:13 <ihrachys> ok let's move on to 15:28:18 <ihrachys> #topic Object implementation 15:28:30 <ihrachys> korzen: you mentioned you woll work on a write-up 15:28:41 <ihrachys> I wonder how electrocucaracha's https://review.openstack.org/#/c/432494/1/neutron/db/migration/alembic_migrations/versions/newton/online/__init__.py fits in 15:29:09 <korzen> yes, electrocucaracha great stuff 15:29:25 <electrocucaracha> I just tried to use the nova approach in our plans 15:29:32 <korzen> this is one part of the mechanism, the CLI tool 15:29:47 <korzen> the second part is how to manage old/new format in OVO 15:30:39 <ihrachys> yeah, I was only thinking that from code organization perspective, it would make sense to store the actual rules in object modules themselves. 15:31:12 <ihrachys> and another consideration would be, whether we can reuse code between lazy migration (on object update) and this forced online migration 15:32:00 <korzen> in nova and cinder it is done in sql 15:32:30 <korzen> one of the cinder example is doing the online data migration without OVO 15:33:05 <korzen> they are providing the CLI to do it, and some simple code in fetching method to port old data to new format 15:33:43 <korzen> in that simple example, they decided to add prefix to some ID 15:35:09 <ihrachys> I guess we can look at code reuse between OVO update path and new migration path later, and stick to direct sql for now. as for code organization, I would think that maybe objects would register online migration functions through let's say stevedore, and then the CLI tool will consume those. that would allow external subprojects to plug into the system. 15:35:38 <korzen> ihrachys, that is also a good idea 15:36:01 <ihrachys> electrocucaracha: thanks for working on the patch, cool stuff 15:36:34 <electrocucaracha> ihrachys: no problem, feel free to modify or completely change it 15:37:22 <ihrachys> korzen: what's the state of port binding OVO integration (https://review.openstack.org/#/c/407868/)? I see it passed most jobs, so it ready? 15:37:53 <korzen> yes, but it still has some problems with tempest 15:38:08 <korzen> and grenade multinode dvr 15:38:21 <korzen> I'm not sure where is the issue there 15:38:37 <korzen> the whole problem seems to be with stale data 15:39:02 <korzen> we are updating the port binding in OVO, but port db data has stale info about the binding 15:39:36 <korzen> I'm thinking if we should first work on port OVO adoption to fully address the issues with port binding 15:40:30 <korzen> so it is not so trivial as it may looked like 15:40:45 <ihrachys> I see 15:43:00 <ihrachys> ok one other patch that I wanted to discuss is 15:43:01 <ihrachys> https://review.openstack.org/#/c/419152 15:43:17 <ihrachys> it adds support for pattern matching for get_* arguments 15:43:29 <ihrachys> I am specifically looking at https://review.openstack.org/#/c/419152/8/neutron/objects/utils.py@34 15:43:47 <ihrachys> and I am not sure if I get intent right 15:44:37 <ihrachys> I thought that if we would expose a general pattern matching (%XXX%), then we could as well match against XXX% as well as %XXX? 15:45:43 <ihrachys> I mean, exposing all three would make sense as long as we stick to the existing code in https://review.openstack.org/#/c/419152/8/neutron/db/common_db_mixin.py 15:45:54 <electrocucaracha> but that only applies for strings :S 15:46:01 <ihrachys> but sqlalchemy exposes .like operator too, so why not sticking to it for all cases? 15:46:35 <ihrachys> tonytan4ever: ^ 15:46:49 <ihrachys> electrocucaracha: elaborate 15:47:52 <tonytan4ever> ihrachys: sqlalchemy supports pattern matching (%XXX%), XXX% and %XXX as different query API though. 15:47:55 <electrocucaracha> well, basically the idea that tonytan4ever is doing is the implementation of more filters, in this case he is only covering the scenario for strings 15:48:28 <electrocucaracha> but I wondering if we have something for filters of dates or strings 15:49:10 <electrocucaracha> I've seen patches which also uses those special filters and they ended in classmethod 15:49:30 <ihrachys> tonytan4ever: but we have .like too? 15:50:08 <ihrachys> see e.g. http://docs.sqlalchemy.org/en/latest/core/sqlelement.html?highlight=like#sqlalchemy.sql.expression.ColumnElement.like 15:51:19 <tonytan4ever> ihrachys: .like could work, but it still expose the sql details to the user though. e.g: you need to say .like(%foobar%) 15:51:33 <tonytan4ever> Where if you use contains, you just need to say .contains("foobar") 15:52:46 <tonytan4ever> Also ".like" covers the case for startswith and endswith, True that. startswith and endswith can provide short hand for some use case like: 15:53:08 <tonytan4ever> get all agents name starting with "zoneA-agent" though. 15:54:14 <ihrachys> ok I see where you come from 15:55:18 <ihrachys> I don't particularly hate the %XXX% notation but I guess you can argue it suggests sql 15:55:53 <ihrachys> ok I will recheck the patch as it stands now 15:56:25 <tonytan4ever> Thanks. 15:56:49 <ihrachys> ok there is little time, so a quick thing... 15:56:59 <ihrachys> #topic Open discussion 15:57:44 <electrocucaracha> ihrachys: I added the topic of db prune rows for the ptg 15:57:58 <ihrachys> I raised the point of memory consumption going nuts in gate before; folks are still looking at it, some data is collected, but we struggle to get it for spawned workers that are of more interest 15:58:01 <ihrachys> electrocucaracha: thanks 15:58:19 <ihrachys> so re: memory, tl;dr is we are still waiting for results that we could work with 15:58:34 <korzen> ok 15:58:34 <ihrachys> so far we saw lots of sqlalchemy and oslo.messaging objects 15:59:12 * electrocucaracha is still waiting for johndperkins patch 15:59:30 <ihrachys> electrocucaracha: which exactly? 16:00:12 <electrocucaracha> ihrachys: he has something in mind that can help to reduce memory, more likely he has been busy doing other stuff 16:00:12 <ihrachys> ok we need to give the room back to openstack 16:00:19 <electrocucaracha> sure 16:00:24 <ihrachys> let's follow up in the team channel 16:00:26 <ihrachys> #endmeeting