15:01:23 <ihrachys> #startmeeting neutron_upgrades
15:01:24 <openstack> Meeting started Mon Mar 20 15:01:23 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:28 <openstack> The meeting name has been set to 'neutron_upgrades'
15:01:36 <ihrachys> #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda
15:01:46 <electrocucaracha> o/
15:02:09 <manjeets> o/
15:02:18 <dasanind> o/
15:02:22 <ihrachys> first, let's review action items from prev meeting
15:02:30 <ihrachys> "everyone to review online data migration neutron-db-manage command: https://review.openstack.org/#/c/432494/"
15:02:36 <ihrachys> I don't think anyone posted comments
15:02:42 <ihrachys> though I actually looked at it
15:03:11 <ihrachys> I was wondering, the patch proposes to add a new command
15:03:23 <sindhu> o/
15:03:28 <electrocucaracha> I can add more examples in order to make it clear
15:03:41 <electrocucaracha> if that's the main reason for people to don't review it
15:03:42 <ihrachys> since we have alembic branches, wouldn't it make sense to have it as a separate alembic branch?
15:03:53 <ihrachys> then we could reuse the same neutron-db-manage upgrade command
15:05:21 <ihrachys> that would mean the upgrade would look like: upgrade --expand; restart controllers one by one; before next upgrade, upgrade --migrate-data; then upgrade to next major version (repeat the process)
15:06:39 <electrocucaracha> ihrachys: my understand is that data online migration can be used as cron task isn't it?
15:07:10 <electrocucaracha> ihrachys: so it's possible to migrate by series of 10 or 20 rows without impacting the db to much
15:07:17 <ihrachys> I am not sure there is a point in doing that work as cron, it's one operation to execute before next upgrade
15:08:22 <ihrachys> the only potential issue I see with alembic approach is that maybe alembic doesn't allow to bulk operations
15:09:10 <ihrachys> ok gotta think more. I will post the idea on the patch and we can discuss there.
15:09:21 <electrocucaracha> ihrachys: +1
15:09:27 <ihrachys> let's repeat the action
15:09:30 <ihrachys> #action everyone to review online data migration neutron-db-manage command: https://review.openstack.org/#/c/432494/
15:09:45 <ihrachys> next was "ihrachys to spec a mechanism to tackle differences in the list of extensions exposed by multiple mixed server nodes"
15:09:59 <ihrachys> I reported the RFE here: https://bugs.launchpad.net/neutron/+bug/1672852
15:09:59 <openstack> Launchpad bug 1672852 in neutron "[RFE] Make controllers with different list of supported API extensions to behave identically" [Wishlist,New] - Assigned to Ihar Hrachyshka (ihar-hrachyshka)
15:10:20 <ihrachys> and I started drafting the spec locally, there is nothing on gerrit just yet, but I plan to post in next day or two
15:11:03 <ihrachys> since it's not complete, will also repeat the action
15:11:09 <ihrachys> #action ihrachys to spec a mechanism to tackle differences in the list of extensions exposed by multiple mixed server nodes
15:11:17 <ihrachys> next was "electrocucaracha to explore status of mixed server version gating in nova/infra"
15:11:37 <electrocucaracha> well Dolph didn't come to the office this week
15:11:49 * electrocucaracha blames the spring break for that
15:12:10 <electrocucaracha> now he's in his desk
15:12:28 <electrocucaracha> let me ask him if he had a chance to join to this meeting
15:14:13 <dolphm> o/
15:14:21 <ihrachys> dolphm: hey!
15:14:44 <ihrachys> dolphm: we were wondering what's the status of mixed controller version testing in u/s gate for nova, and where we can plug ourselves
15:14:44 <electrocucaracha> ihrachys: I could't formulate the question that you had
15:14:53 <ihrachys> we really want to make progress on that matter earlier
15:15:08 * dolphm is reading the meeting log as well
15:15:17 <ihrachys> we had some manual testing for N->O but want more
15:17:54 <dolphm> and yes, i was out because of spring break
15:18:35 <dolphm> i'll also review electrocucaracha's online migrations spec :)
15:18:39 <dolphm> so, status of gating
15:19:18 <dolphm> the status pre-PTG and the direction of QE post-PTG are a bit different, so let me cover both
15:20:09 <dolphm> pre-PTG, each project has been landing multinode grenade jobs and passing some flag to run different devstacks on each node, then run whatever special testing they want to ensure that specific service version intermix works
15:21:28 <dolphm> post-PTG, the QE team agreed to utilize downstream deployment projects to provide feedback on multinode upgrades
15:22:08 <dolphm> so, the current vision is to have a page on http://status.openstack.org/ similar to rechecks that shows upgrade statistics, as performed by various downstream deployment projects
15:22:29 <ihrachys> hmm
15:22:34 <ihrachys> not sure what it means
15:22:44 <ihrachys> which projects do you mean? tripleo?
15:22:49 <dolphm> so, for example, you'll be able to go to the upgrade page, and see openstack-ansible attempting to do a rolling upgrade of neutron, and the results of the smoke tests performed during that upgrade, how long the upgrade took, etc, and how those numbers are changing over time
15:23:17 <dolphm> ihrachys: right, whoever implements rolling upgrades in an upstream projects and wants to provide feedback on it
15:23:19 <ihrachys> but how do we control their specific way of upgrade?
15:23:43 <ihrachys> and how do we gate on it (I think that was the crucial part of the governance tag)?
15:24:14 <dolphm> ihrachys: we won't! from a governance perspective, we're asking upstream services to document a supported upgrade path, and expecting deployers to follow that and provide feedback
15:24:53 <ihrachys> so as long as we have some procedure documented, we can get the tag even if the procedure is broken?
15:25:19 <dolphm> ihrachys: realistically, i don't think we'll be able to achieve per-commit upgrade jobs for every project
15:25:56 <dolphm> but, we're still aiming for non-voting check jobs at the very least, and i can see the tag being dependent on a periodic job
15:26:16 <dolphm> ihrachys: if the procedure is broken, i don't think the tag should apply, no :P
15:26:22 <ihrachys> dolphm: so, we are getting back to the point on how to produce such a job
15:27:37 <dolphm> ihrachys: here's an example of a similar job https://review.openstack.org/#/c/446235/
15:27:46 <dolphm> ihrachys: check out "gate-openstack-ansible-os_keystone-ansible-upgrade-ubuntu-xenial"
15:28:31 <ihrachys> dolphm: so you say instead of producing a grenade job for that matter, we should work with deployment tools to produce jobs with their tooling?
15:28:31 <dolphm> the console log includes a benchmark run of a couple smoke tests for the duration of the upgrade http://logs.openstack.org/35/446235/1/check/gate-openstack-ansible-os_keystone-ansible-upgrade-ubuntu-xenial/4942366/console.html#_2017-03-16_00_38_57_740370
15:28:47 <dolphm> we can extract those uptime stats and publish the results
15:29:06 <dolphm> ihrachys: correct
15:29:47 <ihrachys> dolphm: not sure I follow the reason behind that. how is that special from other jobs that use grenade? were there any technical obstacles identified that suggested grenade is not up for the job?
15:31:12 <dolphm> ihrachys: yes, it came down to technicalities of bending grenade+devstack to do orchestration it wasn't really designed to do, whereas the orchestration projects are designed for exactly that (and in OSA's case, it can do it in an AIO to boot)
15:31:22 <dolphm> versus needed multiple nodes from infra
15:31:57 <ihrachys> ok gotcha. so the right path would be contributing to ansible playbooks and/or tripleo modules.
15:32:14 <dolphm> ihrachys: that would be awesome, yes
15:32:25 <ihrachys> and probably ansible is a better case because it's containerized, and tripleo is not yet
15:32:28 <dolphm> ihrachys: i haven't followed kolla too closely, but i'd include them in that list as well
15:32:42 <ihrachys> ok gotta think about it for a while, thanks for the info
15:32:48 <dolphm> ihrachys: ++
15:33:36 <manjeets> kolla is simpler than osa ( i used it couple of times)
15:33:44 <dolphm> ihrachys: i'd suggest dropping into #openstack-ansible if you want to pursue a playbook with them first
15:33:55 <dolphm> they're pretty eager to help, for sure
15:34:36 <ihrachys> ok let's move on
15:35:11 <ihrachys> manjeets: any updates on the grenade linuxbridge job? have you pinpointed log snippets and requests in service logs for failures we see?
15:35:20 <ihrachys> #topic Linuxbridge multinode grenade job
15:35:35 <manjeets> I've used script to fetch all the Error logs
15:35:55 <manjeets> i sent the paste over irc last week
15:35:58 <ihrachys> yeah but that didn't give the answer did it
15:36:10 <manjeets> not really
15:36:21 <manjeets> I tried setting up two node locally
15:36:27 <manjeets> which gave me issues
15:37:09 <manjeets> I'll continue debugging that to find something
15:37:56 <ihrachys> I usually just inspect logs
15:38:18 <ihrachys> open files one by one, and read what happens end-to-end near the failure
15:38:28 <ihrachys> ofc you need to understand what SHOULD happen
15:38:41 <ihrachys> like floating ip router update propagated to agents at specific time and such
15:39:02 <ihrachys> but yeah, local reproduction may be worth exploring too
15:39:15 <ihrachys> ok let's move forward
15:39:17 <ihrachys> #topic Object implementation
15:39:25 <ihrachys> https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/adopt-oslo-versioned-objects-for-db
15:39:41 <ihrachys> first thing first, I requested a revert of https://review.openstack.org/#/c/360908/
15:39:43 <ihrachys> dasanind: ^
15:39:52 <ihrachys> that's because the unit test is not passing the gate stable
15:40:11 <dasanind> ihrachys: when I put a recheck yesterday it passed the gate
15:40:23 <ihrachys> probably because sometimes the method that generates random fields produce objects that are in conflict for db constraints
15:40:32 <ihrachys> dasanind: well you should not do that
15:40:52 <ihrachys> dasanind: there is a reason why https://review.openstack.org/#/c/426829/ exists
15:40:57 <ihrachys> unit tests are very stable
15:41:11 <dasanind> ihrachys: I ran the test locally first before I put a recheck
15:41:16 <electrocucaracha> ihrachys: do you think that using a more deterministic approach for OVO UTs should be needed
15:41:19 <ihrachys> any failure there, especially in your test that you contribute, is a sign you have it wrong
15:41:21 <dasanind> ihrachys: all the tests passed
15:41:44 <ihrachys> dasanind: they pass, maybe 9 out of 10 times, so what
15:41:55 <ihrachys> we still should fix that 1/10
15:42:19 <ihrachys> electrocucaracha: well we could switch from generator I guess though it would require a lot of work at this point :)
15:43:05 <ihrachys> dasanind: I will have a look at the test, set WIP on the revevt for now
15:43:14 <electrocucaracha> ihrachys: sometimes is needed, specially for those fields which accepts null values and the generator provides something
15:43:25 <dasanind> ihrachys: sure
15:43:48 <electrocucaracha> ihrachys: about the revert patch, shouldn't be better to only revert the UT instead the whole patch?
15:44:02 <ihrachys> electrocucaracha: I may just fix the test
15:44:09 <ihrachys> I just noticed that the patch is landed
15:44:30 <ihrachys> so first requested a revert so that it gets through check queue and ready for merge in case we fail to fix the test in time
15:44:40 <ihrachys> I don't want the revert
15:45:09 <ihrachys> manjeets: I believe now that lock_for_update removal for quotas is in: https://review.openstack.org/442181
15:45:15 <ihrachys> we can move with quotas?
15:45:24 <ihrachys> I mean this patch https://review.openstack.org/338625
15:45:28 <manjeets> ihrachys, I updated the patch quotas ovo patch
15:45:46 <ihrachys> manjeets: does it need the LIKE support patch, or it's independent?
15:46:00 <electrocucaracha> ihrachys: I don't think so
15:46:00 <manjeets> it does not need that AFAIK
15:46:17 <ihrachys> ok cool
15:47:11 <ihrachys> also to update you folks, there was a bug in tag OVO patch that we landed lately: https://review.openstack.org/356825
15:47:34 <ihrachys> specifically, the bug was in https://review.openstack.org/#/c/356825/39/neutron/services/tag/tag_plugin.py@90
15:47:50 <ihrachys> see that we don't pass standard_attr_id into delete_objects
15:48:05 <ihrachys> which made it drop all matching tags from all resources :-x
15:48:25 <ihrachys> that was fixed by https://review.openstack.org/446005
15:49:47 <ihrachys> electrocucaracha: I saw you respinned NetworkSegment adoption patch and it's all red: https://review.openstack.org/#/c/385178/
15:49:54 <ihrachys> electrocucaracha: are you on top of it?
15:50:43 <electrocucaracha> ihrachys: I'm still getting some issues locally
15:51:03 <electrocucaracha> ihrachys: more likely, I'm gonna bother you later
15:51:31 <electrocucaracha> ihrachys: but yes, that patch was rebased and it's still having some issues
15:51:32 <ihrachys> ok
15:52:45 <ihrachys> LIKE patch needs another round of reviewers attention: https://review.openstack.org/#/c/419152/
15:53:34 <ihrachys> ok let's move on
15:53:41 <ihrachys> there are no new patches with UpgradeImpact tag
15:53:49 <ihrachys> #topic Open discussion
15:53:58 <ihrachys> there are no items in the wiki page to raise here
15:54:28 <ihrachys> is everyone ok with the time shift for the meeting due to DST?
15:54:35 <electrocucaracha> ihrachys: about high priority patches to take a look during this week, besides the data online migration
15:54:37 <electrocucaracha> ?
15:55:09 <electrocucaracha> ihrachys: both times work for me
15:55:24 <manjeets> ihrachys, this time is better
15:55:42 <manjeets> previous was like 7 am and I had hard time getting up sometimes
15:56:32 <ihrachys> electrocucaracha: well we should fix the unit tests broken by router binding object; there will be the spec for running mixed server versions; I think LIKE patch should be ready to move forward; and I want to get to quotas OVO patch since it was close the last time I checked.
15:57:51 <ihrachys> manjeets: yeah. I start at 6am so it was not a bother for me
15:58:01 <ihrachys> ok I guess everyone is fine with the time
15:58:06 <ihrachys> anything else?
15:59:09 <ihrachys> ok thanks folks
15:59:13 <electrocucaracha> nope, thanks ihrachys
15:59:17 <sshank> Thank you.
15:59:19 <manjeets> thanks :)
15:59:20 <ihrachys> #endmeeting