15:01:23 #startmeeting neutron_upgrades 15:01:24 Meeting started Mon Mar 20 15:01:23 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:28 The meeting name has been set to 'neutron_upgrades' 15:01:36 #link https://wiki.openstack.org/wiki/Meetings/Neutron-Upgrades-Subteam Agenda 15:01:46 o/ 15:02:09 o/ 15:02:18 o/ 15:02:22 first, let's review action items from prev meeting 15:02:30 "everyone to review online data migration neutron-db-manage command: https://review.openstack.org/#/c/432494/" 15:02:36 I don't think anyone posted comments 15:02:42 though I actually looked at it 15:03:11 I was wondering, the patch proposes to add a new command 15:03:23 o/ 15:03:28 I can add more examples in order to make it clear 15:03:41 if that's the main reason for people to don't review it 15:03:42 since we have alembic branches, wouldn't it make sense to have it as a separate alembic branch? 15:03:53 then we could reuse the same neutron-db-manage upgrade command 15:05:21 that would mean the upgrade would look like: upgrade --expand; restart controllers one by one; before next upgrade, upgrade --migrate-data; then upgrade to next major version (repeat the process) 15:06:39 ihrachys: my understand is that data online migration can be used as cron task isn't it? 15:07:10 ihrachys: so it's possible to migrate by series of 10 or 20 rows without impacting the db to much 15:07:17 I am not sure there is a point in doing that work as cron, it's one operation to execute before next upgrade 15:08:22 the only potential issue I see with alembic approach is that maybe alembic doesn't allow to bulk operations 15:09:10 ok gotta think more. I will post the idea on the patch and we can discuss there. 15:09:21 ihrachys: +1 15:09:27 let's repeat the action 15:09:30 #action everyone to review online data migration neutron-db-manage command: https://review.openstack.org/#/c/432494/ 15:09:45 next was "ihrachys to spec a mechanism to tackle differences in the list of extensions exposed by multiple mixed server nodes" 15:09:59 I reported the RFE here: https://bugs.launchpad.net/neutron/+bug/1672852 15:09:59 Launchpad bug 1672852 in neutron "[RFE] Make controllers with different list of supported API extensions to behave identically" [Wishlist,New] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 15:10:20 and I started drafting the spec locally, there is nothing on gerrit just yet, but I plan to post in next day or two 15:11:03 since it's not complete, will also repeat the action 15:11:09 #action ihrachys to spec a mechanism to tackle differences in the list of extensions exposed by multiple mixed server nodes 15:11:17 next was "electrocucaracha to explore status of mixed server version gating in nova/infra" 15:11:37 well Dolph didn't come to the office this week 15:11:49 * electrocucaracha blames the spring break for that 15:12:10 now he's in his desk 15:12:28 let me ask him if he had a chance to join to this meeting 15:14:13 o/ 15:14:21 dolphm: hey! 15:14:44 dolphm: we were wondering what's the status of mixed controller version testing in u/s gate for nova, and where we can plug ourselves 15:14:44 ihrachys: I could't formulate the question that you had 15:14:53 we really want to make progress on that matter earlier 15:15:08 * dolphm is reading the meeting log as well 15:15:17 we had some manual testing for N->O but want more 15:17:54 and yes, i was out because of spring break 15:18:35 i'll also review electrocucaracha's online migrations spec :) 15:18:39 so, status of gating 15:19:18 the status pre-PTG and the direction of QE post-PTG are a bit different, so let me cover both 15:20:09 pre-PTG, each project has been landing multinode grenade jobs and passing some flag to run different devstacks on each node, then run whatever special testing they want to ensure that specific service version intermix works 15:21:28 post-PTG, the QE team agreed to utilize downstream deployment projects to provide feedback on multinode upgrades 15:22:08 so, the current vision is to have a page on http://status.openstack.org/ similar to rechecks that shows upgrade statistics, as performed by various downstream deployment projects 15:22:29 hmm 15:22:34 not sure what it means 15:22:44 which projects do you mean? tripleo? 15:22:49 so, for example, you'll be able to go to the upgrade page, and see openstack-ansible attempting to do a rolling upgrade of neutron, and the results of the smoke tests performed during that upgrade, how long the upgrade took, etc, and how those numbers are changing over time 15:23:17 ihrachys: right, whoever implements rolling upgrades in an upstream projects and wants to provide feedback on it 15:23:19 but how do we control their specific way of upgrade? 15:23:43 and how do we gate on it (I think that was the crucial part of the governance tag)? 15:24:14 ihrachys: we won't! from a governance perspective, we're asking upstream services to document a supported upgrade path, and expecting deployers to follow that and provide feedback 15:24:53 so as long as we have some procedure documented, we can get the tag even if the procedure is broken? 15:25:19 ihrachys: realistically, i don't think we'll be able to achieve per-commit upgrade jobs for every project 15:25:56 but, we're still aiming for non-voting check jobs at the very least, and i can see the tag being dependent on a periodic job 15:26:16 ihrachys: if the procedure is broken, i don't think the tag should apply, no :P 15:26:22 dolphm: so, we are getting back to the point on how to produce such a job 15:27:37 ihrachys: here's an example of a similar job https://review.openstack.org/#/c/446235/ 15:27:46 ihrachys: check out "gate-openstack-ansible-os_keystone-ansible-upgrade-ubuntu-xenial" 15:28:31 dolphm: so you say instead of producing a grenade job for that matter, we should work with deployment tools to produce jobs with their tooling? 15:28:31 the console log includes a benchmark run of a couple smoke tests for the duration of the upgrade http://logs.openstack.org/35/446235/1/check/gate-openstack-ansible-os_keystone-ansible-upgrade-ubuntu-xenial/4942366/console.html#_2017-03-16_00_38_57_740370 15:28:47 we can extract those uptime stats and publish the results 15:29:06 ihrachys: correct 15:29:47 dolphm: not sure I follow the reason behind that. how is that special from other jobs that use grenade? were there any technical obstacles identified that suggested grenade is not up for the job? 15:31:12 ihrachys: yes, it came down to technicalities of bending grenade+devstack to do orchestration it wasn't really designed to do, whereas the orchestration projects are designed for exactly that (and in OSA's case, it can do it in an AIO to boot) 15:31:22 versus needed multiple nodes from infra 15:31:57 ok gotcha. so the right path would be contributing to ansible playbooks and/or tripleo modules. 15:32:14 ihrachys: that would be awesome, yes 15:32:25 and probably ansible is a better case because it's containerized, and tripleo is not yet 15:32:28 ihrachys: i haven't followed kolla too closely, but i'd include them in that list as well 15:32:42 ok gotta think about it for a while, thanks for the info 15:32:48 ihrachys: ++ 15:33:36 kolla is simpler than osa ( i used it couple of times) 15:33:44 ihrachys: i'd suggest dropping into #openstack-ansible if you want to pursue a playbook with them first 15:33:55 they're pretty eager to help, for sure 15:34:36 ok let's move on 15:35:11 manjeets: any updates on the grenade linuxbridge job? have you pinpointed log snippets and requests in service logs for failures we see? 15:35:20 #topic Linuxbridge multinode grenade job 15:35:35 I've used script to fetch all the Error logs 15:35:55 i sent the paste over irc last week 15:35:58 yeah but that didn't give the answer did it 15:36:10 not really 15:36:21 I tried setting up two node locally 15:36:27 which gave me issues 15:37:09 I'll continue debugging that to find something 15:37:56 I usually just inspect logs 15:38:18 open files one by one, and read what happens end-to-end near the failure 15:38:28 ofc you need to understand what SHOULD happen 15:38:41 like floating ip router update propagated to agents at specific time and such 15:39:02 but yeah, local reproduction may be worth exploring too 15:39:15 ok let's move forward 15:39:17 #topic Object implementation 15:39:25 https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/adopt-oslo-versioned-objects-for-db 15:39:41 first thing first, I requested a revert of https://review.openstack.org/#/c/360908/ 15:39:43 dasanind: ^ 15:39:52 that's because the unit test is not passing the gate stable 15:40:11 ihrachys: when I put a recheck yesterday it passed the gate 15:40:23 probably because sometimes the method that generates random fields produce objects that are in conflict for db constraints 15:40:32 dasanind: well you should not do that 15:40:52 dasanind: there is a reason why https://review.openstack.org/#/c/426829/ exists 15:40:57 unit tests are very stable 15:41:11 ihrachys: I ran the test locally first before I put a recheck 15:41:16 ihrachys: do you think that using a more deterministic approach for OVO UTs should be needed 15:41:19 any failure there, especially in your test that you contribute, is a sign you have it wrong 15:41:21 ihrachys: all the tests passed 15:41:44 dasanind: they pass, maybe 9 out of 10 times, so what 15:41:55 we still should fix that 1/10 15:42:19 electrocucaracha: well we could switch from generator I guess though it would require a lot of work at this point :) 15:43:05 dasanind: I will have a look at the test, set WIP on the revevt for now 15:43:14 ihrachys: sometimes is needed, specially for those fields which accepts null values and the generator provides something 15:43:25 ihrachys: sure 15:43:48 ihrachys: about the revert patch, shouldn't be better to only revert the UT instead the whole patch? 15:44:02 electrocucaracha: I may just fix the test 15:44:09 I just noticed that the patch is landed 15:44:30 so first requested a revert so that it gets through check queue and ready for merge in case we fail to fix the test in time 15:44:40 I don't want the revert 15:45:09 manjeets: I believe now that lock_for_update removal for quotas is in: https://review.openstack.org/442181 15:45:15 we can move with quotas? 15:45:24 I mean this patch https://review.openstack.org/338625 15:45:28 ihrachys, I updated the patch quotas ovo patch 15:45:46 manjeets: does it need the LIKE support patch, or it's independent? 15:46:00 ihrachys: I don't think so 15:46:00 it does not need that AFAIK 15:46:17 ok cool 15:47:11 also to update you folks, there was a bug in tag OVO patch that we landed lately: https://review.openstack.org/356825 15:47:34 specifically, the bug was in https://review.openstack.org/#/c/356825/39/neutron/services/tag/tag_plugin.py@90 15:47:50 see that we don't pass standard_attr_id into delete_objects 15:48:05 which made it drop all matching tags from all resources :-x 15:48:25 that was fixed by https://review.openstack.org/446005 15:49:47 electrocucaracha: I saw you respinned NetworkSegment adoption patch and it's all red: https://review.openstack.org/#/c/385178/ 15:49:54 electrocucaracha: are you on top of it? 15:50:43 ihrachys: I'm still getting some issues locally 15:51:03 ihrachys: more likely, I'm gonna bother you later 15:51:31 ihrachys: but yes, that patch was rebased and it's still having some issues 15:51:32 ok 15:52:45 LIKE patch needs another round of reviewers attention: https://review.openstack.org/#/c/419152/ 15:53:34 ok let's move on 15:53:41 there are no new patches with UpgradeImpact tag 15:53:49 #topic Open discussion 15:53:58 there are no items in the wiki page to raise here 15:54:28 is everyone ok with the time shift for the meeting due to DST? 15:54:35 ihrachys: about high priority patches to take a look during this week, besides the data online migration 15:54:37 ? 15:55:09 ihrachys: both times work for me 15:55:24 ihrachys, this time is better 15:55:42 previous was like 7 am and I had hard time getting up sometimes 15:56:32 electrocucaracha: well we should fix the unit tests broken by router binding object; there will be the spec for running mixed server versions; I think LIKE patch should be ready to move forward; and I want to get to quotas OVO patch since it was close the last time I checked. 15:57:51 manjeets: yeah. I start at 6am so it was not a bother for me 15:58:01 ok I guess everyone is fine with the time 15:58:06 anything else? 15:59:09 ok thanks folks 15:59:13 nope, thanks ihrachys 15:59:17 Thank you. 15:59:19 thanks :) 15:59:20 #endmeeting