16:00:44 <ihrachys> #startmeeting neutron_ci
16:00:45 <openstack> Meeting started Tue Apr 11 16:00:44 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:48 <openstack> The meeting name has been set to 'neutron_ci'
16:00:53 <ihrachys> #link https://wiki.openstack.org/wiki/Meetings/NeutronCI Agenda
16:01:10 <ihrachys> hi everyone
16:01:22 <jlibosva> o/hello
16:01:52 <haleyb> hi
16:02:46 <ihrachys> #topic Action items from prev meeting
16:03:07 <ihrachys> first is still "ihrachys fix e-r bot not reporting in irc channel", and the status is the same - no progress on that one
16:03:44 <ihrachys> Unless someone wants to take it over, I will drop it from the list of actions to chase here, and will report when I actually get there (I track that by other means)
16:04:04 <ihrachys> it never comes to the point when I prioritize it
16:04:27 <jlibosva> I can take a look
16:04:34 <ihrachys> ok nice, thanks
16:05:29 <ihrachys> next was "ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri" and it's still on me, but I will get there for sure, I saw the issue in non-test env
16:05:32 <ihrachys> #action ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri
16:06:10 <ihrachys> next was "haleyb or mlavalle to report back on ha+dvr plan after l3 meeting"
16:06:32 <ihrachys> I remember l3 team should have discussed that right?
16:06:39 <mlavalle> ihrachys: correct
16:06:42 <haleyb> https://review.openstack.org/#/c/455406/
16:06:51 <mlavalle> we discussed it during our meeting
16:07:38 <ihrachys> hm so it just works?
16:08:14 <haleyb> ihrachys: plan is to move that non-voting job into the check queue to replace the dvr-multinode one
16:08:37 <ihrachys> which is also non-voting?
16:08:44 <haleyb> then if we update the grafana page we can watch it for a bit to make it voting
16:08:54 <haleyb> yes, the current one was -nv as well
16:09:34 <haleyb> i didn't want to add a 3-node job without taking something away
16:09:50 <ihrachys> any reason not to include grafana update in the patch?
16:10:28 <ihrachys> but in general, have we seen it passing full run? or we will figure it out after the fact?
16:10:30 <haleyb> i thought that was a different repo, maybe i'm wrong
16:10:44 <ihrachys> haleyb, no, it's same see grafana/neutron.yml in project-config
16:10:46 <reedip_> o\ /o
16:11:00 <haleyb> ihrachys: oh, then i'll update that too
16:12:00 <haleyb> ihrachys: unless we run 'check experimental' everywhere we'll never know if the job is good, the graphs over time is a better way (imo)
16:12:23 <ihrachys> over time sure, but have we at least validated that it has a chance to pass?
16:12:44 <ihrachys> last time I checked, it was consistently failing on some scheduler test
16:13:01 <ihrachys> (that may have been fixed by late tempest changes in-tree)
16:13:24 <ihrachys> I mean https://review.openstack.org/#/c/421155/
16:13:25 <haleyb> ihrachys: hmm, let me look
16:14:23 <haleyb> ah, that change, yes it would fix that bug
16:14:38 <ihrachys> ok let's figure it out off-band, but my general take is, we should show that there is a pass for the job at least once  somewhere before we move to triggering it for every patch
16:14:51 <haleyb> i will update the grafana page and check the job in an existing patch i have
16:15:10 <ihrachys> so if you have a link to successful run would be nice to see it in the gerrit patch for config
16:15:15 <ihrachys> ++
16:15:51 <ihrachys> thanks for working on it, I am happy we make progress on that longstanding issue (I think we started talking about it ~Newton?)
16:16:12 <ihrachys> ok next was "jlibosva to prepare py3 transition plan for Pike"
16:16:15 <ihrachys> jlibosva, your stage
16:17:20 <jlibosva> so I didn't prepare a plan yet
16:17:25 <jlibosva> but I have done some research
16:17:56 <ihrachys> should we start some etherpad to capture whatever we have on the topic?
16:18:18 <jlibosva> and found that dims (?) already started tracking the job for all projects
16:18:20 <jlibosva> https://etherpad.openstack.org/p/support-python3.5-functional-tests
16:18:27 <ihrachys> :)
16:18:39 <jlibosva> we probably want to add your functional suite there
16:18:52 <jlibosva> as it turned to catch py3 related errors in the past
16:19:33 <ihrachys> it seems like dims is tracking some tempest job for that
16:19:39 <ihrachys> but we can do more I think
16:19:45 <jlibosva> I plan to also send an rfe bug specific to neutron where we can track down issues
16:19:48 <ihrachys> for one, functional and fullstack jobs
16:20:16 <ihrachys> I am not fully sure what that gate-tempest-dsvm-nova-py35-ubuntu-xenial job mentioned there is
16:20:17 <jlibosva> the links at the etherpad are outdated
16:20:24 <ihrachys> aren't all dsvm jobs nova? :)
16:20:28 <ihrachys> right
16:20:42 <ihrachys> also, some 'issues' are not really issues, like the one about dhcp_release6
16:20:48 <ihrachys> (we have it for py2 too)
16:21:04 <ihrachys> so the neutron section definitely needs some update
16:21:19 <ihrachys> is the document for tracking all py3 progress or just tempest job?
16:21:52 <ihrachys> I think the answer to the question will decide if we need our own document, or we can hijack the existing one for other py3 things we could have for Pike
16:22:06 <jlibosva> I'd rather go with our document
16:22:12 <ihrachys> dims, what's the intent of https://etherpad.openstack.org/p/support-python3.5-functional-tests ? is it all things for py3 pike goal, or just a specific job?
16:22:16 <jlibosva> just in sake of better overview
16:22:27 <ihrachys> yeah, we can cross link
16:22:38 <ihrachys> then here you go: https://etherpad.openstack.org/p/py3-neutron-pike
16:22:51 <ihrachys> #link https://etherpad.openstack.org/p/py3-neutron-pike Etherpad to track py3 efforts for Pike goal
16:23:12 <jlibosva> cool, thanks
16:23:16 <ihrachys> let's start capturing what you have there
16:23:27 <ihrachys> and draft some high level bullet points
16:23:28 <jlibosva> I plan to look at it more closely this week
16:24:05 <ihrachys> cool
16:24:19 <ihrachys> #action jlibosva to follow up on py3 plan for pike
16:24:47 <ihrachys> next was "ihrachys to chase infra to review https://review.openstack.org/#/c/439114/"
16:24:59 <ihrachys> it's actually in already, so we can adopt the new dashboard for our needs
16:25:07 <ihrachys> I will update the wiki page with the link to the board.
16:25:18 <ihrachys> #action ihrachys to update wiki with the link to gerrit CI dashboard
16:25:38 <ihrachys> the last one is "jlibosva document current openvswitch requirements for fullstack/functional in TESTING.rst"
16:26:12 <jlibosva> whoa, totally missed that
16:26:25 * jlibosva hides under the rock
16:26:35 <ihrachys> but it's good we track those :) you have the cake for the next week then
16:26:42 <ihrachys> #action jlibosva document current openvswitch requirements for fullstack/functional in TESTING.rst
16:27:04 <ihrachys> and that's about it for the action items
16:28:03 <ihrachys> #topic Patches in review
16:28:22 <ihrachys> now that we have manjeets's change for the neutron gerrit dashboard, we can have a look what's there
16:28:38 <ihrachys> the link to it is at the top of http://status.openstack.org/reviews/ (see Neutron link)
16:28:49 <ihrachys> sadly, the link is autogenerated and is too long to copy paste here
16:29:07 <ihrachys> I see a single patch captured by it
16:29:15 <ihrachys> which puzzles me, we should have some more
16:29:22 <ihrachys> (or I think so)
16:29:28 <ihrachys> I will have a look at what's missing later
16:29:47 <ihrachys> #action ihrachys to figure out why gerrit dashboard seems to not show some gate-failure fixes
16:30:25 <ihrachys> manjeets, I may need your help once/if I find missing patches, I will ping you if I do
16:31:36 <ihrachys> anyhow, we have this patch for a sporadic tempest failure on project_id missing in resource payload on first GET: https://review.openstack.org/#/c/447781/
16:32:17 <ihrachys> I see amotoki had some comments on the approach there, it's not fully clear to me whether it's a concern around the patch, or a future change that may got wrong
16:33:37 <ihrachys> I will personally need some more time to understand the concern of amotoki
16:34:26 * ihrachys looks through the queue to see if any more CI fixes are there
16:34:32 <amotoki> I don't want to stick to my thought on how we can treat project_id and tenant_id equally, but I am not sure we need to treat project_id differently from tenant_id
16:34:52 <amotoki> but I don't want to block this if it blocks the gate
16:35:07 <ihrachys> amotoki, it's not like the issue is too pressing, it shows from time to time
16:35:38 <ihrachys> amotoki, so what would be your suggestion in this particular case to make treatment same?
16:35:47 <amotoki> I am sometimes looking it but i haven't figured out what is happenng
16:35:53 <ihrachys> amotoki, cross-check project_id against tenant_id rules and vice versa?
16:36:43 <amotoki> IIRC in the proposed approach, project_id is checked for both project_id and tenant_id, but tenant_id is checked only for tenant_id
16:37:41 <amotoki> I think we need time to switch project-id and tenant-id and we cannot switch these two at once.
16:37:49 <ihrachys> right. because project_id rules are not there (I was thinking about adding them in https://review.openstack.org/448238), and due to the nature of policy.json being a modifiable file, you can't guarantee them being there
16:38:40 <manjeets> ihrachys, sure let me know
16:38:48 <amotoki> personally I would like to treat both equally to avoid unexpected behavior. that is just my point
16:39:11 <ihrachys> though the modifiable nature is probably not an argument here, we should not pretend to support that for owner definition :)
16:39:27 <ihrachys> amotoki, ok, let's see what we can do, we'll proceed in gerrit
16:39:47 <amotoki> ihrachys: sure
16:39:53 <ihrachys> ok, as for other patches up for review
16:39:57 <manjeets> https://tinyurl.com/ly76lmy tiny url
16:40:27 <ihrachys> I have this https://review.openstack.org/454870 to fix a sporadic func test failure (not actually sure if it's the fix due to lack of data on failure in the branch where I spotted the failure the last time)
16:40:45 <ihrachys> manjeets, but you need to generate every time to keep it fresh
16:41:06 <ihrachys> I was hoping to offload that generation matters to infra :)
16:41:10 <manjeets> yep i just generated it
16:41:38 <ihrachys> for the func test failure, we will need to land https://review.openstack.org/#/q/Ic5a3b347bea7e5aa8a5caee5035568e5954f58dc,n,z into stable branches to collect more data next time it fails there
16:43:10 <ihrachys> we also had a nasty bug sneaked into stable branches where a network delete request could spin indefinitely in a loop spinning CPU up to 100%. That made grenade runs in master to fail sometimes with XXXNotFound errors on cleanup of resources.
16:43:15 <ihrachys> it's fixed by https://review.openstack.org/#/q/topic:bug/1672701+message:Revert
16:43:39 <ihrachys> but we will need a new Newton release with the patch since we happily released one with regression :-x
16:44:32 <ihrachys> also the prev week I realized that most stadium projects forked os-testr in their trees and missed some fixes from there: https://review.openstack.org/#/q/topic:remove-subunit-trace-fork
16:44:49 <ihrachys> that made some gates e..g not fail when all tests were skipped (something that happened in lbaas)
16:44:55 <ihrachys> so the patches should fix the wrong
16:45:52 <ihrachys> I also have this https://review.openstack.org/#/c/453212/ to simplify our api_extensions configuration in tempest.conf
16:46:28 <ihrachys> that's not pressing, but something I figured will make our lives easier since we won't need to maintain two almost identical lists of extensions for gate for DVR and non-DVR cases anymore
16:47:04 <ihrachys> of other pressing issues, there is https://bugs.launchpad.net/neutron/+bug/1679815 open
16:47:07 <openstack> Launchpad bug 1679815 in neutron "test_router_interface_ops_bump_router fails with "AssertionError: 5 not greater than 5"" [Critical,Confirmed] - Assigned to Kevin Benton (kevinbenton)
16:47:36 <ihrachys> it made our unit tests crash randomly, our last bastion of stability in gate :)
16:47:54 <ihrachys> the fix landed it seems: https://review.openstack.org/#/c/452691/
16:48:02 <ihrachys> (or so we think, that it it's a fix)
16:48:28 <haleyb> i tripped over this today and rebased to master, so will know soon
16:48:41 <ihrachys> I see reedip_ commented there that it hit him. :-x
16:48:53 <reedip_> hey
16:48:54 <reedip_> yeah
16:48:55 <ihrachys> so maybe it's not a fix in the end
16:49:01 <ihrachys> will need to have another look
16:49:30 <ihrachys> we also have https://bugs.launchpad.net/neutron/+bug/1680136 spooking stable gates
16:49:31 <openstack> Launchpad bug 1680136 in neutron "Stable newton gate is broken" [Critical,Confirmed] - Assigned to Kevin Benton (kevinbenton)
16:49:44 <ihrachys> will be fixed by https://review.openstack.org/#/q/Ieef10eebd93f99404dd2fd87ccbab9b75632945a,n,z
16:50:56 <ihrachys> any other pressing patches we are aware of?
16:51:56 <ihrachys> ok one more thing, we have that pike goal to switch to mod_wsgi for api
16:52:10 <ihrachys> Victor recently respinned his patches https://review.openstack.org/#/q/status:open+topic:goal-deploy-api-in-wsgi+owner:%22Victor+Morales+%253Cvictor.morales%2540intel.com%253E%22
16:52:24 <ihrachys> I haven't had a look yet but if someone has cycles, I would appreciate it
16:52:52 <ihrachys> there is also somewhat related effort to switch to new devstack lib in gate: https://review.openstack.org/#/q/status:open+topic:new-neutron-devstack-in-gate
16:53:24 <ihrachys> I was hoping that the latter would go first, and then we would be able to switch to new wsgi execution mode for lib/neutron only
16:53:38 <ihrachys> but with the review pace for the devstack switch patches, I am not sure we will get there
16:53:55 <ihrachys> again, maybe spend some review cycles on that one if you have any
16:54:07 <haleyb> ihrachys: regarding the stable/newton gate, https://review.openstack.org/#/c/453741/ still hasn't merged, seems stuck
16:54:43 <ihrachys> oh right. hm, why
16:54:55 <ihrachys> oh I W+1 again and now it's in merge queue
16:55:06 <ihrachys> good you spotted it's stuck
16:55:11 <ihrachys> we would waste another day :)
16:55:33 <ihrachys> #topic Grafana
16:55:41 <ihrachys> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:55:55 <ihrachys> we see unit tests spike, probably because of that floating_ip pagination thingy
16:56:19 <ihrachys> we will need to figure that out after the meeting
16:56:36 <ihrachys> fullstack is at ~100% failure rate
16:56:47 <ihrachys> jlibosva, what's the reason there still?
16:57:13 <jlibosva> ihrachys: didn't it improve after the ipconntrack patch?
16:57:15 * ihrachys also sees that scenarios are not in shape (almost 100%)
16:57:26 <ihrachys> jlibosva, that's what I thought that it will
16:57:34 <jlibosva> hmm, it was merged almost 24 hrs ago
16:57:43 <jlibosva> I saw a failure in trunk but that was not consistent
16:57:46 <ihrachys> well it's not 100% exactly, more like 80% now
16:58:01 <jlibosva> yeah, the trend is that it goes down
16:58:03 <ihrachys> so maybe that's an improvement in fullstack-speak
16:58:30 <ihrachys> ok gotta figure out why it's still not shiny, as well for scenarios
16:58:51 <ihrachys> jlibosva, do you want an action item for that?
16:59:02 * ihrachys is not greedy today
16:59:12 <jlibosva> I'd wait when it stabilizes at some value
16:59:17 <ihrachys> ok fair enough
16:59:33 <ihrachys> #action ihrachys to review fullstack and scenario health before next meeting
16:59:41 <ihrachys> and we are at the top of the hour
16:59:56 <ihrachys> thanks all, and thanks for reviews and patches and joining
16:59:58 <ihrachys> ciao
16:59:59 <jlibosva> I found some bugs at dhcp tests in fullstack
17:00:00 <ihrachys> #endmeeting