16:00:55 <ihrachys> #startmeeting neutron_ci 16:00:56 <openstack> Meeting started Tue Mar 14 16:00:55 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:58 <manjeets> hi 16:01:00 <openstack> The meeting name has been set to 'neutron_ci' 16:01:07 <ihrachys> hi everyone 16:01:21 <jlibosva> hi 16:01:43 <dasanind> hi 16:02:10 * ihrachys waves back at manjeets, jlibosva, and dasanind 16:02:11 <ihrachys> #topic Action items from previous meeting 16:02:26 <ihrachys> first was: "ihrachys fix e-r bot not reporting in irc channel" 16:03:15 <ihrachys> I haven't got to that one just yet, need to talk to mtreinish I guess on why it's not reporting, I suspect wrong configuration that is too limiting. lemme repeat the action for the next week. 16:03:21 <ihrachys> #action ihrachys fix e-r bot not reporting in irc channel 16:03:32 <ihrachys> next was "ihrachys to clean up dsvm-scenario flavor handling from gate-hook" 16:03:35 <ihrachys> that happened 16:03:49 <ihrachys> we landed https://review.openstack.org/#/c/442758/ 16:05:03 <ihrachys> beyond cleanup, there is some failure in ovs scenario job that started showing up after latest hook rework 16:05:08 <ihrachys> we will discuss later 16:05:22 <ihrachys> next was "ihrachys to walk thru list of open gate failure bugs and give them love" 16:05:30 <ihrachys> I did, closed some bugs that didn't seem relevant 16:05:39 <ihrachys> ok next was "ihrachys to chase down armax on d-g local.conf breakage assessment for stadium" 16:06:31 <ihrachys> armax posted some patches since then: https://review.openstack.org/442884 for client, https://review.openstack.org/442890 for fwaas 16:06:55 <ihrachys> both were backported into stable branches 16:07:32 <ihrachys> there is also ongoing work for sfc gate: https://review.openstack.org/#/c/445037/ and https://review.openstack.org/#/c/442882/ 16:07:45 <ihrachys> I assume that's all there is 16:08:10 <ihrachys> ok next was "haleyb and mlavalle to investigate what makes dvr gate job failing with 25% rate" 16:08:14 <clarkb> ihrachys: e-r reports that dhcp lease failures are causing problems 16:08:34 <clarkb> not sure if that is on your radar or not 16:08:53 <ihrachys> clarkb: it wasn't. is there a bug? 16:09:08 <mlavalle> hi 16:09:17 <clarkb> ihrachys: yes its top of e-r list http://status.openstack.org/elastic-recheck/index.html 16:09:29 <ihrachys> mlavalle: hey. we were looking for update on dvr job failure rate that was 25% prev week 16:09:43 <clarkb> ihrachys: the bug is over a year old and "fixed" in nova net, I think we must be rematching errors from syslog? against neutron jobs 16:09:54 <clarkb> ihrachys: so the bug there may not be the most up to date 16:10:13 <mlavalle> ihrachys: I couldn't make progress on that last week :-( 16:10:26 <ihrachys> clarkb: yeah, seems like catching generic message 16:10:39 <ihrachys> mlavalle: ok then I leave the action on you 16:10:47 <ihrachys> #action haleyb and mlavalle to investigate what makes dvr gate job failing with 25% rate 16:10:53 <mlavalle> ihrachys: thanks. I was sick a few days last week 16:11:01 <ihrachys> clarkb: thanks for bringing up, we will have a look 16:11:10 <ihrachys> mlavalle: np, get well 16:11:11 <clarkb> thanks 16:11:45 <ihrachys> #action ihrachys explore why bug 1532809 bubbled into top in e-r 16:11:45 <openstack> bug 1532809 in OpenStack Compute (nova) liberty "Gate failures when DHCP lease cannot be acquired" [High,In progress] https://launchpad.net/bugs/1532809 - Assigned to Sean Dague (sdague) 16:11:59 <ihrachys> next was "ajo to chase down fullstack 100% failure rate due to test_dscp_qos_policy_rule_lifecycle failures" 16:12:06 <ihrachys> also jlibosva was going to help ajo 16:12:11 <ihrachys> jlibosva: any progress? 16:12:15 <jlibosva> ihrachys: yes 16:12:21 <manjeets> I may sound dumb asking that question can some explain a bit what is e-r ? 16:12:25 <jlibosva> I started looking at it like hour-ish ago :) 16:12:39 <jlibosva> It got broken by https://review.openstack.org/#/c/380329 16:12:48 <jlibosva> I suspect that delete_dscp doesn't work with native driver 16:12:59 <ihrachys> manjeets: elastic-recheck, it's a tool that captures failure patterns in logs and reports in gerrit, on elastic recheck webui, and in irc 16:13:10 <jlibosva> I'm just trying a local run and I'm about to send a simple patch to verify the fullstack fix 16:13:22 <jlibosva> https://review.openstack.org/445560 16:13:22 <manjeets> ohk thanks ihrachys 16:13:49 <ihrachys> jlibosva: niiice 16:14:03 <jlibosva> I need to make sure that's really it, I didn't give it much love yet 16:14:09 <ihrachys> jlibosva: I wonder why we have different API for drivers 16:14:14 <jlibosva> yeah ... 16:14:30 <ihrachys> still this is a nice step to check if that fixes the issue 16:14:38 <ihrachys> if so, we can look at making them consistent 16:15:18 <jlibosva> there is also a constant failure in securitygroups for linuxbridge 16:15:24 <jlibosva> I didn't get a chance to look at that yet 16:15:59 <ihrachys> gotcha 16:16:24 <jlibosva> can be an AI to me till the next meeting ;) 16:17:12 <ihrachys> #action jlibosva fix delete_dscp for native driver: https://review.openstack.org/445560 16:17:27 <ihrachys> #action jlibosva to fix remaining fullstack failures in securitygroups for linuxbridge 16:18:12 <ihrachys> we are getting closer to stable functional job (it shows normal failure rate now), let's do the same for fullstack 16:18:17 <ihrachys> next AI was "ajo to restore and merge patch raising ovsdb native timeout: https://review.openstack.org/#/c/425623/" 16:18:28 <ihrachys> we restored, but now I am not sure 16:18:47 <ihrachys> as seen on http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen functional is back to normal 16:18:57 <ihrachys> and not the old normal, but actual normal 16:19:10 <ihrachys> its failure rate is less than -api job that is considered rather stable and is voting 16:19:52 <ihrachys> so I am not sure we should land the timeout bump 16:19:57 <ihrachys> thoughts? 16:20:16 <jlibosva> I would go for it 16:20:31 <jlibosva> as the worst that can happen is that some actions might take longer before failing 16:20:46 <jlibosva> if transaction is successful, it won't be longer 16:22:13 <ihrachys> but doesn't it open door wider for performance regressions 16:24:33 <jlibosva> you mean that if we make a change that would create a longer transaction it would fail for us in gate while with higher timeout it would get pass? 16:24:58 <ihrachys> yeah. we will catch it a lot later when we pile up more regressions on top 16:25:09 <ihrachys> at the point when it may be harder to unravel things 16:26:07 <jlibosva> maybe we can close the bug and discuss this on the review with pros and cons 16:26:27 <ihrachys> yeah 16:26:40 <ihrachys> I also want to get back to the question of voting for the job in several weeks 16:26:45 <ihrachys> assuming we prove it's stable 16:27:41 <ihrachys> ok let's discuss further in gerrit 16:27:45 <ihrachys> next AI was "anilvenkata_afk to track inclusion of HA+DVR patch for devstack-gate" 16:27:55 <jlibosva> ihrachys: you mean putting functional to gate queue? 16:28:02 <ihrachys> jlibosva: yes 16:28:04 <jlibosva> or talking about fullstack 16:28:08 <jlibosva> ok 16:28:09 <ihrachys> we told we want it stable and voting? 16:28:22 <ihrachys> maybe just check, we'll see 16:28:23 <ihrachys> but voting 16:28:39 <ihrachys> ehm, sorry, I mix things :) 16:28:43 <ihrachys> yes, gate queue 16:28:50 <jlibosva> got it :) 16:28:51 <ihrachys> for fullstack, it will be first check voting 16:29:18 <ihrachys> for that we need to show weeks of steady stability 16:29:24 <ihrachys> only time will tell if we can do that 16:29:34 <ihrachys> ok back to HA+DVR job 16:29:43 <ihrachys> the patch is still sitting there: https://review.openstack.org/#/c/383827/ 16:30:05 <ihrachys> we may want clarkb and other infra gurus to chime in 16:30:22 <ihrachys> I see clarkb +2d it in the past 16:30:44 <clarkb> I can rereview 16:31:22 <ihrachys> thanks! 16:31:49 <ihrachys> #action anilvenkata to follow up on HA+DVR job patches 16:32:18 <ihrachys> there was a test failure in the new ha-dvr job, probably because of the new topology used 16:32:34 <ihrachys> I assume Anil will look at it once we have infra side done 16:33:50 <ihrachys> though actually, since devstack-gate is part of the same queue, we don't need to wait, we can start fixing the test runs with depends-on patches 16:34:02 <ihrachys> gotta check with Anil on his plans 16:34:31 <ihrachys> and, that was it for action items from the previous week 16:34:36 <ihrachys> let's move on 16:34:44 <ihrachys> #topic Action items from PTG 16:34:53 <ihrachys> the prev meeting, we covered most of them 16:35:12 <ihrachys> the only thing that was left behind is python3 strategy 16:35:32 <ihrachys> during ptg, we decided there should be a bunch of jobs that should transfer to py3 16:35:52 <ihrachys> I believe functional and fullstack were in short list 16:36:03 <ihrachys> also we need some tempest job switched to py3 16:36:16 <ihrachys> jlibosva: what's the plan? 16:37:00 <jlibosva> yes, I believe functional, fullstack and full-tempest makes most sense. I don't think we have any blueprint or bug where we can track the effort 16:37:09 <jlibosva> so I'm gonna create some 16:37:21 <jlibosva> and same for tempest split 16:37:44 <ihrachys> jlibosva: which full-tempest? 16:37:49 <ihrachys> I think we have several 16:38:17 <ihrachys> there is one all-in-one, and two multinodes 16:38:25 <ihrachys> (which are 2nodes really) 16:39:32 <jlibosva> but none of multinode are voting, are they? 16:40:02 <ihrachys> yeah 16:40:06 <ihrachys> and that's a shame 16:40:16 <ihrachys> but consider that the single node one is part of integrated gate 16:40:31 <ihrachys> so it wouldn't be a neutron only decision to switch it 16:40:39 <ihrachys> and I suspect it wouldn't go smooth 16:41:01 <jlibosva> hmm, I need to update myself on that, I see we already have some py35 tempest: http://logs.openstack.org/29/380329/24/check/gate-tempest-dsvm-py35-ubuntu-xenial/4ed1870/logs/testr_results.html.gz 16:41:35 <ihrachys> jlibosva: it's not working 16:41:46 <ihrachys> because swift is not really compatible 16:41:55 <ihrachys> and devstack attempts to execute it as py3 16:42:29 <ihrachys> I think that's going to be tackled with https://review.openstack.org/#/c/439108/ 16:42:35 <ihrachys> but so far it doesn't move anywhere 16:43:13 <jlibosva> so what does the job run? 16:44:26 <ihrachys> it runs everything that devstack determines as py3 compat in py3 16:44:38 <ihrachys> and there was a hack in devstack that enforced py3 for swift 16:44:47 <ihrachys> that was honestly totally wrong 16:45:07 <ihrachys> because they have py3 SyntaxErrors in their code, not to mention it was never tried 16:46:16 <ihrachys> anyhoo, let's leave AI on you and move on 16:46:33 <ihrachys> #action jlibosva to figure out the plan for py3 gate transition and report back 16:47:14 <ihrachys> there will also be some CI related work once we get to switch to lib/neutron: https://review.openstack.org/#/q/topic:new-neutron-devstack-in-gate and mod_wsgi 16:47:28 <jlibosva> yep 16:47:44 <ihrachys> one other thing than I wanted to raise is https://review.openstack.org/#/c/439114/ 16:47:58 <ihrachys> manjeets: what's the status of the dashboard patch? are you going to respin it? 16:48:10 <manjeets> ihrachys, yes 16:48:25 <ihrachys> I was thinking, maybe we should squash that with existing neutron review dashboard 16:48:35 <ihrachys> having a single tool may make more sense 16:48:42 <manjeets> yes that make sense since some of existing method can be used as well 16:48:42 <ihrachys> thoughts? 16:49:03 <manjeets> just creating a new section for all gate patches in existing dashboard 16:49:05 <manjeets> ?? 16:49:08 <ihrachys> yea 16:49:13 <manjeets> ++ 16:49:45 <ihrachys> #action manjeets respin https://review.openstack.org/#/c/439114/ to include gate-failure reviews into existing dashboard 16:50:40 <ihrachys> another thing I wanted to note is that we made some progress on the path to index functional and fullstack logs in logstash 16:50:45 <ihrachys> https://review.openstack.org/#/q/topic:index-func-logs 16:51:00 <ihrachys> we now generate a -index.txt file with all INFO+ messages in each gate run 16:51:16 <jlibosva> ihrachys: good job :) 16:51:18 <ihrachys> the only missing bit is actually updating logstash config to index it 16:51:29 <ihrachys> I hope to get infra review it today 16:51:41 <ihrachys> clarkb: maybe you have +2 there too? system-config ^ 16:52:35 <ihrachys> speaking of more logs collected, another stalled work is collecting info on mlock-consuming processes: https://review.openstack.org/#/q/topic:collect-mlock-stats-in-gate 16:53:00 <ihrachys> I haven't seen oom-killers lately. is it just my perception? 16:53:28 * jlibosva hasn't checked the status 16:53:42 <clarkb> ihrachys: ya I can look a tthat too 16:54:56 <ihrachys> thanks 16:56:22 <ihrachys> ok any more topics to raise this time? 16:57:42 <jlibosva> not from me 16:58:07 <manjeets> not from me either atm 16:58:11 <ihrachys> ok let's call it a day 16:58:13 <ihrachys> thanks everyone 16:58:15 <manjeets> thanks 16:58:16 <jlibosva> thanks! 16:58:16 <ihrachys> #endmeeting