16:00:55 #startmeeting neutron_ci 16:00:56 Meeting started Tue Mar 14 16:00:55 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:58 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:58 hi 16:01:00 The meeting name has been set to 'neutron_ci' 16:01:07 hi everyone 16:01:21 hi 16:01:43 hi 16:02:10 * ihrachys waves back at manjeets, jlibosva, and dasanind 16:02:11 #topic Action items from previous meeting 16:02:26 first was: "ihrachys fix e-r bot not reporting in irc channel" 16:03:15 I haven't got to that one just yet, need to talk to mtreinish I guess on why it's not reporting, I suspect wrong configuration that is too limiting. lemme repeat the action for the next week. 16:03:21 #action ihrachys fix e-r bot not reporting in irc channel 16:03:32 next was "ihrachys to clean up dsvm-scenario flavor handling from gate-hook" 16:03:35 that happened 16:03:49 we landed https://review.openstack.org/#/c/442758/ 16:05:03 beyond cleanup, there is some failure in ovs scenario job that started showing up after latest hook rework 16:05:08 we will discuss later 16:05:22 next was "ihrachys to walk thru list of open gate failure bugs and give them love" 16:05:30 I did, closed some bugs that didn't seem relevant 16:05:39 ok next was "ihrachys to chase down armax on d-g local.conf breakage assessment for stadium" 16:06:31 armax posted some patches since then: https://review.openstack.org/442884 for client, https://review.openstack.org/442890 for fwaas 16:06:55 both were backported into stable branches 16:07:32 there is also ongoing work for sfc gate: https://review.openstack.org/#/c/445037/ and https://review.openstack.org/#/c/442882/ 16:07:45 I assume that's all there is 16:08:10 ok next was "haleyb and mlavalle to investigate what makes dvr gate job failing with 25% rate" 16:08:14 ihrachys: e-r reports that dhcp lease failures are causing problems 16:08:34 not sure if that is on your radar or not 16:08:53 clarkb: it wasn't. is there a bug? 16:09:08 hi 16:09:17 ihrachys: yes its top of e-r list http://status.openstack.org/elastic-recheck/index.html 16:09:29 mlavalle: hey. we were looking for update on dvr job failure rate that was 25% prev week 16:09:43 ihrachys: the bug is over a year old and "fixed" in nova net, I think we must be rematching errors from syslog? against neutron jobs 16:09:54 ihrachys: so the bug there may not be the most up to date 16:10:13 ihrachys: I couldn't make progress on that last week :-( 16:10:26 clarkb: yeah, seems like catching generic message 16:10:39 mlavalle: ok then I leave the action on you 16:10:47 #action haleyb and mlavalle to investigate what makes dvr gate job failing with 25% rate 16:10:53 ihrachys: thanks. I was sick a few days last week 16:11:01 clarkb: thanks for bringing up, we will have a look 16:11:10 mlavalle: np, get well 16:11:11 thanks 16:11:45 #action ihrachys explore why bug 1532809 bubbled into top in e-r 16:11:45 bug 1532809 in OpenStack Compute (nova) liberty "Gate failures when DHCP lease cannot be acquired" [High,In progress] https://launchpad.net/bugs/1532809 - Assigned to Sean Dague (sdague) 16:11:59 next was "ajo to chase down fullstack 100% failure rate due to test_dscp_qos_policy_rule_lifecycle failures" 16:12:06 also jlibosva was going to help ajo 16:12:11 jlibosva: any progress? 16:12:15 ihrachys: yes 16:12:21 I may sound dumb asking that question can some explain a bit what is e-r ? 16:12:25 I started looking at it like hour-ish ago :) 16:12:39 It got broken by https://review.openstack.org/#/c/380329 16:12:48 I suspect that delete_dscp doesn't work with native driver 16:12:59 manjeets: elastic-recheck, it's a tool that captures failure patterns in logs and reports in gerrit, on elastic recheck webui, and in irc 16:13:10 I'm just trying a local run and I'm about to send a simple patch to verify the fullstack fix 16:13:22 https://review.openstack.org/445560 16:13:22 ohk thanks ihrachys 16:13:49 jlibosva: niiice 16:14:03 I need to make sure that's really it, I didn't give it much love yet 16:14:09 jlibosva: I wonder why we have different API for drivers 16:14:14 yeah ... 16:14:30 still this is a nice step to check if that fixes the issue 16:14:38 if so, we can look at making them consistent 16:15:18 there is also a constant failure in securitygroups for linuxbridge 16:15:24 I didn't get a chance to look at that yet 16:15:59 gotcha 16:16:24 can be an AI to me till the next meeting ;) 16:17:12 #action jlibosva fix delete_dscp for native driver: https://review.openstack.org/445560 16:17:27 #action jlibosva to fix remaining fullstack failures in securitygroups for linuxbridge 16:18:12 we are getting closer to stable functional job (it shows normal failure rate now), let's do the same for fullstack 16:18:17 next AI was "ajo to restore and merge patch raising ovsdb native timeout: https://review.openstack.org/#/c/425623/" 16:18:28 we restored, but now I am not sure 16:18:47 as seen on http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen functional is back to normal 16:18:57 and not the old normal, but actual normal 16:19:10 its failure rate is less than -api job that is considered rather stable and is voting 16:19:52 so I am not sure we should land the timeout bump 16:19:57 thoughts? 16:20:16 I would go for it 16:20:31 as the worst that can happen is that some actions might take longer before failing 16:20:46 if transaction is successful, it won't be longer 16:22:13 but doesn't it open door wider for performance regressions 16:24:33 you mean that if we make a change that would create a longer transaction it would fail for us in gate while with higher timeout it would get pass? 16:24:58 yeah. we will catch it a lot later when we pile up more regressions on top 16:25:09 at the point when it may be harder to unravel things 16:26:07 maybe we can close the bug and discuss this on the review with pros and cons 16:26:27 yeah 16:26:40 I also want to get back to the question of voting for the job in several weeks 16:26:45 assuming we prove it's stable 16:27:41 ok let's discuss further in gerrit 16:27:45 next AI was "anilvenkata_afk to track inclusion of HA+DVR patch for devstack-gate" 16:27:55 ihrachys: you mean putting functional to gate queue? 16:28:02 jlibosva: yes 16:28:04 or talking about fullstack 16:28:08 ok 16:28:09 we told we want it stable and voting? 16:28:22 maybe just check, we'll see 16:28:23 but voting 16:28:39 ehm, sorry, I mix things :) 16:28:43 yes, gate queue 16:28:50 got it :) 16:28:51 for fullstack, it will be first check voting 16:29:18 for that we need to show weeks of steady stability 16:29:24 only time will tell if we can do that 16:29:34 ok back to HA+DVR job 16:29:43 the patch is still sitting there: https://review.openstack.org/#/c/383827/ 16:30:05 we may want clarkb and other infra gurus to chime in 16:30:22 I see clarkb +2d it in the past 16:30:44 I can rereview 16:31:22 thanks! 16:31:49 #action anilvenkata to follow up on HA+DVR job patches 16:32:18 there was a test failure in the new ha-dvr job, probably because of the new topology used 16:32:34 I assume Anil will look at it once we have infra side done 16:33:50 though actually, since devstack-gate is part of the same queue, we don't need to wait, we can start fixing the test runs with depends-on patches 16:34:02 gotta check with Anil on his plans 16:34:31 and, that was it for action items from the previous week 16:34:36 let's move on 16:34:44 #topic Action items from PTG 16:34:53 the prev meeting, we covered most of them 16:35:12 the only thing that was left behind is python3 strategy 16:35:32 during ptg, we decided there should be a bunch of jobs that should transfer to py3 16:35:52 I believe functional and fullstack were in short list 16:36:03 also we need some tempest job switched to py3 16:36:16 jlibosva: what's the plan? 16:37:00 yes, I believe functional, fullstack and full-tempest makes most sense. I don't think we have any blueprint or bug where we can track the effort 16:37:09 so I'm gonna create some 16:37:21 and same for tempest split 16:37:44 jlibosva: which full-tempest? 16:37:49 I think we have several 16:38:17 there is one all-in-one, and two multinodes 16:38:25 (which are 2nodes really) 16:39:32 but none of multinode are voting, are they? 16:40:02 yeah 16:40:06 and that's a shame 16:40:16 but consider that the single node one is part of integrated gate 16:40:31 so it wouldn't be a neutron only decision to switch it 16:40:39 and I suspect it wouldn't go smooth 16:41:01 hmm, I need to update myself on that, I see we already have some py35 tempest: http://logs.openstack.org/29/380329/24/check/gate-tempest-dsvm-py35-ubuntu-xenial/4ed1870/logs/testr_results.html.gz 16:41:35 jlibosva: it's not working 16:41:46 because swift is not really compatible 16:41:55 and devstack attempts to execute it as py3 16:42:29 I think that's going to be tackled with https://review.openstack.org/#/c/439108/ 16:42:35 but so far it doesn't move anywhere 16:43:13 so what does the job run? 16:44:26 it runs everything that devstack determines as py3 compat in py3 16:44:38 and there was a hack in devstack that enforced py3 for swift 16:44:47 that was honestly totally wrong 16:45:07 because they have py3 SyntaxErrors in their code, not to mention it was never tried 16:46:16 anyhoo, let's leave AI on you and move on 16:46:33 #action jlibosva to figure out the plan for py3 gate transition and report back 16:47:14 there will also be some CI related work once we get to switch to lib/neutron: https://review.openstack.org/#/q/topic:new-neutron-devstack-in-gate and mod_wsgi 16:47:28 yep 16:47:44 one other thing than I wanted to raise is https://review.openstack.org/#/c/439114/ 16:47:58 manjeets: what's the status of the dashboard patch? are you going to respin it? 16:48:10 ihrachys, yes 16:48:25 I was thinking, maybe we should squash that with existing neutron review dashboard 16:48:35 having a single tool may make more sense 16:48:42 yes that make sense since some of existing method can be used as well 16:48:42 thoughts? 16:49:03 just creating a new section for all gate patches in existing dashboard 16:49:05 ?? 16:49:08 yea 16:49:13 ++ 16:49:45 #action manjeets respin https://review.openstack.org/#/c/439114/ to include gate-failure reviews into existing dashboard 16:50:40 another thing I wanted to note is that we made some progress on the path to index functional and fullstack logs in logstash 16:50:45 https://review.openstack.org/#/q/topic:index-func-logs 16:51:00 we now generate a -index.txt file with all INFO+ messages in each gate run 16:51:16 ihrachys: good job :) 16:51:18 the only missing bit is actually updating logstash config to index it 16:51:29 I hope to get infra review it today 16:51:41 clarkb: maybe you have +2 there too? system-config ^ 16:52:35 speaking of more logs collected, another stalled work is collecting info on mlock-consuming processes: https://review.openstack.org/#/q/topic:collect-mlock-stats-in-gate 16:53:00 I haven't seen oom-killers lately. is it just my perception? 16:53:28 * jlibosva hasn't checked the status 16:53:42 ihrachys: ya I can look a tthat too 16:54:56 thanks 16:56:22 ok any more topics to raise this time? 16:57:42 not from me 16:58:07 not from me either atm 16:58:11 ok let's call it a day 16:58:13 thanks everyone 16:58:15 thanks 16:58:16 thanks! 16:58:16 #endmeeting