16:02:30 <ihrachys> #startmeeting neutron_ci 16:02:31 <openstack> Meeting started Tue Feb 6 16:02:30 2018 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:02:35 <openstack> The meeting name has been set to 'neutron_ci' 16:02:38 <mlavalle> o/ 16:02:43 <haleyb> o/ 16:02:49 <slaweq> hi 16:03:16 <ihrachys> thanks jlibosva for taking over the chair the prev meeting 16:03:23 <ihrachys> afaiu Jakub won't join us today 16:03:23 <mlavalle> ++ 16:03:32 <ihrachys> #topic Actions from prev meeting 16:03:37 <ihrachys> "jlibosva to request a new release for ovsdbapp" 16:03:39 <mlavalle> yes, he dropped of about 15 minutes ago 16:04:02 <ihrachys> there is https://review.openstack.org/541056 and https://review.openstack.org/541112 in pipeline that should roll in new library into gates 16:04:03 <patchbot> patch 541056 - requirements - update constraint for ovsdbapp to new release 0.9.1 16:04:04 <patchbot> patch 541112 - requirements (stable/pike) - update constraint for ovsdbapp to new release 0.4.2 16:05:40 <ihrachys> "mlavalle and haleyb to follow up on how we can move forward floating ip failures in dvr scenario" 16:06:25 <mlavalle> haleyb is setting up an environment to replicate the issue. 16:06:50 <mlavalle> since we marked the test as unstable 16:06:52 <haleyb> ihrachys: my only update is i have an environment but haven't replicated yet 16:09:57 <ihrachys> ok. I guess there is not much sense to roll the AI over and over going forward. I trust you to beat it to death. :) 16:10:16 <ihrachys> next was "jlibosva to report bug about scenario failure for test_snat_external_ip" 16:11:36 <ihrachys> not sure it happened, can't find anything in gate-failure tagged bugs 16:13:19 <ihrachys> I sent him an email about it, we'll see if he followed up on that 16:13:29 <mlavalle> ++ 16:13:36 <ihrachys> ok that's all we had from prev meeting 16:13:44 <ihrachys> #topic Grafana 16:13:48 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:14:12 <ihrachys> before we walk through regular offenders, let's peek at periodics 16:14:39 <ihrachys> mlavalle, afaiu https://review.openstack.org/#/c/539006 would move the jobs to neutron tree, then we would change names for jobs / clean up legacy jobs from infra repos? 16:14:40 <patchbot> patch 539006 - neutron - Move periodic jobs to Neutron repo 16:14:54 <mlavalle> correct 16:15:16 <ihrachys> there seems to be some syntax error or smth. 16:15:28 <ihrachys> ok that's covered then 16:15:29 <mlavalle> I introduced a syntax error yesterday. I'll fix it today 16:16:07 <mlavalle> in the first revision we had a funtional periodic job as well 16:16:20 <mlavalle> but is seems it is the same as the normal functional 16:16:32 <mlavalle> so I removed it yesterday 16:16:56 <mlavalle> and when we add the other two to the periodic queue, we will also add the normal functional 16:19:04 <ihrachys> ok. -pg- (postgres) job of those periodic jobs shows some level of instability 16:19:20 <ihrachys> but it's not clear if it's totally broken, since some days are fine 16:20:27 <ihrachys> I checked latest logs and it seems something is busted 16:20:27 <ihrachys> http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/legacy-periodic-tempest-dsvm-neutron-pg-full/be7f73e/logs/devstacklog.txt.gz#_2018-02-06_06_24_58_783 16:20:33 <ihrachys> "Rolling upgrades are currently supported only for MySQL and Sqlite" 16:20:41 <ihrachys> and that's from glance section of devstack 16:21:29 <ihrachys> I will report a bug for that and assign it to glance 16:21:41 <ihrachys> #action ihrachys report bug for -pg- failure to glance 16:22:07 <ihrachys> fullstack is unstable, that's expected 16:22:21 <ihrachys> functional is a tad better and I expect it to become a lot better when new ovsdbapp rolls in 16:22:23 <slaweq> but it's better than last week at least 16:22:25 <slaweq> :) 16:22:33 <ihrachys> (actually, it may also help fullstack) 16:22:44 <ihrachys> slaweq, absolutely! 16:23:25 <ihrachys> one thing of concern esp. close to releas is that ovsfw job, though of course non-voting, now at 100% 16:23:40 <ihrachys> anyone aware of the reason for breakage / bug to look at? 16:24:39 <slaweq> e.g. here: http://logs.openstack.org/54/537654/4/check/neutron-tempest-ovsfw/5c90b2b/logs/testr_results.html.gz IMO some error with SSH to vm 16:25:08 <slaweq> I can report a bug and try to check it this week if You want 16:25:16 <slaweq> *in this week 16:25:35 <ihrachys> and based on what guests log, metadata is not accessible 16:26:54 <slaweq> http://logs.openstack.org/54/537654/4/check/neutron-tempest-ovsfw/5c90b2b/logs/screen-q-agt.txt.gz?level=ERROR 16:26:59 <ihrachys> slaweq, well if you have cycles. jlibosva may be of help here. 16:27:07 <ihrachys> yeah was going to post same link to agent log 16:27:12 <ihrachys> it doesn't look healthy 16:27:18 <slaweq> there are some errors related to ovsfw but I don't know if it is a reason 16:27:24 <ihrachys> ofport: -1 for VIF: bdfda7ad-c656-46a3-bd83-a17878346c35 is not a positive integer 16:28:14 <ihrachys> slaweq, it's probably worth marking the bug as blocker for release. mlavalle what do you think? 16:28:44 <ihrachys> this sea of red is not something we should release :) 16:28:51 <mlavalle> yeah, I think it is a good idea 16:29:02 <slaweq> ihrachys: I have some cycles so I will check it ASAP 16:29:15 <ihrachys> slaweq, great. make sure the bug is tagged for queens-rc1 16:29:23 <slaweq> sure 16:29:38 <ihrachys> mlavalle, btw when is stable/queens cutoff / rc1? 16:29:53 <mlavalle> rc1 is this coming Friday 16:30:02 <ihrachys> I wonder if our release liaison started to walk through pre-release check list. 16:30:18 <ihrachys> probably not a q for this venue though, but smth to follow up with armax if not already, mlavalle. 16:30:33 <mlavalle> yeap 16:30:42 <mlavalle> but he's been active 16:30:48 <ihrachys> ok so ovsfw is covered too, great 16:31:11 <ihrachys> #action slaweq to report bug for ovsfw job failure / sea of red in ovs agent logs 16:32:25 <ihrachys> as for scenarios, linuxbridge is quite bad, but dvr one actually seems to mostly match trend spikes and dips of other tempest jobs, just a bit exaggerated 16:33:13 <ihrachys> so there are probably some more issues somewhere there. we'll have a look right now on logs. 16:34:20 <ihrachys> actually, dvr is not that much better, it's just some spike in linuxbridge lately that has not reflected on dvr, but the shape for older period is same. 16:34:45 <ihrachys> #topic Scenarios 16:35:37 <ihrachys> linuxbridge: http://logs.openstack.org/54/537654/4/check/neutron-tempest-plugin-scenario-linuxbridge/3272a19/logs/testr_results.html.gz 16:35:45 <ihrachys> ssh to fip failed 16:37:04 <ihrachys> there is some warning in q-agt logged over and over: http://logs.openstack.org/54/537654/4/check/neutron-tempest-plugin-scenario-linuxbridge/3272a19/logs/screen-q-agt.txt.gz?level=WARNING 16:37:18 <ihrachys> haleyb, any idea where that one comes from and whether it's harmful? 16:37:53 <ihrachys> based on table name, neutron-linuxbri-qos-o15fb2c, it's probably smth qos related 16:37:53 <haleyb> ihrachys: i don't know where it's coming from but don't think it's harmful 16:40:03 <haleyb> it could be something as simple as removing the rule with a different string than it was added 16:40:35 <slaweq> it's strange because AFAIR qos is adding rules to MANGLE table only (but I might be wrong) 16:40:48 <ihrachys> right. but then wouldn't we e.g. potentially leave an old? 16:41:11 <slaweq> actually it might be MANGLE table and POSTROUTING chain there 16:43:07 <ihrachys> here is another run: http://logs.openstack.org/68/540868/1/check/neutron-tempest-plugin-scenario-linuxbridge/01c4559/logs/testr_results.html.gz 16:43:25 <ihrachys> also failing on ssh to fip, but different tests 16:43:40 <ihrachys> looks like some random issues that are not specific to scenario, just ssh to fip 16:43:44 <haleyb> slaweq: maybe because the chain removal triggered the rule removal? 16:44:10 <slaweq> haleyb: maybe, I really don't know now 16:44:37 <haleyb> it's the postrouting rule in the mangle table 16:44:52 <ihrachys> but a successful run has same messages: http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-scenario-linuxbridge/35864c5/logs/screen-q-agt.txt.gz?level=WARNING so it's probably not the root cause 16:45:38 <haleyb> actually mangle table rule in the postrouting table, that's a mouthful 16:45:54 <ihrachys> we will probably need to track through things like - whether FIP was reused; whether gARP was sent; whether ARP table was updated.. 16:46:23 <ihrachys> I will bite that one 16:46:39 <ihrachys> #action ihrachys to look at linuxbridge scenario random failures when sshing to FIP 16:48:20 <ihrachys> also, dvr scenarios fail from time to time 16:48:21 <ihrachys> http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-dvr-multinode-scenario/b907462/logs/testr_results.html.gz 16:48:28 <ihrachys> similar symptoms actually 16:49:05 <slaweq> I think that it's the issue which haleyb and mlavalle was talking at the beginning (in "actions from previous meeting") 16:49:23 <ihrachys> oh fip one? 16:49:30 <slaweq> but I might be wrong 16:49:37 <ihrachys> I also see this: http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-dvr-multinode-scenario/b907462/logs/screen-q-agt.txt.gz?level=WARNING#_Feb_03_18_50_36_283366 that seems to resemble ovsfw- job failure 16:50:35 <ihrachys> but not sure, could as well be unrelated 16:50:57 <ihrachys> the issue haleyb was trying to reproduce is for specific test cases that are disabled no? 16:51:18 <mlavalle> yes 16:52:50 <ihrachys> ok I am not sure what to do with this failure. maybe allow it to slip since haleyb is already busy with another issue for the job related to FIPs. 16:54:39 <haleyb> was that just a warning or failure? 16:54:56 <ihrachys> well it fails test cases with ssh connection issues when used with FIP 16:54:57 <haleyb> oh, ERROR 16:55:03 <ihrachys> http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-dvr-multinode-scenario/b907462/logs/testr_results.html.gz 16:55:14 <ihrachys> it broke lots of tests 16:55:18 <haleyb> ah, i saw WARNING in the url 16:55:44 <ihrachys> well yeah WARNINGs are just things we look at in hope they reveal the cause. but it's a legit failure. 16:56:18 <ihrachys> anyhow. I guess we will wait for more progress on that other FIP issue. 16:56:24 <ihrachys> #topic Fullstack 16:56:29 <ihrachys> not much time but let's peek 16:57:08 <slaweq> there is still one issue with security group tests: https://bugs.launchpad.net/neutron/+bug/1744402 16:57:09 <openstack> Launchpad bug 1744402 in neutron "fullstack security groups test fails because ncat process don't starts" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:57:18 <ihrachys> http://logs.openstack.org/14/531414/1/check/neutron-fullstack/55e82ca/logs/testr_results.html.gz 16:57:48 <ihrachys> slaweq, gotcha. fix here: https://review.openstack.org/#/c/541242/ 16:57:48 <patchbot> patch 541242 - neutron - [Fullstack] Mark security group test as unstable 16:57:57 <ihrachys> oh it's just mark as unstable 16:58:13 <slaweq> ihrachys: it's not a fix but related to this issue :) 16:58:27 <ihrachys> ok 16:58:28 <slaweq> I didn't have time yet to debug it 16:59:09 <ihrachys> btw, before we wrap up 16:59:25 <ihrachys> I noticed a colleague from ironic posted this revert for ovsfw patch: https://review.openstack.org/#/c/541297/1 16:59:25 <patchbot> patch 541297 - neutron - DNM Test Revert "ovsfw: Don't create rules if upda... 16:59:33 <ihrachys> maybe they are onto something 16:59:51 <ihrachys> it would make sense to check with them what they try to do 17:00:27 <ihrachys> ok time is out 17:00:35 <ihrachys> thanks folks 17:00:37 <ihrachys> #endmeeting