16:00:16 <slaweq> #startmeeting neutron_ci 16:00:17 <openstack> Meeting started Tue Nov 6 16:00:16 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:21 <openstack> The meeting name has been set to 'neutron_ci' 16:00:23 <slaweq> welcome on another meeting :) 16:00:50 <mlavalle> o/ 16:01:00 <bcafarel> hi again :) 16:01:03 * mlavalle had to re-start system 16:01:18 <mlavalle> made it back on time :-) 16:01:26 <slaweq> haleyb: are You around for CI meeting? 16:01:47 <haleyb> slaweq: yes, i'm here, just on phone at same time with someone 16:01:55 <slaweq> hongbin: are You around for CI meeting? 16:01:59 <slaweq> haleyb: sure, no problem :) 16:02:22 <hongbin> o/ 16:02:35 <slaweq> welcome hongbin :) 16:02:47 <slaweq> I think we can start as njohnston_ is not available today 16:02:49 <slaweq> #topic Actions from previous meetings 16:03:00 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs 16:03:10 <mlavalle> I am working on it 16:03:18 <mlavalle> couldn't reproduce locally 16:03:33 <mlavalle> at this moment comparing logs between good run and bad run 16:04:27 <slaweq> ok, if You will need any help, ping me :) 16:04:45 <mlavalle> do you want the bug? 16:04:53 <slaweq> yes, please 16:05:01 <mlavalle> take it hten 16:05:07 <mlavalle> then 16:06:55 <slaweq> so mlavalle should I work on it now? 16:07:09 <mlavalle> if you want the bug, go ahead 16:07:16 <mlavalle> and work on it 16:07:24 <mlavalle> I was planning to work on it today and tomorrow 16:07:32 <mlavalle> but then I am leaving for Berlin 16:07:41 <slaweq> so please work in it for those 2 days if You can 16:07:46 <slaweq> and I can continue later 16:07:48 <slaweq> :) 16:07:50 <mlavalle> yes, of course 16:07:54 <slaweq> ok for You? 16:07:57 <mlavalle> yes 16:08:07 <slaweq> great 16:08:17 <slaweq> so lets make an action about that 16:08:26 <slaweq> mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs 16:08:34 <slaweq> #action mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs 16:08:46 <slaweq> thx mlavalle :) 16:08:51 <slaweq> ok, lets go to the next one 16:08:51 <mlavalle> thank you 16:08:55 <slaweq> slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475 16:08:55 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] 16:09:04 <slaweq> so I checked it and it looks like different bug 16:09:14 <slaweq> I reported it here: https://bugs.launchpad.net/neutron/+bug/1801930 16:09:14 <openstack> Launchpad bug 1801930 in neutron "Functional test test_ha_router_namespace_has_ipv6_forwarding_disabled failing quite often" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:09:30 <slaweq> and I even pushed patch which (I hope) will fix that: https://review.openstack.org/615893 16:10:04 <slaweq> I couldn't reproduce this issue locally but I was checking logs of such failed test and it's pretty clear for me that it is race issue 16:10:33 <slaweq> so please add it to Your review list if You can :) 16:10:44 <haleyb> there is another ipv6 issue i'm working on that might be related, probably next on your list :) 16:11:20 <slaweq> haleyb: are You talking about: * haleyb to check issue with failing FIP transition to down state ? 16:11:36 <haleyb> no, that's a different one :o 16:12:07 <haleyb> https://bugs.launchpad.net/neutron/+bug/1787919 16:12:07 <openstack> Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,In progress] - Assigned to Brian Haley (brian-haley) 16:12:11 <slaweq> haleyb: so I don't have others on my list 16:12:24 <slaweq> ahh, this one wasn't discussed on CI meeting before 16:12:25 <haleyb> don't know whether that's related to yours, but same system 16:12:36 <haleyb> maybe just in l3 meeting 16:13:03 <haleyb> slaweq: i will look at your change, might be unrelated 16:13:22 <slaweq> for the description of bug it looks like this may or may not be related :) 16:13:51 <slaweq> but in logs which I found it this forwarding was set but about half second after test's check was done 16:14:13 <haleyb> right, that's why it rang a bell for me when i saw your bug :) 16:14:46 <slaweq> thx haleyb :) 16:14:59 <slaweq> ok, lets move on 16:15:01 <slaweq> next action 16:15:03 <slaweq> njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:15:31 <slaweq> njohnston_: is not here today so I think I will just reassign it to the next meeting 16:15:37 <slaweq> #action njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:15:52 <slaweq> njohnston make py3 etherpad 16:15:57 <slaweq> that is next one 16:16:07 <slaweq> do You 16:16:40 <bcafarel> side note https://review.openstack.org/#/c/577383/ is almost there on that functional job reshuffle 16:17:25 <slaweq> bcafarel: thx for info 16:18:11 <slaweq> so You are still working on it, right? 16:19:01 <bcafarel> yes I will send an update with the missing piece (neutron-functional-python27) 16:19:09 <slaweq> thx bcafarel 16:19:21 <slaweq> ok, lets assign action about etherpad to njohnston_ again for next week 16:19:25 <slaweq> #action njohnston make py3 etherpad 16:19:33 <slaweq> next one: 16:19:35 <slaweq> njohnston check if grenade is ready for py3 16:19:41 <slaweq> same here 16:19:46 <slaweq> #action njohnston check if grenade is ready for py3 16:19:55 <slaweq> next one was: 16:19:58 <slaweq> slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472) 16:19:58 <openstack> bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed] https://launchpad.net/bugs/1798472 16:20:05 <slaweq> and I didn't have time to get to this one yet 16:20:15 <slaweq> I will add it again to myself 16:20:20 <slaweq> #action slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472) 16:20:28 <slaweq> next: 16:20:30 <slaweq> mlavalle to check bug 1798475 16:20:30 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:21:07 <slaweq> any info mlavalle about this one? 16:21:10 <mlavalle> I was going to try that one after the FIP one 16:21:29 <slaweq> ok 16:21:58 <slaweq> #action mlavalle to check bug 1798475 16:21:58 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:22:11 <slaweq> and the last one was: 16:22:12 <slaweq> slaweq to check issue with openstack-tox-py35-with-oslo-master periodic job 16:22:25 <slaweq> it's now fixed on oslo.service side with: https://review.openstack.org/#/c/614642/ 16:22:50 <slaweq> so we are fine with this periodic job again 16:23:22 <slaweq> anything to add? any other things from previous week You want to discuss? 16:24:19 <mlavalle> slaweq: just one point 16:24:37 <mlavalle> regarding the previous bug assigned to it 16:24:55 <mlavalle> if at some point it becomes urgent, please let me know and I'll change priorities 16:25:03 <mlavalle> or we can assign to someone else 16:25:21 <slaweq> sure, I will keep that in mind, thx 16:25:48 <mlavalle> :-) 16:25:58 <slaweq> ok, lets move on then to the next topic 16:26:05 <slaweq> #topic Python 3 16:26:33 <slaweq> I don't we did much progress on it since last week 16:26:53 <slaweq> we have this patch from bcafarel who is working on switching functional tests to python3 16:27:06 <slaweq> and we have few action items assigned to njohnston_ 16:27:21 <slaweq> so I don't have anything else to discuss about it today 16:27:29 <slaweq> do You have something to bring on here? 16:28:17 <mlavalle> nothing from me 16:28:42 <slaweq> ok, lets move on then 16:28:44 <slaweq> #topic Grafana 16:28:53 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:30:51 <slaweq> graphs looks quite "normal" 16:31:01 <slaweq> do You want to talk about something specific? 16:31:47 * mlavalle waiting for the graphs to render 16:32:51 <mlavalle> Yeah, they look good 16:33:01 <mlavalle> there was a little spike in functional 16:33:14 <slaweq> in gate queue? 16:33:26 <mlavalle> yeah, yesterday 16:33:27 <slaweq> it was on the weekend and there was very few runs then 16:33:45 <mlavalle> yeah, I see that 16:33:52 <mlavalle> nothing else grabs my attention 16:33:57 <slaweq> good :) 16:33:58 <mlavalle> so I think we are good 16:34:17 <slaweq> so lets talk about this functional and fullstack jobs now :) 16:34:23 <mlavalle> ok 16:34:24 <slaweq> #topic fullstack/functional 16:34:48 <slaweq> regarding functional tests, we have identified 2 issues: 16:35:08 <slaweq> one is this with ipv6 forwarding https://bugs.launchpad.net/neutron/+bug/1801930 and I hope this will be fixed soon 16:35:08 <openstack> Launchpad bug 1801930 in neutron "Functional test test_ha_router_namespace_has_ipv6_forwarding_disabled failing quite often" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:35:18 <slaweq> and second one is related to this db migrations 16:35:28 <slaweq> we had to reopen bug: https://bugs.launchpad.net/neutron/+bug/1687027 16:35:28 <openstack> Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:35:36 <slaweq> because it still happens 16:35:49 <slaweq> it't not so often as it was 16:36:14 <slaweq> but sometime even 600 seconds is not enough and that leads me to think that it's something else except only timeout 16:36:33 <mlavalle> so some are cuased by timeout 16:36:34 <slaweq> unfortunatelly there is no logs from those tests in job logs 16:36:42 <mlavalle> but some by something else 16:36:46 <hongbin> yes, it is possibly something that is hanging 16:37:00 <slaweq> hanging or not running at all, yes :/ 16:37:32 <munimeha1> hi 16:38:15 <slaweq> hi munimeha1 16:38:35 <slaweq> but problem is that if You go to results of functional job, like: http://logs.openstack.org/88/555088/29/check/neutron-functional/0b00b31/logs/testr_results.html.gz 16:38:51 <slaweq> there is no logs from those tests in http://logs.openstack.org/88/555088/29/check/neutron-functional/0b00b31/logs/dsvm-functional-logs/ 16:39:03 <slaweq> so we don't know anything about what happend there :/ 16:39:14 <slaweq> I think that we should first check why those tests aren't logged there 16:39:18 <mlavalle> is that true for toher projects? 16:39:28 <mlavalle> other projects^^^ 16:40:02 <slaweq> what are You asking for exactly? 16:40:12 <munimeha1> can we evaluate this patch for ci https://review.openstack.org/#/c/603501/ 16:40:30 <munimeha1> Do we need to add any gate or anything 16:40:46 <slaweq> munimeha1: can we discuss it in Open agenda? 16:40:54 <munimeha1> thanks 16:40:56 <mlavalle> munimeha1: probably you are in the wrong meeting. the Neutron meeting ended more than 1 hour ago 16:41:05 <munimeha1> ok 16:41:19 <mlavalle> and I raised your patch in that meeting 16:41:29 <mlavalle> please chack the logs, blueprints section 16:42:00 <munimeha1> thanks 16:42:17 <mlavalle> slaweq: I was wondering it the lack of logs is a consequence of the way we setup this test 16:42:25 <slaweq> mlavalle: I don't know 16:42:29 <slaweq> we should check that 16:42:43 <slaweq> but there are logs for some other tests there 16:42:46 <mlavalle> that's why I said, what do other projects do in terms of this setup? 16:42:56 <slaweq> we should check that probably 16:43:11 <slaweq> is there any volunteer to check that maybe? 16:43:45 <mlavalle> I would volunteer with the caveat that I won't get to it until after Berlin 16:43:57 <slaweq> if no, I will assign it to me but I don't know if I will have time for that 16:44:17 <mlavalle> best thing, assign it to you 16:44:22 <slaweq> mlavalle: You already have a lot of things on Your todo list and You have summit so I will assign it to myself :) 16:44:33 <slaweq> but thx for volunteering :) 16:44:45 <mlavalle> if by the week after Berlin you haven't gotten to it, we can discuss again 16:44:54 <slaweq> #action slaweq to check why db_migration functional tests don't have logs 16:45:01 <slaweq> mlavalle: ok, thx 16:45:37 <slaweq> ok, regarding fullstack tests we had two issues which are already assigned to me and mlavalle so there is nothing else to discuss here I guess 16:45:46 <slaweq> lets move to next topic 16:45:48 <slaweq> #topic Tempest/Scenario 16:46:20 <slaweq> regarding tempest jobs, I recently started thinging about one thing 16:46:48 <slaweq> I see quite many failures completly not related to neutron, like e.g. cinder volume issues 16:46:55 <slaweq> so I have a question to You 16:47:24 <slaweq> what You think about creating some blacklist of tests like cinder.volume or maybe some others which we will not run in our gate? 16:47:48 <mlavalle> mhhhhh 16:47:49 <slaweq> I know that neutron and nova are quite related to each other so we can't do that 16:48:07 <slaweq> but for example cinder is not related to neutron at all IMO 16:48:10 <hongbin> the problem is how to track the list 16:48:15 <mlavalle> yeah 16:48:24 <mlavalle> the tracking problem worries me 16:49:03 <mlavalle> if we can have a process whereby we can revisti what we blacklist 16:49:15 <mlavalle> I would consider it 16:49:21 <slaweq> I was thinking about something quite generic like blacklist of all tempest.api.volume for example 16:49:51 <slaweq> as those tests are not testing nothing related to neutron (maybe except ssh connectivity to instance but that is tested in many other tests as well) 16:49:53 <mlavalle> how difficult would it be to put together a list of the specific tests failing 16:49:54 <mlavalle> ? 16:50:17 <mlavalle> is cinder mainly the problem? 16:50:30 <slaweq> mlavalle: TBH I don't know how diffult it would be 16:50:44 <hongbin> for me, it is more convienient to track the list in LB even if it is not related to neutron 16:50:44 <slaweq> I can try to make such list if I will find failed test 16:50:52 <mlavalle> what I am thinking, before taking the step of blaclisting 16:51:12 <mlavalle> is, if we can put together realitively easy a list and it is mostly cinder 16:51:23 <mlavalle> I can discuss with their team that list 16:51:28 <mlavalle> and see what they think 16:51:43 <mlavalle> they might say, just remove them from your queues 16:51:46 <hongbin> it could be something else, like error on shelve/unshelve a vm 16:51:48 <slaweq> mlavalle: and yes, from what I see in our jobs result, it's most often that we have some cinder related issues (except ours issues of course) 16:52:28 <slaweq> hongbin: yes, but shelve/unshelve is nova test and I'm not talking about that one here 16:52:40 <hongbin> slaweq: ok 16:52:44 <mlavalle> yes, nova we are not to blacklst 16:52:58 <slaweq> I'm talking about tests which are failing because of volume in ERROR state for example 16:52:59 <bcafarel> only that pesky volume attach test and a few friends of it 16:53:26 <mlavalle> slaweq: send me an email with the list and I'll discuss next week in Berlin 16:53:36 <mlavalle> if it is easy to put together 16:53:42 <hongbin> is there a bug opened in cinder about that? 16:53:45 <slaweq> ok, so I will make list of such failing tests and will send it to You 16:53:51 <slaweq> hongbin: I don't know TBH 16:54:02 <mlavalle> hongbin: good point. we need to make sure that is the case 16:54:10 <hongbin> slaweq: IMO, we should open bugs whenever we saw such failure 16:54:38 <slaweq> hongbin: I agree, I will do that if I spot it next time 16:54:42 <bcafarel> I know of https://bugs.launchpad.net/cinder/+bug/1796708 at least 16:54:42 <openstack> Launchpad bug 1796708 in Cinder "VolumesExtendTest.test_volume_extend_when_volume_has_snapshot intermittently fails with "Extend volume failed.: VolumeNotDeactivated: Volume volume-5514a6ad-abbb-46b3-a464-d73cc67e55af was not deactivated in time."" [Undecided,New] 16:55:07 <mlavalle> yeah, that is an old friend of ours 16:55:46 <hongbin> slaweq: i will do that as well, then we need a way to track the list of opened bugs in other projects that affect neutron 16:56:00 <slaweq> hongbin: thx 16:56:08 <mlavalle> let's all do that 16:56:22 <slaweq> sounds good 16:56:38 <slaweq> we can create some etherpad to track such failures there 16:56:41 <mlavalle> slaweq: but thanks for bringing the issue up 16:57:41 <slaweq> ok, so that's all from me for today :) 16:57:45 <slaweq> #topic Open discussion 16:58:04 <slaweq> munimeha1: do You want to discuss about something related to CI? 16:59:18 <slaweq> ok, I have one more thing which I forgot at the beginning :) 16:59:37 <slaweq> I want to welcome on those meetings our new CI lieutenant: hongbin :) 16:59:48 <slaweq> sorry that I didn't that at the beginning :) 17:00:05 <mlavalle> yaay! welcome! 17:00:07 <hongbin> slaweq: haha, np, look forward to working in the CI team 17:00:17 <slaweq> :) 17:00:22 <slaweq> ok, I think we have to finish now 17:00:26 <slaweq> thx for attending 17:00:27 <bcafarel> better late announce than never, welcome new lieutenant hongbin :) 17:00:28 <slaweq> o/ 17:00:38 <slaweq> #endmeeting