16:00:16 #startmeeting neutron_ci 16:00:17 Meeting started Tue Nov 6 16:00:16 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:21 The meeting name has been set to 'neutron_ci' 16:00:23 welcome on another meeting :) 16:00:50 o/ 16:01:00 hi again :) 16:01:03 * mlavalle had to re-start system 16:01:18 made it back on time :-) 16:01:26 haleyb: are You around for CI meeting? 16:01:47 slaweq: yes, i'm here, just on phone at same time with someone 16:01:55 hongbin: are You around for CI meeting? 16:01:59 haleyb: sure, no problem :) 16:02:22 o/ 16:02:35 welcome hongbin :) 16:02:47 I think we can start as njohnston_ is not available today 16:02:49 #topic Actions from previous meetings 16:03:00 mlavalle to continue debugging issue with not reachable FIP in scenario jobs 16:03:10 I am working on it 16:03:18 couldn't reproduce locally 16:03:33 at this moment comparing logs between good run and bad run 16:04:27 ok, if You will need any help, ping me :) 16:04:45 do you want the bug? 16:04:53 yes, please 16:05:01 take it hten 16:05:07 then 16:06:55 so mlavalle should I work on it now? 16:07:09 if you want the bug, go ahead 16:07:16 and work on it 16:07:24 I was planning to work on it today and tomorrow 16:07:32 but then I am leaving for Berlin 16:07:41 so please work in it for those 2 days if You can 16:07:46 and I can continue later 16:07:48 :) 16:07:50 yes, of course 16:07:54 ok for You? 16:07:57 yes 16:08:07 great 16:08:17 so lets make an action about that 16:08:26 mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs 16:08:34 #action mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs 16:08:46 thx mlavalle :) 16:08:51 ok, lets go to the next one 16:08:51 thank you 16:08:55 slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475 16:08:55 Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] 16:09:04 so I checked it and it looks like different bug 16:09:14 I reported it here: https://bugs.launchpad.net/neutron/+bug/1801930 16:09:14 Launchpad bug 1801930 in neutron "Functional test test_ha_router_namespace_has_ipv6_forwarding_disabled failing quite often" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:09:30 and I even pushed patch which (I hope) will fix that: https://review.openstack.org/615893 16:10:04 I couldn't reproduce this issue locally but I was checking logs of such failed test and it's pretty clear for me that it is race issue 16:10:33 so please add it to Your review list if You can :) 16:10:44 there is another ipv6 issue i'm working on that might be related, probably next on your list :) 16:11:20 haleyb: are You talking about: * haleyb to check issue with failing FIP transition to down state ? 16:11:36 no, that's a different one :o 16:12:07 https://bugs.launchpad.net/neutron/+bug/1787919 16:12:07 Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,In progress] - Assigned to Brian Haley (brian-haley) 16:12:11 haleyb: so I don't have others on my list 16:12:24 ahh, this one wasn't discussed on CI meeting before 16:12:25 don't know whether that's related to yours, but same system 16:12:36 maybe just in l3 meeting 16:13:03 slaweq: i will look at your change, might be unrelated 16:13:22 for the description of bug it looks like this may or may not be related :) 16:13:51 but in logs which I found it this forwarding was set but about half second after test's check was done 16:14:13 right, that's why it rang a bell for me when i saw your bug :) 16:14:46 thx haleyb :) 16:14:59 ok, lets move on 16:15:01 next action 16:15:03 njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:15:31 njohnston_: is not here today so I think I will just reassign it to the next meeting 16:15:37 #action njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:15:52 njohnston make py3 etherpad 16:15:57 that is next one 16:16:07 do You 16:16:40 side note https://review.openstack.org/#/c/577383/ is almost there on that functional job reshuffle 16:17:25 bcafarel: thx for info 16:18:11 so You are still working on it, right? 16:19:01 yes I will send an update with the missing piece (neutron-functional-python27) 16:19:09 thx bcafarel 16:19:21 ok, lets assign action about etherpad to njohnston_ again for next week 16:19:25 #action njohnston make py3 etherpad 16:19:33 next one: 16:19:35 njohnston check if grenade is ready for py3 16:19:41 same here 16:19:46 #action njohnston check if grenade is ready for py3 16:19:55 next one was: 16:19:58 slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472) 16:19:58 bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed] https://launchpad.net/bugs/1798472 16:20:05 and I didn't have time to get to this one yet 16:20:15 I will add it again to myself 16:20:20 #action slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472) 16:20:28 next: 16:20:30 mlavalle to check bug 1798475 16:20:30 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:21:07 any info mlavalle about this one? 16:21:10 I was going to try that one after the FIP one 16:21:29 ok 16:21:58 #action mlavalle to check bug 1798475 16:21:58 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:22:11 and the last one was: 16:22:12 slaweq to check issue with openstack-tox-py35-with-oslo-master periodic job 16:22:25 it's now fixed on oslo.service side with: https://review.openstack.org/#/c/614642/ 16:22:50 so we are fine with this periodic job again 16:23:22 anything to add? any other things from previous week You want to discuss? 16:24:19 slaweq: just one point 16:24:37 regarding the previous bug assigned to it 16:24:55 if at some point it becomes urgent, please let me know and I'll change priorities 16:25:03 or we can assign to someone else 16:25:21 sure, I will keep that in mind, thx 16:25:48 :-) 16:25:58 ok, lets move on then to the next topic 16:26:05 #topic Python 3 16:26:33 I don't we did much progress on it since last week 16:26:53 we have this patch from bcafarel who is working on switching functional tests to python3 16:27:06 and we have few action items assigned to njohnston_ 16:27:21 so I don't have anything else to discuss about it today 16:27:29 do You have something to bring on here? 16:28:17 nothing from me 16:28:42 ok, lets move on then 16:28:44 #topic Grafana 16:28:53 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:30:51 graphs looks quite "normal" 16:31:01 do You want to talk about something specific? 16:31:47 * mlavalle waiting for the graphs to render 16:32:51 Yeah, they look good 16:33:01 there was a little spike in functional 16:33:14 in gate queue? 16:33:26 yeah, yesterday 16:33:27 it was on the weekend and there was very few runs then 16:33:45 yeah, I see that 16:33:52 nothing else grabs my attention 16:33:57 good :) 16:33:58 so I think we are good 16:34:17 so lets talk about this functional and fullstack jobs now :) 16:34:23 ok 16:34:24 #topic fullstack/functional 16:34:48 regarding functional tests, we have identified 2 issues: 16:35:08 one is this with ipv6 forwarding https://bugs.launchpad.net/neutron/+bug/1801930 and I hope this will be fixed soon 16:35:08 Launchpad bug 1801930 in neutron "Functional test test_ha_router_namespace_has_ipv6_forwarding_disabled failing quite often" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:35:18 and second one is related to this db migrations 16:35:28 we had to reopen bug: https://bugs.launchpad.net/neutron/+bug/1687027 16:35:28 Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:35:36 because it still happens 16:35:49 it't not so often as it was 16:36:14 but sometime even 600 seconds is not enough and that leads me to think that it's something else except only timeout 16:36:33 so some are cuased by timeout 16:36:34 unfortunatelly there is no logs from those tests in job logs 16:36:42 but some by something else 16:36:46 yes, it is possibly something that is hanging 16:37:00 hanging or not running at all, yes :/ 16:37:32 hi 16:38:15 hi munimeha1 16:38:35 but problem is that if You go to results of functional job, like: http://logs.openstack.org/88/555088/29/check/neutron-functional/0b00b31/logs/testr_results.html.gz 16:38:51 there is no logs from those tests in http://logs.openstack.org/88/555088/29/check/neutron-functional/0b00b31/logs/dsvm-functional-logs/ 16:39:03 so we don't know anything about what happend there :/ 16:39:14 I think that we should first check why those tests aren't logged there 16:39:18 is that true for toher projects? 16:39:28 other projects^^^ 16:40:02 what are You asking for exactly? 16:40:12 can we evaluate this patch for ci https://review.openstack.org/#/c/603501/ 16:40:30 Do we need to add any gate or anything 16:40:46 munimeha1: can we discuss it in Open agenda? 16:40:54 thanks 16:40:56 munimeha1: probably you are in the wrong meeting. the Neutron meeting ended more than 1 hour ago 16:41:05 ok 16:41:19 and I raised your patch in that meeting 16:41:29 please chack the logs, blueprints section 16:42:00 thanks 16:42:17 slaweq: I was wondering it the lack of logs is a consequence of the way we setup this test 16:42:25 mlavalle: I don't know 16:42:29 we should check that 16:42:43 but there are logs for some other tests there 16:42:46 that's why I said, what do other projects do in terms of this setup? 16:42:56 we should check that probably 16:43:11 is there any volunteer to check that maybe? 16:43:45 I would volunteer with the caveat that I won't get to it until after Berlin 16:43:57 if no, I will assign it to me but I don't know if I will have time for that 16:44:17 best thing, assign it to you 16:44:22 mlavalle: You already have a lot of things on Your todo list and You have summit so I will assign it to myself :) 16:44:33 but thx for volunteering :) 16:44:45 if by the week after Berlin you haven't gotten to it, we can discuss again 16:44:54 #action slaweq to check why db_migration functional tests don't have logs 16:45:01 mlavalle: ok, thx 16:45:37 ok, regarding fullstack tests we had two issues which are already assigned to me and mlavalle so there is nothing else to discuss here I guess 16:45:46 lets move to next topic 16:45:48 #topic Tempest/Scenario 16:46:20 regarding tempest jobs, I recently started thinging about one thing 16:46:48 I see quite many failures completly not related to neutron, like e.g. cinder volume issues 16:46:55 so I have a question to You 16:47:24 what You think about creating some blacklist of tests like cinder.volume or maybe some others which we will not run in our gate? 16:47:48 mhhhhh 16:47:49 I know that neutron and nova are quite related to each other so we can't do that 16:48:07 but for example cinder is not related to neutron at all IMO 16:48:10 the problem is how to track the list 16:48:15 yeah 16:48:24 the tracking problem worries me 16:49:03 if we can have a process whereby we can revisti what we blacklist 16:49:15 I would consider it 16:49:21 I was thinking about something quite generic like blacklist of all tempest.api.volume for example 16:49:51 as those tests are not testing nothing related to neutron (maybe except ssh connectivity to instance but that is tested in many other tests as well) 16:49:53 how difficult would it be to put together a list of the specific tests failing 16:49:54 ? 16:50:17 is cinder mainly the problem? 16:50:30 mlavalle: TBH I don't know how diffult it would be 16:50:44 for me, it is more convienient to track the list in LB even if it is not related to neutron 16:50:44 I can try to make such list if I will find failed test 16:50:52 what I am thinking, before taking the step of blaclisting 16:51:12 is, if we can put together realitively easy a list and it is mostly cinder 16:51:23 I can discuss with their team that list 16:51:28 and see what they think 16:51:43 they might say, just remove them from your queues 16:51:46 it could be something else, like error on shelve/unshelve a vm 16:51:48 mlavalle: and yes, from what I see in our jobs result, it's most often that we have some cinder related issues (except ours issues of course) 16:52:28 hongbin: yes, but shelve/unshelve is nova test and I'm not talking about that one here 16:52:40 slaweq: ok 16:52:44 yes, nova we are not to blacklst 16:52:58 I'm talking about tests which are failing because of volume in ERROR state for example 16:52:59 only that pesky volume attach test and a few friends of it 16:53:26 slaweq: send me an email with the list and I'll discuss next week in Berlin 16:53:36 if it is easy to put together 16:53:42 is there a bug opened in cinder about that? 16:53:45 ok, so I will make list of such failing tests and will send it to You 16:53:51 hongbin: I don't know TBH 16:54:02 hongbin: good point. we need to make sure that is the case 16:54:10 slaweq: IMO, we should open bugs whenever we saw such failure 16:54:38 hongbin: I agree, I will do that if I spot it next time 16:54:42 I know of https://bugs.launchpad.net/cinder/+bug/1796708 at least 16:54:42 Launchpad bug 1796708 in Cinder "VolumesExtendTest.test_volume_extend_when_volume_has_snapshot intermittently fails with "Extend volume failed.: VolumeNotDeactivated: Volume volume-5514a6ad-abbb-46b3-a464-d73cc67e55af was not deactivated in time."" [Undecided,New] 16:55:07 yeah, that is an old friend of ours 16:55:46 slaweq: i will do that as well, then we need a way to track the list of opened bugs in other projects that affect neutron 16:56:00 hongbin: thx 16:56:08 let's all do that 16:56:22 sounds good 16:56:38 we can create some etherpad to track such failures there 16:56:41 slaweq: but thanks for bringing the issue up 16:57:41 ok, so that's all from me for today :) 16:57:45 #topic Open discussion 16:58:04 munimeha1: do You want to discuss about something related to CI? 16:59:18 ok, I have one more thing which I forgot at the beginning :) 16:59:37 I want to welcome on those meetings our new CI lieutenant: hongbin :) 16:59:48 sorry that I didn't that at the beginning :) 17:00:05 yaay! welcome! 17:00:07 slaweq: haha, np, look forward to working in the CI team 17:00:17 :) 17:00:22 ok, I think we have to finish now 17:00:26 thx for attending 17:00:27 better late announce than never, welcome new lieutenant hongbin :) 17:00:28 o/ 17:00:38 #endmeeting