16:00:35 #startmeeting neutron_ci 16:00:35 Meeting started Tue Mar 26 16:00:35 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:38 hi 16:00:39 The meeting name has been set to 'neutron_ci' 16:00:59 o/ 16:01:04 o/ 16:01:13 hi 16:02:22 here 16:02:30 ok, lets start then 16:02:32 #topic Actions from previous meetings 16:02:41 bcafarel to switch neutron-grenade multinode jobs to bionic nodes 16:02:59 should be done (I am looking for the link) 16:03:10 I think that it's done even by my patch https://review.openstack.org/#/c/639361/ 16:03:12 in fact it was in the original review to switch to bionic 16:03:34 ^ that is the one :) 16:03:54 yeah, I had many such "DNM" test patches and I somehow missed that in this one someone changed description and removed "DNM" from title 16:03:56 :) 16:04:09 :) 16:04:13 thx bcafarel for checking that 16:04:14 just dropping the override was enough in the end 16:04:25 ok, next one 16:04:28 mlavalle to check https://bugs.launchpad.net/neutron/+bug/1820865 16:04:29 Launchpad bug 1820865 in neutron "Fullstack tests are failing because of "OSError: [Errno 22] failed to open netns"" [Critical,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:04:41 I did checked on it 16:04:50 created a patch 16:05:06 but seems ralonsoh proposed a better solution 16:05:33 https://review.openstack.org/#/c/647721 16:05:51 I saw that You tried to downgrade pyroute2 to check if that isn't issue in nevest version of it, right mlavalle? 16:06:02 right 16:06:33 ok, I hope that ralonsoh's solution would fix this problem once and for all :) 16:06:38 I was taking network_namespace_exists as proof that the namespace existed 16:06:54 but it seems that it can exist but might not be ready to be opened 16:08:01 Hi 16:08:01 yes, I also was thinking that if namespace exists than it should be ready always :) 16:08:11 hi lajoskatona 16:08:18 late o/ 16:08:25 hi rubasov 16:08:39 ok, lets review ralonsoh's patch and lets hope it will help 16:08:44 next action 16:08:46 slaweq/mlavalle to check https://bugs.launchpad.net/neutron/+bug/1820870 16:08:48 Launchpad bug 1820870 in neutron "Fullstack tests are failing because async_process is not started properly" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:09:05 I send some dnm patch with extra logs added 16:09:16 it is for sure no issue with rabbitmq 16:09:47 a lot of errors related to rabbit were caused by that rabbitmq user/vhost was already deleted 16:10:08 what happend there is that sometimes AsyncProcess.is_active() was returning False always 16:10:14 and after 1 minute test was failing 16:10:46 so test worker was cleaning everything, removed rabbitmq user/vhost, killed spawned processes and so on 16:11:08 but process which was "not active" (and caused failure) was in fact running fine 16:11:18 it wasn't killed as test runner didn't know its pid 16:11:34 so it was keep running and logging that it can't connect to rabbitmq 16:11:51 in one case such agent's log got about 1,7 GB :) 16:12:12 all my findings should be described in comments to bug in launchpad 16:12:17 and I did patch https://review.openstack.org/#/c/647605/ 16:12:23 please review it 16:12:47 * mlavalle thinks we forgot the grafana url at the beginning of the meeting 16:13:00 mlavalle: right, sorry 16:13:05 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:13:08 please open it now :) 16:13:46 :-) 16:14:17 and that is basically all about this action :) 16:14:23 next one from last week was: 16:14:32 mlavalle to debug reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:14:52 I didn't make much progress there 16:16:03 ok, that's fine :) 16:16:13 we all had more important things last week 16:16:20 I think I sais I was going to do this slowly 16:16:31 yes :) 16:16:37 :-) 16:16:45 should I assign it to You for next week to keep it in mind? 16:16:58 but it's a good idea to ask me every week 16:17:11 ok, I will :) 16:17:15 that way the fear of facinf the Hulk will push me to make progress 16:17:23 LOL 16:17:32 #action mlavalle to debug reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:17:49 and that was all actions from last week 16:17:59 anything You want to add/ask here? 16:18:54 nope 16:19:13 ok, so lets move on to the next topic 16:19:15 #topic Python 3 16:19:21 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:19:26 njohnston: any updates? 16:20:08 I started looking at the vpnaas jobs and tinkering with them 16:20:30 I am trying to learn more from my failures in defining zuul jobs 16:20:43 But zuul is a tough teacher and slaps my hand a lot 16:20:58 Progress is slow but the timeline is long so I think things are OK 16:21:03 if You need any help with zuul I can try to help with it :) 16:21:08 thanks! 16:21:20 just ping me if You will need anything 16:21:37 will do 16:21:53 thx 16:22:03 so next topic is 16:22:04 #topic Ubuntu Bionic in CI jobs 16:22:27 as grenade jobs are switched to bionic I think we are done with all of jobs here 16:22:44 most of them are already zuulv3 syntax so switched some time ago 16:22:53 and legacy jobs were switched recently 16:23:22 so all done for the topic? 16:23:23 if there are no objections I think we can remove this topic from agenda for next meetings 16:23:37 bcafarel: yes, IMO all is done for us 16:24:08 +1 for being done with it, good job! 16:24:15 also stadium projects are switched to bionic as infra switched legacy jobs to bionic too 16:24:31 is midonet still the lone holdout? 16:24:59 yes, let's remove this topic from the agenda 16:25:27 njohnston: yes, for midonet there is patch https://review.openstack.org/#/c/639990/ 16:25:34 to switch them too 16:25:39 but jobs are non-voting already 16:25:50 ok 16:25:56 and midonet team will work on that I hope 16:26:06 ok mlavalle, thx for confirmation :) 16:26:19 #topic tempest-plugins migration 16:26:24 any updates? 16:26:30 I didn't have time to work on that 16:26:33 I will work on this this week 16:26:38 tmorin didn't replied to me 16:26:44 I hope to get back to the fwaas migration as soon as my urgent matters clear up 16:27:38 I will also try to do this migration for bgpvpn this or next week 16:27:48 no progress here either (hope to send at least WIP patch by next meeting) 16:28:09 #topic Grafana 16:28:19 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:28:26 ^^ just as a reminder :) 16:29:59 do You see anything which You want to talk about on grafana? 16:30:15 I don't see anything new and very urgent there 16:30:29 looks much better this week 16:30:40 and I think that we finally more or less make gates working 16:30:51 \o/ 16:31:05 there are still some issues but nothing new, at least I'm not aware of anything new from this week 16:31:38 agree 16:32:40 so lets talk about some specific jobs now 16:32:42 #topic fullstack/functional 16:32:56 we have still those 2 issues with fullstack jobs: 16:33:02 https://bugs.launchpad.net/neutron/+bug/1820865 and 16:33:04 Launchpad bug 1820865 in neutron "Fullstack tests are failing because of "OSError: [Errno 22] failed to open netns"" [Critical,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:33:07 https://bugs.launchpad.net/neutron/+bug/1820870 16:33:08 Launchpad bug 1820870 in neutron "Fullstack tests are failing because async_process is not started properly" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:33:15 but patches for both are already proposed 16:33:39 when both will be merged, we can maybe next week make it voting again if it will be better on grafana 16:34:12 anything else You want to talk about, regarding to functional or fullstack jobs? 16:34:38 agree 16:34:55 I was going to ask about making the job voting again 16:35:02 :) 16:35:07 I remember about that 16:35:28 but lets fix those two issues and then check how it will be for few days at least 16:35:37 what do You think about such plan? 16:36:09 +1 16:37:05 ok, next topic 16:37:07 #topic Tempest/Scenario 16:37:26 I want to raise here 2 issues 16:37:34 first is https://bugs.launchpad.net/neutron/+bug/1815585 16:37:35 Launchpad bug 1815585 in neutron "Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario-linuxbridge" [High,Confirmed] 16:37:43 I was debugging it a bit 16:38:15 and what is strange for me is fact that port is unbound and active - and because of that active status test is failing 16:38:40 do You know if that should be possible somehow to have unbound port active? 16:38:48 for me this looks odd 16:39:10 ahh, example of such failure is e.g. here: http://logs.openstack.org/93/631793/6/check/neutron-tempest-plugin-scenario-linuxbridge/1c3c083/testr_results.html.gz 16:39:20 yes, it looks od 16:39:22 odd 16:39:45 maybe this is somehow related to https://bugs.launchpad.net/neutron/+bug/1819446 16:39:45 Launchpad bug 1819446 in neutron "After the vm's port name is modified, the port status changes from down to active. " [Low,Confirmed] 16:40:03 but in this bug they reported that changing name of port will switch it to active 16:40:17 even stranger 16:40:49 in test there is no such port updates before this place where test is failing but maybe it can happen not only on name update - I don't know 16:41:15 I have it in my backlog still but if someone would have time, please feel free to take it :) 16:41:56 in the meantime I can mark this test as unstable 16:42:01 what You think about it? 16:42:39 according to logstash it happend 9 times in last 7 days 16:42:39 yep 16:43:00 ok, thx mlavalle, I will do it today :) 16:43:18 #action slaweq to mark test_floatingip_port_details test as unstable 16:43:54 and second problem which we have for some time already is problem with intermittent ssh failures in various tempest tests 16:44:03 example: http://logs.openstack.org/46/638646/14/check/neutron-tempest-plugin-scenario-linuxbridge/1ff70f5/testr_results.html.gz 16:44:11 here it is linuxbridge job 16:44:26 but it may happend in any tempest job in fact 16:44:41 I didn't saw any "pattern" of those failures 16:45:02 do we have logstash query for them? 16:45:15 only thing which I think is quite common is fact that those instances can connect to metadata service but ssh to FIP is not working 16:45:24 mlavalle: no, I don't have logstash query for that 16:45:36 and I didn't reported bug for this also 16:45:53 I will report it to be able to track progress there 16:45:57 if we created a logstash query and attach that for a bug report, we might cooperate in trying to isolate the issue 16:46:05 and I will also try to prepare some query 16:46:14 mlavalle++ I will do 16:46:25 slaweq: I'll help 16:46:42 #action slaweq to report a bug related to intermittent ssh failures in various tests 16:46:45 thx mlavalle 16:47:09 and that's all from my side for today 16:47:40 anything else You want to talk about regarding to scenario jobs or anything else related to CI? 16:47:47 #topic Open discussion 16:48:29 quick comment on stable branches, ocata branch gets -1 in recent backport candidates 16:48:37 https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:stable/ocata 16:49:04 I need to check if there's an easy fix, or if it's just getting old 16:49:17 probably the latter 16:50:03 pike branch needs some rechecks from time to time, but not as bad 16:50:12 bcafarel: looking at one random test failure: http://logs.openstack.org/51/646651/1/check/openstack-tox-cover/2a21a99/job-output.txt.gz#_2019-03-25_10_50_42_620773 16:50:21 first question is: why it is tested on bionic? 16:51:05 and it's not job defined in our repo I think 16:51:27 so maybe we should ask e.g. gmann about it 16:51:36 or someone else from qa team 16:51:47 oooh very good point, so that would mostly be a fallout from bionic switch? 16:52:03 I just checked this one job for now :) 16:52:09 I don't know about other failures 16:52:36 ocata should not do bionic. i will check the job 16:52:49 gmann++ thanks 16:52:50 but yeah, looking at 3 other patches, it also failed on openstack-tox-cover job 16:53:01 hi gmann 16:53:06 thx for help with it :) 16:54:37 and thx bcafarel for taking care of those stable branches :) 16:55:12 ok, anything else for today? 16:55:24 if not I think we can finish few minutes earlier :) 16:55:45 sounds good :) 16:55:58 o/ 16:56:06 ok, thx for attending the meeting 16:56:06 +1 thanks! 16:56:09 #endmeeting