16:00:35 <slaweq> #startmeeting neutron_ci 16:00:35 <openstack> Meeting started Tue Mar 26 16:00:35 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:38 <slaweq> hi 16:00:39 <openstack> The meeting name has been set to 'neutron_ci' 16:00:59 <mlavalle> o/ 16:01:04 <bcafarel> o/ 16:01:13 <haleyb> hi 16:02:22 <njohnston> here 16:02:30 <slaweq> ok, lets start then 16:02:32 <slaweq> #topic Actions from previous meetings 16:02:41 <slaweq> bcafarel to switch neutron-grenade multinode jobs to bionic nodes 16:02:59 <bcafarel> should be done (I am looking for the link) 16:03:10 <slaweq> I think that it's done even by my patch https://review.openstack.org/#/c/639361/ 16:03:12 <bcafarel> in fact it was in the original review to switch to bionic 16:03:34 <bcafarel> ^ that is the one :) 16:03:54 <slaweq> yeah, I had many such "DNM" test patches and I somehow missed that in this one someone changed description and removed "DNM" from title 16:03:56 <slaweq> :) 16:04:09 <bcafarel> :) 16:04:13 <slaweq> thx bcafarel for checking that 16:04:14 <bcafarel> just dropping the override was enough in the end 16:04:25 <slaweq> ok, next one 16:04:28 <slaweq> mlavalle to check https://bugs.launchpad.net/neutron/+bug/1820865 16:04:29 <openstack> Launchpad bug 1820865 in neutron "Fullstack tests are failing because of "OSError: [Errno 22] failed to open netns"" [Critical,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:04:41 <mlavalle> I did checked on it 16:04:50 <mlavalle> created a patch 16:05:06 <mlavalle> but seems ralonsoh proposed a better solution 16:05:33 <mlavalle> https://review.openstack.org/#/c/647721 16:05:51 <slaweq> I saw that You tried to downgrade pyroute2 to check if that isn't issue in nevest version of it, right mlavalle? 16:06:02 <mlavalle> right 16:06:33 <slaweq> ok, I hope that ralonsoh's solution would fix this problem once and for all :) 16:06:38 <mlavalle> I was taking network_namespace_exists as proof that the namespace existed 16:06:54 <mlavalle> but it seems that it can exist but might not be ready to be opened 16:08:01 <lajoskatona> Hi 16:08:01 <slaweq> yes, I also was thinking that if namespace exists than it should be ready always :) 16:08:11 <slaweq> hi lajoskatona 16:08:18 <rubasov> late o/ 16:08:25 <slaweq> hi rubasov 16:08:39 <slaweq> ok, lets review ralonsoh's patch and lets hope it will help 16:08:44 <slaweq> next action 16:08:46 <slaweq> slaweq/mlavalle to check https://bugs.launchpad.net/neutron/+bug/1820870 16:08:48 <openstack> Launchpad bug 1820870 in neutron "Fullstack tests are failing because async_process is not started properly" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:09:05 <slaweq> I send some dnm patch with extra logs added 16:09:16 <slaweq> it is for sure no issue with rabbitmq 16:09:47 <slaweq> a lot of errors related to rabbit were caused by that rabbitmq user/vhost was already deleted 16:10:08 <slaweq> what happend there is that sometimes AsyncProcess.is_active() was returning False always 16:10:14 <slaweq> and after 1 minute test was failing 16:10:46 <slaweq> so test worker was cleaning everything, removed rabbitmq user/vhost, killed spawned processes and so on 16:11:08 <slaweq> but process which was "not active" (and caused failure) was in fact running fine 16:11:18 <slaweq> it wasn't killed as test runner didn't know its pid 16:11:34 <slaweq> so it was keep running and logging that it can't connect to rabbitmq 16:11:51 <slaweq> in one case such agent's log got about 1,7 GB :) 16:12:12 <slaweq> all my findings should be described in comments to bug in launchpad 16:12:17 <slaweq> and I did patch https://review.openstack.org/#/c/647605/ 16:12:23 <slaweq> please review it 16:12:47 * mlavalle thinks we forgot the grafana url at the beginning of the meeting 16:13:00 <slaweq> mlavalle: right, sorry 16:13:05 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:13:08 <slaweq> please open it now :) 16:13:46 <njohnston> :-) 16:14:17 <slaweq> and that is basically all about this action :) 16:14:23 <slaweq> next one from last week was: 16:14:32 <slaweq> mlavalle to debug reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:14:52 <mlavalle> I didn't make much progress there 16:16:03 <slaweq> ok, that's fine :) 16:16:13 <slaweq> we all had more important things last week 16:16:20 <mlavalle> I think I sais I was going to do this slowly 16:16:31 <slaweq> yes :) 16:16:37 <mlavalle> :-) 16:16:45 <slaweq> should I assign it to You for next week to keep it in mind? 16:16:58 <mlavalle> but it's a good idea to ask me every week 16:17:11 <slaweq> ok, I will :) 16:17:15 <mlavalle> that way the fear of facinf the Hulk will push me to make progress 16:17:23 <slaweq> LOL 16:17:32 <slaweq> #action mlavalle to debug reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:17:49 <slaweq> and that was all actions from last week 16:17:59 <slaweq> anything You want to add/ask here? 16:18:54 <njohnston> nope 16:19:13 <slaweq> ok, so lets move on to the next topic 16:19:15 <slaweq> #topic Python 3 16:19:21 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:19:26 <slaweq> njohnston: any updates? 16:20:08 <njohnston> I started looking at the vpnaas jobs and tinkering with them 16:20:30 <njohnston> I am trying to learn more from my failures in defining zuul jobs 16:20:43 <njohnston> But zuul is a tough teacher and slaps my hand a lot 16:20:58 <njohnston> Progress is slow but the timeline is long so I think things are OK 16:21:03 <slaweq> if You need any help with zuul I can try to help with it :) 16:21:08 <njohnston> thanks! 16:21:20 <slaweq> just ping me if You will need anything 16:21:37 <njohnston> will do 16:21:53 <slaweq> thx 16:22:03 <slaweq> so next topic is 16:22:04 <slaweq> #topic Ubuntu Bionic in CI jobs 16:22:27 <slaweq> as grenade jobs are switched to bionic I think we are done with all of jobs here 16:22:44 <slaweq> most of them are already zuulv3 syntax so switched some time ago 16:22:53 <slaweq> and legacy jobs were switched recently 16:23:22 <bcafarel> so all done for the topic? 16:23:23 <slaweq> if there are no objections I think we can remove this topic from agenda for next meetings 16:23:37 <slaweq> bcafarel: yes, IMO all is done for us 16:24:08 <njohnston> +1 for being done with it, good job! 16:24:15 <slaweq> also stadium projects are switched to bionic as infra switched legacy jobs to bionic too 16:24:31 <njohnston> is midonet still the lone holdout? 16:24:59 <mlavalle> yes, let's remove this topic from the agenda 16:25:27 <slaweq> njohnston: yes, for midonet there is patch https://review.openstack.org/#/c/639990/ 16:25:34 <slaweq> to switch them too 16:25:39 <slaweq> but jobs are non-voting already 16:25:50 <njohnston> ok 16:25:56 <slaweq> and midonet team will work on that I hope 16:26:06 <slaweq> ok mlavalle, thx for confirmation :) 16:26:19 <slaweq> #topic tempest-plugins migration 16:26:24 <slaweq> any updates? 16:26:30 <slaweq> I didn't have time to work on that 16:26:33 <mlavalle> I will work on this this week 16:26:38 <slaweq> tmorin didn't replied to me 16:26:44 <njohnston> I hope to get back to the fwaas migration as soon as my urgent matters clear up 16:27:38 <slaweq> I will also try to do this migration for bgpvpn this or next week 16:27:48 <bcafarel> no progress here either (hope to send at least WIP patch by next meeting) 16:28:09 <slaweq> #topic Grafana 16:28:19 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:28:26 <slaweq> ^^ just as a reminder :) 16:29:59 <slaweq> do You see anything which You want to talk about on grafana? 16:30:15 <slaweq> I don't see anything new and very urgent there 16:30:29 <mlavalle> looks much better this week 16:30:40 <slaweq> and I think that we finally more or less make gates working 16:30:51 <njohnston> \o/ 16:31:05 <slaweq> there are still some issues but nothing new, at least I'm not aware of anything new from this week 16:31:38 <mlavalle> agree 16:32:40 <slaweq> so lets talk about some specific jobs now 16:32:42 <slaweq> #topic fullstack/functional 16:32:56 <slaweq> we have still those 2 issues with fullstack jobs: 16:33:02 <slaweq> https://bugs.launchpad.net/neutron/+bug/1820865 and 16:33:04 <openstack> Launchpad bug 1820865 in neutron "Fullstack tests are failing because of "OSError: [Errno 22] failed to open netns"" [Critical,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:33:07 <slaweq> https://bugs.launchpad.net/neutron/+bug/1820870 16:33:08 <openstack> Launchpad bug 1820870 in neutron "Fullstack tests are failing because async_process is not started properly" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:33:15 <slaweq> but patches for both are already proposed 16:33:39 <slaweq> when both will be merged, we can maybe next week make it voting again if it will be better on grafana 16:34:12 <slaweq> anything else You want to talk about, regarding to functional or fullstack jobs? 16:34:38 <mlavalle> agree 16:34:55 <mlavalle> I was going to ask about making the job voting again 16:35:02 <slaweq> :) 16:35:07 <slaweq> I remember about that 16:35:28 <slaweq> but lets fix those two issues and then check how it will be for few days at least 16:35:37 <slaweq> what do You think about such plan? 16:36:09 <njohnston> +1 16:37:05 <slaweq> ok, next topic 16:37:07 <slaweq> #topic Tempest/Scenario 16:37:26 <slaweq> I want to raise here 2 issues 16:37:34 <slaweq> first is https://bugs.launchpad.net/neutron/+bug/1815585 16:37:35 <openstack> Launchpad bug 1815585 in neutron "Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario-linuxbridge" [High,Confirmed] 16:37:43 <slaweq> I was debugging it a bit 16:38:15 <slaweq> and what is strange for me is fact that port is unbound and active - and because of that active status test is failing 16:38:40 <slaweq> do You know if that should be possible somehow to have unbound port active? 16:38:48 <slaweq> for me this looks odd 16:39:10 <slaweq> ahh, example of such failure is e.g. here: http://logs.openstack.org/93/631793/6/check/neutron-tempest-plugin-scenario-linuxbridge/1c3c083/testr_results.html.gz 16:39:20 <mlavalle> yes, it looks od 16:39:22 <mlavalle> odd 16:39:45 <slaweq> maybe this is somehow related to https://bugs.launchpad.net/neutron/+bug/1819446 16:39:45 <openstack> Launchpad bug 1819446 in neutron "After the vm's port name is modified, the port status changes from down to active. " [Low,Confirmed] 16:40:03 <slaweq> but in this bug they reported that changing name of port will switch it to active 16:40:17 <mlavalle> even stranger 16:40:49 <slaweq> in test there is no such port updates before this place where test is failing but maybe it can happen not only on name update - I don't know 16:41:15 <slaweq> I have it in my backlog still but if someone would have time, please feel free to take it :) 16:41:56 <slaweq> in the meantime I can mark this test as unstable 16:42:01 <slaweq> what You think about it? 16:42:39 <slaweq> according to logstash it happend 9 times in last 7 days 16:42:39 <mlavalle> yep 16:43:00 <slaweq> ok, thx mlavalle, I will do it today :) 16:43:18 <slaweq> #action slaweq to mark test_floatingip_port_details test as unstable 16:43:54 <slaweq> and second problem which we have for some time already is problem with intermittent ssh failures in various tempest tests 16:44:03 <slaweq> example: http://logs.openstack.org/46/638646/14/check/neutron-tempest-plugin-scenario-linuxbridge/1ff70f5/testr_results.html.gz 16:44:11 <slaweq> here it is linuxbridge job 16:44:26 <slaweq> but it may happend in any tempest job in fact 16:44:41 <slaweq> I didn't saw any "pattern" of those failures 16:45:02 <mlavalle> do we have logstash query for them? 16:45:15 <slaweq> only thing which I think is quite common is fact that those instances can connect to metadata service but ssh to FIP is not working 16:45:24 <slaweq> mlavalle: no, I don't have logstash query for that 16:45:36 <slaweq> and I didn't reported bug for this also 16:45:53 <slaweq> I will report it to be able to track progress there 16:45:57 <mlavalle> if we created a logstash query and attach that for a bug report, we might cooperate in trying to isolate the issue 16:46:05 <slaweq> and I will also try to prepare some query 16:46:14 <slaweq> mlavalle++ I will do 16:46:25 <mlavalle> slaweq: I'll help 16:46:42 <slaweq> #action slaweq to report a bug related to intermittent ssh failures in various tests 16:46:45 <slaweq> thx mlavalle 16:47:09 <slaweq> and that's all from my side for today 16:47:40 <slaweq> anything else You want to talk about regarding to scenario jobs or anything else related to CI? 16:47:47 <slaweq> #topic Open discussion 16:48:29 <bcafarel> quick comment on stable branches, ocata branch gets -1 in recent backport candidates 16:48:37 <bcafarel> https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:stable/ocata 16:49:04 <bcafarel> I need to check if there's an easy fix, or if it's just getting old 16:49:17 <mlavalle> probably the latter 16:50:03 <bcafarel> pike branch needs some rechecks from time to time, but not as bad 16:50:12 <slaweq> bcafarel: looking at one random test failure: http://logs.openstack.org/51/646651/1/check/openstack-tox-cover/2a21a99/job-output.txt.gz#_2019-03-25_10_50_42_620773 16:50:21 <slaweq> first question is: why it is tested on bionic? 16:51:05 <slaweq> and it's not job defined in our repo I think 16:51:27 <slaweq> so maybe we should ask e.g. gmann about it 16:51:36 <slaweq> or someone else from qa team 16:51:47 <bcafarel> oooh very good point, so that would mostly be a fallout from bionic switch? 16:52:03 <slaweq> I just checked this one job for now :) 16:52:09 <slaweq> I don't know about other failures 16:52:36 <gmann> ocata should not do bionic. i will check the job 16:52:49 <bcafarel> gmann++ thanks 16:52:50 <slaweq> but yeah, looking at 3 other patches, it also failed on openstack-tox-cover job 16:53:01 <slaweq> hi gmann 16:53:06 <slaweq> thx for help with it :) 16:54:37 <slaweq> and thx bcafarel for taking care of those stable branches :) 16:55:12 <slaweq> ok, anything else for today? 16:55:24 <slaweq> if not I think we can finish few minutes earlier :) 16:55:45 <bcafarel> sounds good :) 16:55:58 <mlavalle> o/ 16:56:06 <slaweq> ok, thx for attending the meeting 16:56:06 <njohnston> +1 thanks! 16:56:09 <slaweq> #endmeeting