16:00:11 <slaweq> #startmeeting neutron_ci 16:00:11 <openstack> Meeting started Tue Oct 30 16:00:11 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:15 <openstack> The meeting name has been set to 'neutron_ci' 16:00:19 <slaweq> hello everyone 16:00:43 <slaweq> hi mlavalle :) 16:00:50 <mlavalle> o/ 16:00:54 <mlavalle> am I late? 16:01:04 <slaweq> no, You are first actually 16:01:14 <mlavalle> if yes, blame frickler and bcafarel 16:01:15 <slaweq> (except me who started this meeting) 16:01:29 <slaweq> I know, it's always bcafarel's fault :P 16:01:56 <slaweq> haleyb, njohnston: hongbin: manjeets: CI meeting - are You around? 16:01:59 <bcafarel> not this time, I even held on a question I have for mlavalle :p 16:02:11 <njohnston> o/ 16:02:11 <bcafarel> and o/ btw 16:02:15 <manjeets> o/ 16:02:28 * njohnston was lurking 16:02:42 <slaweq> ok, lets start then 16:02:47 <slaweq> #topic Actions from previous meetings 16:02:56 <slaweq> slaweq to continue checking how jobs will run on Bionic nodes 16:03:10 <slaweq> I was checking it a bit 16:03:34 <slaweq> I created etherpad https://etherpad.openstack.org/p/neutron_ci_on_bionic which I want to use to track the progress 16:04:01 * manjeets is in other meeting 16:04:10 <njohnston> +1 for the etherpad 16:04:11 <slaweq> and also I want to have separate topic about this on today's meeting, so let's talk about it later, ok for You? 16:05:09 <slaweq> I take it as "yes" :) 16:05:14 <slaweq> next action then: 16:05:16 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs 16:05:33 <mlavalle> you guys are not going to believe this..... 16:05:48 <mlavalle> no hist over the past seven days since yesterday 16:05:49 <manjeets> slaweq, so all the jobs mentioned in etherpad should use ubuntu bionic ideally ? 16:06:02 <njohnston> mlavalle: really? wow. 16:06:27 <mlavalle> This is the query I'm using http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22line%20143,%20in%20test_trunk_subport_lifecycle%5C%22&from=7d 16:06:49 <slaweq> mlavalle: are You sure? 16:06:58 <slaweq> I found something like http://logs.openstack.org/59/596959/6/check/neutron-tempest-plugin-dvr-multinode-scenario/fbc011b/testr_results.html.gz from yesterday for example 16:07:16 <slaweq> isn't it issue like You were looking for? 16:09:02 <mlavalle> yeah, it's the same issue 16:09:14 <mlavalle> the query is not catching it 16:09:41 <mlavalle> ah, you know why? 16:09:46 <mlavalle> it's the line number 16:10:03 <mlavalle> I should remove the line number from the query 16:10:47 <mlavalle> I took the query that slaweq left originally in the bug (that includes the line number) and just added the 7days 16:10:48 <slaweq> :) 16:11:01 <slaweq> sorry for that then 16:11:16 <mlavalle> in that case.... 16:11:21 <mlavalle> I'll go back to it 16:11:27 <mlavalle> LOL 16:11:32 <slaweq> ok, I will assign it to You again :) 16:11:39 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs 16:11:45 <slaweq> #action mlavalle to continue debugging issue with not reachable FIP in scenario jobs 16:12:01 <slaweq> thx mlavalle for taking care of it 16:12:04 <slaweq> next one was: 16:12:07 <slaweq> slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475 16:12:07 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] 16:12:26 <slaweq> and I totaly forgot about this one as I didn't create card on my trello for it :/ 16:12:28 <slaweq> sorry for that 16:12:39 <slaweq> I will assign it to myself for next week then 16:12:46 <slaweq> #action slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475 16:13:07 <slaweq> ok, next one 16:13:10 <slaweq> slaweq to increase neutron-tempest jobs timeouts 16:13:24 <slaweq> patch is merged already https://review.openstack.org/#/c/612809/ 16:14:17 <slaweq> did You saw any new failures because of timeouts in those jobs in last few days? 16:14:51 <njohnston> I haven't seen anything going into TIMED_OUT state at least 16:15:20 <njohnston> in general things look healthier 16:15:45 <slaweq> njohnston: it wasn't TIMED_OUT, it was usually FAILED 16:15:56 <slaweq> but in job-output.txt.gz there was info about timeout 16:16:21 <njohnston> OK, I haven't seen any of those lately but I haven't tried looking systematically 16:19:12 <slaweq> I don't see anything like that in logstash in last few days so should be better IMO 16:19:23 <slaweq> and lets just check as njohnston said :) 16:19:44 <slaweq> ok, so that's all from last week 16:19:57 <slaweq> #topic Python 3 16:20:12 <slaweq> lets talk about switch CI jobs to python3 16:20:26 <slaweq> njohnston: I think You are most up to date with it 16:20:51 <njohnston> I've been working on the change to the neutron-fu;;stack job we discussed previously: https://review.openstack.org/604749 16:21:52 <njohnston> my goal for this week is to start going through the jobs and adding python3, either by changing their ancestor zuul template or just by adding USE_PYTHON3 depending 16:21:54 * bcafarel actually reading through it, as zuul seems happy now 16:22:15 <njohnston> I noticed that there are differences in the zuul templates - for example if you base things off of "tempest-full" you 16:22:25 <slaweq> ok, but didn't we agree in Denver that we should just switch all jobs to py3 and left only UT and functional with py27 too? 16:23:04 <njohnston> sorry by "adding python3" I meant adding it to existing jobs to convert them to py3 jobs 16:23:14 <slaweq> ok :) 16:23:25 <slaweq> basically what I think is that we should do: 16:23:31 <slaweq> 1. etherpad to track progress 16:23:56 <njohnston> So 'tempest-full' will get more tests run than if you base off of 'tempest-full-py3' because the latter has things like object store turned off, which disables object tests - https://git.openstack.org/cgit/openstack/tempest/tree/.zuul.yaml#n114 16:24:01 <slaweq> 2. rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:24:16 <njohnston> slaweq: Will do, I'll get an etherpad together today 16:24:19 <slaweq> 3. Start switching other jobs to py3 as we decided 16:24:29 <slaweq> what do You think about it mlavalle and njohnston ? 16:24:44 <mlavalle> sounds like a good plan 16:25:21 <slaweq> and IMO we shouldn't do python35/36/37 jobs as it may depends on OS on which job is running 16:25:47 <slaweq> so we should just have our jobs like e.g. neutron-fullstack to run using python3 16:26:04 <njohnston> yes 16:26:19 <slaweq> mlavalle: ok for You? 16:26:21 <njohnston> we may have unit tests for the subversions like in Zane's proposal 16:26:27 <njohnston> but not for all the flavors of testing 16:26:41 <mlavalle> yes 16:26:53 <slaweq> yes, UT are fine and we already have different versions but we shouldn't do it for all other jobs 16:27:00 <mlavalle> njohnston: when you say Zane's proposal, you refer to the messsage in the ML? 16:27:50 <njohnston> it's now a governance change: https://review.openstack.org/#/c/613145/ 16:28:15 <njohnston> I encourage everyone to read it and give feedback 16:28:18 <njohnston> "Resolution on keeping up with Python 3 releases" 16:29:22 <mlavalle> yeah, about the same thing we stated in the message 16:29:39 <mlavalle> so, yes, overall, I'm in agreement 16:30:03 <slaweq> great :) njohnston will You do etherpad to track it? 16:31:46 <njohnston> already started 16:31:49 <slaweq> thx 16:32:14 <slaweq> and do You want to propose this change for functional tests? or do You want me to do it? 16:32:51 <njohnston> I'll do it 16:32:57 <slaweq> thx 16:33:17 <njohnston> #action njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:33:18 <slaweq> #action njohnston to create new neutron-functional-python27 job and switch existing one to python3 16:33:25 <njohnston> #undo 16:33:47 <njohnston> I think that undid mine... we'll have to see :-) 16:33:54 <slaweq> ahh, sorry :) 16:33:56 <njohnston> #action njohnston make py3 etherpad 16:34:07 <slaweq> ok, I will remember only one of them ;) 16:34:47 <slaweq> so next week we will be able to check how it's going and continue this work 16:35:04 <slaweq> do You have anything else related to python3 to talk about? 16:35:23 <njohnston> nope, thanks 16:35:32 <slaweq> ok, so lets move on 16:35:34 <slaweq> next topic 16:35:41 <slaweq> #topic Ubuntu Bionic in CI jobs 16:36:13 <slaweq> as I said, etherpad for it is created https://etherpad.openstack.org/p/neutron_ci_on_bionic 16:36:49 <slaweq> today I sent patch for neutron-tempest-plugin: https://review.openstack.org/#/c/614216/ to switch our jobs to Bionic 16:36:58 <slaweq> it's for now DNM patch as I want to check how it will be 16:37:06 <slaweq> ahh, and one most important thing 16:37:29 <slaweq> there is devstack patch https://review.openstack.org/#/c/610977/ which adds Bionic nodesets which we can use 16:38:02 <slaweq> in etherpad I listed what nodeset should be replaced with what new one (but it may not be perfect yet) 16:38:35 <slaweq> I also did some DNM patch https://review.openstack.org/#/c/610997/ 16:39:02 <slaweq> it looks that for many jobs we are fine and they works good on Bionic 16:39:22 <slaweq> there are some small issues with fullstack jobs for example but it's not something really big 16:40:04 <slaweq> and now the question is: how You want to perform that switch? should we first switch all jobs to python3 and then to Bionic? or do it "in parallel"? 16:40:12 <slaweq> or Bionic first and then python3? 16:40:18 <slaweq> any thoughts? 16:40:51 <mlavalle> I like keeping things simple 16:40:58 <mlavalle> one change at a time 16:41:14 <njohnston> agreed 16:41:33 <mlavalle> if possible, let's convert a job to python3 16:41:37 <njohnston> by the way the problematic part for fullstack - where ovs is compiled - can be discarded when we move to bionic 16:41:45 <mlavalle> and then we move that to bionic 16:42:16 <slaweq> that was also my idea :) lets move to python3 first and then start switching to Bionic 16:42:31 <njohnston> do we know if grenade is ready for py3 yet? 16:42:44 <mlavalle> no clue 16:42:45 <slaweq> I don't know 16:43:14 <njohnston> #action njohnston check if grenade is ready for py3 16:43:21 <slaweq> thx njohnston :) 16:43:41 <njohnston> I mean we'll discover it if our changes fail for grenade jobs, but it'd be nice to find out the plan 16:44:24 <slaweq> njohnston: I agree 16:44:48 <slaweq> ok, so I guess we can move on to next topic now 16:44:51 <slaweq> #topic Grafana 16:44:59 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:46:06 <slaweq> fullstack is about 30% of failures now 16:46:20 <slaweq> but it is related to this issue with stopping processes IMHO 16:46:38 <slaweq> https://bugs.launchpad.net/neutron/+bug/1798472 16:46:39 <openstack> Launchpad bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed] 16:46:45 <slaweq> and to https://bugs.launchpad.net/neutron/+bug/1798475 16:46:45 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] 16:47:21 <slaweq> and both aren't assignment yet 16:47:41 <slaweq> I will try to check at least one of them if I will have some time this week 16:48:28 <slaweq> in functional tests db_migration test is happening less often I think 16:48:28 <mlavalle> I can try to help with the other one 16:48:34 <slaweq> thx mlavalle 16:48:36 <mlavalle> contingent on time availaility 16:48:44 <mlavalle> which one do you want slaweq? 16:49:02 <slaweq> so I will take https://bugs.launchpad.net/neutron/+bug/1798472 16:49:02 <openstack> Launchpad bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed] 16:49:28 <slaweq> #action slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472) 16:49:44 <mlavalle> ok 16:50:22 <mlavalle> #action mlavalle to check bug 1798475 16:50:22 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:51:10 <slaweq> getting back to functional tests, db migration tests are less often after I increased timeout for them to 600 seconds, but unfortunatelly I saw it at lest once that it happend event with such timeout: * http://logs.openstack.org/31/613231/1/gate/neutron-functional/441128f/job-output.txt.gz#_2018-10-25_18_54_53_992690 16:51:39 <slaweq> so I think that it mayb be something different than only slow node 16:52:28 <slaweq> let's observe it for few more days and if You will spot it again, please add it to bug report: https://bugs.launchpad.net/neutron/+bug/1687027 16:52:28 <openstack> Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:54:15 <mlavalle> ack 16:54:32 <slaweq> and that's all from fullstack/functional tests 16:54:39 <slaweq> anything else You want to add? 16:54:46 <mlavalle> no, thanks 16:55:05 <slaweq> ok, so let's now talk about 16:55:07 <slaweq> #topic Periodic 16:55:37 <slaweq> from grafana I see that openstack-tox-py35-with-oslo-master is failing constantly since few days 16:55:55 <slaweq> example from today: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-oslo-master/7ae4398/testr_results.html.gz 16:57:01 <slaweq> anyone wants to fix that? 16:57:07 <slaweq> or I should take it? 16:57:17 <mlavalle> I don't have time this week 16:57:26 <slaweq> ok, I will check it 16:57:43 <slaweq> #action slaweq to check issue with openstack-tox-py35-with-oslo-master periodic job 16:58:11 <slaweq> ok, so that's all from my side for today 16:58:26 <slaweq> anything else You want to add/ask maybe? 16:58:32 <mlavalle> not from me 16:59:00 <slaweq> ok 16:59:04 <slaweq> thx for attending 16:59:07 <mlavalle> o/ 16:59:09 <slaweq> #endmeeting