#openstack-meeting log

16:00:11 <slaweq> #startmeeting neutron_ci
16:00:11 <openstack> Meeting started Tue Oct 30 16:00:11 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:15 <openstack> The meeting name has been set to 'neutron_ci'
16:00:19 <slaweq> hello everyone
16:00:43 <slaweq> hi mlavalle :)
16:00:50 <mlavalle> o/
16:00:54 <mlavalle> am I late?
16:01:04 <slaweq> no, You are first actually
16:01:14 <mlavalle> if yes, blame frickler and bcafarel
16:01:15 <slaweq> (except me who started this meeting)
16:01:29 <slaweq> I know, it's always bcafarel's fault :P
16:01:56 <slaweq> haleyb, njohnston: hongbin: manjeets: CI meeting - are You around?
16:01:59 <bcafarel> not this time, I even held on a question I have for mlavalle :p
16:02:11 <njohnston> o/
16:02:11 <bcafarel> and o/ btw
16:02:15 <manjeets> o/
16:02:28 * njohnston was lurking
16:02:42 <slaweq> ok, lets start then
16:02:47 <slaweq> #topic Actions from previous meetings
16:02:56 <slaweq> slaweq to continue checking how jobs will run on Bionic nodes
16:03:10 <slaweq> I was checking it a bit
16:03:34 <slaweq> I created etherpad https://etherpad.openstack.org/p/neutron_ci_on_bionic which I want to use to track the progress
16:04:01 * manjeets is in other meeting
16:04:10 <njohnston> +1 for the etherpad
16:04:11 <slaweq> and also I want to have separate topic about this on today's meeting, so let's talk about it later, ok for You?
16:05:09 <slaweq> I take it as "yes" :)
16:05:14 <slaweq> next action then:
16:05:16 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs
16:05:33 <mlavalle> you guys are not going to believe this.....
16:05:48 <mlavalle> no hist over the past seven days since yesterday
16:05:49 <manjeets> slaweq, so all the jobs mentioned in etherpad should use ubuntu bionic ideally ?
16:06:02 <njohnston> mlavalle: really?  wow.
16:06:27 <mlavalle> This is the query I'm using http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22line%20143,%20in%20test_trunk_subport_lifecycle%5C%22&from=7d
16:06:49 <slaweq> mlavalle: are You sure?
16:06:58 <slaweq> I found something like http://logs.openstack.org/59/596959/6/check/neutron-tempest-plugin-dvr-multinode-scenario/fbc011b/testr_results.html.gz from yesterday for example
16:07:16 <slaweq> isn't it issue like You were looking for?
16:09:02 <mlavalle> yeah, it's the same issue
16:09:14 <mlavalle> the query is not catching it
16:09:41 <mlavalle> ah, you know why?
16:09:46 <mlavalle> it's the line number
16:10:03 <mlavalle> I should remove the line number from the query
16:10:47 <mlavalle> I took the query that slaweq left originally in the bug (that includes the line number) and just added the 7days
16:10:48 <slaweq> :)
16:11:01 <slaweq> sorry for that then
16:11:16 <mlavalle> in that case....
16:11:21 <mlavalle> I'll go back to it
16:11:27 <mlavalle> LOL
16:11:32 <slaweq> ok, I will assign it to You again :)
16:11:39 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs
16:11:45 <slaweq> #action mlavalle to continue debugging issue with not reachable FIP in scenario jobs
16:12:01 <slaweq> thx mlavalle for taking care of it
16:12:04 <slaweq> next one was:
16:12:07 <slaweq> slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475
16:12:07 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed]
16:12:26 <slaweq> and I totaly forgot about this one as I didn't create card on my trello for it :/
16:12:28 <slaweq> sorry for that
16:12:39 <slaweq> I will assign it to myself for next week then
16:12:46 <slaweq> #action slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475
16:13:07 <slaweq> ok, next one
16:13:10 <slaweq> slaweq to increase neutron-tempest jobs timeouts
16:13:24 <slaweq> patch is merged already https://review.openstack.org/#/c/612809/
16:14:17 <slaweq> did You saw any new failures because of timeouts in those jobs in last few days?
16:14:51 <njohnston> I haven't seen anything going into TIMED_OUT state at least
16:15:20 <njohnston> in general things look healthier
16:15:45 <slaweq> njohnston: it wasn't TIMED_OUT, it was usually FAILED
16:15:56 <slaweq> but in job-output.txt.gz there was info about timeout
16:16:21 <njohnston> OK, I haven't seen any of those lately but I haven't tried looking systematically
16:19:12 <slaweq> I don't see anything like that in logstash in last few days so should be better IMO
16:19:23 <slaweq> and lets just check as njohnston said :)
16:19:44 <slaweq> ok, so that's all from last week
16:19:57 <slaweq> #topic Python 3
16:20:12 <slaweq> lets talk about switch CI jobs to python3
16:20:26 <slaweq> njohnston: I think You are most up to date with it
16:20:51 <njohnston> I've been working on the change to the neutron-fu;;stack job we discussed previously: https://review.openstack.org/604749
16:21:52 <njohnston> my goal for this week is to start going through the jobs and adding python3, either by changing their ancestor zuul template or just by adding USE_PYTHON3 depending
16:21:54 * bcafarel actually reading through it, as zuul seems happy now
16:22:15 <njohnston> I noticed that there are differences in the zuul templates - for example if you base things off of "tempest-full" you
16:22:25 <slaweq> ok, but didn't we agree in Denver that we should just switch all jobs to py3 and left only UT and functional with py27 too?
16:23:04 <njohnston> sorry by "adding python3" I meant adding it to existing jobs to convert them to py3 jobs
16:23:14 <slaweq> ok :)
16:23:25 <slaweq> basically what I think is that we should do:
16:23:31 <slaweq> 1. etherpad to track progress
16:23:56 <njohnston> So 'tempest-full' will get more tests run than if you base off of 'tempest-full-py3' because the latter has things like object store turned off, which disables object tests - https://git.openstack.org/cgit/openstack/tempest/tree/.zuul.yaml#n114
16:24:01 <slaweq> 2. rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3
16:24:16 <njohnston> slaweq: Will do, I'll get an etherpad together today
16:24:19 <slaweq> 3. Start switching other jobs to py3 as we decided
16:24:29 <slaweq> what do You think about it mlavalle and njohnston ?
16:24:44 <mlavalle> sounds like a good plan
16:25:21 <slaweq> and IMO we shouldn't do python35/36/37 jobs as it may depends on OS on which job is running
16:25:47 <slaweq> so we should just have our jobs like e.g. neutron-fullstack to run using python3
16:26:04 <njohnston> yes
16:26:19 <slaweq> mlavalle: ok for You?
16:26:21 <njohnston> we may have unit tests for the subversions like in Zane's proposal
16:26:27 <njohnston> but not for all the flavors of testing
16:26:41 <mlavalle> yes
16:26:53 <slaweq> yes, UT are fine and we already have different versions but we shouldn't do it for all other jobs
16:27:00 <mlavalle> njohnston: when you say Zane's proposal, you refer to the messsage in the ML?
16:27:50 <njohnston> it's now a governance change: https://review.openstack.org/#/c/613145/
16:28:15 <njohnston> I encourage everyone to read it and give feedback
16:28:18 <njohnston> "Resolution on keeping up with Python 3 releases"
16:29:22 <mlavalle> yeah, about the same thing we stated in the message
16:29:39 <mlavalle> so, yes, overall, I'm in agreement
16:30:03 <slaweq> great :) njohnston will You do etherpad to track it?
16:31:46 <njohnston> already started
16:31:49 <slaweq> thx
16:32:14 <slaweq> and do You want to propose this change for functional tests? or do You want me to do it?
16:32:51 <njohnston> I'll do it
16:32:57 <slaweq> thx
16:33:17 <njohnston> #action njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3
16:33:18 <slaweq> #action njohnston to create new neutron-functional-python27 job and switch existing one to python3
16:33:25 <njohnston> #undo
16:33:47 <njohnston> I think that undid mine... we'll have to see :-)
16:33:54 <slaweq> ahh, sorry :)
16:33:56 <njohnston> #action njohnston make py3 etherpad
16:34:07 <slaweq> ok, I will remember only one of them ;)
16:34:47 <slaweq> so next week we will be able to check how it's going and continue this work
16:35:04 <slaweq> do You have anything else related to python3 to talk about?
16:35:23 <njohnston> nope, thanks
16:35:32 <slaweq> ok, so lets move on
16:35:34 <slaweq> next topic
16:35:41 <slaweq> #topic Ubuntu Bionic in CI jobs
16:36:13 <slaweq> as I said, etherpad for it is created https://etherpad.openstack.org/p/neutron_ci_on_bionic
16:36:49 <slaweq> today I sent patch for neutron-tempest-plugin: https://review.openstack.org/#/c/614216/ to switch our jobs to Bionic
16:36:58 <slaweq> it's for now DNM patch as I want to check how it will be
16:37:06 <slaweq> ahh, and one most important thing
16:37:29 <slaweq> there is devstack patch https://review.openstack.org/#/c/610977/ which adds Bionic nodesets which we can use
16:38:02 <slaweq> in etherpad I listed what nodeset should be replaced with what new one (but it may not be perfect yet)
16:38:35 <slaweq> I also did some DNM patch https://review.openstack.org/#/c/610997/
16:39:02 <slaweq> it looks that for many jobs we are fine and they works good on Bionic
16:39:22 <slaweq> there are some small issues with fullstack jobs for example but it's not something really big
16:40:04 <slaweq> and now the question is: how You want to perform that switch? should we first switch all jobs to python3 and then to Bionic? or do it "in parallel"?
16:40:12 <slaweq> or Bionic first and then python3?
16:40:18 <slaweq> any thoughts?
16:40:51 <mlavalle> I like keeping things simple
16:40:58 <mlavalle> one change at a time
16:41:14 <njohnston> agreed
16:41:33 <mlavalle> if possible, let's convert a job to python3
16:41:37 <njohnston> by the way the problematic part for fullstack - where ovs is compiled - can be discarded when we move to bionic
16:41:45 <mlavalle> and then we move that to bionic
16:42:16 <slaweq> that was also my idea :) lets move to python3 first and then start switching to Bionic
16:42:31 <njohnston> do we know if grenade is ready for py3 yet?
16:42:44 <mlavalle> no clue
16:42:45 <slaweq> I don't know
16:43:14 <njohnston> #action njohnston check if grenade is ready for py3
16:43:21 <slaweq> thx njohnston :)
16:43:41 <njohnston> I mean we'll discover it if our changes fail for grenade jobs, but it'd be nice to find out the plan
16:44:24 <slaweq> njohnston: I agree
16:44:48 <slaweq> ok, so I guess we can move on to next topic now
16:44:51 <slaweq> #topic Grafana
16:44:59 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:46:06 <slaweq> fullstack is about 30% of failures now
16:46:20 <slaweq> but it is related to this issue with stopping processes IMHO
16:46:38 <slaweq> https://bugs.launchpad.net/neutron/+bug/1798472
16:46:39 <openstack> Launchpad bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed]
16:46:45 <slaweq> and to https://bugs.launchpad.net/neutron/+bug/1798475
16:46:45 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed]
16:47:21 <slaweq> and both aren't assignment yet
16:47:41 <slaweq> I will try to check at least one of them if I will have some time this week
16:48:28 <slaweq> in functional tests db_migration test is happening less often I think
16:48:28 <mlavalle> I can try to help with the other one
16:48:34 <slaweq> thx mlavalle
16:48:36 <mlavalle> contingent on time availaility
16:48:44 <mlavalle> which one do you want slaweq?
16:49:02 <slaweq> so I will take https://bugs.launchpad.net/neutron/+bug/1798472
16:49:02 <openstack> Launchpad bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed]
16:49:28 <slaweq> #action slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472)
16:49:44 <mlavalle> ok
16:50:22 <mlavalle> #action mlavalle to check bug 1798475
16:50:22 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475
16:51:10 <slaweq> getting back to functional tests, db migration tests are less often after I increased timeout for them to 600 seconds, but unfortunatelly I saw it at lest once that it happend event with such timeout:     * http://logs.openstack.org/31/613231/1/gate/neutron-functional/441128f/job-output.txt.gz#_2018-10-25_18_54_53_992690
16:51:39 <slaweq> so I think that it mayb be something different than only slow node
16:52:28 <slaweq> let's observe it for few more days and if You will spot it again, please add it to bug report: https://bugs.launchpad.net/neutron/+bug/1687027
16:52:28 <openstack> Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:54:15 <mlavalle> ack
16:54:32 <slaweq> and that's all from fullstack/functional tests
16:54:39 <slaweq> anything else You want to add?
16:54:46 <mlavalle> no, thanks
16:55:05 <slaweq> ok, so let's now talk about
16:55:07 <slaweq> #topic Periodic
16:55:37 <slaweq> from grafana I see that openstack-tox-py35-with-oslo-master is failing constantly since few days
16:55:55 <slaweq> example from today: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-oslo-master/7ae4398/testr_results.html.gz
16:57:01 <slaweq> anyone wants to fix that?
16:57:07 <slaweq> or I should take it?
16:57:17 <mlavalle> I don't have time this week
16:57:26 <slaweq> ok, I will check it
16:57:43 <slaweq> #action slaweq to check issue with openstack-tox-py35-with-oslo-master periodic job
16:58:11 <slaweq> ok, so that's all from my side for today
16:58:26 <slaweq> anything else You want to add/ask maybe?
16:58:32 <mlavalle> not from me
16:59:00 <slaweq> ok
16:59:04 <slaweq> thx for attending
16:59:07 <mlavalle> o/
16:59:09 <slaweq> #endmeeting