16:00:43 <ihrachys> #startmeeting neutron_ci
16:00:43 <openstack> Meeting started Tue Dec  5 16:00:43 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <openstack> The meeting name has been set to 'neutron_ci'
16:00:50 <ihrachys> what's up!
16:00:51 <jlibosva> o
16:00:59 <mlavalle> o/
16:01:00 * jlibosva lost his hand
16:01:03 <ihrachys> jlibosva, you came bare hands?
16:01:07 <ihrachys> :)
16:01:07 <slaweq_> hi
16:01:26 <jlibosva> :]
16:01:43 <ihrachys> #topic Actions from prev meeting
16:01:53 <ihrachys> "ihrachys to disable connectivity fullstack tests while we look for culprit"
16:01:54 <ihrachys> and
16:02:00 <ihrachys> "ihrachys to disable dscp qos fullstack test while we look for culprit"
16:02:18 <ihrachys> both merged
16:02:28 <ihrachys> https://review.openstack.org/523517 and https://review.openstack.org/523518
16:02:46 <ihrachys> I was thinking about backporting but that will require to first backport the decorator. do we want to go this way?
16:02:58 <jlibosva> I don't think it should be a priority
16:03:19 <jlibosva> I'd try to stabilize master and then think about stable branches
16:04:04 <ihrachys> ok
16:04:16 <ihrachys> next is
16:04:25 <ihrachys> "jlibosva to look at test_securitygroup(linuxbridge-iptables) failure in fullstack"
16:04:42 <ihrachys> that was some "test_securitygroup(linuxbridge-iptables) with RuntimeError: Process ['ncat', u'20.0.0.11', '3333', '-w', '20'] hasn't been spawned in 20 seconds" failure
16:04:48 <jlibosva> I need more time for both. There was no obvious failure so I'm trying to reproduce
16:04:56 <ihrachys> ack
16:05:02 <ihrachys> I assume second you mean "jlibosva to look at test_securitygroup(linuxbridge-iptables) failure in fullstack"
16:05:04 <jlibosva> the reason it wasn't spawned might be that it uses tcp connection and that couldn't be established
16:05:16 <ihrachys> eh
16:05:18 <ihrachys> "jlibosva to look at test_l2_agent_restart fullstack failure"
16:05:49 <ihrachys> ok next was "ihrachys to disable legacy jobs for neutron master"
16:05:57 <ihrachys> that's for tempest api / scenario jobs
16:05:58 <jlibosva> I briefly looked at that one too but will still need more time. I suspect that 20 seconds might not be enough to restart all agents and come back
16:06:13 <jlibosva> in previous debugging, we saw it took 30 seconds to just start an agent, so it could be a clue
16:06:42 <ihrachys> ack
16:06:51 <ihrachys> for legacy jobs, we merged this: https://review.openstack.org/523514
16:06:56 <ihrachys> disabling them for master
16:07:13 <ihrachys> and we even landed the patch cleaning up tempest in-tree test classes (we will discuss that later), so it works fine
16:07:28 <ihrachys> those were all action items we had so far
16:07:33 <jlibosva> regarding that
16:07:41 <jlibosva> it seems we still have a legacy api job in gate Q
16:07:53 <ihrachys> hm... where?
16:08:19 <jlibosva> don't we need this as well https://review.openstack.org/#/c/521349/ ?
16:08:45 <ihrachys> jlibosva, that would merge after we move the jobs into stable branches
16:08:49 <jlibosva> at the gate report from removing the in-tree tempest but if you want to talk about that later, we can discuss that later
16:08:50 <ihrachys> Miguel is working on it
16:08:53 <jlibosva> https://review.openstack.org/#/c/506672/
16:09:05 <jlibosva> aah, ok, sorry
16:09:12 <jlibosva> but still it seems the api job is there ^^
16:09:15 <ihrachys> jlibosva, oh that's ouch
16:09:22 <ihrachys> jlibosva, maybe it's still voting in gate
16:09:34 <ihrachys> I may have killed it in check queue only
16:09:37 <ihrachys> I will follow up on it
16:09:40 <jlibosva> ack
16:09:42 <jlibosva> thanks
16:09:50 <jlibosva> I don't really know what is going on :)
16:09:51 <ihrachys> #action ihrachys to make sure legacy tempest jobs are gone in gate queue
16:09:58 <ihrachys> thanks for the notice!
16:10:00 <mlavalle> ahhh, that explains why I saw it last night
16:10:30 <ihrachys> I will figure it out
16:10:30 <ihrachys> #topic Grafana
16:10:33 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:11:41 <ihrachys> I gotta wash my eyes. I can't see scenarios at 100%
16:12:22 <jlibosva> I don't see it at all
16:12:37 <ihrachys> oh I think that's because they are gone
16:12:43 <ihrachys> because of project-config change
16:12:50 <ihrachys> I forgot to update grafana
16:12:57 <ihrachys> meh, I suck
16:13:08 <ihrachys> #action ihrachys to update grafana for new non-legacy job names
16:13:23 <ihrachys> it sounded too good to be true right!
16:13:24 <ihrachys> :)
16:13:29 <mlavalle> yeah
16:13:43 <mlavalle> it's not xmas yet
16:13:52 <jlibosva> but at least you get that good feeling for a short time :)
16:13:55 <jlibosva> lol
16:14:01 <mlavalle> lol
16:14:23 <ihrachys> ok so other than that, fullstack that we'll have a closer look now
16:14:26 <ihrachys> #topic Fullstack
16:14:48 <ihrachys> fullstack goes sideways in 60-80% failure rate range
16:15:21 <ihrachys> we have quite some fullstack related bugs
16:15:22 <ihrachys> https://bugs.launchpad.net/neutron/+bugs?field.tag=fullstack&orderby=status&start=0
16:15:29 <ihrachys> 14 total
16:15:37 <ihrachys> some are not gate-failures of couse
16:15:51 <ihrachys> let's walk through them
16:16:06 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1728948
16:16:06 <openstack> Launchpad bug 1728948 in neutron "fullstack: test_connectivity fails due to dhclient crash" [High,New] - Assigned to Jakub Libosvar (libosvar)
16:16:14 <ihrachys> I believe that's one of ton of bugs that jlibosva is looking into
16:16:28 <ihrachys> I guess it should be Confirmed since we see it all the time
16:16:32 <jlibosva> I should be looking at it, I haven't yet :)
16:16:41 <ihrachys> it shouldn't affect gate now right?
16:16:43 <jlibosva> yeah, confirmed makes snese
16:17:04 <jlibosva> I don't know, I would expect it affects other tests using dhcp
16:17:29 <jlibosva> it might be an issue with the dhclient script I copied from dhclient repo
16:17:58 <slaweq_> maybe we should set all other tests to not use dhcp?
16:18:00 <jlibosva> I need to dig deeper
16:18:07 <ihrachys> but I mean, we disabled the tests with the decorator
16:18:24 <jlibosva> were those the only tests using dhcp?
16:18:27 <jlibosva> I need to check
16:18:48 <ihrachys> ok. anyway, seems like you will have more after you look closer.
16:18:52 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1733649
16:18:52 <openstack> Launchpad bug 1733649 in neutron "fullstack neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs.test_dscp_marking_packets(openflow-native) failure" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:19:11 <ihrachys> slaweq_, I think you made some progress there. can you brief us on where it stands?
16:19:17 <jlibosva> maybe test_dhcp_agent will also be affected
16:19:19 <slaweq_> yes
16:19:27 <slaweq_> so I couldn't reproduce it locally
16:19:46 <slaweq_> I pushed patch with some additional logs but it was also fine each time when I rechecked
16:20:11 <slaweq_> I was also checking logs from failed jobs and for me everything on L2 agent and server side looks fine
16:20:29 <slaweq_> I suspect that it can be issue with tcpdump output which don't match regex for some reason
16:20:43 <ihrachys> good. I think you can put it on backburner and we will wait when it triggers your log message.
16:20:46 <slaweq_> so there is already merged patch https://review.openstack.org/#/c/525156/ which adds logging of tcpdump
16:21:02 <slaweq_> sorry but what is backburner?
16:21:12 <mlavalle> leave it aside for a while
16:21:30 <mlavalle> it's an expression
16:21:36 <slaweq_> ah, ok :)
16:21:50 <ihrachys> thanks mlavalle, I should have known not to use it :)
16:21:51 <slaweq_> so yes, I will wait now for such failed test and will check it then
16:22:23 <ihrachys> there was an old https://bugs.launchpad.net/neutron/+bug/1687074 but I am not sure if it is still an issue
16:22:23 <openstack> Launchpad bug 1687074 in neutron "Sometimes ovsdb fails with "tcp:127.0.0.1:6640: error parsing stream"" [High,Confirmed]
16:22:23 <slaweq_> I will remember it now, sorry for my english :)
16:23:10 <ihrachys> jlibosva, you suspect latest ovsdbapp may have it fixed?
16:23:35 <jlibosva> I saw it very recently
16:23:47 <jlibosva> let's check logstash
16:23:49 <ihrachys> with the newest library?
16:24:08 <ihrachys> yeah I see it lately
16:24:08 <jlibosva> that's what I'm not sure
16:24:16 <jlibosva> maybe it was today :)
16:24:16 <ihrachys> in master
16:24:24 <ihrachys> this is q: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22error%20parsing%20stream%5C%22
16:25:09 <ihrachys> most of hits are successes though
16:26:34 <ihrachys> for what I understand, we get random crap from ovsdb socket there
16:26:57 <jlibosva> I can't see one in fullstack though
16:27:01 <ihrachys> it would be nice to see full contents of socket when it happens
16:27:03 <jlibosva> all the failures come from functional tests
16:27:18 <ihrachys> jlibosva, yeah that's true
16:27:20 <ihrachys> hm
16:27:29 <ihrachys> eventlet?
16:27:33 <jlibosva> but I'm 100% I saw it recently in fullstack
16:27:37 <jlibosva> *sure
16:28:51 <ihrachys> though in fullstack, we also patch test runner and agents, and functional test runner should be patched too, so it is probably the same from eventlet perspective
16:29:46 <ihrachys> otherwiseguy, are you around
16:30:43 <ihrachys> ok let's move on but otherwiseguy feel free to reply later
16:31:16 <ihrachys> I think we may need otherwiseguy on that bug since it seems rather close to ovsdbapp
16:31:31 <ihrachys> next is https://bugs.launchpad.net/neutron/+bug/1673531
16:31:31 <openstack> Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [High,Confirmed] - Assigned to Ihar Hrachyshka (ihar-hrachyshka)
16:31:32 <jlibosva> agreed
16:31:46 <ihrachys> I haven't made any progress since the last time here. need more time.
16:32:27 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1721796
16:32:28 <openstack> Launchpad bug 1721796 in neutron "wait_until_true is not rootwrap daemon friendly" [Medium,In progress] - Assigned to Jakub Libosvar (libosvar)
16:32:31 <ihrachys> (I skip Wishlist bugs)
16:32:43 <ihrachys> jlibosva, is that one still valid with new oslo.rootwrap?
16:32:58 * jlibosva ¯\_(ツ)_/¯
16:33:10 <jlibosva> is the new rootwrap already released, bumped and used?
16:33:14 <ihrachys> yes
16:33:20 <jlibosva> ok, I'll keep an eye on this one
16:33:22 <ihrachys> in master for sure.
16:33:45 <ihrachys> jlibosva, ok. I would close and revisit if it pops up again. :)
16:33:47 <jlibosva> problem is that there is no easy string to check
16:33:50 <jlibosva> sounds good
16:34:32 <slaweq_> as I'm checking fullstack results quite often I will also be aware of this one
16:34:59 <jlibosva> thanks
16:36:01 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1487548
16:36:01 <openstack> Launchpad bug 1487548 in neutron "fullstack infrastructure tears down processes via kill -9" [Low,In progress] - Assigned to Ihar Hrachyshka (ihar-hrachyshka)
16:36:08 <ihrachys> I totally forgot about that one
16:36:15 <ihrachys> I had a primitive test patch here: https://review.openstack.org/#/c/499803/
16:36:28 <ihrachys> but there is some work to do there
16:36:45 <ihrachys> making sure that if a process doesn't die gracefully we still kill it forcefully
16:37:11 <ihrachys> honestly though, I may not get on that one in next weeks. shouldn't be a high priority.
16:37:16 <ihrachys> maybe even deserves Wishlist
16:37:21 <jlibosva> agreed
16:37:26 <ihrachys> since nothing is broken
16:37:30 <slaweq_> ihrachys: I can help on this one if You want
16:37:32 <ihrachys> ok I moved to wishlist
16:37:37 <jlibosva> also I think we already have a code that sends 15 and if it doesn't exit, it does 9
16:37:57 <ihrachys> slaweq_, nah I think if you have cycles on fullstack it's better to e.g. help jlibosva with two issues he already tracks
16:38:09 <ihrachys> jlibosva, not for all agents I believe
16:38:13 <slaweq_> ok
16:38:25 <jlibosva> I mean that the functionality is implemented :)
16:38:54 <ihrachys> yeah. ok, I will have another look, maybe it already works as needed.
16:40:27 <ihrachys> I think we have enough work to do not to look through a recent log
16:40:37 <jlibosva> I have one more thing regarding fullstack
16:40:41 <ihrachys> shoot
16:40:57 <jlibosva> so I saw we are getting timeouts .. I think that's cause we have way too many jobs
16:41:16 <jlibosva> one way to simplify testing matrix would be eliminating the ovsdb and openflow interfaces
16:41:42 <jlibosva> we have https://review.openstack.org/#/c/503076/
16:42:18 <ihrachys> oh well. yeah, it fell through cracks... :)
16:42:27 <jlibosva> and we can do similar to https://review.openstack.org/#/c/506713/ with openflow I think
16:42:59 <jlibosva> ihrachys: do you plan to work on that one or maybe we could find somebody to continue the work
16:43:09 <ihrachys> there was some environment setup dependency on cli that I couldn't immediately figure out before I was distracted
16:44:01 <jlibosva> I think we should also prioritize that one to q3
16:44:07 <ihrachys> having someone looking into it instead of me would definitely help :)
16:44:35 <jlibosva> iwamoto asked about status so maybe he would be interested
16:44:41 <jlibosva> also I am interested :)
16:44:59 <jlibosva> another thing
16:45:41 <jlibosva> I thought about improving fullstack by putting the environment building to class level, so env will be built for all tests in a class just once
16:45:44 <ihrachys> jlibosva, ok I commented in gerrit that IWAMOTO can take it over
16:45:50 <jlibosva> ihrachys: thanks
16:45:55 <jlibosva> that can also save some time
16:46:28 <ihrachys> like tempest?
16:46:28 <jlibosva> and last thing - it takes time to create the DB, we can have something like an empty DB dump that would fill the DB with tables instead of using alembic
16:46:34 <jlibosva> right, like tempest
16:47:08 <mlavalle> good ideas jlibosva
16:47:09 <jlibosva> the DB dump would need to be regenerated on each model changes, similarly to updating head hash
16:47:27 <ihrachys> jlibosva, sharing env may affect stability, if there is some cross dependency. but it's worth a try.
16:47:31 <jlibosva> should I file rfes against fullstack or wishlist bugs?
16:47:34 <ihrachys> jlibosva, you suggest to check in dump in git?
16:47:52 <ihrachys> jlibosva, wishlists are fine I believe; rfes are for user visible changes.
16:47:57 <jlibosva> ihrachys: yes, dump with empty tables
16:48:02 <jlibosva> ack, thanks
16:48:07 <jlibosva> I'll try to write something down
16:48:15 <jlibosva> I had this in my head for a while
16:48:42 <ihrachys> #action jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class)
16:49:16 <ihrachys> let's briefly look at tempest plugin since we have little time
16:49:17 <ihrachys> #topic Tempest plugin
16:49:29 <ihrachys> https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move
16:49:59 <ihrachys> so as I learned today, I need to clean up gate from legacy before we can land the patch removing tempest code from neutron tree
16:50:20 <ihrachys> assuming it's done, grafana is fixed, and neutron tempest code is dropped; what are next steps.
16:50:26 <ihrachys> chandankumar, ^
16:50:51 <chandankumar> ihrachys: we need to fix the plugins
16:50:56 <ihrachys> I believe one vector is moving those legacy jobs for stable branches and then removing them in project-config / openstack-zuul-jobs / ...
16:51:12 <ihrachys> chandankumar, which plugins. you mean for stadium projects?
16:51:16 <chandankumar> https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move
16:51:21 <chandankumar> ihrachys: sorry
16:51:51 <chandankumar> ihrachys: since we have now working neutron tempest plugin, we need to think about stadium projects having intree tempest plugin
16:52:18 <ihrachys> there are two in the list - midonet and dynamic-routing
16:52:38 <ihrachys> others are not stadium and I would not waste time on fixing them. interested parties can chime in.
16:52:53 <mlavalle> how about networking-sfc?
16:52:59 <chandankumar> what about neutron-vpnaas  and dynamic routing
16:53:06 <ihrachys> vpnaas is in not stadium
16:53:14 <ihrachys> dynamic-routing, I mentioned it already
16:53:27 <ihrachys> mlavalle, it's not in the list. I assume it means it's fine. chandankumar correct?
16:53:35 <chandankumar> ihrachys: yup
16:53:48 <mlavalle> bcafarel was asking about it a few weeks ago
16:54:00 <ihrachys> chandankumar, do you plan to respin for comments, or we should find owners from subprojects to take over the patches?
16:54:20 <chandankumar> ihrachys: yup i need to respin the patches
16:54:34 <ihrachys> ok feel free to ask for help if needed
16:54:49 <ihrachys> mlavalle, as for your patches moving / removing jobs, where are we on those?
16:54:51 <chandankumar> ihrachys: just blocked on https://review.openstack.org/#/c/521346/
16:55:19 <mlavalle> ihrachys: I created this one for master: https://review.openstack.org/#/c/525345/
16:55:20 <chandankumar> if you check the last comment, we are blocked on how to install neutron tempest plugin
16:55:41 <chandankumar> so that we can it in ci, other reviews are similar
16:55:50 <mlavalle> I am thinking that maybe I should break it in pieces
16:55:50 <chandankumar> some of them does not have devstack plugin
16:56:16 <chandankumar> mlavalle: ihrachys https://review.openstack.org/#/q/topic:switch_to_neutron_tempest_plugin+(status:open+OR+status:merged)
16:56:17 <mlavalle> one for fullstack, one for functional and maybe one for linuxbrdge
16:56:22 <ihrachys> chandankumar, having it in a single place (the new tempest plugin repo) would be the best no?
16:56:32 <ihrachys> then you would just include the plugin in the job
16:56:46 <mlavalle> and one for the rest
16:56:56 <mlavalle> otherwise, the chances of a job failing are high
16:57:07 <ihrachys> mlavalle, is it because you have more work to do on some of those? if not, I think a single piece is fine.
16:57:19 <mlavalle> ok, fine
16:57:25 <ihrachys> mlavalle, oh you mean the fact that it blows up the gate size?
16:57:26 <chandankumar> ihrachys: you mean one devstack plugin for neutron tempest plugin?
16:57:36 <ihrachys> chandankumar, yes
16:57:46 <ihrachys> I think that's what yamamoto suggested
16:57:46 <chandankumar> ihrachys: sure i will look into that tomorrow
16:57:49 <mlavalle> well, those jobs tend to fail frequently
16:57:53 <ihrachys> chandankumar, thanks!
16:58:11 <mlavalle> so one single patch might be difficult to merge
16:58:18 <ihrachys> mlavalle, yeah I get your point now. some are not voting like fullstack so probably doesn't matter as much.
16:58:28 <mlavalle> ok, cool
16:58:48 <mlavalle> so next step is to create the dependent patches for the other repos
16:58:53 <ihrachys> let's split indeed. especially api / scenarios since you will need to backport the rest but not those into stable
16:58:56 <mlavalle> to remove the jobs from there
16:59:16 <ihrachys> ok
16:59:29 <mlavalle> and then backport as you mention
16:59:42 <mlavalle> that's the update in this regard
17:00:26 <ihrachys> we don't have time to discuss scenarios in details
17:00:39 <ihrachys> I will just drop here a link to the patch that should fix linuxbridge flavor: https://review.openstack.org/#/c/523319/
17:00:45 <ihrachys> please review :)
17:00:50 <ihrachys> and... that's it for today
17:00:54 <mlavalle> I will update in channel regarding fip scenrio test
17:01:13 <ihrachys> thanks folks for working on the most boring things in neutron project! you are superheroes. :)
17:01:16 <ihrachys> #endmeeting