16:00:43 <ihrachys> #startmeeting neutron_ci 16:00:43 <openstack> Meeting started Tue Dec 5 16:00:43 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 <openstack> The meeting name has been set to 'neutron_ci' 16:00:50 <ihrachys> what's up! 16:00:51 <jlibosva> o 16:00:59 <mlavalle> o/ 16:01:00 * jlibosva lost his hand 16:01:03 <ihrachys> jlibosva, you came bare hands? 16:01:07 <ihrachys> :) 16:01:07 <slaweq_> hi 16:01:26 <jlibosva> :] 16:01:43 <ihrachys> #topic Actions from prev meeting 16:01:53 <ihrachys> "ihrachys to disable connectivity fullstack tests while we look for culprit" 16:01:54 <ihrachys> and 16:02:00 <ihrachys> "ihrachys to disable dscp qos fullstack test while we look for culprit" 16:02:18 <ihrachys> both merged 16:02:28 <ihrachys> https://review.openstack.org/523517 and https://review.openstack.org/523518 16:02:46 <ihrachys> I was thinking about backporting but that will require to first backport the decorator. do we want to go this way? 16:02:58 <jlibosva> I don't think it should be a priority 16:03:19 <jlibosva> I'd try to stabilize master and then think about stable branches 16:04:04 <ihrachys> ok 16:04:16 <ihrachys> next is 16:04:25 <ihrachys> "jlibosva to look at test_securitygroup(linuxbridge-iptables) failure in fullstack" 16:04:42 <ihrachys> that was some "test_securitygroup(linuxbridge-iptables) with RuntimeError: Process ['ncat', u'20.0.0.11', '3333', '-w', '20'] hasn't been spawned in 20 seconds" failure 16:04:48 <jlibosva> I need more time for both. There was no obvious failure so I'm trying to reproduce 16:04:56 <ihrachys> ack 16:05:02 <ihrachys> I assume second you mean "jlibosva to look at test_securitygroup(linuxbridge-iptables) failure in fullstack" 16:05:04 <jlibosva> the reason it wasn't spawned might be that it uses tcp connection and that couldn't be established 16:05:16 <ihrachys> eh 16:05:18 <ihrachys> "jlibosva to look at test_l2_agent_restart fullstack failure" 16:05:49 <ihrachys> ok next was "ihrachys to disable legacy jobs for neutron master" 16:05:57 <ihrachys> that's for tempest api / scenario jobs 16:05:58 <jlibosva> I briefly looked at that one too but will still need more time. I suspect that 20 seconds might not be enough to restart all agents and come back 16:06:13 <jlibosva> in previous debugging, we saw it took 30 seconds to just start an agent, so it could be a clue 16:06:42 <ihrachys> ack 16:06:51 <ihrachys> for legacy jobs, we merged this: https://review.openstack.org/523514 16:06:56 <ihrachys> disabling them for master 16:07:13 <ihrachys> and we even landed the patch cleaning up tempest in-tree test classes (we will discuss that later), so it works fine 16:07:28 <ihrachys> those were all action items we had so far 16:07:33 <jlibosva> regarding that 16:07:41 <jlibosva> it seems we still have a legacy api job in gate Q 16:07:53 <ihrachys> hm... where? 16:08:19 <jlibosva> don't we need this as well https://review.openstack.org/#/c/521349/ ? 16:08:45 <ihrachys> jlibosva, that would merge after we move the jobs into stable branches 16:08:49 <jlibosva> at the gate report from removing the in-tree tempest but if you want to talk about that later, we can discuss that later 16:08:50 <ihrachys> Miguel is working on it 16:08:53 <jlibosva> https://review.openstack.org/#/c/506672/ 16:09:05 <jlibosva> aah, ok, sorry 16:09:12 <jlibosva> but still it seems the api job is there ^^ 16:09:15 <ihrachys> jlibosva, oh that's ouch 16:09:22 <ihrachys> jlibosva, maybe it's still voting in gate 16:09:34 <ihrachys> I may have killed it in check queue only 16:09:37 <ihrachys> I will follow up on it 16:09:40 <jlibosva> ack 16:09:42 <jlibosva> thanks 16:09:50 <jlibosva> I don't really know what is going on :) 16:09:51 <ihrachys> #action ihrachys to make sure legacy tempest jobs are gone in gate queue 16:09:58 <ihrachys> thanks for the notice! 16:10:00 <mlavalle> ahhh, that explains why I saw it last night 16:10:30 <ihrachys> I will figure it out 16:10:30 <ihrachys> #topic Grafana 16:10:33 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:11:41 <ihrachys> I gotta wash my eyes. I can't see scenarios at 100% 16:12:22 <jlibosva> I don't see it at all 16:12:37 <ihrachys> oh I think that's because they are gone 16:12:43 <ihrachys> because of project-config change 16:12:50 <ihrachys> I forgot to update grafana 16:12:57 <ihrachys> meh, I suck 16:13:08 <ihrachys> #action ihrachys to update grafana for new non-legacy job names 16:13:23 <ihrachys> it sounded too good to be true right! 16:13:24 <ihrachys> :) 16:13:29 <mlavalle> yeah 16:13:43 <mlavalle> it's not xmas yet 16:13:52 <jlibosva> but at least you get that good feeling for a short time :) 16:13:55 <jlibosva> lol 16:14:01 <mlavalle> lol 16:14:23 <ihrachys> ok so other than that, fullstack that we'll have a closer look now 16:14:26 <ihrachys> #topic Fullstack 16:14:48 <ihrachys> fullstack goes sideways in 60-80% failure rate range 16:15:21 <ihrachys> we have quite some fullstack related bugs 16:15:22 <ihrachys> https://bugs.launchpad.net/neutron/+bugs?field.tag=fullstack&orderby=status&start=0 16:15:29 <ihrachys> 14 total 16:15:37 <ihrachys> some are not gate-failures of couse 16:15:51 <ihrachys> let's walk through them 16:16:06 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1728948 16:16:06 <openstack> Launchpad bug 1728948 in neutron "fullstack: test_connectivity fails due to dhclient crash" [High,New] - Assigned to Jakub Libosvar (libosvar) 16:16:14 <ihrachys> I believe that's one of ton of bugs that jlibosva is looking into 16:16:28 <ihrachys> I guess it should be Confirmed since we see it all the time 16:16:32 <jlibosva> I should be looking at it, I haven't yet :) 16:16:41 <ihrachys> it shouldn't affect gate now right? 16:16:43 <jlibosva> yeah, confirmed makes snese 16:17:04 <jlibosva> I don't know, I would expect it affects other tests using dhcp 16:17:29 <jlibosva> it might be an issue with the dhclient script I copied from dhclient repo 16:17:58 <slaweq_> maybe we should set all other tests to not use dhcp? 16:18:00 <jlibosva> I need to dig deeper 16:18:07 <ihrachys> but I mean, we disabled the tests with the decorator 16:18:24 <jlibosva> were those the only tests using dhcp? 16:18:27 <jlibosva> I need to check 16:18:48 <ihrachys> ok. anyway, seems like you will have more after you look closer. 16:18:52 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1733649 16:18:52 <openstack> Launchpad bug 1733649 in neutron "fullstack neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs.test_dscp_marking_packets(openflow-native) failure" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:19:11 <ihrachys> slaweq_, I think you made some progress there. can you brief us on where it stands? 16:19:17 <jlibosva> maybe test_dhcp_agent will also be affected 16:19:19 <slaweq_> yes 16:19:27 <slaweq_> so I couldn't reproduce it locally 16:19:46 <slaweq_> I pushed patch with some additional logs but it was also fine each time when I rechecked 16:20:11 <slaweq_> I was also checking logs from failed jobs and for me everything on L2 agent and server side looks fine 16:20:29 <slaweq_> I suspect that it can be issue with tcpdump output which don't match regex for some reason 16:20:43 <ihrachys> good. I think you can put it on backburner and we will wait when it triggers your log message. 16:20:46 <slaweq_> so there is already merged patch https://review.openstack.org/#/c/525156/ which adds logging of tcpdump 16:21:02 <slaweq_> sorry but what is backburner? 16:21:12 <mlavalle> leave it aside for a while 16:21:30 <mlavalle> it's an expression 16:21:36 <slaweq_> ah, ok :) 16:21:50 <ihrachys> thanks mlavalle, I should have known not to use it :) 16:21:51 <slaweq_> so yes, I will wait now for such failed test and will check it then 16:22:23 <ihrachys> there was an old https://bugs.launchpad.net/neutron/+bug/1687074 but I am not sure if it is still an issue 16:22:23 <openstack> Launchpad bug 1687074 in neutron "Sometimes ovsdb fails with "tcp:127.0.0.1:6640: error parsing stream"" [High,Confirmed] 16:22:23 <slaweq_> I will remember it now, sorry for my english :) 16:23:10 <ihrachys> jlibosva, you suspect latest ovsdbapp may have it fixed? 16:23:35 <jlibosva> I saw it very recently 16:23:47 <jlibosva> let's check logstash 16:23:49 <ihrachys> with the newest library? 16:24:08 <ihrachys> yeah I see it lately 16:24:08 <jlibosva> that's what I'm not sure 16:24:16 <jlibosva> maybe it was today :) 16:24:16 <ihrachys> in master 16:24:24 <ihrachys> this is q: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22error%20parsing%20stream%5C%22 16:25:09 <ihrachys> most of hits are successes though 16:26:34 <ihrachys> for what I understand, we get random crap from ovsdb socket there 16:26:57 <jlibosva> I can't see one in fullstack though 16:27:01 <ihrachys> it would be nice to see full contents of socket when it happens 16:27:03 <jlibosva> all the failures come from functional tests 16:27:18 <ihrachys> jlibosva, yeah that's true 16:27:20 <ihrachys> hm 16:27:29 <ihrachys> eventlet? 16:27:33 <jlibosva> but I'm 100% I saw it recently in fullstack 16:27:37 <jlibosva> *sure 16:28:51 <ihrachys> though in fullstack, we also patch test runner and agents, and functional test runner should be patched too, so it is probably the same from eventlet perspective 16:29:46 <ihrachys> otherwiseguy, are you around 16:30:43 <ihrachys> ok let's move on but otherwiseguy feel free to reply later 16:31:16 <ihrachys> I think we may need otherwiseguy on that bug since it seems rather close to ovsdbapp 16:31:31 <ihrachys> next is https://bugs.launchpad.net/neutron/+bug/1673531 16:31:31 <openstack> Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [High,Confirmed] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 16:31:32 <jlibosva> agreed 16:31:46 <ihrachys> I haven't made any progress since the last time here. need more time. 16:32:27 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1721796 16:32:28 <openstack> Launchpad bug 1721796 in neutron "wait_until_true is not rootwrap daemon friendly" [Medium,In progress] - Assigned to Jakub Libosvar (libosvar) 16:32:31 <ihrachys> (I skip Wishlist bugs) 16:32:43 <ihrachys> jlibosva, is that one still valid with new oslo.rootwrap? 16:32:58 * jlibosva ¯\_(ツ)_/¯ 16:33:10 <jlibosva> is the new rootwrap already released, bumped and used? 16:33:14 <ihrachys> yes 16:33:20 <jlibosva> ok, I'll keep an eye on this one 16:33:22 <ihrachys> in master for sure. 16:33:45 <ihrachys> jlibosva, ok. I would close and revisit if it pops up again. :) 16:33:47 <jlibosva> problem is that there is no easy string to check 16:33:50 <jlibosva> sounds good 16:34:32 <slaweq_> as I'm checking fullstack results quite often I will also be aware of this one 16:34:59 <jlibosva> thanks 16:36:01 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1487548 16:36:01 <openstack> Launchpad bug 1487548 in neutron "fullstack infrastructure tears down processes via kill -9" [Low,In progress] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 16:36:08 <ihrachys> I totally forgot about that one 16:36:15 <ihrachys> I had a primitive test patch here: https://review.openstack.org/#/c/499803/ 16:36:28 <ihrachys> but there is some work to do there 16:36:45 <ihrachys> making sure that if a process doesn't die gracefully we still kill it forcefully 16:37:11 <ihrachys> honestly though, I may not get on that one in next weeks. shouldn't be a high priority. 16:37:16 <ihrachys> maybe even deserves Wishlist 16:37:21 <jlibosva> agreed 16:37:26 <ihrachys> since nothing is broken 16:37:30 <slaweq_> ihrachys: I can help on this one if You want 16:37:32 <ihrachys> ok I moved to wishlist 16:37:37 <jlibosva> also I think we already have a code that sends 15 and if it doesn't exit, it does 9 16:37:57 <ihrachys> slaweq_, nah I think if you have cycles on fullstack it's better to e.g. help jlibosva with two issues he already tracks 16:38:09 <ihrachys> jlibosva, not for all agents I believe 16:38:13 <slaweq_> ok 16:38:25 <jlibosva> I mean that the functionality is implemented :) 16:38:54 <ihrachys> yeah. ok, I will have another look, maybe it already works as needed. 16:40:27 <ihrachys> I think we have enough work to do not to look through a recent log 16:40:37 <jlibosva> I have one more thing regarding fullstack 16:40:41 <ihrachys> shoot 16:40:57 <jlibosva> so I saw we are getting timeouts .. I think that's cause we have way too many jobs 16:41:16 <jlibosva> one way to simplify testing matrix would be eliminating the ovsdb and openflow interfaces 16:41:42 <jlibosva> we have https://review.openstack.org/#/c/503076/ 16:42:18 <ihrachys> oh well. yeah, it fell through cracks... :) 16:42:27 <jlibosva> and we can do similar to https://review.openstack.org/#/c/506713/ with openflow I think 16:42:59 <jlibosva> ihrachys: do you plan to work on that one or maybe we could find somebody to continue the work 16:43:09 <ihrachys> there was some environment setup dependency on cli that I couldn't immediately figure out before I was distracted 16:44:01 <jlibosva> I think we should also prioritize that one to q3 16:44:07 <ihrachys> having someone looking into it instead of me would definitely help :) 16:44:35 <jlibosva> iwamoto asked about status so maybe he would be interested 16:44:41 <jlibosva> also I am interested :) 16:44:59 <jlibosva> another thing 16:45:41 <jlibosva> I thought about improving fullstack by putting the environment building to class level, so env will be built for all tests in a class just once 16:45:44 <ihrachys> jlibosva, ok I commented in gerrit that IWAMOTO can take it over 16:45:50 <jlibosva> ihrachys: thanks 16:45:55 <jlibosva> that can also save some time 16:46:28 <ihrachys> like tempest? 16:46:28 <jlibosva> and last thing - it takes time to create the DB, we can have something like an empty DB dump that would fill the DB with tables instead of using alembic 16:46:34 <jlibosva> right, like tempest 16:47:08 <mlavalle> good ideas jlibosva 16:47:09 <jlibosva> the DB dump would need to be regenerated on each model changes, similarly to updating head hash 16:47:27 <ihrachys> jlibosva, sharing env may affect stability, if there is some cross dependency. but it's worth a try. 16:47:31 <jlibosva> should I file rfes against fullstack or wishlist bugs? 16:47:34 <ihrachys> jlibosva, you suggest to check in dump in git? 16:47:52 <ihrachys> jlibosva, wishlists are fine I believe; rfes are for user visible changes. 16:47:57 <jlibosva> ihrachys: yes, dump with empty tables 16:48:02 <jlibosva> ack, thanks 16:48:07 <jlibosva> I'll try to write something down 16:48:15 <jlibosva> I had this in my head for a while 16:48:42 <ihrachys> #action jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class) 16:49:16 <ihrachys> let's briefly look at tempest plugin since we have little time 16:49:17 <ihrachys> #topic Tempest plugin 16:49:29 <ihrachys> https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:49:59 <ihrachys> so as I learned today, I need to clean up gate from legacy before we can land the patch removing tempest code from neutron tree 16:50:20 <ihrachys> assuming it's done, grafana is fixed, and neutron tempest code is dropped; what are next steps. 16:50:26 <ihrachys> chandankumar, ^ 16:50:51 <chandankumar> ihrachys: we need to fix the plugins 16:50:56 <ihrachys> I believe one vector is moving those legacy jobs for stable branches and then removing them in project-config / openstack-zuul-jobs / ... 16:51:12 <ihrachys> chandankumar, which plugins. you mean for stadium projects? 16:51:16 <chandankumar> https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:51:21 <chandankumar> ihrachys: sorry 16:51:51 <chandankumar> ihrachys: since we have now working neutron tempest plugin, we need to think about stadium projects having intree tempest plugin 16:52:18 <ihrachys> there are two in the list - midonet and dynamic-routing 16:52:38 <ihrachys> others are not stadium and I would not waste time on fixing them. interested parties can chime in. 16:52:53 <mlavalle> how about networking-sfc? 16:52:59 <chandankumar> what about neutron-vpnaas and dynamic routing 16:53:06 <ihrachys> vpnaas is in not stadium 16:53:14 <ihrachys> dynamic-routing, I mentioned it already 16:53:27 <ihrachys> mlavalle, it's not in the list. I assume it means it's fine. chandankumar correct? 16:53:35 <chandankumar> ihrachys: yup 16:53:48 <mlavalle> bcafarel was asking about it a few weeks ago 16:54:00 <ihrachys> chandankumar, do you plan to respin for comments, or we should find owners from subprojects to take over the patches? 16:54:20 <chandankumar> ihrachys: yup i need to respin the patches 16:54:34 <ihrachys> ok feel free to ask for help if needed 16:54:49 <ihrachys> mlavalle, as for your patches moving / removing jobs, where are we on those? 16:54:51 <chandankumar> ihrachys: just blocked on https://review.openstack.org/#/c/521346/ 16:55:19 <mlavalle> ihrachys: I created this one for master: https://review.openstack.org/#/c/525345/ 16:55:20 <chandankumar> if you check the last comment, we are blocked on how to install neutron tempest plugin 16:55:41 <chandankumar> so that we can it in ci, other reviews are similar 16:55:50 <mlavalle> I am thinking that maybe I should break it in pieces 16:55:50 <chandankumar> some of them does not have devstack plugin 16:56:16 <chandankumar> mlavalle: ihrachys https://review.openstack.org/#/q/topic:switch_to_neutron_tempest_plugin+(status:open+OR+status:merged) 16:56:17 <mlavalle> one for fullstack, one for functional and maybe one for linuxbrdge 16:56:22 <ihrachys> chandankumar, having it in a single place (the new tempest plugin repo) would be the best no? 16:56:32 <ihrachys> then you would just include the plugin in the job 16:56:46 <mlavalle> and one for the rest 16:56:56 <mlavalle> otherwise, the chances of a job failing are high 16:57:07 <ihrachys> mlavalle, is it because you have more work to do on some of those? if not, I think a single piece is fine. 16:57:19 <mlavalle> ok, fine 16:57:25 <ihrachys> mlavalle, oh you mean the fact that it blows up the gate size? 16:57:26 <chandankumar> ihrachys: you mean one devstack plugin for neutron tempest plugin? 16:57:36 <ihrachys> chandankumar, yes 16:57:46 <ihrachys> I think that's what yamamoto suggested 16:57:46 <chandankumar> ihrachys: sure i will look into that tomorrow 16:57:49 <mlavalle> well, those jobs tend to fail frequently 16:57:53 <ihrachys> chandankumar, thanks! 16:58:11 <mlavalle> so one single patch might be difficult to merge 16:58:18 <ihrachys> mlavalle, yeah I get your point now. some are not voting like fullstack so probably doesn't matter as much. 16:58:28 <mlavalle> ok, cool 16:58:48 <mlavalle> so next step is to create the dependent patches for the other repos 16:58:53 <ihrachys> let's split indeed. especially api / scenarios since you will need to backport the rest but not those into stable 16:58:56 <mlavalle> to remove the jobs from there 16:59:16 <ihrachys> ok 16:59:29 <mlavalle> and then backport as you mention 16:59:42 <mlavalle> that's the update in this regard 17:00:26 <ihrachys> we don't have time to discuss scenarios in details 17:00:39 <ihrachys> I will just drop here a link to the patch that should fix linuxbridge flavor: https://review.openstack.org/#/c/523319/ 17:00:45 <ihrachys> please review :) 17:00:50 <ihrachys> and... that's it for today 17:00:54 <mlavalle> I will update in channel regarding fip scenrio test 17:01:13 <ihrachys> thanks folks for working on the most boring things in neutron project! you are superheroes. :) 17:01:16 <ihrachys> #endmeeting