16:00:43 #startmeeting neutron_ci 16:00:43 Meeting started Tue Dec 5 16:00:43 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 The meeting name has been set to 'neutron_ci' 16:00:50 what's up! 16:00:51 o 16:00:59 o/ 16:01:00 * jlibosva lost his hand 16:01:03 jlibosva, you came bare hands? 16:01:07 :) 16:01:07 hi 16:01:26 :] 16:01:43 #topic Actions from prev meeting 16:01:53 "ihrachys to disable connectivity fullstack tests while we look for culprit" 16:01:54 and 16:02:00 "ihrachys to disable dscp qos fullstack test while we look for culprit" 16:02:18 both merged 16:02:28 https://review.openstack.org/523517 and https://review.openstack.org/523518 16:02:46 I was thinking about backporting but that will require to first backport the decorator. do we want to go this way? 16:02:58 I don't think it should be a priority 16:03:19 I'd try to stabilize master and then think about stable branches 16:04:04 ok 16:04:16 next is 16:04:25 "jlibosva to look at test_securitygroup(linuxbridge-iptables) failure in fullstack" 16:04:42 that was some "test_securitygroup(linuxbridge-iptables) with RuntimeError: Process ['ncat', u'20.0.0.11', '3333', '-w', '20'] hasn't been spawned in 20 seconds" failure 16:04:48 I need more time for both. There was no obvious failure so I'm trying to reproduce 16:04:56 ack 16:05:02 I assume second you mean "jlibosva to look at test_securitygroup(linuxbridge-iptables) failure in fullstack" 16:05:04 the reason it wasn't spawned might be that it uses tcp connection and that couldn't be established 16:05:16 eh 16:05:18 "jlibosva to look at test_l2_agent_restart fullstack failure" 16:05:49 ok next was "ihrachys to disable legacy jobs for neutron master" 16:05:57 that's for tempest api / scenario jobs 16:05:58 I briefly looked at that one too but will still need more time. I suspect that 20 seconds might not be enough to restart all agents and come back 16:06:13 in previous debugging, we saw it took 30 seconds to just start an agent, so it could be a clue 16:06:42 ack 16:06:51 for legacy jobs, we merged this: https://review.openstack.org/523514 16:06:56 disabling them for master 16:07:13 and we even landed the patch cleaning up tempest in-tree test classes (we will discuss that later), so it works fine 16:07:28 those were all action items we had so far 16:07:33 regarding that 16:07:41 it seems we still have a legacy api job in gate Q 16:07:53 hm... where? 16:08:19 don't we need this as well https://review.openstack.org/#/c/521349/ ? 16:08:45 jlibosva, that would merge after we move the jobs into stable branches 16:08:49 at the gate report from removing the in-tree tempest but if you want to talk about that later, we can discuss that later 16:08:50 Miguel is working on it 16:08:53 https://review.openstack.org/#/c/506672/ 16:09:05 aah, ok, sorry 16:09:12 but still it seems the api job is there ^^ 16:09:15 jlibosva, oh that's ouch 16:09:22 jlibosva, maybe it's still voting in gate 16:09:34 I may have killed it in check queue only 16:09:37 I will follow up on it 16:09:40 ack 16:09:42 thanks 16:09:50 I don't really know what is going on :) 16:09:51 #action ihrachys to make sure legacy tempest jobs are gone in gate queue 16:09:58 thanks for the notice! 16:10:00 ahhh, that explains why I saw it last night 16:10:30 I will figure it out 16:10:30 #topic Grafana 16:10:33 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:11:41 I gotta wash my eyes. I can't see scenarios at 100% 16:12:22 I don't see it at all 16:12:37 oh I think that's because they are gone 16:12:43 because of project-config change 16:12:50 I forgot to update grafana 16:12:57 meh, I suck 16:13:08 #action ihrachys to update grafana for new non-legacy job names 16:13:23 it sounded too good to be true right! 16:13:24 :) 16:13:29 yeah 16:13:43 it's not xmas yet 16:13:52 but at least you get that good feeling for a short time :) 16:13:55 lol 16:14:01 lol 16:14:23 ok so other than that, fullstack that we'll have a closer look now 16:14:26 #topic Fullstack 16:14:48 fullstack goes sideways in 60-80% failure rate range 16:15:21 we have quite some fullstack related bugs 16:15:22 https://bugs.launchpad.net/neutron/+bugs?field.tag=fullstack&orderby=status&start=0 16:15:29 14 total 16:15:37 some are not gate-failures of couse 16:15:51 let's walk through them 16:16:06 https://bugs.launchpad.net/neutron/+bug/1728948 16:16:06 Launchpad bug 1728948 in neutron "fullstack: test_connectivity fails due to dhclient crash" [High,New] - Assigned to Jakub Libosvar (libosvar) 16:16:14 I believe that's one of ton of bugs that jlibosva is looking into 16:16:28 I guess it should be Confirmed since we see it all the time 16:16:32 I should be looking at it, I haven't yet :) 16:16:41 it shouldn't affect gate now right? 16:16:43 yeah, confirmed makes snese 16:17:04 I don't know, I would expect it affects other tests using dhcp 16:17:29 it might be an issue with the dhclient script I copied from dhclient repo 16:17:58 maybe we should set all other tests to not use dhcp? 16:18:00 I need to dig deeper 16:18:07 but I mean, we disabled the tests with the decorator 16:18:24 were those the only tests using dhcp? 16:18:27 I need to check 16:18:48 ok. anyway, seems like you will have more after you look closer. 16:18:52 https://bugs.launchpad.net/neutron/+bug/1733649 16:18:52 Launchpad bug 1733649 in neutron "fullstack neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs.test_dscp_marking_packets(openflow-native) failure" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:19:11 slaweq_, I think you made some progress there. can you brief us on where it stands? 16:19:17 maybe test_dhcp_agent will also be affected 16:19:19 yes 16:19:27 so I couldn't reproduce it locally 16:19:46 I pushed patch with some additional logs but it was also fine each time when I rechecked 16:20:11 I was also checking logs from failed jobs and for me everything on L2 agent and server side looks fine 16:20:29 I suspect that it can be issue with tcpdump output which don't match regex for some reason 16:20:43 good. I think you can put it on backburner and we will wait when it triggers your log message. 16:20:46 so there is already merged patch https://review.openstack.org/#/c/525156/ which adds logging of tcpdump 16:21:02 sorry but what is backburner? 16:21:12 leave it aside for a while 16:21:30 it's an expression 16:21:36 ah, ok :) 16:21:50 thanks mlavalle, I should have known not to use it :) 16:21:51 so yes, I will wait now for such failed test and will check it then 16:22:23 there was an old https://bugs.launchpad.net/neutron/+bug/1687074 but I am not sure if it is still an issue 16:22:23 Launchpad bug 1687074 in neutron "Sometimes ovsdb fails with "tcp:127.0.0.1:6640: error parsing stream"" [High,Confirmed] 16:22:23 I will remember it now, sorry for my english :) 16:23:10 jlibosva, you suspect latest ovsdbapp may have it fixed? 16:23:35 I saw it very recently 16:23:47 let's check logstash 16:23:49 with the newest library? 16:24:08 yeah I see it lately 16:24:08 that's what I'm not sure 16:24:16 maybe it was today :) 16:24:16 in master 16:24:24 this is q: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22error%20parsing%20stream%5C%22 16:25:09 most of hits are successes though 16:26:34 for what I understand, we get random crap from ovsdb socket there 16:26:57 I can't see one in fullstack though 16:27:01 it would be nice to see full contents of socket when it happens 16:27:03 all the failures come from functional tests 16:27:18 jlibosva, yeah that's true 16:27:20 hm 16:27:29 eventlet? 16:27:33 but I'm 100% I saw it recently in fullstack 16:27:37 *sure 16:28:51 though in fullstack, we also patch test runner and agents, and functional test runner should be patched too, so it is probably the same from eventlet perspective 16:29:46 otherwiseguy, are you around 16:30:43 ok let's move on but otherwiseguy feel free to reply later 16:31:16 I think we may need otherwiseguy on that bug since it seems rather close to ovsdbapp 16:31:31 next is https://bugs.launchpad.net/neutron/+bug/1673531 16:31:31 Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [High,Confirmed] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 16:31:32 agreed 16:31:46 I haven't made any progress since the last time here. need more time. 16:32:27 https://bugs.launchpad.net/neutron/+bug/1721796 16:32:28 Launchpad bug 1721796 in neutron "wait_until_true is not rootwrap daemon friendly" [Medium,In progress] - Assigned to Jakub Libosvar (libosvar) 16:32:31 (I skip Wishlist bugs) 16:32:43 jlibosva, is that one still valid with new oslo.rootwrap? 16:32:58 * jlibosva ¯\_(ツ)_/¯ 16:33:10 is the new rootwrap already released, bumped and used? 16:33:14 yes 16:33:20 ok, I'll keep an eye on this one 16:33:22 in master for sure. 16:33:45 jlibosva, ok. I would close and revisit if it pops up again. :) 16:33:47 problem is that there is no easy string to check 16:33:50 sounds good 16:34:32 as I'm checking fullstack results quite often I will also be aware of this one 16:34:59 thanks 16:36:01 https://bugs.launchpad.net/neutron/+bug/1487548 16:36:01 Launchpad bug 1487548 in neutron "fullstack infrastructure tears down processes via kill -9" [Low,In progress] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 16:36:08 I totally forgot about that one 16:36:15 I had a primitive test patch here: https://review.openstack.org/#/c/499803/ 16:36:28 but there is some work to do there 16:36:45 making sure that if a process doesn't die gracefully we still kill it forcefully 16:37:11 honestly though, I may not get on that one in next weeks. shouldn't be a high priority. 16:37:16 maybe even deserves Wishlist 16:37:21 agreed 16:37:26 since nothing is broken 16:37:30 ihrachys: I can help on this one if You want 16:37:32 ok I moved to wishlist 16:37:37 also I think we already have a code that sends 15 and if it doesn't exit, it does 9 16:37:57 slaweq_, nah I think if you have cycles on fullstack it's better to e.g. help jlibosva with two issues he already tracks 16:38:09 jlibosva, not for all agents I believe 16:38:13 ok 16:38:25 I mean that the functionality is implemented :) 16:38:54 yeah. ok, I will have another look, maybe it already works as needed. 16:40:27 I think we have enough work to do not to look through a recent log 16:40:37 I have one more thing regarding fullstack 16:40:41 shoot 16:40:57 so I saw we are getting timeouts .. I think that's cause we have way too many jobs 16:41:16 one way to simplify testing matrix would be eliminating the ovsdb and openflow interfaces 16:41:42 we have https://review.openstack.org/#/c/503076/ 16:42:18 oh well. yeah, it fell through cracks... :) 16:42:27 and we can do similar to https://review.openstack.org/#/c/506713/ with openflow I think 16:42:59 ihrachys: do you plan to work on that one or maybe we could find somebody to continue the work 16:43:09 there was some environment setup dependency on cli that I couldn't immediately figure out before I was distracted 16:44:01 I think we should also prioritize that one to q3 16:44:07 having someone looking into it instead of me would definitely help :) 16:44:35 iwamoto asked about status so maybe he would be interested 16:44:41 also I am interested :) 16:44:59 another thing 16:45:41 I thought about improving fullstack by putting the environment building to class level, so env will be built for all tests in a class just once 16:45:44 jlibosva, ok I commented in gerrit that IWAMOTO can take it over 16:45:50 ihrachys: thanks 16:45:55 that can also save some time 16:46:28 like tempest? 16:46:28 and last thing - it takes time to create the DB, we can have something like an empty DB dump that would fill the DB with tables instead of using alembic 16:46:34 right, like tempest 16:47:08 good ideas jlibosva 16:47:09 the DB dump would need to be regenerated on each model changes, similarly to updating head hash 16:47:27 jlibosva, sharing env may affect stability, if there is some cross dependency. but it's worth a try. 16:47:31 should I file rfes against fullstack or wishlist bugs? 16:47:34 jlibosva, you suggest to check in dump in git? 16:47:52 jlibosva, wishlists are fine I believe; rfes are for user visible changes. 16:47:57 ihrachys: yes, dump with empty tables 16:48:02 ack, thanks 16:48:07 I'll try to write something down 16:48:15 I had this in my head for a while 16:48:42 #action jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class) 16:49:16 let's briefly look at tempest plugin since we have little time 16:49:17 #topic Tempest plugin 16:49:29 https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:49:59 so as I learned today, I need to clean up gate from legacy before we can land the patch removing tempest code from neutron tree 16:50:20 assuming it's done, grafana is fixed, and neutron tempest code is dropped; what are next steps. 16:50:26 chandankumar, ^ 16:50:51 ihrachys: we need to fix the plugins 16:50:56 I believe one vector is moving those legacy jobs for stable branches and then removing them in project-config / openstack-zuul-jobs / ... 16:51:12 chandankumar, which plugins. you mean for stadium projects? 16:51:16 https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:51:21 ihrachys: sorry 16:51:51 ihrachys: since we have now working neutron tempest plugin, we need to think about stadium projects having intree tempest plugin 16:52:18 there are two in the list - midonet and dynamic-routing 16:52:38 others are not stadium and I would not waste time on fixing them. interested parties can chime in. 16:52:53 how about networking-sfc? 16:52:59 what about neutron-vpnaas and dynamic routing 16:53:06 vpnaas is in not stadium 16:53:14 dynamic-routing, I mentioned it already 16:53:27 mlavalle, it's not in the list. I assume it means it's fine. chandankumar correct? 16:53:35 ihrachys: yup 16:53:48 bcafarel was asking about it a few weeks ago 16:54:00 chandankumar, do you plan to respin for comments, or we should find owners from subprojects to take over the patches? 16:54:20 ihrachys: yup i need to respin the patches 16:54:34 ok feel free to ask for help if needed 16:54:49 mlavalle, as for your patches moving / removing jobs, where are we on those? 16:54:51 ihrachys: just blocked on https://review.openstack.org/#/c/521346/ 16:55:19 ihrachys: I created this one for master: https://review.openstack.org/#/c/525345/ 16:55:20 if you check the last comment, we are blocked on how to install neutron tempest plugin 16:55:41 so that we can it in ci, other reviews are similar 16:55:50 I am thinking that maybe I should break it in pieces 16:55:50 some of them does not have devstack plugin 16:56:16 mlavalle: ihrachys https://review.openstack.org/#/q/topic:switch_to_neutron_tempest_plugin+(status:open+OR+status:merged) 16:56:17 one for fullstack, one for functional and maybe one for linuxbrdge 16:56:22 chandankumar, having it in a single place (the new tempest plugin repo) would be the best no? 16:56:32 then you would just include the plugin in the job 16:56:46 and one for the rest 16:56:56 otherwise, the chances of a job failing are high 16:57:07 mlavalle, is it because you have more work to do on some of those? if not, I think a single piece is fine. 16:57:19 ok, fine 16:57:25 mlavalle, oh you mean the fact that it blows up the gate size? 16:57:26 ihrachys: you mean one devstack plugin for neutron tempest plugin? 16:57:36 chandankumar, yes 16:57:46 I think that's what yamamoto suggested 16:57:46 ihrachys: sure i will look into that tomorrow 16:57:49 well, those jobs tend to fail frequently 16:57:53 chandankumar, thanks! 16:58:11 so one single patch might be difficult to merge 16:58:18 mlavalle, yeah I get your point now. some are not voting like fullstack so probably doesn't matter as much. 16:58:28 ok, cool 16:58:48 so next step is to create the dependent patches for the other repos 16:58:53 let's split indeed. especially api / scenarios since you will need to backport the rest but not those into stable 16:58:56 to remove the jobs from there 16:59:16 ok 16:59:29 and then backport as you mention 16:59:42 that's the update in this regard 17:00:26 we don't have time to discuss scenarios in details 17:00:39 I will just drop here a link to the patch that should fix linuxbridge flavor: https://review.openstack.org/#/c/523319/ 17:00:45 please review :) 17:00:50 and... that's it for today 17:00:54 I will update in channel regarding fip scenrio test 17:01:13 thanks folks for working on the most boring things in neutron project! you are superheroes. :) 17:01:16 #endmeeting