16:00:19 <slaweq> #startmeeting neutron_ci 16:00:20 <openstack> Meeting started Tue Dec 18 16:00:19 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:21 <slaweq> hi 16:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:25 <openstack> The meeting name has been set to 'neutron_ci' 16:00:51 <mlavalle> o/ 16:01:25 <hongbin> o/ 16:01:54 <slaweq> ok, lets start 16:02:00 <slaweq> #topic Actions from previous meetings 16:02:06 <slaweq> mlavalle will continue debugging trunk tests failures in multinode dvr env 16:02:26 <mlavalle> I did work on this, in fact I was having a chat with haleyb about it 16:02:57 <mlavalle> I am finding that the instance that lands in the controller node cannot access the metadata service 16:03:37 <slaweq> mlavalle: I'm now looking at dvr multinode job results and I see a lot of errors like: http://logs.openstack.org/59/625359/4/check/neutron-tempest-plugin-dvr-multinode-scenario/0fcd454/controller/logs/screen-q-l3.txt.gz?level=ERROR 16:03:48 <slaweq> do You think that those may be related somehow? 16:04:00 * mlavalle looking 16:04:51 <haleyb> slaweq: i had meant to file a bug for that and another traceback i saw, need to find the tab 16:04:55 <mlavalle> I saw those and they definitely need to be fixed 16:05:20 <mlavalle> but I don't necessarily see the connection with the other bug 16:05:27 <slaweq> yes, but I wonder if that may be a reason why trunk tests are still failing, even with Your patch 16:05:33 <mlavalle> I don't rule it out, but I don't see the connection now 16:05:37 <slaweq> ok 16:05:59 <slaweq> haleyb: will You fill bug for this one or should I do it? 16:06:14 <mlavalle> I am also finding that the L3 agent in the controller doesn't seem to be processing the router 16:06:45 <mlavalle> only the agent in the compute node is processing the router 16:06:56 <haleyb> i can file a bug, wonder about the JSON issues as well, could mean a bad message? 16:07:19 <slaweq> what JSON issue exactly? 16:07:25 <haleyb> i think i've seen these with a malformed rpc, but can't remember 16:07:37 <bcafarel> late hi o/ 16:07:45 <haleyb> http://logs.openstack.org/59/625359/4/check/neutron-tempest-plugin-dvr-multinode-scenario/0fcd454/controller/logs/screen-q-l3.txt.gz?level=ERROR#_Dec_17_23_29_17_340537 16:07:52 <mlavalle> if that is actually the case, then there is no metadata proxy to process the request from the instance in the controller 16:07:56 <haleyb> slaweq: things like that^^ 16:08:33 <mlavalle> so here's what I plan to do: 16:08:42 <mlavalle> 1) Dig further in the logs 16:08:46 <slaweq> hmm, I didn't saw this one before 16:09:09 <mlavalle> 2) I will test locally the creation of the router 16:09:26 <mlavalle> 3) add some log debug statements and test 16:09:36 <mlavalle> agree? 16:09:41 <slaweq> mlavalle: You are talking about debugging trunk issue, right? 16:09:47 <mlavalle> right 16:09:57 <slaweq> ok, that is fine for me 16:10:01 <mlavalle> but at this poiint, it is not a trunk issue 16:10:07 <slaweq> #action mlavalle will continue debugging trunk tests failures in multinode dvr env 16:10:10 <mlavalle> it is a router /L3 agent issue 16:10:16 <mlavalle> slaweq: ^^^^^ 16:10:24 <mlavalle> and that spills to: 16:10:38 <mlavalle> 1) potentially the messages you are seeing in the logs 16:11:08 <mlavalle> 2) the other ssh timouts that you gave me as hoemwork last week 16:11:53 <mlavalle> makes sense? 16:12:01 <slaweq> yep 16:12:21 <mlavalle> ok 16:12:28 <slaweq> mlavalle: so do You think that we should report separate bugs for those issues from logs? 16:12:40 <slaweq> I think so but what's Your opinion? :) 16:12:47 <mlavalle> yes 16:12:49 <slaweq> k 16:12:59 <slaweq> haleyb: will You report them? 16:13:02 <mlavalle> and I'll point to them in the bug that I am working on 16:13:16 <haleyb> sure, will report them 16:13:19 <mlavalle> so if I find a relationship, I keep the connection 16:13:21 <slaweq> #action haleyb to report bugs about recent errors in L3 agent logs 16:13:27 <slaweq> thx mlavalle and haleyb 16:14:13 <slaweq> ok, lets move on 16:14:20 <slaweq> next one: slaweq to continue debugging bug 1798475 16:14:21 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,In progress] https://launchpad.net/bugs/1798475 - Assigned to LIU Yulong (dragon889) 16:14:43 <slaweq> I asked for help L3 experts and liuyulong jumped in. Patch proposed: https://review.openstack.org/#/c/625054/ 16:15:33 <mlavalle> ok 16:15:36 <mlavalle> great! 16:16:09 <mlavalle> it is still WIP 16:16:16 <slaweq> so as liuyulong is working on it, I think we are in good hands with this one :) 16:16:29 <mlavalle> yeap 16:17:27 <slaweq> ok, lets move on then 16:17:31 <slaweq> slaweq to continue fixing funtional-py3 tests 16:17:59 <slaweq> I limited output from functional job by disabling warnings and some logging to stdout 16:18:04 <slaweq> patches for that are proposed: 16:18:09 <slaweq> https://review.openstack.org/#/c/625555/ 16:18:11 <slaweq> https://review.openstack.org/#/c/625569/ 16:18:13 <slaweq> https://review.openstack.org/#/c/625704/ 16:18:15 <slaweq> https://review.openstack.org/#/c/625571/ 16:18:33 <slaweq> with those patches this functional job running on python3 should be (almost) good 16:18:56 <slaweq> almost because I noticed that also 3 tests related to SIGHUP signal are failing: http://logs.openstack.org/83/577383/19/check/neutron-functional/6470d68/logs/testr_results.html.gz 16:19:24 <slaweq> bcafarel: can You take a look at them and check if that is related to this issue with handling SIGHUP which we already have reported somewhere? 16:19:30 <slaweq> or maybe it's some different issue 16:21:06 <slaweq> ok, I think that bcafarel is not here now 16:21:15 <slaweq> I will ping him tomorrow about that issue 16:21:32 <slaweq> #action slaweq to talk with bcafarel about SIGHUP issue in functional py3 tests 16:21:44 <slaweq> next one: 16:21:46 <slaweq> hongbin to report and check failing neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_gateway_ip_changed test 16:21:53 <bcafarel> slaweq: o/ sorry was AFK I can take a look tomorrow (hopefully) 16:22:05 <slaweq> bcafarel: no problem, thx a lot 16:22:26 <hongbin> o/ 16:22:39 <hongbin> there is a proposed patch for that 16:22:47 <hongbin> #link https://review.openstack.org/#/c/625359/ 16:23:31 <hongbin> IMO, we can merge the patch and see if it is able to resolve the error 16:23:45 <slaweq> thx hongbin, I will take a look at it tomorrow 16:24:00 <hongbin> slaweq: thanks 16:24:34 <slaweq> ok, lets move to the next one then 16:24:35 <slaweq> slaweq to switch neutron-tempest-iptables_hybrid job as non-voting if it will be failing a lot because of bug 1807949 16:24:36 <openstack> bug 1807949 in os-vif "os_vif error: [Errno 24] Too many open files" [High,Fix released] https://launchpad.net/bugs/1807949 - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:24:53 <slaweq> I did: Patch https://review.openstack.org/#/c/624489/ is merged already 16:25:05 <slaweq> and also revert is already proposed: https://review.openstack.org/#/c/625519/ - proper fix is already in os_vif 16:25:11 <slaweq> so please review this revert :) 16:25:32 <slaweq> and thx ralonsoh for proper fix in os_vif :) 16:26:20 <slaweq> next one: 16:26:21 <slaweq> slaweq to mark db migration tests as unstable for now 16:26:26 <slaweq> Patch https://review.openstack.org/#/c/624685/ - merged 16:27:25 <slaweq> and I recently found out that there was mistake in this patch, so there is another one: https://review.openstack.org/#/c/625556/ and this is also merged 16:27:40 <slaweq> I hope we should be better in functional tests ratio now 16:28:00 <slaweq> and that was all on my list from last week 16:28:23 <slaweq> anything else You want to ask/talk from previous week? 16:29:10 <slaweq> ok, lets move to the next topic then 16:29:14 <slaweq> #topic Python 3 16:29:28 <slaweq> I have 2 things related to python 3 16:29:42 <slaweq> 1. Rally job switch to python3: https://review.openstack.org/#/c/624358/ - please review it 16:29:55 <slaweq> it required fix on rally side and it's merged already 16:30:06 <slaweq> so we should be good to switch this job to python3 16:30:45 <slaweq> 2. Some info: As per gmann’s comment in https://review.openstack.org/#/c/624360/3 - we will not get rid of tempest-full python 2.7 job for now. 16:32:07 <slaweq> that's all from my side about python3 CI jobs 16:32:17 <slaweq> anything else You want to add? 16:32:48 <mlavalle> nope, thanks for the update 16:32:58 <slaweq> ok 16:33:06 <slaweq> #topic Grafana 16:33:13 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:34:41 <slaweq> do You see anything worth to discuss now? 16:35:22 <slaweq> I think that all is more or less under controll now - we have still some issues which we are aware of but nothing new and nothing which would cause very high failure rates 16:35:51 <slaweq> neutron-tempest-iptables_hybrid jobs is clearly back in good shape after ralonsoh's fix for os_vif 16:35:54 <mlavalle> it looks overall good 16:36:07 <slaweq> and neutron-tempest-plugin-dvr-multinode-scenario is back to 100% of failures 16:36:21 <mlavalle> right I was about to mention that 16:36:22 <slaweq> but this was already discussed before and we know why it happens like that 16:36:46 <mlavalle> but keep in mind that the work I amdoing with the trunk tests adresses that 16:36:52 <mlavalle> at least partially 16:37:17 <slaweq> fullstack tests are mostly failing because of those issues which hongbin and liuyulong are working on 16:37:30 <mlavalle> That's what I figured 16:38:23 <slaweq> only quite high failure rate for functional tests worrying me a bit 16:38:32 <slaweq> I'm looking for some recent examples 16:40:15 <slaweq> I found something like that for example: http://logs.openstack.org/00/612400/16/check/neutron-functional/3e8729c/logs/testr_results.html.gz 16:41:10 <slaweq> and I see such error for first time 16:43:24 <slaweq> other examples which I found looks that are related to patches on which it was running 16:43:33 <slaweq> so maybe there is no any new issue with those tests 16:43:42 <slaweq> lets just monitor it for next days :) 16:43:48 <slaweq> what do You think? 16:44:43 <mlavalle> yes 16:44:48 <mlavalle> let's monitor it 16:45:01 <slaweq> :) 16:45:05 <slaweq> ok, lets move on 16:45:09 <slaweq> #topic Periodic 16:45:42 <slaweq> I just wanted to mention that thx mriedem our neutron-tempest-postgres-full is good again :) 16:45:46 <slaweq> thx mriedem 16:45:59 <slaweq> and except that all other periodic jobs looks good now 16:46:24 <slaweq> ok, so last topic for today 16:46:26 <slaweq> #topic Open discussion 16:46:39 <mriedem> \o/ 16:46:59 <slaweq> first of all I want to mention that next 2 meetings will be cancelled as it would be during Christmas and New Year 16:47:08 <slaweq> I will send email about that today too 16:47:11 <mlavalle> no meetings on the 25th and the 1st, right? 16:47:17 <slaweq> mlavalle: right 16:47:32 <mlavalle> I don't have anything else 16:47:41 <slaweq> and second thing which I want to raise here is patch 16:47:47 <slaweq> #link https://review.openstack.org/573933 16:47:55 <slaweq> it is waiting for very long time for review 16:48:06 <slaweq> I was already mentioning it here few times 16:48:11 <slaweq> please take a look at it 16:48:13 <mlavalle> I'll comment today 16:48:19 <slaweq> thx mlavalle 16:48:25 <mlavalle> as I've said before, I'm not fan of it 16:48:44 <slaweq> personally I don't think we should merge it but I don't want to block it if others things that it's good 16:49:06 <slaweq> ok, thats all from me for today 16:49:17 <slaweq> do You have anything else You want to talk? 16:49:21 <mlavalle> nope 16:49:47 <slaweq> if not, then I will give You back 10 minutes 16:49:54 <mlavalle> thanks 16:50:12 <slaweq> have a great holidays and Happy New Year! 16:50:17 <mlavalle> you too! 16:50:18 <bcafarel> the same :) 16:50:25 <slaweq> Happy CI New Year even :) 16:50:29 <mlavalle> although I still expect to see you in the Neutron channel 16:50:37 <slaweq> and see You all in January on meetings 16:50:50 <slaweq> mlavalle: yes, I will be available until this Friday for sure 16:51:04 <slaweq> next week maybe if I will need to rest from family a bit :P 16:51:06 <mlavalle> :-) 16:51:18 <slaweq> #endmeeting