#openstack-meeting log

16:00:19 <slaweq> #startmeeting neutron_ci
16:00:20 <openstack> Meeting started Tue Dec 18 16:00:19 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:21 <slaweq> hi
16:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:25 <openstack> The meeting name has been set to 'neutron_ci'
16:00:51 <mlavalle> o/
16:01:25 <hongbin> o/
16:01:54 <slaweq> ok, lets start
16:02:00 <slaweq> #topic Actions from previous meetings
16:02:06 <slaweq> mlavalle will continue debugging trunk tests failures in multinode dvr env
16:02:26 <mlavalle> I did work on this, in fact I was having a chat with haleyb about it
16:02:57 <mlavalle> I am finding that the instance that lands in the controller node cannot access the metadata service
16:03:37 <slaweq> mlavalle: I'm now looking at dvr multinode job results and I see a lot of errors like: http://logs.openstack.org/59/625359/4/check/neutron-tempest-plugin-dvr-multinode-scenario/0fcd454/controller/logs/screen-q-l3.txt.gz?level=ERROR
16:03:48 <slaweq> do You think that those may be related somehow?
16:04:00 * mlavalle looking
16:04:51 <haleyb> slaweq: i had meant to file a bug for that and another traceback i saw, need to find the tab
16:04:55 <mlavalle> I saw those and they definitely need to be fixed
16:05:20 <mlavalle> but I don't necessarily see the connection with the other bug
16:05:27 <slaweq> yes, but I wonder if that may be a reason why trunk tests are still failing, even with Your patch
16:05:33 <mlavalle> I don't rule it out, but I don't see the connection now
16:05:37 <slaweq> ok
16:05:59 <slaweq> haleyb: will You fill bug for this one or should I do it?
16:06:14 <mlavalle> I am also finding that the L3 agent in the controller doesn't seem to be processing the router
16:06:45 <mlavalle> only the agent in the compute node is processing the router
16:06:56 <haleyb> i can file a bug, wonder about the JSON issues as well, could mean a bad message?
16:07:19 <slaweq> what JSON issue exactly?
16:07:25 <haleyb> i think i've seen these with a malformed rpc, but can't remember
16:07:37 <bcafarel> late hi o/
16:07:45 <haleyb> http://logs.openstack.org/59/625359/4/check/neutron-tempest-plugin-dvr-multinode-scenario/0fcd454/controller/logs/screen-q-l3.txt.gz?level=ERROR#_Dec_17_23_29_17_340537
16:07:52 <mlavalle> if that is actually the case, then there is no metadata proxy to process the request from the instance in the controller
16:07:56 <haleyb> slaweq: things like that^^
16:08:33 <mlavalle> so here's what I plan to do:
16:08:42 <mlavalle> 1) Dig further in the logs
16:08:46 <slaweq> hmm, I didn't saw this one before
16:09:09 <mlavalle> 2) I will test locally the creation of the router
16:09:26 <mlavalle> 3) add some log debug statements and test
16:09:36 <mlavalle> agree?
16:09:41 <slaweq> mlavalle: You are talking about debugging trunk issue, right?
16:09:47 <mlavalle> right
16:09:57 <slaweq> ok, that is fine for me
16:10:01 <mlavalle> but at this poiint, it is not a trunk issue
16:10:07 <slaweq> #action mlavalle will continue debugging trunk tests failures in multinode dvr env
16:10:10 <mlavalle> it is a router /L3 agent issue
16:10:16 <mlavalle> slaweq: ^^^^^
16:10:24 <mlavalle> and that spills to:
16:10:38 <mlavalle> 1) potentially the messages you are seeing in the logs
16:11:08 <mlavalle> 2) the other ssh timouts that you gave me as hoemwork last week
16:11:53 <mlavalle> makes sense?
16:12:01 <slaweq> yep
16:12:21 <mlavalle> ok
16:12:28 <slaweq> mlavalle: so do You think that we should report separate bugs for those issues from logs?
16:12:40 <slaweq> I think so but what's Your opinion? :)
16:12:47 <mlavalle> yes
16:12:49 <slaweq> k
16:12:59 <slaweq> haleyb: will You report them?
16:13:02 <mlavalle> and I'll point to them in the bug that I am working on
16:13:16 <haleyb> sure, will report them
16:13:19 <mlavalle> so if I find a relationship, I keep the connection
16:13:21 <slaweq> #action haleyb to report bugs about recent errors in L3 agent logs
16:13:27 <slaweq> thx mlavalle and haleyb
16:14:13 <slaweq> ok, lets move on
16:14:20 <slaweq> next one: slaweq to continue debugging bug 1798475
16:14:21 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,In progress] https://launchpad.net/bugs/1798475 - Assigned to LIU Yulong (dragon889)
16:14:43 <slaweq> I asked for help L3 experts and liuyulong jumped in. Patch proposed: https://review.openstack.org/#/c/625054/
16:15:33 <mlavalle> ok
16:15:36 <mlavalle> great!
16:16:09 <mlavalle> it is still WIP
16:16:16 <slaweq> so as liuyulong is working on it, I think we are in good hands with this one :)
16:16:29 <mlavalle> yeap
16:17:27 <slaweq> ok, lets move on then
16:17:31 <slaweq> slaweq to continue fixing funtional-py3 tests
16:17:59 <slaweq> I limited output from functional job by disabling warnings and some logging to stdout
16:18:04 <slaweq> patches for that are proposed:
16:18:09 <slaweq> https://review.openstack.org/#/c/625555/
16:18:11 <slaweq> https://review.openstack.org/#/c/625569/
16:18:13 <slaweq> https://review.openstack.org/#/c/625704/
16:18:15 <slaweq> https://review.openstack.org/#/c/625571/
16:18:33 <slaweq> with those patches this functional job running on python3 should be (almost) good
16:18:56 <slaweq> almost because I noticed that also 3 tests related to SIGHUP signal are failing: http://logs.openstack.org/83/577383/19/check/neutron-functional/6470d68/logs/testr_results.html.gz
16:19:24 <slaweq> bcafarel: can You take a look at them and check if that is related to this issue with handling SIGHUP which we already have reported somewhere?
16:19:30 <slaweq> or maybe it's some different issue
16:21:06 <slaweq> ok, I think that bcafarel is not here now
16:21:15 <slaweq> I will ping him tomorrow about that issue
16:21:32 <slaweq> #action slaweq to talk with bcafarel about SIGHUP issue in functional py3 tests
16:21:44 <slaweq> next one:
16:21:46 <slaweq> hongbin to report and check failing neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_gateway_ip_changed test
16:21:53 <bcafarel> slaweq: o/ sorry was AFK I can take a look tomorrow (hopefully)
16:22:05 <slaweq> bcafarel: no problem, thx a lot
16:22:26 <hongbin> o/
16:22:39 <hongbin> there is a proposed patch for that
16:22:47 <hongbin> #link https://review.openstack.org/#/c/625359/
16:23:31 <hongbin> IMO, we can merge the patch and see if it is able to resolve the error
16:23:45 <slaweq> thx hongbin, I will take a look at it tomorrow
16:24:00 <hongbin> slaweq: thanks
16:24:34 <slaweq> ok, lets move to the next one then
16:24:35 <slaweq> slaweq to switch neutron-tempest-iptables_hybrid job as non-voting if it will be failing a lot because of bug 1807949
16:24:36 <openstack> bug 1807949 in os-vif "os_vif error: [Errno 24] Too many open files" [High,Fix released] https://launchpad.net/bugs/1807949 - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
16:24:53 <slaweq> I did: Patch https://review.openstack.org/#/c/624489/ is merged already
16:25:05 <slaweq> and also revert is already proposed: https://review.openstack.org/#/c/625519/ - proper fix is already in os_vif
16:25:11 <slaweq> so please review this revert :)
16:25:32 <slaweq> and thx ralonsoh for proper fix in os_vif :)
16:26:20 <slaweq> next one:
16:26:21 <slaweq> slaweq to mark db migration tests as unstable for now
16:26:26 <slaweq> Patch https://review.openstack.org/#/c/624685/  - merged
16:27:25 <slaweq> and I recently found out that there was mistake in this patch, so there is another one: https://review.openstack.org/#/c/625556/ and this is also merged
16:27:40 <slaweq> I hope we should be better in functional tests ratio now
16:28:00 <slaweq> and that was all on my list from last week
16:28:23 <slaweq> anything else You want to ask/talk from previous week?
16:29:10 <slaweq> ok, lets move to the next topic then
16:29:14 <slaweq> #topic Python 3
16:29:28 <slaweq> I have 2 things related to python 3
16:29:42 <slaweq> 1. Rally job switch to python3: https://review.openstack.org/#/c/624358/ - please review it
16:29:55 <slaweq> it required fix on rally side and it's merged already
16:30:06 <slaweq> so we should be good to switch this job to python3
16:30:45 <slaweq> 2. Some info: As per gmann’s comment in https://review.openstack.org/#/c/624360/3 - we will not get rid of tempest-full python 2.7 job for now.
16:32:07 <slaweq> that's all from my side about python3 CI jobs
16:32:17 <slaweq> anything else You want to add?
16:32:48 <mlavalle> nope, thanks for the update
16:32:58 <slaweq> ok
16:33:06 <slaweq> #topic Grafana
16:33:13 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:34:41 <slaweq> do You see anything worth to discuss now?
16:35:22 <slaweq> I think that all is more or less under controll now - we have still some issues which we are aware of but nothing new and nothing which would cause very high failure rates
16:35:51 <slaweq> neutron-tempest-iptables_hybrid jobs is clearly back in good shape after ralonsoh's fix for os_vif
16:35:54 <mlavalle> it looks overall good
16:36:07 <slaweq> and neutron-tempest-plugin-dvr-multinode-scenario is back to 100% of failures
16:36:21 <mlavalle> right I was about to mention that
16:36:22 <slaweq> but this was already discussed before and we know why it happens like that
16:36:46 <mlavalle> but keep in mind that the work I amdoing with the trunk tests adresses that
16:36:52 <mlavalle> at least partially
16:37:17 <slaweq> fullstack tests are mostly failing because of those issues which hongbin and liuyulong are working on
16:37:30 <mlavalle> That's what I figured
16:38:23 <slaweq> only quite high failure rate for functional tests worrying me a bit
16:38:32 <slaweq> I'm looking for some recent examples
16:40:15 <slaweq> I found something like that for example: http://logs.openstack.org/00/612400/16/check/neutron-functional/3e8729c/logs/testr_results.html.gz
16:41:10 <slaweq> and I see such error for first time
16:43:24 <slaweq> other examples which I found looks that are related to patches on which it was running
16:43:33 <slaweq> so maybe there is no any new issue with those tests
16:43:42 <slaweq> lets just monitor it for next days :)
16:43:48 <slaweq> what do You think?
16:44:43 <mlavalle> yes
16:44:48 <mlavalle> let's monitor it
16:45:01 <slaweq> :)
16:45:05 <slaweq> ok, lets move on
16:45:09 <slaweq> #topic Periodic
16:45:42 <slaweq> I just wanted to mention that thx mriedem our neutron-tempest-postgres-full is good again :)
16:45:46 <slaweq> thx mriedem
16:45:59 <slaweq> and except that all other periodic jobs looks good now
16:46:24 <slaweq> ok, so last topic for today
16:46:26 <slaweq> #topic Open discussion
16:46:39 <mriedem> \o/
16:46:59 <slaweq> first of all I want to mention that next 2 meetings will be cancelled as it would be during Christmas and New Year
16:47:08 <slaweq> I will send email about that today too
16:47:11 <mlavalle> no meetings on the 25th and the 1st, right?
16:47:17 <slaweq> mlavalle: right
16:47:32 <mlavalle> I don't have anything else
16:47:41 <slaweq> and second thing which I want to raise here is patch
16:47:47 <slaweq> #link https://review.openstack.org/573933
16:47:55 <slaweq> it is waiting for very long time for review
16:48:06 <slaweq> I was already mentioning it here few times
16:48:11 <slaweq> please take a look at it
16:48:13 <mlavalle> I'll comment today
16:48:19 <slaweq> thx mlavalle
16:48:25 <mlavalle> as I've said before, I'm not fan of it
16:48:44 <slaweq> personally I don't think we should merge it but I don't want to block it if others things that it's good
16:49:06 <slaweq> ok, thats all from me for today
16:49:17 <slaweq> do You have anything else You want to talk?
16:49:21 <mlavalle> nope
16:49:47 <slaweq> if not, then I will give You back 10 minutes
16:49:54 <mlavalle> thanks
16:50:12 <slaweq> have a great holidays and Happy New Year!
16:50:17 <mlavalle> you too!
16:50:18 <bcafarel> the same :)
16:50:25 <slaweq> Happy CI New Year even :)
16:50:29 <mlavalle> although I still expect to see you in the Neutron channel
16:50:37 <slaweq> and see You all in January on meetings
16:50:50 <slaweq> mlavalle: yes, I will be available until this Friday for sure
16:51:04 <slaweq> next week maybe if I will need to rest from family a bit :P
16:51:06 <mlavalle> :-)
16:51:18 <slaweq> #endmeeting