15:00:13 <slaweq> #startmeeting neutron_ci
15:00:14 <openstack> Meeting started Wed Feb  5 15:00:13 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:15 <slaweq> hi
15:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:18 <openstack> The meeting name has been set to 'neutron_ci'
15:00:19 <njohnston> o/
15:01:13 <slaweq> ralonsoh: bcafarel haleyb: CI meeting, are You around?
15:01:20 <ralonsoh> hi
15:01:24 <bcafarel> o/
15:01:34 <ralonsoh> I was waiting in the wrong channel
15:01:35 <bcafarel> slaweq: thanks for the ping I was looking for correct window :)
15:01:40 <haleyb> slaweq: i'm in another meeting too, have one eye here :)
15:01:46 <slaweq> :)
15:01:49 <slaweq> ok, lets start
15:01:51 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:03 <slaweq> please open it and we can move on
15:02:05 <slaweq> #topic Actions from previous meetings
15:02:10 <slaweq> slaweq to talk with gmann about vpnaas jobs on rocky
15:02:18 <slaweq> tbh I forgot about it :/
15:02:47 <gmann> slaweq: i sent summary of what amotoki and we discussed on ML
15:02:49 <slaweq> maybe gmann is around now so we can ask him how to fix this issue with vpnaas rocky tempest jobs
15:03:02 <slaweq> gmann: ok, so I need to find this email than :)
15:03:42 <gmann> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012241.html
15:03:45 <gmann> 4th point
15:04:01 <gmann> slaweq: ping me once you are done then we can discuss further
15:07:00 <slaweq> gmann: so basically we should backport to rocky https://review.opendev.org/#/c/695834/
15:07:10 <slaweq> or at least "partially" backport it
15:07:12 <slaweq> ?
15:07:41 <gmann> slaweq: yeah that will be good for long term maintenance and how py2 EOL things happening
15:08:27 <slaweq> ok, and this will not be a problem if we will have those tests in-tree but will run them from neutron-tempest-plugin repo actually?
15:08:56 <gmann> i am fixing stable branches with stable u-c t use in tempest tox run  which solve issue but backport that is what i will suggest in case of another issue
15:09:03 <gmann> true
15:09:16 <slaweq> ok, thx for explanation
15:09:18 <gmann> fixing current stable branches summary - http://lists.openstack.org/pipermail/openstack-discuss/2020-February/012371.html
15:09:36 <slaweq> #action slaweq to backport https://review.opendev.org/#/c/695834/ to stable branches in neutron-vpnaas
15:09:44 <gmann> FYI, all stable branch till rocky are broken now
15:10:14 <bcafarel> sigh
15:10:23 <slaweq> :/
15:11:19 <slaweq> ok, lets move on
15:11:24 <slaweq> next action was
15:11:26 <slaweq> slaweq to update grafana dashboard with missing jobs
15:11:30 <slaweq> and I also forgot about it :/
15:11:34 <slaweq> #action slaweq to update grafana dashboard with missing jobs
15:11:41 <slaweq> I will do it this week
15:12:14 <slaweq> any questions/comments to this topic?
15:12:44 <njohnston> nope
15:13:02 <ralonsoh> no
15:13:07 <slaweq> so we can move on to the next topic
15:13:08 <slaweq> #topic Stadium projects
15:13:21 <slaweq> migration to zuulv3
15:13:23 <slaweq> https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
15:13:40 <slaweq> I was checking this etherpad few days ago, and I even send some small patches related to it
15:13:47 <slaweq> (but still needs some work)
15:14:41 <slaweq> and generally we are pretty good there
15:14:57 <slaweq> most of the legacy jobs have got patches already in review
15:14:57 <njohnston> +1
15:15:18 <slaweq> huge thx for bcafarel for sending many related patches :)
15:15:38 <bcafarel> np, some of them are still not working properly
15:16:02 <bcafarel> slaweq: as you know neutron-functional well if you have some time to take a look at https://review.opendev.org/#/c/703601/
15:16:13 <bcafarel> I can't seem to convince it to install/find neutron :(
15:16:24 <bcafarel> (nothing urgent of course)
15:16:39 <slaweq> bcafarel: ok, I will take a look
15:17:39 <bcafarel> thanks :)
15:17:54 <slaweq> np
15:18:00 <slaweq> anything else related to the stadium projects' ci?
15:18:25 <njohnston> nope
15:18:49 <ralonsoh> no
15:19:27 <slaweq> ok, lets move on than
15:19:55 <slaweq> #topic Grafana
15:19:57 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:21:21 <slaweq> first of all our gate jobs are not running since few days
15:21:44 <slaweq> and it's mostly due to broken neutron-ovn-tempest-ovs-release job
15:22:07 <slaweq> second issue is that we have some gap in data in during last weekend and begin of this week
15:22:13 <slaweq> but that's probably infra issue
15:23:21 <njohnston> Is neutron-ovn-tempest-ovs-release one of the missing jobs that needs to be updated in grafana?  I can't find it.
15:23:30 <slaweq> njohnston: yes
15:23:33 <slaweq> sorry for that :/
15:23:33 <njohnston> ok
15:23:44 <njohnston> np :-)
15:24:50 <slaweq> also it seems we had some problem yesterday as many jobs has got high numbers there
15:25:05 <slaweq> but I don't know about any specific issue from yesterday
15:26:55 <slaweq> but this spike can be also due to low number of running jobs (or data stored) in the day before
15:28:04 <slaweq> other than that I don't see anything really wrong
15:29:11 <slaweq> ok, lets move on than
15:29:13 <slaweq> #topic fullstack/functional
15:29:28 <slaweq> I have couple of issues in fullstack tests for today
15:29:43 <slaweq> Error when connecting to the placement service (same as in last week too):
15:29:50 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f66/703143/3/check/neutron-fullstack/f667c93/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_NIC-Switch-agent_/neutron-server--2020-02-04--12-09-12-759753_log.txt
15:30:03 <slaweq> maybe lajoskatona or rubasov could take a look at it
15:30:57 <slaweq> I pinged rubasov to join this meeting
15:31:05 <slaweq> maybe he will join soon
15:31:48 <rubasov> hi
15:31:54 <slaweq> hi rubasov
15:32:19 <slaweq> recently we spotted few times issue like in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f66/703143/3/check/neutron-fullstack/f667c93/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_NIC-Switch-agent_/neutron-server--2020-02-04--12-09-12-759753_log.txt
15:32:38 <slaweq> neutron-server can't connect to (fake) placement service
15:32:51 <slaweq> did You maybe saw something like that before?
15:33:07 <slaweq> or do You know why it could happen?
15:33:28 <rubasov> did not see this before
15:35:04 <slaweq> rubasov: can You try to take a look into that?
15:35:12 <rubasov> I don't really have ideas right now
15:35:17 <slaweq> not today ofcourse :) but if You will have some time
15:35:41 <rubasov> sure we'll look into it with lajoskatona
15:35:55 <rubasov> he wrote that fake placement service originally IIRC
15:36:05 <rubasov> how frequent this is?
15:36:39 <slaweq> it's not very frequent, I saw it once per week or something like that
15:37:07 <slaweq> rubasov: I will open LP to track it
15:37:09 <rubasov> okay, I'll put it on my todo list
15:37:29 <slaweq> and I will send it to You and Lajos - maybe You will have some time to take a look
15:38:08 <rubasov> that's even better, thank you
15:38:15 <slaweq> #action slaweq to open LP related to fullstack placement issue
15:38:19 <slaweq> thx rubasov
15:38:32 <slaweq> ok, another issue
15:38:35 <slaweq> in https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_/neutron-server--2020-02-05--11-00-19-067345_log.txt
15:38:37 <rubasov> thanks
15:38:52 <slaweq> it seems for me like neutron-server just hanged and that caused test timeout
15:39:38 <slaweq> actually not, it was also connection issue: https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_.txt
15:39:42 <slaweq> this time to neutron-server
15:40:37 <njohnston> well if the neutron server crashed hard then both the log would end precipitously like that and ECONNREFUSED is what clients would see
15:41:05 <njohnston> if it was a hang then the client would have timeouts
15:41:55 <slaweq> njohnston: but in the end of the logs You can see
15:41:57 <slaweq> 2020-02-05 11:01:19.086 22341 DEBUG neutron.agent.linux.utils [-] Running command: ['kill', '-15', '3180'] create_process /home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/utils.py:87
15:41:59 <slaweq> 2020-02-05 11:01:19.280 22341 DEBUG neutron.tests.fullstack.resources.process [-] Process stopped: neutron-server stop /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/process.py:85
15:42:08 <slaweq> so it seems that neutron-server was properly stopped at the end
15:42:21 <slaweq> if it would crash earlier wouldn't this be an error?
15:42:51 <ralonsoh> I think so...
15:43:03 <njohnston> which log is that in?  I don't see it in https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_/neutron-server--2020-02-05--11-00-19-067345_log.txt
15:43:40 <slaweq> njohnston: it's in https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_.txt
15:43:50 <slaweq> this is "test log"
15:44:30 <njohnston> ok
15:46:00 <ralonsoh> slaweq, do you have a bug for this one?
15:46:05 <ralonsoh> I can review it later
15:46:14 <slaweq> ralonsoh: nope, I saw it only once so far so and I didn't open bug for it
15:46:17 <slaweq> but I can
15:46:24 <ralonsoh> (at least this is not a QoS error)
15:47:00 <slaweq> I will ping You when I will open LP for that
15:47:09 <ralonsoh> thanks for the "present"
15:47:22 <slaweq> #action slaweq to open LP related to "hang" neutron-server
15:47:27 <slaweq> ralonsoh: yw :D
15:47:49 <slaweq> and that's all what I have for today for functional/fullstack jobs
15:48:00 <slaweq> anything else You have maybe?
15:48:31 <ralonsoh> slaweq, https://review.opendev.org/#/c/705760/ is almost merged
15:48:51 <ralonsoh> I'll abandon https://review.opendev.org/#/c/705903/
15:49:27 <slaweq> ralonsoh: great
15:49:28 <bcafarel> so, recheck time once 705760 is in?
15:49:43 <slaweq> that should unblock our gate hopefully
15:50:12 <njohnston> excellent
15:51:11 <slaweq> ok, so we moved to scenario/tempest test now
15:51:24 <slaweq> #topic Tempest/Scenario
15:51:57 <slaweq> we already mentioned broken  neutron-ovn-tempest-ovs-release job which should be fixed with 705760
15:53:38 <slaweq> from other issues I have one with test_show_network_segment_range https://9f9aee74b45263b2a9d8-795792c1f104e79962e44448ab55e3f1.ssl.cf1.rackcdn.com/681466/2/check/neutron-tempest-plugin-api/b340c8e/testr_results.html
15:53:49 <slaweq> and I think I saw something similar couple of times already
15:54:05 <slaweq> I'm not sure if that was always the same test but similar issue for sure
15:54:09 <ralonsoh> again?
15:54:16 <ralonsoh> it is the same one
15:54:28 <bcafarel> KeyError on project_id??
15:54:38 <ralonsoh> I really don't understand why this specific key is not present
15:54:48 <ralonsoh> and this is not a trivial one, but project_id
15:55:31 <slaweq> yes, I also don't understant it
15:55:56 <slaweq> wait, actually this one was 21.01
15:56:01 <slaweq> so maybe it's an old issue
15:56:37 <slaweq> sorry for the noise than :)
15:57:12 <ralonsoh> no, but this test error is recurrent
15:57:40 <slaweq> yes, and actually I don't understand it excatly
15:58:46 <slaweq> if You look at code: https://github.com/openstack/neutron-tempest-plugin/blob/master/neutron_tempest_plugin/api/admin/test_network_segment_range.py#L201
15:58:59 <slaweq> it failed after checking "id", "name" and other attributes
15:59:04 <slaweq> so it's not like dict is empty
15:59:13 <ralonsoh> one question
15:59:14 <slaweq> there is "only" project_id missing from it
15:59:21 <ralonsoh> this test is using neutron-client
15:59:28 <ralonsoh> not os-client
15:59:32 <ralonsoh> is that correct?
15:59:59 <slaweq> idk
16:00:10 <slaweq> but IMO those tests are using tempest clients, no?
16:01:02 <ralonsoh> Ok, I'll check it
16:01:09 <slaweq> thx ralonsoh
16:01:18 <slaweq> #action ralonsoh  to check missing project_id issue
16:01:20 <ralonsoh> we run out of time
16:01:26 <slaweq> ok we are out of time today
16:01:31 <slaweq> thx for attending
16:01:33 <slaweq> o/
16:01:34 <ralonsoh> bye
16:01:35 <njohnston> \o
16:01:35 <slaweq> #endmeeting