16:00:37 <slaweq> #startmeeting neutron_ci 16:00:39 <slaweq> hi 16:00:39 <openstack> Meeting started Tue Oct 29 16:00:37 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:40 <ralonsoh> hi 16:00:41 <njohnston> o/ 16:00:42 <openstack> The meeting name has been set to 'neutron_ci' 16:01:04 <bcafarel> o/ 16:01:50 <slaweq> ok, we already have usual participants so we can start 16:02:05 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:10 <slaweq> please open now 16:04:35 <slaweq> sorry, I had to take care of daughter 16:04:40 <slaweq> but now I'm back 16:04:50 <slaweq> #topic Actions from previous meetings 16:05:02 <slaweq> first one 16:05:04 <slaweq> ralonsoh to check if there is any possibility to do somethig like ovsdb-monitor for openflows 16:05:19 <ralonsoh> one sec 16:05:26 <ralonsoh> #link https://review.opendev.org/#/c/689150/ 16:05:39 <ralonsoh> this is a patch to implement a OF monitor 16:05:49 <ralonsoh> then, if approved, we can use it in UT/FT testing 16:06:18 <slaweq> and You want to use it only in the tests? 16:06:31 <ralonsoh> for now I don't see any other use 16:06:50 <ralonsoh> we raised this bug just for debugging tests 16:06:58 <ralonsoh> but of course, could be use anywhere 16:07:38 <slaweq> yes, but I wonder if we couldn't simply run ovs-ofctl monitor as external service and log it's output to some file 16:07:51 <slaweq> similary to how now dstat is working in ci jobs 16:07:59 <slaweq> wouldn't that be enough? 16:08:37 <ralonsoh> well you need a bit of logix there to clean the output 16:08:43 <ralonsoh> but sure this can be done 16:09:07 <ralonsoh> the goal of this class is to be used anywhere you want to track the OF changes 16:09:42 <slaweq> ok, this monitor is run "per bridge" so it would be better to run it "per test" or "per test class" probably 16:09:53 <ralonsoh> yes, only per bridge 16:09:58 <ralonsoh> cmd = ['ovs-ofctl', 'monitor', bridge_name, 'watch:', '--monitor'] 16:10:02 <slaweq> so python is better for that, right :) 16:10:07 <ralonsoh> I think so 16:10:31 <slaweq> thx ralonsoh, I will add this patch to my review queue 16:10:46 <ralonsoh> thanks 16:11:49 <slaweq> thx for working on this, that may help us a lot with debugging some failed tests in the future 16:12:07 <slaweq> ok, next action 16:12:11 <slaweq> ralonsoh to investigate failed fullstack tests for dhcp agent rescheduling 16:12:20 <ralonsoh> as commented in https://bugs.launchpad.net/neutron/+bug/1799555/comments/23 16:12:20 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Fix committed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:12:33 <ralonsoh> more time spent in reviewing logs than implementing the patch 16:12:40 <ralonsoh> just a timout issue 16:12:46 <ralonsoh> #link https://review.opendev.org/#/c/689550/ 16:13:00 <ralonsoh> that's all 16:13:07 <slaweq> yeah, I saw Your comment and patch which looks fine for me 16:13:11 <ralonsoh> cool 16:13:26 <slaweq> njohnston: bcafarel: please also review it if You will have some time :) 16:13:56 <njohnston> will do 16:14:02 <slaweq> njohnston: thx 16:14:05 <slaweq> ok, next one 16:14:07 <slaweq> slaweq to investigate failed neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs 16:14:10 <bcafarel> sure thing, added to the list :) 16:14:23 <slaweq> unfortunatelly I didn't have time for this 16:14:41 <slaweq> I will try to do it this (or next week), maybe somewhere at the airport/airplane 16:14:46 <slaweq> #action slaweq to investigate failed neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs 16:15:06 <slaweq> bcafarel: also thx :) 16:15:09 <slaweq> ok, next one 16:15:11 <slaweq> slaweq to check strange "EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest send_event" log lines in neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking 16:15:20 <slaweq> for this one I also didn't had time 16:15:26 <slaweq> sorry for that 16:15:30 <slaweq> #action slaweq to check strange "EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest send_event" log lines in neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking 16:16:35 <slaweq> next one 16:16:37 <slaweq> ralonsoh to try to log flows at the end of faliled functional test 16:16:39 <ralonsoh> yes 16:16:45 <ralonsoh> once the previous patch is approved 16:16:55 <ralonsoh> #link https://review.opendev.org/#/c/689150/10 16:17:04 <ralonsoh> then we can use it for the tests 16:17:06 <ralonsoh> one question 16:17:25 <ralonsoh> this can be used during the test, logging the OF changes 16:17:42 <ralonsoh> or just when the test fails, we can print the OF changes in the log 16:17:47 <ralonsoh> thoughts? 16:18:04 <slaweq> I think that logging during the test is better for debugging 16:18:07 <ralonsoh> perfect 16:18:16 <slaweq> as than You potentially can find out some race conditions 16:18:30 <ralonsoh> easier, just make use of the class and when needed, log the OF changes 16:18:55 <ralonsoh> that's all 16:19:04 <slaweq> ok, thx ralonsoh 16:19:12 <slaweq> so You will take care of this, right? 16:19:15 <ralonsoh> sure 16:19:28 <slaweq> can You maybe open launchpad bug for it? Just for tracking purpose 16:19:33 <ralonsoh> yes of course 16:19:36 <slaweq> thx 16:20:14 <slaweq> #action ralonsoh to open LP about adding OF monitor to functional tests 16:20:29 <slaweq> ok, I think we can move on to the next topic than 16:20:31 <slaweq> #topic Stadium projects 16:20:47 <slaweq> according to tempest-plugin-migration I don't have any updates 16:21:06 <slaweq> do You have anything related to stadium projects? 16:21:22 <njohnston> should we start tracking py3 support removal and zuul v3 job conversion goals in etherpads (either new or reused)? 16:21:43 <njohnston> Or should we wait until after Shanghai for those to be finalized? 16:22:12 <slaweq> I think we can start doing that, based on what was anounced recently by gmann 16:22:37 <slaweq> but even we want to start it "now" it will be effectively after ptg :) 16:22:44 <njohnston> true, true :-) 16:23:07 <njohnston> ok so maybe let's start talking about it the meeting after PTG 16:23:07 <bcafarel> unless everything is completed by the time we fly back :) 16:23:10 <slaweq> njohnston: will You prepare such etherpad maybe to track progress? 16:23:38 <slaweq> bcafarel: yeah :) 16:23:46 <njohnston> #action njohnston prepare etherpad to track stadium progress for zuul v3 job definition and py2 support drop 16:23:52 <slaweq> thx njohnston 16:24:54 <slaweq> ok, lets move on than 16:24:56 <slaweq> #topic Grafana 16:25:16 <slaweq> I have only one thing to say about grafana 16:25:26 <slaweq> infra had issue with disk space for few days 16:25:35 <slaweq> so we are missing data from last few days 16:25:44 <slaweq> it started working yesterday evening 16:26:41 <slaweq> now our numbers there looks fine but we don't have too much of recent data to check in fact 16:26:43 <njohnston> yeah, tough to make judgements 16:27:06 <slaweq> exactly 16:27:22 <slaweq> I see one thing, we should remove py27 jobs from it as we removed them from queues 16:27:27 <slaweq> I will do it today 16:27:40 <slaweq> #action slaweq to send patch to remove py27 jobs from grafana 16:27:59 <njohnston> +1 16:28:27 <slaweq> anything else You want to add/ask about grafana? 16:29:00 <njohnston> nope 16:29:08 <ralonsoh> no 16:29:17 <bcafarel> let's just wait for it to fill again :) 16:29:27 <slaweq> ok, so lets move on than 16:29:33 <slaweq> #topic fullstack/functional 16:29:45 <slaweq> I found few new issues for You :) 16:30:11 <slaweq> first one is something which we saw in the past already 16:30:13 <slaweq> Problem with "AttributeError: 'str' object has no attribute 'content_type'" error agan, see 16:30:18 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_12f/690908/1/check/neutron-functional/12f8c37/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_get_filter_id_for_ip.txt.gz 16:30:38 <ralonsoh> this one is very curious 16:30:50 <ralonsoh> because this is happening in the test case result return 16:31:08 <njohnston> weird 16:31:22 <ralonsoh> IMO, this is something related to testtools (but I can't confirm) 16:32:02 <ralonsoh> btw, I don't know if the test case failed or not (this is a good indicator to know what is in content) 16:32:16 <slaweq> yeah, it is strange 16:32:21 <slaweq> but if You look at http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'str'%20object%20has%20no%20attribute%20'content_type'%5C%22 16:32:33 <slaweq> it seems that this happens only on neutron functional/fullstack jobs :/ 16:32:51 <ralonsoh> ok, so we have a pattern, pointing to us.... 16:33:14 <slaweq> yep 16:33:26 <slaweq> I will open LP for this one 16:33:32 <slaweq> as it happens from time to time 16:33:40 <slaweq> maybe we will find fix for it 16:34:02 <slaweq> or some volunteer for work on this than :) 16:34:17 <ralonsoh> I can take a look 16:34:29 <slaweq> #action slaweq to open LP related to "AttributeError: 'str' object has no attribute 'content_type'" error 16:34:45 <slaweq> ralonsoh: thx, I will ping You with link to LP when I will open it 16:34:49 <ralonsoh> thanks 16:34:59 <slaweq> if You will have time, would be great if You can check it 16:35:01 <slaweq> thx a lot 16:36:11 <slaweq> next failure which I found is 16:36:13 <slaweq> https://0c68218832dc7cac70c7-9752c849fa19cb3b4ae0f2b2e19d3d65.ssl.cf2.rackcdn.com/691710/2/check/neutron-functional/15b9600/testr_results.html.gz 16:36:33 <slaweq> but in this case in test's log I don't see anything obvious 16:36:35 <slaweq> https://0c68218832dc7cac70c7-9752c849fa19cb3b4ae0f2b2e19d3d65.ssl.cf2.rackcdn.com/691710/2/check/neutron-functional/15b9600/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.test_dhcp_agent.DHCPAgentOVSTestCase.test_good_address_allocation.txt.gz 16:37:15 <ralonsoh> slaweq, maybe 16:37:21 <ralonsoh> in assert_good_allocation_for_port 16:37:42 <ralonsoh> what we need to do is to retrieve the interface name and run the dhclient in the wait loop 16:37:52 <ralonsoh> not only to check the ip list 16:38:44 <slaweq> but when You run dhclient it will wait for dhcp reply and retry couple of times, no? 16:38:55 <slaweq> so it should be like "wait loop" already 16:39:33 <ralonsoh> I can take a closer look to those logs tomorrow 16:41:04 <slaweq> thx ralonsoh, but please don't waste too much of Your time on it, I saw it only once for now in the gate 16:41:11 <ralonsoh> ok 16:41:27 <slaweq> ok, lets move on 16:41:31 <slaweq> fullstack tests now 16:41:46 <slaweq> I found one new bug https://bugs.launchpad.net/neutron/+bug/1850292 16:41:46 <openstack> Launchpad bug 1850292 in neutron "Fullstack test can try to use IP address from outside the subnet's allocation pool" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:41:55 <slaweq> I have almost working patch for it already 16:43:00 <slaweq> but it doesn't seems for me like something very urgent as it don't hits as often 16:43:25 <slaweq> that's all about functional/fullstack on my side 16:43:34 <slaweq> do You have anything else to add/ask? 16:43:59 <ralonsoh> no 16:44:13 <slaweq> ok, lets move on than 16:44:19 <slaweq> #topic Tempest/Scenario 16:44:20 <njohnston> not on this topic 16:44:35 <slaweq> njohnston: not on which topic? 16:44:56 <njohnston> slaweq: was responding to "do You have anything else to add/ask?" just slowly 16:45:02 <slaweq> ahh, ok 16:45:03 <slaweq> :) 16:45:19 <slaweq> ok, going back to scenario tests 16:45:28 <slaweq> I found that Multicast scenario test is failing often 16:45:41 <slaweq> so I reported bug https://bugs.launchpad.net/neutron/+bug/1850288 16:45:41 <openstack> Launchpad bug 1850288 in neutron "scenario test test_multicast_between_vms_on_same_network fails" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:45:54 <slaweq> and I proposed to make it unstable for now https://review.opendev.org/691855 16:46:14 <slaweq> but later I will want to investigate why it is failing from time to time 16:47:11 <slaweq> I also saw some various connectivity issues, like: 16:47:17 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3ae/682483/1/check/tempest-slow-py3/3ae72c6/testr_results.html.gz 16:47:24 <slaweq> and 16:47:26 <slaweq> https://2a0fd24cd939d3b06641-34d5a3d9e2a9ca67eb62c8365f7602e7.ssl.cf5.rackcdn.com/679813/3/check/tempest-slow-py3/2313102/testr_results.html.gz 16:48:38 <ralonsoh> in both examples, one VM obtains IP and not the other one 16:48:59 <slaweq> ralonsoh: in both cases it was IMO the same vm 16:49:07 <slaweq> but it was resizes/cold migrated 16:49:13 <slaweq> so initially it got IP 16:49:15 <ralonsoh> hmmm correct 16:49:20 <slaweq> and after resize/migration it failed 16:49:32 <slaweq> that's how I see it 16:49:41 <slaweq> but I just checked it briefly for now 16:50:19 <slaweq> but I think that there is potentially some problem (race) there and we should check that 16:50:27 <slaweq> is there any volunteer maybe? 16:51:23 <ralonsoh> (distracted whistling) 16:51:34 <bcafarel> (looking intently at the ceiling) 16:51:53 <slaweq> ok, I will open LP for that and we will see 16:51:53 <njohnston> (wanders away aimlessly) 16:52:12 <slaweq> LOL 16:52:27 <slaweq> You guys rock in such moments :P 16:52:27 <njohnston> we're quite a bunch, aren't we 16:52:38 <ralonsoh> hehehehe 16:52:55 <slaweq> njohnston: yeah, indeed :P 16:53:08 <bcafarel> that is some team spirit at least :) 16:53:23 <slaweq> #action slaweq to report LP about connectivity issues after resize/migration 16:53:32 <slaweq> bcafarel: yeah 16:54:13 <slaweq> 👍 16:54:37 <slaweq> ok, now some good news 16:54:53 <slaweq> mlavalle recently finally found root cause and fixed issue with router migrations 16:54:57 <slaweq> patch is merged https://review.opendev.org/#/c/691498/ 16:55:01 <njohnston> \o/ 16:55:13 <ralonsoh> good one 16:55:15 <slaweq> so I hope we should have better numbers on dvr multinode scenario job now 16:55:33 <slaweq> ralonsoh: yes, that was interesting issue 16:55:43 <slaweq> in fact it was bug in our code, not in tests 16:56:20 <slaweq> ok, and the last one from me 16:56:29 <slaweq> related more to grenade jobs 16:56:47 <slaweq> I saw that some jobs are failing due to issue in nova scheduler 16:56:52 <slaweq> so I opened bug https://bugs.launchpad.net/nova/+bug/1850291 16:56:52 <openstack> Launchpad bug 1844929 in OpenStack Compute (nova) "duplicate for #1850291 grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,Confirmed] 16:57:00 * bcafarel waits for the https://review.opendev.org/#/c/691498/ backports to appear 16:57:11 <slaweq> but mriedem just marked it as duplicate of some other one, so it is already known issue for nova team 16:57:59 <slaweq> and that's all on my side for today 16:58:07 <ralonsoh> just in time 16:58:09 <njohnston> I have one thing. Looking at the numbers for the *-uwsgi jobs, they closely mirror their non-uwsgi versions but often are a bit lower. I'd like to propose we make them voting. I'll propose a change and we can conduct the conversation in Gerrit. 16:58:14 <slaweq> next week I will cancel ci meeting as we will be on ptg 16:58:24 <ralonsoh> njohnston, are you sure? 16:58:41 <ralonsoh> ok, let's talk in gerrit 16:59:01 <slaweq> njohnston++ 16:59:02 <njohnston> ralonsoh: I set up some custom graphite searches and it looked pretty good to me. The tempest uwsgi job has veen quite stable since the fix for it went in in mid August 16:59:15 <ralonsoh> then perfect for me 16:59:16 <bcafarel> nice 16:59:28 <slaweq> njohnston: I remember that there was quite many timeout on this job some time ago 16:59:35 <slaweq> but I don't know exactly how it is now 16:59:41 <slaweq> lets talk about it on gerrit 16:59:49 <slaweq> ok, thx for attending 16:59:51 <njohnston> it's much better, as you can see in the current Grafana - the uwsgi lines are low and tight 16:59:54 <njohnston> thanks all 16:59:55 <slaweq> and have a great week 17:00:00 <njohnston> safe travels all 17:00:00 <slaweq> o/ 17:00:01 <ralonsoh> bye! 17:00:02 <bcafarel> o/ 17:00:03 <njohnston> o/ 17:00:03 <slaweq> #endmeeting