16:01:05 #startmeeting neutron_ci 16:01:05 Meeting started Tue Jun 4 16:01:05 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:07 hi 16:01:08 The meeting name has been set to 'neutron_ci' 16:01:21 o/ 16:01:59 hi again 16:02:13 I know that mlavalle will not be able to join this meeting today 16:02:21 so I think we can start now 16:02:32 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:54 #topic Actions from previous meetings 16:03:01 slaweq to continue work on fetch-journal-log zuul role 16:03:11 I did patch https://review.opendev.org/#/c/643733/ and in https://review.opendev.org/#/c/661915/ it's tested that it works 16:03:20 but I'm not sure if this will be needed in fact 16:03:47 I noticed recently that full journal log is already dumped in devstack.journal.xz.gz file in job's logs 16:04:21 but it's in binary journal format so You need to download it and use journalctl to examine it 16:04:46 good to know! 16:04:57 yeah, I had no idea about that before 16:05:16 maybe we could use a role to translate that file in job definition? 16:05:51 bcafarel: You mean to text file? 16:06:15 yes, to skip the download step 16:06:29 for lazy folks like me :) 16:06:51 it's basically what role proposed by me in https://review.opendev.org/#/c/643733/ is doing 16:07:15 we can try to convice infra-root people that it may be useful :) 16:08:08 slaweq: I'm still of the opinion that we should capture the serialized version in its entirety and compress it well as it is a very large file 16:08:12 it ends up taking up a lot of additional disk space in some jobs to keep two copies of the log, and the native version is useful for more flexible local filtering 16:08:23 slaweq: it gives you way more flexibility that way, with a small amount of setup 16:09:02 if we know we need specific service log files we can pull those out along with the openstack service logs 16:09:19 but for the global capture I think what we've got now with the export format works really well 16:09:48 clarkb: yes, but e.g. for neutron it is many different services, like keepalived, haproxy, dnsmasq which are spawned per router or network 16:10:56 yup which is why the current setup is great. You can do journalctl -u keepalived -u haproxy -u dnsmasq -u devstack@q-agt and get only those logs interleaved 16:11:04 you can also set date ranges to narrow down what you are looking at 16:11:10 you cannot easily do that with the chnage you have proposed 16:11:28 clarkb: so as I said, I can just abandon this my patch as now I know that there is this log available and how to get it 16:11:51 I think that this will be the best approach, what do You think? 16:11:52 oh I was going to suggest you update the role so that you don't have to use devstack to get that functionality 16:12:03 basically do what devstack does but in a reconsumable role in zuul-jobs 16:12:07 then we can apply it to non devstack jobs 16:12:36 clarkb: tbh I need it to devstack based jobs so it's enough for us what is now :) 16:12:54 I bet we could easily create a grabjournal script that could fetch the journal for a specific job of a specific change and run that journalctl command on it, if our issue is just that we want to make it more accessible for developers 16:13:06 slaweq: ok 16:13:35 njohnston: initially I wasn't aware of that this log exists already in job's logs, so I proposed patch 16:14:15 but now IMO the only "issue" is this accessibility of log and I think that this is not something which we should spent a lot of time on :) 16:14:32 +1 sounds good 16:15:04 yes, I can survive a download+parse :) 16:15:29 so my proposal is: lets for now use what is already there 16:15:47 and we will see if this will have to be improved somehow :) 16:16:20 ok, so lets move on to the next action 16:16:28 slaweq to remove neutron-tempest-plugin-bgpvpn-bagpipe from "neutron-tempest-plugin-jobs" template 16:16:32 Done: https://review.opendev.org/#/c/661899/ 16:16:44 and the last one was; 16:16:46 mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) 16:16:48 bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:17:04 but as mlavalle is not here now, I think we can assign it to him for next week to not forget about it 16:17:09 #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) 16:17:31 do You have anything else to add regarding actions from last week? 16:18:12 all good here 16:18:21 ok, lets move on 16:18:27 #topic Stadium projects 16:18:36 Python 3 migration 16:18:42 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:18:47 any updates on it? 16:19:06 not from me sorry no progress here 16:19:16 me neither 16:19:26 same from my side 16:19:30 so next thing 16:19:37 tempest-plugins migration 16:19:41 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:20:03 for networking-bgpvpn both main patches are merged 16:20:20 but there was need to do some follow up cleanup which I forgot to do 16:20:30 so there is also https://review.opendev.org/#/c/662231/ waiting for review 16:20:35 oh, what needed to be cleaned up? 16:20:51 ah I see 16:20:55 and there was https://review.opendev.org/#/c/662142/ from masayukig but this one is already merged 16:21:05 so please also remember about that in Your patches :) 16:21:23 +1 16:21:30 and that's all from my side about this 16:21:36 any updates on Your side? 16:21:47 for sfc https://review.opendev.org/#/c/653747 is close to merge (second patch), pending on some gate fixes 16:22:08 so mostly gerrit work left to do :) 16:22:16 bcafarel: great :) 16:22:33 I've been letting fwaas sit, but I will probably be able to dedicate some time to it about a week from now 16:23:04 njohnston: great, if You would need any help, ping me :) 16:24:19 ok, so lets move on to the next topic then 16:24:25 #topic Grafana 16:24:31 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:25:14 still IMO things looks relatively good 16:25:28 the biggest problem which we have are some failures in functional/fullstack tests 16:25:39 yep 16:25:41 and those ssh failure in tempest jobs 16:25:51 but I think it happens less often recently :) 16:26:58 I did have one general grafana question; in the "Number of Functional/Fullstack job runs (Gate queue)" graph I see that the lines diverge. Does that mean we lost data somewhere? I can't imagine a scenario where neutron-functional runs but not neutron-functional-python27 16:27:24 that's just one example; it happens from time to time elsewhere as well 16:29:05 good question njohnston but I don't know an answer 16:29:09 that is interesting, if it's only +/- 1 i could just see it being one had finished at that point in time but the other hadn't 16:29:51 IMHO it may be lack of some data collected by infra - we have some gaps from time to time on graphs so maybe it's also something like that 16:29:56 that's why I picked that one: neutron-functional = 10; neutron-functional-python27 = 7; neutron-fullstack = 6 16:30:22 yeah, it's just weird, something to be aware of 16:31:01 I agree, thx njohnston for pointing this 16:32:26 anything else regarding grafana in general? 16:32:29 speaking of failure rates, i did update the neutron-lib dashboard last week, not that we talk about it much here, just an FYI 16:32:41 haleyb: thx 16:32:48 it looks similar to this one and ovn now 16:32:53 I should take a look at it from time to time too 16:33:04 http://grafana.openstack.org/d/Um5INcSmz/neutron-lib-failure-rate?orgId=1 16:33:41 not too much data there yet :) 16:33:58 nope, not many failures or patches 16:35:23 some periodic jobs are failing constantly 16:35:32 which is maybe worth to check 16:36:11 i think they are known failures, but will look! 16:36:42 haleyb: thx a lot 16:36:46 ok, lets move on then 16:36:51 #topic fullstack/functional 16:37:13 I was looking at results of some failed jobs from last couple of days 16:37:25 and I found 2 new failed tests in functional job 16:37:32 http://logs.openstack.org/78/653378/7/check/neutron-functional/c5ac6a3/testr_results.html.gz and 16:37:37 http://logs.openstack.org/82/659882/3/check/neutron-functional-python27/5e30908/testr_results.html.gz 16:38:04 but each of those I saw only once 16:38:16 did You maybe see something like that before? 16:39:01 i haven't seen it, but failed running sysctl in the first? 16:39:51 here are logs from this first failed test: http://logs.openstack.org/78/653378/7/check/neutron-functional/c5ac6a3/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.test_dvr_router.TestDvrRouter.test_dvr_ha_router_unbound_from_agents.txt.gz#_2019-06-02_14_15_09_388 16:40:48 oh, rtnetlink error, that could be a bug? 16:41:39 i.e. delete_gateway should deal with it 16:42:00 yes, it's possible 16:42:10 I will try to look deeper into this log this week 16:42:20 maybe I will find something obvious to change :) 16:42:29 for the second one nothing jumps out at me but this one error: http://logs.openstack.org/82/659882/3/check/neutron-functional-python27/5e30908/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.test_l2_ovs_agent.TestOVSAgent.test_assert_br_phys_patch_port_ofports_dont_change.txt.gz#_2019-05-30_08_52_08_873 16:42:55 #action slaweq to check logs with failed test_dvr_ha_router_unbound_from_agents functional test 16:43:25 slaweq: i know rodolfo had a patchset up to change this area to use privsep, so might want to start there 16:43:38 haleyb: good to know, thx 16:43:48 I will ask him when he will back 16:44:27 https://review.opendev.org/#/c/661981/ 16:44:49 errors will change after that, into some pyroute2 one perhaps? 16:45:13 ok, I will keep it in mind, thx :) 16:46:49 njohnston: for the second one, this error can be maybe related 16:47:09 or maybe not :) 16:47:37 :) 16:48:13 njohnston: this test is stoping agent: https://github.com/openstack/neutron/blob/86139658efdc739c6cc330304bdf4455613df78d/neutron/tests/functional/agent/test_l2_ovs_agent.py#L261 16:48:26 and IMO this "error" in log is related to this agent's stop 16:48:56 so for me it don't look like possible problem at first glance 16:49:28 I agree 16:50:06 so lets keep this issue in mind, if it will happen more often we will investigate it :) 16:50:16 do You agree? 16:51:36 ok, I guess it means yes :) 16:51:51 so that is all what I had for today 16:52:05 do You have anything else You want to talk about today? 16:52:28 -1 from me 16:52:35 :) 16:52:38 nothing from me either 16:52:39 i'm hungry 16:52:54 haleyb: I'm tired 16:53:02 so let's finish it bit earlier today 16:53:09 thx for attending :) 16:53:12 o/ 16:53:14 o/ 16:53:18 #endmeeting