16:01:05 <slaweq> #startmeeting neutron_ci 16:01:05 <openstack> Meeting started Tue Jun 4 16:01:05 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:07 <slaweq> hi 16:01:08 <openstack> The meeting name has been set to 'neutron_ci' 16:01:21 <njohnston> o/ 16:01:59 <bcafarel> hi again 16:02:13 <slaweq> I know that mlavalle will not be able to join this meeting today 16:02:21 <slaweq> so I think we can start now 16:02:32 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:54 <slaweq> #topic Actions from previous meetings 16:03:01 <slaweq> slaweq to continue work on fetch-journal-log zuul role 16:03:11 <slaweq> I did patch https://review.opendev.org/#/c/643733/ and in https://review.opendev.org/#/c/661915/ it's tested that it works 16:03:20 <slaweq> but I'm not sure if this will be needed in fact 16:03:47 <slaweq> I noticed recently that full journal log is already dumped in devstack.journal.xz.gz file in job's logs 16:04:21 <slaweq> but it's in binary journal format so You need to download it and use journalctl to examine it 16:04:46 <njohnston> good to know! 16:04:57 <slaweq> yeah, I had no idea about that before 16:05:16 <bcafarel> maybe we could use a role to translate that file in job definition? 16:05:51 <slaweq> bcafarel: You mean to text file? 16:06:15 <bcafarel> yes, to skip the download step 16:06:29 <bcafarel> for lazy folks like me :) 16:06:51 <slaweq> it's basically what role proposed by me in https://review.opendev.org/#/c/643733/ is doing 16:07:15 <slaweq> we can try to convice infra-root people that it may be useful :) 16:08:08 <clarkb> slaweq: I'm still of the opinion that we should capture the serialized version in its entirety and compress it well as it is a very large file 16:08:12 <fungi> it ends up taking up a lot of additional disk space in some jobs to keep two copies of the log, and the native version is useful for more flexible local filtering 16:08:23 <clarkb> slaweq: it gives you way more flexibility that way, with a small amount of setup 16:09:02 <clarkb> if we know we need specific service log files we can pull those out along with the openstack service logs 16:09:19 <clarkb> but for the global capture I think what we've got now with the export format works really well 16:09:48 <slaweq> clarkb: yes, but e.g. for neutron it is many different services, like keepalived, haproxy, dnsmasq which are spawned per router or network 16:10:56 <clarkb> yup which is why the current setup is great. You can do journalctl -u keepalived -u haproxy -u dnsmasq -u devstack@q-agt and get only those logs interleaved 16:11:04 <clarkb> you can also set date ranges to narrow down what you are looking at 16:11:10 <clarkb> you cannot easily do that with the chnage you have proposed 16:11:28 <slaweq> clarkb: so as I said, I can just abandon this my patch as now I know that there is this log available and how to get it 16:11:51 <slaweq> I think that this will be the best approach, what do You think? 16:11:52 <clarkb> oh I was going to suggest you update the role so that you don't have to use devstack to get that functionality 16:12:03 <clarkb> basically do what devstack does but in a reconsumable role in zuul-jobs 16:12:07 <clarkb> then we can apply it to non devstack jobs 16:12:36 <slaweq> clarkb: tbh I need it to devstack based jobs so it's enough for us what is now :) 16:12:54 <njohnston> I bet we could easily create a grabjournal script that could fetch the journal for a specific job of a specific change and run that journalctl command on it, if our issue is just that we want to make it more accessible for developers 16:13:06 <clarkb> slaweq: ok 16:13:35 <slaweq> njohnston: initially I wasn't aware of that this log exists already in job's logs, so I proposed patch 16:14:15 <slaweq> but now IMO the only "issue" is this accessibility of log and I think that this is not something which we should spent a lot of time on :) 16:14:32 <njohnston> +1 sounds good 16:15:04 <bcafarel> yes, I can survive a download+parse :) 16:15:29 <slaweq> so my proposal is: lets for now use what is already there 16:15:47 <slaweq> and we will see if this will have to be improved somehow :) 16:16:20 <slaweq> ok, so lets move on to the next action 16:16:28 <slaweq> slaweq to remove neutron-tempest-plugin-bgpvpn-bagpipe from "neutron-tempest-plugin-jobs" template 16:16:32 <slaweq> Done: https://review.opendev.org/#/c/661899/ 16:16:44 <slaweq> and the last one was; 16:16:46 <slaweq> mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) 16:16:48 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:17:04 <slaweq> but as mlavalle is not here now, I think we can assign it to him for next week to not forget about it 16:17:09 <slaweq> #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) 16:17:31 <slaweq> do You have anything else to add regarding actions from last week? 16:18:12 <bcafarel> all good here 16:18:21 <slaweq> ok, lets move on 16:18:27 <slaweq> #topic Stadium projects 16:18:36 <slaweq> Python 3 migration 16:18:42 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:18:47 <slaweq> any updates on it? 16:19:06 <bcafarel> not from me sorry no progress here 16:19:16 <njohnston> me neither 16:19:26 <slaweq> same from my side 16:19:30 <slaweq> so next thing 16:19:37 <slaweq> tempest-plugins migration 16:19:41 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:20:03 <slaweq> for networking-bgpvpn both main patches are merged 16:20:20 <slaweq> but there was need to do some follow up cleanup which I forgot to do 16:20:30 <slaweq> so there is also https://review.opendev.org/#/c/662231/ waiting for review 16:20:35 <njohnston> oh, what needed to be cleaned up? 16:20:51 <njohnston> ah I see 16:20:55 <slaweq> and there was https://review.opendev.org/#/c/662142/ from masayukig but this one is already merged 16:21:05 <slaweq> so please also remember about that in Your patches :) 16:21:23 <njohnston> +1 16:21:30 <slaweq> and that's all from my side about this 16:21:36 <slaweq> any updates on Your side? 16:21:47 <bcafarel> for sfc https://review.opendev.org/#/c/653747 is close to merge (second patch), pending on some gate fixes 16:22:08 <bcafarel> so mostly gerrit work left to do :) 16:22:16 <slaweq> bcafarel: great :) 16:22:33 <njohnston> I've been letting fwaas sit, but I will probably be able to dedicate some time to it about a week from now 16:23:04 <slaweq> njohnston: great, if You would need any help, ping me :) 16:24:19 <slaweq> ok, so lets move on to the next topic then 16:24:25 <slaweq> #topic Grafana 16:24:31 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:25:14 <slaweq> still IMO things looks relatively good 16:25:28 <slaweq> the biggest problem which we have are some failures in functional/fullstack tests 16:25:39 <njohnston> yep 16:25:41 <slaweq> and those ssh failure in tempest jobs 16:25:51 <slaweq> but I think it happens less often recently :) 16:26:58 <njohnston> I did have one general grafana question; in the "Number of Functional/Fullstack job runs (Gate queue)" graph I see that the lines diverge. Does that mean we lost data somewhere? I can't imagine a scenario where neutron-functional runs but not neutron-functional-python27 16:27:24 <njohnston> that's just one example; it happens from time to time elsewhere as well 16:29:05 <slaweq> good question njohnston but I don't know an answer 16:29:09 <haleyb> that is interesting, if it's only +/- 1 i could just see it being one had finished at that point in time but the other hadn't 16:29:51 <slaweq> IMHO it may be lack of some data collected by infra - we have some gaps from time to time on graphs so maybe it's also something like that 16:29:56 <njohnston> that's why I picked that one: neutron-functional = 10; neutron-functional-python27 = 7; neutron-fullstack = 6 16:30:22 <njohnston> yeah, it's just weird, something to be aware of 16:31:01 <slaweq> I agree, thx njohnston for pointing this 16:32:26 <slaweq> anything else regarding grafana in general? 16:32:29 <haleyb> speaking of failure rates, i did update the neutron-lib dashboard last week, not that we talk about it much here, just an FYI 16:32:41 <slaweq> haleyb: thx 16:32:48 <haleyb> it looks similar to this one and ovn now 16:32:53 <slaweq> I should take a look at it from time to time too 16:33:04 <haleyb> http://grafana.openstack.org/d/Um5INcSmz/neutron-lib-failure-rate?orgId=1 16:33:41 <slaweq> not too much data there yet :) 16:33:58 <haleyb> nope, not many failures or patches 16:35:23 <slaweq> some periodic jobs are failing constantly 16:35:32 <slaweq> which is maybe worth to check 16:36:11 <haleyb> i think they are known failures, but will look! 16:36:42 <slaweq> haleyb: thx a lot 16:36:46 <slaweq> ok, lets move on then 16:36:51 <slaweq> #topic fullstack/functional 16:37:13 <slaweq> I was looking at results of some failed jobs from last couple of days 16:37:25 <slaweq> and I found 2 new failed tests in functional job 16:37:32 <slaweq> http://logs.openstack.org/78/653378/7/check/neutron-functional/c5ac6a3/testr_results.html.gz and 16:37:37 <slaweq> http://logs.openstack.org/82/659882/3/check/neutron-functional-python27/5e30908/testr_results.html.gz 16:38:04 <slaweq> but each of those I saw only once 16:38:16 <slaweq> did You maybe see something like that before? 16:39:01 <haleyb> i haven't seen it, but failed running sysctl in the first? 16:39:51 <slaweq> here are logs from this first failed test: http://logs.openstack.org/78/653378/7/check/neutron-functional/c5ac6a3/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.test_dvr_router.TestDvrRouter.test_dvr_ha_router_unbound_from_agents.txt.gz#_2019-06-02_14_15_09_388 16:40:48 <haleyb> oh, rtnetlink error, that could be a bug? 16:41:39 <haleyb> i.e. delete_gateway should deal with it 16:42:00 <slaweq> yes, it's possible 16:42:10 <slaweq> I will try to look deeper into this log this week 16:42:20 <slaweq> maybe I will find something obvious to change :) 16:42:29 <njohnston> for the second one nothing jumps out at me but this one error: http://logs.openstack.org/82/659882/3/check/neutron-functional-python27/5e30908/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.test_l2_ovs_agent.TestOVSAgent.test_assert_br_phys_patch_port_ofports_dont_change.txt.gz#_2019-05-30_08_52_08_873 16:42:55 <slaweq> #action slaweq to check logs with failed test_dvr_ha_router_unbound_from_agents functional test 16:43:25 <haleyb> slaweq: i know rodolfo had a patchset up to change this area to use privsep, so might want to start there 16:43:38 <slaweq> haleyb: good to know, thx 16:43:48 <slaweq> I will ask him when he will back 16:44:27 <haleyb> https://review.opendev.org/#/c/661981/ 16:44:49 <haleyb> errors will change after that, into some pyroute2 one perhaps? 16:45:13 <slaweq> ok, I will keep it in mind, thx :) 16:46:49 <slaweq> njohnston: for the second one, this error can be maybe related 16:47:09 <slaweq> or maybe not :) 16:47:37 <njohnston> :) 16:48:13 <slaweq> njohnston: this test is stoping agent: https://github.com/openstack/neutron/blob/86139658efdc739c6cc330304bdf4455613df78d/neutron/tests/functional/agent/test_l2_ovs_agent.py#L261 16:48:26 <slaweq> and IMO this "error" in log is related to this agent's stop 16:48:56 <slaweq> so for me it don't look like possible problem at first glance 16:49:28 <njohnston> I agree 16:50:06 <slaweq> so lets keep this issue in mind, if it will happen more often we will investigate it :) 16:50:16 <slaweq> do You agree? 16:51:36 <slaweq> ok, I guess it means yes :) 16:51:51 <slaweq> so that is all what I had for today 16:52:05 <slaweq> do You have anything else You want to talk about today? 16:52:28 <haleyb> -1 from me 16:52:35 <bcafarel> :) 16:52:38 <bcafarel> nothing from me either 16:52:39 <haleyb> i'm hungry 16:52:54 <slaweq> haleyb: I'm tired 16:53:02 <slaweq> so let's finish it bit earlier today 16:53:09 <slaweq> thx for attending :) 16:53:12 <slaweq> o/ 16:53:14 <bcafarel> o/ 16:53:18 <slaweq> #endmeeting