16:00:31 #startmeeting neutron_ci 16:00:32 Meeting started Tue Jul 31 16:00:31 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:37 welcome back 16:00:37 The meeting name has been set to 'neutron_ci' 16:00:39 o/ 16:00:43 on neutron CI meeting this time :) 16:00:59 o/ 16:01:14 haleyb: around? 16:01:45 ok, let's start 16:01:48 slaweq: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 16:02:07 you already started :-) 16:02:10 sorry 16:02:15 wrong copy paste 16:02:17 #topic Actions from previous meetings 16:02:31 mlavalle to check dvr (dvr-ha) env and shelve/unshelve server 16:02:51 I didn't have time. Soprry ;-( 16:03:00 no problem :) 16:03:01 will catch up this week 16:03:23 we have bigger priorities now :) 16:03:35 #action mlavalle to check dvr (dvr-ha) env and shelve/unshelve server 16:03:46 next one was on me 16:03:48 * slaweq to check why there was no network connection in neutron-tempest-dvr-ha-multinode-full test 16:04:14 I checked briefly logs and I didn't found anything wrong there 16:04:33 it was only once such issue so If it will repeat, I will report a bug and we will dig more into in then 16:04:41 fine for You? 16:05:32 yeap 16:05:35 no problem 16:05:44 thx 16:05:54 last thing from last week is 16:05:55 * slaweq to check neutron-tempest-iptables_hybrid ssh connection issue 16:06:13 I was checking that. It doesn’t look like issue with Neutron. This failure happens randomly in many, not only neutron jobs. 16:06:36 I proposed a patch for tempest: https://review.openstack.org/#/c/587221/ - if that will be merged, I can check console log of instance and maybe that will help me understand what is going on there 16:06:52 patch for tempest is in the gate now so I hope it will be merged soon 16:07:00 cool 16:07:19 ok, so moving on to 16:07:21 #topic Grafana 16:07:41 thx njohnston we have dashboards for stable releases now 16:07:51 * stable/queens: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release 16:08:04 Stable/pike: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release 16:08:10 Pretty cool! 16:08:39 but as I was looking at them today, I think they need some tweaking 16:08:51 as some jobs names might be different in e.g. Pike branch 16:09:11 and would be good to mark some jobs that are non-voting, e.g. fullstack for pike 16:09:29 njohnston: should I do it or You will check that? 16:09:33 makes sense, I can get that going 16:09:41 thx njohnston 16:10:06 #action njohnston to tweak stable branches dashboards 16:10:26 ok, let's check dashboard for master branch: 16:10:28 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:12:07 btw. njohnston did You change something with dashboard as now it opens with last 7 days period by default 16:12:08 :) 16:12:15 yes I did :-) 16:12:51 I like it 16:12:51 you get faster to where you want to go 16:13:03 thx a lot njohnston :) 16:13:38 an easy fix :-) 16:14:35 what's with the gaps in the gate queue? This is the second week we see this 16:14:53 yes, and I have no idea why there are such gaps there 16:15:07 maybe we should ask someone from infra team about that? 16:15:15 yeah 16:16:19 I will try to talk with them about it 16:16:45 #action slaweq to talk with infra about gaps in grafana graphs 16:18:12 speaking about graphs, I found that around 26-27 there were couple of failures due to some problems with installations of packages 16:18:41 which might explain spike e.g. in fullstack gate graph 16:19:22 how about the peaks in functional? 16:19:47 can it be explained by the packages installation? 16:19:49 in gate queue? 16:20:10 yeah 16:20:14 I don't think so, but it was weekend 16:20:25 so maybe there wasn't too many patches in queue? 16:20:28 yes, weekend 16:20:37 that could be 16:20:42 I was checking most of the patches from last days 16:21:24 but if something was rechecked, I will not see past failures in gerrit, right? 16:21:38 and I only based on results visible in gerrit 16:22:03 accrding to that I found few different issues which we can discuss now 16:22:09 ok 16:22:19 so let's start with 16:22:21 #topic Functional 16:22:50 I found two different failures, which doesn't look like related to patch on which it was running 16:23:07 first is neutron.tests.functional.db.migrations like e.g.: * http://logs.openstack.org/50/533850/34/check/neutron-functional/30ee01c/logs/testr_results.html.gz 16:23:19 and that happens from time to time 16:23:36 I should open a bug for that probably 16:26:01 do You think that reporting a bug is good idea? 16:26:06 yes 16:26:09 k 16:26:35 #action slaweq to report a bug with failing neutron.tests.functional.db.migrations functional tests 16:26:48 and next one related to functional is: 16:26:54 * neutron.tests.functional.agent.linux.test_netlink_lib.NetlinkLibTestCase.test_list_entries: 16:27:02 http://logs.openstack.org/44/587244/2/check/neutron-functional/ec774d6/logs/testr_results.html.gz 16:27:09 did You saw it before maybe? 16:27:39 * mlavalle looking 16:28:09 No, I haven't 16:28:32 let's be on the lookout for this one 16:28:41 yep, I agree 16:28:58 and that were issues which I found in functional tests 16:29:31 from other issues, most what I found was again tempest related jobs 16:29:47 fortunatelly most of the issues is in non-voting jobs :) 16:30:06 so let's talk about those jobs 16:30:08 #topic Scenarios 16:30:20 first short info 16:30:48 failures of tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment - are probably related to bug in nova 16:31:12 mriedem did patch https://review.openstack.org/#/c/586292/ to disable this failing part temporary 16:31:22 so we should be good now in our jobs with this :) 16:32:21 that's good 16:32:34 we were seeing that failure a lot last week 16:32:35 today I checked results of 50 most recently update neutron patches from gerrit and I made a list of issues which I found there 16:32:50 yes mlavalle, should be better now :) 16:33:03 from other jobs, what I found this week is: 16:33:26 neutron-tempest-dvr job, one failure of tempest.api.network.test_networks.NetworksTest.test_create_delete_subnet_with_gw_and_allocation_pools 16:33:35 http://logs.openstack.org/01/584601/2/check/neutron-tempest-dvr/65f95f5/logs/testr_results.html.gz 16:33:48 <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth 16:33:52 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 16:34:00 this looks like maybe some issue in tempest delete_resource method 16:34:11 I will check that this week 16:34:47 #action slaweq to check tempest delete_resource method 16:35:27 in neutron-tempest-plugin-scenario-linuxbridge job one issue: 16:35:41 http://logs.openstack.org/53/586253/1/check/neutron-tempest-plugin-scenario-linuxbridge/ec36ec6/testr_results.html.gz 16:36:13 it is some issue with connection to FIP 16:36:59 tiemouts 16:38:03 yes, looks like that 16:38:05 again 16:38:26 but it looks like it took more than 700 seconds to boot 16:38:32 which is insane IMO 16:38:50 indeed 16:39:42 and now I remember that I found some time ago that in many jobs (maybe all) we have virt_type=qemu 16:39:46 http://logs.openstack.org/53/586253/1/check/neutron-tempest-plugin-scenario-linuxbridge/ec36ec6/controller/logs/etc/nova/nova_conf.txt.gz 16:39:55 do You know if that could be changed maybe? 16:40:00 or why it is like that? 16:40:11 no, I don't know 16:40:22 changing to kvm should speed it up a bit 16:40:30 yeah 16:40:33 I will ask about that infra team also 16:40:53 #action slaweq to check with infra if virt_type=kvm is possible in gate jobs 16:41:24 other issues which I found are from non-voting jobs 16:41:31 first is: neutron-tempest-dvr-ha-multinode-full (non-voting) 16:41:51 here the most common issue is * tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_{attach|detach}_volume_shelved_or_offload_server 16:42:32 which I found that happens 19 times in last few days 16:43:07 so mlavalle we should definitelly check that :) 16:43:14 yes 16:43:15 and it's assigned to You as an action 16:43:17 :) 16:43:26 I know 16:43:37 I know that You know :D 16:43:43 LOL 16:43:44 just saying :P 16:44:01 I also found failure in tempest.api.compute.admin.test_live_migration.LiveMigrationRemoteConsolesV26Test test 16:44:08 http://logs.openstack.org/59/582659/2/check/neutron-tempest-dvr-ha-multinode-full/38f4cd1/logs/testr_results.html.gz 16:44:20 it's related to latest changes with multiple port bindings 16:44:35 but it's already fixed on Nova: https://bugs.launchpad.net/nova/+bug/1783917 16:44:35 Launchpad bug 1783917 in OpenStack Compute (nova) "live migration fails with NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'" [High,Fix released] - Assigned to Matt Riedemann (mriedem) 16:44:57 just wanted to mention that there was such issue :) 16:45:10 <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth 16:45:14 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 16:45:17 I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ 16:45:20 or maybe this blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/ 16:45:22 and the last in this job is something new (at least for me) 16:45:29 it's failure in tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_in_tenant_traffic 16:45:38 and it hits at least 3 times recently: 16:45:47 * http://logs.openstack.org/63/577463/4/check/neutron-tempest-dvr-ha-multinode-full/612982d/logs/testr_results.html.gz 16:45:49 * http://logs.openstack.org/88/555088/21/check/neutron-tempest-dvr-ha-multinode-full/48acc8b/logs/testr_results.html.gz 16:45:51 * http://logs.openstack.org/14/529814/25/check/neutron-tempest-dvr-ha-multinode-full/499a621/logs/testr_results.html.gzy 16:46:26 and here there is no console-output from instance again :/ 16:47:14 so for now I think I will report this as a bug and try to send patch to tempest to add console log in such cases 16:47:38 ok 16:47:43 then we will be able to check something more maybe 16:48:01 #action slaweq to report a bug with tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_in_tenant_traffic 16:48:25 #action slaweq to send patch to add logging of console output in tests like tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_in_tenant_traffic 16:48:42 <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth 16:48:45 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 16:48:49 I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ 16:48:52 or maybe this blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/ 16:49:15 issue with test_attach_volume_shelved_or_offload_server was also few times in neutron-tempest-multinode-full 16:49:18 like: 16:49:25 * http://logs.openstack.org/17/584217/5/check/neutron-tempest-multinode-full/6da0342/logs/testr_results.html.gz 16:49:26 * http://logs.openstack.org/99/587099/1/check/neutron-tempest-multinode-full/3c85bf1/logs/testr_results.html.gz 16:49:33 so it's not only dvr related 16:49:45 but in this dvr test it fails more often 16:50:17 and that is basically everything which I have prepared for today 16:50:23 anything else to add? 16:50:32 I have a quick question related to the python3 community goal... we have the tempest-full-py3 job, but what other tempest jobs do we want to have py3 jobs? Anything voting, or are there some that are not as useful? Thinking of neutron-tempest-[dvr,linuxbridge,plugin-api,iptables_hybrid,*scenario*]. I assume the answer is 'everything' but wanted to check. 16:50:32 but it has in common that it is always multinode 16:51:16 mlavalle: right, I didn't saw it in singlenode job 16:51:54 i has same question as njohnston asked 16:51:59 njohnston: hmm, I don't know if doing all jobs with py2 and py3 will not be too much 16:52:24 so the goal is do all tempest with py3 is as well ? 16:52:48 I think that goal is to replace all py27 jobs with py3 :) 16:52:48 njohnston: yeah, maybe we should check with the TC guys 16:53:19 so here is the goal text: https://governance.openstack.org/tc/goals/pike/python35.html 16:53:21 as a first step maybe adding such jobs to experimental queue and check "on-demand" would be enough 16:53:39 For projects with integration tests in any form: 16:53:39 - All of the integration tests must pass when a service is running under python 3.5. 16:53:39 - Voting check and gate jobs are present to run all of the project’s integration tests under python 3.5 to avoid regressions. 16:54:13 <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth 16:54:13 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 16:54:16 I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ 16:54:19 or maybe this blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/ 16:54:24 The "All of the tests" seems pretty clear, but I wanted to make sure that passes the common-sense test, to make sure we don't abuse the testing infrastructure 16:54:30 njohnston: yeah, it seems you are right. I'll read the text slowly and maybe ask some questions to the TC guys. I'll get back to you 16:54:33 all integration tests include all tempest jobs i guess ? 16:54:48 manjeets: That is my reading of it, yes 16:55:06 I'll get confirmation from the TC 16:55:11 Thank you mlavalle!@ 16:55:13 thx mlavalle 16:55:54 #action mlavalle will check with tc what jobs should be running with py3 16:56:02 ++ 16:57:05 ok, so I think we are done now 16:57:09 thanks all 16:57:10 thx for attending 16:57:13 o/ 16:57:23 have a nice week and see You 16:57:26 #endmeeting