16:00:05 #startmeeting neutron_ci 16:00:06 Meeting started Tue Dec 10 16:00:05 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:08 hi 16:00:09 The meeting name has been set to 'neutron_ci' 16:00:52 o/ 16:01:46 lets wait few more minutes for ralonsoh and others 16:01:51 hi 16:02:54 ok, lets start 16:02:56 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:11 please open it now so that it will be ready when needed :) 16:03:39 #topic Actions from previous meetings 16:03:51 first one: 16:03:53 njohnston to check failing NetworkMigrationFromHA in multinode dvr job 16:04:01 I'm not sure if njohnston is around now 16:04:51 #action njohnston to check failing NetworkMigrationFromHA in multinode dvr job 16:04:58 lets keep it for next week than 16:05:02 o/ 16:05:06 hi njohnston :) 16:05:12 yeah, keep it for next week, I am debuigging it right now 16:05:18 ok 16:05:20 thx 16:05:26 and good luck with debugging 16:05:28 :) 16:05:36 ok, next one: 16:05:37 :) 16:05:43 ralonsoh to check functional tests timeouts https://bugs.launchpad.net/neutron/+bug/1854462 16:05:43 Launchpad bug 1854462 in neutron "[Functional tests] Timeout exception in list_namespace_pids" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:05:56 I wrote a small script for this 16:06:12 http://paste.openstack.org/show/787322/ 16:06:27 and I added log messages in pyroute2 16:07:01 I detected that most of the time, the blocking method was https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/__init__.py#L209 16:07:20 so instead of calling it every time we call create/delete namespace 16:07:44 I create the object once (in the root context, see patch https://review.opendev.org/#/c/698039/) 16:07:46 that's all 16:09:01 smart :) 16:09:15 BTW, _CDLL = ctypes.CDLL(ctypes_util.find_library('c'), use_errno=True) 16:09:25 this MUST not change during the execution 16:09:59 maybe put a comment in to that effect? 16:10:25 njohnston, I mean: the library can't be modified 16:10:32 this is not going to happen 16:10:36 ok 16:11:27 ok 16:11:40 thx ralonsoh for working on this 16:11:51 I hope we will get rid of those timeouts with this patch 16:11:57 next one 16:11:59 slaweq to check reason of grenade jobs failures 16:12:02 looks nice indeed 16:12:02 I checked it 16:12:25 and it seems that all those failures are related to https://bugs.launchpad.net/nova/+bug/1844929 16:12:25 Launchpad bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,Confirmed] 16:13:06 and as I talked with efried and mriedem yesterday, it is probably caused by oversubscribed CI nodes 16:13:27 so we don't have any good solution for that problem now 16:13:54 only 2 possible options imo are: 16:14:13 1. live with it like it's now 16:14:34 2. make grenade jobs non-voting and non-gating temporary until this issue will be solved 16:14:49 problem with 2 is that we don't know when it may be possible fixed 16:15:21 pfff all grenade jobs? 16:15:38 mark them as non-voting? 16:15:41 ralonsoh: we have now only 2 multinode grenade jobs 16:15:42 I wonder if it would be worthwhile to email openstack-discuss and ask if anything can be done about the oversubscription 16:15:50 we removed single node jobs 16:16:01 yes, but several tests 16:16:04 I mean tests 16:16:28 njohnston: there are such threads started by mriedem here http://lists.openstack.org/pipermail/openstack-discuss/2019-October/thread.html#10484 and continued here http://lists.openstack.org/pipermail/openstack-discuss/2019-November/thread.html#10502 16:17:00 we areour own noisy neighbors in many cases. One way to address oversubsceiption is to make our software run more efficiently 16:17:19 devstack jobs swap and Ive asked sevral times that openstack address this 16:17:43 I think fixing swapping will likely have a major impact on performance relatedproblems 16:19:40 clarkb: so we should focus on optimizing Neutron's memory usage to make this working better, correct? 16:21:04 slaweq, agree with this but we usually optimize the speed, not the memory consumption 16:21:28 yep 16:21:30 most of out efforts are in optimizing the DB access, the parallelism, etc 16:22:19 slaweq: and tge rest of openstack/devstack 16:22:27 etcd seems like it isnt used but always run 16:22:31 cinder backup too 16:22:57 byt it all adds up then you start swapping which impacts the current job and all other jobs trying to access those disk resources 16:23:33 clarkb: ok, I will check those jobs and maybe will be able to disable some of those not used services there 16:23:54 thx for this tips 16:24:17 #action slaweq to check and try to optimize neutron-grenade-multinode jobs memory consumption 16:24:39 ok, I think we can move on 16:24:44 next topic 16:24:46 #topic Stadium projects 16:24:52 tempest-plugins migration 16:24:57 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:25:10 last 2 patches for neutron-vpnaas are ready for review now 16:25:21 Step 1: https://review.openstack.org/#/c/649373 16:25:23 Step 2: https://review.opendev.org/#/c/695834 16:25:29 I just +2'ed Step 1 patch 16:25:49 and in step 2 we will probably need to switch centos based job to be non-voting 16:26:03 as code isn't compatible with py27 anymore 16:26:58 yes we need centos7+py3, or centos8 (when possible) 16:26:59 agreed 16:27:29 (I can't run devstack with centos8) 16:27:39 some libraries are missing 16:28:18 maybe we should switch this job to be fedora based? 16:28:29 but that can be IMO done as follow up patch 16:28:32 F29 is working, not F30 16:28:45 sure, in another patch 16:29:28 yes, let's wrap up tempest plugin migration first 16:30:09 yeah :) 16:30:27 so I hope mlavalle will push one more PS soon and we will be done with this finally 16:30:41 I hope next week we will move it out from meeting agenda 16:31:11 :) 16:31:38 next stadium related topic is 16:31:40 Neutron Train - Drop py27 and standardize on zuul v3 16:31:45 Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 16:31:54 njohnston: any updates on this since yesterday? :) 16:32:10 Nope, I thought I saw bcafarel doing something related though 16:32:32 yes I sent a few phase 1 patches for review (links in etherpad) 16:32:42 also I got feedback on openstack-python-jobs-neutron jobs 16:33:10 these are in fact now legacy set and should not be touched, we should move to openstack-python3-ussuri-jobs-neutron 16:34:21 I updated status for some projects too (reviews merged so done for them) 16:34:32 thanks very much bcafarel! 16:35:15 bcafarel++ thx a lot 16:35:35 * njohnston sees slaweq switching between channels, always in demand! 16:36:13 everybody always looking for the PTL 16:36:33 njohnston: yes, I'm trying 16:36:37 but it's hard :) 16:37:03 ok, I think that it is all related to stadium projects for today 16:37:23 or maybe You have anything else what You want to discuss today? 16:37:33 if not, lets move on to the next topic 16:38:34 ok, lets move on than 16:38:46 #topic Grafana 16:38:54 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:39:10 first thing which I want to mention is 16:39:22 that I sent today cleaning patch for grafana dashboard: https://review.opendev.org/698264 16:40:30 other than that, I don't see any issues on grafana 16:40:39 jobs looks pretty well this week 16:40:59 I see the ironic cogating problem as well as tripleo-standalone, and midonet cogating looks like it is getting better 16:42:08 njohnston: yes, those I noticed too 16:42:14 and I forgot about them now :) 16:42:17 sorry 16:42:50 do we know what is up with tripleo-standalone? 16:42:59 nope 16:43:16 is there any volunteer to check both of those jobs? 16:44:14 can we ping yamamoto for midonet? 16:44:20 sorry for being lazy 16:44:34 well midonet is fixed now 16:44:41 oh yes, sorry 16:44:47 it's the other two that are having issues 16:44:53 ralonsoh: midonet is fixed by skipping one failing test 16:45:09 slaweq, give me one to me 16:45:15 I'll take a look this week 16:46:13 ralonsoh: ok, pick whichever You want 16:46:17 ironic 16:47:04 ralonsoh: ok :) 16:47:08 so I will check tripleo 16:47:47 ralonsoh: please check on neutron-channel, I just spoke with TheJulia about one issue in dhcp agent on ironic job 16:47:54 sure 16:47:55 maybe it's the same issue (idk) 16:48:15 thx 16:48:30 #action ralonsoh to check ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa job 16:48:44 #action slaweq to check tripleo job 16:49:09 ok, lets move on than 16:49:25 I don't have any new issues with scenario/functiona/fullstack jobs for today 16:49:28 which is very good 16:49:31 \o/ 16:49:39 but I have one issue with periodic jobs 16:49:40 #topic Periodic 16:50:00 recently we added periodic job which runs on mariadb instead of mysql 16:50:04 and it is failing now: 16:50:10 https://b12f79f00ace923cb903-227be9d6f8442281010ef49b8394f34d.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-tempest-mariadb-full/18fecee/job-output.txt 16:50:18 +1 16:50:26 it seems that our new ovn related code is broken on mariadb 16:50:42 uhmmmm ok 16:50:46 I'll take a look 16:50:52 ralonsoh: thx 16:50:55 (I did the DB migration) 16:51:41 ralonsoh: maybe it's again issue with mariadb 10.1 16:51:46 and on 10.4 will work fine 16:51:57 do you have a link? 16:51:59 as we had already with one other db migration script some time ago 16:52:05 ralonsoh: link to what? 16:52:19 the problem in maribadb 10.1 16:52:33 give me a sec 16:52:39 np, we can talk tomorrow 16:52:54 https://bugs.launchpad.net/kolla-ansible/+bug/1841907 16:52:54 Launchpad bug 1841907 in neutron "Neutron bootstrap failing on Ubuntu bionic with Cannot change column 'network_id" [Critical,Confirmed] 16:53:04 upsss also mine!! 16:53:10 lol 16:53:14 I did this change too 16:53:26 You are doing many patches so some of them may break things ;) 16:53:41 the point is in mysql and postgree that was working 16:53:55 yes 16:54:10 that's why I proposed mariadb periodic job 16:54:20 as there are differences between mysql and mariadb now 16:55:44 ok, so ralonsoh You will check that, right? 16:55:48 yes 16:55:50 thx 16:55:59 #action ralonsoh to check periodic mariadb job failures 16:56:04 ok 16:56:12 so that's all what I had for today 16:56:46 in overall I think that we are now in really good shape with our CI, many patches were merged recently without dozens of rechecks 16:57:03 so thx for working on CI improvements guys :) 16:57:11 \o/ 16:57:13 nice! 16:57:21 fantastic 16:57:31 have a great evening and see You online 16:57:33 o/ 16:57:35 #endmeeting