16:00:05 <slaweq> #startmeeting neutron_ci 16:00:06 <openstack> Meeting started Tue Dec 10 16:00:05 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:08 <slaweq> hi 16:00:09 <openstack> The meeting name has been set to 'neutron_ci' 16:00:52 <bcafarel> o/ 16:01:46 <slaweq> lets wait few more minutes for ralonsoh and others 16:01:51 <ralonsoh> hi 16:02:54 <slaweq> ok, lets start 16:02:56 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:11 <slaweq> please open it now so that it will be ready when needed :) 16:03:39 <slaweq> #topic Actions from previous meetings 16:03:51 <slaweq> first one: 16:03:53 <slaweq> njohnston to check failing NetworkMigrationFromHA in multinode dvr job 16:04:01 <slaweq> I'm not sure if njohnston is around now 16:04:51 <slaweq> #action njohnston to check failing NetworkMigrationFromHA in multinode dvr job 16:04:58 <slaweq> lets keep it for next week than 16:05:02 <njohnston> o/ 16:05:06 <slaweq> hi njohnston :) 16:05:12 <njohnston> yeah, keep it for next week, I am debuigging it right now 16:05:18 <slaweq> ok 16:05:20 <slaweq> thx 16:05:26 <slaweq> and good luck with debugging 16:05:28 <slaweq> :) 16:05:36 <slaweq> ok, next one: 16:05:37 <njohnston> :) 16:05:43 <slaweq> ralonsoh to check functional tests timeouts https://bugs.launchpad.net/neutron/+bug/1854462 16:05:43 <openstack> Launchpad bug 1854462 in neutron "[Functional tests] Timeout exception in list_namespace_pids" [High,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:05:56 <ralonsoh> I wrote a small script for this 16:06:12 <ralonsoh> http://paste.openstack.org/show/787322/ 16:06:27 <ralonsoh> and I added log messages in pyroute2 16:07:01 <ralonsoh> I detected that most of the time, the blocking method was https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/__init__.py#L209 16:07:20 <ralonsoh> so instead of calling it every time we call create/delete namespace 16:07:44 <ralonsoh> I create the object once (in the root context, see patch https://review.opendev.org/#/c/698039/) 16:07:46 <ralonsoh> that's all 16:09:01 <slaweq> smart :) 16:09:15 <ralonsoh> BTW, _CDLL = ctypes.CDLL(ctypes_util.find_library('c'), use_errno=True) 16:09:25 <ralonsoh> this MUST not change during the execution 16:09:59 <njohnston> maybe put a comment in to that effect? 16:10:25 <ralonsoh> njohnston, I mean: the library can't be modified 16:10:32 <ralonsoh> this is not going to happen 16:10:36 <njohnston> ok 16:11:27 <slaweq> ok 16:11:40 <slaweq> thx ralonsoh for working on this 16:11:51 <slaweq> I hope we will get rid of those timeouts with this patch 16:11:57 <slaweq> next one 16:11:59 <slaweq> slaweq to check reason of grenade jobs failures 16:12:02 <bcafarel> looks nice indeed 16:12:02 <slaweq> I checked it 16:12:25 <slaweq> and it seems that all those failures are related to https://bugs.launchpad.net/nova/+bug/1844929 16:12:25 <openstack> Launchpad bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,Confirmed] 16:13:06 <slaweq> and as I talked with efried and mriedem yesterday, it is probably caused by oversubscribed CI nodes 16:13:27 <slaweq> so we don't have any good solution for that problem now 16:13:54 <slaweq> only 2 possible options imo are: 16:14:13 <slaweq> 1. live with it like it's now 16:14:34 <slaweq> 2. make grenade jobs non-voting and non-gating temporary until this issue will be solved 16:14:49 <slaweq> problem with 2 is that we don't know when it may be possible fixed 16:15:21 <ralonsoh> pfff all grenade jobs? 16:15:38 <ralonsoh> mark them as non-voting? 16:15:41 <slaweq> ralonsoh: we have now only 2 multinode grenade jobs 16:15:42 <njohnston> I wonder if it would be worthwhile to email openstack-discuss and ask if anything can be done about the oversubscription 16:15:50 <slaweq> we removed single node jobs 16:16:01 <ralonsoh> yes, but several tests 16:16:04 <ralonsoh> I mean tests 16:16:28 <slaweq> njohnston: there are such threads started by mriedem here http://lists.openstack.org/pipermail/openstack-discuss/2019-October/thread.html#10484 and continued here http://lists.openstack.org/pipermail/openstack-discuss/2019-November/thread.html#10502 16:17:00 <clarkb> we areour own noisy neighbors in many cases. One way to address oversubsceiption is to make our software run more efficiently 16:17:19 <clarkb> devstack jobs swap and Ive asked sevral times that openstack address this 16:17:43 <clarkb> I think fixing swapping will likely have a major impact on performance relatedproblems 16:19:40 <slaweq> clarkb: so we should focus on optimizing Neutron's memory usage to make this working better, correct? 16:21:04 <ralonsoh> slaweq, agree with this but we usually optimize the speed, not the memory consumption 16:21:28 <slaweq> yep 16:21:30 <ralonsoh> most of out efforts are in optimizing the DB access, the parallelism, etc 16:22:19 <clarkb> slaweq: and tge rest of openstack/devstack 16:22:27 <clarkb> etcd seems like it isnt used but always run 16:22:31 <clarkb> cinder backup too 16:22:57 <clarkb> byt it all adds up then you start swapping which impacts the current job and all other jobs trying to access those disk resources 16:23:33 <slaweq> clarkb: ok, I will check those jobs and maybe will be able to disable some of those not used services there 16:23:54 <slaweq> thx for this tips 16:24:17 <slaweq> #action slaweq to check and try to optimize neutron-grenade-multinode jobs memory consumption 16:24:39 <slaweq> ok, I think we can move on 16:24:44 <slaweq> next topic 16:24:46 <slaweq> #topic Stadium projects 16:24:52 <slaweq> tempest-plugins migration 16:24:57 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:25:10 <slaweq> last 2 patches for neutron-vpnaas are ready for review now 16:25:21 <slaweq> Step 1: https://review.openstack.org/#/c/649373 16:25:23 <slaweq> Step 2: https://review.opendev.org/#/c/695834 16:25:29 <slaweq> I just +2'ed Step 1 patch 16:25:49 <slaweq> and in step 2 we will probably need to switch centos based job to be non-voting 16:26:03 <slaweq> as code isn't compatible with py27 anymore 16:26:58 <bcafarel> yes we need centos7+py3, or centos8 (when possible) 16:26:59 <njohnston> agreed 16:27:29 <ralonsoh> (I can't run devstack with centos8) 16:27:39 <ralonsoh> some libraries are missing 16:28:18 <slaweq> maybe we should switch this job to be fedora based? 16:28:29 <slaweq> but that can be IMO done as follow up patch 16:28:32 <ralonsoh> F29 is working, not F30 16:28:45 <ralonsoh> sure, in another patch 16:29:28 <bcafarel> yes, let's wrap up tempest plugin migration first 16:30:09 <slaweq> yeah :) 16:30:27 <slaweq> so I hope mlavalle will push one more PS soon and we will be done with this finally 16:30:41 <slaweq> I hope next week we will move it out from meeting agenda 16:31:11 <bcafarel> :) 16:31:38 <slaweq> next stadium related topic is 16:31:40 <slaweq> Neutron Train - Drop py27 and standardize on zuul v3 16:31:45 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 16:31:54 <slaweq> njohnston: any updates on this since yesterday? :) 16:32:10 <njohnston> Nope, I thought I saw bcafarel doing something related though 16:32:32 <bcafarel> yes I sent a few phase 1 patches for review (links in etherpad) 16:32:42 <bcafarel> also I got feedback on openstack-python-jobs-neutron jobs 16:33:10 <bcafarel> these are in fact now legacy set and should not be touched, we should move to openstack-python3-ussuri-jobs-neutron 16:34:21 <bcafarel> I updated status for some projects too (reviews merged so done for them) 16:34:32 <njohnston> thanks very much bcafarel! 16:35:15 <slaweq> bcafarel++ thx a lot 16:35:35 * njohnston sees slaweq switching between channels, always in demand! 16:36:13 <bcafarel> everybody always looking for the PTL 16:36:33 <slaweq> njohnston: yes, I'm trying 16:36:37 <slaweq> but it's hard :) 16:37:03 <slaweq> ok, I think that it is all related to stadium projects for today 16:37:23 <slaweq> or maybe You have anything else what You want to discuss today? 16:37:33 <slaweq> if not, lets move on to the next topic 16:38:34 <slaweq> ok, lets move on than 16:38:46 <slaweq> #topic Grafana 16:38:54 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:39:10 <slaweq> first thing which I want to mention is 16:39:22 <slaweq> that I sent today cleaning patch for grafana dashboard: https://review.opendev.org/698264 16:40:30 <slaweq> other than that, I don't see any issues on grafana 16:40:39 <slaweq> jobs looks pretty well this week 16:40:59 <njohnston> I see the ironic cogating problem as well as tripleo-standalone, and midonet cogating looks like it is getting better 16:42:08 <slaweq> njohnston: yes, those I noticed too 16:42:14 <slaweq> and I forgot about them now :) 16:42:17 <slaweq> sorry 16:42:50 <njohnston> do we know what is up with tripleo-standalone? 16:42:59 <slaweq> nope 16:43:16 <slaweq> is there any volunteer to check both of those jobs? 16:44:14 <ralonsoh> can we ping yamamoto for midonet? 16:44:20 <ralonsoh> sorry for being lazy 16:44:34 <njohnston> well midonet is fixed now 16:44:41 <ralonsoh> oh yes, sorry 16:44:47 <njohnston> it's the other two that are having issues 16:44:53 <slaweq> ralonsoh: midonet is fixed by skipping one failing test 16:45:09 <ralonsoh> slaweq, give me one to me 16:45:15 <ralonsoh> I'll take a look this week 16:46:13 <slaweq> ralonsoh: ok, pick whichever You want 16:46:17 <ralonsoh> ironic 16:47:04 <slaweq> ralonsoh: ok :) 16:47:08 <slaweq> so I will check tripleo 16:47:47 <slaweq> ralonsoh: please check on neutron-channel, I just spoke with TheJulia about one issue in dhcp agent on ironic job 16:47:54 <ralonsoh> sure 16:47:55 <slaweq> maybe it's the same issue (idk) 16:48:15 <slaweq> thx 16:48:30 <slaweq> #action ralonsoh to check ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa job 16:48:44 <slaweq> #action slaweq to check tripleo job 16:49:09 <slaweq> ok, lets move on than 16:49:25 <slaweq> I don't have any new issues with scenario/functiona/fullstack jobs for today 16:49:28 <slaweq> which is very good 16:49:31 <slaweq> \o/ 16:49:39 <slaweq> but I have one issue with periodic jobs 16:49:40 <slaweq> #topic Periodic 16:50:00 <slaweq> recently we added periodic job which runs on mariadb instead of mysql 16:50:04 <slaweq> and it is failing now: 16:50:10 <slaweq> https://b12f79f00ace923cb903-227be9d6f8442281010ef49b8394f34d.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-tempest-mariadb-full/18fecee/job-output.txt 16:50:18 <ralonsoh> +1 16:50:26 <slaweq> it seems that our new ovn related code is broken on mariadb 16:50:42 <ralonsoh> uhmmmm ok 16:50:46 <ralonsoh> I'll take a look 16:50:52 <slaweq> ralonsoh: thx 16:50:55 <ralonsoh> (I did the DB migration) 16:51:41 <slaweq> ralonsoh: maybe it's again issue with mariadb 10.1 16:51:46 <slaweq> and on 10.4 will work fine 16:51:57 <ralonsoh> do you have a link? 16:51:59 <slaweq> as we had already with one other db migration script some time ago 16:52:05 <slaweq> ralonsoh: link to what? 16:52:19 <ralonsoh> the problem in maribadb 10.1 16:52:33 <slaweq> give me a sec 16:52:39 <ralonsoh> np, we can talk tomorrow 16:52:54 <slaweq> https://bugs.launchpad.net/kolla-ansible/+bug/1841907 16:52:54 <openstack> Launchpad bug 1841907 in neutron "Neutron bootstrap failing on Ubuntu bionic with Cannot change column 'network_id" [Critical,Confirmed] 16:53:04 <ralonsoh> upsss also mine!! 16:53:10 <slaweq> lol 16:53:14 <ralonsoh> I did this change too 16:53:26 <slaweq> You are doing many patches so some of them may break things ;) 16:53:41 <ralonsoh> the point is in mysql and postgree that was working 16:53:55 <slaweq> yes 16:54:10 <slaweq> that's why I proposed mariadb periodic job 16:54:20 <slaweq> as there are differences between mysql and mariadb now 16:55:44 <slaweq> ok, so ralonsoh You will check that, right? 16:55:48 <ralonsoh> yes 16:55:50 <slaweq> thx 16:55:59 <slaweq> #action ralonsoh to check periodic mariadb job failures 16:56:04 <slaweq> ok 16:56:12 <slaweq> so that's all what I had for today 16:56:46 <slaweq> in overall I think that we are now in really good shape with our CI, many patches were merged recently without dozens of rechecks 16:57:03 <slaweq> so thx for working on CI improvements guys :) 16:57:11 <njohnston> \o/ 16:57:13 <bcafarel> nice! 16:57:21 <ralonsoh> fantastic 16:57:31 <slaweq> have a great evening and see You online 16:57:33 <slaweq> o/ 16:57:35 <slaweq> #endmeeting