15:00:10 <slaweq> #startmeeting neutron_ci 15:00:11 <openstack> Meeting started Tue Nov 3 15:00:10 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:13 <slaweq> welcome back :) 15:00:14 <openstack> The meeting name has been set to 'neutron_ci' 15:00:37 <bcafarel> long time no see :) 15:01:16 <ralonsoh> hi 15:01:18 <slaweq> bcafarel: yeah :D 15:01:22 <obondarev> o/ 15:01:56 <mlavalle> o/ 15:02:21 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:27 <slaweq> lets open it now and we can start 15:02:35 <slaweq> #topic Actions from previous meetings 15:02:48 <slaweq> slaweq to propose patch to check console log before ssh to instance 15:02:54 <slaweq> Done: https://review.opendev.org/#/c/758968/ 15:03:07 <ralonsoh> +1 to this patch 15:03:15 <slaweq> and TBH I didn't saw AuthenticationFailure errors in neutron-tempest-plugin jobs in last few days 15:03:27 <slaweq> so it seems that it could helps really 15:03:37 <ralonsoh> until we find/fix the error in paramiko, that will help 15:03:54 <slaweq> I will try to do something similar to tempest also 15:04:17 <slaweq> #action slaweq to propose patch to check console log before ssh to instance in tempest 15:04:35 <slaweq> next one 15:04:37 <slaweq> bcafarel to update grafana dashboard for master branch 15:05:02 <bcafarel> not merged yet, but has a +2 https://review.opendev.org/#/c/758208/ 15:05:49 <slaweq> thx 15:06:05 <slaweq> ok, last one from previous meeting 15:06:07 <slaweq> slaweq to check failing neutron-grenade-ovn job 15:06:13 <slaweq> I didn't had time for that still 15:06:18 <slaweq> #action slaweq to check failing neutron-grenade-ovn job 15:06:36 <slaweq> and that's all actions from last week 15:06:41 <slaweq> lets move on 15:06:43 <slaweq> #topic Stadium projects 15:06:54 <slaweq> lajoskatona: anything regarding stadium projects and ci? 15:07:11 <lajoskatona> Hi 15:07:19 <lajoskatona> nothing new 15:07:36 <lajoskatona> I still in the recovering phase after PTG, sorry 15:07:50 <slaweq> AFAICT for stadium projects it is pretty stable, at least I didn't saw many failures 15:08:09 <lajoskatona> yeah the problems appear mostly in older branches 15:08:24 <bcafarel> for which I have a PTG action item I think :) 15:08:53 <slaweq> yeah, to check which ones are broken and should be moved to "unmaintained" phase 15:08:55 <slaweq> :) 15:09:44 <slaweq> btw. there is one thing regarding stadium, mlavalle please check https://review.opendev.org/#/q/topic:neutron-victoria+(status:open+OR+status:merged) 15:09:53 <slaweq> those are patches for stable/victoria 15:10:12 <slaweq> we need to switch there to use neutron-tempest-plugin-victoria jobs 15:10:59 <slaweq> if there is nothing more related to the stadium, lets move on 15:11:01 <slaweq> next topic 15:11:03 <slaweq> #topic Stable branches 15:11:08 <slaweq> Victoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:11:10 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:11:28 <mlavalle> there are patches for master also in that url 15:11:37 <mlavalle> or am I misunderstanding? 15:11:51 <slaweq> mlavalle: no, only for stable/victoria 15:12:14 <slaweq> in master branch we are still using base neutron-tempest-plugin jobs 15:12:30 <slaweq> but for stable/victoria we need to run jobs dedicated for stable/victoria 15:12:44 <mlavalle> ok, I think I clicked it wrong 15:12:57 <slaweq> :) 15:13:52 <slaweq> bcafarel: any new issues with stable branches? 15:14:31 <bcafarel> nothing I spotted this week, hopefully we have rocky/queens back on track now thanks to your patch 15:14:46 <slaweq> yes, this should be better with https://review.opendev.org/#/c/758377/ :) 15:15:38 <slaweq> ok, so lets move on 15:15:46 <slaweq> #topic Grafana 15:15:52 <slaweq> #link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1 15:16:03 <lajoskatona> I have to leave now, perhaps I can join later (in 30minutes) if I find wifi to connect.... 15:17:41 <slaweq> still I think that most failing jobs are (non-voting) ovn related jobs 15:18:23 <slaweq> like e.g. http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?viewPanel=18&orgId=1 15:18:39 <slaweq> is there any volunteer who will want to check those failures? 15:18:46 <slaweq> jlibosva ? 15:18:48 <slaweq> :) 15:19:31 <jlibosva> I can put it on my todo list :) 15:20:02 <slaweq> jlibosva: thx 15:20:07 <bcafarel> so we have at least https://bugs.launchpad.net/neutron/+bug/1902512 15:20:09 <openstack> Launchpad bug 1902512 in neutron "neutron-ovn-tripleo-ci-centos-8-containers-multinode fails on private networ creation (mtu size)" [Medium,Triaged] 15:21:17 <slaweq> yes, this one sounds like serious one because it happens often 15:21:30 <slaweq> but I think that in other, devstack based jobs there are other failures 15:22:12 <slaweq> ok, except that I think it's "normal" on grafana 15:22:22 <slaweq> so we can move on to the specific jobs and failures 15:22:24 <slaweq> ok? 15:23:54 <slaweq> #topic fullstack/functional 15:23:58 <bcafarel> sounds good 15:24:10 <slaweq> here I found couple of new issues 15:24:14 <slaweq> first functional tests 15:24:31 <slaweq> I (again) so job timeout due to high amount of logs: 15:24:38 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6dc/755752/3/check/neutron-functional-with-uwsgi/6dcca4f/job-output.txt 15:24:49 <slaweq> it's old issue with stestr iirc 15:25:14 <slaweq> I will open LP for that today 15:25:31 <slaweq> any volunteer to take a look and maybe try to avoid some logs to be send to stdout? 15:25:52 <ralonsoh> not this week, sorry 15:26:40 <slaweq> ok, I will open LP and if someone will have time, You can take it :) 15:26:57 <slaweq> now fulstack 15:27:06 <slaweq> I reported bug https://bugs.launchpad.net/neutron/+bug/1902678 today 15:27:07 <openstack> Launchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed] 15:27:16 <slaweq> I saw it at least 3 times recently 15:27:57 <slaweq> basically if that happens, it will fail many tests as in all of them it will timeout while waiting until dhcp agent process will be spawned 15:28:40 <slaweq> it looks like in https://zuul.opendev.org/t/openstack/build/40affb0d6e0844369a293b05dea0e42c/log/controller/logs/dsvm-fullstack-logs/TestHAL3Agent.test_gateway_ip_changed.txt 15:29:28 <slaweq> anyone interested in checking that? 15:30:37 <ralonsoh> ok, I'll take a look 15:30:41 <slaweq> thx 15:31:10 <slaweq> #action ralonsoh to check fullstack issue https://bugs.launchpad.net/neutron/+bug/1902678 15:31:11 <openstack> Launchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed] 15:31:27 <slaweq> ok, lets move on to the scenario jobs now 15:31:29 <slaweq> #topic Tempest/Scenario 15:32:10 <slaweq> first issue which I saw few times are problems with cinder volumes 15:32:26 <slaweq> like e.g. https://2f507ad644729ed0a17c-1abd4c4163ab8d95786215227f5e857f.ssl.cf5.rackcdn.com/758098/7/check/tempest-slow-py3/b2b284b/testr_results.html or https://zuul.opendev.org/t/openstack/build/eec3c390c2944d0ab56460c75d0383fa/logs 15:32:42 <slaweq> and I was thinking about maybe blacklisting those failing cinder tests in our jobs? 15:32:47 <slaweq> wdyt about it? 15:33:31 <bcafarel> can it be done easily in zuul? this is a global job definition no? 15:33:57 <slaweq> bcafarel: tempest-slow-py3 is defined in tempest repo 15:33:58 <bcafarel> (if doable definitely +1 with "to restore once cinder is fixed") 15:34:02 <bcafarel> oh nice 15:34:09 <slaweq> but neutron-tempest-multinode-full-py3 is defined in neutron 15:34:35 <slaweq> but for tempest-slow-py3 we can do our job "neutron-tempest-slow-py3" and blacklist such tests there 15:34:58 <slaweq> I don't know if gmann will be happy with that if he will discover it but we can try IMO ;) 15:35:55 <gmann> slaweq: you mean do neutron-tempest-slow-py3 like we did for integrated job? 15:36:13 <slaweq> gmann: yes, something like that 15:36:29 <slaweq> but also without "volume" tests which 15:36:37 <slaweq> which are failing pretty often in our jobs 15:36:49 <gmann> i think that make sense, I will say we did not do it for slow/multinode job but we should do 15:36:56 <slaweq> and we are trying to make our CI a bit more stable because now it is a nightmare 15:37:00 <gmann> +1 15:37:05 <slaweq> thx :) 15:37:49 <slaweq> ok, so I will do it in our repo 15:38:13 <slaweq> #action slaweq to blacklist some cinder related tests in the neutron-tempest-* jobs 15:38:19 <gmann> either is fine, i think it will be used in neutron so neutron-tempest-slow-py in neutron repo make sense 15:38:29 <gmann> if it is more than neutron then we can do in tempest repo 15:38:44 <slaweq> gmann: ok 15:40:21 <slaweq> ok, lets move to the grenade jobs now 15:40:47 <slaweq> and with grenade jobs I have one "issue" but maybe it's just my missunderstanding of something 15:41:05 <slaweq> it seems that in multinode grenade jobs services on compute-1 node aren't upgraded 15:41:23 <slaweq> and that is causing failure with unsupported ovo version in e.g. my patch https://9a7a3a32fbdea177beae-de1ec222256e01db8c1f1f4d7a4b9170.ssl.cf5.rackcdn.com/749158/8/check/neutron-grenade-multinode/a45a7ed/compute1/logs/screen-neutron-agent.txt 15:41:54 <slaweq> now the question is - should it be like that and we should be able to run compute node with older agents 15:42:02 <slaweq> or should we upgrade those agents too? 15:42:05 <slaweq> do You know? 15:42:55 <bcafarel> hmm newer controller and older compute should work no? 15:43:08 <bcafarel> though in grenade I expected a full upgrade 15:44:01 <slaweq> ok, so I will investigate why it's not working if it should 15:44:28 <slaweq> gmann: also, I found out recently that in grenade jobs on subnodes we are using lib/neutron instead of lib/neutron-legacy 15:44:34 <slaweq> gmann: can You check https://review.opendev.org/#/c/759199/ maybe? 15:44:49 <gmann> slaweq: sure 15:44:58 <slaweq> thx 15:45:21 <slaweq> ok, and that's basically all from me for today 15:45:41 <slaweq> please remember to check failed jobs before recheck 15:45:53 <mlavalle> slaweq: I think hangyan had a similar issue with one of his patches and grenade 15:45:56 <slaweq> and write related bug (or open new one if needed) while rechecking 15:46:03 <slaweq> mlavalle: yes, I know 15:46:25 <slaweq> but I will check on my patch while it's like, maybe we are doing something wrong there :) 15:46:37 <mlavalle> ok 15:46:43 <mlavalle> I'll mention this to him 15:47:00 <slaweq> thx 15:48:17 <slaweq> ok, if there is nothing else to be discussed today, I will give You few minutes back 15:48:21 <slaweq> thx for attending the meeting 15:48:27 <slaweq> and see You online 15:48:27 <ralonsoh> bye 15:48:29 <slaweq> o/ 15:48:32 <slaweq> #endmeeting