#openstack-meeting-3 log

15:00:10 <slaweq> #startmeeting neutron_ci
15:00:11 <openstack> Meeting started Tue Nov  3 15:00:10 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:13 <slaweq> welcome back :)
15:00:14 <openstack> The meeting name has been set to 'neutron_ci'
15:00:37 <bcafarel> long time no see :)
15:01:16 <ralonsoh> hi
15:01:18 <slaweq> bcafarel: yeah :D
15:01:22 <obondarev> o/
15:01:56 <mlavalle> o/
15:02:21 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:27 <slaweq> lets open it now and we can start
15:02:35 <slaweq> #topic Actions from previous meetings
15:02:48 <slaweq> slaweq to propose patch to check console log before ssh to instance
15:02:54 <slaweq> Done: https://review.opendev.org/#/c/758968/
15:03:07 <ralonsoh> +1 to this patch
15:03:15 <slaweq> and TBH I didn't saw AuthenticationFailure errors in neutron-tempest-plugin jobs in last few days
15:03:27 <slaweq> so it seems that it could helps really
15:03:37 <ralonsoh> until we find/fix the error in paramiko, that will help
15:03:54 <slaweq> I will try to do something similar to tempest also
15:04:17 <slaweq> #action slaweq to propose patch to check console log before ssh to instance in tempest
15:04:35 <slaweq> next one
15:04:37 <slaweq> bcafarel to update grafana dashboard for master branch
15:05:02 <bcafarel> not merged yet, but has a +2 https://review.opendev.org/#/c/758208/
15:05:49 <slaweq> thx
15:06:05 <slaweq> ok, last one from previous meeting
15:06:07 <slaweq> slaweq to check failing neutron-grenade-ovn job
15:06:13 <slaweq> I didn't had time for that still
15:06:18 <slaweq> #action slaweq to check failing neutron-grenade-ovn job
15:06:36 <slaweq> and that's all actions from last week
15:06:41 <slaweq> lets move on
15:06:43 <slaweq> #topic Stadium projects
15:06:54 <slaweq> lajoskatona: anything regarding stadium projects and ci?
15:07:11 <lajoskatona> Hi
15:07:19 <lajoskatona> nothing new
15:07:36 <lajoskatona> I still in the recovering phase after PTG, sorry
15:07:50 <slaweq> AFAICT for stadium projects it is pretty stable, at least I didn't saw many failures
15:08:09 <lajoskatona> yeah the problems appear mostly in older branches
15:08:24 <bcafarel> for which I have a PTG action item I think :)
15:08:53 <slaweq> yeah, to check which ones are broken and should be moved to "unmaintained" phase
15:08:55 <slaweq> :)
15:09:44 <slaweq> btw. there is one thing regarding stadium, mlavalle please check https://review.opendev.org/#/q/topic:neutron-victoria+(status:open+OR+status:merged)
15:09:53 <slaweq> those are patches for stable/victoria
15:10:12 <slaweq> we need to switch there to use neutron-tempest-plugin-victoria jobs
15:10:59 <slaweq> if there is nothing more related to the stadium, lets move on
15:11:01 <slaweq> next topic
15:11:03 <slaweq> #topic Stable branches
15:11:08 <slaweq> Victoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1
15:11:10 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1
15:11:28 <mlavalle> there are patches for master also in that url
15:11:37 <mlavalle> or am I misunderstanding?
15:11:51 <slaweq> mlavalle: no, only for stable/victoria
15:12:14 <slaweq> in master branch we are still using base neutron-tempest-plugin jobs
15:12:30 <slaweq> but for stable/victoria we need to run jobs dedicated for stable/victoria
15:12:44 <mlavalle> ok, I think I clicked it wrong
15:12:57 <slaweq> :)
15:13:52 <slaweq> bcafarel: any new issues with stable branches?
15:14:31 <bcafarel> nothing I spotted this week, hopefully we have rocky/queens back on track now thanks to your patch
15:14:46 <slaweq> yes, this should be better with https://review.opendev.org/#/c/758377/ :)
15:15:38 <slaweq> ok, so lets move on
15:15:46 <slaweq> #topic Grafana
15:15:52 <slaweq> #link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1
15:16:03 <lajoskatona> I have to leave now, perhaps I can join later (in 30minutes) if I find wifi to connect....
15:17:41 <slaweq> still I think that most failing jobs are (non-voting) ovn related jobs
15:18:23 <slaweq> like e.g. http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?viewPanel=18&orgId=1
15:18:39 <slaweq> is there any volunteer who will want to check those failures?
15:18:46 <slaweq> jlibosva ?
15:18:48 <slaweq> :)
15:19:31 <jlibosva> I can put it on my todo list :)
15:20:02 <slaweq> jlibosva: thx
15:20:07 <bcafarel> so we have at least https://bugs.launchpad.net/neutron/+bug/1902512
15:20:09 <openstack> Launchpad bug 1902512 in neutron "neutron-ovn-tripleo-ci-centos-8-containers-multinode fails on private networ creation (mtu size)" [Medium,Triaged]
15:21:17 <slaweq> yes, this one sounds like serious one because it happens often
15:21:30 <slaweq> but I think that in other, devstack based jobs there are other failures
15:22:12 <slaweq> ok, except that I think it's "normal" on grafana
15:22:22 <slaweq> so we can move on to the specific jobs and failures
15:22:24 <slaweq> ok?
15:23:54 <slaweq> #topic fullstack/functional
15:23:58 <bcafarel> sounds good
15:24:10 <slaweq> here I found couple of new issues
15:24:14 <slaweq> first functional tests
15:24:31 <slaweq> I (again) so job timeout due to high amount of logs:
15:24:38 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6dc/755752/3/check/neutron-functional-with-uwsgi/6dcca4f/job-output.txt
15:24:49 <slaweq> it's old issue with stestr iirc
15:25:14 <slaweq> I will open LP for that today
15:25:31 <slaweq> any volunteer to take a look and maybe try to avoid some logs to be send to stdout?
15:25:52 <ralonsoh> not this week, sorry
15:26:40 <slaweq> ok, I will open LP and if someone will have time, You can take it :)
15:26:57 <slaweq> now fulstack
15:27:06 <slaweq> I reported bug https://bugs.launchpad.net/neutron/+bug/1902678 today
15:27:07 <openstack> Launchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed]
15:27:16 <slaweq> I saw it at least 3 times recently
15:27:57 <slaweq> basically if that happens, it will fail many tests as in all of them it will timeout while waiting until dhcp agent process will be spawned
15:28:40 <slaweq> it looks like in  https://zuul.opendev.org/t/openstack/build/40affb0d6e0844369a293b05dea0e42c/log/controller/logs/dsvm-fullstack-logs/TestHAL3Agent.test_gateway_ip_changed.txt
15:29:28 <slaweq> anyone interested in checking that?
15:30:37 <ralonsoh> ok, I'll take a look
15:30:41 <slaweq> thx
15:31:10 <slaweq> #action ralonsoh to check fullstack issue https://bugs.launchpad.net/neutron/+bug/1902678
15:31:11 <openstack> Launchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed]
15:31:27 <slaweq> ok, lets move on to the scenario jobs now
15:31:29 <slaweq> #topic Tempest/Scenario
15:32:10 <slaweq> first issue which I saw few times are problems with cinder volumes
15:32:26 <slaweq> like e.g. https://2f507ad644729ed0a17c-1abd4c4163ab8d95786215227f5e857f.ssl.cf5.rackcdn.com/758098/7/check/tempest-slow-py3/b2b284b/testr_results.html or https://zuul.opendev.org/t/openstack/build/eec3c390c2944d0ab56460c75d0383fa/logs
15:32:42 <slaweq> and I was thinking about maybe blacklisting those failing cinder tests in our jobs?
15:32:47 <slaweq> wdyt about it?
15:33:31 <bcafarel> can it be done easily in zuul? this is a global job definition no?
15:33:57 <slaweq> bcafarel: tempest-slow-py3 is defined in tempest repo
15:33:58 <bcafarel> (if doable definitely +1 with "to restore once cinder is fixed")
15:34:02 <bcafarel> oh nice
15:34:09 <slaweq> but neutron-tempest-multinode-full-py3 is defined in neutron
15:34:35 <slaweq> but for tempest-slow-py3 we can do our job "neutron-tempest-slow-py3" and blacklist such tests there
15:34:58 <slaweq> I don't know if gmann will be happy with that if he will discover it but we can try IMO ;)
15:35:55 <gmann> slaweq: you mean do neutron-tempest-slow-py3 like we did for integrated job?
15:36:13 <slaweq> gmann: yes, something like that
15:36:29 <slaweq> but also without "volume" tests which
15:36:37 <slaweq> which are failing pretty often in our jobs
15:36:49 <gmann> i think that make sense, I will say we did not do it for slow/multinode  job but we should do
15:36:56 <slaweq> and we are trying to make our CI a bit more stable because now it is a nightmare
15:37:00 <gmann> +1
15:37:05 <slaweq> thx :)
15:37:49 <slaweq> ok, so I will do it in our repo
15:38:13 <slaweq> #action slaweq to blacklist some cinder related tests in the neutron-tempest-* jobs
15:38:19 <gmann> either is fine, i think it will be used in neutron so neutron-tempest-slow-py in neutron repo make sense
15:38:29 <gmann> if it is more than neutron then we can do in tempest repo
15:38:44 <slaweq> gmann: ok
15:40:21 <slaweq> ok, lets move to the grenade jobs now
15:40:47 <slaweq> and with grenade jobs I have one "issue" but maybe it's just my missunderstanding of something
15:41:05 <slaweq> it seems that in multinode grenade jobs services on compute-1 node aren't upgraded
15:41:23 <slaweq> and that is causing failure with unsupported ovo version in e.g. my patch https://9a7a3a32fbdea177beae-de1ec222256e01db8c1f1f4d7a4b9170.ssl.cf5.rackcdn.com/749158/8/check/neutron-grenade-multinode/a45a7ed/compute1/logs/screen-neutron-agent.txt
15:41:54 <slaweq> now the question is  - should it be like that and we should be able to run compute node with older agents
15:42:02 <slaweq> or should we upgrade those agents too?
15:42:05 <slaweq> do You know?
15:42:55 <bcafarel> hmm newer controller and older compute should work no?
15:43:08 <bcafarel> though in grenade I expected a full upgrade
15:44:01 <slaweq> ok, so I will investigate why it's not working if it should
15:44:28 <slaweq> gmann: also, I found out recently that in grenade jobs on subnodes we are using lib/neutron instead of lib/neutron-legacy
15:44:34 <slaweq> gmann: can You check https://review.opendev.org/#/c/759199/ maybe?
15:44:49 <gmann> slaweq: sure
15:44:58 <slaweq> thx
15:45:21 <slaweq> ok, and that's basically all from me for today
15:45:41 <slaweq> please remember to check failed jobs before recheck
15:45:53 <mlavalle> slaweq: I think hangyan had a similar issue with one of his patches and grenade
15:45:56 <slaweq> and write related bug (or open new one if needed) while rechecking
15:46:03 <slaweq> mlavalle: yes, I know
15:46:25 <slaweq> but I will check on my patch while it's like, maybe we are doing something wrong there :)
15:46:37 <mlavalle> ok
15:46:43 <mlavalle> I'll mention this to him
15:47:00 <slaweq> thx
15:48:17 <slaweq> ok, if there is nothing else to be discussed today, I will give You few minutes back
15:48:21 <slaweq> thx for attending the meeting
15:48:27 <slaweq> and see You online
15:48:27 <ralonsoh> bye
15:48:29 <slaweq> o/
15:48:32 <slaweq> #endmeeting