15:00:10 #startmeeting neutron_ci 15:00:11 Meeting started Tue Nov 3 15:00:10 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:13 welcome back :) 15:00:14 The meeting name has been set to 'neutron_ci' 15:00:37 long time no see :) 15:01:16 hi 15:01:18 bcafarel: yeah :D 15:01:22 o/ 15:01:56 o/ 15:02:21 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:27 lets open it now and we can start 15:02:35 #topic Actions from previous meetings 15:02:48 slaweq to propose patch to check console log before ssh to instance 15:02:54 Done: https://review.opendev.org/#/c/758968/ 15:03:07 +1 to this patch 15:03:15 and TBH I didn't saw AuthenticationFailure errors in neutron-tempest-plugin jobs in last few days 15:03:27 so it seems that it could helps really 15:03:37 until we find/fix the error in paramiko, that will help 15:03:54 I will try to do something similar to tempest also 15:04:17 #action slaweq to propose patch to check console log before ssh to instance in tempest 15:04:35 next one 15:04:37 bcafarel to update grafana dashboard for master branch 15:05:02 not merged yet, but has a +2 https://review.opendev.org/#/c/758208/ 15:05:49 thx 15:06:05 ok, last one from previous meeting 15:06:07 slaweq to check failing neutron-grenade-ovn job 15:06:13 I didn't had time for that still 15:06:18 #action slaweq to check failing neutron-grenade-ovn job 15:06:36 and that's all actions from last week 15:06:41 lets move on 15:06:43 #topic Stadium projects 15:06:54 lajoskatona: anything regarding stadium projects and ci? 15:07:11 Hi 15:07:19 nothing new 15:07:36 I still in the recovering phase after PTG, sorry 15:07:50 AFAICT for stadium projects it is pretty stable, at least I didn't saw many failures 15:08:09 yeah the problems appear mostly in older branches 15:08:24 for which I have a PTG action item I think :) 15:08:53 yeah, to check which ones are broken and should be moved to "unmaintained" phase 15:08:55 :) 15:09:44 btw. there is one thing regarding stadium, mlavalle please check https://review.opendev.org/#/q/topic:neutron-victoria+(status:open+OR+status:merged) 15:09:53 those are patches for stable/victoria 15:10:12 we need to switch there to use neutron-tempest-plugin-victoria jobs 15:10:59 if there is nothing more related to the stadium, lets move on 15:11:01 next topic 15:11:03 #topic Stable branches 15:11:08 Victoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:11:10 Ussuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:11:28 there are patches for master also in that url 15:11:37 or am I misunderstanding? 15:11:51 mlavalle: no, only for stable/victoria 15:12:14 in master branch we are still using base neutron-tempest-plugin jobs 15:12:30 but for stable/victoria we need to run jobs dedicated for stable/victoria 15:12:44 ok, I think I clicked it wrong 15:12:57 :) 15:13:52 bcafarel: any new issues with stable branches? 15:14:31 nothing I spotted this week, hopefully we have rocky/queens back on track now thanks to your patch 15:14:46 yes, this should be better with https://review.opendev.org/#/c/758377/ :) 15:15:38 ok, so lets move on 15:15:46 #topic Grafana 15:15:52 #link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1 15:16:03 I have to leave now, perhaps I can join later (in 30minutes) if I find wifi to connect.... 15:17:41 still I think that most failing jobs are (non-voting) ovn related jobs 15:18:23 like e.g. http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?viewPanel=18&orgId=1 15:18:39 is there any volunteer who will want to check those failures? 15:18:46 jlibosva ? 15:18:48 :) 15:19:31 I can put it on my todo list :) 15:20:02 jlibosva: thx 15:20:07 so we have at least https://bugs.launchpad.net/neutron/+bug/1902512 15:20:09 Launchpad bug 1902512 in neutron "neutron-ovn-tripleo-ci-centos-8-containers-multinode fails on private networ creation (mtu size)" [Medium,Triaged] 15:21:17 yes, this one sounds like serious one because it happens often 15:21:30 but I think that in other, devstack based jobs there are other failures 15:22:12 ok, except that I think it's "normal" on grafana 15:22:22 so we can move on to the specific jobs and failures 15:22:24 ok? 15:23:54 #topic fullstack/functional 15:23:58 sounds good 15:24:10 here I found couple of new issues 15:24:14 first functional tests 15:24:31 I (again) so job timeout due to high amount of logs: 15:24:38 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6dc/755752/3/check/neutron-functional-with-uwsgi/6dcca4f/job-output.txt 15:24:49 it's old issue with stestr iirc 15:25:14 I will open LP for that today 15:25:31 any volunteer to take a look and maybe try to avoid some logs to be send to stdout? 15:25:52 not this week, sorry 15:26:40 ok, I will open LP and if someone will have time, You can take it :) 15:26:57 now fulstack 15:27:06 I reported bug https://bugs.launchpad.net/neutron/+bug/1902678 today 15:27:07 Launchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed] 15:27:16 I saw it at least 3 times recently 15:27:57 basically if that happens, it will fail many tests as in all of them it will timeout while waiting until dhcp agent process will be spawned 15:28:40 it looks like in https://zuul.opendev.org/t/openstack/build/40affb0d6e0844369a293b05dea0e42c/log/controller/logs/dsvm-fullstack-logs/TestHAL3Agent.test_gateway_ip_changed.txt 15:29:28 anyone interested in checking that? 15:30:37 ok, I'll take a look 15:30:41 thx 15:31:10 #action ralonsoh to check fullstack issue https://bugs.launchpad.net/neutron/+bug/1902678 15:31:11 Launchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed] 15:31:27 ok, lets move on to the scenario jobs now 15:31:29 #topic Tempest/Scenario 15:32:10 first issue which I saw few times are problems with cinder volumes 15:32:26 like e.g. https://2f507ad644729ed0a17c-1abd4c4163ab8d95786215227f5e857f.ssl.cf5.rackcdn.com/758098/7/check/tempest-slow-py3/b2b284b/testr_results.html or https://zuul.opendev.org/t/openstack/build/eec3c390c2944d0ab56460c75d0383fa/logs 15:32:42 and I was thinking about maybe blacklisting those failing cinder tests in our jobs? 15:32:47 wdyt about it? 15:33:31 can it be done easily in zuul? this is a global job definition no? 15:33:57 bcafarel: tempest-slow-py3 is defined in tempest repo 15:33:58 (if doable definitely +1 with "to restore once cinder is fixed") 15:34:02 oh nice 15:34:09 but neutron-tempest-multinode-full-py3 is defined in neutron 15:34:35 but for tempest-slow-py3 we can do our job "neutron-tempest-slow-py3" and blacklist such tests there 15:34:58 I don't know if gmann will be happy with that if he will discover it but we can try IMO ;) 15:35:55 slaweq: you mean do neutron-tempest-slow-py3 like we did for integrated job? 15:36:13 gmann: yes, something like that 15:36:29 but also without "volume" tests which 15:36:37 which are failing pretty often in our jobs 15:36:49 i think that make sense, I will say we did not do it for slow/multinode job but we should do 15:36:56 and we are trying to make our CI a bit more stable because now it is a nightmare 15:37:00 +1 15:37:05 thx :) 15:37:49 ok, so I will do it in our repo 15:38:13 #action slaweq to blacklist some cinder related tests in the neutron-tempest-* jobs 15:38:19 either is fine, i think it will be used in neutron so neutron-tempest-slow-py in neutron repo make sense 15:38:29 if it is more than neutron then we can do in tempest repo 15:38:44 gmann: ok 15:40:21 ok, lets move to the grenade jobs now 15:40:47 and with grenade jobs I have one "issue" but maybe it's just my missunderstanding of something 15:41:05 it seems that in multinode grenade jobs services on compute-1 node aren't upgraded 15:41:23 and that is causing failure with unsupported ovo version in e.g. my patch https://9a7a3a32fbdea177beae-de1ec222256e01db8c1f1f4d7a4b9170.ssl.cf5.rackcdn.com/749158/8/check/neutron-grenade-multinode/a45a7ed/compute1/logs/screen-neutron-agent.txt 15:41:54 now the question is - should it be like that and we should be able to run compute node with older agents 15:42:02 or should we upgrade those agents too? 15:42:05 do You know? 15:42:55 hmm newer controller and older compute should work no? 15:43:08 though in grenade I expected a full upgrade 15:44:01 ok, so I will investigate why it's not working if it should 15:44:28 gmann: also, I found out recently that in grenade jobs on subnodes we are using lib/neutron instead of lib/neutron-legacy 15:44:34 gmann: can You check https://review.opendev.org/#/c/759199/ maybe? 15:44:49 slaweq: sure 15:44:58 thx 15:45:21 ok, and that's basically all from me for today 15:45:41 please remember to check failed jobs before recheck 15:45:53 slaweq: I think hangyan had a similar issue with one of his patches and grenade 15:45:56 and write related bug (or open new one if needed) while rechecking 15:46:03 mlavalle: yes, I know 15:46:25 but I will check on my patch while it's like, maybe we are doing something wrong there :) 15:46:37 ok 15:46:43 I'll mention this to him 15:47:00 thx 15:48:17 ok, if there is nothing else to be discussed today, I will give You few minutes back 15:48:21 thx for attending the meeting 15:48:27 and see You online 15:48:27 bye 15:48:29 o/ 15:48:32 #endmeeting