15:01:38 <slaweq> #startmeeting neutron_ci 15:01:39 <openstack> Meeting started Tue Nov 17 15:01:38 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:42 <slaweq> and welcome back :) 15:01:43 <openstack> The meeting name has been set to 'neutron_ci' 15:01:47 <lajoskatona> o/ 15:01:47 <ralonsoh> hi 15:01:52 <bcafarel> second o/ 15:02:13 <slaweq> lets do that one quick :) 15:02:19 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:10 <slaweq> #topic Actions from previous meetings 15:03:18 <slaweq> first one 15:03:21 <slaweq> slaweq to check failing neutron-grenade-ovn job 15:03:24 <slaweq> I checked it 15:03:40 <slaweq> and I found that it's failing due to the same reason why all ovn multinode jobs 15:03:44 <slaweq> https://bugs.launchpad.net/neutron/+bug/1904117 15:03:46 <openstack> Launchpad bug 1904117 in neutron "Nodes in the OVN scenario multinode jobs can't talk to each other" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:03:59 <slaweq> I proposed patches https://review.opendev.org/#/c/762650/ and https://review.opendev.org/#/c/762654/ 15:04:13 <slaweq> now I'm waiting for result of the last run of it 15:04:25 <slaweq> I hope it will work and will be accepted by zuul people 15:05:58 <slaweq> if You have any other ideas how we can solve that problem (maybe in some easier way) than please speak up :) 15:06:50 <slaweq> next one 15:06:51 <slaweq> slaweq to report bug regarding error 500 in ovn functional tests 15:06:55 <slaweq> https://bugs.launchpad.net/neutron/+bug/1903008 15:06:56 <openstack> Launchpad bug 1903008 in neutron "Create network failed during functional test" [High,Confirmed] 15:07:36 <slaweq> according to ralonsoh's comment we should wait with this until we will finish migration to new engine facade 15:07:49 <ralonsoh> right 15:08:10 <bcafarel> which should be soon :) 15:08:38 <slaweq> I hope so 15:08:48 <slaweq> ok, next one 15:08:50 <slaweq> ralonsoh to check error 500 in ovn functional tests 15:09:09 <ralonsoh> I didn't find anything relevant sorry 15:09:19 <ralonsoh> I don't know why is failing... 15:09:50 <slaweq> ok, I will ask also jlibosva and lucasgomes to take a look 15:10:33 <slaweq> ralonsoh: was there LP reported for that? 15:10:42 <ralonsoh> no 15:11:27 <slaweq> ok, I will report one and ping ovn folks to check it 15:11:50 <slaweq> #action slaweq to report bug regarding errors 500 in ovn functional tests 15:12:07 <slaweq> ok, and the last one for today 15:12:08 <slaweq> slaweq to report LP regarding functional test timeout on set_link_attribute method 15:12:12 <slaweq> https://bugs.launchpad.net/neutron/+bug/1903985 15:12:12 <openstack> Launchpad bug 1903985 in neutron "[functional] Timeouts during setting link attributes in the namepaces" [High,Confirmed] 15:14:09 <slaweq> ok, lets move on 15:14:12 <slaweq> #topic Stadium projects 15:14:23 <slaweq> lajoskatona: anything urgent/new regarding stadiums CI? 15:16:28 <slaweq> ok, I guess that except this issue with capping neutron in u-c all is fine there 15:16:43 <bcafarel> mostly yes 15:16:48 <lajoskatona> nothing 15:17:16 <bcafarel> some open questions for stein EM transition in https://review.opendev.org/#/c/762404 but nothing important I think (I left comments for most projects) 15:17:43 <slaweq> I will check those when this fix/revert will be ready 15:18:44 <slaweq> ok 15:18:47 <slaweq> so next topic 15:18:50 <slaweq> #topic Stable branches 15:18:54 <slaweq> and the same question :) 15:19:01 <slaweq> bcafarel: anything new/urgent here? 15:19:27 <bcafarel> everything handled already :) 15:19:38 <slaweq> great 15:19:42 <slaweq> so lets move on 15:19:43 <bcafarel> I saw quite a few backports get merged last week, CI looks OK on stable branches 15:19:57 <slaweq> good to hear that 15:20:00 <slaweq> :) 15:20:35 <slaweq> #topic Grafana 15:20:42 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:21:11 <slaweq> the only thing I can say from it is that numbers there aren't good in overall 15:21:22 <slaweq> but I don't see any specific new issue there 15:21:32 <slaweq> just that our jobs are mostly not stable still :/ 15:21:44 <slaweq> but it's getting to be better IMO 15:21:57 <bcafarel> with those connect/ssh fixes I think it is better in last few days? 15:22:03 <bcafarel> at least I saw a few +2 pass by :) 15:22:37 <slaweq> yes, and fix for checking vm's console log before ssh to instance was just merged in tempest yesterday night 15:22:52 <slaweq> which I hope will help with many tempest jobs 15:25:14 <slaweq> so I think we can move on to some scenario jobs' failures 15:25:22 <slaweq> #topic Tempest/Scenario 15:26:10 <slaweq> today I went through failures from last week and I found couple of issues there 15:26:30 <slaweq> there was (or is, idk exactly) some issue with rabbitmq which didn't start properly 15:26:36 <slaweq> but that's not on us really 15:27:05 <slaweq> except that, failure which was seen most often was problem with hostname command, like https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/753847/14/gate/neutron-tempest-plugin-scenario-linuxbridge/b18b3b6/testr_results.html 15:27:12 <slaweq> but that should be fixed by ralonsoh 15:27:18 <slaweq> right? 15:27:29 <ralonsoh> I think so 15:27:32 <bcafarel> in rechecks I think yes 15:27:37 <ralonsoh> https://review.opendev.org/#/c/762527/ 15:27:56 <bcafarel> do we know why hostname started returning -1 recently btw? 15:28:16 <slaweq> bcafarel: I think it was like that before from time to time 15:28:28 <slaweq> and recently this test was marked as unstable due to other issues 15:28:31 <bcafarel> ok I prefer that :) 15:28:36 <slaweq> so we probably didn't saw it really 15:29:04 <ralonsoh> we can skip that reading the hostname file directly 15:29:11 <ralonsoh> that's not perfect, but works for us 15:29:41 <slaweq> if that works I'm ok with it 15:29:46 <slaweq> it's just a test :) 15:31:25 <slaweq> ok, other issue which I saw is timeout waiting for instance 15:31:29 <slaweq> like: https://10c0cf0314a8bd66c0e4-c578cacb39dd1edf606b634ec77d1998.ssl.cf5.rackcdn.com/762654/5/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/78884bd/testr_results.html 15:31:35 <slaweq> I think lajoskatona also saw it recently 15:31:48 <slaweq> and IIRC it's always in the same test, which is strange for me 15:32:00 <lajoskatona> yeah, once or twice this week perhaps 15:32:10 <slaweq> lajoskatona: did You check with Nova what can be wrong there? 15:32:42 <ralonsoh> this could be because of the advance image 15:32:46 <lajoskatona> I asked gibi to help me out ith n-cpu logs and He said that one slow thing was the conversion from qcow2 to raw format, at least from log 15:32:58 <lajoskatona> yeah I hade the same feeling 15:33:06 <slaweq> ok 15:33:29 <lajoskatona> if you check on canonical page it is slightly bigger than previous images 15:33:36 <slaweq> so we can try to increase this build timeout (but how much would be enough) 15:33:41 <ralonsoh> if this is because of the conversion, can we point to the raw format image? 15:33:48 <slaweq> or try to look for some smaller image maybe 15:34:18 <slaweq> ralonsoh: but raw will be much bigger, no? 15:34:29 <ralonsoh> maybe... 15:34:37 <lajoskatona> yes, thats bigger I suppose, but no conversion 15:34:53 <ralonsoh> but I don't see it in the repos 15:35:04 <ralonsoh> https://cloud-images.ubuntu.com/bionic/current/ 15:35:58 <slaweq> maybe this one https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-root.tar.xz ? 15:36:10 <slaweq> idk exactly what are all those images really :) 15:36:43 <ralonsoh> ok so the problems are the conversion and the resources (more ram and disk) 15:37:06 <ralonsoh> do you know if we support something else apart from ubuntu and cirros? 15:37:30 <slaweq> we should have a bit more resources after https://review.opendev.org/#/c/762582/ and https://review.opendev.org/#/c/762539/ will be merged 15:38:29 <ralonsoh> we can also try to use less workers 15:38:40 <slaweq> but we need also https://review.opendev.org/#/c/762622/ to merge all of that 15:38:48 <ralonsoh> that will take more time but with less probability of failure 15:38:52 <slaweq> ralonsoh: test workers or neutron workers? 15:38:57 <ralonsoh> test 15:39:19 <ralonsoh> ncpu - 1, for example 15:39:24 <slaweq> we can try but I think we already use only 2 15:39:28 <slaweq> IIRC 15:39:30 <ralonsoh> really? 15:39:31 <ralonsoh> ok 15:40:03 <slaweq> sorry, 4 15:40:09 <slaweq> so we can try only 2 15:40:18 <ralonsoh> or 3 yes 15:40:25 <ralonsoh> that will take more time but less problems, I think so 15:40:25 <slaweq> especially in neutron-tempest-plugin jobs where there is no so many tests to run 15:40:41 <slaweq> ralonsoh: will You propose a patch or do You want me to do it? 15:40:44 <ralonsoh> sure 15:40:50 <ralonsoh> I'll do it 15:40:50 <slaweq> thx 15:41:18 <slaweq> #action ralonsoh will decrease number of test workers in scenario jobs 15:41:48 <slaweq> I also found other interesting failure with server 15:41:54 <slaweq> but only once so far 15:41:56 <slaweq> https://6ad68def19a9c3e3c7f7-a757501b1a7ef7a48e849fadd8ea0086.ssl.cf2.rackcdn.com/759657/1/check/neutron-tempest-plugin-scenario-ovn/3ce85d5/testr_results.html 15:42:06 <slaweq> it seems it was timeout during server termination 15:42:13 <slaweq> did You saw something like that before? 15:43:08 <ralonsoh> only testing manually 15:43:47 <slaweq> ok, lets see how it will be and if it will repeat more often, we will report bug for that 15:43:51 <slaweq> and we will see :) 15:44:34 <slaweq> and that's all what I have for today 15:44:41 <slaweq> periodic jobs are ok recently 15:44:59 <slaweq> do You have anything else You want to talk today? 15:45:08 <ralonsoh> no 15:45:09 <slaweq> if not, I will give You 15 minutes back 15:45:36 <bcafarel> nice to be able to wrap the meeting in less time :) 15:45:56 <slaweq> ok, thx for attending 15:46:00 <ralonsoh> bye! 15:46:05 <slaweq> have a great evening and see You online tomorrow :) 15:46:07 <slaweq> o/ 15:46:09 <slaweq> #endmeeting