15:01:38 #startmeeting neutron_ci 15:01:39 Meeting started Tue Nov 17 15:01:38 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:42 and welcome back :) 15:01:43 The meeting name has been set to 'neutron_ci' 15:01:47 o/ 15:01:47 hi 15:01:52 second o/ 15:02:13 lets do that one quick :) 15:02:19 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:10 #topic Actions from previous meetings 15:03:18 first one 15:03:21 slaweq to check failing neutron-grenade-ovn job 15:03:24 I checked it 15:03:40 and I found that it's failing due to the same reason why all ovn multinode jobs 15:03:44 https://bugs.launchpad.net/neutron/+bug/1904117 15:03:46 Launchpad bug 1904117 in neutron "Nodes in the OVN scenario multinode jobs can't talk to each other" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:03:59 I proposed patches https://review.opendev.org/#/c/762650/ and https://review.opendev.org/#/c/762654/ 15:04:13 now I'm waiting for result of the last run of it 15:04:25 I hope it will work and will be accepted by zuul people 15:05:58 if You have any other ideas how we can solve that problem (maybe in some easier way) than please speak up :) 15:06:50 next one 15:06:51 slaweq to report bug regarding error 500 in ovn functional tests 15:06:55 https://bugs.launchpad.net/neutron/+bug/1903008 15:06:56 Launchpad bug 1903008 in neutron "Create network failed during functional test" [High,Confirmed] 15:07:36 according to ralonsoh's comment we should wait with this until we will finish migration to new engine facade 15:07:49 right 15:08:10 which should be soon :) 15:08:38 I hope so 15:08:48 ok, next one 15:08:50 ralonsoh to check error 500 in ovn functional tests 15:09:09 I didn't find anything relevant sorry 15:09:19 I don't know why is failing... 15:09:50 ok, I will ask also jlibosva and lucasgomes to take a look 15:10:33 ralonsoh: was there LP reported for that? 15:10:42 no 15:11:27 ok, I will report one and ping ovn folks to check it 15:11:50 #action slaweq to report bug regarding errors 500 in ovn functional tests 15:12:07 ok, and the last one for today 15:12:08 slaweq to report LP regarding functional test timeout on set_link_attribute method 15:12:12 https://bugs.launchpad.net/neutron/+bug/1903985 15:12:12 Launchpad bug 1903985 in neutron "[functional] Timeouts during setting link attributes in the namepaces" [High,Confirmed] 15:14:09 ok, lets move on 15:14:12 #topic Stadium projects 15:14:23 lajoskatona: anything urgent/new regarding stadiums CI? 15:16:28 ok, I guess that except this issue with capping neutron in u-c all is fine there 15:16:43 mostly yes 15:16:48 nothing 15:17:16 some open questions for stein EM transition in https://review.opendev.org/#/c/762404 but nothing important I think (I left comments for most projects) 15:17:43 I will check those when this fix/revert will be ready 15:18:44 ok 15:18:47 so next topic 15:18:50 #topic Stable branches 15:18:54 and the same question :) 15:19:01 bcafarel: anything new/urgent here? 15:19:27 everything handled already :) 15:19:38 great 15:19:42 so lets move on 15:19:43 I saw quite a few backports get merged last week, CI looks OK on stable branches 15:19:57 good to hear that 15:20:00 :) 15:20:35 #topic Grafana 15:20:42 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:21:11 the only thing I can say from it is that numbers there aren't good in overall 15:21:22 but I don't see any specific new issue there 15:21:32 just that our jobs are mostly not stable still :/ 15:21:44 but it's getting to be better IMO 15:21:57 with those connect/ssh fixes I think it is better in last few days? 15:22:03 at least I saw a few +2 pass by :) 15:22:37 yes, and fix for checking vm's console log before ssh to instance was just merged in tempest yesterday night 15:22:52 which I hope will help with many tempest jobs 15:25:14 so I think we can move on to some scenario jobs' failures 15:25:22 #topic Tempest/Scenario 15:26:10 today I went through failures from last week and I found couple of issues there 15:26:30 there was (or is, idk exactly) some issue with rabbitmq which didn't start properly 15:26:36 but that's not on us really 15:27:05 except that, failure which was seen most often was problem with hostname command, like https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/753847/14/gate/neutron-tempest-plugin-scenario-linuxbridge/b18b3b6/testr_results.html 15:27:12 but that should be fixed by ralonsoh 15:27:18 right? 15:27:29 I think so 15:27:32 in rechecks I think yes 15:27:37 https://review.opendev.org/#/c/762527/ 15:27:56 do we know why hostname started returning -1 recently btw? 15:28:16 bcafarel: I think it was like that before from time to time 15:28:28 and recently this test was marked as unstable due to other issues 15:28:31 ok I prefer that :) 15:28:36 so we probably didn't saw it really 15:29:04 we can skip that reading the hostname file directly 15:29:11 that's not perfect, but works for us 15:29:41 if that works I'm ok with it 15:29:46 it's just a test :) 15:31:25 ok, other issue which I saw is timeout waiting for instance 15:31:29 like: https://10c0cf0314a8bd66c0e4-c578cacb39dd1edf606b634ec77d1998.ssl.cf5.rackcdn.com/762654/5/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/78884bd/testr_results.html 15:31:35 I think lajoskatona also saw it recently 15:31:48 and IIRC it's always in the same test, which is strange for me 15:32:00 yeah, once or twice this week perhaps 15:32:10 lajoskatona: did You check with Nova what can be wrong there? 15:32:42 this could be because of the advance image 15:32:46 I asked gibi to help me out ith n-cpu logs and He said that one slow thing was the conversion from qcow2 to raw format, at least from log 15:32:58 yeah I hade the same feeling 15:33:06 ok 15:33:29 if you check on canonical page it is slightly bigger than previous images 15:33:36 so we can try to increase this build timeout (but how much would be enough) 15:33:41 if this is because of the conversion, can we point to the raw format image? 15:33:48 or try to look for some smaller image maybe 15:34:18 ralonsoh: but raw will be much bigger, no? 15:34:29 maybe... 15:34:37 yes, thats bigger I suppose, but no conversion 15:34:53 but I don't see it in the repos 15:35:04 https://cloud-images.ubuntu.com/bionic/current/ 15:35:58 maybe this one https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-root.tar.xz ? 15:36:10 idk exactly what are all those images really :) 15:36:43 ok so the problems are the conversion and the resources (more ram and disk) 15:37:06 do you know if we support something else apart from ubuntu and cirros? 15:37:30 we should have a bit more resources after https://review.opendev.org/#/c/762582/ and https://review.opendev.org/#/c/762539/ will be merged 15:38:29 we can also try to use less workers 15:38:40 but we need also https://review.opendev.org/#/c/762622/ to merge all of that 15:38:48 that will take more time but with less probability of failure 15:38:52 ralonsoh: test workers or neutron workers? 15:38:57 test 15:39:19 ncpu - 1, for example 15:39:24 we can try but I think we already use only 2 15:39:28 IIRC 15:39:30 really? 15:39:31 ok 15:40:03 sorry, 4 15:40:09 so we can try only 2 15:40:18 or 3 yes 15:40:25 that will take more time but less problems, I think so 15:40:25 especially in neutron-tempest-plugin jobs where there is no so many tests to run 15:40:41 ralonsoh: will You propose a patch or do You want me to do it? 15:40:44 sure 15:40:50 I'll do it 15:40:50 thx 15:41:18 #action ralonsoh will decrease number of test workers in scenario jobs 15:41:48 I also found other interesting failure with server 15:41:54 but only once so far 15:41:56 https://6ad68def19a9c3e3c7f7-a757501b1a7ef7a48e849fadd8ea0086.ssl.cf2.rackcdn.com/759657/1/check/neutron-tempest-plugin-scenario-ovn/3ce85d5/testr_results.html 15:42:06 it seems it was timeout during server termination 15:42:13 did You saw something like that before? 15:43:08 only testing manually 15:43:47 ok, lets see how it will be and if it will repeat more often, we will report bug for that 15:43:51 and we will see :) 15:44:34 and that's all what I have for today 15:44:41 periodic jobs are ok recently 15:44:59 do You have anything else You want to talk today? 15:45:08 no 15:45:09 if not, I will give You 15 minutes back 15:45:36 nice to be able to wrap the meeting in less time :) 15:45:56 ok, thx for attending 15:46:00 bye! 15:46:05 have a great evening and see You online tomorrow :) 15:46:07 o/ 15:46:09 #endmeeting