#openstack-meeting-3 log

15:01:38 <slaweq> #startmeeting neutron_ci
15:01:39 <openstack> Meeting started Tue Nov 17 15:01:38 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:42 <slaweq> and welcome back :)
15:01:43 <openstack> The meeting name has been set to 'neutron_ci'
15:01:47 <lajoskatona> o/
15:01:47 <ralonsoh> hi
15:01:52 <bcafarel> second o/
15:02:13 <slaweq> lets do that one quick :)
15:02:19 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:03:10 <slaweq> #topic Actions from previous meetings
15:03:18 <slaweq> first one
15:03:21 <slaweq> slaweq to check failing neutron-grenade-ovn job
15:03:24 <slaweq> I checked it
15:03:40 <slaweq> and I found that it's failing due to the same reason why all ovn multinode jobs
15:03:44 <slaweq> https://bugs.launchpad.net/neutron/+bug/1904117
15:03:46 <openstack> Launchpad bug 1904117 in neutron "Nodes in the OVN scenario multinode jobs can't talk to each other" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
15:03:59 <slaweq> I proposed patches https://review.opendev.org/#/c/762650/ and https://review.opendev.org/#/c/762654/
15:04:13 <slaweq> now I'm waiting for result of the last run of it
15:04:25 <slaweq> I hope it will work and will be accepted by zuul people
15:05:58 <slaweq> if You have any other ideas how we can solve that problem (maybe in some easier way) than please speak up :)
15:06:50 <slaweq> next one
15:06:51 <slaweq> slaweq to report bug regarding error 500 in ovn functional tests
15:06:55 <slaweq> https://bugs.launchpad.net/neutron/+bug/1903008
15:06:56 <openstack> Launchpad bug 1903008 in neutron "Create network failed during functional test" [High,Confirmed]
15:07:36 <slaweq> according to ralonsoh's comment we should wait with this until we will finish migration to new engine facade
15:07:49 <ralonsoh> right
15:08:10 <bcafarel> which should be soon :)
15:08:38 <slaweq> I hope so
15:08:48 <slaweq> ok, next one
15:08:50 <slaweq> ralonsoh to check error 500 in ovn functional tests
15:09:09 <ralonsoh> I didn't find anything relevant sorry
15:09:19 <ralonsoh> I don't know why is failing...
15:09:50 <slaweq> ok, I will ask also jlibosva and lucasgomes to take a look
15:10:33 <slaweq> ralonsoh: was there LP reported for that?
15:10:42 <ralonsoh> no
15:11:27 <slaweq> ok, I will report one and ping ovn folks to check it
15:11:50 <slaweq> #action slaweq to report bug regarding errors 500 in ovn functional tests
15:12:07 <slaweq> ok, and the last one for today
15:12:08 <slaweq> slaweq to report LP regarding functional test timeout on set_link_attribute method
15:12:12 <slaweq> https://bugs.launchpad.net/neutron/+bug/1903985
15:12:12 <openstack> Launchpad bug 1903985 in neutron "[functional] Timeouts during setting link attributes in the namepaces" [High,Confirmed]
15:14:09 <slaweq> ok, lets move on
15:14:12 <slaweq> #topic Stadium projects
15:14:23 <slaweq> lajoskatona: anything urgent/new regarding stadiums CI?
15:16:28 <slaweq> ok, I guess that except this issue with capping neutron in u-c all is fine there
15:16:43 <bcafarel> mostly yes
15:16:48 <lajoskatona> nothing
15:17:16 <bcafarel> some open questions for stein EM transition in https://review.opendev.org/#/c/762404 but nothing important I think (I left comments for most projects)
15:17:43 <slaweq> I will check those when this fix/revert will be ready
15:18:44 <slaweq> ok
15:18:47 <slaweq> so next topic
15:18:50 <slaweq> #topic Stable branches
15:18:54 <slaweq> and the same question :)
15:19:01 <slaweq> bcafarel: anything new/urgent here?
15:19:27 <bcafarel> everything handled already :)
15:19:38 <slaweq> great
15:19:42 <slaweq> so lets move on
15:19:43 <bcafarel> I saw quite a few backports get merged last week, CI looks OK on stable branches
15:19:57 <slaweq> good to hear that
15:20:00 <slaweq> :)
15:20:35 <slaweq> #topic Grafana
15:20:42 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:21:11 <slaweq> the only thing I can say from it is that numbers there aren't good in overall
15:21:22 <slaweq> but I don't see any specific new issue there
15:21:32 <slaweq> just that our jobs are mostly not stable still :/
15:21:44 <slaweq> but it's getting to be better IMO
15:21:57 <bcafarel> with those connect/ssh fixes I think it is better in last few days?
15:22:03 <bcafarel> at least I saw a few +2 pass by :)
15:22:37 <slaweq> yes, and fix for checking vm's console log before ssh to instance was just merged in tempest yesterday night
15:22:52 <slaweq> which I hope will help with many tempest jobs
15:25:14 <slaweq> so I think we can move on to some scenario jobs' failures
15:25:22 <slaweq> #topic Tempest/Scenario
15:26:10 <slaweq> today I went through failures from last week and I found couple of issues there
15:26:30 <slaweq> there was (or is, idk exactly) some issue with rabbitmq which didn't start properly
15:26:36 <slaweq> but that's not on us really
15:27:05 <slaweq> except that, failure which was seen most often was problem with hostname command, like https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/753847/14/gate/neutron-tempest-plugin-scenario-linuxbridge/b18b3b6/testr_results.html
15:27:12 <slaweq> but that should be fixed by ralonsoh
15:27:18 <slaweq> right?
15:27:29 <ralonsoh> I think so
15:27:32 <bcafarel> in rechecks I think yes
15:27:37 <ralonsoh> https://review.opendev.org/#/c/762527/
15:27:56 <bcafarel> do we know why hostname started returning -1 recently btw?
15:28:16 <slaweq> bcafarel: I think it was like that before from time to time
15:28:28 <slaweq> and recently this test was marked as unstable due to other issues
15:28:31 <bcafarel> ok I prefer that :)
15:28:36 <slaweq> so we probably didn't saw it really
15:29:04 <ralonsoh> we can skip that reading the hostname file directly
15:29:11 <ralonsoh> that's not perfect, but works for us
15:29:41 <slaweq> if that works I'm ok with it
15:29:46 <slaweq> it's just a test :)
15:31:25 <slaweq> ok, other issue which I saw is timeout waiting for instance
15:31:29 <slaweq> like: https://10c0cf0314a8bd66c0e4-c578cacb39dd1edf606b634ec77d1998.ssl.cf5.rackcdn.com/762654/5/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/78884bd/testr_results.html
15:31:35 <slaweq> I think lajoskatona also saw it recently
15:31:48 <slaweq> and IIRC it's always in the same test, which is strange for me
15:32:00 <lajoskatona> yeah, once or twice this week perhaps
15:32:10 <slaweq> lajoskatona: did You check with Nova what can be wrong there?
15:32:42 <ralonsoh> this could be because of the advance image
15:32:46 <lajoskatona> I asked gibi to help me out ith n-cpu logs and He said that one slow thing was the conversion from qcow2 to raw format, at least from log
15:32:58 <lajoskatona> yeah I hade the same feeling
15:33:06 <slaweq> ok
15:33:29 <lajoskatona> if you check on canonical page it is slightly bigger than previous images
15:33:36 <slaweq> so we can try to increase this build timeout (but how much would be enough)
15:33:41 <ralonsoh> if this is because of the conversion, can we point to the raw format image?
15:33:48 <slaweq> or try to look for some smaller image maybe
15:34:18 <slaweq> ralonsoh: but raw will be much bigger, no?
15:34:29 <ralonsoh> maybe...
15:34:37 <lajoskatona> yes, thats bigger I suppose, but no conversion
15:34:53 <ralonsoh> but I don't see it in the repos
15:35:04 <ralonsoh> https://cloud-images.ubuntu.com/bionic/current/
15:35:58 <slaweq> maybe this one https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-root.tar.xz ?
15:36:10 <slaweq> idk exactly what are all those images really :)
15:36:43 <ralonsoh> ok so the problems are the conversion and the resources (more ram and disk)
15:37:06 <ralonsoh> do you know if we support something else apart from ubuntu and cirros?
15:37:30 <slaweq> we should have a bit more resources after https://review.opendev.org/#/c/762582/ and https://review.opendev.org/#/c/762539/ will be merged
15:38:29 <ralonsoh> we can also try to use less workers
15:38:40 <slaweq> but we need also https://review.opendev.org/#/c/762622/ to merge all of that
15:38:48 <ralonsoh> that will take more time but with less probability of failure
15:38:52 <slaweq> ralonsoh: test workers or neutron workers?
15:38:57 <ralonsoh> test
15:39:19 <ralonsoh> ncpu - 1, for example
15:39:24 <slaweq> we can try but I think we already use only 2
15:39:28 <slaweq> IIRC
15:39:30 <ralonsoh> really?
15:39:31 <ralonsoh> ok
15:40:03 <slaweq> sorry, 4
15:40:09 <slaweq> so we can try only 2
15:40:18 <ralonsoh> or 3 yes
15:40:25 <ralonsoh> that will take more time but less problems, I think so
15:40:25 <slaweq> especially in neutron-tempest-plugin jobs where there is no so many tests to run
15:40:41 <slaweq> ralonsoh: will You propose a patch or do You want me to do it?
15:40:44 <ralonsoh> sure
15:40:50 <ralonsoh> I'll do it
15:40:50 <slaweq> thx
15:41:18 <slaweq> #action ralonsoh will decrease number of test workers in scenario jobs
15:41:48 <slaweq> I also found other interesting failure with server
15:41:54 <slaweq> but only once so far
15:41:56 <slaweq> https://6ad68def19a9c3e3c7f7-a757501b1a7ef7a48e849fadd8ea0086.ssl.cf2.rackcdn.com/759657/1/check/neutron-tempest-plugin-scenario-ovn/3ce85d5/testr_results.html
15:42:06 <slaweq> it seems it was timeout during server termination
15:42:13 <slaweq> did You saw something like that before?
15:43:08 <ralonsoh> only testing manually
15:43:47 <slaweq> ok, lets see how it will be and if it will repeat more often, we will report bug for that
15:43:51 <slaweq> and we will see :)
15:44:34 <slaweq> and that's all what I have for today
15:44:41 <slaweq> periodic jobs are ok recently
15:44:59 <slaweq> do You have anything else You want to talk today?
15:45:08 <ralonsoh> no
15:45:09 <slaweq> if not, I will give You 15 minutes back
15:45:36 <bcafarel> nice to be able to wrap the meeting in less time :)
15:45:56 <slaweq> ok, thx for attending
15:46:00 <ralonsoh> bye!
15:46:05 <slaweq> have a great evening and see You online tomorrow :)
15:46:07 <slaweq> o/
15:46:09 <slaweq> #endmeeting