Tuesday, 2020-11-03

slaweq#startmeeting networking14:00
slaweqok, lets start14:02
slaweq#topic Announcements14:03
*** openstack changes topic to "Announcements (Meeting topic: networking)"14:03
slaweqI fast-tracked back Oleg to the core team - welcome back in the team obondarev :)14:03
obondarevThanks!!! :)14:03
obondarevthanks folks, I'm happy to join the team14:05
slaweqnext one14:05
slaweqI proposed Rodolfo to be member of the stable core team: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018480.html14:05
mlavalleobondarev: WELCOME!14:05
slaweqI hope it will be done this week14:05
ralonsohI'll be able to break stable branches too14:06
slaweqnext one14:06
slaweqVirtual PTG summary http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018489.html14:06
slaweqalso on http://kaplonski.pl/blog/virtual_ptg_october_2020_summary/ where there are also team photos14:06
slaweqand regarding ptg agreements14:07
slaweqQoS meeting is now cancelled - https://review.opendev.org/#/c/760902/ - if we will have any QoS related topics, please add them to the On demand agenda for the team meeting14:07
slaweqCI meeting time changed - now it will be on Tuesday, just after this meeting: https://review.opendev.org/#/c/760823/14:09
slaweqnext one14:09
bcafarelcongrats obondarev and ralonsoh  (and watch out ralonsoh I will be watching :) )14:09
slaweqWallaby cycle calendar https://releases.openstack.org/wallaby/schedule.html14:09
bcafarel(sorry for lag, dns issues)14:09
slaweqit is our new most important calendar for now :)14:10
slaweqFirst milestone is in the week of November 30th14:10
slaweqso we have just few weeks to that milestone14:10
slaweqand last one for today14:10
slaweqgerrit outage is planned Nov 20 - 23 - http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html14:11
slaweqjust FYI if You missed this email on openinfra ML14:11
slaweqthat's all announcements for today from my side14:11
slaweqanything else You want to share with the team now?14:12
bcafarelstable-wise, stein is going EM next week14:12
bcafarelwe will tag one last neutron version for it before that, if you want to have a backport in a tag, better hurry14:12
bcafarelof course we will still do backports after that, just that there won't be any new tag releases14:13
slaweqbcafarel: will we do last release this week?14:13
bcafarelhttps://review.opendev.org/#/q/status:open+project:openstack/neutron+branch:stable/stein is almost empty so we can prepare it on Thursday to give some time?14:14
slaweqok, lets move on14:16
slaweqnext topic14:16
slaweq#topic Blueprints14:16
*** openstack changes topic to "Blueprints (Meeting topic: networking)"14:16
slaweqI prepared some list of the BPs for W-1 https://bugs.launchpad.net/neutron/+milestone/wallaby-114:16
slaweqthose are mostly things which we postponed from last cycle14:16
slaweqand which I think that will be continued now14:17
slaweqif You want to add some other BP to that list and track it weekly, please tell me that on irc14:17
slaweqalso if You think we should move out something from that list, please mee that too14:17
slaweqwe need some volunteer for https://blueprints.launchpad.net/neutron/+spec/handle-deadlocks-in-oslo-way - at least someone who will organize work on that14:20
slaweqas this seems to be long one14:20
slaweqsimilar to e.g. new engine facade transition14:20
lajoskatonaparttime I can work on that14:21
*** jlibosva has joined #openstack-meeting-314:21
slaweqlajoskatona: thx a lot14:21
slaweqlajoskatona: also question to You about https://bugs.launchpad.net/bugs/188280414:22
openstackLaunchpad bug 1882804 in neutron "RFE: allow replacing the QoS policy of bound port" [Wishlist,Confirmed] - Assigned to Lajos Katona (lajos-katona)14:22
slaweqwhat we still need there?14:22
lajoskatonaI think only the tempest test, but let me check14:22
slaweqIIRC there was some bug in neutron code which broke nova's gate14:22
slaweqand that's why we didn't finish it in Victoria14:22
lajoskatonayeah the fix isn't merged: https://review.opendev.org/75689214:23
lajoskatonawe postponed to Wallaby14:24
slaweqand after that what else we will need to finish that?14:25
lajoskatonaTempest: https://review.opendev.org/74369514:25
lajoskatonathe documentation was updated, so I think nothing is missing except the bugfix (https://review.opendev.org/756892 )14:26
slaweqok, so seems like that one is close14:26
slaweqthx for update lajoskatona14:27
obondarevI've just +A bug fix14:27
slaweqthx obondarev14:27
obondarev(it had +1 from me)14:27
obondarevso not just blind +A :D14:28
lajoskatonathanks for the attention and reviews14:28
slaweqok, any other updates about BPs?14:28
slaweqok, so lets move on14:29
slaweq#topic Community Goals14:29
*** openstack changes topic to "Community Goals (Meeting topic: networking)"14:29
slaweqwe have now one community goal:14:30
slaweqMigrate from oslo.rootwrap to oslo.privsep14:30
lajoskatonaand the policy.yaml topic was not accepted for this cycle?14:30
amotokiit will be accepted soon in my understanding.14:30
slaweqlajoskatona: https://governance.openstack.org/tc/goals/selected/wallaby/14:30
slaweqfor now I see only that one14:31
lajoskatonait's still not merged: https://review.opendev.org/759881, so I am not sure if it will be wallaby goal14:31
lajoskatonaAnd the etherpad mentions as goal: https://etherpad.opendev.org/p/policy-popup-wallaby-ptg but not sure14:32
slaweqthx lajoskatona and amotoki for the heads up14:33
amotokithe proposed one is about changing the default policy file name to policy.yaml. Other policy stuffs are not part of the current proposed one.14:33
slaweqI will add it to our agenda14:33
amotokianyway I will care it.14:34
slaweqeven if it's not yet accepted officially we can try to work on it14:34
slaweqamotoki: thx a lot14:34
slaweqI think we can move on to the next topic now14:35
slaweq#topic Bugs14:35
*** openstack changes topic to "Bugs (Meeting topic: networking)"14:35
rubasovreport: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018494.html14:35
slaweqrubasov: thx14:35
slaweq(that was fast :))14:35
rubasovit was quite a slow week14:35
rubasovhowever we have some incomplete bugs14:36
rubasovfirst let me call your attention to this one14:36
openstackLaunchpad bug 1901707 in neutron "race condition on port binding vs instance being resumed for live-migrations" [Undecided,Incomplete]14:36
rubasovwhich may not be incomplete to someone who understands the history of the issue in depth14:37
rubasovbut I myself had to ask a few questions from the reporter, so marked it incomplete14:37
ralonsohwe found this bug too in our CI14:38
ralonsohthe problem is the load of the OVS agent14:38
rubasovoh, that's a piece of information I did not encounter while reading the history14:38
rubasovthe problem seems relevant (failed connectivity after some live migrations)14:39
rubasovbut it would be great if we could simplify the issue14:40
ralonsohbut that should be solved with the new port binding method14:40
ralonsohimplemented in Rocky+14:40
rubasovthe old bug report (https://bugs.launchpad.net/neutron/+bug/1815989) explicitly states the opposite IIRC14:41
openstackLaunchpad bug 1815989 in OpenStack Compute (nova) "OVS drops RARP packets by QEMU upon live-migration causes up to 40s ping pause in Rocky" [Medium,In progress] - Assigned to sean mooney (sean-k-mooney)14:41
ralonsohhe is still working on this14:41
ralonsohI know that for sure14:41
slaweqso should we mark 1901707 as a duplicate of 1815989 ?14:42
slaweqor is it different issue?14:43
rubasovthe reporter's intention was to break out a sub-problem from the old and huge report14:43
rubasovbut I did not fully manage to understand to boundaries between the two parts14:43
ralonsohok, let's keep both for now14:44
ralonsohI'm working with Sean in the original one14:44
slaweqralonsoh: ok, so please keep an eye on that new one too14:44
rubasovokay that tells me practically we have progress on the 2nd report too14:44
rubasovthank you14:44
rubasovanother is this: https://bugs.launchpad.net/neutron/+bug/190221114:45
openstackLaunchpad bug 1902211 in neutron "Router State standby on all l3 agent when create" [Undecided,Incomplete]14:45
rubasovif you have experience with debugging locks not released and vpnaas, the reporter could use your help14:46
rubasovbut it's far from simple, because the bug is not reproducible at will14:46
rubasovthese were the bugs I wanted to mention14:48
rubasovthe rest I believe has progress14:49
*** macz_ has quit IRC14:49
slaweqso seems like some neutron-vpnaas bug?14:50
rubasovcould easily be14:50
slaweqI just wrote a comment there14:52
slaweqthx rubasov for the summary14:52
slaweqthis week our bug deputy is ralonsoh14:52
slaweqand next week will be lucasgomes' turn14:53
ralonsohon my way!14:53
slaweqI also added obondarev to our bug deputy rotation calendar now :)14:53
obondarevnp :)14:53
slaweqany other bugs You would like to quickly discuss today?14:54
slaweqok, I guess that this means "no"14:56
slaweqplease remember - in 4 minutes we have ci meeting in same channel :)14:56
slaweqthx for attending that meeting and see You online14:56
* bcafarel updates his calendar14:58
slaweq#startmeeting neutron_ci15:00
bcafarellong time no see :)15:00
slaweqbcafarel: yeah :D15:01
slaweqGrafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate15:02
slaweqlets open it now and we can start15:02
slaweq#topic Actions from previous meetings15:02
*** openstack changes topic to "Actions from previous meetings (Meeting topic: neutron_ci)"15:02
slaweqslaweq to propose patch to check console log before ssh to instance15:02
slaweq    Done: https://review.opendev.org/#/c/758968/15:02
ralonsoh+1 to this patch15:03
slaweqand TBH I didn't saw AuthenticationFailure errors in neutron-tempest-plugin jobs in last few days15:03
slaweqso it seems that it could helps really15:03
ralonsohuntil we find/fix the error in paramiko, that will help15:03
slaweqI will try to do something similar to tempest also15:03
slaweq#action slaweq to propose patch to check console log before ssh to instance in tempest15:04
slaweqnext one15:04
slaweqbcafarel to update grafana dashboard for master branch15:04
bcafarelnot merged yet, but has a +2 https://review.opendev.org/#/c/758208/15:05
slaweqok, last one from previous meeting15:06
slaweqslaweq to check failing neutron-grenade-ovn job15:06
slaweqI didn't had time for that still15:06
slaweq#action slaweq to check failing neutron-grenade-ovn job15:06
slaweqand that's all actions from last week15:06
slaweqlets move on15:06
slaweq#topic Stadium projects15:06
*** openstack changes topic to "Stadium projects (Meeting topic: neutron_ci)"15:06
slaweqlajoskatona: anything regarding stadium projects and ci?15:06
lajoskatonanothing new15:07
lajoskatonaI still in the recovering phase after PTG, sorry15:07
slaweqAFAICT for stadium projects it is pretty stable, at least I didn't saw many failures15:07
lajoskatonayeah the problems appear mostly in older branches15:08
bcafarelfor which I have a PTG action item I think :)15:08
slaweqyeah, to check which ones are broken and should be moved to "unmaintained" phase15:08
slaweqbtw. there is one thing regarding stadium, mlavalle please check https://review.opendev.org/#/q/topic:neutron-victoria+(status:open+OR+status:merged)15:09
slaweqthose are patches for stable/victoria15:09
slaweqwe need to switch there to use neutron-tempest-plugin-victoria jobs15:10
slaweqif there is nothing more related to the stadium, lets move on15:10
slaweqnext topic15:11
slaweq#topic Stable branches15:11
*** openstack changes topic to "Stable branches (Meeting topic: neutron_ci)"15:11
slaweqVictoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=115:11
slaweqUssuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=115:11
mlavallethere are patches for master also in that url15:11
mlavalleor am I misunderstanding?15:11
slaweqmlavalle: no, only for stable/victoria15:11
slaweqin master branch we are still using base neutron-tempest-plugin jobs15:12
slaweqbut for stable/victoria we need to run jobs dedicated for stable/victoria15:12
mlavalleok, I think I clicked it wrong15:12
slaweqbcafarel: any new issues with stable branches?15:13
bcafarelnothing I spotted this week, hopefully we have rocky/queens back on track now thanks to your patch15:14
slaweqyes, this should be better with https://review.opendev.org/#/c/758377/ :)15:14
slaweqok, so lets move on15:15
slaweq#topic Grafana15:15
*** openstack changes topic to "Grafana (Meeting topic: neutron_ci)"15:15
slaweq#link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=115:15
lajoskatonaI have to leave now, perhaps I can join later (in 30minutes) if I find wifi to connect....15:16
slaweqstill I think that most failing jobs are (non-voting) ovn related jobs15:17
slaweqlike e.g. http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?viewPanel=18&orgId=115:18
slaweqis there any volunteer who will want to check those failures?15:18
slaweqjlibosva ?15:18
jlibosvaI can put it on my todo list :)15:19
slaweqjlibosva: thx15:20
bcafarelso we have at least https://bugs.launchpad.net/neutron/+bug/190251215:20
openstackLaunchpad bug 1902512 in neutron "neutron-ovn-tripleo-ci-centos-8-containers-multinode fails on private networ creation (mtu size)" [Medium,Triaged]15:20
slaweqyes, this one sounds like serious one because it happens often15:21
slaweqbut I think that in other, devstack based jobs there are other failures15:21
slaweqok, except that I think it's "normal" on grafana15:22
slaweqso we can move on to the specific jobs and failures15:22
slaweq#topic fullstack/functional15:23
*** openstack changes topic to "fullstack/functional (Meeting topic: neutron_ci)"15:23
bcafarelsounds good15:23
slaweqhere I found couple of new issues15:24
slaweqfirst functional tests15:24
slaweqI (again) so job timeout due to high amount of logs:15:24
slaweq https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6dc/755752/3/check/neutron-functional-with-uwsgi/6dcca4f/job-output.txt15:24
slaweqit's old issue with stestr iirc15:24
slaweqI will open LP for that today15:25
slaweqany volunteer to take a look and maybe try to avoid some logs to be send to stdout?15:25
ralonsohnot this week, sorry15:25
slaweqok, I will open LP and if someone will have time, You can take it :)15:26
slaweqnow fulstack15:26
slaweqI reported bug https://bugs.launchpad.net/neutron/+bug/1902678 today15:27
openstackLaunchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed]15:27
slaweqI saw it at least 3 times recently15:27
slaweqbasically if that happens, it will fail many tests as in all of them it will timeout while waiting until dhcp agent process will be spawned15:27
slaweqit looks like in  https://zuul.opendev.org/t/openstack/build/40affb0d6e0844369a293b05dea0e42c/log/controller/logs/dsvm-fullstack-logs/TestHAL3Agent.test_gateway_ip_changed.txt15:28
slaweqanyone interested in checking that?15:29
ralonsohok, I'll take a look15:30
slaweq#action ralonsoh to check fullstack issue https://bugs.launchpad.net/neutron/+bug/190267815:31
openstackLaunchpad bug 1902678 in neutron "[Fullstack] Wrong output of the cmdline causes tests timeouts" [Critical,Confirmed]15:31
slaweqok, lets move on to the scenario jobs now15:31
slaweq#topic Tempest/Scenario15:31
*** openstack changes topic to "Tempest/Scenario (Meeting topic: neutron_ci)"15:31
slaweqfirst issue which I saw few times are problems with cinder volumes15:32
slaweqlike e.g. https://2f507ad644729ed0a17c-1abd4c4163ab8d95786215227f5e857f.ssl.cf5.rackcdn.com/758098/7/check/tempest-slow-py3/b2b284b/testr_results.html or https://zuul.opendev.org/t/openstack/build/eec3c390c2944d0ab56460c75d0383fa/logs15:32
slaweqand I was thinking about maybe blacklisting those failing cinder tests in our jobs?15:32
slaweqwdyt about it?15:32
bcafarelcan it be done easily in zuul? this is a global job definition no?15:33
slaweqbcafarel: tempest-slow-py3 is defined in tempest repo15:33
bcafarel(if doable definitely +1 with "to restore once cinder is fixed")15:33
bcafareloh nice15:34
slaweqbut neutron-tempest-multinode-full-py3 is defined in neutron15:34
slaweqbut for tempest-slow-py3 we can do our job "neutron-tempest-slow-py3" and blacklist such tests there15:34
slaweqI don't know if gmann will be happy with that if he will discover it but we can try IMO ;)15:34
gmannslaweq: you mean do neutron-tempest-slow-py3 like we did for integrated job?15:35
slaweqgmann: yes, something like that15:36
slaweqbut also without "volume" tests which15:36
slaweqwhich are failing pretty often in our jobs15:36
gmanni think that make sense, I will say we did not do it for slow/multinode  job but we should do15:36
slaweqand we are trying to make our CI a bit more stable because now it is a nightmare15:36
slaweqthx :)15:37
slaweqok, so I will do it in our repo15:37
slaweq#action slaweq to blacklist some cinder related tests in the neutron-tempest-* jobs15:38
gmanneither is fine, i think it will be used in neutron so neutron-tempest-slow-py in neutron repo make sense15:38
gmannif it is more than neutron then we can do in tempest repo15:38
slaweqgmann: ok15:38
slaweqok, lets move to the grenade jobs now15:40
slaweqand with grenade jobs I have one "issue" but maybe it's just my missunderstanding of something15:40
slaweqit seems that in multinode grenade jobs services on compute-1 node aren't upgraded15:41
slaweqand that is causing failure with unsupported ovo version in e.g. my patch https://9a7a3a32fbdea177beae-de1ec222256e01db8c1f1f4d7a4b9170.ssl.cf5.rackcdn.com/749158/8/check/neutron-grenade-multinode/a45a7ed/compute1/logs/screen-neutron-agent.txt15:41
slaweqnow the question is  - should it be like that and we should be able to run compute node with older agents15:41
slaweqor should we upgrade those agents too?15:42
slaweqdo You know?15:42
bcafarelhmm newer controller and older compute should work no?15:42
bcafarelthough in grenade I expected a full upgrade15:43
*** obondarev has quit IRC15:43
slaweqok, so I will investigate why it's not working if it should15:44
slaweqgmann: also, I found out recently that in grenade jobs on subnodes we are using lib/neutron instead of lib/neutron-legacy15:44
slaweqgmann: can You check https://review.opendev.org/#/c/759199/ maybe?15:44
gmannslaweq: sure15:44
slaweqok, and that's basically all from me for today15:45
slaweqplease remember to check failed jobs before recheck15:45
mlavalleslaweq: I think hangyan had a similar issue with one of his patches and grenade15:45
slaweqand write related bug (or open new one if needed) while rechecking15:45
slaweqmlavalle: yes, I know15:46
slaweqbut I will check on my patch while it's like, maybe we are doing something wrong there :)15:46
mlavalleI'll mention this to him15:46
slaweqok, if there is nothing else to be discussed today, I will give You few minutes back15:48
slaweqthx for attending the meeting15:48
slaweqand see You online15:48
