14:00:48 <ykarel> #startmeeting RDO meeting - 2021-07-21
14:00:48 <opendevmeet> Meeting started Wed Jul 21 14:00:48 2021 UTC and is due to finish in 60 minutes.  The chair is ykarel. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:48 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:48 <opendevmeet> The meeting name has been set to 'rdo_meeting___2021_07_21'
14:01:10 <ykarel> Please add topic to agenda https://etherpad.opendev.org/p/RDO-Meeting
14:01:16 <ykarel> #topic roll call
14:02:04 <spotz> o/
14:02:10 <ykarel> #chair spotz
14:02:10 <opendevmeet> Current chairs: spotz ykarel
14:02:22 <amoralej> o/
14:02:31 <ykarel> #chair amoralej
14:02:31 <opendevmeet> Current chairs: amoralej spotz ykarel
14:02:31 <jcapitao> o/
14:02:35 <ykarel> #chair jcapitao
14:02:35 <opendevmeet> Current chairs: amoralej jcapitao spotz ykarel
14:03:45 <ykarel> Ok let's start with topics in agenda
14:03:59 <ykarel> #topic C9 Stream Updates
14:05:46 <ykarel> #link https://review.rdoproject.org/r/c/testproject/+/33878
14:06:02 <ykarel> #link https://review.rdoproject.org/r/c/testproject/+/34549
14:06:37 <ykarel> #info some tests were run with devstack + c9
14:07:08 <ykarel> amoralej, any other updates apart from ^?
14:07:18 <amoralej> i think that's the main progress
14:07:28 <amoralej> i tried to debug some issues in devstack runs
14:07:37 <amoralej> but i think i need someone to check
14:07:49 <ykarel> amoralej, related to tempest failures?
14:07:54 <amoralej> yes
14:07:56 <ykarel> i noticed some of those were random
14:08:07 <amoralej> i'm not sure if it's just random tbh
14:08:20 <ykarel> okk
14:08:32 <amoralej> tosky, ^ we are running devstack on Centos9 and got some errors in cinder
14:08:50 <spotz> Wasn;t there a post about tempest failures? Might have been last week
14:08:50 <amoralej> may we get some help from some cinder expert?
14:08:54 <ykarel> like in https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/d2b0b81/testr_results.html there was just 1 failure
14:09:35 <ykarel> and the run before and after ^ there were more failrues
14:09:52 <amoralej> after adding swift we got more errors
14:09:53 <amoralej> https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/793602b/testr_results.html
14:10:01 <amoralej> but i'm not sure if those are related to swift
14:10:07 <amoralej> or maybe performance related
14:10:17 <amoralej> but there seems to be some pattern
14:10:24 <amoralej> and be mainly related to cinder
14:10:36 <tosky> amoralej: is it really cinder or something that looks cinder but it's really, say, nova? No :)
14:11:06 <amoralej> in fact it's nova complaining but seems to be cinder :)
14:13:24 <amoralej> it's unclear to me, tbh
14:13:49 <amoralej> Jul 19 07:59:29.720234 node-0001435902 devstack@c-api.service[107162]: CRITICAL cinder [None req-3be6ea4f-0dfb-4686-9ad3-3ad58f157fc6 tempest-ServerStableDeviceRescueTest-1149582697 tempest-ServerStableDeviceRescueTest-1149582697-project] Unhandled error: OSError: write error
14:13:59 <amoralej> that looks suspictious
14:14:08 <amoralej> https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/793602b/controller/logs/screen-c-api.txt
14:14:59 <amoralej> anyway, we can follow up later
14:15:16 <ykarel> yeap ok let's move to next
14:15:33 <ykarel> #topic Jenkins Migration Updates
14:15:53 <ykarel> #link https://review.rdoproject.org/r/q/topic:jenkins-v2
14:16:01 <ykarel> jcapitao, anything to add in this
14:16:28 <jcapitao> so the patch to migrate promotion jobs for tripleo master is ready
14:16:36 <jcapitao> #link https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/34419
14:17:07 <jcapitao> as well for per releases and distro jobs (based on master patch)
14:17:17 <jcapitao> #link https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/34560
14:17:36 <jcapitao> now we're working on tripleo-quickstart 3rd party job which needs some specific Jenkins configuration
14:18:46 <ykarel> ok Good, Thanks jcapitao
14:19:05 <ykarel> weshay|ruck, rlandy|ruck can u have a look ^
14:19:34 <rlandy|ruck> ykarel: ack - will look after meetings
14:19:44 <rlandy|ruck> weshay|ruck is PTO
14:19:53 <rlandy|ruck> jcapitao: thanks for setting this up
14:19:56 <ykarel> rlandy|ruck, okk Thanks
14:20:42 <ykarel> Ok let's move to next topic
14:20:57 <ykarel> #topic RDO Trunk repos for centos-[queens|rocky|stein] from last meeting
14:21:13 <ykarel> amoralej, you got some updates on ^?
14:21:21 <amoralej> yes, sorry, i realized about i missed sending the mail
14:21:33 <amoralej> i was writing it right before the meeting, i'll send it asap
14:21:49 <ykarel> ok np
14:21:58 <ykarel> can followup next week
14:22:17 <ykarel> #topic chair for next week
14:22:25 <ykarel> any volunteer?
14:22:32 <jcapitao> I can take it
14:22:52 <spotz> Thhanks jcapitao
14:22:58 <ykarel> Thanks jcapitao
14:23:05 <ykarel> #action jcapitao to chair next week
14:23:24 <ykarel> #topic Open Floor
14:23:31 <ykarel> Feel free to bring any topic now
14:23:50 <tosky> amoralej: back to that cinder issue (so that's in the log) that error you reported (OSError) can be seen also in different tests (look for tempest-ServerStableDeviceRescueTest), so it may not be critical
14:23:57 <tosky> there is a proxy error here: https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/793602b/controller/logs/screen-n-cpu.txt
14:24:06 <tosky> look for Jul 19 07:59:29
14:25:26 <amoralej> tosky, iiuc the actual error there is:
14:25:27 <amoralej> ERROR nova.virt.libvirt.driver [None req-bc98f5f5-5bdf-4839-8e92-54e818ebbfca tempest-ServerStableDeviceRescueTest-1149582697 tempest-ServerStableDeviceRescueTest-1149582697-project] Waiting for libvirt event about the detach of device vdb with device alias virtio-disk1 from instance ca12fb3e-0e7c-4460-ab03-c77a593ccc75 is timed out.
14:25:40 <amoralej> that's how i was to check cinder
14:26:00 <amoralej> but i'm not sure who is responsible of that dettaching
14:26:07 <amoralej> is that nova issue?
14:28:48 <tosky> it's on the boundary and that's the part I'm not sure about
14:28:57 <tosky> I'm sure geguileo knows if it's cinder or not
14:29:03 <amoralej> yeah, we are in the same page then :)
14:29:28 <ykarel> ok i just rechecked, and will request node hold, so can check on live env
14:29:35 <amoralej> good
14:29:36 <geguileo> tosky: let me catch up on the conversation...
14:31:27 <rdogerrit> Joel Capitao proposed config master: Update script which sends review based on u-c changes  https://review.rdoproject.org/r/c/config/+/33518
14:32:34 <geguileo> tosky: amoralej  that issue is in Nova's domain, because the timeout is coming from libvirt
14:33:12 <geguileo> it's failing before they even call os-brick (telling the instance to stop using the block device)
14:33:15 <amoralej> ok, i'll look for some nova friend now
14:34:24 <geguileo> amoralej: no, I may be wrong...
14:34:28 <geguileo> let me have another look...
14:35:33 <geguileo> looking at request req-bc98f5f5-5bdf-4839-8e92-54e818ebbfca which is the one with the error pasted above
14:35:48 <geguileo> that error was temporary, because on the next try it succeeded
14:36:16 <geguileo> so, it failed calling cinder  :-(
14:38:05 <ykarel> yes seems so, both the test where it was seeing timeout, have actually passed.
14:38:22 <ykarel> failing one is ERROR nova.compute.manager [None req-70164f48-e90d-494a-8ccd-fbcade5428d4 tempest-AttachVolumeMultiAttachTest-1454555876 tempest-AttachVolumeMultiAttachTest-1454555876-project] [instance: b725072c-0ddf-41ba-9ffd-ad61aa4e54fd] Failed to attach deaabc20-29d2-4ec4-a8a0-ba0fc827b0d7 at /dev/vdb: cinderclient.exceptions.ClientException: Proxy Error (HTTP 502)
14:39:10 <geguileo> the scary part is that the actual cinder code returns 200
14:39:25 <geguileo> Jul 19 07:59:34.690149 node-0001435902 devstack@c-api.service[107162]: INFO cinder.api.openstack.wsgi [req-bc98f5f5-5bdf-4839-8e92-54e818ebbfca req-86510417-b70f-4423-83b6-cceb0e70caab tempest-ServerStableDeviceRescueTest-1149582697 tempest-ServerStableDeviceRescueTest-1149582697-project]
14:39:26 <amoralej> it may be even uwsgi itself
14:39:27 <geguileo> https://38.102.83.14/volume/v3/170ba9baa97a46baac4c99ef456e5d84/attachments/beac4203-1e49-472f-8a1e-2bb3d7d8fe3c returned with HTTP 200
14:40:09 <amoralej> geguileo, the log is written before the request result is passed to uwsgi?
14:40:51 <geguileo> it would seem so, because it fails after cinder API code is "done"
14:41:12 <amoralej> https://stackoverflow.com/questions/36156887/uwsgi-raises-oserror-write-error-during-large-request
14:42:25 <geguileo> amoralej: no I believe it's wsgi who returns the log return value
14:43:57 <amoralej> let's see if we can reproduce it and check
14:44:18 <geguileo> amoralej: https://github.com/openstack/cinder/blob/81f2aaeea91bce6455f9b09cc8795855200e75e1/cinder/api/openstack/wsgi.py#L938-L954
14:46:56 <geguileo> that's the INFO logging and what cinder does afterwards...
14:47:09 <geguileo> doesn't seem to do any OS calls
14:47:52 <amoralej> uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 306]
14:48:06 <amoralej> yes, looks like it's uwsgi itself
14:50:00 <ykarel> Lock acquired by "attachment_update"
14:50:07 <ykarel> doesn't seem this released ^
14:50:26 <ykarel> ^ seems related geguileo ^?
14:51:20 <ykarel> following req-70164f48-e90d-494a-8ccd-fbcade5428d4 in cinder api log
14:51:32 <ykarel> https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/d2b0b81/controller/logs/screen-c-api.txt
14:52:40 <ykarel> Ok let's have this offline, /me ends meeting
14:52:43 <ykarel> Thanks all
14:52:53 <ykarel> #endmeeting