14:00:48 <ykarel> #startmeeting RDO meeting - 2021-07-21 14:00:48 <opendevmeet> Meeting started Wed Jul 21 14:00:48 2021 UTC and is due to finish in 60 minutes. The chair is ykarel. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:48 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:48 <opendevmeet> The meeting name has been set to 'rdo_meeting___2021_07_21' 14:01:10 <ykarel> Please add topic to agenda https://etherpad.opendev.org/p/RDO-Meeting 14:01:16 <ykarel> #topic roll call 14:02:04 <spotz> o/ 14:02:10 <ykarel> #chair spotz 14:02:10 <opendevmeet> Current chairs: spotz ykarel 14:02:22 <amoralej> o/ 14:02:31 <ykarel> #chair amoralej 14:02:31 <opendevmeet> Current chairs: amoralej spotz ykarel 14:02:31 <jcapitao> o/ 14:02:35 <ykarel> #chair jcapitao 14:02:35 <opendevmeet> Current chairs: amoralej jcapitao spotz ykarel 14:03:45 <ykarel> Ok let's start with topics in agenda 14:03:59 <ykarel> #topic C9 Stream Updates 14:05:46 <ykarel> #link https://review.rdoproject.org/r/c/testproject/+/33878 14:06:02 <ykarel> #link https://review.rdoproject.org/r/c/testproject/+/34549 14:06:37 <ykarel> #info some tests were run with devstack + c9 14:07:08 <ykarel> amoralej, any other updates apart from ^? 14:07:18 <amoralej> i think that's the main progress 14:07:28 <amoralej> i tried to debug some issues in devstack runs 14:07:37 <amoralej> but i think i need someone to check 14:07:49 <ykarel> amoralej, related to tempest failures? 14:07:54 <amoralej> yes 14:07:56 <ykarel> i noticed some of those were random 14:08:07 <amoralej> i'm not sure if it's just random tbh 14:08:20 <ykarel> okk 14:08:32 <amoralej> tosky, ^ we are running devstack on Centos9 and got some errors in cinder 14:08:50 <spotz> Wasn;t there a post about tempest failures? Might have been last week 14:08:50 <amoralej> may we get some help from some cinder expert? 14:08:54 <ykarel> like in https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/d2b0b81/testr_results.html there was just 1 failure 14:09:35 <ykarel> and the run before and after ^ there were more failrues 14:09:52 <amoralej> after adding swift we got more errors 14:09:53 <amoralej> https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/793602b/testr_results.html 14:10:01 <amoralej> but i'm not sure if those are related to swift 14:10:07 <amoralej> or maybe performance related 14:10:17 <amoralej> but there seems to be some pattern 14:10:24 <amoralej> and be mainly related to cinder 14:10:36 <tosky> amoralej: is it really cinder or something that looks cinder but it's really, say, nova? No :) 14:11:06 <amoralej> in fact it's nova complaining but seems to be cinder :) 14:13:24 <amoralej> it's unclear to me, tbh 14:13:49 <amoralej> Jul 19 07:59:29.720234 node-0001435902 devstack@c-api.service[107162]: CRITICAL cinder [None req-3be6ea4f-0dfb-4686-9ad3-3ad58f157fc6 tempest-ServerStableDeviceRescueTest-1149582697 tempest-ServerStableDeviceRescueTest-1149582697-project] Unhandled error: OSError: write error 14:13:59 <amoralej> that looks suspictious 14:14:08 <amoralej> https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/793602b/controller/logs/screen-c-api.txt 14:14:59 <amoralej> anyway, we can follow up later 14:15:16 <ykarel> yeap ok let's move to next 14:15:33 <ykarel> #topic Jenkins Migration Updates 14:15:53 <ykarel> #link https://review.rdoproject.org/r/q/topic:jenkins-v2 14:16:01 <ykarel> jcapitao, anything to add in this 14:16:28 <jcapitao> so the patch to migrate promotion jobs for tripleo master is ready 14:16:36 <jcapitao> #link https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/34419 14:17:07 <jcapitao> as well for per releases and distro jobs (based on master patch) 14:17:17 <jcapitao> #link https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/34560 14:17:36 <jcapitao> now we're working on tripleo-quickstart 3rd party job which needs some specific Jenkins configuration 14:18:46 <ykarel> ok Good, Thanks jcapitao 14:19:05 <ykarel> weshay|ruck, rlandy|ruck can u have a look ^ 14:19:34 <rlandy|ruck> ykarel: ack - will look after meetings 14:19:44 <rlandy|ruck> weshay|ruck is PTO 14:19:53 <rlandy|ruck> jcapitao: thanks for setting this up 14:19:56 <ykarel> rlandy|ruck, okk Thanks 14:20:42 <ykarel> Ok let's move to next topic 14:20:57 <ykarel> #topic RDO Trunk repos for centos-[queens|rocky|stein] from last meeting 14:21:13 <ykarel> amoralej, you got some updates on ^? 14:21:21 <amoralej> yes, sorry, i realized about i missed sending the mail 14:21:33 <amoralej> i was writing it right before the meeting, i'll send it asap 14:21:49 <ykarel> ok np 14:21:58 <ykarel> can followup next week 14:22:17 <ykarel> #topic chair for next week 14:22:25 <ykarel> any volunteer? 14:22:32 <jcapitao> I can take it 14:22:52 <spotz> Thhanks jcapitao 14:22:58 <ykarel> Thanks jcapitao 14:23:05 <ykarel> #action jcapitao to chair next week 14:23:24 <ykarel> #topic Open Floor 14:23:31 <ykarel> Feel free to bring any topic now 14:23:50 <tosky> amoralej: back to that cinder issue (so that's in the log) that error you reported (OSError) can be seen also in different tests (look for tempest-ServerStableDeviceRescueTest), so it may not be critical 14:23:57 <tosky> there is a proxy error here: https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/793602b/controller/logs/screen-n-cpu.txt 14:24:06 <tosky> look for Jul 19 07:59:29 14:25:26 <amoralej> tosky, iiuc the actual error there is: 14:25:27 <amoralej> ERROR nova.virt.libvirt.driver [None req-bc98f5f5-5bdf-4839-8e92-54e818ebbfca tempest-ServerStableDeviceRescueTest-1149582697 tempest-ServerStableDeviceRescueTest-1149582697-project] Waiting for libvirt event about the detach of device vdb with device alias virtio-disk1 from instance ca12fb3e-0e7c-4460-ab03-c77a593ccc75 is timed out. 14:25:40 <amoralej> that's how i was to check cinder 14:26:00 <amoralej> but i'm not sure who is responsible of that dettaching 14:26:07 <amoralej> is that nova issue? 14:28:48 <tosky> it's on the boundary and that's the part I'm not sure about 14:28:57 <tosky> I'm sure geguileo knows if it's cinder or not 14:29:03 <amoralej> yeah, we are in the same page then :) 14:29:28 <ykarel> ok i just rechecked, and will request node hold, so can check on live env 14:29:35 <amoralej> good 14:29:36 <geguileo> tosky: let me catch up on the conversation... 14:31:27 <rdogerrit> Joel Capitao proposed config master: Update script which sends review based on u-c changes https://review.rdoproject.org/r/c/config/+/33518 14:32:34 <geguileo> tosky: amoralej that issue is in Nova's domain, because the timeout is coming from libvirt 14:33:12 <geguileo> it's failing before they even call os-brick (telling the instance to stop using the block device) 14:33:15 <amoralej> ok, i'll look for some nova friend now 14:34:24 <geguileo> amoralej: no, I may be wrong... 14:34:28 <geguileo> let me have another look... 14:35:33 <geguileo> looking at request req-bc98f5f5-5bdf-4839-8e92-54e818ebbfca which is the one with the error pasted above 14:35:48 <geguileo> that error was temporary, because on the next try it succeeded 14:36:16 <geguileo> so, it failed calling cinder :-( 14:38:05 <ykarel> yes seems so, both the test where it was seeing timeout, have actually passed. 14:38:22 <ykarel> failing one is ERROR nova.compute.manager [None req-70164f48-e90d-494a-8ccd-fbcade5428d4 tempest-AttachVolumeMultiAttachTest-1454555876 tempest-AttachVolumeMultiAttachTest-1454555876-project] [instance: b725072c-0ddf-41ba-9ffd-ad61aa4e54fd] Failed to attach deaabc20-29d2-4ec4-a8a0-ba0fc827b0d7 at /dev/vdb: cinderclient.exceptions.ClientException: Proxy Error (HTTP 502) 14:39:10 <geguileo> the scary part is that the actual cinder code returns 200 14:39:25 <geguileo> Jul 19 07:59:34.690149 node-0001435902 devstack@c-api.service[107162]: INFO cinder.api.openstack.wsgi [req-bc98f5f5-5bdf-4839-8e92-54e818ebbfca req-86510417-b70f-4423-83b6-cceb0e70caab tempest-ServerStableDeviceRescueTest-1149582697 tempest-ServerStableDeviceRescueTest-1149582697-project] 14:39:26 <amoralej> it may be even uwsgi itself 14:39:27 <geguileo> https://38.102.83.14/volume/v3/170ba9baa97a46baac4c99ef456e5d84/attachments/beac4203-1e49-472f-8a1e-2bb3d7d8fe3c returned with HTTP 200 14:40:09 <amoralej> geguileo, the log is written before the request result is passed to uwsgi? 14:40:51 <geguileo> it would seem so, because it fails after cinder API code is "done" 14:41:12 <amoralej> https://stackoverflow.com/questions/36156887/uwsgi-raises-oserror-write-error-during-large-request 14:42:25 <geguileo> amoralej: no I believe it's wsgi who returns the log return value 14:43:57 <amoralej> let's see if we can reproduce it and check 14:44:18 <geguileo> amoralej: https://github.com/openstack/cinder/blob/81f2aaeea91bce6455f9b09cc8795855200e75e1/cinder/api/openstack/wsgi.py#L938-L954 14:46:56 <geguileo> that's the INFO logging and what cinder does afterwards... 14:47:09 <geguileo> doesn't seem to do any OS calls 14:47:52 <amoralej> uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 306] 14:48:06 <amoralej> yes, looks like it's uwsgi itself 14:50:00 <ykarel> Lock acquired by "attachment_update" 14:50:07 <ykarel> doesn't seem this released ^ 14:50:26 <ykarel> ^ seems related geguileo ^? 14:51:20 <ykarel> following req-70164f48-e90d-494a-8ccd-fbcade5428d4 in cinder api log 14:51:32 <ykarel> https://logserver.rdoproject.org/49/34549/20/check/devstack-platform-centos-9-stream/d2b0b81/controller/logs/screen-c-api.txt 14:52:40 <ykarel> Ok let's have this offline, /me ends meeting 14:52:43 <ykarel> Thanks all 14:52:53 <ykarel> #endmeeting