#openstack-meeting-alt log

14:00:40 <rosmaita> #startmeeting cinder
14:00:41 <openstack> Meeting started Wed Mar  3 14:00:40 2021 UTC and is due to finish in 60 minutes.  The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:45 <openstack> The meeting name has been set to 'cinder'
14:00:51 <tosky> o/
14:00:53 <enriquetaso> hi
14:00:53 <geguileo> hi! o/
14:00:53 <rosmaita> #topic roll call
14:00:55 <whoami-rajat> Hi
14:00:58 <e0ne> hi
14:01:00 <zoharm> o7
14:01:01 <kinpaa12389> hi
14:01:02 <eharney> hi
14:01:47 <rosmaita> looks like a good turnout, let's get started
14:02:00 <rosmaita> #link https://etherpad.openstack.org/p/cinder-wallaby-meetings
14:02:04 <rosmaita> #topic announcements
14:02:18 <rosmaita> wallaby os-brick release this week (that is, tomorrow)
14:02:28 <rosmaita> cinderclient release next week
14:02:39 <rosmaita> wallaby feature freeze next week also
14:02:48 <rosmaita> so, review priorities are:
14:02:53 <rosmaita> today: os-brick
14:03:02 <rosmaita> then, cinderclient and features
14:03:13 <rosmaita> use launchpad to find features to review
14:03:40 <rosmaita> #link https://blueprints.launchpad.net/cinder/wallaby
14:03:55 <rosmaita> if you have a feature that's not in there, let me know immediately
14:04:25 <rosmaita> note to driver developer/maintainers: even if you are not a core, you can review other people's drivers
14:04:48 <rosmaita> we have helpful info in the cinder docs if you need suggestions on what to look for
14:05:02 <rosmaita> though, you probably know since you are working on your own driver
14:05:43 <jungleboyj> o/
14:05:43 <rosmaita> #topic wallaby os-brick release
14:06:11 <rosmaita> ok, some patches we need to get in (and that obviously need reviews)
14:06:20 <rosmaita> fix nvmeof connector regression
14:06:29 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/777086
14:06:53 <rosmaita> it's got a -1 from me, but just for minor stuff ... it looks otherwise ok to me
14:07:10 <rosmaita> it has passed zuul
14:07:29 <rosmaita> and passed the mellanox SPDK CI, which consumes the connector in a "legacy" way
14:07:42 <rosmaita> and it has passed the kioxia CI, which consumes the connector in the new way
14:09:19 <rosmaita> anyway, that's the #1 priority because it corrects a regression
14:09:38 <rosmaita> other stuff that is uncontroversial and needs review:
14:09:54 <rosmaita> remove six - https://review.opendev.org/c/openstack/os-brick/+/754598/
14:10:12 <rosmaita> the nvmeof change doesn't use six at all, so it's not impacted
14:10:31 <rosmaita> but i seem to have been reviewing this patch for months and would likeit out of my life
14:10:43 <rosmaita> change min version of tox - https://review.opendev.org/c/openstack/os-brick/+/771568
14:11:10 <rosmaita> i verified that tox is smart enough to update itself when it builds testenvs, so it's not a problem
14:11:23 <rosmaita> RBD: catch read exceptions prior to modifying offset
14:11:36 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/763867
14:11:57 <rosmaita> would be good, please look, if it requires work, give it a -1
14:12:09 <rosmaita> Avoid unhandled exceptions during connecting to iSCSI portals
14:12:17 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/775545
14:12:44 <rosmaita> there's a question about ^^, namely whether it catches too broad an exception
14:13:14 <rosmaita> so while it would be nice to have, i don't insist on it
14:13:29 <rosmaita> so if you have a particular interest in this, please review
14:13:42 <rosmaita> change hacking to 4.0.0
14:13:57 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/774883
14:14:21 <rosmaita> so, we have already done it for cinder, and the one to cinderclient will probably merge before release
14:15:08 <rosmaita> now that i think about it, we could always merge later and backport to stable/wallaby to keep the code consistent
14:15:41 <rosmaita> so, let's hold off unless someone here has a strong opinion?
14:15:55 <eharney> i think just updating to 4.0.0 is pretty easy?  not sure what you mean
14:16:15 <rosmaita> well, i don't know if it will impact any of the other patches
14:16:36 <eharney> ah
14:17:20 <rosmaita> i will rebase it after the critical stuff merges and we can see what happens
14:17:49 <rosmaita> ok, i may put up a patch to update the requirements and lower-constraints
14:18:11 <rosmaita> it may not be necessary because they were updated in january
14:18:48 <rosmaita> anyway, if there is a change, i will individually bug people to look at the patch
14:19:34 <rosmaita> ok, to summarize
14:19:49 <rosmaita> the highest priority for brick is the nvmeof patch
14:20:08 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/777086
14:21:57 <rosmaita> need quick feedback on this issue: https://review.opendev.org/c/openstack/os-brick/+/777086/8/os_brick/initiator/connectors/nvmeof.py#829
14:22:41 <rosmaita> ok, if memory serves, e0ne and hemna volunteered to review this?
14:23:12 <e0ne> rosmaita: will do it. last time I +1'ed on it but there was an issue with naming
14:23:13 <rosmaita> i am right on the verge of +2, but want to make sure we have people who can look at it today
14:23:22 <jungleboyj> Definitely need hemna to look at it.
14:23:24 <rosmaita> e0ne: cool, i think that has been addressed
14:24:03 <rosmaita> i think hemna looked at an earlier version, so shouldn't be too bad to re-review
14:24:14 <jungleboyj> ++
14:24:23 <rosmaita> he may be on the west coast this week, i will look for him later
14:25:12 <rosmaita> #topic cinderclient
14:25:13 <hemna> I'll poke at it
14:25:18 <rosmaita> hemna: ty!
14:25:41 <rosmaita> https://review.opendev.org/q/project:openstack/python-cinderclient+branch:master+status:open
14:25:58 <rosmaita> major things are to get the MV updated
14:26:12 <rosmaita> there's a patch for 3.63
14:26:20 <rosmaita> there will be a patch for 3.64
14:26:35 <rosmaita> but the cinder-side code needs to merge first
14:27:13 <rosmaita> #link https://review.opendev.org/q/project:openstack/python-cinderclient+branch:master+status:open
14:27:54 <rosmaita> bad paste there, sorry
14:28:02 <rosmaita> #link https://review.opendev.org/c/openstack/cinder/+/771081
14:28:29 <rosmaita> ^^ is a wallaby feature so is a priority review ... also, it's nicely written and has good tests, and one +2
14:28:33 * jungleboyj has so many tabs open.  :-)
14:28:38 <rosmaita> :D
14:28:57 <rosmaita> ok, i am still working on the v2 support removal patch for cinderclient
14:29:14 <rosmaita> should have it ready in the next day or so
14:29:26 <jungleboyj> ++
14:30:25 <rosmaita> so note to reviewers: there are a bunch of small patches for cinderclient, please address them
14:30:44 <rosmaita> #topic Request from TC to investigate frequent Gate failures
14:30:50 <rosmaita> jungleboyj: that's you
14:30:58 <jungleboyj> Hello.  :-)
14:31:31 <jungleboyj> So, as many have hopefully seen, there has been an effort to make our gate jobs more efficient as we have fewer resources for running check/gate.
14:31:52 <jungleboyj> There have also been issues with how long patches are taking to get through.
14:32:08 <jungleboyj> Neutron, Nova, etc have made things more efficient.
14:32:31 <jungleboyj> There is still need for improvement.
14:32:46 <jungleboyj> Now they are looking at how often gate/check runs fail.
14:33:10 <jungleboyj> #link https://termbin.com/4guht
14:33:29 <tosky> interesting enough, the pastebin you linked refers to tempest-integrated-storage failures, I thought we had many more in the lvm-lio-barbican job
14:33:32 <jungleboyj> dansmith:  Had highlighted two examples above where he was seeing frequent failures for Cinder.
14:33:32 <rosmaita> does dansmith have an elasticsearch query up for these?
14:33:55 <eharney> the lio-barbican job is failing a lot, but i think it doesn't show up much where other projects would see it
14:34:02 <jungleboyj> rosmaita:  I don't think he had put that together yet.  This was just a quick grab.
14:34:03 <dansmith> nope, haven't done that yet
14:34:03 <rosmaita> what eharney said
14:34:14 <rosmaita> dansmith: if you have time, that would be great
14:34:15 <tosky> we have kind of discussed a bit about those in the past, my feeling is that something weird happen on the limited environment
14:34:26 <dansmith> definitely not barbican-related jobs that I've seen
14:34:44 <tosky> and eharney has started a bit of investigation, I don't remember if there were any findings
14:35:33 <eharney> tosky: i think there's actually a bug in the target code hitting the lio-barbican job but i guess the questions today are about other issues
14:35:36 <rosmaita> i think if we can get an elasticsearch query up, that will make it easier to find fresh logs when someone has time to look
14:35:46 <tosky> eharney: I see
14:36:00 <jungleboyj> So, partially, this is a reminder to everyone that we should try to look at failures and not just blindly recheck.  Second, as to see if anyone has been looking at failures.
14:36:21 <rosmaita> the big lvm-lio-barbican job failures i have been seeing are related to backup tests and the test-boot-pattern tests
14:36:25 <jungleboyj> eharney:  It is more of a general call.  Those were just examples.  :-)
14:36:40 <jungleboyj> rosmaita:  Hasn't that been the case for like 3 years now?
14:36:47 <tosky> (backup tests which were removed from the main gates because of the failing rates, probably resources )
14:37:49 <jungleboyj> rosmaita:  Do we have elasticsearch queries for those?
14:38:02 <rosmaita> i don't think so
14:38:18 <jungleboyj> So, that would probably be good to add.
14:38:30 <rosmaita> someone could be a Hero of Cinder and put them together
14:38:44 <jungleboyj> :-)
14:38:44 <rosmaita> and put the links on the meeting etherpad
14:39:04 * jungleboyj looks at our bug here enriquetaso
14:39:08 <jungleboyj> *hero
14:39:35 <enriquetaso> sure
14:39:46 <rosmaita> maybe a driver maintainer who's waiting for reviews ?
14:39:47 <enriquetaso> not really sure how but sure
14:40:07 <jungleboyj> Add to our weekly bug review an eye on where our gate/check failures are at.
14:40:15 <jungleboyj> rosmaita:  That is a good idea too.
14:40:52 <rosmaita> enriquetaso: you can look by hand, but i think the preferred method is to file a bug and checkout a repo and push a patch
14:40:56 <rosmaita> makes it more stable
14:41:09 <rosmaita> we can talk offline, i have notes somewhere
14:41:27 <rosmaita> or maybe dansmith can volunteer to walk you through the process?
14:42:05 <rosmaita> ok, so the conclusion is that volume-related gate failures suck and someone needs to do something about it
14:42:08 <dansmith> tbh, I haven't actually done that in a while
14:42:18 <rosmaita> #topic Wallaby R-6 Bug Review
14:42:33 <enriquetaso> hi
14:42:47 <enriquetaso> We have 6 bugs reported this week. I'd like to show 3 today because I'm not sure about the severity of them.
14:42:50 <rosmaita> dansmith: that's fine, just may slow things down a bit, as you can see enriquetaso is already kind of busy
14:43:11 <enriquetaso> #link https://etherpad.opendev.org/p/cinder-wallaby-r6-bug-review
14:43:18 <enriquetaso> bug_1:
14:43:18 <enriquetaso> Backup create failed: RBD volume flatten too long causing mq to timed out.
14:43:19 <jungleboyj> rosmaita:  ++ Thank you.
14:43:27 <enriquetaso> #link
14:43:27 <enriquetaso> https://bugs.launchpad.net/cinder/+bug/1916843
14:43:28 <openstack> Launchpad bug 1916843 in Cinder "Backup create failed: RBD volume flatten too long causing mq to timed out." [Medium,New]
14:43:35 <enriquetaso> #link  https://bugs.launchpad.net/cinder/+bug/1916843
14:43:40 <enriquetaso> In the process of creating a backup using a snapshot, there is an operation tocreate a temporary volume from a snapshot, which requires cinder-backupand cinder-volume to perform rpc interaction.default configuration"rpc_response_timeout" is 60s.
14:44:19 <enriquetaso> whoami-rajat: is this related to any of the flatten work you have done?
14:44:27 <enriquetaso> i can't remember
14:45:03 <whoami-rajat> can't recall anything related, will take a thorough look after the meeting
14:45:25 <enriquetaso> Thanks!
14:45:29 <rosmaita> i think we should ask for more info first
14:45:41 <rosmaita> something about the environment
14:45:47 <eharney> the main point in that bug is that the slow call shouldn't block the service up for RPC calls, AFAIK
14:45:58 <eharney> the fact that it's slow isn't really the interesting part
14:46:42 <rosmaita> yeah, but sounds like we would need a major redesign to do that right
14:46:58 <eharney> i'm not so sure, need to look around in that area again
14:47:01 <dansmith> is it just that the operation takes longer than the RPC timeout?
14:47:18 <dansmith> like, works for small volumes but large one will always take longer than 60s and the RPC call times out?
14:47:31 <rosmaita> not sure, doesn't say
14:48:00 <eharney> i'm thinking more in the direction of librbd not behaving the way most code does with eventlet
14:48:13 <dansmith> we had a lot of those sorts of things in nova, so I implemented heartbeating long-call support in oslo.messaging a while back which lets you make much longer RPC calls
14:48:14 <eharney> but i don't have enough to make it a useful discussion for now
14:48:35 <dansmith> okay if you're blocked in a C lib call and not answering RPC, that would also be bad
14:48:43 <eharney> right
14:49:29 <enriquetaso> OK looks like there may be something there
14:49:41 <enriquetaso> next
14:49:45 <enriquetaso> bug_2: Cinder sends old db object when delete a attachment
14:49:50 <enriquetaso> #link https://bugs.launchpad.net/cinder/+bug/1916980
14:49:51 <openstack> Launchpad bug 1916980 in Cinder "cinder sends old db object when delete a attachment" [Undecided,Incomplete] - Assigned to wu.chunyang (wuchunyang)
14:50:01 <enriquetaso> When we delete an attachment Cinder sends a notification with old db object. We should call refresh() function to refresh the object before sending tonotification. after running the refresh function. the status  changes todetached.
14:50:01 <enriquetaso> I left a comment on the bug report "Does this problem occur on all backends or on a specific one?"
14:50:20 <geguileo> enriquetaso: I have fixed that one in my local repo
14:50:29 <geguileo> enriquetaso: I will submit the patch later today
14:50:32 <enriquetaso> cool
14:50:34 <enriquetaso> thanks!
14:50:47 <enriquetaso> next
14:50:49 <enriquetaso> bug_3: Scheduling is not even among multiple thin provisioning pools which have different sizes
14:50:57 <enriquetaso> #link https://bugs.launchpad.net/cinder/+bug/1917293
14:50:58 <openstack> Launchpad bug 1917293 in Cinder "Scheduling is not even among multiple thin provisioning pools which have different sizes" [Undecided,New]
14:51:05 <enriquetaso> Scheduling is not even among multiple thin provisioning pools
14:51:05 <enriquetaso> which have different sizes. For example, there are two thin provisioning
14:51:05 <enriquetaso> pools. Pool0 has 10T capacity, Pool1 has 30T capacity, the max_over_subscription_ratio
14:51:05 <enriquetaso> of them are both 20. We assume that the provisioned_capacity_gb of Pool1
14:51:05 <enriquetaso> is 250T and the provisioned_capacity_gb of Pool0 is 0.
14:51:06 <enriquetaso> According to the formula in the cinder source code, the free capacity of Pool0
14:51:10 <enriquetaso> is 10*20-0=200T and the free capacity of Pool1 is 30*20-250=350T.
14:51:12 <enriquetaso> So it is clear for us to see that a new created volume is
14:51:14 <enriquetaso> scheduled to Pool1 instead of Pool0. However, Pool0 is scheduled
14:51:16 <enriquetaso> since it has bigger real capacity.
14:51:18 <enriquetaso> In a word, the scheduler tends to schedule the pool that has bigger size.
14:51:55 <geguileo> enriquetaso: I haven't looked at the bug, but where they testing on devstack or with a deployment that has 3 schedulers?
14:52:12 <eharney> i'd need to look into this one again, but i'm not sure it's a bug, it may be that the reporter's expectations aren't quite right and he's expecting a scheduling algorithm that's smarter than what we actually try to do
14:52:34 <geguileo> or that the requests are going to different schedulers
14:52:45 <geguileo> and each scheduler has a different in-memory data at the time of receiving the request...
14:52:57 <enriquetaso> geguileo, i need to ask because it doesn't says
14:53:48 <geguileo> enriquetaso: anything that happens with multiple schedulers but doesn't with a single one can be attributed to different in-memory data
14:54:30 <enriquetaso> geguileo, good point, so we need more info
14:54:39 <eharney> this will also depend on how you configure the capacity weigher
14:55:27 <enriquetaso> OK
14:55:39 <enriquetaso> that's the last one :)
14:55:50 <rosmaita> thanks sofia
14:55:59 <rosmaita> #topic stable branch update
14:56:07 <whoami-rajat> i will provide a quick update
14:56:16 <whoami-rajat> proposed stable victoria release
14:56:22 <whoami-rajat> #link https://review.opendev.org/c/openstack/releases/+/778231
14:56:31 <whoami-rajat> 3 patches remaining in ussuri (cinder and os-brick)
14:56:42 <whoami-rajat> 4 remaining in train
14:56:54 <whoami-rajat> thanks everyone for the reviews this week
14:57:17 <whoami-rajat> that's all from my side
14:57:31 <rosmaita> thanks Rajat
14:57:44 <rosmaita> #topic conf option workaround
14:57:58 <whoami-rajat> Again me
14:58:17 <whoami-rajat> so hemna asked a question about the default value of the conf option in rbd clone patch
14:58:31 <whoami-rajat> #link https://review.opendev.org/c/openstack/cinder/+/754397
14:59:10 <eharney> IIRC, this is defaulted off mostly because we don't want to introduce extra risk and complexity, right?
14:59:19 <whoami-rajat> yes
14:59:37 <eharney> that's not really obvious in the patch, but that plus the fact that there are other solutions for people now makes a decent case for defaulting it off i think
14:59:48 <rosmaita> ok, we are out of time ... let's continue this in openstack-cinder, or respond on the patch
14:59:57 <rosmaita> thanks everyone
15:00:02 <rosmaita> #endmeeting