14:00:40 <rosmaita> #startmeeting cinder 14:00:41 <openstack> Meeting started Wed Mar 3 14:00:40 2021 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:45 <openstack> The meeting name has been set to 'cinder' 14:00:51 <tosky> o/ 14:00:53 <enriquetaso> hi 14:00:53 <geguileo> hi! o/ 14:00:53 <rosmaita> #topic roll call 14:00:55 <whoami-rajat> Hi 14:00:58 <e0ne> hi 14:01:00 <zoharm> o7 14:01:01 <kinpaa12389> hi 14:01:02 <eharney> hi 14:01:47 <rosmaita> looks like a good turnout, let's get started 14:02:00 <rosmaita> #link https://etherpad.openstack.org/p/cinder-wallaby-meetings 14:02:04 <rosmaita> #topic announcements 14:02:18 <rosmaita> wallaby os-brick release this week (that is, tomorrow) 14:02:28 <rosmaita> cinderclient release next week 14:02:39 <rosmaita> wallaby feature freeze next week also 14:02:48 <rosmaita> so, review priorities are: 14:02:53 <rosmaita> today: os-brick 14:03:02 <rosmaita> then, cinderclient and features 14:03:13 <rosmaita> use launchpad to find features to review 14:03:40 <rosmaita> #link https://blueprints.launchpad.net/cinder/wallaby 14:03:55 <rosmaita> if you have a feature that's not in there, let me know immediately 14:04:25 <rosmaita> note to driver developer/maintainers: even if you are not a core, you can review other people's drivers 14:04:48 <rosmaita> we have helpful info in the cinder docs if you need suggestions on what to look for 14:05:02 <rosmaita> though, you probably know since you are working on your own driver 14:05:43 <jungleboyj> o/ 14:05:43 <rosmaita> #topic wallaby os-brick release 14:06:11 <rosmaita> ok, some patches we need to get in (and that obviously need reviews) 14:06:20 <rosmaita> fix nvmeof connector regression 14:06:29 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/777086 14:06:53 <rosmaita> it's got a -1 from me, but just for minor stuff ... it looks otherwise ok to me 14:07:10 <rosmaita> it has passed zuul 14:07:29 <rosmaita> and passed the mellanox SPDK CI, which consumes the connector in a "legacy" way 14:07:42 <rosmaita> and it has passed the kioxia CI, which consumes the connector in the new way 14:09:19 <rosmaita> anyway, that's the #1 priority because it corrects a regression 14:09:38 <rosmaita> other stuff that is uncontroversial and needs review: 14:09:54 <rosmaita> remove six - https://review.opendev.org/c/openstack/os-brick/+/754598/ 14:10:12 <rosmaita> the nvmeof change doesn't use six at all, so it's not impacted 14:10:31 <rosmaita> but i seem to have been reviewing this patch for months and would likeit out of my life 14:10:43 <rosmaita> change min version of tox - https://review.opendev.org/c/openstack/os-brick/+/771568 14:11:10 <rosmaita> i verified that tox is smart enough to update itself when it builds testenvs, so it's not a problem 14:11:23 <rosmaita> RBD: catch read exceptions prior to modifying offset 14:11:36 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/763867 14:11:57 <rosmaita> would be good, please look, if it requires work, give it a -1 14:12:09 <rosmaita> Avoid unhandled exceptions during connecting to iSCSI portals 14:12:17 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/775545 14:12:44 <rosmaita> there's a question about ^^, namely whether it catches too broad an exception 14:13:14 <rosmaita> so while it would be nice to have, i don't insist on it 14:13:29 <rosmaita> so if you have a particular interest in this, please review 14:13:42 <rosmaita> change hacking to 4.0.0 14:13:57 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/774883 14:14:21 <rosmaita> so, we have already done it for cinder, and the one to cinderclient will probably merge before release 14:15:08 <rosmaita> now that i think about it, we could always merge later and backport to stable/wallaby to keep the code consistent 14:15:41 <rosmaita> so, let's hold off unless someone here has a strong opinion? 14:15:55 <eharney> i think just updating to 4.0.0 is pretty easy? not sure what you mean 14:16:15 <rosmaita> well, i don't know if it will impact any of the other patches 14:16:36 <eharney> ah 14:17:20 <rosmaita> i will rebase it after the critical stuff merges and we can see what happens 14:17:49 <rosmaita> ok, i may put up a patch to update the requirements and lower-constraints 14:18:11 <rosmaita> it may not be necessary because they were updated in january 14:18:48 <rosmaita> anyway, if there is a change, i will individually bug people to look at the patch 14:19:34 <rosmaita> ok, to summarize 14:19:49 <rosmaita> the highest priority for brick is the nvmeof patch 14:20:08 <rosmaita> #link https://review.opendev.org/c/openstack/os-brick/+/777086 14:21:57 <rosmaita> need quick feedback on this issue: https://review.opendev.org/c/openstack/os-brick/+/777086/8/os_brick/initiator/connectors/nvmeof.py#829 14:22:41 <rosmaita> ok, if memory serves, e0ne and hemna volunteered to review this? 14:23:12 <e0ne> rosmaita: will do it. last time I +1'ed on it but there was an issue with naming 14:23:13 <rosmaita> i am right on the verge of +2, but want to make sure we have people who can look at it today 14:23:22 <jungleboyj> Definitely need hemna to look at it. 14:23:24 <rosmaita> e0ne: cool, i think that has been addressed 14:24:03 <rosmaita> i think hemna looked at an earlier version, so shouldn't be too bad to re-review 14:24:14 <jungleboyj> ++ 14:24:23 <rosmaita> he may be on the west coast this week, i will look for him later 14:25:12 <rosmaita> #topic cinderclient 14:25:13 <hemna> I'll poke at it 14:25:18 <rosmaita> hemna: ty! 14:25:41 <rosmaita> https://review.opendev.org/q/project:openstack/python-cinderclient+branch:master+status:open 14:25:58 <rosmaita> major things are to get the MV updated 14:26:12 <rosmaita> there's a patch for 3.63 14:26:20 <rosmaita> there will be a patch for 3.64 14:26:35 <rosmaita> but the cinder-side code needs to merge first 14:27:13 <rosmaita> #link https://review.opendev.org/q/project:openstack/python-cinderclient+branch:master+status:open 14:27:54 <rosmaita> bad paste there, sorry 14:28:02 <rosmaita> #link https://review.opendev.org/c/openstack/cinder/+/771081 14:28:29 <rosmaita> ^^ is a wallaby feature so is a priority review ... also, it's nicely written and has good tests, and one +2 14:28:33 * jungleboyj has so many tabs open. :-) 14:28:38 <rosmaita> :D 14:28:57 <rosmaita> ok, i am still working on the v2 support removal patch for cinderclient 14:29:14 <rosmaita> should have it ready in the next day or so 14:29:26 <jungleboyj> ++ 14:30:25 <rosmaita> so note to reviewers: there are a bunch of small patches for cinderclient, please address them 14:30:44 <rosmaita> #topic Request from TC to investigate frequent Gate failures 14:30:50 <rosmaita> jungleboyj: that's you 14:30:58 <jungleboyj> Hello. :-) 14:31:31 <jungleboyj> So, as many have hopefully seen, there has been an effort to make our gate jobs more efficient as we have fewer resources for running check/gate. 14:31:52 <jungleboyj> There have also been issues with how long patches are taking to get through. 14:32:08 <jungleboyj> Neutron, Nova, etc have made things more efficient. 14:32:31 <jungleboyj> There is still need for improvement. 14:32:46 <jungleboyj> Now they are looking at how often gate/check runs fail. 14:33:10 <jungleboyj> #link https://termbin.com/4guht 14:33:29 <tosky> interesting enough, the pastebin you linked refers to tempest-integrated-storage failures, I thought we had many more in the lvm-lio-barbican job 14:33:32 <jungleboyj> dansmith: Had highlighted two examples above where he was seeing frequent failures for Cinder. 14:33:32 <rosmaita> does dansmith have an elasticsearch query up for these? 14:33:55 <eharney> the lio-barbican job is failing a lot, but i think it doesn't show up much where other projects would see it 14:34:02 <jungleboyj> rosmaita: I don't think he had put that together yet. This was just a quick grab. 14:34:03 <dansmith> nope, haven't done that yet 14:34:03 <rosmaita> what eharney said 14:34:14 <rosmaita> dansmith: if you have time, that would be great 14:34:15 <tosky> we have kind of discussed a bit about those in the past, my feeling is that something weird happen on the limited environment 14:34:26 <dansmith> definitely not barbican-related jobs that I've seen 14:34:44 <tosky> and eharney has started a bit of investigation, I don't remember if there were any findings 14:35:33 <eharney> tosky: i think there's actually a bug in the target code hitting the lio-barbican job but i guess the questions today are about other issues 14:35:36 <rosmaita> i think if we can get an elasticsearch query up, that will make it easier to find fresh logs when someone has time to look 14:35:46 <tosky> eharney: I see 14:36:00 <jungleboyj> So, partially, this is a reminder to everyone that we should try to look at failures and not just blindly recheck. Second, as to see if anyone has been looking at failures. 14:36:21 <rosmaita> the big lvm-lio-barbican job failures i have been seeing are related to backup tests and the test-boot-pattern tests 14:36:25 <jungleboyj> eharney: It is more of a general call. Those were just examples. :-) 14:36:40 <jungleboyj> rosmaita: Hasn't that been the case for like 3 years now? 14:36:47 <tosky> (backup tests which were removed from the main gates because of the failing rates, probably resources ) 14:37:49 <jungleboyj> rosmaita: Do we have elasticsearch queries for those? 14:38:02 <rosmaita> i don't think so 14:38:18 <jungleboyj> So, that would probably be good to add. 14:38:30 <rosmaita> someone could be a Hero of Cinder and put them together 14:38:44 <jungleboyj> :-) 14:38:44 <rosmaita> and put the links on the meeting etherpad 14:39:04 * jungleboyj looks at our bug here enriquetaso 14:39:08 <jungleboyj> *hero 14:39:35 <enriquetaso> sure 14:39:46 <rosmaita> maybe a driver maintainer who's waiting for reviews ? 14:39:47 <enriquetaso> not really sure how but sure 14:40:07 <jungleboyj> Add to our weekly bug review an eye on where our gate/check failures are at. 14:40:15 <jungleboyj> rosmaita: That is a good idea too. 14:40:52 <rosmaita> enriquetaso: you can look by hand, but i think the preferred method is to file a bug and checkout a repo and push a patch 14:40:56 <rosmaita> makes it more stable 14:41:09 <rosmaita> we can talk offline, i have notes somewhere 14:41:27 <rosmaita> or maybe dansmith can volunteer to walk you through the process? 14:42:05 <rosmaita> ok, so the conclusion is that volume-related gate failures suck and someone needs to do something about it 14:42:08 <dansmith> tbh, I haven't actually done that in a while 14:42:18 <rosmaita> #topic Wallaby R-6 Bug Review 14:42:33 <enriquetaso> hi 14:42:47 <enriquetaso> We have 6 bugs reported this week. I'd like to show 3 today because I'm not sure about the severity of them. 14:42:50 <rosmaita> dansmith: that's fine, just may slow things down a bit, as you can see enriquetaso is already kind of busy 14:43:11 <enriquetaso> #link https://etherpad.opendev.org/p/cinder-wallaby-r6-bug-review 14:43:18 <enriquetaso> bug_1: 14:43:18 <enriquetaso> Backup create failed: RBD volume flatten too long causing mq to timed out. 14:43:19 <jungleboyj> rosmaita: ++ Thank you. 14:43:27 <enriquetaso> #link 14:43:27 <enriquetaso> https://bugs.launchpad.net/cinder/+bug/1916843 14:43:28 <openstack> Launchpad bug 1916843 in Cinder "Backup create failed: RBD volume flatten too long causing mq to timed out." [Medium,New] 14:43:35 <enriquetaso> #link https://bugs.launchpad.net/cinder/+bug/1916843 14:43:40 <enriquetaso> In the process of creating a backup using a snapshot, there is an operation tocreate a temporary volume from a snapshot, which requires cinder-backupand cinder-volume to perform rpc interaction.default configuration"rpc_response_timeout" is 60s. 14:44:19 <enriquetaso> whoami-rajat: is this related to any of the flatten work you have done? 14:44:27 <enriquetaso> i can't remember 14:45:03 <whoami-rajat> can't recall anything related, will take a thorough look after the meeting 14:45:25 <enriquetaso> Thanks! 14:45:29 <rosmaita> i think we should ask for more info first 14:45:41 <rosmaita> something about the environment 14:45:47 <eharney> the main point in that bug is that the slow call shouldn't block the service up for RPC calls, AFAIK 14:45:58 <eharney> the fact that it's slow isn't really the interesting part 14:46:42 <rosmaita> yeah, but sounds like we would need a major redesign to do that right 14:46:58 <eharney> i'm not so sure, need to look around in that area again 14:47:01 <dansmith> is it just that the operation takes longer than the RPC timeout? 14:47:18 <dansmith> like, works for small volumes but large one will always take longer than 60s and the RPC call times out? 14:47:31 <rosmaita> not sure, doesn't say 14:48:00 <eharney> i'm thinking more in the direction of librbd not behaving the way most code does with eventlet 14:48:13 <dansmith> we had a lot of those sorts of things in nova, so I implemented heartbeating long-call support in oslo.messaging a while back which lets you make much longer RPC calls 14:48:14 <eharney> but i don't have enough to make it a useful discussion for now 14:48:35 <dansmith> okay if you're blocked in a C lib call and not answering RPC, that would also be bad 14:48:43 <eharney> right 14:49:29 <enriquetaso> OK looks like there may be something there 14:49:41 <enriquetaso> next 14:49:45 <enriquetaso> bug_2: Cinder sends old db object when delete a attachment 14:49:50 <enriquetaso> #link https://bugs.launchpad.net/cinder/+bug/1916980 14:49:51 <openstack> Launchpad bug 1916980 in Cinder "cinder sends old db object when delete a attachment" [Undecided,Incomplete] - Assigned to wu.chunyang (wuchunyang) 14:50:01 <enriquetaso> When we delete an attachment Cinder sends a notification with old db object. We should call refresh() function to refresh the object before sending tonotification. after running the refresh function. the status changes todetached. 14:50:01 <enriquetaso> I left a comment on the bug report "Does this problem occur on all backends or on a specific one?" 14:50:20 <geguileo> enriquetaso: I have fixed that one in my local repo 14:50:29 <geguileo> enriquetaso: I will submit the patch later today 14:50:32 <enriquetaso> cool 14:50:34 <enriquetaso> thanks! 14:50:47 <enriquetaso> next 14:50:49 <enriquetaso> bug_3: Scheduling is not even among multiple thin provisioning pools which have different sizes 14:50:57 <enriquetaso> #link https://bugs.launchpad.net/cinder/+bug/1917293 14:50:58 <openstack> Launchpad bug 1917293 in Cinder "Scheduling is not even among multiple thin provisioning pools which have different sizes" [Undecided,New] 14:51:05 <enriquetaso> Scheduling is not even among multiple thin provisioning pools 14:51:05 <enriquetaso> which have different sizes. For example, there are two thin provisioning 14:51:05 <enriquetaso> pools. Pool0 has 10T capacity, Pool1 has 30T capacity, the max_over_subscription_ratio 14:51:05 <enriquetaso> of them are both 20. We assume that the provisioned_capacity_gb of Pool1 14:51:05 <enriquetaso> is 250T and the provisioned_capacity_gb of Pool0 is 0. 14:51:06 <enriquetaso> According to the formula in the cinder source code, the free capacity of Pool0 14:51:10 <enriquetaso> is 10*20-0=200T and the free capacity of Pool1 is 30*20-250=350T. 14:51:12 <enriquetaso> So it is clear for us to see that a new created volume is 14:51:14 <enriquetaso> scheduled to Pool1 instead of Pool0. However, Pool0 is scheduled 14:51:16 <enriquetaso> since it has bigger real capacity. 14:51:18 <enriquetaso> In a word, the scheduler tends to schedule the pool that has bigger size. 14:51:55 <geguileo> enriquetaso: I haven't looked at the bug, but where they testing on devstack or with a deployment that has 3 schedulers? 14:52:12 <eharney> i'd need to look into this one again, but i'm not sure it's a bug, it may be that the reporter's expectations aren't quite right and he's expecting a scheduling algorithm that's smarter than what we actually try to do 14:52:34 <geguileo> or that the requests are going to different schedulers 14:52:45 <geguileo> and each scheduler has a different in-memory data at the time of receiving the request... 14:52:57 <enriquetaso> geguileo, i need to ask because it doesn't says 14:53:48 <geguileo> enriquetaso: anything that happens with multiple schedulers but doesn't with a single one can be attributed to different in-memory data 14:54:30 <enriquetaso> geguileo, good point, so we need more info 14:54:39 <eharney> this will also depend on how you configure the capacity weigher 14:55:27 <enriquetaso> OK 14:55:39 <enriquetaso> that's the last one :) 14:55:50 <rosmaita> thanks sofia 14:55:59 <rosmaita> #topic stable branch update 14:56:07 <whoami-rajat> i will provide a quick update 14:56:16 <whoami-rajat> proposed stable victoria release 14:56:22 <whoami-rajat> #link https://review.opendev.org/c/openstack/releases/+/778231 14:56:31 <whoami-rajat> 3 patches remaining in ussuri (cinder and os-brick) 14:56:42 <whoami-rajat> 4 remaining in train 14:56:54 <whoami-rajat> thanks everyone for the reviews this week 14:57:17 <whoami-rajat> that's all from my side 14:57:31 <rosmaita> thanks Rajat 14:57:44 <rosmaita> #topic conf option workaround 14:57:58 <whoami-rajat> Again me 14:58:17 <whoami-rajat> so hemna asked a question about the default value of the conf option in rbd clone patch 14:58:31 <whoami-rajat> #link https://review.opendev.org/c/openstack/cinder/+/754397 14:59:10 <eharney> IIRC, this is defaulted off mostly because we don't want to introduce extra risk and complexity, right? 14:59:19 <whoami-rajat> yes 14:59:37 <eharney> that's not really obvious in the patch, but that plus the fact that there are other solutions for people now makes a decent case for defaulting it off i think 14:59:48 <rosmaita> ok, we are out of time ... let's continue this in openstack-cinder, or respond on the patch 14:59:57 <rosmaita> thanks everyone 15:00:02 <rosmaita> #endmeeting