14:00:35 <rosmaita> #startmeeting cinder 14:00:36 <openstack> Meeting started Wed Jul 8 14:00:35 2020 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:39 <openstack> The meeting name has been set to 'cinder' 14:00:47 <jungleboyj> o/ 14:00:55 <rosmaita> #topic roll call 14:00:55 <e0ne> hi 14:00:57 <whoami-rajat> Hi 14:01:00 <geguileo> hi! o/ 14:01:27 <eharney> hi 14:01:29 <tosky> hi 14:02:10 <rosmaita> ok, looks like we have some people 14:02:13 <rosmaita> hello everyone 14:02:31 <rosmaita> #link https://etherpad.openstack.org/p/cinder-victoria-meetings 14:03:05 <rosmaita> i'm at a coffee shop due to a power outage 14:03:15 <smcginnis> o/ 14:03:19 <rosmaita> so not using my usual keyboard, as you will notice 14:03:36 <rosmaita> ok, let's get started 14:03:45 <rosmaita> #topic updates 14:03:50 <jungleboyj> rosmaita, You can go to coffee shops? 14:03:51 <jungleboyj> :-) 14:04:08 <rosmaita> i am sitting outside, 15 feet from anyone else 14:04:15 <jungleboyj> ++ 14:04:16 <rosmaita> inside is closed, you can only get coffee and leave 14:04:22 <rosmaita> but the wifi is working! 14:04:32 <tosky> what else is needed then 14:04:39 <rosmaita> a better keyboard! 14:05:08 <rosmaita> i the function and control keys are mashed together and i am having cutting & pasting problems 14:05:12 <rosmaita> but enough about that 14:05:24 <rosmaita> ok, the video meeting poll closes tomorrow 14:05:38 <rosmaita> #link https://rosmaita.wufoo.com/forms/monthly-video-meeting-proposal/ 14:05:57 <rosmaita> it even has an option for "don't care", so even if you don't care, you can still fill it out 14:06:11 <rosmaita> this week is R-minus-14 14:06:17 <rosmaita> milestone 2 is at R-11 14:06:17 <enriquetaso> o/ 14:06:24 <rosmaita> hello sofia 14:06:26 <rosmaita> that is, really soon 14:06:33 <rosmaita> it is also the new driver merge deadline 14:06:53 <rosmaita> i think we have 2 new drivers proposed? 14:06:58 <rosmaita> hitachi is mostly together 14:07:03 <rosmaita> thanks to lseki and smcginnis for reviewing that closely 14:07:09 <rosmaita> and i think dell/emc is proposing a new driver? 14:07:41 <rosmaita> i don't think i've seen any patches, just the launchpad blueprint so far 14:07:52 <rosmaita> and a special note for geguileo 14:07:59 <rosmaita> ussuri cinderlib must be released by R-9 14:08:00 <smcginnis> rosmaita: I don't think we will get that Dell one for Victoria. 14:08:17 <LiangFang> o/ 14:08:19 <rosmaita> smcginnis: ok 14:08:24 <geguileo> rosmaita: as soon as we review the patches that are in gerrit (with the exception of the one with the -W) we can release 14:08:43 <rosmaita> "we" meaning "me", at least partially ... OK, will do 14:09:17 <rosmaita> ok, that's all the announcements 14:09:40 <rosmaita> i thought for a minute i deleted lseki's topic by mistake 14:09:53 <rosmaita> but i see that he has moved it lower due to connection problems 14:10:11 <rosmaita> #topic Moving stable/ocata and stable/pike to quick EOL 14:10:20 <sfernand> Lucio is having some issues to join the meeting, he is asking if we could post pone that 14:10:23 <sfernand> ahh ok 14:10:33 <TusharTgite> hi 14:10:48 <rosmaita> ok, so you may have seen on the ML that nova is proposing to put pike and ocata into 'unmaintained' 14:10:59 <rosmaita> hang on while i paste links 14:11:10 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html 14:11:20 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html 14:11:46 <rosmaita> you may remember that there was a proposal to do this for ocata before the PTG 14:12:13 <rosmaita> and smcginnis pointed out in that thread that if one of the major projects EOLs a branch, we pretty much all have to do it 14:12:15 <rosmaita> anyway 14:12:24 <rosmaita> i looked at our cinder ocata and pike branches 14:12:36 <rosmaita> and they haven't been committed to in over 6 months 14:13:07 <rosmaita> i mention that because lyarwood was proposing to back-date the nova 'unmaintained' phase to the last commit, which would mean a 3 month head start 14:13:18 <rosmaita> i am not being clear 14:13:35 <rosmaita> the issue is that a branch is supposed to be 'unmaintained' for 6 months, and then can go EOL 14:13:53 <rosmaita> so, if it's ok for nova to back-date the 'unmaintained' period, i think we can too 14:14:03 <rosmaita> just so happens that our back-dating can be 6 months 14:14:15 <rosmaita> so my proposal is to put out a notice on the ML 14:14:43 <rosmaita> that we are putting cinder pike and ocata into 'unmaintained' for 2 weeks, and if no one adopts them, we will EOL them 14:14:52 <smcginnis> ++ 14:15:02 <rosmaita> that's what i was waiting for! 14:15:06 <rosmaita> thanks smcginnis 14:15:11 <smcginnis> ;) 14:15:33 <rosmaita> ok, so i will do that this afternoon ... 2 weeks from today is 22 July 14:15:43 <rosmaita> (just to have that on the record) 14:15:58 <LiangFang> ++ 14:16:07 <tosky> removing them will simplify a lot the job handling; most "modern" jobs starts from pike, if not rocky 14:16:37 <rosmaita> yeah, ocata has been dead to me for a month now 14:16:42 <rosmaita> and pike is not much better 14:16:54 <rosmaita> hooray for modernization 14:17:14 <rosmaita> that's all, if anyone has second thoughts, we may have some open discussion later, and there is always the ML 14:17:31 <rosmaita> #topic rethink the visibility of __DEFAULT__ type 14:17:36 <rosmaita> whoami-rajat: that's you 14:17:42 <whoami-rajat> rosmaita, thanks! 14:18:16 <rosmaita> #link https://bugs.launchpad.net/cinder/+bug/1886632 14:18:16 <openstack> Launchpad bug 1886632 in Cinder "Cannot delete __DEFAULT__ volume type" [Undecided,New] - Assigned to Rajat Dhasmana (whoami-rajat) 14:18:17 <whoami-rajat> So we've had a recent bug in which the author states that their users are being confused by the __DEFAULT__ name 14:18:19 * lseki sneaks in 14:18:33 <rosmaita> i was skeptical at first, but the last comment on the bug is very revealing 14:18:34 <whoami-rajat> s/name/type 14:19:27 <whoami-rajat> they say they don't want their users to see the __DEFAULT__ type since they've already configured CONF.default_volume_type 14:19:47 <eharney> they don't want to see it when listing types, that is? 14:20:12 <rosmaita> eharney: yes, but maybe even stronger than that 14:20:25 <rosmaita> i think the way to go here is to not display __DEFAULT__ in the GET /types response if there is a default-type configured in cinder.conf 14:20:35 <jungleboyj> *Sigh* 14:20:38 <whoami-rajat> eharney, yes, they say the users gets confused if they should use this one or the other their admin has configured as default 14:20:44 <rosmaita> for type-show, you need to know the UUID of the type, is that right? 14:21:00 <whoami-rajat> rosmaita, id or name 14:21:17 <rosmaita> we take the name in the path? 14:21:28 <whoami-rajat> names are unique for volume types 14:21:28 <jungleboyj> The concern does make sense. 14:21:49 <whoami-rajat> I'm not really sure if this is a problem for a large mass of just this particular case 14:22:08 <rosmaita> i think we will see it more and more 14:22:28 <rosmaita> the problem i see, is that __DEFAULT__ shows up in the api ref 14:22:40 <geguileo> rosmaita: hiding the __DEFAULT__ vol type if we have a default in .conf could lead to a deployment having some volumes with __DEFAULT__ type but not getting it listed 14:22:41 <rosmaita> and if you can do GET /types/__DEFAULT__ 14:22:47 <geguileo> if they changed it after creating some volumes 14:22:50 <whoami-rajat> rosmaita, but we allow the __DEFAULT__ to be configurable, that's why it is visible 14:23:07 <rosmaita> yes, but if it is not used at all. what does that matter? 14:23:22 <whoami-rajat> rosmaita, also if a volume gets created with the __DEFAULT__ type, it would confuse users more that their volume is using a type which isn't visible 14:23:24 <geguileo> we could add a config option to hide it? 14:23:37 <rosmaita> no 14:23:59 <rosmaita> you just said they can do GET on the __DEFAULT__, so they can still see it 14:24:25 <rosmaita> i mean, at the time you do a GET /types call, if the operator has one configured, that is what you will get 14:24:34 <rosmaita> so we don't need to display the __DEFAULT__ in that case 14:24:43 <rosmaita> and if the operator removes the config 14:24:46 <rosmaita> then we will 14:24:51 <rosmaita> which makes sense 14:24:53 <geguileo> but if someone created a volume and it used __DEFAULT__ 14:24:58 <geguileo> then the .conf was changed 14:25:06 <geguileo> listing types would not return it 14:25:15 <geguileo> and it would be weird not to have the type that some volume has 14:25:35 <geguileo> when listing, I mean 14:25:40 <geguileo> (the type would be there) 14:25:52 <rosmaita> i think it depends on what the types list is supposed to display 14:26:01 <rosmaita> i think the types that are currently available to you 14:26:11 <geguileo> and __DEFAULT__ is available 14:26:13 <eharney> can't a user still manually create a volume w/ type __DEFAULT__ even if we don't list it? 14:26:24 <geguileo> eharney: yup 14:26:33 <eharney> so i'm not sure it's just about visibility in the list 14:26:36 <rosmaita> that seems like a bug 14:26:48 <rosmaita> i mean, __DEFAULT__ is supposed to be for lazy operators 14:26:50 <geguileo> I don't see that as a bug... 14:27:03 <eharney> i think it probably is a bug 14:27:11 <rosmaita> sure, the operator has configured a default type, that's what the default should be 14:27:32 <rosmaita> so, looks like a can of worms has been opened 14:27:53 <eharney> presumably if the operator made a default volume type, they don't want __DEFAULT__ to be used 14:28:17 <rosmaita> yes, that's exactly this bug-filer's issue 14:28:49 <rosmaita> whoami-rajat: i forget, what are the restrictions on modifying __DEFAULT__ type? 14:29:09 <rosmaita> i mean the actual system default 14:29:35 <whoami-rajat> rosmaita, their issue is they don't want their users to see it, they don't use it but it doesn't cause them any problem other than confusion 14:29:41 <whoami-rajat> rosmaita, we can update it, but can't delete it 14:30:04 <rosmaita> so they could update __DEFAULT__ to have exactly the same properties as their preferred default? 14:30:04 <whoami-rajat> what i suggested was, i will document this clearly 14:30:21 <whoami-rajat> rosmaita, yes they can 14:30:54 <rosmaita> but they can't do it while there are any volumes of __DEFAULT__, right? 14:31:21 <whoami-rajat> rosmaita, yep, it shouldn't be in use by any volume 14:31:39 <rosmaita> well, except as eharney pointed out, a user could explicitly ask for it 14:31:58 <rosmaita> given that it's all over the api-ref responses 14:32:42 <whoami-rajat> I've no issues in improving the documentation but what they're suggesting is to remove it which will again allow creating of untyped volumes which i don't prefer 14:33:19 <whoami-rajat> and we also discussed the visibility scenario, that doesn't seem to work either 14:33:31 <rosmaita> i think we have 2 bugs: 14:33:55 <rosmaita> 1) if an operator has configured a default type, users should not be able to create a volume of __DEFAULT__ type 14:34:27 <rosmaita> 2) if an operator has configured a default type, the __DEFAULT__ should not be displayed in the GET /types response (this one is controversial right now) 14:34:57 <rosmaita> i think this is a real problem, because even though it's kind of silly, customer calls are a PITA 14:35:01 <smcginnis> Since __DEFAULT__ was created because we can't handle things right in our code because too many places expected to have a type, I think it should be hidden from end users. 14:35:11 <rosmaita> smcginnis: ++ 14:35:35 <whoami-rajat> the configured one already has a priority over the __DEFAULT__ type 14:35:52 <rosmaita> yes, but there' s no way for end users to know that 14:36:12 <geguileo> and the problem is that horizon would present __DEFAULT__ 14:36:29 <rosmaita> yeah, and DEFAULT looks more important that default 14:36:37 <smcginnis> Yep. 14:37:13 <jungleboyj> Yeah. I do think that the complaint is relevant. 14:37:28 <geguileo> yeah, it's a reasonable complaint 14:37:46 <rosmaita> ok, let's think about this some more and revisit next week 14:38:02 <whoami-rajat> thanks everyone for their feedback 14:38:09 <jungleboyj> rosmaita, ++ 14:38:13 <geguileo> I think we can hide the __DEFAULT__ type from the list if there are no volumes that use them and cinder.conf has a different default 14:38:24 <geguileo> s/them/it 14:38:47 <rosmaita> geguileo: problem is, any deployment since train will definitely have them 14:38:58 <geguileo> rosmaita: not necessarily 14:39:01 <rosmaita> yes 14:39:06 <rosmaita> there was a regression 14:39:10 <geguileo> rosmaita: they could have a default already defined 14:39:14 <geguileo> in the conf 14:39:16 <whoami-rajat> geguileo, but if they comment out the default part in cinder.conf, we should show it ? 14:39:20 <geguileo> and the __DEFAULT__ would not be used 14:39:33 <geguileo> whoami-rajat: that's what I would do 14:39:44 <geguileo> that, or having a config option 14:40:19 <rosmaita> i don't like the config option 14:40:36 <rosmaita> but we can discuss next week, let's move on 14:40:39 <geguileo> rosmaita: but it's the cleanest way, since we pass the responsibility to the admin 14:40:43 <geguileo> rosmaita: ok 14:41:17 <rosmaita> #topic CI issues 14:41:25 <rosmaita> tosky: hopefully this is quick 14:41:32 <tosky> I can just copy the content of the etherpad here 14:41:38 <tosky> or do a summary: 14:42:10 <tosky> - you can see many failures on cinder-tempest-plugin-lvm-lio-barbican fails, especially one test, I don't know why 14:42:37 <tosky> - https://review.opendev.org/#/c/733161/ should temporarily unblock cinder-tempest-plugins gate broken by the ceph updates (but we need to fix them) 14:43:01 <tosky> - please merge https://review.opendev.org/#/c/738978/ and its future train backport to make the ceph job pass again 14:43:23 <tosky> - devstack-plugin-nfs-tempest-full is superbroken for unknown reasons (see https://review.opendev.org/#/c/735959/) 14:43:30 <tosky> that's it - suggestions and help more than welcome 14:43:46 <eharney> the lio-barbican job has been a little flaky for a while, and i occasionally look at it, but the failures are never very actionable/interesting to me 14:44:03 <eharney> (that is, it probably needs a more thorough look) 14:44:30 <rosmaita> superbroken is even worse than usualy 14:44:41 <tosky> I suspect resource issues, the tests which fails most for lio-barbican is tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern 14:44:55 <tosky> and it usually fails to connect to the spawned instance 14:44:59 <rosmaita> tosky: i think you are onto something there 14:45:01 <eharney> right 14:45:20 <eharney> are we still chasing any of these things with elastic recheck? 14:45:39 <rosmaita> i personally am not 14:46:03 <tosky> I admit not being too much into that; I was told no need to add recheck <foo> because it should be caught by elasticsearch (maybe after adding some rules) 14:47:02 <smcginnis> I think since Riedeman left, we lost our last elastic recheck champion. :) 14:47:12 <smcginnis> I think we should use it though. It does help. 14:47:24 <rosmaita> ok, let's address that next week too 14:47:31 <rosmaita> thanks, tosky 14:47:50 <rosmaita> #topic Fix for Fail to extend attached volume using generic NFS driver 14:47:58 <rosmaita> lseki: that's you 14:48:03 <lseki> hi 14:48:15 <lseki> I think kaisers is ooo but he can read the logs later 14:48:17 <rosmaita> hopefully your connection will hold for the next 10 min 14:48:32 <lseki> hopefully 14:48:47 <lseki> I talked to openstack-nova folks 14:49:15 <lseki> about https://bugs.launchpad.net/cinder/+bug/1870367 14:49:15 <openstack> Launchpad bug 1870367 in Cinder "Fail to extend attached volume using generic NFS driver" [High,In progress] - Assigned to Lucio Seki (lseki) 14:49:27 <rosmaita> i like the idea of nova doing everything 14:49:39 <lseki> in short, generic nfs driver is failing because it's trying to do an unnecessary `qemu-img resize` operation 14:50:01 <enriquetaso> :o 14:50:11 <lseki> so the fix is to avoid generic nfs driver from doing that 14:50:17 <lseki> and let nova do everything needed 14:50:46 <lseki> I submitted 3 draft patches for nova, cinder, and devstack 14:51:08 <lseki> nova patch to implement a trivial method called upon extend_volume 14:51:30 <lseki> cinder patch to make nfs driver skip the qemu-img resize when volume is attached 14:51:46 <lseki> devstack patch to enable the online extend test for generic nfs driver 14:52:19 <lseki> reviews are welcome! 14:52:36 <eharney> does the volume manager submit a nova event etc for extend after the driver's extend_volume call? 14:53:05 <lseki> soon, I'll submit a similar patch for ONTAP NFS driver; it works on my machine 14:53:23 <geguileo> eharney: we do 14:53:40 <eharney> i suspect this means the extend method may need a lock against create_snapshot and other snapshot calls in the nfs driver 14:53:55 <eharney> this also needs to be tested thoroughly with encrypted volumes 14:54:07 <eharney> but many thanks for working on this 14:54:39 <lseki> :-) 14:54:42 <rosmaita> lseki: looks like your request for corner cases has been satisfied 14:54:59 <rosmaita> lseki: thanks for the comprehensive report 14:55:02 <eharney> to be more clear: performing resize and snapshot operations concurrently may break with your current patch, but i haven't looked too closely 14:55:20 <lseki> kaisers may do something similar to quobyte nfs driver, putting a depends-on to nova patch 14:56:07 <lseki> eharney: hmm we should check that 14:56:34 <rosmaita> four minutes left ... 14:56:38 <lseki> I have another concern: what if nova fails to extend the volume for some reason? 14:57:11 <lseki> cinder will update the DB with the new size, but the actual volume file will remain with the original size 14:57:15 <eharney> hmm 14:57:19 <geguileo> eharney: if the driver needs a lock to prevent snapshots while Nova does the resize we have a problem 14:57:35 <eharney> geguileo: how so? 14:57:38 <geguileo> because the call is async 14:57:45 <smcginnis> We just send an event. 14:57:52 <smcginnis> We don't ever even know if it happens. 14:57:54 <geguileo> exactly :-( 14:58:02 <geguileo> which brings us to lseki's concern 14:58:05 <smcginnis> "Hey nova, if you're listening, you can extend this volume if you feel like it." 14:58:09 <geguileo> what if it fails 14:58:23 <geguileo> so we need to find a way to make it synchronous 14:58:45 <eharney> i suspect there's an issue if you extend the root file while halfway through a create_snapshot operation which is shuffling files around 14:58:59 <geguileo> or implement a similar external events mechanism like Nova so they can let us know the result 14:59:19 <smcginnis> For other drivers it is not an issue since they extend the volume first, then send an event. 14:59:35 <smcginnis> It could be to nova, or it could be to someone else using Cinder for volume services. 14:59:45 <smcginnis> We definitely should not have a hard dependency on a nova API. 14:59:59 <rosmaita> ok, looks like this needs some more thought 15:00:04 <rosmaita> and we are out of time 15:00:08 <rosmaita> #endmeeting