14:00:34 <rosmaita> #startmeeting cinder 14:00:35 <openstack> Meeting started Wed Jun 17 14:00:34 2020 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:38 <openstack> The meeting name has been set to 'cinder' 14:00:44 <enriquetaso> o/ 14:00:44 <rosmaita> #topic roll call 14:00:48 <whoami-rajat> Hi 14:00:49 <walshh_> Hi 14:00:50 <eharney> hi 14:00:50 <enriquetaso> o/ 14:00:58 <rajinir> hi 14:01:06 <jungleboyj> o/ 14:01:15 <tosky> o/ 14:01:24 <geguileo> hi! o/ 14:01:45 <rosmaita> nice turnout! 14:01:46 <rosmaita> #link https://etherpad.openstack.org/p/cinder-victoria-meetings 14:02:06 <lseki> hi 14:02:20 <rosmaita> #topic announcements 14:02:28 <smcginnis> o/ 14:02:45 <rosmaita> thanks to everyone who filled out the poll to determine a good time for the virtual midcycle 14:02:50 <rosmaita> which is next week! 14:03:00 <rosmaita> the winner is:\ 14:03:11 <rosmaita> Wednesday 24 June 1400-1600 UTC 14:03:26 <rosmaita> so the first hour overlaps with the time of the usual cinder meeting 14:03:38 <rosmaita> so everyone should be able to attend at least the first part 14:03:52 <rosmaita> and we will need topics: 14:04:02 <rosmaita> #link https://etherpad.opendev.org/p/cinder-victoria-mid-cycles 14:04:03 <e0ne_> hi 14:04:42 <rosmaita> since it's the week before spec freeze, anyone working on a spec who has questions or needs some guidance, the virtual midcycle would be a good place to get answers 14:05:02 <rosmaita> anyone working on a spec should check out the virtual PTG etherpad 14:05:19 <rosmaita> each spec was discussed, and there are comments on the etherpad 14:05:34 <rosmaita> #link https://etherpad.opendev.org/p/victoria-ptg-cinder 14:05:50 <rosmaita> and, of course, we can discuss other issues as well 14:05:57 <hemna_> mep 14:06:16 <rosmaita> i know it's coming up fast, but this is a short cycle, and spec freeze is in 2 weeks 14:06:58 <whoami-rajat> rosmaita, we will have only 1 meeting right ? (of 2 hours) 14:07:01 <jungleboyj> Crazy. 14:07:16 <rosmaita> whoami-rajat: well, there will be part 2, but at week R-9 14:07:24 <rosmaita> so yes, just one meeting before spec freeze 14:07:31 <whoami-rajat> rosmaita, yep. ack. 14:07:32 <whoami-rajat> thanks 14:07:47 <rosmaita> #action rosmaita announcement to ML about virtual midcycle 14:08:05 <rosmaita> we'll do bluejeans again as a videoconf technology 14:08:46 <rosmaita> the other topic is that last time i looked, all devstack-based zuul jobs are broken 14:09:00 <rosmaita> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015432.html 14:09:13 <rosmaita> so recheck is futile until that's fixed 14:09:59 <whoami-rajat> they've proposed a patch which is merged in master but when i looked at grenade failure, it shows error on previous code, not sure why 14:10:45 <whoami-rajat> https://review.opendev.org/#/c/577955/27/lib/apache 14:10:55 <tosky> the new uwsgi brought havoc 14:11:16 <rosmaita> but the stable branches don't seem to be doing so well, either 14:11:17 <tosky> whoami-rajat: grenade install the previous branch 14:11:27 <rosmaita> #link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-full# 14:11:34 <smcginnis> I thought at least train was fixed as of yesterday. 14:11:41 <tosky> so they sequence is: make grenade non-voting, make it non-voting also in the backport, fix the issue in the previous branch, reenable it 14:12:01 <tosky> but yeah, until stein things should work, but not rocky 14:12:04 <tosky> and queens 14:12:15 <whoami-rajat> tosky, shouldn't it use the previous uwsgi version then? isn't it constrained for every release? 14:12:29 <rosmaita> ok, so we think stein through master should work now? 14:13:10 <whoami-rajat> i still see failure in my master patch i updated yesterday https://review.opendev.org/#/c/735513/ 14:14:35 <tosky> uhm, but that was yesterday 14:14:58 <tosky> whoami-rajat: this patch has recent results https://review.opendev.org/#/c/712832/ 14:15:00 <rosmaita> hopefully ajaeger will send an update to the ML when everything is back to normal 14:15:29 <rosmaita> #topic python2.7 compatibility issue 14:15:36 <whoami-rajat> oh, thanks tosky 14:16:21 <rosmaita> ok, so we've been talking about backports being more difficult now that master and ussuri are python3-only, and most of the stable branches are not 14:16:41 <rosmaita> so, here's a case in point so we can see what we're up against 14:16:52 <rosmaita> #link https://review.opendev.org/#/c/733100/ 14:17:10 <rosmaita> open that link and take a look at the zuul results 14:17:49 <rosmaita> also, since this is a driver change, there was third party ci run on it manually for stable/train 14:18:04 <rosmaita> but, take alook at this: 14:18:17 <rosmaita> #link https://review.opendev.org/#/c/733100/1/os_brick/privileged/scaleio.py 14:18:36 <rosmaita> look at line 99 and think for a minute before opening the next link 14:18:53 <rosmaita> #link https://launchpad.net/bugs/1883654 14:18:53 <openstack> Launchpad bug 1883654 in os-brick train "os-brick fail to get password from config file" [Undecided,In progress] - Assigned to hamza (alqtaishat) 14:19:27 <rosmaita> so, you may remember hearing me say that we need to have good tests that will break on python3 only features 14:19:41 <hemna_> :( 14:19:42 <rosmaita> and you may also notice that i +2'd the above patch 14:20:02 <rosmaita> anyway, here's a test: 14:20:05 <rosmaita> #link https://review.opendev.org/#/c/735989/6/os_brick/tests/privileged/test_scaleio.py 14:20:27 <rosmaita> which is a 68 line test file to test a function that contains 4 statements 14:20:41 <rosmaita> and i'm not even saying that's a great test 14:20:56 <rosmaita> for instance, there's a pretty good chance that it doesn't work on windows 14:21:34 <rosmaita> but, the point is that we need to up our test game 14:21:40 <rosmaita> or at least i do :D 14:21:48 <hemna_> fwiw, there is no scaleio connector for windows 14:22:02 <rosmaita> cool 14:22:21 <rosmaita> but i guess this unit test would run anyway 14:22:37 <whoami-rajat> i think i left a comment on master version of this patch regarding testing that method 14:22:37 <rosmaita> the problem is using the tempfile 14:22:51 <rosmaita> i believe you did 14:23:09 <rosmaita> so bonus points for whoami-rajat 14:23:28 <whoami-rajat> :D 14:23:48 <whoami-rajat> we were just testing if the method was called but not going inside it 14:24:08 <whoami-rajat> i'm not sure if it would've helped with this situation 14:24:24 <enriquetaso> whoami-rajat++ 14:24:41 <rosmaita> yeah, so the code wasn't completely un-tested 14:24:47 <rosmaita> just not tested thoroughly enough 14:25:11 <rosmaita> my main point is that reviewers should feel empowered to demand more tests 14:25:28 <rosmaita> and reviewees need to be patient when more tests are demanded 14:26:04 <whoami-rajat> rosmaita++ 14:26:08 <rosmaita> any comments or observations before we move on? 14:26:40 <rosmaita> i guess i have one more -- anyone guess why the third-party CI passed? 14:26:49 <enriquetaso> I agree 14:26:56 <rosmaita> because we made everyone move to Python 3!!! 14:27:24 <eharney> :) 14:27:26 <whoami-rajat> rosmaita, on stable branches also the CI running is py3? 14:27:36 <whoami-rajat> i guess it's the same CI 14:27:37 <rosmaita> so even if third-party CI is testing stable branches, we can't count on that to detect problems 14:28:23 <rosmaita> ok, so in the next topic we can discuss why this is a particularly painful bug 14:28:36 <rosmaita> #topic updating OSSN-0086 14:28:54 <rosmaita> ok, so the patch that introduced that bug is one of the patches addressing a security issue 14:29:06 <rosmaita> the security issue had to be fixed in both cinder and os-brick 14:29:17 <rosmaita> luckily, the cinder side of the fix doesn't contain this problem 14:29:28 <rosmaita> (and hopefully doesn't contain a different one) 14:29:58 <rosmaita> so, ussuri and master are python3 only, so we don't need to worry about those 14:30:23 <rosmaita> but we do have T, S, R, and Q (plus some joker has proposed a backport from Q -> P) 14:30:39 <rosmaita> at this point, ocata is dead to me, and pike is getting there 14:30:52 <rosmaita> but we can discuss pike later 14:31:04 <rosmaita> in the ossn, we promised to fix back to queens 14:31:14 <rosmaita> so that's all i'm worried about now 14:31:36 <rosmaita> here's the train patch, which still needs some work: 14:31:41 <rosmaita> #link https://review.opendev.org/#/c/735989/ 14:32:07 <rosmaita> in particular, smcginnis and jungleboyj please look at the release note on that patch 14:32:30 <rosmaita> i tested locally, it will be a clean backport from T to S 14:32:52 <rosmaita> so once we get train fixed, that will be big progress 14:33:05 <jungleboyj> Ok. 14:33:21 <rosmaita> so, the OSSN lists the patches. i will have to update it to include the new patch 14:33:49 <rosmaita> we just released cinder and os-brick from train, i think we will have to do new releases? 14:34:12 <rosmaita> so what we're looking at is: merge when gate is fixed, release new os-brick, wait for u-c change to merge, update cinder requirements, release new cinder 14:34:29 <smcginnis> ++ 14:34:44 <rosmaita> ok, thanks, just wanted to make sure i wasn't being overly dramatic 14:34:45 <jungleboyj> ++ 14:35:30 <rosmaita> i'll include a similar release note on the new cinder release, just saying you only need to upgrade if you are using that particular backend and using python 2.7 14:35:59 <rosmaita> ok, so now when we get to Rocky and Queens, we don't release any more becuase they are in Extended Maintenance mode 14:36:52 <rosmaita> the commit message for the cinder patch refers to the os-brick patch (which is now broken) 14:37:06 <rosmaita> but it also mentions the OSSN, which i will update with the second brick patch 14:37:39 <rosmaita> and i think if the commit message on the second patch says "Fix regression", anyone grabbing the first patch will notice the second in the git log? 14:38:15 <rosmaita> plus, i can send out an embarassing email to the ML giving operators a heads up 14:38:27 <rosmaita> i guess that will take care of Rocky and Queens? 14:38:44 <rosmaita> anyway, if anyone has a better suggestion, let me know 14:38:53 <rosmaita> i am just thinking out loud here 14:39:30 <rosmaita> so basically, what i really need is for stable cores to look for these patches 14:39:39 <rosmaita> i will liberally ping people in IRC 14:39:51 <jungleboyj> ++ 14:39:58 <rosmaita> #topic expected behavior when the default volume type config option specifies a non-existent volume_type 14:40:48 <rosmaita> ok, the context is the work in Train to make sure that every volume has a volume_type 14:41:00 <rosmaita> because we used to allow no type, and that caused problems 14:41:27 <rosmaita> so now we have __DEFAULT__ if an operator doesn't set a default type in the cinder conf 14:42:00 <rosmaita> and we have a migration (i think in ussuri?) that makes any untyped volume __DEFAULT__ 14:42:05 <rosmaita> (is that correct whoami-rajat ?) 14:42:28 <whoami-rajat> rosmaita, yes 14:42:31 <rosmaita> cool 14:42:46 <whoami-rajat> basically it modifies the volumes,snapshots,encryption tables but yes 14:43:22 <jungleboyj> Sounds familiar. :-) 14:43:29 <rosmaita> ok, so there is a slight regression at the REST API layer (nothing to do with the database) 14:43:58 <rosmaita> and in fixing that, i am having an argument with whoami-rajat about expected behavior 14:44:28 <rosmaita> so the issue is that before train 14:44:40 <rosmaita> if the operator mis-configured the default volume type 14:44:44 <rosmaita> with a non-existent type 14:45:02 <rosmaita> we would log a message, and let the build succeed with no volume type 14:45:13 <hemna_> the build ? 14:45:22 <rosmaita> building the volume 14:46:05 <rosmaita> the code in Train changes this so that instead of continuing the volume creation 14:46:14 <rosmaita> an exception is raised, and no volume 14:46:37 <enriquetaso> oh 14:46:38 <rosmaita> i was thinking that the analog to the old behavior would be to allow the create to succeed, but with __DEFAULT__ as the volume type 14:46:46 <eharney> failing the create is the right thing to do 14:46:58 <whoami-rajat> #link https://review.opendev.org/#/c/639180/41/cinder/volume/volume_types.py 14:47:01 <whoami-rajat> L#186 14:47:08 <hemna_> I think failing is the right thing to do 14:47:11 <rosmaita> well, except you get a 202 and only find out the volume create failed later 14:47:12 <hemna_> it's a misconfigured cinder 14:47:43 <eharney> we want to fail the create because it also enables the case where an admin wants to make users select a volume type instead of defaulting to one 14:48:14 <rosmaita> ok, what about this case 14:48:16 <hemna_> so the intention is to set an invalid default on purpose ? 14:48:29 <rosmaita> no, but it's logged for the operator 14:48:46 <rosmaita> all i'm saying is that we have __DEFAULT__, why not use it? 14:48:55 <hemna_> in eharney's case an invalid default is specified with no intention of fixing it ? 14:48:57 <jungleboyj> The problem with failing is that it is an API change. Right? 14:49:05 <eharney> do we check for an invalid volume type in the config at service start up? 14:49:13 <rosmaita> not sure 14:49:18 <rosmaita> the other place this will occur 14:49:18 <whoami-rajat> i honestly don't remember exactly but this made sense as if we configure a wrong default type (the right one points to a specific backend) then the volumes will end up in a totally unexpected backend (with the __DEFAULT__ type) 14:49:29 <hemna_> failing should happen already for a volume type passed in that's invalid 14:49:53 <jungleboyj> At one point in history we were checking bad defaults at start-up. 14:49:54 <smcginnis> eharney: We probably should have a check on start up. 14:50:04 <hemna_> smcginnis +1 14:50:15 <rosmaita> ok, we can follow up with that 14:50:25 <e0ne_> smcginnis: +1. it's a good suggestion 14:50:27 <jungleboyj> ++ 14:50:39 <rosmaita> the other case is if a user creates a volume from an image 14:50:43 <hemna_> the only downside to that is that they have to start cinder w/o a default set, then create the volume type, then change the default in cinder.conf, then bounce cinder 14:50:52 <rosmaita> you can put metadata on the image saying what volume type you want 14:51:11 <hemna_> vs w/o the start check, they can specify the missing type, then start cinder and create it. 14:51:13 <rosmaita> and since that's done by end-users, they are bound to make mistakes 14:51:47 <rosmaita> do we want to fail the build, or give them the default? 14:52:06 <rosmaita> they used to get a successful build with no volume type 14:52:08 <hemna_> create should fail with invalid volume type 14:52:17 <hemna_> as it does today if you pass in an invalid type 14:52:35 <eharney> yeah, i don't think we want to quietly use the default in the image case either 14:52:50 <rosmaita> ok 14:53:17 <hemna_> we could do a check at startup and throw a warning 14:53:22 <smcginnis> I agree. We either need to fail on startup, or we should fail the volume creation. 14:53:37 <rosmaita> well, for the image case, we can't check at startup 14:53:54 <jungleboyj> smcginnis: ++ 14:54:08 <jungleboyj> Ok, then it sounds like we need to fail at creation. 14:54:19 <rosmaita> ok, so the conclusion is: if a volume-type is mis-specified, we do not give you a default, we fail the create 14:54:28 <eharney> yes 14:54:29 <rosmaita> which actually makes sense 14:54:31 <jungleboyj> Yes. 14:54:36 <rosmaita> it's just different from pre-Train 14:55:05 <whoami-rajat> but we already do it so we're fine :) 14:55:27 <rosmaita> well, apparently we don't do it too much, or someone would have caught the bug earlier :) 14:55:44 <eharney> for more justification: defaulting to a different type would be quite bad for someone who thinks they're using encrypted volumes but accidentally isn't 14:56:12 <rosmaita> yeah, silently doing something that sort of works is never a good strategy 14:56:19 <whoami-rajat> rosmaita, yep, that too! 14:56:38 <rosmaita> i'll revise my patch to preserve the train behavior, then 14:56:54 <rosmaita> ok, thanks for the discussion 14:57:02 <rosmaita> #topic open discussion 14:57:12 <rosmaita> LiangFang is having connection problems 14:57:28 <rosmaita> but is asking people to please look at his patches for volume-local-cache 14:57:47 <rosmaita> the brick patch is using six, which i think we don't want any more 14:58:08 <rosmaita> but i forgot to actually leave that comment on the patch 14:58:14 <jungleboyj> ++ 14:58:18 <hemna_> ok I'll take a look 14:58:39 <rosmaita> hemna_ had some questions or observations in the cinder channel just before the meeting started 14:58:49 <hemna_> rosmaita thanks 14:58:53 <enriquetaso> Hi, same request here for the NFS encryption patch https://review.opendev.org/#/c/597148/ 14:59:18 <rosmaita> i had really good intentions to review both, but this unending security bug ate my time 14:59:28 <hemna_> so I was wondering why the volume's existing AZ isn't specified in the request spec during retype by default? 14:59:53 <rosmaita> we will have to continue this in the cinder channel, horizon meets in 20 seconds 14:59:55 <hemna_> this ends up resulting in the scheduler picking a backend existing in a different AZ, which fails 14:59:59 <hemna_> heh ok 15:00:09 <rosmaita> sorry about that 15:00:12 <rosmaita> thanks everyone 15:00:15 <rosmaita> #endmeeting