16:00:01 #startmeeting Cinder 16:00:03 Meeting started Wed Nov 18 16:00:01 2015 UTC and is due to finish in 60 minutes. The chair is smcginnis. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:07 The meeting name has been set to 'cinder' 16:00:16 Hi! 16:00:18 hi 16:00:19 hi 16:00:20 Hi 16:00:21 hey 16:00:22 Hello :) 16:00:27 Hey everyone. 16:00:33 hi 16:00:39 hi 16:00:42 hi 16:01:05 #topic Big Status 16:01:15 #info 482 Cinder bugs 16:01:23 #info 54 python-cinderclient bugs 16:01:36 #info 13 os-brick bugs 16:02:00 o/ 16:02:02 Just a summary of bug counts to make sure there is awareness. 16:02:07 hi 16:02:28 Any kind of bug scrub or other attention would definitely be appreciated. 16:02:53 Any link that shows the list? 16:02:55 scottda also points out there are 12 nova bugs tagged as volume. 16:03:02 hi 16:03:06 #link https://bugs.launchpad.net/nova/+bugs?field.status:list=NEW&field.tag=volumes 16:03:13 Those are just the "new" ones that need triage 16:03:26 scottda: :/ 16:03:41 rajinir: Nothing else really except here: 16:03:43 #link https://bugs.launchpad.net/cinder/+bugs 16:04:04 Launchpad is the greatest interface for searching through, but they're all there. 16:04:23 scottda: Do you know of any particularly interesting ones? 16:04:49 s/is the greatest/isn't the greatest/ 16:04:54 There's 1 with NetApp in the title :) 16:05:02 :) 16:05:17 hi 16:05:36 I think in general I want to at least spend some time adding tags to the bug reports to make it easy to pull out by vendor or particular feature. 16:05:43 Most Nova volume bugs look familiar...things get out of sync during attach/detach :( 16:06:16 scottda: OK, at least it's consistent. 16:06:30 There's also the spec etherpad from last week: https://etherpad.openstack.org/p/mitaka-cinder-spec-review-tracking 16:06:39 mornin 16:06:41 That to be honest I haven't had the time to do a thing with. 16:06:49 scottda: we've got an E-Series driver person looing at that one 16:07:05 looking 16:07:16 But the format is used by nova with some success, so I think it's worth spending more time on it. 16:07:34 I also haven't had a chance to get some good reports up like nova has. 16:07:48 Mostly because I just haven't been able to sit down and hack away at it. 16:07:58 But hopefully some helpful tools coming soon. 16:08:10 OK, just want to keep awareness here. 16:08:14 Let's move on to business. 16:08:25 OSLO incubator going away (DuncanT) 16:08:32 great news! 16:08:35 Hi 16:08:35 #topic OSLO incubator going away 16:08:59 We still use a few bits from osl incubator, most noticably the scheduler and image utils 16:09:21 Image utils is trivial, but I wonder if it is worth doing a last sync on the scheduler 16:09:39 Tehy both need moving into a standard cinder namespace, and the unit tests pulling in from incubator 16:09:41 DuncanT: can we move imagutils to oslo.utils? 16:10:28 e0ne: Maybe, yeah. There's some argument over how it should work though, and it isn't complex, so it might be easier to have our own copy 16:10:46 DuncanT: sounds reasonable 16:11:44 At least I start I suppose. 16:11:59 I've nothing else to say on the subject, it was just a heads up... if I get time I'll put up patches, but volunteers welcome 16:12:09 DuncanT: Thanks! 16:12:23 #topic Adding extra data to event notifications for searchlight to consume 16:12:33 DuncanT: Still all yours. :) 16:12:34 Me again 16:13:09 So searchlight shoves openstack metadata into an elastic search engine, with just enough extra metadata that it can do RBAC with it 16:13:23 I wrote a cinder plugin for it, modelled off the nova one 16:13:44 Trouble is, like nova, it needs to hit the API after nearly every event to get extra info 16:13:55 :( 16:13:58 This sucks, and makes the API and DB work twice as hard 16:14:06 +1 16:14:16 What additional data does it need? 16:14:19 I propose stuffing more details into the events to avoid this 16:14:36 Usually RBAC related fields, sometimes status 16:14:56 can we add cinder objects to events? 16:15:00 I suggest we just JSON encode the DB record we have in our hand at the time, and attach that to the event 16:15:07 Events are just JSON 16:16:04 That does seem like it would make the event more useful to consumers. 16:16:29 I'd be concerned about passing too much extra, especially for non-searchlight consumers. 16:16:43 But I think the overhead would be fairly low at this point.? 16:16:58 In all cases I can find, it is less than a k of data unless people go mad with metadata 16:17:55 Doesn't sound like anybody hates it so far, I'll post patches 16:18:03 Any concern with large and active system deployments? 16:18:33 Yeah, might be worth proposing some patches and see if we get any other feedback there. 16:18:55 e0ne: Cinder objects are JSON serializable by definition, we can attach them. 16:19:12 I can run it on ~hundred nodes test system, see how that goes re extra load 16:19:18 dulek, that is a lot of extra data floating around though. 16:19:22 is it really worth this ? 16:19:22 dulek: sure, it's what I'm proposing 16:19:55 I'll come up with the fields that are actually interesting, but I suspect ti ends up being most of them 16:20:03 DuncanT, hemna: That extra stream of data is my only concern. 16:20:04 hemna: It is, but I basing on my undestanding how searchlight works it needs that. Or just diffs? 16:20:20 dulek, any urls on searchlight ? 16:20:36 And if someone want to run searchlight - notifications are still better than SL hitting APIs all the time for indexing. ;) 16:20:52 smcginnis, maybe this is CONF enabled but disabled by default ? 16:21:10 https://www.youtube.com/watch?v=0jYXsK4j26s http://docs.openstack.org/developer/searchlight/ 16:21:11 Defintely better when you're using searchclight. But what about when not? 16:21:14 otherwise we are stuffing lots more data around for no use. 16:21:26 hemna: I don't have nothing more than Google shows me. 16:21:32 BTW - TravT_away ^ 16:21:38 who is asking for this ? 16:21:43 hemna: Yeah, that's what I was thinking. Just a flag to flip. 16:22:22 hemna: Searchlight? Several interested parties, and several past attempts to add e.g. regex filtering to cinder 16:22:56 hemna: http://stackalytics.com/?metric=commits&module=searchlight-group&release=liberty 16:23:21 I'll have to write the searchlight plugin to do the API queries if the events don't contain enough info, since in needs to work against existing systems ideally, so a conf option is fine by me, but I think the overhead should be negligable 16:23:23 Well, there's certainly one company fully behind this. :) 16:23:29 heh 16:23:32 smcginnis: :D 16:23:33 :) 16:23:59 so make it disabled by default and enable it in cinder.conf 16:24:19 DuncanT: I think it would help if you are able to get some metrics. Then we know what we're dealing with. 16:24:22 until we get more performance metrics out of the impact of adding it. 16:24:26 hemna: Another conf option is a burden though.... I'll benchmark the difference 16:24:37 if it's negligible, then enable it by default. 16:24:42 a burden? meh 16:25:11 Once we have the data it will be easier to say if we should just have on by default. 16:25:15 hemna: We have loads that aren't tested or documented already 16:25:31 that's not an argument for not creating another one. :) 16:25:38 I think have a switch to turn it off for those that aren't using and want to optimize things a bit might not be bad. 16:25:51 Or is that optimise? :) 16:26:21 I'll benchmark it, we have some performance guys who're good at that sort of thing 16:26:27 if we are talking about the entire volume object and all of it's entries, that's quite a bit of data per event. 16:26:29 #action DuncanT to get some data for overhead with sending object data in notifications. 16:26:36 that's not negligible IMHO. 16:27:55 Well, let's see how things look once we have real data. 16:28:02 DuncanT: all for this? 16:28:04 hemna: It's going via rabbit, so message size isn't actually a bit overhead according to previous benchmark 16:28:09 smcginnis: Yup 16:28:19 coolio. 16:28:23 OK, thanks. 16:28:31 #topic Looping on failed driver init. 16:28:37 It's the DuncanT show today. :) 16:29:20 Ok, this is hopefully an easy one. If you power your array and cinder node on at the smae time, and the array comes up second, c-vol is useless until you restart it. This sucks 16:29:36 (at least for drivers that check the array in init, which many do) 16:29:42 DuncanT: +1 16:29:53 I agree. 16:29:57 DuncanT: +1 16:30:02 DuncanT, we had customers asking for this 16:30:14 Ok, I'll post patches. Thanks 16:30:18 kmartin: not only you :) 16:30:33 I think that's a pretty clear answer then. :) 16:30:34 DuncanT, +1 16:30:38 c-vol shows up succesful in the services-list even if the driver failed 16:30:41 (Were 'I' quite possibly means 'one of my colleagues') 16:30:48 rajinir: That's not true. 16:30:53 rajinir: At least in Liberty. 16:31:01 Delegation - DuncanT must be a manager now. :) 16:31:05 DuncanT, so are you proposing a retry N times thing ? 16:31:06 dulek: It was true until very late in liberty 16:31:29 dule: that's great, if some work was done to fix that 16:31:37 hemna: Just retry every half minute or something for ever - there's nothing else the process can usefully be doing at that point 16:31:39 so in the case of our driver, we puke if we can't hit the array, but we also puke if there are conf settings that don't verify on the array, so retry won't help. 16:31:50 i put a comment in the bug, but i'm concerned about doing this for __init__ if that's what you mean 16:31:57 that basically means the log will show exceptions every half minute :) 16:32:03 continually failing 16:32:03 hemna, DuncanT: we can make retrive interval configurable 16:32:13 hemna: So my idea doesn't make it any worse, just doesn't fix all cases. I'm ok with that 16:32:20 Another config option:) 16:32:21 Or maybe have a scaling backoff. 16:32:23 DuncanT: multiple backends? maybe configurable bound on the number of retries. 16:32:31 hemna: Currently you get a log full of failing periodic tasks 16:32:40 it should only retry if the driver pukes because it can't talk to the array 16:32:43 DuncanT: I agree. We still spam the logs on failures as it is. 16:32:43 that's a problem that I can understand .... +1 ;) 16:32:51 if it pukes because of misconfiguration then it shouldn't retry 16:32:56 as it will fail forever 16:33:03 hemna: So maybe new exception that everyone should implement. 16:33:05 tbarron: Multi-backend is in different processes by this point 16:33:09 smcginnis, +1 16:33:15 DuncanT: oh right 16:33:18 RetryableCinderException or some such thing. 16:33:23 yup 16:33:37 That makes sense. 16:33:37 hemna:smcginnis : Ok, I'll look at adding a new exception to say 'stop, this can't be retried 16:33:44 DuncanT, +1 16:33:47 thanks! 16:33:53 DuncanT: +1 16:33:54 DuncanT: what about __init__ failures due to missing requirements, syntax errors, etc? 16:33:57 ComeBackLaterCinderException 16:34:06 smcginnis: +1 ;) 16:34:12 eharney, I think that new exception would cover those 16:34:18 eharney: Retries are still fairly harmless IMO 16:34:22 hemna: it wouldn't 16:34:41 well, not syntax errors 16:34:47 but missing conf entries ? it should 16:34:56 i don't think we want to retry if you have broken python, personally 16:34:56 eharney: You don't think so? 16:35:08 that's just a mess in the logs 16:35:11 driver recognizes it doesn't have the needed conf entry and raises the DontRetryException 16:35:33 HavingABeerWillCheckLaterCinderExcpetion 16:35:37 syntax errors in python shouldn't make it into the code :) 16:35:53 eharney: Right now the logs are a mess - you get 'driver not initialised' messages and have to scroll back up to find the real problem 16:35:57 Yeah, I think it could be explicit by the driver. I'm all good but I can't complete init. 16:36:02 if those aren't being covered by py27 and py34 we have bigger problems 16:36:35 Is there a way to propagate the error to the services-list 16:36:36 hemna: eh... there's a lot more to it than that 16:37:20 rajinir: Not in the first patch 16:37:55 upstream unit tests don't help with people deploying out of tree drivers, broken packages, etc 16:38:31 eharney: I'll keep it in mind. The logs are already really unhelpful in that case though 16:38:45 the service itself would puke out 16:38:55 I'm not sure what you are trying to solve w/ that one. 16:38:57 eharney: So if the driver has to raise an exception saying everrything else is good but it can't talk to the array right now, wouldn't that cover it? 16:39:08 eharney: Otherwise it blows up in a different way. 16:39:36 smcginnis, I think he's trying to cover the case where someone installs an out of tree driver that's horribly broken (syntax errors). 16:39:41 smcginnis: yes 16:39:54 But then it wouldn't raise a ComeBackLaterException. 16:40:03 It would be a ThisCodeIsCrapException. 16:40:14 all i'm saying is, no point in retrying indefinitely for "your system is broken" kinds of errors 16:40:15 And then things just fail. 16:40:27 i'm not too concerned with the other details 16:40:31 eharney: Yeah, definitely. 16:41:12 eharney, I thought that's what I had raised earlier though? and why we created the dontretryexeption ? 16:41:14 I'm confused. 16:41:25 but that assumes you can raise a dontretryexception 16:41:46 then you don't cover other assorted error exceptions that can be raised outside of your intent 16:41:50 ok this is going in circles. I give up. 16:42:14 Oh, I think instead of a dontretyexception it needs to be DoRetryException. 16:42:34 Is this only for init? 16:42:38 So the driver can let the upper layer know it's a condition that may resolve itself. 16:42:42 Swanson: Yeah. 16:42:45 smcginnis: that was the thought 16:42:50 We can have that discussion on review probably… 16:42:55 I agree. 16:43:02 We don't need to design this here. 16:43:18 Good enough for now I think. Let the details come out in the code and reviews. 16:43:31 dulek: Keeping us on track. :) 16:43:34 Do I know the difference between a bad config and an array being down? 16:43:55 * dulek is doing 2 meetings at once, so time's valuable. ;) 16:44:01 Swanson: You would know the difference between a missing config and not being able to contact the array. 16:44:11 Swanson: If the config is worng password, probably not, but retrying is fairly cheap in that case 16:44:16 Like DuncanT was pointing out, not any worse than how it is now. 16:44:21 smcginnis: +1 16:44:32 OK, let's move on. Especially since dulek is next. :) 16:44:39 #topic reno for release notes 16:44:42 hi! 16:44:46 dulek: Thanks for bringing this up by the way. 16:44:52 And this is also just a heads up. 16:45:08 We started to use reno for relase notes management. 16:45:28 #link http://lists.openstack.org/pipermail/openstack-dev/2015-November/078301.html 16:45:34 #link http://docs.openstack.org/developer/keystone/developing.html#release-notes 16:45:41 #link https://review.openstack.org/#/c/246455/ 16:45:55 These are the resources that explain this stuff. 16:46:25 As you see Keystone already merged some guidelines on what requires a relase note 16:46:31 We probably should do the same. 16:47:01 Thanks, I hadnt seen the keystone notes. That would be good for us to have somewhere in our devref. 16:47:03 dulek: +1 16:47:08 Probably on our wiki too. 16:47:11 I can come up with proposition (getting last link into account - so cross-project guidelines) to get reviewed. 16:47:29 * DuncanT would like it if the git commit messages were good enough, but they probably aren't :-( 16:48:02 Having to pull together the release notes for liberty, I can definitely ay they are not good enough. 16:48:07 s/ay/say/ 16:48:12 DuncanT: feel free to -1 on such commit messages :) 16:48:12 Ah, and BTW - there's two last patches waiting for merging. 16:48:29 One is unreleased notes page backported to Liberty 16:48:43 And the second one - a job for release notes. 16:48:54 dulek: Shoot, I thought we had gotten those through. Will look it up. 16:48:58 CI job I mean. 16:49:10 smcginnis: https://review.openstack.org/#/c/245431/ 16:49:13 That's the one. 16:49:33 I believe project-config changes are out from our scope, but +1 from smcginnis would be helpful. 16:49:37 dulek: Thanks! 16:49:43 https://review.openstack.org/#/c/244764/ 16:49:48 That's CI job. 16:50:17 So summing up - we all have another thing to -1 reviews for. :D 16:50:25 :/ 16:50:32 I hope this pays off. 16:51:01 I hope too, I'm not totally convinced, but let's see. 16:51:16 It does look promising. I think we will just need to adjust so it's just normal. 16:51:44 dulek: Well, again, thanks for bringing it up. 16:51:46 So if anyone will have a question on release notes stuff - I'm here to help. :) 16:51:53 dulek: Anything else. 16:51:53 That's all I've wanted to say. :) 16:52:02 Thank you. 16:52:10 #topic Open Discussion 16:52:20 The floor is open. (for 8 minutes) 16:52:35 Just a note - we're moving in right direction with upgrades stuff! 16:53:06 There's a spec for making DB schema migrations in live manner: https://review.openstack.org/#/c/245976/ 16:53:10 If someone interested. 16:53:16 dulek, nice 16:53:37 Thanks everyone for reviewing and especially thangp for driving this that well. 16:53:49 +1 16:54:27 dulek: we ran into one issue with versionedobjects conversion 16:54:51 dulek: volume_metadata in volume db is changed to metadata in volume object 16:55:00 dulek: our CI failed because of that 16:55:06 xyang: I've noticed that in logs recently. 16:55:14 dulek: so other drivers could be affected 16:55:29 xyang: We should probably add an alias to Volume object. 16:55:52 dulek: so for VNX driver, the create_volume function needs to use metadata but migrate_volume needs volume_metadata 16:55:55 dulek: +1 16:56:04 xyang: I'll take a look on that. Thanks for reporting. 16:56:17 dulek: thanks 16:57:08 so, FWIW, I created a dumping ground github repo for some of my brick tools I've been using here: https://github.com/WaltHP/hpe-openstack-tools 16:57:23 if anyone wants to use/contribute to em 16:57:25 Cool! 16:57:29 I use those when testing stuffs 16:57:37 that Cinder/Nova doesn't call just yet 16:58:04 the brick_volume.py dumps out the existing volume paths on the system for an attached volume. 16:58:28 http://paste.openstack.org/show/479295/ 16:58:30 looks like that 16:59:11 OK, pretty much out of time. Thanks everyone. 16:59:19 Thanks 16:59:22 #endmeeting