15:00:38 #startmeeting manila 15:00:39 Meeting started Thu Nov 12 15:00:38 2015 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:43 The meeting name has been set to 'manila' 15:00:46 hello all 15:00:52 \o 15:00:53 hello 15:00:54 hi 15:00:57 hello 15:00:57 hello 15:01:00 hello 15:01:10 hi 15:01:19 long agenda today 15:01:33 I saw a lot of agenda updates in the last 4 hours ;-) 15:01:39 Hi 15:01:45 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:01:52 first topic I submitted. :) 15:01:52 o/ 15:02:01 in the last minute 15:02:05 no announcements today so let's get going 15:02:10 #topic Which method is better for Manila QoS 15:02:23 zhongjun: you're up 15:03:01 #link http://paste.openstack.org/show/477677/ 15:03:08 link http://paste.openstack.org/show/477677/ 15:03:40 wow paste.openstack.org is ultra slow this morning 15:04:17 I'd like to note that at the summit folks were leaning towards #2 but I don't think we all clearly understood the downsides 15:04:19 yes, ameade show some different in this link. 15:04:48 yeah #2 is what I proposed 15:05:22 I'm not convinced that it's worth the extra complexity in Manila to make it easy to share a common qos spec between share types 15:05:44 if we do #2, then the complexity for handling share types as an admin is harder...that i think we all understand 15:05:51 but in order for #2 to work 15:06:00 we have no data on the relative number of share types and qos specs deployers actually use 15:06:12 we need a way to have netapp_iops=10 OR hwawei_iops=20 15:06:25 and I don't see why qos-related extra specs are more deserving a reusable wrapper than other extra specs 15:06:49 we would also need to have extra specs that are values and not just bools 15:06:55 hey 15:06:59 for the min_ and max_ stuff 15:07:01 re-usable spec groups sounds useful 15:07:38 as an admin, i would love to have a single qos group that means gold and has what that means for all vendors i support 15:07:42 markstur__: if we go down that path, I would perfer something more generic, such as inheritable share types or something 15:08:01 bswartz: +1 15:08:11 bswartz: +1 15:08:17 bswartz: +1 15:08:19 inheritable share types sounds more sympathetic to programmatic discovery 15:08:27 or programmatic in general 15:08:33 i'd prefer having multiple share types over inheritable maybe 15:08:38 or share type bundles 15:08:50 inheritance sucks 15:08:52 okay but can we agree that inheritable share types is a totally separate enhancement on top of basic qos extra spec support? 15:09:08 yeah i agree with that 15:09:10 bswartz: +1 15:09:14 I'd like to get qos working, then come back and focus on the management complexity 15:09:21 we still need a way to do ORs and ranges 15:09:48 ameade: example? 15:10:25 why do you need a range if you can use min/max values? 15:10:40 cknight: i mean, wouldn't that mean the driver reports a range? 15:11:10 ameade: not sure I see the difference 15:11:13 ameade: quite the opposite the way I understand, the type specifies a range, the driver reports a value 15:11:40 ganso: +1 15:11:44 I think we're starting to discuss something else -- which is more like performance capacity based scheduling 15:12:00 that's an interesting topic, but not the same as QoS (IMO) 15:12:05 both do ranges? 15:12:33 bswartz: so the need for OR is so i can specify qos values for 2+ vendors in a single share type 15:12:36 the most basic form of qos is a throttle 15:12:48 for a throttle, the admin just specifies a number and the backend implements it 15:12:48 otherwise even with only 2 vendors the number of share_types explodes 15:13:26 ameade: I'm not sure I see why 15:13:45 your paste shows how to do 2 vendors in one share type -- it seems trivial to extend to 3 15:14:29 that example wouldnt work because it would try to find a backend that matches both netapp_iops and huawei_iops no? 15:14:41 no 15:14:52 in reality we'd used scoped keys 15:14:57 so the filter scheduler would never see them 15:15:07 I would edit your paste if it were possible 15:15:26 so we still need to know backend capable of applying those qos values 15:15:28 we should use a wiki for qos design not paste 15:15:57 we can agree on one common extra specs -- qos_support = True/False 15:16:06 that's what you would filter on 15:16:10 so it's all or nothing for qos support? 15:16:25 well part of the agreement would be to define EXACTLY what it means 15:16:34 and document that definition 15:16:41 so any vendor who has a unique qos thing they can provide is out of the picture? 15:16:50 if the most common production approach is something similar as to "bronze, silver, gold, platinum", then for each type the admin would specify QoS ranges that fit those share type standards 15:16:55 we would have to agree on what the basic requirement is for qos support 15:16:58 ameade: no, you can just use vendor-specific scoped keys 15:17:14 cknight: and report vendor specific qos capabilities? 15:17:19 in order to implement OR, I think we need a QoS spec group... because no backend can match both huwawei and netapp QoS extra specs 15:17:48 cknight: I think what ameade is getting at is that if we use unscoped keys for filtering, like netapp_qos_support and huawei_qos_support, then your share type needs to have an OR expression 15:18:15 bswartz: thanks, I get it 15:18:24 that's why I'd be in favor of a common capability for the basic qos_support 15:18:47 but the unscoped keys can be vendor specific because the scheduler never sees them anyways 15:18:55 s/unscoped/scoped/ 15:19:38 I like the idea of inheritable share types of some way to bundle multiple types -- we should discuss that later on 15:19:40 so my huawei backend could report qos_support = true but my share type has all netapp qos specs 15:19:49 so it ends up on huawei but doesnt apply anything 15:20:13 ameade: I think we could agree on some common extra specs for throttling too 15:20:25 like max_read_bps and max_write_bps 15:20:55 for other things, they would need to be vendor specific and we'd just have to document how to avoid doing the wrong thing 15:21:14 sounds like we need to flesh out the design for option #2 in the paste to think about these corner cases 15:21:16 I really don't see this being a huge problem in practice 15:21:41 which is method one in 15:21:44 #link https://wiki.openstack.org/wiki/Manila/QoS 15:22:22 clouds with multiple storage vendors are rare, and when they exist it's even more rare to have a share type that covers 2 or more vendors -- typically people create different share types for each backend vendor 15:22:49 bswartz: I think that last point is a problem in itself tbh 15:23:05 Zhongjun: I'm sorry we don't seem to be getting closer to a decision here 15:23:16 we might need to schedule a working session to get qos hammered out 15:23:32 +1 15:23:41 I dont' want to take up the whole meeting with qos though because we have other business 15:23:46 and if someone has the bandwidth to design what 'method 1' would need to look like exactly 15:23:47 bswartz: It's ok 15:24:26 I'd like to see more detailed example of what it should look like in the real world 15:25:01 ameade's paste is a good start, but we need more examples that support a particular design or show why a particular design has problems 15:25:15 +1 15:25:26 and right now I'm less concerned with administrator quality of life and more concerned with whether we can even implement something that makes sense 15:25:57 I think we can come back and solve manageability issues after we have a design that makes sense 15:26:14 Let's use the ML to continue discussing this one 15:26:24 just want to make sure we don't pigeon hole ourselves 15:26:25 and I will try to schedule a specific time next week to discuss qos 15:26:30 +1 15:26:39 #topic Manila DR update 15:26:46 ameade: this one's your 15:26:48 #link https://review.openstack.org/#/c/238572/ 15:26:49 yours 15:27:04 haven't gotten any reviews but i have a couple things to run by folks in the meeting 15:27:19 at the summit we agreed that we need a first party driver implementation of DR to run in the gate 15:27:33 ameade: yes 15:27:48 i want to know if we should do that in the current generic driver or if we were going to have another generic driver soon 15:28:00 and who can do that work and do we need it right away 15:28:05 while the api is experimental 15:28:24 ameade: there are 2 new first party drivers in development -- both should have reviewable code 15:29:08 I honestly don't know if it would be easier to build replication support in the existing generic driver or one of the new ones 15:29:28 at the very least, replication support would create additional dependencies in the driver 15:29:36 so it would need to be optional 15:30:07 the 2 most promising approaches I'm aware of are: 15:30:07 bswartz: I'm very hesitant to continue investing in the current generic driver 15:30:09 I don't know of the implementation details of DR, but wouldn't it make more sense to use Cinder to replicate in the current generic driver since it uses Cinder... as soon as Cinder implements DR properly... 15:30:33 1) block layer replication using DRBD, and using a filesystem which support replication on top of it 15:30:35 is this something we want in one of these drivers right away in Mitaka? 15:30:47 and 2) filesystem layer replication, using ZFS 15:31:23 ganso: cinder's replication semantics are too weak for us to use them, even if it was working today (it's not) 15:31:59 maybe 3) filesystem layer replication, using glusterfs? 15:32:15 the advantage to a DRBD solution is that we could do active/active 15:32:48 the ZFS-based approach would be active/passive for sure, but could probably implement "readable" semantics rather than "dr" 15:32:50 if we do active/active then it can test promote :P 15:32:53 cant* 15:33:09 ameade: maybe we should just do both then 15:33:47 DRBD is also relativly simple to setup 15:33:49 jasonsb: I'm not familiar enough with glusterfs to know what it can do, replication-wise 15:34:42 bswartz: i can check on it and give some more details on how suitable 15:34:45 I'd be happy to hear about additional proposals for replication in a first-party driver 15:34:54 especially if it's less work to get it up and running 15:35:17 ok so design aside, is it ok to just require this for when we transition off of experimental? 15:35:24 bswartz: wrt. glusterfs: I'm not familiar with state of the art either, but can check about 15:35:24 toabctl: I hope you're right -- do you have any interest in writing a prototype? 15:35:42 or is it something we need right away (obviously sooner is better)? 15:35:55 bswartz: so this could be in *any* of the drivers that run in the gate (gluster, generic, ceph, hdfs, etc.), right? 15:37:13 ameade: I'm of 2 minds about that -- part of me thinks we need a first party implementation before it can merge, but then I can also imagine merging it as experimental and adding support in a first party driver afterwards 15:37:26 bswartz: interest yes, but no time. next SUSE cloud release is on the agenda currently. sorry 15:37:39 I'd like to hear more opinions on that 15:37:49 toabctl: okay I understand 15:37:53 bswartz: I'd say it cannot leave the experimental state until it's tested in the gate. 15:37:54 bswartz: same, it definitely can't be promoted from experimental without it, i think we all agree there 15:38:08 * time check, 23min left * 15:38:25 ameade: you're okay with moving on? 15:38:52 yeah for now, we can revisit later 15:38:54 #topic Manila Driver minimum requirements document update 15:38:57 ganso: you're up 15:39:27 ok so, it seems the last issue remaining in the document is that we are not sure if manage feature is mandatory or not in DHSS=False mode 15:39:54 ganso: I think we agreed that it's optional 15:39:57 we know it is not mandatory in general, because the driver can implement DHSS=True mode and not have to worry about this 15:40:14 but if the driver operates in DHSS=False, does it have to implement manage? 15:40:25 there's confusion about what not supporting it means, so we should be very clear that drivers don't need to implement anything 15:41:02 also, Valeriy questioned that if all drivers can implement this, then this should be mandatory for DHSS=False mode... 15:41:12 I agree with him, for interoperability 15:41:34 so it's an admin-only feature, so interoperability is less critical 15:41:48 and even for drivers that support it, it's allowed to fail for arbitrary reasons 15:42:19 therefore even if we made it mandatory, a driver could simply always fail and still meet the contract of the driver interface 15:42:33 that sounds optional 15:42:33 thus it's silly to make it mandatory 15:42:54 humm ok, so it is completely optional, even in DHSS=False... I will update the document 15:43:06 yes I think that's what we said in tokyo 15:43:22 ganso: thanks for handling this document! 15:43:32 fwiw this is not a change in thinking, but just a change in how we communicate to driver maintainers 15:43:33 +1 thanks 15:44:07 we've always known that drivers can get away with a noop implementaiton so it's effectively optional 15:44:20 #topic Manila Data Copy service name 15:44:27 #link https://review.openstack.org/#/c/244286/ 15:44:37 so thanks to ganso for starting work on the new data copy service 15:45:17 the proposed named was manila-data_copy which is gross for 2 reasons (mixing hypens and underscores, and it's 2 words instead of 1) 15:45:31 I prefer manila-data, or m-dat for short 15:45:43 but I'm soliciting other ideas 15:45:59 we don't need to spend much time on this -- please use the code review to register feedback 15:46:09 I just wanted to raise awareness that we need to choose a name 15:46:13 I already changed to m-dat and manila-data 15:46:24 if we all agree to this, then it is decided 15:46:30 bswartz: +1 on needing another name. your suggestion seems a good starting point for a service in the control plan that must access backends on the data plane. 15:46:36 ganso: thanks -- I want to know if anyone else has better/different ideas 15:47:05 if everyone is fine with manila-data, then we're done 15:47:14 #toic Upcoming change for Manila CI hooks 15:47:21 vponomaryov: you're up 15:47:53 is he on irc today 15:48:01 oh vponomaryov isn't here 15:48:16 must be connection issues 15:48:22 he added this topic right before the meeting 15:48:28 vponomaryov: hello! 15:48:35 it's time for your topic 15:48:57 sorry, missed the timings 15:49:17 so, this topic is about head up for driver maintainers 15:49:24 and theirs Third-party CIs 15:49:42 oh yes I remember this 15:49:51 #link https://review.openstack.org/#/c/243233/ 15:50:28 yeah this change has the potential to break CI systems, depending on how they're implemented 15:50:32 first reason - we use fixed version of Tempest and our plugin is updated from time to time, so we store value in manila CI hooks 15:50:43 most of them should be fine (as we can see from the votes) 15:51:15 CI maintainers should take a closer look at this one, and after it merges make sure they're not broken 15:51:18 and to ease sync for all third-party CIs common parts are being separated to another file 15:51:58 any questions on this? 15:52:00 and now is right time to make updates to Third-party CIs to support both - old and new approach 15:52:15 new approach is much better :) 15:52:31 ganso: agreed 15:52:47 #topic Manilaclient enhancement to provide request_id when set http_log_debug is True 15:52:59 JayXu: you're up 15:53:17 we are refactoring our component test 15:53:24 #link https://review.openstack.org/#/c/243233/ 15:53:47 and find that there is no way to correlate the failed request with its id 15:54:04 err 15:54:05 wrong link 15:54:06 so I propose to add request_id into http resp body 15:54:13 the agenda got screwed up 15:54:22 http://paste.openstack.org/show/478675/ 15:54:56 any comment on that? 15:55:13 there is a cross project effort to add request id to response data, may be related to this 15:55:13 bswartz: just update page ) 15:55:20 bswartz: *refresh 15:55:26 JayXu: was there a cross project discussion about this in tokyo? 15:55:29 let me dig out some info 15:55:45 no 15:55:50 I think it's a good idea -- especially if it can be done cheaply 15:56:07 I just got it this week when I tried to update my component test cases 15:56:12 I have a link, give me a sec 15:56:18 when I first heard about this idea I wondered if it would require much code to track the ID everywhere we want to log it 15:56:19 #link https://etherpad.openstack.org/p/Mitaka_Cross_Project_Logging 15:56:25 no spec yet i think 15:57:03 I'm sure QA guys and deployers/troubleshooters would be thrilled to have this though 15:57:42 https://blueprints.launchpad.net/python-cinderclient/+spec/return-request-id-to-caller 15:57:43 it might even enable some cool scripting to tie together log files 15:58:13 there is also an oslo spec 15:58:21 xyang1: thanks 15:58:49 JayXu: if you haven't seen these, I suggest reading them and making sure the proposal for Manila fits in with what others are doing 15:59:00 okay, thx 15:59:01 it sounds like a good idea, but we should be consistent with our approach 15:59:09 #topic open discussion 15:59:13 only 1 minute left 15:59:20 anyone have a last minute topic? 15:59:43 we need to followup on both qos open items and replication open items 16:00:05 let's use the ML for those 16:00:08 thanks everyone 16:00:22 #endmeeting