15:00:26 #startmeeting manila 15:00:28 Meeting started Thu Dec 3 15:00:26 2015 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:32 The meeting name has been set to 'manila' 15:00:34 hello all 15:00:37 hi 15:00:41 bswartz: \o 15:00:45 hi 15:00:46 hello 15:00:52 hi 15:00:56 hello 15:01:53 * bswartz continues to have IRC issues, so be patient if he disappears briefly 15:01:58 hello 15:02:07 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:02:41 #topic midcycle meetup 15:03:02 after 2 weeks of voting I closed the poll 15:03:38 there was an offer to host the meetup in Roseville CA (thanks markstur!) 15:04:02 markstur: no direct flight to there:( 15:04:09 and there was an offer to hose the meeting in the Principality of Sealand (some joker I guess) 15:04:15 host* 15:04:57 I wish I could say I did that, but it was someone more clever :) 15:05:03 xyang2, no direct flights to Sealand either 15:05:16 lol 15:05:17 markstur_: that's true:) 15:05:26 How many people would prefer Roseville to RTP, assuming you would travel? 15:06:08 hi 15:06:08 historically we've been nearly all virtual, and it would be nice to get a more face to face meetup going 15:06:22 bswartz: +1 15:07:10 okay I'm not hearing any votes for Roseville 15:07:35 assuming we can get people interested in meeting face to face, we will work on locations again next time 15:07:51 I think this time it will be mostly virtual again 15:08:02 so the next important issue is the date 15:08:39 the polling showed a clear preference for 2 weeks before Cinder, followed by 1 week before cinder, and people did not like the 1 week after cinder option 15:08:51 * bswartz wishes he had an easy way to share the survey data 15:09:43 so I think Jan 12-14 will be the date range (although we'll pick only 2 days, not 3) 15:10:10 is there anyone who absolutely can't attend those dates? 15:10:19 that's 6 weeks from now btw 15:11:32 okay with no objections we will plan on Jan 12-14 for the midcycle 15:11:59 I will do an ML announcement and we will work on the agenda and the exact times in later meetings 15:12:11 bswartz: What's the venue going to be? 15:12:17 anything else about the midcycle? 15:12:57 dustins: NetApp is willing to host in RTP 15:13:28 dustins: so I will need to get a list of people who will travel or already live close (like you) 15:13:32 bswartz: Cool, that's very kind of them :) 15:13:47 I'll put that info in the ML post 15:14:00 I expect most people will be virtual as they have in the past 15:14:30 #topic Add "data loss" flag in DB migrations 15:14:35 u_glide: you're up 15:14:43 thanks 15:14:55 We have a bunch of DB migrations which can lead to data loss (example #link https://review.openstack.org/#/c/245126/10/manila/db/migrations/alembic/versions/344c1ac4747f_remove_access_rules_status.py). I think we should somehow mark this migrations to simplify upgrading process. 15:15:05 Also, If we want to (someday) implement versioned objects in Manila we also should mark migrations which couldn't be applied on live cloud. But this is another story :) 15:15:23 a bunch of them? 15:15:36 isn't it just 1 in this new feature? 15:15:46 u_glide1: there's work in cinder on this 15:16:06 u_glide1: we can only migrate in multiple releases in some cases 15:16:35 https://review.openstack.org/#/c/245976/ 15:16:58 bswartz: no, we have at least 3 existing migrations with possible data loss 15:17:33 but in downgrade methods 15:17:49 the online schema update will be important later -- I think we can wait to look into that until more time is spent on versioned objects however 15:18:05 oh 15:18:26 well I can understand downgrades being destructive, since upgrades commonly add new columns 15:18:29 oh, I forgot we don't have versioned objects in manila yet 15:18:59 xyang2: yes we want them but it's a ton a work 15:19:05 yes 15:19:39 u_glide1: any upgrades destructive other than the one you propose? 15:20:16 bswartz: I didn't check it, but looks like it's the first one 15:20:31 and btw, I'm completely okay with the proposed upgrade -- the only thing lost would be a status column which doesn't have a ton of value 15:21:37 however for safety, and for future use, I'm also okay with a flag that must be specified for destructive migrations 15:22:18 although we should probably call them something less scary, or provide some help text to help the admin understand what will be lost 15:22:53 yes, it makes sense :) 15:23:29 for example, if db sync is called without --force, an error message should be printed that explains --force is needed and what exactly it will do 15:23:55 and any scripts that currently do unattended db syncs need to be updated to specify --force 15:24:15 anyone disagree or want a different name for this? 15:24:44 Will it be possible to detect that in 'db sync', even if the data loss is several migration steps away from the current one? 15:25:20 cknight: I'm imagining it would sync as far as it could and stop at the lossy step if there was no --force 15:25:22 this check will be performed *before* actual upgrades 15:25:35 perhaps that's not the best behavior 15:25:47 so what do i do if i want to upgrade without dataloss? 15:25:59 bswartz: +1 It would be better to check all the steps and warn before doing any of them. 15:26:12 bswartz: +1 15:26:17 u_glide1: do you propose not doing any changes without --force if there is a lossy step that needs to be performed? 15:26:33 bswartz: yes, I do 15:26:35 ameade: the point of this is that it's not possible (or desirable) 15:26:51 bswartz: so the whole point of this is just an extra warning? 15:26:58 ameade: admins who worry about that kind of thing take DB backups before upgrading anyways 15:27:13 can't do that in a live upgrade though 15:27:41 ameade: many projects don't allow downgrades at all, and rely on a DB backup for their "downgrade" 15:28:03 that's basically the position i hear us making right now 15:28:08 bswartz: or at least they *should* take a backup 15:29:22 no I think we're proposing that we will allow you to downgrade to reverse the lossy step, but if you upgrade and downgrade your database might be in a different state than it was 15:29:59 downgrades are not intended to be used in production -- they're for testing mostly 15:30:11 ameade: whith this extra warnings admin will know possible issues *before* migration, not in the middle 15:30:16 but they're a valuable testing tool which is why we don't remove them 15:30:36 bswartz: whats an example of it's value in testing? 15:31:29 ameade: to be able to put an already-populated DB into a downgraded state so you can test your upgrades on actual data 15:32:22 the alternative is forcing developers to keep backups of their DB at every possible upgrade step 15:33:20 well another alternative is to start in the downgraded state, get real data, then upgrade 15:34:00 ameade: yes but if the data you need to "get" is something that won't be added until an intermediate upgrade step, then the "get data" step might have to be run multiple times 15:34:30 ameade: the question of what production deployments do with live upgrade is more interesting 15:34:47 the best practice is to take a backup in case the upgrade goes pear shaped 15:35:06 however recovering a backup is an obviously lossy step 15:35:37 since we don't to live upgrades (yet) that's not a problem we face though 15:36:20 if you do an offline upgrade, then recovering from a backup is lossless 15:36:28 mhm, i think if we want live upgrades we need to worry about these things now 15:36:30 I think we should move on though 15:36:38 so we dont have crazy iterative design 15:37:01 ameade: live upgrades won't apply to older upgrade steps -- only to steps we add after we have versioned objects 15:37:36 so we solve it one way for now and another way later? 15:37:49 so even once we have versioned objects to support live upgrades, users will have to do 1 last offline upgrade 15:38:44 I don't think this issue will change 15:39:20 even if we had versioned objects and live upgrade support, we could still make decisions to alter the schema to drop data we deem useless or obsolete 15:40:18 i think this is such a minor thing, if we want deployers to be careful we could just always warn that db migrations may much up your db 15:40:28 muck* 15:40:32 ameade: I want to hear more about what you think the problem here is, but I also want to get to the rest of our topics 15:40:47 i just dont think it's a big problem 15:41:11 but yeah lets carry on 15:41:13 #topic Reorganization of Manila scheduler 15:41:25 In case you didn't know, the oslo incubator (from which Manila & Cinder get some of their scheduler code) is going away. 15:41:34 So we have to own the code we got from there (which is already in the Manila tree) and also copy the unit tests over from oslo. 15:41:41 we merged a patch from cknight which moved around a lot of scheduler code 15:41:47 I already knew the organization of the scheduler code & unit tests left a lot to be desired, and it made grafting in the oslo tests more complicated. 15:41:57 So I cleaned it all up. That work merged yesterday, and it shouldn't break anyone (except other scheduler patches in flight). 15:42:08 Cinder has the same issues in their scheduler and would benefit from the same cleanup. I don't know if they would agree. 15:42:18 Without oslo incubator, the scheduler code we shared with Cinder is guaranteed to diverge over time. 15:42:27 So the question for us is whether to pursue a scheduler code sharing arrangement with Cinder in the form of a library. I'm less optimistic they would be receptive to that, but we might as well consider the question as the Manila community. 15:42:37 Thoughts? 15:42:47 https://review.openstack.org/#/c/252060/ 15:43:04 cknight: do you know why oslo does not want to maintain this any more 15:43:07 cknight: ask Cinder community first 15:43:25 my personal feeling is that we should not propose anything to cinder that we're not happy with ourselves 15:43:26 cknight: do they want to share it with Manila via lib or not 15:43:26 xyang2: I don't know the whole story, but there were numerous small bits of code with no owners. 15:43:41 xyang2: The scheduler bits were in that category. 15:43:45 cknight: I would think nova needs it too 15:43:46 merging the above patch was step 1 towards making us happy with the scheduler 15:44:10 vponomaryov: agreed, I wouldn't do anything unless there was interest from all sides 15:44:24 sharing is good, but the problem is where it should reside 15:44:30 have you considered taking the question to the cross-projects team? 15:44:43 I'm not sure if there's enough shared code to warrant a whole library -- that's one reason it never go out of oslo-incubator 15:45:07 xyang2: the oslo team's position is that those things belong in a shared library, and now the library must be the first step, which seems more of a barrier to entry. 15:45:20 but if there is interest in a (very small) scheduling library, then we could absolutely pursue it 15:45:24 os-brick is a sub project under cinder that is shared between cinder and nova 15:45:38 so even if it is a library, it should still be under some project 15:45:49 otherwise, who owns it 15:45:56 xyang2: os-brick is quite a bit larger than the schedule code we pulled from oslo-incubator I think 15:46:05 xyang2 : exactly, one of the projects can initiate the library and add others needed as cores 15:46:06 xyang2: so cinder is a dep of nova? 15:46:22 xyang2: the question of ownership is likely to make it more difficult to get cinder and manila to agree 15:46:24 csaba: no 15:46:30 xyang2: would you like to ask the Cinder team about their interest in sharing a scheduler library? 15:46:38 csaba: nova just uses os-brick as a lib 15:46:45 xyang2: +1 someone has to own it 15:46:47 cknight: sure 15:47:05 xyang2: ok, thanks. I'm content to leave it there for now, then. 15:47:09 dims_: you mean make it a standalone lib outside of cinder and manila? 15:47:17 xyang2, cknight : process to setup a library is no onerous 15:47:21 s/no/not/ 15:47:27 sure 15:47:57 http://docs.openstack.org/infra/manual/creators.html 15:48:03 dims_: thanks 15:48:21 OK, let's let xyang2 determine interest on the Cinder side and go from there. 15:48:25 dims_: it may not be technically onerous, but politically its more effort to get everyone on board 15:48:35 The benefit for them would be better code organization and modularity. 15:48:43 bswartz :) true 15:48:55 I think nova may need to be involved as well 15:49:20 xyang2: I don't think the nova scheduler shared much (or anything) with the cinder scheduler 15:49:36 I thought they diverged considerably after the fork 15:49:45 bswartz: we are reading nova scheduler doc for cinder:) 15:49:53 oh well perhaps I'm wrong 15:50:11 bswartz: should be simple to check 15:50:19 yeah 15:50:29 cknight: thanks for raising awareness about this 15:50:40 we have one more topic 15:50:50 #topic CI reliability 15:50:54 xyang2: that's why I was thinking cross-project, if three projects are involved, if only two, then not. 15:50:59 So last night I was reviewing several of the Gluster patches, when I realized that none had passed their own CI tests. 15:51:07 Ben & I have discussed how to encourage vendors to keep their CI working all the time, instead of fixing it near a release to keep from being thrown out. 15:51:09 tbarron: sure 15:51:19 I think the answer is to politely refuse to merge a driver patch if the CI for that driver has not succeeded for the patch. 15:51:20 csaba, rraja: what's with the glusterfs CI? 15:51:30 CI == *continuous* integration, and it only has value if we use it to gate patches, no different from how Jenkins gates core patches. 15:51:39 I've also noticed that it fails all the time recently 15:51:44 I'm not picking on the Gluster guys, but I think we do everyone - including driver teams - a disservice by merging patches without passing CI reports. 15:52:08 we are doing that in cinder. makes sense to check CI in manila as well 15:52:10 So I'm proposing we not merge driver patches unless CI is good from that driver's CI run. 15:52:12 cknight, bswartz : your concern is fully agreed... we planned to clean it up and fix the CI just that was delayed due to some urgent stuff 15:52:23 we'll fix it for next week 15:52:49 Any dissent from reviewers? 15:53:08 It's trivial to check the CI result before clicking +1 or +2. 15:53:23 What do we define as having a good CI? A week of good results? 15:53:25 so in additiona to glusterfs, I see a lot of red in the CI history from: Huawei, Microsoft, and Oracle 15:53:45 dustins: It's success for the specific patch to be merged. 15:54:01 dustins: Otherwise we have no idea if the driver change is good. 15:54:09 cknight: +1, makes sense to check. 15:54:11 dustins: I think the proposal is that your patch needs to pass your own CI 15:54:33 bswartz: +1 Exactly. Otherwise vendor CI doesn't offer much value. 15:54:34 cknight: Makes sense to me, and it's a simple guideline to remember 15:54:44 we wouldn't merge any glusterfs patch that didn't pass glusterfs CI, for example 15:54:46 bswartz, cknight: what about latency in reporting? 15:54:59 bswartz, cknight: some CIs report ina week 15:55:21 vponomaryov: That would be a problem. Waiting for Jenkins +1 is OK, but a week is not. 15:55:28 vponomaryov: vendors typically have ways to manipulate their CI queues, or force their system to check a specific patch 15:55:56 vponomaryov: But the patch author would then be more likely to be stuck in a rebase loop, so there is incentive to run CI faster. 15:56:14 if Oracle's CI system is slow (to pick on someone else) and Oracle wanted to upgrade their driver, then they should make their CI system test their own patch and post the results as fast as it's able to 15:56:32 bswartz, cknight: do we want to set threshold for latency too? 15:56:53 there are other questions about CI quality and latency we should discuss 15:57:11 I think cknight is trying to raise the bar very slightly here and it sounds reasonable to me 15:57:18 bswartz: +1. Let's walk before we run. 15:57:32 bswartz: +1 15:57:35 +1 15:57:52 +1 15:58:16 okay 15:58:16 Sound like agreement. Thanks, everyone! 15:58:47 #agreed core reviewers should not +2 driver patches that don't receive a +1 from the CI system that tests that driver 15:59:00 #topic open discussion 15:59:05 1 minute left.... 15:59:55 okay it sounds like we wrapped up everything 16:00:03 oh wait... 16:00:06 Mitaka-1 is today! 16:00:24 I'm working on cutting the release -- working through some new release processes that's making it take longer than it used to 16:00:37 so look for the milestone to be cut this afternoon 16:00:46 okay thanks all 16:00:59 #endmeeting