16:00:04 <thingee> #startmeeting cinder
16:00:04 <openstack> Meeting started Wed Jul  1 16:00:04 2015 UTC and is due to finish in 60 minutes.  The chair is thingee. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:07 <openstack> The meeting name has been set to 'cinder'
16:00:10 <dulek> hi!
16:00:17 <e0ne> hi!
16:00:19 <cebruns> hello!
16:00:20 <thangp> o/
16:00:21 <tbarron> hi
16:00:21 <eharney> hi
16:00:23 <jseiler> hi
16:00:27 <mriedem> hi
16:00:29 <xyang2> hi
16:00:30 <sgotliv> o/
16:00:31 <liuxg> hi
16:00:31 <patrickeast> hi
16:00:34 <SrikanthPoolla> hello
16:00:35 <dannywilson> hi
16:00:36 <deepakcs> o/
16:00:47 <thingee> hi everyone!
16:00:47 <flip214> .
16:00:48 <Swanson> hello
16:01:02 <zongliang> hi
16:01:11 <thingee> #topic announcements
16:01:40 <thingee> #info Cinder is not accepting anymore drivers for liberty
16:01:44 <thingee> #link http://lists.openstack.org/pipermail/openstack-dev/2015-May/064072.html
16:01:50 <jgriffith> o/
16:01:51 <thingee> Please stop harassing me on PM's about it
16:01:53 <erlon> hi
16:01:55 <jgriffith> LOL
16:02:18 <hemna> :)
16:02:30 <thingee> I will be approving RPC and version object work by thangp this week
16:02:43 <thingee> #info spec for rpc compat is planned to be approved this week
16:02:45 <thangp> \o/
16:02:45 <thingee> #link https://review.openstack.org/#/c/192037/5
16:02:53 <thingee> speak now or hold your peace
16:03:01 <e0ne> :)
16:03:21 <thingee> #info Return request id to caller cross project will be approved this week
16:03:23 <thingee> #link https://review.openstack.org/#/c/156508/
16:03:38 <thingee> speak now or hold your peace. Cinder client will be receiving this treatment in Liberty
16:03:51 <winston-d> nice
16:03:57 <hemna> :)
16:04:01 <thingee> #topic Encrypted volumes
16:04:03 <thingee> mriedem: hi!
16:04:11 <mriedem> hello
16:04:15 <thingee> #info mailing list post
16:04:17 <thingee> #link http://lists.openstack.org/pipermail/openstack-dev/2015-June/068117.html
16:04:37 <mriedem> so i just read over the latest on the cinder change https://review.openstack.org/#/c/193673/ to set the encrypted flag globally in connection_info
16:04:47 <mriedem> looks like everyone is in general agreement this is goodness so our heads aren't in the sand
16:04:52 <jungleboyj> o/
16:05:02 <mriedem> the tempets change to add the config option to disable the encrypted volume tests is merged
16:05:07 <hemna> mriedem, does that also mean it will get saved in nova's bdm ?
16:05:10 <mriedem> i pinged some -qa guys for the d-g and devstack changes
16:05:15 <mriedem> hemna: yeah
16:05:19 <hemna> ok cool
16:05:24 <mriedem> tbarron: was good enough to open some nova bugs
16:05:38 <thingee> Is everyone fine with my decision here? http://lists.openstack.org/pipermail/openstack-dev/2015-June/068370.html
16:05:44 <mriedem> so those are on the radar, one looks simple (rootwrap filter update?) and the other i'm not sure about, some in-use race or something
16:06:04 <thingee> we will just enable this and see which CI's surface with this problem and deal with them on a case by case
16:06:08 <tbarron> mriedem: there may of course be other bugs exposed once e.g. the rootwrap is fixed
16:06:14 <tbarron> thingee: +1
16:06:23 <DuncanT> +1
16:06:26 <mriedem> tbarron: sure - this is not a very well exposed area,
16:06:30 <mriedem> so i expect more bugs
16:06:31 <xyang2> +1
16:06:32 <jgriffith> +2
16:06:36 <patrickeast> +1
16:06:46 <mriedem> tbarron: were you working on any of the nova fixes, or just reporting?
16:06:54 <tbarron> I'd like other 3rd party CIs to check againgst these two bugs and not just assume their failures are the same
16:06:59 <thingee> #agreed enable tempest, let ci's surface and work with maintainers to resolve issues in their backend/drivers
16:07:02 <thingee> mriedem: thanks :)
16:07:12 <mriedem> np
16:07:12 <thingee> mriedem: and for all your work on tracking this issue down
16:07:16 <xyang2> I've added comments there as our failures are similar to tbarron's
16:07:21 <mriedem> \o/
16:07:24 <tbarron> mriedem: I'd do the rootwrap one right now but I can't submit upstream this week (company is down this week, and I have to get approval)
16:07:34 <mriedem> tbarron: ok, i'll check it out today
16:07:35 <xyang2> FC error is the same as NFS because it uses rootwrap for iscsi
16:07:35 <tbarron> mriedem: the 'in use' bug is tricky I think
16:07:48 <xyang2> iSCSI error is exactly the same as tbarron's
16:07:53 <deepakcs> tbarron: which two bugs, any links ?
16:08:07 <mriedem> https://bugs.launchpad.net/nova/+bug/1470142
16:08:07 <openstack> Launchpad bug 1470142 in OpenStack Compute (nova) "Nova volume encryptors attach volume fails for NFS and FC" [Medium,Triaged]
16:08:09 <mriedem> https://bugs.launchpad.net/nova/+bug/1470562
16:08:10 <openstack> Launchpad bug 1470562 in OpenStack Compute (nova) "'in use' error when Nova volume encryptors format cinder volumes" [Medium,Confirmed]
16:08:20 <xyang2> tbarron: I modified the bug to add FC there
16:08:45 <tbarron> xyang2: yes, I saw, thanks!  would you also add to/modify the in-use bug?
16:09:00 <thingee> mriedem: anything else to add?
16:09:03 <xyang2> tbarron: sure, I didn't see it yesterday
16:09:11 <mriedem> thingee: nope
16:09:12 <tbarron> xyang2: that's because I just now filed it
16:09:17 <thingee> #topic CI job changes
16:09:19 <thingee> e0ne: hi
16:09:23 <e0ne> hi
16:09:27 <hemna> mriedem, I wonder if my FC nova patch helps out at all ? https://review.openstack.org/#/c/195350/
16:09:31 <thingee> #idea make gate-rally-dsvm-cinder job voting
16:09:41 <e0ne> it's table enough
16:09:42 <hemna> mriedem, it's been sitting there for a bit w/o attention.
16:09:49 <hemna> mriedem, it also helps live migration for FC
16:10:01 <e0ne> and it's only one job wich tests cinder+python-cinderclient
16:10:17 <e0ne> also, it covers cases which are not tested by tempest
16:10:26 <e0ne> e.g. lastest bug with broken backups
16:10:29 <jgriffith> thingee: e0ne I'm not completely sure about stability on that
16:10:30 <thingee> e0ne: are there any other projects that have move rally to voting?
16:10:41 <asselin> o/
16:10:48 <e0ne> thingee: i don't know;(
16:10:49 <jgriffith> thingee: e0ne but I'm +1 if it is
16:10:57 <mriedem> nova doesn't have a rally job
16:11:00 <mriedem> fwiw
16:11:00 <thingee> e0ne: do you have any stats on the stability?
16:11:18 <thingee> or how you came to the conclusion of it being stable
16:11:21 <e0ne> thingee: we could get it using logstash
16:11:34 <mriedem> e0ne: i think you should check out graphite
16:11:39 <mriedem> http://graphite.openstack.org/
16:11:44 <mriedem> logstash only has 10 days of results
16:11:52 <mriedem> graphite has release cycles worth
16:12:01 <mriedem> that's how we knew the ceph job stabilized
16:12:03 <mriedem> before making it votiing
16:12:04 <thingee> In previous decisions of making things voting like Ceph we have used stats no problem. I don't see any harm in looking that over before we make a decision.
16:13:03 <thingee> not sure how long this will take, and if we should just circle back
16:13:08 <e0ne> i'm sorry, i didn't get any stats right now:(
16:13:30 <thingee> ok, I'm fine with it. I'm not sure who is opposed, but if it's stable, sure why not?
16:13:51 <DuncanT> Agreed - if it is stable then it sounds like a good idea
16:14:03 <e0ne> tbh, i didn't see any fals-negative reports lasw 2 weeks
16:14:13 <mriedem> yfried was saying rally was broken in -infra yesteday
16:14:21 <mriedem> due to some jsonschema/functools32 stuff
16:14:36 <e0ne> i could get some stats and post it to a commit with making job voting
16:14:46 <thingee> e0ne: that sounds like a good idea to move forward
16:14:55 <e0ne> mriedem: it was broken becose some projects were broken
16:15:00 <mriedem> e0ne: what's the name of the job?
16:15:01 <thingee> heh
16:15:11 <thingee> "not my project, this other project was broken."
16:15:12 <avishay> hi all, sorry i'm late
16:15:20 <thingee> avishay: you're just in time
16:15:35 <e0ne> thingee: tbh, rally was broken not with cinder related jobs:)
16:15:43 <dannywilson> e0ne: is there a plan/maintainer if the job starts posting false negatives?  point of contact?
16:16:00 * DuncanT can't find jenkins votes in graphite :-(
16:16:03 <e0ne> dannywilson: it cold be me
16:16:09 <e0ne> DuncanT: +1
16:16:23 <winston-d> I can see gate-rally-dsvm-designate/ironic/manila/mistral/murano/zaqar and some more in graphite
16:16:25 <mriedem> DuncanT: i can find you stats in graphite
16:16:33 <e0ne> dannywilson: i have +2 for cinder-related patches to rally
16:16:35 <smcginnis> DuncanT: graphite data loss yesterday due to the storage outage.
16:16:55 <thingee> smcginnis: ha
16:17:12 <winston-d> smcginnis: wondering what kind of storage backend they were using
16:17:13 <DuncanT> smcginnis: mriedem: I think I might just be looking in the wrong place... a search in graphite for 'cinder' or 'rally' both fail though
16:17:23 <dannywilson> e0ne: okay, can we post that somewhere like a wiki page so others can find it too?
16:17:30 <smcginnis> winston-d: Not mine! :)
16:17:45 <thingee> dannywilson: we'll discuss this in the next meeting so people can find it
16:17:47 <flip214> DuncanT: you'll need an URL like this: http://graphite.openstack.org/render/?width=600&height=344&_salt=1434709688.361&from=-1days&xFormat=%25b%25d-%25T&title=DRBD%20Cinder%2FDevstack%20stats&colorList=red%2Cgreen%2Cblue&target=stats_counts.zuul.pipeline.check.job.check-tempest-dsvm-full-drbd-devstack-nv.FAILURE&target=stats_counts.zuul.pipeline.check.job.check-tempest-dsvm-full-drbd-devstack-nv.SUCCESS
16:17:54 <thingee> nice URL
16:17:57 <dannywilson> sounds good
16:18:02 <e0ne> dannywilson: https://github.com/openstack/rally/blob/master/doc/source/project_info.rst
16:18:20 <thingee> #agreed get stats, discuss in next week's meeting
16:18:33 <dannywilson> e0ne: thanks
16:18:43 <e0ne> thingee: thanks!
16:18:46 <thingee> #action e0ne to get stats and include in review comments for enabling voting job
16:18:48 <thingee> e0ne: thanks
16:18:49 <flip214> DuncanT: or, even better: go to logstash.openstack.org,
16:18:55 <flip214> and use a search filter like that:  project:"openstack/cinder"  AND build_name:"check-tempest-dsvm-full-drbd-devstack-nv" AND "Detailed logs" AND build_status:"FAILURE"
16:19:03 <thingee> #topic Remove LIO iSCSI helper from Cinder
16:19:05 <thingee> avishay: hi
16:19:07 <avishay> thingee: hey
16:19:13 <thingee> #idea remove LIO iscsi helper
16:19:19 <avishay> so maybe the topic is a little too ... enthusiastic
16:19:24 <eharney> could we perhaps start with the _problem_ rather than "remove it"
16:19:25 <e0ne> thingee: what about cinderclient  functional tests job?
16:19:30 <thingee> #info broken in Juno
16:19:33 <avishay> but LIO in juno is currently broken and has no CI
16:19:54 <avishay> other iSCSI targets (except tgt AFAIK) have no CI either
16:19:58 <eharney> i am now taking up the effort for doing CI for LVM+LIO
16:19:59 <thingee> avishay: FYI, we really had no ci's in juno
16:20:03 <avishay> the nfs driver has no CI AFAIK
16:20:14 <jgriffith> eharney: to be fair I raised this last cycle that it's "unknown" how to install and run on Ubuntu and has no CI
16:20:22 <jgriffith> eharney: and isn't well tested.
16:20:32 <jgriffith> eharney: IMO we should either get it fixed up and CI'd or remove it
16:20:39 <eharney> jgriffith: i agree it's a problem.  so let's fix it
16:20:45 <tbarron> eharney: +1
16:20:45 <eharney> i will do CI
16:20:47 <thingee> avishay:  also LIO-iser is exactly what mellanox ci does
16:20:49 <geguileo> eharney: +1
16:20:54 <thingee> however, it's down right now
16:20:56 <thingee> according to stats
16:21:02 <jgriffith> eharney: so I quite getting calls from EVERY customer that runs RHEL and it breaks every other iSCSI dev
16:21:11 <jgriffith> eharney: I'm with ya on that
16:21:15 <jgriffith> fixing it
16:21:18 <jgriffith> I'm fine with that
16:21:21 <hemna> can we fix the LIO helper instead of removing it ?
16:21:32 <jgriffith> to avishay 's point, the only way people care is when you threaten removal
16:21:33 <geguileo> hemna: +1
16:21:33 <avishay> anyway, i love CI as a deployer, but setting up a test env with LVM or NFS has been rough
16:21:35 <jgriffith> :)
16:21:37 <jungleboyj> eharney: Thanks for being willing to do that.
16:21:44 <tbarron> hemna: +1
16:21:45 <jgriffith> hemna: keep up hemna !! :)
16:21:51 <thingee> #info Mellanox CI does provide CI for LIO-iser
16:21:51 <jungleboyj> hemna: ++
16:21:54 <thingee> #link https://wiki.openstack.org/wiki/ThirdPartySystems/Mellanox_Cinder_CI
16:22:01 <jgriffith> tbarron: you've +1 that comment 3 times.. shouldn't it be +3 :)
16:22:05 <geguileo> eharney: Thanks for taking the work out of my hands  :)
16:22:10 <tbarron> jgriffith: :-)
16:22:12 <jgriffith> thingee: that doesn't work FWIW
16:22:23 <avishay> i didn't mean to remove it today, but i think drivers (and iscsi targets) that don't have CI should be removed for liberty
16:22:28 <thingee> jgriffith: well it's also marked down :)
16:22:33 <avishay> including nfs, including the iscsi targets
16:22:34 <jgriffith> :)
16:22:53 <jgriffith> thingee: and IMO not really applicable if you're not running mellanox anyway :)
16:23:09 <hemna> avishay, as a side note, we've been slowly getting folks to do CI on os-brick patches.  it's kinda a similar setup
16:23:10 <jungleboyj> avishay: I am concerned by the number of people that would potentially effect.
16:23:26 <thingee> jgriffith: maybe... I still think there is value in that code path being tested?
16:23:35 <jungleboyj> avishay: I think we need to work on addressing the problems.
16:23:35 <hemna> but I think we need to have CI for each of the target objects.
16:23:45 <jgriffith> thingee: for sure
16:23:58 <avishay> jungleboyj: you're not concerned about the number of people running broken drivers?  i'd rather have it not there than waste days trying to get it working.
16:24:18 <jgriffith> thingee: I'm just saying we shouldn't rely on a vendors impl of the target for the CI etc
16:24:20 <thingee> hemna: I think we made a decision earlier this release about target driver cis
16:24:21 <avishay> hemna: CI for brick is great as well
16:24:57 <hemna> thingee, ok sorry I don't remember what that was.  sorry for the churn
16:25:01 <thingee> jgriffith: so since things like LIO is open source, should we start working with infra to get a ci job in place?
16:25:02 <DuncanT> avishay: CI takes time to set up... eharney is on the case for LIO
16:25:02 <jgriffith> the duplication factor is somethign I still question but we can discuss offline
16:25:11 <jungleboyj> avishay: If you are a RedHat user I believe you have to use LIO.  Alienate a whole customer base.
16:25:12 <avishay> anyway, that's all i have, hope to see CIs soon
16:25:12 <jgriffith> thingee: that's what I was thinking
16:25:14 <jgriffith> yes
16:25:23 <thingee> hemna: that was more of a question, I'm unsure :P ... can't keep track
16:25:25 <avishay> jungleboyj: yes
16:25:27 <hemna> oh :)
16:25:40 <jungleboyj> avishay: Nice.
16:25:42 <avishay> RH7 has depracated tgt and the cinder package is preconfigured to use LIO
16:25:57 <jungleboyj> thingee: What was the decision?
16:26:04 <thingee> #info this is the second time LIO has been proposed to be removed I think
16:26:17 <thingee> jungleboyj: I think I said they need CI's, but there was no follow up communication
16:26:18 <avishay> there is also iet and scst, which i don't know what they are
16:26:21 <eharney> i'm not too familiar on details from the first time
16:26:24 <thingee> there needs to be ci's*
16:26:45 <thingee> eharney: we had an angry bug about lio...and just setting things up in ubuntu. I'll leave it at that.
16:26:58 <eharney> uh, ok
16:27:06 <avishay> bottom line, this fell through the cracks, it sucks, let's get CIs ASAP please :)
16:27:10 <avishay> for nfs too please
16:27:27 <thingee> nfs and block device driver...
16:27:33 <avishay> yes
16:27:39 <eharney> NFS and block device won't pass currently
16:27:43 <thingee> unfortunately the block device driver as I understand won't pass tempest today
16:27:45 <jordanP> nfs doesn't support snapshot, how could it pass the CI ?
16:27:55 <thingee> ^ exactly
16:27:59 <eharney> i and some others have been slowly poking at getting snapshots into the NFS driver
16:28:08 <eharney> current blocking point is a bug in Nova that i'm not clear really has an owner
16:28:12 <DuncanT> block device driver doesn't meet minimum features... it is one of the reasons we have so many ABCs
16:28:14 <avishay> block device driver can probably pass some subset, no?
16:28:18 <jordanP> eharney, which bug in nova ?
16:28:35 <thingee> oh darn we should talk about abc's if we have time jgriffith
16:28:44 <eharney> jordanP: looking
16:28:46 <DuncanT> I'm tempted to say block device driver should just be pulled TBH
16:29:09 <eharney> jordanP: https://bugs.launchpad.net/nova/+bug/1416132/
16:29:09 <openstack> Launchpad bug 1416132 in OpenStack Compute (nova) "_get_instance_disk_info fails to read files from NFS due to permissions" [High,In progress] - Assigned to Eric Harney (eharney)
16:29:15 <thingee> DuncanT: I think that harms projects like sahara though?
16:29:19 <eharney> the block device driver can't support snapshots, right?
16:29:29 <avishay> eharney: no
16:29:35 <eharney> oh, ok
16:29:41 <winston-d> eharney: no
16:29:49 <avishay> eharney: i guess it can 'dd', but that's pretty terrible
16:29:54 <DuncanT> thingee: Not enough that anybody has stepped up to CI it... Infra will host it if somebody can get it even slightly close to working
16:30:00 <smcginnis> DuncanT: I thought Mirantis was using BlockDeviceDriver for something.
16:30:03 <DuncanT> eharney: Correct
16:30:04 <winston-d> e0ne: are you guys still using it? Or Sahara guys?
16:30:05 <smcginnis> Maybe that was Sahara.
16:30:11 <thingee> smcginnis: yup!
16:30:15 <SergeyLukjanov> hey folks
16:30:17 <e0ne> winston-d: yes
16:30:22 <jordanP> eharney, "if processing a qcow2 backing file" -->>> work around for the CI would be to run nfs_use_qcow2 = False
16:30:24 <e0ne> we're going to make CI for it
16:30:28 <SergeyLukjanov> this driver is used by the bunch of sahara users
16:30:35 <e0ne> SergeyLukjanov: comfirm, please about CI
16:30:43 <eharney> jordanP: that isn't sufficient because you still get qcow2 files if you use snapshots
16:30:53 <SergeyLukjanov> it's the only way to make a performant storage for big data processing
16:31:06 <SergeyLukjanov> e0ne, yeah, we're working on making CI for it
16:31:13 <e0ne> SergeyLukjanov: great
16:31:19 <jordanP> ok
16:31:21 <DuncanT> SergeyLukjanov: Do you have a benchmark .v. local LVM? I can't get more than a few percent difference
16:31:24 <SergeyLukjanov> our plan now is to add support for it into devstack
16:31:49 <winston-d> we have customers want to evaluate hadoop with cinder block device driver
16:31:52 <e0ne> also, i'm a contact persom for this driver if you need some maintance or bugfixing or new features reauest
16:32:00 <thingee> SergeyLukjanov: by "we're" do you mean mirantis? And if so, is the contact information here https://wiki.openstack.org/wiki/ThirdPartySystems ?
16:33:14 <SergeyLukjanov> DuncanT, it's a recommendation from Hadoop community to use JBOD and not LVM
16:33:30 <thingee> eharney: what is the progress with snapshots with nfs? can we expect this for liberty?
16:33:47 <thingee> and if so, can we begin having a non-voting job hosted by infra?
16:33:49 <SergeyLukjanov> thingee, yup, we're now designing how it could be tested
16:34:06 <eharney> thingee: my understanding is that the majority of the Cinder work can merge once we get the above Nova bug fixed
16:34:16 <eharney> thingee: but how that Nova issue gets fixed is not clear at the moment
16:34:21 <DuncanT> SergeyLukjanov: Yeah, but are there any numbers to back it up? I tried to benchmark it and found basically no difference .v. LVM thick, and it is missing major features that complicate cinder somewhat
16:34:27 <thingee> SergeyLukjanov: great and I'm assuming for my second question the contact is e0ne or someone else?
16:34:34 <e0ne> :)
16:34:53 <e0ne> thingee: i posted answer few minutes earlier
16:34:55 <thingee> eharney: that's right, I think the nova bug is linked with the bluepirnt
16:35:01 <eharney> thingee: yes
16:35:06 <thingee> great
16:35:08 <thingee> avishay: there
16:35:12 <avishay> thingee: yup
16:35:15 <SergeyLukjanov> thingee, yes, e0ne will be contact for it
16:35:21 <thingee> SergeyLukjanov: thanks
16:35:58 <jordanP> eharney, we also need https://review.openstack.org/#/c/192736/ to get in I think
16:36:06 <avishay> i know CI takes some time, but these drivers/targets fell through the cracks, and i think we should set some deadline for them too
16:36:10 <SergeyLukjanov> DuncanT, hm, I have no numbers in mind, just everyone who using Hadoop is asking for directly mapped disks, not lvm
16:36:14 <eharney> jordanP: yes, and volume format tracking
16:36:18 <thingee> #info nfs ci is blocked by not supporting CI which is blocked by a nova bug
16:36:33 <avishay> i obviously don't want LIO removed since that's what we use for our internal tests, but i'd like to know that it works :)
16:36:35 <eharney> thingee: and https://review.openstack.org/#/c/192736/
16:36:37 <thingee> #info block device is in progress by mirantis. e0ne is point of contact
16:36:47 <e0ne> SergeyLukjanov, DuncanT: i believe, we can make some performance results for it
16:37:00 <DuncanT> e0ne: thanks
16:37:05 <winston-d> e0ne: that will be great
16:37:16 <thingee> #info nfs ci is also blocked by https://review.openstack.org/#/c/192736/
16:37:19 <e0ne> note: i didn't promice anything :)
16:37:36 <thingee> eharney: I will make a point to talk to johnthetubaguy about it.
16:37:38 <e0ne> but i'll try to do it...
16:37:51 <thingee> he reached out to me recently on syncing on some issues between nova and cinder
16:37:55 <winston-d> e0ne: we believe what you believe
16:38:03 <thingee> avishay: anything else?
16:38:10 <e0ne> winston-d: :)
16:38:12 <avishay> thingee: nope, thanks for the stage :)
16:38:20 <thingee> #topic volume migration
16:38:28 <thingee> jungleboyj: hi
16:38:49 <thingee> #info current spec in review
16:38:50 <thingee> #link https://review.openstack.org/#/c/186327/
16:39:19 <thingee> so I'm just about fine with this. I did think it was weird the only way to get progress on a migration is through ceilometer
16:39:25 <jungleboyj> thingee: Oh, I didn't know I was on the hook for this.
16:39:41 <thingee> the only way around this is to store the progress in the db and have an api call to get it.
16:39:44 <jungleboyj> I was just looking at that review.
16:39:48 <thingee> jungleboyj: sorry vincent isn't around
16:39:58 <jungleboyj> thingee: No problem.
16:40:19 <jungleboyj> Yeah, I agree we should be able to get the status from in Cinder.
16:40:25 <thingee> does anyone have thoughts, opinions with the migration progress being stored and accessible via api
16:40:27 <erlon> thingee: having an api for that is not an option?
16:40:30 <jungleboyj> Are you ok with tracking the progress in the DB?
16:40:49 <thingee> erlon: that's was my suggestion earlier. vincent just added that in the updated patch of the spec
16:40:56 <avishay> i think the overall spec is very good.  i reviewed it a while ago and it may have changed slightly since, but it's definitely in the right direction.
16:41:06 <erlon> thingee: ok
16:41:38 <thingee> jungleboyj: I'm fine, i wanted other people to raise concerns
16:41:50 <erlon> im setting up an environment with all backends I can to test vicent's patches, my idea is start with LVM, HNAS NFS HNAS iSCSI, HUS VM, HUS110
16:42:10 <avishay> i'm not crazy about a progress bar in the DB because it's not persistent.
16:42:14 <thingee> here's the diff between the changes when I raised some comments in the spec that introduces additional documentation for driver maintainers on how to develop this and getting migration status from the db https://review.openstack.org/#/c/186327/25..25/specs/liberty/volume-migration-improvement.rst,cm
16:42:34 <erlon> then have a matrix of what works and problems of BE integration
16:42:41 <DuncanT> The two risks with progress in the db are stale data and too many updates
16:42:48 <avishay> DuncanT: +1
16:43:02 <erlon> any suggestion of tests? or other scenarios??
16:43:11 <jgriffith> We currently don't do "progress" updates for anything else, do we need to for this?
16:43:25 <avishay> maybe cinder-volume can periodically notify cinder-scheduler via RPC
16:43:32 <jgriffith> If so, it could be "later" work, and def shouldn't be Ceilo dep IMHO
16:43:33 <avishay> or even better
16:43:38 <thingee> DuncanT: yeah think of cases where we have a mass migration happening because we need to switch from one pool to another. That would be a lot of updates happening. This won't be an everyday then, but still something to consider.
16:43:48 <avishay> if the API is called, just ask cinder-volume what the progress is right now
16:43:56 <jgriffith> avishay: +1
16:43:56 <DuncanT> jgriffith: I'd love to add progress to backups - I'd probably do it via an RPC to the backup service though, so the info is fresh
16:43:57 <thingee> jgriffith: +1
16:44:11 <DuncanT> avishay: +1
16:44:15 <e0ne> DuncanT: +1
16:44:15 <thingee> avishay: +1
16:44:22 <thingee> avishay: can you raise a comment with that suggestion?
16:44:29 <avishay> thingee: sure
16:44:36 <thingee> excellent
16:44:59 <thingee> So assuming vincent has that updated, would people be fine with me approving this spec this week?
16:45:18 <smcginnis> thingee: +1
16:45:21 <thingee> assuming no one raises blocking concerns
16:45:34 <erlon> thingee: +
16:45:36 <erlon> 1
16:45:38 <thingee> and by this week I mean friday. gives you time to read things now
16:45:42 <jungleboyj> Having more progress info in Cinder would be great.
16:46:01 <jungleboyj> thingee: +1
16:46:14 <xyang2> thingee: +1
16:46:15 <thingee> #idea have progress for volume migration come from api -> c-vol
16:46:23 <winston-d> thingee: aren't you guys in holiday this friday?
16:46:30 <thingee> winston-d: I never rest
16:46:31 <DuncanT> jungleboyj: What are the (potentially) slow operations? Backup, image stuff, migration. Maybe snap if you're rackspace?
16:46:40 <smcginnis> thingee: :)
16:46:49 <winston-d> thingee:  you just play
16:46:54 <winston-d> :)
16:47:02 <thingee> #agreed spec will be approved this friday assuming no blocking concerns and vincent updates spec with idea for progress update
16:47:09 <thingee> jungleboyj: thanks!
16:47:09 <jungleboyj> DuncanT: ++
16:47:18 <jungleboyj> thingee: Thank you!
16:47:25 <thingee> #topic HA
16:47:27 <thingee> geguileo: hi
16:47:32 <geguileo> thingee: Hi
16:47:39 <geguileo> Liberty is advancing and I believe right now most HA efforts are waiting on Cinder-nova interactions.
16:47:42 <thingee> #link https://etherpad.openstack.org/p/CinderNovaAPI
16:48:11 <geguileo> We know there's a lot of problems in Nova-Cinder interaction
16:48:16 <geguileo> As can be seen in that list
16:48:18 <dulek> You mean c-vol A/A probably.
16:48:23 <DuncanT> geguileo: The atomic state change in API is a breaking API from client PoV :-(
16:48:36 <thingee> geguileo: I dropped a bunch of work on winston-d to work on the error handling with cinder client. have an update winston-d ?
16:48:47 <thingee> geguileo: the last time we talked about this we said we address that first.
16:48:55 <geguileo> thingee: Yes
16:49:07 <geguileo> thingee: But in that list I see a lot of things that should not be blocking HA work
16:49:17 <hemna> https://etherpad.openstack.org/p/CinderNovaAPI
16:49:20 <geguileo> As I understand there are some interactions that need fixing
16:49:29 <geguileo> For HA work to be able to start
16:49:32 <hemna> fwiw, that etherpad has a ton of issues called out between nova -> cinder.
16:49:34 <geguileo> And others that are generic
16:49:39 <winston-d> thingee: sorry, not much progress so far, busy separating the company.
16:49:54 <geguileo> hemna: Are they all limiting Cinder to move to atomic state changes?
16:50:15 <hemna> geguileo, this is just a note taking etherpad that calls out all of the issues, and outstanding bugs
16:50:25 <hemna> lots of them
16:50:33 <DuncanT> geguileo: See above. atomic state change requires an API contract change with our clients, not just nova
16:50:39 <geguileo> I see the list is big and getting bigger, so
16:51:01 <hemna> I also called out some live migration problems in that etherpad as well.  it's not good.
16:51:07 <geguileo> DuncanT: Really? r:??
16:51:47 <geguileo> DuncanT: But even if we need to update cinderclient that should be fairly easy
16:51:52 <dulek> DuncanT: Any workaround?
16:51:58 <DuncanT> geguileo: We accept certain combinations of commands right now and effectively queue them up on the lock in the volume manager. If we go with atomic state changes, that no longer works
16:52:03 <dulek> geguileo: DuncanT meant clients like client scripts.
16:52:22 <hemna> this is a good topic for the meetup :)
16:52:29 <dulek> As first version of c-vol A/A we can go with Tooz locks for that.
16:52:30 <DuncanT> geguileo: We can't just change cinder-client behaviour - and python-cinderclient is not the only client out there
16:52:33 <jungleboyj> hemna ++
16:52:42 <thingee> 8 minute warning
16:52:55 <DuncanT> dulek: I can't figure out safe lock expiration in tooz
16:53:03 <hemna> this first stage in fixing the local volume locks was to put 'ing' checks in the API and then report VolumeIsBusy, and Nova has to cope with that.
16:53:11 <dulek> DuncanT: Hm? Locks get dropped when service dies.
16:53:13 <geguileo> DuncanT: There's heartbeats to keep locks
16:53:26 <DuncanT> dulek: Which leaves crap lying round... not good
16:53:28 <hemna> so I thought someone was going to look at the Nova side to expect the volume is busy exception and cope with that as step 1.
16:53:28 <geguileo> No heartbeats, they die
16:53:36 <hemna> I think that's a good idea regardless.
16:53:40 <dulek> DuncanT: Why do you think that? What kind of "crap"?
16:53:57 <DuncanT> dulek: Have cloned volumes... half done snapshots....
16:54:07 <geguileo> Taskflow could revert those operations
16:54:18 <hemna> taskflow...caugh..caugh
16:54:21 <dulek> DuncanT: With service I've meant Cinder services. We have such situations already.
16:54:24 <thingee> hemna: https://review.openstack.org/#/c/186742/
16:54:31 <DuncanT> geguileo: Only with persistence and really good coding
16:54:41 <winston-d> hemna: problem is, currently cinder doesn't raise VolumeIsBusy, right? Not until the lock-free change gets in?
16:54:42 <dulek> DuncanT: right.
16:54:53 <hemna> winston-d, it's a catch 22
16:55:03 <DuncanT> winston-d: Cinder can't raise it until the client handles it
16:55:06 <geguileo> DuncanT: With bad coding we would go nowhere anyway
16:55:10 <hemna> winston-d, we can't put it in Cinder until Nova handles it, or CI will puke
16:55:13 <e0ne> DuncanT: +1 about taskflow
16:55:22 <DuncanT> geguileo: We've got a long way with some very bad code....
16:55:24 <hemna> taskflow -1 for me
16:55:36 <hemna> we haven't decided as a team if we are sticking with it yet or not.
16:55:44 <geguileo> Ok, so we basically are saying that we can forget about HA?
16:55:45 <dulek> Okay, lets get back to using tooz. What's wrong with that besides of some performance issues?
16:55:50 <e0ne> aarefiev is hitting on performance emprovements for persistnace taskflow
16:55:53 <geguileo> Or wait for API v4?
16:55:54 <hemna> geguileo, no.
16:56:01 <jgriffith> dulek: it's zookeeper :)
16:56:03 <winston-d> hemna: ok. thx for clarification. i can move on my change on my 'fixbug1458958' nova branch now.
16:56:04 <DuncanT> The persistence we need seems to be at a more course granularity than tasks...
16:56:04 <dulek> If c-vol dies - we have half-done snapshots
16:56:05 <geguileo> hemna: Good to know
16:56:08 <hemna> geguileo, we have to get Nova to expect VolumeIsBusy exceptions.
16:56:15 <jgriffith> dulek: and the perf is a pretty big issue IMO
16:56:19 <avishay> i think the current locks can be removed if we garbage collect volumes offline rather than delete them immediately
16:56:21 <e0ne> he will be able to show some code/reports this week or next one
16:56:23 <geguileo> hemna: Ok, only that?
16:56:36 <dulek> If tooz backend service dies - node should get fenced by Pacemaker.
16:56:38 <greghaynes> toox isnt necessarially zk
16:56:41 <greghaynes> er, tooz
16:56:41 <hemna> geguileo, after that, then we put 'ing' checks in the cinder API, and report VolumeIsBusy.
16:56:51 <winston-d> hemna: i'm on it - fixing nova to expect volumeisbusy
16:56:55 <hemna> then we can remove all but 1 of the locks in the volume manager.
16:57:02 <geguileo> hemna: Ok, so the Nova support for VolumeIsBusy
16:57:03 <dulek> jgriffith: ZooKeeper is one of the options. Ceilometer relies on Redis as default backend - they say it's reliable.
16:57:05 <jgriffith> greghaynes: fair, but it's the version that "works" and is most deployed IIUC
16:57:12 <geguileo> hemna: Are there patches for that already submitted?
16:57:21 <jgriffith> dulek: they're problem space is different, but that's fair
16:57:22 <dulek> jgriffith: And performance will be probably better compared to running single c-vol
16:57:27 <hemna> we still have to deal w/ the lock in taskflow though
16:57:44 <winston-d> geguileo: not yet
16:57:46 <dulek> hemna: lock in taskflow?
16:57:46 <avishay> we need distributed locks for 1 lock in the code?
16:57:50 <hemna> that will be the last one, as far as the volume manager is concerned
16:57:51 <jgriffith> dulek: so what's the real advantage/reason for running multiple c-vol services?
16:57:53 <hemna> dulek, yes
16:58:01 <jgriffith> dulek: I'm not certain I think it's that great
16:58:01 <greghaynes> yes, and thats because zk is the only system that is known to really 'work' for that problem - if you want those guarantees youll need to use that system reguardless of tooz, or if you dont need them then you can use something lighter weight
16:58:07 <smcginnis> 2 minutes
16:58:13 <geguileo> winston-d: So the 3 patches that are submitted don't fix our problems for HA?
16:58:20 <hemna> dulek, I'll find it.  it's not as easy to remove....
16:58:22 <dulek> jgriffith: A/A HA and scaling probably.
16:58:26 <jgriffith> dulek: I've also thought we'd be MUCH better off using something like containers with mesos or something else
16:58:30 <jgriffith> dulek: disagree
16:58:30 <geguileo> Force iSCSI disconnect after timeout: https://review.openstack.org/#/c/167815/
16:58:32 <geguileo> Rollback if attach_volume timesout: https://review.openstack.org/#/c/138664/
16:58:33 <jgriffith> dulek: on both counts
16:58:34 <geguileo> Detach and terminate conn if Cinder attach fails: https://review.openstack.org/#/c/186742/
16:58:36 <geguileo> winston-d: ^
16:58:43 <jgriffith> dulek: c-vol is nothing but an API interface
16:58:53 <jgriffith> dulek: if it dies just respawn it
16:59:10 <jgriffith> dulek: and A/A configuration is sort of... mmm... well weird
16:59:14 <dulek> hemna: Point it to me please, I've became an expert on these flows in last weeks. ;)
16:59:17 <jgriffith> dulek: when pointing to the same backend device
16:59:20 <winston-d> geguileo: no, i have on wip patch for nova.
16:59:22 <hemna> dulek, I'm looking..
16:59:28 <jordanP> jgriffith, the "name" (the host) of the service is "coded" in every volume
16:59:33 <avishay> distributed locks are fixing the solution with a bulldozer. i'm sure we could figure out how to avoid the lock entirely.
16:59:39 <thingee> geguileo, winston-d lets talk in the #openstack-cinder room after the meeting
16:59:40 <jordanP> jgriffith, so you can't respawn it everywhere
16:59:47 <jgriffith> jordanP: sure you can
16:59:49 <geguileo> thingee: Ok, good idea
16:59:52 <dulek> jgriffith: Hm... So DuncanT was tasked with making it A/A in the first place. What were the motivations?
17:00:01 <jordanP> jgriffith, we need to tweak the "host' config flag then
17:00:03 <thingee> #endmeeting