Monday, 2016-01-04

*** zengyingzhe has quit IRC		00:48
*** gampel has quit IRC		01:45
*** wangfeng_yellow has joined #openstack-smaug		03:20
*** zengyingzhe has joined #openstack-smaug		06:07
*** zhonghua-lee has quit IRC		06:29
*** zhonghua-lee has joined #openstack-smaug		06:29
*** gampel has joined #openstack-smaug		07:22
*** gampel1 has joined #openstack-smaug		08:20
yinwei	@gampel on line?	08:21
yinwei	we're thinking about checkpoint lock mechanism	08:23
*** gampel has quit IRC		08:23
yinwei	scenario like: operation is under execution of protection service, and checkpoint is created. Here we need lock the checkpoint until execution of protection fishishes.	08:23
yinwei	The lock is to avoid delete chekcpoint on parallel and will be used when service restart and the workflow of the checkpoint rolling back: we need check no other service is working on this checkpoint on parallel. If we need this distributed lock, I'm thinking whether we need introduce a lock service plugin and its backend, say zookeeper or, since the lock is not competed frequemently, and doesn't have high performance requirement, we could implement	08:25
yinwei	a distributed lock based on our bank it self, like S3, swift. What do you think? the latter approach doesn't need introduce another component.	08:25
*** gampel1 has quit IRC		08:56
*** gampel has joined #openstack-smaug		08:57
zengyingzhe	yinwei, if lock the checkpoint all time while protection progress is going on, how can we read the status of this checkpoint?	09:47
zengyingzhe	I'm not sure whether S3 or swift is suit for implementing a distributed lock, but I do know DB can, cause heat implements lock mechanism by DB.	09:50
*** zengyingzhe has quit IRC		09:55
*** zengyingzhe has joined #openstack-smaug		09:56
*** zengyingzhe has quit IRC		09:57
*** openstackgerrit has quit IRC		10:02
*** openstackgerrit has joined #openstack-smaug		10:02
yinwei	same mechanism as DB, put and check. But swift as a distributed storage should scale better than DB, except you're talking about distributed db like cassandra.	10:15
*** saggi has joined #openstack-smaug		10:17
yinwei	guys are also discussing use zk to synchronize in heat :http://blogs.rdoproject.org/7685/zookeeper-part-1-the-swiss-army-knife-of-the-distributed-system-engineer-2	10:17
saggi	yinwei: I missed the beginning. What are we all talking about?	10:18
yinwei	if we don't mind introducing another component, zk would be more reliable way and nova service group has already accepted zk as one of its lock backend.	10:18
yinwei	hi, saggi	10:19
saggi	hi :)	10:19
yinwei	we're talking about checkpoint lock	10:19
yinwei	which seems to be a distributed lock	10:19
yinwei	scenario like: operation is under execution of protection service, and checkpoint is created. Here we need lock the checkpoint until execution of protection fishishes.	10:20
yinwei	The lock is to avoid delete chekcpoint on parallel and will be used when service restart and the workflow of the checkpoint rolling back: we need check no other service is working on this checkpoint on parallel.	10:20
yinwei	I think I got the scenario from your bank.md :)	10:21
saggi	Yes, how it's implemented depends on the bank.	10:21
saggi	So object store implementations support mechanisms that enable this locking.	10:22
yinwei	so two options, shall we introduce another lock service plugin and its backend and make zk as its lock	10:22
yinwei	sorry, by what semantics do you mean object store supports?	10:23
yinwei	AFAK, object storage only ensure atomic put and last one wins	10:23
saggi	First of all, the lock is only really important for deletion. Since the checkpoint will be marked as "in progress" until the checkpoint is done. This means we know that we can't restore from that point if it's not in "done" state. The only problem is deletion.	10:24
saggi	We want to know that: a) The protect operation crashed so we can delete an "in progress" checkpoint. b) there are no restoration in progress so we can delete a "done" checkpoint.	10:24
yinwei	IMHO, either we introduce lock service like zk, or we implement it based on bank in bankplugin.	10:25
saggi	I don't think we can use ZK since it needs to work cross site.	10:25
yinwei	shall we deploy swift across sites?	10:26
saggi	When restoring on site B site A needs to know it can't delete this checkpoint.	10:26
saggi	I'd assume you use the target's swift using it's northbound API. When you backup for DR you need to save off-site. That target site will need to run the object store. If the object store is in the local site we will loose it in the disaster.	10:28
yinwei	how about this case: we deploy multiple protection services, when one service crashes while the checkpoint is 'in progress'. Later, shall we pick up this checkpoint to continue in other service instance when the crash detected?	10:29
saggi	You can't continue. You have to create a new checkpoint from scratch since the tree might have changed while you were down.	10:30
yinwei	AFAK, swift has implemented geo replication	10:30
yinwei	why one site failue will lose whole swift?	10:31
saggi	If you don't have it replicated of course	10:31
yinwei	here I mean why not use its geo replication feature, but build two swift sites and replicate data by ourselves.	10:32
saggi	how are collisions resolved?	10:33
saggi	gampel: Suggested that for simplicity we might lock a bank to a single site. That would remove the need for cross site locking. Each site could put it's ID in the root of the bank. If you want write access you will need to use that site or steal access. We are vulnerable only during a forced transition, but we will warn the user when that happens.	10:33
saggi	yinwei, I have to, will you be here in 45 minutes?	10:34
yinwei	I don't think so, maybe later	10:34
saggi	yinwei, you are asking good questions and I would like to continue	10:34
yinwei	45 minutes later is my dinner time	10:34
yinwei	thanks, saggi	10:34
saggi	yinwei, ping me when you have time	10:34
yinwei	sure	10:34
yinwei	I still think we need check all zombie checkpoints (to cleanup, yes, delete again), and we need redo those operations	10:38
yinwei	to check zombie checkpoints, distributed lock would be a good semantic.	10:38
yinwei	@zengyingzhe, to answer read status question, we build index for checkpoint status reading. BTW, the lock should allow repeate repeated entrance of the same lock owner	10:41
*** zengyingzhe has joined #openstack-smaug		10:51
*** zengyingzhe_ has joined #openstack-smaug		12:05
*** zengyingzhe has quit IRC		12:08
*** yinweiphone has joined #openstack-smaug		12:15
yinweiphone	saggi: hello	12:16
saggi	yinweiphone: Hello	12:16
yinweiphone	happy	12:16
yinweiphone	happy you're here	12:16
saggi	yinweiphone: You are here in 3 different forms :)	12:18
yinweiphone	hmm, I'm trying iPhone app	12:18
yinweiphone	seems hard to print	12:18
saggi	yinweiphone: wrt locking. We would like to avoid distributed locking. What I thought was to use Swift's auto deletion feature to create lease objects. While they exists the checkpoint it locked. They will autodelete if the host crashes and no longer extends their lifetime.	12:22
*** yinweiphone has quit IRC		12:23
saggi	yinweiphone: It still doesn't solve the cross site use case as it will take time for georeplication to copy this objects so we can't trust them across sites. To solve this I suggest marking a checkpoint as deleted first and only deleting it after enough time has passed that we are sure that all sites are up to date.	12:24
*** wei__ has joined #openstack-smaug		12:25
wei__	wow, now i can login through mac, cool	12:26
wei__	much better to print	12:27
wei__	saggi, shall we continue?	12:28
*** yinweiphone has joined #openstack-smaug		12:28
*** yinweiphone has quit IRC		12:28
saggi	wei__: wrt locking. We would like to avoid distributed locking. What I thought was to use Swift's auto deletion feature to create lease objects. While they exists the checkpoint it locked. They will autodelete if the host crashes and no longer extends their lifetime.	12:29
saggi	wei__: It still doesn't solve the cross site use case as it will take time for georeplication to copy this objects so we can't trust them across sites. To solve this I suggest marking a checkpoint as deleted first and only deleting it after enough time has passed that we are sure that all sites are up to date.	12:29
wei__	auto deletion feature? sounds like empheral node in zk	12:31
saggi	wei__: I don't think we should deploy ZK cross site. I don't think it's built for that. I'd rather have the option to have a restore fail because someone deleted it (because it's not that bad) than have to configure and maintain a cross site ZK configuration.	12:33
wei__	I need check more about auto deletion. AFAK in S3, auto deletion only means object could be deleted sometime later as user set, like 3 days later.	12:33
wei__	I understand your point, the cost to introduce another service.	12:34
wei__	Just not sure whether auto deletion of swift could make it	12:34
wei__	as you said, it seems to maintain a heartbeat between client and cluster, once client crashes, the ephemeral lock unlocks.	12:36
saggi	wei__: The only thing I'm really worried about is detecting abandoned (zombie) checkpoints.	12:36
saggi	Deleting while restoring isn't very important as the restore would fail an it's the user's problem if it decides to delete the checkpoint.	12:37
wei__	oh oh oh	12:37
wei__	I got it	12:37
wei__	you mean client will update the lifetime of checkpoint key again and again	12:38
wei__	if client crashes, the checkpoint is auto deleted	12:38
wei__	ok	12:38
saggi	wei__: Just the lease file. Since the checkpoint might be many objects and updating them all is to much work.	12:39
*** zengyingzhe_ has quit IRC		12:39
wei__	the lease file is kept per service instance?	12:42
saggi	Per checkpoint.	12:42
wei__	ok.	12:42
wei__	then what's abandoned checkpoint do you mean?	12:43
wei__	checkpoints under execution while service crashes?	12:43
saggi	The server stopped the checkpoint process spontaneously either by a bug or by a crash	12:44
saggi	So the checkpoint is still "in progress" but it will never finish	12:44
saggi	we need to clean it up	12:44
saggi	wei__: OK?	12:47
wei__	hmm, do you mean the lock client who will update the lease file will be a separate process other than protection service? Otherwise, I couldn't see why when the service crashes, the lease will still be kept	12:47
wei__	hmm, if there's a bug, where protection service lives, but lease still continues...if this is the case, it's really hard to detect the zombie since even if you make lease updating logic into the task flow, you can't make sure the fineness is small enough to track bugs.	12:52
saggi	The lease will expire correctly. But another service wouldn't know which of the server leases are responsible for which checkpoints.	12:52
wei__	why?	12:53
saggi	How would they know?	12:53
wei__	I thought the lease should be some key like /container/checkpoint/lease/client_id	12:54
wei__	client_id is a sha256 or sth. like that generated once the service initialized according to timestamp	12:54
saggi	hmm	12:55
saggi	And than we only need to update one lease	12:55
wei__	swift get -path /container/checkpoint/lease could get if there's any lock	12:55
saggi	I would call it owner_service_id	12:55
wei__	yes, much better	12:56
saggi	and than we check that this owner is alive	12:56
saggi	It's a good idea wei__	12:56
wei__	thanks, saggi	12:56
wei__	I like your auto deletion idea too	12:56
saggi	So we have one per service.	12:57
wei__	actually, I'm thinking implement sth. like this in client side but forgot such a magic usage	12:57
wei__	ok, let me see	12:57
saggi	And we make it long	12:57
saggi	Much longer than the update time.	12:57
saggi	So if the service fails to update for N minutes it abandons all checkpoints.	12:58
saggi	Since we can't attest their integrity	12:58
wei__	how could we tell 'all checkpoints' belonging to one server owner?	12:58
saggi	We don't need that	12:59
saggi	we just go over all the in progress checkpoints and check their owners	12:59
saggi	collect all the abandoned ones	12:59
saggi	and delete them	12:59
wei__	I mean how could we tell the owner of each checkpoint?	12:59
saggi	wei__: Could you comment on bank.md so I will remember to add it to the file.	13:00
saggi	yes	13:00
saggi	that is what you suggested	13:00
saggi	putting it in the checkpiont MD	13:00
wei__	metadata of checkpoint object?	13:00
saggi	in the bank	13:00
wei__	sure, i will comment it in bank.md	13:01
saggi	/checkpoints/<checkpoint id>/owner_id	13:01
saggi	and we will have /leases/clients/<client_id>	13:01
saggi	or maybe /leases/owners/owner_id	13:01
wei__	you mean make another key under the prefix under checkpoint_id	13:01
saggi	yes	13:02
saggi	We create it when we create the checkpoint	13:02
wei__	ok	13:02
saggi	That way we only maintain one lease	13:02
wei__	hmm, got the mapping: checkpoint->owner-id->lease	13:03
saggi	We would make the leases long. Since I don't foresee a lot of contention	13:03
wei__	yes, same here	13:03
saggi	yes	13:03
wei__	another question, when you said we actually delete checkpoint after some time long enough until all sites updated	13:04
wei__	do you mean the geo replication of swift is an eventually consistency model?	13:05
saggi	wei__: Just what I wanted to talk about now :)	13:05
wei__	nice	13:05
wei__	:)	13:05
saggi	Yes, the problem is that we can't ensure consistency of the leases across side. So leases don't work.	13:06
wei__	anyway in swift to check whether the consistency has achieved?	13:06
saggi	But this is only a problem for the delete while restore case	13:06
wei__	yes, only happens when delete from one site but read from another site	13:07
saggi	wei__: I would prefer if swift wouldn't do anything. Since we also need to synchronize with resources outside of swift.	13:07
saggi	There is also the issue of double delete	13:08
wei__	hmm, what swift offers is almost what other object storage offers. They all aligns with S3.	13:08
saggi	So what we suggest is that deleting will only change the state of the checkpoint but wouldn't actually delete it. Then there will be a single process that we need to make sure only runs at one place and actually deletes the checkpoints.	13:09
saggi	Like a garbage collector.	13:09
wei__	yes, could only be that	13:10
wei__	eventually consistency introduces dirty read	13:10
saggi	But how do we ensure that it only runs on one site?	13:10
saggi	That is what we haven't solved yet	13:10
wei__	sorry, could describe the problem with more details?	13:11
saggi	Let's say someone at site A and site B decided to delete the same checkpoint. You will obviously encounter issues.	13:13
saggi	Even with the GC approach we need to somehow make sure only one site runs the GC	13:13
saggi	Or we get issues while cleaning up resources outside the bank	13:13
saggi	wei__: Do you understand the issue?	13:14
wei__	yeach	13:14
wei__	Or we get issues while cleaning up resources outside the bank---what issue? delete error 404 not found?	13:15
wei__	hmm, could be early exit since the key hasn't be replicated there yet?	13:16
saggi	Lets say we also have a volume backed up somewhere else. Where we delete the checkpoint we also need to delete this volume from the other storage. If two process try and do it at once one of them will fail.	13:17
wei__	but we wait enough to delete, right? we suppose that key has been replicated already, then we start GC	13:17
saggi	wei__: But than two sites can start the GC at once.	13:17
wei__	why not delete checkpoint key first, only the one succeeds delete checkpoint key will do following steps to cleanup backup resources?	13:19
wei__	swift will only allow one client succeeds to delete the key, others should get 404, isn't it?	13:19
saggi	wei__: But that doesn't immediately with geo replication	13:20
wei__	but we wait enough to delete, right? we suppose that key has been replicated already, then we start GC	13:20
wei__	that's the assumption, isn't it?	13:20
* saggi is thinking		13:21
wei__	ok, you mean the delete is not immediately with geo replication	13:21
wei__	thinking	13:22
saggi	What I'm saying is that two servers can decide to act on the deletion at once	13:22
saggi	yes	13:22
wei__	saggi, the condition GC could delete the checkpoint is whether lease of this checkpoint is still there	13:25
saggi	Yes, if it's missing we can delete	13:25
saggi	Since we know it was abandoned	13:26
saggi	gampel: Suggested having a root lease that can only actually delete checkpoints. Everyone can mark deletions but only it can actually delete.	13:27
wei__	so the root lease should locate in which site?	13:28
wei__	we need ensure this root lease won't fail in any site failure	13:28
saggi	It's in the bank. If it expires someone else will become root.	13:28
wei__	actually i was thinking we should have each site one GC to collect its owners' checkpoints garbage, so when it fails, others need take the ownership	13:30
wei__	there need some arbitration service to tell who is the root or who is the owner ?	13:31
wei__	what if we just tolerate the delete failure?	13:32
saggi	wei__: We will need to write the plugins around that. Since we don't want to have leftover data outside the bank unreachable.	13:32
wei__	yes, we don't leave garbage	13:33
wei__	just let the looser of the GC competitors to tolerate the delete error, what's the cons here?	13:33
wei__	sorry, have to leave. shall we continue tommorrow?	13:34
saggi	sure	13:34
*** wei__ has quit IRC		13:54
*** wei__ has joined #openstack-smaug		14:27
*** wei__ has quit IRC		14:30
*** chenying has quit IRC		14:30
*** chenying has joined #openstack-smaug		14:30
openstackgerrit	Eran Gampel proposed openstack/smaug: First draft of the API documentation https://review.openstack.org/255211	15:49
openstackgerrit	Eran Gampel proposed openstack/smaug: Add Smaug spec directory https://review.openstack.org/261913	16:00
*** smcginnis has joined #openstack-smaug		16:12
*** gampel has quit IRC		16:18
openstackgerrit	Merged openstack/smaug: First draft of the API documentation https://review.openstack.org/255211	16:43
openstackgerrit	Saggi Mizrahi proposed openstack/smaug: Pluggable protection provider doc https://review.openstack.org/262264	16:54
openstackgerrit	Merged openstack/smaug: Add Smaug spec directory https://review.openstack.org/261913	16:58
*** openstackgerrit has quit IRC		18:32
*** openstackgerrit has joined #openstack-smaug		18:32
*** zhonghua-lee has quit IRC		22:13
*** zhonghua-lee has joined #openstack-smaug		22:14
*** saggi has quit IRC		23:11
*** saggi has joined #openstack-smaug		23:28

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!