Wednesday, 2017-03-15

openstackgerrit	Kota Tsuyuzaki proposed openstack/storlets master: Set concurrency 2 for less gate failures with 35 mins Timeout https://review.openstack.org/436912	07:24
*** sagara has joined #openstack-storlets		07:57
*** akihito has joined #openstack-storlets		08:01
eranrom	Hi	08:01
sagara	eranrom: Hi !	08:01
eranrom	I have left comments in https://etherpad.openstack.org/p/storlets-rate-resource-limiting-design	08:02
sagara	yesterday, thank you for adding your comment.	08:02
eranrom	sure.	08:02
akihito	eranrom: HI!	08:03
eranrom	akihito: Hi	08:03
eranrom	Have also left comments here: https://etherpad.openstack.org/p/storlets-pike-design-summit	08:03
sagara	eranrom: year, I confirmed that, and I wrote a reply comment.	08:03
eranrom	sagara: oh, I missed that. Let me have another look. min.	08:04
akihito	eranrom: Thank you. I am looking it.	08:04
sagara	eranrom: Sorry, I was writing until just a little while ago	08:05
eranrom	sagara: no problem. I have added replies.	08:08
sagara	eranrom: thank you, I will check it right away	08:09
eranrom	There is one think I am not yet sure about:	08:09
eranrom	What is the relationship between per account configuration and per node/proxy configuration.	08:10
eranrom	Per node there is the mechanism defined by reserved_host_cpu_cores, storlets_cpu_cores_fraction (and same for memory)	08:11
eranrom	Now for account there is also cores_per_node (and memory). How do the two should work together?	08:12
sagara	'per account' means how much resources user can request as containers spec, and 'per node/proxy' is how much resources admin(operator) reserved to host.	08:12
sagara	'cpu_cores_per_node', I assume we do per account resouce limit with newly creating API which is used by only admin.	08:14
eranrom	So given both the per-account and the per node/proxy how is the container ends up being executed?	08:14
eranrom	Can I try and write down my understanging?	08:15
sagara	I'll glad if you write it down.	08:16
eranrom	ok, let me know if I got it right:	08:16
sagara	and I also write example case.	08:16
eranrom	Say a user says my account requires 1.5 CPU cores. and the admin says overall storlets can have 10 cores in a proxy/object node	08:17
eranrom	When we need to start a container on a given node, we see if there are 1.5 CPUs available from the total 10 cores.	08:18
eranrom	Is that correct?	08:18
sagara	yes. that's correct	08:21
sagara	but I don't think float cpu number request	08:22
eranrom	I think that float cpu is supported, but its not very important for the design.	08:22
eranrom	So what happens if there are no cores left? Do we retry another node?	08:23
eranrom	Also, what if other containers on the same node are idle?	08:24
sagara	Sure, float is just a magnification.	08:24
sagara	If there are no cores left, it returns 409 or 5xx.	08:24
sagara	I don't think idle containers, I assumed Akihito's idle resource collection feature.	08:25
sagara	s/I don't think/I did not consider it/	08:26
sagara	s/I don't think/I did not consider/	08:26
eranrom	This might work, although I am afraid that even with a reasonable timeout for Akihito's resource collection feature we may end up with either seconds to tens of seconds of idle	08:28
eranrom	or even with not fully utilized CPUs.	08:28
eranrom	The fact that a container is not idle does not mean it uses its CPU to their full capacity.	08:28
eranrom	another option would be to use priorities...I agree its not the same.	08:30
sagara	I agree priority about CPU, Disk I/O.	08:30
sagara	We don't need reservation model without memory.	08:31
sagara	Is it correct?	08:31
eranrom	yes, I think that for memory this is different. My understanding for memory is:	08:32
eranrom	when you give 1GB of ram to a container, it means that it can use up to 1GB memory, but it does not guarantee that there will be 1GB of memory kept only for the container	08:33
eranrom	so it actually means the container has <=1GB	08:33
eranrom	I do need to confirm that	08:34
sagara	Sure. cgroup may be upper constraint, I also think it is not reservation. but I will confirm that.	08:35
eranrom	ok.	08:37
sagara	So we calculate all storlets memory in a node from reserved_host_memory_mb or storlets_memory_fraction, and set it to docker cgroup constraint value.	08:38
sagara	it can keep docker container's total memory to some upper value, and can keep host stable.	08:39
eranrom	that was my thinking. and this can be done by passing the constraints to the docker run command.	08:39
eranrom	I guess the more tricky part is doing the priorities, but IMHO this can be done later...	08:40
eranrom	unless of corse you have some strong requirement here...	08:40
sagara	Please confirm your sayng for me, I write it down now	08:41
eranrom	For the per node we calculate the reserved memory / CPU for storlets and enforce it when starting a container using e.g. the --memory and --cpu-set	08:43
eranrom	Make sense?	08:43
sagara	Your saying is, user storlet request will create docker run with --memory.limit_in_bytes=(total_host_memory - reserved_host_memory)	08:44
sagara	user storlet request will create docker instance with 'docker run --memory.limit_in_bytes=(total_host_memory - reserved_host_memory)'	08:45
eranrom	something like that (there is also the franction option)	08:45
sagara	sure, I'll implement fraction together.	08:46
sagara	So all docker has memory cap, and their memory request are 'first come, first served'	08:47
eranrom	and similarly docker-run --cpu-set=(total_host_set / reserved_host_set)	08:47
eranrom	where / is minus for sets :-)	08:47
sagara	Should we limit cpu cores just only cpu.shares?	08:48
sagara	I think both idea are good.	08:50
eranrom	This is also an option. Although swapping has some overhead so IMHO you get better segregation when using cpu-set although I am not sure how different this is...	08:50
eranrom	The merit of using shares is that when the host is idle you can make use of it...	08:51
eranrom	perhaps Takashi or Kota have an opinion here.	08:51
kota_	oops	08:52
kota_	sorry i was missing	08:52
eranrom	np :-) welcome.	08:52
sagara	Sorry I cannot understand the relation between 'swapping' and 'cpuset', something like NUMA?	08:52
sagara	hi!	08:52
eranrom	so we have two options for CPU sharing between host and storlets	08:53
kota_	k	08:53
eranrom	(1) use cpu share where storlets get e.g. 0.6 of all cores and host gets 0.4 of all cores	08:53
eranrom	(2) use sets, suppose there are 10 cores altogether. give storlets 6 and host 4	08:53
eranrom	option 1 is better in the sense that if storlets or host are idle the other one can use the cpus	08:54
eranrom	option 2 has better assurances on what the host ends up getting.	08:54
eranrom	this is because CPU swapping between processes has some price,	08:55
eranrom	My guess is that operators would want the assurances given by option 2 (but this is just a guess)	08:56
sagara	I see, maybe (2) is better, we talked no API model, so current design doesn't control cpu.share, so we had better controlling it with cpuset. is that right?	08:56
kota_	i like option (2) for the first step because it makes deployers easy to estimate how many cores used for users and swift backends	08:56
sagara	ah, I see, you said about CPU pinning	08:56
eranrom	sagara: right!	08:57
eranrom	regarding the API, I think this can be defined using config files, and 'enforced' using docker run command line params	08:58
eranrom	specifically, --memory and --cpuset-cpus (https://docs.docker.com/engine/admin/resource_constraints/)	08:59
sagara	Yes, I know that.	09:00
eranrom	sagara: right, sorry	09:01
sagara	sorry	09:01
eranrom	sagara: nothing to be sorry about :-)	09:03
sagara	OK, so all my question is resolved, we have config file to reserve host cpu cores and memory.	09:05
sagara	which are storlets_cpu_cores_fraction, reserved_host_cpu_cores, storlets_memory_fraction, reserved_host_memory_mb.	09:05
sagara	User do storlets request, and if there isn't docker container yet, 'docker run --cpuset.cpus=xxx, --memory.limit_in_bytes=yyy'	09:07
sagara	where xxx is 'total_host_cpus' - 'culculated by storlets_cpu_cores_fraction or reserved_host_cpu_cores'.	09:08
sagara	and memory is same as cpu	09:09
sagara	therefore, we don't need API	09:10
eranrom	sagara: right. There is also the per account settings which we may want to use shares for, but I suggest to discuss this later - is that ok?	09:11
sagara	OK	09:11
eranrom	akihito: still here?	09:12
akihito	yep	09:12
sagara	eranrom: thanks a lot!	09:12
eranrom	sagara: sure, my pleasure	09:12
eranrom	akihito: I have commented on the etherpad	09:13
akihito	oh! Thank you! I look it. and I comment too.	09:14
eranrom	I think that we still need to think more on the long running storlet problem (stop storlet app)	09:15
eranrom	and how to delete files2 and files3	09:15
eranrom	just let me know when	09:16
akihito	I see. I will update my opinion on etherpad.	09:18
eranrom	akihito: sure.	09:29
akihito	sorry. I don't assemble my opinion. I will update it tonight.	09:38
*** akihito has quit IRC		09:39
*** openstackgerrit has quit IRC		10:18
*** sagara has quit IRC		10:30
*** akihito has joined #openstack-storlets		15:50
*** akihito has quit IRC		15:51

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!