openstackgerrit | Kota Tsuyuzaki proposed openstack/storlets master: Set concurrency 2 for less gate failures with 35 mins Timeout https://review.openstack.org/436912 | 07:24 |
---|---|---|
*** sagara has joined #openstack-storlets | 07:57 | |
*** akihito has joined #openstack-storlets | 08:01 | |
eranrom | Hi | 08:01 |
sagara | eranrom: Hi ! | 08:01 |
eranrom | I have left comments in https://etherpad.openstack.org/p/storlets-rate-resource-limiting-design | 08:02 |
sagara | yesterday, thank you for adding your comment. | 08:02 |
eranrom | sure. | 08:02 |
akihito | eranrom: HI! | 08:03 |
eranrom | akihito: Hi | 08:03 |
eranrom | Have also left comments here: https://etherpad.openstack.org/p/storlets-pike-design-summit | 08:03 |
sagara | eranrom: year, I confirmed that, and I wrote a reply comment. | 08:03 |
eranrom | sagara: oh, I missed that. Let me have another look. min. | 08:04 |
akihito | eranrom: Thank you. I am looking it. | 08:04 |
sagara | eranrom: Sorry, I was writing until just a little while ago | 08:05 |
eranrom | sagara: no problem. I have added replies. | 08:08 |
sagara | eranrom: thank you, I will check it right away | 08:09 |
eranrom | There is one think I am not yet sure about: | 08:09 |
eranrom | What is the relationship between per account configuration and per node/proxy configuration. | 08:10 |
eranrom | Per node there is the mechanism defined by reserved_host_cpu_cores, storlets_cpu_cores_fraction (and same for memory) | 08:11 |
eranrom | Now for account there is also cores_per_node (and memory). How do the two should work together? | 08:12 |
sagara | 'per account' means how much resources user can request as containers spec, and 'per node/proxy' is how much resources admin(operator) reserved to host. | 08:12 |
sagara | 'cpu_cores_per_node', I assume we do per account resouce limit with newly creating API which is used by only admin. | 08:14 |
eranrom | So given both the per-account and the per node/proxy how is the container ends up being executed? | 08:14 |
eranrom | Can I try and write down my understanging? | 08:15 |
sagara | I'll glad if you write it down. | 08:16 |
eranrom | ok, let me know if I got it right: | 08:16 |
sagara | and I also write example case. | 08:16 |
eranrom | Say a user says my account requires 1.5 CPU cores. and the admin says overall storlets can have 10 cores in a proxy/object node | 08:17 |
eranrom | When we need to start a container on a given node, we see if there are 1.5 CPUs available from the total 10 cores. | 08:18 |
eranrom | Is that correct? | 08:18 |
sagara | yes. that's correct | 08:21 |
sagara | but I don't think float cpu number request | 08:22 |
eranrom | I think that float cpu is supported, but its not very important for the design. | 08:22 |
eranrom | So what happens if there are no cores left? Do we retry another node? | 08:23 |
eranrom | Also, what if other containers on the same node are idle? | 08:24 |
sagara | Sure, float is just a magnification. | 08:24 |
sagara | If there are no cores left, it returns 409 or 5xx. | 08:24 |
sagara | I don't think idle containers, I assumed Akihito's idle resource collection feature. | 08:25 |
sagara | s/I don't think/I did not consider it/ | 08:26 |
sagara | s/I don't think/I did not consider/ | 08:26 |
eranrom | This might work, although I am afraid that even with a reasonable timeout for Akihito's resource collection feature we may end up with either seconds to tens of seconds of idle | 08:28 |
eranrom | or even with not fully utilized CPUs. | 08:28 |
eranrom | The fact that a container is not idle does not mean it uses its CPU to their full capacity. | 08:28 |
eranrom | another option would be to use priorities...I agree its not the same. | 08:30 |
sagara | I agree priority about CPU, Disk I/O. | 08:30 |
sagara | We don't need reservation model without memory. | 08:31 |
sagara | Is it correct? | 08:31 |
eranrom | yes, I think that for memory this is different. My understanding for memory is: | 08:32 |
eranrom | when you give 1GB of ram to a container, it means that it can use up to 1GB memory, but it does not guarantee that there will be 1GB of memory kept only for the container | 08:33 |
eranrom | so it actually means the container has <=1GB | 08:33 |
eranrom | I do need to confirm that | 08:34 |
sagara | Sure. cgroup may be upper constraint, I also think it is not reservation. but I will confirm that. | 08:35 |
eranrom | ok. | 08:37 |
sagara | So we calculate all storlets memory in a node from reserved_host_memory_mb or storlets_memory_fraction, and set it to docker cgroup constraint value. | 08:38 |
sagara | it can keep docker container's total memory to some upper value, and can keep host stable. | 08:39 |
eranrom | that was my thinking. and this can be done by passing the constraints to the docker run command. | 08:39 |
eranrom | I guess the more tricky part is doing the priorities, but IMHO this can be done later... | 08:40 |
eranrom | unless of corse you have some strong requirement here... | 08:40 |
sagara | Please confirm your sayng for me, I write it down now | 08:41 |
eranrom | For the per node we calculate the reserved memory / CPU for storlets and enforce it when starting a container using e.g. the --memory and --cpu-set | 08:43 |
eranrom | Make sense? | 08:43 |
sagara | Your saying is, user storlet request will create docker run with --memory.limit_in_bytes=(total_host_memory - reserved_host_memory) | 08:44 |
sagara | user storlet request will create docker instance with 'docker run --memory.limit_in_bytes=(total_host_memory - reserved_host_memory)' | 08:45 |
eranrom | something like that (there is also the franction option) | 08:45 |
sagara | sure, I'll implement fraction together. | 08:46 |
sagara | So all docker has memory cap, and their memory request are 'first come, first served' | 08:47 |
eranrom | and similarly docker-run --cpu-set=(total_host_set / reserved_host_set) | 08:47 |
eranrom | where / is minus for sets :-) | 08:47 |
sagara | Should we limit cpu cores just only cpu.shares? | 08:48 |
sagara | I think both idea are good. | 08:50 |
eranrom | This is also an option. Although swapping has some overhead so IMHO you get better segregation when using cpu-set although I am not sure how different this is... | 08:50 |
eranrom | The merit of using shares is that when the host is idle you can make use of it... | 08:51 |
eranrom | perhaps Takashi or Kota have an opinion here. | 08:51 |
kota_ | oops | 08:52 |
kota_ | sorry i was missing | 08:52 |
eranrom | np :-) welcome. | 08:52 |
sagara | Sorry I cannot understand the relation between 'swapping' and 'cpuset', something like NUMA? | 08:52 |
sagara | hi! | 08:52 |
eranrom | so we have two options for CPU sharing between host and storlets | 08:53 |
kota_ | k | 08:53 |
eranrom | (1) use cpu share where storlets get e.g. 0.6 of all cores and host gets 0.4 of all cores | 08:53 |
eranrom | (2) use sets, suppose there are 10 cores altogether. give storlets 6 and host 4 | 08:53 |
eranrom | option 1 is better in the sense that if storlets or host are idle the other one can use the cpus | 08:54 |
eranrom | option 2 has better assurances on what the host ends up getting. | 08:54 |
eranrom | this is because CPU swapping between processes has some price, | 08:55 |
eranrom | My guess is that operators would want the assurances given by option 2 (but this is just a guess) | 08:56 |
sagara | I see, maybe (2) is better, we talked no API model, so current design doesn't control cpu.share, so we had better controlling it with cpuset. is that right? | 08:56 |
kota_ | i like option (2) for the first step because it makes deployers easy to estimate how many cores used for users and swift backends | 08:56 |
sagara | ah, I see, you said about CPU pinning | 08:56 |
eranrom | sagara: right! | 08:57 |
eranrom | regarding the API, I think this can be defined using config files, and 'enforced' using docker run command line params | 08:58 |
eranrom | specifically, --memory and --cpuset-cpus (https://docs.docker.com/engine/admin/resource_constraints/) | 08:59 |
sagara | Yes, I know that. | 09:00 |
eranrom | sagara: right, sorry | 09:01 |
sagara | sorry | 09:01 |
eranrom | sagara: nothing to be sorry about :-) | 09:03 |
sagara | OK, so all my question is resolved, we have config file to reserve host cpu cores and memory. | 09:05 |
sagara | which are storlets_cpu_cores_fraction, reserved_host_cpu_cores, storlets_memory_fraction, reserved_host_memory_mb. | 09:05 |
sagara | User do storlets request, and if there isn't docker container yet, 'docker run --cpuset.cpus=xxx, --memory.limit_in_bytes=yyy' | 09:07 |
sagara | where xxx is 'total_host_cpus' - 'culculated by storlets_cpu_cores_fraction or reserved_host_cpu_cores'. | 09:08 |
sagara | and memory is same as cpu | 09:09 |
sagara | therefore, we don't need API | 09:10 |
eranrom | sagara: right. There is also the per account settings which we may want to use shares for, but I suggest to discuss this later - is that ok? | 09:11 |
sagara | OK | 09:11 |
eranrom | akihito: still here? | 09:12 |
akihito | yep | 09:12 |
sagara | eranrom: thanks a lot! | 09:12 |
eranrom | sagara: sure, my pleasure | 09:12 |
eranrom | akihito: I have commented on the etherpad | 09:13 |
akihito | oh! Thank you! I look it. and I comment too. | 09:14 |
eranrom | I think that we still need to think more on the long running storlet problem (stop storlet app) | 09:15 |
eranrom | and how to delete files2 and files3 | 09:15 |
eranrom | just let me know when | 09:16 |
akihito | I see. I will update my opinion on etherpad. | 09:18 |
eranrom | akihito: sure. | 09:29 |
akihito | sorry. I don't assemble my opinion. I will update it tonight. | 09:38 |
*** akihito has quit IRC | 09:39 | |
*** openstackgerrit has quit IRC | 10:18 | |
*** sagara has quit IRC | 10:30 | |
*** akihito has joined #openstack-storlets | 15:50 | |
*** akihito has quit IRC | 15:51 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!