16:01:02 #startmeeting containers 16:01:03 Meeting started Tue Oct 18 16:01:02 2016 UTC and is due to finish in 60 minutes. The chair is adrian_otto. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:07 The meeting name has been set to 'containers' 16:01:27 #link https://wiki.openstack.org/wiki/Meetings/Containers#Agenda_for_2016-10-18_1600_UTC Our Agenda 16:01:30 #topic Roll Call 16:01:31 Spyros Trigazis o/ 16:01:32 o/ 16:01:32 Adrian Otto 16:01:36 o/ 16:01:36 o/ 16:01:37 Jaycen Grant 16:01:50 Ton Ngo 16:01:57 Fengshengqin 16:02:23 hello strigazi Drago randallburt muralia jvgrant tonanhngo Fengshengqin1 16:02:40 Happy Tuesday everyone 16:03:28 o/ 16:03:35 o/ 16:03:47 hello hongbin and vijendar1 16:04:08 let's begin 16:04:14 #topic Announcements 16:04:24 1) Reminder: There will be no team meeting next week on 2016-10-25 because that is the week of the OpenStack Summit in Barcelona. 16:04:33 forgot to check in o/ 16:04:40 Our Next meeting will be on 2015-11-01 at 1600 UTC. 16:04:44 welcome swatson_ 16:05:01 2) Reminder: Please review our NodeGroup spec draft to prepare for our Summit discussion on this topic: 16:05:07 #link https://review.openstack.org/352734 [WIP] Add NodeGroup specification 16:05:35 this is a key topic of collaboration for our upcoming summit, and we are planning to share our plans with the community, so we'd like to be well prepared. 16:05:49 any other announcements from team members? 16:06:31 ^ the client-side stuff for the ID blueprint is almost ready 16:06:39 https://review.openstack.org/#/c/383930/ 16:07:03 oh, great, swatson_! 16:07:21 adrian_otto: Thanks! More eyes on it would be good :) 16:07:35 #topic Review Action Items 16:07:37 1) adrian_otto follow up with Kuryr PTL to arrange a joint session 16:07:43 Status: Follow-up complete, coordination is in progress 16:07:56 I'll follow up with an ML message with further details 16:08:27 They have allocated their first session on Thursday for Magnum 16:08:31 #action adrian_otto to email Magnum/Kuryr teams with details about join meetup plans prior to summit. 16:08:44 then my action will be really easy! 16:08:50 thanks tonanhngo 16:08:56 2) adrian_otto to remove BM Blueprint from Essential BP Review on team agenda 16:09:00 Status: COMPLETE 16:09:10 3) adrian_otto to remove Docs Blueprint from Essential BP Review on team agenda 16:09:11 Status: COMPLETE 16:09:18 4) strigazi to create a magnum-specs repo 16:09:44 I was sick from Thursday, I'll do it this week 16:09:58 oh, are you feeling better yet? 16:10:03 yeap 16:10:09 oh, good. 16:10:25 I'll carry that one forward 16:10:30 thanks 16:10:34 #action strigazi to create a magnum-specs repo 16:10:42 #topic Blueprints/Bugs/Reviews/Ideas 16:10:53 Do any team members have remarks for this section? 16:10:57 adrian_otto: Are we going to discuss the NodeGroup spec during this meeting? 16:11:11 Drago: yes, let's cover that now 16:11:20 Thanks 16:11:44 #link https://review.openstack.org/352734 [WIP] Add NodeGroup specification 16:12:06 Add information about how Heat stacks are handled for Clusters and NodeGroups 16:12:23 I think we can discuss this one. 16:12:34 +1, this will be a significant change 16:13:02 +1, my current thought is to have a stack for the Cluster, and 1 stack per NodeGroup 16:13:06 Our plan is to use a series of nested stacks, correct? 16:13:24 I had a private discussion with Drago on that, we were on a good path. Drago do you want to explain? 16:13:26 Drago: same question ^ as nested stacks or completely separate? 16:13:36 *Nested* stacks are going to be difficult to manage, because nested means they're inside another template 16:13:43 Also 16:13:59 nested stack resources are our main heat bottlenck 16:14:32 iirc, the heat team is working on that. triple0 has similar issues 16:14:36 This part is not parallel and scales linearly 16:14:48 that's a bottleneck when creating a cluster (or creating a node group in the future), correct strigazi ? 16:15:18 If we have a Cluster stack and a stack per nodegroup, it would not exacerbate the problem 16:15:38 yes, when heat validates all nested stack *resources* after creating the masters and beforae creating the workers 16:15:40 I am not sure if the stacks being completely independent would help or not 16:16:18 Drago: How would resources be referenced across the separate stacks? 16:16:23 if you're running heat with a single engine and worker, you won't see much difference either way 16:16:56 randallburt, ^^ what do you mean? 16:17:03 randallburt: what if the heat service has been scaled across numerous hosts? 16:17:29 then there should be no major difference between nested or multiple adrian_otto 16:17:44 the question I think we are after is if magnum can arrange heat's work so that it scales better than linear in therms of work complete by number of heat engines. 16:17:56 the validation itself would take longer with nested though 16:18:11 tonanhngo: The properties set for the current nested stacks (in the top level template e.g. kubecluster.yaml) would be converted to outputs. Then magnum could pick them up and pass them as parameters to the nodegroup's stacks 16:18:11 exactly 16:18:12 and if there is no meaningful difference might the separate stacks be harder to manage than nested ones? 16:18:15 randallburt: how about stack-update (updating a big stack with nest stacks VS update specific stack) 16:18:39 adrian_otto: the logic would certainly be more complex I would imagine 16:19:05 i think the pros of multiple stacks is that we can update specifc stack instead of always updating the big stack 16:19:22 this is common for life-cycle operations i guess 16:19:23 adrian_otto: If the stacks are nested, you have to mess with the top-level template body directly 16:19:31 hongbin: again, the bottleneck will be in the initial "what needs doing" step and if you can do that on a separate stack that would be faster from a magnum standpoint 16:19:39 Drago: ok, got it. I see the desire to federate that clearly now. 16:19:51 the actual orchestration after that isn't much different 16:20:57 so yeah, you get an easier implementation with using nested stacks but you could get faster and more fine-grained control using separate, but you'd have to do the synchronization of those stacks on your end 16:21:09 so our nodegroup resources will all be identified with the related cluster resource, so it will be trivial to act on a full set of them. 16:21:13 potentially, anyway 16:22:06 I think it makes sense to have separate stacks because it means that magnum resources to stacks is 1:1 16:22:09 in the current design, I think a NodeGroup can only belong to one cluster at a time, and we don't have plans to make it so you can orphan and adopt them into other clusters, correct? 16:22:10 In the case of AZs I think it makes sense to have different stacks 16:22:24 adrian_otto: Correct 16:22:26 adrain_otto: correct 16:22:30 correct 16:22:35 in that case, let's cover what happens when you delete a cluster 16:22:37 strigazi: you can put nested stacks in different AZ's though 16:22:44 there's a resource for that 16:22:48 one option is to automatically delete all associated node groups 16:23:12 another option is to raise an exception unless there are no associated nodegroups. 16:23:31 or some combination thereof, like a flag to delete that indicates you want a recursive deletion of all nodegroups. 16:24:08 The current behavior is (obviously) the former, so we could stick with that 16:24:20 in the spec when creating a NG you must specify the cluster. also the cluster will hold the resources for networks 16:24:28 I would assume that if you are deleting at the cluster level, you want the whole cluster gone. If you want to reduce availability of a particular group then scale it to 0 16:24:39 And if it turns out we want the latter, it can be implemented later 16:24:55 agreed. I'd like us to record that intent in the spec. 16:25:07 currently the idea is a nodegroup can't exist without a cluster it is connected to 16:25:09 adrian_otto: Which intent? 16:25:19 recursive delete 16:25:59 you mean rolling? 16:26:18 NodeGroups will also be manageable, so you can add or delete nodegroups from a cluster 16:26:35 strigazi: no. yes. I was imagining a future state where a nodegroup might be transitioned from one cluster to another. 16:26:46 but I think that's too complex for a first iteration 16:27:15 How about cluster-update? Would the change be applied recursively from the cluster, or separately per node group? 16:27:33 having separate heat stacks per node group opens the door to that possibility though. 16:27:43 tonanhngo: i think it depends on which attribute you want to update 16:28:06 i would think any operation on cluster would be applied recursively as it makes since for the operation 16:28:18 sense 16:28:57 you could always make it explicit in the api cluster-wide and group specific operations, then punt to the driver to sort it 16:29:24 Is there a use-case for recursively updating something in all the nodegroups? There is also nodegroup-update, so we could have cluster-update only apply to cluster attributes and then enable specifying nodegroup attributes to do recursive updates if we need it 16:29:41 ^^ +1 16:29:58 The only thing updatable right now is node-count, and I don't know why you'd want that recursive 16:30:21 true, currently there is not something that needs this 16:30:27 and it would be easy enough to automate if you did want to do it to all groups in the cluster 16:30:29 it could get complicated for user to manage, figuring out which attribute to update at the cluster level and which for nodegroup level. 16:30:31 though that might change with the lifecycle operations 16:30:38 that one should not be handles recursively. 16:31:18 jvgrant: you'd just have cluster-wide ones and group-specific ones in the api I would think 16:32:00 I could see an argument for the convenience of doing a cluster update and being able to specify —master-node-count or —node-count and having it apply correctly IFF there's 1 master NG and 1 minion NG 16:32:01 At this point our use cases are for cluster wide operations 16:32:22 randallburt: correct, i don't think we have the use case for needing the recursive yet. so we should not worry about it for now 16:33:05 Drago: that could be handled in the CLI to just find and send the command to the correct nodegroup 16:33:14 jvgrant: Sure sure :) 16:33:38 another way to handle that is to do node-group-update and have an optional cluster_id to use instead of nodegroup name. In that case it coudl do what you asked for for all the nodegroups in the cluster. 16:34:03 or allow a list of nodegroups to act on 16:34:11 both are optiimzations we can defer. 16:34:44 adrian_otto: Note that cluster_id is optional in the spec currently, because you can specify a NG_id that uniquely identifies the NG 16:35:02 In nodegroup-update and many other nodegroup commands 16:35:17 -1 To orphan NGs 16:35:28 an NG must have a cluster IMO 16:35:49 strigazi: Yes, the optional cluster ID is just because you already have a NG ID, not because it's orphanable 16:36:08 strigazi: +1 16:36:12 strigazi: +1 16:36:19 strigazi +1 16:36:34 +1 agreed. 16:36:50 I think we can agree on the one stack per NG and iterate on that 16:37:07 sounds good to me. 16:37:13 I saw that tonanhngo had some concerns about ClusterTemplates, around how attributes work with them. I know that there was some confusion around this for others as well 16:37:38 agree, one stack per NG is more elegant approach 16:38:31 Templates are true templates now. They have the same attributes as what they are a template of. Not separate objects with additional attributes 16:39:14 Currently, the intent of v2 ClusterTemplates is to act as a prototype, so you can create a cluster out of them. I don't think it's necessary to set required attributes on it if you don't want to, because it only needs to be enforced upon cluster create. Keypair is a great example of this 16:40:28 Drago: then when you use a template to create a cluster, magnum would validate and complain if a required attribute is missing? 16:40:37 tonanhngo: Correct 16:40:48 +1 16:40:59 ok, I guess that's different from the current model 16:41:07 tonanhngo: It definitely is 16:41:25 but it's reasonable as long as we document it so users are not surprised 16:42:09 You can also just override the attribute set in the template at cluster creation time, right? 16:42:10 And I think that's where part of the confusion is from. Attributes from the CT are copied over and the created Cluster has *no* link to the CT it was created from. I wanted to make that clear too 16:42:19 tonanhngo: Yes 16:42:58 A use case I want to have is: create a cluster with 4 NGs with one commands from a public CT or/and NG 16:43:13 tonanhngo: *Yes, unless we have the lockout feature we discussed at the midcycle 16:43:28 2 master NGs and 2 worker NGs 16:44:05 I'll comment for it in the spec 16:44:15 if the CT and NGT are already there then yes you can do that in one command 16:44:25 strigazi: According to the current spec, you can do that. You can add NodeGroupTemplates to a ClusterTemplate, then when you create a cluster from that CT, it creates the NodeGroups as well 16:44:40 This is a distinct use case, we should describe it in the use case section: flexibility in using the template 16:44:56 and you can override specific values on each if you need to(example: give each a different keypair other than the one in the template) 16:44:58 I mean for public CT and NGT 16:45:13 strigazi: It should not make any difference that it's public 16:45:21 ok 16:45:24 good 16:45:33 i can add that example to the spec 16:45:46 I want to bring up one more thing before the summit 16:45:51 +1, that would clarify the intention 16:45:57 have we considered a use-case where the user wants the master and worker nodes collapsed onto a single node in a single nogegroup? 16:46:30 heat-agent VS magnum-daemon on cluster nodes 16:46:44 adrian_otto: You could use labels when creating the master nodegroup and the driver could select the template that has the combined architecture 16:47:09 I like that 16:47:41 As the topology become complex, i think we could consider to support a declarative file to create a cluster, like k8s 16:47:44 adrian_otto: A Cluster with only "master" nodes is valid 16:47:53 hongbin: Or heat? :) randallburt 16:48:14 hongbin: at what point then do you just say "use heat if the user is that opinionated"? 16:48:24 lol Drago :D 16:48:32 ok, given our limited time today, are there any major concerns with the nogegroup spec that we should plan to address prior to the summit? 16:48:43 randallburt: I think hongbin means a file that would have a similar form to what you'd specify on the command line 16:48:52 Drago: ah, ok 16:48:55 yes 16:48:56 or are we happy sharing the current direction with the community, in accordance with our discussion today? 16:49:31 it seems like we are in agreement on most of the large points of this change 16:49:37 +1, maybe with some revision based on the discussion today 16:50:05 okay, so let's wrap this up for now, and if something surfaces, do our best to work it out on the ML 16:50:37 I had an item on the agenda to discuss the auth plugin 16:50:47 I think that if we were to come up with our own magnum agent, it would turn out to be just like the heat agent 16:51:02 strigazi, hongbin 16:51:05 +1 16:51:23 well, the heat agent is polling based, and magnum agents could be push based 16:51:54 That requires your nodes to be on the same network as magnum. I don't think that's good from a security perspective, and part of the reason the heat agent is polling 16:52:01 I don't know what I prefer yet, I just want to clarify 16:52:21 hongbin. there are options for the heat agent configuration that would allow for a reversal of that 16:52:27 Drago, nodes talk to magnum-api already 16:52:35 i don't like the performance of polling, it just takes a while for a lifecycle operation to get executed 16:52:49 strigazi: Yes, but nodes can be behind NAT, with no way for magnum to contact them first 16:52:50 yeah, assuming Magnum will have direct network access to the cluster nodes isn't always the case 16:52:57 in that case using the zaqar driver could help 16:53:22 and IIRC, os-*-config are generic enough that I think an operator could write a push-based plugin for them if they wanted 16:53:39 Another nice thing about using the heat agent is that it allows the operator to use whatever signal transport they want because it's a heat configuration, not magnum 16:53:59 #topic Open Discussion 16:54:01 randallburt, not sure how can this be done 16:54:15 right, so in all cases, I think the os-*-config agents work and for reference we can use the zaqar plugin that heat already supports 16:54:43 hongbin: I know you had a concern about the whole stack being updated. We can architect around that. Currently, the softwaredeployment resources can be pulled to the top level. When we move to 1 stack per nodegroup, it won't be a problem anymore regardless 16:55:05 strigazi: you could plug into os-refresh-config to be listening rather than polling, I think. Its been a while since I've looked tbh 16:55:14 then, how about extensibililty, for example, allows operators to provide a plugin that takes actions before/after a lifecycle operation 16:55:19 randallburt ok 16:55:48 hongbin: Operators can customize their templates to do so 16:55:50 adrian_otto, hongbin, strigazi, randallburt: Continue this after the meeting so that we can get to adrian_otto's item? 16:55:55 hongbin: the mechanism is the same 16:55:56 Drago: ok, then the stack-update seems fine 16:56:01 ok Drago 16:56:25 sure 16:56:36 randallburt: yes, i would argue a better option is to override a interface instead of customize the template 16:56:52 hongbin: then you gut Magnum in a way 16:57:19 randallburt: :) 16:57:29 that is my point, i am not strong at it 16:57:36 hongbin: on the other hand, I was going to propose moving all lifecycle and montioring things into the driver interface so that would work too and you could do it either way depending on operator expertise 16:58:30 adrian_otto? 16:58:33 we can cover the auth plugin discussion after we adjourn as well. What the team needs to know is that we will continue to refine that work, and seek agrement on the best place for the various components to live. 16:58:33 true, for example, in trove, there are hooks for operators to do pre-* and post-* operators 16:59:07 hongbin: yep, and you can do this with resources and things in a Heat template, but I see your point about not *having* to do so 16:59:14 Thanks everyone for attending today. Our next meeting will be at 2016-11-01 at 1600 UTC. 16:59:31 #endmeeting