09:08:41 #startmeeting containers 09:08:42 Meeting started Wed Sep 25 09:08:41 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:08:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:08:46 The meeting name has been set to 'containers' 09:08:50 #topic Roll Call 09:08:55 o/ 09:09:15 o/ 09:10:44 brtknr: 09:12:06 o/ 09:13:12 #topic Stories and Tasks 09:13:16 o/ 09:13:33 let's discuss quickly fedora coreos status and reasoning 09:13:44 then nodegroups 09:13:51 brtknr: jakeyip anything else you want to discuss 09:13:52 ? 09:14:09 stein backports 09:14:17 nothing from me 09:14:34 ok 09:15:05 also when to cut the train release 09:15:23 So for CoreOS 09:15:48 1. we need to change from Atomic, there is no discussion around it 09:16:14 2. Fedora CoreOS is the "replacement" supported by the same team 09:16:33 I say replacement because it is not drop in replacement 09:16:44 I mean replacement in quotes 09:17:08 reasons to use it, at least from my POV 09:17:22 we have good communication with that community 09:18:13 the goal is to run the stock OS and run everything in containers 09:18:21 also they told me yesterday that they would like to support our use case transition from `atomic install --system ...` 09:18:43 the transition is probably podman run 09:20:16 any counter argument? 09:20:57 sounds good 09:20:59 at first, my worry was no more `atomic` but i am more reassured by the fact that the intended replacement is podman/docker 09:21:19 the work required is around the heat agent and replacement for the atomic cli 09:21:37 we should be able to run privileged container for kube-* services right? 09:21:58 atomic is just a python cli that writes a systemd unit which does "runc run" 09:22:24 we could 09:22:43 and podman is containers running under systemd iiuc 09:23:05 I hope at least, because k8s 1.16 is not playing nice in a container 09:23:12 yes 09:23:38 like my comment in https://review.opendev.org/#/c/678458/ 09:23:45 >I hope at least, because k8s 1.16 is not playing nice in a container 09:23:48 in what way? 09:24:13 the kubelet container is not propagating the mounts to the host 09:24:36 only kubelet, the others are fine 09:25:11 let's move to nodegroups? we won't solve this here 09:25:22 I mean the 1.16 issue 09:25:40 sounds like problem with podman? 09:26:09 that was with atomic, not podman 09:26:17 podman, atomic, they all use runc 09:26:48 okay.. anyway i think what we need is to convince ourselves fcos is the best alternative 09:27:49 before we build more momentum 09:28:20 I'm convinced, whenever I ask the ubuntu community for help, I found the door closed 09:28:51 I suppose the community is an important aspect... 09:29:00 brtknr: what are your concerns? 09:29:48 anyway, we will go to their meeting and we see. 09:30:00 #topic Nodegroups 09:30:03 i am just concerned about the risks as it seems experimantal 09:30:23 as with all exciting things XD 09:30:57 compared with centos, openstack and kubernetes is too experimental 09:31:28 or compared with debian, apache server, I can bring more :) 09:32:05 i think we need to find out from fcos community what are the things that are going to stay and things that may be uprooted 09:32:12 but happy to move on to nodegroups 09:35:10 we are fixing an issue with labels, brtknr did you find anything? 09:35:19 did manage to add nodegroups? 09:35:23 did you manage to add nodegroups? 09:36:45 strigazi: i just created a cluster but i was using kube_tag=v1.16.0 so it failed 09:36:55 retrying now with v1.15.3 09:38:26 but i have tested the full lifecycle in one of the earlier patchsets 09:38:33 create update and delete, also scaling 09:38:41 and everything seemed to work for me 09:39:01 also nice work adding the tests ttsiouts 09:39:13 it feels like a complete package now 09:39:42 brtknr: i'll push again today adapting your comments 09:40:08 brtknr: we also identified this issue with labels that strigazi mentioned 09:40:34 brtknr: thanks for testing! 09:43:01 brtknr: perfect 09:43:12 ttsiouts: i will repost my output to ng-6 saying everything is working for me 09:44:25 excellent 09:44:32 oh, one more thing 09:44:51 for nodegroups, we (CERN) need to spawn cluster across projects 09:44:58 for nodegroups, we (CERN) need to spawn clusters across projects 09:45:14 eg ng1 in project p1 and ng2 in project p2 09:45:16 so one nodegroup in 1 project another in a different project? 09:45:21 yes 09:45:28 that sounds messy... 09:45:38 in the db og ngs, we have proejct_id already 09:45:40 isnt there tenant isolation between networks? 09:45:43 nova is messy 09:46:12 so the mess comes from tehre 09:46:15 so the mess comes from there 09:46:26 or are you planning to use public interface? 09:46:31 s/public/external 09:46:33 public inteface 09:46:44 hmm interesting 09:46:52 yes, it depends on what you use the cluster for 09:47:00 sorry for the late 09:47:15 hi flwang1 :) 09:47:21 was taking care sick kids 09:47:26 brtknr: hello 09:47:29 is strigazi around? 09:47:31 for our usage and it is not an issue, and it is opt-in anyway 09:47:34 yep 09:47:36 hi flwang1 09:47:40 strigazi: hello 09:47:51 flwang1: hope the kids get better! 09:48:14 brtknr: thanks 09:48:28 my daughter got fever since yesterday 09:48:55 strigazi: is multi project supported in current ng implementation? 09:48:59 anything i can help provide my opinion? 09:49:17 oh, you're discussing the ng stuff 09:49:26 brtknr: no, but it is a small change 09:49:29 flwang1: i cannot reproduce the issue you commented in ng-6 patch 09:49:51 strigazi: is this for admins only? 09:49:58 brtknr: no 09:50:09 brtknr: 100% for uers 09:50:15 brtknr: 100% for users 09:51:02 brtknr: in my testing, after removing the 4 new fields from nodegroup table, the cluster is getting stable 09:51:04 brtknr: nova doesn't have accounting, for gpus, FPGAs, Ironic cpus are == to vcpus 09:51:09 i haven't dig into the root cause 09:51:18 flwang1: what fields? 09:51:28 what are you talking about? 09:51:32 stack_id, status, status_reason, version 09:51:48 you dropped things from the db? 09:51:54 are all migrations done? 09:52:07 flwang1: did you apply `magnum-db-manage upgrade` after checking out ng-9? 09:52:13 i did 09:52:16 for sure 09:52:24 i didn't need to delete anything 09:52:32 i mean 09:52:37 what is the error? 09:52:51 not the VM restarts/rebuils that is irrelevant 09:52:54 i have mentioned the error i saw in the ng6 patch 09:53:09 i also had to checkout the change in python-magnumclient then `pip install -e .` 09:53:42 the problem i got is the vm restart/rebuild 09:54:07 bad nova 09:54:11 no resources 09:54:23 when heat sends a req to nova 09:54:27 and nova fails 09:54:31 heat retries 09:54:40 deletes the old vm and tries again 09:54:58 same everything but diferent uuid 09:55:00 strigazi: so you mean, it's because my env(devstack) is lacking resource? 09:55:11 this happens when you don't have resources 09:55:12 yes 09:55:26 or it missbehaves in some other way 09:55:30 strigazi: ok, i will test again tomorrow then 09:55:34 eg can't create ports 09:55:52 try the minumum possible 09:56:28 ok, i don't really worry about the ng work, overall looks good for me 09:56:57 ok, if I +2 and bharat verifies you are ok? 09:57:32 we test at cern in three different dev envs plus bharat's tests 09:58:17 strigazi: i'm ok with that 09:58:33 flwang1: for train? 09:59:03 I'm mostly happy to get things merged after rebase and addressing all the minor comments, now that we also have solid unit tests... i am sure we will find minor issues with it later but its been hanging around for too long :) 09:59:10 strigazi: for train 09:59:15 just one silly question 09:59:29 what's the 'version' standing for in the ng table? 09:59:38 i can't see a description for that 10:00:04 placeholder for uprades with nodereplacement 10:00:40 now it will work as it is implemented 10:00:48 or we can leverage it now 10:01:05 so it's a version as kube_tag? 10:01:42 give me 5', sorru 10:01:44 give me 5', sorry 10:02:36 i have to leave in 30 minutes for our team standup 10:03:17 brtknr: no problem 10:03:38 i will be offline in 15 mins as well 10:03:56 i'm addressing the comments from heat team for the ignition patch 10:04:07 i'm very happy they're generally OK with that 10:04:18 btw can we start using etherpad for agenda like other teams, e.g. keystone: https://etherpad.openstack.org/p/keystone-weekly-meeting 10:04:36 and put a link to this in the channel's idle topic 10:04:40 brtknr: we were using wiki, but i'm ok with etherpad 10:05:15 or a link to wiki... i prefer the etherpad UI... 10:05:35 https://etherpad.openstack.org/p/magnum-weekly-meeting 10:05:37 there 10:05:39 :) 10:07:21 cool 10:08:15 i just proposed a new patchset for the ignition patch 10:14:52 flwang1: looks pretty solid 10:16:02 after it's done, we still have quite a lot work on magnum side to get the fedora coreos driver ready 10:16:04 where were we? 10:16:30 strigazi: the fedora coreos driver 10:16:43 flwang1: for ngs 10:16:50 for ngs 10:17:02 (22:01:04) flwang1: so it's a version as kube_tag? 10:17:04 flwang1: the ngs in different projects, is there an issue? 10:17:15 flwang1: oh, this 10:17:24 flwang1: we can use it now too 10:17:48 so the version is the coe version of current node group? 10:18:10 the current name 'version' is quite confused for me 10:19:14 this is an incremental version for the ng 10:19:23 so that we have some tracking 10:19:43 when a user upgrades something 10:20:18 but for now it is a placeholder, to implement it 10:20:56 makes sense? 10:21:49 brtknr: before you go, any reason to not have ngs in different projects as an opt-in option? 10:22:22 brtknr: flwang1: still here? 10:22:24 strigazi: i dont have major objections to it but perhaps this can be added on later? 10:22:29 i'm 10:22:38 i'm thinking and checking the gke api 10:22:40 or is it required imminintly 10:22:44 https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.locations.clusters.nodePools#NodePool 10:22:48 brtknr: why? for us it is 10:23:00 what do you mean ngs in different project? 10:23:44 flwang1: because nova doesn't have accounting for GPUs, FPGAs, and ironic cpus are accounted as vcpus 10:23:45 strigazi: i find it slightly unintuitive 10:24:13 brtknr: we won't advertise this 10:24:14 i was under the impression that projects imply complete separation 10:24:39 this doesn't say much ^^ 10:24:46 i prefer supporting ng per region under the same project 10:24:51 we do multicloud applications 10:24:54 strigazi: can you explain 'ngs in different projects? 10:25:17 flwang1: so ng1 lives in project A and ng2 lives in project B, both part of the same cluster 10:25:22 does that mean cluster 1 in project A can have a NG which belongs to project B? 10:25:32 again, this is opt-in 10:25:44 brtknr: that doesn't sound good for me 10:26:00 i was under the impresssion that a cluster belongs to a project 10:26:07 if magnum doesn't have it, we will investigate something else 10:26:30 if we want to have it, it needs to be disabled by default 10:26:45 unless the cloud operators enable it 10:26:49 it then seems like a jump in logic to have children nodegroups spanning different projects 10:26:56 that makes 100% sense 10:27:10 if we want to have it, it needs to be disabled by default that 10:27:38 brtknr: how do you do accounting for ironic nodes mixed with vms? 10:28:09 everything starts from there 10:28:14 and nova cells 10:28:50 in the ideal openstack cloud, I understand, it does not make sens. 10:29:20 strigazi: okay i'm happy with disabled by default. 10:30:34 flwang1: ? 10:30:56 policy or config option? 10:31:01 i'm ok, if it's disabled by default 10:31:07 config 10:31:35 i just worry about the security hell 10:31:38 flwang1: brtknr I'll send you a presentation why we do it 10:31:46 strigazi: pls do 10:32:07 flwang1: don't, because for a cloud with proper network it won't work anywya 10:32:27 well, it can work 10:32:35 but you need an extra router 10:32:41 vrouter 10:33:06 from a public cloud pov, it doesn't make sense 10:33:06 flwang1: this might be also usefull for running the master nodes in the operators tenant :) 10:33:17 well, see my comment above :) 10:33:22 strigazi: i can see the extra benefit ;) 10:33:40 i have to go, sorry 10:33:43 sorry, I completely forgot about that 10:33:46 it's late here 10:33:50 ok 10:33:57 see you 10:34:02 #endmeeting