09:08:41 <strigazi> #startmeeting containers 09:08:42 <openstack> Meeting started Wed Sep 25 09:08:41 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:08:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:08:46 <openstack> The meeting name has been set to 'containers' 09:08:50 <strigazi> #topic Roll Call 09:08:55 <strigazi> o/ 09:09:15 <jakeyip> o/ 09:10:44 <strigazi> brtknr: 09:12:06 <brtknr> o/ 09:13:12 <strigazi> #topic Stories and Tasks 09:13:16 <ttsiouts> o/ 09:13:33 <strigazi> let's discuss quickly fedora coreos status and reasoning 09:13:44 <strigazi> then nodegroups 09:13:51 <strigazi> brtknr: jakeyip anything else you want to discuss 09:13:52 <strigazi> ? 09:14:09 <brtknr> stein backports 09:14:17 <jakeyip> nothing from me 09:14:34 <strigazi> ok 09:15:05 <brtknr> also when to cut the train release 09:15:23 <strigazi> So for CoreOS 09:15:48 <strigazi> 1. we need to change from Atomic, there is no discussion around it 09:16:14 <strigazi> 2. Fedora CoreOS is the "replacement" supported by the same team 09:16:33 <strigazi> I say replacement because it is not drop in replacement 09:16:44 <strigazi> I mean replacement in quotes 09:17:08 <strigazi> reasons to use it, at least from my POV 09:17:22 <strigazi> we have good communication with that community 09:18:13 <strigazi> the goal is to run the stock OS and run everything in containers 09:18:21 <brtknr> also they told me yesterday that they would like to support our use case transition from `atomic install --system ...` 09:18:43 <strigazi> the transition is probably podman run 09:20:16 <strigazi> any counter argument? 09:20:57 <jakeyip> sounds good 09:20:59 <brtknr> at first, my worry was no more `atomic` but i am more reassured by the fact that the intended replacement is podman/docker 09:21:19 <strigazi> the work required is around the heat agent and replacement for the atomic cli 09:21:37 <brtknr> we should be able to run privileged container for kube-* services right? 09:21:58 <strigazi> atomic is just a python cli that writes a systemd unit which does "runc run" 09:22:24 <strigazi> we could 09:22:43 <brtknr> and podman is containers running under systemd iiuc 09:23:05 <strigazi> I hope at least, because k8s 1.16 is not playing nice in a container 09:23:12 <strigazi> yes 09:23:38 <strigazi> like my comment in https://review.opendev.org/#/c/678458/ 09:23:45 <brtknr> >I hope at least, because k8s 1.16 is not playing nice in a container 09:23:48 <brtknr> in what way? 09:24:13 <strigazi> the kubelet container is not propagating the mounts to the host 09:24:36 <strigazi> only kubelet, the others are fine 09:25:11 <strigazi> let's move to nodegroups? we won't solve this here 09:25:22 <strigazi> I mean the 1.16 issue 09:25:40 <brtknr> sounds like problem with podman? 09:26:09 <strigazi> that was with atomic, not podman 09:26:17 <strigazi> podman, atomic, they all use runc 09:26:48 <brtknr> okay.. anyway i think what we need is to convince ourselves fcos is the best alternative 09:27:49 <brtknr> before we build more momentum 09:28:20 <strigazi> I'm convinced, whenever I ask the ubuntu community for help, I found the door closed 09:28:51 <brtknr> I suppose the community is an important aspect... 09:29:00 <strigazi> brtknr: what are your concerns? 09:29:48 <strigazi> anyway, we will go to their meeting and we see. 09:30:00 <strigazi> #topic Nodegroups 09:30:03 <brtknr> i am just concerned about the risks as it seems experimantal 09:30:23 <brtknr> as with all exciting things XD 09:30:57 <strigazi> compared with centos, openstack and kubernetes is too experimental 09:31:28 <strigazi> or compared with debian, apache server, I can bring more :) 09:32:05 <brtknr> i think we need to find out from fcos community what are the things that are going to stay and things that may be uprooted 09:32:12 <brtknr> but happy to move on to nodegroups 09:35:10 <strigazi> we are fixing an issue with labels, brtknr did you find anything? 09:35:19 <strigazi> did manage to add nodegroups? 09:35:23 <strigazi> did you manage to add nodegroups? 09:36:45 <brtknr> strigazi: i just created a cluster but i was using kube_tag=v1.16.0 so it failed 09:36:55 <brtknr> retrying now with v1.15.3 09:38:26 <brtknr> but i have tested the full lifecycle in one of the earlier patchsets 09:38:33 <brtknr> create update and delete, also scaling 09:38:41 <brtknr> and everything seemed to work for me 09:39:01 <brtknr> also nice work adding the tests ttsiouts 09:39:13 <brtknr> it feels like a complete package now 09:39:42 <ttsiouts> brtknr: i'll push again today adapting your comments 09:40:08 <ttsiouts> brtknr: we also identified this issue with labels that strigazi mentioned 09:40:34 <ttsiouts> brtknr: thanks for testing! 09:43:01 <strigazi> brtknr: perfect 09:43:12 <brtknr> ttsiouts: i will repost my output to ng-6 saying everything is working for me 09:44:25 <strigazi> excellent 09:44:32 <strigazi> oh, one more thing 09:44:51 <strigazi> for nodegroups, we (CERN) need to spawn cluster across projects 09:44:58 <strigazi> for nodegroups, we (CERN) need to spawn clusters across projects 09:45:14 <strigazi> eg ng1 in project p1 and ng2 in project p2 09:45:16 <brtknr> so one nodegroup in 1 project another in a different project? 09:45:21 <strigazi> yes 09:45:28 <brtknr> that sounds messy... 09:45:38 <strigazi> in the db og ngs, we have proejct_id already 09:45:40 <brtknr> isnt there tenant isolation between networks? 09:45:43 <strigazi> nova is messy 09:46:12 <strigazi> so the mess comes from tehre 09:46:15 <strigazi> so the mess comes from there 09:46:26 <brtknr> or are you planning to use public interface? 09:46:31 <brtknr> s/public/external 09:46:33 <strigazi> public inteface 09:46:44 <brtknr> hmm interesting 09:46:52 <strigazi> yes, it depends on what you use the cluster for 09:47:00 <flwang1> sorry for the late 09:47:15 <brtknr> hi flwang1 :) 09:47:21 <flwang1> was taking care sick kids 09:47:26 <flwang1> brtknr: hello 09:47:29 <flwang1> is strigazi around? 09:47:31 <strigazi> for our usage and it is not an issue, and it is opt-in anyway 09:47:34 <brtknr> yep 09:47:36 <strigazi> hi flwang1 09:47:40 <flwang1> strigazi: hello 09:47:51 <brtknr> flwang1: hope the kids get better! 09:48:14 <flwang1> brtknr: thanks 09:48:28 <flwang1> my daughter got fever since yesterday 09:48:55 <brtknr> strigazi: is multi project supported in current ng implementation? 09:48:59 <flwang1> anything i can help provide my opinion? 09:49:17 <flwang1> oh, you're discussing the ng stuff 09:49:26 <strigazi> brtknr: no, but it is a small change 09:49:29 <brtknr> flwang1: i cannot reproduce the issue you commented in ng-6 patch 09:49:51 <brtknr> strigazi: is this for admins only? 09:49:58 <strigazi> brtknr: no 09:50:09 <strigazi> brtknr: 100% for uers 09:50:15 <strigazi> brtknr: 100% for users 09:51:02 <flwang1> brtknr: in my testing, after removing the 4 new fields from nodegroup table, the cluster is getting stable 09:51:04 <strigazi> brtknr: nova doesn't have accounting, for gpus, FPGAs, Ironic cpus are == to vcpus 09:51:09 <flwang1> i haven't dig into the root cause 09:51:18 <strigazi> flwang1: what fields? 09:51:28 <strigazi> what are you talking about? 09:51:32 <flwang1> stack_id, status, status_reason, version 09:51:48 <strigazi> you dropped things from the db? 09:51:54 <strigazi> are all migrations done? 09:52:07 <brtknr> flwang1: did you apply `magnum-db-manage upgrade` after checking out ng-9? 09:52:13 <flwang1> i did 09:52:16 <flwang1> for sure 09:52:24 <brtknr> i didn't need to delete anything 09:52:32 <flwang1> i mean 09:52:37 <strigazi> what is the error? 09:52:51 <strigazi> not the VM restarts/rebuils that is irrelevant 09:52:54 <flwang1> i have mentioned the error i saw in the ng6 patch 09:53:09 <brtknr> i also had to checkout the change in python-magnumclient then `pip install -e .` 09:53:42 <flwang1> the problem i got is the vm restart/rebuild 09:54:07 <strigazi> bad nova 09:54:11 <strigazi> no resources 09:54:23 <strigazi> when heat sends a req to nova 09:54:27 <strigazi> and nova fails 09:54:31 <strigazi> heat retries 09:54:40 <strigazi> deletes the old vm and tries again 09:54:58 <strigazi> same everything but diferent uuid 09:55:00 <flwang1> strigazi: so you mean, it's because my env(devstack) is lacking resource? 09:55:11 <strigazi> this happens when you don't have resources 09:55:12 <strigazi> yes 09:55:26 <strigazi> or it missbehaves in some other way 09:55:30 <flwang1> strigazi: ok, i will test again tomorrow then 09:55:34 <strigazi> eg can't create ports 09:55:52 <strigazi> try the minumum possible 09:56:28 <flwang1> ok, i don't really worry about the ng work, overall looks good for me 09:56:57 <strigazi> ok, if I +2 and bharat verifies you are ok? 09:57:32 <strigazi> we test at cern in three different dev envs plus bharat's tests 09:58:17 <flwang1> strigazi: i'm ok with that 09:58:33 <strigazi> flwang1: for train? 09:59:03 <brtknr> I'm mostly happy to get things merged after rebase and addressing all the minor comments, now that we also have solid unit tests... i am sure we will find minor issues with it later but its been hanging around for too long :) 09:59:10 <flwang1> strigazi: for train 09:59:15 <flwang1> just one silly question 09:59:29 <flwang1> what's the 'version' standing for in the ng table? 09:59:38 <flwang1> i can't see a description for that 10:00:04 <strigazi> placeholder for uprades with nodereplacement 10:00:40 <strigazi> now it will work as it is implemented 10:00:48 <strigazi> or we can leverage it now 10:01:05 <flwang1> so it's a version as kube_tag? 10:01:42 <strigazi> give me 5', sorru 10:01:44 <strigazi> give me 5', sorry 10:02:36 <brtknr> i have to leave in 30 minutes for our team standup 10:03:17 <flwang1> brtknr: no problem 10:03:38 <flwang1> i will be offline in 15 mins as well 10:03:56 <flwang1> i'm addressing the comments from heat team for the ignition patch 10:04:07 <flwang1> i'm very happy they're generally OK with that 10:04:18 <brtknr> btw can we start using etherpad for agenda like other teams, e.g. keystone: https://etherpad.openstack.org/p/keystone-weekly-meeting 10:04:36 <brtknr> and put a link to this in the channel's idle topic 10:04:40 <flwang1> brtknr: we were using wiki, but i'm ok with etherpad 10:05:15 <brtknr> or a link to wiki... i prefer the etherpad UI... 10:05:35 <brtknr> https://etherpad.openstack.org/p/magnum-weekly-meeting 10:05:37 <brtknr> there 10:05:39 <brtknr> :) 10:07:21 <flwang1> cool 10:08:15 <flwang1> i just proposed a new patchset for the ignition patch 10:14:52 <brtknr> flwang1: looks pretty solid 10:16:02 <flwang1> after it's done, we still have quite a lot work on magnum side to get the fedora coreos driver ready 10:16:04 <strigazi> where were we? 10:16:30 <flwang1> strigazi: the fedora coreos driver 10:16:43 <strigazi> flwang1: for ngs 10:16:50 <flwang1> for ngs 10:17:02 <flwang1> (22:01:04) flwang1: so it's a version as kube_tag? 10:17:04 <strigazi> flwang1: the ngs in different projects, is there an issue? 10:17:15 <strigazi> flwang1: oh, this 10:17:24 <strigazi> flwang1: we can use it now too 10:17:48 <flwang1> so the version is the coe version of current node group? 10:18:10 <flwang1> the current name 'version' is quite confused for me 10:19:14 <strigazi> this is an incremental version for the ng 10:19:23 <strigazi> so that we have some tracking 10:19:43 <strigazi> when a user upgrades something 10:20:18 <strigazi> but for now it is a placeholder, to implement it 10:20:56 <strigazi> makes sense? 10:21:49 <strigazi> brtknr: before you go, any reason to not have ngs in different projects as an opt-in option? 10:22:22 <strigazi> brtknr: flwang1: still here? 10:22:24 <brtknr> strigazi: i dont have major objections to it but perhaps this can be added on later? 10:22:29 <flwang1> i'm 10:22:38 <flwang1> i'm thinking and checking the gke api 10:22:40 <brtknr> or is it required imminintly 10:22:44 <flwang1> https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.locations.clusters.nodePools#NodePool 10:22:48 <strigazi> brtknr: why? for us it is 10:23:00 <flwang1> what do you mean ngs in different project? 10:23:44 <strigazi> flwang1: because nova doesn't have accounting for GPUs, FPGAs, and ironic cpus are accounted as vcpus 10:23:45 <brtknr> strigazi: i find it slightly unintuitive 10:24:13 <strigazi> brtknr: we won't advertise this 10:24:14 <brtknr> i was under the impression that projects imply complete separation 10:24:39 <strigazi> this doesn't say much ^^ 10:24:46 <brtknr> i prefer supporting ng per region under the same project 10:24:51 <strigazi> we do multicloud applications 10:24:54 <flwang1> strigazi: can you explain 'ngs in different projects? 10:25:17 <brtknr> flwang1: so ng1 lives in project A and ng2 lives in project B, both part of the same cluster 10:25:22 <flwang1> does that mean cluster 1 in project A can have a NG which belongs to project B? 10:25:32 <strigazi> again, this is opt-in 10:25:44 <flwang1> brtknr: that doesn't sound good for me 10:26:00 <brtknr> i was under the impresssion that a cluster belongs to a project 10:26:07 <strigazi> if magnum doesn't have it, we will investigate something else 10:26:30 <flwang1> if we want to have it, it needs to be disabled by default 10:26:45 <flwang1> unless the cloud operators enable it 10:26:49 <brtknr> it then seems like a jump in logic to have children nodegroups spanning different projects 10:26:56 <strigazi> that makes 100% sense 10:27:10 <strigazi> if we want to have it, it needs to be disabled by default that 10:27:38 <strigazi> brtknr: how do you do accounting for ironic nodes mixed with vms? 10:28:09 <strigazi> everything starts from there 10:28:14 <strigazi> and nova cells 10:28:50 <strigazi> in the ideal openstack cloud, I understand, it does not make sens. 10:29:20 <brtknr> strigazi: okay i'm happy with disabled by default. 10:30:34 <strigazi> flwang1: ? 10:30:56 <strigazi> policy or config option? 10:31:01 <flwang1> i'm ok, if it's disabled by default 10:31:07 <flwang1> config 10:31:35 <flwang1> i just worry about the security hell 10:31:38 <strigazi> flwang1: brtknr I'll send you a presentation why we do it 10:31:46 <flwang1> strigazi: pls do 10:32:07 <strigazi> flwang1: don't, because for a cloud with proper network it won't work anywya 10:32:27 <strigazi> well, it can work 10:32:35 <strigazi> but you need an extra router 10:32:41 <strigazi> vrouter 10:33:06 <flwang1> from a public cloud pov, it doesn't make sense 10:33:06 <strigazi> flwang1: this might be also usefull for running the master nodes in the operators tenant :) 10:33:17 <strigazi> well, see my comment above :) 10:33:22 <flwang1> strigazi: i can see the extra benefit ;) 10:33:40 <flwang1> i have to go, sorry 10:33:43 <strigazi> sorry, I completely forgot about that 10:33:46 <flwang1> it's late here 10:33:50 <strigazi> ok 10:33:57 <strigazi> see you 10:34:02 <strigazi> #endmeeting