#openstack-containers log

09:08:41 <strigazi> #startmeeting containers
09:08:42 <openstack> Meeting started Wed Sep 25 09:08:41 2019 UTC and is due to finish in 60 minutes.  The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:08:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:08:46 <openstack> The meeting name has been set to 'containers'
09:08:50 <strigazi> #topic Roll Call
09:08:55 <strigazi> o/
09:09:15 <jakeyip> o/
09:10:44 <strigazi> brtknr:
09:12:06 <brtknr> o/
09:13:12 <strigazi> #topic Stories and Tasks
09:13:16 <ttsiouts> o/
09:13:33 <strigazi> let's discuss quickly fedora coreos status and reasoning
09:13:44 <strigazi> then nodegroups
09:13:51 <strigazi> brtknr: jakeyip anything else you want to discuss
09:13:52 <strigazi> ?
09:14:09 <brtknr> stein backports
09:14:17 <jakeyip> nothing from me
09:14:34 <strigazi> ok
09:15:05 <brtknr> also when to cut the train release
09:15:23 <strigazi> So for CoreOS
09:15:48 <strigazi> 1. we need to change from Atomic, there is no discussion around it
09:16:14 <strigazi> 2. Fedora CoreOS is the "replacement" supported by the same team
09:16:33 <strigazi> I say replacement because it is not drop in replacement
09:16:44 <strigazi> I mean replacement in quotes
09:17:08 <strigazi> reasons to use it, at least from my POV
09:17:22 <strigazi> we have good communication with that community
09:18:13 <strigazi> the goal is to run the stock OS and run everything in containers
09:18:21 <brtknr> also they told me yesterday that they would like to support our use case transition from `atomic install --system ...`
09:18:43 <strigazi> the transition is probably podman run
09:20:16 <strigazi> any counter argument?
09:20:57 <jakeyip> sounds good
09:20:59 <brtknr> at first, my worry was no more `atomic` but i am more reassured by the fact that the intended replacement is podman/docker
09:21:19 <strigazi> the work required is around the heat agent and replacement for the atomic cli
09:21:37 <brtknr> we should be able to run privileged container for kube-* services right?
09:21:58 <strigazi> atomic is just a python cli that writes a systemd unit which does "runc run"
09:22:24 <strigazi> we could
09:22:43 <brtknr> and podman is containers running under systemd iiuc
09:23:05 <strigazi> I hope at least, because k8s 1.16 is not playing nice in a container
09:23:12 <strigazi> yes
09:23:38 <strigazi> like my comment in https://review.opendev.org/#/c/678458/
09:23:45 <brtknr> >I hope at least, because k8s 1.16 is not playing nice in a container
09:23:48 <brtknr> in what way?
09:24:13 <strigazi> the kubelet container is not propagating the mounts to the host
09:24:36 <strigazi> only kubelet, the others are fine
09:25:11 <strigazi> let's move to nodegroups? we won't solve this here
09:25:22 <strigazi> I mean the 1.16 issue
09:25:40 <brtknr> sounds like problem with podman?
09:26:09 <strigazi> that was with atomic, not podman
09:26:17 <strigazi> podman, atomic, they all use runc
09:26:48 <brtknr> okay.. anyway i think what we need is to convince ourselves fcos is the best alternative
09:27:49 <brtknr> before we build more momentum
09:28:20 <strigazi> I'm convinced, whenever I ask the ubuntu community for help, I found the door closed
09:28:51 <brtknr> I suppose the community is an important aspect...
09:29:00 <strigazi> brtknr: what are your concerns?
09:29:48 <strigazi> anyway, we will go to their meeting and we see.
09:30:00 <strigazi> #topic Nodegroups
09:30:03 <brtknr> i am just concerned about the risks as it seems experimantal
09:30:23 <brtknr> as with all exciting things XD
09:30:57 <strigazi> compared with centos, openstack and kubernetes is too experimental
09:31:28 <strigazi> or compared with debian, apache server, I can bring more :)
09:32:05 <brtknr> i think we need to find out from fcos community what are the things that are going to stay and things that may be uprooted
09:32:12 <brtknr> but happy to move on to nodegroups
09:35:10 <strigazi> we are fixing an issue with labels, brtknr did you find anything?
09:35:19 <strigazi> did manage to add nodegroups?
09:35:23 <strigazi> did you manage to add nodegroups?
09:36:45 <brtknr> strigazi: i just created a cluster but i was using kube_tag=v1.16.0 so it failed
09:36:55 <brtknr> retrying now with v1.15.3
09:38:26 <brtknr> but i have tested the full lifecycle in one of the earlier patchsets
09:38:33 <brtknr> create update and delete, also scaling
09:38:41 <brtknr> and everything seemed to work for me
09:39:01 <brtknr> also nice work adding the tests ttsiouts
09:39:13 <brtknr> it feels like a complete package now
09:39:42 <ttsiouts> brtknr: i'll push again today adapting your comments
09:40:08 <ttsiouts> brtknr: we also identified this issue with labels that strigazi mentioned
09:40:34 <ttsiouts> brtknr: thanks for testing!
09:43:01 <strigazi> brtknr: perfect
09:43:12 <brtknr> ttsiouts: i will repost my output to ng-6 saying everything is working for me
09:44:25 <strigazi> excellent
09:44:32 <strigazi> oh, one more thing
09:44:51 <strigazi> for nodegroups, we (CERN) need to spawn cluster across projects
09:44:58 <strigazi> for nodegroups, we (CERN) need to spawn clusters across projects
09:45:14 <strigazi> eg ng1 in project p1 and ng2 in project p2
09:45:16 <brtknr> so one nodegroup in 1 project another in a different project?
09:45:21 <strigazi> yes
09:45:28 <brtknr> that sounds messy...
09:45:38 <strigazi> in the db og ngs, we have proejct_id already
09:45:40 <brtknr> isnt there tenant isolation between networks?
09:45:43 <strigazi> nova is messy
09:46:12 <strigazi> so the mess comes from tehre
09:46:15 <strigazi> so the mess comes from there
09:46:26 <brtknr> or are you planning to use public interface?
09:46:31 <brtknr> s/public/external
09:46:33 <strigazi> public inteface
09:46:44 <brtknr> hmm interesting
09:46:52 <strigazi> yes, it depends on what you use the cluster for
09:47:00 <flwang1> sorry for the late
09:47:15 <brtknr> hi flwang1 :)
09:47:21 <flwang1> was taking care  sick kids
09:47:26 <flwang1> brtknr: hello
09:47:29 <flwang1> is strigazi around?
09:47:31 <strigazi> for our usage and it is not an issue, and it is opt-in anyway
09:47:34 <brtknr> yep
09:47:36 <strigazi> hi flwang1
09:47:40 <flwang1> strigazi: hello
09:47:51 <brtknr> flwang1: hope the kids get better!
09:48:14 <flwang1> brtknr: thanks
09:48:28 <flwang1> my daughter got fever since yesterday
09:48:55 <brtknr> strigazi: is multi project supported in current ng implementation?
09:48:59 <flwang1> anything i can help provide my opinion?
09:49:17 <flwang1> oh, you're discussing the ng stuff
09:49:26 <strigazi> brtknr: no, but it is a small change
09:49:29 <brtknr> flwang1: i cannot reproduce the issue you commented in ng-6 patch
09:49:51 <brtknr> strigazi: is this for admins only?
09:49:58 <strigazi> brtknr: no
09:50:09 <strigazi> brtknr: 100% for uers
09:50:15 <strigazi> brtknr: 100% for users
09:51:02 <flwang1> brtknr: in my testing, after removing the 4 new fields from nodegroup table, the cluster is getting stable
09:51:04 <strigazi> brtknr: nova doesn't have accounting, for gpus, FPGAs, Ironic cpus are == to vcpus
09:51:09 <flwang1> i haven't dig into the root cause
09:51:18 <strigazi> flwang1: what fields?
09:51:28 <strigazi> what are you talking about?
09:51:32 <flwang1> stack_id, status, status_reason, version
09:51:48 <strigazi> you dropped things from the db?
09:51:54 <strigazi> are all migrations done?
09:52:07 <brtknr> flwang1: did you apply `magnum-db-manage upgrade` after checking out ng-9?
09:52:13 <flwang1> i did
09:52:16 <flwang1> for sure
09:52:24 <brtknr> i didn't need to delete anything
09:52:32 <flwang1> i mean
09:52:37 <strigazi> what is the error?
09:52:51 <strigazi> not the VM restarts/rebuils that is irrelevant
09:52:54 <flwang1> i have mentioned the error i saw in the ng6 patch
09:53:09 <brtknr> i also had to checkout the change in python-magnumclient then `pip install -e .`
09:53:42 <flwang1> the problem i got is the vm restart/rebuild
09:54:07 <strigazi> bad nova
09:54:11 <strigazi> no resources
09:54:23 <strigazi> when heat sends a req to nova
09:54:27 <strigazi> and nova fails
09:54:31 <strigazi> heat retries
09:54:40 <strigazi> deletes the old vm and tries again
09:54:58 <strigazi> same everything but diferent uuid
09:55:00 <flwang1> strigazi: so you mean, it's because my env(devstack) is lacking resource?
09:55:11 <strigazi> this happens when you don't have resources
09:55:12 <strigazi> yes
09:55:26 <strigazi> or it missbehaves in some other way
09:55:30 <flwang1> strigazi: ok, i will test again tomorrow then
09:55:34 <strigazi> eg can't create ports
09:55:52 <strigazi> try the minumum possible
09:56:28 <flwang1> ok, i don't really worry about the ng work, overall looks good for me
09:56:57 <strigazi> ok, if I +2 and bharat verifies you are ok?
09:57:32 <strigazi> we test at cern in three different dev envs plus bharat's tests
09:58:17 <flwang1> strigazi: i'm ok with that
09:58:33 <strigazi> flwang1: for train?
09:59:03 <brtknr> I'm mostly happy to get things merged after rebase and addressing all the minor comments, now that we also have solid unit tests... i am sure we will find minor issues with it later but its been hanging around for too long :)
09:59:10 <flwang1> strigazi: for train
09:59:15 <flwang1> just one silly question
09:59:29 <flwang1> what's the 'version' standing for in the ng table?
09:59:38 <flwang1> i can't see a description for that
10:00:04 <strigazi> placeholder for uprades with nodereplacement
10:00:40 <strigazi> now it will work as it is implemented
10:00:48 <strigazi> or we can leverage it now
10:01:05 <flwang1> so it's a version as kube_tag?
10:01:42 <strigazi> give me 5', sorru
10:01:44 <strigazi> give me 5', sorry
10:02:36 <brtknr> i have to leave in 30 minutes for our team standup
10:03:17 <flwang1> brtknr: no problem
10:03:38 <flwang1> i will be offline in 15 mins as well
10:03:56 <flwang1> i'm addressing the comments from heat team for the ignition patch
10:04:07 <flwang1> i'm very happy they're generally OK with that
10:04:18 <brtknr> btw can we start using etherpad for agenda like other teams, e.g. keystone: https://etherpad.openstack.org/p/keystone-weekly-meeting
10:04:36 <brtknr> and put a link to this in the channel's idle topic
10:04:40 <flwang1> brtknr: we were using wiki, but  i'm ok with etherpad
10:05:15 <brtknr> or a link to wiki... i prefer the etherpad UI...
10:05:35 <brtknr> https://etherpad.openstack.org/p/magnum-weekly-meeting
10:05:37 <brtknr> there
10:05:39 <brtknr> :)
10:07:21 <flwang1> cool
10:08:15 <flwang1> i just proposed a new patchset for the ignition patch
10:14:52 <brtknr> flwang1: looks pretty solid
10:16:02 <flwang1> after it's done, we still have quite a lot work on magnum side to get the fedora coreos driver ready
10:16:04 <strigazi> where were we?
10:16:30 <flwang1> strigazi: the fedora coreos driver
10:16:43 <strigazi> flwang1: for ngs
10:16:50 <flwang1> for ngs
10:17:02 <flwang1> (22:01:04) flwang1: so it's a version as kube_tag?
10:17:04 <strigazi> flwang1:  the ngs in different projects, is there an issue?
10:17:15 <strigazi> flwang1: oh, this
10:17:24 <strigazi> flwang1: we can use it now too
10:17:48 <flwang1> so the version is the coe version of current node group?
10:18:10 <flwang1> the current name 'version' is quite confused for me
10:19:14 <strigazi> this is an incremental version for the ng
10:19:23 <strigazi> so that we have some tracking
10:19:43 <strigazi> when a user upgrades something
10:20:18 <strigazi> but for now it is a placeholder, to implement it
10:20:56 <strigazi> makes sense?
10:21:49 <strigazi> brtknr: before you go, any reason to not have ngs in different projects as an opt-in option?
10:22:22 <strigazi> brtknr: flwang1: still here?
10:22:24 <brtknr> strigazi: i dont have major objections to it but perhaps this can be added on later?
10:22:29 <flwang1> i'm
10:22:38 <flwang1> i'm thinking and checking the gke api
10:22:40 <brtknr> or is it required imminintly
10:22:44 <flwang1> https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.locations.clusters.nodePools#NodePool
10:22:48 <strigazi> brtknr: why? for us it is
10:23:00 <flwang1> what do you mean ngs in different project?
10:23:44 <strigazi> flwang1: because nova doesn't have accounting for GPUs, FPGAs, and ironic cpus are accounted as vcpus
10:23:45 <brtknr> strigazi: i find it slightly unintuitive
10:24:13 <strigazi> brtknr: we won't advertise this
10:24:14 <brtknr> i was under the impression that projects imply complete separation
10:24:39 <strigazi> this doesn't say much ^^
10:24:46 <brtknr> i prefer supporting ng per region under the same project
10:24:51 <strigazi> we do multicloud applications
10:24:54 <flwang1> strigazi:  can you explain 'ngs in different projects?
10:25:17 <brtknr> flwang1: so ng1 lives in project A and ng2 lives in project B, both part of the same cluster
10:25:22 <flwang1> does that mean cluster 1 in project A can have a NG which belongs to project B?
10:25:32 <strigazi> again, this is opt-in
10:25:44 <flwang1> brtknr: that doesn't sound good for me
10:26:00 <brtknr> i was under the impresssion that a cluster belongs to a project
10:26:07 <strigazi> if magnum doesn't have it, we will investigate something else
10:26:30 <flwang1> if we want to have it, it needs to be disabled by default
10:26:45 <flwang1> unless the cloud operators enable it
10:26:49 <brtknr> it then seems like a jump in logic to have children nodegroups spanning different projects
10:26:56 <strigazi> that makes 100% sense
10:27:10 <strigazi> if we want to have it, it needs to be disabled by default that
10:27:38 <strigazi> brtknr: how do you do accounting for ironic nodes mixed with vms?
10:28:09 <strigazi> everything starts from there
10:28:14 <strigazi> and nova cells
10:28:50 <strigazi> in the ideal openstack cloud, I understand, it does not make sens.
10:29:20 <brtknr> strigazi: okay i'm happy with disabled by default.
10:30:34 <strigazi> flwang1: ?
10:30:56 <strigazi> policy or config option?
10:31:01 <flwang1> i'm ok, if it's disabled by default
10:31:07 <flwang1> config
10:31:35 <flwang1> i just worry about the security hell
10:31:38 <strigazi> flwang1: brtknr I'll send you a presentation why we do it
10:31:46 <flwang1> strigazi: pls do
10:32:07 <strigazi> flwang1: don't, because for a cloud with proper network it won't work anywya
10:32:27 <strigazi> well, it can work
10:32:35 <strigazi> but you need an extra router
10:32:41 <strigazi> vrouter
10:33:06 <flwang1> from a public cloud pov, it doesn't make sense
10:33:06 <strigazi> flwang1: this might be also usefull for running the master nodes in the operators tenant :)
10:33:17 <strigazi> well, see my comment above :)
10:33:22 <flwang1> strigazi: i can see the extra benefit ;)
10:33:40 <flwang1> i have to go, sorry
10:33:43 <strigazi> sorry, I completely forgot about that
10:33:46 <flwang1> it's late here
10:33:50 <strigazi> ok
10:33:57 <strigazi> see you
10:34:02 <strigazi> #endmeeting