#openstack-meeting-4 log

14:01:25 <irenab> #startmeeting kuryr
14:01:26 <openstack> Meeting started Mon Jan  8 14:01:25 2018 UTC and is due to finish in 60 minutes.  The chair is irenab. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:29 <openstack> The meeting name has been set to 'kuryr'
14:01:51 <irenab> hi, who is here for kuryr weekly?
14:02:06 <yboaron> +1
14:02:26 <ltomasbo> hi!
14:02:38 <dulek> o/
14:03:00 <irenab> #chair ltomasbo
14:03:01 <openstack> Current chairs: irenab ltomasbo
14:03:15 <irenab> ltomasbo, added you since I will have to leave in about 20 mins
14:03:41 <irenab> lets start with kuryr-kubernetes?
14:03:46 <apuimedo> yes
14:03:48 <irenab> #chair apuimedo
14:03:49 <openstack> Current chairs: apuimedo irenab ltomasbo
14:03:59 <ltomasbo> ok
14:04:03 <irenab> #topic kuryr-kubernetes
14:04:38 <apuimedo> #info an important fix for kubelet retries has been merged 210603
14:04:43 <apuimedo> damn, wrong number
14:04:54 <apuimedo> #link https://review.openstack.org/518404
14:05:30 <apuimedo> #info The new readiness check server has been marked for merging too
14:05:47 <irenab> apuimedo, question on that one
14:05:51 <dulek> I'm not sure if 518404 it's so important, but it helps a bit with failures. :)
14:06:37 <irenab> Regarding readiness check, I suggest to make the probe loadable via stevedore to keep the kuryr infra as generic as it used to be
14:06:56 <apuimedo> dulek: it helps prevent an very frustrating issue
14:06:58 <irenab> Can be done as a follow up, just wanted to check if agreed
14:07:02 <apuimedo> makes it important for ux
14:07:15 <irenab> dulek, link?
14:07:18 <apuimedo> irenab: you meant the stevedorization?
14:07:29 <irenab> apuimedo, yes :-)
14:07:36 <dulek> irenab: I've meant https://review.openstack.org/518404 - the retries fix.
14:07:39 <apuimedo> maysamacedos is here too now
14:07:59 <irenab> dulek, important fix
14:08:22 <apuimedo> irenab: well, I do think that kuryr-kubernetes should be able to have methods that are used for readiness and health checks
14:08:47 <apuimedo> so that when they are instantiated they get registered with the readiness/health check servers
14:09:02 <irenab> I kind like the way we had kuryr as generic integration framework, so it could be used to have non neutron drivers
14:09:11 <apuimedo> irenab: dulek: granted it only affected baremetal
14:09:17 <irenab> apuimedo, no argument on your point
14:09:19 <apuimedo> so that makes it slightly less damning
14:09:51 <irenab> I just think that it should not be explicetly hard coded in the kuryr server code, but loadable based on deployment option
14:09:57 <apuimedo> irenab: currently maysamacedos and I were about to discuss about the approach for the liveness checks
14:10:31 <apuimedo> I suggested having the liveness check server hold references to the handler threads for checks
14:10:51 <apuimedo> but it would probably be better that the references are used to ask the handler instances themselves
14:11:03 <irenab> apuimedo, not sure I follow
14:11:05 <apuimedo> since it could happen that the thread is alive
14:11:16 <apuimedo> but the handler is in a bad state
14:11:43 <apuimedo> irenab: we want to add liveness checks that monitor the health of the vif/service/etc handlers as well
14:12:03 <dulek> apuimedo: You mean the watcher threads?
14:12:19 <apuimedo> dulek: yes, the different threads on which the handlers run
14:12:49 <irenab> apuimedo, important to have that too. My point is that we need to have this not explicit, but based on the drivers used in the deployment to keep the current pluggability
14:13:04 <dulek> apuimedo: I'm not sure if it's worth it, I think Watcher threads has protection from unexpected exceptions.
14:13:05 <apuimedo> what I'm saying is that probably we should have protocols for instances to register to the readiness/liveness check servers
14:13:13 <apuimedo> that can be implemented by any instance
14:13:23 <dulek> apuimedo: Ah, that might be better.
14:13:29 <apuimedo> irenab: yes, but maybe we don't need stevedore for that
14:13:37 <apuimedo> stevedore handles loading of the instances
14:13:40 <apuimedo> that part we already have
14:13:48 <apuimedo> I'm just saying that as part of the loading
14:13:53 <irenab> apuimedo, not for the readiness probe
14:13:54 <apuimedo> that currently exists
14:14:15 <apuimedo> they could register with readiness/liveness depending on which of those two protocols they implement
14:14:26 <apuimedo> dulek: irenab: maysamacedos: how does that sound?
14:14:39 <irenab> line 53 is bothering me https://review.openstack.org/#/c/529335/17/kuryr_kubernetes/controller/service.py
14:15:08 <apuimedo> irenab: why?
14:15:17 <apuimedo> I think we always want a readiness check server
14:15:27 <apuimedo> what should be different
14:15:43 <irenab> explicit call to the brobe that checks for neutron and keystone. What if kuryr is used for non neutron drivers?
14:15:53 <apuimedo> is that only checks for components that have been plugged should be performed in it
14:16:03 <apuimedo> irenab: that's what I'm saying
14:16:05 <apuimedo> :-)
14:16:12 <apuimedo> you want the readiness checker running
14:16:16 <irenab> We started to play with adding DF native drivers to kuryr
14:16:19 <apuimedo> the currently not ideal thing
14:16:30 <irenab> so we will check for DF readiness and not neutron
14:16:35 <apuimedo> is that now it is a module that just performs always the same checks
14:16:46 <apuimedo> what I want is that the neutron vif handler
14:16:51 <maysamacedos> apuimedo: not sure I get it what you meant with "protocols for instances to register to the readiness/liveness"
14:16:53 <apuimedo> registers its checking method
14:17:01 <apuimedo> which checks neutron
14:17:07 <apuimedo> the neutron service handler
14:17:11 <irenab> apuimedo, sounds reasonable if I got it correctly
14:17:15 <apuimedo> will check lbaasv2 or octavia
14:17:38 <dulek> I'm +1 on that.
14:17:39 <apuimedo> but we do want a single health checker server in the controller
14:18:12 <irenab> apuimedo, agreed, but this will invoke routines of enabled Handlers/drivers
14:18:38 <apuimedo> not necessarily
14:18:57 <apuimedo> they may be external to the instances in some cases
14:18:58 <irenab> so how this will be extendable?
14:19:07 <apuimedo> the instance sets what it needs
14:19:17 <irenab> apuimedo, instance of what?
14:19:22 <dulek> Handler?
14:19:37 <apuimedo> handlers and drivers
14:19:44 <dulek> More like driver. :)
14:19:45 <apuimedo> I think both could need specific checks
14:19:57 <apuimedo> handlers need checks for liveness mostly
14:20:03 <apuimedo> in our neutron case
14:20:25 <apuimedo> they may need more for DF if they want to check redis for example, I don't know
14:20:50 <irenab> We can have sort of dedicated hierarchy for the probes
14:21:13 <apuimedo> irenab: what I want to avoid for sure is to have repeated probes
14:21:19 <irenab> probe routines
14:21:36 <apuimedo> so if two drivers need the same Neutron support (not different exts), I only want one check
14:21:38 <irenab> to be performed by single health server
14:21:44 <apuimedo> exactly
14:22:08 <apuimedo> but that is a bit of fine tuning
14:22:19 <irenab> routines can be added and for deployment just 'checked' to be invoked
14:22:30 <apuimedo> and if in the beginning there is multiple neutron extension checks, clocking that at 22ms per check is not too terrible
14:23:16 <irenab> so checking same one several times will end up calling this routine once
14:23:30 <apuimedo> irenab: that will be the ideal
14:23:40 <apuimedo> but we don't need to get there in a single patch
14:24:04 <irenab> as long as we agree on the longer term generic approach, I am fine with merging the current patch
14:24:05 <apuimedo> maysamacedos: what do you think of this approach?
14:24:19 <apuimedo> taking the checks to their respective handler/drivers?
14:25:23 <irenab> apuimedo, may I interrupt with short update on net policies on behalf of leyal- ? I have to leave in a few mins?
14:26:07 <apuimedo> irenab: irenab
14:26:11 <apuimedo> sure!
14:26:31 <maysamacedos> apuimedo: Isn't that kind what I suggested in the email?
14:26:35 <irenab> The spec is up for review https://review.openstack.org/#/c/519239/
14:26:54 <irenab> There are 2 patches WIP mainly for missing unit tests
14:27:07 <irenab> https://review.openstack.org/#/c/530655/
14:27:10 <apuimedo> maysamacedos: in the email about the references to the task?
14:27:23 <irenab> https://review.openstack.org/#/c/526916/
14:27:29 <maysamacedos> apuimedo: eys
14:27:30 <maysamacedos> yes
14:27:44 <apuimedo> maysamacedos: what we're saying is to move the check code out of the server
14:27:49 <irenab> Please review and comment, the most important is the spec patch to agree on direction of the implementation
14:28:02 <apuimedo> so each module/class can contribute its own readiness and liveness code
14:28:21 <apuimedo> and the readiness/health check servers just load id
14:28:30 <maysamacedos> hmm I see
14:28:30 <irenab> apuimedo, all, please review the Net policies spec
14:28:33 <apuimedo> (not the k8s readiness check)
14:28:36 <apuimedo> irenab: will do
14:28:52 <irenab> thanks! /me sorry, have to leave
14:28:53 <apuimedo> irenab: have you tried straight to DF policy translation?
14:29:22 <irenab> apuimedo, no, not yet
14:29:48 <irenab> first with translation to neutron SGs
14:29:55 <apuimedo> irenab: and for port operations, how much faster is it, bypassing neutron
14:30:09 <apuimedo> ?
14:30:15 <apuimedo> ltomasbo: are you here?
14:30:22 <ltomasbo> I am
14:30:37 <irenab> apuimedo, have to measure, the patch is here but it is very WIP: https://review.openstack.org/#/c/529971/
14:30:51 <apuimedo> how's the node depending vif driver?
14:31:19 <apuimedo> irenab: cool!
14:31:25 <apuimedo> ttyl then!
14:31:39 <ltomasbo> apuimedo, you mean this: https://review.openstack.org/#/c/528345?
14:31:41 <dulek> apuimedo: You mean stuff I'm working on?
14:32:01 <ltomasbo> the multi-pool multi-vif thing?
14:33:46 <maysamacedos> apuimedo: what you meant with the checks servers load the id?
14:36:51 <dulek> apuimedo's just sitting there wondering what we'll figure out from his cryptic message. :D
14:36:59 <apuimedo> ltomasbo: right
14:37:01 <apuimedo> that
14:37:12 <apuimedo> maysamacedos: id?
14:37:25 <ltomasbo> so, I tested it and it was working (before Christmas)
14:37:29 <apuimedo> dulek: apuimedo was taking the baby outside of the room :P
14:37:39 <ltomasbo> but I would like to check it at a real mix environment
14:37:40 <apuimedo> sorry
14:37:42 <apuimedo> :-)
14:37:50 <apuimedo> ltomasbo: ok
14:37:52 <ltomasbo> as I just checked from one or the other environment, with the multi driver
14:38:06 <apuimedo> maysamacedos: not sure about "the id" part
14:38:12 <apuimedo> a typo maybe
14:38:14 <ltomasbo> detecting the pool driver to use from the node vif label
14:38:30 <maysamacedos> apuimedo: thats what you said "the readiness/health check servers just load id "
14:38:33 <apuimedo> ltomasbo: with multi node devstack it should be doable
14:38:34 <ltomasbo> I'm deploying the devstack with kubelet in one port to test
14:38:37 <maysamacedos> probably a typo
14:38:41 <apuimedo> maysamacedos: typo then
14:38:48 <apuimedo> mean "just load it"
14:38:55 <ltomasbo> apuimedo, yes, but our current local.conf are kind of broken
14:39:04 <apuimedo> ltomasbo: that's interesting
14:39:05 <ltomasbo> I got a problem with the etcd running on a nested configuration
14:39:06 <apuimedo> how?
14:39:28 <apuimedo> maysamacedos: so I meant that when we load the drivers/handlers, they can contain methods for the health check servers
14:39:30 <ltomasbo> becasue devstack now forces etcd to be installed on the server_host, instead of on the local one
14:39:41 <apuimedo> and for the readiness one too
14:39:44 <ltomasbo> and I needed local to be nested
14:40:05 <apuimedo> why?
14:40:09 <ltomasbo> and I just created the port on the pods network and deploying on baremetal on the remove server
14:40:22 <ltomasbo> apuimedo: https://github.com/openstack-dev/devstack/commit/146332e349416ac0b3c9653b0ae68d55dbb3f9de
14:42:38 <ltomasbo> apuimedo: https://github.com/openstack-dev/devstack/commit/146332e349416ac0b3c9653b0ae68d55dbb3f9de
14:42:58 <ltomasbo> and it also fails on the remove baremetal devstack node, I'm going to check what the problem is there
14:43:19 <apuimedo> ltomasbo: crap
14:43:20 <ltomasbo> umm 2018-01-08 14:41:43.053 | ++ /opt/stack/kuryr-kubernetes/devstack/plugin.sh:extract_hyperkube:497 :   docker cp :/hyperkube /tmp/hyperkube
14:43:20 <ltomasbo> 2018-01-08 14:41:43.069 | must specify at least one container source
14:43:21 <apuimedo> ok, that's a bug
14:43:23 <apuimedo> :-)
14:43:34 <apuimedo> we should have our local.conf put :: then
14:43:43 <apuimedo> or just '0'
14:44:14 <maysamacedos> apuimedo: got it
14:44:25 <apuimedo> maysamacedos: ;-)
14:44:52 <ltomasbo> apuimedo, but still the neutron services are on the Service_host
14:44:56 <ltomasbo> so, we also need that
14:46:19 <apuimedo> ltomasbo: I'm not sure I follow
14:46:51 <apuimedo> devstack with everything (including etcd) and another with just the kubelet running on a nova VM
14:46:52 <ltomasbo> apuimedo, are you suggesting our local.conf just set server_host to :::
14:47:06 <apuimedo> one uses veth, the other pod in VM
14:47:24 <ltomasbo> ahh, ok, now I got what you meant
14:47:33 <apuimedo> ltomasbo: on the devstack with everything the server host would be 0.0.0.0:2379
14:47:35 <ltomasbo> the other way around I was doing...
14:47:37 <apuimedo> for etcd
14:47:53 <maysamacedos> apuimedo: but what this methods would check then? since they are in each module they could not verify the thread state neither if an exception occured.. or they could?
14:48:34 <apuimedo> maysamacedos: would not be called on the instance
14:48:44 <apuimedo> what you wrote on the email
14:48:55 <apuimedo> but outside of the liveness monitor
14:49:07 <apuimedo> probably on the watcher instance
14:49:27 <apuimedo> ltomasbo: anything else?
14:49:32 <apuimedo> or did I miss something?
14:49:47 <ltomasbo> apuimedo, that's it from my side
14:50:07 <apuimedo> good!
14:50:11 <dulek> Okay, so maybe a quick update from me?
14:50:16 <apuimedo> dulek: please!
14:50:21 <dulek> Simple one: https://review.openstack.org/#/c/531128/
14:50:34 <dulek> I've tried K8s 1.9 on the gate, everything looks fine.
14:50:57 <dulek> I don't know what policy we have on updating it. And I don't have a strong opinion on that.
14:51:40 <apuimedo> the policy is, if it ain't failing, upgrade
14:51:41 <dulek> But in any case once we decide to move forward we should be fine with that.
14:52:06 <apuimedo> so that we know that the queens release works well with 1.9
14:52:30 <dulek> I wonder if we want to gate on older versions as well?
14:52:38 <dulek> Do we have support matrix or sth like that?
14:54:04 <dulek> I guess it's something we need to discuss with irenab as well.
14:54:12 <apuimedo> dulek: indeed
14:54:32 <apuimedo> let's discuss it tomorrow morning when irenab and dmellado will also be able to weigh in
14:54:41 <dulek> Second thing is I've updated https://review.openstack.org/#/c/527243 with apuimedo's remarks.
14:55:09 <dulek> Now pooled VIFs are attached to a dummy namespace when discovered and on CNI ADD they're only replugged (at least I think they are).
14:55:35 <apuimedo> dulek: for baremetal or for all?
14:55:59 <dulek> apuimedo: Good question, only bare metal for now, I've left nested in TODO for now.
14:56:02 <apuimedo> good!
14:56:05 <dulek> I'm trying to time BM case now to see if there's a benefit from that in a single-node case.
14:56:07 <apuimedo> :-)
14:56:39 <apuimedo> dulek: with port pooling I would seriously hope there's a big improvement
14:56:46 <dulek> First results show a tiny performance improvement, but I'm trying now with higher number of ports.
14:56:48 <apuimedo> since there's only a veth change of NS
14:57:03 <apuimedo> instead of activation + netlink creation + move
14:57:16 <apuimedo> dulek: first create 50 ports
14:57:26 <apuimedo> wait until the dummy namespace is full
14:57:29 <apuimedo> then create the pods
14:57:31 <dulek> apuimedo: That's how I'm trying to test it.
14:57:32 <apuimedo> that's the scenario
14:57:34 <apuimedo> ok
14:57:51 <dulek> apuimedo: Please take a look at binding code then. I may do too much stuff on the reattachment side.
14:57:58 <apuimedo> dulek: will do!
14:58:07 <dulek> apuimedo: An advice there would help. :)
14:58:12 <apuimedo> anything else anybody?
14:58:14 <dulek> Okay, that's it!
14:58:18 <yboaron> I Can update from my side
14:58:24 <apuimedo> yboaron: please do!
14:58:24 <yboaron> Please ,Review https://review.openstack.org/#/c/529966/
14:58:32 <apuimedo> oh fuck
14:58:33 <yboaron> The devref to support Ingress controller , the OCP-route support will be based on this common part
14:58:33 <yboaron> (L7 routers)
14:58:51 <yboaron> apuimedo, fuck the ingres-controller ???
14:58:53 <apuimedo> I wanted to spend some time discussing your proposal for separate router/ingress controller component
14:59:06 <apuimedo> but irenab and dmellado left already
14:59:13 <apuimedo> let's discuss that tomorrow morning, please
14:59:14 <yboaron> I almost finished splitting the patch into 3 pieces , but there's an open issue if we need to have a seperate controller for that purpose,
14:59:14 <yboaron> something similiar to NGINX/GCE ingress controller,
14:59:14 <yboaron> or just extend kuryr-controller to support ingress and OCP-Route
14:59:43 <apuimedo> yboaron: I'll review the patch ;-)
14:59:45 <yboaron> apuimedo, so maybe we can do  it tomorrow on kuryr channel
14:59:51 <apuimedo> dulek: take a look at it too
14:59:56 <apuimedo> yboaron: we'll have to
14:59:58 <apuimedo> :-)
15:00:06 <apuimedo> there's not enough people now to discuss
15:00:10 <apuimedo> and the time just ran out
15:00:16 <apuimedo> THanks all for joining
15:00:18 <apuimedo> !
15:00:20 <apuimedo> #endmeeting