14:01:25 #startmeeting kuryr 14:01:26 Meeting started Mon Jan 8 14:01:25 2018 UTC and is due to finish in 60 minutes. The chair is irenab. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:29 The meeting name has been set to 'kuryr' 14:01:51 hi, who is here for kuryr weekly? 14:02:06 +1 14:02:26 hi! 14:02:38 o/ 14:03:00 #chair ltomasbo 14:03:01 Current chairs: irenab ltomasbo 14:03:15 ltomasbo, added you since I will have to leave in about 20 mins 14:03:41 lets start with kuryr-kubernetes? 14:03:46 yes 14:03:48 #chair apuimedo 14:03:49 Current chairs: apuimedo irenab ltomasbo 14:03:59 ok 14:04:03 #topic kuryr-kubernetes 14:04:38 #info an important fix for kubelet retries has been merged 210603 14:04:43 damn, wrong number 14:04:54 #link https://review.openstack.org/518404 14:05:30 #info The new readiness check server has been marked for merging too 14:05:47 apuimedo, question on that one 14:05:51 I'm not sure if 518404 it's so important, but it helps a bit with failures. :) 14:06:37 Regarding readiness check, I suggest to make the probe loadable via stevedore to keep the kuryr infra as generic as it used to be 14:06:56 dulek: it helps prevent an very frustrating issue 14:06:58 Can be done as a follow up, just wanted to check if agreed 14:07:02 makes it important for ux 14:07:15 dulek, link? 14:07:18 irenab: you meant the stevedorization? 14:07:29 apuimedo, yes :-) 14:07:36 irenab: I've meant https://review.openstack.org/518404 - the retries fix. 14:07:39 maysamacedos is here too now 14:07:59 dulek, important fix 14:08:22 irenab: well, I do think that kuryr-kubernetes should be able to have methods that are used for readiness and health checks 14:08:47 so that when they are instantiated they get registered with the readiness/health check servers 14:09:02 I kind like the way we had kuryr as generic integration framework, so it could be used to have non neutron drivers 14:09:11 irenab: dulek: granted it only affected baremetal 14:09:17 apuimedo, no argument on your point 14:09:19 so that makes it slightly less damning 14:09:51 I just think that it should not be explicetly hard coded in the kuryr server code, but loadable based on deployment option 14:09:57 irenab: currently maysamacedos and I were about to discuss about the approach for the liveness checks 14:10:31 I suggested having the liveness check server hold references to the handler threads for checks 14:10:51 but it would probably be better that the references are used to ask the handler instances themselves 14:11:03 apuimedo, not sure I follow 14:11:05 since it could happen that the thread is alive 14:11:16 but the handler is in a bad state 14:11:43 irenab: we want to add liveness checks that monitor the health of the vif/service/etc handlers as well 14:12:03 apuimedo: You mean the watcher threads? 14:12:19 dulek: yes, the different threads on which the handlers run 14:12:49 apuimedo, important to have that too. My point is that we need to have this not explicit, but based on the drivers used in the deployment to keep the current pluggability 14:13:04 apuimedo: I'm not sure if it's worth it, I think Watcher threads has protection from unexpected exceptions. 14:13:05 what I'm saying is that probably we should have protocols for instances to register to the readiness/liveness check servers 14:13:13 that can be implemented by any instance 14:13:23 apuimedo: Ah, that might be better. 14:13:29 irenab: yes, but maybe we don't need stevedore for that 14:13:37 stevedore handles loading of the instances 14:13:40 that part we already have 14:13:48 I'm just saying that as part of the loading 14:13:53 apuimedo, not for the readiness probe 14:13:54 that currently exists 14:14:15 they could register with readiness/liveness depending on which of those two protocols they implement 14:14:26 dulek: irenab: maysamacedos: how does that sound? 14:14:39 line 53 is bothering me https://review.openstack.org/#/c/529335/17/kuryr_kubernetes/controller/service.py 14:15:08 irenab: why? 14:15:17 I think we always want a readiness check server 14:15:27 what should be different 14:15:43 explicit call to the brobe that checks for neutron and keystone. What if kuryr is used for non neutron drivers? 14:15:53 is that only checks for components that have been plugged should be performed in it 14:16:03 irenab: that's what I'm saying 14:16:05 :-) 14:16:12 you want the readiness checker running 14:16:16 We started to play with adding DF native drivers to kuryr 14:16:19 the currently not ideal thing 14:16:30 so we will check for DF readiness and not neutron 14:16:35 is that now it is a module that just performs always the same checks 14:16:46 what I want is that the neutron vif handler 14:16:51 apuimedo: not sure I get it what you meant with "protocols for instances to register to the readiness/liveness" 14:16:53 registers its checking method 14:17:01 which checks neutron 14:17:07 the neutron service handler 14:17:11 apuimedo, sounds reasonable if I got it correctly 14:17:15 will check lbaasv2 or octavia 14:17:38 I'm +1 on that. 14:17:39 but we do want a single health checker server in the controller 14:18:12 apuimedo, agreed, but this will invoke routines of enabled Handlers/drivers 14:18:38 not necessarily 14:18:57 they may be external to the instances in some cases 14:18:58 so how this will be extendable? 14:19:07 the instance sets what it needs 14:19:17 apuimedo, instance of what? 14:19:22 Handler? 14:19:37 handlers and drivers 14:19:44 More like driver. :) 14:19:45 I think both could need specific checks 14:19:57 handlers need checks for liveness mostly 14:20:03 in our neutron case 14:20:25 they may need more for DF if they want to check redis for example, I don't know 14:20:50 We can have sort of dedicated hierarchy for the probes 14:21:13 irenab: what I want to avoid for sure is to have repeated probes 14:21:19 probe routines 14:21:36 so if two drivers need the same Neutron support (not different exts), I only want one check 14:21:38 to be performed by single health server 14:21:44 exactly 14:22:08 but that is a bit of fine tuning 14:22:19 routines can be added and for deployment just 'checked' to be invoked 14:22:30 and if in the beginning there is multiple neutron extension checks, clocking that at 22ms per check is not too terrible 14:23:16 so checking same one several times will end up calling this routine once 14:23:30 irenab: that will be the ideal 14:23:40 but we don't need to get there in a single patch 14:24:04 as long as we agree on the longer term generic approach, I am fine with merging the current patch 14:24:05 maysamacedos: what do you think of this approach? 14:24:19 taking the checks to their respective handler/drivers? 14:25:23 apuimedo, may I interrupt with short update on net policies on behalf of leyal- ? I have to leave in a few mins? 14:26:07 irenab: irenab 14:26:11 sure! 14:26:31 apuimedo: Isn't that kind what I suggested in the email? 14:26:35 The spec is up for review https://review.openstack.org/#/c/519239/ 14:26:54 There are 2 patches WIP mainly for missing unit tests 14:27:07 https://review.openstack.org/#/c/530655/ 14:27:10 maysamacedos: in the email about the references to the task? 14:27:23 https://review.openstack.org/#/c/526916/ 14:27:29 apuimedo: eys 14:27:30 yes 14:27:44 maysamacedos: what we're saying is to move the check code out of the server 14:27:49 Please review and comment, the most important is the spec patch to agree on direction of the implementation 14:28:02 so each module/class can contribute its own readiness and liveness code 14:28:21 and the readiness/health check servers just load id 14:28:30 hmm I see 14:28:30 apuimedo, all, please review the Net policies spec 14:28:33 (not the k8s readiness check) 14:28:36 irenab: will do 14:28:52 thanks! /me sorry, have to leave 14:28:53 irenab: have you tried straight to DF policy translation? 14:29:22 apuimedo, no, not yet 14:29:48 first with translation to neutron SGs 14:29:55 irenab: and for port operations, how much faster is it, bypassing neutron 14:30:09 ? 14:30:15 ltomasbo: are you here? 14:30:22 I am 14:30:37 apuimedo, have to measure, the patch is here but it is very WIP: https://review.openstack.org/#/c/529971/ 14:30:51 how's the node depending vif driver? 14:31:19 irenab: cool! 14:31:25 ttyl then! 14:31:39 apuimedo, you mean this: https://review.openstack.org/#/c/528345? 14:31:41 apuimedo: You mean stuff I'm working on? 14:32:01 the multi-pool multi-vif thing? 14:33:46 apuimedo: what you meant with the checks servers load the id? 14:36:51 apuimedo's just sitting there wondering what we'll figure out from his cryptic message. :D 14:36:59 ltomasbo: right 14:37:01 that 14:37:12 maysamacedos: id? 14:37:25 so, I tested it and it was working (before Christmas) 14:37:29 dulek: apuimedo was taking the baby outside of the room :P 14:37:39 but I would like to check it at a real mix environment 14:37:40 sorry 14:37:42 :-) 14:37:50 ltomasbo: ok 14:37:52 as I just checked from one or the other environment, with the multi driver 14:38:06 maysamacedos: not sure about "the id" part 14:38:12 a typo maybe 14:38:14 detecting the pool driver to use from the node vif label 14:38:30 apuimedo: thats what you said "the readiness/health check servers just load id " 14:38:33 ltomasbo: with multi node devstack it should be doable 14:38:34 I'm deploying the devstack with kubelet in one port to test 14:38:37 probably a typo 14:38:41 maysamacedos: typo then 14:38:48 mean "just load it" 14:38:55 apuimedo, yes, but our current local.conf are kind of broken 14:39:04 ltomasbo: that's interesting 14:39:05 I got a problem with the etcd running on a nested configuration 14:39:06 how? 14:39:28 maysamacedos: so I meant that when we load the drivers/handlers, they can contain methods for the health check servers 14:39:30 becasue devstack now forces etcd to be installed on the server_host, instead of on the local one 14:39:41 and for the readiness one too 14:39:44 and I needed local to be nested 14:40:05 why? 14:40:09 and I just created the port on the pods network and deploying on baremetal on the remove server 14:40:22 apuimedo: https://github.com/openstack-dev/devstack/commit/146332e349416ac0b3c9653b0ae68d55dbb3f9de 14:42:38 apuimedo: https://github.com/openstack-dev/devstack/commit/146332e349416ac0b3c9653b0ae68d55dbb3f9de 14:42:58 and it also fails on the remove baremetal devstack node, I'm going to check what the problem is there 14:43:19 ltomasbo: crap 14:43:20 umm 2018-01-08 14:41:43.053 | ++ /opt/stack/kuryr-kubernetes/devstack/plugin.sh:extract_hyperkube:497 : docker cp :/hyperkube /tmp/hyperkube 14:43:20 2018-01-08 14:41:43.069 | must specify at least one container source 14:43:21 ok, that's a bug 14:43:23 :-) 14:43:34 we should have our local.conf put :: then 14:43:43 or just '0' 14:44:14 apuimedo: got it 14:44:25 maysamacedos: ;-) 14:44:52 apuimedo, but still the neutron services are on the Service_host 14:44:56 so, we also need that 14:46:19 ltomasbo: I'm not sure I follow 14:46:51 devstack with everything (including etcd) and another with just the kubelet running on a nova VM 14:46:52 apuimedo, are you suggesting our local.conf just set server_host to ::: 14:47:06 one uses veth, the other pod in VM 14:47:24 ahh, ok, now I got what you meant 14:47:33 ltomasbo: on the devstack with everything the server host would be 0.0.0.0:2379 14:47:35 the other way around I was doing... 14:47:37 for etcd 14:47:53 apuimedo: but what this methods would check then? since they are in each module they could not verify the thread state neither if an exception occured.. or they could? 14:48:34 maysamacedos: would not be called on the instance 14:48:44 what you wrote on the email 14:48:55 but outside of the liveness monitor 14:49:07 probably on the watcher instance 14:49:27 ltomasbo: anything else? 14:49:32 or did I miss something? 14:49:47 apuimedo, that's it from my side 14:50:07 good! 14:50:11 Okay, so maybe a quick update from me? 14:50:16 dulek: please! 14:50:21 Simple one: https://review.openstack.org/#/c/531128/ 14:50:34 I've tried K8s 1.9 on the gate, everything looks fine. 14:50:57 I don't know what policy we have on updating it. And I don't have a strong opinion on that. 14:51:40 the policy is, if it ain't failing, upgrade 14:51:41 But in any case once we decide to move forward we should be fine with that. 14:52:06 so that we know that the queens release works well with 1.9 14:52:30 I wonder if we want to gate on older versions as well? 14:52:38 Do we have support matrix or sth like that? 14:54:04 I guess it's something we need to discuss with irenab as well. 14:54:12 dulek: indeed 14:54:32 let's discuss it tomorrow morning when irenab and dmellado will also be able to weigh in 14:54:41 Second thing is I've updated https://review.openstack.org/#/c/527243 with apuimedo's remarks. 14:55:09 Now pooled VIFs are attached to a dummy namespace when discovered and on CNI ADD they're only replugged (at least I think they are). 14:55:35 dulek: for baremetal or for all? 14:55:59 apuimedo: Good question, only bare metal for now, I've left nested in TODO for now. 14:56:02 good! 14:56:05 I'm trying to time BM case now to see if there's a benefit from that in a single-node case. 14:56:07 :-) 14:56:39 dulek: with port pooling I would seriously hope there's a big improvement 14:56:46 First results show a tiny performance improvement, but I'm trying now with higher number of ports. 14:56:48 since there's only a veth change of NS 14:57:03 instead of activation + netlink creation + move 14:57:16 dulek: first create 50 ports 14:57:26 wait until the dummy namespace is full 14:57:29 then create the pods 14:57:31 apuimedo: That's how I'm trying to test it. 14:57:32 that's the scenario 14:57:34 ok 14:57:51 apuimedo: Please take a look at binding code then. I may do too much stuff on the reattachment side. 14:57:58 dulek: will do! 14:58:07 apuimedo: An advice there would help. :) 14:58:12 anything else anybody? 14:58:14 Okay, that's it! 14:58:18 I Can update from my side 14:58:24 yboaron: please do! 14:58:24 Please ,Review https://review.openstack.org/#/c/529966/ 14:58:32 oh fuck 14:58:33 The devref to support Ingress controller , the OCP-route support will be based on this common part 14:58:33 (L7 routers) 14:58:51 apuimedo, fuck the ingres-controller ??? 14:58:53 I wanted to spend some time discussing your proposal for separate router/ingress controller component 14:59:06 but irenab and dmellado left already 14:59:13 let's discuss that tomorrow morning, please 14:59:14 I almost finished splitting the patch into 3 pieces , but there's an open issue if we need to have a seperate controller for that purpose, 14:59:14 something similiar to NGINX/GCE ingress controller, 14:59:14 or just extend kuryr-controller to support ingress and OCP-Route 14:59:43 yboaron: I'll review the patch ;-) 14:59:45 apuimedo, so maybe we can do it tomorrow on kuryr channel 14:59:51 dulek: take a look at it too 14:59:56 yboaron: we'll have to 14:59:58 :-) 15:00:06 there's not enough people now to discuss 15:00:10 and the time just ran out 15:00:16 THanks all for joining 15:00:18 ! 15:00:20 #endmeeting