*** salv-orlando has joined #openstack-kuryr | 00:17 | |
*** salv-orlando has quit IRC | 00:22 | |
*** yamamoto has joined #openstack-kuryr | 00:24 | |
*** yamamoto has quit IRC | 00:29 | |
*** aojea has joined #openstack-kuryr | 00:47 | |
*** limao has joined #openstack-kuryr | 00:48 | |
*** aojea has quit IRC | 00:52 | |
*** caowei has joined #openstack-kuryr | 00:53 | |
*** salv-orlando has joined #openstack-kuryr | 01:18 | |
*** kiennt26 has joined #openstack-kuryr | 01:21 | |
*** salv-orlando has quit IRC | 01:22 | |
*** yamamoto has joined #openstack-kuryr | 01:24 | |
*** aojea has joined #openstack-kuryr | 01:48 | |
*** hongbin has joined #openstack-kuryr | 01:49 | |
*** aojea has quit IRC | 01:52 | |
*** wangbo has joined #openstack-kuryr | 02:01 | |
*** hongbin_ has joined #openstack-kuryr | 02:04 | |
*** salv-orlando has joined #openstack-kuryr | 02:18 | |
*** salv-orlando has quit IRC | 02:23 | |
*** aojea has joined #openstack-kuryr | 02:49 | |
*** hongbin_ has quit IRC | 02:51 | |
*** aojea has quit IRC | 02:53 | |
*** wangbo has quit IRC | 03:11 | |
*** wangbo has joined #openstack-kuryr | 03:18 | |
*** salv-orlando has joined #openstack-kuryr | 03:19 | |
*** vikasc has quit IRC | 03:23 | |
*** salv-orlando has quit IRC | 03:24 | |
*** limao has quit IRC | 03:36 | |
*** vikasc has joined #openstack-kuryr | 03:36 | |
*** hongbin has quit IRC | 03:37 | |
*** kiennt26 has quit IRC | 03:45 | |
*** kiennt26 has joined #openstack-kuryr | 03:46 | |
*** limao has joined #openstack-kuryr | 03:46 | |
*** limao has quit IRC | 03:46 | |
*** limao has joined #openstack-kuryr | 03:47 | |
*** aojea has joined #openstack-kuryr | 03:49 | |
*** limao has quit IRC | 03:51 | |
*** aojea has quit IRC | 03:54 | |
*** gouthamr has quit IRC | 03:59 | |
*** salv-orlando has joined #openstack-kuryr | 04:20 | |
*** salv-orlando has quit IRC | 04:25 | |
*** kiennt26 has quit IRC | 04:41 | |
*** wangbo has quit IRC | 04:42 | |
*** vikasc has quit IRC | 04:45 | |
*** vikasc has joined #openstack-kuryr | 04:48 | |
*** aojea has joined #openstack-kuryr | 04:50 | |
*** aojea has quit IRC | 04:55 | |
*** yboaron has joined #openstack-kuryr | 05:01 | |
*** limao has joined #openstack-kuryr | 05:07 | |
*** salv-orlando has joined #openstack-kuryr | 05:21 | |
*** salv-orlando has quit IRC | 05:25 | |
*** wangbo has joined #openstack-kuryr | 05:33 | |
*** salv-orlando has joined #openstack-kuryr | 05:34 | |
*** wangbo has quit IRC | 05:34 | |
*** garyloug has quit IRC | 05:38 | |
*** janki has joined #openstack-kuryr | 05:40 | |
*** wangbo has joined #openstack-kuryr | 05:45 | |
*** aojea has joined #openstack-kuryr | 05:51 | |
*** aojea has quit IRC | 05:56 | |
*** janki has quit IRC | 05:57 | |
*** janki has joined #openstack-kuryr | 05:57 | |
*** kiennt26 has joined #openstack-kuryr | 06:04 | |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Fix ref to ports pool at nested-vlan documentation https://review.openstack.org/512495 | 06:25 |
---|---|---|
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Fix ports pool documentation https://review.openstack.org/512495 | 06:35 |
*** aojea has joined #openstack-kuryr | 06:38 | |
*** pcaruana has joined #openstack-kuryr | 06:44 | |
*** aojea has quit IRC | 06:47 | |
*** wangbo has quit IRC | 06:54 | |
*** wangbo has joined #openstack-kuryr | 06:57 | |
*** yboaron has quit IRC | 07:02 | |
*** kiennt26 has quit IRC | 07:12 | |
*** kiennt26 has joined #openstack-kuryr | 07:13 | |
*** vikasc has quit IRC | 07:31 | |
*** danil has joined #openstack-kuryr | 07:34 | |
*** aojea has joined #openstack-kuryr | 07:44 | |
*** vikasc has joined #openstack-kuryr | 07:44 | |
*** egonzalez has joined #openstack-kuryr | 07:45 | |
*** aojea has quit IRC | 07:48 | |
*** wangbo has quit IRC | 07:51 | |
*** karimb has joined #openstack-kuryr | 07:51 | |
*** karimb has quit IRC | 07:52 | |
*** wangbo has joined #openstack-kuryr | 07:52 | |
*** wangbo has quit IRC | 07:56 | |
*** wangbo has joined #openstack-kuryr | 08:02 | |
*** salv-orlando has quit IRC | 08:05 | |
*** salv-orlando has joined #openstack-kuryr | 08:05 | |
*** wangbo has quit IRC | 08:07 | |
*** yboaron has joined #openstack-kuryr | 08:08 | |
*** wangbo has joined #openstack-kuryr | 08:10 | |
*** wangbo has quit IRC | 08:10 | |
*** wangbo has joined #openstack-kuryr | 08:13 | |
*** karimb has joined #openstack-kuryr | 08:19 | |
*** phuoc_ has quit IRC | 08:24 | |
*** wangbo has quit IRC | 08:34 | |
irenab | dulek, hi | 08:35 |
irenab | I was trying https://review.openstack.org/#/c/480028/, pods stuck at creating status | 08:36 |
irenab | eventually one pod is running, second is reporting error | 08:39 |
*** wangbo has joined #openstack-kuryr | 08:40 | |
*** garyloug has joined #openstack-kuryr | 08:43 | |
*** aojea has joined #openstack-kuryr | 08:44 | |
irenab | dulek, janonymous another issue I see is when I deleted pod and then new pod was started, daemon gets netlink error on binding commit | 08:47 |
*** yamamoto has quit IRC | 08:47 | |
irenab | NetlinkError (17, 'File exists') | 08:47 |
*** garyloug has quit IRC | 08:48 | |
*** aojea has quit IRC | 08:49 | |
dulek | irenab: I'll try to reproduce it. | 08:50 |
irenab | dulek, I tried apuimedo's demo pod. Created one and then scaled to 2 instances | 08:50 |
irenab | withiut waiting for the first to become running | 08:51 |
dulek | irenab: Okay. | 08:51 |
janonymous | irenab: i thought that was unrelated to that | 08:52 |
*** garyloug has joined #openstack-kuryr | 08:52 | |
irenab | janonymous, I see exception in the kuryr-daemon | 08:52 |
irenab | same worked for me with non-daemon CNI | 08:53 |
janonymous | irenab: ohh..sometime back there was a patch related to this change, i dont recall exactly i will check | 08:54 |
janonymous | irenab: that's why i mentioned it in commit msg of https://review.openstack.org/#/c/480028/ about netlink error .. strange | 08:55 |
*** yamamoto has joined #openstack-kuryr | 08:55 | |
*** yamamoto has quit IRC | 08:55 | |
irenab | janonymous, it didn't fix itself. After seversl times I removed pod that failed, I got one running | 08:58 |
janonymous | irenab: oh.. | 09:02 |
janonymous | dulek: another way might be to use --replias=2/more with `kubectl run <image>` command , i will be checking too | 09:04 |
irenab | janonymous, exactly what I did | 09:06 |
janonymous | :) | 09:06 |
janonymous | dulek: exact command `kubectl run hello-node --image=gcr.io/google-samples/hello-app:1.0 --port=8080` | 09:09 |
janonymous | --replicas =2 | 09:09 |
dulek | janonymous: I'm stacking a fresh env to take a look. | 09:09 |
dulek | irenab, janonymous: Okay, reproduced, although I'm getting "KeyError: u'tap77ccc560-59'" | 09:14 |
*** yamamoto has joined #openstack-kuryr | 09:15 | |
*** karimb has quit IRC | 09:28 | |
*** jchhatbar has joined #openstack-kuryr | 09:34 | |
*** janki has quit IRC | 09:37 | |
*** aojea has joined #openstack-kuryr | 09:45 | |
dulek | irenab, janonymous: Oh, I think I know what's wrong… The fix is on patch that adds logging. Let me move it to the base patch while fixing irenab's comments. | 09:47 |
*** aojea has quit IRC | 09:50 | |
*** salv-orlando has quit IRC | 09:50 | |
*** salv-orlando has joined #openstack-kuryr | 09:50 | |
dulek | Oh, it's more complicated than I thought. Back to the code… | 09:54 |
*** karimb has joined #openstack-kuryr | 09:55 | |
*** salv-orlando has quit IRC | 09:55 | |
*** c00281451 has joined #openstack-kuryr | 09:57 | |
dulek | apuimedo: Hi, are you able to tell what https://github.com/openstack/kuryr-kubernetes/blob/6d9e564251853885ba54868fefb09f6741de96dc/kuryr_kubernetes/cni/binding/bridge.py#L35-L37 is doing? | 10:15 |
janonymous | dulek: is it the sys. logger which i used earlier? | 10:15 |
dulek | janonymous: No, it shouldn't be related to logger. | 10:15 |
janonymous | ok | 10:16 |
dulek | janonymous: I've briefly thought it's fault of not doing config.init_config(), but now I doubt it. | 10:16 |
dulek | Do containers on the same pod go into a single netns? | 10:18 |
*** yamamoto has quit IRC | 10:18 | |
*** yamamoto has joined #openstack-kuryr | 10:25 | |
*** wangbo has quit IRC | 10:28 | |
*** yamamoto has quit IRC | 10:30 | |
*** c00281451 has quit IRC | 10:31 | |
*** salv-orlando has joined #openstack-kuryr | 10:32 | |
*** c00281451 has joined #openstack-kuryr | 10:32 | |
*** limao has quit IRC | 10:32 | |
*** openstackgerrit has quit IRC | 10:33 | |
*** limao has joined #openstack-kuryr | 10:33 | |
*** limao_ has joined #openstack-kuryr | 10:36 | |
*** limao_ has quit IRC | 10:36 | |
*** limao_ has joined #openstack-kuryr | 10:37 | |
*** limao has quit IRC | 10:37 | |
*** caowei has quit IRC | 10:39 | |
janonymous | can you do these changes in your env : http://textuploader.com/d4uq6 | 10:40 |
janonymous | and check using grep -inr "kuryr_kubernetes.controller.drivers.additional_subnets" in kuryr-kubernetes package | 10:41 |
janonymous | if any , try to change to `kuryr_kubernetes.controller.drivers.additional_subnet` and restart controller, i think it should work | 10:41 |
janonymous | @danil : ^^ | 10:42 |
danil | janonymous, yeah, thanks, one min | 10:43 |
dulek | ltomasbo, irenab: Maybe you have an idea why we're setting netns of an interface to os.getpid() here: | 10:44 |
dulek | ltomasbo, irenab: https://github.com/openstack/kuryr-kubernetes/blob/6d9e564251853885ba54868fefb09f6741de96dc/kuryr_kubernetes/cni/binding/bridge.py#L35-L37 | 10:44 |
*** aojea has joined #openstack-kuryr | 10:46 | |
irenab | dulek, I do not remember. ivc_ are you around? | 10:49 |
ltomasbo | dulek, I don't know | 10:50 |
*** aojea has quit IRC | 10:50 | |
irenab | dulek, does this method given netns or it is None? | 10:57 |
dulek | irenab: According to traceback it's given netns. | 10:57 |
* dulek baked his env, needs to restack to continue testing… | 10:58 | |
irenab | so this is provided through params. Maybe this is the way kubelet gives the netns | 11:01 |
*** wangbo has joined #openstack-kuryr | 11:09 | |
*** wangbo has quit IRC | 11:10 | |
*** atoth has joined #openstack-kuryr | 11:14 | |
*** wangbo has joined #openstack-kuryr | 11:14 | |
*** yamamoto has joined #openstack-kuryr | 11:26 | |
*** yamamoto has quit IRC | 11:32 | |
*** c00281451 is now known as zengchen | 11:36 | |
*** karimb has quit IRC | 11:38 | |
*** yamamoto has joined #openstack-kuryr | 11:41 | |
*** yamamoto_ has joined #openstack-kuryr | 11:42 | |
*** yamamoto has quit IRC | 11:46 | |
yboaron | ping irenab | 11:51 |
zengchen | apuimedo & irenab: Sorry to interrupt you. Could you have time to review the patches of fuxi-k8s. The patches are ready for review for a long time. From my perspective, I hope the left patches could be merged. Thanks very much! | 11:54 |
zengchen | https://review.openstack.org/#/q/project:openstack/fuxi-kubernetes+status:open | 11:54 |
*** salv-orlando has quit IRC | 11:54 | |
*** yamamoto_ has quit IRC | 11:56 | |
*** c00281451 has joined #openstack-kuryr | 12:03 | |
*** zengchen has quit IRC | 12:06 | |
*** c00281451 is now known as zengchen | 12:16 | |
*** yamamoto has joined #openstack-kuryr | 12:19 | |
apuimedo | zengchen: please, remember to add us as reviewers, as we usually mostly check the patches in which we are listed as reviewers | 12:21 |
apuimedo | I'm sorry that I didn't notice or that they went out of my mind | 12:21 |
irenab | zengchen, same appology from me. Will take a look asap | 12:25 |
irenab | yboaron, hi | 12:25 |
yboaron | Hi, I checked the service access in case devstack env HA-PROXY | 12:25 |
irenab | zengchen, can you please resolve merge conflict? | 12:26 |
irenab | yboaron, any idea how it works with ref. implementation? | 12:26 |
yboaron | it appears that for HA-PROXY The load balancer port will be assigned to the projects default security group | 12:26 |
irenab | so why curl to FIP works? | 12:27 |
yboaron | and in devstack default security group enable all IPV4 ingress traffic | 12:27 |
yboaron | I assume that in your env - the default security doesnt allow all IP V4 right ? | 12:27 |
irenab | yboaron, I do not see it in my environment . I see that VIP port has same SG as pods | 12:27 |
yboaron | take a look at this one : https://github.com/kubernetes/kubernetes/issues/29745 | 12:28 |
irenab | yboaron, I just use kuryr-kubernetes local.conf and deploy devstack, this should not be different | 12:29 |
yboaron | hte only difference is the ml2 plugin ? | 12:29 |
irenab | ml2 drvier | 12:30 |
irenab | I thnk we see the same issue as reported in the link you posted | 12:30 |
apuimedo | irenab: maybe this can interest you for DF https://github.com/alibaba/ApsaraCache | 12:30 |
irenab | apuimedo, thanks. Will take a look. | 12:31 |
apuimedo | ;-) | 12:31 |
irenab | yboaron, we assign pods' SG to LB port in kuryr-kubernetes | 12:32 |
zengchen | apuimedo & irenab: thanks for your response. I will add you as the reviewer for each patches. thanks very much! | 12:33 |
yboaron | right , in my devstack all IPV4 ingress are allowed at this SG , see https://pastebin.com/4qdJTLfi | 12:33 |
apuimedo | ;-) | 12:33 |
yboaron | port_security_enabled is True for VIP port , and SG is the default one - right ? | 12:35 |
zengchen | apuimedo:btw, will the vPTG be held this week? | 12:36 |
irenab | yboaron, but ingress ipv4 is enabled only for ones with same SG | 12:36 |
irenab | not anyone | 12:36 |
irenab | yboaron, and for your question, the answer is that port security enabled and SG is the one provided by the kuryr-kubernetes SG driver (the same as set to pod ports) | 12:39 |
irenab | apuimedo, please verify that I am correct | 12:39 |
*** openstackgerrit has joined #openstack-kuryr | 12:39 | |
openstackgerrit | Daniel Mellado proposed openstack/kuryr-tempest-plugin master: [WIP] Add scenario test manager https://review.openstack.org/510896 | 12:39 |
*** salv-orlando has joined #openstack-kuryr | 12:40 | |
apuimedo | zengchen: I still did not hear from limao | 12:41 |
apuimedo | and I wanted to keep the last two sessions together | 12:41 |
apuimedo | I guess we can schedule for next week | 12:41 |
irenab | apuimedo, dulek , janonymous : I think we must have scenario tests before having cni-deamon version as default choice for deployment | 12:41 |
apuimedo | irenab: I can agree with that | 12:41 |
dulek | irenab: Me too, I'm still trying to figure out why the heck is this failing. | 12:42 |
zengchen | apuimedo:got. if you have a schedule, please send an email. thanks! | 12:42 |
apuimedo | zengchen: w\\ | 12:42 |
apuimedo | - | 12:42 |
apuimedo | zengchen: I will | 12:42 |
dulek | apuimedo: Maybe you have an idea what's happening in https://github.com/openstack/kuryr-kubernetes/blob/6d9e564251853885ba54868fefb09f6741de96dc/kuryr_kubernetes/cni/binding/bridge.py#L35-L37 | 12:42 |
dulek | apuimedo: I've tried suppressing the exception I get there, but it looks like then pod gets no network access. | 12:43 |
apuimedo | irenab: what should I verify the correctness on? | 12:44 |
irenab | my answer to yboaron regarding load balancer | 12:44 |
apuimedo | dulek: that's not good for you | 12:45 |
irenab | yboaron, apuimedo I think we will have to add explicit SG crreation/addition for VIP port to enable service specific traffic | 12:45 |
apuimedo | :-) | 12:45 |
dulek | apuimedo: I've figured that already. :D | 12:45 |
apuimedo | dulek: does your daemon run with hostnetworking? | 12:46 |
dulek | apuimedo: Yes, we're talking about non-containerized case now. | 12:46 |
apuimedo | oh | 12:46 |
dulek | apuimedo: daemon is running with sudo. | 12:46 |
apuimedo | then it should be okay | 12:46 |
dulek | apuimedo: We've figured out that scaling deployment e.g. from 1 to 3 causes NetlinError: File already exists. | 12:47 |
dulek | s/NetlinErorr/NetlinkError | 12:47 |
apuimedo | dulek: very interesting indeed | 12:47 |
apuimedo | :-) | 12:47 |
*** aojea has joined #openstack-kuryr | 12:47 | |
apuimedo | maybe that's related to the errors we saw in the scale lab :-) | 12:47 |
apuimedo | dulek: you just made me happy | 12:48 |
dulek | apuimedo: I'm guessing that for normal CNI os.getpid() returns different value every time. | 12:48 |
dulek | Well, not the reaction I've expected? :D | 12:48 |
dulek | apuimedo: And when running as daemon - we're getting same value of course. | 12:48 |
dulek | So my question is - why the heck os.getpid()? | 12:48 |
dulek | Why CNI binary process pid matters? | 12:49 |
apuimedo | dulek: did you try what happens if you comment that out? | 12:49 |
dulek | apuimedo: Wrapping it in "except: pass" resulted in some containers getting IP but being unpingable. | 12:50 |
dulek | apuimedo: I guess all of them will get unpingable, but let me try. | 12:51 |
apuimedo | dulek: that's what I'd expect as well | 12:51 |
*** aojea has quit IRC | 12:52 | |
*** garyloug has quit IRC | 12:53 | |
dulek | apuimedo: KeyError: u'tap764255bb-40' - look like next lines start to fail. | 12:54 |
apuimedo | well, of course | 12:54 |
*** yamamoto has quit IRC | 12:54 | |
dulek | apuimedo: Which starts to make sense - the lines I've commented out move iface into our namespace. | 12:54 |
apuimedo | look closely at line 36 | 12:54 |
dulek | So we can modify it. | 12:54 |
apuimedo | it is creating the veth in the container netns | 12:55 |
apuimedo | line 37 moves the host side to the host networking | 12:55 |
apuimedo | (putting the netns of the current pid) | 12:55 |
*** rwallner has joined #openstack-kuryr | 12:55 | |
apuimedo | if you comment that out | 12:55 |
apuimedo | line 39 | 12:55 |
apuimedo | which uses the host ipdb | 12:55 |
apuimedo | won't find the host side veth | 12:55 |
apuimedo | since it was not moved there | 12:56 |
apuimedo | dulek: did I explain it well? | 12:56 |
dulek | apuimedo: Yup! | 12:56 |
dulek | apuimedo: So… Why do we get error when moving more than one iface? | 12:57 |
apuimedo | however, the reason why itgets the problem of already existing file... | 12:57 |
apuimedo | let me check the pyroute2 code | 12:57 |
irenab | dulek, apuimedo can you add docstring in the code to help kuryr followers | 12:58 |
dulek | irenab: Where exactly you mean? | 12:58 |
apuimedo | irenab: where? | 12:58 |
irenab | on the binding code | 12:59 |
irenab | once it works | 12:59 |
apuimedo | irenab: I'll quote G&G: "The code is the documentation" xD | 12:59 |
irenab | to solve the mystery of moving netns/pids/... | 12:59 |
apuimedo | no, now seriously, we should put a proper docstring | 12:59 |
dulek | irenab: Okay, I can add comment, that will help me learn what happens in there. :) | 12:59 |
apuimedo | :-) | 12:59 |
irenab | :-) | 12:59 |
*** yboaron_ has joined #openstack-kuryr | 13:01 | |
apuimedo | dulek: are you sure that the host side veth name didn't collide? | 13:02 |
dulek | apuimedo: I'll need to check, but then same should happen for non-daemonized CNI plugin. | 13:02 |
dulek | And it isn't. | 13:03 |
dulek | apuimedo: Let's investigate after the meeting? | 13:03 |
apuimedo | ok | 13:03 |
*** yboaron has quit IRC | 13:04 | |
*** wangbo has quit IRC | 13:08 | |
*** karimb has joined #openstack-kuryr | 13:13 | |
*** garyloug has joined #openstack-kuryr | 13:29 | |
*** gouthamr has joined #openstack-kuryr | 13:43 | |
dulek | apuimedo: Okay, so you want me to check if veth names are not colliding when doing scaling. Let's see… | 13:45 |
apuimedo | in the mean time I check the kernel | 13:47 |
*** aojea has joined #openstack-kuryr | 13:48 | |
dulek | apuimedo: I've got tap86d5e73e-1c and tap7ddf14d1-96, second one fails. :( | 13:49 |
*** aojea has quit IRC | 13:53 | |
apuimedo | okey dokey | 13:53 |
dulek | Hm? | 13:55 |
*** yamamoto has joined #openstack-kuryr | 13:55 | |
yboaron_ | irenab, are u sure that kuryr sets the VIP port SG ? from source code , it seems that SG value not used in lbaas driver | 13:55 |
yboaron_ | https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/lbaasv2.py#L51 | 13:55 |
yboaron_ | irenab, maybe I'm missing something ... | 13:56 |
apuimedo | dulek: I meant... | 13:57 |
apuimedo | good. So I continue checking the kernel side | 13:57 |
apuimedo | dulek: I'm checking get_net_ns_by_pid | 13:58 |
dulek | apuimedo: Hm. This fails more on setting than getting. Want the traceback? | 13:58 |
apuimedo | dulek: no need | 13:58 |
apuimedo | the get is used in do_setlink | 13:59 |
*** yboaron__ has joined #openstack-kuryr | 13:59 | |
apuimedo | in net/core/rtnetlink.c | 13:59 |
apuimedo | which is what pyroute2 uses | 13:59 |
apuimedo | like when you do ip link set | 13:59 |
dulek | Mhm, okay. | 14:00 |
*** yboaron_ has quit IRC | 14:02 | |
*** jchhatbar has quit IRC | 14:03 | |
*** danil has quit IRC | 14:03 | |
*** yamamoto has quit IRC | 14:05 | |
*** limao_ has quit IRC | 14:05 | |
apuimedo | dulek: you got an EEXIST right? | 14:05 |
dulek | NetlinkError: (17, 'File exists') | 14:05 |
dulek | apuimedo: Looks like it matches - EEXIST is 17. | 14:06 |
apuimedo | I only see it in the dev_change_net_namespace | 14:08 |
apuimedo | dulek: and it is name related | 14:10 |
dulek | apuimedo: So when it's fired? | 14:11 |
apuimedo | dulek: https://github.com/torvalds/linux/blob/master/net/core/dev.c#L8268-L8275 | 14:11 |
apuimedo | I wonder what dev->name is in that moment | 14:12 |
apuimedo | hey | 14:13 |
apuimedo | it's a race | 14:13 |
openstackgerrit | Yossi Boaron proposed openstack/kuryr-kubernetes master: Closes-Bug: ####### 1714204 - Delete service/deployment causes exception https://review.openstack.org/512636 | 14:14 |
dulek | apuimedo: Hm? We have 2 parallel requests, but with different names and namespaces. | 14:15 |
apuimedo | dulek: https://github.com/svinota/pyroute2/blob/master/pyroute2/ipdb/interfaces.py#L584-L605 | 14:15 |
dulek | apuimedo: Oooow. | 14:15 |
dulek | apuimedo: But I *am* getting the exception, it's not suppressed. | 14:16 |
apuimedo | dulek: how consistently do you get it? | 14:17 |
dulek | apuimedo: Okay, got to admin, it seems a bit random. | 14:18 |
dulek | s/admin/admit | 14:18 |
apuimedo | dulek: I ask because I'm curious is changing things around | 14:18 |
apuimedo | and creating the veth pair in h_ipdb and then moving would behave better | 14:19 |
*** karimb has quit IRC | 14:21 | |
dulek | I can try that, although I don't think this tells us why the error happens. | 14:21 |
apuimedo | yeah | 14:22 |
dulek | I can pretty consistently hit it when scaling a deployment with +2. Doing one-by-one seem fine most of the time. | 14:22 |
dulek | It could be a race, but why there's a conflict. | 14:23 |
dulek | If name is different. | 14:23 |
apuimedo | it is a race between the different c_ipdbs | 14:23 |
apuimedo | why... I don't know | 14:23 |
apuimedo | let's ask svinota | 14:23 |
dulek | Actually…! A difference here is that those will get run in the same process. | 14:24 |
dulek | While in case on non-daemonized CNI we're guaranteed to run in different processes. | 14:24 |
apuimedo | dulek: are you sure that ipdb doesn't spawn it's own process? | 14:24 |
* apuimedo does not remember | 14:24 | |
dulek | Is it that pyroute2 is non threadsafe? | 14:24 |
dulek | I'm not sure, haven't checked. | 14:25 |
apuimedo | let's think | 14:25 |
apuimedo | dulek: I suppose you saw https://github.com/svinota/pyroute2/issues/306 | 14:27 |
dulek | apuimedo: I've noticed, but ignored it since it's a different error. | 14:28 |
apuimedo | yeha | 14:32 |
apuimedo | mmm | 14:32 |
*** phuoc_ has joined #openstack-kuryr | 14:33 | |
*** hongbin has joined #openstack-kuryr | 14:34 | |
apuimedo | dulek: are we sure that we don't have a recreation of a pod? | 14:38 |
apuimedo | i.e., when it fails to go running and it tries to create it again? | 14:38 |
apuimedo | couldn't in that case there be the old host side veth with the same tapxxx name? | 14:38 |
dulek | apuimedo: Hm, interesting idea, I sometimes see CNI timeouts and reruns. | 14:40 |
dulek | apuimedo: It would be pretty cool if that's it, as it would solve 2 problems at once. | 14:40 |
apuimedo | that's my worry | 14:40 |
dulek | apuimedo: So scenario would be: | 14:41 |
apuimedo | but why do we take so long time that we get to a rerun | 14:41 |
apuimedo | that I don't know | 14:41 |
*** tonygunk has joined #openstack-kuryr | 14:41 | |
dulek | apuimedo: That would need to be related to file socket, the HTTPServer daemon is spawning and the way CNI is doing the request. | 14:41 |
dulek | 1. CNI sends the request. | 14:41 |
dulek | 2. CNI daemon gets requests, plugs vif, returns. | 14:42 |
dulek | 3. CNI doesn't get reply for some reason - so it retries. | 14:42 |
dulek | 4. CNI daemon gets request again and fails. | 14:43 |
dulek | BTW - it shouldn't really fail, we should probably be idempotent. | 14:43 |
apuimedo | dulek: that's true | 14:44 |
apuimedo | dulek: can you do something fast | 14:44 |
apuimedo | in case it fails, use h_ipdb to check if it exists and log it | 14:44 |
apuimedo | if it does, then we can see on which namespace its pair is and move it to where it needs to be | 14:44 |
dulek | apuimedo: Okay, let me see if I can code it. | 14:45 |
apuimedo | dulek: forget about the moving part | 14:46 |
apuimedo | for now the logging of the h_ipdb.interfaces[host_ifname] | 14:46 |
apuimedo | is sufficient | 14:47 |
dulek | apuimedo: Just logging? | 14:47 |
apuimedo | yeah | 14:47 |
apuimedo | to see if my hypothesis is correct | 14:47 |
*** yamamoto has joined #openstack-kuryr | 14:47 | |
*** yamamoto has quit IRC | 14:47 | |
dulek | apuimedo: http://paste.openstack.org/show/623848/ | 14:55 |
dulek | apuimedo: 'link_netnsid': 26 | 14:55 |
apuimedo | so it did exist, eh? | 14:56 |
apuimedo | so now it's about to check where the pair is | 14:58 |
apuimedo | s/to check/checking/ | 14:58 |
dulek | apuimedo: What do you mean by "pair" in this context? | 14:58 |
dulek | Bridged interfaces? | 14:58 |
apuimedo | these are veth pairs | 14:58 |
*** janki has joined #openstack-kuryr | 14:58 | |
apuimedo | it means it is a pair of linux virtual device | 14:59 |
apuimedo | one end on the host side | 14:59 |
apuimedo | one on the container side | 14:59 |
apuimedo | so it's about looking for the container side of this already existing host side | 14:59 |
apuimedo | device | 14:59 |
dulek | Uhm. | 15:00 |
*** kiennt26_ has joined #openstack-kuryr | 15:03 | |
apuimedo | dulek: ? | 15:04 |
dulek | apuimedo: I'm trying to figure it out. ;) I guess I shouldn't kill the pod after getting the error. | 15:04 |
apuimedo | probably not :-) | 15:08 |
*** egonzalez has quit IRC | 15:14 | |
dulek | apuimedo: http://paste.openstack.org/show/623852/ | 15:20 |
* apuimedo reconnecting | 15:21 | |
dulek | apuimedo: Not sure if that's what I should look for… | 15:21 |
openstackgerrit | Yossi Boaron proposed openstack/kuryr-kubernetes master: Eliminate wrong ERROR report (in kuryr log file) when service of type LoadBalancer type is deleted https://review.openstack.org/512670 | 15:26 |
*** yboaron__ has quit IRC | 15:31 | |
apuimedo | dulek sudo nsenter -t 2799 -n ip -o -d link show | 15:40 |
dulek | apuimedo: http://paste.openstack.org/show/623857/ | 15:41 |
apuimedo | dulek: which is this namespace? | 15:43 |
apuimedo | the one of the failed pod? | 15:43 |
apuimedo | what about in the host namespace? | 15:43 |
apuimedo | Is the tapf7a... device tehre? | 15:43 |
apuimedo | which is its index? | 15:43 |
apuimedo | and @ | 15:43 |
dulek | apuimedo: "CNI_NETNS": "/proc/2799/ns/net" - that's failed CNI request. | 15:44 |
dulek | I have the tap interface on the host. | 15:44 |
dulek | apuimedo: 423: tapf7afbf1c-ba@if3 | 15:44 |
apuimedo | if3... | 15:44 |
apuimedo | I don't suppose there is an if3 in the host namespace, is there? | 15:45 |
dulek | apuimedo: Nope. But all the tap interfaces have @if3. | 15:45 |
dulek | Even those that work correctly. | 15:45 |
apuimedo | dulek: yeah... I think it just means "no fucking clue where the pair is" | 15:45 |
dulek | apuimedo: Should it be paired with eth0? | 15:46 |
apuimedo | of some container namespace | 15:46 |
apuimedo | how many infra containers you have on `docker ps`? | 15:46 |
apuimedo | you have to try to find an eth0 inside a container | 15:47 |
apuimedo | with an interface index number 424 (or 422 now I don't remember) | 15:47 |
dulek | apuimedo: infra containers? | 15:47 |
*** yamamoto has joined #openstack-kuryr | 15:48 | |
apuimedo | dulek: when you do `docker ps` | 15:48 |
apuimedo | for each pod | 15:48 |
apuimedo | you have an infra container | 15:48 |
apuimedo | and a container with the actual image that is being run | 15:49 |
dulek | Ah, okay. 4 containers. | 15:49 |
apuimedo | I'm telling you to look at the infra containers only, so that you don't need to look 2x | 15:49 |
apuimedo | :-) | 15:49 |
apuimedo | so do | 15:49 |
apuimedo | docker exec name_of_each_container ip link show | 15:49 |
*** aojea has joined #openstack-kuryr | 15:50 | |
dulek | apuimedo: http://paste.openstack.org/show/623859/ | 15:51 |
dulek | apuimedo: First one is from infra container. | 15:51 |
apuimedo | dulek: right | 15:52 |
apuimedo | is that pod running? | 15:52 |
apuimedo | is it terminating | 15:52 |
apuimedo | is it a troll pod? | 15:52 |
dulek | The one with CNI failure is in "cannot join network of a non running container", previously ContainerCreating. | 15:53 |
dulek | To catch the container namespace I needed to stop kubelet - otherwise kubelet retried constantly while changing it. | 15:54 |
apuimedo | dulek: let me rephrase | 15:54 |
*** aojea has quit IRC | 15:54 | |
apuimedo | if you find the pod for this infra container | 15:54 |
apuimedo | what does it say | 15:54 |
apuimedo | in describe? | 15:54 |
apuimedo | the 'cannot join'? | 15:54 |
*** gouthamr has quit IRC | 15:54 | |
apuimedo | but is it a pod that was created before teh one in which we now fail to move the device to the host namespace? | 15:55 |
*** yamamoto has quit IRC | 15:55 | |
*** gouthamr_ has joined #openstack-kuryr | 15:55 | |
dulek | Just a second… | 15:57 |
*** limao has joined #openstack-kuryr | 15:57 | |
*** rwallner has quit IRC | 15:58 | |
dulek | apuimedo: Okay, finally - cannot join. | 15:58 |
dulek | apuimedo: Let's take a step back. This isn't very productive, as I'm having troubles following what we're doing… | 15:59 |
*** limao_ has joined #openstack-kuryr | 15:59 | |
apuimedo | dulek: I've got an important meeting now | 16:00 |
*** rwallner_ has joined #openstack-kuryr | 16:00 | |
apuimedo | I'll try to answer as much as possible | 16:00 |
dulek | apuimedo: Okay, what I'll try to first is to get rid of those timeouts on CNI. | 16:00 |
*** pcaruana has quit IRC | 16:01 | |
dulek | apuimedo: Maybe it'll kill 2 birds with one stone. | 16:01 |
*** rwallne__ has joined #openstack-kuryr | 16:02 | |
*** limao has quit IRC | 16:02 | |
*** rwallner_ has quit IRC | 16:04 | |
*** rwallne__ has quit IRC | 16:04 | |
*** rwallner has joined #openstack-kuryr | 16:05 | |
openstackgerrit | Hongbin Lu proposed openstack/kuryr master: Allow multiple binding drivers https://review.openstack.org/508778 | 16:07 |
apuimedo | :-) | 16:07 |
*** kiennt26_ has quit IRC | 16:07 | |
openstackgerrit | Hongbin Lu proposed openstack/kuryr-libnetwork master: Support searching existing port with macaddress https://review.openstack.org/505443 | 16:14 |
*** salv-orl_ has joined #openstack-kuryr | 16:29 | |
*** salv-orlando has quit IRC | 16:32 | |
*** tonygunk has quit IRC | 16:40 | |
*** jchhatbar has joined #openstack-kuryr | 16:44 | |
*** jchhatbar has quit IRC | 16:45 | |
*** jchhatbar has joined #openstack-kuryr | 16:45 | |
*** rwallner has quit IRC | 16:46 | |
*** janki has quit IRC | 16:47 | |
*** aojea has joined #openstack-kuryr | 16:50 | |
*** aojea has quit IRC | 16:55 | |
*** jchhatbar has quit IRC | 16:56 | |
*** garyloug has quit IRC | 16:57 | |
*** karimb has joined #openstack-kuryr | 17:04 | |
*** limao_ has quit IRC | 17:08 | |
*** rwallner has joined #openstack-kuryr | 17:09 | |
*** leyal has quit IRC | 17:17 | |
*** leyal has joined #openstack-kuryr | 17:19 | |
*** rwallner_ has joined #openstack-kuryr | 17:25 | |
*** salv-orlando has joined #openstack-kuryr | 17:28 | |
*** rwallner has quit IRC | 17:28 | |
*** salv-orl_ has quit IRC | 17:32 | |
*** salv-orlando has quit IRC | 17:32 | |
*** aojea has joined #openstack-kuryr | 17:51 | |
*** gouthamr_ is now known as gouthamr | 17:52 | |
*** aojea has quit IRC | 17:56 | |
*** aojea has joined #openstack-kuryr | 18:38 | |
*** aojea has quit IRC | 18:47 | |
*** aojea has joined #openstack-kuryr | 19:43 | |
*** aojea has quit IRC | 19:48 | |
*** rwallner_ has quit IRC | 19:55 | |
*** tonygunk has joined #openstack-kuryr | 20:27 | |
*** aojea has joined #openstack-kuryr | 20:44 | |
*** aojea has quit IRC | 20:49 | |
*** tonygunk has quit IRC | 21:00 | |
*** gouthamr has quit IRC | 21:08 | |
*** salv-orlando has joined #openstack-kuryr | 21:42 | |
*** salv-orlando has quit IRC | 21:44 | |
*** salv-orlando has joined #openstack-kuryr | 21:44 | |
*** aojea has joined #openstack-kuryr | 21:45 | |
*** aojea has quit IRC | 21:49 | |
*** atoth has quit IRC | 22:43 | |
*** salv-orlando has quit IRC | 22:45 | |
*** salv-orlando has joined #openstack-kuryr | 22:46 | |
*** oanson has quit IRC | 22:50 | |
*** salv-orlando has quit IRC | 22:51 | |
*** oanson has joined #openstack-kuryr | 22:52 | |
*** hongbin has quit IRC | 23:32 | |
*** salv-orlando has joined #openstack-kuryr | 23:46 | |
*** salv-orlando has quit IRC | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!