*** hongbin has joined #openstack-kuryr | 00:57 | |
*** ndesh has joined #openstack-kuryr | 01:47 | |
*** irclogbot_1 has joined #openstack-kuryr | 02:06 | |
*** irclogbot_1 has quit IRC | 02:13 | |
*** irclogbot_0 has joined #openstack-kuryr | 02:16 | |
*** altlogbot_3 has joined #openstack-kuryr | 02:18 | |
*** irclogbot_0 has quit IRC | 02:19 | |
*** altlogbot_3 has quit IRC | 02:34 | |
*** irclogbot_1 has joined #openstack-kuryr | 03:20 | |
*** irclogbot_1 has quit IRC | 03:25 | |
*** hongbin has quit IRC | 03:44 | |
*** altlogbot_3 has joined #openstack-kuryr | 03:54 | |
*** altlogbot_3 has quit IRC | 03:59 | |
*** gcheresh_ has joined #openstack-kuryr | 04:34 | |
*** pcaruana has joined #openstack-kuryr | 04:55 | |
*** ccamposr has joined #openstack-kuryr | 05:18 | |
*** ccamposr__ has quit IRC | 05:20 | |
*** gcheresh_ has quit IRC | 05:34 | |
*** gcheresh_ has joined #openstack-kuryr | 05:40 | |
*** ccamposr__ has joined #openstack-kuryr | 05:57 | |
*** gcheresh_ has quit IRC | 05:58 | |
*** ccamposr has quit IRC | 06:00 | |
*** altlogbot_1 has joined #openstack-kuryr | 06:18 | |
*** altlogbot_1 has quit IRC | 06:23 | |
*** altlogbot_0 has joined #openstack-kuryr | 06:24 | |
*** altlogbot_0 has quit IRC | 06:29 | |
*** openstackgerrit has joined #openstack-kuryr | 06:53 | |
openstackgerrit | Alexey Perevalov proposed openstack/kuryr-kubernetes master: Use CNI_IFNAME environment variable https://review.opendev.org/670141 | 06:53 |
---|---|---|
*** altlogbot_3 has joined #openstack-kuryr | 06:56 | |
*** altlogbot_3 has quit IRC | 07:01 | |
*** gcheresh_ has joined #openstack-kuryr | 07:11 | |
*** korzen has joined #openstack-kuryr | 07:14 | |
*** irclogbot_2 has joined #openstack-kuryr | 07:16 | |
*** irclogbot_2 has quit IRC | 07:20 | |
*** irclogbot_3 has joined #openstack-kuryr | 07:24 | |
*** irclogbot_3 has quit IRC | 07:27 | |
*** altlogbot_3 has joined #openstack-kuryr | 07:48 | |
*** altlogbot_3 has quit IRC | 07:49 | |
*** irclogbot_3 has joined #openstack-kuryr | 07:52 | |
*** irclogbot_3 has quit IRC | 07:55 | |
*** irclogbot_0 has joined #openstack-kuryr | 08:08 | |
*** irclogbot_0 has quit IRC | 08:13 | |
*** maysams has joined #openstack-kuryr | 08:20 | |
*** gkadam has joined #openstack-kuryr | 08:28 | |
*** ccamposr has joined #openstack-kuryr | 08:51 | |
*** ccamposr__ has quit IRC | 08:53 | |
*** FlorianFa has joined #openstack-kuryr | 09:36 | |
*** korzen has quit IRC | 09:37 | |
*** irclogbot_3 has joined #openstack-kuryr | 09:48 | |
*** irclogbot_3 has quit IRC | 09:51 | |
*** altlogbot_3 has joined #openstack-kuryr | 10:26 | |
*** altlogbot_3 has quit IRC | 10:29 | |
*** janki has joined #openstack-kuryr | 10:32 | |
*** altlogbot_0 has joined #openstack-kuryr | 10:34 | |
*** altlogbot_0 has quit IRC | 10:39 | |
*** altlogbot_2 has joined #openstack-kuryr | 10:40 | |
*** irclogbot_0 has joined #openstack-kuryr | 10:40 | |
*** altlogbot_2 has quit IRC | 10:45 | |
*** irclogbot_0 has quit IRC | 10:45 | |
*** altlogbot_3 has joined #openstack-kuryr | 10:56 | |
*** altlogbot_3 has quit IRC | 11:01 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Set the validate CRD enabled flag at tempest.conf https://review.opendev.org/668916 | 11:03 |
openstackgerrit | Merged openstack/kuryr-kubernetes stable/stein: Fix fail to recreate namespace when previous KuryrNet CRD is not deleted https://review.opendev.org/669800 | 11:14 |
*** altlogbot_1 has joined #openstack-kuryr | 11:56 | |
*** rh-jelabarre has joined #openstack-kuryr | 11:59 | |
*** altlogbot_1 has quit IRC | 12:01 | |
*** danil has joined #openstack-kuryr | 12:08 | |
danil | Hello. Does anybody know why should we specify resourceVersion here https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/handlers/vif.py#L212 ? Could we make annotations withot exact resourceVersion? Can it cause problems? | 12:10 |
dulek | danil: resourceVersion is to make sure nobody updated that annotation between. | 12:15 |
dulek | danil: Basically to avoid lost update. | 12:15 |
*** janki has quit IRC | 12:21 | |
*** irclogbot_3 has joined #openstack-kuryr | 12:32 | |
*** irclogbot_3 has quit IRC | 12:35 | |
*** janki has joined #openstack-kuryr | 12:43 | |
*** janki has quit IRC | 12:45 | |
danil | dulek, so from the code I can see, that if somebody updated annotations between, we will get an Exception from k8s client. After that an attempt will be repeated until it will be succeed, right? | 12:49 |
*** altlogbot_2 has joined #openstack-kuryr | 12:50 | |
dulek | danil: Yes, but only if that annotation was not modified. | 12:50 |
dulek | Otherwise you'll get an error and you'll need to check if there wasn't a conflict or something. | 12:51 |
danil | do you mean "annotation was not modified" between attempts? or before ? | 12:51 |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Use CNI_IFNAME environment variable https://review.opendev.org/670141 | 12:52 |
*** altlogbot_2 has quit IRC | 12:53 | |
dulek | danil: Okay, so here how it works: | 13:01 |
dulek | 1. We read pod from K8s API. | 13:01 |
dulek | 2. We do some modifications to annotations based on that information. | 13:02 |
dulek | 3. Then we try to save the annotation. | 13:02 |
dulek | 4. Now if we won't include resourceVersion, that pod might have changed, or some other thread/service wrote to that annotation field. | 13:02 |
dulek | So we use resourceVersion to make sure that if the pod resource was modified between we read and saved it - we will reconsider it. | 13:03 |
dulek | According to this: <danil> conflict in resourceVersion happens (in my opinion) because object was updated in binding driver (also resourceVersion was updated), but vif handler tries to annotate pod with old resourceVersion | 13:04 |
dulek | Well then I think resourceVersion serves its purpose - makes sure that VIF handler will notice that pod was updated. | 13:04 |
dulek | danil: So the VIF handler should retry, shouldn't it? | 13:05 |
danil | yes, it retries , but after exceptions from k8s client like this... | 13:07 |
danil | Exception response, headers: {'Date': 'Thu, 04 Jul 2019 09:58:23 GMT', 'Content-Length': '338', 'Content-Type': 'application/json'}, content: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on pods \"nginx-sriov-664456478-cr8kq\": the object has been modified; please apply your changes to | 13:07 |
danil | the latest version and try again","reason":"Conflict","details":{"name":"nginx-sriov-664456478-cr8kq","kind":"pods"},"code":409}, text: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on pods \"nginx-sriov-664456478-cr8kq\": the object has been modified; please apply your changes to the | 13:07 |
danil | latest version and try again","reason":"Conflict","details":{"name":"nginx-sriov-664456478-cr8kq","kind":"pods"},"code":409} | 13:07 |
danil | dulek, ^ sorry, did'n ,antioned you | 13:13 |
dulek | danil: This is saving the VIF annotation? | 13:14 |
danil | dulek, it happens in vif handler while _set_pod_state. It tries to save vif annotations after all neutron ports were active. An Exception occurs , it retries , after that annotations are written | 13:18 |
danil | https://pastebin.com/bLUXpsNm | 13:18 |
dulek | danil: Wait, on which level it retries? | 13:18 |
dulek | Okay, I see, on RetryHandler level. | 13:19 |
danil | dulek, yes | 13:19 |
dulek | danil: Do you have more logs? | 13:25 |
dulek | danil: Like including this one: https://github.com/openstack/kuryr-kubernetes/blob/c8d41c0d498d8337d6ec5dcb28bbc425be55ff39/kuryr_kubernetes/k8s_client.py#L198-L199 | 13:25 |
danil | dulek, one moment | 13:25 |
danil | dulek, https://pastebin.com/BQAs08A0 here are logs from controller | 13:35 |
*** irclogbot_1 has joined #openstack-kuryr | 13:40 | |
dulek | danil: Okay, so either 'openstack.org/kuryr-vif' or 'openstack.org/kuryr-pod-label' got updated between the read and write. | 13:43 |
dulek | danil: You need to discover what updated it and why it happens at this moment. | 13:43 |
dulek | danil: And if needed implement retry logic in VIF handler. | 13:44 |
dulek | danil: This stuff is actually quite expected as we started annotating pods in kuryr-daemon as well. | 13:44 |
dulek | It is way easier to implement if there's just one entity doing that. | 13:45 |
dulek | But if only kuryr-daemon knows about SR-IOV, then it might be best design we have. | 13:45 |
*** irclogbot_1 has quit IRC | 13:45 | |
dulek | danil: I would start with printing values of the annotations in the LOG.debug() I linked above. | 13:45 |
dulek | To compare what actually changes. | 13:45 |
danil | dulek, hm, actually binding driver made changes in annotations , but it was another name of annotations. I suppose that after this change resourceVersion of pod object was changed. So when controller tries to make annotations once again (with specified old resourceVersion) it has a conflict | 13:47 |
dulek | danil: No, this code is supposed to retry if only unrelated annotations were modified: https://github.com/openstack/kuryr-kubernetes/blob/c8d41c0d498d8337d6ec5dcb28bbc425be55ff39/kuryr_kubernetes/k8s_client.py#L215-L229 | 13:48 |
dulek | danil: So unless there's a flaw there (might be, it's not trivial), then it should retry if none of 'openstack.org/kuryr-vif' and 'openstack.org/kuryr-pod-label' got modified. | 13:49 |
dulek | Hm should it? | 13:49 |
dulek | Yes, it should. | 13:50 |
danil | dulek, https://github.com/openstack/kuryr-kubernetes/blob/c8d41c0d498d8337d6ec5dcb28bbc425be55ff39/kuryr_kubernetes/k8s_client.py#L220-L222 , here I got a break because 'openstack.org/kuryr-vif' was modified from controller side (it changed 'active' to True for default VIF because is have seen active related neutron port) | 13:58 |
danil | but it is normal behavior | 13:58 |
dulek | danil: Well, then you found the culprit, I assume? | 13:59 |
danil | dulek, no) is this not a normal behavior? neutron port becomes active -> related vif is marked as active -> annotate new annotations | 14:01 |
dulek | danil: Wait, you have 2 VIF's there now? | 14:06 |
dulek | danil: So it's probably like this: | 14:06 |
dulek | Both become active nearly the same time and one update wins the race condition and second loses it. | 14:07 |
dulek | You can confirm it the way I proposed above - just add the annotation value to the debug statement there, so you can see what is being annotated and what was overwritten, right? | 14:08 |
danil | dulek, yes, we have 2 VIF's there. (Actually one of them that is sriov is made active by hands, it doesn't wait for related neutron port). Also it is strange because this problems occurred only with https://review.opendev.org/#/c/656482/ | 14:15 |
danil | I confirmed | 14:15 |
danil | we try to write "active": True , but we have "active": False . This is for default vif | 14:16 |
dulek | danil: Uhm, just a sec, let me think. | 14:17 |
*** gcheresh_ has quit IRC | 14:17 | |
danil | dulek, I ve checked one thing: I checked out on master for CNI side, and problem disappeared . So I think problem is in new patch that annotates to pod from sriov binding driver | 14:19 |
danil | dulek, no, sorry | 14:19 |
dulek | danil: Sure thing, if something modifies pod in between and annotation is already there we get an issue. | 14:19 |
dulek | Now I don't really see if we could use JSON patch to help with this. | 14:23 |
dulek | Maybeee…? | 14:23 |
dulek | danil: You can try to rewrite that logic to JSON patch - I think it would only fail then when that particular annotations were modified. | 14:24 |
dulek | But I'm not sure here. | 14:24 |
danil | dulek, ok, thanks. I will try to localize the place where problems start. It seems to me that I have not realized it yet) | 14:25 |
openstackgerrit | Merged openstack/kuryr master: Fix kuryr CI https://review.opendev.org/669553 | 14:33 |
*** gcheresh_ has joined #openstack-kuryr | 14:35 | |
*** gcheresh_ has quit IRC | 14:58 | |
*** ndesh has quit IRC | 15:05 | |
danil | dulek, I've checked. And it looks strange: | 15:07 |
danil | I've updated CNI to master version, and only controller side has new patch now. And it causes k8s exception | 15:08 |
openstackgerrit | Alexey Perevalov proposed openstack/kuryr-kubernetes master: Change trace pod/pool drivers are incompatible https://review.opendev.org/669992 | 15:10 |
openstackgerrit | Alexey Perevalov proposed openstack/kuryr-kubernetes master: Change trace pod/pool drivers are incompatible https://review.opendev.org/669992 | 15:11 |
*** pcaruana has quit IRC | 15:11 | |
dulek | danil: IMO it's related to your pod having 2 interfaces. | 15:17 |
dulek | Both want to become active and that's why it happens. | 15:18 |
dulek | It wouldn't happen if both would be updated at the same time, but I think it's not the case. | 15:18 |
*** pcaruana has joined #openstack-kuryr | 15:57 | |
*** altlogbot_3 has joined #openstack-kuryr | 16:04 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Make SG modifications for LoadBalancers optional https://review.opendev.org/665227 | 16:05 |
*** altlogbot_3 has quit IRC | 16:07 | |
*** maysams has quit IRC | 16:32 | |
*** altlogbot_2 has joined #openstack-kuryr | 16:44 | |
*** gkadam has quit IRC | 16:46 | |
*** altlogbot_2 has quit IRC | 16:49 | |
*** altlogbot_2 has joined #openstack-kuryr | 17:22 | |
*** altlogbot_2 has quit IRC | 17:25 | |
*** gcheresh_ has joined #openstack-kuryr | 17:28 | |
*** altlogbot_1 has joined #openstack-kuryr | 17:30 | |
*** altlogbot_1 has quit IRC | 17:35 | |
*** altlogbot_2 has joined #openstack-kuryr | 17:36 | |
*** gcheresh_ has quit IRC | 17:40 | |
*** altlogbot_2 has quit IRC | 17:41 | |
*** altlogbot_2 has joined #openstack-kuryr | 17:43 | |
*** altlogbot_2 has quit IRC | 17:45 | |
*** irclogbot_3 has joined #openstack-kuryr | 19:28 | |
*** irclogbot_3 has quit IRC | 19:33 | |
*** mrostecki has joined #openstack-kuryr | 19:37 | |
*** altlogbot_2 has joined #openstack-kuryr | 19:39 | |
*** altlogbot_2 has quit IRC | 19:41 | |
*** jistr has quit IRC | 19:43 | |
*** jistr has joined #openstack-kuryr | 19:45 | |
*** jistr has quit IRC | 20:03 | |
*** jistr has joined #openstack-kuryr | 20:05 | |
*** brault has joined #openstack-kuryr | 20:10 | |
*** irclogbot_3 has joined #openstack-kuryr | 20:12 | |
*** gcheresh_ has joined #openstack-kuryr | 20:13 | |
*** brault has quit IRC | 20:15 | |
*** irclogbot_3 has quit IRC | 20:16 | |
*** brault has joined #openstack-kuryr | 20:16 | |
*** brault has quit IRC | 20:17 | |
*** pcaruana has quit IRC | 20:31 | |
*** gcheresh_ has quit IRC | 20:57 | |
*** altlogbot_3 has joined #openstack-kuryr | 21:00 | |
*** altlogbot_3 has quit IRC | 21:05 | |
*** irclogbot_0 has joined #openstack-kuryr | 21:23 | |
*** irclogbot_0 has quit IRC | 21:26 | |
*** irclogbot_1 has joined #openstack-kuryr | 21:29 | |
*** irclogbot_1 has quit IRC | 21:32 | |
*** danil has quit IRC | 21:57 | |
*** altlogbot_2 has joined #openstack-kuryr | 22:17 | |
*** altlogbot_3 has joined #openstack-kuryr | 22:19 | |
*** irclogbot_0 has joined #openstack-kuryr | 22:19 | |
*** altlogbot_3 has quit IRC | 22:23 | |
*** irclogbot_0 has quit IRC | 22:24 | |
*** rh-jelabarre has quit IRC | 22:27 | |
*** altlogbot_2 has joined #openstack-kuryr | 22:57 | |
*** altlogbot_2 has quit IRC | 22:59 | |
*** irclogbot_0 has joined #openstack-kuryr | 23:01 | |
*** irclogbot_0 has quit IRC | 23:06 | |
*** irclogbot_0 has joined #openstack-kuryr | 23:09 | |
*** irclogbot_0 has quit IRC | 23:16 | |
*** altlogbot_3 has joined #openstack-kuryr | 23:29 | |
*** altlogbot_3 has quit IRC | 23:30 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!