*** jistr has quit IRC | 01:00 | |
*** jistr has joined #openstack-kuryr | 01:01 | |
*** hongbin has joined #openstack-kuryr | 01:03 | |
*** hongbin has quit IRC | 01:37 | |
*** hongbin has joined #openstack-kuryr | 01:56 | |
*** hongbin has quit IRC | 02:25 | |
*** hongbin has joined #openstack-kuryr | 03:46 | |
*** spsurya has joined #openstack-kuryr | 04:28 | |
*** hongbin has quit IRC | 04:55 | |
*** gkadam has joined #openstack-kuryr | 07:22 | |
*** maysams has joined #openstack-kuryr | 07:24 | |
*** gcheresh has joined #openstack-kuryr | 07:53 | |
*** pcaruana has joined #openstack-kuryr | 08:01 | |
dulek | ltomasbo: I have another hit of that SG issue when subnet namespace driver is enabled: http://logs.openstack.org/99/632999/3/check/kuryr-kubernetes-tempest-daemon-openshift-octavia/b0b6707/controller/logs/screen-kuryr-kubernetes.txt.gz | 08:05 |
---|---|---|
ltomasbo | dulek, checking | 08:13 |
dulek | ltomasbo: Thanks! I'll be available for a while from the airport, my flight's delayed. | 08:14 |
dulek | ltomasbo: Are you on a train? ;) | 08:14 |
ltomasbo | dulek, yep! | 08:15 |
ltomasbo | dulek, I just switched trains, heading to Barcelona in 15 mins or so | 08:16 |
dulek | ltomasbo: Regarding the issue - to me it seems like some race condition. You can look up the missing SG id's in q-svc logs. | 08:16 |
ltomasbo | dulek, umm, strange | 08:17 |
dulek | ltomasbo: That's why I wasn't able to debug it myself. :) | 08:18 |
ltomasbo | dulek, it happens on openshift gates, right? so related to namespace isolation probably | 08:18 |
dulek | ltomasbo: Yes, yes, I'm pretty sure it's due to namespace subnet driver. | 08:18 |
ltomasbo | dulek, I'm wondering if there was some issue (neutron timing issue) and then the rollback is not fully working... leading to a broken env... | 08:19 |
dulek | ltomasbo: I thought about that SG getting removed in rollback, but if I remember correctly there was no SG deletion on q-svc. Let me double check. | 08:20 |
ltomasbo | ahh, wait | 08:21 |
ltomasbo | dulek, I see there is 2 subsequent calls | 08:21 |
ltomasbo | create namespace, create security group rule | 08:21 |
dulek | Oh, okay, I only see the SG rule creation and that's failing. | 08:21 |
ltomasbo | and the error seems to come from the second one | 08:21 |
ltomasbo | creating the security group rule, while the create_security_group seems to not have finished | 08:22 |
dulek | ltomasbo: Yeah, from creation of SG rule. | 08:22 |
ltomasbo | perhaps some race on the neutron side? | 08:22 |
dulek | ltomasbo: Ooooh. It's 201 that's returned from SG creation. So it's only ACCEPTED. | 08:22 |
* dulek checks if Neutron changed something there recently. | 08:23 | |
ltomasbo | dulek, yep, but I think it is only accepted because they cannot ensure it is applied on the hypervisors | 08:23 |
ltomasbo | dulek, but it should be created on the database at least! | 08:23 |
ltomasbo | it will be really dumb to have to poll in there... | 08:23 |
dulek | ltomasbo: Yup, I agree here! | 08:23 |
ltomasbo | perhaps we can add a retry if NotFound exception for the second... | 08:24 |
ltomasbo | dulek, ^^ | 08:24 |
ltomasbo | to be on the safe side... | 08:24 |
dulek | ltomasbo: That would work, but I think we both find it nasty? :D | 08:24 |
ltomasbo | yes yes! I don't think that should be the way | 08:25 |
ltomasbo | that should be ensure on the neutron side | 08:25 |
ltomasbo | it will be just 'defensive' programing :/ | 08:25 |
dulek | ltomasbo: https://review.openstack.org/#/c/628691/ - might be related? | 08:25 |
dulek | ltomasbo: Got merged 4 days ago. That should be our cause. | 08:26 |
ltomasbo | dulek, yep, makes sense | 08:26 |
ltomasbo | should we ping our neutron folks? | 08:27 |
openstackgerrit | Danil Golov proposed openstack/kuryr-kubernetes master: Fix a misprint in SR-IOV binding driver https://review.openstack.org/633453 | 08:28 |
dulek | ltomasbo: Well, there are 2 RH folks that accepted it. :D | 08:29 |
dulek | ltomasbo: I'll ping slaweq, he'll be willing to help me. :) | 08:29 |
ltomasbo | dulek, great! thanks! | 08:29 |
ltomasbo | dulek, btw, if you are bored waiting at the airport... https://review.openstack.org/#/c/631587/ | 08:31 |
dulek | ltomasbo: Sure, in a moment. | 08:32 |
ltomasbo | dulek, no hurry! | 08:32 |
maysams | dulek: I am having the same issue. Did you try with the default driver? | 08:45 |
maysams | dulek, ltomasbo: I just tried and the problem remains | 08:46 |
ltomasbo | maysams, default as in without namespace nor network policy? | 08:47 |
dulek | maysams: With the default SG driver? | 08:47 |
maysams | dulek, ltomasbo: I was trying to create a NP and it was not able to create the sg | 08:47 |
ltomasbo | maysams, I assume it will happen everytime we create SGs + SG rules | 08:47 |
maysams | dulek: yes | 08:47 |
dulek | ltomasbo: +1 | 08:47 |
maysams | yup, I think so | 08:47 |
dulek | maysams, ltomasbo: So slaweq told me that nobody else is complaining. | 08:47 |
ltomasbo | maysams, ahh, then, when creating a np, the action is creating a sg + sg_rules, so it will happen too | 08:48 |
maysams | I saw that you guys thought it was only related to namespace subnet driver | 08:48 |
ltomasbo | dulek, if people is creating them manually perhaps there is no problem... | 08:48 |
dulek | And create-SG-then-rules is a pretty common pattern, so either we do it differently of everyone's broken. | 08:48 |
maysams | so, I thought it maybe be good to point that this happens with default as well | 08:48 |
dulek | ltomasbo: Don't we create SG and rules in the DevStack plugin as well? | 08:48 |
ltomasbo | dulek, I would be amaze if neutron folks don't have a gate creating security group and then rules on top of it... | 08:49 |
ltomasbo | dulek, perhaps our tests are creating a few more in parallel than they do | 08:49 |
ltomasbo | dulek, because we will create a few when the kuryr-controller is started and handles all the base openshift namespaces | 08:50 |
maysams | will be heading to the office see you guys later | 08:50 |
ltomasbo | dulek, so, probably other people is not triggering that many sg+sg_rules creation concurrently as we do | 08:50 |
dulek | ltomasbo: Good point. Are you able to point me to the code in namespace subnet driver that the error comes from? | 08:50 |
ltomasbo | dulek, sure! | 08:50 |
ltomasbo | one sec | 08:50 |
dulek | ltomasbo: You can probably tell that my "don't do `raise ex`" patch is due to traceback being lost on exceptions in those logs. :P | 08:51 |
ltomasbo | dulek, https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/namespace_security_groups.py#L99-L121 | 08:51 |
ltomasbo | dulek, yep, your patch will be helpful! xD | 08:52 |
*** maysams has quit IRC | 08:55 | |
dulek | Ah, and just in case everyone - due to issue discussed above merges will fail, so you can abstain from rechecking until we figure it out. | 09:02 |
dulek | ltomasbo: Okay, I have a hypothesis. | 09:08 |
dulek | ltomasbo: self._check_security_group(context, remote_group_id, | 09:08 |
dulek | project_id=rule['tenant_id']) | 09:08 |
*** ccamposr has joined #openstack-kuryr | 09:08 | |
dulek | ltomasbo: That is probably failing. That tenant_id is probably None as we don't specify it. | 09:09 |
*** maysams has joined #openstack-kuryr | 09:09 | |
*** maysams has joined #openstack-kuryr | 09:12 | |
ltomasbo | dulek, tenant_id? | 09:12 |
ltomasbo | dulek, wasn't that deprecated in favor of project_id? | 09:13 |
dulek | ltomasbo: Whatever, it's DB code, Neutron still names the field tenant_id internally. | 09:13 |
dulek | ltomasbo: Doesn't matter too much. But I'm pretty sure it fails because SG in DB has tenant_id and that code doesn't fill it automatically with context. | 09:14 |
dulek | ltomasbo: I'm trying to confirm the latter. | 09:14 |
ltomasbo | dulek, I see part of the neutron ps you linked is actually moving from tenant_id to project_id | 09:15 |
dulek | ltomasbo: Uh, oh? | 09:15 |
dulek | ltomasbo: Oh crap, it does. xD | 09:16 |
dulek | ltomasbo: Good thinking! | 09:16 |
* dulek checks DB schemas. | 09:16 | |
ltomasbo | dulek, I'm wondering if the rule should have project_id too instead of tenant_id... | 09:16 |
dulek | ltomasbo: Uh, they define it as synonym in DB model. Now it's SQLAlchemy magic… | 09:18 |
ltomasbo | ufff... | 09:18 |
openstackgerrit | Merged openstack/kuryr-kubernetes master: devstack: Create LB objects only if Octavia is enabled https://review.openstack.org/632999 | 09:30 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: DNM, just testing now https://review.openstack.org/633461 | 09:31 |
dulek | Hm, a patch merged? | 09:32 |
dulek | I don't understand a thing now. :D | 09:32 |
dulek | Oh, OpenShift's non-voting? | 09:32 |
dulek | Okay, gotta board my flight! | 09:32 |
*** garyloug has joined #openstack-kuryr | 09:40 | |
*** garyloug has quit IRC | 09:41 | |
*** mrostecki has quit IRC | 10:02 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Ensure host to pod connectivity for NP https://review.openstack.org/632503 | 10:04 |
*** mrostecki has joined #openstack-kuryr | 10:08 | |
*** gcheresh has quit IRC | 11:20 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Avoid doing `raise ex` when only logging https://review.openstack.org/633034 | 11:30 |
*** aperevalov has joined #openstack-kuryr | 11:55 | |
*** pcaruana has quit IRC | 12:36 | |
*** pcaruana has joined #openstack-kuryr | 12:37 | |
*** danil has joined #openstack-kuryr | 12:43 | |
*** rh-jelabarre has joined #openstack-kuryr | 12:48 | |
*** pcaruana has quit IRC | 13:32 | |
*** gcheresh has joined #openstack-kuryr | 13:37 | |
*** pcaruana has joined #openstack-kuryr | 13:50 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Ensure lb sg rules are deleted when no longer allowed https://review.openstack.org/631587 | 14:35 |
*** zul has joined #openstack-kuryr | 15:05 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Ensure NP changes are applied to services https://review.openstack.org/629856 | 15:19 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Add gate for Octavia provider OVN https://review.openstack.org/604036 | 15:27 |
*** openstackgerrit has quit IRC | 15:51 | |
*** gkadam has quit IRC | 16:00 | |
*** openstackgerrit has joined #openstack-kuryr | 16:40 | |
openstackgerrit | Paul Belanger proposed openstack/kuryr-kubernetes master: Remove non-voting job from gate https://review.openstack.org/633551 | 16:40 |
*** pcaruana has quit IRC | 16:47 | |
*** gcheresh has quit IRC | 17:00 | |
*** dims has quit IRC | 17:08 | |
openstackgerrit | Merged openstack/kuryr-tempest-plugin master: Service cleanup should be optional https://review.openstack.org/631459 | 17:17 |
dulek | maysams: config.CONF.neutron_defaults.project | 17:27 |
*** maysams has quit IRC | 17:39 | |
*** ccamposr has quit IRC | 17:51 | |
*** aperevalov has quit IRC | 19:13 | |
*** spsurya has quit IRC | 19:14 | |
*** dims has joined #openstack-kuryr | 19:15 | |
*** aojea has joined #openstack-kuryr | 19:59 | |
*** yboaron has quit IRC | 20:09 | |
*** yboaron has joined #openstack-kuryr | 20:09 | |
*** aojea has quit IRC | 20:12 | |
*** yboaron has quit IRC | 20:15 | |
*** aojea has joined #openstack-kuryr | 20:20 | |
*** aojea has quit IRC | 20:20 | |
*** aojea has joined #openstack-kuryr | 20:20 | |
*** aojea has quit IRC | 22:03 | |
*** premsankar has joined #openstack-kuryr | 22:08 | |
*** aojea has joined #openstack-kuryr | 22:20 | |
*** aojea has quit IRC | 22:24 | |
*** aojea has joined #openstack-kuryr | 22:25 | |
*** aojea has quit IRC | 22:29 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!