*** spotz has quit IRC | 01:12 | |
*** hongbin has joined #openstack-kuryr | 01:41 | |
*** spsurya has joined #openstack-kuryr | 02:12 | |
*** gkadam has joined #openstack-kuryr | 02:37 | |
*** hongbin has quit IRC | 03:00 | |
*** ccamposr has joined #openstack-kuryr | 06:13 | |
*** janki has joined #openstack-kuryr | 06:17 | |
*** gcheresh has joined #openstack-kuryr | 06:25 | |
*** pcaruana has joined #openstack-kuryr | 06:36 | |
*** pcaruana has quit IRC | 06:38 | |
*** pcaruana has joined #openstack-kuryr | 06:38 | |
*** maysams has joined #openstack-kuryr | 06:52 | |
dulek | ltomasbo: I could use some of your judgment here, you were working closer to that stuff. | 07:10 |
---|---|---|
dulek | ltomasbo: So the root issue of the gate breakage is the fact that inside Amphora's amphora-proxy netns, no default route is set anymore. | 07:11 |
dulek | ltomasbo: Previously it was set to our Neutron router 10.1.0.190. | 07:11 |
dulek | ltomasbo: So life was good, even though Amp was only connected to services subnet (and kubelet OVS bind is on pod subnet). | 07:12 |
dulek | ltomasbo: I don't yet know why that is happening. | 07:12 |
dulek | ltomasbo: Anyway my fix was to add --subnet-id to member creation, which made API's amphora to be connected to pod subnet. | 07:13 |
ltomasbo | dulek, umm | 07:13 |
dulek | ltomasbo: But now the octavia_pod_access SG, which only opens ingress from services subnet is not enough. | 07:13 |
dulek | ltomasbo: Should I just make it open to pod subnet as well? | 07:13 |
ltomasbo | dulek, I think if you add subnet-id, then you are enforcing L2 mode instead of L3 mode for amphora | 07:13 |
dulek | ltomasbo: Crap, it looked like L2 mode is enforced even without that. | 07:14 |
dulek | ltomasbo: Like no default route in amphora's internal netns routes. | 07:14 |
ltomasbo | and that was actually not properly working, so I think we actually made k8s amphora L3, even if the rest of LBs (svc lbs) are L2 | 07:14 |
dulek | ltomasbo: Hm… So what do you think we should do? | 07:16 |
dulek | ltomasbo: What's triggering L2 or L3 modes of amps? | 07:16 |
ltomasbo | dulek, adding or not the subnet-id | 07:22 |
ltomasbo | when attaching the members | 07:22 |
ltomasbo | so, in your patch, by adding the subnet id, you are enforcing it is L2 | 07:22 |
*** jistr is now known as jistr|afk | 07:22 | |
dulek | ltomasbo: OH MY. This might be an openstack-client/openstacksdk regression. | 07:23 |
dulek | The former, we don't use openstacksdk on devstack plugin. | 07:23 |
* dulek digs. | 07:23 | |
ltomasbo | ohh, that could be, yes | 07:24 |
ltomasbo | it is an easy way to check if you are using L2 or L3 | 07:25 |
ltomasbo | you can check if a port on the subnet id is created for the amphora | 07:25 |
ltomasbo | dulek, ^ | 07:25 |
dulek | ltomasbo: Yeah, yeah, but if it implicitly set --subnet-id=service-subnet, I won't see it… | 07:26 |
dulek | ltomasbo: Because Amphora is bound there by default. | 07:26 |
ltomasbo | ahh, ok | 07:27 |
dulek | Damn, nothing striking in both python-openstack and octaviaclients. | 07:27 |
dulek | ltomasbo: This seems to be bigger, as we seem to have same issue with service tests - i.e. no connectivity. So I assume amps are forced into L2 mode as well. | 07:29 |
dulek | If that's true, the L2 job should succeed on my patch. | 07:29 |
ltomasbo | you don't have that set in your local.conf, right? | 07:29 |
*** shachar has quit IRC | 07:30 | |
ltomasbo | dulek, not sure, I think we made K8s API L3 for a reason... (but I don't remember what was it) | 07:30 |
dulek | ltomasbo: Double checking, but I don't think I have. | 07:31 |
dulek | Nope. | 07:31 |
ltomasbo | dulek, if you use an old amphora, is it working? | 07:43 |
ltomasbo | dulek, I remember there were some problems (at some point) with the centos base amphpra image and L2 routing, perhaps something similar broke | 07:44 |
dulek | ltomasbo: I *think* so. I'm not sure here, it might be something else not being updated on my older env. | 07:44 |
dulek | ltomasbo: It's the Ubuntu, but I get the point. | 07:44 |
*** snapiri has joined #openstack-kuryr | 07:46 | |
ltomasbo | dulek, sorry for the slow replies.... did the lbaas guys hit the same problem? | 08:10 |
dulek | ltomasbo: cgoncalves was debugging my env and pointed out that Amp is missing a default route that would direct the traffic to the router. | 08:12 |
ltomasbo | dulek, ok! | 08:16 |
*** ccamposr has quit IRC | 09:03 | |
*** ccamposr has joined #openstack-kuryr | 09:04 | |
*** celebdor has joined #openstack-kuryr | 09:07 | |
*** ltomasbo has quit IRC | 10:12 | |
*** ltomasbo has joined #openstack-kuryr | 10:39 | |
dulek | ltomasbo: Hey, so why do we even have L3 mode in the first place? | 11:00 |
dulek | ltomasbo: Octavia folks are rather surprised that this worked before (well, as always). | 11:00 |
dulek | ltomasbo: And they advise to just always specify subnet_id. | 11:00 |
ltomasbo | well, you are wasting an extra port per loadbalancer on the member subnet with L2 | 11:00 |
dulek | ltomasbo: I still don't know what change triggered the error, I'm now testing with one less neutron commit, but that's probably last resort, I don't have any more ideas. | 11:01 |
ltomasbo | plus, it was not working with some sdns at that time, like odl | 11:01 |
dulek | ltomasbo: Suuuure. But looks like we'll need to do it. Did I told you that my patch + L2 mode enabled works fine? | 11:01 |
dulek | Ah crap. | 11:01 |
ltomasbo | if you did, I missed it (has some problem with the irc today) | 11:02 |
*** jistr|afk is now known as jistr | 11:02 | |
ltomasbo | dulek, also, to be honest... not sure if amphora L2 mode works with network policies.... | 11:02 |
ltomasbo | or better say, the other way around... | 11:02 |
dulek | ltomasbo: Wonderful. <3 | 11:03 |
ltomasbo | :/ | 11:08 |
ltomasbo | I know... | 11:09 |
ltomasbo | dulek, anyway, if that was working before and not anymore, they have a regression, right? | 11:09 |
dulek | ltomasbo: Oh come one, you must know how this works? :P | 11:10 |
ltomasbo | xD | 11:10 |
ltomasbo | I know I know... | 11:10 |
dulek | - "You were abusing some bug that's now fixed."; - "Sure, but what bug?!"; - "Dunno" | 11:10 |
ltomasbo | I hope we have ovn-octavia soooooon | 11:11 |
dulek | ltomasbo: ubuntu-minimal was updated at March, 20th. This is a far shot but maybe the infra image cache wasn't updated until 29th and that's the issue… | 11:13 |
dulek | It definitely has new version of cloud init… | 11:13 |
dulek | ltomasbo: Let's see what happens with centos Amphora. :D | 11:13 |
ltomasbo | ok! | 11:15 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: DNM: Testing with centos amphora https://review.openstack.org/649582 | 11:19 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: DNM: Testing with centos amphora https://review.openstack.org/649582 | 11:19 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: DNM: Testing with centos amphora https://review.openstack.org/649582 | 11:22 |
*** rh-jelabarre has quit IRC | 12:06 | |
*** rh-jelabarre has joined #openstack-kuryr | 12:06 | |
*** celebdor has quit IRC | 12:06 | |
dulek | ltomasbo, dmellado: Okay, centos amp works fine on my local env. Do we switch our gates now and I'll continue to check if it's cloud-init update or something else? | 12:15 |
dulek | gcheresh: Want the workaround for the DevStack issue? | 12:16 |
dulek | (finally) | 12:16 |
gcheresh | dulek: of course | 12:16 |
dulek | gcheresh: Just set those in local.conf while making sure to do `rm -rf /opt/stack/octavia/diskimage-create/amphora*` | 12:17 |
dulek | OCTAVIA_AMP_BASE_OS=centos | 12:17 |
dulek | OCTAVIA_AMP_DISTRIBUTION_RELEASE_ID=7 | 12:17 |
dulek | OCTAVIA_AMP_IMAGE_SIZE=3 | 12:17 |
dulek | gcheresh: Please note that you need to disable using downloaded amphora, so OCTAVIA_AMP_IMAGE_FILE must be unset. | 12:18 |
gcheresh | dulek: and restart controller after the change? | 12:18 |
dulek | gcheresh: More like restack whole DevStack. | 12:18 |
ltomasbo | dulek, \o/ | 12:18 |
dulek | gcheresh: Or build the image CentOS image yourself and reconfigure Octavia, but I don't know how to do that. | 12:18 |
dulek | gcheresh: I mean it's certainly possible, but I wanted DevStack to do that for me. :P | 12:19 |
gcheresh | dulek: ok, will give a try for the first option | 12:19 |
dulek | gcheresh: Also note that centos image in https://tarballs.openstack.org/octavia/test-images/ is broken. | 12:19 |
dulek | gcheresh: So you need to make sure your devstack builds a new one from latest octavia code (fix got in around 8 AM). | 12:19 |
dulek | ltomasbo: Question is - what do we do? :P | 12:20 |
ltomasbo | let's move the default to centos (at least until ubuntu image is fixed) | 12:22 |
ltomasbo | I think we used to have it on centos until it broke and we moved it to ubuntu | 12:22 |
dulek | ltomasbo: Hah, nice. :D | 12:24 |
ltomasbo | :) | 12:24 |
dulek | ltomasbo: Okay, so let's see if my commit will work. Current centos tarball is broken, so it won't work with predownloaded amphora until tomorrow morning. | 12:24 |
ltomasbo | dulek, great! thanks for unlocking the gates!!! | 12:26 |
*** janki has quit IRC | 12:38 | |
*** gaoyan has joined #openstack-kuryr | 13:00 | |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Switch to CentOS Amphora https://review.openstack.org/649582 | 13:29 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Restore using infra build of amphora https://review.openstack.org/649614 | 13:29 |
dulek | ltomasbo, dmellado: Okay, my test commit worked in the gate so here is merge-worthy version of it. ^ | 13:30 |
dulek | The second patch will most likely fail and is to be merged tomorrow, when nightly build includes the fix by gconcalves. | 13:31 |
*** shachar has joined #openstack-kuryr | 13:53 | |
*** oanson has quit IRC | 13:55 | |
*** snapiri has quit IRC | 13:56 | |
openstackgerrit | Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Fix LBaaS SG rules update https://review.openstack.org/649636 | 14:44 |
openstackgerrit | Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Fix LBaaS SG rules update https://review.openstack.org/649636 | 14:45 |
*** celebdor has joined #openstack-kuryr | 14:52 | |
*** gcheresh has quit IRC | 14:57 | |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Restore using infra build of amphora https://review.openstack.org/649614 | 15:06 |
*** celebdor has quit IRC | 15:06 | |
*** ccamposr has quit IRC | 15:17 | |
*** gcheresh has joined #openstack-kuryr | 15:24 | |
*** gaoyan has quit IRC | 15:25 | |
dmellado | hi dulek, the feverish me is slightly better | 15:39 |
dmellado | so what happened, did switch to centos-amp sorted out any stuff? | 15:39 |
dulek | dmellado: Yes, it seems to be working. | 15:42 |
dulek | dmellado: So +2 here is advisable: https://review.openstack.org/#/c/649582/ :) | 15:42 |
*** gcheresh has quit IRC | 15:44 | |
openstackgerrit | Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Add support for text ports on Network Policy Spec https://review.openstack.org/648905 | 15:46 |
dmellado | dulek: cloud-init? damn... | 15:46 |
dulek | dmellado: That's the only thing that changed between releases and matches. | 15:48 |
dulek | Maybe libdns, but I doubt it. | 15:48 |
dmellado | in any case I wonder how could that affect, is there any route that is not getting to the amphora? | 15:48 |
dulek | Either way - there's some regression. My only problem is that until we know exactly what it is, we can't ever guarantee it'll get fixed. | 15:48 |
dulek | And CentOS might get updated package one day too. ;) | 15:49 |
dmellado | I'll wait for CI in order to get merged but yeah | 15:49 |
dmellado | what did the octavia folks said? | 15:49 |
dulek | dmellado: Yeah, exactly. I linked that a while ago - new Ubuntu Amphora doesn't get the default route in amphora-haproxy netns. | 15:49 |
dulek | <ltomasbo> dulek, anyway, if that was working before and not anymore, they have a regression, right? | 15:50 |
dulek | <dulek> ltomasbo: Oh come one, you must know how this works? :P | 15:50 |
dulek | <ltomasbo> xD | 15:50 |
dulek | <ltomasbo> I know I know... | 15:50 |
dulek | <dulek> - "You were abusing some bug that's now fixed."; - "Sure, but what bug?!"; - "Dunno" | 15:50 |
dmellado | will they take care of this issue on their side | 15:51 |
dulek | dmellado: I strongly doubt it, they assume everything is working correctly now. | 15:58 |
dulek | And the fact that we had that default route was there just by chance. | 15:58 |
dmellado | 'by chance' | 15:58 |
* dmellado sighs | 15:58 | |
dmellado | this is getting me sick again | 15:58 |
dmellado | dulek: in any case please open a bug on octavia if you have the details there | 15:58 |
dmellado | and let me try handling this with carlos | 15:59 |
dulek | dmellado: Well, we can sigh or we can do our job. ;) If we pinpoint the root cause, it's quite easy to convince whoever fault it is to fix the regression. | 15:59 |
dulek | At this point of my investigation it isn't Octavia's fault, it's something on lower layer. | 15:59 |
dmellado | if it's related to cloud-init | 16:02 |
dmellado | we can fetch larsks | 16:02 |
dmellado | do you know him? | 16:02 |
*** gkadam has quit IRC | 16:42 | |
*** rh-jelabarre has quit IRC | 16:44 | |
*** spsurya has quit IRC | 16:46 | |
*** rh-jelabarre has joined #openstack-kuryr | 16:49 | |
*** spsurya has joined #openstack-kuryr | 17:27 | |
*** gcheresh has joined #openstack-kuryr | 20:37 | |
*** spsurya has quit IRC | 20:59 | |
*** gcheresh has quit IRC | 21:04 | |
*** pcaruana has quit IRC | 21:06 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!