*** janonymous has quit IRC | 00:26 | |
*** premsankar has quit IRC | 00:58 | |
*** chenyb4 has joined #openstack-kuryr | 01:14 | |
*** salv-orl_ has joined #openstack-kuryr | 01:59 | |
*** salv-orlando has quit IRC | 02:02 | |
*** maysamacedos has quit IRC | 02:03 | |
*** janki has joined #openstack-kuryr | 02:07 | |
*** hongbin_ has joined #openstack-kuryr | 02:10 | |
*** jchhatbar has joined #openstack-kuryr | 02:10 | |
*** salv-orl_ has quit IRC | 02:11 | |
*** janki has quit IRC | 02:12 | |
*** salv-orlando has joined #openstack-kuryr | 02:13 | |
*** salv-orlando has quit IRC | 02:20 | |
*** salv-orlando has joined #openstack-kuryr | 02:22 | |
*** kiennt2609 has joined #openstack-kuryr | 02:33 | |
*** kiennt2609 has quit IRC | 02:34 | |
*** kiennt2637 has joined #openstack-kuryr | 02:34 | |
*** kiennt2637 has quit IRC | 02:35 | |
*** kiennt2609 has joined #openstack-kuryr | 02:35 | |
*** caowei has joined #openstack-kuryr | 02:53 | |
*** kiennt2609 has quit IRC | 03:10 | |
*** kiennt2609 has joined #openstack-kuryr | 04:02 | |
*** kiennt2609 has quit IRC | 04:03 | |
*** hongbin_ has quit IRC | 04:11 | |
*** gcheresh has joined #openstack-kuryr | 04:18 | |
*** jchhatba_ has joined #openstack-kuryr | 04:18 | |
*** jchhatba_ has quit IRC | 04:19 | |
*** jchhatba_ has joined #openstack-kuryr | 04:19 | |
*** jchhatbar has quit IRC | 04:21 | |
*** premsankar has joined #openstack-kuryr | 04:21 | |
*** jchhatbar has joined #openstack-kuryr | 04:50 | |
*** gcheresh has quit IRC | 04:51 | |
*** jchhatba_ has quit IRC | 04:53 | |
*** janonymous has joined #openstack-kuryr | 05:04 | |
*** gcheresh has joined #openstack-kuryr | 05:42 | |
*** pcaruana has joined #openstack-kuryr | 06:21 | |
*** premsankar has quit IRC | 06:49 | |
*** dims has quit IRC | 06:54 | |
*** dims has joined #openstack-kuryr | 06:56 | |
*** dims has quit IRC | 07:01 | |
*** dims has joined #openstack-kuryr | 07:02 | |
*** celebdor1 has joined #openstack-kuryr | 07:21 | |
*** celebdor1 is now known as apuimedo | 07:21 | |
apuimedo | morning | 07:21 |
---|---|---|
*** salv-orlando has quit IRC | 07:23 | |
*** salv-orlando has joined #openstack-kuryr | 07:24 | |
*** kiennt2609 has joined #openstack-kuryr | 07:26 | |
dulek | o/ | 07:26 |
*** salv-orlando has quit IRC | 07:28 | |
apuimedo | dulek: did you see I fixed https://review.openstack.org/#/c/562067/1 ? | 07:29 |
*** pmannidi has quit IRC | 07:29 | |
apuimedo | sorry https://review.openstack.org/#/c/562067/2 | 07:29 |
dulek | !!! | 07:29 |
openstack | dulek: Error: "!!" is not a valid command. | 07:29 |
dulek | I haven't noticed. Wow. | 07:30 |
dulek | Wait, no. | 07:30 |
dulek | apuimedo: We need to rebase the test disabling skip decorator on top of that to test. | 07:30 |
dulek | Otherwise without Service test we cannot prove anything. | 07:30 |
*** dmellado has joined #openstack-kuryr | 07:32 | |
*** salv-orlando has joined #openstack-kuryr | 07:37 | |
apuimedo | dulek: I meant that I fixed the fake router thing xD | 07:39 |
apuimedo | from the patch | 07:39 |
apuimedo | dulek: but yeah, let's rebase the skip test | 07:39 |
dulek | apuimedo: You can do that from the UI. :) | 07:40 |
apuimedo | dulek: I'll just add a depends-on since it's a different repo | 07:41 |
dulek | Oh, right. | 07:42 |
openstackgerrit | Antoni Segura Puimedon proposed openstack/kuryr-tempest-plugin master: Revert "Skip service test" https://review.openstack.org/561364 | 07:42 |
*** pcaruana has quit IRC | 07:45 | |
*** pcaruana has joined #openstack-kuryr | 07:46 | |
*** salv-orlando has quit IRC | 08:00 | |
*** salv-orlando has joined #openstack-kuryr | 08:01 | |
*** salv-orlando has quit IRC | 08:05 | |
*** garyloug has joined #openstack-kuryr | 08:17 | |
apuimedo | ltomasbo: do you have details on this l2 failure? | 08:29 |
ltomasbo | I'm checking | 08:29 |
ltomasbo | apuimedo, I think we need to create (and use) the security group you set for the kubelet port for the octavia l2 mode | 08:30 |
ltomasbo | I'm testing that | 08:30 |
ltomasbo | apuimedo, if it works, I'll let you know | 08:31 |
ltomasbo | dmellado, dulek, apuimedo: and we should test both ovs-firewall and octavia l2 and l3 on gates | 08:31 |
dmellado | yeah | 08:32 |
ltomasbo | otherwise we will broke them all the time (as usual) :D | 08:32 |
dmellado | ltomasbo: could you add that to the enhance gates bp? | 08:32 |
dulek | Aaaah, more deployment options ?! | 08:32 |
dmellado | dulek: yeah xD | 08:32 |
dmellado | + containerized / non containerized | 08:32 |
dmellado | apuimedo: what did you say you fixed? | 08:33 |
apuimedo | dmellado: world hunger | 08:33 |
ltomasbo | dulek, perhaps we should just move to ovs-firewall... | 08:33 |
ltomasbo | dulek, nested does not work with ovs-hybrid anyway, and it will be safe to ensure security groups are the right ones | 08:33 |
dmellado | apuimedo: with fuet? | 08:34 |
apuimedo | dmellado: I'd rather it be with llangonissa | 08:35 |
dulek | ltomasbo: I'm not against it. | 08:35 |
apuimedo | ltomasbo: I tend to agree | 08:36 |
dulek | But we need to start thinking about what's Kuryr issue and what's Kuryr DevStack plugin issue. | 08:36 |
apuimedo | dulek: this is devstack | 08:36 |
apuimedo | clearly | 08:36 |
dmellado | dulek: also, we'd need to provide some healthchecks | 08:36 |
dmellado | in devstack | 08:36 |
dmellado | some timeout and check that the containers are ready | 08:36 |
dulek | This one - yes. SGs missing for LBaaS v2 Services was Kuryr's. | 08:36 |
dulek | Because putting too much effort in testing DevStack plugin isn't worth it IMO. | 08:36 |
ltomasbo | dulek, agree, but it is better to have sg enforcement so that we now that they are needed and where. That will also help when deploying with other SDNs/Tools | 08:40 |
*** salv-orlando has joined #openstack-kuryr | 08:58 | |
dulek | ltomasbo: Agreed (sorry, missed this message). | 09:02 |
ltomasbo | xD | 09:02 |
ltomasbo | apuimedo, it (half) worked | 09:02 |
ltomasbo | apuimedo, the cni is now able to connect, but the loadbalancer is actually not working... probably missing some extra SGs | 09:03 |
apuimedo | ltomasbo: do you need help? | 09:03 |
ltomasbo | I'm in a call, but we can share the tmux if you want to dig into it | 09:03 |
apuimedo | ltomasbo: give me the details and I can look into it while you're on the call | 09:04 |
apuimedo | https://gist.github.com/celebdor/77f1130eb8763078a2c997a2ebf91494 | 09:05 |
ltomasbo | apuimedo, stack@38.145.33.129 | 09:06 |
ltomasbo | apuimedo, I have a tmux session there | 09:07 |
ltomasbo | and I applied some changes on top of your patch | 09:07 |
* dmellado sighs | 09:07 | |
dmellado | the issues that we're facing on the octavia gate seems to be related to upstream infra | 09:07 |
dmellado | I just created a dsvm ubuntu based with the qcow amphora and I don't see any issue | 09:08 |
dmellado | but it just doesn't make sense | 09:08 |
apuimedo | ltomasbo: ODL?! | 09:08 |
ltomasbo | apuimedo, it is ml2/ovs... don't worry | 09:08 |
dmellado | ltomasbo: shhh just tell him it's odl so he becomes crazy | 09:09 |
ltomasbo | that is because I used for odl at some time | 09:09 |
dmellado | xD | 09:09 |
apuimedo | ltomasbo: you scared me | 09:09 |
apuimedo | xD | 09:09 |
ltomasbo | xD | 09:09 |
dmellado | folks any kind of idea of what could be using that ip on the upstream gates? | 09:10 |
dmellado | http://logs.openstack.org/64/561364/2/check/kuryr-kubernetes-tempest-octavia/f028a60/controller/logs/screen-o-api.txt.gz#_Apr_19_08_25_22_230839 | 09:11 |
apuimedo | dmellado: didn't you request access to one of those VMs? | 09:11 |
dmellado | apuimedo: still waiting for it | 09:11 |
dmellado | it just plain works on ubuntu from our side | 09:11 |
dmellado | even on rdo cloud | 09:12 |
dmellado | and that's a lot to say | 09:12 |
dulek | dmellado: If we're debugging this let's not speculate and go straight to frickler to freeze a VM. | 09:17 |
dulek | dmellado: Infra thinks that this is still better than rechecking stuff like crazy. | 09:17 |
dmellado | dulek: I'm rechecking in order to actually GET a hold on that vm | 09:17 |
dmellado | before nodepool drops it | 09:18 |
dmellado | thus the recheck | 09:18 |
dmellado | so don't complain :P | 09:18 |
dmellado | once the results are spit I've been just told they can't do it | 09:18 |
dulek | Okay! | 09:20 |
dulek | dmellado: And no offense, I was just proposing a solution that worked last time. ;) | 09:21 |
apuimedo | ltomasbo: you've been touching the SGs right and left, huh? | 09:21 |
ltomasbo | apuimedo, I finished with the call | 09:21 |
ltomasbo | apuimedo, only one! I added octavia one to the kubelet | 09:21 |
ltomasbo | other than that, just the devstack modifications | 09:21 |
dmellado | dulek: I was thinking that it could be that somehow the ip that it tries to allocate to the 2nd amphora is used by something in infra | 09:22 |
dmellado | or whatever | 09:22 |
dmellado | let's try to check once we get access to the vm | 09:22 |
apuimedo | ltomasbo: :P | 09:22 |
ltomasbo | apuimedo, I only added the df866 one | 09:22 |
ltomasbo | apuimedo, ca503 is added automatically | 09:22 |
apuimedo | ltomasbo: you created and added df866fe7-904a-4340-aa59-3c9047562dee | 09:23 |
ltomasbo | apuimedo, and d7de with the modification to your patch, without that, kuryr-cni is not able to connect to the API | 09:23 |
apuimedo | right? | 09:23 |
ltomasbo | apuimedo, that is created (and needed) for the l2 mode to work | 09:23 |
ltomasbo | so, it is created by devstack/plugin.sh | 09:23 |
apuimedo | ltomasbo: the api lb is missing a port on the pod subnet! | 09:29 |
ltomasbo | apuimedo, really? | 09:29 |
apuimedo | ltomasbo: yup | 09:32 |
apuimedo | I'm fixing it now | 09:32 |
dulek | HA, I've plumbed the K8s API LB for DevStack with OpenShift. \o/ | 09:32 |
dulek | openshift-master was binding to HOST_IP only. :) | 09:33 |
apuimedo | :P | 09:33 |
dulek | Now how do I fix that in DevStack plugin… | 09:33 |
dmellado | and now things break for yolanda and she can't gate the node | 09:33 |
* dmellado sighs | 09:33 | |
* dmellado double sighs | 09:33 | |
dmellado | why everything just breaks all the time | 09:34 |
apuimedo | ltomasbo: do you see now that the LB can access the api? | 09:34 |
apuimedo | now we need to see why I can't curl the VIP | 09:34 |
dmellado | apuimedo: but *why* does it work on our local installations? | 09:34 |
dmellado | it makes no sense | 09:34 |
apuimedo | dmellado: I'm not reading your thread. I'm talking to ltomasbo | 09:35 |
apuimedo | dmellado: what are you talking about? | 09:35 |
dmellado | LOL | 09:35 |
dmellado | nvm I'll just go and deal with my pain alone xD | 09:35 |
dmellado | thought you were tackling the issue on the ip address already allocated | 09:36 |
apuimedo | dmellado: now that I found one of the issues with ltomasbo env | 09:36 |
apuimedo | I have a thread open | 09:37 |
apuimedo | dmellado: that also | 09:37 |
apuimedo | wait, the insurance people are here | 09:37 |
apuimedo | I'll be back | 09:37 |
dmellado | insurance? | 09:37 |
apuimedo | dmellado: broken glass door in the house entrance | 09:42 |
apuimedo | ltomasbo: I summon thee | 09:42 |
*** salv-orl_ has joined #openstack-kuryr | 09:43 | |
ltomasbo | apuimedo, ?? | 09:45 |
ltomasbo | apuimedo, tell me! | 09:45 |
*** salv-orlando has quit IRC | 09:47 | |
apuimedo | ltomasbo: hey | 09:48 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Gate with containerized deployment and OpenShift https://review.openstack.org/557313 | 09:48 |
apuimedo | ltomasbo: do you see those chksum errors? | 09:48 |
dulek | Okay, hopefully that's it. ^ | 09:48 |
ltomasbo | apuimedo, yes | 10:02 |
apuimedo | ltomasbo: any idea? | 10:02 |
ltomasbo | yes | 10:03 |
ltomasbo | I think this got fixed on ubuntu (when we hit it last time) | 10:03 |
ltomasbo | and this is a centos amphora, I bet nobody has tested it with l2 | 10:03 |
ltomasbo | and it is replying from the wrong eth | 10:04 |
ltomasbo | I remember we had to set some kernel flags to fix it | 10:04 |
apuimedo | ltomasbo: can you find out? | 10:04 |
ltomasbo | apuimedo, also, that was working before | 10:05 |
ltomasbo | why it is not working now? | 10:05 |
ltomasbo | did you disable something? | 10:05 |
apuimedo | ltomasbo: what was working? | 10:07 |
ltomasbo | the lb api | 10:08 |
ltomasbo | apuimedo, ^^ | 10:08 |
apuimedo | ltomasbo: nah... it wasn't | 10:08 |
apuimedo | when I sshed into your machine it was even missing an interface | 10:08 |
ltomasbo | apuimedo, did you add the missing interface to what? api lbaas? demo lbaas? or both? | 10:09 |
apuimedo | ltomasbo: I'm only touching api lbaas | 10:10 |
ltomasbo | so, that was actually working (at least the kuryr-cni was able to reach it | 10:10 |
ltomasbo | my problem was with the default/demo lbaas | 10:11 |
ltomasbo | apuimedo, and the problem is that it reaches the amphora through the wrong nic | 10:12 |
apuimedo | ltomasbo: I tried to curl the API from the host namespace and it was getting EOF | 10:14 |
ltomasbo | that I tested before, and it was working | 10:16 |
ltomasbo | let me restack and start from clean deployment | 10:17 |
*** maysamacedos has joined #openstack-kuryr | 10:23 | |
*** kiennt2609 has quit IRC | 10:24 | |
dmellado | dulek: apuimedo ltomasbo | 10:32 |
dmellado | ready for upstream gate debugging? | 10:32 |
dulek | Ah! | 10:34 |
dulek | "kuryr-kubernetes-tempest-daemon-containerized-openshift-lbaasv2 success (non-voting)" \o/ | 10:34 |
dmellado | heh | 10:35 |
dulek | https://github.com/dulek.keys | 10:35 |
dmellado | dulek: good padawan xD | 10:35 |
dulek | https://github.com/dulek.keys | 10:35 |
dulek | Okay, alias works. :P | 10:35 |
apuimedo | dmellado: more or less | 10:35 |
apuimedo | https://gist.github.com/celebdor/77f1130eb8763078a2c997a2ebf91494 | 10:36 |
dulek | irenab: Can you look again on https://review.openstack.org/#/c/556777 ? I've answered your comment there. | 10:36 |
apuimedo | dulek: it's a public holiday | 10:37 |
dulek | Ah, okay. | 10:37 |
apuimedo | 70th aniversary of the state of Israel IIRC | 10:37 |
dmellado | it'll be all week | 10:37 |
dmellado | yeah | 10:37 |
dulek | Poland's going to have 100th this year. :) | 10:37 |
dmellado | ssh root(at)104.239.135.58 | 10:38 |
dmellado | then su - stack | 10:38 |
dmellado | dulek: apuimedo | 10:38 |
dmellado | ltomasbo: and me will be heading for lunch in 15' or so | 10:38 |
dmellado | so maybe you can check you can login | 10:38 |
dmellado | deploy a tmux | 10:38 |
dulek | dmellado: I've logged int. | 10:38 |
dmellado | and we can go after we come back | 10:38 |
dmellado | dulek: awesome | 10:38 |
*** caowei has quit IRC | 10:38 | |
dmellado | I'll install vim and tmux | 10:38 |
dmellado | xD | 10:38 |
dmellado | otherwise we won't be able to work | 10:39 |
* dulek starts preparing lunch then. | 10:39 | |
dmellado | dulek: tmux a -t gate | 10:39 |
dmellado | I've created a tmux session named like that | 10:39 |
dmellado | under 'stack' user | 10:40 |
ltomasbo | dmellado, https://github.com/luis5tb.keys | 10:40 |
dmellado | ltomasbo: added | 10:41 |
ltomasbo | dmellado, I cannot login... | 10:43 |
ltomasbo | apuimedo, btw, I redeploy the stack, and it is getting access to the 10.0.0.129:443 | 10:46 |
apuimedo | let me take a look | 10:47 |
apuimedo | dmellado: is this before or after running tempest? | 10:51 |
dmellado | apuimedo: after | 10:52 |
dmellado | we can trigger the test and pause it if needed | 10:52 |
apuimedo | dmellado: not necessary | 10:52 |
dmellado | so it tries to spin up an amphora with a non-valid ip which is took y the service? | 10:52 |
*** chenyb4 has quit IRC | 10:54 | |
apuimedo | where the fuck is the 155 port? | 10:55 |
apuimedo | creation request in the API? | 10:55 |
dmellado | maybe we can check neutron? | 10:55 |
*** maysamacedos has quit IRC | 10:56 | |
dmellado | we go for lunch, brb | 10:58 |
apuimedo | ltomasbo: don't you see that in your deployment the api amphora works but is in L3 mode | 11:02 |
apuimedo | it should be in l2 mode, i.e., have an attachment to the k8s-pod-net | 11:02 |
*** maysamacedos has joined #openstack-kuryr | 11:18 | |
*** gcheresh has quit IRC | 11:22 | |
*** atoth has joined #openstack-kuryr | 11:50 | |
*** rh-jelabarre has joined #openstack-kuryr | 11:58 | |
dulek | ltomasbo, apuimedo: I'm debating ltomasbo remark on https://review.openstack.org/#/c/562366 . | 12:03 |
dulek | Basically - which Kuryr version supports which OpenShift? | 12:03 |
*** yamamoto_ has quit IRC | 12:21 | |
dmellado | apuimedo: dulek back, | 12:22 |
dmellado | any discovery? | 12:22 |
ltomasbo | apuimedo, shoulid it? | 12:22 |
ltomasbo | for the kubelet? perhaps that is because we configure the kubelet port ourselves... | 12:23 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Raise OpenShift version to 3.9.0 https://review.openstack.org/562366 | 12:23 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Gate with containerized deployment and OpenShift https://review.openstack.org/557313 | 12:23 |
dulek | dmellado: Haven't been looking there, was eating and waiting for you. | 12:26 |
dulek | But meanwhile, while don't we start mergefest? | 12:27 |
*** yamamoto has joined #openstack-kuryr | 12:27 | |
dmellado | let's go for mergefest while we have the meeting | 12:27 |
dulek | dmellado: https://review.openstack.org/#/c/556777 - this only needs second +2 and I've answered irenab comment about making `mkdir -p` conditional. | 12:28 |
dmellado | in! | 12:29 |
dulek | Whooo, it's going! | 12:30 |
*** chenyb4 has joined #openstack-kuryr | 12:47 | |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Raise OpenShift version to 3.9.0 https://review.openstack.org/562366 | 13:04 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Gate with containerized deployment and OpenShift https://review.openstack.org/557313 | 13:04 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Raise OpenShift version to 3.9.0 https://review.openstack.org/562366 | 13:05 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Gate with containerized deployment and OpenShift https://review.openstack.org/557313 | 13:05 |
ltomasbo | celebdor, I think I found out the reason for the api lb not having the subnet port | 13:08 |
dmellado | dulek: ltomasbo around and seeing my tmux? | 13:11 |
*** chenyb4 has quit IRC | 13:12 | |
dulek | dmellado: Not really. | 13:12 |
*** jistr is now known as jistr|mtg | 13:14 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Create CNI bin dir in OpenShift DevStack plugin https://review.openstack.org/556777 | 13:15 |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Add `privileged` SCC to SA in OpenShift DevStack https://review.openstack.org/556959 | 13:21 |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Add HTTPS support to K8s API healthchecks https://review.openstack.org/556960 | 13:22 |
*** salv-orl_ has quit IRC | 13:27 | |
dmellado | dulek: could you take over? | 13:32 |
dmellado | I've been running the test as otherwise the resources are just destroyed | 13:32 |
dmellado | andit seems that we cannot get to the service ip:port | 13:32 |
dmellado | we were thinking about it possibly be related to it needing a fip so we added that to the port | 13:33 |
dmellado | but still stuck | 13:33 |
dulek | dmellado: Sorry, I don't get it. I've tried doing `tmux a` and `tmux a -t gate` but there's no session. | 13:33 |
dmellado | dulek: can't you get in? | 13:33 |
dmellado | ssh into there | 13:33 |
dmellado | then su - stack | 13:33 |
dmellado | then tmux a -t gate | 13:33 |
dulek | I'm on the VM. | 13:33 |
dmellado | then I'll show you | 13:33 |
dulek | dmellado: | 13:34 |
dulek | root@ubuntu-xenial-rax-dfw-0003608416:~# tmux a -t gate | 13:34 |
dulek | no sessions | 13:34 |
dmellado | dulek: heh | 13:34 |
dmellado | read ^^ | 13:34 |
dmellado | su - stack | 13:34 |
dmellado | then tmux a -t gate | 13:34 |
dulek | Ah. | 13:34 |
dmellado | let me know when you're in | 13:35 |
dulek | dmellado: I'm in. So what are you checking there. Don't we just need to do `kubectl expose` and see why Octavia crashes? | 13:35 |
dmellado | what I'm doing is running the test | 13:35 |
*** gcheresh has joined #openstack-kuryr | 13:35 | |
dmellado | and seeing the pods and after that the service | 13:36 |
dmellado | i.e. | 13:36 |
dulek | dmellado: watch kubectl get pods -o wide. :P | 13:37 |
dulek | dmellado: That's better. | 13:37 |
dmellado | I thought it was going to be quicker | 13:37 |
dmellado | that's why I thought I'd wait xD | 13:37 |
dulek | dmellado: Heh, this test creates a pod and waits for it before creating next one? | 13:38 |
dulek | dmellado: That can be improved. :P | 13:38 |
dmellado | yeah, totally | 13:38 |
celebdor | dmellado: found anything? | 13:38 |
dmellado | it even uses a for | 13:38 |
dmellado | see that now we got that kuryr-service there at 80 port? | 13:38 |
dulek | Okay, we have a Service. Fine. | 13:38 |
dmellado | so let's check tempest logs | 13:38 |
dulek | dmellado: Can you reach it from the host? | 13:39 |
dulek | dmellado: I've seen that, nothing to see here. We need to figure out why Octavia explodes. | 13:39 |
dulek | dmellado: Take a look on Octavia API log. | 13:39 |
dulek | dmellado: Hey, that's not too bad. Though what's with the restart of the log? | 13:40 |
dmellado | IIRC it'll time out eventually as we can't reach it from the host at all | 13:41 |
dulek | dmellado: Yes, yes, it will. | 13:41 |
dmellado | huh, I lost connectivity to the infra vm | 13:42 |
dmellado | are you still around in the tmux | 13:42 |
dulek | dmellado: Why don't we kubectl run a single pod, expose it and start trying to reach the service? | 13:42 |
dulek | dmellado: I'm in. | 13:42 |
dmellado | ok I can't type there now | 13:42 |
dmellado | weird | 13:42 |
*** gcheresh has quit IRC | 13:42 | |
dmellado | could you try that? | 13:43 |
dmellado | ltomasbo: and we will go into a looong meeting now | 13:43 |
dmellado | and I'll sync with you after we're done | 13:43 |
dulek | dmellado: Yeah, once it gets better connectivity, I have lag as well. | 13:43 |
dmellado | dulek: see? we just hit that issue | 13:44 |
dulek | dmellado: Here we see the Octavia failure, right? | 13:44 |
dmellado | see the 10.1.0. blah already allocated | 13:44 |
dulek | Reproducing it is easy, hard part is why it's happening. | 13:44 |
dulek | WHY CAN'T I TYPE?! xD | 13:45 |
dmellado | you can't either? xD xD xD | 13:45 |
dulek | Hm, waaaait… | 13:45 |
dmellado | dafuq!!!xD | 13:45 |
dmellado | just slow? | 13:45 |
dulek | dmellado: It's tmux fault, we've probably hit something blocking input. | 13:46 |
*** hongbin_ has joined #openstack-kuryr | 13:46 | |
*** jistr|mtg is now known as jistr | 14:10 | |
dmellado | dulek: tmux? | 14:12 |
dulek | dmellado: I'm not on the tmux, investigating from outside as that works. :P | 14:13 |
dmellado | LOOOOL | 14:13 |
dmellado | xD | 14:13 |
*** janonymous has quit IRC | 14:13 | |
dulek | ? | 14:13 |
*** kiennt26_ has joined #openstack-kuryr | 14:16 | |
dmellado | dreaded tmux | 14:16 |
apuimedo | dmellado: dulek: I can type just fine in tmux | 14:17 |
dulek | apuimedo: Maybe it fixed itself. | 14:17 |
apuimedo | maybe | 14:17 |
dulek | Okay, what I see from Octavia code and logs is that it sees that LB has and IP. | 14:17 |
dulek | And tries to allocate a port with that IP. | 14:18 |
dulek | But that port already exists and for some reason Octavia's unable to notice that. | 14:18 |
dulek | According to code for some reason vip.ip_address gets saved but not vip.port_id. | 14:19 |
dulek | Though I'm unable to find first POST for that port. | 14:19 |
apuimedo | which code? | 14:19 |
dmellado | genadi's code | 14:19 |
*** kiennt26 has joined #openstack-kuryr | 14:20 | |
apuimedo | oh | 14:20 |
apuimedo | any idea why it only happens at infra | 14:20 |
dulek | Genadi's? | 14:21 |
dulek | No, Octavia code. | 14:21 |
dulek | Okay, I have first post in Neutron logs… It happens *5 minutes* before the error. Let's see who did that. | 14:22 |
dmellado | dulek: oh, I thought you meant gena's code on the test | 14:22 |
dulek | Okay, seems like we did that from DevStack. Let me make sure. | 14:23 |
dulek | Uhm… Interesting, it's not DevStack? | 14:24 |
dulek | Okay, so it's kuryr-kubernetes who ordered the first conflicting LB… | 14:27 |
dmellado | dulek: so it was our fault? what did you find out? | 14:28 |
dulek | Nothing yet, still looking. | 14:28 |
dulek | (I'm reading the logs on the gate, not on the VM, feel free to investigate VM on your own) | 14:29 |
dulek | apuimedo, dmellado: Okay, here's what happens IMO: | 14:33 |
dulek | 1. Kuryr tries to provision an LB. | 14:33 |
dulek | 2. After a while LB is still not… ACTIVE or whatever, so Kuryr retries. | 14:33 |
dulek | 3. Octavia retries port creation and fails. | 14:33 |
dulek | That's why we see those logs. | 14:33 |
dulek | They're most likely not related to the fact that Octavia's not passing traffic. | 14:34 |
dulek | Anyone tried looking what's happening on Amphorae VM? | 14:34 |
dulek | I'd bet tarball image is malformed or something. | 14:34 |
apuimedo | dulek: shouldn't the retry not happen if the VM and port are created? | 14:40 |
dulek | apuimedo: Let me show the exact log. | 14:40 |
dulek | apuimedo: http://logs.openstack.org/64/561364/2/check/kuryr-kubernetes-tempest-octavia/4a08a59/controller/logs/screen-kuryr-kubernetes.txt.gz#_Apr_19_10_17_49_117020 | 14:41 |
dulek | apuimedo: This is the moment when second POST happens on Octavia LB and Octavia tries to create the port for the second time. | 14:41 |
dulek | Now why is there ResourceNotReady even though LB is created…? | 14:43 |
apuimedo | dulek: my question is whether the LB exists when the second post happends | 14:43 |
apuimedo | *happens | 14:43 |
dulek | apuimedo: It definitely does in Octavia API - we can see gets with 200 answers. | 14:43 |
dulek | Maybe we time out too fast? And Octavia loses track of LB when it gets this duplicated request? Dunno… | 14:45 |
dulek | Yeah, it's still PENDING_CREATE 5 seconds before retry. | 14:46 |
dulek | But in the end we can see it created on the env. | 14:46 |
dulek | Just Kuryr doesn't care about it anymore as test timed out as well and K8s resources are down. | 14:47 |
apuimedo | dulek: mmm | 14:47 |
dulek | Let me create a Service myself and lets wait a bit. | 14:47 |
apuimedo | ok | 14:47 |
dulek | apuimedo: Hm, I should test ClusterIP, right? | 14:49 |
dulek | Okay, so LB is PENDING_CREATE | 14:50 |
*** salv-orlando has joined #openstack-kuryr | 14:51 | |
dulek | I'll wait until it's up and see if Kuryr created all the other resources. | 14:51 |
dulek | Like members. | 14:51 |
apuimedo | very well | 14:52 |
dmellado | dulek: I've just read all this | 14:53 |
dmellado | my laptop crashed | 14:53 |
dmellado | (as happens with everything as of lately) | 14:54 |
dmellado | hmmm we can always try with the non-qcow2 version of the amphora | 14:54 |
dmellado | dunno about the current status of dib, though | 14:54 |
dulek | dmellado: DIB is still failing on stable/queens even though release with the fix was released. | 14:56 |
dulek | dmellado: :( | 14:56 |
dulek | dmellado: So this is another issue. | 14:56 |
apuimedo | dulek: where was it fixed? | 14:56 |
dulek | dmellado: I've looked on the Amphorae VMs and it doesn't look too bad - qemu reports no panic and they answer pings. | 14:56 |
dulek | apuimedo: This was supposed to help: https://review.openstack.org/#/c/561479/ | 14:57 |
dulek | apuimedo: But it isn't. Might be good to ping Octavia folks again. | 14:57 |
dmellado | dulek: maybe they won't reply | 14:58 |
dmellado | xD | 14:58 |
* dmellado hides | 14:58 | |
dmellado | seriously, it might be totally worth pinging them again | 14:58 |
apuimedo | dulek: doesn't help even with your libs_from_git cheat? | 14:58 |
dmellado | apuimedo: dulek at least today's a happy day | 14:58 |
dulek | apuimedo: Even. Now as it's released LIBS_FROM_GIT addition is not needed. | 14:59 |
dmellado | Nicholas Cage has retired | 14:59 |
dmellado | xD | 14:59 |
dmellado | finally xD | 14:59 |
dulek | dmellado: I'm probably the only person in the universe that really liked some of his movies. | 14:59 |
dmellado | dulek: really? which one? If you say Ghost Rider I'll be really sad | 14:59 |
dmellado | xD | 14:59 |
apuimedo | dulek: name one good movie where he leads | 14:59 |
dulek | dmellado: I liked Bad Lieutenant. And Gone In 60 Seconds. | 15:00 |
dulek | I'm not really sophisticated cinema person. | 15:00 |
dmellado | didn't see that first one | 15:00 |
dmellado | but I can't but think on this one https://www.imdb.com/title/tt0117420/ | 15:00 |
dmellado | xD | 15:00 |
dulek | It's quite okay. Probably the only movie where Cage acts not like Cage. :P | 15:01 |
*** kiennt26_ has quit IRC | 15:01 | |
dulek | Oh, Lord of War was good. Not Cage's acting though, but film was nice. | 15:02 |
dulek | Aaaand I have my LB ACTIVE! | 15:02 |
dmellado | man it took a while | 15:02 |
apuimedo | dulek: how long was it? | 15:03 |
dulek | More than 10 minutes it seems. | 15:03 |
dulek | But let me check the connectivity. :P | 15:03 |
apuimedo | good lord | 15:04 |
apuimedo | dulek: you created it via cli? | 15:04 |
dmellado | no wonder it fails | 15:04 |
dulek | apuimedo: I've did `kubectl expose` and waited. | 15:05 |
dulek | Heh, there's still no connectivity to that LB, so let's dig it further… | 15:05 |
apuimedo | dulek: and kuryr didn't mash it? | 15:05 |
apuimedo | due to timeout | 15:05 |
dulek | I'm checking. | 15:05 |
dulek | Yep, Kuryr mashed it. No pool created. | 15:05 |
dulek | So… Kuryr-Kubernetes logs! | 15:06 |
dmellado | dulek: does it complain about us developers? xD | 15:08 |
dulek | dmellado: No. I wonder though who's not going to complain when he needs to wait 10 minutes until his Service is exposed… | 15:09 |
dulek | Someone's still working on mainframes, I guess? | 15:09 |
dmellado | I wonder if this is related to nested kvm performance | 15:09 |
dmellado | as it's *not* on the upstream infra | 15:09 |
dulek | dmellado: No nested? Cool. | 15:10 |
dmellado | yeah, 'cool' | 15:10 |
dmellado | there are plans to enable this but not yet | 15:10 |
dulek | dmellado: Might be that, VM was ACTIVE fast, but stuff done on the VM… That's other story. | 15:10 |
dulek | Okay, so how about I increase Kuryr's timeout and retry? | 15:10 |
dmellado | but this was working at some point | 15:10 |
dmellado | dulek: yep | 15:10 |
dulek | Default timeout is pathetic 180 seconds? | 15:13 |
dmellado | dulek: put 600s at least, given what we saw | 15:13 |
*** yamamoto has quit IRC | 15:13 | |
dulek | Sure thing, I just wonder where do I put it. | 15:14 |
dmellado | apuimedo: any hint on that? | 15:14 |
*** yamamoto has joined #openstack-kuryr | 15:14 | |
dulek | I see line number, okay. | 15:14 |
dmellado | you saw it? awesome | 15:14 |
apuimedo | dulek: hey... Now that I think of it | 15:14 |
apuimedo | didn't eunsoo report this issue and sent a patch to have the timeout be configurable? | 15:15 |
dmellado | hmm could be, kinda rings a bell | 15:15 |
apuimedo | https://review.openstack.org/#/c/549945/ | 15:15 |
dulek | Currently it's hardcoded on, surprise, surprise… 300 seconds. | 15:15 |
apuimedo | dulek: ^^ | 15:15 |
dulek | Heh, exactly. xD | 15:16 |
apuimedo | so... It's ltomasbo's fault for -1 | 15:16 |
dulek | It's always him! | 15:16 |
dulek | I'll just change the value in the code. | 15:17 |
ltomasbo | :/ | 15:17 |
dmellado | dulek: apuimedo I was just blaming him now xD | 15:17 |
dmellado | he deserves it | 15:17 |
dmellado | xD | 15:17 |
ltomasbo | lol | 15:17 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Add namespace subnet driver for namespace creation https://review.openstack.org/562247 | 15:17 |
dmellado | apuimedo: bring back that picture of him with the new manager | 15:17 |
dmellado | xD | 15:17 |
apuimedo | xD | 15:18 |
dulek | Okay, 1000 seconds timeout, lets restart and try again. | 15:18 |
apuimedo | ltomasbo: ok | 15:18 |
apuimedo | dulek: ok | 15:18 |
*** yamamoto has quit IRC | 15:19 | |
dulek | Okay, it's going. | 15:20 |
dmellado | let's see | 15:20 |
apuimedo | dulek: it was 300 by default, right? | 15:23 |
dulek | BTW - we do have an issue that on timeout we don't clean up the LB. | 15:23 |
dulek | And then get a conflict and fail to process the event. | 15:23 |
apuimedo | dulek: I know | 15:23 |
dulek | Who's going to file a bug? ;) | 15:23 |
dulek | apuimedo: It was 300. | 15:23 |
apuimedo | I wonder how well it works to start deleting something that either didn't finish or errored in provisioning | 15:23 |
apuimedo | dulek: I'd make it related to bug https://review.openstack.org/#/c/549945/6 | 15:24 |
apuimedo | it fits quite well | 15:24 |
dulek | apuimedo: Sure thing. I'll confirm that this is the issue and will handle all the patches and paperwork. | 15:24 |
dulek | Unless I'll not make it in 2 hours as that's a hard stop for me today. :P | 15:25 |
apuimedo | dulek: very well | 15:27 |
apuimedo | let me know if I need to take over from you then | 15:28 |
dulek | apuimedo: Sure. | 15:28 |
dulek | BTW - anyone on fixing stable/queens? We need to figure out what to do with https://review.openstack.org/#/c/561974/ fast… | 15:28 |
dmellado | dulek: remaning lbaas could've been there from interrupted tempest runs | 15:28 |
dulek | dmellado: It is, but still Tempest would let it clean up. | 15:29 |
dulek | dmellado: The issue is Kuryr doesn't clean up on timeout. | 15:29 |
dmellado | nope if you interrupt it | 15:29 |
dmellado | dulek: worst case we'll get to skip the test there too | 15:29 |
dulek | dmellado: Service and Pods were deleted, Kuryr should handle deletion of its resources. | 15:29 |
dulek | dmellado: It's not "skip a test". It's skip whole CI. | 15:30 |
dulek | s/CI/gate | 15:30 |
dmellado | we can make octavia non voting for there until it gets solved | 15:30 |
apuimedo | dulek: we're stuck on dib, right? | 15:30 |
dmellado | not a pretty solution but the best that I can think on | 15:30 |
dulek | apuimedo: Right | 15:30 |
dulek | dmnRight. | 15:30 |
dmellado | yeah | 15:30 |
dulek | dmellado: Right. | 15:30 |
dulek | dmellado: And here's the orphaned resources issue - waiting for LB times out, RetryHandler restarts the LB Handler, Handler sees that LB was not provisioned completely, so it tries to create it. | 15:31 |
dmellado | hmm I see | 15:32 |
dulek | dmellado: It fails due to conflict, HTTP conflict is not on a list for RetryHandler, so event gets lost. | 15:32 |
dulek | That's it. | 15:32 |
dmellado | all awesome and probably due to vm slowness | 15:32 |
dulek | dmellado: On timeout we need to either cleanup existing LB or make sure we're able to detect it and restart *waiting*, not creating. | 15:32 |
ltomasbo | apuimedo, I found out why the k8s-pod-net was not on the amphora, but adding it does not seems to help... | 15:32 |
apuimedo | ltomasbo: tell me you were not on this long | 15:33 |
apuimedo | cause I already solved that in the morning when you first showed it to me | 15:34 |
ltomasbo | I was doing other staff | 15:34 |
apuimedo | if you are talking about the api lb | 15:34 |
apuimedo | ok | 15:34 |
ltomasbo | well, it is not really solved | 15:34 |
ltomasbo | it does not work for octavia l2 | 15:34 |
apuimedo | ltomasbo: I had it working | 15:34 |
ltomasbo | when? in another setup? | 15:34 |
apuimedo | ah no | 15:34 |
*** yamamoto has joined #openstack-kuryr | 15:34 | |
apuimedo | right | 15:34 |
apuimedo | there's the centos issue | 15:34 |
apuimedo | with checksum | 15:34 |
ltomasbo | in your patch (from yesterday) the problem was that we need to add the member in a different way when using l2 and l3 mode | 15:35 |
apuimedo | you should report a bug on that to Octavia and assign it to cafarelli | 15:35 |
apuimedo | ltomasbo: I know | 15:35 |
apuimedo | xD | 15:35 |
ltomasbo | apuimedo, so I modified devstack to include that | 15:35 |
ltomasbo | and apply the right SGs | 15:35 |
ltomasbo | now that is right, but I see the same problem as before with the demo loadbalancer (when the api was using l3 instead of l2) | 15:36 |
ltomasbo | apuimedo, I believe this is the problem that we hit with the ubuntu amphora long ago | 15:37 |
ltomasbo | that we are hitting it now with the centos one | 15:37 |
openstackgerrit | Daniel Mellado proposed openstack/kuryr-kubernetes master: Remove LIBS_FROM_GIT as a ver https://review.openstack.org/562719 | 15:38 |
dulek | 14 minutes until it works. | 15:38 |
openstackgerrit | Daniel Mellado proposed openstack/kuryr-kubernetes master: Remove LIBS_FROM_GIT as a var in zuul.yaml https://review.openstack.org/562719 | 15:38 |
dulek | I've put 16,6 minutes there, so it's still too close. :P | 15:38 |
dmellado | hmmm that's a LONG time | 15:39 |
dmellado | if this is happening like that maybe we'd need to add a new label like 'slow' tag to these tests | 15:39 |
dmellado | and add a new tox env | 15:39 |
dulek | At least it works. | 15:39 |
apuimedo | ltomasbo: the demo loadbalancer issue is unrelated to that, yes | 15:39 |
apuimedo | I saw that the requests get to the amphora | 15:39 |
apuimedo | and from the amphora you can send requests to the member | 15:39 |
apuimedo | but the haproxy is not taking the requests from what I can see | 15:39 |
*** apuimedo has quit IRC | 15:39 | |
*** celebdor1 has joined #openstack-kuryr | 15:39 | |
*** celebdor1 is now known as apuimedo | 15:39 | |
apuimedo | ltomasbo: which is the last message you saw from me? | 15:40 |
dulek | Okay, I'll clean up the patches and will ping you once ready. | 15:40 |
apuimedo | dulek: I may have missed some message | 15:40 |
dulek | BTW - where should I put the timeout option? neutron_defaults sound wrong… | 15:40 |
apuimedo | my daugther disabled the wifi | 15:40 |
dmellado | apuimedo: lol | 15:40 |
ltomasbo | apuimedo, yes, and that is the same issue we hit for ubuntu | 15:40 |
dmellado | dulek: let me know if you need to take me over lately | 15:40 |
dulek | apuimedo: Put the router on the ceiling! | 15:40 |
dmellado | later | 15:40 |
ltomasbo | apuimedo, https://review.openstack.org/#/c/501915/ | 15:40 |
dulek | dmellado, apuimedo: Where should I put lbaas_activation_timeout option? neutron_defaults section…? Sounds a bit weird to me… | 15:41 |
apuimedo | well, since we manage it all via neutron (auth and such) | 15:41 |
apuimedo | we probably should put it there first | 15:41 |
apuimedo | and then move to a LB section | 15:41 |
apuimedo | (defined in the handler) | 15:41 |
apuimedo | (or driver) | 15:41 |
dulek | Hm, okay, I'll leave it in neutron_defaults and we'll think on a cleanup later on. | 15:42 |
apuimedo | my wife went to some travelperk meetup (I wonder if she'll see devvesa there) | 15:42 |
dulek | Moving options around isn't too bad with oslo.config. | 15:42 |
apuimedo | so I have both little monsters in my care | 15:42 |
apuimedo | dulek: true | 15:42 |
apuimedo | ltomasbo: yes, it is probably something similar to https://review.openstack.org/#/c/501915/ | 15:43 |
dmellado | apuimedo: travelperk | 15:44 |
dmellado | deff she'll meet him | 15:44 |
dmellado | give him regards xD | 15:44 |
apuimedo | I won't be there | 15:46 |
apuimedo | dmellado: dulek: alright, what do we do about dib? | 15:47 |
apuimedo | Do we keep octavia disabled in stable/queens for the merge | 15:47 |
apuimedo | so we can make a release? | 15:47 |
* dulek finishes timeout stuff, but has no ideas so go on. | 15:47 | |
apuimedo | s/disable/move to non-voting/ | 15:48 |
apuimedo | that is the plan, as you said | 15:48 |
apuimedo | then in parallel we have to figure out how long we'll have this octavia breakage | 15:48 |
apuimedo | dmellado: did you talk to the openstack-lbaas guys about this? | 15:49 |
ltomasbo | apuimedo, I'm testing with ubuntu amphora to see if that works with L2 | 15:52 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Make Neutron LBaaS Activation Timeout configurable https://review.openstack.org/549945 | 15:53 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-tempest-plugin master: Revert "Skip service test" https://review.openstack.org/561364 | 15:53 |
dulek | Okay, let's see how those will do. | 15:53 |
dulek | apuimedo: I'm okay with moving it to non-voting. That's better than using master's Amphorae tarball. | 15:54 |
apuimedo | ltomasbo: thanks | 15:58 |
apuimedo | dulek: probably | 15:59 |
*** pcaruana has quit IRC | 16:03 | |
*** jchhatbar has quit IRC | 16:35 | |
dulek | Hm, jobs are queued for 46 minutes now. I guess I'm checking CI results from the pub today. :P | 16:40 |
dmellado | apuimedo: let's go non-voting for now | 16:47 |
dmellado | apuimedo yeah | 16:48 |
dmellado | will try to whack them tomorrow | 16:48 |
dmellado | dulek: go for the pub | 16:48 |
dmellado | let's fetch some beers today | 16:48 |
dmellado | I need those xD | 16:48 |
dulek | dmellado: Each Thursday we're doing PubQuiz with friends. We're now on triple winning streak, so only one beer today to make sure we'll not break it. :D | 16:50 |
dmellado | lol | 16:50 |
dulek | Maybe more after the quiz…? Anyway see you tomorrow! | 16:51 |
dmellado | enjoy dulek | 16:51 |
*** garyloug has quit IRC | 17:00 | |
*** jermz has joined #openstack-kuryr | 17:09 | |
*** jerms has quit IRC | 17:10 | |
*** yamamoto has quit IRC | 17:20 | |
*** mestery has quit IRC | 17:29 | |
*** mestery has joined #openstack-kuryr | 17:31 | |
*** mestery has quit IRC | 18:07 | |
*** yamamoto has joined #openstack-kuryr | 18:20 | |
*** yamamoto has quit IRC | 18:30 | |
openstackgerrit | Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Fix LB member creation on Nested environment https://review.openstack.org/562800 | 18:58 |
*** maysamacedos has quit IRC | 19:11 | |
*** dulek_ has joined #openstack-kuryr | 19:22 | |
*** premsankar has joined #openstack-kuryr | 19:35 | |
*** maysamacedos has joined #openstack-kuryr | 19:56 | |
*** dulek_ has quit IRC | 20:08 | |
openstackgerrit | Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Fix LB member creation on Nested environment https://review.openstack.org/562800 | 20:16 |
*** maysams has joined #openstack-kuryr | 20:31 | |
*** maysams has quit IRC | 20:36 | |
*** atoth has quit IRC | 20:41 | |
*** maysams has joined #openstack-kuryr | 20:49 | |
*** maysams has quit IRC | 20:53 | |
*** maysams has joined #openstack-kuryr | 21:13 | |
*** maysams has quit IRC | 21:16 | |
*** maysamacedos has quit IRC | 21:31 | |
*** yamamoto has joined #openstack-kuryr | 21:49 | |
*** maysams has joined #openstack-kuryr | 22:09 | |
*** maysams has quit IRC | 22:11 | |
*** maysamacedos has joined #openstack-kuryr | 22:17 | |
*** apuimedo has quit IRC | 22:24 | |
*** hongbin_ has quit IRC | 22:57 | |
*** salv-orlando has quit IRC | 23:13 | |
*** salv-orlando has joined #openstack-kuryr | 23:13 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!