*** maysams has quit IRC | 00:05 | |
*** celebdor has joined #openstack-kuryr | 01:06 | |
*** rh-jelabarre has joined #openstack-kuryr | 01:24 | |
*** hongbin has joined #openstack-kuryr | 02:00 | |
*** celebdor has quit IRC | 02:33 | |
*** rh-jelabarre has quit IRC | 04:25 | |
*** janki has joined #openstack-kuryr | 04:44 | |
*** sean-k-mooney has quit IRC | 05:18 | |
*** hongbin has quit IRC | 05:55 | |
*** ccamposr has joined #openstack-kuryr | 05:59 | |
*** janki has quit IRC | 06:38 | |
dulek | bathri-s: Seems like your Docker daemon isn't running for some reason. Checkout it's logs. | 07:18 |
---|---|---|
*** maysams has joined #openstack-kuryr | 07:43 | |
*** pcaruana has joined #openstack-kuryr | 08:19 | |
*** dmellado has quit IRC | 08:44 | |
dulek | ltomasbo: Seems like you're only one I can consult my idea with… ;) | 08:45 |
ltomasbo | dulek, sure! tell me! | 08:45 |
dulek | ltomasbo: I'm tempted to try switching all jobs to multinode and only run etcd on the second node. | 08:46 |
dulek | ltomasbo: That way we should be able to see if it's some contention or simple lack of resources issue, or something in the software. | 08:46 |
ltomasbo | dulek, umm, with the current status of the gates... | 08:46 |
dulek | ltomasbo: Naaah, just for testing. | 08:46 |
*** dmellado has joined #openstack-kuryr | 08:46 | |
ltomasbo | dulek, are we actually using the multinode gate right now? | 08:46 |
ltomasbo | dulek, perhaps you can test it on that one... | 08:47 |
dulek | ltomasbo: Yes, but etcd only runs on controller node. | 08:47 |
ltomasbo | dulek, ahh, true... | 08:47 |
ltomasbo | dulek, sounds good to me | 08:47 |
dulek | ltomasbo: Well, the etcd issue is hitting us 1 in 10 runs, so it's easier to just switch all. :P | 08:48 |
ltomasbo | dulek, also, note there were some problems with locating etcd on a different node (if I remember correctly) | 08:48 |
ltomasbo | dulek, I remember having to do so for a mix env with nested and baremetal | 08:48 |
ltomasbo | dulek, and I had to tweak devstack to allow that... not sure about the current status | 08:48 |
ltomasbo | dulek, on a (somehow) related note. I didn't hit the int literal problem thing on this: https://review.openstack.org/#/c/626624/1 | 08:49 |
dulek | ltomasbo: Hm, yeah, I'll need to get subnode IP somehow. | 08:49 |
ltomasbo | dulek, but probably because it failed before... | 08:49 |
dulek | ltomasbo: xD | 08:49 |
ltomasbo | dulek, so, I'm rechecking to see if it helps... | 08:49 |
ltomasbo | dulek, also updated https://review.openstack.org/#/c/626363/ | 08:50 |
ltomasbo | dmellado, ^^ | 08:50 |
dulek | dmellado: o/ | 08:50 |
ltomasbo | dulek, this looks good too: https://review.openstack.org/#/c/626638/1 | 08:50 |
dulek | ltomasbo: It does, but K8s 1.13 drops etcd2. xD | 08:51 |
dmellado | hi folks, damn (hug) bouncer | 08:51 |
dulek | ltomasbo: I needed to go back to 1.12 to make this work. | 08:51 |
dmellado | any more hugged findings from the gate? | 08:51 |
dulek | dmellado: http://eavesdrop.openstack.org/irclogs/%23openstack-kuryr/ | 08:51 |
ltomasbo | dulek, yep, but dmellado was moving back to 1.12 too in another patch... | 08:51 |
ltomasbo | better to go with 1.12 until we make 1.13 work, rather than the other way around, right? | 08:51 |
ltomasbo | dulek, dmellado ^^ | 08:51 |
dulek | ltomasbo: It's find for short term of course, but in the long run we need to find a way. | 08:51 |
dulek | Or maybe it's 1.13 bug and we should report it. | 08:52 |
dulek | ltomasbo: But from what I've seen K8s API folks only answer - add resources to your etcd node. | 08:52 |
dulek | ltomasbo: s/find/fine. :D | 08:52 |
ltomasbo | dulek, yep... and that could be actually the case... I think etcd has a problem... | 08:52 |
dmellado | loool so k8s 1.13 to blame | 08:52 |
dmellado | and just 'add resources' | 08:52 |
dmellado | awesome | 08:53 |
dmellado | let's revert to 1.12 for now and investigate with 1.13 on an experimental gate | 08:53 |
ltomasbo | dulek, I'm my envs sometimes pods are not getting active too due to etcd missing events | 08:53 |
ltomasbo | dulek, and in eminguez env the other day, deletion of resources was taking ages... | 08:53 |
dulek | dmellado: We've seen that occasionally with 1.12 as well. It's probably just 1.13 stretching etcd more. | 08:53 |
ltomasbo | probably due to etcd sync... | 08:53 |
dmellado | in any case it might become better if we go to kubeadm and put etcd on another node... | 08:54 |
dulek | dmellado: That's my idea for a test patch now. Switch all the gates to multinode with etcd on the subnode. | 08:54 |
dulek | dmellado: I just need to get subnode ip somehow. :D | 08:54 |
dulek | dmellado: Another thing to try is to give the freaking etcd higher CPU priority. It haven't worked with IO, but dstat says that it isn't really an issue with iops. | 08:56 |
*** janki has joined #openstack-kuryr | 08:56 | |
dmellado | even with nice? xD | 08:56 |
dulek | I'll start with analyzing https://review.openstack.org/#/c/626638/ | 08:56 |
dulek | dmellado: https://review.openstack.org/#/c/624731/ | 08:57 |
dmellado | dulek: # Żółć. ? xD | 08:58 |
dmellado | in any case looks promising | 08:59 |
dmellado | if we could get by with this without having to get all our gates to multinode it'd be easier | 08:59 |
dulek | dmellado: Oh, I only want to switch that to test if it's contention/lack of resources issue. | 09:00 |
dmellado | I wouldn't be surprised if that's the case | 09:00 |
dulek | dmellado: żółć means bile. Hard to find more Polish word. :D | 09:00 |
dmellado | as we're installing a lot of things in the node | 09:00 |
dmellado | devstack, hyperkube and so | 09:01 |
dulek | ltomasbo: Hey, and about the failures on your OVS from source patch. | 09:01 |
dulek | ltomasbo: Isn't it due to different kernel versions maybe? | 09:01 |
dulek | ltomasbo: Just a thought but maybe it's only failing on one of the clouds. | 09:01 |
dulek | Okay, so with switch to etcd2 we're still getting some issues on RAX nodes. I'll wait for the recheck, but looks like it's not helping. | 09:03 |
dmellado | dulek: that with also 1.12? | 09:04 |
dulek | dmellado: Yup, I needed to downgrade that - 1.13 drops etcd2 support. | 09:05 |
dmellado | actually it seems that https://review.openstack.org/#/c/624730/ performs better | 09:08 |
ltomasbo | dulek, I check and some of them are unrelated... other could be due to not rmmod openvswitch before compiling it from source (I guess) | 09:18 |
*** garyloug has joined #openstack-kuryr | 09:19 | |
ltomasbo | dulek, in your patch: HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://158.69.74.203:2379 has no leader\n","code":500} | 09:20 |
dmellado | I've also seen that on some patches | 09:20 |
dulek | ltomasbo: Which of my patches/ | 09:24 |
dulek | ? | 09:24 |
ltomasbo | the one using etcd2 and kubernetes 1.12 | 09:24 |
dulek | ltomasbo: Which gate? | 09:25 |
ltomasbo | http://logs.openstack.org/38/626638/1/check/kuryr-kubernetes-tempest-daemon-containerized-octavia/050a482/testr_results.html.gz | 09:25 |
ltomasbo | dulek, this seems to work actually: https://review.openstack.org/#/c/626624/ | 09:38 |
ltomasbo | dulek, it is failing on ovn because by using that var, I believe both ovn and neutron are trying to install ovs from soruce | 09:39 |
ltomasbo | dulek, the other gate is failing on the sg_svc_isolation (which is unrelated) | 09:39 |
ltomasbo | dulek, and the last failures is related t7u failed kubernetes-scheduler.service... | 09:40 |
ltomasbo | dulek, I'm going to recheck again... | 09:40 |
dulek | ltomasbo: http://logs.openstack.org/24/626624/1/check/kuryr-kubernetes-tempest-daemon-octavia/d5736a8/controller/logs/screen-kubernetes-api.txt.gz#_Dec_21_08_25_21_618551 | 09:40 |
dulek | ltomasbo: It still has failures on K8s API. | 09:41 |
dulek | ltomasbo: So I'd say it's just luck. ;) | 09:41 |
ltomasbo | dulek, I meant for the other error, not for etcd | 09:41 |
dulek | ltomasbo: Ah, but you were aiming to get rid of base 10. :D | 09:41 |
ltomasbo | dulek, I'm refering to the base 10 thing, yes | 09:41 |
dulek | ltomasbo: Those were more rare, so it might need a few rechecks to confirm. | 09:42 |
ltomasbo | yes yes, I'm rechecking... | 09:42 |
ltomasbo | (and waiting for experimental) | 09:42 |
ltomasbo | but I was getting those locally with ovs2.9 | 09:42 |
ltomasbo | so, I believe it may help... fingers crossed | 09:43 |
*** gmann is now known as gmann_pto | 09:43 | |
*** celebdor has joined #openstack-kuryr | 09:53 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Drop Octavia providers supported protocols list https://review.openstack.org/626032 | 09:57 |
*** phuoc_ has quit IRC | 10:06 | |
*** phuoc_ has joined #openstack-kuryr | 10:06 | |
dulek | dmellado: If you want easy Friday refactoring, there's a devstack-minimal job that could serve as our base job. :) | 10:39 |
dmellado | dulek: heh, we could use that | 10:39 |
dmellado | but let me fist finish dealing with pagure and fedora | 10:39 |
dmellado | hugged python-openshift | 10:39 |
dmellado | dulek: please remember me to force using request for anyone who needs a client who's not packaged in distro already | 10:40 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Enable debug logs on Kubernetes services https://review.openstack.org/626609 | 10:51 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: DNM: Put etcd on another host https://review.openstack.org/626872 | 10:51 |
dulek | dmellado: ^ that's really bruteforce, but let's see. | 10:51 |
dulek | dmellado: My another idea is to put etcd data directory onto ramdisk. xD | 10:51 |
dmellado | dulek: hmmm looking forward to see the result... | 10:51 |
dmellado | LOL | 10:51 |
dmellado | that might actually not be a bad idea xD | 10:52 |
dmellado | but we're already short on ram xD | 10:52 |
dulek | dmellado: For tests it's totally a viable long-term solution. | 10:52 |
dulek | dmellado: Depends how big it would need to be. If etcd only needs ~500 MB, then I think we can spare that. | 10:52 |
dmellado | well, let's see the outcome of ^^first and then we can get to discuss that | 10:53 |
dmellado | we could even use the another host ramdisk xD xD XD | 10:53 |
dulek | dmellado: Shhh, I hear infra folks walking nearby. ;) | 10:53 |
dmellado | :D | 10:54 |
*** gkadam has joined #openstack-kuryr | 10:56 | |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878 | 11:05 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878 | 11:37 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878 | 11:44 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: DNM Test building ovs from source https://review.openstack.org/626624 | 11:55 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878 | 11:55 |
openstackgerrit | Michał Dulko proposed openstack/kuryr-kubernetes master: Enable debug logs on Kubernetes services https://review.openstack.org/626609 | 12:19 |
dulek | dmellado: ^ this now depends from a patch that creates a ramfs for etcd's data. We'll see… | 12:26 |
dmellado | lol | 12:26 |
dmellado | let's see if that ramdisk isn't really filled out | 12:26 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Handle loadbalancer SGs are created when sg_mode is create https://review.openstack.org/626887 | 12:30 |
dulek | dmellado: I've tried it locally and, well. kubectl works super fast. xD | 12:38 |
dulek | Gotta go now, happy holidays everyone! | 12:39 |
dmellado | dulek: happy holidays! | 12:39 |
dmellado | I'll take a look, thanks!!! | 12:39 |
dmellado | and safe travels!!! | 12:39 |
dulek | I'll check back here in the evening, but probably nobody will be there. ;) | 12:39 |
dmellado | I assume you'll be driving back home for Christmas? | 12:39 |
dulek | dmellado: Naaah, we've decided to fly to Canary Island instead. ;) | 12:40 |
dmellado | Oh, enjoy then! | 12:40 |
dmellado | don't forget to try 'mojo picón' xD | 12:40 |
* dulek never tries anything dmellado recommends without checking what's that first. | 12:42 | |
dmellado | lol | 12:42 |
dmellado | dulek: c'mon, I even helped you with your CYD issues xD | 12:43 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878 | 12:53 |
*** ccamposr has quit IRC | 12:54 | |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healthchecks passes without CRDs https://review.openstack.org/626878 | 12:56 |
*** rh-jelabarre has joined #openstack-kuryr | 13:15 | |
openstackgerrit | Antoni Segura Puimedon proposed openstack/kuryr-tempest-plugin master: detect failed curl when streamed from pod https://review.openstack.org/626892 | 13:21 |
celebdor | ltomasbo: dulek: ^^ will at least give a more meaningful error | 13:22 |
celebdor | to reduce headscratching | 13:22 |
*** janki has quit IRC | 14:04 | |
ltomasbo | celebdor, dulek: I'm rechecking once again this one: https://review.openstack.org/#/c/626624/ | 14:21 |
ltomasbo | celebdor, dulek: for now I haven't hit the invalid literal problem. And I fixed the problem for OVN gates and building for source (twice) | 14:21 |
celebdor | ltomasbo: why are you rechecking it? | 14:21 |
celebdor | ltomasbo: did you see the change I made to be more precise on the error? | 14:22 |
ltomasbo | celebdor, as that problem was not happening all the times, to ensure it is actually avoiding the problem (and not just being lucky | 14:22 |
celebdor | ok | 14:22 |
ltomasbo | celebdor, not yet | 14:22 |
celebdor | ok | 14:25 |
dmellado | let's see if we can get the CI to behave in a more reliable way | 14:54 |
dmellado | also the ramfs will make things faster hopefully | 14:54 |
dmellado | while not getting the node outta ram | 14:54 |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Ensure controller healthchecks passes without CRDs https://review.openstack.org/626878 | 16:04 |
*** pcaruana has quit IRC | 16:18 | |
dmellado | o/ I'm off until Jan the 2nd! Happy New Year kuryrs! Thanks for your help along this year! ;) | 16:21 |
openstackgerrit | Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Update CRD when NP has podSelectors https://review.openstack.org/625588 | 16:29 |
openstackgerrit | Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure gates run the latest OVS https://review.openstack.org/626624 | 16:40 |
*** gkadam has quit IRC | 16:58 | |
*** gkadam has joined #openstack-kuryr | 16:58 | |
openstackgerrit | Merged openstack/kuryr-tempest-plugin master: Fixup kuryr_daemon_enabled option description https://review.openstack.org/622932 | 17:15 |
*** celebdor has quit IRC | 19:33 | |
*** maysams has quit IRC | 20:46 | |
openstackgerrit | Merged openstack/kuryr-kubernetes master: Handle loadbalancer SGs are created when sg_mode is create https://review.openstack.org/626887 | 21:26 |
*** aojea has joined #openstack-kuryr | 22:56 | |
*** aojea has quit IRC | 22:56 | |
*** dims has quit IRC | 23:33 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!