Friday, 2018-12-21

*** maysams has quit IRC		00:05
*** celebdor has joined #openstack-kuryr		01:06
*** rh-jelabarre has joined #openstack-kuryr		01:24
*** hongbin has joined #openstack-kuryr		02:00
*** celebdor has quit IRC		02:33
*** rh-jelabarre has quit IRC		04:25
*** janki has joined #openstack-kuryr		04:44
*** sean-k-mooney has quit IRC		05:18
*** hongbin has quit IRC		05:55
*** ccamposr has joined #openstack-kuryr		05:59
*** janki has quit IRC		06:38
dulek	bathri-s: Seems like your Docker daemon isn't running for some reason. Checkout it's logs.	07:18
*** maysams has joined #openstack-kuryr		07:43
*** pcaruana has joined #openstack-kuryr		08:19
*** dmellado has quit IRC		08:44
dulek	ltomasbo: Seems like you're only one I can consult my idea with… ;)	08:45
ltomasbo	dulek, sure! tell me!	08:45
dulek	ltomasbo: I'm tempted to try switching all jobs to multinode and only run etcd on the second node.	08:46
dulek	ltomasbo: That way we should be able to see if it's some contention or simple lack of resources issue, or something in the software.	08:46
ltomasbo	dulek, umm, with the current status of the gates...	08:46
dulek	ltomasbo: Naaah, just for testing.	08:46
*** dmellado has joined #openstack-kuryr		08:46
ltomasbo	dulek, are we actually using the multinode gate right now?	08:46
ltomasbo	dulek, perhaps you can test it on that one...	08:47
dulek	ltomasbo: Yes, but etcd only runs on controller node.	08:47
ltomasbo	dulek, ahh, true...	08:47
ltomasbo	dulek, sounds good to me	08:47
dulek	ltomasbo: Well, the etcd issue is hitting us 1 in 10 runs, so it's easier to just switch all. :P	08:48
ltomasbo	dulek, also, note there were some problems with locating etcd on a different node (if I remember correctly)	08:48
ltomasbo	dulek, I remember having to do so for a mix env with nested and baremetal	08:48
ltomasbo	dulek, and I had to tweak devstack to allow that... not sure about the current status	08:48
ltomasbo	dulek, on a (somehow) related note. I didn't hit the int literal problem thing on this: https://review.openstack.org/#/c/626624/1	08:49
dulek	ltomasbo: Hm, yeah, I'll need to get subnode IP somehow.	08:49
ltomasbo	dulek, but probably because it failed before...	08:49
dulek	ltomasbo: xD	08:49
ltomasbo	dulek, so, I'm rechecking to see if it helps...	08:49
ltomasbo	dulek, also updated https://review.openstack.org/#/c/626363/	08:50
ltomasbo	dmellado, ^^	08:50
dulek	dmellado: o/	08:50
ltomasbo	dulek, this looks good too: https://review.openstack.org/#/c/626638/1	08:50
dulek	ltomasbo: It does, but K8s 1.13 drops etcd2. xD	08:51
dmellado	hi folks, damn (hug) bouncer	08:51
dulek	ltomasbo: I needed to go back to 1.12 to make this work.	08:51
dmellado	any more hugged findings from the gate?	08:51
dulek	dmellado: http://eavesdrop.openstack.org/irclogs/%23openstack-kuryr/	08:51
ltomasbo	dulek, yep, but dmellado was moving back to 1.12 too in another patch...	08:51
ltomasbo	better to go with 1.12 until we make 1.13 work, rather than the other way around, right?	08:51
ltomasbo	dulek, dmellado ^^	08:51
dulek	ltomasbo: It's find for short term of course, but in the long run we need to find a way.	08:51
dulek	Or maybe it's 1.13 bug and we should report it.	08:52
dulek	ltomasbo: But from what I've seen K8s API folks only answer - add resources to your etcd node.	08:52
dulek	ltomasbo: s/find/fine. :D	08:52
ltomasbo	dulek, yep... and that could be actually the case... I think etcd has a problem...	08:52
dmellado	loool so k8s 1.13 to blame	08:52
dmellado	and just 'add resources'	08:52
dmellado	awesome	08:53
dmellado	let's revert to 1.12 for now and investigate with 1.13 on an experimental gate	08:53
ltomasbo	dulek, I'm my envs sometimes pods are not getting active too due to etcd missing events	08:53
ltomasbo	dulek, and in eminguez env the other day, deletion of resources was taking ages...	08:53
dulek	dmellado: We've seen that occasionally with 1.12 as well. It's probably just 1.13 stretching etcd more.	08:53
ltomasbo	probably due to etcd sync...	08:53
dmellado	in any case it might become better if we go to kubeadm and put etcd on another node...	08:54
dulek	dmellado: That's my idea for a test patch now. Switch all the gates to multinode with etcd on the subnode.	08:54
dulek	dmellado: I just need to get subnode ip somehow. :D	08:54
dulek	dmellado: Another thing to try is to give the freaking etcd higher CPU priority. It haven't worked with IO, but dstat says that it isn't really an issue with iops.	08:56
*** janki has joined #openstack-kuryr		08:56
dmellado	even with nice? xD	08:56
dulek	I'll start with analyzing https://review.openstack.org/#/c/626638/	08:56
dulek	dmellado: https://review.openstack.org/#/c/624731/	08:57
dmellado	dulek: # Żółć. ? xD	08:58
dmellado	in any case looks promising	08:59
dmellado	if we could get by with this without having to get all our gates to multinode it'd be easier	08:59
dulek	dmellado: Oh, I only want to switch that to test if it's contention/lack of resources issue.	09:00
dmellado	I wouldn't be surprised if that's the case	09:00
dulek	dmellado: żółć means bile. Hard to find more Polish word. :D	09:00
dmellado	as we're installing a lot of things in the node	09:00
dmellado	devstack, hyperkube and so	09:01
dulek	ltomasbo: Hey, and about the failures on your OVS from source patch.	09:01
dulek	ltomasbo: Isn't it due to different kernel versions maybe?	09:01
dulek	ltomasbo: Just a thought but maybe it's only failing on one of the clouds.	09:01
dulek	Okay, so with switch to etcd2 we're still getting some issues on RAX nodes. I'll wait for the recheck, but looks like it's not helping.	09:03
dmellado	dulek: that with also 1.12?	09:04
dulek	dmellado: Yup, I needed to downgrade that - 1.13 drops etcd2 support.	09:05
dmellado	actually it seems that https://review.openstack.org/#/c/624730/ performs better	09:08
ltomasbo	dulek, I check and some of them are unrelated... other could be due to not rmmod openvswitch before compiling it from source (I guess)	09:18
*** garyloug has joined #openstack-kuryr		09:19
ltomasbo	dulek, in your patch: HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://158.69.74.203:2379 has no leader\n","code":500}	09:20
dmellado	I've also seen that on some patches	09:20
dulek	ltomasbo: Which of my patches/	09:24
dulek	?	09:24
ltomasbo	the one using etcd2 and kubernetes 1.12	09:24
dulek	ltomasbo: Which gate?	09:25
ltomasbo	http://logs.openstack.org/38/626638/1/check/kuryr-kubernetes-tempest-daemon-containerized-octavia/050a482/testr_results.html.gz	09:25
ltomasbo	dulek, this seems to work actually: https://review.openstack.org/#/c/626624/	09:38
ltomasbo	dulek, it is failing on ovn because by using that var, I believe both ovn and neutron are trying to install ovs from soruce	09:39
ltomasbo	dulek, the other gate is failing on the sg_svc_isolation (which is unrelated)	09:39
ltomasbo	dulek, and the last failures is related t7u failed kubernetes-scheduler.service...	09:40
ltomasbo	dulek, I'm going to recheck again...	09:40
dulek	ltomasbo: http://logs.openstack.org/24/626624/1/check/kuryr-kubernetes-tempest-daemon-octavia/d5736a8/controller/logs/screen-kubernetes-api.txt.gz#_Dec_21_08_25_21_618551	09:40
dulek	ltomasbo: It still has failures on K8s API.	09:41
dulek	ltomasbo: So I'd say it's just luck. ;)	09:41
ltomasbo	dulek, I meant for the other error, not for etcd	09:41
dulek	ltomasbo: Ah, but you were aiming to get rid of base 10. :D	09:41
ltomasbo	dulek, I'm refering to the base 10 thing, yes	09:41
dulek	ltomasbo: Those were more rare, so it might need a few rechecks to confirm.	09:42
ltomasbo	yes yes, I'm rechecking...	09:42
ltomasbo	(and waiting for experimental)	09:42
ltomasbo	but I was getting those locally with ovs2.9	09:42
ltomasbo	so, I believe it may help... fingers crossed	09:43
*** gmann is now known as gmann_pto		09:43
*** celebdor has joined #openstack-kuryr		09:53
openstackgerrit	Merged openstack/kuryr-kubernetes master: Drop Octavia providers supported protocols list https://review.openstack.org/626032	09:57
*** phuoc_ has quit IRC		10:06
*** phuoc_ has joined #openstack-kuryr		10:06
dulek	dmellado: If you want easy Friday refactoring, there's a devstack-minimal job that could serve as our base job. :)	10:39
dmellado	dulek: heh, we could use that	10:39
dmellado	but let me fist finish dealing with pagure and fedora	10:39
dmellado	hugged python-openshift	10:39
dmellado	dulek: please remember me to force using request for anyone who needs a client who's not packaged in distro already	10:40
openstackgerrit	Michał Dulko proposed openstack/kuryr-kubernetes master: Enable debug logs on Kubernetes services https://review.openstack.org/626609	10:51
openstackgerrit	Michał Dulko proposed openstack/kuryr-kubernetes master: DNM: Put etcd on another host https://review.openstack.org/626872	10:51
dulek	dmellado: ^ that's really bruteforce, but let's see.	10:51
dulek	dmellado: My another idea is to put etcd data directory onto ramdisk. xD	10:51
dmellado	dulek: hmmm looking forward to see the result...	10:51
dmellado	LOL	10:51
dmellado	that might actually not be a bad idea xD	10:52
dmellado	but we're already short on ram xD	10:52
dulek	dmellado: For tests it's totally a viable long-term solution.	10:52
dulek	dmellado: Depends how big it would need to be. If etcd only needs ~500 MB, then I think we can spare that.	10:52
dmellado	well, let's see the outcome of ^^first and then we can get to discuss that	10:53
dmellado	we could even use the another host ramdisk xD xD XD	10:53
dulek	dmellado: Shhh, I hear infra folks walking nearby. ;)	10:53
dmellado	:D	10:54
*** gkadam has joined #openstack-kuryr		10:56
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878	11:05
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878	11:37
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878	11:44
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: DNM Test building ovs from source https://review.openstack.org/626624	11:55
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878	11:55
openstackgerrit	Michał Dulko proposed openstack/kuryr-kubernetes master: Enable debug logs on Kubernetes services https://review.openstack.org/626609	12:19
dulek	dmellado: ^ this now depends from a patch that creates a ramfs for etcd's data. We'll see…	12:26
dmellado	lol	12:26
dmellado	let's see if that ramdisk isn't really filled out	12:26
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Handle loadbalancer SGs are created when sg_mode is create https://review.openstack.org/626887	12:30
dulek	dmellado: I've tried it locally and, well. kubectl works super fast. xD	12:38
dulek	Gotta go now, happy holidays everyone!	12:39
dmellado	dulek: happy holidays!	12:39
dmellado	I'll take a look, thanks!!!	12:39
dmellado	and safe travels!!!	12:39
dulek	I'll check back here in the evening, but probably nobody will be there. ;)	12:39
dmellado	I assume you'll be driving back home for Christmas?	12:39
dulek	dmellado: Naaah, we've decided to fly to Canary Island instead. ;)	12:40
dmellado	Oh, enjoy then!	12:40
dmellado	don't forget to try 'mojo picón' xD	12:40
* dulek never tries anything dmellado recommends without checking what's that first.		12:42
dmellado	lol	12:42
dmellado	dulek: c'mon, I even helped you with your CYD issues xD	12:43
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healtchecks passes without CRDs https://review.openstack.org/626878	12:53
*** ccamposr has quit IRC		12:54
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure controller healthchecks passes without CRDs https://review.openstack.org/626878	12:56
*** rh-jelabarre has joined #openstack-kuryr		13:15
openstackgerrit	Antoni Segura Puimedon proposed openstack/kuryr-tempest-plugin master: detect failed curl when streamed from pod https://review.openstack.org/626892	13:21
celebdor	ltomasbo: dulek: ^^ will at least give a more meaningful error	13:22
celebdor	to reduce headscratching	13:22
*** janki has quit IRC		14:04
ltomasbo	celebdor, dulek: I'm rechecking once again this one: https://review.openstack.org/#/c/626624/	14:21
ltomasbo	celebdor, dulek: for now I haven't hit the invalid literal problem. And I fixed the problem for OVN gates and building for source (twice)	14:21
celebdor	ltomasbo: why are you rechecking it?	14:21
celebdor	ltomasbo: did you see the change I made to be more precise on the error?	14:22
ltomasbo	celebdor, as that problem was not happening all the times, to ensure it is actually avoiding the problem (and not just being lucky	14:22
celebdor	ok	14:22
ltomasbo	celebdor, not yet	14:22
celebdor	ok	14:25
dmellado	let's see if we can get the CI to behave in a more reliable way	14:54
dmellado	also the ramfs will make things faster hopefully	14:54
dmellado	while not getting the node outta ram	14:54
openstackgerrit	Merged openstack/kuryr-kubernetes master: Ensure controller healthchecks passes without CRDs https://review.openstack.org/626878	16:04
*** pcaruana has quit IRC		16:18
dmellado	o/ I'm off until Jan the 2nd! Happy New Year kuryrs! Thanks for your help along this year! ;)	16:21
openstackgerrit	Maysa de Macedo Souza proposed openstack/kuryr-kubernetes master: Update CRD when NP has podSelectors https://review.openstack.org/625588	16:29
openstackgerrit	Luis Tomas Bolivar proposed openstack/kuryr-kubernetes master: Ensure gates run the latest OVS https://review.openstack.org/626624	16:40
*** gkadam has quit IRC		16:58
*** gkadam has joined #openstack-kuryr		16:58
openstackgerrit	Merged openstack/kuryr-tempest-plugin master: Fixup kuryr_daemon_enabled option description https://review.openstack.org/622932	17:15
*** celebdor has quit IRC		19:33
*** maysams has quit IRC		20:46
openstackgerrit	Merged openstack/kuryr-kubernetes master: Handle loadbalancer SGs are created when sg_mode is create https://review.openstack.org/626887	21:26
*** aojea has joined #openstack-kuryr		22:56
*** aojea has quit IRC		22:56
*** dims has quit IRC		23:33

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!