Tuesday, 2020-04-07

flwang1	give yours is a new env, pls go for fedora coreos	00:01
cosmicsound	flwang1 , i tried those images in this new setup somehow they all failed to boot	00:04
cosmicsound	maybe my cli for coreos template is not great	00:04
flwang1	cosmicsound: i think it's because you're using old version heat	00:04
flwang1	what's your heat version?	00:04
cosmicsound	i use kolla-ansible to deploy from source on ubuntu	00:05
cosmicsound	so i should not be using old version	00:05
flwang1	cosmicsound: what's your heat version?	00:06
*** k_mouza has joined #openstack-containers		00:08
*** k_mouza has quit IRC		00:10
cosmicsound	not sure how i see it	00:12
cosmicsound	cli shows me the cli version of cli cliennt	00:12
cosmicsound	heat --version	00:13
cosmicsound	2.0.0	00:13
flwang1	hmm.. no, not heat cli version	00:31
flwang1	your heat service version	00:31
cosmicsound	not sure how to get that yet	00:42
cosmicsound	must be the train stable version	00:42
openstackgerrit	Merged openstack/magnum master: Support calico v3.3.6 https://review.opendev.org/717116	01:23
cosmicsound	it seems if so i found the issue	02:05
cosmicsound	or not	02:08
*** k_mouza has joined #openstack-containers		02:10
*** k_mouza has quit IRC		02:15
cosmicsound	manifest for kolla/ubuntu-source-magnum-api:9.3.0 not found: manifest unknown: manifest unknown	02:32
openstackgerrit	Lingxian Kong proposed openstack/magnum master: [K8S] Delete all related load balancers before deleting cluster https://review.opendev.org/716930	03:48
*** ykarel\|away is now known as ykarel		04:20
cosmicsound	Apr 07 04:02:01 d-pa2hu2ehhcwn-master-0 podman[2332]: Authorization failed: SSL exception connecting to https://cloud.uhlhost.net:5000/v3/auth/tokens: HTTPSConnectionPool(host='cloud.uhlhost.net', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(136, '[X509] no certificate or crl found (_ssl.c:4232)')))	04:23
cosmicsound	this is only in coreos happening	04:24
*** AJaeger has left #openstack-containers		05:04
*** ricolin has joined #openstack-containers		05:30
*** udesale has joined #openstack-containers		05:41
openstackgerrit	Feilong Wang proposed openstack/magnum master: [WIP] Support multi AZ for k8s multi masters https://review.opendev.org/714347	05:44
brtknr	cosmicsound: are you using os_distro=fedora-coreos or coreos?	05:57
cosmicsound	fedora-coreos	06:03
cosmicsound	tried with your scrips	06:03
cosmicsound	scripts*	06:04
cosmicsound	it worked for atomic	06:04
cosmicsound	not for coreos	06:04
cosmicsound	I get this in logs	06:04
cosmicsound	Apr 07 05:45:48 k-necofnexy5va-master-0 podman[2327]: Source [heat_local] Unavailable.	06:04
cosmicsound	Apr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: Source [request] Unavailable.	06:04
cosmicsound	Apr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: /var/lib/os-collect-config/local-data not found. Skipping	06:04
cosmicsound	Apr 07 05:45:50 k-necofnexy5va-master-0 podman[2327]: No auth_url configured.	06:04
cosmicsound	I updated to latest image from coreos	06:04
*** udesale has quit IRC		06:09
*** udesale has joined #openstack-containers		06:13
*** xinliang has joined #openstack-containers		06:26
brtknr	cosmicsound: wait so coreos is booting up but not making the cluster?	06:30
brtknr	If you are using kolla ansible, can you try using the master branch for magnum?	06:31
brtknr	I have seen that before for tls endpoints	06:31
brtknr	eg if your keystone is https	06:32
brtknr	it works when it’s http	06:32
brtknr	please file a bug report	06:32
brtknr	And we shall look at it	06:32
brtknr	But try the master tag for magnum fiest	06:33
brtknr	first	06:33
cosmicsound	yes	06:36
cosmicsound	I will try	06:36
cosmicsound	Now i am on train	06:36
openstackgerrit	Merged openstack/magnum master: Cleanup py27 support https://review.opendev.org/717549	06:41
*** ttsiouts has joined #openstack-containers		06:45
cosmicsound	brtknr , with master branch and your script for atomic it gives me: Create_Failed: Resource CREATE failed: BadRequest: resources.kube_masters.resources[0].resources.docker_volume: Invalid input for field/attribute availability_zone. Value: . '' is too short (HTTP 400) (Request-ID: req-f3846f3b-bd55-4d64-bb1a-33779b5a43ba)	07:16
*** xinliang has quit IRC		07:29
cosmicsound	brtknr , Create Failed	07:29
cosmicsound	Resource Create Failed: Badrequest: Resources.Kube Masters.Resources[0].Resources.Kube Node Volume: Invalid Input For Field/Attribute Availability Zone. Value: . '' Is Too Short (Http 400) (Request-Id: Req-E909dacb-8c96-467b-A859-1ae5ddb40e4a)	07:29
cosmicsound	i added availability_zone=nova and then i got the second error	07:29
cosmicsound	and weirdest is that. on new version i cannot even. remake the cluster. that worked before: Resource Create Failed: Error: Resources.Kube Masters.Resources[0].Resources.Master Config Deployment: Deployment To Server Failed: Deploy Status Code: Deployment Exited With Non-Zero Status Code: 1	07:59
cosmicsound	http://paste.openstack.org/show/791719/	08:00
*** born2bake has joined #openstack-containers		08:02
*** k_mouza has joined #openstack-containers		08:11
*** k_mouza has quit IRC		08:15
*** ykarel is now known as ykarel\|lunch		08:48
brtknr	cosmicsound: are you using cinder volume?	08:51
cosmicsound	brtknr , yes	08:51
cosmicsound	cinder backed by ceph	08:51
brtknr	can you disable it and try	08:51
cosmicsound	not sure how you mean that	08:52
cosmicsound	labels \| {'heat_container_agent_tag': '689704', 'kube_tag': 'v1.14.8'}	08:52
brtknr	also you need the latest heat also	08:52
brtknr	do you have volume_driver=cinder?	08:52
cosmicsound	let me check	08:52
cosmicsound	latest heat with latest magnum?	08:53
brtknr	yes	08:54
cosmicsound	will give it a try	08:54
cosmicsound	anyhow	08:54
cosmicsound	most occurd i find	08:54
cosmicsound	that the same template that made yesterday a healthy cluster	08:55
cosmicsound	today it fails	08:55
cosmicsound	there is no volume_driver setup	08:56
*** k_mouza has joined #openstack-containers		09:37
*** ykarel\|lunch is now known as ykarel		09:41
*** k_mouza has quit IRC		09:49
*** k_mouza has joined #openstack-containers		09:57
*** k_mouza has quit IRC		09:57
*** k_mouza has joined #openstack-containers		09:57
*** ykarel is now known as ykarel\|meeting		10:02
cosmicsound	brtknr , it did not helped	10:04
cosmicsound	somehow it fails continually	10:04
brtknr	cosmicsound: why are you using heat_container_agent_tag 689704?	10:12
brtknr	ussuri-dev is recommended	10:12
cosmicsound	thats what you had stored in the script	10:13
cosmicsound	and thus was the first working cluster	10:13
cosmicsound	i will update them all to ussuri-dev and try again	10:13
*** ricolin has quit IRC		10:16
cosmicsound	heat_container_agent_tag=ussuri-dev,kube_tag=v1.16.0	10:27
cosmicsound	trying this	10:27
cosmicsound	dockerd-current[1343]: time="2020-04-07T10:30:28.670451562Z" level=error msg="Handler for GET /v1.26/containers/etcd/json returned error: No such container: etcd"	10:30
cosmicsound	Apr 07 10:30:14 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: [2020-04-07 10:30:14,291] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script < /var/lib/heat-config/deployed/56955806-b478-4789-b1c6-e5e747713f43.json	10:31
cosmicsound	it will stay here 2 3 mins, then will time out with os-profile not found	10:32
cosmicsound	brtknr , it goes here	10:37
cosmicsound	Apr 07 10:36:02 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: [2020-04-07 10:36:02,657] (os-refresh-config) [INFO] Completed phase migration	10:37
cosmicsound	Apr 07 10:36:02 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: INFO:os-refresh-config:Completed phase migration	10:37
cosmicsound	Apr 07 10:36:04 k1-rkjdrcythq3m-master-0.novalocal runc[2397]: /var/lib/os-collect-config/local-data not found. Skipping	10:37
cosmicsound	and it dies	10:37
cosmicsound	no matter what labels i use	10:37
brtknr	cosmicsound: please check inside /var/log/heat-config as i mentioned before	10:37
*** ykarel\|meeting is now known as ykarel		10:37
*** vishalmanchanda has joined #openstack-containers		10:40
cosmicsound	i did	10:41
cosmicsound	this is from there	10:41
*** k_mouza has quit IRC		10:42
*** k_mouza has joined #openstack-containers		10:46
cosmicsound	4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: ++ ssh -F /srv/magnum/.ssh/config root@localhost ls /dev/disk/by-id	10:48
cosmicsound	4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: ++ grep 'd3212ba7-394d-45f1-9$'	10:48
cosmicsound	4月 07 08:38:12 magnum-test-cluster-v1-15-11-7lzcw4g3fcpk-master-0.novalocal runc[2364]: + device_name=	10:48
cosmicsound	Here comes the trouble	10:48
cosmicsound	https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/configure-etcd.sh#L25	10:50
cosmicsound	this one cannot be retrieved by. heat-agent	10:50
cosmicsound	and will fail	10:50
*** ttsiouts has quit IRC		10:59
cosmicsound	brtknr , with or without hw_scsi_model=virtio-scsi	11:03
cosmicsound	it seems scsi can cause also issues with above bug we got here	11:03
cosmicsound	il confirm soon	11:03
brtknr	please show me the full log	11:04
brtknr	cosmicsound: well, have you specified etcd_volume_size?	11:05
brtknr	try disabling it	11:05
brtknr	if what you say is true then that is only ever executed if this condition is met: if [ -n "$ETCD_VOLUME_SIZE" ] && [ "$ETCD_VOLUME_SIZE" -gt 0 ]; then	11:06
cosmicsound	brtknr , i had o etcd specified this time	11:07
cosmicsound	also i was o scsi now on virtio	11:07
*** ttsiouts has joined #openstack-containers		11:09
cosmicsound	http://paste.openstack.org/show/791719/ here is full log from heat	11:11
cosmicsound	coreos seems to be stucked at:	11:35
cosmicsound	+ echo 'Trying to label master node with node-role.kubernetes.io/master=""'	11:35
cosmicsound	+ sleep 5s	11:35
cosmicsound	++ curl --silent http://127.0.0.1:8080/healthz	11:35
cosmicsound	+ '[' ok = '' ']'	11:35
ttsiouts	strigazi, brtknr: are you guys around?	12:07
strigazi	o/	12:08
ttsiouts	o/	12:08
ttsiouts	I wanted to talk about the spec	12:08
ttsiouts	I kind of like brtknr's idea.	12:09
ttsiouts	I could start rewriting the spec based on that	12:09
guilhermesp	thanks for sharing the conformance results flwang1 ! Not sure but for both v1.17.4 and v1.18 i'm getting the same tests failing http://paste.openstack.org/show/791730/	12:26
guilhermesp	which is mostly dns tests	12:26
*** ttsiouts has quit IRC		12:34
born2bake	guys is there any up-to-date guide how to use magnum and deploy up-to-date k8s with magnum?	12:41
born2bake	using up-to-date fedora-coreos images	12:42
born2bake	I ve tried already so many different setups :) the only one that works for me: flannel, fedora-coreos, 1 master. (autoscaler, autohealer, cloud manager are crashing at scaling)	12:43
*** ttsiouts has joined #openstack-containers		12:43
born2bake	calico, multi-master neither of them are working for me	12:43
born2bake	openstack: train, kolla	12:43
guilhermesp	born2bake: do you have octavia on your env?	12:47
born2bake	yes	12:47
guilhermesp	no logs?	12:47
born2bake	deleted everything, will try again later on. I managed to have multi-master cluster with fedora-atomic 29....but I cant use fedora-atomic (it takes around 20 min to boot up just one image), worker-nodes were not able to connect either	12:48
guilhermesp	born2bake: https://review.opendev.org/#/c/685875/1 are you aware of?	12:49
born2bake	the main problem, I have no idea how to troubleshoot using heat-config logs...I can see it stopped and failed but I do not know why	12:49
born2bake	guilhermesp nope. even though I have no idea how to update my magnum setup lol	12:50
born2bake	but as far as I know, in kolla ansible train magnum version is 9.2.0	12:50
guilhermesp	and the heat version?	12:51
brtknr	ttsiouts: heelo im here	12:52
born2bake	docker exec -it kolla/ubuntu-source-heat-engine:train heat --version - 1.18.0	12:53
brtknr	born2bake: can you upload your logs?	12:56
brtknr	born2bake: ssh core@172.24.4.253 sudo cat /var/log/heat-config/heat-config-script/* \| nc seashells.io 1337	12:56
born2bake	surely I will do	12:57
brtknr	cosmicsound: same for you^	12:57
strigazi	ping ttsiouts	12:58
ttsiouts	I'm here	13:01
ttsiouts	brtknr: I really like the your idea	13:02
ttsiouts	:)	13:02
ttsiouts	I just wanted to discuss with both of you a bit more	13:02
brtknr	ttsiouts: great to hear =)	13:02
*** udesale_ has joined #openstack-containers		13:03
brtknr	I think someone else has suggested this before but on the client side	13:03
brtknr	e.g. by reading the cluster template labels and applying merge/override based on a flag that the server never sees	13:04
ttsiouts	this solution though would not allow proper tracking of the labels that were provided at creation time.	13:05
*** udesale has quit IRC		13:05
brtknr	ttsiouts: yes i agree	13:06
ttsiouts	so having this option server side is what makes it work for this use case too.	13:06
strigazi	I think someone else has suggested this before but on the client side: HARD NO	13:06
brtknr	I thought it was this but looks like its a different implementation: https://review.opendev.org/#/c/657410	13:07
strigazi	https://review.opendev.org/#/c/657435/ This is the patch	13:07
brtknr	that looks similar	13:08
strigazi	They are duplicate	13:08
brtknr	either way, the merge takes place on the client side	13:08
strigazi	Haven't we rejected this?	13:09
brtknr	yes, i was basically trying to point out that my suggestion is not 100% original :)	13:10
ttsiouts	ok we agree that the merge should be done server side in order to allow proper tracking of client input	13:10
brtknr	+1	13:11
ttsiouts	do you also agree that the labels field (in a cluster or nodegroup) should contain only the labels provided at creation?	13:13
ttsiouts	which means that we have to persist the flag too.	13:14
brtknr	at cluster/nodegroup creation?	13:14
ttsiouts	brtknr: yes	13:14
brtknr	agree	13:14
ttsiouts	strigazi ?	13:14
strigazi	yes	13:15
strigazi	argee	13:15
strigazi	agree	13:16
brtknr	ttsiouts: e.g. after the cluster is created, --merge-label flag cannot be modified you mean right?	13:16
brtknr	via the API	13:16
ttsiouts	brtknr: yes	13:16
ttsiouts	cool	13:17
ttsiouts	sorry for going step by step but I want this to go forward as soon as possible	13:18
ttsiouts	:)	13:18
strigazi	we are picky, so this ^^ is the only way	13:18
ttsiouts	:)	13:19
brtknr	ttsiouts: any more resolutions to pass ? :)	13:20
ttsiouts	should we also agree on the flag and the field name?	13:20
ttsiouts	--merge-labels and a boolean field in DB called merge_labels?	13:21
brtknr	i am happy with merge-labels but open to other suggestions	13:22
brtknr	other ideas: --combine-labels, --smash-labels, --update-labels, --inherit-labels, --override-labels	13:23
strigazi	override is probably what we want	13:23
strigazi	in OOP you override a method	13:23
brtknr	second thing to agree on is whether the current behaviour is override-labels=True or False	13:25
strigazi	animal.get_features() and dog.get_features()	13:25
ttsiouts	override though means not using what's inherited right?	13:25
ttsiouts	brtknr: yes	13:25
ttsiouts	if we go with override then false should mean merge right?	13:26
brtknr	I think the current behaviour is --override-labels=True	13:26
ttsiouts	brtknr: exactly	13:26
brtknr	Since nothing is inherited	13:27
strigazi	actually, the current behaviour is both. Becaus:	13:27
cosmicsound	brtknr , will upload logs	13:28
brtknr	It would be good not have to speficy --override-labels=False as an opt-in flag	13:28
strigazi	in the API we do https://github.com/openstack/magnum/blob/master/magnum/api/controllers/v1/cluster.py#L475	13:28
brtknr	would prefer to supply "--opt-in-flag" only if True	13:28
strigazi	as brtknr wants	13:29
brtknr	strigazi: I see your point	13:30
ttsiouts	strigazi: indeed	13:30
cosmicsound	brtknr , https://seashells.io/v/VzqW9UYW	13:31
brtknr	--override_labels = False if cluster.label == wtypes.Unset else True	13:31
strigazi	this boolean is strange because, in the new API version we want override always True and in the old API always False.	13:32
cosmicsound	Will update session as it changes i test newer versions	13:32
brtknr	hmm this is more of a rabbit hole than I realised :)	13:33
strigazi	Another option (a bad one) is the default API is not the actual latest which means (use always only the cluster labels), and the new API uses always both (override = True)	13:35
strigazi	The problem is that in the cli we always ask the latest API version.	13:35
strigazi	override=true == (get CT labels and C labels) && (get CT labels and C labels and NG labels)	13:38
strigazi	override=False == (get C labels) && (get NG labels)	13:38
strigazi	for POST cluster and POST NG respectively	13:38
strigazi	default logic override=true	13:39
ttsiouts	IMHO the True boolean option should reflect the new functionality.	13:39
strigazi	+10 ^^	13:40
strigazi	The issue to address is: The old client will send Unset for override and latest for the API microversion.	13:41
strigazi	To solve this: The API can have default logic override=fale	13:42
strigazi	and the new client send true by default	13:42
strigazi	for UX experience	13:42
brtknr	or override=True if labels is defined else False?	13:43
brtknr	is override is Unset	13:43
strigazi	(UX includes experience) :)	13:43
brtknr	if override is Unset	13:43
strigazi	-1 to that	13:43
strigazi	The new API should not check if Unset	13:44
strigazi	the old API microversion will do what you just mentioned	13:44
strigazi	because it is not supposed to know about override	13:44
brtknr	strigazi: ok IDK the fine details of how API microversion works atm	13:45
cosmicsound	cloud_provider_tag=v1.15.0 should work for k8s v1.17.4 ?	13:45
strigazi	yes ^^	13:45
brtknr	cosmicsound: yes thats what I use	13:45
guilhermesp	it is the default right for v1.17 right?	13:46
guilhermesp	cloud_provider_tag=v1.15.0.	13:46
strigazi	brtknr: ttsiouts: let's break backwards compatibility?	13:47
strigazi	brtknr: ttsiouts: let's break the API	13:47
brtknr	strigazi: hmm?	13:47
strigazi	we document and users open many tickets, at CERN the open many anyway :)	13:47
brtknr	strigazi: not sure if you are being serious :)	13:49
strigazi	actually desperate	13:49
strigazi	:)	13:49
cosmicsound	Error: Unable to update cluster. when trying to resize cluster	13:49
cosmicsound	isnt this supposed to work?	13:49
cosmicsound	making from 1 node 2 3 nodes	13:50
ttsiouts	strigazi, brtknr: let's think about this. this property is immutable. meaning that it is false for all the existing clusters	13:50
ttsiouts	we need to describe this with one word.	13:51
brtknr	the way I see it, it only makes sense to evaluate this flag if labels is not empty at cluster scope or nodegroup scope	13:52
ttsiouts	brtknr: +1	13:52
brtknr	if labels is empty and this flag is True, the API should return an error	13:52
brtknr	this should be backward compatible too	13:52
ttsiouts	I agree	13:52
brtknr	strigazi: ^	13:53
strigazi	thinking	13:55
brtknr	-\\|/-\\|/-	13:55
strigazi	And the default is True?	13:56
strigazi	both API and cli	13:57
strigazi	?	13:57
brtknr	i think the "opt-in" flag should signify the merge action	13:57
brtknr	I am starting to actually prefer the sound of combine	13:58
ttsiouts	the default shouldn't be true	13:58
brtknr	--combine-labels	13:58
brtknr	so it should be False	13:58
brtknr	^	13:58
*** dave-mccowan has joined #openstack-containers		13:59
ttsiouts	In the code it will be a dict.update	14:01
strigazi	brtknr: So the new improved logic won't be available by default to users, correct?	14:01
ttsiouts	should we go with update?	14:01
ttsiouts	update-labels	14:01
brtknr	strigazi: not by default	14:02
brtknr	users will need to work for it	14:02
brtknr	earn their keep	14:02
strigazi	Is this what we want?	14:02
brtknr	strigazi: doesnt make sense to break current default behaviour suddenly does it?	14:03
strigazi	I think that what currently is described in the spec is simpler (one param less and more verbosity) for user and more code for us, thoughts?	14:03
strigazi	doesnt make sense to break current default behaviour suddenly does it? it doesn't	14:04
brtknr	i disagree that it is one less param, since we are adding a new field	14:04
brtknr	its the same number of params	14:05
brtknr	ttsiouts: i am also fine with update-labels	14:06
brtknr	ttsiouts: i am also fine with update-labels but it is less obvious	14:07
strigazi	brtknr: true, same number of params.	14:07
brtknr	in fact i would argue that the purpose of dict.update is not entirely clear to a new user	14:07
strigazi	So, to have the same functionality with the SPEC (and not change default behavior). override-labes false by default. And we are covered, correct?	14:10
brtknr	the purpose of this flag is to avoid mutually exclusive population of labels/override_labels(from the current spec)	14:10
*** dave-mccowan has quit IRC		14:10
brtknr	if override-labels==combine-labels, correct :)	14:11
strigazi	exactly	14:11
brtknr	if override-labels==combine-labels==update-labels, correct :)	14:11
strigazi	*-labels	14:11
strigazi	doesn't matter	14:12
strigazi	the new name	14:12
strigazi	whatever it is	14:12
brtknr	we could do a survey on ML?	14:14
brtknr	or send a link to a survey	14:14
strigazi	let's conclude in the rest and leave the name out	14:14
brtknr	ok	14:14
strigazi	So it will be a new bool	14:15
ttsiouts	yes	14:16
brtknr	+1	14:16
strigazi	it will be false by default	14:16
ttsiouts	yes	14:17
brtknr	false by default if labels is defined	14:17
ttsiouts	we need a default value for the flag in the DB	14:17
ttsiouts	and it should be false no matter what for existing cluster	14:18
ttsiouts	the default should be false and it will be evaluated only if labels are provided.	14:19
ttsiouts	does it make sense?	14:19
brtknr	ttsiouts: yes that makes sense to me	14:19
strigazi	brtknr: > false by default if labels is defined \| what about when labels == unset?	14:19
strigazi	brtknr: false as well, no?	14:20
brtknr	what ttsiouts said	14:20
brtknr	strigazi: yes	14:20
brtknr	lets keep it simple and leave it as false	14:20
strigazi	solved?	14:22
brtknr	ttsiouts: ?	14:22
brtknr	strigazi: do you guys use https for keystone at CERN?	14:23
strigazi	yes	14:23
brtknr	how did you make this work for fedora coreos?	14:23
strigazi	for everything user facing	14:23
ttsiouts	I'm just thinking about the migration for the existing clusters. we should have something simple and not checking if things are set or not	14:23
strigazi	I think false for all existing clusters is fine. I don't see what we need to distinct it	14:24
brtknr	strigazi: a user reported this yesterday: Apr 07 04:02:01 d-pa2hu2ehhcwn-master-0 podman[2332]: Authorization failed: SSL exception connecting to	14:24
brtknr	https://cloud.uhlhost.net:5000/v3/auth/tokens: HTTPSConnectionPool(host='cloud.uhlhost.net',	14:25
brtknr	port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(136, '[X509] no	14:25
brtknr	certificate or crl found (_ssl.c:4232)')))	14:25
strigazi	ttsiouts: I think false for all existing clusters is fine. I don't see what we need to distinct it	14:25
strigazi	brtknr: openstack_ca_file	14:25
strigazi	brtknr: in magnum.conf	14:25
ttsiouts	strigazi: cool for me	14:25
brtknr	strigazi: ok cool thanks	14:26
brtknr	ttsiouts: look forward to the updated spec	14:26
brtknr	cosmicsound: ^^	14:26
strigazi	ttsiouts: brtknr: I hope flwang1 doesn't have another idea	14:26
brtknr	strigazi: me too :P	14:27
ttsiouts	haha	14:27
ttsiouts	strigazi, brtknr: thanks guys! I'll update the spec	14:27
brtknr	.X,	14:27
brtknr	^that is a crossed finger	14:27
strigazi	brtknr: ttsiouts: name:	14:29
strigazi	brtknr: ttsiouts: https://helm.sh/docs/helm/helm_install/ For example, if both myvalues.yaml and override.yaml contained a key called ‘Test’, the value set in override.yaml would take precedence:	14:29
ttsiouts	we go with override? It's ok for me	14:30
strigazi	brtknr: ^^	14:30
brtknr	strigazi: im okay with override	14:30
brtknr	yankcrime: ^^ please read the bit about openstack_ca_file	14:31
ttsiouts	it's the name of the spec and my jira ticket :P	14:31
brtknr	i am getting used to it	14:32
strigazi	ttsiouts: override?	14:32
ttsiouts	strigazi: yes	14:32
strigazi	brtknr: yankcrime: https://docs.openstack.org/magnum/latest/configuration/sample-config.html [drivers] openstack_ca_file Path to the OpenStack CA-bundle file to pass and install in all cluster nodes.	14:33
*** ricolin has joined #openstack-containers		14:39
*** ttsiouts has quit IRC		14:42
yankcrime	brtknr: 👀	14:45
yankcrime	oh is this because fedora coreos doesn't ship a cacert bundle for public / commercial CAs?	14:45
strigazi	yankcrime: no, because pyhton	14:46
*** ttsiouts has joined #openstack-containers		14:46
strigazi	yankcrime: no, because python	14:46
yankcrime	:(	14:47
strigazi	yankcrime: it should work, you have an ok cert	14:48
strigazi	from Sectigo	14:48
*** ttsiouts_ has joined #openstack-containers		14:49
*** ttsiouts has quit IRC		14:49
yankcrime	strigazi: it's a letsencrypt issued cert and we still see that error that brtknr described	14:50
strigazi	yankcrime: not sure why you see it: podman run -it --rm --entrypoint /usr/bin/python docker.io/openstackmagnum/heat-container-agent:train-stable-1 -c "import requests ; print(requests.get('https://cloud.uhlhost.net:5000/v3'))"	14:53
strigazi	<Response [200]>	14:53
strigazi	seems to work	14:53
strigazi	yankcrime: this is how the agent runs: https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_coreos_v1/templates/fcct-config.yaml#L194	14:59
brtknr	strigazi: yankcrime has 9.2.0 release	14:59
brtknr	strigazi: is this patch relevant: https://review.opendev.org/#/c/709777/	15:00
brtknr	this patch is only available in 9.3.0 release	15:01
strigazi	maybe yes, if /etc/pki/ca-trust/source/anchors/openstack-ca.pem has something bad inside	15:01
*** ttsiouts_ has quit IRC		15:02
strigazi	brtknr: if this file doesn't exist the error is: OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/pki/ca-trust/source/anchors/openstack-ca.pem	15:03
strigazi	test with: podman run -it --rm --entrypoint /usr/bin/python3 --env REQUESTS_CA_BUNDLE=/etc/pki/ca-trust/source/anchors/openstack-ca.pem docker.io/openstackmagnum/heat-container-agent:train-stable-1 -c "import requests ; print(requests.get('https://cloud.uhlhost.net:5000/v3'))"	15:03
strigazi	yankcrime: brtknr: ^^	15:04
brtknr	strigazi: you're right	15:06
born2bake	brtknr http://paste.openstack.org/show/791740/ - calico, coreos ; same on flannel. its with having loadbalancers added.	15:08
born2bake	so it failed but load balancers show online http://prntscr.com/rut6b5 and they can ping machines	15:09
brtknr	strigazi: not sure what other explaination there is	15:11
brtknr	i will try running these on yankcrime's compute.sausage.cloud	15:12
*** rcernin has quit IRC		15:13
brtknr	born2bake: please run 9.3.0, there is a patch for TimeoutRestartSec	15:17
born2bake	do I need to add label tag or something when I do 9.3.0?	15:18
brtknr	No label required	15:19
brtknr	TimeoutRestartSec default value is 90 seconds, we have increased this to 600	15:19
*** ttsiouts has joined #openstack-containers		15:27
brtknr	born2bake: in 9.3.0 release	15:30
born2bake	it would take some time cause I ve no idea how to create custom magnum containers in kolla so I can have the latest version :)	15:31
brtknr	born2bake: you dont need to build it, the image should be usable as train tag: https://hub.docker.com/r/kolla/centos-binary-magnum-conductor/tags	15:34
brtknr	although i think the CI is broken	15:35
born2bake	a632c4d94216 kolla/ubuntu-source-magnum-conductor:train "dumb-init --single-…" 3 days ago Up 3 days magnum_conductor	15:35
born2bake	1a200061c45b kolla/ubuntu-source-magnum-api:train "dumb-init --single-…" 3 days ago Up 3 days magnum_api	15:35
born2bake	i have ubuntu-source-train	15:35
brtknr	no wait it finally merged: https://review.opendev.org/#/c/716339/	15:36
born2bake	and then just run reconfigure?	15:36
brtknr	you might have to wait till tomrrow because i think they build the image every 24 hours	15:36
brtknr	strigazi: if run that command you shared as sudo, with --privileged flag, i can reproduce the problem	15:56
brtknr	strigazi: e.g sudo podman run -it --name heat-container-agent-dupe --privileged --volume /etc/:/etc/ --env REQUESTS_CA_BUNDLE=/etc/pki/ca-trust/source/anchors/openstack-ca.pem --net=host --rm docker.io/openstackmagnum/heat-container-agent:ussuri-dev python3 -c "import requests ; print(requests.get('https://compute.sausage.cloud:5000/v3'))"	15:59
brtknr	but with the REQUESTS_CA_BUNDLE patch, no issues	15:59
brtknr	yankcrime: you need this patch in conclusion https://review.opendev.org/#/c/704739/2/magnum/drivers/k8s_fedora_coreos_v1/templates/user_data.json	16:00
brtknr	born2bake: yes reconfigure but as i mentioned in the openstack-kolla channel, i dont think the images have been built yet, according to dockerhub the last train image was built 13 days ago	16:04
brtknr	ah sorry you are using ubuntu-source	16:05
brtknr	its possible 9.3.0 is available in there then	16:06
brtknr	one caveat is that we forgot to merge zincati auto-update disable patch	16:06
brtknr	you might therefore be better off using master branch for magnum	16:06
brtknr	you might therefore be better off using master tag for magnum	16:07
brtknr	the side-effect of zincati is that for fedora coreos, heat-container-agent restarts	16:07
*** udesale_ has quit IRC		16:08
*** ykarel is now known as ykarel\|away		16:10
born2bake	brtknr ok I will try binary cotainers then. Also, as I mentioned previously, just created flannel 1 master 1 node cluster....run kubectl scale deployment test-autoscale --replicas=100 - http://paste.openstack.org/show/791752/ (autoscaler, autohealer, cloud manager crashing, node is created in stack though but not added	16:12
brtknr	born2bake: not sure why, they work for me	16:14
brtknr	born2bake: ubuntu-source may have the correct version	16:15
brtknr	as it was built 9 hours ago	16:15
born2bake	how do I check magnum version in container?	16:15
born2bake	[magnum@sova magnum-base-source]$ ls	16:19
born2bake	magnum-9.2.0	16:19
born2bake	[heat@sova heat-base-source]$ ls openstack-heat-13.0.0	16:19
born2bake	I will try to use binary master	16:22
born2bake	brtknr which one you wd suggest to you? centos/ubuntu-binary/source-master?	16:22
yankcrime	brtknr: ok will get it applied	16:27
yankcrime	tomorrow at this rate	16:27
*** ttsiouts has quit IRC		16:40
brtknr	born2bake: ubuntu-source master should also be fine	16:48
cosmicsound	born2bake , use virtio instead of scsi if case	16:50
cosmicsound	it helped me on my failed scripts	16:51
cosmicsound	use heat_tag: master magnum_tag: master and reconfigure	16:51
born2bake	as I said, when I use virtio, image doesnt have enough entropy /dev/random and cant generate ssh keys fast. so it takes around 20 minutes for machine to boot :)	16:51
cosmicsound	make sure disks are on virtio	16:51
cosmicsound	hmm	16:51
cosmicsound	did you added the other one i mentioned?	16:52
born2bake	therefore, I am stick to fedora-coreos images cause they are fine and newer	16:52
born2bake	yes, I ve tried all :)	16:52
cosmicsound	i too work now on coreos	16:52
cosmicsound	and works good for me	16:52
born2bake	have you tried autoscaler?	16:52
born2bake	its crashing for me for some reason	16:53
*** ttsiouts has joined #openstack-containers		17:13
*** ttsiouts has quit IRC		17:18
cosmicsound	born2bake , i tried it	17:22
cosmicsound	it made me scared when it lowered my servers	17:22
cosmicsound	:D	17:22
cosmicsound	I did not tried it upscale it yet was downscalling itself	17:22
born2bake	none-k8s servers? :)	17:22
cosmicsound	all :D	17:30
cosmicsound	born2bake , used sonobuoy?	17:31
cosmicsound	anyone know how i start it?	17:31
*** k_mouza has quit IRC		17:31
*** ttsiouts has joined #openstack-containers		17:31
born2bake	cosmicsound what version do you have? kolla/ubuntu-source-magnum-conductor:master - [magnum@sova magnum-base-source]$ ls - magnum-9.1.0.dev212	17:36
born2bake	I set master, and its even lower than I had	17:37
*** ttsiouts has quit IRC		17:46
*** vishalmanchanda has quit IRC		17:47
cosmicsound	yes born2bake	17:57
cosmicsound	the one with 9.1.0 was working	17:57
cosmicsound	also need the heat master	17:57
cosmicsound	i do not tag versions only master or train .	17:57
cosmicsound	numerical tags do not work	17:57
cosmicsound	If I want to edit just a extra label	17:58
cosmicsound	I need to recreate the cluster?	17:58
born2bake	both flannel and calico failed for me with master tag :/	18:01
born2bake	http://paste.openstack.org/show/791756/	18:01
*** ricolin has quit IRC		18:02
*** k_mouza has joined #openstack-containers		18:13
*** k_mouza has quit IRC		18:14
*** ttsiouts has joined #openstack-containers		18:25
*** ttsiouts has quit IRC		18:30
born2bake	http://paste.openstack.org/show/791757/ - flannel, with lb, 2 masters	18:31
born2bake	calico faiing	18:31
brtknr	Use etcd_tag=v3.4.6	18:33
brtknr	With coreos or atomic?	18:33
brtknr	born2bake:	18:33
born2bake	coreos	18:34
brtknr	are you using the terraform script?	18:34
born2bake	http://paste.openstack.org/show/791756/ - calico	18:34
born2bake	yes terraform from github	18:34
brtknr	That is a partial log that doesn’t tell me a lot	19:01
brtknr	born2bake:	19:02
brtknr	born2bake: it doesn’t say why it failed	19:02
brtknr	born2bake: at the end of the log, it says etcd server request timed out	19:04
brtknr	check that etcd is running	19:04
brtknr	born2bake: When copying the logs, use the seashells method I described earlier	19:05
brtknr	it will capture the full log	19:06
born2bake	ssh core@172.24.4.253 sudo cat /var/log/heat-config/heat-config-script/* \| nc seashells.io 1337 ?	19:06
born2bake	Okay I will	19:06
brtknr	born2bake: Yes	19:06
born2bake	I will change etcd tag and do clusters again	19:06
brtknr	but looks are you are using incompatible etcd version	19:06
brtknr	what is the current version on the terraform script?	19:07
born2bake	flannel http://paste.openstack.org/show/791757/ looks like smth with octavia	19:07
brtknr	in master, you now need a v before the tag	19:07
born2bake	branch is up-to-date with 'origin/master'	19:07
born2bake	in vars.tf right?	19:08
brtknr	I saw those, it’s not much use because the log is incomplete	19:08
brtknr	but I saw etcd timing out at the end	19:09
*** ttsiouts has joined #openstack-containers		19:14
born2bake	brtknr flannel, coreos - masters finished successfully https://seashells.io/v/QJcExtc8 ; let me see worker	19:21
born2bake	on worker node there is no even heat-config logs	19:23
born2bake	now there is. worker node - https://seashells.io/v/nQvQRffK	19:23
born2bake	kubectl get node doesnt work on master either	19:26
brtknr	Can you use tail instead of cat	19:28
brtknr	looks like the container agent is still running on the master	19:29
cosmicsound	born2bake , what hw labels you have on the image?	19:34
cosmicsound	for libvirt	19:34
born2bake	tail on master - https://seashells.io/v/uASF984Q	19:37
born2bake	cosmicsound all ceph rbd related...cant find it now :)	19:37
brtknr	born2bake: can you try with your lb disabled?	19:47
*** ttsiouts has quit IRC		19:48
brtknr	does that work?	19:48
brtknr	looks like your lb for octavia is not reachable	19:48
born2bake	brtknr flannel: masters tail - https://seashells.io/v/uASF984Q ; workers - https://seashells.io/v/nQvQRffK ; calico: master - https://seashells.io/v/xN4gyXTD	19:48
brtknr	for etcd	19:48
brtknr	arr you sure octavia is configured correctly?	19:49
born2bake	as I said, flannel with 1 master and 1 node it works	19:49
born2bake	cant be sure :/	19:50
born2bake	https://ssup2.github.io/record/OpenStack_Stein_%EC%84%A4%EC%B9%98_Kolla-Ansible_Ubuntu_18.04_ODROID-H2_Cluster/ followed that guide for octavia	19:50
brtknr	you can test octavia ingress controller	19:50
born2bake	created certs, added route to docker hosts "route add -net 20.0.0.0/24 gw 192.168.0.225", then when I create lb's they are fine	19:51
brtknr	born2bake: Setting up octavia is complicated, if it works with single master, sounds like problem with your octavia config	19:54
born2bake	calico without lb still didnt work but I think I will focus on flannel just now...and see what's wrong with octavia	19:54
born2bake	even though autoscaler/cloud manager are still crashing for me :(	19:55
brtknr	you can try curling etcd port on the load balancer	19:55
born2bake	curl --insecure https://10.0.15.169:2379 curl: (35) error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate	19:57
born2bake	I noticed my load balancer does not have floating ip assigned	19:57
born2bake	Status: TCP 2379 Online Active Yes	19:57
born2bake	however, I do have master_lb_floating_ip_enabled = "true" enabled	19:58
born2bake	I think the case might be that my octavia doesnt support tls/ssl	19:59
brtknr	What if you use http instead of https	20:03
born2bake	curl: (52) Empty reply from server	20:03
born2bake	thing is it does not create floating ip - lb for etcd. only for 6443 api	20:04
brtknr	Can you curl the k8s api?	20:09
brtknr	born2bake: Anyway have fun investigating, I’m going to bed, I strongly suspect your lb config	20:10
born2bake	okay, yeah wd need to do some testing on octavia	20:11
born2bake	thanks a lot!	20:11
flwang1	brtknr: ping, are you there?	20:28
*** born2bake has quit IRC		21:36
*** ttsiouts has joined #openstack-containers		21:44
*** rcernin has joined #openstack-containers		22:11
*** ttsiouts has quit IRC		22:18

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!