Tuesday, 2019-03-05

*** ttsiouts has joined #openstack-containers		00:07
*** sdake has joined #openstack-containers		00:08
*** itlinux has joined #openstack-containers		00:09
*** sdake has quit IRC		00:12
*** itlinux_ has joined #openstack-containers		00:13
*** sdake has joined #openstack-containers		00:14
*** itlinux has quit IRC		00:15
*** ttsiouts has quit IRC		00:29
*** sdake has quit IRC		00:41
*** sapd1 has quit IRC		00:42
*** PagliaccisCloud has quit IRC		00:50
*** PagliaccisCloud has joined #openstack-containers		00:52
*** openstackgerrit has joined #openstack-containers		00:57
openstackgerrit	Jake Yip proposed openstack/magnum master: Update min tox version to 2.0 https://review.openstack.org/616412	00:57
*** ricolin has joined #openstack-containers		01:03
*** sdake has joined #openstack-containers		01:08
*** sapd1 has joined #openstack-containers		01:52
*** sdake has quit IRC		02:05
*** hongbin has joined #openstack-containers		02:13
*** sdake has joined #openstack-containers		02:28
*** sapd1 has quit IRC		02:37
*** itlinux_ has quit IRC		03:06
*** itlinux has joined #openstack-containers		03:12
*** itlinux has quit IRC		03:21
*** itlinux has joined #openstack-containers		03:25
*** sdake has quit IRC		03:51
*** ramishra has joined #openstack-containers		04:09
*** udesale has joined #openstack-containers		04:18
*** ykarel\|away has joined #openstack-containers		04:31
*** ykarel\|away is now known as ykarel		04:31
*** janki has joined #openstack-containers		05:08
*** hongbin has quit IRC		05:09
*** udesale has quit IRC		05:22
*** jhesketh has quit IRC		05:47
*** jhesketh has joined #openstack-containers		05:48
*** sdake has joined #openstack-containers		05:48
*** sdake has quit IRC		05:50
*** pcaruana has joined #openstack-containers		05:52
*** sdake has joined #openstack-containers		05:58
*** pcaruana has quit IRC		06:07
*** dims has quit IRC		06:24
*** dims has joined #openstack-containers		06:26
*** itlinux has quit IRC		06:34
*** dims has quit IRC		06:36
*** dims has joined #openstack-containers		06:37
*** mkuf has quit IRC		06:48
*** mkuf has joined #openstack-containers		07:01
*** udesale has joined #openstack-containers		07:51
*** udesale has quit IRC		08:01
*** flwang1 has joined #openstack-containers		08:14
*** sapd1 has joined #openstack-containers		08:16
*** pcaruana has joined #openstack-containers		08:18
*** pcaruana has quit IRC		08:25
*** yolanda has joined #openstack-containers		08:25
flwang1	strigazi: around?	08:33
*** sdake has quit IRC		08:36
*** pcaruana has joined #openstack-containers		08:37
flwang1	strigazi: do you have time for a catch up?	08:40
*** ykarel is now known as ykarel\|lunch		08:41
*** pcaruana has quit IRC		08:44
*** ttsiouts has joined #openstack-containers		08:52
*** alisanhaji has joined #openstack-containers		09:00
*** pcaruana has joined #openstack-containers		09:01
*** ttsiouts has quit IRC		09:05
*** ttsiouts has joined #openstack-containers		09:06
*** ttsiouts has quit IRC		09:10
*** ttsiouts has joined #openstack-containers		09:12
*** ign0tus has joined #openstack-containers		09:12
*** alisanhaji has quit IRC		09:30
*** alisanhaji has joined #openstack-containers		09:32
*** ykarel\|lunch is now known as ykarel		09:39
*** sdake has joined #openstack-containers		09:55
*** sdake has quit IRC		10:51
*** sdake has joined #openstack-containers		10:55
*** ttsiouts has quit IRC		11:19
*** ttsiouts has joined #openstack-containers		11:20
*** janki has quit IRC		11:22
*** ttsiouts has quit IRC		11:24
*** mkuf has quit IRC		11:27
*** mkuf has joined #openstack-containers		11:28
*** sapd1 has quit IRC		11:29
*** udesale has joined #openstack-containers		12:00
*** ttsiouts has joined #openstack-containers		12:01
*** dave-mccowan has joined #openstack-containers		12:19
*** sdake has quit IRC		12:39
*** janki has joined #openstack-containers		13:17
*** sdake has joined #openstack-containers		13:19
*** ivve has joined #openstack-containers		13:38
*** andrein has joined #openstack-containers		13:41
*** sapd1 has joined #openstack-containers		13:43
andrein	Hi guys, I'm trying to configure magnum on openstack rocky. I can launch the cluster, but the heat stack fails after creating the masters. I've logged in to the masters and every one of them is hanging when starting etcd because it can't find the certificates. I've noticed the make-certs.sh job failed on all of them because they're trying to hit the keystone API over the internal endpoint. How can I change this?	13:47
*** sapd1 has quit IRC		13:47
*** sdake has quit IRC		14:11
*** janki has quit IRC		14:12
*** janki has joined #openstack-containers		14:12
*** ykarel is now known as ykarel\|away		14:14
*** ykarel\|away has quit IRC		14:18
*** sapd1 has joined #openstack-containers		14:18
*** ykarel\|away has joined #openstack-containers		14:19
*** ttsiouts has quit IRC		14:21
*** sdake has joined #openstack-containers		14:22
*** ttsiouts has joined #openstack-containers		14:22
*** sdake has quit IRC		14:23
DimGR	strigazi hii :)	14:23
*** ttsiouts has quit IRC		14:25
*** ttsiouts has joined #openstack-containers		14:25
*** sdake has joined #openstack-containers		14:33
brtknr	andrein: how was your openstack deployed?	14:37
andrein	brtknr, I deployed it using kolla-ansible	14:47
*** hongbin has joined #openstack-containers		14:47
brtknr	Which version of kolla-ansible?	14:47
andrein	Version 7.0.1	14:48
*** sdake has quit IRC		14:48
*** sdake has joined #openstack-containers		14:50
brtknr	Hmm, can you check your heat-container-agent log in master?	14:50
brtknr	also check /var/log/cloud-init.log	14:51
brtknr	and /var/log/cloud-init-output.log	14:51
brtknr	and grep -i for fail	14:51
andrein	On the Kubernetes master, right?	14:51
* andrein is spawning another cluster		14:53
*** pcaruana has quit IRC		14:57
*** munimeha1 has joined #openstack-containers		14:58
andrein	brtknr, cloud init log shows make-cert.service failing	14:59
andrein	That's the only error I see in cloud-init logs. I'm using coreos as a base image for this cluster.	15:04
*** sdake has quit IRC		15:04
andrein	From what I notice in /etc/sysconfig/heat-params, MAGNUM_URL is set to the public endpoint, but AUTH_URL is private.	15:04
andrein	Make-certs.sh is trying to hit the private auth endpoint and times out after a while, that causes etcd to fail etc.	15:05
*** sdake has joined #openstack-containers		15:07
openstackgerrit	jacky06 proposed openstack/magnum-tempest-plugin master: Update json module to jsonutils https://review.openstack.org/638968	15:07
andrein	Hmmm, wait a second, in horizon under admin/system information I do have the wrong URL for the public endpoint. Seems something went south in kolla-ansible	15:08
*** sapd1 has quit IRC		15:25
*** openstackgerrit has quit IRC		15:28
*** alisanhaji has quit IRC		15:34
*** pcaruana has joined #openstack-containers		15:42
*** alisanhaji has joined #openstack-containers		15:44
*** sdake has quit IRC		15:49
*** udesale has quit IRC		15:50
*** belmoreira has quit IRC		16:00
*** sdake has joined #openstack-containers		16:01
*** ricolin has quit IRC		16:04
*** Adri2000 has joined #openstack-containers		16:21
Adri2000	hello	16:21
Adri2000	is there any existing discussion somewhere about using 8.8.8.8 as default dns server for magnum-created networks, instead of not specifying any dns server and therefore using neutron dns resolution?	16:23
Adri2000	at least in the k8s_fedora_atomic_v1 driver	16:23
*** janki has quit IRC		16:41
*** ramishra has quit IRC		16:41
*** janki has joined #openstack-containers		16:41
*** ign0tus has quit IRC		16:45
*** ivve has quit IRC		16:58
*** andrein has quit IRC		17:00
-openstackstatus- NOTICE: Gerrit is being restarted for a configuration change, it will be briefly offline.		17:09
*** ykarel\|away has quit IRC		17:21
*** itlinux has joined #openstack-containers		17:35
*** sdake has quit IRC		17:35
*** ttsiouts has quit IRC		17:38
flwang1	Adri2000: we have seen this requirement before	17:47
flwang1	but no one working on that now	17:47
*** itlinux has quit IRC		18:42
*** itlinux has joined #openstack-containers		18:47
*** andrein has joined #openstack-containers		18:56
brtknr	Meeting today?	19:10
*** ivve has joined #openstack-containers		19:12
brtknr	andrein: if coreos is not essential, try using fedora-atomic driver	19:12
brtknr	not sure many people here are testing coreos environment	19:12
brtknr	although that might change soon with fedora-coreos?	19:13
*** ttsiouts has joined #openstack-containers		19:31
*** sdake has joined #openstack-containers		19:32
*** dave-mccowan has quit IRC		19:55
*** itlinux has quit IRC		19:56
*** NobodyCam has joined #openstack-containers		19:57
NobodyCam	morning Magnum folks	19:57
NobodyCam	anyone encountered Authorization failed. or token scope issues with OpenStack sensible installed magnum?	19:58
*** dave-mccowan has joined #openstack-containers		20:01
*** itlinux has joined #openstack-containers		20:08
*** ttsiouts has quit IRC		20:11
*** ttsiouts has joined #openstack-containers		20:11
*** itlinux has quit IRC		20:15
*** ttsiouts has quit IRC		20:15
*** sdake has quit IRC		20:17
andrein	brtknr, I eventually got it working with CoreOS. Had to change the keystone public endpoint manually, no idea why kolla skipped reconfiguring it, the other endpoints were changed.	20:18
*** sdake has joined #openstack-containers		20:19
*** ttsiouts has joined #openstack-containers		20:32
flwang1	strigazi: do we have meeting today?	20:36
flwang1	brtknr: seems we don't have meeting today, strigazi is not online	20:42
strigazi	We do have a meeting	20:46
brtknr	Woot!	20:49
brtknr	andrein: we have submitted kolla-ansible config to modify keystone endpoint in the past but maybe it wasnt for coreos	20:49
strigazi	Dates for the next three Tuesdays https://wiki.openstack.org/wiki/Meetings/Containers	20:50
colin-	hi	20:51
*** andrein has quit IRC		20:52
flwang1	strigazi: cool, good to see you	20:53
*** andrein has joined #openstack-containers		20:53
*** mkuf has quit IRC		20:57
strigazi	#startmeeting containers	21:00
openstack	Meeting started Tue Mar 5 21:00:05 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:00
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:00
*** openstack changes topic to " (Meeting topic: containers)"		21:00
openstack	The meeting name has been set to 'containers'	21:00
strigazi	#topic Roll Call	21:00
*** openstack changes topic to "Roll Call (Meeting topic: containers)"		21:00
strigazi	o/	21:00
schaney	o/	21:00
jakeyip	o/	21:00
brtknr	o/	21:01
strigazi	Hello schaney jakeyip brtknr	21:02
strigazi	#topic Stories/Tasks	21:02
*** openstack changes topic to "Stories/Tasks (Meeting topic: containers)"		21:02
*** imdigitaljim has joined #openstack-containers		21:02
imdigitaljim	o/	21:02
strigazi	I want to mention three things quickly.	21:03
strigazi	CI for swarm and kubernetes is not passing	21:03
colin-	hello	21:03
strigazi	Hello colin- imdigitaljim	21:03
strigazi	I'm finding the error	21:04
strigazi	for example for k8s http://logs.openstack.org/73/639873/3/check/magnum-functional-k8s/06f3638/logs/screen-h-eng.txt.gz?level=ERROR	21:04
strigazi	The error is the same for swarm	21:04
strigazi	If someone wants to take a look and then comment in https://review.openstack.org/#/c/640238/ or in a fix :)	21:06
strigazi	2.	21:06
strigazi	small regression I have found for the etcd_volume_size label (persistent storage for etcd) https://storyboard.openstack.org/#!/story/2005143	21:06
strigazi	this fix is obvious	21:07
strigazi	3.	21:07
strigazi	imdigitaljim created Cluster creators that leave WRT Keystone cause major error https://storyboard.openstack.org/#!/story/2005145	21:07
imdigitaljim	yeah thats my 1	21:07
strigazi	it has been discusses many times. the keystone team says there is no fix	21:07
strigazi	in our cloud we manually transfer the trustee user to another account.	21:08
imdigitaljim	could we rework magnum to opt to poll heat based on a service account for 1 part	21:08
imdigitaljim	instead of using trust cred to poll heat	21:08
strigazi	imdigitaljim: some says this is a security issue, it was like this before.	21:08
imdigitaljim	oh?	21:09
strigazi	but this fixes part of the problem	21:09
imdigitaljim	couldnt it be scoped to readonly/gets for heat	21:09
imdigitaljim	the kubernetes side	21:09
imdigitaljim	either might be trust transfer (like you suggest)	21:09
imdigitaljim	or we have been opting for teams to use a bot account type approach for their tenant	21:09
imdigitaljim	that will persist among users leaving	21:09
strigazi	trusts transfer won't happen in keystone, ever	21:10
imdigitaljim	yeah	21:10
imdigitaljim	i doubt it would	21:10
jakeyip	does this happen only if the user is deleted from keystone?	21:10
strigazi	they were clear with this in the Dublin PTG	21:10
*** pcaruana has quit IRC		21:10
strigazi	yes	21:10
imdigitaljim	yeah	21:10
strigazi	the trust powers die when the user is deleted	21:11
strigazi	same for application creds	21:11
imdigitaljim	to be honest even if we fix the "magnum to opt to poll heat based on a service account"	21:11
imdigitaljim	that would be a huge improvement	21:11
imdigitaljim	that would at least enable us to delete the clusters	21:11
imdigitaljim	without db edits	21:11
strigazi	admins can delete the cluster anyway	21:11
imdigitaljim	we could not	21:12
strigazi	?	21:12
imdigitaljim	with our admin accounts	21:12
imdigitaljim	the codepaths bomb out with heat polling	21:12
imdigitaljim	not sure where	21:12
jakeyip	is this a heat issue instead?	21:12
imdigitaljim	the occurrence was just yesterda	21:12
strigazi	mayve you diverged in the code?	21:12
imdigitaljim	no i had to delete the heat stack underneath with normal heat functionality	21:12
imdigitaljim	and then manually remove the cluster via db	21:13
strigazi	wrong policy?	21:13
imdigitaljim	not with that regard	21:13
colin-	+1 re: service account, fwiw	21:13
imdigitaljim	nope	21:14
imdigitaljim	AuthorizationFailure: unexpected keystone client error occurred: Could not find user: <deleted_user>. (HTTP 404) (Request-ID: req-370b414f-239a-4e13-b00d-a1d87184904b)	21:15
strigazi	ok	21:15
jakeyip	ok so figuring out why admin can't use magnum to delete a cluster but can use heat to delete a stack will be a way forward?	21:15
jakeyip	I wonder what is the workflow for normal resources (e.g. nova instances) in case of people leaving?	21:15
strigazi	the problem is magnum can't check the status of the stack	21:16
brtknr	it would be nice if the trust was owned by a role+domain rather than a user, so anyone with the role+domain can act as that role+domain	21:16
imdigitaljim	^	21:16
imdigitaljim	+1	21:16
imdigitaljim	+1	21:16
brtknr	guess its too late to refactor things now...	21:16
imdigitaljim	imo not really	21:16
strigazi	it is a bit bad as well	21:16
imdigitaljim	but it can be bad based on the use-case	21:17
imdigitaljim	for us its fine	21:17
strigazi	the trust creds are a leak	21:17
imdigitaljim	yeah	21:17
imdigitaljim	the trust creds on the server	21:17
strigazi	userA takes trust creds from userb that they both own the cluster	21:17
imdigitaljim	and you can get access to other clusters	21:17
strigazi	userA is fired, can still access keystone	21:17
brtknr	oh, because trust is still out in the wild?	21:18
strigazi	the polling issue is different than the trust in the cluster	21:18
imdigitaljim	yeah	21:18
brtknr	change trust password rolls eyes	21:18
imdigitaljim	different issues	21:18
strigazi	we can do service account for polling again	21:18
imdigitaljim	but an admin readonly scope	21:19
imdigitaljim	?	21:19
strigazi	That is possible	21:19
strigazi	since the magnum controller is managed by admins	21:19
imdigitaljim	yeah	21:19
imdigitaljim	i think that would a satisfactory solution	21:19
imdigitaljim	the clusters we can figure out/delete/etc	21:19
imdigitaljim	but magnums behavior is a bit unavoidable	21:20
imdigitaljim	thanks strigazi!	21:20
imdigitaljim	you going to denver?	21:20
strigazi	https://github.com/openstack/magnum/commit/f895b2bd0922f29a9d6b08617cb60258fa101c68#diff-e004adac7f8cb91a28c210e2a8d08ee9	21:21
strigazi	I'm going yes	21:21
imdigitaljim	lets meet up!	21:21
strigazi	sure thing :)	21:22
strigazi	Is anyone going to work on the polling thing? maybe a longer description first in storyboard?	21:22
flwang1	strigazi: re https://storyboard.openstack.org/#!/story/2005145 i think you and ricardo proposed this issue before in mailing list	21:23
strigazi	yes, I mentioned this. I discussed it with the keystone team in Dublin	21:24
flwang1	and IIRC, we need support from keystone side?	21:24
strigazi	there won't be help or change	21:24
strigazi	from the keystone side	21:24
strigazi	22:11 < strigazi> trusts transfer won't happen in keystone, ever	21:25
strigazi	nor for application credentials	21:25
flwang1	strigazi: so we have to fix it in magnum?	21:25
strigazi	yes	21:25
strigazi	two issues, one is the polling heat issue	21:25
strigazi	2nd, the cluster inside the cluster must be rotated	21:26
imdigitaljim	creds inside*	21:26
strigazi	we had a design for this in Dublin, but not man power	21:26
strigazi	yes, creds :)	21:26
imdigitaljim	yeah 1) trust on magnum, fixable and 2) trust on cluster, no clear path yet	21:26
strigazi	2) we have a rotate certificates api with noop	21:27
strigazi	it can rotate the certs and the trust	21:27
strigazi	that was the design	21:27
flwang1	strigazi: ok, i think we need longer discussion for this one	21:27
imdigitaljim	im more concerned about 1) for the moment which is smaller in scope	21:27
imdigitaljim	2) might be more challenging and needs more discussion/desing	21:27
imdigitaljim	design	21:27
strigazi	no :) we did it one year ago, someone can implement it :)	21:27
strigazi	I'll bring up the pointer in storyboard	21:28
*** janki has quit IRC		21:29
strigazi	For the autoscaler, are there any outstanfing comments? Can we start pushing the maintainers to accept it?	21:30
flwang1	strigazi: i'm happy with current status.	21:30
flwang1	it passed my test	21:30
schaney	strigazi: there are some future enhancements that I am hoping to work with you guys on	21:31
flwang1	strigazi: so we can/should start to push CA team to merge it	21:31
strigazi	schaney: do you want to leave a comment you are happy with the current state? we can ping the CA team the {'k8s', 'sig', 'openstack'} in some order	21:32
flwang1	schaney: sure, the /resize api is coming	21:32
schaney	I can leave a comment yeah	21:34
schaney	Are you alright with me including some of the stipulations in the comment?	21:35
schaney	for things like nodegroups, resize, and a couple bugs	21:35
strigazi	schaney: I don't know how it will work for them	21:35
schaney	same, not sure if it's better to get something out there and start iterating	21:36
strigazi	+1 ^^	21:36
schaney	or try to get it perfect first	21:36
flwang1	schaney: i would suggest to track them in magnum or open separated issues later, but just my 2c	21:36
imdigitaljim	we'll probably just do PRs against the first iteration	21:37
schaney	track them in magnum vs the autoscaler?	21:37
imdigitaljim	and use issues in autoscaler repo probably	21:37
imdigitaljim	./shrug	21:37
schaney	yeah, us making PRs to the autoscaler will work for us going forward	21:38
schaney	the current PR has so much going on already	21:38
strigazi	We can focus on the things that work atm, and when it is in, PR in the CA repo are fine	21:38
flwang1	issues in autoscaler, but don't scare them :)	21:38
flwang1	strigazi: +1	21:39
schaney	one question if tghartland has looking into the TemplateNodeInfo interface method implementation	21:39
strigazi	as long as we agree on the direction	21:39
schaney	I think the current implementation will cause a crash	21:40
imdigitaljim	imho i think we're all heading the same direction	21:40
strigazi	creash on what?	21:40
strigazi	crash on what? why?	21:40
schaney	the autoscaler	21:40
strigazi	is it reproducible?	21:41
schaney	Should be, I am curious as to if you guys have seen it	21:41
strigazi	no	21:42
schaney	I'll double check, but the current implementation should crash 100% of the time when it gets called	21:42
strigazi	it is a specific call that is not implemented?	21:42
schaney	yes	21:42
strigazi	TemplateNodeInfo this >	21:42
schaney	TemplateNodeInfo()	21:42
strigazi	I'll discuss it with him tmr	21:43
schaney	kk sounds good, I think for good faith for the upstream autoscaler guys, we might want to figure that part out	21:43
schaney	before requesting merge	21:44
strigazi	100% probability of crash should be fixed first	21:44
*** ivve has quit IRC		21:44
schaney	:) yeah	21:44
strigazi	it is the vm flavor basically?	21:45
schaney	yeah pretty much	21:45
schaney	the autoscaler gets confused when there are no schedulable nodes	21:46
*** alisanhaji has quit IRC		21:46
schaney	so TemplateNodeInfo() should generate a sample node for a given nodegroup	21:46
strigazi	sounds easy	21:47
schaney	Yeah shouldn't be too bad, just need to fully construct the template node	21:47
strigazi	this however: 'the autoscaler gets confused when there are no schedulable nodes' sounds bad.	21:48
schaney	it tries to run simulations before scaling up	21:48
strigazi	so how it works now?	21:48
schaney	if there are valid nodes, it will use their info in the simulation	21:49
strigazi	it doesn't do any simulations?	21:49
schaney	if there is no valid node, it needs the result of templateNodeInfo	21:49
strigazi	if you can send us a scenario to reproduce, it would help	21:50
schaney	cordon all nodes and put the cluster in a situation to scale up, should show the issue	21:51
strigazi	but, won't it create a new node?	21:51
strigazi	I pinged him, he will try tmr	21:52
flwang1	strigazi: in my testing, it scaled up well	21:52
strigazi	schaney: apart from that, anything else?	21:52
strigazi	to request to merge	21:52
strigazi	flwang1: for me as well	21:53
schaney	I think that was the last crash that I was looking at, everything else will just be tweaking	21:54
strigazi	nice	21:54
schaney	flwang1: to be clear, this issue is only seen when effectively scaling up from 0	21:54
flwang1	schaney: i see. i haven't tested that case	21:55
schaney	rare case, but I was just bringing it up since it will cause a crash	21:55
flwang1	schaney: cool	21:55
strigazi	we can address it	21:55
schaney	awesome	21:56
strigazi	we are almost out of time	21:58
flwang1	strigazi: rolling upgrade status?	21:58
strigazi	I'll just ask one more time, Can someone look into the CI failures?	21:58
flwang1	strigazi: i did	21:59
strigazi	flwang1: end meeting first and the discuss it?	21:59
flwang1	the current ci failure is related to nested virt	21:59
strigazi	how so?	21:59
flwang1	strigazi: sure	21:59
flwang1	i even popped up in infra channel	21:59
strigazi	let's end the meeting first	21:59
colin-	see you next time	21:59
strigazi	thanks everyone	22:00
flwang1	and there is no good way now, seems infra recently upgrade their kernel	22:00
flwang1	manser may have more inputs	22:00
strigazi	#endmeeting	22:00
*** openstack changes topic to "OpenStack Containers Team"		22:00
openstack	Meeting ended Tue Mar 5 22:00:33 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	22:00
openstack	Minutes: http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.html	22:00
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.txt	22:00
openstack	Log: http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.log.html	22:00
strigazi	this thing? http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-03-03.log.html#t2019-03-03T20:17:29	22:02
flwang1	strigazi: yes	22:03
strigazi	that is why I have the CI non voting	22:03
strigazi	feels more like an indication to me all these years.	22:03
flwang1	strigazi: yep, nest virt is still a pain	22:04
strigazi	no problems with centos here	22:04
strigazi	anyway,	22:04
flwang1	maybe we should migrate to FA 29 to try?	22:04
flwang1	did you get any luck on FA 29? it failed in my testing	22:05
strigazi	at cern we use it	22:05
flwang1	which k8s version?	22:05
strigazi	I didn't have time for testing it in devstack	22:05
flwang1	from community, no change?	22:05
strigazi	no change	22:05
flwang1	ok	22:05
strigazi	1.13.3 and 1.12.4	22:06
flwang1	cool	22:06
strigazi	overlay2, no extra volumes	22:06
flwang1	ok	22:06
flwang1	btw, i have already proposed the patch for api ref of resize API https://review.openstack.org/639882	22:07
flwang1	and the health_status patch in cluster listing patch is here https://review.openstack.org/640222	22:07
strigazi	i have seen them	22:08
strigazi	missed the api-ref	22:08
strigazi	For upgrades, I'm working in the driver code.	22:09
strigazi	Do you want to take the api?	22:09
flwang1	yep, i can help polish the api patch, and the api ref	22:09
strigazi	The only part that needs work in the api is:	22:09
flwang1	as for api, do you want to use the same way I'm using for resize api?	22:10
strigazi	i think they are the same, no?	22:10
strigazi	last time I checked it was	22:10
flwang1	they should be same, a little bit diff between your current one and mine	22:10
strigazi	ok	22:10
strigazi	the only part that needs some thought is	22:11
strigazi	clusterA used CT-A to be created	22:11
strigazi	CT-A had labels X Y and Z	22:11
flwang1	labels merging issue?	22:11
strigazi	yes	22:12
strigazi	I thought of a config option to check if some labels are going to be changed, and in such a case refuse to upgrade or even create a cluster	22:13
flwang1	can we do simple merge now like ricardo and i discussed?	22:14
jakeyip	is this to do with https://review.openstack.org/#/c/621611/ ?	22:15
strigazi	no	22:16
strigazi	flwang1: were you discussed it?	22:16
strigazi	flwang1: where you discussed it?	22:16
flwang1	https://review.openstack.org/#/c/621611/	22:16
flwang1	we discussed similar issue in above patch	22:16
strigazi	this is for cluster create	22:16
flwang1	yes, but similar issue	22:17
strigazi	almost but not	22:17
flwang1	i mean, we probably want to use same policy for merging to avoid confusing users	22:17
flwang1	yes i know	22:17
strigazi	shall i explain it or not?	22:17
strigazi	in one loine	22:18
strigazi	in one line	22:18
strigazi	user in cluster creation selected version 5, the new CT to upgrade has version 4, what do you do?	22:18
strigazi	downgrade?	22:18
flwang1	for a version of an addon?	22:19
strigazi	based on 621611 yes, downgrade	22:19
strigazi	addon or k8s	22:19
jakeyip	cluster creation label should override CT label?	22:19
strigazi	jakeyip: yes, for creation. for upgrade?	22:20
strigazi	as an admin, I don't wont to support users that go rogue	22:20
jakeyip	btw I was just reviewing 621611 last night and I felt quite uneasy about it, prob cos there are many ways like this where it's going to be weird	22:20
flwang1	we should support downgrade, but better not now?	22:20
strigazi	support downgrade from user selected version, to admin suggested?	22:21
strigazi	this is asking for trouble	22:21
strigazi	the matrix of version explodes this way	22:21
flwang1	yes, so i think we don't have to do it now, maybe even in future	22:22
jakeyip	sorry I am a bit lost where is the change for this functionality? (CT label update and cluster upgrade)	22:23
strigazi	downgrading is bad	22:23
strigazi	jakeyip: there is no change for this yet	22:24
jakeyip	strigazi: I see. are we thinking of using update on CT to update clusters?	22:24
strigazi	the problem is that users can select labels in cluster creation, then with a CT they will try to upgrade and there will be conflicts	22:25
strigazi	jakeyip: yes	22:25
flwang1	strigazi: yep, labels is a pain, we can only support base image upgrade and k8s upgrade	22:26
flwang1	we need more discussion about labels, i mean i need more thinking	22:26
strigazi	we need to support uprading add ons too. but even for k8s we should discourage users to pick the version in cluster creation	22:27
strigazi	let's see next week about it	22:27
flwang1	strigazi: that's why i mentioned before, we probably need another attributes for template	22:27
flwang1	which can indicate the compatibility with new/old versions	22:28
*** sdake has quit IRC		22:28
strigazi	this week you can pick the API patch and I continue with the driver	22:28
flwang1	for example, CT-1.11.2 has attribute can_upgrade_to ['1.12.4', '1.13.4']	22:28
flwang1	strigazi: no problem	22:28
strigazi	cool	22:29
*** sdake has joined #openstack-containers		22:29
*** sdake has quit IRC		22:29
flwang1	strigazi: thank you, my frined	22:29
flwang1	friend	22:29
strigazi	jakeyip: do you need help with anything?	22:29
strigazi	flwang1: thank you	22:29
strigazi	flwang1: you do too much for the project	22:30
jakeyip	yes, maybe https://review.openstack.org/#/c/638077/ ?	22:30
jakeyip	Pardon me, I am a newbie, but I feel like the CT to Clusters relationship needs to be defined a bit better?	22:31
flwang1	strigazi: btw, as for https://review.openstack.org/640211	22:31
jakeyip	if they are tightly coupled then updating CT is going to be very scary, for both operators and users	22:31
strigazi	jakeyip: ack for 638077	22:31
strigazi	jakeyip: we can limit the access to CTs for users	22:32
flwang1	i changed my mind, i would like to show the health_status_reason by default, cause the api has returned everything, we don't have to ask the user to add --detailed again to trigger another api call to see the health_status_reason, thoughts?	22:32
strigazi	jakeyip: and give less freedom in labels in cluster creation	22:32
jakeyip	I feel like having CT just acting like a template is good, it prefills fields that you can override	22:33
strigazi	flwang1: hmm, i'm only concerned for very big clusters	22:33
flwang1	strigazi: that's rare case, no?	22:33
strigazi	what is the limit of the field in the db?	22:33
flwang1	maybe common in cern	22:34
strigazi	for us more than 500 nodes is a bit rare	22:34
strigazi	but a few 100s is not	22:34
strigazi	we can take it :)	22:34
flwang1	strigazi: you try download the patch and give it try	22:35
strigazi	since the info is in the db anyway	22:35
strigazi	yes	22:35
flwang1	if it's really a pain, i'm open to support --detailed	22:35
strigazi	cool	22:35
*** sapd1 has joined #openstack-containers		22:36
jakeyip	strigazi: I, as an operator, will feel uneasy updating a CT that half the clusters in my cloud depend on. So maybe I won't do it and just create new CTs. Negating the whole benefit.	22:36
flwang1	i'm good, sorry for a lot of pushing	22:36
strigazi	won't be need probably, I'll add Ricardo in the review too	22:36
strigazi	jakeyip it is not possible to update used CT and it won't be	22:37
flwang1	jakeyip: i can see your pain, but just like image, make CT immutable also have good sides	22:37
*** sdake has joined #openstack-containers		22:37
strigazi	jakeyip: did you understand that CTs will be mutable? they will continue to be immutable	22:38
jakeyip	strigazi: sorry I thought we are talking about a updating labels on a CT to trigger upgrade on a cluster?	22:38
jakeyip	ok. phew	22:38
strigazi	jakeyip: selecting a new CT triggers upgrade	22:38
jakeyip	strigazi: thanks for the clarification!	22:39
strigazi	cool, I have to go guys	22:39
strigazi	see you around	22:40
strigazi	thanks flwang1 jakeyip for all the work	22:40
brtknr	Sorry, i enjoyed my observer role, like to stay in the loop! good night	22:40
jakeyip	see you thanks strigazi as always	22:40
flwang1	strigazi: see you	22:40
*** andrein has quit IRC		22:40
strigazi	brtknr: cheers	22:41
strigazi	bye	22:41
imdigitaljim	flwang1 we've moved long past having CT blocking issues for upgrades fwiw	22:42
imdigitaljim	id be glad to share more recent updates to centos driver	22:42
imdigitaljim	we want a few last things done and a few of my team is going to scour the code and ready it for upstreaming	22:42
brtknr	imdigitaljim: are you using centos atomic or vanilla centos?	22:42
imdigitaljim	centos 7.6	22:42
brtknr	atomic or not?	22:43
imdigitaljim	nope	22:43
brtknr	with magnum?	22:43
imdigitaljim	yes	22:43
imdigitaljim	this driver can be fairly easily adapted to ubuntu as wel	22:44
imdigitaljim	and the like	22:44
brtknr	thats cool! i often get questions about whether that is possible	22:44
imdigitaljim	oh for sure it is	22:44
imdigitaljim	ill ping you when we start uploading the driver	22:44
imdigitaljim	it works differently than fedoras	22:44
imdigitaljim	but its still executed the same way if that makes sense	22:44
brtknr	do you have kube* services running as containers?	22:44
imdigitaljim	yes	22:44
*** sdake has quit IRC		22:45
brtknr	cool!	22:45
imdigitaljim	we also rely on github for versioning	22:45
imdigitaljim	so when you have a cluster you know what git revision of the cluster it was in case you need to know how it was bootstrapped	22:45
imdigitaljim	and providing newer versions is ezpz	22:45
imdigitaljim	upgrades in place are done through an api call + heat agent + kubernetes deployment	22:46
brtknr	thats one way of doing it!	22:46
imdigitaljim	so in other words we have a repo for magnum and a repo for the bootstrapping kubernetes content	22:46
imdigitaljim	that we handle separately	22:46
imdigitaljim	we hardly ever update magnum's code	22:46
imdigitaljim	ill be glad to provide documentation on it and how it works for everything	22:47
jakeyip	how do you point k8s to the config repo ?	22:47
brtknr	so you dont go via kubeadm ?	22:47
imdigitaljim	when that time comes	22:47
imdigitaljim	no but we've considered adapting that approach	22:47
jakeyip	that's nice in a way. good for power users.	22:47
imdigitaljim	we do it the hardway since we have more control	22:48
imdigitaljim	(i.e similar to fedora atomics)	22:48
imdigitaljim	btw we could use any git repo	22:48
imdigitaljim	even github.com	22:48
brtknr	can you upgrade docker in place too? inside containers?	22:48
imdigitaljim	youd provide that bootstrapping endpoint in the config file	22:48
brtknr	*inside vms	22:48
imdigitaljim	when we need to upgrade docker	22:48
imdigitaljim	we just do a rolling scale	22:49
imdigitaljim	rolling/update	22:49
imdigitaljim	but we actually have puppet connected	22:49
imdigitaljim	which can do it too	22:49
imdigitaljim	but thats not part of the requirement for upstream content	22:49
imdigitaljim	puppet is not a dependency	22:49
brtknr	do you replace the image with new docker version or update the package itself?	22:49
imdigitaljim	so for example	22:50
imdigitaljim	the image in the template is centos-magnum	22:50
imdigitaljim	we update the image with a new centos-magnum with updated docker	22:50
imdigitaljim	and when nodes are scaled in/out they come up with new version of centos	22:50
imdigitaljim	and/or docker	22:50
imdigitaljim	and any additional software upgrades	22:50
*** sdake has joined #openstack-containers		22:51
imdigitaljim	based on a little git management and using the git api	22:51
imdigitaljim	we can provide any versions of kubernetes	22:51
imdigitaljim	(that we want to support)	22:51
brtknr	so rolling scale = -1 old instance +1 new instance?	22:51
imdigitaljim	so we support v1.12.1 -> v1.13.4	22:51
imdigitaljim	and slowly move up from there as people dont have older clusters	22:52
imdigitaljim	yeah	22:52
brtknr	doesnt that just replace the nths node	22:52
brtknr	n_th node*	22:52
*** munimeha1 has quit IRC		22:52
imdigitaljim	minions_to_remove=1...N	22:52
imdigitaljim	masters_to_remove=1...N	22:53
imdigitaljim	just make that call	22:53
imdigitaljim	until all minions/masters are cycled	22:53
imdigitaljim	thats how we execute the inplace upgrades mostly too	22:53
jakeyip	this is via kubectl or ?	22:53
imdigitaljim	automatic	22:53
imdigitaljim	heat-api call essentially though	22:53
imdigitaljim	we have a kubernetes deployment that gets put on the cluster	22:54
imdigitaljim	that cycles them	22:54
imdigitaljim	the cluster manages itself	22:54
imdigitaljim	it does a drain + kill	22:55
imdigitaljim	so no loss of service	22:55
imdigitaljim	we might iterate and if you have enough capacity	22:55
imdigitaljim	you could grow your cluster N -> 2N	22:55
jakeyip	nice. one thing I'm confused when we were talking about a new image, is that a docker image or glance image?	22:55
imdigitaljim	then back down to N canning the old nodes	22:55
imdigitaljim	ah yeah	22:56
imdigitaljim	glance image	22:56
imdigitaljim	centos-magnum is the glance image defined in the template	22:56
imdigitaljim	and we upgrade that	22:56
jakeyip	ok so heat-api to do a rebuild, or is that a new nova instance ?	22:56
imdigitaljim	so new nodes come up with the upgrades	22:56
brtknr	so the glance image name is important	22:56
jakeyip	I guess N -> 2N is a new nova instance	22:56
brtknr	there cannot be duplicate centos-magnum images?	22:56
imdigitaljim	we dont need multiple images	22:57
imdigitaljim	for our case	22:57
imdigitaljim	just 1	22:57
brtknr	what i mean is, old and new version keep the same name	22:57
imdigitaljim	if youre creating a cluster or upgrading nodes	22:57
brtknr	?	22:57
imdigitaljim	we delete old	22:57
imdigitaljim	or rename old	22:57
imdigitaljim	and provide new	22:57
brtknr	ok cool	22:57
imdigitaljim	theres only ever 1 by that name	22:58
imdigitaljim	so its always okay	22:58
imdigitaljim	:D	22:58
brtknr	=D	22:58
brtknr	i like the way docker solves this problem by using tag	22:58
brtknr	image hash can be different but tag is always the same	22:59
imdigitaljim	yeah	22:59
brtknr	glance could benefit from something similar	22:59
imdigitaljim	basically thats how we treat the glance image	22:59
imdigitaljim	but yeah	22:59
imdigitaljim	its not by default	22:59
imdigitaljim	because when you provide same tag in docker	22:59
imdigitaljim	it detags the old one	23:00
imdigitaljim	which is what we do	23:00
imdigitaljim	heh	23:00
brtknr	are you also doing federations?	23:00
imdigitaljim	we do keep it for a while	23:00
imdigitaljim	not yet but thats to come	23:00
imdigitaljim	well push upstream probably after inplace upgrades is fully completed	23:00
brtknr	with a mix of gpu/non-gpu nodes in the same cluster?	23:00
imdigitaljim	its mostly like a alpha/beta level maturity	23:00
imdigitaljim	ah yeah we'd also need to probably get hte nodegroups in place for upstream since the fedora guys all want it	23:01
imdigitaljim	we dont use it here	23:01
imdigitaljim	but we know its important	23:01
brtknr	hte?	23:01
imdigitaljim	the*	23:01
jakeyip	imdigitaljim: so are you still using magnum?	23:02
imdigitaljim	yup	23:02
brtknr	lol^	23:02
imdigitaljim	core magnum is almost the same (tiny changes that we'd upstream)	23:02
jakeyip	lol I don't mean it that way	23:02
*** sdake has quit IRC		23:02
jakeyip	just for 1st provisioning?	23:02
imdigitaljim	and everything else is a driver change	23:02
imdigitaljim	magnum =/= fedora atomic k8s for us	23:02
imdigitaljim	if thats what you mean	23:02
jakeyip	I am thinking everything like node-count / image is going to be out of sync with your approach	23:03
imdigitaljim	magnum is basically just a CRUD service	23:03
imdigitaljim	when flwang fixes the api	23:03
imdigitaljim	it wont be	23:03
imdigitaljim	but atm yes it only reflects create time	23:03
* brtknr googles CRUD		23:03
brtknr	oh i see	23:04
imdigitaljim	create read update delete	23:04
imdigitaljim	the image doesnt get out of date	23:04
imdigitaljim	the node-count does	23:04
jakeyip	so it might be magnum thinks node-count is 2 but heat and actually is 4? then what happens when flwang api updates node-count?	23:04
imdigitaljim	we were also thinking of a better way than polling heat	23:04
imdigitaljim	but see if we can interact with a rabbitmq or something	23:05
imdigitaljim	heat <-> magnum relationship is pretty close anyways	23:05
imdigitaljim	jakeyip	23:05
imdigitaljim	when any scaling happens we'd use magnum's api instead of heats	23:05
imdigitaljim	and then magnum will update itself as expected	23:05
imdigitaljim	but mostly magnum just proxies requests to heat anyways	23:06
jakeyip	ok but for current clusters they'll be out of synced and someone needs to fix them up again I think?	23:06
imdigitaljim	how so?	23:06
jakeyip	the upgrade workflow you were mentioning about adds new nodes?	23:07
imdigitaljim	we dont get out of synced issues	23:08
imdigitaljim	so maybe ive already solved that and hadnt explained it	23:08
jakeyip	is node-count going to be eventually consistent with what's defined in magnum, after the upgrade ?	23:09
imdigitaljim	oh you mean the N->2N thing	23:13
imdigitaljim	we dont do that now	23:13
imdigitaljim	that was just an idea for a more optimal upgrade	23:13
imdigitaljim	but with the extra resource requirement	23:13
jakeyip	I see. thanks for the clarification!	23:13
jakeyip	would love to see your implementation when it is possible!	23:14
brtknr	Me too	23:31
*** sdake has joined #openstack-containers		23:33
imdigitaljim	yeah im looking forward to submitting it and would love to have some additional users	23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!