Tuesday, 2020-11-17

openstackgerrit	Merged openstack/kayobe stable/train: Fix filtering of network names set to an empty string https://review.opendev.org/761331	00:09
*** k_mouza has joined #openstack-kolla		00:45
*** k_mouza has quit IRC		00:49
*** kevko has joined #openstack-kolla		00:56
*** zzzeek has quit IRC		01:06
*** zzzeek has joined #openstack-kolla		01:09
*** xinliang has joined #openstack-kolla		01:15
*** xinliang has quit IRC		01:31
openstackgerrit	Merged openstack/kolla-ansible stable/victoria: CI: add missing --fail argument to curl https://review.opendev.org/762745	02:38
*** skramaja has joined #openstack-kolla		02:59
*** kevko has quit IRC		03:05
*** vishalmanchanda has joined #openstack-kolla		03:52
*** sri_ has quit IRC		04:07
*** sri_ has joined #openstack-kolla		04:11
*** vkmc has quit IRC		04:16
*** vkmc has joined #openstack-kolla		04:16
*** k_mouza has joined #openstack-kolla		04:46
*** johnsom has quit IRC		04:46
*** johnsom has joined #openstack-kolla		04:49
*** k_mouza has quit IRC		04:50
*** stackedsax has quit IRC		05:13
*** stackedsax has joined #openstack-kolla		05:13
*** johnsom has quit IRC		05:18
*** johnsom has joined #openstack-kolla		05:19
*** zzzeek has quit IRC		05:25
*** zzzeek has joined #openstack-kolla		05:27
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #openstack-kolla		05:33
*** zzzeek has quit IRC		05:50
*** zzzeek has joined #openstack-kolla		05:51
*** JamesBenson has quit IRC		05:53
*** zzzeek has quit IRC		06:10
*** zzzeek has joined #openstack-kolla		06:11
*** johnsom has quit IRC		06:27
*** johnsom has joined #openstack-kolla		06:27
*** gfidente\|afk is now known as gfidente		06:42
*** cah_link has joined #openstack-kolla		06:47
*** zzzeek has quit IRC		07:14
*** zzzeek has joined #openstack-kolla		07:16
*** rm_work has quit IRC		07:26
*** jbadiapa has joined #openstack-kolla		07:27
*** rm_work has joined #openstack-kolla		07:28
*** wuchunyang has joined #openstack-kolla		07:35
*** zzzeek has quit IRC		07:37
*** zzzeek has joined #openstack-kolla		07:38
*** nikparasyr has joined #openstack-kolla		07:46
*** rpittau\|afk is now known as rpittau		08:05
yoctozepto	dcapone2004: one would have to switch to using role-based hostnames to have this working nicely	08:10
mnasiadka	morning	08:18
*** pescobar has quit IRC		08:21
*** Fl1nt has joined #openstack-kolla		08:22
Fl1nt	Good morning everyone!	08:22
Fl1nt	sorry for the vanishing, but I've been busy at work :)	08:23
mnasiadka	we are all busy at work	08:25
*** zzzeek has quit IRC		08:26
*** pescobar has joined #openstack-kolla		08:27
*** bengates has joined #openstack-kolla		08:28
*** zzzeek has joined #openstack-kolla		08:30
*** zzzeek has quit IRC		08:38
*** zzzeek has joined #openstack-kolla		08:40
*** zzzeek has quit IRC		08:48
Fl1nt	yeah! fortunately, it's all good for the community as I'm working to setup an internal advocacy division about Openstack / Kolla / Ansible etc at work.	08:49
Fl1nt	BTW, quick question, when we activate external TLS, it create an appropriate automatic redirect for horizon but not for keystone and other services, is there a reason? Same way, when dealing with SSO using SAML2 iDP endpoint and especially ADFS	08:51
Fl1nt	this one refuse to send back claims to non-tls endpoint	08:51
*** zzzeek has joined #openstack-kolla		08:51
Fl1nt	meaning that returnTo parameter can't be forged using non-tls by Apache	08:51
*** mgoddard has joined #openstack-kolla		08:51
Fl1nt	meaning we miss a SSLEngine on option on keystone Apache public virtualhost section	08:52
Fl1nt	I'm currently testing it, but if someone have any clue on this ^^	08:52
*** bengates has quit IRC		08:59
*** k_mouza has joined #openstack-kolla		09:00
*** bengates has joined #openstack-kolla		09:01
*** k_mouza has quit IRC		09:05
*** kevko has joined #openstack-kolla		09:18
*** sean-k-mooney has quit IRC		09:27
*** sean-k-mooney has joined #openstack-kolla		09:27
*** slunav has quit IRC		09:28
*** mgoddard has quit IRC		09:29
*** sluna has joined #openstack-kolla		09:31
Fl1nt	aaaah fuuuu** it's actually available on Train+ release.	09:33
Fl1nt	all right.	09:33
openstackgerrit	Michal Nasiadka proposed openstack/kolla-ansible stable/ussuri: [baremetal]: Use $releasever in docker-ce repo https://review.opendev.org/762979	09:35
Fl1nt	mnasiadka, why would you use the docker_yum_baseurl when you can just use the already variabilized docker-ce.repo file: https://download.docker.com/linux/centos/docker-ce.repo ?	09:39
mnasiadka	Fl1nt: feel free to propose a change to master, I'm not putting any more cycles into that ;)	09:40
Fl1nt	no, I mean, I'm looking for the reasoning behind that.	09:41
mnasiadka	Fl1nt: reasoning is so it's easy to backport	09:41
Fl1nt	just curious pp	09:41
Fl1nt	ok	09:41
*** bengates has quit IRC		09:46
*** bengates has joined #openstack-kolla		09:47
Fl1nt	Still working on SSO Using SAML2 within k-a this will be a big boy, the amount of configuration and logic is pretty large.	09:49
Fl1nt	and I've finished the cloudkitty role fix for Elasticsearch and prometheus, I just need time to push the patch.	09:50
*** bengates has quit IRC		09:51
*** bengates has joined #openstack-kolla		09:54
*** zzzeek has quit IRC		09:57
*** zzzeek has joined #openstack-kolla		10:02
*** wuchunyang has quit IRC		10:16
*** mgoddard has joined #openstack-kolla		10:19
*** jpward has quit IRC		10:20
openstackgerrit	Michal Nasiadka proposed openstack/kolla master: prometheus: Add OVN and OVS exporters https://review.opendev.org/762986	10:34
mgoddard	Fl1nt: Apache is always non-TLS, unless you are using backend TLS (Ussuri+)	10:35
mgoddard	HAproxy terminates TL	10:35
mgoddard	TLS	10:35
mgoddard	Fl1nt: and horizon has a redirect because the port is different for HTTP	10:36
mgoddard	plus it's more of a user/browser facing service	10:36
*** zzzeek has quit IRC		10:38
*** zzzeek has joined #openstack-kolla		10:40
Fl1nt	actually, when dealing with ADFS (SAML 2.0) you need keystone tls backend as apache (mod_auth_mellon) create a returnTo URL which use non-tls otherwise, and using a non-tls returnTo is not working	10:40
Fl1nt	exemple:	10:41
Fl1nt	when your user call /auth/login using WebSSO it call for https://<fqdn>:5000/v3/auth/OS-FEDERATION/identity-providers/adfs/protocols/saml2/websso?origin=https://<fqdn>/auth/websso	10:44
mnasiadka	Fl1nt: tried just setting ServerName directive in wsgi config?	10:47
Fl1nt	yep doesn't work neither.	10:47
mnasiadka	works for me	10:48
Fl1nt	ServerName <fqdn> at the virtualHost level just right upper the WSGI config?	10:48
Fl1nt	I'll check everything again.	10:49
*** k_mouza has joined #openstack-kolla		10:51
*** kwazar is now known as quasar_		10:55
*** quasar_ is now known as quasar		10:55
*** quasar is now known as quasar`		10:56
Fl1nt	ok, so, just to clarify the issue.	11:04
Fl1nt	when using a normal kolla train branch, only enabling kolla_enable_external_tls	11:05
Fl1nt	at some point	11:05
Fl1nt	the SP (keystone/mod_auth_mellon) craft the relayState URL	11:05
Fl1nt	that URL should be a TLS endpoint using your now enabled public tls endpoint for keystone: https://<fqdn>:5000/v3	11:06
Fl1nt	however	11:06
Fl1nt	and it's were I'm being lost	11:06
Fl1nt	for some reason	11:06
Fl1nt	apache isn't using our originURL=https://<fqdn>:5000/v3 value	11:06
Fl1nt	but the non-tls equivalent	11:07
Fl1nt	I've try to add a http to https redirect at haproxy level adding a keystone_public_redirect section within keystone services dict for haproxy, it works, but it doesn't as it create a loop as the relayState is a parameter that is then redirected but create a new request with a non-tls parameters etc	11:08
*** e0ne has joined #openstack-kolla		11:08
Fl1nt	so, for now, my conclusion is: Until apache virtualhost isn't using TLS (SSLEngine on) mod_auth_mellon context can't craft a TLS relayState url as it's location composition (VirtualHost context) isn't TLS based.	11:10
openstackgerrit	Michal Nasiadka proposed openstack/kolla-ansible master: WIP: Add OVN and OVS exporter deployment https://review.opendev.org/762992	11:12
*** wuchunyang has joined #openstack-kolla		11:15
*** zzzeek has quit IRC		11:17
*** zzzeek has joined #openstack-kolla		11:20
*** zzzeek has quit IRC		11:27
*** zzzeek has joined #openstack-kolla		11:28
*** stingrayza has quit IRC		11:29
*** stingrayza has joined #openstack-kolla		11:31
Fl1nt	ok, so there is someone else having the same issue: https://github.com/latchset/mod_auth_mellon/issues/27 and hopefully there is mellon diagnostics directive to activate more verbose log.	11:35
openstackgerrit	Mark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build https://review.opendev.org/761928	11:42
openstackgerrit	Mark Goddard proposed openstack/kolla master: CI: add templated Dockerfiles to build logs https://review.opendev.org/762997	11:42
*** brinzhang0 has joined #openstack-kolla		11:50
*** brinzhang_ has quit IRC		11:53
*** brinzhang_ has joined #openstack-kolla		11:55
*** brinzhang0 has quit IRC		11:58
*** mgoddard has quit IRC		12:01
openstackgerrit	Michal Nasiadka proposed openstack/kolla master: prometheus: Add OVN and OVS exporters https://review.opendev.org/762986	12:10
*** wuchunyang has quit IRC		12:11
*** JamesBenson has joined #openstack-kolla		12:12
*** mgoddard has joined #openstack-kolla		12:15
*** jpward has joined #openstack-kolla		12:20
*** Luzi has joined #openstack-kolla		12:36
*** stingrayza has quit IRC		12:37
*** zzzeek has quit IRC		12:39
*** zzzeek has joined #openstack-kolla		12:41
openstackgerrit	Michal Nasiadka proposed openstack/kolla master: prometheus: Add OVN exporter https://review.opendev.org/762986	12:45
Fl1nt	ok, I managed to re deploy our staging environment in order to validate my assertion.	12:56
Fl1nt	so	12:56
Fl1nt	our ADFS refuse to send claims to non-tls assertion consumer service URLs (postResponse) but in the meantime, if I do activate the kolla_enable_external_tls then the VIP:80 redirect to VIP:443 but then VIP:5000 doesn't provide TLS endpoint until you declare either haproxy to redirect keystone_external_redirect or keystone apache vhost TLS Backend.	12:58
Fl1nt	now my question to you mnasiadka is, do you use a Train release or a Ussuri one?	12:59
mnasiadka	Ussuri	12:59
Fl1nt	TBN: CentOS provided mod_auth_mellon comes without mellon_diagnostic compiled so no chance ^^	12:59
Fl1nt	ah ok, so hence why it works	13:00
Fl1nt	ok and finally, did anyone already successfully managed to make keystone SAML2.0 federated authentication usin mod_mellon work on a Train release?	13:01
*** dougsz has joined #openstack-kolla		13:01
Fl1nt	from my test, the missing part is all the backend tls works done with ussuri as when doing federation agains SAML2 endpoint (mainly ADFS) you need the whole communication channel to be TLS as your client (WebBrowser mostly) need to connect to keystone directly using TLS but keystone mod_mellon won't craft a TLS endpoint until your apache vhost is using TLS.	13:04
Fl1nt	more accuratly, mod_mellon won't create an appropriate relayState URL	13:05
Fl1nt	and so lead you to this redirect loop nightmare	13:06
Fl1nt	because haproxy translate the request from HTTP to HTTPS but he request parameter continue to be a non-tls originURL/RelayState URL etc etc	13:06
*** wuchunyang has joined #openstack-kolla		13:07
Fl1nt	mgoddard, until how many time is Train supposed to be supported on kolla?	13:07
mgoddard	Fl1nt: it will enter extended maintenance in about 6 months	13:08
yoctozepto	Fl1nt: but we can support it in em stage as long as we care	13:08
yoctozepto	just with no new official releases	13:08
Fl1nt	I need to evaluate if it worth the effort to either migrate to ussuri right now and so natively get TLS backend support, or if I should patch Train appropriately	13:09
yoctozepto	I believe Train might be late because Ussuri breaks c7 compat	13:09
yoctozepto	I would just go to Ussuri	13:09
yoctozepto	you would have to upgrade anyhow	13:09
Fl1nt	yeah, this is kind of an additional breakthrough but I want to migrate on C8 for prod release.	13:09
yoctozepto	mgoddard, mnasiadka: I have some cycles for upstream today and tomorrow - any priority stuff to look at?	13:10
mgoddard	yoctozepto: docker pull limits	13:11
mgoddard	yoctozepto: https://bugs.launchpad.net/kolla-ansible/+bug/1904062	13:11
openstack	Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,Triaged]	13:11
yoctozepto	mgoddard: ack, have you talked to infra since then?	13:11
mgoddard	yoctozepto: not yet	13:11
yoctozepto	ok, then I will handle this	13:12
Fl1nt	mgoddard, regarding ceph external bug, it's not recommended to fix the host actually and as mentioned there is a command to migrate volumes.	13:13
Fl1nt	but I don't completely get the issue tbh it's not that clear what's the issue.	13:14
Fl1nt	oooh ok, I see, BTW, train doc about external ceph is broken.	13:15
mnasiadka	mgoddard: I can take the cinder bug, did an investigation yesterday.	13:19
mgoddard	thanks mnasiadka	13:20
yoctozepto	mgoddard: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018817.html	13:30
yoctozepto	thanks mnasiadka	13:31
yoctozepto	mgoddard: also pinged infra on irc (#opendev)	13:31
yoctozepto	let's see and I will coordinate this	13:31
yoctozepto	any other urgent matters?	13:31
mgoddard	nice, thanks yoctozepto	13:31
openstackgerrit	Michal Nasiadka proposed openstack/kolla-ansible master: cinder: start using active-active for rbd https://review.opendev.org/763011	13:33
mgoddard	yoctozepto: I don't think so. I guess just victoria stabilisation	13:33
mnasiadka	yeah, we could look at bugs targeted at victoria and just start closing them	13:33
mnasiadka	I guess Kolla should be close to a first stable release for Victoria	13:34
yoctozepto	that is what I wanted to do next, so we are aligned	13:35
*** k_mouza has quit IRC		13:41
openstackgerrit	Merged openstack/kolla master: Bump up openstack exporter to 1.2.0 https://review.opendev.org/761123	13:45
*** dougsz has quit IRC		13:48
*** dougsz has joined #openstack-kolla		14:17
*** dougsz has quit IRC		14:17
*** dougsz has joined #openstack-kolla		14:18
*** Luzi has quit IRC		14:18
openstackgerrit	Merged openstack/kayobe-config-dev master: Sync configs with kayobe @ 074024d63f9cb364ca16a7a7f0ac94d77ee9466b https://review.opendev.org/762826	14:19
*** k_mouza has joined #openstack-kolla		14:24
*** k_mouza has quit IRC		14:24
*** k_mouza has joined #openstack-kolla		14:24
*** brinzhang_ has quit IRC		14:26
mnasiadka	yoctozepto: so in order to close a bug, I need to submit a feature with some distributed lock manager? :D	14:41
mnasiadka	I see tripleo is using etcd	14:41
yoctozepto	mnasiadka: not sure about our etcd either, sorry	14:41
yoctozepto	mnasiadka: I mean its ha properties	14:42
openstackgerrit	Mark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build https://review.opendev.org/761928	14:42
openstackgerrit	Mark Goddard proposed openstack/kolla master: Remove footer block from intermediate images https://review.opendev.org/763027	14:42
yoctozepto	mnasiadka: redis generally works better because of etcd driver quirks	14:42
mnasiadka	well, we would need to enforce coordination backend to anything when cinder backend is ceph	14:42
yoctozepto	but really one needs to finally look at that lock mechanism	14:42
yoctozepto	yeah, that too	14:42
yoctozepto	sounds bad	14:42
yoctozepto	the previous generally worked	14:42
yoctozepto	so maybe for now we should just keep it	14:43
mnasiadka	yoctozepto: it led to some duplications, because it was meant for active/passive	14:45
yoctozepto	mnasiadka: could you expand that?	14:46
yoctozepto	mnasiadka: I might want to know :D	14:46
mnasiadka	https://docs.openstack.org/cinder/latest/contributor/high_availability.html#cinder-volume	14:47
mnasiadka	check the attention :)	14:47
yoctozepto	mnasiadka: yeah, that's why we should use backend_host and I am pretty sure we always did	14:50
yoctozepto	let me check my deployment	14:51
yoctozepto	mhm	14:52
yoctozepto	though I think I set it	14:53
*** TrevorV has joined #openstack-kolla		14:53
yoctozepto	mnasiadka: it only has some issues that mgoddard linked to: https://bugs.launchpad.net/cinder/+bug/1837403	14:55
openstack	Launchpad bug 1837403 in openstack-ansible trunk "CleanableInUse exceptions when doing large parallel operations (like snapshot creates)" [Undecided,New]	14:55
yoctozepto	"large number of parallel Cinder operations"	14:55
yoctozepto	I certainly do not have it	14:55
yoctozepto	from cinder docs I can't tell why backend_host would be "hacky"	14:56
mnasiadka	Well, we can just add it back, and work on coordination	14:57
mnasiadka	It won’t be worse...	14:57
yoctozepto	exactly	14:59
yoctozepto	but we did not have it	15:00
yoctozepto	so it is a problem for those moving from internal to external	15:00
yoctozepto	so I guess this is a general issue against external	15:00
yoctozepto	and not essentially its refactoring	15:00
mgoddard	The most common deployment option for Cinder-Volume is as Active-Passive. This requires a common storage backend, the same Cinder backend configuration in all nodes, having the backend_host set on the backend sections, and using a high-availability cluster resource manager like Pacemaker.	15:00
yoctozepto	yeah, the pacemaker sounds scary	15:01
mgoddard	we deploy active/active	15:01
yoctozepto	but it seems to work nonetheless without it	15:01
yoctozepto	we deploy whatever	15:01
yoctozepto	:D	15:01
Fl1nt	Can someone explain this issue? I'm having a hard time finding what's the problem.	15:08
Fl1nt	because I'm using an external CEPH cluster and everything is active-active from a cinder and nova standpoint.	15:09
*** dougsz has quit IRC		15:11
Fl1nt	I'm currently transfering my whole kolla-config for our new deployment that include a fair amount of downstream patch, so I'll be able to give some examples and samples of how we did it if needed.	15:11
*** cah_link has quit IRC		15:12
*** cah_link has joined #openstack-kolla		15:13
mnasiadka	mgoddard, yoctozepto: so what - just add backend_host back as part of the bugfix, and work on cluster/coordination? or do we want to do it properly (according to cinder)	15:13
yoctozepto	mnasiadka: where does it say that we should use active/active?	15:15
Fl1nt	just TBN: not using backend_host and having a working one.	15:15
yoctozepto	it feels right to use it	15:15
mnasiadka	yoctozepto: well, we deploy active active (multiple cinder-volumes)	15:15
yoctozepto	Fl1nt: then using `cluster` perhaps?	15:15
Fl1nt	neither	15:15
Fl1nt	hold on, I'm redacting/pasting	15:15
mnasiadka	yoctozepto: but the mechanism we use to present them as one, is a bit hacky :)	15:15
yoctozepto	Fl1nt: well, then your HA is not really HA	15:15
yoctozepto	mnasiadka: hmm, could be	15:15
yoctozepto	mnasiadka: I felt like it worked like active/passive anyhow	15:16
yoctozepto	because cinder claims to require coordination for active/active	15:16
mnasiadka	yoctozepto: I guess it did, we could just set active/backup in haproxy	15:16
yoctozepto	and I'm not running one	15:16
Fl1nt	I swear it is, as all hosts uses all cinder-volume nodes, hold on for the screenshots and conf	15:16
mnasiadka	yoctozepto: as the bugfix	15:16
yoctozepto	mnasiadka: haproxy does not care about cinder-volume	15:16
mnasiadka	ah right	15:16
yoctozepto	:-)	15:16
mnasiadka	so then we can just go back to old hacky somewhat working solution	15:17
mnasiadka	and close the bug	15:17
*** dougsz has joined #openstack-kolla		15:17
mnasiadka	because cluster+coordination seems like it needs a fair amount of testing	15:18
openstackgerrit	Michal Nasiadka proposed openstack/kolla-ansible stable/ussuri: [baremetal]: Use $releasever in docker-ce repo https://review.opendev.org/762979	15:20
mgoddard	mnasiadka, yoctozepto: let's find out from Cinder what the potential issues with backend_host & active/active are	15:21
mgoddard	mailing list time?	15:21
yoctozepto	mgoddard: ok, good call	15:23
yoctozepto	I wonder as it works just fine for me at the moment :D	15:24
yoctozepto	(and really thought it somehow degraded itself to active/passive due to no coordination)	15:24
yoctozepto	you might want to mention that no coordination is configured	15:24
*** ysirndjuro has joined #openstack-kolla		15:24
mnasiadka	yoctozepto: if you feel super bad to images created with just downloading a file via curl, so then we need to fix one third of images? :)	15:27
yoctozepto	mnasiadka: feel bad about increasing our debt :D	15:31
mnasiadka	yoctozepto: well, then we need some better approach - any proposals? ;)	15:31
Fl1nt	yoctozepto, ok, which config/screen do you need?	15:32
Fl1nt	I've retrieve cinder-volume.conf	15:32
Fl1nt	ceph.conf	15:32
Fl1nt	and dash screen of the services and volumes distribution	15:32
Fl1nt	http://paste.openstack.org/show/aM7Xsdim8oAHQXcfAUVY/ - cinder-volume.conf	15:33
Fl1nt	http://paste.openstack.org/show/Tw0FnpLhdQN93EP1teSm/ - ceph.conf	15:33
Fl1nt	and here is the volumes distribution: https://imgur.com/a/2CKptab	15:35
Fl1nt	TBN: this is on a staging cluster but yet.	15:35
Fl1nt	sorry for those blackbox but I don't yet get a local distributed cluster to live show as I'm still waiting for some parts.	15:36
mnasiadka	yoctozepto: we can use ADD, but it will not cache those downloads anyway	15:38
Fl1nt	or you could use the packages from distributions, I've done it that way as we don't get access to internet.	15:39
yoctozepto	Fl1nt: kill one cinder-volume to discover that you can no longer manage volumes linked to it	15:39
yoctozepto	mnasiadka: I mean I would love to have this from real repos	15:39
mnasiadka	yoctozepto: real repos... you want people writing 300 lines of code go an extra mile and build rpms and debs? :)	15:40
mnasiadka	it would probably take more time than writing the app :)	15:40
yoctozepto	mnasiadka: yeas	15:40
yoctozepto	make them feel the pain	15:40
Fl1nt	aaaaaaah THAT! ok, so the issue is about management. not distribution or availability.	15:40
yoctozepto	Fl1nt: well, availability during failure is impacted	15:41
yoctozepto	running vms are happy	15:41
yoctozepto	but otherwise :-)	15:41
mnasiadka	yoctozepto: get back on the ground now ;)	15:41
Fl1nt	it's not, we already tested that, if you loose your storage controller you don't loose the VM workload and attachment, you just need to rebalance the volume managed.	15:41
yoctozepto	mnasiadka: yes, sir	15:42
*** skramaja has quit IRC		15:42
yoctozepto	Fl1nt: yeah, it needs "rebalancing"	15:42
yoctozepto	quirky but worky	15:42
Fl1nt	yoctozepto, right you're too quick to wrote :p but yet availability of the APIs calls aren't something that can't be managed real quick by migrating to another controller so what's the point? Except if your cluster only have one controller which is far from being serious.	15:43
Fl1nt	I mean, rebalancing data is already an ops daily task with swift so, it's not really a biggie to have to do it for a cinder-volume controller.	15:44
openstackgerrit	Mark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build https://review.opendev.org/761928	15:45
Fl1nt	BTW, from cinder doc: Active-Active is not yet	15:50
Fl1nt	# supported.	15:50
Fl1nt	so I guess it kinda fix the issue.	15:51
mnasiadka	where is it stated?	15:54
*** rouk has joined #openstack-kolla		15:54
Fl1nt	https://docs.openstack.org/cinder/ussuri/configuration/block-storage/samples/cinder.conf.html <- at the cluster directive level	15:54
Fl1nt	tho it seems it became available on victoria	15:55
Fl1nt	but as neither the official administration or installation doc is actually refering to it and that only OOO seems to "use it" I would be extra careful introducing the feature.	15:57
mgoddard	we are using active/active	15:58
mgoddard	always have	15:59
Fl1nt	on victoria?	15:59
mgoddard	the question is around how we do it	15:59
mgoddard	if you are running ceph and more than one active cinder-volume, you have active/active	15:59
Fl1nt	starting from which release?	15:59
mgoddard	active/passive would require some fencing mechanism, such as pacemaker	15:59
Fl1nt	mgoddard, I think we're not talking about the same thing.	16:00
mgoddard	possibly	16:00
Fl1nt	cinder-volume services are active/active in terms of requests, as long as there is one down and that your request is passing by the VIP you're safe	16:01
Fl1nt	BUT	16:01
Fl1nt	as yoctozepto noted it, if you've got a cinder-volume agent down, all volumes attached BY this agent and so referenced by it within the database, won't be manageable until you explicitly attach them to another up and running agent	16:01
Fl1nt	using the openstack cli	16:02
rouk	oh hey is this my bug being discussed?	16:02
Fl1nt	or maybe through horizon I don't tested it.	16:02
Fl1nt	rouk, depends ^^	16:02
rouk	https://bugs.launchpad.net/kolla-ansible/+bug/1904062	16:02
openstack	Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,In progress] - Assigned to Michal Nasiadka (mnasiadka)	16:02
Fl1nt	rouk, yep	16:03
rouk	is the backend_host method from before no longer recommended? im a bit out of the loop.	16:03
Fl1nt	to be clear, rouk your problem is that when you lost a cinder-volume agent, you have to migrate them to another still running one before being able to manage your volume again right?	16:04
rouk	that, and that every pre-ussuri volume is on the old host, which goes away on upgrade forever.	16:04
rouk	so every existing volume needs a manual migration to a random host.	16:04
Fl1nt	yes, using the openstack cli	16:05
mgoddard	rouk: the old ones are assigned to the old backend_host	16:05
mgoddard	when that gets removed, they need to be migrated to a real hostname	16:05
rouk	mgoddard: that is correct, yeah.	16:05
Fl1nt	the old backend_host vanished if I correctly understand rouk	16:05
rouk	Fl1nt: s/vanished/down/	16:05
Fl1nt	yes my point	16:05
mgoddard	rouk: did you actively remove backend_host?	16:06
rouk	i moved to the new templates for external_ceph, which involved trusting kolla to not brick my previously-recommended config :p	16:06
mgoddard	right	16:07
rouk	so right now im just overriding backend_host back in.	16:07
*** wuchunyang has quit IRC		16:07
*** k_mouza has quit IRC		16:07
rouk	if theres a way to get individual host states without manual migrations, that would be cool though.	16:07
Fl1nt	so you basically just have to cinder migrate <volume> <host> with a for loop	16:08
Fl1nt	replace cinder with the appropriate openstack cmd if you're using the wrapper	16:08
rouk	yeah, which is <0 fun on routine maintenance, or random node deaths.	16:08
Fl1nt	rouk, no no no	16:08
Fl1nt	if your storage host die, and that it is part of a ceph cluster	16:09
Fl1nt	your VMs don't lost the volume	16:09
Fl1nt	if your storage host die, that it is part of a ceph cluster AND hosting a cinder-volume agent	16:09
rouk	if i take down a cinder-volume node for maint, and then someone, out of my hundreds of users, deletes a vm, and i didnt react instantly and migrate volumes as soon as the crash happened, i end up with a volume attached to a deleted instance.	16:09
Fl1nt	then your "just" need to instruct openstack to delegate those volume management to another cindervolume agent	16:09
rouk	thats how i noticed this issue.	16:10
rouk	user deleted vm, ended up with a stuck attached volume, cause cinder-volume didnt respond during the delete.	16:10
Fl1nt	you then just have to put your volume in an available state get ride of the attachement and delete it or let it be available again	16:10
Fl1nt	that's the way cinder/openstack works, it's not really a bug.	16:11
rouk	which is manual intervention in hundreds of workflows which are often scripted and will require calls to like 20 people to get them to clean up.	16:11
Fl1nt	well, welcome to openstack orphane resources no management ^^	16:11
mgoddard	Fl1nt: well, that's the way it works if you don't use backend_host or clustering	16:12
mgoddard	Fl1nt: but surely with backend_host or clustering, you don't need to do that?	16:12
rouk	backend_host just fixes this, and if clustering can do it too, while maintaining a record for each agent, that would be even better.	16:12
Fl1nt	you can't use clustering until victoria from my understanding of the doc (but can be wrong) and using backend_host wont fix that specific orphaned issue.	16:12
Fl1nt	s/issue/resource issue/	16:13
rouk	mgoddard: correct, the odds of a stuck volume with backend_host is only if a node dies and is sent commands in that 1 minute period before its timed off rabbit, i think.	16:13
rouk	it still happens, but its better than "till all volumes get migrated"	16:13
rouk	presumably clustering would fix that last possible case.	16:14
mgoddard	Fl1nt: OSA switched to active/active in Stein: https://opendev.org/openstack/openstack-ansible-os_cinder/commit/918b9077c816be5fc056637301265e0be2f245ab	16:14
mgoddard	(after release)	16:15
Fl1nt	rouk, are you running on victoria? because until then, you can't get cluster (stated within the doc) and even on victoria there is a lack of documentation.	16:15
Fl1nt	mgoddard, yeah but it's not because they activate something that it necessarly work and it require pacemaker.	16:15
rouk	Fl1nt: nah, im slow because of ties to FWaaS that i need to convince said hundreds of people to fix their 0/0 public ip security groups before i can upgrade.	16:16
mgoddard	Fl1nt: no, pacemaker is for active/passive	16:16
rouk	ussuri hitting prod for me friday.	16:16
Fl1nt	mgoddard, ok, noticed	16:16
Fl1nt	I can test cluster directive on staging but I doubt it will work like that out of the box.	16:17
rouk	i can get victoria onto my PTE env and start testing it some time after the new year, sadly.	16:18
rouk	so im kinda useless on testing clustering.	16:18
Fl1nt	mgoddard, how can they have cinder a/a cluster used with stein when the configuration from cinder and the doc state that it is not yet supported even on ussuri?	16:18
Fl1nt	https://docs.openstack.org/cinder/ussuri/configuration/block-storage/samples/cinder.conf.html	16:18
Fl1nt	is there kind of a new "beta" phase with features on openstack like for kubernetes now ?	16:19
mnasiadka	well, we need some statement from Cinder team, how it should be done in Victoria and before ;-)	16:20
Fl1nt	rouk, is your cluster hosting sensitive data? Do you have proper backup solution in place? Because if you don't, I would not advise you to use the cluster feature until victoria and the proper validation from cinde maintainers from the distribution list.	16:20
Fl1nt	mnasiadka, +10	16:20
mgoddard	Fl1nt: https://docs.openstack.org/releasenotes/cinder/rocky.html	16:21
rouk	Fl1nt: im not doing anything in prod till i know its good. PTE is worthless to me, its big, but its designed to be nuked.	16:21
rouk	for now, im going to keep using backend_host till theres a better option.	16:21
Fl1nt	are you refering to this mark? "Added support for active-active replication to the RBD driver. This allows users to configure multiple volume backends that are all a member of the same cluster participating in replication."	16:22
*** k_mouza has joined #openstack-kolla		16:23
mgoddard	Fl1nt: yes	16:24
Fl1nt	hum, this is so vague that I don't know if it's not refering to ceph capability itself and not really cinder-volume agent per see	16:24
rouk	it doesnt make sense as a ceph statement... unless they mean setting up cross-cluster pool replication? but cinder doesnt do pool management, it expects pools to be there already, heh.	16:26
Fl1nt	rouk, you can have multiple backends from the same ceph cluster and then even have additional mirroring on rbd	16:26
Fl1nt	all in all, need to be tested and validated from cinder team.	16:27
Fl1nt	for instance on our prod cluster, we have three different ceph backends, participating on the same cluster and cinder actually use those three different backends.	16:28
mgoddard	here's the patch that added that reno: https://review.opendev.org/#/c/556658/	16:29
patchbot	patch 556658 - cinder - RBD: add support for active/active replication (MERGED) - 7 patch sets	16:29
mgoddard	I think the word replication is a misnomer	16:29
Fl1nt	thank for the patch	16:30
Fl1nt	so	16:30
Fl1nt	it's a CEPH level feature, not a cinder-volume one.	16:31
Fl1nt	it use CEPH RBD mirroring	16:31
*** cah_link has quit IRC		16:32
Fl1nt	hum... actually, it not even that	16:35
Fl1nt	they're cloning images	16:36
rouk	so i have a completely tangental question, which ive tried to get neutron to answer a few times, but never got anywhere and havnt had time to persue it hard enough. since train upgrade, ive had issues with routers not getting routes, and when one fails over, its a dice roll if the target has its assigned routes. could it just be the fwaas plugin for l3-agent slowly rotting?	16:36
rouk	re-adding routes magically fixes the problem, but it only started happening in train.	16:37
Fl1nt	Never had this issue sorry :(	16:37
rouk	yeah its nasty, its in all my clusters, and nobody elses, and has no error, and no reproducable test case.	16:38
rouk	must be fwaas since im apparently the only user left.	16:38
rouk	Fl1nt: i agree, the code, and the commit message, and the merge request are uselessly vague, and they need to weigh in on the "right" solution.	16:39
Fl1nt	mgoddard, look at the _disable_replication function, it's a ceph function to mirror a flattened image (volume in ceph vocabulary)	16:40
Fl1nt	so basically everything replication related is based on this concept	16:40
rouk	then yeah, thats not helpful for this, sadly.	16:41
rouk	must be clustering for >V, backend_host for <V, but would be nice for their opinion on it.	16:41
* Fl1nt reading the rbd driver... it's pretty interesting ^^		16:42
rouk	its a lot shorter code than i expected, but everything with ceph is pretty smooth, so i guess not that unexpected.	16:43
Fl1nt	it's not even clear from the driver itself as everything named volume is coming from cinder library but then they're transforming volume back as image when dealing with ceph related block. it's confusing ^^	16:43
rouk	yeah, needs a terminology fix, too many opinions on what something means.	16:43
mgoddard	Fl1nt: I think that is unrelated. AFAICT, the cluster option, SUPPORTS_ACTIVE_ACTIVE driver flag etc relate to mapping of volumes to cinder-volume hosts	16:44
mgoddard	volume replication is a ceph cluster concern	16:44
Fl1nt	actually, I would be able to tell you correctly if I download the cinder source as the IDE would then follow the appropriate link and not just be a guess ^^	16:44
Fl1nt	with ceph there is to my knowledge, three different replication features, geo/rbd mirroring and OSDs of course, so it isn't that clear. the problem here is that the failover function fail if your volume isn't a replication enabled RBD image.	16:47
Fl1nt	which doesn't make sense	16:48
Fl1nt	as why would cinder-volume require to know about a image feature related in order to do the active/active?	16:48
*** nikparasyr has left #openstack-kolla		16:49
mgoddard	https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cinder-volume-active-active-support.html	16:52
*** bengates has quit IRC		16:56
*** muhaha has joined #openstack-kolla		16:57
Fl1nt	mgoddard, thanks, gonna dive deeper on tomorrow as from the review patch I can't really tell what they're really doing there are function call from external module that I can't understand otherwise than with having the code.	17:06
*** Fl1nt has quit IRC		17:11
*** cah_link has joined #openstack-kolla		17:12
*** cah_link has quit IRC		17:15
*** e0ne has quit IRC		17:22
*** kevko has quit IRC		17:23
*** rpittau is now known as rpittau\|afk		17:27
*** k_mouza has quit IRC		17:30
*** dougsz has quit IRC		17:32
*** kevko has joined #openstack-kolla		17:42
*** k_mouza has joined #openstack-kolla		17:50
*** gfidente is now known as gfidente\|afk		17:59
*** mgoddard has quit IRC		18:02
*** k_mouza has quit IRC		18:10
*** k_mouza has joined #openstack-kolla		18:19
*** k_mouza has quit IRC		18:56
*** mgoddard has joined #openstack-kolla		19:21
*** kevko has quit IRC		19:44
*** k_mouza has joined #openstack-kolla		19:57
*** k_mouza has quit IRC		20:01
*** k_mouza has joined #openstack-kolla		20:16
*** k_mouza has quit IRC		20:21
*** mgoddard has quit IRC		20:31
*** TrevorV has quit IRC		20:50
*** gfidente\|afk is now known as gfidente		21:04
*** hjensas_ has joined #openstack-kolla		21:06
*** hjensas has quit IRC		21:10
*** jovial[m] has quit IRC		21:41
*** muhaha has quit IRC		21:47
*** jovial[m] has joined #openstack-kolla		22:05
openstackgerrit	James Kirsch proposed openstack/kolla master: Add LetsEncrypt images for cert request/renewal https://review.opendev.org/741339	22:31
*** hjensas__ has joined #openstack-kolla		22:47
*** hjensas_ has quit IRC		22:51
*** quasar` is now known as parallax		23:03
*** parallax has left #openstack-kolla		23:07
*** parallax has joined #openstack-kolla		23:15
*** Arador has joined #openstack-kolla		23:36
Arador	Hello, new here and new to Kolla, but I have been using Openstack for a while. Is anyone here familiar with getting the Adjutant role to work? I am getting an error that say "no filter named 'customise_fluentd'"	23:40
*** mloza has quit IRC		23:46

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!