openstackgerrit | Merged openstack/kayobe stable/train: Fix filtering of network names set to an empty string https://review.opendev.org/761331 | 00:09 |
---|---|---|
*** k_mouza has joined #openstack-kolla | 00:45 | |
*** k_mouza has quit IRC | 00:49 | |
*** kevko has joined #openstack-kolla | 00:56 | |
*** zzzeek has quit IRC | 01:06 | |
*** zzzeek has joined #openstack-kolla | 01:09 | |
*** xinliang has joined #openstack-kolla | 01:15 | |
*** xinliang has quit IRC | 01:31 | |
openstackgerrit | Merged openstack/kolla-ansible stable/victoria: CI: add missing --fail argument to curl https://review.opendev.org/762745 | 02:38 |
*** skramaja has joined #openstack-kolla | 02:59 | |
*** kevko has quit IRC | 03:05 | |
*** vishalmanchanda has joined #openstack-kolla | 03:52 | |
*** sri_ has quit IRC | 04:07 | |
*** sri_ has joined #openstack-kolla | 04:11 | |
*** vkmc has quit IRC | 04:16 | |
*** vkmc has joined #openstack-kolla | 04:16 | |
*** k_mouza has joined #openstack-kolla | 04:46 | |
*** johnsom has quit IRC | 04:46 | |
*** johnsom has joined #openstack-kolla | 04:49 | |
*** k_mouza has quit IRC | 04:50 | |
*** stackedsax has quit IRC | 05:13 | |
*** stackedsax has joined #openstack-kolla | 05:13 | |
*** johnsom has quit IRC | 05:18 | |
*** johnsom has joined #openstack-kolla | 05:19 | |
*** zzzeek has quit IRC | 05:25 | |
*** zzzeek has joined #openstack-kolla | 05:27 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-kolla | 05:33 | |
*** zzzeek has quit IRC | 05:50 | |
*** zzzeek has joined #openstack-kolla | 05:51 | |
*** JamesBenson has quit IRC | 05:53 | |
*** zzzeek has quit IRC | 06:10 | |
*** zzzeek has joined #openstack-kolla | 06:11 | |
*** johnsom has quit IRC | 06:27 | |
*** johnsom has joined #openstack-kolla | 06:27 | |
*** gfidente|afk is now known as gfidente | 06:42 | |
*** cah_link has joined #openstack-kolla | 06:47 | |
*** zzzeek has quit IRC | 07:14 | |
*** zzzeek has joined #openstack-kolla | 07:16 | |
*** rm_work has quit IRC | 07:26 | |
*** jbadiapa has joined #openstack-kolla | 07:27 | |
*** rm_work has joined #openstack-kolla | 07:28 | |
*** wuchunyang has joined #openstack-kolla | 07:35 | |
*** zzzeek has quit IRC | 07:37 | |
*** zzzeek has joined #openstack-kolla | 07:38 | |
*** nikparasyr has joined #openstack-kolla | 07:46 | |
*** rpittau|afk is now known as rpittau | 08:05 | |
yoctozepto | dcapone2004: one would have to switch to using role-based hostnames to have this working nicely | 08:10 |
mnasiadka | morning | 08:18 |
*** pescobar has quit IRC | 08:21 | |
*** Fl1nt has joined #openstack-kolla | 08:22 | |
Fl1nt | Good morning everyone! | 08:22 |
Fl1nt | sorry for the vanishing, but I've been busy at work :) | 08:23 |
mnasiadka | we are all busy at work | 08:25 |
*** zzzeek has quit IRC | 08:26 | |
*** pescobar has joined #openstack-kolla | 08:27 | |
*** bengates has joined #openstack-kolla | 08:28 | |
*** zzzeek has joined #openstack-kolla | 08:30 | |
*** zzzeek has quit IRC | 08:38 | |
*** zzzeek has joined #openstack-kolla | 08:40 | |
*** zzzeek has quit IRC | 08:48 | |
Fl1nt | yeah! fortunately, it's all good for the community as I'm working to setup an internal advocacy division about Openstack / Kolla / Ansible etc at work. | 08:49 |
Fl1nt | BTW, quick question, when we activate external TLS, it create an appropriate automatic redirect for horizon but not for keystone and other services, is there a reason? Same way, when dealing with SSO using SAML2 iDP endpoint and especially ADFS | 08:51 |
Fl1nt | this one refuse to send back claims to non-tls endpoint | 08:51 |
*** zzzeek has joined #openstack-kolla | 08:51 | |
Fl1nt | meaning that returnTo parameter can't be forged using non-tls by Apache | 08:51 |
*** mgoddard has joined #openstack-kolla | 08:51 | |
Fl1nt | meaning we miss a SSLEngine on option on keystone Apache public virtualhost section | 08:52 |
Fl1nt | I'm currently testing it, but if someone have any clue on this ^^ | 08:52 |
*** bengates has quit IRC | 08:59 | |
*** k_mouza has joined #openstack-kolla | 09:00 | |
*** bengates has joined #openstack-kolla | 09:01 | |
*** k_mouza has quit IRC | 09:05 | |
*** kevko has joined #openstack-kolla | 09:18 | |
*** sean-k-mooney has quit IRC | 09:27 | |
*** sean-k-mooney has joined #openstack-kolla | 09:27 | |
*** slunav has quit IRC | 09:28 | |
*** mgoddard has quit IRC | 09:29 | |
*** sluna has joined #openstack-kolla | 09:31 | |
Fl1nt | aaaah fuuuu** it's actually available on Train+ release. | 09:33 |
Fl1nt | all right. | 09:33 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible stable/ussuri: [baremetal]: Use $releasever in docker-ce repo https://review.opendev.org/762979 | 09:35 |
Fl1nt | mnasiadka, why would you use the docker_yum_baseurl when you can just use the already variabilized docker-ce.repo file: https://download.docker.com/linux/centos/docker-ce.repo ? | 09:39 |
mnasiadka | Fl1nt: feel free to propose a change to master, I'm not putting any more cycles into that ;) | 09:40 |
Fl1nt | no, I mean, I'm looking for the reasoning behind that. | 09:41 |
mnasiadka | Fl1nt: reasoning is so it's easy to backport | 09:41 |
Fl1nt | just curious pp | 09:41 |
Fl1nt | ok | 09:41 |
*** bengates has quit IRC | 09:46 | |
*** bengates has joined #openstack-kolla | 09:47 | |
Fl1nt | Still working on SSO Using SAML2 within k-a this will be a big boy, the amount of configuration and logic is pretty large. | 09:49 |
Fl1nt | and I've finished the cloudkitty role fix for Elasticsearch and prometheus, I just need time to push the patch. | 09:50 |
*** bengates has quit IRC | 09:51 | |
*** bengates has joined #openstack-kolla | 09:54 | |
*** zzzeek has quit IRC | 09:57 | |
*** zzzeek has joined #openstack-kolla | 10:02 | |
*** wuchunyang has quit IRC | 10:16 | |
*** mgoddard has joined #openstack-kolla | 10:19 | |
*** jpward has quit IRC | 10:20 | |
openstackgerrit | Michal Nasiadka proposed openstack/kolla master: prometheus: Add OVN and OVS exporters https://review.opendev.org/762986 | 10:34 |
mgoddard | Fl1nt: Apache is always non-TLS, unless you are using backend TLS (Ussuri+) | 10:35 |
mgoddard | HAproxy terminates TL | 10:35 |
mgoddard | TLS | 10:35 |
mgoddard | Fl1nt: and horizon has a redirect because the port is different for HTTP | 10:36 |
mgoddard | plus it's more of a user/browser facing service | 10:36 |
*** zzzeek has quit IRC | 10:38 | |
*** zzzeek has joined #openstack-kolla | 10:40 | |
Fl1nt | actually, when dealing with ADFS (SAML 2.0) you need keystone tls backend as apache (mod_auth_mellon) create a returnTo URL which use non-tls otherwise, and using a non-tls returnTo is not working | 10:40 |
Fl1nt | exemple: | 10:41 |
Fl1nt | when your user call /auth/login using WebSSO it call for https://<fqdn>:5000/v3/auth/OS-FEDERATION/identity-providers/adfs/protocols/saml2/websso?origin=https://<fqdn>/auth/websso | 10:44 |
mnasiadka | Fl1nt: tried just setting ServerName directive in wsgi config? | 10:47 |
Fl1nt | yep doesn't work neither. | 10:47 |
mnasiadka | works for me | 10:48 |
Fl1nt | ServerName <fqdn> at the virtualHost level just right upper the WSGI config? | 10:48 |
Fl1nt | I'll check everything again. | 10:49 |
*** k_mouza has joined #openstack-kolla | 10:51 | |
*** kwazar is now known as quasar_ | 10:55 | |
*** quasar_ is now known as quasar | 10:55 | |
*** quasar is now known as quasar` | 10:56 | |
Fl1nt | ok, so, just to clarify the issue. | 11:04 |
Fl1nt | when using a normal kolla train branch, only enabling kolla_enable_external_tls | 11:05 |
Fl1nt | at some point | 11:05 |
Fl1nt | the SP (keystone/mod_auth_mellon) craft the relayState URL | 11:05 |
Fl1nt | that URL should be a TLS endpoint using your now enabled public tls endpoint for keystone: https://<fqdn>:5000/v3 | 11:06 |
Fl1nt | however | 11:06 |
Fl1nt | and it's were I'm being lost | 11:06 |
Fl1nt | for some reason | 11:06 |
Fl1nt | apache isn't using our originURL=https://<fqdn>:5000/v3 value | 11:06 |
Fl1nt | but the non-tls equivalent | 11:07 |
Fl1nt | I've try to add a http to https redirect at haproxy level adding a keystone_public_redirect section within keystone services dict for haproxy, it works, but it doesn't as it create a loop as the relayState is a parameter that is then redirected but create a new request with a non-tls parameters etc | 11:08 |
*** e0ne has joined #openstack-kolla | 11:08 | |
Fl1nt | so, for now, my conclusion is: Until apache virtualhost isn't using TLS (SSLEngine on) mod_auth_mellon context can't craft a TLS relayState url as it's location composition (VirtualHost context) isn't TLS based. | 11:10 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: WIP: Add OVN and OVS exporter deployment https://review.opendev.org/762992 | 11:12 |
*** wuchunyang has joined #openstack-kolla | 11:15 | |
*** zzzeek has quit IRC | 11:17 | |
*** zzzeek has joined #openstack-kolla | 11:20 | |
*** zzzeek has quit IRC | 11:27 | |
*** zzzeek has joined #openstack-kolla | 11:28 | |
*** stingrayza has quit IRC | 11:29 | |
*** stingrayza has joined #openstack-kolla | 11:31 | |
Fl1nt | ok, so there is someone else having the same issue: https://github.com/latchset/mod_auth_mellon/issues/27 and hopefully there is mellon diagnostics directive to activate more verbose log. | 11:35 |
openstackgerrit | Mark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build https://review.opendev.org/761928 | 11:42 |
openstackgerrit | Mark Goddard proposed openstack/kolla master: CI: add templated Dockerfiles to build logs https://review.opendev.org/762997 | 11:42 |
*** brinzhang0 has joined #openstack-kolla | 11:50 | |
*** brinzhang_ has quit IRC | 11:53 | |
*** brinzhang_ has joined #openstack-kolla | 11:55 | |
*** brinzhang0 has quit IRC | 11:58 | |
*** mgoddard has quit IRC | 12:01 | |
openstackgerrit | Michal Nasiadka proposed openstack/kolla master: prometheus: Add OVN and OVS exporters https://review.opendev.org/762986 | 12:10 |
*** wuchunyang has quit IRC | 12:11 | |
*** JamesBenson has joined #openstack-kolla | 12:12 | |
*** mgoddard has joined #openstack-kolla | 12:15 | |
*** jpward has joined #openstack-kolla | 12:20 | |
*** Luzi has joined #openstack-kolla | 12:36 | |
*** stingrayza has quit IRC | 12:37 | |
*** zzzeek has quit IRC | 12:39 | |
*** zzzeek has joined #openstack-kolla | 12:41 | |
openstackgerrit | Michal Nasiadka proposed openstack/kolla master: prometheus: Add OVN exporter https://review.opendev.org/762986 | 12:45 |
Fl1nt | ok, I managed to re deploy our staging environment in order to validate my assertion. | 12:56 |
Fl1nt | so | 12:56 |
Fl1nt | our ADFS refuse to send claims to non-tls assertion consumer service URLs (postResponse) but in the meantime, if I do activate the kolla_enable_external_tls then the VIP:80 redirect to VIP:443 but then VIP:5000 doesn't provide TLS endpoint until you declare either haproxy to redirect keystone_external_redirect or keystone apache vhost TLS Backend. | 12:58 |
Fl1nt | now my question to you mnasiadka is, do you use a Train release or a Ussuri one? | 12:59 |
mnasiadka | Ussuri | 12:59 |
Fl1nt | TBN: CentOS provided mod_auth_mellon comes without mellon_diagnostic compiled so no chance ^^ | 12:59 |
Fl1nt | ah ok, so hence why it works | 13:00 |
Fl1nt | ok and finally, did anyone already successfully managed to make keystone SAML2.0 federated authentication usin mod_mellon work on a Train release? | 13:01 |
*** dougsz has joined #openstack-kolla | 13:01 | |
Fl1nt | from my test, the missing part is all the backend tls works done with ussuri as when doing federation agains SAML2 endpoint (mainly ADFS) you need the whole communication channel to be TLS as your client (WebBrowser mostly) need to connect to keystone directly using TLS but keystone mod_mellon won't craft a TLS endpoint until your apache vhost is using TLS. | 13:04 |
Fl1nt | more accuratly, mod_mellon won't create an appropriate relayState URL | 13:05 |
Fl1nt | and so lead you to this redirect loop nightmare | 13:06 |
Fl1nt | because haproxy translate the request from HTTP to HTTPS but he request parameter continue to be a non-tls originURL/RelayState URL etc etc | 13:06 |
*** wuchunyang has joined #openstack-kolla | 13:07 | |
Fl1nt | mgoddard, until how many time is Train supposed to be supported on kolla? | 13:07 |
mgoddard | Fl1nt: it will enter extended maintenance in about 6 months | 13:08 |
yoctozepto | Fl1nt: but we can support it in em stage as long as we care | 13:08 |
yoctozepto | just with no new official releases | 13:08 |
Fl1nt | I need to evaluate if it worth the effort to either migrate to ussuri right now and so natively get TLS backend support, or if I should patch Train appropriately | 13:09 |
yoctozepto | I believe Train might be late because Ussuri breaks c7 compat | 13:09 |
yoctozepto | I would just go to Ussuri | 13:09 |
yoctozepto | you would have to upgrade anyhow | 13:09 |
Fl1nt | yeah, this is kind of an additional breakthrough but I want to migrate on C8 for prod release. | 13:09 |
yoctozepto | mgoddard, mnasiadka: I have some cycles for upstream today and tomorrow - any priority stuff to look at? | 13:10 |
mgoddard | yoctozepto: docker pull limits | 13:11 |
mgoddard | yoctozepto: https://bugs.launchpad.net/kolla-ansible/+bug/1904062 | 13:11 |
openstack | Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,Triaged] | 13:11 |
yoctozepto | mgoddard: ack, have you talked to infra since then? | 13:11 |
mgoddard | yoctozepto: not yet | 13:11 |
yoctozepto | ok, then I will handle this | 13:12 |
Fl1nt | mgoddard, regarding ceph external bug, it's not recommended to fix the host actually and as mentioned there is a command to migrate volumes. | 13:13 |
Fl1nt | but I don't completely get the issue tbh it's not that clear what's the issue. | 13:14 |
Fl1nt | oooh ok, I see, BTW, train doc about external ceph is broken. | 13:15 |
mnasiadka | mgoddard: I can take the cinder bug, did an investigation yesterday. | 13:19 |
mgoddard | thanks mnasiadka | 13:20 |
yoctozepto | mgoddard: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018817.html | 13:30 |
yoctozepto | thanks mnasiadka | 13:31 |
yoctozepto | mgoddard: also pinged infra on irc (#opendev) | 13:31 |
yoctozepto | let's see and I will coordinate this | 13:31 |
yoctozepto | any other urgent matters? | 13:31 |
mgoddard | nice, thanks yoctozepto | 13:31 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible master: cinder: start using active-active for rbd https://review.opendev.org/763011 | 13:33 |
mgoddard | yoctozepto: I don't think so. I guess just victoria stabilisation | 13:33 |
mnasiadka | yeah, we could look at bugs targeted at victoria and just start closing them | 13:33 |
mnasiadka | I guess Kolla should be close to a first stable release for Victoria | 13:34 |
yoctozepto | that is what I wanted to do next, so we are aligned | 13:35 |
*** k_mouza has quit IRC | 13:41 | |
openstackgerrit | Merged openstack/kolla master: Bump up openstack exporter to 1.2.0 https://review.opendev.org/761123 | 13:45 |
*** dougsz has quit IRC | 13:48 | |
*** dougsz has joined #openstack-kolla | 14:17 | |
*** dougsz has quit IRC | 14:17 | |
*** dougsz has joined #openstack-kolla | 14:18 | |
*** Luzi has quit IRC | 14:18 | |
openstackgerrit | Merged openstack/kayobe-config-dev master: Sync configs with kayobe @ 074024d63f9cb364ca16a7a7f0ac94d77ee9466b https://review.opendev.org/762826 | 14:19 |
*** k_mouza has joined #openstack-kolla | 14:24 | |
*** k_mouza has quit IRC | 14:24 | |
*** k_mouza has joined #openstack-kolla | 14:24 | |
*** brinzhang_ has quit IRC | 14:26 | |
mnasiadka | yoctozepto: so in order to close a bug, I need to submit a feature with some distributed lock manager? :D | 14:41 |
mnasiadka | I see tripleo is using etcd | 14:41 |
yoctozepto | mnasiadka: not sure about our etcd either, sorry | 14:41 |
yoctozepto | mnasiadka: I mean its ha properties | 14:42 |
openstackgerrit | Mark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build https://review.opendev.org/761928 | 14:42 |
openstackgerrit | Mark Goddard proposed openstack/kolla master: Remove footer block from intermediate images https://review.opendev.org/763027 | 14:42 |
yoctozepto | mnasiadka: redis generally works better because of etcd driver quirks | 14:42 |
mnasiadka | well, we would need to enforce coordination backend to anything when cinder backend is ceph | 14:42 |
yoctozepto | but really one needs to finally look at that lock mechanism | 14:42 |
yoctozepto | yeah, that too | 14:42 |
yoctozepto | sounds bad | 14:42 |
yoctozepto | the previous generally worked | 14:42 |
yoctozepto | so maybe for now we should just keep it | 14:43 |
mnasiadka | yoctozepto: it led to some duplications, because it was meant for active/passive | 14:45 |
yoctozepto | mnasiadka: could you expand that? | 14:46 |
yoctozepto | mnasiadka: I might want to know :D | 14:46 |
mnasiadka | https://docs.openstack.org/cinder/latest/contributor/high_availability.html#cinder-volume | 14:47 |
mnasiadka | check the attention :) | 14:47 |
yoctozepto | mnasiadka: yeah, that's why we should use *backend_host* and I am pretty sure we always did | 14:50 |
yoctozepto | let me check my deployment | 14:51 |
yoctozepto | mhm | 14:52 |
yoctozepto | though I think I set it | 14:53 |
*** TrevorV has joined #openstack-kolla | 14:53 | |
yoctozepto | mnasiadka: it only has some issues that mgoddard linked to: https://bugs.launchpad.net/cinder/+bug/1837403 | 14:55 |
openstack | Launchpad bug 1837403 in openstack-ansible trunk "CleanableInUse exceptions when doing large parallel operations (like snapshot creates)" [Undecided,New] | 14:55 |
yoctozepto | "large number of parallel Cinder operations" | 14:55 |
yoctozepto | I certainly do not have it | 14:55 |
yoctozepto | from cinder docs I can't tell why backend_host would be "hacky" | 14:56 |
mnasiadka | Well, we can just add it back, and work on coordination | 14:57 |
mnasiadka | It won’t be worse... | 14:57 |
yoctozepto | exactly | 14:59 |
yoctozepto | but we did not have it | 15:00 |
yoctozepto | so it is a problem for those moving from internal to external | 15:00 |
yoctozepto | so I guess this is a general issue against external | 15:00 |
yoctozepto | and not essentially its refactoring | 15:00 |
mgoddard | The most common deployment option for Cinder-Volume is as Active-Passive. This requires a common storage backend, the same Cinder backend configuration in all nodes, having the backend_host set on the backend sections, and using a high-availability cluster resource manager like Pacemaker. | 15:00 |
yoctozepto | yeah, the pacemaker sounds scary | 15:01 |
mgoddard | we deploy active/active | 15:01 |
yoctozepto | but it seems to work nonetheless without it | 15:01 |
yoctozepto | we deploy whatever | 15:01 |
yoctozepto | :D | 15:01 |
Fl1nt | Can someone explain this issue? I'm having a hard time finding what's the problem. | 15:08 |
Fl1nt | because I'm using an external CEPH cluster and everything is active-active from a cinder and nova standpoint. | 15:09 |
*** dougsz has quit IRC | 15:11 | |
Fl1nt | I'm currently transfering my whole kolla-config for our new deployment that include a fair amount of downstream patch, so I'll be able to give some examples and samples of how we did it if needed. | 15:11 |
*** cah_link has quit IRC | 15:12 | |
*** cah_link has joined #openstack-kolla | 15:13 | |
mnasiadka | mgoddard, yoctozepto: so what - just add backend_host back as part of the bugfix, and work on cluster/coordination? or do we want to do it properly (according to cinder) | 15:13 |
yoctozepto | mnasiadka: where does it say that we should use active/active? | 15:15 |
Fl1nt | just TBN: not using backend_host and having a working one. | 15:15 |
yoctozepto | it feels right to use it | 15:15 |
mnasiadka | yoctozepto: well, we deploy active active (multiple cinder-volumes) | 15:15 |
yoctozepto | Fl1nt: then using `cluster` perhaps? | 15:15 |
Fl1nt | neither | 15:15 |
Fl1nt | hold on, I'm redacting/pasting | 15:15 |
mnasiadka | yoctozepto: but the mechanism we use to present them as one, is a bit hacky :) | 15:15 |
yoctozepto | Fl1nt: well, then your HA is not really HA | 15:15 |
yoctozepto | mnasiadka: hmm, could be | 15:15 |
yoctozepto | mnasiadka: I felt like it worked like active/passive anyhow | 15:16 |
yoctozepto | because cinder claims to require coordination for active/active | 15:16 |
mnasiadka | yoctozepto: I guess it did, we could just set active/backup in haproxy | 15:16 |
yoctozepto | and I'm not running one | 15:16 |
Fl1nt | I swear it is, as all hosts uses all cinder-volume nodes, hold on for the screenshots and conf | 15:16 |
mnasiadka | yoctozepto: as the bugfix | 15:16 |
yoctozepto | mnasiadka: haproxy does not care about cinder-volume | 15:16 |
mnasiadka | ah right | 15:16 |
yoctozepto | :-) | 15:16 |
mnasiadka | so then we can just go back to old hacky somewhat working solution | 15:17 |
mnasiadka | and close the bug | 15:17 |
*** dougsz has joined #openstack-kolla | 15:17 | |
mnasiadka | because cluster+coordination seems like it needs a fair amount of testing | 15:18 |
openstackgerrit | Michal Nasiadka proposed openstack/kolla-ansible stable/ussuri: [baremetal]: Use $releasever in docker-ce repo https://review.opendev.org/762979 | 15:20 |
mgoddard | mnasiadka, yoctozepto: let's find out from Cinder what the potential issues with backend_host & active/active are | 15:21 |
mgoddard | mailing list time? | 15:21 |
yoctozepto | mgoddard: ok, good call | 15:23 |
yoctozepto | I wonder as it works just fine for me at the moment :D | 15:24 |
yoctozepto | (and really thought it somehow degraded itself to active/passive due to no coordination) | 15:24 |
yoctozepto | you might want to mention that no coordination is configured | 15:24 |
*** ysirndjuro has joined #openstack-kolla | 15:24 | |
mnasiadka | yoctozepto: if you feel super bad to images created with just downloading a file via curl, so then we need to fix one third of images? :) | 15:27 |
yoctozepto | mnasiadka: feel bad about increasing our debt :D | 15:31 |
mnasiadka | yoctozepto: well, then we need some better approach - any proposals? ;) | 15:31 |
Fl1nt | yoctozepto, ok, which config/screen do you need? | 15:32 |
Fl1nt | I've retrieve cinder-volume.conf | 15:32 |
Fl1nt | ceph.conf | 15:32 |
Fl1nt | and dash screen of the services and volumes distribution | 15:32 |
Fl1nt | http://paste.openstack.org/show/aM7Xsdim8oAHQXcfAUVY/ - cinder-volume.conf | 15:33 |
Fl1nt | http://paste.openstack.org/show/Tw0FnpLhdQN93EP1teSm/ - ceph.conf | 15:33 |
Fl1nt | and here is the volumes distribution: https://imgur.com/a/2CKptab | 15:35 |
Fl1nt | TBN: this is on a staging cluster but yet. | 15:35 |
Fl1nt | sorry for those blackbox but I don't yet get a local distributed cluster to live show as I'm still waiting for some parts. | 15:36 |
mnasiadka | yoctozepto: we can use ADD, but it will not cache those downloads anyway | 15:38 |
Fl1nt | or you could use the packages from distributions, I've done it that way as we don't get access to internet. | 15:39 |
yoctozepto | Fl1nt: kill one cinder-volume to discover that you can no longer manage volumes linked to it | 15:39 |
yoctozepto | mnasiadka: I mean I would love to have this from real repos | 15:39 |
mnasiadka | yoctozepto: real repos... you want people writing 300 lines of code go an extra mile and build rpms and debs? :) | 15:40 |
mnasiadka | it would probably take more time than writing the app :) | 15:40 |
yoctozepto | mnasiadka: yeas | 15:40 |
yoctozepto | make them feel the pain | 15:40 |
Fl1nt | aaaaaaah THAT! ok, so the issue is about management. not distribution or availability. | 15:40 |
yoctozepto | Fl1nt: well, availability during failure is impacted | 15:41 |
yoctozepto | running vms are happy | 15:41 |
yoctozepto | but otherwise :-) | 15:41 |
mnasiadka | yoctozepto: get back on the ground now ;) | 15:41 |
Fl1nt | it's not, we already tested that, if you loose your storage controller you don't loose the VM workload and attachment, you just need to rebalance the volume managed. | 15:41 |
yoctozepto | mnasiadka: yes, sir | 15:42 |
*** skramaja has quit IRC | 15:42 | |
yoctozepto | Fl1nt: yeah, it needs "rebalancing" | 15:42 |
yoctozepto | quirky but worky | 15:42 |
Fl1nt | yoctozepto, right you're too quick to wrote :p but yet availability of the APIs calls aren't something that can't be managed real quick by migrating to another controller so what's the point? Except if your cluster only have one controller which is far from being serious. | 15:43 |
Fl1nt | I mean, rebalancing data is already an ops daily task with swift so, it's not really a biggie to have to do it for a cinder-volume controller. | 15:44 |
openstackgerrit | Mark Goddard proposed openstack/kolla master: WIP: CI: revert to public package mirrors after build https://review.opendev.org/761928 | 15:45 |
Fl1nt | BTW, from cinder doc: Active-Active is not yet | 15:50 |
Fl1nt | # supported. | 15:50 |
Fl1nt | so I guess it kinda fix the issue. | 15:51 |
mnasiadka | where is it stated? | 15:54 |
*** rouk has joined #openstack-kolla | 15:54 | |
Fl1nt | https://docs.openstack.org/cinder/ussuri/configuration/block-storage/samples/cinder.conf.html <- at the cluster directive level | 15:54 |
Fl1nt | tho it seems it became available on victoria | 15:55 |
Fl1nt | but as neither the official administration or installation doc is actually refering to it and that only OOO seems to "use it" I would be extra careful introducing the feature. | 15:57 |
mgoddard | we are using active/active | 15:58 |
mgoddard | always have | 15:59 |
Fl1nt | on victoria? | 15:59 |
mgoddard | the question is around how we do it | 15:59 |
mgoddard | if you are running ceph and more than one active cinder-volume, you have active/active | 15:59 |
Fl1nt | starting from which release? | 15:59 |
mgoddard | active/passive would require some fencing mechanism, such as pacemaker | 15:59 |
Fl1nt | mgoddard, I think we're not talking about the same thing. | 16:00 |
mgoddard | possibly | 16:00 |
Fl1nt | cinder-volume services are active/active in terms of requests, as long as there is one down and that your request is passing by the VIP you're safe | 16:01 |
Fl1nt | BUT | 16:01 |
Fl1nt | as yoctozepto noted it, if you've got a cinder-volume agent down, all volumes attached BY this agent and so referenced by it within the database, won't be manageable until you explicitly attach them to another up and running agent | 16:01 |
Fl1nt | using the openstack cli | 16:02 |
rouk | oh hey is this my bug being discussed? | 16:02 |
Fl1nt | or maybe through horizon I don't tested it. | 16:02 |
Fl1nt | rouk, depends ^^ | 16:02 |
rouk | https://bugs.launchpad.net/kolla-ansible/+bug/1904062 | 16:02 |
openstack | Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,In progress] - Assigned to Michal Nasiadka (mnasiadka) | 16:02 |
Fl1nt | rouk, yep | 16:03 |
rouk | is the backend_host method from before no longer recommended? im a bit out of the loop. | 16:03 |
Fl1nt | to be clear, rouk your problem is that when you lost a cinder-volume agent, you have to migrate them to another still running one before being able to manage your volume again right? | 16:04 |
rouk | that, and that every pre-ussuri volume is on the old host, which goes away on upgrade forever. | 16:04 |
rouk | so every existing volume needs a manual migration to a random host. | 16:04 |
Fl1nt | yes, using the openstack cli | 16:05 |
mgoddard | rouk: the old ones are assigned to the old backend_host | 16:05 |
mgoddard | when that gets removed, they need to be migrated to a real hostname | 16:05 |
rouk | mgoddard: that is correct, yeah. | 16:05 |
Fl1nt | the old backend_host vanished if I correctly understand rouk | 16:05 |
rouk | Fl1nt: s/vanished/down/ | 16:05 |
Fl1nt | yes my point | 16:05 |
mgoddard | rouk: did you actively remove backend_host? | 16:06 |
rouk | i moved to the new templates for external_ceph, which involved trusting kolla to not brick my previously-recommended config :p | 16:06 |
mgoddard | right | 16:07 |
rouk | so right now im just overriding backend_host back in. | 16:07 |
*** wuchunyang has quit IRC | 16:07 | |
*** k_mouza has quit IRC | 16:07 | |
rouk | if theres a way to get individual host states without manual migrations, that would be cool though. | 16:07 |
Fl1nt | so you basically just have to cinder migrate <volume> <host> with a for loop | 16:08 |
Fl1nt | replace cinder with the appropriate openstack cmd if you're using the wrapper | 16:08 |
rouk | yeah, which is <0 fun on routine maintenance, or random node deaths. | 16:08 |
Fl1nt | rouk, no no no | 16:08 |
Fl1nt | if your storage host die, and that it is part of a ceph cluster | 16:09 |
Fl1nt | your VMs don't lost the volume | 16:09 |
Fl1nt | if your storage host die, that it is part of a ceph cluster AND hosting a cinder-volume agent | 16:09 |
rouk | if i take down a cinder-volume node for maint, and then someone, out of my hundreds of users, deletes a vm, and i didnt react instantly and migrate volumes as soon as the crash happened, i end up with a volume attached to a deleted instance. | 16:09 |
Fl1nt | then your "just" need to instruct openstack to delegate those volume management to another cindervolume agent | 16:09 |
rouk | thats how i noticed this issue. | 16:10 |
rouk | user deleted vm, ended up with a stuck attached volume, cause cinder-volume didnt respond during the delete. | 16:10 |
Fl1nt | you then just have to put your volume in an available state get ride of the attachement and delete it or let it be available again | 16:10 |
Fl1nt | that's the way cinder/openstack works, it's not really a bug. | 16:11 |
rouk | which is manual intervention in hundreds of workflows which are often scripted and will require calls to like 20 people to get them to clean up. | 16:11 |
Fl1nt | well, welcome to openstack orphane resources no management ^^ | 16:11 |
mgoddard | Fl1nt: well, that's the way it works if you don't use backend_host or clustering | 16:12 |
mgoddard | Fl1nt: but surely with backend_host or clustering, you don't need to do that? | 16:12 |
rouk | backend_host just fixes this, and if clustering can do it too, while maintaining a record for each agent, that would be even better. | 16:12 |
Fl1nt | you can't use clustering until victoria from my understanding of the doc (but can be wrong) and using backend_host wont fix that specific orphaned issue. | 16:12 |
Fl1nt | s/issue/resource issue/ | 16:13 |
rouk | mgoddard: correct, the odds of a stuck volume with backend_host is only if a node dies and is sent commands in that 1 minute period before its timed off rabbit, i think. | 16:13 |
rouk | it still happens, but its better than "till all volumes get migrated" | 16:13 |
rouk | presumably clustering would fix that last possible case. | 16:14 |
mgoddard | Fl1nt: OSA switched to active/active in Stein: https://opendev.org/openstack/openstack-ansible-os_cinder/commit/918b9077c816be5fc056637301265e0be2f245ab | 16:14 |
mgoddard | (after release) | 16:15 |
Fl1nt | rouk, are you running on victoria? because until then, you can't get cluster (stated within the doc) and even on victoria there is a lack of documentation. | 16:15 |
Fl1nt | mgoddard, yeah but it's not because they activate something that it necessarly work and it require pacemaker. | 16:15 |
rouk | Fl1nt: nah, im slow because of ties to FWaaS that i need to convince said hundreds of people to fix their 0/0 public ip security groups before i can upgrade. | 16:16 |
mgoddard | Fl1nt: no, pacemaker is for active/passive | 16:16 |
rouk | ussuri hitting prod for me friday. | 16:16 |
Fl1nt | mgoddard, ok, noticed | 16:16 |
Fl1nt | I can test cluster directive on staging but I doubt it will work like that out of the box. | 16:17 |
rouk | i can get victoria onto my PTE env and start testing it some time after the new year, sadly. | 16:18 |
rouk | so im kinda useless on testing clustering. | 16:18 |
Fl1nt | mgoddard, how can they have cinder a/a cluster used with stein when the configuration from cinder and the doc state that it is not yet supported even on ussuri? | 16:18 |
Fl1nt | https://docs.openstack.org/cinder/ussuri/configuration/block-storage/samples/cinder.conf.html | 16:18 |
Fl1nt | is there kind of a new "beta" phase with features on openstack like for kubernetes now ? | 16:19 |
mnasiadka | well, we need some statement from Cinder team, how it should be done in Victoria and before ;-) | 16:20 |
Fl1nt | rouk, is your cluster hosting sensitive data? Do you have proper backup solution in place? Because if you don't, I would not advise you to use the cluster feature until victoria and the proper validation from cinde maintainers from the distribution list. | 16:20 |
Fl1nt | mnasiadka, +10 | 16:20 |
mgoddard | Fl1nt: https://docs.openstack.org/releasenotes/cinder/rocky.html | 16:21 |
rouk | Fl1nt: im not doing anything in prod till i know its good. PTE is worthless to me, its big, but its designed to be nuked. | 16:21 |
rouk | for now, im going to keep using backend_host till theres a better option. | 16:21 |
Fl1nt | are you refering to this mark? "Added support for active-active replication to the RBD driver. This allows users to configure multiple volume backends that are all a member of the same cluster participating in replication." | 16:22 |
*** k_mouza has joined #openstack-kolla | 16:23 | |
mgoddard | Fl1nt: yes | 16:24 |
Fl1nt | hum, this is so vague that I don't know if it's not refering to ceph capability itself and not really cinder-volume agent per see | 16:24 |
rouk | it doesnt make sense as a ceph statement... unless they mean setting up cross-cluster pool replication? but cinder doesnt do pool management, it expects pools to be there already, heh. | 16:26 |
Fl1nt | rouk, you can have multiple backends from the same ceph cluster and then even have additional mirroring on rbd | 16:26 |
Fl1nt | all in all, need to be tested and validated from cinder team. | 16:27 |
Fl1nt | for instance on our prod cluster, we have three different ceph backends, participating on the same cluster and cinder actually use those three different backends. | 16:28 |
mgoddard | here's the patch that added that reno: https://review.opendev.org/#/c/556658/ | 16:29 |
patchbot | patch 556658 - cinder - RBD: add support for active/active replication (MERGED) - 7 patch sets | 16:29 |
mgoddard | I think the word replication is a misnomer | 16:29 |
Fl1nt | thank for the patch | 16:30 |
Fl1nt | so | 16:30 |
Fl1nt | it's a CEPH level feature, not a cinder-volume one. | 16:31 |
Fl1nt | it use CEPH RBD mirroring | 16:31 |
*** cah_link has quit IRC | 16:32 | |
Fl1nt | hum... actually, it not even that | 16:35 |
Fl1nt | they're cloning images | 16:36 |
rouk | so i have a completely tangental question, which ive tried to get neutron to answer a few times, but never got anywhere and havnt had time to persue it hard enough. since train upgrade, ive had issues with routers not getting routes, and when one fails over, its a dice roll if the target has its assigned routes. could it just be the fwaas plugin for l3-agent slowly rotting? | 16:36 |
rouk | re-adding routes magically fixes the problem, but it only started happening in train. | 16:37 |
Fl1nt | Never had this issue sorry :( | 16:37 |
rouk | yeah its nasty, its in all my clusters, and nobody elses, and has no error, and no reproducable test case. | 16:38 |
rouk | must be fwaas since im apparently the only user left. | 16:38 |
rouk | Fl1nt: i agree, the code, and the commit message, and the merge request are uselessly vague, and they need to weigh in on the "right" solution. | 16:39 |
Fl1nt | mgoddard, look at the _disable_replication function, it's a ceph function to mirror a flattened image (volume in ceph vocabulary) | 16:40 |
Fl1nt | so basically everything replication related is based on this concept | 16:40 |
rouk | then yeah, thats not helpful for this, sadly. | 16:41 |
rouk | must be clustering for >V, backend_host for <V, but would be nice for their opinion on it. | 16:41 |
* Fl1nt reading the rbd driver... it's pretty interesting ^^ | 16:42 | |
rouk | its a lot shorter code than i expected, but everything with ceph is pretty smooth, so i guess not that unexpected. | 16:43 |
Fl1nt | it's not even clear from the driver itself as everything named volume is coming from cinder library but then they're transforming volume back as image when dealing with ceph related block. it's confusing ^^ | 16:43 |
rouk | yeah, needs a terminology fix, too many opinions on what something means. | 16:43 |
mgoddard | Fl1nt: I think that is unrelated. AFAICT, the cluster option, SUPPORTS_ACTIVE_ACTIVE driver flag etc relate to mapping of volumes to cinder-volume hosts | 16:44 |
mgoddard | volume replication is a ceph cluster concern | 16:44 |
Fl1nt | actually, I would be able to tell you correctly if I download the cinder source as the IDE would then follow the appropriate link and not just be a guess ^^ | 16:44 |
Fl1nt | with ceph there is to my knowledge, three different replication features, geo/rbd mirroring and OSDs of course, so it isn't that clear. the problem here is that the failover function fail if your volume isn't a replication enabled RBD image. | 16:47 |
Fl1nt | which doesn't make sense | 16:48 |
Fl1nt | as why would cinder-volume require to know about a image feature related in order to do the active/active? | 16:48 |
*** nikparasyr has left #openstack-kolla | 16:49 | |
mgoddard | https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cinder-volume-active-active-support.html | 16:52 |
*** bengates has quit IRC | 16:56 | |
*** muhaha has joined #openstack-kolla | 16:57 | |
Fl1nt | mgoddard, thanks, gonna dive deeper on tomorrow as from the review patch I can't really tell what they're really doing there are function call from external module that I can't understand otherwise than with having the code. | 17:06 |
*** Fl1nt has quit IRC | 17:11 | |
*** cah_link has joined #openstack-kolla | 17:12 | |
*** cah_link has quit IRC | 17:15 | |
*** e0ne has quit IRC | 17:22 | |
*** kevko has quit IRC | 17:23 | |
*** rpittau is now known as rpittau|afk | 17:27 | |
*** k_mouza has quit IRC | 17:30 | |
*** dougsz has quit IRC | 17:32 | |
*** kevko has joined #openstack-kolla | 17:42 | |
*** k_mouza has joined #openstack-kolla | 17:50 | |
*** gfidente is now known as gfidente|afk | 17:59 | |
*** mgoddard has quit IRC | 18:02 | |
*** k_mouza has quit IRC | 18:10 | |
*** k_mouza has joined #openstack-kolla | 18:19 | |
*** k_mouza has quit IRC | 18:56 | |
*** mgoddard has joined #openstack-kolla | 19:21 | |
*** kevko has quit IRC | 19:44 | |
*** k_mouza has joined #openstack-kolla | 19:57 | |
*** k_mouza has quit IRC | 20:01 | |
*** k_mouza has joined #openstack-kolla | 20:16 | |
*** k_mouza has quit IRC | 20:21 | |
*** mgoddard has quit IRC | 20:31 | |
*** TrevorV has quit IRC | 20:50 | |
*** gfidente|afk is now known as gfidente | 21:04 | |
*** hjensas_ has joined #openstack-kolla | 21:06 | |
*** hjensas has quit IRC | 21:10 | |
*** jovial[m] has quit IRC | 21:41 | |
*** muhaha has quit IRC | 21:47 | |
*** jovial[m] has joined #openstack-kolla | 22:05 | |
openstackgerrit | James Kirsch proposed openstack/kolla master: Add LetsEncrypt images for cert request/renewal https://review.opendev.org/741339 | 22:31 |
*** hjensas__ has joined #openstack-kolla | 22:47 | |
*** hjensas_ has quit IRC | 22:51 | |
*** quasar` is now known as parallax | 23:03 | |
*** parallax has left #openstack-kolla | 23:07 | |
*** parallax has joined #openstack-kolla | 23:15 | |
*** Arador has joined #openstack-kolla | 23:36 | |
Arador | Hello, new here and new to Kolla, but I have been using Openstack for a while. Is anyone here familiar with getting the Adjutant role to work? I am getting an error that say "no filter named 'customise_fluentd'" | 23:40 |
*** mloza has quit IRC | 23:46 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!