*** sdake_ has joined #openstack-ansible | 00:00 | |
*** sdake has quit IRC | 00:03 | |
*** sdake_ is now known as sdake | 00:06 | |
*** annashen has joined #openstack-ansible | 00:14 | |
*** annashen has quit IRC | 00:19 | |
*** JRobinson__ has joined #openstack-ansible | 00:34 | |
*** sdake_ has joined #openstack-ansible | 00:41 | |
*** abitha has quit IRC | 00:42 | |
*** sdake has quit IRC | 00:44 | |
*** openstack has joined #openstack-ansible | 00:56 | |
*** markvoelker has joined #openstack-ansible | 00:56 | |
*** dkalleg has quit IRC | 01:02 | |
*** markvoelker has quit IRC | 01:02 | |
*** ccrouch has quit IRC | 01:14 | |
*** tlian has joined #openstack-ansible | 01:33 | |
*** markvoelker has joined #openstack-ansible | 01:57 | |
*** britthouser has joined #openstack-ansible | 01:58 | |
*** britthou_ has quit IRC | 02:02 | |
*** markvoelker has quit IRC | 02:02 | |
*** daneyon_ has quit IRC | 02:13 | |
*** gtt116 has quit IRC | 02:20 | |
*** sdake_ has quit IRC | 02:34 | |
*** stevemar has joined #openstack-ansible | 03:01 | |
*** tlian has quit IRC | 03:17 | |
*** annashen has joined #openstack-ansible | 03:17 | |
*** stevemar has quit IRC | 03:22 | |
*** annashen has quit IRC | 03:22 | |
*** stevemar has joined #openstack-ansible | 03:26 | |
*** sdake has joined #openstack-ansible | 03:33 | |
*** markvoelker has joined #openstack-ansible | 03:46 | |
*** markvoelker has quit IRC | 03:51 | |
*** sdake_ has joined #openstack-ansible | 04:07 | |
*** abitha has joined #openstack-ansible | 04:10 | |
*** sdake has quit IRC | 04:10 | |
*** CheKoLyN has quit IRC | 04:10 | |
*** abitha has quit IRC | 04:10 | |
*** abitha has joined #openstack-ansible | 04:11 | |
*** sreejithp has joined #openstack-ansible | 04:28 | |
*** sreejithp has quit IRC | 04:29 | |
*** sreejithp has joined #openstack-ansible | 04:29 | |
*** markvoelker has joined #openstack-ansible | 04:47 | |
*** markvoelker has quit IRC | 04:51 | |
*** abitha has quit IRC | 05:18 | |
*** sdake_ has quit IRC | 05:33 | |
*** sreejithp has quit IRC | 05:48 | |
*** sreejithp has joined #openstack-ansible | 05:48 | |
*** javeriak has joined #openstack-ansible | 05:51 | |
*** radek_ has joined #openstack-ansible | 05:58 | |
*** stevemar has quit IRC | 06:10 | |
*** annashen has joined #openstack-ansible | 06:20 | |
*** annashen has quit IRC | 06:25 | |
*** markvoelker has joined #openstack-ansible | 06:36 | |
*** markvoelker has quit IRC | 06:40 | |
*** javeriak has quit IRC | 06:54 | |
*** JRobinson__ has quit IRC | 07:12 | |
mancdaz | woo gerrit has started working with our non-trunk branches! | 08:14 |
---|---|---|
mancdaz | https://bugs.launchpad.net/openstack-ansible/juno/+bug/1466010 | 08:14 |
openstack | Launchpad bug 1466010 in openstack-ansible juno "Tempest requirements not in sync with openstack" [High,Fix committed] - Assigned to Hugh Saunders (hughsaunders) | 08:14 |
mancdaz | it's a little thing, but it added a tag 'in-juno' which makes it easier to track bugs between releases | 08:15 |
mancdaz | if we have a bug targeted to a particular series, it will also update the status as the gerrit review progresses (like it does already for trunk) | 08:15 |
mancdaz | maybe it's just me, but this is cool | 08:16 |
*** markvoelker has joined #openstack-ansible | 08:25 | |
*** markvoelker has quit IRC | 08:29 | |
odyssey4me | mancdaz yeah, I didn't notice the tag but I did notice it changing the series status :) | 08:38 |
odyssey4me | that's a great combo, actually -just in case we haven't done the series targeting just yet :) | 08:39 |
mancdaz | odyssey4me yeah in theory that tag is useful when you do releases - you mark all the things with that tag as fix released, then remove the tag | 08:40 |
mancdaz | means you don't need to target all your bugs at a milestone in advance | 08:41 |
mancdaz | you just target the ones that have actually gone in by the time you want to release | 08:41 |
mancdaz | by looking for that tag | 08:41 |
odyssey4me | ah, nice :) | 08:41 |
mancdaz | so you'd then only specifically target things that you definitely have to have in a particular release | 08:42 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Fixed empty yaml load https://review.openstack.org/193457 | 08:43 |
openstackgerrit | Andy McCrae proposed stackforge/os-ansible-deployment: Add read/write_affinity settings for Swift https://review.openstack.org/191023 | 09:08 |
*** annashen has joined #openstack-ansible | 09:22 | |
*** annashen has quit IRC | 09:27 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: Split environment file into component parts https://review.openstack.org/192102 | 09:29 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Allow protocol to be set per endpoint-type https://review.openstack.org/192148 | 09:35 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Update rabbitmq to v3.5.3-1 https://review.openstack.org/192899 | 09:35 |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Implement service_region user_group_var https://review.openstack.org/193468 | 09:38 |
*** markvoelker has joined #openstack-ansible | 10:13 | |
*** markvoelker has quit IRC | 10:18 | |
*** sreejithp has quit IRC | 11:07 | |
*** sreejithp has joined #openstack-ansible | 11:08 | |
*** sreejithp has left #openstack-ansible | 11:09 | |
*** sreejithp has joined #openstack-ansible | 11:09 | |
*** sreejithp has left #openstack-ansible | 11:13 | |
*** markvoelker has joined #openstack-ansible | 11:29 | |
openstackgerrit | Andy McCrae proposed stackforge/os-ansible-deployment: Add read/write_affinity settings for Swift https://review.openstack.org/191023 | 11:30 |
*** markvoelker has quit IRC | 11:34 | |
*** sreejithp has joined #openstack-ansible | 11:43 | |
svg | trying to understand the dist_sort module, more specific as used in cloudnull's example: https://gist.github.com/cloudnull/bf7311fb93bca16c9c3b - Am I saying it right that this will shift the memcached servers until the one that is on the same metal_host as the current container, is placed first? | 11:53 |
*** markvoelker has joined #openstack-ansible | 12:05 | |
openstackgerrit | Jesse Pretorius proposed stackforge/os-ansible-deployment: Allow galera wsrep_provider_options to be customised https://review.openstack.org/191106 | 12:16 |
mancdaz | hughsaunders: https://bugs.launchpad.net/openstack-ansible/+bug/1445421 is this 'done' ? | 12:17 |
openstack | Launchpad bug 1445421 in openstack-ansible trunk "Juno to Kilo upgrades should be tested" [High,Triaged] - Assigned to Hugh Saunders (hughsaunders) | 12:17 |
* hughsaunders looks | 12:18 | |
hughsaunders | mancdaz: no | 12:18 |
hughsaunders | but unlikely to become more-done given the current state of things | 12:18 |
*** tlian has joined #openstack-ansible | 12:20 | |
*** sreejithp has left #openstack-ansible | 12:34 | |
cloudnull | Morning | 12:36 |
*** sreejithp has joined #openstack-ansible | 12:40 | |
odyssey4me | o/ cloudnull | 12:45 |
cloudnull | Helo | 12:45 |
odyssey4me | I just hit an issue when trying to work out federation configuration - when trying to generate the SP cert it fails, and it turns out that the script uses the hostname for the cert generation. | 12:46 |
odyssey4me | The failure is because the hostname contains underscores (as all our container host names do). | 12:46 |
odyssey4me | I remember hitting issues when working on log analysis with the same thing, and I had to work around it. | 12:46 |
odyssey4me | As I recall I just used the IP address instead, but it's not ideal. | 12:47 |
odyssey4me | It'll be better to replace the underscores with dashes, I would think... perhaps just in the hostname inside the container? | 12:48 |
cloudnull | Yup. We should fix that. It's an issue in the dynamic inventory. We're using the ansible group name in the container hostname and that contains underscores , by ansible convention its not a requirement , so we should fix that in the next release. | 12:48 |
odyssey4me | but if we do this, how would we handle upgraded environments? | 12:49 |
cloudnull | It should be easy enough to fix by doing a simple replace where the hostname is written. We could most the hostname resolution stuff into its own general purpose role. Or maybe we add some bits to the upgrade script ? | 12:50 |
cloudnull | *move | 12:50 |
*** jaypipes has quit IRC | 12:51 | |
cloudnull | Or maybe we add a task in the container build to ensure the hostname used is valid ? | 12:51 |
cloudnull | In that case it would simply attach, check, change. And should be fixed on an upgrade. | 12:52 |
odyssey4me | cloudnull I guess. :/ If we had throw-away containers this would be less of an issue. :) | 12:53 |
cloudnull | All of the api container could be thrown away. | 12:54 |
cloudnull | They contain no state. | 12:54 |
cloudnull | I mean there are logs, but we have proper log shipping. So who cares. | 12:55 |
cloudnull | The db containers its harder. | 12:55 |
cloudnull | But we bind mount the bits out to the host. So again , it should be OK. | 12:55 |
cloudnull | So you should feel free to nuke as needed. Just expect downtime which replacing your thrown away containers. | 12:56 |
odyssey4me | yeah, the whole idea of deploying from images would completely change how we deal with things | 12:57 |
odyssey4me | ansible would only have to focus on actual hosts - the containers would just be assets | 12:57 |
cloudnull | But then you have two distinct code paths for two distinct deployment processes. | 12:58 |
cloudnull | And you need an image pipeline to do updates within the environment. | 12:59 |
*** KLevenstein has joined #openstack-ansible | 12:59 | |
cloudnull | Also how would one upgrade a container assest by replacing it and maintain active connections in the case like cinder? | 13:02 |
odyssey4me | cloudnull yeah, I know that's one of the trouble points | 13:02 |
cloudnull | It connects to a block device and replacing the container means you have 100% downtime for that resource. | 13:02 |
cloudnull | The whole immutable container idea is great but in practice its not practical in complex infrastructure. | 13:03 |
odyssey4me | cloudnull the downtime only happens if you're using specific kinds of storage | 13:04 |
cloudnull | This is true , anything with a local block device attachment | 13:04 |
cloudnull | If using netapp vnx it should remain online. | 13:05 |
odyssey4me | or ceph | 13:05 |
odyssey4me | etc | 13:05 |
*** sdake has joined #openstack-ansible | 13:05 | |
cloudnull | Basically lvm or nfs would be impossible which is what most users are deploying. | 13:06 |
hughsaunders | odyssey4me: re k-k-fed post, that is a great post, lots of code towards the end... looks like quite a lot of that functionality is available in openstack client | 13:07 |
*** sdake_ has joined #openstack-ansible | 13:07 | |
*** sdake has quit IRC | 13:10 | |
cloudnull | odyssey4me: hughsaunders: in your k-k travels did you need to pull in the follwoing reviews? | 13:12 |
cloudnull | https://review.openstack.org/#/c/188581/ | 13:12 |
cloudnull | https://review.openstack.org/#/c/188881/ | 13:13 |
cloudnull | based on spec https://blueprints.launchpad.net/python-keystoneclient/+spec/k2k-auth-plugin ? | 13:13 |
hughsaunders | cloudnull: so far my travels have been theoretical | 13:13 |
cloudnull | ok, just curious. | 13:13 |
hughsaunders | but it looks the post we're reading is from the same guy http://blog.rodrigods.com/it-is-time-to-play-with-keystone-to-keystone-federation-in-kilo/ | 13:14 |
odyssey4me | cloudnull I'm focused on Keystone as an SP to an ADFS IDP, so neither of those appear required. | 13:15 |
odyssey4me | rumour has it that dstanek may have some tweaks to openstack-ansible to make things work :) | 13:16 |
cloudnull | ah. the land of thw windows . | 13:16 |
cloudnull | ++ dstanek | 13:17 |
dstanek | cloudnull: howdy | 13:17 |
*** sreejithp has left #openstack-ansible | 13:17 | |
odyssey4me | dstanek rumour has it you may have some bits you could share for making federation work - perhaps you could share in the form of a review, or even just a fork with patches in it? | 13:18 |
dstanek | odyssey4me: yes, i'm working on some k2k stuff that i can share with you. it isn't osad, but maybe you'll find it useful? | 13:19 |
odyssey4me | dstanek ah - well, hughsaunders and miguelgrinberg are working on that particular track for including it into osad | 13:20 |
*** sdake has joined #openstack-ansible | 13:20 | |
odyssey4me | if you'd be happy to share with them at least that'd be great | 13:20 |
odyssey4me | I'm having fun and games with ADFS and Shibboleth | 13:21 |
hughsaunders | dstanek: whatcha up to? | 13:21 |
dstanek | i've been working on a little k2k for an upcoming blog post | 13:23 |
*** sdake_ has quit IRC | 13:24 | |
*** jaypipes has joined #openstack-ansible | 13:26 | |
hughsaunders | dstanek: cool, link us up when you're ready please :) | 13:26 |
*** jaypipes is now known as leakypipes | 13:50 | |
*** subscope has quit IRC | 13:56 | |
palendae | mancdaz: RE gerrit working on non-trunk: sigmavirus24_awa's patch to infra for not requiring stable/ in names was accepted | 13:57 |
mancdaz | palendae yarp | 13:57 |
mancdaz | that's why it's working :) | 13:57 |
palendae | Mmmhmm | 13:58 |
mancdaz | it only took about 4 months to get it in | 13:58 |
*** Mudpuppy has joined #openstack-ansible | 14:10 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 14:12 | |
sigmavirus24 | mancdaz: like most teams, there are higher priorities and lower priorities, and that was a low priority for infra | 14:14 |
mancdaz | sigmavirus24 of course | 14:14 |
palendae | sigmavirus24: For https://review.openstack.org/#/c/193277/ - was I not already using section headings? | 14:14 |
palendae | I'm trying a definition list right now, and I lose the anchors to scripts, which is a useful thing to have, IMO | 14:15 |
sigmavirus24 | oh hah | 14:18 |
sigmavirus24 | gertty didn't show that | 14:18 |
sigmavirus24 | bug in gertty | 14:18 |
sigmavirus24 | good job gertty | 14:19 |
sigmavirus24 | Gertty must use a regex to strip out +s in the diff | 14:19 |
sigmavirus24 | And it's probably over-enthusiastic | 14:19 |
palendae | Ok, I'll leave that alone then | 14:20 |
sigmavirus24 | https://imgur.com/a/PVEMr | 14:23 |
b3rnard0 | odyssey4me: cloudnull: and anyone else. meetbot manual information added to the bottom of the meetings page | 14:24 |
*** stevemar has joined #openstack-ansible | 14:24 | |
palendae | sigmavirus24: bug reports welcome? | 14:24 |
sigmavirus24 | Of course | 14:24 |
sigmavirus24 | I'm looking for where to report them now | 14:24 |
palendae | Probably LP | 14:25 |
*** stevemar is now known as stevedoor | 14:29 | |
openstackgerrit | Andy McCrae proposed stackforge/os-ansible-deployment: Add global endpoint_type_proto options https://review.openstack.org/193573 | 14:31 |
*** lkoranda has quit IRC | 14:32 | |
*** sreejithp has joined #openstack-ansible | 14:48 | |
sigmavirus24 | palendae: https://storyboard.openstack.org/#!/story/2000299 | 14:50 |
*** daneyon has joined #openstack-ansible | 14:52 | |
sigmavirus24 | palendae: oh man, my reviews shows up on the wrong lines too | 14:52 |
*** Mudpuppy has quit IRC | 14:57 | |
*** daneyon_ has joined #openstack-ansible | 14:59 | |
svg | keystone config uses the (default?) dogpile.cache.memcached driver for it's backend - this is hardcoded and not parametrized in osad | 15:01 |
svg | according to http://docs.openstack.org/kilo/config-reference/content/section_keystone.conf.html in production it is recommended to use keystone.cache.memcache_pool | 15:01 |
svg | Gonna file a big, yel at me if there is a good reason for how it is now ;) | 15:02 |
svg | bug! | 15:02 |
*** daneyon has quit IRC | 15:02 | |
odyssey4me | svg hang on, I think that is parameterised | 15:04 |
svg | see playbooks/roles/os_keystone/templates/keystone.conf.j2, it's not templated | 15:04 |
odyssey4me | svg ah, you're right - we've simplified things... either cached or not cached... if cached then use dogpile | 15:05 |
svg | actually, what is dogpile? the upstream doc seems to infer that does not use pooling, but I'm not sure what it does then | 15:06 |
odyssey4me | svg according to http://docs.openstack.org/developer/keystone/configuration.html keystone.cache.memcache_pool is an eventlet safe implementation of dogpile.cache.memcached | 15:08 |
odyssey4me | we're not using eventlet (we deploy keystone behind apache) | 15:08 |
odyssey4me | but yeah, perhaps we should allow parameterisation | 15:08 |
dstanek | svg: dogpile is just a cache abstraction | 15:09 |
*** Mudpuppy has joined #openstack-ansible | 15:11 | |
*** daneyon_ has quit IRC | 15:12 | |
svg | ok, so not really a difference on how it handles the pooling I guess? | 15:12 |
*** sreejithp has left #openstack-ansible | 15:20 | |
sigmavirus24 | so dstanek dolphm do we need a cronjob for https://review.openstack.org/#/c/189998/ ? | 15:24 |
*** JTen has joined #openstack-ansible | 15:24 | |
*** sreejithp has joined #openstack-ansible | 15:26 | |
*** annashen has joined #openstack-ansible | 15:27 | |
sigmavirus24 | http://lbragstad.com/?p=133 doesn't recommend having a cron job to rotate keys, it simply talks about a calculation for the case where you had a token with a certain lifetime and a rotation timeline | 15:28 |
sigmavirus24 | Also http://docs.openstack.org/developer/keystone/configuration.html#uuid-pki-pkiz-or-fernet says "can" not "should" ;) | 15:30 |
odyssey4me | sigmavirus24 yeah, it's not clear whether key rotation is essential for good security or not | 15:32 |
*** annashen has quit IRC | 15:32 | |
odyssey4me | and it'd be useful to have some sort of guidelines for how often, how many, etc | 15:33 |
sigmavirus24 | odyssey4me: also, how would we set up a cron job for this without generating ssh-keys for keystone containers and distributing them to the other keystone containers as authorized keys? | 15:33 |
stevedoor | sigmavirus24, probably want a cron job for rotating keys in production :( | 15:33 |
odyssey4me | sigmavirus24 perhaps we count on network time being equal and ensure that the cron happens at the same time? :p | 15:34 |
sigmavirus24 | stevedoor: so the problem here is the way we set-up our containers and the fact that keystone-container-a can't ssh into keystone-container-{b,c,d,...} to distribute/synchronize the rotated keys | 15:34 |
sigmavirus24 | stevedoor: will fernet_rotate deterministically generate the new key? | 15:34 |
odyssey4me | sigmavirus24 why can't one container ssh to another? surely we could implement ssh key generation in keystone[0] and distribute? | 15:35 |
stevedoor | sigmavirus24, i don't know about generating a new one? i think thats what the first step does? | 15:35 |
sigmavirus24 | i.e., assuming a keystone service has keys 0 and 1, will using fernet_rotate generate the same new key deterministically for each of those keystones? | 15:35 |
sigmavirus24 | odyssey4me: we could, but we currently don't have that | 15:36 |
sigmavirus24 | odyssey4me: and that's orthogonal to the review that's being blocked by that | 15:36 |
odyssey4me | so if you generate once, and ship them around - when you rotate on each server will the rotation be the same? | 15:37 |
sigmavirus24 | ^ | 15:37 |
sigmavirus24 | that's the better way of phrasing my question since I need more coffee clearly | 15:37 |
*** daneyon has joined #openstack-ansible | 15:46 | |
dstanek | sigmavirus24: it won't generate the same key - you generate on one keystone and sync to the others | 15:50 |
sigmavirus24 | thought so | 15:50 |
*** georgem1 has joined #openstack-ansible | 15:55 | |
*** cbaesema has quit IRC | 16:04 | |
dolphm | sigmavirus24: i don't think you *need* a cronjob, but i suppose it depends on how fast you believe your adversary can break both a 128 bit AES key and a 128 bit HMAC key | 16:17 |
dolphm | sigmavirus24: the job would have to be on the ansible host, though | 16:18 |
dolphm | sigmavirus24: stevedoor: and no, there's no determinism in key generation | 16:19 |
sigmavirus24 | I didn't think it would be, but I wanted to make sure | 16:19 |
sigmavirus24 | dolphm: correct, for now the job would have to be on the ansible host | 16:20 |
sigmavirus24 | Until we generate an ssh key for the keystone_all[0] and distribute it to the rest of the keystone containers | 16:21 |
*** palendae has quit IRC | 16:27 | |
*** palendae has joined #openstack-ansible | 16:27 | |
*** sdake_ has joined #openstack-ansible | 16:27 | |
*** sdake has quit IRC | 16:31 | |
*** lkoranda has joined #openstack-ansible | 16:38 | |
openstackgerrit | Nolan Brubaker proposed stackforge/os-ansible-deployment: Generate consistent length passwords in AIO https://review.openstack.org/193628 | 16:39 |
palendae | ^ backport for something already merged in master | 16:42 |
*** lkoranda has quit IRC | 16:51 | |
*** sdake has joined #openstack-ansible | 16:52 | |
*** sdake_ has quit IRC | 16:55 | |
*** lkoranda has joined #openstack-ansible | 17:04 | |
*** sdake_ has joined #openstack-ansible | 17:05 | |
*** dkalleg has joined #openstack-ansible | 17:07 | |
*** lkoranda has quit IRC | 17:08 | |
*** sdake has quit IRC | 17:09 | |
*** lkoranda has joined #openstack-ansible | 17:10 | |
*** javeriak has joined #openstack-ansible | 17:11 | |
*** abitha has joined #openstack-ansible | 17:16 | |
*** KLevenstein is now known as KLev-afk | 17:17 | |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Tag prep for 9.0.10 https://review.openstack.org/193650 | 17:22 |
*** annashen has joined #openstack-ansible | 17:27 | |
*** sreejithp has quit IRC | 17:29 | |
*** jwagner_away is now known as jwagner | 17:34 | |
dolphm | sigmavirus24: p.s. i'm maintaining my +1 -- don't see a reason to implement a rotation cronjob in the first pass | 17:48 |
dolphm | sigmavirus24: it's certainly a valuable security hardening exercise though | 17:49 |
*** javeriak has quit IRC | 17:59 | |
sigmavirus24 | dolphm: I agree | 18:06 |
openstackgerrit | Merged stackforge/os-ansible-deployment: Add documentation to user config file https://review.openstack.org/193162 | 18:11 |
*** javeriak has joined #openstack-ansible | 18:14 | |
svg | can someone explain me the difference between [cache] and [memcache] in keystone, and why the former points to the local memcaches in the same container? | 18:18 |
sigmavirus24 | svg: that's a good question | 18:30 |
sigmavirus24 | I don't have an answer but I'll see if I can figure it out | 18:30 |
svg | the reason I'm looking into it is that I experience problems with the current setup when memcaches go down, as it is pooled end not load balanced | 18:32 |
cloudnull | cores https://review.openstack.org/#/c/193650/ <- please make this go for 9.0.10 | 18:32 |
cloudnull | sigmavirus24 palendae mattt odyssey4me hughsaunders d34dh0r53 andymccr ^^ | 18:32 |
sigmavirus24 | cloudnull: do you have any ideas about svg's question? | 18:33 |
* cloudnull looking back | 18:33 | |
cloudnull | dolphm: ^^ | 18:38 |
*** stevedoor has quit IRC | 18:40 | |
*** javeriak has quit IRC | 18:40 | |
cloudnull | svg by default the keystone config for kilo is `keystone.token.persistence.backends.memcache_pool.Token` which will use the local memcached within keystone for token storage, and is hashed to all other keystone containers. | 18:41 |
cloudnull | once sigmavirus24 's fernet changes go through fernet will become the new default. | 18:41 |
svg | is that the [cache] or [memcache] part? | 18:43 |
svg | as you say it uses the 'local' memcache, I assume it's the [memcache] one | 18:43 |
cloudnull | dolphm: correct me if im wrong but im almost positive that is the [memcache] backend | 18:44 |
svg | 'local' being a bad name, as it's the list of memcaches within the keystone container | 18:44 |
cloudnull | yes, local as in the memcache deployed within keystone | 18:44 |
svg | that howver does not explain why it is a separete set of memcaches instead of the containerized infra ones | 18:45 |
cloudnull | that way token storage is not in a global cache. | 18:45 |
*** KLev-afk is now known as KLevenstein | 18:46 | |
svg | can I assume the logic how both sets of pools are connected to is the same? | 18:46 |
cloudnull | and you could flush all tokens without impacting the rest of the gobal cache | 18:46 |
svg | ok | 18:46 |
dolphm | svg: cloudnull: sigmavirus24: i'm catching up... | 18:46 |
sigmavirus24 | dolphm: ++ | 18:46 |
cloudnull | svg dolphm == keystone guru | 18:46 |
svg | I assumed that :) | 18:47 |
dolphm | i do believe that's the difference. [cache] is used for caching in front of sql/ldap, and [memcache] configures memcache as a sort-of-persistent kvs backend | 18:47 |
dolphm | ... used for token persistence | 18:48 |
dolphm | so with the transition to fernet as a token provider, you no longer need a token backend, and therefore [memcache] will go unused | 18:49 |
svg | ok, thx - I'm into a troubleshoot with kilo though | 18:50 |
svg | do you know how keystone does its logic in how it connects to the pool of servers? | 18:50 |
*** javeriak has joined #openstack-ansible | 18:56 | |
svg | dolphm: ^^ | 19:04 |
dolphm | svg: i assume you mean for memcache? | 19:04 |
svg | yes | 19:05 |
svg | for both types of memcaches | 19:05 |
svg | when say 3 of 5 memcaches go down, all client operations that involve keystone are dead slow b/c long timeouts on memcaches | 19:06 |
dolphm | svg: for [memcache] we use python-memcached, which handles pooling | 19:06 |
dstanek | svg: so the connection attempts don't timeout? | 19:07 |
dolphm | svg: implementation is here, i believe https://github.com/linsomniac/python-memcached/blob/master/memcache.py | 19:07 |
svg | they do, but that takes seconds | 19:07 |
dstanek | that may mean the timeout is seconds :-( | 19:08 |
svg | uhuh | 19:08 |
svg | and it doesn't sounds like dead servers are avoided - or the dead ttl is to low too, or something like that | 19:08 |
dstanek | svg: they are not | 19:08 |
dstanek | the pool you are referring to is not a pool of servers, but a pool of connections | 19:09 |
dstanek | the way memcache works is that a key is hashed to 1 of the servers and that is the only one that has the value | 19:09 |
svg | ok, that rings a bell - I tried understanding ppythion-memcached | 19:10 |
dstanek | so in your case if 3 or the 5 servers were down then roughly 60% of your requests would slow down (assuming an even distribution in the hashing algorithm) | 19:11 |
svg | sound slike what I experience yes | 19:11 |
dstanek | oh, i'm reading back in history here - if memcached is being used for tokens persistence then keystone will definitely have issues if any server is down | 19:14 |
svg | ok, so it seems to me that it is a bad idea here to define a non balanced list of memcaches | 19:16 |
svg | given osad configures a local memcache instance, it should suffice to only point to that local instance, IMHO? | 19:17 |
svg | cloudnull: ^^ | 19:17 |
*** sdake_ has quit IRC | 19:19 | |
*** daneyon has quit IRC | 19:22 | |
svg | dstanek: so that is not really pooling, as dolphm said | 19:25 |
svg | does the same logic apply for [cache]? | 19:25 |
dolphm | svg: it's taking advantage of a distributed cluster using an array of connections ;) | 19:31 |
svg | as long as no nodes go down :) | 19:32 |
dolphm | svg: all keystone instances should share the exact same [memcache] configuration - it's a common backend behind the application layer | 19:32 |
svg | ah, that is important | 19:32 |
dolphm | svg: whereas [cache] can exist per application node, and there's zero reason to HA | 19:32 |
svg | aha again | 19:33 |
svg | so only the [memcache] should be HA | 19:33 |
cloudnull | so our [cache] pooling should be limited to local memcached. | 19:33 |
cloudnull | and not use a connection pool | 19:34 |
cloudnull | ? | 19:34 |
dolphm | cloudnull: how is [cache] setup now? | 19:34 |
svg | dolphm: how can we best handle the case for [memcache] where nodes might go down? | 19:34 |
*** metral has quit IRC | 19:34 | |
*** jwagner is now known as jwagner_away | 19:34 | |
dolphm | scenarios for [cache]: A) the running application is running and a has a cache, B) the running application can run without a cache and performance suffers as the backend load increases, or C) the application is down and there's no reason to have a cache | 19:34 |
svg | sounds like the opposite of how things are now? | 19:34 |
*** annashen has quit IRC | 19:35 | |
svg | dolphm: so all openstack components that use memcache basically should have their own local cache on localhost | 19:35 |
cloudnull | we are using dogpiles ability to connect to multuple caches | 19:36 |
svg | and only keystone [memcache should have soething distributed] | 19:36 |
dolphm | svg: well, that's a question of memory. in a world with unlimited memory, yes | 19:36 |
cloudnull | for the [cache] section | 19:36 |
dstanek | i don't think that would be very efficient | 19:36 |
svg | agreed | 19:36 |
svg | the problem that I want to solve, is when nodes go down, and timeouts make the stack sluggish | 19:37 |
dolphm | dstanek: i assume if you're passing multiple memcache instances to dogpile.cache, then you're just using a distributed cache... right? | 19:37 |
svg | so we could configure each os component to only use the memcache on the local metal host perhaps | 19:37 |
dstanek | svg: we may need to solve that in the memcache client - | 19:37 |
dolphm | dstanek: and we don't use dogpile to provide a token backend, do we? | 19:38 |
dstanek | dolphm: what do you mean by using a distributed cache? | 19:38 |
dstanek | i think it hashes each object and puts it on the correct server | 19:38 |
dolphm | svg: that would reduce the risk, for sure | 19:38 |
dolphm | svg: if a cache is down, the service using it is likely to have gone with it | 19:39 |
dolphm | dstanek: that's what i mean | 19:39 |
dstanek | svg: you might find that your cache per node will be mostly useless | 19:39 |
dolphm | dstanek: more cache misses? | 19:40 |
svg | but osad deploys a separate container with memcache for all os caches, and configures all components with a list of those containers | 19:40 |
dstanek | yeah, you only have so much memory and i'd expect things to be evicted quickly | 19:40 |
svg | for the keystone [memcache] it does the same, except it uses a separate memcached within the same container as keystone | 19:40 |
svg | in both cases there is a problem if one or more of the metal hosts go down | 19:41 |
svg | still not sure what the solution needs to be | 19:42 |
dstanek | python-memcached should mark the server as dead and not try use use it again - https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1306 | 19:42 |
dstanek | maybe we need to tune the dead_retry? | 19:43 |
dstanek | i haven't hacked on memcache in a long time, so i'm trying to catch up to where i was | 19:43 |
*** jwagner_away is now known as jwagner | 19:43 | |
cloudnull | palendae: still around https://review.openstack.org/#/c/193650/ | 19:46 |
svg | as of now there is non tuning | 19:47 |
svg | so yes that might be it | 19:48 |
cloudnull | sorry im only 50% here. | 19:49 |
cloudnull | but will try some things in a bit | 19:49 |
svg | dstanek: possibly, what I am seing might be related to the fact that, in a small stack, not that many requests come through, so that a dead server comes back before the next request is made | 19:51 |
svg | as keystine itself is load balanced... | 19:51 |
*** annashen has joined #openstack-ansible | 19:53 | |
svg | on a side note, also wondering why the repo-server has its own memcached instance btw | 19:53 |
cloudnull | repo-servers has memcached there for the distribution of the lsync keys. the repo-servers role can be deployed anywhere, even outside of osad all together so it needs to be able to do everything on its own. | 19:57 |
svg | k, I noticed the infra memcaches are also used to distribute the horizon ssl keys | 19:59 |
svg | as in, I broke that trying to configure some custom values for memcached_servers | 20:00 |
svg | I am now wondering if that is a good idea | 20:00 |
*** metral_zzz has joined #openstack-ansible | 20:00 | |
cloudnull | and nova | 20:01 |
*** metral_zzz is now known as metral | 20:01 | |
svg | nova has ssl keys? | 20:01 |
cloudnull | ssh keys. | 20:02 |
cloudnull | used for migrations between compute hosts. | 20:02 |
svg | why does that need to go through memcache? | 20:03 |
cloudnull | so they dont travel from a deployment host to a node. | 20:04 |
cloudnull | we encrypt locally, shard, distribute, and decrypt locally. | 20:04 |
cloudnull | otherwise we'd have to synchronize back to the deployment host and then distribute to the nodes. | 20:05 |
cloudnull | similar to the RHEL key deployment methodology just with memcached and encryption. | 20:06 |
*** stevemar has joined #openstack-ansible | 20:07 | |
svg | that synchronize part can happen just in memroy with a registered variable | 20:08 |
svg | what I'm thinking of: current setup assumes here all os components (and horizon here) on each node have the same set of memcache servers | 20:09 |
svg | given the info ^^ that this need not to be distributed, it stops the user to configure a different set of memcache servers on each node | 20:10 |
svg | not sure if that is a real problem, I just bumped into it trying to configure each os component with onlly the emcache server that lives on ethe same metal host | 20:11 |
cloudnull | so something like caching servers for nova, cinder, glance, etc... ? | 20:12 |
stevelle | cloudnull: I'm thinking this might be a scenario for that filter I wrote: https://github.com/stevelle/rpc-openstack/blob/poc-identity-only/rpcd/playbooks/plugins/filters/select_hostvar.py | 20:13 |
stevelle | svg I think I hear you trying to configure each os component to only talk to the memcache on the physical host it lives on | 20:15 |
stevelle | bbiab, lunch time | 20:15 |
*** sdake has joined #openstack-ansible | 20:16 | |
*** sdake_ has joined #openstack-ansible | 20:17 | |
*** Mudpuppy is now known as Cloudpuppy | 20:18 | |
*** Cloudpuppy is now known as Mudpuppy | 20:19 | |
svg | stevelle: yes, that was one of the options I tried | 20:19 |
svg | which by itself works, but it broke that horizon ssl key deployment | 20:19 |
*** sdake has quit IRC | 20:21 | |
*** tlian2 has joined #openstack-ansible | 20:23 | |
*** annashen has quit IRC | 20:23 | |
*** javeriak has quit IRC | 20:25 | |
*** annashen has joined #openstack-ansible | 20:25 | |
*** tlian has quit IRC | 20:25 | |
*** jwagner is now known as jwagner[away] | 20:32 | |
*** jwagner[away] is now known as jwagner_away | 20:38 | |
*** Mudpuppy_ has joined #openstack-ansible | 20:40 | |
svg | dolphm: dstanek cloudnull stevelle part of the solution might be to implement the extra configuration parameters introduced with this bug report https://review.openstack.org/#/c/119452/ | 20:40 |
svg | basically, I'm seeing the same behaviour as before that patch | 20:40 |
dstanek | svg: so you are seeing every request waiting on the downed memcached instance? | 20:43 |
*** daneyon has joined #openstack-ansible | 20:43 | |
svg | yes | 20:43 |
*** Mudpuppy has quit IRC | 20:43 | |
svg | which mostly comes down to keystone iirc | 20:44 |
*** georgem1 has quit IRC | 20:44 | |
*** Mudpuppy_ has quit IRC | 20:44 | |
dstanek | what's weird is that if the instance is truely gone it should fail really fast | 20:45 |
dstanek | is this for caching or the token backend? | 20:46 |
*** daneyon has quit IRC | 20:48 | |
openstackgerrit | Merged stackforge/os-ansible-deployment: Tag prep for 9.0.10 https://review.openstack.org/193650 | 20:49 |
cloudnull | dstanek: its for both. | 20:49 |
*** britthouser has quit IRC | 20:50 | |
cloudnull | default is keystone.token.providers.uuid.Provider w/ keystone.token.persistence.backends.memcache.Token | 20:50 |
stevelle | svg: if you used container_vars to make keystone to use the full set of memcaches, but overrode all the rest to use local-only would that be a good workaround today? | 20:50 |
dstanek | using the token backend will just fail if nodes are down - not much you can do | 20:50 |
dstanek | i wouldn't expect the described behavior from the caching though | 20:51 |
*** tlian2 has quit IRC | 20:51 | |
stevelle | (or is it that you want the reverse of that, I got confused :)) | 20:52 |
dolphm | dstanek: i've seen that behavior too. i configured keystone to use memcache and then forgot to install memcache. | 20:55 |
dstanek | dolphm: and you saw slow requests? | 20:55 |
dolphm | dstanek: yes | 20:55 |
dstanek | i would not expect waiting on memcached. slow because things are not in cache are work has to be done though | 20:56 |
*** radek_ has quit IRC | 20:58 | |
*** jwagner_away is now known as jwagner | 20:59 | |
svg | I assume the slow requests were due to both the token backend and or the caching | 20:59 |
dstanek | dolphm: http://paste.openstack.org/raw/305349/ <- this tooks milliseconds | 20:59 |
dstanek | svg: was a memcache instance for your token backend down? | 21:00 |
svg | well, the exact case was where we had 5 configured "controlers" with each having a container of memcache, keystone etc | 21:00 |
dolphm | dstanek: it wouldn't behave differently behind dogpile, would it? | 21:00 |
dstanek | dolphm: if the server connects, but doesn't respond there has to be a timeout | 21:00 |
svg | the token backend points to the 5 memcache instances that live in the keystone container | 21:01 |
dstanek | dolphm: i don't think so | 21:01 |
svg | of the 5 controller metal hosts, 2 were down (all containers were down) | 21:01 |
svg | so 2 of the 5 token backends were down too, just as 2 of the 5 memcached containers were down | 21:01 |
dstanek | svg: if any of keystone's memcached servers were down i wouldn't expect keystone to work very well | 21:01 |
svg | well, yes, thats is the case | 21:02 |
dstanek | it's like having your database down | 21:02 |
dolphm | dstanek: would keystone returned tokens that were never written to memcache? | 21:03 |
svg | dstanek: but that is what happens, as the token backends are not HA | 21:03 |
dstanek | dolphm: i'm not sure how that backend actually works; i think it's just fundamentally broken by design | 21:03 |
sigmavirus24 | dstanek: in openstack? Never | 21:07 |
sigmavirus24 | =P | 21:07 |
cloudnull | sigmavirus24: https://review.openstack.org/#/c/189998/ | 21:09 |
cloudnull | also svg ^ once that goes in memcached token backends can go away | 21:09 |
dstanek | dolphm: lol, i think it will transparently fail to write (that's what memcache is supposed to do) and then return the token like all it good | 21:09 |
svg | still feeling I'm in a catch 22 | 21:10 |
sigmavirus24 | cloudnull: follow-on patch instead? | 21:11 |
cloudnull | svg: good reading on the future of the token backend http://dolphm.com/openstack-keystone-fernet-tokens | 21:11 |
sigmavirus24 | This adds the needfuls to make fernet work while that would make fernet the default | 21:11 |
cloudnull | maybe ... | 21:11 |
dstanek | cloudnull: does osad use the memcache token backend by default? | 21:11 |
cloudnull | yes | 21:11 |
dstanek | bummer :-( | 21:11 |
cloudnull | juno used sql , kilo is memcached | 21:12 |
cloudnull | by default. | 21:12 |
dstanek | what was the driver behind that change? | 21:12 |
sigmavirus24 | does keystone use the memcached driver by default? I think we standardized our stuff to be 100% upstream defaults in kilo | 21:13 |
sigmavirus24 | see also: kilofication | 21:13 |
cloudnull | it was a recommended change for speed and reliability on high traffic systems. | 21:13 |
*** daneyon has joined #openstack-ansible | 21:14 | |
dstanek | sigmavirus24: i think it's sql, but i'd have to look | 21:14 |
sigmavirus24 | dstanek: nevermind | 21:14 |
cloudnull | dstanek: using sql under heavy load caused the token table to grow uncontrollably. | 21:14 |
cloudnull | even with token clean up on a per hour basis | 21:14 |
dstanek | cloudnull: i'd agree with speed, but disagree with reliability | 21:14 |
dstanek | the failure modes are truly terrible | 21:15 |
cloudnull | we found that heavy periods of load would make token clean up in sql impossible without manual intervention and deployers were needing to deal with token tables that were multiple gigabytes | 21:16 |
dstanek | dolphm: how many people would freak out if i posted a review to delete that backend :-) | 21:16 |
dstanek | cloudnull: yeah, i don't doubt that | 21:16 |
cloudnull | IE we had one user that would regularly need to clean up a token table that was > 50GB thus memcached. | 21:16 |
cloudnull | however with fernet all that goes away | 21:17 |
cloudnull | and there will be much rejoicing. | 21:17 |
cloudnull | sigmavirus24: nevermind | 21:18 |
svg | so that will solve the issue with the token backend, not the general cache | 21:18 |
sigmavirus24 | cloudnull: ? | 21:18 |
cloudnull | +2 | 21:18 |
sigmavirus24 | cloudnull: if you post the review to make fernet the default, I'll +2 to that ;) | 21:18 |
dstanek | sigmavirus24: ++ | 21:19 |
sigmavirus24 | dstanek: you can +1 ;) | 21:20 |
dstanek | svg: i don't know how to solve your issue (x% of memcached nodes down) if you are using the memcached token backend | 21:20 |
dstanek | sigmavirus24: the all powerful +1! | 21:20 |
sigmavirus24 | dstanek: you're an SME | 21:20 |
sigmavirus24 | Your +1 on 189998 would also be powerful | 21:21 |
*** javeriak has joined #openstack-ansible | 21:21 | |
dstanek | sigmavirus24: i'll take a look | 21:21 |
sigmavirus24 | dstanek: be sure to test it out | 21:21 |
sigmavirus24 | I don't know how many times I built an AIO with it but it's always good to have an N+1 run | 21:22 |
sigmavirus24 | ;) | 21:22 |
sigmavirus24 | dstanek: dolphm also, opinions about defaults for osad would be good, but should be added onto cloudnull's review to make it a default ;) | 21:22 |
*** annashen has quit IRC | 21:26 | |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Add support for deploying Keystone with Fernet https://review.openstack.org/189998 | 21:26 |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Updated keystone to use fernet as the default https://review.openstack.org/193729 | 21:26 |
cloudnull | sorry idk why it added a patch set to your sigmavirus24, my bad, but nothing changed. | 21:26 |
cloudnull | i made it a dep but it seems to have gotten angry | 21:27 |
dstanek | cloudnull: git review likes to rebase unless you tell it not to | 21:28 |
sigmavirus24 | weird | 21:28 |
sigmavirus24 | oh yeah | 21:28 |
sigmavirus24 | that's probably it | 21:28 |
sigmavirus24 | cloudnull: no need to apologize | 21:28 |
sigmavirus24 | gerrit doesn't wipe out votes any longer | 21:28 |
cloudnull | ++ dstanek, sigmavirus24. i forgot about the rebase happy process. | 21:30 |
cloudnull | but on that note. im going to sleep | 21:30 |
cloudnull | have a good weekend folks. | 21:30 |
*** KLevenstein has quit IRC | 21:31 | |
*** annashen has joined #openstack-ansible | 21:41 | |
sigmavirus24 | later | 21:41 |
*** stevemar has quit IRC | 21:52 | |
*** daneyon has quit IRC | 21:57 | |
*** abitha has quit IRC | 22:03 | |
*** abitha has joined #openstack-ansible | 22:03 | |
*** jwagner is now known as jwagner_away | 22:19 | |
*** sdake_ has quit IRC | 22:25 | |
*** javeriak has quit IRC | 22:47 | |
*** annashen has quit IRC | 22:50 | |
openstackgerrit | Steve Lewis proposed stackforge/os-ansible-deployment: Remove invalid client config option https://review.openstack.org/193747 | 23:00 |
stevelle | sigmavirus24: looks like just a repo-build.yml and os-keystone-install.yml would be necessary to apply the Fernet patch to an existing stack. Do you have any different thoughts? | 23:07 |
*** dkalleg has quit IRC | 23:08 | |
*** dkalleg has joined #openstack-ansible | 23:10 | |
*** dkalleg has quit IRC | 23:14 | |
openstackgerrit | Kevin Carter proposed stackforge/os-ansible-deployment: Updated keystone to use fernet as the default https://review.openstack.org/193729 | 23:16 |
sigmavirus24 | stevelle: sorry? | 23:16 |
sigmavirus24 | oh | 23:16 |
sigmavirus24 | for upgrades, I'm not certain | 23:16 |
sigmavirus24 | I've been building from scratch because you mentioned that you bjorked one of your AIOs by trying to do an upgrade | 23:17 |
stevelle | wham bam | 23:17 |
stevelle | will find out relatively soon | 23:21 |
*** jlvillal has quit IRC | 23:21 | |
sigmavirus24 | cool | 23:22 |
sigmavirus24 | Also, that assumes you updated user_variables, right? | 23:22 |
stevelle | just the one iirc | 23:27 |
sigmavirus24 | Yep | 23:29 |
*** dkalleg has joined #openstack-ansible | 23:32 | |
*** jlvillal has joined #openstack-ansible | 23:33 | |
*** tlian has joined #openstack-ansible | 23:33 | |
*** javeriak has joined #openstack-ansible | 23:37 | |
*** dkalleg has quit IRC | 23:47 | |
openstackgerrit | Steve Lewis proposed stackforge/os-ansible-deployment: Remove invalid client config option https://review.openstack.org/193747 | 23:49 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!