Friday, 2015-06-19

*** sdake_ has joined #openstack-ansible00:00
*** sdake has quit IRC00:03
*** sdake_ is now known as sdake00:06
*** annashen has joined #openstack-ansible00:14
*** annashen has quit IRC00:19
*** JRobinson__ has joined #openstack-ansible00:34
*** sdake_ has joined #openstack-ansible00:41
*** abitha has quit IRC00:42
*** sdake has quit IRC00:44
*** openstack has joined #openstack-ansible00:56
*** markvoelker has joined #openstack-ansible00:56
*** dkalleg has quit IRC01:02
*** markvoelker has quit IRC01:02
*** ccrouch has quit IRC01:14
*** tlian has joined #openstack-ansible01:33
*** markvoelker has joined #openstack-ansible01:57
*** britthouser has joined #openstack-ansible01:58
*** britthou_ has quit IRC02:02
*** markvoelker has quit IRC02:02
*** daneyon_ has quit IRC02:13
*** gtt116 has quit IRC02:20
*** sdake_ has quit IRC02:34
*** stevemar has joined #openstack-ansible03:01
*** tlian has quit IRC03:17
*** annashen has joined #openstack-ansible03:17
*** stevemar has quit IRC03:22
*** annashen has quit IRC03:22
*** stevemar has joined #openstack-ansible03:26
*** sdake has joined #openstack-ansible03:33
*** markvoelker has joined #openstack-ansible03:46
*** markvoelker has quit IRC03:51
*** sdake_ has joined #openstack-ansible04:07
*** abitha has joined #openstack-ansible04:10
*** sdake has quit IRC04:10
*** CheKoLyN has quit IRC04:10
*** abitha has quit IRC04:10
*** abitha has joined #openstack-ansible04:11
*** sreejithp has joined #openstack-ansible04:28
*** sreejithp has quit IRC04:29
*** sreejithp has joined #openstack-ansible04:29
*** markvoelker has joined #openstack-ansible04:47
*** markvoelker has quit IRC04:51
*** abitha has quit IRC05:18
*** sdake_ has quit IRC05:33
*** sreejithp has quit IRC05:48
*** sreejithp has joined #openstack-ansible05:48
*** javeriak has joined #openstack-ansible05:51
*** radek_ has joined #openstack-ansible05:58
*** stevemar has quit IRC06:10
*** annashen has joined #openstack-ansible06:20
*** annashen has quit IRC06:25
*** markvoelker has joined #openstack-ansible06:36
*** markvoelker has quit IRC06:40
*** javeriak has quit IRC06:54
*** JRobinson__ has quit IRC07:12
mancdazwoo gerrit has started working with our non-trunk branches!08:14
mancdazhttps://bugs.launchpad.net/openstack-ansible/juno/+bug/146601008:14
openstackLaunchpad bug 1466010 in openstack-ansible juno "Tempest requirements not in sync with openstack" [High,Fix committed] - Assigned to Hugh Saunders (hughsaunders)08:14
mancdazit's a little thing, but it added a tag 'in-juno' which makes it easier to track bugs between releases08:15
mancdazif we have a bug targeted to a particular series, it will also update the status as the gerrit review progresses (like it does already for trunk)08:15
mancdazmaybe it's just me, but this is cool08:16
*** markvoelker has joined #openstack-ansible08:25
*** markvoelker has quit IRC08:29
odyssey4memancdaz yeah, I didn't notice the tag but I did notice it changing the series status :)08:38
odyssey4methat's a great combo, actually -just in case we haven't done the series targeting just yet :)08:39
mancdazodyssey4me yeah in theory that tag is useful when you do releases - you mark all the things with that tag as fix released, then remove the tag08:40
mancdazmeans you don't need to target all your bugs at a milestone in advance08:41
mancdazyou just target the ones that have actually gone in by the time you want to release08:41
mancdazby looking for that tag08:41
odyssey4meah, nice :)08:41
mancdazso you'd then only specifically target things that you definitely have to have in a particular release08:42
openstackgerritJesse Pretorius proposed stackforge/os-ansible-deployment: Fixed empty yaml load  https://review.openstack.org/19345708:43
openstackgerritAndy McCrae proposed stackforge/os-ansible-deployment: Add read/write_affinity settings for Swift  https://review.openstack.org/19102309:08
*** annashen has joined #openstack-ansible09:22
*** annashen has quit IRC09:27
openstackgerritMerged stackforge/os-ansible-deployment: Split environment file into component parts  https://review.openstack.org/19210209:29
openstackgerritMerged stackforge/os-ansible-deployment: Allow protocol to be set per endpoint-type  https://review.openstack.org/19214809:35
openstackgerritMerged stackforge/os-ansible-deployment: Update rabbitmq to v3.5.3-1  https://review.openstack.org/19289909:35
openstackgerritJesse Pretorius proposed stackforge/os-ansible-deployment: Implement service_region user_group_var  https://review.openstack.org/19346809:38
*** markvoelker has joined #openstack-ansible10:13
*** markvoelker has quit IRC10:18
*** sreejithp has quit IRC11:07
*** sreejithp has joined #openstack-ansible11:08
*** sreejithp has left #openstack-ansible11:09
*** sreejithp has joined #openstack-ansible11:09
*** sreejithp has left #openstack-ansible11:13
*** markvoelker has joined #openstack-ansible11:29
openstackgerritAndy McCrae proposed stackforge/os-ansible-deployment: Add read/write_affinity settings for Swift  https://review.openstack.org/19102311:30
*** markvoelker has quit IRC11:34
*** sreejithp has joined #openstack-ansible11:43
svgtrying to understand the dist_sort module, more specific as used in cloudnull's example: https://gist.github.com/cloudnull/bf7311fb93bca16c9c3b - Am I saying it right that this will shift the memcached servers until the one that is on the same metal_host as the current container, is placed first?11:53
*** markvoelker has joined #openstack-ansible12:05
openstackgerritJesse Pretorius proposed stackforge/os-ansible-deployment: Allow galera wsrep_provider_options to be customised  https://review.openstack.org/19110612:16
mancdazhughsaunders: https://bugs.launchpad.net/openstack-ansible/+bug/1445421 is this 'done' ?12:17
openstackLaunchpad bug 1445421 in openstack-ansible trunk "Juno to Kilo upgrades should be tested" [High,Triaged] - Assigned to Hugh Saunders (hughsaunders)12:17
* hughsaunders looks12:18
hughsaundersmancdaz: no12:18
hughsaundersbut unlikely to become more-done given the current state of things12:18
*** tlian has joined #openstack-ansible12:20
*** sreejithp has left #openstack-ansible12:34
cloudnullMorning12:36
*** sreejithp has joined #openstack-ansible12:40
odyssey4meo/ cloudnull12:45
cloudnullHelo12:45
odyssey4meI just hit an issue when trying to work out federation configuration - when trying to generate the SP cert it fails, and it turns out that the script uses the hostname for the cert generation.12:46
odyssey4meThe failure is because the hostname contains underscores (as all our container host names do).12:46
odyssey4meI remember hitting issues when working on log analysis with the same thing, and I had to work around it.12:46
odyssey4meAs I recall I just used the IP address instead, but it's not ideal.12:47
odyssey4meIt'll be better to replace the underscores with dashes, I would think... perhaps just in the hostname inside the container?12:48
cloudnullYup. We should fix that. It's an issue in the dynamic inventory. We're using the ansible group name in the container hostname and that contains underscores , by ansible convention its not a requirement , so we should fix that in the next release.12:48
odyssey4mebut if we do this, how would we handle upgraded environments?12:49
cloudnullIt should be easy enough to fix by doing a simple replace where the hostname is written. We could most the hostname resolution stuff into its own general purpose role. Or maybe we add some bits to the upgrade script ?12:50
cloudnull*move12:50
*** jaypipes has quit IRC12:51
cloudnullOr maybe we add a task in the container build to ensure the hostname used is valid ?12:51
cloudnullIn that case it would simply attach, check, change. And should be fixed on an upgrade.12:52
odyssey4mecloudnull I guess. :/ If we had throw-away containers this would be less of an issue. :)12:53
cloudnullAll of the api container could be thrown away.12:54
cloudnullThey contain no state.12:54
cloudnullI mean there are logs, but we have proper log shipping. So who cares.12:55
cloudnullThe db containers its harder.12:55
cloudnullBut we bind mount the bits out to the host. So again , it should be OK.12:55
cloudnullSo you should feel free to nuke as needed. Just expect downtime which replacing your thrown away containers.12:56
odyssey4meyeah, the whole idea of deploying from images would completely change how we deal with things12:57
odyssey4meansible would only have to focus on actual hosts - the containers would just be assets12:57
cloudnullBut then you have two distinct code paths for two distinct deployment processes.12:58
cloudnullAnd you need an image pipeline to do updates within the environment.12:59
*** KLevenstein has joined #openstack-ansible12:59
cloudnullAlso how would one upgrade a container assest by replacing it and maintain active connections in the case like cinder?13:02
odyssey4mecloudnull yeah, I know that's one of the trouble points13:02
cloudnullIt connects to a block device and replacing the container means you have 100% downtime for that resource.13:02
cloudnullThe whole immutable container idea is great but in practice its not practical in complex infrastructure.13:03
odyssey4mecloudnull the downtime only happens if you're using specific kinds of storage13:04
cloudnullThis is true , anything with a local block device attachment13:04
cloudnullIf using netapp vnx it should remain online.13:05
odyssey4meor ceph13:05
odyssey4meetc13:05
*** sdake has joined #openstack-ansible13:05
cloudnullBasically lvm or nfs would be impossible which is what most users are deploying.13:06
hughsaundersodyssey4me: re k-k-fed post, that is a great post, lots of code towards the end... looks like quite a lot of that functionality is available in openstack client13:07
*** sdake_ has joined #openstack-ansible13:07
*** sdake has quit IRC13:10
cloudnullodyssey4me: hughsaunders: in your k-k travels did you need to pull in the follwoing reviews?13:12
cloudnullhttps://review.openstack.org/#/c/188581/13:12
cloudnullhttps://review.openstack.org/#/c/188881/13:13
cloudnullbased on spec https://blueprints.launchpad.net/python-keystoneclient/+spec/k2k-auth-plugin ?13:13
hughsaunderscloudnull: so far my travels have been theoretical13:13
cloudnullok, just curious.13:13
hughsaundersbut it looks the post we're reading is from the same guy http://blog.rodrigods.com/it-is-time-to-play-with-keystone-to-keystone-federation-in-kilo/13:14
odyssey4mecloudnull I'm focused on Keystone as an SP to an ADFS IDP, so neither of those appear required.13:15
odyssey4merumour has it that dstanek may have some tweaks to openstack-ansible to make things work :)13:16
cloudnullah. the land of thw windows .13:16
cloudnull++ dstanek13:17
dstanekcloudnull: howdy13:17
*** sreejithp has left #openstack-ansible13:17
odyssey4medstanek rumour has it you may have some bits you could share for making federation work - perhaps you could share in the form of a review, or even just a fork with patches in it?13:18
dstanekodyssey4me: yes, i'm working on some k2k stuff that i can share with you. it isn't osad, but maybe you'll find it useful?13:19
odyssey4medstanek ah - well, hughsaunders and miguelgrinberg are working on that particular track for including it into osad13:20
*** sdake has joined #openstack-ansible13:20
odyssey4meif you'd be happy to share with them at least that'd be great13:20
odyssey4meI'm having fun and games with ADFS and Shibboleth13:21
hughsaundersdstanek: whatcha up to?13:21
dstaneki've been working on a little k2k for an upcoming blog post13:23
*** sdake_ has quit IRC13:24
*** jaypipes has joined #openstack-ansible13:26
hughsaundersdstanek: cool, link us up when you're ready please :)13:26
*** jaypipes is now known as leakypipes13:50
*** subscope has quit IRC13:56
palendaemancdaz: RE gerrit working on non-trunk: sigmavirus24_awa's patch to infra for not requiring stable/ in names was accepted13:57
mancdazpalendae yarp13:57
mancdazthat's why it's working :)13:57
palendaeMmmhmm13:58
mancdazit only took about 4 months to get it in13:58
*** Mudpuppy has joined #openstack-ansible14:10
*** sigmavirus24_awa is now known as sigmavirus2414:12
sigmavirus24mancdaz: like most teams, there are higher priorities and lower priorities, and that was a low priority for infra14:14
mancdazsigmavirus24 of course14:14
palendaesigmavirus24: For https://review.openstack.org/#/c/193277/ - was I not already using section headings?14:14
palendaeI'm trying a definition list right now, and I lose the anchors to scripts, which is a useful thing to have, IMO14:15
sigmavirus24oh hah14:18
sigmavirus24gertty didn't show that14:18
sigmavirus24bug in gertty14:18
sigmavirus24good job gertty14:19
sigmavirus24Gertty must use a regex to strip out +s in the diff14:19
sigmavirus24And it's probably over-enthusiastic14:19
palendaeOk, I'll leave that alone then14:20
sigmavirus24https://imgur.com/a/PVEMr14:23
b3rnard0odyssey4me: cloudnull: and anyone else. meetbot manual information added to the bottom of the meetings page14:24
*** stevemar has joined #openstack-ansible14:24
palendaesigmavirus24: bug reports welcome?14:24
sigmavirus24Of course14:24
sigmavirus24I'm looking for where to report them now14:24
palendaeProbably LP14:25
*** stevemar is now known as stevedoor14:29
openstackgerritAndy McCrae proposed stackforge/os-ansible-deployment: Add global endpoint_type_proto options  https://review.openstack.org/19357314:31
*** lkoranda has quit IRC14:32
*** sreejithp has joined #openstack-ansible14:48
sigmavirus24palendae: https://storyboard.openstack.org/#!/story/200029914:50
*** daneyon has joined #openstack-ansible14:52
sigmavirus24palendae: oh man, my reviews shows up on the wrong lines too14:52
*** Mudpuppy has quit IRC14:57
*** daneyon_ has joined #openstack-ansible14:59
svgkeystone config uses the (default?) dogpile.cache.memcached driver for it's backend - this is hardcoded and not parametrized in osad15:01
svgaccording to http://docs.openstack.org/kilo/config-reference/content/section_keystone.conf.html in production it is recommended to use keystone.cache.memcache_pool15:01
svgGonna file a big, yel at me if there is a good reason for how it is now ;)15:02
svgbug!15:02
*** daneyon has quit IRC15:02
odyssey4mesvg hang on, I think that is parameterised15:04
svgsee playbooks/roles/os_keystone/templates/keystone.conf.j2, it's not templated15:04
odyssey4mesvg ah, you're right - we've simplified things... either cached or not cached... if cached then use dogpile15:05
svgactually, what is dogpile? the upstream doc seems to infer that does not use pooling, but I'm not sure what it does then15:06
odyssey4mesvg according to http://docs.openstack.org/developer/keystone/configuration.html keystone.cache.memcache_pool is an eventlet safe implementation of dogpile.cache.memcached15:08
odyssey4mewe're not using eventlet (we deploy keystone behind apache)15:08
odyssey4mebut yeah, perhaps we should allow parameterisation15:08
dstaneksvg: dogpile is just a cache abstraction15:09
*** Mudpuppy has joined #openstack-ansible15:11
*** daneyon_ has quit IRC15:12
svgok, so not really a difference on how it handles the pooling I guess?15:12
*** sreejithp has left #openstack-ansible15:20
sigmavirus24so dstanek dolphm do we need a cronjob for https://review.openstack.org/#/c/189998/ ?15:24
*** JTen has joined #openstack-ansible15:24
*** sreejithp has joined #openstack-ansible15:26
*** annashen has joined #openstack-ansible15:27
sigmavirus24http://lbragstad.com/?p=133 doesn't recommend having a cron job to rotate keys, it simply talks about a calculation for the case where you had a token with a certain lifetime and a rotation timeline15:28
sigmavirus24Also http://docs.openstack.org/developer/keystone/configuration.html#uuid-pki-pkiz-or-fernet says "can" not "should" ;)15:30
odyssey4mesigmavirus24 yeah, it's not clear whether key rotation is essential for good security or not15:32
*** annashen has quit IRC15:32
odyssey4meand it'd be useful to have some sort of guidelines for how often, how many, etc15:33
sigmavirus24odyssey4me: also, how would we set up a cron job for this without generating ssh-keys for keystone containers and distributing them to the other keystone containers as authorized keys?15:33
stevedoorsigmavirus24, probably want a cron job for rotating keys in production :(15:33
odyssey4mesigmavirus24 perhaps we count on network time being equal and ensure that the cron happens at the same time? :p15:34
sigmavirus24stevedoor: so the problem here is the way we set-up our containers and the fact that keystone-container-a can't ssh into keystone-container-{b,c,d,...} to distribute/synchronize the rotated keys15:34
sigmavirus24stevedoor: will fernet_rotate deterministically generate the new key?15:34
odyssey4mesigmavirus24 why can't one container ssh to another? surely we could implement ssh key generation in keystone[0] and distribute?15:35
stevedoorsigmavirus24, i don't know about generating a new one? i  think thats what the first step does?15:35
sigmavirus24i.e., assuming a keystone service has keys 0 and 1, will using fernet_rotate generate the same new key deterministically for each of those keystones?15:35
sigmavirus24odyssey4me: we could, but we currently don't have that15:36
sigmavirus24odyssey4me: and that's orthogonal to the review that's being blocked by that15:36
odyssey4meso if you generate once, and ship them around - when you rotate on each server will the rotation be the same?15:37
sigmavirus24^15:37
sigmavirus24that's the better way of phrasing my question since I need more coffee clearly15:37
*** daneyon has joined #openstack-ansible15:46
dstaneksigmavirus24: it won't generate the same key - you generate on one keystone and sync to the others15:50
sigmavirus24thought so15:50
*** georgem1 has joined #openstack-ansible15:55
*** cbaesema has quit IRC16:04
dolphmsigmavirus24: i don't think you *need* a cronjob, but i suppose it depends on how fast you believe your adversary can break both a 128 bit AES key and a 128 bit HMAC key16:17
dolphmsigmavirus24: the job would have to be on the ansible host, though16:18
dolphmsigmavirus24: stevedoor: and no, there's no determinism in key generation16:19
sigmavirus24I didn't think it would be, but I wanted to make sure16:19
sigmavirus24dolphm: correct, for now the job would have to be on the ansible host16:20
sigmavirus24Until we generate an ssh key for the keystone_all[0] and distribute it to the rest of the keystone containers16:21
*** palendae has quit IRC16:27
*** palendae has joined #openstack-ansible16:27
*** sdake_ has joined #openstack-ansible16:27
*** sdake has quit IRC16:31
*** lkoranda has joined #openstack-ansible16:38
openstackgerritNolan Brubaker proposed stackforge/os-ansible-deployment: Generate consistent length passwords in AIO  https://review.openstack.org/19362816:39
palendae^ backport for something already merged in master16:42
*** lkoranda has quit IRC16:51
*** sdake has joined #openstack-ansible16:52
*** sdake_ has quit IRC16:55
*** lkoranda has joined #openstack-ansible17:04
*** sdake_ has joined #openstack-ansible17:05
*** dkalleg has joined #openstack-ansible17:07
*** lkoranda has quit IRC17:08
*** sdake has quit IRC17:09
*** lkoranda has joined #openstack-ansible17:10
*** javeriak has joined #openstack-ansible17:11
*** abitha has joined #openstack-ansible17:16
*** KLevenstein is now known as KLev-afk17:17
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Tag prep for 9.0.10  https://review.openstack.org/19365017:22
*** annashen has joined #openstack-ansible17:27
*** sreejithp has quit IRC17:29
*** jwagner_away is now known as jwagner17:34
dolphmsigmavirus24: p.s. i'm maintaining my +1 -- don't see a reason to implement a rotation cronjob in the first pass17:48
dolphmsigmavirus24: it's certainly a valuable security hardening exercise though17:49
*** javeriak has quit IRC17:59
sigmavirus24dolphm: I agree18:06
openstackgerritMerged stackforge/os-ansible-deployment: Add documentation to user config file  https://review.openstack.org/19316218:11
*** javeriak has joined #openstack-ansible18:14
svgcan someone explain me the difference between [cache] and [memcache] in keystone, and why the former points to the local memcaches in the same container?18:18
sigmavirus24svg: that's a good question18:30
sigmavirus24I don't have an answer but I'll see if I can figure it out18:30
svgthe reason I'm looking into it is that I experience problems with the current setup when memcaches go down, as it is pooled end not load balanced18:32
cloudnullcores https://review.openstack.org/#/c/193650/ <- please make this go for 9.0.1018:32
cloudnullsigmavirus24 palendae mattt odyssey4me hughsaunders d34dh0r53 andymccr ^^18:32
sigmavirus24cloudnull: do you have any ideas about svg's question?18:33
* cloudnull looking back18:33
cloudnulldolphm: ^^18:38
*** stevedoor has quit IRC18:40
*** javeriak has quit IRC18:40
cloudnullsvg by default the keystone config for kilo is `keystone.token.persistence.backends.memcache_pool.Token` which will use the local memcached within keystone for token storage, and is hashed to all other keystone containers.18:41
cloudnullonce sigmavirus24 's fernet changes go through fernet will become the new default.18:41
svgis that the [cache] or [memcache] part?18:43
svgas you say it uses the 'local' memcache, I assume it's the [memcache] one18:43
cloudnulldolphm:  correct me if im wrong but im almost positive that is the [memcache] backend18:44
svg'local' being a bad name, as it's the list of memcaches within the keystone container18:44
cloudnullyes, local as in the memcache deployed within keystone18:44
svgthat howver does not explain why it is a separete set of memcaches instead of the containerized infra ones18:45
cloudnullthat way token storage is not in a global cache.18:45
*** KLev-afk is now known as KLevenstein18:46
svgcan I assume the logic how both sets of pools are connected to is the same?18:46
cloudnulland you could flush all tokens without impacting the rest of the gobal cache18:46
svgok18:46
dolphmsvg: cloudnull: sigmavirus24: i'm catching up...18:46
sigmavirus24dolphm: ++18:46
cloudnullsvg dolphm == keystone guru18:46
svgI assumed that :)18:47
dolphmi do believe that's the difference. [cache] is used for caching in front of sql/ldap, and [memcache] configures memcache as a sort-of-persistent kvs backend18:47
dolphm... used for token persistence18:48
dolphmso with the transition to fernet as a token provider, you no longer need a token backend, and therefore [memcache] will go unused18:49
svgok, thx - I'm into a troubleshoot with kilo though18:50
svgdo you know how keystone does its logic in how it connects to the pool of servers?18:50
*** javeriak has joined #openstack-ansible18:56
svgdolphm: ^^19:04
dolphmsvg: i assume you mean for memcache?19:04
svgyes19:05
svgfor both types of memcaches19:05
svgwhen say 3 of 5 memcaches go down, all client operations that involve keystone are dead slow b/c long timeouts on memcaches19:06
dolphmsvg: for [memcache] we use python-memcached, which handles pooling19:06
dstaneksvg: so the connection attempts don't timeout?19:07
dolphmsvg: implementation is here, i believe https://github.com/linsomniac/python-memcached/blob/master/memcache.py19:07
svgthey do, but that takes seconds19:07
dstanekthat may mean the timeout is seconds :-(19:08
svguhuh19:08
svgand it doesn't sounds like dead servers are avoided - or the dead ttl is to low too, or something like that19:08
dstaneksvg: they are not19:08
dstanekthe pool you are referring to is not a pool of servers, but a pool of connections19:09
dstanekthe way memcache works is that a key is hashed to 1 of the servers and that is the only one that has the value19:09
svgok, that rings a bell - I tried understanding ppythion-memcached19:10
dstanekso in your case if 3 or the 5 servers were down then roughly 60% of your requests would slow down (assuming an even distribution in the hashing algorithm)19:11
svgsound slike what I experience yes19:11
dstanekoh, i'm reading back in history here - if memcached is being used for tokens persistence then keystone will definitely have issues if any server is down19:14
svgok, so it seems to me that it is a bad idea here to define a non balanced list of memcaches19:16
svggiven osad configures a local memcache instance, it should suffice to only point to that local instance, IMHO?19:17
svgcloudnull: ^^19:17
*** sdake_ has quit IRC19:19
*** daneyon has quit IRC19:22
svgdstanek: so that is not really pooling, as dolphm said19:25
svgdoes the same logic apply for [cache]?19:25
dolphmsvg: it's taking advantage of a distributed cluster using an array of connections ;)19:31
svgas long as no nodes go down :)19:32
dolphmsvg: all keystone instances should share the exact same [memcache] configuration - it's a common backend behind the application layer19:32
svgah, that is important19:32
dolphmsvg: whereas [cache] can exist per application node, and there's zero reason to HA19:32
svgaha again19:33
svgso only the [memcache] should be HA19:33
cloudnullso our [cache] pooling should be limited to local memcached.19:33
cloudnulland not use a connection pool19:34
cloudnull?19:34
dolphmcloudnull: how is [cache] setup now?19:34
svgdolphm: how can we best handle the case for [memcache] where nodes might go down?19:34
*** metral has quit IRC19:34
*** jwagner is now known as jwagner_away19:34
dolphmscenarios for [cache]: A) the running application is running and a has a cache, B) the running application can run without a cache and performance suffers as the backend load increases, or C) the application is down and there's no reason to have a cache19:34
svgsounds like the opposite of how things are now?19:34
*** annashen has quit IRC19:35
svgdolphm: so all openstack components that use memcache basically should have their own local cache on localhost19:35
cloudnullwe are using dogpiles ability to connect to multuple caches19:36
svgand only keystone [memcache should have soething distributed]19:36
dolphmsvg: well, that's a question of memory. in a world with unlimited memory, yes19:36
cloudnullfor the [cache] section19:36
dstaneki don't think that would be very efficient19:36
svgagreed19:36
svgthe problem that I want to solve, is when nodes go down, and timeouts make the stack sluggish19:37
dolphmdstanek: i assume if you're passing multiple memcache instances to dogpile.cache, then you're just using a distributed cache... right?19:37
svgso we could configure each os component to only use the memcache on the local metal host perhaps19:37
dstaneksvg: we may need to solve that in the memcache client -19:37
dolphmdstanek: and we don't use dogpile to provide a token backend, do we?19:38
dstanekdolphm: what do you mean by using a distributed cache?19:38
dstaneki think it hashes each object and puts it on the correct server19:38
dolphmsvg: that would reduce the risk, for sure19:38
dolphmsvg: if a cache is down, the service using it is likely to have gone with it19:39
dolphmdstanek: that's what i mean19:39
dstaneksvg: you might find that your cache per node will be mostly useless19:39
dolphmdstanek: more cache misses?19:40
svgbut osad deploys a separate container with memcache for all os caches, and configures all components with a list of those containers19:40
dstanekyeah, you only have so much memory and i'd expect things to be evicted quickly19:40
svgfor the keystone [memcache] it does the same, except it uses a separate memcached within the same container as keystone19:40
svgin both cases there is a problem if one or more of the metal hosts go down19:41
svgstill not sure what the solution needs to be19:42
dstanekpython-memcached should mark the server as dead and not try use use it again - https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L130619:42
dstanekmaybe we need to tune the dead_retry?19:43
dstaneki haven't hacked on memcache in a long time, so i'm trying to catch up to where i was19:43
*** jwagner_away is now known as jwagner19:43
cloudnullpalendae:  still around https://review.openstack.org/#/c/193650/19:46
svgas of now there is non tuning19:47
svgso yes that might be it19:48
cloudnullsorry im only 50% here.19:49
cloudnullbut will try some things in a bit19:49
svgdstanek: possibly, what I am seing might be related to the fact that, in a small stack, not that many requests come through, so that a dead server comes back before the next request is made19:51
svgas keystine itself is load balanced...19:51
*** annashen has joined #openstack-ansible19:53
svgon a side note, also wondering why the repo-server has its own memcached instance btw19:53
cloudnullrepo-servers has memcached there for the distribution of the lsync keys. the repo-servers role can be deployed anywhere, even outside of osad all together so it needs to be able to do everything on its own.19:57
svgk, I noticed the infra memcaches are also used to distribute the horizon ssl keys19:59
svgas in, I broke that trying to configure some custom values for memcached_servers20:00
svgI am now wondering if that is a good idea20:00
*** metral_zzz has joined #openstack-ansible20:00
cloudnulland nova20:01
*** metral_zzz is now known as metral20:01
svgnova has ssl keys?20:01
cloudnullssh keys.20:02
cloudnullused for migrations between compute hosts.20:02
svgwhy does that need to go through memcache?20:03
cloudnullso they dont travel from a deployment host to a node.20:04
cloudnullwe encrypt locally, shard, distribute, and decrypt locally.20:04
cloudnullotherwise we'd have to synchronize back to the deployment host and then distribute to the nodes.20:05
cloudnullsimilar to the RHEL key deployment methodology just with memcached and encryption.20:06
*** stevemar has joined #openstack-ansible20:07
svgthat synchronize part can happen just in memroy with a registered variable20:08
svgwhat I'm thinking of: current setup assumes here all os components (and horizon here) on each node have the same set of memcache servers20:09
svggiven the info ^^ that this need not to be distributed, it stops the user to configure a different set of memcache servers on each node20:10
svgnot sure if that is a real problem, I just bumped into it trying to configure each os component with onlly the emcache server that lives on ethe same metal host20:11
cloudnullso something like caching servers for nova, cinder, glance, etc... ?20:12
stevellecloudnull: I'm thinking this might be a scenario for that filter I wrote: https://github.com/stevelle/rpc-openstack/blob/poc-identity-only/rpcd/playbooks/plugins/filters/select_hostvar.py20:13
stevellesvg I think I hear you trying to configure each os component to only talk to the memcache on the physical host it lives on20:15
stevellebbiab, lunch time20:15
*** sdake has joined #openstack-ansible20:16
*** sdake_ has joined #openstack-ansible20:17
*** Mudpuppy is now known as Cloudpuppy20:18
*** Cloudpuppy is now known as Mudpuppy20:19
svgstevelle: yes, that was one of the options I tried20:19
svgwhich by itself works, but it broke that horizon ssl key deployment20:19
*** sdake has quit IRC20:21
*** tlian2 has joined #openstack-ansible20:23
*** annashen has quit IRC20:23
*** javeriak has quit IRC20:25
*** annashen has joined #openstack-ansible20:25
*** tlian has quit IRC20:25
*** jwagner is now known as jwagner[away]20:32
*** jwagner[away] is now known as jwagner_away20:38
*** Mudpuppy_ has joined #openstack-ansible20:40
svgdolphm: dstanek cloudnull stevelle part of the solution might be to implement the  extra configuration parameters introduced with this bug report https://review.openstack.org/#/c/119452/20:40
svgbasically, I'm seeing the same behaviour as before that patch20:40
dstaneksvg: so you are seeing every request waiting on the downed memcached instance?20:43
*** daneyon has joined #openstack-ansible20:43
svgyes20:43
*** Mudpuppy has quit IRC20:43
svgwhich mostly comes down to keystone iirc20:44
*** georgem1 has quit IRC20:44
*** Mudpuppy_ has quit IRC20:44
dstanekwhat's weird is that if the instance is truely gone it should fail really fast20:45
dstanekis this for caching or the token backend?20:46
*** daneyon has quit IRC20:48
openstackgerritMerged stackforge/os-ansible-deployment: Tag prep for 9.0.10  https://review.openstack.org/19365020:49
cloudnulldstanek: its for both.20:49
*** britthouser has quit IRC20:50
cloudnulldefault is keystone.token.providers.uuid.Provider w/ keystone.token.persistence.backends.memcache.Token20:50
stevellesvg: if you used container_vars to make keystone to use the full set of memcaches, but overrode all the rest to use local-only would that be a good workaround today?20:50
dstanekusing the token backend will just fail if nodes are down - not much you can do20:50
dstaneki wouldn't expect the described behavior from the caching though20:51
*** tlian2 has quit IRC20:51
stevelle(or is it that you want the reverse of that, I got confused :))20:52
dolphmdstanek: i've seen that behavior too. i configured keystone to use memcache and then forgot to install memcache.20:55
dstanekdolphm: and you saw slow requests?20:55
dolphmdstanek: yes20:55
dstaneki would not expect waiting on memcached. slow because things are not in cache are work has to be done though20:56
*** radek_ has quit IRC20:58
*** jwagner_away is now known as jwagner20:59
svgI assume the slow requests were due to both the token backend and or the caching20:59
dstanekdolphm: http://paste.openstack.org/raw/305349/ <- this tooks milliseconds20:59
dstaneksvg: was a memcache instance for your token backend down?21:00
svgwell, the exact case was where we had 5 configured "controlers" with each having a container of memcache, keystone etc21:00
dolphmdstanek: it wouldn't behave differently behind dogpile, would it?21:00
dstanekdolphm: if the server connects, but doesn't respond there has to be a timeout21:00
svgthe token backend points to the 5 memcache instances that live in the keystone container21:01
dstanekdolphm: i don't think so21:01
svgof the 5 controller metal hosts, 2 were down (all containers were down)21:01
svgso 2 of the 5 token backends were down too, just as 2 of the 5 memcached containers were down21:01
dstaneksvg: if any of keystone's memcached servers were down i wouldn't expect keystone to work very well21:01
svgwell, yes, thats is the case21:02
dstanekit's like having your database down21:02
dolphmdstanek: would keystone returned tokens that were never written to memcache?21:03
svgdstanek: but that is what happens, as the token backends are not HA21:03
dstanekdolphm: i'm not sure how that backend actually works; i think it's just fundamentally broken by design21:03
sigmavirus24dstanek: in openstack? Never21:07
sigmavirus24=P21:07
cloudnullsigmavirus24: https://review.openstack.org/#/c/189998/21:09
cloudnullalso svg ^ once that goes in memcached token backends can go away21:09
dstanekdolphm: lol, i think it will transparently fail to write (that's what memcache is supposed to do) and then return the token like all it good21:09
svgstill feeling I'm in a catch 2221:10
sigmavirus24cloudnull: follow-on patch instead?21:11
cloudnullsvg: good reading on the future of the token backend http://dolphm.com/openstack-keystone-fernet-tokens21:11
sigmavirus24This adds the needfuls to make fernet work while that would make fernet the default21:11
cloudnullmaybe ...21:11
dstanekcloudnull: does osad use the memcache token backend by default?21:11
cloudnullyes21:11
dstanekbummer :-(21:11
cloudnulljuno used sql , kilo is memcached21:12
cloudnullby default.21:12
dstanekwhat was the driver behind that change?21:12
sigmavirus24does keystone use the memcached driver by default? I think we standardized our stuff to be 100% upstream defaults in kilo21:13
sigmavirus24see also: kilofication21:13
cloudnullit was a recommended change for speed and reliability on high traffic systems.21:13
*** daneyon has joined #openstack-ansible21:14
dstaneksigmavirus24: i think it's sql, but i'd have to look21:14
sigmavirus24dstanek: nevermind21:14
cloudnulldstanek: using sql under heavy load caused the token table to grow uncontrollably.21:14
cloudnulleven with token clean up on a per hour basis21:14
dstanekcloudnull: i'd agree with speed, but disagree with reliability21:14
dstanekthe failure modes are truly terrible21:15
cloudnullwe found that heavy periods of load would make token clean up in sql impossible without manual intervention and deployers were needing to deal with token tables that were multiple gigabytes21:16
dstanekdolphm: how many people would freak out if i posted a review to delete that backend :-)21:16
dstanekcloudnull: yeah, i don't doubt that21:16
cloudnullIE we had one user that would regularly need to clean up a token table that was > 50GB thus memcached.21:16
cloudnullhowever with fernet all that goes away21:17
cloudnulland there will be much rejoicing.21:17
cloudnullsigmavirus24:  nevermind21:18
svgso that will solve the issue with the token backend, not the general cache21:18
sigmavirus24cloudnull: ?21:18
cloudnull+221:18
sigmavirus24cloudnull: if you post the review to make fernet the default, I'll +2 to that ;)21:18
dstaneksigmavirus24: ++21:19
sigmavirus24dstanek: you can +1 ;)21:20
dstaneksvg: i don't know how to solve your issue (x% of memcached nodes down) if you are using the memcached token backend21:20
dstaneksigmavirus24: the all powerful +1!21:20
sigmavirus24dstanek: you're an SME21:20
sigmavirus24Your +1 on 189998 would also be powerful21:21
*** javeriak has joined #openstack-ansible21:21
dstaneksigmavirus24: i'll take a look21:21
sigmavirus24dstanek: be sure to test it out21:21
sigmavirus24I don't know how many times I built an AIO with it but it's always good to have an N+1 run21:22
sigmavirus24;)21:22
sigmavirus24dstanek: dolphm also, opinions about defaults for osad would be good, but should be added onto cloudnull's review to make it a default ;)21:22
*** annashen has quit IRC21:26
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Add support for deploying Keystone with Fernet  https://review.openstack.org/18999821:26
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Updated keystone to use fernet as the default  https://review.openstack.org/19372921:26
cloudnullsorry idk why it added a patch set to your sigmavirus24, my bad, but nothing changed.21:26
cloudnulli made it a dep but it seems to have gotten angry21:27
dstanekcloudnull: git review likes to rebase unless you tell it not to21:28
sigmavirus24weird21:28
sigmavirus24oh yeah21:28
sigmavirus24that's probably it21:28
sigmavirus24cloudnull: no need to apologize21:28
sigmavirus24gerrit doesn't wipe out votes any longer21:28
cloudnull++ dstanek, sigmavirus24. i forgot about the rebase happy process.21:30
cloudnullbut on that note. im going to sleep21:30
cloudnullhave a good weekend folks.21:30
*** KLevenstein has quit IRC21:31
*** annashen has joined #openstack-ansible21:41
sigmavirus24later21:41
*** stevemar has quit IRC21:52
*** daneyon has quit IRC21:57
*** abitha has quit IRC22:03
*** abitha has joined #openstack-ansible22:03
*** jwagner is now known as jwagner_away22:19
*** sdake_ has quit IRC22:25
*** javeriak has quit IRC22:47
*** annashen has quit IRC22:50
openstackgerritSteve Lewis proposed stackforge/os-ansible-deployment: Remove invalid client config option  https://review.openstack.org/19374723:00
stevellesigmavirus24: looks like just a repo-build.yml and os-keystone-install.yml would be necessary to apply the Fernet patch to an existing stack. Do you have any different thoughts?23:07
*** dkalleg has quit IRC23:08
*** dkalleg has joined #openstack-ansible23:10
*** dkalleg has quit IRC23:14
openstackgerritKevin Carter proposed stackforge/os-ansible-deployment: Updated keystone to use fernet as the default  https://review.openstack.org/19372923:16
sigmavirus24stevelle: sorry?23:16
sigmavirus24oh23:16
sigmavirus24for upgrades, I'm not certain23:16
sigmavirus24I've been building from scratch because you mentioned that you bjorked one of your AIOs by trying to do an upgrade23:17
stevellewham bam23:17
stevellewill find out relatively soon23:21
*** jlvillal has quit IRC23:21
sigmavirus24cool23:22
sigmavirus24Also, that assumes you updated user_variables, right?23:22
stevellejust the one iirc23:27
sigmavirus24Yep23:29
*** dkalleg has joined #openstack-ansible23:32
*** jlvillal has joined #openstack-ansible23:33
*** tlian has joined #openstack-ansible23:33
*** javeriak has joined #openstack-ansible23:37
*** dkalleg has quit IRC23:47
openstackgerritSteve Lewis proposed stackforge/os-ansible-deployment: Remove invalid client config option  https://review.openstack.org/19374723:49

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!