#openstack-keystone log

17:01:51 <lbragstad> #startmeeting keystone-office-hours
17:01:52 <openstack> Meeting started Tue Jul 17 17:01:51 2018 UTC and is due to finish in 60 minutes.  The chair is lbragstad. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:01:55 <openstack> The meeting name has been set to 'keystone_office_hours'
17:06:47 <lbragstad> i have to step away for lunch
17:09:04 * gagehugo goes to grab lunch as well
17:09:46 <openstackgerrit> wangxiyuan proposed openstack/keystoneauth master: Add netloc and version check for version discovery  https://review.openstack.org/583215
17:19:22 * kmalloc just ate
17:19:27 <kmalloc> breakfast.
17:41:29 <cmurphy> so with this one https://review.openstack.org/#/c/578008/ I'm unclear on why this option isn't automatically exposed by keystoneauth and wondering if we should be exposing it there rather than registering it in keystonemiddleware
17:44:24 <openstackgerrit> Merged openstack/ldappool master: Bump to hacking 1.1.x  https://review.openstack.org/583162
18:04:55 <mnaser> does the validate token endpoint speak to the db at all when using fernet?
18:19:19 <kmalloc> mnaser: yes.
18:19:37 <kmalloc> mnaser: the fernet data is very limited and relies on the db to look up the values
18:19:51 <mnaser> ah, i thought they can be validated on their own
18:19:56 <mnaser> using the private key
18:20:21 <kmalloc> nope. that was a feature of PKI tokens, but the token data was so large we exploded HTTP request handling
18:20:30 <mnaser> yeah i remember those times
18:20:45 <kmalloc> Fernet tokens are "live" validated, meaning direct lookup in the db
18:21:17 <kmalloc> it also means if a user's roles change, the validation payload would change, it reflects the current state of the DB plus or minus some delta depending on caching
18:21:53 * kmalloc kicks the trust controller ... hard.
18:22:24 <kmalloc> ok ok.. what in the heck... i am getting a non-iso time back... but afaict i'm only emitting iso time into the dat astruct
18:22:31 <kmalloc> how am i dropping the 'Z'...
18:22:35 <kmalloc> *glare*
18:23:59 <cmurphy> that sounds like a familiar bug
18:24:25 <kmalloc> yeah
18:24:33 <kmalloc> i am not seeing how the Z is being dropped
18:24:43 <kmalloc> it's... weird.
18:31:36 <kmalloc> ahhh found it
18:31:39 <kmalloc> badly name variables
18:31:49 <kmalloc> cmurphy: "trust" vs "new_trust" *eyeroll*
18:32:46 <cmurphy> kmalloc: badly named variables changed the time format?
18:32:53 <kmalloc> yeah
18:33:00 <kmalloc> i was re-normalizing the input data
18:33:08 <kmalloc> not the "after store in the db" data
18:33:17 <cmurphy> ah
18:33:22 <kmalloc> new_trust = providers.trust_api.create_trust()
18:33:30 <kmalloc> then normalize_expires_at(trust)
18:33:32 <kmalloc> whoope
18:33:34 <kmalloc> whoopse*
18:33:44 <kmalloc> i renamed "new_trust" to "return_trust"
18:34:05 <kmalloc> just to make it easier to see and behold, normalizing the correct ref makes the difference
18:34:30 <cmurphy> ++
18:57:53 <lbragstad> mnaser: are you having some issues with fernet tokens?
18:59:09 <mnaser> lbragstad: no, just trying to think of the cleanest way to architect this solution. Our keystone is based out in Montreal and we’re opening a region in the Silicon Valley
18:59:20 <mnaser> So trying to make sure the latency doesn’t break the world :)
19:01:19 <lbragstad> oh...
19:01:21 <lbragstad> sure
19:02:20 <lbragstad> i assume both are writeable?
19:05:10 <lbragstad> since token validation is read-only, the validation process should be immediate
19:15:48 <kmalloc> lbragstad: damn
19:16:08 <kmalloc> lbragstad: looks like we need a handler that explicitly does a 404 not a 405 when a method is not implemented =/
19:16:21 <kmalloc> lbragstad: since our contract is crappy and 404s in those cases.
19:16:30 <lbragstad> bah
19:16:33 <kmalloc> lbragstad: though... realistically that *isnt* really part of our api
19:16:59 <kmalloc> PATCH /v3/OS-TRUST/trusts/<trust_id> isn't really part of the API.
19:17:07 <kmalloc> but...
19:17:13 <kmalloc> it requires me to "change" a test.
19:17:21 <kmalloc> so... what is your opinion here
19:17:32 <kmalloc> I'm personally ok with moving to a 405 in this case
19:17:42 <kmalloc> we just explicitly test for a 404.
19:18:09 <kmalloc> if someone tries to patch a trust, it's a rando-404
19:18:16 <lbragstad> if we end up going in that direction, i'd like to do it all at once for all 404s like that
19:18:33 <kmalloc> ok i'll add a TODO explicitly in the PATCH implementation
19:18:36 <lbragstad> i assume you're just talking about trusts?
19:18:41 <kmalloc> yeah for now
19:19:06 <kmalloc> since we are migrating apis piece-meal i think a 404->405 for these cases is fine as we go
19:19:20 <kmalloc> ftr: "put" will 405 for trusts
19:19:23 <kmalloc> and we don't check for that
19:19:37 <lbragstad> hmm
19:20:11 <kmalloc> we're highly inconsistent here
19:20:17 <kmalloc> and it's not something that is "API" specific
19:20:29 <kmalloc> it's not like PATCH for trust ever did anything
19:21:17 <kmalloc> it does mean we need to implement a GET/POST/PUT/PATCH/DELETE for every resource that blindly 404s
19:21:49 <kmalloc> unless it is overidden. it feels weird to do that, esp. since we test for some of these cases but not really all/many/consistently any of them
19:22:07 <kmalloc> lbragstad: i'll defer to your call here though.
19:22:51 <kmalloc> so: quick check on options (pick one)
19:22:58 <kmalloc> 1) Implement explicit 404 where we test for it
19:23:01 <lbragstad> the explicity implementation would be nice
19:23:09 <kmalloc> 2) Implement explicit 404 everywhere
19:23:16 <kmalloc> for un-defined methods
19:23:24 <kmalloc> 3) allow 405 to pass through for unimplemented methods
19:23:29 * kmalloc prefers #3
19:23:37 <lbragstad> #2 makes things 405 -> 404
19:23:51 <kmalloc> #2 is closest to what we have now.
19:24:13 <lbragstad> how much harder would it be to do #2 over #3?
19:24:18 <kmalloc> #3 makes some things 404->405, but they aren't part of our API, it happens to be magic it happens
19:24:36 <kmalloc> #2 is just defining a base class and if someone doesn't use it, it will 405
19:25:08 <kmalloc> 405 is the MOST correct error to pass through in these cases.
19:25:17 <lbragstad> i agree there
19:25:22 <kmalloc> it mostly was an accident we got 404s because of how our system was implemented
19:25:35 <lbragstad> i'm wondering what a client is going to do when they've been dealing with 404s and now they get a 405
19:25:51 <kmalloc> they've been using an invalid/not part of the API already :P
19:26:12 <kmalloc> it could have resulted in any number of things.
19:26:27 <kmalloc> let me check if tempest tries patching trusts.
19:26:36 <kmalloc> i think that will answer my question on "is this part of the api"
19:27:45 <kmalloc> yeah tempest doesn't even try to patch a trust
19:28:37 <kmalloc> so, revised order of preference: #3 -> 405s, #1 -> explicit 404 if we test for it, #2 blanket 404
19:29:42 <lbragstad> from an API guidelines perspective, going from 404 -> 405 is allowed?
19:30:04 <kmalloc> i'd contest this isn't part of the API
19:30:30 <kmalloc> PATCH is not implemented for Trusts.
19:30:32 <kmalloc> same with PUT
19:30:55 <kmalloc> if patch was implemented, it wouldn't be allowed
19:31:08 <kmalloc> but since it's an unimplemented method, it isn't part of the API.
19:31:29 <kmalloc> it is the responsibility of the underlying server to handle it.
19:33:20 <lbragstad> ok
19:33:26 <lbragstad> in that case i think i'm fine with #3
19:34:06 <kmalloc> yeah.
19:34:17 <kmalloc> you know me, i'm pretty strict on the not breaking the contract
19:34:20 <kmalloc> ;)
19:34:24 <kmalloc> i'm proposing it as 405
19:34:30 <kmalloc> but we can reverse course if needed
19:35:36 <lbragstad> but 405 seems like the most correct thing in this context
19:36:53 <kmalloc> yep
19:37:05 <lbragstad> at least based on my interpretation of the RFC
19:37:12 <kmalloc> exactly
19:37:21 <kmalloc> i'll make sure to add a note in the review for the reviewers
19:37:25 <lbragstad> ++
19:37:39 <lbragstad> ^ kinda would be nice in a separate patch.. but
19:37:47 <lbragstad> er - that'd be a reason for it...
19:37:56 <kmalloc> i have to separate out some patches anyway
19:37:57 <lbragstad> but calling it out in the review might be fine
19:38:06 <kmalloc> i'll split that
19:38:19 <kmalloc> i have a bug in RBACEnforcer, Json_home population, and something else
19:38:27 <kmalloc> so this is being split into 2-3 patches anyway
19:41:33 <lbragstad> ok
19:45:33 <kfox1111> question. does keystone support osprofiler and how far back does its support go?
19:48:44 <lbragstad> yes - we've had that support since like newton i think
19:48:54 <kfox1111> ok. cool. thanks.
19:50:51 <lbragstad> 639e36adbfa0f58ce2c3f31856b4343e9197aa0e
19:51:14 <lbragstad> https://review.openstack.org/#/c/103368/
19:52:32 <kfox1111> nice. :)
20:05:24 <kmalloc> lbragstad: bah
20:05:31 <kmalloc> lbragstad: i found a bug in our json_home test...
20:05:50 <kmalloc> i think.
20:07:30 <kmalloc> ah just "obseved" expected is wonky
20:07:31 <kmalloc> nvm
20:20:13 <lbragstad> i have to relocate quick
20:20:40 <mnaser> lbragstad: i think i am okay with only 1 of the keystones being write-able
20:20:51 <mnaser> so auth happens in that one location, always.
20:20:58 <lbragstad> ah
20:21:13 <lbragstad> but validate should be able to happen in both
20:21:24 <mnaser> yes
20:21:41 <mnaser> and the idea is validate being able to happen to the closer datacenter (this is really to avoid latency in the openstack apis)
20:21:52 <lbragstad> well - let me know if there is anything we can do to improve that upstream
20:22:26 <mnaser> well i'm just wondering how 'bad' it would be if i had a 70ms latency between keystone/openstack (while using memcache anyways)
20:22:32 <mnaser> memcache being local obviously
20:23:08 <mnaser> 75ms rtt that is
20:26:37 <kmalloc> not terrible, but... sub-ideal imo
20:26:49 <kmalloc> like... nothing should break overtly
20:27:02 <kmalloc> a keystone validate is not "fast"
20:27:18 <kmalloc> but i worry about a non-local memcache in general
20:29:01 <mnaser> oh there will be a local memcache
20:29:10 <mnaser> but i suspect not a lot of clients reuse tokens besides openstack services
20:29:15 <kmalloc> ah
20:30:43 <kmalloc> lbragstad[m]: should have trusts pushed up in a moment
20:32:52 <imacdonn> speaking of memcache, I have a topic for discussion, but I don't want to interrupt, so let me know when you guys are done ;)
20:35:22 <errr> Im having some trouble getting logged into Horizon using keystone shibboleth federation. When I successfully auth with my IDP I get redirected https://aio.mrice.internal:5000/v3/auth/OS-FEDERATION/websso/saml2?origin=https://aio.mrice.internal/dashboard/auth/websso/ and it tells me 401 The request you have made requires authentication.
20:35:38 <errr> any idea what I may have missed in my setup thats keeping this from working?
20:36:42 <cmurphy> errr: if you turn on insecure_debug in keystone.conf it will tell you specifically what went wrong (remember to turn it off in production)
20:36:55 <errr> cmurphy: thanks
20:39:29 <imacdonn> so my problem has to do with exceeding memcached's maximum connections limit .., caused by neutron-server, which uses keystonemiddleware
20:39:31 <errr> wow. that helped a ton. Thanks again cmurphy
20:39:32 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Move trusts to flask native dispatching  https://review.openstack.org/583278
20:39:33 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Correctly pull input data for enforcement  https://review.openstack.org/583356
20:39:33 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Allow for 'extension' rel in json home  https://review.openstack.org/583357
20:39:34 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Trusts do not implement patch.  https://review.openstack.org/583358
20:39:44 <kmalloc> imacdonn: ok i can focus now that those patches are pushed
20:40:06 <kmalloc> imacdonn: yep, i've seen that in the past. the correct answer is, unfortunately, to use the memcache-pool
20:40:17 <kmalloc> imacdonn: the issue is eventlet creates a new connection per-green-thread
20:40:24 <kmalloc> imacdonn: and doesn't cleanup it's connections well
20:40:44 <imacdonn> kmalloc: yes, exactly .. gleaned from comments in https://bugs.launchpad.net/fuel-ccp/+bug/1653071
20:40:44 <openstack> Launchpad bug 1653071 in fuel-ccp "Lack of free connections to memcached cause keystone middleware to stall" [High,Fix released] - Assigned to Fuel CCP Bug Team (fuel-ccp-bugs)
20:40:50 <kmalloc> imacdonn: i'll need to check to make sure memcachepool has been implemented for ksm
20:41:07 <kmalloc> imacdonn: it might be only in oslo_cache, and ksm is not on oslo_cache (if i remember correctly) yet
20:41:30 <imacdonn> kmalloc: there's a config option "memcache_use_advanced_pool", but I've not been able to make much sense of it
20:41:32 <kmalloc> this is one of the major reasons keystone dropped eventlet and all greenlet/greenthread based handling.
20:41:39 <kmalloc> imacdonn: ah that would be the option.
20:41:53 <kmalloc> imacdonn: it... is not a good piece of code. (and i apologize for that)
20:42:20 <kmalloc> imacdonn: python-memcache is sortof a trainwreck on some fronts and we eat it badly because of it.
20:42:40 <kmalloc> our solution(s): migrate to oslo-cache and implement a better backend for dogpile that is not based on python-memcached
20:42:44 <kmalloc> it's been a long term goal.
20:42:49 <imacdonn> kmalloc: Heh. OK, well at least it's good to know I'm not missing something stupid
20:43:01 <kmalloc> nope. that advanced pool is the only real solution
20:43:11 <kmalloc> it basically builds a shared set of memcache connections
20:43:34 <kmalloc> but since python-memcache uses threadlocal natively and we stack on top of that, it is prone to being more fragile than we'd like
20:43:46 <kmalloc> and we have had to reference internal interfaces
20:44:16 <imacdonn> so there are a couple of bugs related to that - https://bugs.launchpad.net/keystonemiddleware/+bug/1748160 and https://bugs.launchpad.net/keystonemiddleware/+bug/1747565
20:44:16 <openstack> Launchpad bug 1748160 in keystonemiddleware "memcache_use_advanced_pool = True doesn't work when use oslo.cache" [Undecided,Fix released] - Assigned to wangxiyuan (wangxiyuan)
20:44:18 <openstack> Launchpad bug 1747565 in keystonemiddleware "AttributeError when use memcache_use_advanced_pool = True in Ocata" [Undecided,Fix released] - Assigned to wangxiyuan (wangxiyuan)
20:44:53 <imacdonn> I tried back-porting the fixes, and also tried updating middleware to 5.x in my Queens environment
20:45:08 <kmalloc> lbragstad[m], knikolla: damn so close. -333, +337, would have been aweseme if it was +333/-333
20:45:25 <imacdonn> but I can't get it to work .. it doesn't seem to make any connections to memcached ... and then things start sporadically hanging
20:45:39 <kmalloc> imacdonn: updating middleware beyond the release is a recipe for disaster, since ksm needs to lean on the libs in that nova/neutron/etc do
20:45:55 <imacdonn> kmalloc: yeah, I figured, but had to try it (in a lab env)
20:45:58 <kmalloc> right
20:45:59 <kmalloc> hm.
20:46:27 <aning_> Hi, for fernet keys, are there any ways to generate them other than keystone-manage fernet_setup?
20:46:41 <imacdonn> so I guess I can try upping the max connections limit, but it's icky :/
20:48:01 <kmalloc> yeah
20:48:18 <kmalloc> imacdonn: you're on Queens?
20:48:24 <imacdonn> kmalloc: yes
20:48:28 <kmalloc> imacdonn: hmm.
20:48:40 <kmalloc> i really want to re-write that
20:48:47 * kmalloc wishes he could write more code faster
20:48:50 <imacdonn> kmalloc: actually, maybe the problem env is Pike
20:49:13 <kmalloc> stil.
20:49:16 <kmalloc> still*
20:49:30 <imacdonn> yeah, it's Pike ... but I haven't found anything that obviously makes it better in Queens
20:50:00 <kmalloc> yeah. i just don't know i can offer some "here is an alternative solution" but it likely wont be straightforward
20:50:12 <kmalloc> and migth require replacing part of ksm's code
20:50:57 <imacdonn> part of my concern is that I don't have a good handle on how the connections are accumulating, so I don't know what I need to set the limit to
20:51:33 <imacdonn> (as a workaround) ... or maybe they'll just keep multiplying? :/
21:04:39 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Allow for 'extension' rel in json home  https://review.openstack.org/583357
21:04:40 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Move trusts to flask native dispatching  https://review.openstack.org/583278
21:04:40 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Use oslo_serialization.jsonutils  https://review.openstack.org/583373
21:04:41 <openstackgerrit> Morgan Fainberg proposed openstack/keystone master: Add pycadf initiator for flask resource  https://review.openstack.org/583374
21:05:33 <kmalloc> imacdonn: it's just because eventlet and it's backend suck at this part
21:05:46 <kmalloc> the accumulation is mostly dead connections that haven't been cleaned up
21:06:00 <kmalloc> the answer is ... set it obnoxiously high
21:06:07 <kmalloc> if changing on the memcache side
21:06:28 <imacdonn> yes, I guess that's all I can do .... unless I can shorten the lifetime
21:06:29 <kmalloc> you can take a look at netstat and see what you have
21:06:58 <imacdonn> # (Optional) Number of seconds a connection to memcached is held unused in the
21:06:58 <imacdonn> # pool before it is closed. (integer value)
21:06:58 <imacdonn> #memcache_pool_unused_timeout = 60
21:07:07 <imacdonn> not sure if that comes into play or not
21:07:20 <kmalloc> that only applies when using the advanced pool
21:07:24 <imacdonn> ooh
21:07:49 <imacdonn> is the advanced pool stuff documented somewhere? I haven't found anything that even mentions if, other than config comments
21:09:01 <imacdonn> I'm going to try again to patch the two bugs in Queens... but when I tried that before, it seemed it wasn't making any connections to memcached at all
21:09:13 <imacdonn> side question ... how can I turn on debug logging for this?
21:09:19 <imacdonn> (from a client like neutron)
21:11:07 <kmalloc> in pike the memcachepool is here https://github.com/openstack/keystonemiddleware/blob/stable/pike/keystonemiddleware/auth_token/_memcache_pool.py
21:11:19 <kmalloc> queens moves to oslo_cache
21:11:28 <kmalloc> patching / backporting is going tombe really hairy
21:11:43 <kmalloc> totally different code bases
21:12:13 <imacdonn> if I can get the pool to work on Queens, it'd be good incentive to upgrade ... want to get that done anyway
21:12:57 <kmalloc> aye
21:13:06 <kmalloc> so queens is very different
21:14:11 <kmalloc> man.. our docs suck
21:14:15 <kmalloc> i'm so very sorry
21:14:18 <imacdonn> heh
21:18:41 <kmalloc> so, i think the answer is configure memcache, then do the advanced_pool=true option
21:18:56 <kmalloc> you have a number of tunables for ksm in the pool
21:19:01 <kmalloc> most are sane-ish defaults
21:19:05 <imacdonn> OK, so I went back to the Queens version of ksm, and encountered the two bugs mentioned above ... so applied the patches .... and now I'm back to no connections, and things are hanging
21:19:13 <kmalloc> weird.
21:19:27 <kmalloc> very weird.
21:20:38 <kmalloc> ipv4 or ipv6?
21:21:01 <imacdonn> the hosts have v6 addresses, but they're not being used
21:21:36 <kmalloc> right
21:21:45 <imacdonn> or, at least, nothing it configured to use them (and there are no DNS references to them)
21:21:49 <kmalloc> memcache has issues in this case with v6
21:21:57 <kmalloc> but as long as you're using v4
21:22:03 <kmalloc> that is a non-issue
21:22:07 <imacdonn> ok
21:22:51 <kmalloc> can you reach memcache server from the neutron server host?
21:23:01 <imacdonn> yes ... they're actually on the same host
21:23:03 <kmalloc> telnet should work fine to test
21:23:28 <imacdonn> # lsof -i TCP:11211 | wc -l
21:23:28 <imacdonn> 495
21:23:28 <imacdonn> # lsof -i TCP:11211 | grep ^neutron
21:23:28 <imacdonn> #
21:23:31 <kmalloc> well, more specifically can you reach memcache via telnet using the ip/port specified in neutron config
21:23:46 <imacdonn> note that it works if the advanced pool is not enabled
21:23:48 <kmalloc> just making sure it's not just something wonky going on
21:23:50 <kmalloc> ah
21:23:51 <kmalloc> ok
21:23:52 <kmalloc> hm
21:24:37 <imacdonn> compare to:
21:24:39 <imacdonn> # lsof -i TCP:11211 | grep -c ^neutron
21:24:39 <imacdonn> 2866
21:24:39 <imacdonn> #
21:24:57 <imacdonn> (Pike env that has the problem)
21:28:35 <kmalloc> hm.
21:28:41 <kmalloc> i don't see how this is not working
21:28:50 <kmalloc> there is nothing wonky in the code base atm
21:28:58 <kmalloc> it should just work with the advanced pool
21:29:06 <imacdonn> yeah. I'm trying to figure out what it's hanging on
21:29:27 <kmalloc> be back in a few
21:29:30 <imacdonn> k
21:29:37 <kmalloc> i need to not look at this for a sec and get some food/another coffee
21:29:38 <kmalloc> :)
21:29:42 <imacdonn> :)
22:50:36 <imacdonn> kmalloc: unsurprisingly, the hang is occurring here (haven't attempted to trace beyond this point yet):  https://github.com/openstack/keystonemiddleware/blob/stable/queens/keystonemiddleware/auth_token/__init__.py#L730
22:51:44 <lbragstad> #endmeeting