17:01:51 #startmeeting keystone-office-hours 17:01:52 Meeting started Tue Jul 17 17:01:51 2018 UTC and is due to finish in 60 minutes. The chair is lbragstad. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:53 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:55 The meeting name has been set to 'keystone_office_hours' 17:06:47 i have to step away for lunch 17:09:04 * gagehugo goes to grab lunch as well 17:09:46 wangxiyuan proposed openstack/keystoneauth master: Add netloc and version check for version discovery https://review.openstack.org/583215 17:19:22 * kmalloc just ate 17:19:27 breakfast. 17:41:29 so with this one https://review.openstack.org/#/c/578008/ I'm unclear on why this option isn't automatically exposed by keystoneauth and wondering if we should be exposing it there rather than registering it in keystonemiddleware 17:44:24 Merged openstack/ldappool master: Bump to hacking 1.1.x https://review.openstack.org/583162 18:04:55 does the validate token endpoint speak to the db at all when using fernet? 18:19:19 mnaser: yes. 18:19:37 mnaser: the fernet data is very limited and relies on the db to look up the values 18:19:51 ah, i thought they can be validated on their own 18:19:56 using the private key 18:20:21 nope. that was a feature of PKI tokens, but the token data was so large we exploded HTTP request handling 18:20:30 yeah i remember those times 18:20:45 Fernet tokens are "live" validated, meaning direct lookup in the db 18:21:17 it also means if a user's roles change, the validation payload would change, it reflects the current state of the DB plus or minus some delta depending on caching 18:21:53 * kmalloc kicks the trust controller ... hard. 18:22:24 ok ok.. what in the heck... i am getting a non-iso time back... but afaict i'm only emitting iso time into the dat astruct 18:22:31 how am i dropping the 'Z'... 18:22:35 *glare* 18:23:59 that sounds like a familiar bug 18:24:25 yeah 18:24:33 i am not seeing how the Z is being dropped 18:24:43 it's... weird. 18:31:36 ahhh found it 18:31:39 badly name variables 18:31:49 cmurphy: "trust" vs "new_trust" *eyeroll* 18:32:46 kmalloc: badly named variables changed the time format? 18:32:53 yeah 18:33:00 i was re-normalizing the input data 18:33:08 not the "after store in the db" data 18:33:17 ah 18:33:22 new_trust = providers.trust_api.create_trust() 18:33:30 then normalize_expires_at(trust) 18:33:32 whoope 18:33:34 whoopse* 18:33:44 i renamed "new_trust" to "return_trust" 18:34:05 just to make it easier to see and behold, normalizing the correct ref makes the difference 18:34:30 ++ 18:57:53 mnaser: are you having some issues with fernet tokens? 18:59:09 lbragstad: no, just trying to think of the cleanest way to architect this solution. Our keystone is based out in Montreal and we’re opening a region in the Silicon Valley 18:59:20 So trying to make sure the latency doesn’t break the world :) 19:01:19 oh... 19:01:21 sure 19:02:20 i assume both are writeable? 19:05:10 since token validation is read-only, the validation process should be immediate 19:15:48 lbragstad: damn 19:16:08 lbragstad: looks like we need a handler that explicitly does a 404 not a 405 when a method is not implemented =/ 19:16:21 lbragstad: since our contract is crappy and 404s in those cases. 19:16:30 bah 19:16:33 lbragstad: though... realistically that *isnt* really part of our api 19:16:59 PATCH /v3/OS-TRUST/trusts/ isn't really part of the API. 19:17:07 but... 19:17:13 it requires me to "change" a test. 19:17:21 so... what is your opinion here 19:17:32 I'm personally ok with moving to a 405 in this case 19:17:42 we just explicitly test for a 404. 19:18:09 if someone tries to patch a trust, it's a rando-404 19:18:16 if we end up going in that direction, i'd like to do it all at once for all 404s like that 19:18:33 ok i'll add a TODO explicitly in the PATCH implementation 19:18:36 i assume you're just talking about trusts? 19:18:41 yeah for now 19:19:06 since we are migrating apis piece-meal i think a 404->405 for these cases is fine as we go 19:19:20 ftr: "put" will 405 for trusts 19:19:23 and we don't check for that 19:19:37 hmm 19:20:11 we're highly inconsistent here 19:20:17 and it's not something that is "API" specific 19:20:29 it's not like PATCH for trust ever did anything 19:21:17 it does mean we need to implement a GET/POST/PUT/PATCH/DELETE for every resource that blindly 404s 19:21:49 unless it is overidden. it feels weird to do that, esp. since we test for some of these cases but not really all/many/consistently any of them 19:22:07 lbragstad: i'll defer to your call here though. 19:22:51 so: quick check on options (pick one) 19:22:58 1) Implement explicit 404 where we test for it 19:23:01 the explicity implementation would be nice 19:23:09 2) Implement explicit 404 everywhere 19:23:16 for un-defined methods 19:23:24 3) allow 405 to pass through for unimplemented methods 19:23:29 * kmalloc prefers #3 19:23:37 #2 makes things 405 -> 404 19:23:51 #2 is closest to what we have now. 19:24:13 how much harder would it be to do #2 over #3? 19:24:18 #3 makes some things 404->405, but they aren't part of our API, it happens to be magic it happens 19:24:36 #2 is just defining a base class and if someone doesn't use it, it will 405 19:25:08 405 is the MOST correct error to pass through in these cases. 19:25:17 i agree there 19:25:22 it mostly was an accident we got 404s because of how our system was implemented 19:25:35 i'm wondering what a client is going to do when they've been dealing with 404s and now they get a 405 19:25:51 they've been using an invalid/not part of the API already :P 19:26:12 it could have resulted in any number of things. 19:26:27 let me check if tempest tries patching trusts. 19:26:36 i think that will answer my question on "is this part of the api" 19:27:45 yeah tempest doesn't even try to patch a trust 19:28:37 so, revised order of preference: #3 -> 405s, #1 -> explicit 404 if we test for it, #2 blanket 404 19:29:42 from an API guidelines perspective, going from 404 -> 405 is allowed? 19:30:04 i'd contest this isn't part of the API 19:30:30 PATCH is not implemented for Trusts. 19:30:32 same with PUT 19:30:55 if patch was implemented, it wouldn't be allowed 19:31:08 but since it's an unimplemented method, it isn't part of the API. 19:31:29 it is the responsibility of the underlying server to handle it. 19:33:20 ok 19:33:26 in that case i think i'm fine with #3 19:34:06 yeah. 19:34:17 you know me, i'm pretty strict on the not breaking the contract 19:34:20 ;) 19:34:24 i'm proposing it as 405 19:34:30 but we can reverse course if needed 19:35:36 but 405 seems like the most correct thing in this context 19:36:53 yep 19:37:05 at least based on my interpretation of the RFC 19:37:12 exactly 19:37:21 i'll make sure to add a note in the review for the reviewers 19:37:25 ++ 19:37:39 ^ kinda would be nice in a separate patch.. but 19:37:47 er - that'd be a reason for it... 19:37:56 i have to separate out some patches anyway 19:37:57 but calling it out in the review might be fine 19:38:06 i'll split that 19:38:19 i have a bug in RBACEnforcer, Json_home population, and something else 19:38:27 so this is being split into 2-3 patches anyway 19:41:33 ok 19:45:33 question. does keystone support osprofiler and how far back does its support go? 19:48:44 yes - we've had that support since like newton i think 19:48:54 ok. cool. thanks. 19:50:51 639e36adbfa0f58ce2c3f31856b4343e9197aa0e 19:51:14 https://review.openstack.org/#/c/103368/ 19:52:32 nice. :) 20:05:24 lbragstad: bah 20:05:31 lbragstad: i found a bug in our json_home test... 20:05:50 i think. 20:07:30 ah just "obseved" expected is wonky 20:07:31 nvm 20:20:13 i have to relocate quick 20:20:40 lbragstad: i think i am okay with only 1 of the keystones being write-able 20:20:51 so auth happens in that one location, always. 20:20:58 ah 20:21:13 but validate should be able to happen in both 20:21:24 yes 20:21:41 and the idea is validate being able to happen to the closer datacenter (this is really to avoid latency in the openstack apis) 20:21:52 well - let me know if there is anything we can do to improve that upstream 20:22:26 well i'm just wondering how 'bad' it would be if i had a 70ms latency between keystone/openstack (while using memcache anyways) 20:22:32 memcache being local obviously 20:23:08 75ms rtt that is 20:26:37 not terrible, but... sub-ideal imo 20:26:49 like... nothing should break overtly 20:27:02 a keystone validate is not "fast" 20:27:18 but i worry about a non-local memcache in general 20:29:01 oh there will be a local memcache 20:29:10 but i suspect not a lot of clients reuse tokens besides openstack services 20:29:15 ah 20:30:43 lbragstad[m]: should have trusts pushed up in a moment 20:32:52 speaking of memcache, I have a topic for discussion, but I don't want to interrupt, so let me know when you guys are done ;) 20:35:22 Im having some trouble getting logged into Horizon using keystone shibboleth federation. When I successfully auth with my IDP I get redirected https://aio.mrice.internal:5000/v3/auth/OS-FEDERATION/websso/saml2?origin=https://aio.mrice.internal/dashboard/auth/websso/ and it tells me 401 The request you have made requires authentication. 20:35:38 any idea what I may have missed in my setup thats keeping this from working? 20:36:42 errr: if you turn on insecure_debug in keystone.conf it will tell you specifically what went wrong (remember to turn it off in production) 20:36:55 cmurphy: thanks 20:39:29 so my problem has to do with exceeding memcached's maximum connections limit .., caused by neutron-server, which uses keystonemiddleware 20:39:31 wow. that helped a ton. Thanks again cmurphy 20:39:32 Morgan Fainberg proposed openstack/keystone master: Move trusts to flask native dispatching https://review.openstack.org/583278 20:39:33 Morgan Fainberg proposed openstack/keystone master: Correctly pull input data for enforcement https://review.openstack.org/583356 20:39:33 Morgan Fainberg proposed openstack/keystone master: Allow for 'extension' rel in json home https://review.openstack.org/583357 20:39:34 Morgan Fainberg proposed openstack/keystone master: Trusts do not implement patch. https://review.openstack.org/583358 20:39:44 imacdonn: ok i can focus now that those patches are pushed 20:40:06 imacdonn: yep, i've seen that in the past. the correct answer is, unfortunately, to use the memcache-pool 20:40:17 imacdonn: the issue is eventlet creates a new connection per-green-thread 20:40:24 imacdonn: and doesn't cleanup it's connections well 20:40:44 kmalloc: yes, exactly .. gleaned from comments in https://bugs.launchpad.net/fuel-ccp/+bug/1653071 20:40:44 Launchpad bug 1653071 in fuel-ccp "Lack of free connections to memcached cause keystone middleware to stall" [High,Fix released] - Assigned to Fuel CCP Bug Team (fuel-ccp-bugs) 20:40:50 imacdonn: i'll need to check to make sure memcachepool has been implemented for ksm 20:41:07 imacdonn: it might be only in oslo_cache, and ksm is not on oslo_cache (if i remember correctly) yet 20:41:30 kmalloc: there's a config option "memcache_use_advanced_pool", but I've not been able to make much sense of it 20:41:32 this is one of the major reasons keystone dropped eventlet and all greenlet/greenthread based handling. 20:41:39 imacdonn: ah that would be the option. 20:41:53 imacdonn: it... is not a good piece of code. (and i apologize for that) 20:42:20 imacdonn: python-memcache is sortof a trainwreck on some fronts and we eat it badly because of it. 20:42:40 our solution(s): migrate to oslo-cache and implement a better backend for dogpile that is not based on python-memcached 20:42:44 it's been a long term goal. 20:42:49 kmalloc: Heh. OK, well at least it's good to know I'm not missing something stupid 20:43:01 nope. that advanced pool is the only real solution 20:43:11 it basically builds a shared set of memcache connections 20:43:34 but since python-memcache uses threadlocal natively and we stack on top of that, it is prone to being more fragile than we'd like 20:43:46 and we have had to reference internal interfaces 20:44:16 so there are a couple of bugs related to that - https://bugs.launchpad.net/keystonemiddleware/+bug/1748160 and https://bugs.launchpad.net/keystonemiddleware/+bug/1747565 20:44:16 Launchpad bug 1748160 in keystonemiddleware "memcache_use_advanced_pool = True doesn't work when use oslo.cache" [Undecided,Fix released] - Assigned to wangxiyuan (wangxiyuan) 20:44:18 Launchpad bug 1747565 in keystonemiddleware "AttributeError when use memcache_use_advanced_pool = True in Ocata" [Undecided,Fix released] - Assigned to wangxiyuan (wangxiyuan) 20:44:53 I tried back-porting the fixes, and also tried updating middleware to 5.x in my Queens environment 20:45:08 lbragstad[m], knikolla: damn so close. -333, +337, would have been aweseme if it was +333/-333 20:45:25 but I can't get it to work .. it doesn't seem to make any connections to memcached ... and then things start sporadically hanging 20:45:39 imacdonn: updating middleware beyond the release is a recipe for disaster, since ksm needs to lean on the libs in that nova/neutron/etc do 20:45:55 kmalloc: yeah, I figured, but had to try it (in a lab env) 20:45:58 right 20:45:59 hm. 20:46:27 Hi, for fernet keys, are there any ways to generate them other than keystone-manage fernet_setup? 20:46:41 so I guess I can try upping the max connections limit, but it's icky :/ 20:48:01 yeah 20:48:18 imacdonn: you're on Queens? 20:48:24 kmalloc: yes 20:48:28 imacdonn: hmm. 20:48:40 i really want to re-write that 20:48:47 * kmalloc wishes he could write more code faster 20:48:50 kmalloc: actually, maybe the problem env is Pike 20:49:13 stil. 20:49:16 still* 20:49:30 yeah, it's Pike ... but I haven't found anything that obviously makes it better in Queens 20:50:00 yeah. i just don't know i can offer some "here is an alternative solution" but it likely wont be straightforward 20:50:12 and migth require replacing part of ksm's code 20:50:57 part of my concern is that I don't have a good handle on how the connections are accumulating, so I don't know what I need to set the limit to 20:51:33 (as a workaround) ... or maybe they'll just keep multiplying? :/ 21:04:39 Morgan Fainberg proposed openstack/keystone master: Allow for 'extension' rel in json home https://review.openstack.org/583357 21:04:40 Morgan Fainberg proposed openstack/keystone master: Move trusts to flask native dispatching https://review.openstack.org/583278 21:04:40 Morgan Fainberg proposed openstack/keystone master: Use oslo_serialization.jsonutils https://review.openstack.org/583373 21:04:41 Morgan Fainberg proposed openstack/keystone master: Add pycadf initiator for flask resource https://review.openstack.org/583374 21:05:33 imacdonn: it's just because eventlet and it's backend suck at this part 21:05:46 the accumulation is mostly dead connections that haven't been cleaned up 21:06:00 the answer is ... set it obnoxiously high 21:06:07 if changing on the memcache side 21:06:28 yes, I guess that's all I can do .... unless I can shorten the lifetime 21:06:29 you can take a look at netstat and see what you have 21:06:58 # (Optional) Number of seconds a connection to memcached is held unused in the 21:06:58 # pool before it is closed. (integer value) 21:06:58 #memcache_pool_unused_timeout = 60 21:07:07 not sure if that comes into play or not 21:07:20 that only applies when using the advanced pool 21:07:24 ooh 21:07:49 is the advanced pool stuff documented somewhere? I haven't found anything that even mentions if, other than config comments 21:09:01 I'm going to try again to patch the two bugs in Queens... but when I tried that before, it seemed it wasn't making any connections to memcached at all 21:09:13 side question ... how can I turn on debug logging for this? 21:09:19 (from a client like neutron) 21:11:07 in pike the memcachepool is here https://github.com/openstack/keystonemiddleware/blob/stable/pike/keystonemiddleware/auth_token/_memcache_pool.py 21:11:19 queens moves to oslo_cache 21:11:28 patching / backporting is going tombe really hairy 21:11:43 totally different code bases 21:12:13 if I can get the pool to work on Queens, it'd be good incentive to upgrade ... want to get that done anyway 21:12:57 aye 21:13:06 so queens is very different 21:14:11 man.. our docs suck 21:14:15 i'm so very sorry 21:14:18 heh 21:18:41 so, i think the answer is configure memcache, then do the advanced_pool=true option 21:18:56 you have a number of tunables for ksm in the pool 21:19:01 most are sane-ish defaults 21:19:05 OK, so I went back to the Queens version of ksm, and encountered the two bugs mentioned above ... so applied the patches .... and now I'm back to no connections, and things are hanging 21:19:13 weird. 21:19:27 very weird. 21:20:38 ipv4 or ipv6? 21:21:01 the hosts have v6 addresses, but they're not being used 21:21:36 right 21:21:45 or, at least, nothing it configured to use them (and there are no DNS references to them) 21:21:49 memcache has issues in this case with v6 21:21:57 but as long as you're using v4 21:22:03 that is a non-issue 21:22:07 ok 21:22:51 can you reach memcache server from the neutron server host? 21:23:01 yes ... they're actually on the same host 21:23:03 telnet should work fine to test 21:23:28 # lsof -i TCP:11211 | wc -l 21:23:28 495 21:23:28 # lsof -i TCP:11211 | grep ^neutron 21:23:28 # 21:23:31 well, more specifically can you reach memcache via telnet using the ip/port specified in neutron config 21:23:46 note that it works if the advanced pool is not enabled 21:23:48 just making sure it's not just something wonky going on 21:23:50 ah 21:23:51 ok 21:23:52 hm 21:24:37 compare to: 21:24:39 # lsof -i TCP:11211 | grep -c ^neutron 21:24:39 2866 21:24:39 # 21:24:57 (Pike env that has the problem) 21:28:35 hm. 21:28:41 i don't see how this is not working 21:28:50 there is nothing wonky in the code base atm 21:28:58 it should just work with the advanced pool 21:29:06 yeah. I'm trying to figure out what it's hanging on 21:29:27 be back in a few 21:29:30 k 21:29:37 i need to not look at this for a sec and get some food/another coffee 21:29:38 :) 21:29:42 :) 22:50:36 kmalloc: unsurprisingly, the hang is occurring here (haven't attempted to trace beyond this point yet): https://github.com/openstack/keystonemiddleware/blob/stable/queens/keystonemiddleware/auth_token/__init__.py#L730 22:51:44 #endmeeting