18:01:43 <dolphm> #startmeeting keystone 18:01:44 <openstack> Meeting started Tue Jan 21 18:01:43 2014 UTC and is due to finish in 60 minutes. The chair is dolphm. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:47 <openstack> The meeting name has been set to 'keystone' 18:01:50 <dolphm> #topic Meeting pings 18:02:01 <dolphm> so i just had a dumb idea to make 10 seconds of my Tuesday easier 18:02:12 <gyee> \o 18:02:22 <dolphm> i'm going to put the above list (just keystone-core at the moment) onto https://wiki.openstack.org/wiki/Meetings/KeystoneMeeting 18:02:37 <dolphm> if you'd like to be pinged prior to our meetings, add your IRC name to the list and i'll just copy/paste it :) 18:02:40 <morganfainberg> dolphm, copy/paste? ;) 18:02:58 <bknudson> 10 seconds will really add up over time. 18:03:00 <topol> o/ 18:03:22 <morganfainberg> dolphm, you missed ayoung in that list 18:03:36 <topol> dolphm can you add topol 18:03:42 <stevemar> hehe 18:03:48 <ayoung> I'm here 18:03:53 <marekd> hello. 18:04:01 <fabiog> hello 18:04:28 <dolphm> list is there now 18:04:37 <jamielennox> hi 18:04:49 <dolphm> #topic icehouse-2 freeze 18:05:12 <dolphm> so, ignoring gate woes for the moment, i bumped revocation-events and kds to i3, although i'd really rather not 18:05:24 * ayoung frantically trying to keep up with bknudson reviews on revocation-events 18:05:32 <ayoung> tis close 18:05:39 <dolphm> we have 3 hours to get things *gating* to call them icehouse-2 18:05:44 <bknudson> frantically trying to keep up with updates to review 18:06:02 <ayoung> bknudson, any stop ships in that latest round? 18:06:19 <bknudson> ayoung: in revocation events? 18:06:23 <ayoung> yeah 18:06:56 <topol> dolphm, anything that is priority to review now? 18:07:01 <henrynash> i'll try and get list limiting fixed up - it was passing - only question is whether the 'next' pointer method is acceptable 18:07:07 <dolphm> i haven't kept up with reviews there at all -- could the patchset be broken down into something that can land today, with trickier bits landing in i3? 18:07:14 <bknudson> ayoung: I didn't actually review it again. I just looked through my previous comments that weren't addressed in the latest patch 18:07:24 <bknudson> ayoung: so I'll have to go through and do an actual review to know 18:07:24 <dolphm> topol: revocation-events, and mapping i'd say 18:07:29 <morganfainberg> henrynash, did we determine 203 or in-json indicator? 18:07:30 <ayoung> bknudson, OK 18:07:40 <dolphm> #link https://review.openstack.org/#/c/55908/ 18:07:41 <stevemar> fyi mapping: https://review.openstack.org/#/c/60424/ 18:07:46 <dolphm> #link https://review.openstack.org/#/c/60424/ 18:07:54 <jamielennox> dolphm: i'd like this one in i2 18:07:56 * stevemar pokes bknudson to review mapping :P 18:07:57 <jamielennox> #link https://review.openstack.org/#/c/67785/ 18:08:09 <henrynash> morganfainberg: the 203 looks dodgy….I think we were misinterpreting the spec 18:08:12 <morganfainberg> once the meeting is done, i'll jump on reviewing those links. 18:08:22 <morganfainberg> henrynash, ok fair enough. 18:08:25 <henrynash> morganfainber: so in-json seems to be the easiest 18:08:32 <bknudson> stevemar: https://review.openstack.org/#/c/60424/ depends on a change that's outdated. 18:08:45 <dolphm> jamielennox: why is that citing a bug and not a blueprint :( 18:08:48 <dolphm> i haven't tracked that at all 18:08:58 <stevemar> bknudson, yeah, idp change just got pushed, so i'm rebasing 18:09:30 <marekd> stevemar: i'd rather wait :P 18:09:35 <jamielennox> dolphm: does it warrant a blueprint? 18:10:07 <dolphm> jamielennox: it warrants milestone tracking of some kind, and it has none 18:12:10 <jamielennox> dolphm: sorry, bug targetted to i2 18:14:01 <dolphm> if anyone is interested, i carried over our hackathon whiteboard, sort of, here: https://gist.github.com/dolph/8522191 18:14:21 <jamielennox> gyee: thanks 18:14:30 <ayoung> bknudson, only you could have 90 combined comments on a review and claim "I haven't reviewed it yet." 18:15:00 <bknudson> ayoung: barely scratched the surface. 18:15:10 <dolphm> ayoung: it's a giant patchset :( 18:16:23 <gyee> love the name Kite! 18:16:27 <topol> I'm not buying anymore beers 18:16:40 <ayoung> topol, oh yes you are 18:16:54 <gyee> topol was buying beer? 18:17:02 <morganfainberg> gyee, ++ 18:17:06 <ayoung> Growlers 18:18:44 <dolphm> hmm 18:19:09 <dolphm> so, since transient gate failures are a hot topic 18:19:12 <dolphm> #topic Default to listen on 127.0.0.1 instead of 0.0.0.0 18:19:25 <dolphm> at bkhudson's request, i restored https://review.openstack.org/#/c/59528/ 18:19:41 <bknudson> let's be part of the solution for gate problems and not part of the problem. 18:19:51 <lbragstad> bknudson: ++ 18:19:59 <ayoung> So...does 127 makes things better? 18:20:00 <morganfainberg> I really would rather this not be the default. i'd _rather_ this change go into devstack. 18:20:13 <dolphm> morganfainberg: did we determine that this would fix the gate? or that the actual fix must be in devstack and this would just set the precedent 18:20:20 <dolphm> morganfainberg: downvote / -2 then 18:20:21 <shardy> I still don't understand why things need to be defaulted in the config, and then again in eventlet_server.py 18:20:35 <morganfainberg> dolphm, i wont block it if we really want to go forward with it 18:20:44 <morganfainberg> dolphm, i 18:20:46 <dolphm> morganfainberg: i'm VERY torn on this :( 18:21:12 <jamielennox> dolphm: 0.0.0.0 seems like a better default there 18:21:15 <dolphm> hence i wanted to be the one to propose a solution rather than code review it :P 18:21:16 <ayoung> 0.0.0.0 means it can be called from off system, 127 does not 18:21:22 <morganfainberg> dolphm, if we made devstack able to do in single-node mode 127.0.0.1 (and this is the "right fix"), and we still devault to 0.0.0.0 for default 18:21:35 <morganfainberg> it doesn't present an insane default that every single deployer needs to change 18:21:42 <morganfainberg> alternatively... make that default explode keystone 18:21:43 <bknudson> 0.0.0.0 also means that it will prevent an ephemeral port at 35357 18:21:44 <jamielennox> exactly - in almost every deployment call from anywhere is correct, this would mean deployers have to change this 18:21:45 <morganfainberg> no default listen 18:21:56 <topol> so its saves a very few amount of gate rechecks. what was the downside again? 18:21:58 <morganfainberg> you must pick a listen, i'm ok with that as well. 18:22:06 <dolphm> ayoung: can you remove your approval on that, pending discussion 18:22:25 <ayoung> done 18:22:36 <jamielennox> morganfainberg: i think you're right - surely this can be set in the devstack gate config 18:23:05 <ayoung> what is the right behavior, devstack not withstanding? 18:23:10 <topol> I thought the impact of this was small enough that fixing it was picking nits 18:23:21 <topol> (on gate rechecks) 18:23:48 <ayoung> If you spin up a Keystone instance, with or without SSL, should you be able to reach it from a remote system by default? 18:23:49 <topol> jamielennox ++ 18:23:50 <bknudson> get rechecks are very painful to the infra team. 18:24:07 <bknudson> they affect all of openstack 18:24:13 <dolphm> topol: it's the highest priority transient error logged against keystone :) 18:24:25 <topol> dolphm and the only one 18:24:29 <bknudson> and are the reason why things are taking 3 days to merge 18:24:52 <dolphm> topol: shh (but GOOD WORK EVERYONE!) 18:25:07 <topol> bknudson, dolphm said this one does not happen that often. 18:25:19 <bknudson> if it happens at all it's too often 18:25:26 <bknudson> because of the number of times the gate tests are run 18:25:29 <topol> OK bknudson, you win! 18:25:43 <topol> lets fix this 18:26:04 <gyee> I am OK with the localhost fix 18:26:18 * topol topol owes morganfainberg another beer 18:26:27 <dolphm> topol: fwiw, rechecks are fairly low-cost... it's gate failure & gate resets that are incredibly expensive 18:26:42 <ayoung> We should not be using IP addresses anyway. Should be hostnames.... 18:26:46 <topol> dolphm, K 18:26:53 <bknudson> if this should be changed in some test config, then make the change there. 18:27:14 <morganfainberg> bknudson, if devstack will accept the change, i'd much rather get it there. 18:27:17 <bknudson> We've got this change and we can approve it right now to prevent us from being part of the gate problem. 18:27:29 <bknudson> we can revert it later if there's another solution out there. 18:27:57 <dolphm> does any other project default to listening on localhost? 18:28:02 <morganfainberg> dolphm, afaik no 18:28:12 <topol> bknudson is being very pragmatic on this 18:28:13 <dolphm> i'd rather not be a surprise in that regard :-/ 18:28:23 <lbragstad> We should update the commit message to state that it should be reverted if a devstack fix goes in 18:28:58 <jamielennox> why is this just our problem and not suffered by the other services? 18:29:17 <jamielennox> is it just that the admin port is in the ephemeral range? 18:29:19 <ayoung> why are we cahngein localhost to 127 in the doc? 127 should be localhost 18:29:20 <bknudson> jamielennox: do they use ports in the ephemeral range? 18:29:24 <topol> isnt there enough runway before m3 that we would know if setting the new value will cause chaos? 18:29:42 <dolphm> jamielennox: that's doc'd in the bug, but it's that we're using an IANA-assigned port, which falls in linux's ephemeral range (but not in the IANA-defined ephemeral range) 18:31:26 <jamielennox> isn't the solution then just to pick another port for devstack? 18:32:17 <ayoung> how about 443? 18:32:19 <dolphm> so, there's a lot of possible solutions with upsides/downsides 18:32:23 <dolphm> ayoung: that's one. 18:32:29 <gyee> jamielennox, not really, unless you want to change the service catalog 18:32:41 <dolphm> 35357 can be an exception to the ephemeral range in linux 18:33:02 <dolphm> 35357 can be changed to something else in devstack, but that would be very odd and cause documentation / UX issues 18:33:13 <dolphm> the ephemeral range can be reduced, but that's just nasty 18:33:22 <dolphm> especially as a fix for something like this 18:34:01 <ayoung> OK...forgetting everything else, should Keystone listen on 127.0.0.1 (localhost)by default? It is not a production level default. What is our stance? Is Keystone ready for production out the gate, or do you need to customize? I know we said we needed to customize the values in auth_token middleware in deployment. Is this comparable? 18:34:12 <jamielennox> gyee: well devstack will provision the service catalog based on the port you give it 18:34:25 <bknudson> ayoung: 0.0.0.0 isn't a production level default either. 18:34:37 <bknudson> they'll need to configure the system for the interfaces they want to listen on. 18:34:40 <dolphm> ayoung: the rest of our keystone.conf is generally geared for minimally production-friendly defaults 18:35:08 <ayoung> bknudson, its the only IP address based default that we can rely on. 18:35:29 <dolphm> bknudson: keyword "minimally" ;) 18:35:29 <jamielennox> it just seems that having devstack use a different port is a way less surprising fix 18:36:15 <dolphm> jamielennox: i think that would be *very* surprising to openstack manuals, all the blog authors out there, curl examples, our own docs, etc 18:36:26 <ayoung> jamielennox, nah, that will mess people up, too, as the AUTH_URL is usually pre-canned with the Keystone port. 18:37:14 <jamielennox> bknudson: I also see no problem with 0.0.0.0 in production if you are running the services on a controller machine 18:37:40 <jamielennox> dolphm: it's set as an environment variable only in the gate - but i do get what you mean 18:37:54 <dolphm> jamielennox: link? 18:39:04 <ayoung> So, if Keystone were to listen on an known public IP address for the machine, would that still have the ephemeral problem? I'm thinking it would 18:39:10 <ayoung> problem is that we have port 35357 18:39:50 <ayoung> We'd effectively break devstack's multi-node capability. And the same would be true for anything taking its cue from Devstack 18:39:51 <bknudson> ayoung: apparently the only time it has a problem with ephemeral ports (outbound) is when it's listening on 0.0.0.0 18:39:51 <jamielennox> dolphm: https://github.com/openstack-dev/devstack/blob/master/lib/keystone#L65 18:40:19 <dolphm> jamielennox: what the hell is the next line? 35358? 18:40:34 <jamielennox> lol, i have no idea 18:40:50 * dolphm runs off to git blame 18:40:55 <jamielennox> oh, it's for tls-proxy 18:41:30 <jamielennox> if you enable tls-proxy it runs on 35357 and then redirects to a keystone running on 35358 18:41:35 <dolphm> ah 18:41:40 <dolphm> 5000 and 5001 behave the same way 18:41:51 <ayoung> can't we "claim" an ephemeral port? 18:41:58 <morganfainberg> as a deployer (and I also polled my coworkers) it's dumb to listen on 127.0.0.1 by default. default should be minimally functional for general usecase 18:42:23 <morganfainberg> they also said that they'd not mind if keystone exploded if you didn't set a bind (e.g. must pick in all cases) 18:42:50 <dolphm> ayoung: ? 18:42:52 <morganfainberg> but changing to 127.0.0.1 would open the door for subtle broken behavior (can't access from another node by default) 18:42:57 <jamielennox> anyway so exporting KEYSTONE_AUTH_PORT=9999 (random value) in gate runs would solve this 18:43:02 <dolphm> morganfainberg: subtle for newbies is no fun 18:43:07 <morganfainberg> dolphm, exactly 18:43:18 <jamielennox> anyone _relying_ on 35357 is wrong anyway 18:43:29 <dolphm> jamielennox: explain? 18:43:31 <morganfainberg> figured i'd ask guys who run keystone every day their opinion 18:43:47 <ayoung> jamielennox, yeah, but we'll still break the gate, which is not what we want to accomplish 18:44:09 <ayoung> morganfainberg, wouldn't that be the public IP of the Keystone server? 18:44:10 <dolphm> the gate fix is really on the devstack side; i saw https://review.openstack.org/#/c/59528/ as just a first step 18:44:13 <gyee> morganfainberg, we fronted Keystone with Apigee, LG, reverse proxies, etc in production :) 18:44:18 <jamielennox> well everything does a first touch of keystone via 5000 to retrieve the auth_url, all we should need to do is set the admin url down into the non-ephemeral range 18:44:18 <gyee> LB 18:44:20 <dolphm> going to abandon https://review.openstack.org/#/c/59528/ unless anyone is really in favor of it 18:44:34 <dolphm> gyee: as you should 18:44:45 <morganfainberg> gyee, right, but that doesn't change 35357 issue 18:44:46 <jamielennox> ayoung: why would it break the gate? 18:44:48 <ayoung> is the problem that 0.0.0.0 somehow blocks the outgoing ports? All Outgoing and incoming 0.0.0.0 come from the same pool? 18:44:57 <dolphm> jamielennox: true for newer tools, for sure 18:45:00 <ayoung> jamielennox, cuz someone somewhere is hard coding 35357 18:45:07 <morganfainberg> ayoung, basically... if something else is using 35357 as ephemeral, we don't start 18:45:14 <jamielennox> ayoung: i'm only worried about the gate here 18:45:22 <morganfainberg> ayoung, it happens ~1 time a day in infra 18:45:22 <gyee> morganfainberg, dolphm, we run Keystone in dedicated boxes, I would imagine everyone does in production 18:45:22 <ayoung> jamielennox, so am I 18:45:37 <ayoung> can we retry if 35357 is not available? 18:45:38 <gyee> so this is really a devstack gate fix 18:45:38 <morganfainberg> gyee, no, we use keystyone on shared boxes, not hypervisors 18:45:39 <jamielennox> if someone within the gate has hardcoded to 35357 then that's a bug to fix 18:45:45 <dolphm> gyee: dedicated macbook pros all the way 18:45:45 <morganfainberg> gyee, but shared resources 18:46:02 <ayoung> sleep 1; retry ; sleep 5; retry; sleep 10; retry, give up? 18:46:19 <gyee> ayoung, find the process using that port, kill it :) 18:46:22 <gyee> then retry 18:46:30 <morganfainberg> ayoung, depends on how long lived the use of the port is, but it should reduce the scope by some doing that 18:46:36 <ayoung> gyee, I don't want to give Keystone the power to kill other processes 18:47:00 <topol> 1 time a day. For one time a day cant we wait till devstack fixes it? 18:47:12 <dolphm> topol: let's contribute the fix to devstack! 18:47:14 <morganfainberg> i think the best bet is to make devstack force a listen on 127.0.0.1 in single node 18:47:30 <gyee> ayoung, like stop squatting on my port! 18:47:44 <topol> dolphm, agreed 18:47:48 <ayoung> morganfainberg, nope 18:47:49 <dolphm> gyee: according to linux, it's not your port ;) 18:47:57 <morganfainberg> ayoung, no? 18:47:59 <ayoung> devstack doesn't know it is going to be single node 18:48:00 <dolphm> ayoung: why not? 18:48:11 <morganfainberg> ayoung, i think it does 18:48:12 <ayoung> you run an additional devstack on a second machine, and link it to the first 18:48:46 <morganfainberg> ayoung, really? i thought it had more smarts than that *admittedly, i haven't tried* 18:49:21 <ayoung> morganfainberg, its one of the ways it can be run, and what I would expect most people to do: set up a minimal install, make sure it runs, then add additional machines 18:49:43 <morganfainberg> this sounds like something that needs to be changed in the devstack-gate config then 18:49:52 <morganfainberg> and explicitly there. 18:50:02 <ayoung> so, if my machine has a "public" ip of 10.10.2.12 can I listed on 10.10.2.12:35357 without conflicting on the ephemeral port? 18:50:22 <morganfainberg> ayoung, if nothing else is using it yes. it doesn't matter 18:50:29 <gyee> according to bknudson, yes 18:50:35 <dolphm> ayoung: i think you'd be okay there unless something else explicitly was listening on the same interface + port 18:50:46 <morganfainberg> dolphm, i believe that is how it wokrs 18:50:50 <ayoung> what is it that we are tripping on that has the ephemeral port open in practice? 18:51:02 <morganfainberg> ayoung, anything. 18:51:06 <ayoung> in practice 18:51:10 <bknudson> ayoung: it's just some random application opens a connection to something 18:51:15 <morganfainberg> ayoung, it's random, not consistent 18:51:16 <ayoung> in devstack runs that fail? 18:51:33 <ayoung> these are dedicated machines, we should be able to tell 18:51:35 <morganfainberg> ayoung, could be apt, could be git, could be... uhm,, http? 18:51:45 <morganfainberg> ayoung, could be any request to an external place 18:52:00 <gyee> I blame pip 18:52:01 <morganfainberg> ayoung, and i think it isn't consistent. 18:52:03 <ayoung> outgoing requests should be separate from incoming, I thought 18:52:22 <morganfainberg> ayoung, bi-directional, ephemeral ports are used for that 18:52:28 <lbragstad> morganfainberg: in which case wouldn't it be up to the administrator to decided how to solve best? 18:52:53 <morganfainberg> lbragstad, the easiest way is to change the ephemeral port range for the box to match the IANA numbers published 18:53:02 <morganfainberg> lbragstad, the "most correct way" that is 18:53:12 <morganfainberg> lbragstad, linux doesn't adhere to that RFC by default 18:53:18 <bknudson> anyone using the default config will wind up with keystone failing to start every once in a while because of this 18:53:31 <ayoung> sudo echo "49152 65535" > /proc/sys/net/ipv4/ip_local_port_range 18:53:46 <morganfainberg> ayoung, doesn't solve anything running before devstack starts 18:53:53 <morganfainberg> ayoung, which is why devstack ditched that 18:54:05 <ayoung> they can do it on the gate machines in rc.d 18:54:09 <ayoung> rc.local 18:54:11 <morganfainberg> that could. 18:54:13 <dolphm> ayoung: you can also just register a single exception (35357) 18:54:14 <morganfainberg> they could 18:54:22 <morganfainberg> but i think they were resistent to that change 18:54:29 <dolphm> ayoung: but, what morganfainberg said 18:54:40 <morganfainberg> requires custom images 18:54:50 <ayoung> "we fear change:" 18:55:11 <morganfainberg> ayoung, "we fear changes we have to make every single time we update other things" 18:55:11 <ayoung> lets push for Devstack to run Keystone from HTTP using 443 18:55:25 <ayoung> HTTPD 18:55:25 <dolphm> ayoung: it's not so much the change, so much as it becomes a hacky fix. devstack should on *any* supported image 18:55:33 <gyee> 443 is https 18:55:52 <ayoung> gyee, that is why I didn't say 80 18:55:58 <ayoung> 5 minutes left 18:56:03 <dolphm> ayoung: but you said http 18:56:23 <ayoung> twas a typo I corrected immediately after. 18:56:25 <topol> doesnt devstack have an apache running with swift and possibly using the http ports? 18:56:26 <gyee> lets run devstack on windows :) 18:56:37 <ayoung> topol, that will work anyways 18:56:44 <bknudson> ok, so we're not going to fix the gate problem we're causing? 18:56:47 <ayoung> http://wiki.openstack.org/URLs 18:56:51 <topol> ayoung, no conflict? 18:56:57 <ayoung> nope 18:57:15 <ayoung> So long as the WSGI apps do something like /keystone vs /swift 18:57:35 <jamielennox> topol: it has apache for keystone too but it defaults to running on 35357 18:57:59 <topol> jamielennox, OK 18:58:24 <ayoung> which is why we can't just change to listening on 443, we need to use Apache to manage between the WSGI apps 18:59:25 * topol wonder what sdague thinks is the proper fix. He'll get to make the final decision anyway 18:59:48 * topol let him decide 18:59:57 <ayoung> I think the retry 19:00:18 <bknudson> retry will mean it fails less often 19:00:45 <morganfainberg> bknudson, short of abandoning the ephemeral port, there aren't good option here. 19:00:46 <ayoung> fewest things changing, and it would be possible to layer another, more draconian fix on it later 19:00:53 <topol> the transient gate fail will not have been officially removed 19:01:03 <jamielennox> that's time guys 19:01:29 <lbragstad> continue this in -dev? 19:01:55 <gyee> ++ 19:02:49 <pleia2> dolphm: can you #endmeeting ? 19:03:06 <clarkb> pleia2: you can do it after 60 minutes from start of meeting 19:03:12 <dolphm> #endmeeting