18:01:43 <dolphm> #startmeeting keystone
18:01:44 <openstack> Meeting started Tue Jan 21 18:01:43 2014 UTC and is due to finish in 60 minutes.  The chair is dolphm. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:01:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:01:47 <openstack> The meeting name has been set to 'keystone'
18:01:50 <dolphm> #topic Meeting pings
18:02:01 <dolphm> so i just had a dumb idea to make 10 seconds of my Tuesday easier
18:02:12 <gyee> \o
18:02:22 <dolphm> i'm going to put the above list (just keystone-core at the moment) onto https://wiki.openstack.org/wiki/Meetings/KeystoneMeeting
18:02:37 <dolphm> if you'd like to be pinged prior to our meetings, add your IRC name to the list and i'll just copy/paste it :)
18:02:40 <morganfainberg> dolphm, copy/paste? ;)
18:02:58 <bknudson> 10 seconds will really add up over time.
18:03:00 <topol> o/
18:03:22 <morganfainberg> dolphm, you missed ayoung  in that list
18:03:36 <topol> dolphm can you add topol
18:03:42 <stevemar> hehe
18:03:48 <ayoung> I'm here
18:03:53 <marekd> hello.
18:04:01 <fabiog> hello
18:04:28 <dolphm> list is there now
18:04:37 <jamielennox> hi
18:04:49 <dolphm> #topic icehouse-2 freeze
18:05:12 <dolphm> so, ignoring gate woes for the moment, i bumped revocation-events and kds to i3, although i'd really rather not
18:05:24 * ayoung frantically trying to keep up with bknudson reviews on revocation-events
18:05:32 <ayoung> tis close
18:05:39 <dolphm> we have 3 hours to get things *gating* to call them icehouse-2
18:05:44 <bknudson> frantically trying to keep up with updates to review
18:06:02 <ayoung> bknudson, any stop ships in that latest round?
18:06:19 <bknudson> ayoung: in revocation events?
18:06:23 <ayoung> yeah
18:06:56 <topol> dolphm, anything that is priority to review now?
18:07:01 <henrynash> i'll try and get list limiting fixed up - it was passing - only question is whether the 'next' pointer method is acceptable
18:07:07 <dolphm> i haven't kept up with reviews there at all -- could the patchset be broken down into something that can land today, with trickier bits landing in i3?
18:07:14 <bknudson> ayoung: I didn't actually review it again. I just looked through my previous comments that weren't addressed in the latest patch
18:07:24 <bknudson> ayoung: so I'll have to go through and do an actual review to know
18:07:24 <dolphm> topol: revocation-events, and mapping i'd say
18:07:29 <morganfainberg> henrynash, did we determine 203 or in-json indicator?
18:07:30 <ayoung> bknudson, OK
18:07:40 <dolphm> #link https://review.openstack.org/#/c/55908/
18:07:41 <stevemar> fyi mapping: https://review.openstack.org/#/c/60424/
18:07:46 <dolphm> #link https://review.openstack.org/#/c/60424/
18:07:54 <jamielennox> dolphm: i'd like this one in i2
18:07:56 * stevemar pokes bknudson to review mapping :P
18:07:57 <jamielennox> #link https://review.openstack.org/#/c/67785/
18:08:09 <henrynash> morganfainberg: the 203 looks dodgy….I think we were misinterpreting the spec
18:08:12 <morganfainberg> once the meeting is done, i'll jump on reviewing those links.
18:08:22 <morganfainberg> henrynash, ok fair enough.
18:08:25 <henrynash> morganfainber: so in-json seems to be the easiest
18:08:32 <bknudson> stevemar: https://review.openstack.org/#/c/60424/ depends on a change that's outdated.
18:08:45 <dolphm> jamielennox: why is that citing a bug and not a blueprint :(
18:08:48 <dolphm> i haven't tracked that at all
18:08:58 <stevemar> bknudson, yeah, idp change just got pushed, so i'm rebasing
18:09:30 <marekd> stevemar: i'd rather wait :P
18:09:35 <jamielennox> dolphm: does it warrant a blueprint?
18:10:07 <dolphm> jamielennox: it warrants milestone tracking of some kind, and it has none
18:12:10 <jamielennox> dolphm: sorry, bug targetted to i2
18:14:01 <dolphm> if anyone is interested, i carried over our hackathon whiteboard, sort of, here: https://gist.github.com/dolph/8522191
18:14:21 <jamielennox> gyee: thanks
18:14:30 <ayoung> bknudson, only you could have 90 combined comments on a review and claim "I haven't reviewed it yet."
18:15:00 <bknudson> ayoung: barely scratched the surface.
18:15:10 <dolphm> ayoung: it's a giant patchset :(
18:16:23 <gyee> love the name Kite!
18:16:27 <topol> I'm not buying anymore beers
18:16:40 <ayoung> topol, oh yes you are
18:16:54 <gyee> topol was buying beer?
18:17:02 <morganfainberg> gyee, ++
18:17:06 <ayoung> Growlers
18:18:44 <dolphm> hmm
18:19:09 <dolphm> so, since transient gate failures are a hot topic
18:19:12 <dolphm> #topic Default to listen on 127.0.0.1 instead of 0.0.0.0
18:19:25 <dolphm> at bkhudson's request, i restored https://review.openstack.org/#/c/59528/
18:19:41 <bknudson> let's be part of the solution for gate problems and not part of the problem.
18:19:51 <lbragstad> bknudson: ++
18:19:59 <ayoung> So...does 127 makes things better?
18:20:00 <morganfainberg> I really would rather this not be the default.  i'd _rather_ this change go into devstack.
18:20:13 <dolphm> morganfainberg: did we determine that this would fix the gate? or that the actual fix must be in devstack and this would just set the precedent
18:20:20 <dolphm> morganfainberg: downvote / -2 then
18:20:21 <shardy> I still don't understand why things need to be defaulted in the config, and then again in eventlet_server.py
18:20:35 <morganfainberg> dolphm, i wont block it if we really want to go forward with it
18:20:44 <morganfainberg> dolphm, i
18:20:46 <dolphm> morganfainberg: i'm VERY torn on this :(
18:21:12 <jamielennox> dolphm: 0.0.0.0 seems like a better default there
18:21:15 <dolphm> hence i wanted to be the one to propose a solution rather than code review it :P
18:21:16 <ayoung> 0.0.0.0 means it can be called from off system, 127 does not
18:21:22 <morganfainberg> dolphm, if we made devstack able to do in single-node mode 127.0.0.1 (and this is the "right fix"), and we still devault to 0.0.0.0 for default
18:21:35 <morganfainberg> it doesn't present an insane default that every single deployer needs to change
18:21:42 <morganfainberg> alternatively... make that default explode keystone
18:21:43 <bknudson> 0.0.0.0 also means that it will prevent an ephemeral port at 35357
18:21:44 <jamielennox> exactly - in almost every deployment call from anywhere is correct, this would mean deployers have to change this
18:21:45 <morganfainberg> no default listen
18:21:56 <topol> so its saves a very few amount of gate rechecks.  what was the downside again?
18:21:58 <morganfainberg> you must pick a listen, i'm ok with that as well.
18:22:06 <dolphm> ayoung: can you remove your approval on that, pending discussion
18:22:25 <ayoung> done
18:22:36 <jamielennox> morganfainberg: i think you're right - surely this can be set in the devstack gate config
18:23:05 <ayoung> what is the right behavior, devstack not withstanding?
18:23:10 <topol> I thought the impact of this was small enough that fixing it was picking nits
18:23:21 <topol> (on gate rechecks)
18:23:48 <ayoung> If you spin up a Keystone instance, with or without SSL, should you be able to reach it from a remote system by default?
18:23:49 <topol> jamielennox ++
18:23:50 <bknudson> get rechecks are very painful to the infra team.
18:24:07 <bknudson> they affect all of openstack
18:24:13 <dolphm> topol: it's the highest priority transient error logged against keystone :)
18:24:25 <topol> dolphm and the only one
18:24:29 <bknudson> and are the reason why things are taking 3 days to merge
18:24:52 <dolphm> topol: shh (but GOOD WORK EVERYONE!)
18:25:07 <topol> bknudson, dolphm said this one does not happen that often.
18:25:19 <bknudson> if it happens at all it's too often
18:25:26 <bknudson> because of the number of times the gate tests are run
18:25:29 <topol> OK bknudson, you win!
18:25:43 <topol> lets fix this
18:26:04 <gyee> I am OK with the localhost fix
18:26:18 * topol topol owes morganfainberg another beer
18:26:27 <dolphm> topol: fwiw, rechecks are fairly low-cost... it's gate failure & gate resets that are incredibly expensive
18:26:42 <ayoung> We should not be using IP addresses anyway.  Should be hostnames....
18:26:46 <topol> dolphm, K
18:26:53 <bknudson> if this should be changed in some test config, then make the change there.
18:27:14 <morganfainberg> bknudson, if devstack will accept the change, i'd much rather get it there.
18:27:17 <bknudson> We've got this change and we can approve it right now to prevent us from being part of the gate problem.
18:27:29 <bknudson> we can revert it later if there's another solution out there.
18:27:57 <dolphm> does any other project default to listening on localhost?
18:28:02 <morganfainberg> dolphm, afaik no
18:28:12 <topol> bknudson is being very pragmatic on this
18:28:13 <dolphm> i'd rather not be a surprise in that regard :-/
18:28:23 <lbragstad> We should update the commit message to state that it should be reverted if a devstack fix goes in
18:28:58 <jamielennox> why is this just our problem and not suffered by the other services?
18:29:17 <jamielennox> is it just that the admin port is in the ephemeral range?
18:29:19 <ayoung> why are we cahngein localhost to 127 in the doc?   127 should be localhost
18:29:20 <bknudson> jamielennox: do they use ports in the ephemeral range?
18:29:24 <topol> isnt there enough runway before m3 that we would know if setting the new value will cause chaos?
18:29:42 <dolphm> jamielennox: that's doc'd in the bug, but it's that we're using an IANA-assigned port, which falls in linux's ephemeral range (but not in the IANA-defined ephemeral range)
18:31:26 <jamielennox> isn't the solution then just to pick another port for devstack?
18:32:17 <ayoung> how about 443?
18:32:19 <dolphm> so, there's a lot of possible solutions with upsides/downsides
18:32:23 <dolphm> ayoung: that's one.
18:32:29 <gyee> jamielennox, not really, unless you want to change the service catalog
18:32:41 <dolphm> 35357 can be an exception to the ephemeral range in linux
18:33:02 <dolphm> 35357 can be changed to something else in devstack, but that would be very odd and cause documentation / UX issues
18:33:13 <dolphm> the ephemeral range can be reduced, but that's just nasty
18:33:22 <dolphm> especially as a fix for something like this
18:34:01 <ayoung> OK...forgetting everything else,  should Keystone listen on 127.0.0.1 (localhost)by default?  It is not a production level default.  What is our stance?  Is Keystone ready for production out the gate, or do you need to customize?  I know we said we needed to customize the values in auth_token middleware in deployment.  Is this comparable?
18:34:12 <jamielennox> gyee: well devstack will provision the service catalog based on the port you give it
18:34:25 <bknudson> ayoung: 0.0.0.0 isn't a production level default either.
18:34:37 <bknudson> they'll need to configure the system for the interfaces they want to listen on.
18:34:40 <dolphm> ayoung: the rest of our keystone.conf is generally geared for minimally production-friendly defaults
18:35:08 <ayoung> bknudson, its the only IP address based default that we can rely on.
18:35:29 <dolphm> bknudson: keyword "minimally" ;)
18:35:29 <jamielennox> it just seems that having devstack use a different port is a way less surprising fix
18:36:15 <dolphm> jamielennox: i think that would be *very* surprising to openstack manuals, all the blog authors out there, curl examples, our own docs, etc
18:36:26 <ayoung> jamielennox, nah, that will mess people up, too, as the AUTH_URL is usually pre-canned with the Keystone port.
18:37:14 <jamielennox> bknudson: I also see no problem with 0.0.0.0 in production if you are running the services on a controller machine
18:37:40 <jamielennox> dolphm: it's set as an environment variable only in the gate - but i do get what you mean
18:37:54 <dolphm> jamielennox: link?
18:39:04 <ayoung> So, if Keystone were to listen on an known public IP address for the machine, would that still have the ephemeral problem?  I'm thinking it would
18:39:10 <ayoung> problem is that we have port 35357
18:39:50 <ayoung> We'd effectively break devstack's multi-node capability.  And the same would be true for anything taking its cue from Devstack
18:39:51 <bknudson> ayoung: apparently the only time it has a problem with ephemeral ports (outbound) is when it's listening on 0.0.0.0
18:39:51 <jamielennox> dolphm: https://github.com/openstack-dev/devstack/blob/master/lib/keystone#L65
18:40:19 <dolphm> jamielennox: what the hell is the next line? 35358?
18:40:34 <jamielennox> lol, i have no idea
18:40:50 * dolphm runs off to git blame
18:40:55 <jamielennox> oh, it's for tls-proxy
18:41:30 <jamielennox> if you enable tls-proxy it runs on 35357 and then redirects to a keystone running on 35358
18:41:35 <dolphm> ah
18:41:40 <dolphm> 5000 and 5001 behave the same way
18:41:51 <ayoung> can't we "claim" an ephemeral port?
18:41:58 <morganfainberg> as a deployer (and I also polled my coworkers) it's dumb to listen on 127.0.0.1 by default.  default should be minimally functional for general usecase
18:42:23 <morganfainberg> they also said that they'd not mind if keystone exploded if you didn't set a bind (e.g. must pick in all cases)
18:42:50 <dolphm> ayoung: ?
18:42:52 <morganfainberg> but changing to 127.0.0.1 would open the door for subtle broken behavior (can't access from another node by default)
18:42:57 <jamielennox> anyway so exporting KEYSTONE_AUTH_PORT=9999 (random value) in gate runs would solve this
18:43:02 <dolphm> morganfainberg: subtle for newbies is no fun
18:43:07 <morganfainberg> dolphm, exactly
18:43:18 <jamielennox> anyone _relying_ on 35357 is wrong anyway
18:43:29 <dolphm> jamielennox: explain?
18:43:31 <morganfainberg> figured i'd ask guys who run keystone every day their opinion
18:43:47 <ayoung> jamielennox, yeah, but we'll still break the gate, which is not what we want to accomplish
18:44:09 <ayoung> morganfainberg, wouldn't that be the public IP of the Keystone server?
18:44:10 <dolphm> the gate fix is really on the devstack side; i saw https://review.openstack.org/#/c/59528/ as just a first step
18:44:13 <gyee> morganfainberg, we fronted Keystone with Apigee, LG, reverse proxies, etc in production :)
18:44:18 <jamielennox> well everything does a first touch of keystone via 5000 to retrieve the auth_url, all we should need to do is set the admin url down into the non-ephemeral range
18:44:18 <gyee> LB
18:44:20 <dolphm> going to abandon https://review.openstack.org/#/c/59528/ unless anyone is really in favor of it
18:44:34 <dolphm> gyee: as you should
18:44:45 <morganfainberg> gyee, right, but that doesn't change 35357 issue
18:44:46 <jamielennox> ayoung: why would it break the gate?
18:44:48 <ayoung> is the problem that 0.0.0.0 somehow blocks the outgoing ports?  All Outgoing and incoming 0.0.0.0 come from the same pool?
18:44:57 <dolphm> jamielennox: true for newer tools, for sure
18:45:00 <ayoung> jamielennox, cuz someone somewhere is hard coding 35357
18:45:07 <morganfainberg> ayoung, basically... if something else is using 35357 as ephemeral, we don't start
18:45:14 <jamielennox> ayoung: i'm only worried about the gate here
18:45:22 <morganfainberg> ayoung, it happens ~1 time a day in infra
18:45:22 <gyee> morganfainberg, dolphm, we run Keystone in dedicated boxes, I would imagine everyone does in production
18:45:22 <ayoung> jamielennox, so am I
18:45:37 <ayoung> can we retry if 35357 is not available?
18:45:38 <gyee> so this is really a devstack gate fix
18:45:38 <morganfainberg> gyee, no, we use keystyone on shared boxes, not hypervisors
18:45:39 <jamielennox> if someone within the gate has hardcoded to 35357 then that's a bug to fix
18:45:45 <dolphm> gyee: dedicated macbook pros all the way
18:45:45 <morganfainberg> gyee, but shared resources
18:46:02 <ayoung> sleep 1; retry ;  sleep 5; retry; sleep 10; retry, give up?
18:46:19 <gyee> ayoung, find the process using that port, kill it :)
18:46:22 <gyee> then retry
18:46:30 <morganfainberg> ayoung, depends on how long lived the use of the port is, but it should reduce the scope by some doing that
18:46:36 <ayoung> gyee, I don't want to give Keystone the power to kill other processes
18:47:00 <topol> 1 time a day.  For one time a day cant we wait till devstack fixes it?
18:47:12 <dolphm> topol: let's contribute the fix to devstack!
18:47:14 <morganfainberg> i think the best bet is to make devstack force a listen on 127.0.0.1 in single node
18:47:30 <gyee> ayoung, like stop squatting on my port!
18:47:44 <topol> dolphm, agreed
18:47:48 <ayoung> morganfainberg, nope
18:47:49 <dolphm> gyee: according to linux, it's not your port ;)
18:47:57 <morganfainberg> ayoung, no?
18:47:59 <ayoung> devstack doesn't know it is going to be single node
18:48:00 <dolphm> ayoung: why not?
18:48:11 <morganfainberg> ayoung, i think it does
18:48:12 <ayoung> you run an additional devstack on a second machine, and link it to the first
18:48:46 <morganfainberg> ayoung, really? i thought it had more smarts than that *admittedly, i haven't tried*
18:49:21 <ayoung> morganfainberg, its one of the ways it can be run, and what I would expect most people to do:  set up a minimal install, make sure it runs, then add additional machines
18:49:43 <morganfainberg> this sounds like something that needs to be changed in the devstack-gate config then
18:49:52 <morganfainberg> and explicitly there.
18:50:02 <ayoung> so, if my machine has a "public" ip of 10.10.2.12  can I listed on 10.10.2.12:35357 without conflicting on the ephemeral port?
18:50:22 <morganfainberg> ayoung, if nothing else is using it yes.  it doesn't matter
18:50:29 <gyee> according to bknudson, yes
18:50:35 <dolphm> ayoung: i think you'd be okay there unless something else explicitly was listening on the same interface + port
18:50:46 <morganfainberg> dolphm, i believe that is how it wokrs
18:50:50 <ayoung> what is it that we are tripping on that has the ephemeral port open in practice?
18:51:02 <morganfainberg> ayoung, anything.
18:51:06 <ayoung> in practice
18:51:10 <bknudson> ayoung: it's just some random application opens a connection to something
18:51:15 <morganfainberg> ayoung, it's random, not consistent
18:51:16 <ayoung> in devstack runs that fail?
18:51:33 <ayoung> these are dedicated machines,  we should be able to tell
18:51:35 <morganfainberg> ayoung, could be apt, could be git, could be... uhm,, http?
18:51:45 <morganfainberg> ayoung, could be any request to an external place
18:52:00 <gyee> I blame pip
18:52:01 <morganfainberg> ayoung, and i think it isn't consistent.
18:52:03 <ayoung> outgoing requests should be separate from incoming, I thought
18:52:22 <morganfainberg> ayoung, bi-directional, ephemeral ports are used for that
18:52:28 <lbragstad> morganfainberg: in which case wouldn't it be up to the administrator to decided how to solve best?
18:52:53 <morganfainberg> lbragstad, the easiest way is to change the ephemeral port range for the box to match the IANA numbers published
18:53:02 <morganfainberg> lbragstad, the "most correct way" that is
18:53:12 <morganfainberg> lbragstad, linux doesn't adhere to that RFC by default
18:53:18 <bknudson> anyone using the default config will wind up with keystone failing to start every once in a while because of this
18:53:31 <ayoung> sudo  echo "49152 65535" > /proc/sys/net/ipv4/ip_local_port_range
18:53:46 <morganfainberg> ayoung, doesn't solve anything running before devstack starts
18:53:53 <morganfainberg> ayoung, which is why devstack ditched that
18:54:05 <ayoung> they can do it on the gate machines in rc.d
18:54:09 <ayoung> rc.local
18:54:11 <morganfainberg> that could.
18:54:13 <dolphm> ayoung: you can also just register a single exception (35357)
18:54:14 <morganfainberg> they could
18:54:22 <morganfainberg> but i think they were resistent to that change
18:54:29 <dolphm> ayoung: but, what morganfainberg said
18:54:40 <morganfainberg> requires custom images
18:54:50 <ayoung> "we fear change:"
18:55:11 <morganfainberg> ayoung, "we fear changes we have to make every single time we update other things"
18:55:11 <ayoung> lets push for Devstack to run Keystone from HTTP using 443
18:55:25 <ayoung> HTTPD
18:55:25 <dolphm> ayoung: it's not so much the change, so much as it becomes a hacky fix. devstack should on *any* supported image
18:55:33 <gyee> 443 is https
18:55:52 <ayoung> gyee, that is why I didn't say 80
18:55:58 <ayoung> 5 minutes left
18:56:03 <dolphm> ayoung: but you said http
18:56:23 <ayoung> twas a typo I corrected immediately after.
18:56:25 <topol> doesnt devstack have an apache running with swift and possibly using the http ports?
18:56:26 <gyee> lets run devstack on windows :)
18:56:37 <ayoung> topol, that will work anyways
18:56:44 <bknudson> ok, so we're not going to fix the gate problem we're causing?
18:56:47 <ayoung> http://wiki.openstack.org/URLs
18:56:51 <topol> ayoung, no conflict?
18:56:57 <ayoung> nope
18:57:15 <ayoung> So long as the WSGI apps do something like /keystone vs /swift
18:57:35 <jamielennox> topol: it has apache for keystone too but it defaults to running on 35357
18:57:59 <topol> jamielennox, OK
18:58:24 <ayoung> which is why we can't just change to listening on 443, we need to use Apache to manage between the WSGI apps
18:59:25 * topol wonder what sdague thinks is the proper fix.  He'll get to make the final decision anyway
18:59:48 * topol let him decide
18:59:57 <ayoung> I  think the retry
19:00:18 <bknudson> retry will mean it fails less often
19:00:45 <morganfainberg> bknudson, short of abandoning the ephemeral port, there aren't good option here.
19:00:46 <ayoung> fewest things changing, and it would be possible to layer another, more draconian fix on it later
19:00:53 <topol> the transient gate fail will not have been officially removed
19:01:03 <jamielennox> that's time guys
19:01:29 <lbragstad> continue this in -dev?
19:01:55 <gyee> ++
19:02:49 <pleia2> dolphm: can you #endmeeting ?
19:03:06 <clarkb> pleia2: you can do it after 60 minutes from start of meeting
19:03:12 <dolphm> #endmeeting