18:01:43 #startmeeting keystone 18:01:44 Meeting started Tue Jan 21 18:01:43 2014 UTC and is due to finish in 60 minutes. The chair is dolphm. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:47 The meeting name has been set to 'keystone' 18:01:50 #topic Meeting pings 18:02:01 so i just had a dumb idea to make 10 seconds of my Tuesday easier 18:02:12 \o 18:02:22 i'm going to put the above list (just keystone-core at the moment) onto https://wiki.openstack.org/wiki/Meetings/KeystoneMeeting 18:02:37 if you'd like to be pinged prior to our meetings, add your IRC name to the list and i'll just copy/paste it :) 18:02:40 dolphm, copy/paste? ;) 18:02:58 10 seconds will really add up over time. 18:03:00 o/ 18:03:22 dolphm, you missed ayoung in that list 18:03:36 dolphm can you add topol 18:03:42 hehe 18:03:48 I'm here 18:03:53 hello. 18:04:01 hello 18:04:28 list is there now 18:04:37 hi 18:04:49 #topic icehouse-2 freeze 18:05:12 so, ignoring gate woes for the moment, i bumped revocation-events and kds to i3, although i'd really rather not 18:05:24 * ayoung frantically trying to keep up with bknudson reviews on revocation-events 18:05:32 tis close 18:05:39 we have 3 hours to get things *gating* to call them icehouse-2 18:05:44 frantically trying to keep up with updates to review 18:06:02 bknudson, any stop ships in that latest round? 18:06:19 ayoung: in revocation events? 18:06:23 yeah 18:06:56 dolphm, anything that is priority to review now? 18:07:01 i'll try and get list limiting fixed up - it was passing - only question is whether the 'next' pointer method is acceptable 18:07:07 i haven't kept up with reviews there at all -- could the patchset be broken down into something that can land today, with trickier bits landing in i3? 18:07:14 ayoung: I didn't actually review it again. I just looked through my previous comments that weren't addressed in the latest patch 18:07:24 ayoung: so I'll have to go through and do an actual review to know 18:07:24 topol: revocation-events, and mapping i'd say 18:07:29 henrynash, did we determine 203 or in-json indicator? 18:07:30 bknudson, OK 18:07:40 #link https://review.openstack.org/#/c/55908/ 18:07:41 fyi mapping: https://review.openstack.org/#/c/60424/ 18:07:46 #link https://review.openstack.org/#/c/60424/ 18:07:54 dolphm: i'd like this one in i2 18:07:56 * stevemar pokes bknudson to review mapping :P 18:07:57 #link https://review.openstack.org/#/c/67785/ 18:08:09 morganfainberg: the 203 looks dodgy….I think we were misinterpreting the spec 18:08:12 once the meeting is done, i'll jump on reviewing those links. 18:08:22 henrynash, ok fair enough. 18:08:25 morganfainber: so in-json seems to be the easiest 18:08:32 stevemar: https://review.openstack.org/#/c/60424/ depends on a change that's outdated. 18:08:45 jamielennox: why is that citing a bug and not a blueprint :( 18:08:48 i haven't tracked that at all 18:08:58 bknudson, yeah, idp change just got pushed, so i'm rebasing 18:09:30 stevemar: i'd rather wait :P 18:09:35 dolphm: does it warrant a blueprint? 18:10:07 jamielennox: it warrants milestone tracking of some kind, and it has none 18:12:10 dolphm: sorry, bug targetted to i2 18:14:01 if anyone is interested, i carried over our hackathon whiteboard, sort of, here: https://gist.github.com/dolph/8522191 18:14:21 gyee: thanks 18:14:30 bknudson, only you could have 90 combined comments on a review and claim "I haven't reviewed it yet." 18:15:00 ayoung: barely scratched the surface. 18:15:10 ayoung: it's a giant patchset :( 18:16:23 love the name Kite! 18:16:27 I'm not buying anymore beers 18:16:40 topol, oh yes you are 18:16:54 topol was buying beer? 18:17:02 gyee, ++ 18:17:06 Growlers 18:18:44 hmm 18:19:09 so, since transient gate failures are a hot topic 18:19:12 #topic Default to listen on 127.0.0.1 instead of 0.0.0.0 18:19:25 at bkhudson's request, i restored https://review.openstack.org/#/c/59528/ 18:19:41 let's be part of the solution for gate problems and not part of the problem. 18:19:51 bknudson: ++ 18:19:59 So...does 127 makes things better? 18:20:00 I really would rather this not be the default. i'd _rather_ this change go into devstack. 18:20:13 morganfainberg: did we determine that this would fix the gate? or that the actual fix must be in devstack and this would just set the precedent 18:20:20 morganfainberg: downvote / -2 then 18:20:21 I still don't understand why things need to be defaulted in the config, and then again in eventlet_server.py 18:20:35 dolphm, i wont block it if we really want to go forward with it 18:20:44 dolphm, i 18:20:46 morganfainberg: i'm VERY torn on this :( 18:21:12 dolphm: 0.0.0.0 seems like a better default there 18:21:15 hence i wanted to be the one to propose a solution rather than code review it :P 18:21:16 0.0.0.0 means it can be called from off system, 127 does not 18:21:22 dolphm, if we made devstack able to do in single-node mode 127.0.0.1 (and this is the "right fix"), and we still devault to 0.0.0.0 for default 18:21:35 it doesn't present an insane default that every single deployer needs to change 18:21:42 alternatively... make that default explode keystone 18:21:43 0.0.0.0 also means that it will prevent an ephemeral port at 35357 18:21:44 exactly - in almost every deployment call from anywhere is correct, this would mean deployers have to change this 18:21:45 no default listen 18:21:56 so its saves a very few amount of gate rechecks. what was the downside again? 18:21:58 you must pick a listen, i'm ok with that as well. 18:22:06 ayoung: can you remove your approval on that, pending discussion 18:22:25 done 18:22:36 morganfainberg: i think you're right - surely this can be set in the devstack gate config 18:23:05 what is the right behavior, devstack not withstanding? 18:23:10 I thought the impact of this was small enough that fixing it was picking nits 18:23:21 (on gate rechecks) 18:23:48 If you spin up a Keystone instance, with or without SSL, should you be able to reach it from a remote system by default? 18:23:49 jamielennox ++ 18:23:50 get rechecks are very painful to the infra team. 18:24:07 they affect all of openstack 18:24:13 topol: it's the highest priority transient error logged against keystone :) 18:24:25 dolphm and the only one 18:24:29 and are the reason why things are taking 3 days to merge 18:24:52 topol: shh (but GOOD WORK EVERYONE!) 18:25:07 bknudson, dolphm said this one does not happen that often. 18:25:19 if it happens at all it's too often 18:25:26 because of the number of times the gate tests are run 18:25:29 OK bknudson, you win! 18:25:43 lets fix this 18:26:04 I am OK with the localhost fix 18:26:18 * topol topol owes morganfainberg another beer 18:26:27 topol: fwiw, rechecks are fairly low-cost... it's gate failure & gate resets that are incredibly expensive 18:26:42 We should not be using IP addresses anyway. Should be hostnames.... 18:26:46 dolphm, K 18:26:53 if this should be changed in some test config, then make the change there. 18:27:14 bknudson, if devstack will accept the change, i'd much rather get it there. 18:27:17 We've got this change and we can approve it right now to prevent us from being part of the gate problem. 18:27:29 we can revert it later if there's another solution out there. 18:27:57 does any other project default to listening on localhost? 18:28:02 dolphm, afaik no 18:28:12 bknudson is being very pragmatic on this 18:28:13 i'd rather not be a surprise in that regard :-/ 18:28:23 We should update the commit message to state that it should be reverted if a devstack fix goes in 18:28:58 why is this just our problem and not suffered by the other services? 18:29:17 is it just that the admin port is in the ephemeral range? 18:29:19 why are we cahngein localhost to 127 in the doc? 127 should be localhost 18:29:20 jamielennox: do they use ports in the ephemeral range? 18:29:24 isnt there enough runway before m3 that we would know if setting the new value will cause chaos? 18:29:42 jamielennox: that's doc'd in the bug, but it's that we're using an IANA-assigned port, which falls in linux's ephemeral range (but not in the IANA-defined ephemeral range) 18:31:26 isn't the solution then just to pick another port for devstack? 18:32:17 how about 443? 18:32:19 so, there's a lot of possible solutions with upsides/downsides 18:32:23 ayoung: that's one. 18:32:29 jamielennox, not really, unless you want to change the service catalog 18:32:41 35357 can be an exception to the ephemeral range in linux 18:33:02 35357 can be changed to something else in devstack, but that would be very odd and cause documentation / UX issues 18:33:13 the ephemeral range can be reduced, but that's just nasty 18:33:22 especially as a fix for something like this 18:34:01 OK...forgetting everything else, should Keystone listen on 127.0.0.1 (localhost)by default? It is not a production level default. What is our stance? Is Keystone ready for production out the gate, or do you need to customize? I know we said we needed to customize the values in auth_token middleware in deployment. Is this comparable? 18:34:12 gyee: well devstack will provision the service catalog based on the port you give it 18:34:25 ayoung: 0.0.0.0 isn't a production level default either. 18:34:37 they'll need to configure the system for the interfaces they want to listen on. 18:34:40 ayoung: the rest of our keystone.conf is generally geared for minimally production-friendly defaults 18:35:08 bknudson, its the only IP address based default that we can rely on. 18:35:29 bknudson: keyword "minimally" ;) 18:35:29 it just seems that having devstack use a different port is a way less surprising fix 18:36:15 jamielennox: i think that would be *very* surprising to openstack manuals, all the blog authors out there, curl examples, our own docs, etc 18:36:26 jamielennox, nah, that will mess people up, too, as the AUTH_URL is usually pre-canned with the Keystone port. 18:37:14 bknudson: I also see no problem with 0.0.0.0 in production if you are running the services on a controller machine 18:37:40 dolphm: it's set as an environment variable only in the gate - but i do get what you mean 18:37:54 jamielennox: link? 18:39:04 So, if Keystone were to listen on an known public IP address for the machine, would that still have the ephemeral problem? I'm thinking it would 18:39:10 problem is that we have port 35357 18:39:50 We'd effectively break devstack's multi-node capability. And the same would be true for anything taking its cue from Devstack 18:39:51 ayoung: apparently the only time it has a problem with ephemeral ports (outbound) is when it's listening on 0.0.0.0 18:39:51 dolphm: https://github.com/openstack-dev/devstack/blob/master/lib/keystone#L65 18:40:19 jamielennox: what the hell is the next line? 35358? 18:40:34 lol, i have no idea 18:40:50 * dolphm runs off to git blame 18:40:55 oh, it's for tls-proxy 18:41:30 if you enable tls-proxy it runs on 35357 and then redirects to a keystone running on 35358 18:41:35 ah 18:41:40 5000 and 5001 behave the same way 18:41:51 can't we "claim" an ephemeral port? 18:41:58 as a deployer (and I also polled my coworkers) it's dumb to listen on 127.0.0.1 by default. default should be minimally functional for general usecase 18:42:23 they also said that they'd not mind if keystone exploded if you didn't set a bind (e.g. must pick in all cases) 18:42:50 ayoung: ? 18:42:52 but changing to 127.0.0.1 would open the door for subtle broken behavior (can't access from another node by default) 18:42:57 anyway so exporting KEYSTONE_AUTH_PORT=9999 (random value) in gate runs would solve this 18:43:02 morganfainberg: subtle for newbies is no fun 18:43:07 dolphm, exactly 18:43:18 anyone _relying_ on 35357 is wrong anyway 18:43:29 jamielennox: explain? 18:43:31 figured i'd ask guys who run keystone every day their opinion 18:43:47 jamielennox, yeah, but we'll still break the gate, which is not what we want to accomplish 18:44:09 morganfainberg, wouldn't that be the public IP of the Keystone server? 18:44:10 the gate fix is really on the devstack side; i saw https://review.openstack.org/#/c/59528/ as just a first step 18:44:13 morganfainberg, we fronted Keystone with Apigee, LG, reverse proxies, etc in production :) 18:44:18 well everything does a first touch of keystone via 5000 to retrieve the auth_url, all we should need to do is set the admin url down into the non-ephemeral range 18:44:18 LB 18:44:20 going to abandon https://review.openstack.org/#/c/59528/ unless anyone is really in favor of it 18:44:34 gyee: as you should 18:44:45 gyee, right, but that doesn't change 35357 issue 18:44:46 ayoung: why would it break the gate? 18:44:48 is the problem that 0.0.0.0 somehow blocks the outgoing ports? All Outgoing and incoming 0.0.0.0 come from the same pool? 18:44:57 jamielennox: true for newer tools, for sure 18:45:00 jamielennox, cuz someone somewhere is hard coding 35357 18:45:07 ayoung, basically... if something else is using 35357 as ephemeral, we don't start 18:45:14 ayoung: i'm only worried about the gate here 18:45:22 ayoung, it happens ~1 time a day in infra 18:45:22 morganfainberg, dolphm, we run Keystone in dedicated boxes, I would imagine everyone does in production 18:45:22 jamielennox, so am I 18:45:37 can we retry if 35357 is not available? 18:45:38 so this is really a devstack gate fix 18:45:38 gyee, no, we use keystyone on shared boxes, not hypervisors 18:45:39 if someone within the gate has hardcoded to 35357 then that's a bug to fix 18:45:45 gyee: dedicated macbook pros all the way 18:45:45 gyee, but shared resources 18:46:02 sleep 1; retry ; sleep 5; retry; sleep 10; retry, give up? 18:46:19 ayoung, find the process using that port, kill it :) 18:46:22 then retry 18:46:30 ayoung, depends on how long lived the use of the port is, but it should reduce the scope by some doing that 18:46:36 gyee, I don't want to give Keystone the power to kill other processes 18:47:00 1 time a day. For one time a day cant we wait till devstack fixes it? 18:47:12 topol: let's contribute the fix to devstack! 18:47:14 i think the best bet is to make devstack force a listen on 127.0.0.1 in single node 18:47:30 ayoung, like stop squatting on my port! 18:47:44 dolphm, agreed 18:47:48 morganfainberg, nope 18:47:49 gyee: according to linux, it's not your port ;) 18:47:57 ayoung, no? 18:47:59 devstack doesn't know it is going to be single node 18:48:00 ayoung: why not? 18:48:11 ayoung, i think it does 18:48:12 you run an additional devstack on a second machine, and link it to the first 18:48:46 ayoung, really? i thought it had more smarts than that *admittedly, i haven't tried* 18:49:21 morganfainberg, its one of the ways it can be run, and what I would expect most people to do: set up a minimal install, make sure it runs, then add additional machines 18:49:43 this sounds like something that needs to be changed in the devstack-gate config then 18:49:52 and explicitly there. 18:50:02 so, if my machine has a "public" ip of 10.10.2.12 can I listed on 10.10.2.12:35357 without conflicting on the ephemeral port? 18:50:22 ayoung, if nothing else is using it yes. it doesn't matter 18:50:29 according to bknudson, yes 18:50:35 ayoung: i think you'd be okay there unless something else explicitly was listening on the same interface + port 18:50:46 dolphm, i believe that is how it wokrs 18:50:50 what is it that we are tripping on that has the ephemeral port open in practice? 18:51:02 ayoung, anything. 18:51:06 in practice 18:51:10 ayoung: it's just some random application opens a connection to something 18:51:15 ayoung, it's random, not consistent 18:51:16 in devstack runs that fail? 18:51:33 these are dedicated machines, we should be able to tell 18:51:35 ayoung, could be apt, could be git, could be... uhm,, http? 18:51:45 ayoung, could be any request to an external place 18:52:00 I blame pip 18:52:01 ayoung, and i think it isn't consistent. 18:52:03 outgoing requests should be separate from incoming, I thought 18:52:22 ayoung, bi-directional, ephemeral ports are used for that 18:52:28 morganfainberg: in which case wouldn't it be up to the administrator to decided how to solve best? 18:52:53 lbragstad, the easiest way is to change the ephemeral port range for the box to match the IANA numbers published 18:53:02 lbragstad, the "most correct way" that is 18:53:12 lbragstad, linux doesn't adhere to that RFC by default 18:53:18 anyone using the default config will wind up with keystone failing to start every once in a while because of this 18:53:31 sudo echo "49152 65535" > /proc/sys/net/ipv4/ip_local_port_range 18:53:46 ayoung, doesn't solve anything running before devstack starts 18:53:53 ayoung, which is why devstack ditched that 18:54:05 they can do it on the gate machines in rc.d 18:54:09 rc.local 18:54:11 that could. 18:54:13 ayoung: you can also just register a single exception (35357) 18:54:14 they could 18:54:22 but i think they were resistent to that change 18:54:29 ayoung: but, what morganfainberg said 18:54:40 requires custom images 18:54:50 "we fear change:" 18:55:11 ayoung, "we fear changes we have to make every single time we update other things" 18:55:11 lets push for Devstack to run Keystone from HTTP using 443 18:55:25 HTTPD 18:55:25 ayoung: it's not so much the change, so much as it becomes a hacky fix. devstack should on *any* supported image 18:55:33 443 is https 18:55:52 gyee, that is why I didn't say 80 18:55:58 5 minutes left 18:56:03 ayoung: but you said http 18:56:23 twas a typo I corrected immediately after. 18:56:25 doesnt devstack have an apache running with swift and possibly using the http ports? 18:56:26 lets run devstack on windows :) 18:56:37 topol, that will work anyways 18:56:44 ok, so we're not going to fix the gate problem we're causing? 18:56:47 http://wiki.openstack.org/URLs 18:56:51 ayoung, no conflict? 18:56:57 nope 18:57:15 So long as the WSGI apps do something like /keystone vs /swift 18:57:35 topol: it has apache for keystone too but it defaults to running on 35357 18:57:59 jamielennox, OK 18:58:24 which is why we can't just change to listening on 443, we need to use Apache to manage between the WSGI apps 18:59:25 * topol wonder what sdague thinks is the proper fix. He'll get to make the final decision anyway 18:59:48 * topol let him decide 18:59:57 I think the retry 19:00:18 retry will mean it fails less often 19:00:45 bknudson, short of abandoning the ephemeral port, there aren't good option here. 19:00:46 fewest things changing, and it would be possible to layer another, more draconian fix on it later 19:00:53 the transient gate fail will not have been officially removed 19:01:03 that's time guys 19:01:29 continue this in -dev? 19:01:55 ++ 19:02:49 dolphm: can you #endmeeting ? 19:03:06 pleia2: you can do it after 60 minutes from start of meeting 19:03:12 #endmeeting