Thursday, 2015-02-05

*** SimonChung1 has quit IRC00:01
*** SimonChung has joined #openstack-operators00:07
*** SimonChung1 has joined #openstack-operators00:12
*** SimonChung has quit IRC00:12
*** dmsimard is now known as dmsimard_away00:17
*** blair has joined #openstack-operators00:19
*** Marga_ has quit IRC00:22
*** furlongm_ has joined #openstack-operators00:22
*** Marga_ has joined #openstack-operators00:23
*** furlongm has quit IRC00:24
*** Marga_ has quit IRC00:27
*** furlongm_ has quit IRC00:29
*** furlongm has joined #openstack-operators00:30
*** Marga_ has joined #openstack-operators00:52
*** Marga_ has quit IRC00:52
*** Marga_ has joined #openstack-operators00:53
*** SimonChung1 has quit IRC00:57
*** VW_ has quit IRC01:02
*** SimonChung has joined #openstack-operators01:15
*** alop has quit IRC01:16
*** david-lyle is now known as david-lyle_afk01:18
klindgrenDoes anyone here use the metadata feature of keystone?01:19
klindgrenIE the ability to attach arbitrary key=value stuff to keystone projects and the like?01:19
*** mdorman has quit IRC01:20
klindgrenExample use case: Marking a project as "production" without having to make sure that the project name is "PROD-<name>"01:20
klindgrenor "test" or "dev" or something of that nature01:20
dvorakpretty sure we don't, but that actually explains something I saw in a designate video I was watching earlier today01:21
dvorakjlk: I assume you guys are building from internal repos using giftwrap?01:22
dvorakklindgren: we use the description of the project to distingish between internal projects and our (internal) customers projects01:23
dvorakseems like the metadata would be a lot better way to do that though01:23
klindgrenAsking because we wanted to use metadata - to you know set metadata stuff about a project.01:24
klindgrenbut its in the process of being removed in kilo with no replacement01:24
klindgrenTrying to see if anyone wants to or is currently using that feature.01:25
klindgrenThinking that we might try to do something like the description as json blob to support that01:33
*** dmsimard_away is now known as dmsimard01:34
*** SimonChung has quit IRC01:38
*** markvoelker has joined #openstack-operators01:54
*** dmsimard is now known as dmsimard_away02:03
*** signed8bit has joined #openstack-operators02:22
dvorakwell, the description shows up in horizon, so that'd be kind of ugly, but I imagine it'd work ok02:41
*** harlowja is now known as harlowja_away03:56
*** VW_ has joined #openstack-operators03:56
*** signed8bit has quit IRC04:08
jlkdvorak: mostly from upstream repos, but we have forked a few04:21
*** VW_ has quit IRC04:26
*** blairo has joined #openstack-operators04:26
*** VW_ has joined #openstack-operators04:27
*** blair has quit IRC04:29
*** blairo has quit IRC04:31
*** VW_ has quit IRC04:39
*** SimonChung has joined #openstack-operators05:19
*** SimonChung1 has joined #openstack-operators05:21
*** SimonChung has quit IRC05:23
*** Gala-G has joined #openstack-operators05:25
*** blair has joined #openstack-operators05:25
klindgrendvorak, in our particular case - we dont use horizon at all.  We have our own frontend that we expose to end users.05:34
jlkA common story ^05:36
*** sanjayu has joined #openstack-operators05:50
*** Marga_ has quit IRC05:54
*** markvoelker has quit IRC07:06
*** blair has quit IRC07:09
*** racedo has quit IRC07:13
*** harlowja_away has quit IRC07:23
*** subscope has quit IRC07:29
*** zerda has joined #openstack-operators07:29
*** subscope has joined #openstack-operators07:44
*** belmoreira has joined #openstack-operators08:02
*** blair has joined #openstack-operators08:27
*** zz_avozza is now known as avozza08:28
*** subscope has quit IRC08:29
beddariah .. too much is wrong about this -> http://cloudscaling.com/blog/openstack/vanilla-openstack-doesnt-exist-and-never-will/08:30
*** matrohon has joined #openstack-operators08:38
*** subscope has joined #openstack-operators08:45
*** bvandenh has joined #openstack-operators08:55
*** matrohon has quit IRC08:59
*** derekh has joined #openstack-operators09:24
*** Marga_ has joined #openstack-operators11:14
*** markvoelker has joined #openstack-operators11:40
*** markvoelker has quit IRC11:47
*** Marga_ has quit IRC12:02
*** Marga_ has joined #openstack-operators12:04
*** reed has joined #openstack-operators12:06
*** todin has joined #openstack-operators12:10
*** zerda has quit IRC12:14
*** markvoelker has joined #openstack-operators12:43
*** markvoelker has quit IRC12:47
*** VW_ has joined #openstack-operators12:50
*** Marga_ has quit IRC12:53
*** subscope has quit IRC12:55
*** Marga_ has joined #openstack-operators12:56
*** markvoelker has joined #openstack-operators13:01
*** Marga_ has quit IRC13:08
*** VW_ has quit IRC13:10
*** subscope has joined #openstack-operators13:11
*** matrohon has joined #openstack-operators13:11
*** Ctina has joined #openstack-operators13:23
*** pboros has joined #openstack-operators13:28
*** sanjayu has quit IRC13:29
*** matrohon has quit IRC13:40
*** subscope has quit IRC13:55
*** VW_ has joined #openstack-operators13:57
*** subscope has joined #openstack-operators14:10
*** signed8bit has joined #openstack-operators14:42
*** signed8b_ has joined #openstack-operators14:44
*** signed8bit has quit IRC14:47
*** Gala-G has quit IRC14:58
*** VW__ has joined #openstack-operators15:07
*** VW_ has quit IRC15:09
*** VW__ has quit IRC15:19
*** david-lyle_afk is now known as david-lyle15:23
*** VW_ has joined #openstack-operators15:24
*** Marga_ has joined #openstack-operators15:28
*** VW_ has quit IRC15:32
*** VW_ has joined #openstack-operators15:33
*** jaypipes has joined #openstack-operators15:43
*** VW_ has quit IRC15:46
*** Ctina_ has joined #openstack-operators15:48
*** Ctina has quit IRC15:50
*** VW_ has joined #openstack-operators15:51
*** signed8b_ is now known as signed8bit_ZZZzz15:51
*** signed8bit_ZZZzz is now known as signed8b_15:53
*** VW_ has quit IRC15:55
jlkbeddari: I tend to agree with it.15:58
*** VW_ has joined #openstack-operators16:01
*** Marga_ has quit IRC16:02
*** Marga_ has joined #openstack-operators16:04
*** avozza is now known as zz_avozza16:07
klindgrenjlk - also agreed16:40
klindgrenbesides most people I know already have some investment in either networking/san storage that they are happy with16:41
klindgrenand aren't really looking to replace what they know16:41
klindgrenthey jsut want to be able to hookup what they already have/know and put an API/UI infront of it16:41
klindgrenThe one thing that openstack is sorely missing is any sort of HA around the vm's that it spins up and trying its best to keep a vm from error state16:43
klindgrenit seems like any transient error immediatly throws a vm into "ERROR"16:43
*** mdorman has joined #openstack-operators16:48
jlkThat's Cloud16:50
jlkautoscale can help there, by keeping a minimum # of roles running16:51
*** alop has joined #openstack-operators16:51
jlkso if one goes down, the scaler brings up a replacement.16:51
jlkI'd rather see more effort around HA for control services.16:52
jlkToo many single points of failure16:52
klindgren>_>17:05
klindgren*cough* rabbitmq *cough*17:06
klindgren<_<17:06
mgagnejaypipes: As an operator (or casual contributor), how can I make sure my change/blueprint gets reviewed before the feature freeze? Example: https://review.openstack.org/#/c/115409/ Should I poke cores directly until one takes care of it? I wish to avoid this bad experience in the future.17:17
jaypipesmgagne: one moment... on a call.17:18
mgagnejaypipes: sure, np17:18
*** VW__ has joined #openstack-operators17:19
*** VW_ has quit IRC17:21
*** signed8b_ has quit IRC17:23
*** VW__ has quit IRC17:35
jaypipesmgagne: alrighty... so lemme take a looksie at the above patch.17:36
jaypipesmgagne: generally if it's a smallish patch that is well-defined and contains unit tests, it shouldn't be a problem to ask a couple cores on IRc for a review.17:37
jaypipesmgagne: in this case, it looks like you got some feedback from a number of nova drivers team (sdague, jogo, johnthetubaguy, and mriedem) on the blueprint back in November and December17:38
mgagnejaypipes: blueprint which was created months after the initial patch after someone commented: doesn't it need a blueprint?17:39
jaypipesmgagne: one sec, still reading back through the comments :)17:39
mgagnejaypipes: IIRC, I poked a core about that one which then required a blueprint17:39
*** belmoreira has quit IRC17:41
jaypipesmgagne: and, I agree with you that it's not fair that this was sitting for a long time, with an approved BP and now has been blocked. it's just that it's not on the priority list of reviews, I'm afraid, and there's only a certain amount of core reviewers :( That said, you may certainly apply for a feature freeze exception for this. You'll need 2 cores to sponsor the patch. Gimme a little while to review it and I wi17:42
jaypipesll let you know if I can sponsor it, ok?17:42
*** Marga_ has quit IRC17:43
jlkklindgren: ugh, don't remind me. So many problems with rabbit.17:43
mgagnejaypipes: I'm fine going down with the exception. I however wish to learn the way to avoid this situation again because I (honestly) think I did all I had to do to make it work.17:43
*** VW_ has joined #openstack-operators17:44
*** VW_ has quit IRC17:44
jaypipesmgagne: the best way to avoid the situation is to pester cores early and often I'm afraid.17:44
*** VW_ has joined #openstack-operators17:44
mgagnejaypipes: and I feel befriending cores look to be the only way to fast forward stuff17:44
mgagnejaypipes: alright then17:44
jaypipesIt's not about befriending :) it's about being a salesperson for your BP/patch17:44
jaypipesand just being persistent.17:45
jaypipesremember, at any given time, there are more than 600 patches in the review queue...17:45
jaypipesso it's easy for cores to lose track of a patch. so it behooves you to gently remind them ;)17:45
mgagnejaypipes: I understand/lived this unfortunately reality ^^'17:45
jaypipes:)17:45
mgagnejaypipes: thanks for your help17:46
jaypipesany time!17:46
mfischklindgren: you here?17:48
klindgrenmfisch, I am17:48
mfischklindgren: wanted to talk more about rabbit and how you guys use it17:48
mfischwe're currently using the "list all nodes in the config file" method17:48
klindgrenSure - though mdorman did alot of the config on that17:49
klindgrenwe are currently in the clustered rabbitmq + LB17:49
klindgrenand it *SUCKS*17:49
mfischtalking to some guys who worked for Pivotal on rabbit they recommend LB, but they have NFC about openstack17:49
mfischI was looking into switching to haproxy but everyone hate sit17:49
klindgrenyea suggestion to not17:49
mfischso I was toying with haproxy + primary/backup17:49
klindgrenwe have harpoxy in dev/test17:49
klindgrenand we are goign to pull it17:50
klindgrenand go back to all server listed17:50
mfischthe biggest issue I have, haproxy or not, is how to restart rabbitmq and not break all services17:50
klindgrenissue comes with haproxy long connection timeout stuff17:50
mfischopenstack services go full retard when rabbit goes away17:50
klindgrenyea17:50
*** Marga_ has joined #openstack-operators17:50
mfischthats an issue haproxy or no haproxy17:50
klindgrenfull retard + sometimes no error messages about not being correctly connected to rabbitmq17:50
mfischyes17:51
mfischthats an openstack issue17:51
_nicki've found rabbitmq to be a lot flakier with haproxy added into the mix17:51
mfischIMHO17:51
klindgrenyea17:51
klindgrenwaiting on oslo.messaging17:51
_nickbut yeah, openstack goes batshit mental regardless if rabbitmq goes awol17:51
mfischso the basic idea I had before I thought about it more was that if I had haproxy I could bleed connections off before restarting a node17:51
mfischhowever the openstack connections last like forever17:51
mfischso thats not going to work17:51
dvorakit does if you restart all the services :)17:52
mgagnemfisch: no actual exp. with haproxy+rabbitmq but operators suggest to not do it. Rabbit knows better about the state of the cluster and queue replication than haproxy alone.17:52
mfischyeah that was your idea17:52
klindgren1.5.2(?) to get commited17:52
mfischthe fundamental issue here imho is openstack's reaction to rabbit going away17:52
dvorakif you list everything in the config file, how do you take down a node then?17:52
jlkooh rabbit talk17:52
jlkyes17:52
dvorakwe do that now, and restarting rabbit is miserable17:52
mfischrabbit is on my time machine list17:53
jlkWe tried rabbit as a list in configs17:53
jlkbut when we failed over to another rabbit server, the services just sat there looking dumb17:53
klindgrenyea17:53
mfischI dont think haproxy solves that issue17:53
_nickjlk: exactly what we've experienced as well17:53
jlkwe've got back to rabbit running on two systems17:53
klindgrenits suppsoe to have a heartbeat (coming soon)17:53
jlkbut not as a cluster17:53
klindgrenthat should figure out tis dead pretty fast and move to another server17:53
mgagnedvorak: we are able to take down any node without problem. However if you restart the 3 of them one after the other, AFAIK, you could have problem17:53
jlkwhen failover happens, queues get re-created on the waiting rabbit server17:53
jlkand services seem to notice pretty quickly17:53
dvorakmgagne: well, and that's usually what we want to do  :)17:53
jlkthere is a tiny chance at lost messages17:53
mfischit seems that no matter what you do there's a chance of lost messages with rabbit17:54
mgagnedvorak: then you take out your ansible toolbox and restart the openstack world ^^'17:54
dvorakmgagne: that's kind of what I've been suggesting we do anyway :)17:54
mfischmgagne: 3 control nodes and 3 rabbit servers = 27 restarts to restart all of rabbit though17:54
dvorakI don't want to have to schedule an API outage every time we need to reconfigure or upgrade rabbit17:55
jlkso far, we've had good success with just a single rabbit server running at a time17:55
jlkwith services pointing at a floating IP for it17:55
mfischjlk: whats your 2nd server exactly in that environment?17:55
mgagnemfisch: yep, I think we are all in the same boat regarding flaky support for heartbeat/failover17:55
jlkat least in our limited juno testing17:55
*** Marga_ has quit IRC17:55
jlkmfisch: it's just a rabbit server, not configured for HA or clustering or anything17:55
mfischwe're almost to juno, i'm sure all problems will be solved17:56
jlkjust sitting there unused17:56
mfischjlk: so how do you failover exactly?17:56
jlkucarp floating IP17:56
klindgrenbascially, as it sits right now our process is if doing anything with rabbitmq17:56
dvorakklindgren: is there a specific patch in review for heartbeats?17:56
klindgrenmake sure rabbitmq cluster is not fbuar17:56
jlkwhen the IP moves, services timeout on their old connection and reconnect, which re-creates all the necessary queues17:56
klindgrenrestart world17:56
klindgrenand life continues17:56
mgagnethat never ending hunt for stability and bug fixes. You will find new bugs in juno and wish you could upgrade to kilo asap17:56
klindgrendvorak, yes17:56
mfischmgagne: it pays the bills at least17:56
klindgrenhttps://review.openstack.org/#/c/146047/17:57
mgagnemfisch: can't argue with that haha17:57
klindgrensupposedly other peopel were saying oslo.messaging 1.5.1 had some fixes in it to make rabbit stuff better17:57
klindgrenand that guy should make things atleast recover from full retard17:57
jlkwhen we tried doing rabbit clustering with juno, it was not a good story17:58
jlkwhen one of our rabbit nodes goes down, the services never seem to jump to the other one17:58
jlkso we'd have to restart any service that uses rabbit17:58
dvorakwe're runnign 1.4.1 :(17:58
mgagnebut then you have your distro that is stuck to an archaic version of oslo.messaging and you wish you had venv instead.17:58
jlkand wait for half of their worker threads to timeout to the downed rabbit server17:58
klindgrenyea from what someone else was saying the switch from whateve r messaging to oslo-messaging in icehouse17:58
dvorakmfisch: we need to ditch all these UCA packages.17:58
mgagnedvorak: hahaha17:58
klindgrenwas a huge set back in terms of stability17:58
*** Ctina_ has quit IRC17:59
jlkMirantis seems to have found  a bunch of stuff wrong with oslo rabbit17:59
klindgrendvorak, we are running 1.4.1 as well17:59
jlkand a bunch of PRs are up17:59
dvorakyeah, 1.4.1 is the latest available from canonical17:59
mgagnebut unfortunately you developed stockholm syndrome with UCA17:59
jlkI'm hoping they get merged in Kilo, and we'll try clustered rabbit then again17:59
mfischI dont feel like I've got a good sense that I should bother to change anything right now18:00
mfischmore like stay where we're at and wait18:00
dvorakmgagne: I really don't want to be using UCA, but I don't want to package all this crap myself either.  Looking into giftwrap currently.18:00
mgagnedvorak: the idea is floating here too. let me know how it goes.18:00
mfischhey dvorak if we dont want to use UCA we could go work for godaddy, problem solved!18:00
klindgrenpersonlly - we are jsut going to roll a new version of oslo.messaging under juno18:01
klindgrenand see if things aren't instantly better18:01
dvorakI'd argue that RPM packaging is significantly less stupid than debian packaging.18:01
klindgrenI can't take rabbitmq issues anymore18:01
klindgren"reliable" messaging my ass18:01
dvorakklindgren: well, I think at least half the problem is the applications18:01
mfischI'd say 90%18:01
mgagne^18:01
dvorakif you don't at least do basic things like turning on heartbeats...18:01
mfischI think that rabbit would say "use rabbit like this" and openstack said "whatever"18:02
dvorakyes, that's what I suspect.18:02
*** Piet has joined #openstack-operators18:02
mfischif rabbit was this bad why would anyone use it? plenty of apps run without issues18:02
*** Gala-G has joined #openstack-operators18:02
mgagneI wonder if any dev are actually running openstack in a production env.18:03
* mgagne takes cover18:03
mfischwe all know the answer to that one18:03
mfischa fresh built devstack every morning is prod to most people18:03
*** derekh has quit IRC18:03
jlkmfisch: there are few alternatives to rabbit18:03
jlkin the amqp space18:03
klindgrenpretty sure if you upgrade oslo.messaging and everything uses oslo-messaging and you add the heartbeats and stuff to the configs - I think it should get better no?  Not exactly sure how much of the rabbitmq implementation is in oslo.messaging18:03
dvorakI know RDO used to use qpid, but I don't know if the still are.18:04
klindgrenqpid and zeromq?18:04
dvorakklindgren: I'd expect that all of it is in oslo.messaging18:04
jlkzeromq is a worse story actually18:04
jlknot enough testing18:04
mgagne"Have you tried to ./unstack.sh and ./stack.sh it back up?"18:04
klindgrenI dunno anyone running qpid?18:04
mfischis rabbitmq complaints on the mid-cycle meetup list?18:04
mfischmgagne: lol18:04
jlkapid is a huge pile of java, which we'd rather avoid18:04
klindgrenhaving ran activemq before18:04
klindgrennot only no - but hello no on that one18:05
klindgrenhell*18:05
klindgrenactivemq made another application switch to "unreliable messaginging"18:05
klindgrenwhich imho is a great way to do it18:05
klindgrenif something has picked up your request in x amount of time - send it again18:05
klindgrenI would be fine about a session on messaging in mid-cycle18:06
jlkyeah, a mutual gripe session18:06
mfischwe can make cars drive themselves but not reliable IPC in 2015...18:06
klindgrenI would be ok talking about what we have and how its working/pain points18:06
mfischI hope self-driving cars arent using rabbit now that I think about it18:07
jlksame.18:07
dvorakmfisch and I spent most of two days with some pivotal guys and picked up a lot.  I might even remember some of it18:07
klindgrenhonestly though - I dont know anyone who is happy with their rabbitmq/messaging setup18:07
mfischI'll add it  if its not here18:07
mfischhttps://etherpad.openstack.org/p/PHL-ops-meetup18:07
klindgrenatleast under openstack18:07
dvorakklindgren: I've certainly talked to people that are really happy using rabbit for things that aren't openstack18:08
mfischadded to the list, please +118:08
dvorakbut most of those people have developers on staff that wrote the apps that talk to rabbit and know what they're doing18:08
mfischdvorak: solution, we should go work on those apps18:08
dvorakI've actually done development against rabbitmq, but it was a long time ago18:09
*** VW__ has joined #openstack-operators18:09
klindgrendvorak, we use rabbitmq in our logstash setup18:09
klindgrenand honestly - it jsut works18:09
klindgrenoutside of standard rabbitmq clustering upgrade issues18:10
klindgreneg: new version of erlang?18:10
*** VW_ has quit IRC18:10
dvorakyeah, that's the thing that's biting us right now18:11
mfischnew version of erlang is going to cause restarts18:11
dvorakthere is a new version of erlang available from ubuntu and we're afraid to upgade18:11
dvorakupgrade18:11
dvorakit's really sad when rabbitmq makes galera look like an easy to manage application18:12
*** VW__ has quit IRC18:12
jlkthe problem is that most development focuses on thing working under optimal dondition18:13
jlkcondition18:13
mfischI'd marry galera after dating rabbit18:13
jlknot focusing on things working when rabbit fails underneath them18:13
jlkor fails over to another rabbit server.18:13
*** VW_ has joined #openstack-operators18:13
mfischI dont think devs ever see the rabbit issues we have, atleast not often18:13
mfischyou can't simulate them well on a laptop18:13
jlkyou can if you launch two instances for rabbit18:14
jlkand kill one of them :)18:14
jlkneed multi-node devstack18:14
mfischI wonder how often launching 2 rabbits and killing one is tested18:14
jlkRackspace just took a different approach. They only have one rabbit server and a hot spare. If one dies, well, it's restart the world time.18:15
jlkmfisch: approaching 018:15
*** Gala-G has quit IRC18:15
mfischthats the way I was headed with a primary/backup behind haproxy18:15
mfischuntil I heard the issues here18:15
jlknod18:15
jlkI like our current approach18:15
jlk"like"18:15
jlkas in prefer it over the alternatives. But I don't really "like" it18:16
klindgrenjlk - honestly - looking to move to something like that18:16
*** Marga_ has joined #openstack-operators18:16
jlkmy main motivation is to not have to have somebody have to restart a bunch of things if there is a network failover18:16
mfischklindgren: have you tried packaging up the newer oslo-messaging?18:16
*** signed8bit has joined #openstack-operators18:18
mgagnejlk: and it's done by having a active/standby setup?18:19
*** signed8bit is now known as signed8bit_ZZZzz18:19
jlkmgagne: kinda, more like solo active and solo active. Neither rabbit server knows about the other one18:19
mgagnejlk: any other issue related to firewall session timeout or such?18:19
jlkno, our firewall just allows the incoming rabbit port. It's a local firewall on the host, not a device.18:20
jlkour clouds are very small so everything is fairly localized18:20
mgagneright18:20
mfischso if rabbit gets restarted on a box, what happens jlk18:20
mfischor the node reboots18:20
jlkif the node reboots, ucarp moves the floating IP over. All hosts are pointed at the floating IP18:21
jlkon the non-active rabbit it'll suddenly start getting connections, and queues get re-created by all the producers/consumers18:21
mfischso messages already in queues are lost?18:21
jlkanything uncomsumed yes18:21
mfischunconsumed messages18:21
mfischok18:21
klindgreneh jlk - from xp - pretty sure thats not going to work out very well18:22
jlkit's up to the producers to know that their message wasn't consumed18:22
jlkeither they don't care, or they do and handle it18:22
klindgrenwe have the same thing happen with haproxy where it hits an idle timout out and a connection gets moved from one server to naother18:22
mfischI agree in theory, not sure thats how openstack works18:22
*** signed8bit_ZZZzz is now known as signed8bit18:22
klindgrenand the client doesn't really realizes that its ben diconeccted18:22
klindgrenbeen disconnected*18:22
jlkat least in juno, some of the clients do18:23
klindgrenand were runnign clustered rabbitmq there so it should have access to the same queues on the new stuff as the old stuff18:23
jlklike nova-conductor and nova-compute18:23
jlkklindgren: we aren't running clustered, that may be the key18:23
klindgrenkk18:23
jlknor do we have HA queues turned on in the openstack configs18:23
*** radez_g0n3 is now known as radez18:23
mfischwe have ha queues enabled18:24
jlkIIRC you're only supposed ot use that with clustered rabbit18:24
mfischdo any of you guys have TTL set on messages to help cleanup the random queues with no consumers?18:24
klindgrenwe don't afaik18:24
mdormanhey guys just getting back here.18:24
jlkno, we haven't had any problems with unconsumed messages18:24
jlkwe have a sensu check for unconsumed messages even, and it never fires unless something really weird happens.18:24
mfischwe occassionally will have queues that have say 900 messages with no producers and no consumers18:25
klindgrenmdorman, knows more about our rabbitmq setup18:25
jlk(like a network failover without a server actually restarting, a couple queues could get left stale on the now no-longer active rabbit)18:25
mfischmdorman: drive on up to FTC and I'll buy you a lemonade18:25
klindgrenmfisch, I haven't tried packaging a newer oslo.messaging - however when that heartbeat stuff makes it in -its on the list to do18:25
klindgrenI have a story every sprint that I am moving so I keep track of it18:26
mdormanactually i’m in arizona right now, so maybe in a couple weeks :)18:26
mfischklindgren: same here18:26
mdormananyways, got anothe rmeeting in 30 min so i gotta go get some lunch18:26
mfischjlk: whats your procedure for a "nice" failover in your environment?18:27
mfischjlk: like a scheduled maintenance18:27
mdormanmfisch:  i’ll tell you about RMQ if you tell me about federated keystone :)18:27
mfischours is more like in quotes federated18:28
mdormanhaha18:28
jlkmfisch: we would initiate a floating IP failover18:28
jlkwell, we'd do maint on the non-active one first, then do a falover and do maint on the other one18:28
dvorakother than the galera crashes, it's kind of like federated keystone on "easy" setting18:28
mfischjlk: do you restart any services? shutdown rabbit etc?18:28
jlkdepends on what we're touching18:28
jlktypically rabbit itself doesn't need to get messed with18:28
mfischlets assume rabbit has to restart so connections are lose18:28
jlkonly openstack services18:28
jlkif you restart rabbit itself, but don't move the floating IP, I think clients notice18:29
mgagnejlk: have you changed workplace recently? (looking at past openstack summit presenation)18:29
jlkbut I haven't tested that in a bit18:29
jlkmgagne: yes, I was at Rackspace previously, I'm at Blue Box now18:29
jlkin experience, we haven't had to do anything specific to rabbit in like... ever18:29
mfischjlk: I'm trying to figure out how your setup would differ from ours in that respect. If I assume connection to rabbit is lost I know we have to sometimes restart openstack services18:29
jlkoutside of needing to reboot the system rabbit runs on18:30
*** VW_ has quit IRC18:30
mgagnejlk: this explains the discrepancy between your workplace and profile text on sched ;)18:30
jlkheh18:30
jlkmfisch: yeah there are some scenarios where a restart may be necessary18:30
*** harlowja has joined #openstack-operators18:30
mfischapt-get install --upgrade erlang is on our radar18:31
jlkmfisch: at Rackspace we've ran into that, when the rabbit server itself hiccuped or the network between the two hiccuped18:31
jlkI designed an ansible playbook there that would reboot any "bus" service18:31
jlker not reboot, but restart.18:31
mgagnejlk: sounds like we use the same workaround ^^'18:31
jlkso things like nova-scheduler, nova-conductor, nova-compute. But not nova-api18:31
jlkwasn't pretty18:32
jlkI was hoping clustered rabbit would resolve that18:32
jlkalas....18:32
klindgrennova-cells?18:32
jlkoh yeah, nova-cells18:32
jlkI forget about that, we don't use cells at BB18:32
mgagnejlk: any issue related to firewall? due to missing rabbit heartbeat, sessions often get dropped if we don't bump tcp keepalive in kernel =(18:32
klindgrenI thought that nova-api would stuff messages on the queue?18:32
jlkthe nice thing about all those services is that customers don't directly hit them, so they can be restarted at-will18:33
klindgreneh18:33
klindgrennot really18:33
klindgrenif nova-compute is downloading a backing image18:33
jlkmgagne:  at Rackspace there was a firewall rule that allowed long lived connections to rabbit18:33
klindgrenand your reboot it - boom dead vm18:33
jlkklindgren: graceful restart18:33
jlkthat's actually a thing in nova18:33
klindgrenshit18:34
klindgrenneed to figure out how to do that with systemd18:34
mgagnejlk: haha, our netadmin would go berserk if we asked him to do the same ^^'18:34
klindgrenI got 99 problems and systemd is one18:34
jlkklindgren: nova accepts a TERM signal18:34
jlkwhen it gets that, it'll go into clean up mode. Do all it's running stuff, don't take any new on18:34
jlkit'll run indefinitely in that mode until either all running stuff completes, or it gets a kill signal18:35
jlkon debian platforms, start-stop-daemon was nice for that, since you could feed it a set of signals and a timer between them18:35
jlkso we baked that into our init script18:36
jlkgraceful shutdowns, nova-conductor, kinda awesome for doing low impact sweeping restarts18:37
klindgrenah - jlk do you have anything for systemd? :-)18:38
jlkno. I was unsuccessful in convincing RAX to move to anything newer than debian squeeze when I was there.18:39
jlkand BB is still on Ubuntu Precise18:39
jlkI'm quietly making a play to move to CentOS7 but I haven't spent much time on that18:39
jlkfirst up was getting from havana to juno18:39
klindgrenmoved to cent7 and got chaos monkey for free (systemd)18:39
jlkI'm not as angered by systemd, but I was a RHT employee for many years and was part of the transition18:40
*** mdorman is now known as mdorman_away18:40
jlkI'd take systemd over upstart, but mostly I just want consistency.18:41
jlkwith precise, some things are upstart, some things are traditional init.d stuff, and it's difficult to manage18:42
klindgreneh - I am not as angry about systemd.  But immediately ran into problems18:42
klindgrenlike systemd nuking cgroups created by libvirt18:42
jlkthat would be a bummer18:42
klindgrenlogstash not starting on bootup randomly18:42
klindgrensystemctl saying that everything is happy and fine when service really == dead18:42
klindgrenjlk, yea - its been fixed now18:43
klindgrenbut that was a fun one :-)18:43
*** bvandenh has quit IRC18:45
klindgrenIs anyone from mirantis going to be at the ops-midycle?18:49
*** signed8bit has quit IRC18:50
klindgrenSince we see a bunch of patches re: oslo.messaging and pretty much everyone who talked here had issues with openstack + rabbitmq wondering ifthey have some secret sauce figure out for making it less of pain18:51
*** signed8bit has joined #openstack-operators18:51
*** zz_avozza is now known as avozza18:52
mgagneklindgren: +1 on that one18:53
*** signed8bit is now known as signed8bit_ZZZzz19:01
*** avozza is now known as zz_avozza19:03
*** signed8bit_ZZZzz is now known as signed8bit19:04
*** bradm has quit IRC19:12
*** VW_ has joined #openstack-operators19:14
*** signed8b_ has joined #openstack-operators19:27
*** signed8bit has quit IRC19:31
jaypipesmgagne: btw, I haven't forgotten about you. still going through a review.19:40
mgagnejaypipes: sure, I know you are busy, we are all busy =)19:40
mgagneguys, how about we document somewhere the common pitfall with rabbit and openstack? If there are known solutions (or absence of solution), lets write them down too.19:46
jlkit should go in the HA guide19:46
jlkwhich right now sets people up for failure (doing a rabbit cluster with HA queues)19:46
mgagneI don't consult much the guide, lots of good ideas but I often think I'm better or different and implement a different solution. But the pitfalls stay the same (and undocumented).19:48
jaypipesmgagne: ++ on your suggestion about rabbit+openstack issues.19:49
klindgrenyea - their was a ops mailing list thing that went around a few weeks ago - thats what turned me on to the heartbeat stuff19:52
mgagneSame with MySQL/MariaDB/Galera I would say. I'm still trying to understand the (not so clear) pitfalls of Galera (reading the topic on openstack-dev and jaypipes' blog post). I'm still on the fence about using Galera instead of traditional DRBD/heartbeat setup19:52
klindgrenI am responding to that to see if someone from miratnis who has been doing work on their HA stuff for fuel was going to be at the meetup - would like them to talk about their setup nd the fixes the commited19:53
jaypipesmgagne: we ran 12 availability zones at AT&T in a multi-writer Galera cluster across the WAN and never had issues with it. For the internal AZ clusters (for Nova, Neutron, Cinder dbs), we used a 4-node Galera multi-writer cluster, load balanced equally across all nodes, and again never had issues.19:53
mgagneI just don't know how to reconcile the "ops ways" of doing thing (casual wiki) vs the openstack manual (formal docbook)19:53
jaypipesmgagne: now... is it possilbe to run a stress test that deliberately tries to swamp the database with concurrent requests for hotspot data and get a bunch of retries due to deadlocks? yes. does it really happen in production sites? not really.19:54
klindgrenfor us atleast rabbitmq seems to be the source of a majority of our issues.  IE issues where a consumers aren't picking up messages - openstack breaks after somethign with a rabbitmq node happens or LB failover or something like that, and a "restart world" solve the problem19:55
mgagnejaypipes: that's the kind of stories I wish to hear. I don't care about endless discussion about theoretical situations which statistically never happens just for the sack of arguing about something.19:55
jaypipesexactly.19:55
jaypipesklindgren: at AT&T, I would say keeping RabbitMQ up and happy was our #1 issue from an ops perspective as well.19:56
jaypipeswe ran it in cluster mode, active/active spread round robin with sticky sessions and mnesia persistence.19:56
klindgrenjaypipes, and did that work for you?  We have tried the clustered rabbit behind a LB and clustered rabbit + listing all the rabbitmq nodes in the configuration20:00
klindgrenand none of them seem to failover correctly20:01
jaypipeson phone one sec20:01
klindgrenthough from XP - the loadbalancer config seemed to be worse than listing servers in config.20:01
klindgrenMainly because of connections getting transferred to other nodes due to reasons, and the client being totally oblivious to the fact something changed20:02
*** Marga_ has quit IRC20:03
*** Marga_ has joined #openstack-operators20:03
*** mdorman_away is now known as mdorman20:04
*** Marga_ has quit IRC20:08
alopDoes anyone have any experience 'inspecting' images in glance?20:08
alopI'm thinking this would be something worth writing a blueprint for20:08
alopwe've been talking about it at work, like, we allow users to upload whatever they want, and can't impose a naming scheme20:09
alopand we've gotten ourselves into a situation where we've agreed to report to Microsoft on the number of instances running windows20:09
alopeven if we don't provide it20:09
alop^Which I thought was a huge fail on the lawyers part20:09
alopLike, if it was *my* image, that I was providing users, then I could do somethign with meta-data, naming scheme, etc to make it easy to report on usage20:10
klindgrenlike people doing byoi and running windows with their own keys and the like?20:10
alopya20:10
alopsomehow, that' *my* problem20:11
klindgrenerm20:11
klindgrensuggestion to go back to microsoft and "ask" someone else20:11
alopso, I'm looking at libguestfs, inspect-os20:11
alophaha20:11
alopI envision Monty Python and the Holy Grail20:11
alopwhere King Arthur is talking to the French soldiers20:11
alop"Is there someone else we can talk to?"20:12
klindgrenhonestly - if you dont provide windows as an offering on your cloud20:12
klindgrenthen people running images is not your problem20:12
klindgrenmicrosoft only cares about you using spla lics and providing windows to customers20:12
alopExactly, either the business/legal people totally misunderstand,20:12
klindgrenif you do that then you have all sorts of bullshit to deal with20:12
alopexactly20:13
alopbut talking to the Business/Product people here is like talking to a Horse20:13
alopj05h: ^^^20:13
klindgrenwell - I am pretty sure that if you report people running their own windows images + their own licensing - microsoft is going to come after you to pay for licensicing of those instances.20:14
alopthat's exactly what I said!20:14
aloplike, the users would have already paid for their own windows20:14
alopand we're charging them again?!?!20:15
alopLike, with Redhat, it's pretty easy. You don't register the instance, you get no package updates/repos/support20:15
klindgrenAnyway - its been a while since I did anything with microsoft.  I try to avoid windows as much as possible these days.  But some general googling seems to indicate that customers can only license servers that are on premise.20:25
klindgrenThough unsure whos problem that really becomes unless you want to provide the license and also take on the support burden of microsoft stuff.  Including the crazy licenses requirements20:26
alopyeah, I'm going to suggest that we get to see what the actual agreement was, It's possible that the business people are misinterpretting it20:27
klindgrenthe other problem - from XP 4+ years ago20:27
klindgrenis the answer changes depending on who you talk to20:27
klindgren:-)20:27
klindgreneven in the same division @ Microsoft20:28
dvorakmgagne: mfisch and I have both spent a lot of time on galera care and feeding, upgrades etc.  We'd be glad to help on anything20:29
dvorakwe run a local 3 node galera cluster in each region (2 regions) and a 6 node + arbitrator cluster across the two regions for keystone + horizon sessions20:29
mfischI am missing context all I see is MSFT talk20:30
alopoh, someone is telling me that I need to figure out a way to determine which instances are running windows20:31
dvoraksorry, it was way back in scrollback20:31
alopand not just from "my" images20:31
alopbut from images they might add to glance20:31
alopso, I'm looking at libguestfs "inspect-os"20:31
dvorakalop: I'm not sure if we actually have any windows instances, but my understanding is that if you buy datacenter licenses for a compute node, the instances hosted on it are automatically covered20:32
mfischyeah I think our plan is to pay per host20:32
dvorakI believe the plan has been to license a specific number of compute hosts and use aggregates to schedule windows instances only on those compute hosts20:32
alopI'm being told that there's some specific german law at play here20:32
alopit just seems like a fools errand20:33
dvorakah, ok. I couldn't begin to cover that :)20:33
*** zz_avozza is now known as avozza20:33
alopI'd love to write a blueprint for a disk inspector for glance20:33
alopbut the reasoning is upsetting20:33
mgagnedvorak: that's the solution we went with: aggregates, image props + KMS20:33
*** Marga_ has joined #openstack-operators20:34
alopone thing I could think of, is to change all the root disk sizes to something much smaller, which would not be suitible for windows, then make a flavor names "m1.windows" or something, which has the large root disk windows likes20:36
alopbutthat wouldn't cover all instances20:36
mgagneI can only imagine a troll face: problem? :D20:38
*** avozza is now known as zz_avozza20:38
*** Marga_ has quit IRC20:38
*** matrohon has joined #openstack-operators20:43
*** signed8b_ is now known as signed8bit_ZZZzz20:55
*** Marga_ has joined #openstack-operators20:56
*** signed8bit_ZZZzz is now known as signed8b_20:59
*** bvandenh has joined #openstack-operators21:04
*** zz_avozza is now known as avozza21:06
*** Marga_ has quit IRC21:10
*** Marga_ has joined #openstack-operators21:11
*** Marga__ has joined #openstack-operators21:15
*** Marga_ has quit IRC21:15
*** matrohon has quit IRC21:16
*** matrohon has joined #openstack-operators21:16
*** Marga__ has quit IRC21:17
*** Marga_ has joined #openstack-operators21:18
*** matrohon has quit IRC21:40
*** Piet has quit IRC21:46
*** blair has quit IRC21:50
*** bvandenh has quit IRC21:51
*** Rockyg has joined #openstack-operators21:54
*** VW__ has joined #openstack-operators21:58
*** Marga_ has quit IRC21:59
*** Marga_ has joined #openstack-operators21:59
*** VW_ has quit IRC22:01
*** Marga_ has quit IRC22:04
*** jaypipes has quit IRC22:04
*** VW_ has joined #openstack-operators22:05
*** avozza is now known as zz_avozza22:05
*** VW__ has quit IRC22:08
*** Gala-G has joined #openstack-operators22:12
*** Marga_ has joined #openstack-operators22:16
*** Marga_ has quit IRC22:16
*** Marga_ has joined #openstack-operators22:17
*** VW_ has quit IRC22:21
*** blair has joined #openstack-operators22:27
*** VW_ has joined #openstack-operators22:34
*** signed8b_ has quit IRC22:44
*** radez is now known as radez_g0n322:46
*** pboros has quit IRC22:57
*** VW__ has joined #openstack-operators23:01
*** VW_ has quit IRC23:03
*** VW__ has quit IRC23:10
*** Rockyg has quit IRC23:16
*** VW_ has joined #openstack-operators23:22
*** j05h1 has joined #openstack-operators23:23
*** j05h has quit IRC23:26
*** VW_ has quit IRC23:39
*** VW_ has joined #openstack-operators23:40
*** VW_ has quit IRC23:41
*** david-lyle is now known as david-lyle_afk23:41
*** Piet has joined #openstack-operators23:50
*** openstack has joined #openstack-operators23:57

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!