20:00:04 <johnsom> #startmeeting Octavia 20:00:06 <openstack> Meeting started Wed Apr 6 20:00:04 2016 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:09 <openstack> The meeting name has been set to 'octavia' 20:00:12 <minwang2> o 20:00:15 <bana_k> hi 20:00:15 <minwang2> o/ 20:00:21 <johnsom> Hi everyone 20:00:38 <johnsom> This will probably be a short meeting. 20:00:40 <dougwig> o/ 20:00:47 <dougwig> is that a challenge? 20:00:51 <johnsom> #topic Announcements 20:00:54 <johnsom> HA 20:01:14 <johnsom> I would not do such a thing.... 20:01:21 <bharathm> o/ 20:01:23 <johnsom> We will end up talking about endpoints again 20:01:41 <johnsom> Ok, I don't really have any announcements this week. Anyone else? 20:01:41 <TrevorV> o/ 20:02:06 <johnsom> #topic Docu-geddon 20:02:11 <TrevorV> I've got single-create "functioning" off the "/graphs" endpoint, but testing has been broken by my changes... I'm hunting it down now 20:02:13 <johnsom> #link https://bugs.launchpad.net/octavia/+bugs?field.tag=docs 20:02:19 <xgerman> O/ 20:02:20 <TrevorV> Ha ha, my bad. 20:02:27 <TrevorV> I thought we were doin progress reports :D 20:02:38 <johnsom> FYI, we are still looking for folks to sign up for documentation bugs. 20:03:16 <johnsom> We had some folks asking about documentation again today, so well worth our time 20:03:22 <johnsom> #topic Brief progress reports 20:03:32 <johnsom> TrevorV Ok, now... grin 20:03:37 <TrevorV> I've got single-create "functioning" off the "/graphs" endpoint, but testing has been broken by my changes... I'm hunting it down now 20:03:39 <TrevorV> :D 20:03:45 <minwang2> the following patches needs review:#link https://review.openstack.org/#/c/300689/ 20:03:45 <minwang2> #link https://review.openstack.org/#/c/299687/ 20:03:46 <minwang2> #link https://review.openstack.org/#/c/288208/ 20:04:01 <TrevorV> #link https://review.openstack.org/#/c/300689/ 20:04:11 <TrevorV> minwang2 missed a newline in there :D 20:04:23 <minwang2> ah, thanks for pointing out 20:04:33 <johnsom> I have been working on moving haproxy into a namespace in the amphora. I think I have it functioning now, still have some testing to do and clean up the unit tests I recently broke figuring this stuff out. 20:05:17 <johnsom> Otherwise it looks like a bunch of bug fixes going on. 20:05:20 <johnsom> Good stuff. 20:05:42 <johnsom> Any other progress reports to discuss? 20:05:56 <johnsom> #topic Open Discussion 20:06:02 <TrevorV> I have a topic 20:06:06 <johnsom> Ok, other topics 20:06:26 <TrevorV> Multiple controller workers. Are we anywhere near ready to have the conversation on "how to do this"? 20:06:37 <johnsom> Sure. 20:06:41 <johnsom> In theory it works 20:06:51 <dougwig> it just needs a new endpoint. 20:06:59 <dougwig> jk, put the damn torches down. 20:07:21 * johnsom considers a vote on worker endpoints 20:07:22 <TrevorV> dougwig I think I'll give you a papercut during the summit... just to say I cut you 20:07:27 <xgerman> Do I need a paint brush? 20:08:00 <johnsom> TrevorV Was there a particular concern about running multiple workers? 20:08:10 <xgerman> I think we run multiple workers... 20:08:10 <blogan> sorry im late 20:08:19 <johnsom> Yeah, we do 20:08:46 <xgerman> But not sure if it works at scale 20:09:04 <TrevorV> johnsom not specifically so much as we were talking about scaling issues coming up for Rackspace deployment and stuff, and didn't know if we had this figured out 20:10:29 <TrevorV> If it "works" then I guess we at Rackspace will just have to make sure it actually does when we have the chance :D 20:10:34 <johnsom> Yeah, to my knowledge multiple workers functions just fine. We still want to move to job board for HA reasons, but I think multiple workers is functional. 20:10:36 <TrevorV> I do have one more topic 20:11:10 <johnsom> Then, on the scale side, there is still the multiple tenant/multiple management LAN work 20:11:34 <johnsom> Which may/may not be an issue in your deployment 20:11:52 <johnsom> TrevorV Ok, what is next? 20:12:55 <TrevorV> It may be johnsom 20:13:01 <TrevorV> The next topic I have is "billing" 20:13:14 <TrevorV> So at Rackspace we talked about billing on a few things, and were discussing how to collect that information 20:13:22 * johnsom thinks "there goes the short meeting" 20:14:02 <TrevorV> crc32_znc (Carlos) mentioned the heartbeat mechanism already collects certain stats and sends them along, but we ignore them in favor of the "time" and the "id" 20:14:09 <johnsom> Today we have bytes in/out and connections in the octavia DB 20:14:18 <TrevorV> Oh, we do? 20:14:24 <TrevorV> Nice! 20:14:50 <johnsom> I'm not convinced there aren't bugs around failover and those stats, but yes, it's being collected 20:14:56 <TrevorV> So what about "uptime"? Haproxy *should* be able to give that to us as well, right? Would adding that to the heartbeat and storing it afterwards be a "problem"? 20:14:59 <xgerman> Yeah. Though probably brittle since we take it from haproxy 20:15:13 <dougwig> do we have a defined set of metrics we want, and an api, and if we're talking to ceilometer? if we make a ref amphora only solution, it'll be a pain to shoehorn in other drivers later, if we're standalone. 20:15:49 <TrevorV> I'm okay with it being modular, as much of our other driver-related implementations are, dougwig 20:15:54 <johnsom> And dougwig gets to the point.... 20:16:27 <xgerman> I think I set it up modular with the Nixon 20:16:33 <xgerman> Mixin 20:16:54 <johnsom> I thought the mixin got removed... 20:17:14 <TrevorV> I'm fairly certain it was removed as well 20:17:18 <xgerman> We can resurrect that... 20:17:31 <blogan> im pretty sure its still driverized 20:17:51 <TrevorV> Right, so the thing I was getting at was the "size" of the heartbeat 20:17:59 <johnsom> So, for Octavia, we have this API defined: http://www.octavia.io/review/master/api/octaviaapi.html#list-listener-statistics 20:18:02 <TrevorV> If we append a bit more information to it, will it be problematic? 20:18:35 <johnsom> TrevorV Help me with "uptime" as it is a highly abused word. 20:18:37 <neelashah> sorry a bit late in joining - are we talking about ceilometer support for lbaas v2? 20:19:03 <xgerman> Yeah, wouldn't nova know that? 20:19:05 <johnsom> neelashah Kind of. We are talking about billing topics. 20:19:07 <TrevorV> neelashah its just a "metrics/billing" conversation in general 20:19:27 <TrevorV> xgerman if you're talking about instance uptime, that's NOT the same as haproxy-service uptime. 20:19:29 <neelashah> in mitaka, ceilometer did add support for metrics for lbaas v2 20:19:54 <johnsom> TrevorV To me, uptime is a metric around availability, typically collected outside the system. 20:20:29 <TrevorV> neelashah Right, but some deployers may not use ceilometer, so we'll need to help that be consumed another way 20:20:39 <xgerman> TrevorV: might be good enough for billing 20:20:46 <johnsom> If you are talking about how log a load balancer has been provisioned, that is probably something that would come from the database 20:21:20 <TrevorV> johnsom except what if their LB dies for 6 hours, should we bill them? 20:21:27 <blogan> yes 20:21:29 <blogan> :) 20:21:40 <xgerman> +1 20:21:42 <TrevorV> dammit brandon 20:21:44 <johnsom> Ok, so you are talking about traditional uptime 20:22:01 <TrevorV> Sure, mostly because I didn't know another way to have that defined :D 20:22:16 <blogan> they're still taking up resources and its up to the cloud company to be proactive about refunds or the customer 20:22:18 <TrevorV> I *think* haproxy has that information somewhere, right? 20:22:49 <johnsom> TrevorV I don't think so. If it's down it's not going to tell you much 20:23:12 <TrevorV> johnsom sure, but does it have a stat that's something like, "I've been active this long" or something? 20:23:13 <johnsom> We could also bill time and a half for LBs with no members attached. 20:23:23 <TrevorV> We can calculate certain levels of down-time based on that number being reset and such 20:23:35 <xgerman> Yep, and the thing might be working fine in our world but some network is hosed making it not work for the customer 20:24:05 <johnsom> I don't think so. Here is the list: http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 20:24:06 <TrevorV> xgerman which means we'd still bill them because its out of our control. That's like saying "we won't bill you, mr customer, if you delete your app" 20:24:14 <johnsom> Again, I think you would have problems with failover, etc. 20:25:03 <xgerman> How do you bill vm? 20:26:11 <blogan> i think traditional uptime is a reasonable goal at first 20:26:17 <TrevorV> xgerman I'm not sure, I'm just trying to get a feel for things, you know? 20:26:33 <TrevorV> Ooops, left on accident 20:26:45 <xgerman> Never paid for mine:-) 20:26:57 <TrevorV> So that was an interesting find, haproxy has a "downtime" value 20:27:27 <TrevorV> I wonder if we should talk about billing the difference between instance up-time and the haproxy reported "downtime"... 20:27:29 <TrevorV> Hmmm 20:27:34 <johnsom> downtime as in all members were down 20:28:01 <xgerman> That's the customers fault 20:28:06 <johnsom> Right 20:28:22 <TrevorV> Oooh I follow then 20:28:42 <TrevorV> Then I'm not even sure how to collect that data for uptime 20:28:59 <blogan> the traditional way 20:29:03 <blogan> :) 20:29:05 <xgerman> Well, you can always count heartbeats 20:29:17 <johnsom> Pingdom? 20:29:27 <xgerman> :-) 20:29:46 <TrevorV> Are heartbeats specific enough to include an instance is errored? 20:29:56 <johnsom> Sorry, we probably aren't being super helpful right now. 20:29:58 <blogan> heartbeats overwrite each other 20:30:00 <TrevorV> Like, process is "running" but its not actually loadbalancing? 20:30:11 <blogan> so you wouldn't be able to see a history of heartbeats 20:30:14 <blogan> except through logs maybe 20:30:21 <xgerman> It check on the stats socket 20:30:47 <xgerman> Counting heartbeat would mean work 20:30:47 <johnsom> Yeah, heartbeats check stats and make sure the lister count matches expected. 20:30:57 <TrevorV> xgerman so you're saying in getting stats that haproxy is loadbalancing? There will never be a time that haproxy will report stats when its not actually loadbalancing? 20:31:11 <blogan> if the network in front of it is down 20:31:15 <blogan> the vip network 20:31:22 <blogan> or something north of that 20:31:23 <xgerman> +1 20:31:31 <TrevorV> blogan is that in our control or no? 20:31:38 <blogan> yes and no 20:32:02 <TrevorV> ? 20:32:05 <blogan> octavia's? no, company infrastructure? yes 20:32:22 <TrevorV> In other words, they should not be billed for it. 20:33:05 <blogan> i think thats better handled by auditing after an event like that happens 20:33:29 <TrevorV> So, in my opinion, we (maybe just Rackspace) will need a mechanism to capture/calculate up-time, and have it reported somehow to the CW to put it in the DB 20:33:32 <johnsom> Right now, for health monitoring, we consider it healthy if we get stats back and the listeners are all present. 20:33:33 <johnsom> https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/update_db.py#L66 20:34:29 <TrevorV> Right johnsom but we just identified a scenario where that's "inaccurate" 20:34:34 <TrevorV> When the VIP network is having issues/ 20:34:35 <blogan> TrevorV: the simple calculation of uptime by when the lb was created doesn't satisfy that? 20:34:44 <TrevorV> blogan that's overbilling 20:34:58 <blogan> TrevorV: thats subjective 20:35:04 <xgerman> Makes more money... 20:35:10 <TrevorV> Yep, and jorge would say we can't do that 20:35:31 <xgerman> Him! 20:35:32 <blogan> TrevorV: would he? 20:35:35 <TrevorV> blogan I can say that billing off "amphora" uptime wouldn't necessarily be a problem 20:35:49 <johnsom> I think it is really a policy thing to define your SLA and how it's measured. 20:35:57 <blogan> johnsom: agreed 20:36:01 <TrevorV> Since we capture bandwidth and (other thing I can't remember) already, we'll still be billing for resources consumed 20:36:03 <blogan> TrevorV: lets talk about if offline 20:36:05 <blogan> with jorge 20:36:10 <blogan> bc i think thats a lofty requirement 20:36:18 <TrevorV> kk blogan 20:36:43 <TrevorV> Looks like we're tired of the discussion gents, but knowing that we already store those other 2 fields in the DB, we (at rackspace) should be able to move forward when we're ready, thanks guys! 20:37:13 <blogan> TrevorV: we store it by amp, still need an aggregator 20:37:28 <ptoohill> Yea we discussed a good portion of this yesterday 20:37:32 <TrevorV> blogan I know, but we can make that happen in the "driver" we write for it 20:37:40 <blogan> which maybe the metering service handles, i dotn know 20:37:47 <blogan> ah okay 20:37:48 <TrevorV> ptoohill my objective was to identify what parts "weren't" done yet. 20:37:50 <johnsom> Yeah, to a point I think as long as it's provisioned, it's billed, and it's our responsibility to keep it healthy within the SLA. I.e. Act/Stndby, etc 20:38:09 <blogan> man i already feel out of the loop with internal workings 20:38:20 <TrevorV> Yeah, I agree johnsom I was just looking for the easiest way to get that information 20:38:31 <TrevorV> Make things easier, etc etc 20:38:36 <TrevorV> For the billing peeps I mean 20:38:41 <TrevorV> Alright, I'm good. 20:39:05 <johnsom> Ok. Any other topics for this week? 20:39:47 <johnsom> Ok, thanks folks! 20:39:52 <TrevorV> :D 20:39:53 <johnsom> #endmeeting