#openstack-meeting-alt log

20:00:04 <johnsom> #startmeeting Octavia
20:00:06 <openstack> Meeting started Wed Apr  6 20:00:04 2016 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:09 <openstack> The meeting name has been set to 'octavia'
20:00:12 <minwang2> o
20:00:15 <bana_k> hi
20:00:15 <minwang2> o/
20:00:21 <johnsom> Hi everyone
20:00:38 <johnsom> This will probably be a short meeting.
20:00:40 <dougwig> o/
20:00:47 <dougwig> is that a challenge?
20:00:51 <johnsom> #topic Announcements
20:00:54 <johnsom> HA
20:01:14 <johnsom> I would not do such a thing....
20:01:21 <bharathm> o/
20:01:23 <johnsom> We will end up talking about endpoints again
20:01:41 <johnsom> Ok, I don't really have any announcements this week.  Anyone else?
20:01:41 <TrevorV> o/
20:02:06 <johnsom> #topic Docu-geddon
20:02:11 <TrevorV> I've got single-create "functioning" off the "/graphs" endpoint, but testing has been broken by my changes... I'm hunting it down now
20:02:13 <johnsom> #link https://bugs.launchpad.net/octavia/+bugs?field.tag=docs
20:02:19 <xgerman> O/
20:02:20 <TrevorV> Ha ha, my bad.
20:02:27 <TrevorV> I thought we were doin progress reports :D
20:02:38 <johnsom> FYI, we are still looking for folks to sign up for documentation bugs.
20:03:16 <johnsom> We had some folks asking about documentation again today, so well worth our time
20:03:22 <johnsom> #topic Brief progress reports
20:03:32 <johnsom> TrevorV Ok, now...  grin
20:03:37 <TrevorV> I've got single-create "functioning" off the "/graphs" endpoint, but testing has been broken by my changes... I'm hunting it down now
20:03:39 <TrevorV> :D
20:03:45 <minwang2> the following patches needs review:#link https://review.openstack.org/#/c/300689/
20:03:45 <minwang2> #link https://review.openstack.org/#/c/299687/
20:03:46 <minwang2> #link https://review.openstack.org/#/c/288208/
20:04:01 <TrevorV> #link https://review.openstack.org/#/c/300689/
20:04:11 <TrevorV> minwang2 missed a newline in there :D
20:04:23 <minwang2> ah, thanks for pointing out
20:04:33 <johnsom> I have been working on moving haproxy into a namespace in the amphora.  I think I have it functioning now, still have some testing to do and clean up the unit tests I recently broke figuring this stuff out.
20:05:17 <johnsom> Otherwise it looks like a bunch of bug fixes going on.
20:05:20 <johnsom> Good stuff.
20:05:42 <johnsom> Any other progress reports to discuss?
20:05:56 <johnsom> #topic Open Discussion
20:06:02 <TrevorV> I have a topic
20:06:06 <johnsom> Ok, other topics
20:06:26 <TrevorV> Multiple controller workers.  Are we anywhere near ready to have the conversation on "how to do this"?
20:06:37 <johnsom> Sure.
20:06:41 <johnsom> In theory it works
20:06:51 <dougwig> it just needs a new endpoint.
20:06:59 <dougwig> jk, put the damn torches down.
20:07:21 * johnsom considers a vote on worker endpoints
20:07:22 <TrevorV> dougwig I think I'll give you a papercut during the summit... just to say I cut you
20:07:27 <xgerman> Do I need a paint brush?
20:08:00 <johnsom> TrevorV Was there a particular concern about running multiple workers?
20:08:10 <xgerman> I think we run multiple workers...
20:08:10 <blogan> sorry im late
20:08:19 <johnsom> Yeah, we do
20:08:46 <xgerman> But not sure if it works at scale
20:09:04 <TrevorV> johnsom not specifically so much as we were talking about scaling issues coming up for Rackspace deployment and stuff, and didn't know if we had this figured out
20:10:29 <TrevorV> If it "works" then I guess we at Rackspace will just have to make sure it actually does when we have the chance :D
20:10:34 <johnsom> Yeah, to my knowledge multiple workers functions just fine.  We still want to move to job board for HA reasons, but I think multiple workers is functional.
20:10:36 <TrevorV> I do have one more topic
20:11:10 <johnsom> Then, on the scale side, there is still the multiple tenant/multiple management LAN work
20:11:34 <johnsom> Which may/may not be an issue in your deployment
20:11:52 <johnsom> TrevorV Ok, what is next?
20:12:55 <TrevorV> It may be johnsom
20:13:01 <TrevorV> The next topic I have is "billing"
20:13:14 <TrevorV> So at Rackspace we talked about billing on a few things, and were discussing how to collect that information
20:13:22 * johnsom thinks "there goes the short meeting"
20:14:02 <TrevorV> crc32_znc (Carlos) mentioned the heartbeat mechanism already collects certain stats and sends them along, but we ignore them in favor of the "time" and the "id"
20:14:09 <johnsom> Today we have bytes in/out and connections in the octavia DB
20:14:18 <TrevorV> Oh, we do?
20:14:24 <TrevorV> Nice!
20:14:50 <johnsom> I'm not convinced there aren't bugs around failover and those stats, but yes, it's being collected
20:14:56 <TrevorV> So what about "uptime"?  Haproxy *should* be able to give that to us as well, right?  Would adding that to the heartbeat and storing it afterwards be a "problem"?
20:14:59 <xgerman> Yeah. Though probably brittle since we take it from haproxy
20:15:13 <dougwig> do we have a defined set of metrics we want, and an api, and if we're talking to ceilometer?  if we make a ref amphora only solution, it'll be a pain to shoehorn in other drivers later, if we're standalone.
20:15:49 <TrevorV> I'm okay with it being modular, as much of our other driver-related implementations are, dougwig
20:15:54 <johnsom> And dougwig gets to the point....
20:16:27 <xgerman> I think I set it up modular with the Nixon
20:16:33 <xgerman> Mixin
20:16:54 <johnsom> I thought the mixin got removed...
20:17:14 <TrevorV> I'm fairly certain it was removed as well
20:17:18 <xgerman> We can resurrect that...
20:17:31 <blogan> im pretty sure its still driverized
20:17:51 <TrevorV> Right, so the thing I was getting at was the "size" of the heartbeat
20:17:59 <johnsom> So, for Octavia, we have this API defined: http://www.octavia.io/review/master/api/octaviaapi.html#list-listener-statistics
20:18:02 <TrevorV> If we append a bit more information to it, will it be problematic?
20:18:35 <johnsom> TrevorV Help me with "uptime" as it is a highly abused word.
20:18:37 <neelashah> sorry a bit late in joining - are we talking about ceilometer support for lbaas v2?
20:19:03 <xgerman> Yeah, wouldn't nova know that?
20:19:05 <johnsom> neelashah Kind of.  We are talking about billing topics.
20:19:07 <TrevorV> neelashah its just a "metrics/billing" conversation in general
20:19:27 <TrevorV> xgerman if you're talking about instance uptime, that's NOT the same as haproxy-service uptime.
20:19:29 <neelashah> in mitaka, ceilometer did add support for metrics for lbaas v2
20:19:54 <johnsom> TrevorV To me, uptime is a metric around availability, typically collected outside the system.
20:20:29 <TrevorV> neelashah Right, but some deployers may not use ceilometer, so we'll need to help that be consumed another way
20:20:39 <xgerman> TrevorV: might be good enough for billing
20:20:46 <johnsom> If you are talking about how log a load balancer has been provisioned, that is probably something that would come from the database
20:21:20 <TrevorV> johnsom except what if their LB dies for 6 hours, should we bill them?
20:21:27 <blogan> yes
20:21:29 <blogan> :)
20:21:40 <xgerman> +1
20:21:42 <TrevorV> dammit brandon
20:21:44 <johnsom> Ok, so you are talking about traditional uptime
20:22:01 <TrevorV> Sure, mostly because I didn't know another way to have that defined :D
20:22:16 <blogan> they're still taking up resources and its up to the cloud company to be proactive about refunds or the customer
20:22:18 <TrevorV> I *think* haproxy has that information somewhere, right?
20:22:49 <johnsom> TrevorV I don't think so.  If it's down it's not going to tell you much
20:23:12 <TrevorV> johnsom sure, but does it have a stat that's something like, "I've been active this long" or something?
20:23:13 <johnsom> We could also bill time and a half for LBs with no members attached.
20:23:23 <TrevorV> We can calculate certain levels of down-time based on that number being reset and such
20:23:35 <xgerman> Yep, and the thing might be working fine in our world but some network is hosed making it not work for the customer
20:24:05 <johnsom> I don't think so.  Here is the list: http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1
20:24:06 <TrevorV> xgerman which means we'd still bill them because its out of our control.  That's like saying "we won't bill you, mr customer, if you delete your app"
20:24:14 <johnsom> Again, I think you would have problems with failover, etc.
20:25:03 <xgerman> How do you bill vm?
20:26:11 <blogan> i think traditional uptime is a reasonable goal at first
20:26:17 <TrevorV> xgerman I'm not sure, I'm just trying to get a feel for things, you know?
20:26:33 <TrevorV> Ooops, left on accident
20:26:45 <xgerman> Never paid for mine:-)
20:26:57 <TrevorV> So that was an interesting find, haproxy has a "downtime" value
20:27:27 <TrevorV> I wonder if we should talk about billing the difference between instance up-time and the haproxy reported "downtime"...
20:27:29 <TrevorV> Hmmm
20:27:34 <johnsom> downtime as in all members were down
20:28:01 <xgerman> That's the customers fault
20:28:06 <johnsom> Right
20:28:22 <TrevorV> Oooh I follow then
20:28:42 <TrevorV> Then I'm not even sure how to collect that data for uptime
20:28:59 <blogan> the traditional way
20:29:03 <blogan> :)
20:29:05 <xgerman> Well, you can always count heartbeats
20:29:17 <johnsom> Pingdom?
20:29:27 <xgerman> :-)
20:29:46 <TrevorV> Are heartbeats specific enough to include an instance is errored?
20:29:56 <johnsom> Sorry, we probably aren't being super helpful right now.
20:29:58 <blogan> heartbeats overwrite each other
20:30:00 <TrevorV> Like, process is "running" but its not actually loadbalancing?
20:30:11 <blogan> so you wouldn't be able to see a history of heartbeats
20:30:14 <blogan> except through logs maybe
20:30:21 <xgerman> It check on the stats socket
20:30:47 <xgerman> Counting heartbeat would mean work
20:30:47 <johnsom> Yeah, heartbeats check stats and make sure the lister count matches expected.
20:30:57 <TrevorV> xgerman so you're saying in getting stats that haproxy is loadbalancing?  There will never be a time that haproxy will report stats when its not actually loadbalancing?
20:31:11 <blogan> if the network in front of it is down
20:31:15 <blogan> the vip network
20:31:22 <blogan> or something north of that
20:31:23 <xgerman> +1
20:31:31 <TrevorV> blogan is that in our control or no?
20:31:38 <blogan> yes and no
20:32:02 <TrevorV> ?
20:32:05 <blogan> octavia's? no, company infrastructure? yes
20:32:22 <TrevorV> In other words, they should not be billed for it.
20:33:05 <blogan> i think thats better handled by auditing after an event like that happens
20:33:29 <TrevorV> So, in my opinion, we (maybe just Rackspace) will need a mechanism to capture/calculate up-time, and have it reported somehow to the CW to put it in the DB
20:33:32 <johnsom> Right now, for health monitoring, we consider it healthy if we get stats back and the listeners are all present.
20:33:33 <johnsom> https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/update_db.py#L66
20:34:29 <TrevorV> Right johnsom but we just identified a scenario where that's "inaccurate"
20:34:34 <TrevorV> When the VIP network is having issues/
20:34:35 <blogan> TrevorV: the simple calculation of uptime by when the lb was created doesn't satisfy that?
20:34:44 <TrevorV> blogan that's overbilling
20:34:58 <blogan> TrevorV: thats subjective
20:35:04 <xgerman> Makes more money...
20:35:10 <TrevorV> Yep, and jorge would say we can't do that
20:35:31 <xgerman> Him!
20:35:32 <blogan> TrevorV: would he?
20:35:35 <TrevorV> blogan I can say that billing off "amphora" uptime wouldn't necessarily be a problem
20:35:49 <johnsom> I think it is really a policy thing to define your SLA and how it's measured.
20:35:57 <blogan> johnsom: agreed
20:36:01 <TrevorV> Since we capture bandwidth and (other thing I can't remember) already, we'll still be billing for resources consumed
20:36:03 <blogan> TrevorV: lets talk about if offline
20:36:05 <blogan> with jorge
20:36:10 <blogan> bc i think thats a lofty requirement
20:36:18 <TrevorV> kk blogan
20:36:43 <TrevorV> Looks like we're tired of the discussion gents, but knowing that we already store those other 2 fields in the DB, we (at rackspace) should be able to move forward when we're ready, thanks guys!
20:37:13 <blogan> TrevorV: we store it by amp, still need an aggregator
20:37:28 <ptoohill> Yea we discussed a good portion of this yesterday
20:37:32 <TrevorV> blogan I know, but we can make that happen in the "driver" we write for it
20:37:40 <blogan> which maybe the metering service handles, i dotn know
20:37:47 <blogan> ah okay
20:37:48 <TrevorV> ptoohill my objective was to identify what parts "weren't" done yet.
20:37:50 <johnsom> Yeah, to a point I think as long as it's provisioned, it's billed, and it's our responsibility to keep it healthy within the SLA.  I.e. Act/Stndby, etc
20:38:09 <blogan> man i already feel out of the loop with internal workings
20:38:20 <TrevorV> Yeah, I agree johnsom I was just looking for the easiest way to get that information
20:38:31 <TrevorV> Make things easier, etc etc
20:38:36 <TrevorV> For the billing peeps I mean
20:38:41 <TrevorV> Alright, I'm good.
20:39:05 <johnsom> Ok.  Any other topics for this week?
20:39:47 <johnsom> Ok, thanks folks!
20:39:52 <TrevorV> :D
20:39:53 <johnsom> #endmeeting