16:00:14 <johnsom> #startmeeting Octavia 16:00:15 <openstack> Meeting started Wed Jul 1 16:00:14 2020 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:18 <openstack> The meeting name has been set to 'octavia' 16:00:25 <rm_work> o/ 16:00:26 <johnsom> Hi everyone 16:00:28 <ataraday_> hi 16:00:29 <cgoncalves> hi 16:00:56 <johnsom> #topic Announcements 16:01:06 <johnsom> I don't really have any announcements this week. 16:01:35 <johnsom> We finally got a stable/train release out the door. I think it had been nine months since the last release. 16:01:40 * johnsom shames himself 16:01:56 <johnsom> Any other announcements this week? 16:02:56 <johnsom> #topic Brief progress reports / bugs needing review 16:03:43 <ataraday_> I create patch with experimental job with amphorav2 https://review.opendev.org/#/c/737993/ and find out that several things is/got broken. 16:03:52 <ataraday_> #link https://review.opendev.org/#/c/738609/ 16:04:03 <johnsom> Aside from reviews, rebases, and the occasional small bug fix, I have been focusing on failover for amphorav2. It's a bit slow going, but making progress. 16:04:42 <ataraday_> there is also issue with barbican tls, still looking into it.. 16:04:54 <johnsom> I think after we have that landed there are going to be some patches to simplify and clean up stuff. I'm trying to not go crazy doing that now as I don't want another monster patch and want to move faster on this patch. 16:05:08 <johnsom> Nice, thank you. 16:05:12 <ataraday_> johnsom, Thanks for proposing failover for amphorav2! 16:05:23 <johnsom> I'm also looking into why the IPv6 job is timing out. sigh 16:05:37 <johnsom> ataraday_ Still a lot to be done, but work in progress 16:06:36 <cgoncalves> I extended the amphora flavor capabilities to add amp_image_tag. this is useful for multi-architecture clouds and testing amphora images in staging environments, for example. as part of that work I also removed some deprecated options and create an image driver interface (noop and glance drivers) 16:07:05 <johnsom> Yeah, nice! I think I have reviewed about half of that now 16:07:19 <rm_work> I'm revisiting the failover threshold thingy 16:07:31 <rm_work> #link https://review.opendev.org/#/c/656811/ 16:07:45 <johnsom> Nice, also helpful 16:07:46 <cgoncalves> I successfully deployed via devstack Octavia in an arm64/aarch64 system, although the amphora agent is not coming up in Zuul CI. this is a side project, thus low priority for me 16:08:10 <rm_work> isn't TrevorV working on that stuff? aarch64 16:08:26 <johnsom> Trevor is working on power 16:08:32 <rm_work> ahh he's on ppc, ok 16:09:04 <johnsom> There was a mailing list post about the ppc work and Octavia I mentioned last week. I think I was the only one to reply 16:09:10 <cgoncalves> OpenDev has aarch64 nodepool nodes, thanks to Linaro! 16:09:16 <rm_work> what's a "mailing list" 16:09:21 <johnsom> lol 16:09:32 <rm_work> I forgot what email was after Google killed Inbox 16:09:33 <rm_work> RIP Inbox 16:09:34 <johnsom> You sign up for coupons with one 16:09:43 <rm_work> RIP email 16:09:44 <cgoncalves> IT'S A TRAP! 16:10:18 <rm_work> ooo nice, aarch64 testing resources is good :) do we also have ppc? 16:10:47 <cgoncalves> I found a clever way to run our CI jobs in nest-virt enabled nodes, cutting job time from close to 2 hours down to as low as 38 minutes 16:10:53 <cgoncalves> #link https://review.opendev.org/#/c/738246/ 16:10:58 <johnsom> I don't think so, not in zuul at least. Red Hat has some ppc stuffs 16:11:01 <rm_work> yeah I just +2'd that, seems workable 16:11:35 <cgoncalves> yeah, we have also ppc systems internally but, as you said, TrevorV has been working on that 16:12:07 <johnsom> Yeah, I don't have cycles to put into that at the moment 16:12:09 * rm_work is just enjoying causing TrevorV's IRC client to possibly ping 16:12:12 <TrevorV> Woah woah woah! I haven't touched arm 16:12:18 <rm_work> haha there we go 16:12:26 <TrevorV> What'd I miss now? 16:12:28 <rm_work> yes, johnsom pointed out it was ppc :D 16:12:34 <johnsom> Arm either, though I have run our code on a raspberry pi 4 with very good results. 16:12:38 <rm_work> I just remembered you were on some alternate arch 16:12:58 <cgoncalves> TrevorV, we were saying you've been working on ppc, not arm 16:13:52 <rm_work> ok so moving on 16:13:58 <johnsom> Yep 16:14:09 <johnsom> #topic Open Discussion 16:14:17 <johnsom> Someone seems anxious.... 16:14:44 <johnsom> Any other topics this week? 16:15:16 <cgoncalves> I asked just before the meeting started a question about the amphorav2. would now be a good time to discuss it? 16:16:01 <johnsom> Sure 16:16:13 <cgoncalves> ok, I'll paste the question again: 16:16:16 <cgoncalves> have we considered either renaming the amphorav2 provider to amphora or alias amphora->amphorav2 once we switch the default to amphorav2? 16:16:21 <johnsom> I think it is a logical choice to make that change at some point, but given the extra infrastructure I'm not sure aliasing amphorav2 to amphora is a good idea. 16:16:29 * johnsom pastes his answer again 16:16:50 <cgoncalves> ok, so you're not for aliasing. are you then for renaming or? 16:17:07 <johnsom> That said, I might look at if we can have the v2 path jobboard/extra requirements optional. It seems like it should be do-able 16:18:05 <johnsom> Interested in what people think.... 16:18:15 <rm_work> Yeah I had some concerns here 16:18:51 <rm_work> If we deprecate/remove the amphora (v1) driver, we are going to be explicitly choosing to make our default deployment more complex than previously (introducing a second required service) 16:19:11 <rm_work> if we DON'T, we will forever have two code-paths, and that sucks, so I don't think it's really an option T_T 16:19:51 <johnsom> Yeah, we have to deprecate the v1 path. It's just not workable long term IMO 16:19:53 <rm_work> SO going from that: if we deprecate the old version, and force people over to the new version, given the new complexity/requirements, they MUST explicitly switch over for it to work 16:20:49 <rm_work> so for people who do upgrades without reading release-notes, if they upgrade from say X->Y release and v1 becomes v2, it's just going to explode but not "cleanly"/"clearly" 16:21:19 <rm_work> rather than "the provider does not exist" it'll be some error nestled down inside the worker where it tries and fails to connect to unconfigured redis 16:22:18 <johnsom> We can put some tooling in to warn people or make it obvious early. I think Ann already has updated the upgrade check tool. 16:22:33 * johnsom notes, upgrade check tool nobody runs.... 16:22:35 <cgoncalves> ataraday_ has a pre-upgrade check for the amphorav2 provider that may be handy 16:22:37 <cgoncalves> #link https://review.opendev.org/#/c/735556/ 16:22:41 <rm_work> yeah 16:23:26 <johnsom> We can set it up to check on startup and fail to start if it's not present as well. 16:23:52 <rm_work> fail to start if a flavor is using a provider that doesn't exist? 16:24:38 <johnsom> Or the default is v2, yeah, basically. It's ugly, but possible 16:25:59 <rm_work> anyway, yeah, if we rename/alias automatically then it really hides stuff from the operator that I don't think we should be hiding 16:26:41 <johnsom> So I think the original question was about the name. I assume for upgrade reasons. Is there a need to rename it from amphorav2? 16:26:54 <cgoncalves> amphorav2 should be able manage amphorav1-created resources, correct? when we remove the amphorav1 code, we could either do a db migration to change the provider driver of existing LBs to amphorav2 or rename amphorav2 to amphora (with the bonus that we don't potentially break things for the user as the provider name would not change) 16:26:56 <rm_work> i mean, it does look kinda weird 16:27:52 <rm_work> I guess so, with the caveat that we prevent the service from starting if everything for amphorav2 isn't configured properly? 16:28:05 <rm_work> which kinda handles that in a more up-front way 16:28:13 <cgoncalves> my question comes from a deployment side as I started work to support the amphorav2 in tripleo 16:28:47 <rm_work> yeah ok i'm coming around, maybe we do alias it, and deal with my concerns via the service startup checking its config is valid 16:29:11 <rm_work> same as "can i connect to SQL" and "can I connect to RMQ", add "can I connect to Redis/Zookeeper" 16:29:15 <johnsom> I think we should also look at making the "jobboard" part of the v2 driver optional. 16:29:24 <rm_work> hmm that is the other posibility 16:29:38 <rm_work> I actually would prefer that but I didn't know if THAT was feasible, as it's pretty baked in 16:29:42 <johnsom> We should be able to run the new flows just like we have, without the need for redis, and have a bit-flip for jobboard 16:30:06 <rm_work> yeah ok I think that is my #1 preferred option -- and then yes, alias amphora to amphorav2 16:30:09 <johnsom> Really it's about how we launch the flows. 16:30:21 <rm_work> and keep the default to False for "use_jobboard" 16:31:00 <ataraday_> without jobboard we won't be able to resume jobs 16:31:18 <rm_work> right 16:31:19 <ataraday_> so why we will may need that? 16:31:27 <johnsom> Right, that would be the trade off of that setting 16:31:35 <rm_work> so Octavia is still possible to run without Redis 16:31:41 <ataraday_> in this case we may leave amphora 16:31:45 <rm_work> and using no additional upgrade requirements 16:32:12 <openstackgerrit> Pierre Riteau proposed openstack/octavia master: Add debootstrap installation instructions for CentOS https://review.opendev.org/738885 16:32:15 <rm_work> basically that allows for what I wanted as far as "keeping v1 and v2 around" except we don't ACTUALLY have to keep v1 around, and consolidate code paths 16:32:29 <rm_work> because the ONLY advantage of v1 was not requiring Redis 16:34:38 <johnsom> Yeah, I really don't want to keep the v1 code around. That is asking for mistakes to be made and doubles work effort (I'm feeling the pain now, lol) 16:35:16 <rm_work> yep 16:35:31 <rm_work> i absolutely do not want to keep it around, for the record 16:35:56 <cgoncalves> +1. amphorav2++ 16:35:59 <rm_work> I just hated the idea of having no way to run without jobboard (if you want a simpler install, with the possibility of stuff stuck in PENDING) 16:36:09 <johnsom> So, maybe in parallel someone can look at making that config setting? Or I could look at it after failover is done. 16:37:09 <johnsom> I think it's just making a method that runs the flows like the v1 driver does or like the v2 driver does, depending on the config setting. 16:38:03 <cgoncalves> I have little to none (more like none TBH, lol) understanding on the amphorav2/jobboard but I could maybe take a look 16:38:31 <ataraday_> this may make code really twisted 16:38:46 <johnsom> cgoncalves Ok cool. I can give you a few pointers to what I'm thinking. 16:38:52 * cgoncalves l 16:38:56 <cgoncalves> oops! 16:39:24 <rm_work> yeah, my concern (and why I didn't ask for this originally) was that it might not even be possible or would make things incredibly more complex within v2 16:39:33 <johnsom> ataraday_ Yeah, I may be overlooking something, but I think we should give it a shot. 16:40:21 <aannuusshhkkaa> we needed feedback on metric selection for the amphoras.. can we take that up next? 16:40:28 <rm_work> ok, so decision was: yes to alias, and try to make v2 have a "jobboardless" option? 16:40:45 <cgoncalves> if feasible, I'd advocate for use_jobboard=true as default. devstack can set redis up and is our recommendation for production environments, right? 16:40:53 <johnsom> Or was it "alias if we can make the jobboard part optional"? 16:40:58 <rm_work> hmm maybe that 16:41:20 <rm_work> yes, let's plan to take that topic up next aannuusshhkkaa! 16:41:21 <johnsom> Yeah, we should push for it as the default 16:41:33 <rm_work> I am unsure I agree 16:42:12 <rm_work> but I guess it is as simple as "oh, my service won't start due to an error that says I don't have redis configured and might need to turn off use_jobboard" and then they do that 16:42:21 <johnsom> I think it is a question of timing. 16:42:24 <cgoncalves> rm_work, before this conversation, amphorav2 was already set to become the default in Victoria or later release so either way Redis would be required 16:43:21 <rm_work> yeah which I never liked 16:43:24 <johnsom> We need to make a call by the end of ms2 because we need to send an e-mail out about the need for Redis to the deployment tools have time to add it. 16:43:36 <rm_work> ok 16:43:38 <cgoncalves> you get the bonus now that there may be a chance jobboard/redis not to be mandatory :) 16:43:47 <rm_work> lets do some research on this and see if it's even possible to make it optional? 16:44:02 <rm_work> then revisit before ms2 16:44:05 <johnsom> Ok, yeah, I agree 16:44:12 <johnsom> MS2 is coming up fast though 16:44:33 <rm_work> kk 16:44:35 <johnsom> Week of July 27th 16:44:39 <rm_work> can we move on to metrics then? 16:44:44 <aannuusshhkkaa> yes! 16:44:47 <johnsom> #link https://releases.openstack.org/victoria/schedule.html 16:44:47 <aannuusshhkkaa> :D 16:45:09 <cgoncalves> metrics, that's why you were anxious earlier :D 16:45:14 <johnsom> I think so. What is up with metrics? I know there is a patch that needs some reviews 16:45:17 <rm_work> heh 16:45:22 <aannuusshhkkaa> here is the list of metrics we are thinking of implementing: 16:45:22 <aannuusshhkkaa> Must Haves: 16:45:22 <aannuusshhkkaa> CPU Usage (Current %) 16:45:22 <aannuusshhkkaa> Load Averages 16:45:22 <aannuusshhkkaa> RAM Usage (some combo of: total / free / available / cached / etc) 16:45:23 <aannuusshhkkaa> Nice to Haves: 16:45:23 <aannuusshhkkaa> Disk usage (used/free? or used%?) 16:45:24 <aannuusshhkkaa> Random extra HAProxy fields (taking suggestions) 16:45:25 <aannuusshhkkaa> are we good on the ones we have selected? are we missing something? have we included something that isn’t plausible? 16:45:26 <rm_work> yes, we have one patch up that changes the interfaces around slightly 16:45:53 <rm_work> ack, in the future you should paste multi-line stuff into something like http://paste.openstack.org/ 16:46:18 <johnsom> Do we need load averages? Personally I would like to keep the data minimal, so just %'s 16:47:01 <rm_work> I feel like it's generally more useful than "point in time" CPU usage 16:47:03 <johnsom> Yeah, you can get booted off the server to too much multi-line pastes. 16:47:22 <aannuusshhkkaa> rm_work gotcha 16:48:16 <aannuusshhkkaa> johnsom, haha okay! thanks for the correction.. 16:48:32 <johnsom> Ok, if you have a use for it. 16:48:48 <rm_work> Well, it's super easy for a single point in time CPU % to be totally off 16:48:53 <johnsom> aannuusshhkkaa The server can consider you a spam bot basically. 16:49:09 <aannuusshhkkaa> yeap that makes sense.. will keep that in mind.. 16:49:43 <johnsom> Yeah, but we will get samples every 10 seconds or so. 16:49:48 <rm_work> whereas load averages are very nice locally generated averages that we can keep collecting at a much slower interval and still have them be useful 16:50:09 <rm_work> yeah if we were collecting every second, doing our own averages might make more sense, but... 16:50:40 <johnsom> Yeah, maybe. You would have to then get the number of cores to calculate a percent 16:51:12 <rm_work> hmm yes that is true 16:51:30 <rm_work> possibly do THAT locally? and return load average %s 16:51:41 <rm_work> rather than have to ship that info up and calculate later 16:52:05 <johnsom> That could work. 16:52:17 <rm_work> anyway, we will look into options for that -- any comments about the others? RAM numbers that are actually useful? 16:52:26 <rm_work> Linux "free memory" is basically a useless metric 16:52:29 <rm_work> AFAICT 16:52:47 <rm_work> but I don't know exactly which combination of RAM metrics is *actually* most useful 16:53:02 <johnsom> Yeah, really I'm looking for % of available memory 16:53:13 <johnsom> available/total 16:53:28 <johnsom> cache and free, not so useful to me 16:55:16 <aannuusshhkkaa> okay.. what about disk usage? used %s? 16:56:20 <rm_work> I would assume used% is better possibly? though I also wonder if without context that could be not so useful if it says 50% and then you wonder "at what rate will that fill" 16:56:46 <johnsom> Rate is why you have time series 16:57:08 <johnsom> IMO 16:57:25 <johnsom> timeseries/deltas 16:58:12 <johnsom> A simple scaling driver could have just thresholds and ignore rate. A fancy driver could build a rate and make decisions. 16:58:59 <johnsom> Then at some point someone can add AI and ML, then build a model of what works and doesn't. Then we retire 16:59:14 <aannuusshhkkaa> lol 16:59:29 <rm_work> so that means... percentage? or used/total? lol 16:59:39 <johnsom> Oh, just about out of time for the meeting. We can continue after if you still have questions. 16:59:47 <johnsom> Or defer to next week. 16:59:52 <aannuusshhkkaa> yes we do.. 17:00:04 <aannuusshhkkaa> next week would probably be a little too late.. 17:00:10 <johnsom> #endmeeting