09:00:01 <ttx> #startmeeting large_scale_sig
09:00:02 <openstack> Meeting started Wed Dec 18 09:00:01 2019 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:06 <openstack> The meeting name has been set to 'large_scale_sig'
09:00:09 <ttx> #topic Rollcall
09:00:11 <oneswig> hi
09:00:24 <ttx> who is around?
09:00:26 <amorin> hey all
09:00:32 <ttx> oneswig, amorin o/
09:00:46 <oneswig> greetings \o
09:01:00 <masahito> o/
09:01:28 <ttx> While we wait for more, here is our agenda:
09:01:31 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-meeting
09:01:37 <ttx> Feel free to add other topics !
09:02:15 <ttx> jiaopengju, Dinesh_Bhor: around?
09:02:28 <mdelavergne> hi
09:02:30 <ttx> etp: too
09:02:48 <etp> o/
09:03:12 <ttx> ok let's get started
09:03:18 <ttx> #topic Last meeting actions
09:03:28 <ttx> We have minutes for our last meeting at:
09:03:31 <ttx> #link http://eavesdrop.openstack.org/meetings/large_scale_sig/2019/large_scale_sig.2019-11-27-09.00.html
09:03:39 <ttx> We had the following actions:
09:03:48 <ttx> - ttx to propose large scale SIG creation changes to openstack-sigs repository
09:03:57 <ttx> That's done, our SIG is now official at https://governance.openstack.org/sigs/ !
09:04:13 <ttx> We also have a basic information page on the wiki at https://wiki.openstack.org/wiki/Large_Scale_SIG
09:04:28 <ttx> - ttx to send meeting summary and create two etherpads to further refine the two initial goals
09:04:28 <amorin> yay!
09:04:29 <masahito> Thanks for the patch and organization :-)
09:04:39 <ttx> I did that too, and we'll dive in the etherpads for the rest of the meeting
09:05:05 <ttx> #topic Objective: Scaling within one cluster, and instrumentation of the bottlenecks
09:05:28 <ttx> with the following volunteers: masahito, oneswig, YusukeTatsumi, jiaopengju
09:05:40 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-cluster-scaling
09:05:52 <ttx> To kick off the discussion I suggested two subgoals on that one
09:05:57 <ttx> - Document "average" single-cluster scaling limits today, and what usually breaks first
09:06:02 <ttx> - Measurement of MQ behavior through oslo.metrics
09:06:16 <ttx> Last I looked, nobody jumped on the first one. Does it sound like a valuable goal ? Or should we focus on the second for now?
09:06:37 <ttx> My thought was that describing what currently happens when we scale a cluster up, and what tends to fail first and at which limit, gives us a good base.
09:07:14 <ttx> (I personally have no idea because I don;t have that experience, only second-hand accounts)
09:07:19 <oneswig> I think it is a valuable goal.  The difficulty is finding someone who does this regularly enough.
09:07:24 <amorin> what we can without so much effort is collecting the experiences from everybody in the group
09:07:25 <ttx> And with your collective experience, you're well positioned to describe that.
09:07:42 <ttx> yes... Does not have to be a formal documentation, we could set it up as a mailing-list thread where each would contribute their experience...
09:07:46 <ttx> Or an etherpad with a call for participation...
09:08:14 <amorin> yes, and maybe add a link to this etherpad or the mail on a wiki page
09:08:18 <amorin> so we can find it easily
09:08:21 <masahito> Is the first goal like an userstory?
09:09:04 <ttx> masahito: yes... the goal is to describe what currently breaks first when we scale up. I bet each case is a bit different... but
09:09:20 <ttx> I also bet there are general issues
09:09:39 <masahito> If so, I could quickly write the problem we hit before. I thought it targeted some technical document, like actual configuration for scaling.
09:09:45 <ttx> Like we all know rabbit is (one of the) first to break
09:10:18 <ttx> No, I was thinking more like "tell us your story when you scaled up, what broke"
09:10:33 <ttx> (and at which numbers, if possible)
09:11:09 <ttx> do you prefer etherpad or an email thread as a way to cllect that information?
09:11:19 <amorin> +1 for etherpad
09:11:27 <ttx> In both cases we should be ready to seed the discussion, to encourage others to talk
09:11:38 <amorin> we can do both actually
09:11:42 <ttx> and by we I mean you, since I don;t have experience to contribute in that area
09:11:51 <ttx> amorin: not a bad idea
09:11:57 <amorin> an etherpad and a mail to let everybody aware
09:12:09 <ttx> post email, and collect the results in an etherpad
09:12:29 <masahito> +1
09:12:50 <amorin> +1
09:12:56 <etp> +1
09:13:23 <ttx> OK, so how about... you all prepare a short description of what usually happens to you when you scale up a single cluster up to a point. We wait until after new year. I create an etherpad and send a call for stories early January
09:13:47 <ttx> and then you reply to that thread with your prepared stories, hopefully encouraging others to chime in
09:13:59 <ttx> we collect all the stories in a single etherpad
09:14:09 <ttx> then reduce that etherpad to common trends
09:14:27 <amorin> good plan
09:14:42 <etp> works for me
09:14:46 <jiaopengju1> +1, sorry for my late
09:14:47 <masahito> I'm okay
09:15:10 <ttx> #action all prepare a short description of what happens (what breaks first) when scaling up a single cluster
09:15:37 <ttx> #action ttx to prepare etherpad and send ML thread asking for user scaling stories, to be posted after the end-of-year holidays
09:15:53 <ttx> ok, we have a plan
09:15:56 <ttx> <--- happy
09:16:03 <ttx> - Measurement of MQ behavior through oslo.metrics...
09:16:14 <ttx> we have some early steps documented
09:16:26 <ttx> Who is interested in writing that oslo.metric blueprint?
09:16:36 <ttx> (or collaborate on one)
09:16:47 <ttx> I think LINE is well advanced on that
09:17:12 <masahito> yes, I wrote it on the etherpad
09:17:25 <ttx> maybe you can post the first draft and we can help you review it?
09:17:49 <masahito> Yes.
09:18:19 <ttx> #action masahito to produce first draft for the oslo.metric blueprint
09:18:46 <ttx> ok.. anything more to do immediately on that subgoal?
09:19:25 <masahito> I don't have much time to work for writing the draft right now. I will be able to push the bp in beggining of next year.
09:19:32 <ttx> sure, no hurry :)
09:19:39 <ttx> Just documenting next steps
09:20:46 <ttx> oneswig added another subgoal: Instrumentation for "golden signals"
09:21:16 <ttx> I think that would be very valuable. But it feels more like a second-stage goal
09:21:25 <oneswig> I did... I think the terminology in this book might be useful to adopt.
09:21:37 <oneswig> ttx: it's more abstract than specific cases such as rabbit
09:21:38 <ttx> Once we identify what the current state is
09:21:54 <ttx> I think golden signals will emerge
09:22:21 <ttx> unless you think we can start documenting that already?
09:22:34 <amorin> what do you mean by golden signals?
09:22:40 <oneswig> Not sure.  I can start in early 2020.
09:22:43 <ttx> or maybe as a first step, learn about that concept
09:23:11 <oneswig> golden signals are tell-tales, strong indicators for symptoms of trouble.
09:23:32 <etp> do you mean like nova api response time increasing suddenly/steadily?
09:23:41 <ttx> I know! it's when my phone vibrates
09:24:01 <oneswig> etp: yes, exactly that kind of thing - that would be a latency measurement
09:24:12 <amorin> ok thanks
09:24:56 <etp> those do tend to reveal interesting things :)
09:25:20 <oneswig> apologies I must leave in a few minutes - diary conflict
09:25:48 <ttx> ok moving on quickly
09:26:08 <ttx> I propose as action that we read a bit more about golden signals between now and next meeting
09:26:37 <ttx> #action all learn more about golden signals concept as described in https://landing.google.com/sre/book.html
09:26:53 <ttx> OK, let's move to the other goal
09:27:04 <ttx> #topic Objective: Document large scale configuration and tips &tricks
09:27:10 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-documentation
09:27:16 <ttx> Again I did suggest two subgoals based on our discussion last week
09:27:22 <ttx> err... last meeting
09:27:29 <ttx> The first one is probably a group exercise... If you find doc or blogposts that can help, add them here. There may not be that many.
09:27:40 <ttx> Searching planet.openstack.org may help.
09:28:34 <ttx> #action all add links to relevant articles around large scale openstack (if you find any) to the etherpad
09:28:37 <oneswig> With a Scientific SIG hat on, it might be a useful group for gathering data from.  I can follow up on that.
09:28:48 <oneswig> have to go, sorry but do set an action iif you agree.
09:28:55 <ttx> yes, scientific communities are generally more used to sharing
09:29:26 <ttx> #action oneswig to follow up with Scientific community to find such articles
09:30:02 <ttx> The second subgoal is around documenting configuration values
09:30:07 <ttx> amorin suggested several tactics there
09:30:15 <ttx> Patching existing docs, producing a new doc, or a mixture of both...
09:30:28 <amorin> yes
09:30:36 <ttx> What do others think?
09:30:58 <ttx> My gut feeling is that this should appear in regular doc, if it's good and current enough to support that. If not good/current, a new doc might be preferable
09:30:58 <amorin> I think the best is my last proposition
09:31:24 <amorin> a mix between patching current doc and eventually have an external pages with details
09:31:36 <ttx> yeah that would probably work best
09:31:49 <ttx> allows to discover settings, without making the original doc too complex
09:32:23 <amorin> yup
09:32:32 <amorin> I have the feeling that it would help
09:32:38 <amorin> at least, help me :p
09:32:50 <etp> +1, external page would be useful linked to golden metrics
09:33:42 <ttx> OK, so what would be the next step for this?
09:34:18 <amorin> I dont know how we should proceed
09:34:29 <amorin> but maybe we should ask the PTL of each project
09:34:30 <ttx> #agree we should patch existing doc to point to a separate new doc
09:34:47 <amorin> what they thing about our plan?
09:34:58 <amorin> maybe they will have ideas
09:35:41 <ttx> 'OK, so the simplest would be to post a ML thread with [largescale-sig] and [ptl]
09:35:57 <ttx> If that does not work, I can help with engaging more directly
09:36:11 <amorin> ok
09:36:11 <ttx> but having the thread as a base we can link to will be useful in all cases
09:36:29 <amorin> I can start this thread if you want
09:36:52 <ttx> that would be perfect. Again I think we should avoid the end-of-year, because that's when threads go ignored
09:37:07 <amorin> ok, I can start it in january
09:37:14 <ttx> A lot of people just declare ML bankrupcy after the holidays :)
09:37:58 <ttx> #action amorin to start a thread on documenting configuration defaults for large scale, introduce the "mixture of both" tactic
09:38:34 <ttx> Alright, we have next steps here too... anything else on that topic?
09:39:24 <ttx> Any other topic you'd like to discuss?
09:40:14 <ttx> #topic Any other business
09:40:35 <ttx> I noticed we have a new group member, mdelavergne
09:40:42 <ttx> maybe you can introduce yourself?
09:40:44 <mdelavergne> Yep :)
09:41:06 <ttx> What would you like to achieve in this group, and do you find our early goals in line with that
09:42:16 <mdelavergne> Hi everyone, I'm am a PhD student at Inria, I work with the discovery initiative which tried to evaluate Openstack at the edge (so large scale and wanwide), and I'm trying to help with pointers to what we did, and trying to learn from your experiences mostly !
09:42:38 <ttx> perfect! welcome
09:42:47 <mdelavergne> thanks :)
09:42:48 <amorin> welcome!
09:43:01 <etp> welcome!
09:43:07 <masahito> welcome :)
09:43:18 <ttx> Time to discuss next meeting date...
09:43:25 <ttx> I think it's safe to skip the next two weeks. Do you prefer January 8 or January 15 for next meeting?
09:43:57 <masahito> I'm not available January 8.
09:44:01 <ttx> I feel like we won't have made much progress by January 8... Maybe the 15th would be better? Just before Chinese new year
09:44:21 <ttx> As long as we start those threads before we should be good
09:44:23 <jiaopengju1> 15th is ok for me
09:44:31 <masahito> 15th works for me.
09:44:35 <etp> +1 for 15th
09:44:43 <mdelavergne> 15th is fine by me
09:44:45 <amorin> 15th is ok for me
09:45:02 <ttx> Alright looks like a winner. Hopefully belmiro will be able to join us on that date
09:45:19 <ttx> #agreed next meeting: January 15, 9utc #openstack-meeting
09:45:43 <ttx> I'll send the meeting summary and the list of actions.
09:46:03 <ttx> We have a bunch of threads to start, and the week of January 6 is probably the best for that
09:46:37 <ttx> Alright... anything else before we close?
09:47:29 <ttx> Thanks again to everyone for your participation in the group, and have a great end of year
09:47:38 <ttx> #endmeeting