09:00:01 #startmeeting large_scale_sig 09:00:02 Meeting started Wed Dec 18 09:00:01 2019 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:06 The meeting name has been set to 'large_scale_sig' 09:00:09 #topic Rollcall 09:00:11 hi 09:00:24 who is around? 09:00:26 hey all 09:00:32 oneswig, amorin o/ 09:00:46 greetings \o 09:01:00 o/ 09:01:28 While we wait for more, here is our agenda: 09:01:31 #link https://etherpad.openstack.org/p/large-scale-sig-meeting 09:01:37 Feel free to add other topics ! 09:02:15 jiaopengju, Dinesh_Bhor: around? 09:02:28 hi 09:02:30 etp: too 09:02:48 o/ 09:03:12 ok let's get started 09:03:18 #topic Last meeting actions 09:03:28 We have minutes for our last meeting at: 09:03:31 #link http://eavesdrop.openstack.org/meetings/large_scale_sig/2019/large_scale_sig.2019-11-27-09.00.html 09:03:39 We had the following actions: 09:03:48 - ttx to propose large scale SIG creation changes to openstack-sigs repository 09:03:57 That's done, our SIG is now official at https://governance.openstack.org/sigs/ ! 09:04:13 We also have a basic information page on the wiki at https://wiki.openstack.org/wiki/Large_Scale_SIG 09:04:28 - ttx to send meeting summary and create two etherpads to further refine the two initial goals 09:04:28 yay! 09:04:29 Thanks for the patch and organization :-) 09:04:39 I did that too, and we'll dive in the etherpads for the rest of the meeting 09:05:05 #topic Objective: Scaling within one cluster, and instrumentation of the bottlenecks 09:05:28 with the following volunteers: masahito, oneswig, YusukeTatsumi, jiaopengju 09:05:40 #link https://etherpad.openstack.org/p/large-scale-sig-cluster-scaling 09:05:52 To kick off the discussion I suggested two subgoals on that one 09:05:57 - Document "average" single-cluster scaling limits today, and what usually breaks first 09:06:02 - Measurement of MQ behavior through oslo.metrics 09:06:16 Last I looked, nobody jumped on the first one. Does it sound like a valuable goal ? Or should we focus on the second for now? 09:06:37 My thought was that describing what currently happens when we scale a cluster up, and what tends to fail first and at which limit, gives us a good base. 09:07:14 (I personally have no idea because I don;t have that experience, only second-hand accounts) 09:07:19 I think it is a valuable goal. The difficulty is finding someone who does this regularly enough. 09:07:24 what we can without so much effort is collecting the experiences from everybody in the group 09:07:25 And with your collective experience, you're well positioned to describe that. 09:07:42 yes... Does not have to be a formal documentation, we could set it up as a mailing-list thread where each would contribute their experience... 09:07:46 Or an etherpad with a call for participation... 09:08:14 yes, and maybe add a link to this etherpad or the mail on a wiki page 09:08:18 so we can find it easily 09:08:21 Is the first goal like an userstory? 09:09:04 masahito: yes... the goal is to describe what currently breaks first when we scale up. I bet each case is a bit different... but 09:09:20 I also bet there are general issues 09:09:39 If so, I could quickly write the problem we hit before. I thought it targeted some technical document, like actual configuration for scaling. 09:09:45 Like we all know rabbit is (one of the) first to break 09:10:18 No, I was thinking more like "tell us your story when you scaled up, what broke" 09:10:33 (and at which numbers, if possible) 09:11:09 do you prefer etherpad or an email thread as a way to cllect that information? 09:11:19 +1 for etherpad 09:11:27 In both cases we should be ready to seed the discussion, to encourage others to talk 09:11:38 we can do both actually 09:11:42 and by we I mean you, since I don;t have experience to contribute in that area 09:11:51 amorin: not a bad idea 09:11:57 an etherpad and a mail to let everybody aware 09:12:09 post email, and collect the results in an etherpad 09:12:29 +1 09:12:50 +1 09:12:56 +1 09:13:23 OK, so how about... you all prepare a short description of what usually happens to you when you scale up a single cluster up to a point. We wait until after new year. I create an etherpad and send a call for stories early January 09:13:47 and then you reply to that thread with your prepared stories, hopefully encouraging others to chime in 09:13:59 we collect all the stories in a single etherpad 09:14:09 then reduce that etherpad to common trends 09:14:27 good plan 09:14:42 works for me 09:14:46 +1, sorry for my late 09:14:47 I'm okay 09:15:10 #action all prepare a short description of what happens (what breaks first) when scaling up a single cluster 09:15:37 #action ttx to prepare etherpad and send ML thread asking for user scaling stories, to be posted after the end-of-year holidays 09:15:53 ok, we have a plan 09:15:56 <--- happy 09:16:03 - Measurement of MQ behavior through oslo.metrics... 09:16:14 we have some early steps documented 09:16:26 Who is interested in writing that oslo.metric blueprint? 09:16:36 (or collaborate on one) 09:16:47 I think LINE is well advanced on that 09:17:12 yes, I wrote it on the etherpad 09:17:25 maybe you can post the first draft and we can help you review it? 09:17:49 Yes. 09:18:19 #action masahito to produce first draft for the oslo.metric blueprint 09:18:46 ok.. anything more to do immediately on that subgoal? 09:19:25 I don't have much time to work for writing the draft right now. I will be able to push the bp in beggining of next year. 09:19:32 sure, no hurry :) 09:19:39 Just documenting next steps 09:20:46 oneswig added another subgoal: Instrumentation for "golden signals" 09:21:16 I think that would be very valuable. But it feels more like a second-stage goal 09:21:25 I did... I think the terminology in this book might be useful to adopt. 09:21:37 ttx: it's more abstract than specific cases such as rabbit 09:21:38 Once we identify what the current state is 09:21:54 I think golden signals will emerge 09:22:21 unless you think we can start documenting that already? 09:22:34 what do you mean by golden signals? 09:22:40 Not sure. I can start in early 2020. 09:22:43 or maybe as a first step, learn about that concept 09:23:11 golden signals are tell-tales, strong indicators for symptoms of trouble. 09:23:32 do you mean like nova api response time increasing suddenly/steadily? 09:23:41 I know! it's when my phone vibrates 09:24:01 etp: yes, exactly that kind of thing - that would be a latency measurement 09:24:12 ok thanks 09:24:56 those do tend to reveal interesting things :) 09:25:20 apologies I must leave in a few minutes - diary conflict 09:25:48 ok moving on quickly 09:26:08 I propose as action that we read a bit more about golden signals between now and next meeting 09:26:37 #action all learn more about golden signals concept as described in https://landing.google.com/sre/book.html 09:26:53 OK, let's move to the other goal 09:27:04 #topic Objective: Document large scale configuration and tips &tricks 09:27:10 #link https://etherpad.openstack.org/p/large-scale-sig-documentation 09:27:16 Again I did suggest two subgoals based on our discussion last week 09:27:22 err... last meeting 09:27:29 The first one is probably a group exercise... If you find doc or blogposts that can help, add them here. There may not be that many. 09:27:40 Searching planet.openstack.org may help. 09:28:34 #action all add links to relevant articles around large scale openstack (if you find any) to the etherpad 09:28:37 With a Scientific SIG hat on, it might be a useful group for gathering data from. I can follow up on that. 09:28:48 have to go, sorry but do set an action iif you agree. 09:28:55 yes, scientific communities are generally more used to sharing 09:29:26 #action oneswig to follow up with Scientific community to find such articles 09:30:02 The second subgoal is around documenting configuration values 09:30:07 amorin suggested several tactics there 09:30:15 Patching existing docs, producing a new doc, or a mixture of both... 09:30:28 yes 09:30:36 What do others think? 09:30:58 My gut feeling is that this should appear in regular doc, if it's good and current enough to support that. If not good/current, a new doc might be preferable 09:30:58 I think the best is my last proposition 09:31:24 a mix between patching current doc and eventually have an external pages with details 09:31:36 yeah that would probably work best 09:31:49 allows to discover settings, without making the original doc too complex 09:32:23 yup 09:32:32 I have the feeling that it would help 09:32:38 at least, help me :p 09:32:50 +1, external page would be useful linked to golden metrics 09:33:42 OK, so what would be the next step for this? 09:34:18 I dont know how we should proceed 09:34:29 but maybe we should ask the PTL of each project 09:34:30 #agree we should patch existing doc to point to a separate new doc 09:34:47 what they thing about our plan? 09:34:58 maybe they will have ideas 09:35:41 'OK, so the simplest would be to post a ML thread with [largescale-sig] and [ptl] 09:35:57 If that does not work, I can help with engaging more directly 09:36:11 ok 09:36:11 but having the thread as a base we can link to will be useful in all cases 09:36:29 I can start this thread if you want 09:36:52 that would be perfect. Again I think we should avoid the end-of-year, because that's when threads go ignored 09:37:07 ok, I can start it in january 09:37:14 A lot of people just declare ML bankrupcy after the holidays :) 09:37:58 #action amorin to start a thread on documenting configuration defaults for large scale, introduce the "mixture of both" tactic 09:38:34 Alright, we have next steps here too... anything else on that topic? 09:39:24 Any other topic you'd like to discuss? 09:40:14 #topic Any other business 09:40:35 I noticed we have a new group member, mdelavergne 09:40:42 maybe you can introduce yourself? 09:40:44 Yep :) 09:41:06 What would you like to achieve in this group, and do you find our early goals in line with that 09:42:16 Hi everyone, I'm am a PhD student at Inria, I work with the discovery initiative which tried to evaluate Openstack at the edge (so large scale and wanwide), and I'm trying to help with pointers to what we did, and trying to learn from your experiences mostly ! 09:42:38 perfect! welcome 09:42:47 thanks :) 09:42:48 welcome! 09:43:01 welcome! 09:43:07 welcome :) 09:43:18 Time to discuss next meeting date... 09:43:25 I think it's safe to skip the next two weeks. Do you prefer January 8 or January 15 for next meeting? 09:43:57 I'm not available January 8. 09:44:01 I feel like we won't have made much progress by January 8... Maybe the 15th would be better? Just before Chinese new year 09:44:21 As long as we start those threads before we should be good 09:44:23 15th is ok for me 09:44:31 15th works for me. 09:44:35 +1 for 15th 09:44:43 15th is fine by me 09:44:45 15th is ok for me 09:45:02 Alright looks like a winner. Hopefully belmiro will be able to join us on that date 09:45:19 #agreed next meeting: January 15, 9utc #openstack-meeting 09:45:43 I'll send the meeting summary and the list of actions. 09:46:03 We have a bunch of threads to start, and the week of January 6 is probably the best for that 09:46:37 Alright... anything else before we close? 09:47:29 Thanks again to everyone for your participation in the group, and have a great end of year 09:47:38 #endmeeting