09:00:34 <ttx> #startmeeting large_scale_sig 09:00:35 <openstack> Meeting started Wed Nov 27 09:00:34 2019 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:38 <openstack> The meeting name has been set to 'large_scale_sig' 09:00:44 <ttx> #topic Rollcall 09:00:49 <oneswig> hi 09:00:51 <YusukeTatsumi18> hi 09:00:54 <belmoreira> o/ 09:01:00 <amorin> hello 09:01:00 <etp> hi 09:01:00 <jiaopengju> hi 09:01:02 <Dinesh_Bhor> Hello all 09:01:04 <ttx> Welcome to the first of what I hope will be a long series of meetings of this SIG! 09:01:06 <masahito> o/ 09:01:20 <ttx> I'd like to start by doing a quick round of introductions, I'll start 09:01:28 <ttx> My name is Thierry Carrez, I manage the engineering team at the OpenStack Foundation 09:01:40 <oneswig> I'm Stig Telfer, CTO, StackHPC 09:01:40 <ttx> My goal here is to facilitate a discussion between OpenStack users 09:01:50 <ttx> and get them engaged to drive common improvements that will make everyone's life better 09:02:43 <etp> I'm Erkki Peura, architect for Nokia private cloud 09:02:53 <YusukeTatsumi18> Hi I'm Yusuke Tatsumi from Yahoo! JAPAN 09:03:10 <jiaopengju> I'am Pengju Jiao from China Mobile 09:03:11 <amorin> I am Arnaud Morin, working for OVH in the team in charge of deploying and operating the Public Cloud infrastructure (based on openstack of course) 09:03:18 <belmoreira> I'm Belmiro. I work at CERN deploying and maintaining our multi cell cloud 09:03:34 <masahito> I'm Masahito Muroi, working for LINE as software engineer. 09:03:54 <Dinesh_Bhor> I'm Dinesh Bhor, I am from LINE Corp. I work as an Infrastructure Enginner. 09:04:51 <ttx> Looks like we lost Yusuke 09:05:11 <YusukeTatsumi> I'm re-logined. 09:05:18 <ttx> Ah great!\ 09:05:29 <ttx> I think we heard from everyone 09:05:32 <ttx> #topic Agree on SIG name 09:05:43 <ttx> We are currently using "large scale SIG" to describe this group 09:05:57 <ttx> Before I formally file the paperwork to create the SIG I'd like to see if that name works 09:06:09 <ttx> On one hand it's a bit vague and with a bit of a wide potential scope 09:06:21 <ttx> On the other we already started to communicate with that name, so maybe it's simpler to continue 09:06:26 <ttx> What is your opinion on that? 09:06:38 <amorin> for me this name is correct 09:06:51 <oneswig> I'm here for the substance, whatever the name :-) 09:06:51 <jiaopengju> agree +1 09:06:55 <belmoreira> +1 09:06:56 <Dinesh_Bhor> I think lets keep it same as now. 09:06:57 <etp> +1 09:07:02 <ttx> Personally I'm ok with that name, as long as we set smaller-scope objectives and don't go in every direction 09:07:02 <masahito> +1 09:07:12 <amorin> agree 09:07:21 <ttx> Like if we are clear on what we want to do, the name doesn't matter much 09:07:22 <YusukeTatsumi> +1 09:07:46 <ttx> #agreed Keep "large scale SIG" as the group name 09:07:54 <ttx> #topic Volunteers for SIG chairing 09:08:06 <ttx> As I said earlier my goal here is to facilitate this discussion, and I'm happy to help chairing the group at the beginning 09:08:16 <ttx> As I said earlier my goal here is to facilitate this discussion, and I'm happy to help chairing the group at the beginning 09:08:18 <ttx> err 09:08:28 <ttx> But I'm not running a large scale deployment of openstack myself, so I'll gladly let anyone else interested take over 09:08:41 <ttx> For now we'd need at least one person that can take over organizing the meeting when I won't be available 09:08:50 <ttx> Is there any volunteer? 09:09:34 <oneswig> We are at the measuring phase of a large-scale deployment, we don't have any length of operational experience to draw from either. 09:09:59 <belmoreira> I can help 09:10:30 <amorin> we run large scale but I am not sure I can run the group for now, I'd prefer if someone else can take the lead 09:10:35 <ttx> belmoreira: thanks! I'll list you as co-chair. I expect to take the bulk of the chairing work, but it's always good to have two names for continuity 09:11:27 <ttx> #info Belmiro will co-chair with Thierry for now 09:11:38 <ttx> unless there are other volunteers :) 09:12:02 <ttx> We could have three chairs, especially if someone from the APAC timezones can help cover there 09:12:39 <ttx> and we don;t have to decide today. Two is good for now 09:12:56 <jiaopengju> Maybe I can help. We run large scale openstack cluster in public cloud 09:13:34 <ttx> jiaopengju: OK, I'll list you as co-chair as well. I like the idea of having geographic distribution for those 09:14:29 <ttx> #info Pengju Jiao will co-chair with Belmiro and Thierry 09:14:39 <ttx> #topic Meetings 09:14:49 <ttx> Now we need to decide how we should make progress in this SIG 09:14:57 <ttx> Do we need synchronous meetings like this one? 09:15:00 <ttx> And if yes, how often should we have them? Is IRC fine? 09:15:10 <ttx> Should we have a permanent IRC channel ? 09:15:25 <ttx> (like #openstack-large-scale) 09:15:46 <belmoreira> I like the idea to have a meeting to sync. 09:15:57 <amorin> +1 09:15:58 <ttx> Personally I feel like we'll need regular meetings, at least at the start, to get it off the ground 09:16:09 <jiaopengju> +1 09:16:13 <Dinesh_Bhor> +1 09:16:16 <YusukeTatsumi> +1 09:16:37 <oneswig> makes sense to me, but how often - every 2 weeks? 09:16:44 <ttx> Should we make those weekly for now? Or every two weeks ? 09:16:46 <ttx> hah 09:16:57 <ttx> Weekly might just be too often 09:17:07 <YusukeTatsumi> prefer every 2 weeks 09:17:20 <jiaopengju> two weeks is ok for me :) 09:17:25 <amorin> we can start with every 2 weeks 09:17:30 <etp> +1 09:17:32 <masahito> Bi-weekly makes sense to me 09:17:46 <ttx> #agreed IRC meeting every 2 weeks 09:17:46 <belmoreira> every 2 weeks I think is a good compromise to start with 09:18:12 <oneswig> ttx: should it cover different time zones? I'm happy with this time but could go up to +12 hours from now too 09:18:34 <ttx> oneswig: good question. The group is currently only Europe and APAC 09:18:44 <ttx> which is why this time makes the most sense 09:18:58 <ttx> If we had people from the US interested, we shoudl probably find a way to rotate 09:19:24 <ttx> but it's not the case yet... so maybe a problem for another time? 09:20:18 <oneswig> True. Meeting every 2 weeks on this time leaves the option of an interleaved meeting at a different time. 09:20:22 <ttx> (the trick being, there is just no convenient time for China/Japan + western Europe + US east _+US west 09:20:59 <ttx> If we keep that day and time every two weeks, does that work for everyone (for now) ? 09:21:12 <amorin> works for me 09:21:16 <belmoreira> good for me 09:21:17 <etp> works for me 09:21:21 <masahito> works for me 09:21:21 <jiaopengju> works for me 09:21:26 <ttx> Fun fact, I won;t be able to run the meeting two weeks from now at this time, being at a conf 09:21:30 <YusukeTatsumi> good for me (from APAC/Japan) 09:22:10 <ttx> Do you think a permanent IRC channel would help? 09:22:21 <ttx> Or should we push as much comm as possible to the ML ? 09:22:22 <belmoreira> for IRC we already have the openstack-operators. I think we shouldn't create a different group ("the large scale operators") but expose everything that we discuss to all operators 09:22:54 <masahito> If not, what's the candidates for IRC channels? the openstack-operators? 09:23:09 <ttx> I feel like leaving communication traces to the mailing-list is a great way to be transparent and encourage others to join 09:23:54 <ttx> We can definitely use #openstack-operators for one-off discussions 09:24:02 <oneswig> agreed - the scientific-sig has a separate IRC channel but it is not used 09:24:47 <amorin> Agree with ML, for the trace and being able to catch back some topics 09:25:04 <masahito> Actually, I'm not available on IRC at night. ML is good to me as first contact points. 09:25:06 <ttx> But I'd rather not force everyone to monitor a IRC channel all the time... 09:25:16 <amorin> I have no strong opinion on IRC channel. openstack-operators is good for me 09:25:16 <ttx> masahito: yes 09:25:18 <masahito> ah, I'm living Japan. 09:26:06 <jiaopengju> irc channel and ML are all good for me 09:26:07 <masahito> For interactive communication, #openstack-operators sounds good to me. 09:26:24 <belmoreira> I agree with ML as the main communication channel. And we can use the openstack-operators for one-off discussions as ttx suggested 09:26:42 <ttx> OK so let's use the mailing-list as our main means of communication... with prefix [large-scale] or [largescale-sig] 09:27:07 <ttx> maybe the latter, so that it's clear it's about the SIG 09:28:34 <ttx> Also we'll likely use a lot of etherpads as we draft goals and create documentation 09:28:34 <ttx> that is all asynchronous and will work better across all of our timezones 09:28:38 <ttx> Does that work? 09:28:48 <YusukeTatsumi> +1 09:28:51 <jiaopengju> +1 09:28:53 <etp> +1 09:28:56 <masahito> +1 09:28:58 <amorin> +1 09:29:06 <belmoreira> +1 09:29:29 <Dinesh_Bhor> +1 09:29:54 <ttx> #agreed Use openstack-discuss with [largescale-sig] for SIG topics. Prefer etherpads and other asynchronous methods of communciation. One-off synchronous discussions in #openstack-operators 09:30:54 <ttx> ok, is there any other logistics questions we need to solve before discussing what we'll actually do? 09:31:33 <ttx> #action ttx to propose large scale SIG creation changes to openstack-sigs repository 09:32:14 <ttx> I'll take that as a "no" 09:32:18 <ttx> #topic Discuss initial SIG objectives 09:32:36 <ttx> So first of all I think it is important to set reasonable objectives 09:32:59 <ttx> In my long experience of such groups in OpenStack history, we always start with a lot of energy 09:33:24 <ttx> but then if we set large goals and go in every direction, that initial energy dissipates fast 09:33:40 <ttx> especially when real world commitments start to disrupt progress 09:34:18 <ttx> It's a lot better to set a small goal and make steady progress toward it 09:34:37 <ttx> rather than set a large goal and abandon it because nobody has enough time 09:35:02 <ttx> But the group should definitely end up producing *something* 09:35:23 <ttx> otherwise without a focal point the energy also dissipates fast :) 09:35:52 <ttx> We had several ideas raised in the discussion we had in Shanghai 09:36:00 <ttx> Notes at: 09:36:04 <ttx> #link https://etherpad.openstack.org/p/PVG-large-scale-SIG 09:36:43 <ttx> amorin mentioned wanting to create or modify existing doc for sensible larger-scale config defaults 09:37:20 <ttx> masahito has work within Oslo to instrument bottlenecks 09:37:21 <belmoreira> the ML thread also adds some high level information 09:37:52 <ttx> does anyone want to propose a topic for the group to initially focus on? 09:38:37 <oneswig> instrumentation is my primary focus at present. 09:38:53 <amorin> what do you mean by instrumentation? 09:39:04 <YusukeTatsumi> what is "large scale" definition? I think about 1k compute-node on one cluster. 09:39:18 <belmoreira> How about to gather the existing information how operators are managing large deployments. During the summits we have a lot of presentations that discuss several aspects: cells, rabbit, ... 09:39:38 <oneswig> amorin: I'm thinking of how do I detect the bottlenecks as the system grows 09:39:50 <amorin> ok 09:40:00 <ttx> YusukeTatsumi: yes one issue was the difference between scale within one cluster (which was my original focus) and more generally large size deployments (lots of clusters) 09:40:33 <ttx> Personally ai think if we focus on scaling within one cluster, it's already a large enough scope 09:40:44 <belmoreira> signal relevant presentations in a document and maybe create a summary would help to avoid rethink a solution that maybe was solved by someone but didn't get a lot of exposure 09:40:45 <ttx> and would raise very interesting questions 09:40:56 <amorin> belmoreira: agree with that, and I think we can share good practices on config params within this topic 09:41:33 <amorin> ^ was about how operators are managing large scale 09:41:58 <ttx> Maybe that's two different axis we can work on. (1) Scaling within one cluster, and instrumentation of the bottlenecks there 09:42:26 <ttx> (2) Document large scale configuration and tips &tricks 09:42:50 <amorin> +1 09:43:14 <masahito> That makes sense. 09:43:56 <jiaopengju> I think we should give all the users confidence in large clusters, so at first we should tell them how large about the clusters that running in product, and then show them how to do this 09:43:57 <YusukeTatsumi> +1 09:43:58 <masahito> My work ttx mentioned above is related to (1) with the direction. 09:44:32 <ttx> A reasonable goal for (1) would be to identify the most obvious bottlenecks and start implementing instrumentation to actually be able to measure it 09:44:53 <ttx> Reasonable goal for (2) is to produce some documentation 09:45:27 <amorin> yup 09:45:40 <ttx> I count masahito, oneswig interested in (1), amorin, belmoreira in (2) 09:45:59 <YusukeTatsumi> I can join to (1) 09:46:05 <belmoreira> honestly I think we shouldn't limit ourselves explicitly to one cell. Different workloads, use cases may require small but multiple cells. I would prefer to consider the architecture choice/botlenecks in terms of use case 09:46:21 <jiaopengju> join (1) 09:46:36 <oneswig> belmoreira: it's more one deployment than one cell isn't it? 09:47:17 <ttx> belmoreira: I agree we should not limit the scope of the SIG to one cell. But raising scaling limits within a single cell/cluster is useful for everyone imho 09:47:31 <ttx> so it can be one of the SIG's lines of work 09:48:15 <belmoreira> ttx of course I agree with that 09:48:45 <belmoreira> oneswig yes, deployment considering the use-case 09:48:45 <ttx> There will always be a point where you have to do multiple cells and clusters... and I agree we should also discuss that within this SIG 09:49:06 <etp> there is also growth point of view, yep 09:49:50 <ttx> Personally I can help both subgroups with their logistics and interactions with openstack project teams 09:50:06 <ttx> Like to set up a repo and jobs to publish docs to a website etc 09:50:27 <ttx> or grease the wheels with Oslo reviews etc 09:50:59 <belmoreira> ttx that's great 09:51:38 <masahito> Thanks. 09:51:43 <ttx> OK, so in terms of immediate actions, and to make progress between now and next meeting... maybe we can start two threads on the ML, one of each subject, to further refine plans 09:52:25 <ttx> Goal generally being to have a more detailed plan to discuss at next meeting for both areas 09:53:30 <ttx> Or should we brainstorm on etherpads first before dropping it on the ML? 09:54:02 <oneswig> I recall there was an interesting discussion in Shanghai about prometheus endpoints being exposed by OpenStack services. I haven't seen any follow-up go by on that but it would be one interesting place to start. 09:55:04 <ttx> One issue of discussing very early steps on the ML (compared to doing it on an etherpad) is that you'll get people outside the SIG starting to shoot down crazy ideas 09:55:05 <oneswig> I think it was in the context of a billing forum 09:55:13 <ttx> and therefore that limits the discussion 09:55:58 <ttx> What's your preference to further refine those two topics? 09:55:58 <belmoreira> ttx yes :) the etherpad may be better to start the discussion 09:56:18 <amorin> I trust your experience, etherpad is good 09:56:30 <ttx> ok, so I'll post a summary of this meeting, and create two etherpads to refine those two topics 09:56:41 <amorin> ok 09:57:00 <ttx> #topic Next meeting 09:57:19 <ttx> #action ttx to send meeting summary and create two etherpads to further refine the two initial goals 09:57:32 <ttx> So as I said earlier, I won;t be around at that time in two weeks 09:58:08 <ttx> also some of us will have the end-of-year holidays after that 09:58:22 <ttx> So I'm wondering if we should not set the next meeting to Dec 18 09:58:23 <amorin> december 11 I am not available neither 09:58:38 <ttx> and then January 8 09:58:45 <oneswig> either date works for me 09:59:04 <ttx> then we can go back to every 2 week 09:59:12 <oneswig> +1 09:59:13 <amorin> dec 18 works for me 09:59:15 <etp> both work for me 09:59:22 <jiaopengju> dec 18 is fine 09:59:34 <oneswig> have to go - thanks ttx & all - see you next time 09:59:41 <masahito> 18 dec works for me 09:59:48 <ttx> Alright! Thanks everyone for attending 09:59:48 <YusukeTatsumi> ether days works 09:59:48 <belmoreira> I will not be available in the 18 10:00:49 <belmoreira> i would propose to work in the scope and then we meet next year 10:01:02 <ttx> belmoreira: will you be available to sync with amorin ahead of the meeting on Dec 18? 10:01:08 <ttx> It's ok to miss a meeting 10:01:46 <belmoreira> I will be off all week 10:01:56 <amorin> arf 10:02:14 <ttx> OK, let's continue that discussion on the ML, and free up people 10:02:25 <ttx> #info Next meeting date to be confirmed 10:02:29 <ttx> #endmeeting