#openstack-meeting log

09:00:34 <ttx> #startmeeting large_scale_sig
09:00:35 <openstack> Meeting started Wed Nov 27 09:00:34 2019 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:38 <openstack> The meeting name has been set to 'large_scale_sig'
09:00:44 <ttx> #topic Rollcall
09:00:49 <oneswig> hi
09:00:51 <YusukeTatsumi18> hi
09:00:54 <belmoreira> o/
09:01:00 <amorin> hello
09:01:00 <etp> hi
09:01:00 <jiaopengju> hi
09:01:02 <Dinesh_Bhor> Hello all
09:01:04 <ttx> Welcome to the first of what I hope will be a long series of meetings of this SIG!
09:01:06 <masahito> o/
09:01:20 <ttx> I'd like to start by doing a quick round of introductions, I'll start
09:01:28 <ttx> My name is Thierry Carrez, I manage the engineering team at the OpenStack Foundation
09:01:40 <oneswig> I'm Stig Telfer, CTO, StackHPC
09:01:40 <ttx> My goal here is to facilitate a discussion between OpenStack users
09:01:50 <ttx> and get them engaged to drive common improvements that will make everyone's life better
09:02:43 <etp> I'm Erkki Peura, architect for Nokia private cloud
09:02:53 <YusukeTatsumi18> Hi I'm Yusuke Tatsumi from Yahoo! JAPAN
09:03:10 <jiaopengju> I'am Pengju Jiao from China Mobile
09:03:11 <amorin> I am Arnaud Morin, working for OVH in the team in charge of deploying and operating the Public Cloud infrastructure (based on openstack of course)
09:03:18 <belmoreira> I'm Belmiro. I work at CERN deploying and maintaining our multi cell cloud
09:03:34 <masahito> I'm Masahito Muroi, working for LINE as software engineer.
09:03:54 <Dinesh_Bhor> I'm Dinesh Bhor, I am from LINE Corp. I work as an Infrastructure Enginner.
09:04:51 <ttx> Looks like we lost Yusuke
09:05:11 <YusukeTatsumi> I'm re-logined.
09:05:18 <ttx> Ah great!\
09:05:29 <ttx> I think we heard from everyone
09:05:32 <ttx> #topic Agree on SIG name
09:05:43 <ttx> We are currently using "large scale SIG" to describe this group
09:05:57 <ttx> Before I formally file the paperwork to create the SIG I'd like to see if that name works
09:06:09 <ttx> On one hand it's a bit vague and with a bit of a wide potential scope
09:06:21 <ttx> On the other we already started to communicate with that name, so maybe it's simpler to continue
09:06:26 <ttx> What is your opinion on that?
09:06:38 <amorin> for me this name is correct
09:06:51 <oneswig> I'm here for the substance, whatever the name :-)
09:06:51 <jiaopengju> agree +1
09:06:55 <belmoreira> +1
09:06:56 <Dinesh_Bhor> I think lets keep it same as now.
09:06:57 <etp> +1
09:07:02 <ttx> Personally I'm ok with that name, as long as we set smaller-scope objectives and don't go in every direction
09:07:02 <masahito> +1
09:07:12 <amorin> agree
09:07:21 <ttx> Like if we are clear on what we want to do, the name doesn't matter much
09:07:22 <YusukeTatsumi> +1
09:07:46 <ttx> #agreed Keep "large scale SIG" as the group name
09:07:54 <ttx> #topic Volunteers for SIG chairing
09:08:06 <ttx> As I said earlier my goal here is to facilitate this discussion, and I'm happy to help chairing the group at the beginning
09:08:16 <ttx> As I said earlier my goal here is to facilitate this discussion, and I'm happy to help chairing the group at the beginning
09:08:18 <ttx> err
09:08:28 <ttx> But I'm not running a large scale deployment of openstack myself, so I'll gladly let anyone else interested take over
09:08:41 <ttx> For now we'd need at least one person that can take over organizing the meeting when I won't be available
09:08:50 <ttx> Is there any volunteer?
09:09:34 <oneswig> We are at the measuring phase of a large-scale deployment, we don't have any length of operational experience to draw from either.
09:09:59 <belmoreira> I can help
09:10:30 <amorin> we run large scale but I am not sure I can run the group for now, I'd prefer if someone else can take the lead
09:10:35 <ttx> belmoreira: thanks! I'll list you as co-chair. I expect to take the bulk of the chairing work, but it's always good to have two names for continuity
09:11:27 <ttx> #info Belmiro will co-chair with Thierry for now
09:11:38 <ttx> unless there are other volunteers :)
09:12:02 <ttx> We could have three chairs, especially if someone from the APAC timezones can help cover there
09:12:39 <ttx> and we don;t have to decide today. Two is good for now
09:12:56 <jiaopengju> Maybe I can help. We run large scale openstack cluster in public cloud
09:13:34 <ttx> jiaopengju: OK, I'll list you as co-chair as well. I like the idea of having geographic distribution for those
09:14:29 <ttx> #info Pengju Jiao will co-chair with Belmiro and Thierry
09:14:39 <ttx> #topic Meetings
09:14:49 <ttx> Now we need to decide how we should make progress in this SIG
09:14:57 <ttx> Do we need synchronous meetings like this one?
09:15:00 <ttx> And if yes, how often should we have them? Is IRC fine?
09:15:10 <ttx> Should we have a permanent IRC channel ?
09:15:25 <ttx> (like #openstack-large-scale)
09:15:46 <belmoreira> I like the idea to have a meeting to sync.
09:15:57 <amorin> +1
09:15:58 <ttx> Personally I feel like we'll need regular meetings, at least at the start, to get it off the ground
09:16:09 <jiaopengju> +1
09:16:13 <Dinesh_Bhor> +1
09:16:16 <YusukeTatsumi> +1
09:16:37 <oneswig> makes sense to me, but how often - every 2 weeks?
09:16:44 <ttx> Should we make those weekly for now? Or every two weeks ?
09:16:46 <ttx> hah
09:16:57 <ttx> Weekly might just be too often
09:17:07 <YusukeTatsumi> prefer every 2 weeks
09:17:20 <jiaopengju> two weeks is ok for me :)
09:17:25 <amorin> we can start with every 2 weeks
09:17:30 <etp> +1
09:17:32 <masahito> Bi-weekly makes sense to me
09:17:46 <ttx> #agreed IRC meeting every 2 weeks
09:17:46 <belmoreira> every 2 weeks I think is a good compromise to start with
09:18:12 <oneswig> ttx: should it cover different time zones?  I'm happy with this time but could go up to +12 hours from now too
09:18:34 <ttx> oneswig: good question. The group is currently only Europe and APAC
09:18:44 <ttx> which is why this time makes the most sense
09:18:58 <ttx> If we had people from the US interested, we shoudl probably find a way to rotate
09:19:24 <ttx> but it's not the case yet... so maybe a problem for another time?
09:20:18 <oneswig> True.  Meeting every 2 weeks on this time leaves the option of an interleaved meeting at a different time.
09:20:22 <ttx> (the trick being, there is just no convenient time for China/Japan + western Europe + US east _+US west
09:20:59 <ttx> If we keep that day and time every two weeks, does that work for everyone (for now) ?
09:21:12 <amorin> works for me
09:21:16 <belmoreira> good for me
09:21:17 <etp> works for me
09:21:21 <masahito> works for me
09:21:21 <jiaopengju> works for me
09:21:26 <ttx> Fun fact, I won;t be able to run the meeting two weeks from now at this time, being at a conf
09:21:30 <YusukeTatsumi> good for me (from APAC/Japan)
09:22:10 <ttx> Do you think a permanent IRC channel would help?
09:22:21 <ttx> Or should we push as much comm as possible to the ML ?
09:22:22 <belmoreira> for IRC we already have the openstack-operators. I think we shouldn't create a different group ("the large scale operators") but expose everything that we discuss to all operators
09:22:54 <masahito> If not, what's the candidates for IRC channels? the openstack-operators?
09:23:09 <ttx> I feel like leaving communication traces to the mailing-list is a great way to be transparent and encourage others to join
09:23:54 <ttx> We can definitely use #openstack-operators for one-off discussions
09:24:02 <oneswig> agreed - the scientific-sig has a separate IRC channel but it is not used
09:24:47 <amorin> Agree with ML, for the trace and being able to catch back some topics
09:25:04 <masahito> Actually, I'm not available on IRC at night. ML is good to me as first contact points.
09:25:06 <ttx> But I'd rather not force everyone to monitor a IRC channel all the time...
09:25:16 <amorin> I have no strong opinion on IRC channel. openstack-operators is good for me
09:25:16 <ttx> masahito: yes
09:25:18 <masahito> ah, I'm living Japan.
09:26:06 <jiaopengju> irc channel and ML are all good for me
09:26:07 <masahito> For interactive communication, #openstack-operators sounds good to me.
09:26:24 <belmoreira> I agree with ML as the main communication channel. And we can use the openstack-operators for one-off discussions as ttx suggested
09:26:42 <ttx> OK so let's use the mailing-list as our main means of communication... with prefix [large-scale] or [largescale-sig]
09:27:07 <ttx> maybe the latter, so that it's clear it's about the SIG
09:28:34 <ttx> Also we'll likely use a lot of etherpads as we draft goals and create documentation
09:28:34 <ttx> that is all asynchronous and will work better across all of our timezones
09:28:38 <ttx> Does that work?
09:28:48 <YusukeTatsumi> +1
09:28:51 <jiaopengju> +1
09:28:53 <etp> +1
09:28:56 <masahito> +1
09:28:58 <amorin> +1
09:29:06 <belmoreira> +1
09:29:29 <Dinesh_Bhor> +1
09:29:54 <ttx> #agreed Use openstack-discuss with [largescale-sig] for SIG topics. Prefer etherpads and other asynchronous methods of communciation. One-off synchronous discussions in #openstack-operators
09:30:54 <ttx> ok, is there any other logistics questions we need to solve before discussing what we'll actually do?
09:31:33 <ttx> #action ttx to propose large scale SIG creation changes to openstack-sigs repository
09:32:14 <ttx> I'll take that as a "no"
09:32:18 <ttx> #topic Discuss initial SIG objectives
09:32:36 <ttx> So first of all I think it is important to set reasonable objectives
09:32:59 <ttx> In my long experience of such groups in OpenStack history, we always start with a lot of energy
09:33:24 <ttx> but then if we set large goals and go in every direction, that initial energy dissipates fast
09:33:40 <ttx> especially when real world commitments start to disrupt progress
09:34:18 <ttx> It's a lot better to set a small goal and make steady progress toward it
09:34:37 <ttx> rather than set a large goal and abandon it because nobody has enough time
09:35:02 <ttx> But the group should definitely end up producing *something*
09:35:23 <ttx> otherwise without a focal point the energy also dissipates fast :)
09:35:52 <ttx> We had several ideas raised in the discussion we had in Shanghai
09:36:00 <ttx> Notes at:
09:36:04 <ttx> #link https://etherpad.openstack.org/p/PVG-large-scale-SIG
09:36:43 <ttx> amorin mentioned wanting to create or modify existing doc for sensible larger-scale config defaults
09:37:20 <ttx> masahito has work within Oslo to instrument bottlenecks
09:37:21 <belmoreira> the ML thread also adds some high level information
09:37:52 <ttx> does anyone want to propose a topic for the group to initially focus on?
09:38:37 <oneswig> instrumentation is my primary focus at present.
09:38:53 <amorin> what do you mean by instrumentation?
09:39:04 <YusukeTatsumi> what is "large scale" definition? I think about 1k compute-node on one cluster.
09:39:18 <belmoreira> How about to gather the existing information how operators are managing large deployments. During the summits we have a lot of presentations that discuss several aspects: cells, rabbit, ...
09:39:38 <oneswig> amorin: I'm thinking of how do I detect the bottlenecks as the system grows
09:39:50 <amorin> ok
09:40:00 <ttx> YusukeTatsumi: yes one issue was the difference between scale within one cluster (which was my original focus) and more generally large size deployments (lots of clusters)
09:40:33 <ttx> Personally ai think if we focus on scaling within one cluster, it's already a large enough scope
09:40:44 <belmoreira> signal relevant presentations in a document and maybe create a summary would help to avoid rethink a solution that maybe was solved by someone but didn't get a lot of exposure
09:40:45 <ttx> and would raise very interesting questions
09:40:56 <amorin> belmoreira: agree with that, and I think we can share good practices on config params within this topic
09:41:33 <amorin> ^ was about how operators are managing large scale
09:41:58 <ttx> Maybe that's two different axis we can work on. (1) Scaling within one cluster, and instrumentation of the bottlenecks there
09:42:26 <ttx> (2) Document large scale configuration and tips &tricks
09:42:50 <amorin> +1
09:43:14 <masahito> That makes sense.
09:43:56 <jiaopengju> I think we should give all the users confidence in large clusters, so at first we should tell them how large about the clusters that running in product, and then show them how to do this
09:43:57 <YusukeTatsumi> +1
09:43:58 <masahito> My work ttx mentioned above is related to (1) with the direction.
09:44:32 <ttx> A reasonable goal for (1) would be to identify the most obvious bottlenecks and start implementing instrumentation to actually be able to measure it
09:44:53 <ttx> Reasonable goal for (2) is to produce some documentation
09:45:27 <amorin> yup
09:45:40 <ttx> I count masahito, oneswig interested in (1), amorin, belmoreira in (2)
09:45:59 <YusukeTatsumi> I can join to (1)
09:46:05 <belmoreira> honestly I think we shouldn't limit ourselves explicitly to one cell. Different workloads, use cases may require small but multiple cells. I would prefer to consider the architecture choice/botlenecks in terms of use case
09:46:21 <jiaopengju> join (1)
09:46:36 <oneswig> belmoreira: it's more one deployment than one cell isn't it?
09:47:17 <ttx> belmoreira: I agree we should not limit the scope of the SIG to one cell. But raising scaling limits within a single cell/cluster is useful for everyone imho
09:47:31 <ttx> so it can be one of the SIG's lines of work
09:48:15 <belmoreira> ttx of course I agree with that
09:48:45 <belmoreira> oneswig yes, deployment considering the use-case
09:48:45 <ttx> There will always be a point where you have to do multiple cells and clusters... and I agree we should also discuss that within this SIG
09:49:06 <etp> there is also growth point of view, yep
09:49:50 <ttx> Personally I can help both subgroups with their logistics and interactions with openstack project teams
09:50:06 <ttx> Like to set up a repo and jobs to publish docs to a website etc
09:50:27 <ttx> or grease the wheels with Oslo reviews etc
09:50:59 <belmoreira> ttx that's great
09:51:38 <masahito> Thanks.
09:51:43 <ttx> OK, so in terms of immediate actions, and to make progress between now and next meeting... maybe we can start two threads on the ML, one of each subject, to further refine plans
09:52:25 <ttx> Goal generally being to have a more detailed plan to discuss at next meeting for both areas
09:53:30 <ttx> Or should we brainstorm on etherpads first before dropping it on the ML?
09:54:02 <oneswig> I recall there was an interesting discussion in Shanghai about prometheus endpoints being exposed by OpenStack services.  I haven't seen any follow-up go by on that but it would be one interesting place to start.
09:55:04 <ttx> One issue of discussing very early steps on the ML (compared to doing it on an etherpad) is that you'll get people outside the SIG starting to shoot down crazy ideas
09:55:05 <oneswig> I think it was in the context of a billing forum
09:55:13 <ttx> and therefore that limits the discussion
09:55:58 <ttx> What's your preference to further refine those two topics?
09:55:58 <belmoreira> ttx yes :) the etherpad may be better to start the discussion
09:56:18 <amorin> I trust your experience, etherpad is good
09:56:30 <ttx> ok, so I'll post a summary of this meeting, and create two etherpads to refine those two topics
09:56:41 <amorin> ok
09:57:00 <ttx> #topic Next meeting
09:57:19 <ttx> #action ttx to send meeting summary and create two etherpads to further refine the two initial goals
09:57:32 <ttx> So as I said earlier, I won;t be around at that time in two weeks
09:58:08 <ttx> also some of us will have the end-of-year holidays after that
09:58:22 <ttx> So I'm wondering if we should not set the next meeting to Dec 18
09:58:23 <amorin> december 11 I am not available neither
09:58:38 <ttx> and then January 8
09:58:45 <oneswig> either date works for me
09:59:04 <ttx> then we can go back to every 2 week
09:59:12 <oneswig> +1
09:59:13 <amorin> dec 18 works for me
09:59:15 <etp> both work for me
09:59:22 <jiaopengju> dec 18 is fine
09:59:34 <oneswig> have to go - thanks ttx & all - see you next time
09:59:41 <masahito> 18 dec works for me
09:59:48 <ttx> Alright! Thanks everyone for attending
09:59:48 <YusukeTatsumi> ether days works
09:59:48 <belmoreira> I will not be available in the 18
10:00:49 <belmoreira> i would propose to work in the scope and then we meet next year
10:01:02 <ttx> belmoreira: will you be available to sync with amorin ahead of the meeting on Dec 18?
10:01:08 <ttx> It's ok to miss a meeting
10:01:46 <belmoreira> I will be off all week
10:01:56 <amorin> arf
10:02:14 <ttx> OK, let's continue that discussion on the ML, and free up people
10:02:25 <ttx> #info Next meeting date to be confirmed
10:02:29 <ttx> #endmeeting