Wednesday, 2020-12-16

*** tosky has quit IRC		00:04
*** macz_ has quit IRC		00:58
*** mlavalle has quit IRC		01:30
*** _mlavalle_1 has joined #openstack-meeting-3		01:30
*** artom has quit IRC		02:12
*** hemanth_n has joined #openstack-meeting-3		02:26
*** benj_- has joined #openstack-meeting-3		02:35
*** benj_ has quit IRC		02:35
*** benj_- is now known as benj_		02:35
*** macz_ has joined #openstack-meeting-3		02:55
*** macz_ has quit IRC		03:00
*** macz_ has joined #openstack-meeting-3		03:46
*** macz_ has quit IRC		03:50
*** ricolin has joined #openstack-meeting-3		05:59
*** yamamoto has quit IRC		06:48
*** lkoranda has joined #openstack-meeting-3		07:25
*** yamamoto has joined #openstack-meeting-3		07:29
*** lkoranda has quit IRC		07:35
*** eolivare has joined #openstack-meeting-3		07:35
*** yamamoto has quit IRC		07:39
*** slaweq has joined #openstack-meeting-3		08:00
*** tosky has joined #openstack-meeting-3		08:33
*** e0ne has joined #openstack-meeting-3		08:54
*** aarents has quit IRC		09:24
*** tosky_ has joined #openstack-meeting-3		09:47
*** tosky is now known as Guest24372		09:49
*** tosky_ is now known as tosky		09:49
*** Guest24372 has quit IRC		09:50
*** lpetrut has joined #openstack-meeting-3		09:57
*** yamamoto has joined #openstack-meeting-3		10:12
*** baojg has quit IRC		10:18
*** baojg has joined #openstack-meeting-3		10:20
*** artom has joined #openstack-meeting-3		11:13
*** macz_ has joined #openstack-meeting-3		11:18
*** macz_ has quit IRC		11:23
*** yamamoto has quit IRC		11:32
*** raildo has joined #openstack-meeting-3		11:53
*** yamamoto has joined #openstack-meeting-3		11:55
*** yamamoto has quit IRC		12:00
*** eolivare_ has joined #openstack-meeting-3		12:03
*** eolivare has quit IRC		12:05
*** baojg has quit IRC		12:09
*** baojg has joined #openstack-meeting-3		12:09
*** yamamoto has joined #openstack-meeting-3		12:20
*** eolivare_ has quit IRC		12:25
*** baojg has quit IRC		12:42
*** eolivare_ has joined #openstack-meeting-3		12:43
*** baojg has joined #openstack-meeting-3		12:44
*** hemanth_n has quit IRC		13:03
*** liuyulong has joined #openstack-meeting-3		13:59
*** mdelavergne has joined #openstack-meeting-3		14:51
ttx	#startmeeting large_scale_sig	15:00
openstack	Meeting started Wed Dec 16 15:00:06 2020 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:00
*** genekuo has joined #openstack-meeting-3		15:00
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:00
*** openstack changes topic to " (Meeting topic: large_scale_sig)"		15:00
openstack	The meeting name has been set to 'large_scale_sig'	15:00
ttx	#topic Rollcall	15:00
*** openstack changes topic to "Rollcall (Meeting topic: large_scale_sig)"		15:00
ttx	Who is here for the Large Scale SIG meeting ?	15:00
mdelavergne	Hi!	15:00
genekuo	o/	15:00
ttx	mdelavergne: hi!	15:00
jpward	o/	15:00
liuyulong	HI	15:00
ttx	pinging amorin	15:01
ttx	I don;t see belmiro in channel	15:01
*** belmoreira has joined #openstack-meeting-3		15:01
ttx	pinging imtiazc too	15:01
imtiazc	I'm here	15:02
belmoreira	o/	15:02
ttx	if we say belmoreira 3 times he appears	15:02
ttx	Our agenda for today is at:	15:02
ttx	#link https://etherpad.openstack.org/p/large-scale-sig-meeting	15:02
ttx	#topic Review previous meetings action items	15:02
*** openstack changes topic to "Review previous meetings action items (Meeting topic: large_scale_sig)"		15:02
ttx	"ttx to add 5th stage around upgrade and maintain scaled out systems in operation"	15:03
ttx	that's done at:	15:03
ttx	#link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain	15:03
ttx	we can have a look when we'll review those pages later	15:03
ttx	"ttx to make sure oslo.metrics 0.1 is released"	15:03
ttx	That was done through https://review.opendev.org/c/openstack/releases/+/764631 and now oslo.metrics is available at:	15:03
ttx	#link https://pypi.org/project/oslo.metrics/	15:03
ttx	It was also added to OpenStack global requirements by genekuo:	15:03
ttx	#link https://review.opendev.org/c/openstack/requirements/+/766662	15:04
ttx	So it's not ready to consume and will be included in OpenStack Wallaby.	15:04
genekuo	ttx, can you ping me when CI is fix?	15:04
genekuo	*fixed	15:04
ttx	It's up to us to now better explain how to enable and use it	15:04
ttx	I'll ask around on what is still blocked, yes	15:04
ttx	"all to help in filling out https://etherpad.opendev.org/p/large-scale-sig-scaling-videos"	15:04
ttx	Thanks everyone for the help there!	15:05
imtiazc	Here	15:05
ttx	As a reminder we did look up those videos for two reasons:	15:05
ttx	- we can link to them on wiki pages as a good resource to watch (if relevant to any stage)	15:05
ttx	- we could reach out to specific presenters so that they share a bit more about their scaling story	15:05
ttx	So if you watch them and find them very relevant for any of our stages, please add them to the wiki pages	15:05
ttx	And if a specific use case looks very interesting but lacks details, we could reach our to the presenters with more questions	15:05
ttx	Especially from presenters who are not already on the SIG, like China Mobile, Los Alamos, ATT, Reliance Jio...	15:06
ttx	Questions on that?	15:06
mdelavergne	seems straightforward	15:07
ttx	"ttx to check out Ops meetups future plans"	15:07
ttx	I did ask and there is no event planned yet, so we can't piggyback on that for now for our "scaling story collection" work	15:07
ttx	we'll see what event(s) are being organized in 2021. should see more clearly in January	15:08
ttx	"all to review pages under https://wiki.openstack.org/wiki/Large_Scale_SIG in preparation for next meeting"	15:08
ttx	We'll discuss that now in more details in the next topic	15:08
genekuo	I've put some short answer on some of the question listed there	15:08
ttx	Any question or comment on those action items? Anything to add?	15:08
genekuo	at least the thing I know	15:08
ttx	oh yes I saw it	15:08
ttx	that's the idea, feel free to add things to those pages. We'll review them now and see if there is anything we should prioritize adding	15:09
ttx	#topic Reviewing all scaling stages, and identifying simple tasks to do a first pass at improving those pages	15:09
*** openstack changes topic to "Reviewing all scaling stages, and identifying simple tasks to do a first pass at improving those pages (Meeting topic: large_scale_sig)"		15:09
ttx	So.. the first one is...	15:09
ttx	#link https://wiki.openstack.org/wiki/Large_Scale_SIG/Configure	15:10
ttx	At this stage I don't think there are any easy tasks...	15:10
ttx	amorin did lead the curation of at-scale configuration defaults	15:10
ttx	But it's still work in progress,	15:10
ttx	so i don;t think we have a final answer for the "Which parameters should I adjust before tackling scale ?" question	15:11
ttx	Are there other common questions that we should list for that stage?	15:11
ttx	maybe something around choosing the right drivers/backends at install time	15:12
mdelavergne	maybe "how not" ?	15:12
ttx	like which are the ones that actually CAN scale?	15:12
imtiazc	Rabbit configuration. We had to tweak a few things there.	15:12
ttx	maybe we can split the question into openstack parameters and rabbit parameters	15:13
ttx	I'll do that now	15:13
genekuo	I agree with listing out the drivers and backends people are using in large scale	15:14
*** ralonsoh has quit IRC		15:15
ttx	ok I added those two as questions	15:15
*** ralonsoh has joined #openstack-meeting-3		15:15
*** ricolin_ has joined #openstack-meeting-3		15:15
ttx	Any other easy things to add at that stage?	15:16
imtiazc	Apart from DB and RMQ tuning, we had to add memcached. Memcached is used to be an optional deployment option but makes a big difference in performance	15:16
ttx	imtiazc: how about we add "should I use memcached?" question	15:17
ttx	then you can answer it	15:17
ttx	:)	15:17
imtiazc	Sure	15:17
ttx	I like it, that's a good one	15:18
ttx	OK, moving on to next stage...	15:18
ttx	#link https://wiki.openstack.org/wiki/Large_Scale_SIG/Monitor	15:18
jpward	I don't know exactly how to ask the question, but what about determining the number of controller nodes and associated services?	15:18
ttx	jpward: that would be for step 3	15:18
ttx	we'll be back to it	15:18
ttx	For the "Monitor" stage I feel like we should redirect people more aggressively to oslo.metrics	15:19
ttx	oh I see that genekuo already added those	15:19
genekuo	yeah	15:20
ttx	genekuo: the next step will be to write good doc for oslo.metrics	15:20
mdelavergne	yep, seems that oslo.metrics is currently everywhere :D	15:20
ttx	so that we can redirect people to it and there they will find all answers	15:20
genekuo	I think there are some other stuff that is worth monitoring like queued messages in rabbitmq	15:20
genekuo	I will try to add some docs once the oslo.messaging code is done	15:21
ttx	Anything else to add? I was tempted to add questions around "how do I track latency issues", "how do I track traffic issues", "how do I track error rates", "how do I track saturation issues"	15:22
ttx	but I'm not sure we would have good answers for those anytime soon	15:22
imtiazc	Are oslo.metrics supposed to help with distributed tracing?	15:22
*** lpetrut has quit IRC		15:23
genekuo	I'm not sure about how the question will be, but we do monitor queued messages in rabbitmq	15:23
genekuo	if it keep piling up, it may indicate that the workers aren't enough	15:23
ttx	imtiazc: I'd say that oslo.metrics is more around targeted monitoring of external data sources (database, queue) from openstack perspective	15:24
genekuo	imtiazc, what do you mean by distributed tracing? can you give an example on that?	15:24
ttx	like tracing a user call through all components?	15:24
genekuo	thanks ttx for the explaination	15:24
imtiazc	ttx: Those are good questions. I think the answers will vary from one operator to another.	15:24
genekuo	I agree it will be good to add those questions	15:25
ttx	ok, I'll add them now	15:25
imtiazc	genekuo; An example would be how much time each component of OpenStack takes to create a VM. It can be traced using a common request ID	15:26
ttx	imtiazc: OSProfiler is supposed to help there	15:27
ttx	example https://docs.openstack.org/ironic/pike/_images/sample_trace.svg	15:28
imtiazc	Thanks, haven't tried that out yet. We were considering hooking up with OpenTracing or something like Jaeger	15:28
ttx	I haven;t looked at it in a while, so not sure how usable it is	15:29
genekuo	for oslo.metrics, I think what you can get is how much time it takes for scheduling rpc calls in a certain period.	15:29
genekuo	but not for a specific request	15:29
ttx	right, it's different goal	15:29
mdelavergne	Osprofiler worked fine when we used it	15:30
ttx	imtiazc: if you have a fresh look at it, I'm sure the group will be interested in learning what you thought of it	15:30
ttx	ok, anything else to add to the Monitoring stage at this point?	15:30
genekuo	LGTM	15:31
imtiazc	Is there a plan for the community to develop all the monitoring checks (e.g. prometheus checks)?	15:31
ttx	imtiazc: there has been a technical Committee discussion on how to develop something for monitoring that's more sustainable than Ceilometer	15:32
ttx	including building something around prometheus	15:32
ttx	discussion died down as people did not take immediate interest in working on it	15:33
ttx	that does not mean it's not important	15:33
ttx	we might need to revive that discussion after the holidays in one way or another	15:33
ttx	moving on to stage 3	15:34
ttx	#link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleUp	15:34
ttx	so this is where we should give guidance on number of nodes	15:35
ttx	jpward: ^	15:35
ttx	some of the resources listed here might be a better fit in the Configure stage	15:35
genekuo	For third question "RabbitMQ is clearly my bottleneck, is there a way to reconfigure that part to handle more scale?"	15:36
ttx	Like it's a bit late in the journey to select a Neutron backend	15:36
genekuo	should we put this in step 1?	15:36
imtiazc	That a good topic :) The answer, however, depends a lot on the network provider selection.	15:36
imtiazc	We often wondered about what tools other operators use. For e.g. what network provider are they using, what is used for monitoring, logging. How do other provision their hosts (before even deploying OpenStack) and also deployment tool - Puppet, Ansible, etc. Do you think we can come up with a matrix /table for this?	15:36
ttx	genekuo: yeah, I think we should delete that question from this stage. I already added a question on RabbitMQ configuration	15:37
ttx	done	15:37
jpward	imtiazc, I have wondered the same thing, I would like to see that as well	15:37
ttx	I'll move the Neutron backends comparison to stage 1 too	15:38
ttx	ok done	15:39
genekuo	imtiazc, I think there's a lot of feedback about what tools ops uses in ops forum during summit	15:40
jpward	should there also be a planning stage? Like determining the type of hardware, networking configurations, etc?	15:40
ttx	yeah, the trick is to reduce all that feedback into common best practices	15:40
ttx	jpward: currently we use stage 1 (Configure) for that	15:40
ttx	It's like initial decisions (stage 1) and later decisions (stage 3)	15:41
jpward	ok	15:41
ttx	Picking a neutron backend would be an initial decision	15:41
ttx	deciding on a control plane / data plane number of nodes mix is more stage 3	15:42
*** liuyulong has quit IRC		15:42
ttx	(bad example, it's like where the answer is the most "it depends")	15:42
ttx	maybe we should rename to the "It Depends SIG"	15:42
jpward	lol	15:43
genekuo	lol	15:43
ttx	Seriously though, there is a reason why there is no "Scaling guide" yet... It's just hard to extract common guidance	15:43
genekuo	we determine the number of control plane process by looking into rabbitmq queue	15:43
ttx	yet we need to, because this journey is superscary	15:44
genekuo	if the number of messages keep queueing up it probably means that you need to add more workers	15:44
ttx	So any answer or reassurance we can give, we should.	15:44
ttx	genekuo: would you mind adding a question around that? Like "how do you decide to add a new node for control plane" maybe	15:45
imtiazc	Yes, the guidance is somewhat dependent on monitoring your queues and other services. But I think we can vouch for the max number of computes given our architecture.	15:45
genekuo	ttx, let me add it	15:45
ttx	Frankly, we should set the bar pretty low. Any information is better than the current void	15:46
ttx	which is why I see this as a no pressure exercise	15:46
*** _mlavalle_1 has quit IRC		15:46
ttx	It is a complex system and every use case is different	15:46
ttx	If optimizing was easy we'd encode it in the software	15:47
ttx	So even if the answer is always "it depends", at least we can say "it depends on..."	15:47
genekuo	done	15:47
ttx	and provide tools to help determining the best path	15:47
ttx	genekuo: thx	15:47
imtiazc	We had some rough ideas on how much we could scale based on feedback from other operators like CERN, Salesforce, PayPal etc..	15:48
ttx	Anything else to add to Scaleup?	15:48
ttx	imtiazc: the best way is, indeed, to listen and discuss with others and apply what they say mentally to your use case	15:48
ttx	Maybe one pro tip we should give is to attend events, watch presentations, engage with fellow operators	15:49
ttx	once that will be possible again to socialize :)	15:50
genekuo	sounds good	15:50
ttx	ok,. moving on to next stage	15:50
ttx	#link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut	15:50
ttx	So here I think it would be great to have a few models	15:50
ttx	i can't lead that as I don;t have practical experience doing it	15:51
genekuo	me too, we currently only split regions because of DR purpose	15:51
ttx	If someone is interested in listing the various ways you can scale out to multiple clusters/zones/regions/cells...	15:51
ttx	genekuo: independent clusters is still one model	15:52
ttx	So we won;t solve that one today, but if you;re interested in helping there, let me know	15:52
ttx	Last stage is the one I just added	15:53
ttx	#link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain	15:53
ttx	(based on input from last meeting)	15:53
imtiazc	We are also following a cookie cutter model. Once we have determined a max size we are comfortable with, we just replicate. I do like what CERN has done there	15:53
ttx	imtiazc: that's good input. If you can formalize it as a question/answer, I think it would be a great addition	15:54
ttx	So again, I don't think there is easy low-hanging fruit in this stage we could pick up	15:54
ttx	Also wondering how much that stage depends on the distribution you picked at stage 1	15:55
ttx	could be an interesting question to add -- which OpenStack distribution model is well-suited for large scale	15:56
ttx	(stage 1 probably)	15:56
ttx	I'll add it	15:56
ttx	Any last comment before we switch to discussing next meeting date?	15:57
genekuo	nope :)	15:57
imtiazc	By distribution, do you mean Ubuntu, RedHat, SuSe etc?	15:57
ttx	or openstackansible etc	15:58
ttx	Like how do you install openstack	15:58
imtiazc	ok. thanks. I don't have anything else for today	15:58
ttx	So not really Ubuntu, but Ubuntu debs vs. Juju vs...	15:59
ttx	#topic Next meeting	15:59
*** openstack changes topic to "Next meeting (Meeting topic: large_scale_sig)"		15:59
ttx	As discussed last meeting, we'll skip the meeting over teh end-of-year holidays	15:59
ttx	So our next meeting will be January 13.	15:59
ttx	I don't think we'll have a specific item to discuss in-depth, we'll just focus on restarting the Large Scale SIG engine in the new year	15:59
imtiazc	Happy holidays everyone!	15:59
ttx	Super, we made it to the end of the meeting without logging any TODOs! We'll be able to take a clean break over the holidays	15:59
ttx	Thanks everyone	15:59
ttx	#endmeeting	16:00
*** openstack changes topic to "OpenStack Meetings \|\| https://wiki.openstack.org/wiki/Meetings/"		16:00
openstack	Meeting ended Wed Dec 16 16:00:03 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:00
mdelavergne	Happy holidays, see you next year, and thanks :)	16:00
openstack	Minutes: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-12-16-15.00.html	16:00
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-12-16-15.00.txt	16:00
openstack	Log: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-12-16-15.00.log.html	16:00
ttx	right on time	16:00
ttx	that was clsoe	16:00
genekuo	thanks all, see you next year	16:00
ttx	close even	16:00
ttx	genekuo: thanks!	16:00
*** imtiazc has left #openstack-meeting-3		16:00
*** mdelavergne has quit IRC		16:00
*** macz_ has joined #openstack-meeting-3		16:04
*** eolivare_ has quit IRC		16:16
*** mlavalle has joined #openstack-meeting-3		16:16
*** ricolin_ has quit IRC		16:31
*** ralonsoh is now known as ralonsoh\|afk		17:00
*** belmoreira has quit IRC		17:34
*** e0ne has quit IRC		19:52
*** baojg has quit IRC		21:15
*** baojg has joined #openstack-meeting-3		21:17
*** baojg has quit IRC		21:19
*** baojg has joined #openstack-meeting-3		21:21
*** ralonsoh\|afk has quit IRC		22:09
*** raildo has quit IRC		22:30
*** slaweq has quit IRC		22:42
*** haleyb is now known as haleyb\|away		23:18
*** baojg has quit IRC		23:25
*** baojg has joined #openstack-meeting-3		23:26

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!