09:00:09 <ttx> #startmeeting large_scale_sig
09:00:09 <openstack> Meeting started Wed Feb 26 09:00:09 2020 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:12 <openstack> The meeting name has been set to 'large_scale_sig'
09:00:14 <ttx> Hi!
09:00:20 <ttx> #topic Rollcall
09:00:26 <amorin> LO
09:00:32 <ttx> amorin: how are things?
09:00:44 <oneswig> hello
09:00:51 <amorin> pretty well
09:01:01 <ttx> Agenda for today at:
09:01:06 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-meeting
09:02:10 <ttx> Let's wait a couple minutes for our friends in Asia
09:02:14 <masahito> o/
09:02:16 <amorin> ack
09:03:09 <ttx> jiaopengju1: around?
09:03:37 <ttx> Ok, let's get started
09:03:46 <ttx> #topic Progress on "Documenting large scale operations" goal
09:03:51 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-documentation
09:03:58 <ttx> amorin: any progress on documenting configuration defaults for large scale ?
09:04:09 <ttx> any blocker / help needed?
09:04:13 <amorin> yes
09:04:24 <amorin> I have been working on that in the past few days
09:04:37 <amorin> as you can see in the link you pasted
09:04:59 <amorin> I collected some params that need to be tweaked on nova / neutron
09:05:07 <amorin> some of them, it's just a beginning
09:05:18 <amorin> but I'd like to go further with that now
09:05:29 <amorin> so I have some questions for all of you :p
09:06:00 <ttx> re: oslo.privsep on Neutron, last time I looked it was not there yet
09:06:02 <amorin> first is, should we create a wiki page somewhere?
09:06:17 <amorin> ah yes, thanks
09:06:33 <ttx> It's mostly cinder/nova at this point
09:06:45 <ttx> (and os-brick)
09:06:51 <amorin> ok
09:07:03 <amorin> so neutron and others are still using rootwrap daemon
09:07:20 <amorin> that's the kind of info I want to share on the wiki page
09:07:28 <amorin> do you know how I can create that?
09:07:30 <ttx> and even there, most of the time it's a thin wrapper rather than a reimplementation of functions for higher security
09:07:35 <amorin> (I havn't checked at all)
09:07:56 <ttx> so from a performance perspective, equivalent to rootwrap-daemon
09:08:16 <amorin> ack
09:08:16 <ttx> only Nova did serious usage of privsep
09:08:33 <ttx> resulting in stability, security and performance benefits
09:08:33 <amorin> that's what came out from talk with matt
09:09:13 <amorin> I dont want to disperse, but maybe the large scale group could also help on pushing privsep in other projects
09:09:30 <ttx> re:wiki, I think it would be great to create a page under https://wiki.openstack.org/wiki/Large_Scale_SIG
09:10:05 <ttx> amorin: yeah, could be a collaboration with the Security SIG... or some pop-up team to push for privsep in a specific direction
09:10:17 <ttx> I have a long-standing TODO to work on privsepping osbrick
09:10:24 <amorin> ok :p
09:10:29 <oneswig> ttx: where do the performance benefits come in with privsep?
09:10:31 <ttx> since it sounded like a fun project
09:11:34 <ttx> oneswig: mostly in using narrow python functions rather than calling on IPC... gain depends on function called though
09:11:58 <ttx> in some rare cases I could see it being actually slower, but overall there should be a small gain.
09:12:03 <ttx> But the real gain is security
09:12:23 <oneswig> OK, that makes sense thanks
09:12:57 <ttx> amorin: so +1 on creating a wiki page
09:13:10 <amorin> I will do then
09:13:16 <amorin> and register the link on https://wiki.openstack.org/wiki/Large_Scale_SIG
09:13:22 <ttx> ++
09:13:26 <amorin> that was the first point,
09:13:33 <amorin> then, after the wiki page is done
09:13:44 <amorin> I'd like to contribute to documentation, on nova for example
09:13:54 <amorin> with a small change on some parameters docs
09:14:02 <amorin> to show a link to this page
09:14:13 <amorin> what do you think?
09:14:29 <amorin> #action amorin do a wiki page for large scale documentation
09:15:02 <ttx> it sounds like a good idea. To be discussed with Nova team I guess
09:15:34 <amorin> yes, I will do a proposal that could be used to start talking
09:15:51 <ttx> maybe they would prefer making it direct part of doc, but at least at a first stage the content of that wiki page are likely to be a moving target
09:16:19 <ttx> so until it stabilizes it really makes sense to expose it through a small doc patch
09:16:27 <amorin> ok
09:16:53 <ttx> amorin: anything else on that topic?
09:17:02 <amorin> #action amorin start a patch on documentation for nova
09:17:24 <amorin> last is, if you have some params on your side that could be identified in that documentation
09:17:28 <amorin> feel free to add them :p
09:17:34 <amorin> and that's all from me!
09:17:51 <amorin> the scaling story here:
09:17:56 <amorin> #link https://etherpad.openstack.org/p/scaling-stories
09:18:01 <ttx> #idea pop up team between Large Scale SIG / Security SIG to push privsep in specific areas
09:18:04 <amorin> is an amazing souce of info
09:18:14 <ttx> yes, I wish there were more, really
09:18:17 <amorin> I'd love to have more like that
09:18:18 <slaweq> amorin: ttx: we are moving with many things to e.g. pyroute2 library to use privsep
09:18:20 <amorin> +1
09:18:30 <amorin> slaweq, nice!
09:18:33 <slaweq> but it's still not finished and in some places we are still using rootwrap
09:18:35 <ttx> slaweq: great to see progress there !
09:19:11 <ttx> OK... Looking at the etherpad I don't think we had any new doc links added, so let's skip that
09:19:27 <ttx> #topic Progress on "Scaling within one cluster" goal
09:19:48 <ttx> Re: https://etherpad.openstack.org/p/scaling-stories , no recent story posted, so we should carry that over
09:20:01 <ttx> I was thinking of sending a periodic reminder
09:20:18 <amorin> +1
09:20:31 <ttx> but ideally we'd have scaling stories from this group, to make it look more like a living resource
09:20:44 <amorin> at OVH we can add some, I'll try to gather the info internally
09:20:55 <ttx> Also I was thinking of doing a live exercise on this at the Large Scale usage of infrastructure track at OpenDev
09:20:59 <ttx> (more on that later)
09:21:16 <ttx> amorin: can be anecdotes, does not have to be as comprehensive as the first one posted
09:21:21 <amorin> I am pretty sure CERN can also add some very interesting things
09:21:52 <ttx> Like "I remember when we did this and had to do this to get it to work"
09:22:17 <ttx> We got a couple more reviews on oslo.metrics blueprint at https://review.opendev.org/#/c/704733/
09:22:28 <amorin> yes
09:22:35 <ttx> masahito: Based on bnemec's comment, I'd say we should not overthink it... Maybe post a new patchset to take comments into account and run for approval ?
09:22:36 <masahito> Thanks for the comments.
09:23:07 <masahito> ah, yes. I plan to update this in tomorrow.
09:23:11 <ttx> Nice!
09:23:32 <oneswig> sounds great
09:23:34 <amorin> great, for me it's quite ok, I think we must first focus on simplicity
09:24:01 <amorin> one of my concern was about the fact that oslo metrics was sending the data somewhere
09:24:07 <amorin> are we sure we want to do that?
09:24:16 <ttx> yes and getting the code out there. If there was a fatal flaw in the idea, we'd have had more comments on that spec :)
09:24:57 <ttx> amorin: you mean central collection? what would the alternative be?
09:25:30 <masahito> And as Ben commented it's better to public our codes for a first look. But I need to check which doesn't have sensitive information in our company.
09:25:43 <amorin> IMHO, oslo.metrics should collect the metrics and leave it locally on an unix socket
09:25:45 <ttx> masahito: of course, take your time :)
09:26:08 <amorin> the central collection could be done by antoher tool
09:26:35 <amorin> but if your implementation is already doing it, then it's ok :p
09:26:37 <masahito> amorin: Good point. In our cases, oslo.metrics sends data. But if that's not good for other company, I don't want to push our cases.
09:27:01 <amorin> in my company, we already have some kind of collectors on hosts,
09:27:07 <amorin> so I'd like to reuse those
09:27:14 <ttx> amorin: could that be one config option?
09:27:20 <amorin> of course
09:27:47 <amorin> I wasnt aware that an implemetnation was already done
09:28:01 <ttx> maybe we can propose that once the original code is posted
09:28:11 <amorin> make sense
09:28:23 <ttx> amorin: it's what masahito presented in Shanghai, running at LINE
09:28:24 <masahito> amorin: Not a much implementation. We just started.
09:28:36 <amorin> ah ok!
09:28:40 <amorin> perfect then ;p
09:29:18 <ttx> If there is one thing I learned about telemetry and metrics collection, it's that you need to be flexible to adapt to every case
09:29:32 <ttx> and that simple is better than complex for that reason
09:30:08 <ttx> ok, anything else to discuss re: "Scaling within one cluster" goal ?
09:30:21 <oneswig> I added something
09:30:23 <ttx> #action masahito to post new patchset of spec
09:30:32 <oneswig> on bare metal scaling.  Is it time for that?
09:31:00 <ttx> yes, in a sec
09:31:08 <ttx> #topic Other topics
09:31:15 <ttx> oneswig: go for it
09:31:46 <oneswig> ah right
09:31:58 <oneswig> so I was thinking this might be another front for discussion
09:32:18 <oneswig> how big can bare metal clusters go, and how are they managed in extremis?
09:32:44 <oneswig> We have experience with working on the rate at which nodes can be deployed concurrently
09:33:07 <ttx> I feel like that's definitely aligned with the "scaling within one cluster" goal, for a specific definition of cluster
09:33:33 <oneswig> One limitation can be the processing rate of networking-generic-switch, if we aren't reverting to flat networking
09:33:46 <oneswig> ttx: think so too :-)
09:34:05 <ttx> oneswig: it could start as a scaling story
09:34:23 <oneswig> I don't have much to contribute on this yet but I'm happy to write and maintain a scaling story for this based on our experiences.
09:34:25 <ttx> I don;t think we have enough data points to start making separate categories
09:34:34 <amorin> that could be nice
09:34:50 <amorin> you eventually found some tunings to be done for your story, no?
09:34:50 <oneswig> cool, sign me up.
09:35:02 <ttx> Once we have plenty of stories we'll have to think on how to best present them, but for now the big etherpad is enough
09:35:11 <oneswig> Well, the most recent work has been a combination of tunables and refactoring of n-g-s
09:35:24 <ttx> #action oneswig to contribute a scaling story on bare metal cluster scaling
09:35:39 <oneswig> nb, might take a couple of months
09:35:59 <ttx> oneswig: Oath / Verizon Media has a massive footprint of bare metal clusters, would be nice to hear from them
09:36:08 <ttx> which brings us to our next point
09:36:10 <oneswig> for sure!
09:36:19 <ttx> belmoreira and masahito volunteered to join the programming committee for the "Large-scale Usage of Open Source Infrastructure Software" track
09:36:23 <ttx> at OpenDev event in Vancouver, June 8-11
09:36:31 <ttx> It's our opportunity to shape the discussion so that it aligns with the group's objectives
09:36:37 <ttx> And recruit new members
09:36:39 <masahito> We also have tons of baremetal but not managed by OpenStack ;-<
09:36:53 <ttx> I'm already involved with another track, so Allison Price will be the OSF contact to organize that track.
09:37:05 <ttx> I'll follow closely still, and we can discuss it in this meeting
09:37:10 <amorin> unfortunately I wont be able to attend
09:37:22 <ttx> amorin: that's unfortunate
09:37:31 <ttx> A track has several subthemes...
09:37:36 <oneswig> I expect to be there and will participate provided it doesn't conflict with the scientific sig
09:37:49 <ttx> So we could have one around collecting scaling stories from the participants
09:37:54 <ttx> oneswig: it does not
09:37:58 <masahito> I plan to be there as ttx said
09:38:12 <ttx> Opendev tracks are in the mornings and PTG in the afternoons (+ last day)
09:38:33 <ttx> We could also have a specific theme around scaling bare metal clusters
09:38:57 <ttx> The format is pretty open, the idea is to prefer direct engagement over formal talks
09:39:13 <oneswig> ttx: sounds good to me.
09:39:47 <ttx> so as you are contacted and start discussing the contents, please push our SIG's agenda !
09:40:06 <oneswig> +1
09:40:11 <ttx> ideally we'd recruit new members, but at the very least we should collect information
09:41:01 <ttx> I'd really like a scaling-stories session where people can informally talk about scaling anecdotes, and then we'd take notes and follow up with them to submit something more detailed
09:41:34 <ttx> #topic Next meeting
09:41:40 <ttx> everyone still OK with a biweekly frequency?
09:41:50 <amorin> it's ok for me
09:41:50 <oneswig> yes
09:41:55 <masahito> yes
09:42:35 <ttx> #info next meeting: March 11
09:42:47 <ttx> Anything else to mention before we close ?
09:43:32 <oneswig> nothing here
09:43:36 <ttx> #idea have a scaling-stories session where people can informally talk about scaling anecdotes, and then we'd take notes and follow up with them to submit something more detailed
09:43:57 <ttx> ok I think I captured everything
09:44:01 <ttx> Thanks everyone
09:44:07 <ttx> #endmeeting