09:00:09 #startmeeting large_scale_sig 09:00:09 Meeting started Wed Feb 26 09:00:09 2020 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:12 The meeting name has been set to 'large_scale_sig' 09:00:14 Hi! 09:00:20 #topic Rollcall 09:00:26 LO 09:00:32 amorin: how are things? 09:00:44 hello 09:00:51 pretty well 09:01:01 Agenda for today at: 09:01:06 #link https://etherpad.openstack.org/p/large-scale-sig-meeting 09:02:10 Let's wait a couple minutes for our friends in Asia 09:02:14 o/ 09:02:16 ack 09:03:09 jiaopengju1: around? 09:03:37 Ok, let's get started 09:03:46 #topic Progress on "Documenting large scale operations" goal 09:03:51 #link https://etherpad.openstack.org/p/large-scale-sig-documentation 09:03:58 amorin: any progress on documenting configuration defaults for large scale ? 09:04:09 any blocker / help needed? 09:04:13 yes 09:04:24 I have been working on that in the past few days 09:04:37 as you can see in the link you pasted 09:04:59 I collected some params that need to be tweaked on nova / neutron 09:05:07 some of them, it's just a beginning 09:05:18 but I'd like to go further with that now 09:05:29 so I have some questions for all of you :p 09:06:00 re: oslo.privsep on Neutron, last time I looked it was not there yet 09:06:02 first is, should we create a wiki page somewhere? 09:06:17 ah yes, thanks 09:06:33 It's mostly cinder/nova at this point 09:06:45 (and os-brick) 09:06:51 ok 09:07:03 so neutron and others are still using rootwrap daemon 09:07:20 that's the kind of info I want to share on the wiki page 09:07:28 do you know how I can create that? 09:07:30 and even there, most of the time it's a thin wrapper rather than a reimplementation of functions for higher security 09:07:35 (I havn't checked at all) 09:07:56 so from a performance perspective, equivalent to rootwrap-daemon 09:08:16 ack 09:08:16 only Nova did serious usage of privsep 09:08:33 resulting in stability, security and performance benefits 09:08:33 that's what came out from talk with matt 09:09:13 I dont want to disperse, but maybe the large scale group could also help on pushing privsep in other projects 09:09:30 re:wiki, I think it would be great to create a page under https://wiki.openstack.org/wiki/Large_Scale_SIG 09:10:05 amorin: yeah, could be a collaboration with the Security SIG... or some pop-up team to push for privsep in a specific direction 09:10:17 I have a long-standing TODO to work on privsepping osbrick 09:10:24 ok :p 09:10:29 ttx: where do the performance benefits come in with privsep? 09:10:31 since it sounded like a fun project 09:11:34 oneswig: mostly in using narrow python functions rather than calling on IPC... gain depends on function called though 09:11:58 in some rare cases I could see it being actually slower, but overall there should be a small gain. 09:12:03 But the real gain is security 09:12:23 OK, that makes sense thanks 09:12:57 amorin: so +1 on creating a wiki page 09:13:10 I will do then 09:13:16 and register the link on https://wiki.openstack.org/wiki/Large_Scale_SIG 09:13:22 ++ 09:13:26 that was the first point, 09:13:33 then, after the wiki page is done 09:13:44 I'd like to contribute to documentation, on nova for example 09:13:54 with a small change on some parameters docs 09:14:02 to show a link to this page 09:14:13 what do you think? 09:14:29 #action amorin do a wiki page for large scale documentation 09:15:02 it sounds like a good idea. To be discussed with Nova team I guess 09:15:34 yes, I will do a proposal that could be used to start talking 09:15:51 maybe they would prefer making it direct part of doc, but at least at a first stage the content of that wiki page are likely to be a moving target 09:16:19 so until it stabilizes it really makes sense to expose it through a small doc patch 09:16:27 ok 09:16:53 amorin: anything else on that topic? 09:17:02 #action amorin start a patch on documentation for nova 09:17:24 last is, if you have some params on your side that could be identified in that documentation 09:17:28 feel free to add them :p 09:17:34 and that's all from me! 09:17:51 the scaling story here: 09:17:56 #link https://etherpad.openstack.org/p/scaling-stories 09:18:01 #idea pop up team between Large Scale SIG / Security SIG to push privsep in specific areas 09:18:04 is an amazing souce of info 09:18:14 yes, I wish there were more, really 09:18:17 I'd love to have more like that 09:18:18 amorin: ttx: we are moving with many things to e.g. pyroute2 library to use privsep 09:18:20 +1 09:18:30 slaweq, nice! 09:18:33 but it's still not finished and in some places we are still using rootwrap 09:18:35 slaweq: great to see progress there ! 09:19:11 OK... Looking at the etherpad I don't think we had any new doc links added, so let's skip that 09:19:27 #topic Progress on "Scaling within one cluster" goal 09:19:48 Re: https://etherpad.openstack.org/p/scaling-stories , no recent story posted, so we should carry that over 09:20:01 I was thinking of sending a periodic reminder 09:20:18 +1 09:20:31 but ideally we'd have scaling stories from this group, to make it look more like a living resource 09:20:44 at OVH we can add some, I'll try to gather the info internally 09:20:55 Also I was thinking of doing a live exercise on this at the Large Scale usage of infrastructure track at OpenDev 09:20:59 (more on that later) 09:21:16 amorin: can be anecdotes, does not have to be as comprehensive as the first one posted 09:21:21 I am pretty sure CERN can also add some very interesting things 09:21:52 Like "I remember when we did this and had to do this to get it to work" 09:22:17 We got a couple more reviews on oslo.metrics blueprint at https://review.opendev.org/#/c/704733/ 09:22:28 yes 09:22:35 masahito: Based on bnemec's comment, I'd say we should not overthink it... Maybe post a new patchset to take comments into account and run for approval ? 09:22:36 Thanks for the comments. 09:23:07 ah, yes. I plan to update this in tomorrow. 09:23:11 Nice! 09:23:32 sounds great 09:23:34 great, for me it's quite ok, I think we must first focus on simplicity 09:24:01 one of my concern was about the fact that oslo metrics was sending the data somewhere 09:24:07 are we sure we want to do that? 09:24:16 yes and getting the code out there. If there was a fatal flaw in the idea, we'd have had more comments on that spec :) 09:24:57 amorin: you mean central collection? what would the alternative be? 09:25:30 And as Ben commented it's better to public our codes for a first look. But I need to check which doesn't have sensitive information in our company. 09:25:43 IMHO, oslo.metrics should collect the metrics and leave it locally on an unix socket 09:25:45 masahito: of course, take your time :) 09:26:08 the central collection could be done by antoher tool 09:26:35 but if your implementation is already doing it, then it's ok :p 09:26:37 amorin: Good point. In our cases, oslo.metrics sends data. But if that's not good for other company, I don't want to push our cases. 09:27:01 in my company, we already have some kind of collectors on hosts, 09:27:07 so I'd like to reuse those 09:27:14 amorin: could that be one config option? 09:27:20 of course 09:27:47 I wasnt aware that an implemetnation was already done 09:28:01 maybe we can propose that once the original code is posted 09:28:11 make sense 09:28:23 amorin: it's what masahito presented in Shanghai, running at LINE 09:28:24 amorin: Not a much implementation. We just started. 09:28:36 ah ok! 09:28:40 perfect then ;p 09:29:18 If there is one thing I learned about telemetry and metrics collection, it's that you need to be flexible to adapt to every case 09:29:32 and that simple is better than complex for that reason 09:30:08 ok, anything else to discuss re: "Scaling within one cluster" goal ? 09:30:21 I added something 09:30:23 #action masahito to post new patchset of spec 09:30:32 on bare metal scaling. Is it time for that? 09:31:00 yes, in a sec 09:31:08 #topic Other topics 09:31:15 oneswig: go for it 09:31:46 ah right 09:31:58 so I was thinking this might be another front for discussion 09:32:18 how big can bare metal clusters go, and how are they managed in extremis? 09:32:44 We have experience with working on the rate at which nodes can be deployed concurrently 09:33:07 I feel like that's definitely aligned with the "scaling within one cluster" goal, for a specific definition of cluster 09:33:33 One limitation can be the processing rate of networking-generic-switch, if we aren't reverting to flat networking 09:33:46 ttx: think so too :-) 09:34:05 oneswig: it could start as a scaling story 09:34:23 I don't have much to contribute on this yet but I'm happy to write and maintain a scaling story for this based on our experiences. 09:34:25 I don;t think we have enough data points to start making separate categories 09:34:34 that could be nice 09:34:50 you eventually found some tunings to be done for your story, no? 09:34:50 cool, sign me up. 09:35:02 Once we have plenty of stories we'll have to think on how to best present them, but for now the big etherpad is enough 09:35:11 Well, the most recent work has been a combination of tunables and refactoring of n-g-s 09:35:24 #action oneswig to contribute a scaling story on bare metal cluster scaling 09:35:39 nb, might take a couple of months 09:35:59 oneswig: Oath / Verizon Media has a massive footprint of bare metal clusters, would be nice to hear from them 09:36:08 which brings us to our next point 09:36:10 for sure! 09:36:19 belmoreira and masahito volunteered to join the programming committee for the "Large-scale Usage of Open Source Infrastructure Software" track 09:36:23 at OpenDev event in Vancouver, June 8-11 09:36:31 It's our opportunity to shape the discussion so that it aligns with the group's objectives 09:36:37 And recruit new members 09:36:39 We also have tons of baremetal but not managed by OpenStack ;-< 09:36:53 I'm already involved with another track, so Allison Price will be the OSF contact to organize that track. 09:37:05 I'll follow closely still, and we can discuss it in this meeting 09:37:10 unfortunately I wont be able to attend 09:37:22 amorin: that's unfortunate 09:37:31 A track has several subthemes... 09:37:36 I expect to be there and will participate provided it doesn't conflict with the scientific sig 09:37:49 So we could have one around collecting scaling stories from the participants 09:37:54 oneswig: it does not 09:37:58 I plan to be there as ttx said 09:38:12 Opendev tracks are in the mornings and PTG in the afternoons (+ last day) 09:38:33 We could also have a specific theme around scaling bare metal clusters 09:38:57 The format is pretty open, the idea is to prefer direct engagement over formal talks 09:39:13 ttx: sounds good to me. 09:39:47 so as you are contacted and start discussing the contents, please push our SIG's agenda ! 09:40:06 +1 09:40:11 ideally we'd recruit new members, but at the very least we should collect information 09:41:01 I'd really like a scaling-stories session where people can informally talk about scaling anecdotes, and then we'd take notes and follow up with them to submit something more detailed 09:41:34 #topic Next meeting 09:41:40 everyone still OK with a biweekly frequency? 09:41:50 it's ok for me 09:41:50 yes 09:41:55 yes 09:42:35 #info next meeting: March 11 09:42:47 Anything else to mention before we close ? 09:43:32 nothing here 09:43:36 #idea have a scaling-stories session where people can informally talk about scaling anecdotes, and then we'd take notes and follow up with them to submit something more detailed 09:43:57 ok I think I captured everything 09:44:01 Thanks everyone 09:44:07 #endmeeting