15:00:32 #startmeeting large_scale_sig 15:00:33 Meeting started Wed Dec 2 15:00:32 2020 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:37 The meeting name has been set to 'large_scale_sig' 15:00:39 #topic Rollcall 15:00:44 Hi o/ 15:00:44 Who is here for the Large Scale SIG meeting ? 15:00:49 Hi o/ 15:01:13 amorin is probably busy. If not, he should be 15:01:45 belmoreira: maybe around? 15:02:02 o/ 15:02:07 thanks for the ping 15:02:38 hoping to see imtiaz soon 15:02:51 Alright, lets get started 15:02:56 Our agenda for today is at: 15:02:58 #link https://etherpad.openstack.org/p/large-scale-sig-meeting 15:03:23 checking for late-added agenda items 15:03:37 #topic Review previous meetings action items 15:03:43 "ttx to refactor wiki and other etherpads into that "journey" FAQ view" 15:03:47 So... I did reorganize all pages under: 15:03:51 #link https://wiki.openstack.org/wiki/Large_Scale_SIG 15:04:04 If you have a look... you can see, I split everything into 4 subpages, one for each stage of the journey 15:04:14 For each subpage there is a FAQ, a few resources links 15:04:22 + a section on other SIG work pertaining to that stage 15:04:39 o/ 15:04:44 Yeah I saw that, I'll be adding something to those pages before the next meeting 15:04:49 jpward: hi! welcome 15:05:01 Hi 15:05:05 yes, please feel free to add questions, answers or links relevant to the stage 15:05:11 liuyulong: hi! 15:05:14 Hi, seeing new people 15:05:37 I'll finish reviewing previous point and we'll do an introduction round 15:05:50 At our next meeting we'll review each stage contents, so please spend some time reviewing those in the next two weeks! 15:06:00 #action all to review pages under https://wiki.openstack.org/wiki/Large_Scale_SIG in preparation for next meeting 15:06:07 Also let me know if you have trouble logging into the wiki. 15:06:22 The goal being, anyone in the SIG should feel free to update those pages, not just me 15:06:46 comments on that point? 15:07:04 The structure looks pretty good to me 15:07:19 The 4 categories look good. Could we add "upgrade" story as well? 15:07:55 imtiazc: hi! I feel like like upgrade is an orthogonal concern at each stage 15:08:01 unless we add it as a 5th stage 15:08:19 like, once you have scaled out, how the hell do you upgrade? 15:08:38 exactly. It does become quite challenging. 15:08:54 I probably can add that maybe next year, we're planning on doing that 15:09:04 Should be a lot of pain 15:09:31 I like the idea of a 5th stage. does that make sense to you mdelavergne, belmoreira ? 15:10:00 I mean, if upgrading at high scale has unique constraints, it makes sense for us to document it 15:10:30 mmh yep ! 15:10:43 upgrade/maintain ? 15:10:47 OK, I'll create a skeleton page 15:10:55 "rinse, repeat" 15:11:23 #action ttx to add 5th stage around upgrade and maintain scaled out systems in operation 15:11:28 configuration tunning means you need to restart/reload/respawn the processes/servies/agents and so on, so looks like a upgrade already. : ) 15:11:44 ok, next action item from last meeting was... 15:11:49 "genekuo to review/approve https://review.opendev.org/#/c/755069/" 15:11:52 That's done 15:11:58 then... "ttx to set up release jobs and request a 0.1 release" 15:12:05 (for oslo.metrics) 15:12:08 By upgrade, I meant moving from one OpenStack release to another. 15:12:08 I did that: 15:12:14 #link https://review.opendev.org/c/openstack/project-config/+/763986 15:12:16 #link https://review.opendev.org/c/openstack/releases/+/764631 15:12:21 I've checked the PR seems one of them is blocked by CI 15:12:27 release should be processed very soon now, maybe today 15:12:56 err, failed again. Will have a look :) 15:13:17 #action ttx to make sure oslo.metrics 0.1 is released 15:13:21 "ttx to reschedule meeting to be biweekly on Wednesdays, 15utc" 15:13:31 That's done: http://eavesdrop.openstack.org/#Large_Scale_SIG_Meeting 15:13:31 I'll start working on oslo.messaging code once 0.1 is released 15:13:37 and finally... "all to think about how to improve how we collect feedbackPTG/Summit postmortem" 15:13:51 we'll discuss that in this meeting after the round of intros 15:14:07 #topic Introducing new SIG members 15:14:29 I see two new faces, would be good to introduce ourselves and our interest in the Large Scale SIG 15:14:33 I'll start 15:14:59 I'm Thierry Carrez, VP Engineering at the now Open Infrastructure Foundation. I'm helping drive this group because I have an interest in getting large users to contribute their experience running openstack, and receive lots of questions from users that worry about the scaling journey and would very much like that we have great answers to that 15:15:09 (yes, I copied last week's intro) 15:15:15 Hi, my name is LIU Yulong, I'm the core of Neutron project. So I just want to see how many scale issue/pains you guys see on Neutron. : ) 15:15:22 Hi, I'm Gene Kuo, working at LINE as Infrastructure engineer. Our team have been developing and operating OpenStack based private clouds to run our services. 15:15:37 liuyulong: neutron is definitely a hot topic around here 15:15:51 especially since rabbitMQ started to behave a bit more sanely lately 15:15:51 I also copied last weeks intro. 15:16:12 I probably cannot give a lot of feedback regarding neutron as we implemented our own plugins 15:16:14 Hi, I'm Marie Delavergne, PhD student working on large scale Openstacks :) 15:16:19 Neutron is often the first scaling pain point, so we appreciate you visiting! 15:16:41 Yes, I know that. 15:17:17 I'm working at China Unicom now. We have some large deployment for public cloud. 15:17:27 jpward: care to introduce yourself and tell us what you're interested in? 15:18:07 I'm John Ward, and I work for Global InfoTek, we have a number of different openstack deployments, the largest one that I am working on is a 15k core but looking to continue scaling. 15:18:48 nice! I suspect it's already made of multiple clusters? 15:18:56 or cells or.. 15:19:23 glad to be here, I bring some experience working on even larger cloud Rackspace public cloud is my previous experience 15:19:41 currently we don't have cells implemented, but that is on the road map 15:20:16 Nice, very interested in hearing how your scaling went so far ! 15:20:43 Cool, nice to meet you 15:20:46 So, for today's meeting we planned to discuss how to best collect feedback from experienced operators 15:20:51 #topic How to best collect feedback from experienced operators? 15:21:05 as I said a couple of weeks ago, during Victoria cycle we tried to use etherpads to collect scaling stories, then curate them onto a wiki page 15:21:11 That was not very successful. 15:21:19 (understatement of 2020) 15:21:29 In contrast, we had several people sharing at our Opendev and Forum sessions around scaling 15:21:41 So I was wondering if we should not change our strategy there 15:21:59 Rather than run opendev and forum sessions about scaling, in hope that people will join the SIG and share more... 15:22:06 I am Imtiaz Chowdhury. I am the Cloud Architect for Workday. At Workday, we have 45 clusters running over 9K hypervisors and now close to 500K core. The deployment size is expected to double next year. 15:22:19 Maybe we should run events specifically to collect those experiences, and not expect people to join the SIG afterwards 15:22:28 or fill an etherpad 15:22:34 What's your view on that? 15:22:34 Yes that what I also think 15:22:54 It's hard to get people fill out etherpad after work 15:23:13 (i mean, if people jion the SIG as regular members, that's awesome, but we should collect their scaling story without expecting them to join first) 15:23:32 I think an event with in OpenInfra summit and OpsMeetup is the best place to gather information 15:23:43 Also it's hard to write and easier to just discuss 15:23:44 Etherpad or any tool that allows collaborative editing: Google docs, Wiki 15:24:32 So how about... 15:24:38 People tend to get more active when there's event or deadline 15:24:38 We build a schedule of regular Large Scale SIG events (think ~ every 2 months) 15:24:47 piggybacking on existing events (forum, opendev, ops meetup) if available, or running our own if not 15:25:03 and use that to ask specific questions and collect output 15:25:07 I will suggest piggybacking at first 15:25:36 And advertise it to some event you can join even if you are not currently running large scale 15:25:37 genekuo_: yes but there isn;t much planned in the coming months. I have to doublecheck what the OpsMettup has planned 15:26:04 Other suggestions included: 15:26:07 - Leverage superuser nominations to extract knowledge 15:26:08 but planning to scale in the future 15:26:13 - Reach out to past speakers that spoke on scaling 15:26:22 - Engage with Chinese users 15:26:40 so, more direct or narrow outreach 15:26:49 Neutron team has a mechanism that will enable a deputy each week to collect/filter the bugs. 15:27:04 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018782.html 15:27:10 For instance ^ 15:27:11 I like those suggestions. 15:27:23 So maybe this SIG can add such routine for collecting the scale related infromations. 15:27:47 And feedback a mail to the community. 15:27:50 liuyulong: that sounds good 15:27:58 Superuser nominations, next round is a bit far away 15:28:09 Is there a way we could facilitate connecting different large scale operators? 15:28:15 but we could try to identify past summit talks on scaling 15:28:23 and reach out to speakers 15:28:31 Previous superuser nominations also will work 15:28:35 (in addition to actually extractign info from the video content) 15:29:10 I can help out reaching directly to those user if man power is needed. 15:29:41 Re: old scaling presentations, I'll create an etherpad where we can dump out findings, organized per event 15:30:02 I think that's a good resource to link to in our various stages page anyway 15:30:51 #link https://etherpad.opendev.org/p/large-scale-sig-scaling-videos 15:31:51 I can pick up some of those videos once the list is complete 15:33:21 I'll pick up listing for Shanghai 15:33:50 Will do others if I have additional time 15:34:19 If you have a few cycles, please assign yourself one of the summits and do a quick search for scale-related presentations 15:34:36 If you watch any, feel free to drop notes and remarks on the etherpad too 15:34:50 #action all to help in filling out https://etherpad.opendev.org/p/large-scale-sig-scaling-videos 15:35:13 I'll check out the Ops meetups future plans 15:35:23 #action ttx to check out Ops meetups future plans 15:35:40 I shall look at Virtual summit and Denver. I shall also add the past scaling presentation we did from Workday 15:35:53 imtiazc: great, thanks 15:36:30 as far as Chinese users go, I'll defer to Chinese contributors. It feels like we get a lot of engagement when we use China-specific social media 15:36:56 so I was wondering if we could use that to ask simple questions from large scale deployments in China 15:37:25 am open to suggestions on how to best proceed theer 15:37:29 there* 15:38:31 I am not sure about that. Could we get some help from Jonathan Bryce or Mark C here? They seem to at least have contacts of the large operators and sponsors from China 15:38:43 hmm, I can ask Rico if he can help 15:38:55 ricolin 15:39:29 That suggestion was actually from Rico :) 15:39:38 yeah I know 15:40:07 OK, that sounds like great first steps. Any other suggestions? 15:41:17 Alright then, moving on to next topic 15:41:19 #topic Next meeting 15:41:27 Our next meeting will be December 16. 15:41:35 (Then we'll skip, and have the one after that on January 13) 15:41:51 The main topic for that next meeting will be to review all stages, and identify simple tasks to do a first pass at improving those pages 15:41:53 same time on the 16th? 15:41:57 yes 15:42:11 I'm ok 15:42:24 ok 15:42:43 So between now and then, please check out the base content at https://wiki.openstack.org/wiki/Large_Scale_SIG and think a bit on how we can do a first pass at improving that 15:43:05 I bet there are a few easy question/answers we could add 15:43:38 yep 15:43:39 Like, put yourself back into that stage of your own scaling story, and answer one of your own early questions you had 15:44:04 #topic Open discussion 15:44:20 That is all we had on the agenda... Is there anything else you would like to discuss? 15:44:35 Nope :) 15:44:49 nice meeting everyone, nothing else from me 15:44:51 I have a question on deployment story 15:44:58 If not, I'll wrap up now and post the summary. We have a bunch of actions to work on between now and next meeting 15:45:07 imtiazc: yes? 15:45:53 What deployment tools work best for large scale deployment? We are aware of limitations of TripleO but not so sure about Kolla. 15:47:03 We currently write our own Ansible script and separate compute nodes to different host groups 15:47:03 I heard good things of OpenStack-Ansible, but i don't run a deployment myself. What do you all use, if anything? 15:48:02 jpward, belmoreira, liuyulong: any specific tooling? 15:48:20 We are using salt for our deployment currently, I have used OSA and TripleO in the past 15:48:28 We are currently using community forked Chef based tools with some home grown tools. 15:48:39 wow lots of homegrown tools 15:50:19 I thought there was more convergence toward community deployment tools, but maybe I imagined things 15:50:52 ttx: Do you think "deployment" story could be added as stage zero to the list of categories? 15:50:57 I personally use kolla-ansible for my own cluster before and have good experience with it, but the scale is very small 15:51:01 jpward: I had an unrelated question, how did you learn about the SIG? 15:52:50 I run across the wiki site one day when searching for something else 15:53:11 ah? funny 15:53:30 unexpected 15:54:11 yeah usually people can't find anything in the wiki 15:54:40 alright, if nothing else... 15:54:52 Let's continue the discussion at our next meeting 15:54:52 I like the idea of deployment as stage zero! 15:55:16 sorry... couldn't follow the meeting... fighting some fires 15:55:17 mdelavergne: we'll end up writing a complete guide to openstack :) 15:55:26 ahah 15:55:51 belmoreira: it's ok, you can catch up with the logs 15:56:18 belmoreira: I should know that, but do you use a specific deployment tooling to handle the CERRN deployment? 15:56:27 CERN* 15:56:50 for configuration management we use puppet 15:57:03 we deploy OpenStack using puppet 15:57:17 using the openstack-puppet upstream stuff? 15:57:17 Our operators use ansible, but the templates are written by them, not the openstack ansible. 15:57:31 yes, openstack-puppet 15:57:47 alright, thanks everyone, time to move to.. another meeting 15:57:55 #endmeeting