14:01:26 #startmeeting large_scale_sig 14:01:26 Meeting started Wed Nov 18 14:01:26 2020 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:30 The meeting name has been set to 'large_scale_sig' 14:01:31 #topic Rollcall 14:01:35 Who is here for the Large Scale SIG meeting ? 14:01:39 o/ 14:01:41 o/ 14:01:51 amorin: ? 14:01:55 mdelavergne: ? 14:02:07 Hi! 14:02:25 hey! 14:02:28 I am here! 14:02:34 thanks for ping :) 14:02:36 perfect, let's get started 14:02:41 Our agenda for today is at: 14:02:44 #link https://etherpad.openstack.org/p/large-scale-sig-meeting 14:02:51 #topic PTG/Summit postmortem 14:03:04 A few weeks ago we had our Forum and PTG activities, and I'd like to discuss how successful that was 14:03:11 My view on it is that the Forum session was a success with new people sharing experience 14:03:22 The two PTG meetings were not as successful with only a couple of people showing up 14:03:28 What do people think? 14:04:19 Personally, I think that means we need to separate SIG activities (the inner circle) from the sharing of experience (outer circle) 14:04:42 Yeah, I think summit have more operators attending 14:04:48 as it's a lot easier to get people to show up when you ask a specific question, than askign people to join a SIG meeting 14:05:00 agree 14:05:07 For reference here is the link to the Forum and PTG etherpads: 14:05:11 #link https://etherpad.opendev.org/p/vSummit2020_OpenStackScalingStory 14:05:16 #link https://etherpad.opendev.org/p/wallaby-ptg-largescale-sig 14:05:34 ttx if I remember well I didn't attend the PTG sessions because the timeslot 14:05:43 amorin, mdelavergne what did you think 14:05:59 belmoreira: our hope was that some of the people from the week before would show up 14:06:07 I really enjoy the summit part, with a lot of operator giving feedbacks 14:06:07 but i think that was misguided 14:06:19 unfortunately I was not avaible for PTG, so I miss that 14:06:28 nothing much, sorry I was really occupied these last few weeks, I missed a lot :( 14:06:52 anyway, I have the feeling that operators have things to say 14:07:02 I dont know how we can imply them more in our meetings 14:07:11 I've been giving it some thought 14:07:20 you want to split the group in 2 parts? 14:07:23 and we discussed that with genekuo at the PTG sessions 14:07:38 ttx, true. But for some reason we see europeans participating more in these meetings and if I recall correctly it was at the beginning and end of the day. For example for me was impossible to attend 14:07:38 I think we've discussed with collaborating with OpsMeetup during PTG 14:07:45 amorin: I think expecting people to join the SIG meeting and share their experience is not going to happen 14:08:03 We need to use other types of exceptional events to raise the questions we careabout and get the feedback we need 14:08:23 But that will be the topic for the next meeting 14:08:27 as we have lots to cover today 14:08:51 ack 14:08:56 but yes, homework before next meeting will be to think about how we can more efficiently collect feedback 14:08:59 ttx I'm just trying to understand why, maybe, the PTG sessions didn't have a wider participation 14:09:37 belmoreira: yes that explains why the regulars did not show up, but not why the expected new people did not show up either :) 14:09:53 #topic Wallaby setup 14:09:57 So as we start a new cycle I'd like to discuss three things: 14:10:07 meeting frequency, update co-chairs, and organization of workstreams 14:10:17 Regarding meeting frequency, during the PTG we discussed moving back to a single biweekly meeting 14:10:27 It seems our experience with rotating meeting times resulted in spreading out the team without getting regular new members 14:10:42 So I ran a poll and we arrived at that 14utc slot 14:10:52 We might push it back one hour to 15utc to be more friendly to imtiaz (15utc was a popular time in the survey too) 14:11:06 ok for me 14:11:12 (he wanted to join today but said 6am was a little early) 14:11:22 I'm OK with both slots 14:11:23 I'm ok, even a bit later at 16 utc 14:11:36 same here, 16utc is my last slot 14:11:48 uh... 16 UTC can be more difficult for me 14:11:53 Let's try 15utc. I don;t want to over-adapt around someone that has not attended any meeting yet 14:12:00 no problem 14:12:09 15 or 16 is fine by me 14:12:13 Regarding co-chairs, I'd like to update the list since we have not seen Pengju Jiao recently 14:12:26 Anyone interested? the role is about helping keeping track of workstream status and (when I'm not available) help chairing meetings 14:12:30 belmoreira: interested in continuing? 14:12:46 anyone else ? 14:12:50 sure 14:12:52 I'm ok 14:13:11 to help out 14:13:11 genekuo: ok to be chair or ok not to be chair? :) 14:13:26 ok to be chair, that works 14:13:28 ok to be co-chiar 14:13:31 the more the merrier 14:13:32 *chair 14:13:47 I'll update and replace Pengju Jiao with Gene Kuo 14:13:57 Now... Regarding organization of workstreams 14:14:01 #topic Reorganize work into a scaling journey 14:14:19 During PTG with genekuo we discussed reorganizing our goals into more of a scaling journey 14:14:25 Like as you scale up, you: 14:14:31 1. configure/optimize your single cluster to handle additional load 14:14:39 2. monitor your cluster to detect strain/limits 14:14:46 3. scale up until you reach limits 14:14:55 4. scale out to multiple clusters/regions/cells once you reach limits 14:15:06 And our work is essentially about helping people throughout that journey 14:15:20 So for (1) we work on documenting configuration options for large scale 14:15:30 For (2) we work on oslo.metrics and other meaningful monitoring solutions 14:15:41 For (3) we identify and push back low-hanging fruit limits scaling within one cluster 14:15:52 (That can include documenting how to properly shard a RabbitMQ install, for example) 14:16:02 and finally for (4) we explain the various options and how to do it 14:16:22 I feel like that makes our purpose a lot clearer. It's just about helping people at various stages of this journey. 14:16:36 And it does not have to be complete or anything. Every bit helps. 14:16:44 Does that make sense to you? 14:16:49 yes 14:16:53 yes, I like the idea 14:16:57 yep 14:17:14 it's a lot less abstract than setting artifical "goals" imho, and also less intimidating 14:17:32 In terms of output format, I was thinking we could use a FAQ format. That way we can list common questions, and provide/improve answers when we have them 14:17:45 FAQs have two interesting properties: 14:17:53 - Listing questions is as important as listing answers 14:18:01 - It's OK if the FAQ is constantly work in progress 14:18:17 so again, less intimidating than a doc or a white paper that has to be "finished" 14:18:23 How does that sound? 14:18:26 yep, good idea 14:18:39 faq on wiki for example? 14:18:44 or directly in documentation? 14:18:55 Yes, it's easier to answer questions than to write a document from scratch 14:18:56 +1 14:18:58 yes, I was thinking refactoring the wiki to describe the journey 14:18:59 I'd like to have it in doc directly 14:19:10 Agree with doc 14:19:12 it can still point to doc artifacts 14:19:17 usually, operators are looking at config options, deployments etc 14:19:23 I rarely refer to wiki for OpenStack stuffs 14:19:33 if we have a page over there, it could be nice IMHO 14:19:42 genekuo same here 14:19:43 Like obviously, oslo.metrics will not live in a FAQ, it will be a proper software library 14:19:51 same for documenting options 14:19:51 yup 14:20:13 But the main artifact of our group should be this set of wiki pages (at least for now) 14:20:21 that describe the journey and answer questions 14:20:25 wherever this is put, we can still link it from the other anyway ? 14:20:27 pointing to other materials 14:20:42 ok for me 14:20:50 sounds good 14:21:03 Like one question could be "my database is getting quite large with stale entries, what should I do about it" and point to OSarchiver 14:21:39 :) 14:21:55 personally I have a lot more questions than answers, so I like a format that lets me ask them, even if for now we don;t have all the answers 14:22:17 OK if that sounds good, I can work on refactoring our wiki docs 14:22:23 good starting point :) 14:22:24 #action ttx to refactor wiki and other etherpads into that "journey" FAQ view 14:22:26 yep, at some point somebody will have the answer and add it, hopefuly 14:23:02 I'll replace all our etherpads with a single one that tracks what the SIG is actively working on 14:23:11 (rather than one per stream) 14:23:45 agreed, we have a lot of scattered pages, better reorganize or link them in one page 14:23:49 Any other suggestion on how we should reorganize how we work? 14:24:19 :) 14:24:45 ok moving on to next topic 14:24:49 #topic The road to oslo.metrics 1.0 14:24:59 So one objective we have for the Wallaby cycle (so between now and April 2021) is to get to a proper oslo.metrics release 14:25:18 Before our first 0.1 release we should have: 14:25:21 - Basic tests (https://review.opendev.org/#/c/755069/) 14:25:26 - Latest metrics code (?) 14:25:35 Then before the end of the Wallaby cycle we should have: 14:25:41 - oslo-messaging metrics code (https://review.opendev.org/#/c/761848/) 14:25:48 - Enable bandit (issue to fix with predictable path for metrics socket ?) 14:25:52 - Improve tests to get closer to 100% coverage 14:25:56 genekuo: anything I missed? 14:26:00 ttx: I basically have the latest main code merged in the last commit 14:26:20 THe rest of them are mostly additional metrics which I would like to add after 0.1 release 14:26:22 genekuo: with the socket patch? 14:26:26 Yep 14:26:28 ah ok 14:26:42 So we could merge the basic tests 14:27:00 and then do a 0.1 release that you can start using in oslo.messaging 14:27:39 Ah, I should have another patch handling sigterm, but it shouldn't affect the test 14:27:47 Let me update it later this week 14:27:56 Yes, thanks! 14:28:05 genekuo: genekuo I'll add you to oslo-metrics-core so you can +2 my patch 14:28:17 OK 14:28:24 ok you should be able to +2a now 14:28:34 if you reload the page 14:28:59 then I'll check what it takes to do a 0.1 release 14:29:13 thank you 14:29:27 #action genekuo to review/approve https://review.opendev.org/#/c/755069/ 14:29:43 #action ttx to set up release jobs and request a 0.1 release 14:30:05 That should keep us busy between now and next meeting 14:30:15 #topic Next meeting 14:30:23 So our next meeting should be December 2. 14:30:45 We'll do 15utc, one hour later compared to today, in this channel if it's available 14:30:51 #action ttx to reschedule meeting to be biweekly on Wednesdays, 15utc 14:31:01 there is a chance that I will not be available 14:31:47 I'd rather keep that meeting date that way we can do the next one on Dec 16 14:31:53 then skip for holidays 14:32:13 and be back on January 13 14:32:40 I'm ok with the timeline 14:33:02 is there anythig else we can make progress on between now and next meeting? 14:33:19 amorin: was there any progress on the OSops/OSarchiver side? 14:34:20 nop, was not able to find time for this 14:34:38 I suspect recent weeks at OVHCloud have been kind of busy 14:34:49 yes :) 14:35:13 OK anything else, anyone? 14:35:47 oh, I should write an action item for the homework 14:36:11 #action all to think about how to improve how we collect feedback 14:36:19 since that was not a stellar success during the Victoria cycle... 14:36:46 Since we have a few more minutes, I'll talk about what we discussed in the PTG sessions 14:37:12 The suggestion there was to make those feedback sessions more "exceptional" than a stabnding etherpad open for comments 14:37:43 Like we've had those etherpads like the scaling-stories etherpad open, with calls on the mailing-list for people to add experience to it 14:37:47 that's just not working 14:38:15 But when we create a session about "sharing your scaling story" at an ops meetup or a Forum, people come 14:38:54 so basically, making it an event, and focusing it on one question 14:39:13 agree 14:39:31 should it be written or audio ? 14:39:33 Could be just piggybacking on existing events, every time they happen... or maybe create our own "event" 14:39:53 like a monthly "question to operators" session on Zoom or whatever 14:40:08 audio/video is really nice in my opinion 14:40:45 I think IRC is very efficient for work group meeting as it's minutes are recorded and you can easily take #actions and such 14:40:56 But not for ops sharing experiences 14:41:11 for that audio/video seems to be a plus, or in-person 14:41:27 More work for us to extract learnings from that, but that's better than no feedback at all 14:41:39 if we keep this session (maybe more than one slot) in the forum it's a good start 14:42:04 also, a ops event... (not sure if the ops team is organizing something virtual) 14:42:11 Yes, obviously we should make sure we leverage future forums. But once every 6 month might not be enough :) 14:42:46 So that's the general idea I had, but you can noodle on it for two weeks and we'll dsicuss it again on Dec 2 14:43:01 I also vote for collaborating with ops event 14:43:21 Also, make the SIG group more comfortable for existing attendees, rather than jump through hoops for hypothetical new members 14:43:49 thag means obviously creating an inner circle and an outer circle, but that's what we have anyway, the regulars and the others 14:44:27 Yes we should track what the ops meetups group is giong to do next 14:44:44 Alright, that is all I had for today. Anything else we should be doing ? 14:45:10 ttx: do ops meetup have regular irc meeting? 14:45:16 let me see 14:45:18 would like to join if possible 14:45:23 Sorry for joining a bit late despite my best intentions to wake up earlier. Do we have a Zoom session as well for this meeting? 14:45:35 http://eavesdrop.openstack.org/#OpenStack_Ops_Meetup_Team 14:45:53 imtiazc: no only IRC, makes it easier to write minutes :) 14:46:03 thanks ttx 14:46:13 imtiazc: thanks for joining, we agreed to move one hour later for the next meeting 14:46:21 So December 2, 15utc 14:46:31 Thank you @ttx 14:46:32 I'll post the summary and logs for this meeting 14:46:53 imtiazc: we have a few minutes left, maybe you can introduce yourself? 14:46:57 Then we will 14:47:12 :) 14:48:21 or maybe we can all introduce ourselves at the same time 14:48:26 Sure, this is Imtiaz Chowdhury, Cloud Architect from Workday. At Workday, we have been deploying OpenStack based private clouds for 7 years. 14:48:59 Hi! 14:50:05 I'm Thierry Carrez, VP Engineering at the now Open Infrastructure Foundation. I'm helping drive this group because I have an interest in getting large users to contribute their experience running openstack, and receive lots of questions from users that worry about the scaling journey and would very much like that we have great answers to that 14:50:32 So even if I don;t have any answers, I help greasing the wheels of the group 14:50:34 I'm Gene Kuo, working at LINE as Infrastructure engineer. Our team have been developing and operating OpenStack based private clouds to run our services. 14:50:56 I am Arnaud Morin, from OVHCloud, we are running OpenStack to provide a public cloud 14:51:08 I am involved in the team which deploy and manage the infrastructure 14:51:38 Hi, I'm Marie Delavergne, phd student for Inria and I work on scaling OpenStack in different ways :) 14:51:53 I'm Belmiro Moreira. I work at CERN as Cloud Engineer 14:53:00 imtiazc: I invite you to read the logs of this meeting, should give you a lot of insights on how we plan to reorganize work of the group for the Wallaby cycle 14:53:12 and we'll talk again in two weeks 14:53:29 Anything else before we close, anyone? 14:54:08 Nope :) 14:54:11 Alright then... thanks! 14:54:15 #endmeeting