15:00:01 #startmeeting large_scale_sig 15:00:02 Meeting started Wed Feb 10 15:00:01 2021 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:05 The meeting name has been set to 'large_scale_sig' 15:00:07 #topic Rollcall 15:00:13 Who is here for the Large Scale SIG meeting ? 15:00:29 o/ 15:00:31 pinging regulars: amorin belmoreira 15:00:45 o/ 15:00:48 thanks ttx 15:00:50 hello! 15:00:53 thanks 15:01:02 @alistarle @leducflorian and @loan from Société Générale 15:01:10 o/ 15:01:12 welcome guys :P 15:01:25 Pretty sure genekuo will not be joining us as it's Chinese New Year 15:01:35 We could almost do the meeting in French 15:01:53 :) 15:01:54 :) don't make fun of me 15:02:08 Hello guys 15:02:09 belmoreira: I suspect your french is better than my English 15:02:15 Our agenda for today is at: 15:02:19 #link https://etherpad.openstack.org/p/large-scale-sig-meeting 15:02:40 First let's quickly review our action items from last meeting... 15:02:42 ttx I won't bet on it 15:02:43 #topic Action items review 15:02:57 - ttx to revive the OSarchiver upstreaming effort 15:03:02 Discussed at http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020116.html 15:03:10 I'm going to gauge interest from the OpenStack TC for option (3) vs. option (2) 15:03:13 `o/ 15:03:18 reedip: hi! 15:03:27 #action ttx to contact OpenStack TC re: OSarchiver and see which option sounds best 15:03:43 - ttx to start a "how many compute nodes in your typical cluster" discussion on the ML 15:03:57 Done @ http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020084.html 15:04:00 ttx ack 15:04:07 That triggered a lot of good insights, which I'll work on to summarize 15:04:19 very good indeed 15:04:21 #action ttx to summarize "how many compute nodes in your typical cluster" thread into the wiki 15:04:35 That's pretty concise ... 15:04:46 amorin: yes it did encourage me to start other such discussions on the ML 15:04:53 - belmoreira to post first draft of the ScaleOut FAQ 15:05:07 belmoreira: I saw some updates passing by? 15:05:25 yes, updated the scaling out and upgrades sections 15:05:36 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut 15:05:43 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain 15:05:44 let me know if you have any comments 15:05:59 I only reviewed the first one, but it looks great 15:06:05 and if you have any other questions that we can add 15:06:21 Like good enough for us to point people asking questions to it now 15:06:37 - all to think about 5-10min presentations to use in a video version of our SIG meeting 15:06:45 Any idea on that yet? 15:06:51 I have few 15:07:16 for example: 15:07:27 Regions vs Cells 15:07:27 Inception - Run your OpenStack control plane on top of the infrastructure that it manages 15:07:27 (to summarize for newcomers, the idea was to once in a while do lightning talks in a video meeting version of this meeting, to touch a wider audience) 15:07:28 Scaling out with Nova Cells 15:07:28 RabbitMQ clusters for large OpenStack Infrastructures 15:07:30 Upgrades in a Large Scale infrastructure 15:07:30 Fine-grained scheduling in a Large Scale infrastructure 15:07:32 OpenStack control plane DBs maintenance in a Large Scale infrastructure 15:07:32 How to scale Glance, Cinder and other core projects in a Large Scale infrastructure 15:07:34 Operations in a Large Scale infrastructure 15:07:34 AVZs with multiple Cells 15:08:22 sounds like we have plenty of subjects for the next few month :) 15:08:23 That's a great list! I bet a lot of people would be interested in that. 15:08:55 we can do at OVH something about operations 15:09:00 We should rename the meeting "Fireside chat with Belmiro, tips from searching for the Higgs Boson" 15:09:02 this can be a 10 min presentation, to trigger discussion 15:09:13 we have mistral that is doing a self-healing of openstack 15:09:24 Those are great topic @belmoreira 15:09:26 ttx haha 15:09:39 OK, let's discuss at the end of the meeting when to schedule the first one, and which topic to pick 15:09:39 Sure, I think few of them can interest us at SG 15:10:23 Back to the agenda... 15:10:24 please suggest also other topics, these are just few that we can talk about 15:10:31 Does operations of a large scale cover monitoring i.e getting logs and metrics 15:10:48 yes maybe I should create a wiki page or etherpad to collect topics 15:10:56 so that people can asynchronously propose those 15:11:28 imtiazc I was not thinking in monitoring. But definitely we should have a session about monit 15:12:19 That was something I asked in the last meeting as well 15:12:25 https://etherpad.opendev.org/p/large-scale-sig-lightning-talks 15:12:27 For monitoring 15:12:35 belmoreira: please dump your list in there 15:12:48 sure 15:13:39 ok, moving on to... 15:13:41 #topic Discussing Ceph in a future forum setting 15:13:57 So we have several new folks joining today from Societe Generale 15:14:09 They had questions around Ceph scaling, like: 15:14:15 - Single cluster vs. multiple smaller clusters ? 15:14:21 - Ceph cluster optimization (number of nodes, enabled features...) in a large scale cluster 15:14:27 - Performance optimization for SSD/HDD use cases 15:14:33 - Improve resilience through erasure coding implementation 15:14:41 We did not really discuss Ceph yet 15:15:15 But I know CERN is running it extensively, so we should be able to tap into the group's experience running that as well 15:15:23 Thanks for the summarize ttx 15:15:32 Should we discuss those here today? Or schedule some Ceph-specific session in the near future and try to make some noise to get more people to show up? 15:16:13 I guess it boils down to who feels comfortable talking about their Ceph scaling 15:16:19 at CERN, ceph is not managed by the Cloud team. We have a storage team that looks after all the storage solutions available in the Organization 15:16:37 Ah, ok, did not know that :) 15:16:55 Workday's CEPH deployment footprint is growing as well. So far, it is loosely integrated with OpenStack. 15:16:55 belmoreira: do you think some of them might be interested in sharing their experience there? 15:17:48 probably we should point first to the presentations that they already gave in OpenStack Summits and Ceph Days 15:17:55 I was thinking we could set up some specific Ceph/OpenStack session and try to get Ceph community people + operators versed in Ceph+OpenStack to discuss how they do things 15:17:58 let me find some references 15:18:19 since we probably don;t have critical mass of Ceph experience right here today 15:18:44 at OVH, we have a specific team for that as well 15:18:48 but maybe they will be open for a short presentation in one of our future sessions 15:18:54 Yes it seems in large deployment the Ceph part is likely separated from the Openstack one 15:18:55 I can ask if one of the guys could join and talk about it 15:19:02 Which seems to be a good idea actually 15:19:10 maybe in the mean time we can start a few threads on the mailing-list to start collecting solutions, but also names of people to involve in that specific session 15:19:55 today at Societe Generale the cloud team manage almost all the stack (except for cinder API + ISCSI over pure storage solutions). 15:20:59 FWIW we are interested in facilitating integration, so I'm happy to have the OpenStack large scale SIG host a Ceph-oriented discussion 15:21:38 but if the teams are often separated, we'll likely need to do some prep work to get the right people 15:22:26 So my proposal would be: 15:23:25 1- The SG people start a thread on a specific Ceph integration question they have on the OpenStack mailing-list, see if we get teh Ceph experts out of the woods and participating 15:23:41 There is another topic, not related to scaling, but is being discussed a lot: what operating system should we continue on? For those who have been using CentOS? 15:24:26 2- Meanwhile, we plan to have in the near future a Ceph/OpenStack scaling session of the Large Scale SIG video meetings, and start gathering names of people to invite to that 15:24:58 Does Large Scale SIG already have video meetings or it will be a exceptional session ? 15:25:29 We are just getting started with video meetings, and the first one should probably not be a Ceph specific one 15:25:51 I noticed that CERN is coming up with strategy. I would imagine all large scale operators will have to come up with a plan on how to update CentOS . Is there enough interest on this? 15:25:57 But... at the Foundation we are also working on a format for a short/recurring virtual event 15:26:02 3- Maybe it could be interesting to post a wiki about Ceph scaling (as the Large Scale SIG wiki that you created recently)? 15:26:29 so maybe we could also fit into that and get the Foundation promotion machine to help us assemble a crowd 15:26:30 ttx I like 2, but when these meetings have some traction and attendees... to avoid some disappointment when having other teams presenting 15:26:53 imtiazc ++ 15:26:57 belmoreira: yes we would need to have a number of presenters to seed the discussion 15:27:22 loan: yes, once we start collecting answers we could totally document them on the wiki 15:27:40 you all could actually help with that :) 15:28:02 ttx: for the wiki, maybe we can begin by filling up question without answer, to launch the discussion ? 15:29:08 maybe we misunderstood the purpose of these wikis 15:29:20 imtiazc: I added that suggestion to https://etherpad.opendev.org/p/large-scale-sig-lightning-talks 15:29:32 in the sublist of things we would like to discuss but need key presenters for 15:30:01 For reference, some links about the ceph setup at CERN: 15:30:02 https://www.youtube.com/watch?v=OopRMUYiY5E 15:30:02 https://www.youtube.com/watch?v=21LF2LC58MM 15:30:02 https://www.youtube.com/watch?v=0i7ew3XXb7Q 15:30:05 alistarle: yes, that's how we've been using them. To list the questions and the answers 15:30:17 sometimes just the questions 15:31:05 alistarle: I can create a page for Ceph Q&A, seeding it with your set of questions 15:31:43 LGTM, and we can complete them by detailing it a little more if needed 15:31:49 exactly 15:32:09 OK, let me summarize all the ideas for the meeting logs 15:32:35 #info The SG people start a thread on a specific Ceph integration question they have on the OpenStack mailing-list, see if we get the Ceph experts out of the woods and participating 15:33:07 #info we plan to have in the near future a Ceph/OpenStack scaling session of the Large Scale SIG video meetings, and start gathering names of people to invite to that. need key presenters first 15:33:33 #action ttx to create a wiki page for Ceph-related questions and seed it with SG questions 15:34:07 #info In the list of "sessions we would like to have but need to find the right people first", we also have CentOS upgrading 15:35:01 #info ttx should see if that sort of discussion could fit in the "regular short virtual event" format that the Foundation wants to support 15:35:02 thx for all those usefull references belmoreira 15:35:24 Alright, I think I captured everything 15:36:19 #info If you have ideas of lightning talks that could seed a discussion, or ideas of sessions we shouldhave but lack presenters for, please add them to https://etherpad.opendev.org/p/large-scale-sig-lightning-talks 15:36:59 Anything else on that topic? 15:37:34 Business summary: We don't have answers, but we have a plan to gather them :) 15:38:09 And yes, would be good to dig into the trove of past summit presentations see if anything useful turns up 15:38:15 CERN but also elsewhere 15:38:42 #link https://www.openstack.org/videos/search?search=Ceph 15:39:08 I did have a look, but those sounded pretty specific and not general enough 15:39:10 Everything seems good. We are starting to gather informations on our side about that because it's quite urgent on our side (thining about redesigning our ceph infrastructure). 15:39:40 So maybe we will also have some results / presentations to share in a near future. 15:39:51 https://www.openstack.org/videos/summits/sydney-2017/the-dos-and-donts-for-ceph-and-openstack looks promising 15:39:59 And Florian is such a great speaker 15:40:53 Alright, if nothing else, let's move on to next topic 15:40:54 yeah, it seem very interesting :) 15:41:04 #topic Progress/Blockers 15:41:10 * Stage 1 - Configure (amorin) 15:41:14 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Configure 15:41:22 Any progress/blocker to report, amorin ? 15:41:31 I havent been able to move this topic forward 15:41:53 * Stage 2 - Monitor (genekuo) 15:41:59 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Monitor 15:42:36 gene is not around, I can report a few patches were posted for oslo.metrics, slow progress on the oslo,messaging integration front 15:42:43 * Stage 3 - Scale Up (ttx) 15:42:47 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleUp 15:43:10 Started that thread on the ML and will be digesting answers on the page soon 15:43:20 # Stage 4/5 - Scale Out, upgrade & maintain (belmoreira) 15:43:23 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut 15:43:25 #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain 15:43:40 we saw progress already, any blocker? help needed? 15:43:59 as mentioned before, updated the pages. Please have a look. 15:44:19 ok then... 15:44:21 #topic Next meeting 15:44:43 Our next meeting should be in two weeks, Feb 24, but I'll likely not be available on Feb 24 as we have a Foundation all-hands meeting planned 15:44:54 We can push back, or someone else can chair 15:45:17 plenty of information to gather thanks all 15:45:47 we could also schedule our video meeting instead 15:45:49 since you are the main driver of the group I would suggest to postpone, especially if you start the zoom 15:46:49 belmoreira: Is there a topic in your list you would feel comfortable running for our first video meeting the week after? 15:46:53 like... March 3rd? 15:47:19 Regions vs Cells would probably be a hit 15:47:35 +1 15:47:50 looks good 15:48:01 that gives us plenty of time to make some noise about it 15:48:15 also we'll be past the Chinese new year holidays 15:49:07 good for me 15:49:12 #info Next meeting will be a Zoom meeting (usual time, link to be posted a few days before) with topic: "Regions vs. Cells" and Belmiro doing the intro talk 15:49:13 we will need a place to have a zoom link. Also, ttx do you know if we can use an account from the foundation? 15:49:29 I should be able to provide a link 15:49:42 #info March 3rd, 15UTC 15:50:21 That works for everyone? 15:50:34 +1 15:50:44 oops sorry I miss the meeting 15:50:52 I will catch up with the logs 15:50:56 genekuo: you missed all the fun! 15:51:16 #topic Open discussion 15:51:35 genekuo: anything specific to report? I mentioned your in-flight patches 15:51:53 yeah, I'm almost done with the functional test as oslo.messaging side 15:52:08 awaiting the new release for oslo.metrics to be release so that the bug can be fixed 15:52:17 genekuo: do you run Ceph? We've been dsicussing collecting specific Ceph+OpenStack answers 15:52:47 We do run Ceph, but it's handled in other teams along with cinder 15:52:59 We have a dedicated storage team 15:53:06 hah, common pattern it seems 15:53:44 I can probably make the oslo.messaging patch ready to review in next week 15:53:53 genekuo: do you think they could be interested in joining a specific Ceph+OpenStack ops discussion in the near future? 15:54:07 genekuo: great! But you also need to enjoy the holidays 15:54:13 hmm, I can ask them 15:54:29 it seem that Societe generale need to let the dedicated Cinder team also handle Ceph :') 15:54:44 loan: problem solved! 15:55:14 before we close, anything else, anyone? 15:55:48 Ah, I would like to mention I've reviewed the Google docs and it LGTM 15:56:20 the Google docs? 15:56:26 you mean Belmiro's? he posted the result in the wiki 15:56:32 yep 15:56:49 Thanks for the work :) 15:56:55 loan: belmiro had a google doc with the proposed responses to the #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut questions 15:56:56 thanks genekuo, it's now in the wiki 15:57:16 Alright, thanks everyone for participating and sharing your experience. Have a great week and we'll talk again in 3 weeks! 15:57:34 thank you all, sorry for being late 15:57:40 Keep an eye on the mailing list for the video meeting announcement 15:57:52 thanks for all your inputs ! 15:57:58 \o 15:58:03 #endmeeting