15:00:01 <ttx> #startmeeting large_scale_sig 15:00:02 <openstack> Meeting started Wed Feb 10 15:00:01 2021 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:05 <openstack> The meeting name has been set to 'large_scale_sig' 15:00:07 <ttx> #topic Rollcall 15:00:13 <ttx> Who is here for the Large Scale SIG meeting ? 15:00:29 <alistarle> o/ 15:00:31 <ttx> pinging regulars: amorin belmoreira 15:00:45 <belmoreira> o/ 15:00:48 <belmoreira> thanks ttx 15:00:50 <amorin> hello! 15:00:53 <amorin> thanks 15:01:02 <alistarle> @alistarle @leducflorian and @loan from Société Générale 15:01:10 <Loan> o/ 15:01:12 <amorin> welcome guys :P 15:01:25 <ttx> Pretty sure genekuo will not be joining us as it's Chinese New Year 15:01:35 <ttx> We could almost do the meeting in French 15:01:53 <amorin> :) 15:01:54 <belmoreira> :) don't make fun of me 15:02:08 <leducflorian> Hello guys 15:02:09 <ttx> belmoreira: I suspect your french is better than my English 15:02:15 <ttx> Our agenda for today is at: 15:02:19 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-meeting 15:02:40 <ttx> First let's quickly review our action items from last meeting... 15:02:42 <belmoreira> ttx I won't bet on it 15:02:43 <ttx> #topic Action items review 15:02:57 <ttx> - ttx to revive the OSarchiver upstreaming effort 15:03:02 <ttx> Discussed at http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020116.html 15:03:10 <ttx> I'm going to gauge interest from the OpenStack TC for option (3) vs. option (2) 15:03:13 <reedip> `o/ 15:03:18 <ttx> reedip: hi! 15:03:27 <ttx> #action ttx to contact OpenStack TC re: OSarchiver and see which option sounds best 15:03:43 <ttx> - ttx to start a "how many compute nodes in your typical cluster" discussion on the ML 15:03:57 <ttx> Done @ http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020084.html 15:04:00 <amorin> ttx ack 15:04:07 <ttx> That triggered a lot of good insights, which I'll work on to summarize 15:04:19 <amorin> very good indeed 15:04:21 <ttx> #action ttx to summarize "how many compute nodes in your typical cluster" thread into the wiki 15:04:35 <reedip> That's pretty concise ... 15:04:46 <ttx> amorin: yes it did encourage me to start other such discussions on the ML 15:04:53 <ttx> - belmoreira to post first draft of the ScaleOut FAQ 15:05:07 <ttx> belmoreira: I saw some updates passing by? 15:05:25 <belmoreira> yes, updated the scaling out and upgrades sections 15:05:36 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut 15:05:43 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain 15:05:44 <belmoreira> let me know if you have any comments 15:05:59 <ttx> I only reviewed the first one, but it looks great 15:06:05 <belmoreira> and if you have any other questions that we can add 15:06:21 <ttx> Like good enough for us to point people asking questions to it now 15:06:37 <ttx> - all to think about 5-10min presentations to use in a video version of our SIG meeting 15:06:45 <ttx> Any idea on that yet? 15:06:51 <belmoreira> I have few 15:07:16 <belmoreira> for example: 15:07:27 <belmoreira> Regions vs Cells 15:07:27 <belmoreira> Inception - Run your OpenStack control plane on top of the infrastructure that it manages 15:07:27 <ttx> (to summarize for newcomers, the idea was to once in a while do lightning talks in a video meeting version of this meeting, to touch a wider audience) 15:07:28 <belmoreira> Scaling out with Nova Cells 15:07:28 <belmoreira> RabbitMQ clusters for large OpenStack Infrastructures 15:07:30 <belmoreira> Upgrades in a Large Scale infrastructure 15:07:30 <belmoreira> Fine-grained scheduling in a Large Scale infrastructure 15:07:32 <belmoreira> OpenStack control plane DBs maintenance in a Large Scale infrastructure 15:07:32 <belmoreira> How to scale Glance, Cinder and other core projects in a Large Scale infrastructure 15:07:34 <belmoreira> Operations in a Large Scale infrastructure 15:07:34 <belmoreira> AVZs with multiple Cells 15:08:22 <amorin> sounds like we have plenty of subjects for the next few month :) 15:08:23 <ttx> That's a great list! I bet a lot of people would be interested in that. 15:08:55 <amorin> we can do at OVH something about operations 15:09:00 <ttx> We should rename the meeting "Fireside chat with Belmiro, tips from searching for the Higgs Boson" 15:09:02 <belmoreira> this can be a 10 min presentation, to trigger discussion 15:09:13 <amorin> we have mistral that is doing a self-healing of openstack 15:09:24 <imtiazc> Those are great topic @belmoreira 15:09:26 <belmoreira> ttx haha 15:09:39 <ttx> OK, let's discuss at the end of the meeting when to schedule the first one, and which topic to pick 15:09:39 <alistarle> Sure, I think few of them can interest us at SG 15:10:23 <ttx> Back to the agenda... 15:10:24 <belmoreira> please suggest also other topics, these are just few that we can talk about 15:10:31 <imtiazc> Does operations of a large scale cover monitoring i.e getting logs and metrics 15:10:48 <ttx> yes maybe I should create a wiki page or etherpad to collect topics 15:10:56 <ttx> so that people can asynchronously propose those 15:11:28 <belmoreira> imtiazc I was not thinking in monitoring. But definitely we should have a session about monit 15:12:19 <reedip> That was something I asked in the last meeting as well 15:12:25 <ttx> https://etherpad.opendev.org/p/large-scale-sig-lightning-talks 15:12:27 <reedip> For monitoring 15:12:35 <ttx> belmoreira: please dump your list in there 15:12:48 <belmoreira> sure 15:13:39 <ttx> ok, moving on to... 15:13:41 <ttx> #topic Discussing Ceph in a future forum setting 15:13:57 <ttx> So we have several new folks joining today from Societe Generale 15:14:09 <ttx> They had questions around Ceph scaling, like: 15:14:15 <ttx> - Single cluster vs. multiple smaller clusters ? 15:14:21 <ttx> - Ceph cluster optimization (number of nodes, enabled features...) in a large scale cluster 15:14:27 <ttx> - Performance optimization for SSD/HDD use cases 15:14:33 <ttx> - Improve resilience through erasure coding implementation 15:14:41 <ttx> We did not really discuss Ceph yet 15:15:15 <ttx> But I know CERN is running it extensively, so we should be able to tap into the group's experience running that as well 15:15:23 <leducflorian> Thanks for the summarize ttx 15:15:32 <ttx> Should we discuss those here today? Or schedule some Ceph-specific session in the near future and try to make some noise to get more people to show up? 15:16:13 <ttx> I guess it boils down to who feels comfortable talking about their Ceph scaling 15:16:19 <belmoreira> at CERN, ceph is not managed by the Cloud team. We have a storage team that looks after all the storage solutions available in the Organization 15:16:37 <ttx> Ah, ok, did not know that :) 15:16:55 <imtiazc> Workday's CEPH deployment footprint is growing as well. So far, it is loosely integrated with OpenStack. 15:16:55 <ttx> belmoreira: do you think some of them might be interested in sharing their experience there? 15:17:48 <belmoreira> probably we should point first to the presentations that they already gave in OpenStack Summits and Ceph Days 15:17:55 <ttx> I was thinking we could set up some specific Ceph/OpenStack session and try to get Ceph community people + operators versed in Ceph+OpenStack to discuss how they do things 15:17:58 <belmoreira> let me find some references 15:18:19 <ttx> since we probably don;t have critical mass of Ceph experience right here today 15:18:44 <amorin> at OVH, we have a specific team for that as well 15:18:48 <belmoreira> but maybe they will be open for a short presentation in one of our future sessions 15:18:54 <alistarle> Yes it seems in large deployment the Ceph part is likely separated from the Openstack one 15:18:55 <amorin> I can ask if one of the guys could join and talk about it 15:19:02 <alistarle> Which seems to be a good idea actually 15:19:10 <ttx> maybe in the mean time we can start a few threads on the mailing-list to start collecting solutions, but also names of people to involve in that specific session 15:19:55 <leducflorian> today at Societe Generale the cloud team manage almost all the stack (except for cinder API + ISCSI over pure storage solutions). 15:20:59 <ttx> FWIW we are interested in facilitating integration, so I'm happy to have the OpenStack large scale SIG host a Ceph-oriented discussion 15:21:38 <ttx> but if the teams are often separated, we'll likely need to do some prep work to get the right people 15:22:26 <ttx> So my proposal would be: 15:23:25 <ttx> 1- The SG people start a thread on a specific Ceph integration question they have on the OpenStack mailing-list, see if we get teh Ceph experts out of the woods and participating 15:23:41 <imtiazc> There is another topic, not related to scaling, but is being discussed a lot: what operating system should we continue on? For those who have been using CentOS? 15:24:26 <ttx> 2- Meanwhile, we plan to have in the near future a Ceph/OpenStack scaling session of the Large Scale SIG video meetings, and start gathering names of people to invite to that 15:24:58 <alistarle> Does Large Scale SIG already have video meetings or it will be a exceptional session ? 15:25:29 <ttx> We are just getting started with video meetings, and the first one should probably not be a Ceph specific one 15:25:51 <imtiazc> I noticed that CERN is coming up with strategy. I would imagine all large scale operators will have to come up with a plan on how to update CentOS . Is there enough interest on this? 15:25:57 <ttx> But... at the Foundation we are also working on a format for a short/recurring virtual event 15:26:02 <loan> 3- Maybe it could be interesting to post a wiki about Ceph scaling (as the Large Scale SIG wiki that you created recently)? 15:26:29 <ttx> so maybe we could also fit into that and get the Foundation promotion machine to help us assemble a crowd 15:26:30 <belmoreira> ttx I like 2, but when these meetings have some traction and attendees... to avoid some disappointment when having other teams presenting 15:26:53 <belmoreira> imtiazc ++ 15:26:57 <ttx> belmoreira: yes we would need to have a number of presenters to seed the discussion 15:27:22 <ttx> loan: yes, once we start collecting answers we could totally document them on the wiki 15:27:40 <ttx> you all could actually help with that :) 15:28:02 <alistarle> ttx: for the wiki, maybe we can begin by filling up question without answer, to launch the discussion ? 15:29:08 <loan> maybe we misunderstood the purpose of these wikis 15:29:20 <ttx> imtiazc: I added that suggestion to https://etherpad.opendev.org/p/large-scale-sig-lightning-talks 15:29:32 <ttx> in the sublist of things we would like to discuss but need key presenters for 15:30:01 <belmoreira> For reference, some links about the ceph setup at CERN: 15:30:02 <belmoreira> https://www.youtube.com/watch?v=OopRMUYiY5E 15:30:02 <belmoreira> https://www.youtube.com/watch?v=21LF2LC58MM 15:30:02 <belmoreira> https://www.youtube.com/watch?v=0i7ew3XXb7Q 15:30:05 <ttx> alistarle: yes, that's how we've been using them. To list the questions and the answers 15:30:17 <ttx> sometimes just the questions 15:31:05 <ttx> alistarle: I can create a page for Ceph Q&A, seeding it with your set of questions 15:31:43 <alistarle> LGTM, and we can complete them by detailing it a little more if needed 15:31:49 <ttx> exactly 15:32:09 <ttx> OK, let me summarize all the ideas for the meeting logs 15:32:35 <ttx> #info The SG people start a thread on a specific Ceph integration question they have on the OpenStack mailing-list, see if we get the Ceph experts out of the woods and participating 15:33:07 <ttx> #info we plan to have in the near future a Ceph/OpenStack scaling session of the Large Scale SIG video meetings, and start gathering names of people to invite to that. need key presenters first 15:33:33 <ttx> #action ttx to create a wiki page for Ceph-related questions and seed it with SG questions 15:34:07 <ttx> #info In the list of "sessions we would like to have but need to find the right people first", we also have CentOS upgrading 15:35:01 <ttx> #info ttx should see if that sort of discussion could fit in the "regular short virtual event" format that the Foundation wants to support 15:35:02 <leducflorian> thx for all those usefull references belmoreira 15:35:24 <ttx> Alright, I think I captured everything 15:36:19 <ttx> #info If you have ideas of lightning talks that could seed a discussion, or ideas of sessions we shouldhave but lack presenters for, please add them to https://etherpad.opendev.org/p/large-scale-sig-lightning-talks 15:36:59 <ttx> Anything else on that topic? 15:37:34 <ttx> Business summary: We don't have answers, but we have a plan to gather them :) 15:38:09 <ttx> And yes, would be good to dig into the trove of past summit presentations see if anything useful turns up 15:38:15 <ttx> CERN but also elsewhere 15:38:42 <ttx> #link https://www.openstack.org/videos/search?search=Ceph 15:39:08 <ttx> I did have a look, but those sounded pretty specific and not general enough 15:39:10 <loan> Everything seems good. We are starting to gather informations on our side about that because it's quite urgent on our side (thining about redesigning our ceph infrastructure). 15:39:40 <loan> So maybe we will also have some results / presentations to share in a near future. 15:39:51 <ttx> https://www.openstack.org/videos/summits/sydney-2017/the-dos-and-donts-for-ceph-and-openstack looks promising 15:39:59 <ttx> And Florian is such a great speaker 15:40:53 <ttx> Alright, if nothing else, let's move on to next topic 15:40:54 <loan> yeah, it seem very interesting :) 15:41:04 <ttx> #topic Progress/Blockers 15:41:10 <ttx> * Stage 1 - Configure (amorin) 15:41:14 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Configure 15:41:22 <ttx> Any progress/blocker to report, amorin ? 15:41:31 <amorin> I havent been able to move this topic forward 15:41:53 <ttx> * Stage 2 - Monitor (genekuo) 15:41:59 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Monitor 15:42:36 <ttx> gene is not around, I can report a few patches were posted for oslo.metrics, slow progress on the oslo,messaging integration front 15:42:43 <ttx> * Stage 3 - Scale Up (ttx) 15:42:47 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleUp 15:43:10 <ttx> Started that thread on the ML and will be digesting answers on the page soon 15:43:20 <ttx> # Stage 4/5 - Scale Out, upgrade & maintain (belmoreira) 15:43:23 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut 15:43:25 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain 15:43:40 <ttx> we saw progress already, any blocker? help needed? 15:43:59 <belmoreira> as mentioned before, updated the pages. Please have a look. 15:44:19 <ttx> ok then... 15:44:21 <ttx> #topic Next meeting 15:44:43 <ttx> Our next meeting should be in two weeks, Feb 24, but I'll likely not be available on Feb 24 as we have a Foundation all-hands meeting planned 15:44:54 <ttx> We can push back, or someone else can chair 15:45:17 <leducflorian> plenty of information to gather thanks all 15:45:47 <ttx> we could also schedule our video meeting instead 15:45:49 <belmoreira> since you are the main driver of the group I would suggest to postpone, especially if you start the zoom 15:46:49 <ttx> belmoreira: Is there a topic in your list you would feel comfortable running for our first video meeting the week after? 15:46:53 <ttx> like... March 3rd? 15:47:19 <ttx> Regions vs Cells would probably be a hit 15:47:35 <imtiazc> +1 15:47:50 <belmoreira> looks good 15:48:01 <ttx> that gives us plenty of time to make some noise about it 15:48:15 <ttx> also we'll be past the Chinese new year holidays 15:49:07 <amorin> good for me 15:49:12 <ttx> #info Next meeting will be a Zoom meeting (usual time, link to be posted a few days before) with topic: "Regions vs. Cells" and Belmiro doing the intro talk 15:49:13 <belmoreira> we will need a place to have a zoom link. Also, ttx do you know if we can use an account from the foundation? 15:49:29 <ttx> I should be able to provide a link 15:49:42 <ttx> #info March 3rd, 15UTC 15:50:21 <ttx> That works for everyone? 15:50:34 <belmoreira> +1 15:50:44 <genekuo> oops sorry I miss the meeting 15:50:52 <genekuo> I will catch up with the logs 15:50:56 <ttx> genekuo: you missed all the fun! 15:51:16 <ttx> #topic Open discussion 15:51:35 <ttx> genekuo: anything specific to report? I mentioned your in-flight patches 15:51:53 <genekuo> yeah, I'm almost done with the functional test as oslo.messaging side 15:52:08 <genekuo> awaiting the new release for oslo.metrics to be release so that the bug can be fixed 15:52:17 <ttx> genekuo: do you run Ceph? We've been dsicussing collecting specific Ceph+OpenStack answers 15:52:47 <genekuo> We do run Ceph, but it's handled in other teams along with cinder 15:52:59 <genekuo> We have a dedicated storage team 15:53:06 <ttx> hah, common pattern it seems 15:53:44 <genekuo> I can probably make the oslo.messaging patch ready to review in next week 15:53:53 <ttx> genekuo: do you think they could be interested in joining a specific Ceph+OpenStack ops discussion in the near future? 15:54:07 <ttx> genekuo: great! But you also need to enjoy the holidays 15:54:13 <genekuo> hmm, I can ask them 15:54:29 <loan> it seem that Societe generale need to let the dedicated Cinder team also handle Ceph :') 15:54:44 <ttx> loan: problem solved! 15:55:14 <ttx> before we close, anything else, anyone? 15:55:48 <genekuo> Ah, I would like to mention I've reviewed the Google docs and it LGTM 15:56:20 <loan> the Google docs? 15:56:26 <ttx> you mean Belmiro's? he posted the result in the wiki 15:56:32 <genekuo> yep 15:56:49 <genekuo> Thanks for the work :) 15:56:55 <ttx> loan: belmiro had a google doc with the proposed responses to the #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut questions 15:56:56 <belmoreira> thanks genekuo, it's now in the wiki 15:57:16 <ttx> Alright, thanks everyone for participating and sharing your experience. Have a great week and we'll talk again in 3 weeks! 15:57:34 <genekuo> thank you all, sorry for being late 15:57:40 <ttx> Keep an eye on the mailing list for the video meeting announcement 15:57:52 <alistarle> thanks for all your inputs ! 15:57:58 <alistarle> \o 15:58:03 <ttx> #endmeeting