15:00:01 <ttx> #startmeeting large_scale_sig
15:00:02 <openstack> Meeting started Wed Feb 10 15:00:01 2021 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:05 <openstack> The meeting name has been set to 'large_scale_sig'
15:00:07 <ttx> #topic Rollcall
15:00:13 <ttx> Who is here for the Large Scale SIG meeting ?
15:00:29 <alistarle> o/
15:00:31 <ttx> pinging regulars: amorin belmoreira
15:00:45 <belmoreira> o/
15:00:48 <belmoreira> thanks ttx
15:00:50 <amorin> hello!
15:00:53 <amorin> thanks
15:01:02 <alistarle> @alistarle @leducflorian and @loan from Société Générale
15:01:10 <Loan> o/
15:01:12 <amorin> welcome guys :P
15:01:25 <ttx> Pretty sure genekuo will not be joining us as it's Chinese New Year
15:01:35 <ttx> We could almost do the meeting in French
15:01:53 <amorin> :)
15:01:54 <belmoreira> :) don't make fun of me
15:02:08 <leducflorian> Hello guys
15:02:09 <ttx> belmoreira: I suspect your french is better than my English
15:02:15 <ttx> Our agenda for today is at:
15:02:19 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-meeting
15:02:40 <ttx> First let's quickly review our action items from last meeting...
15:02:42 <belmoreira> ttx I won't bet on it
15:02:43 <ttx> #topic Action items review
15:02:57 <ttx> - ttx to revive the OSarchiver upstreaming effort
15:03:02 <ttx> Discussed at http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020116.html
15:03:10 <ttx> I'm going to gauge interest from the OpenStack TC for option (3) vs. option (2)
15:03:13 <reedip> `o/
15:03:18 <ttx> reedip: hi!
15:03:27 <ttx> #action ttx to contact OpenStack TC re: OSarchiver and see which option sounds best
15:03:43 <ttx> - ttx to start a "how many compute nodes in your typical cluster" discussion on the ML
15:03:57 <ttx> Done @ http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020084.html
15:04:00 <amorin> ttx ack
15:04:07 <ttx> That triggered a lot of good insights, which I'll work on to summarize
15:04:19 <amorin> very good indeed
15:04:21 <ttx> #action ttx to summarize "how many compute nodes in your typical cluster" thread into the wiki
15:04:35 <reedip> That's pretty concise ...
15:04:46 <ttx> amorin: yes it did encourage me to start other such discussions on the ML
15:04:53 <ttx> - belmoreira to post first draft of the ScaleOut FAQ
15:05:07 <ttx> belmoreira: I saw some updates passing by?
15:05:25 <belmoreira> yes, updated the scaling out and upgrades sections
15:05:36 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut
15:05:43 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain
15:05:44 <belmoreira> let me know if you have any comments
15:05:59 <ttx> I only reviewed the first one, but it looks great
15:06:05 <belmoreira> and if you have any other questions that we can add
15:06:21 <ttx> Like good enough for us to point people asking questions to it now
15:06:37 <ttx> - all to think about 5-10min presentations to use in a video version of our SIG meeting
15:06:45 <ttx> Any idea on that yet?
15:06:51 <belmoreira> I have few
15:07:16 <belmoreira> for example:
15:07:27 <belmoreira> Regions vs Cells
15:07:27 <belmoreira> Inception - Run your OpenStack control plane on top of the infrastructure that it manages
15:07:27 <ttx> (to summarize for newcomers, the idea was to once in a while do lightning talks in a video meeting version of this meeting, to touch a wider audience)
15:07:28 <belmoreira> Scaling out with Nova Cells
15:07:28 <belmoreira> RabbitMQ clusters for large OpenStack Infrastructures
15:07:30 <belmoreira> Upgrades in a Large Scale infrastructure
15:07:30 <belmoreira> Fine-grained scheduling in a Large Scale infrastructure
15:07:32 <belmoreira> OpenStack control plane DBs maintenance in a Large Scale infrastructure
15:07:32 <belmoreira> How to scale Glance, Cinder and other core projects in a Large Scale infrastructure
15:07:34 <belmoreira> Operations in a Large Scale infrastructure
15:07:34 <belmoreira> AVZs with multiple Cells
15:08:22 <amorin> sounds like we have plenty of subjects for the next few month :)
15:08:23 <ttx> That's a great list! I bet a lot of people would be interested in that.
15:08:55 <amorin> we can do at OVH something about operations
15:09:00 <ttx> We should rename the meeting "Fireside chat with Belmiro, tips from searching for the Higgs Boson"
15:09:02 <belmoreira> this can be a 10 min presentation, to trigger discussion
15:09:13 <amorin> we have mistral that is doing a self-healing of openstack
15:09:24 <imtiazc> Those are great topic @belmoreira
15:09:26 <belmoreira> ttx haha
15:09:39 <ttx> OK, let's discuss at the end of the meeting when to schedule the first one, and which topic to pick
15:09:39 <alistarle> Sure, I think few of them can interest us at SG
15:10:23 <ttx> Back to the agenda...
15:10:24 <belmoreira> please suggest also other topics, these are just few that we can talk about
15:10:31 <imtiazc> Does operations of a large scale cover monitoring i.e getting logs and metrics
15:10:48 <ttx> yes maybe I should create a wiki page or etherpad to collect topics
15:10:56 <ttx> so that people can asynchronously propose those
15:11:28 <belmoreira> imtiazc I was not thinking in monitoring. But definitely we should have a session about monit
15:12:19 <reedip> That was something I asked in the last meeting as well
15:12:25 <ttx> https://etherpad.opendev.org/p/large-scale-sig-lightning-talks
15:12:27 <reedip> For monitoring
15:12:35 <ttx> belmoreira: please dump your list in there
15:12:48 <belmoreira> sure
15:13:39 <ttx> ok, moving on to...
15:13:41 <ttx> #topic Discussing Ceph in a future forum setting
15:13:57 <ttx> So we have several new folks joining today from Societe Generale
15:14:09 <ttx> They had questions around Ceph scaling, like:
15:14:15 <ttx> - Single cluster vs. multiple smaller clusters ?
15:14:21 <ttx> - Ceph cluster optimization (number of nodes, enabled features...) in a large scale cluster
15:14:27 <ttx> - Performance optimization for SSD/HDD use cases
15:14:33 <ttx> - Improve resilience through erasure coding implementation
15:14:41 <ttx> We did not really discuss Ceph yet
15:15:15 <ttx> But I know CERN is running it extensively, so we should be able to tap into the group's experience running that as well
15:15:23 <leducflorian> Thanks for the summarize ttx
15:15:32 <ttx> Should we discuss those here today? Or schedule some Ceph-specific session in the near future and try to make some noise to get more people to show up?
15:16:13 <ttx> I guess it boils down to who feels comfortable talking about their Ceph scaling
15:16:19 <belmoreira> at CERN, ceph is not managed by the Cloud team. We have a storage team that looks after all the storage solutions available in the Organization
15:16:37 <ttx> Ah, ok, did not know that :)
15:16:55 <imtiazc> Workday's CEPH deployment footprint is growing as well. So far, it is loosely integrated with OpenStack.
15:16:55 <ttx> belmoreira: do you think some of them might be interested in sharing their experience there?
15:17:48 <belmoreira> probably we should point first to the presentations that they already gave in OpenStack Summits and Ceph Days
15:17:55 <ttx> I was thinking we could set up some specific Ceph/OpenStack session and try to get Ceph community people + operators versed in Ceph+OpenStack to discuss how they do things
15:17:58 <belmoreira> let me find some references
15:18:19 <ttx> since we probably don;t have critical mass of Ceph experience right here today
15:18:44 <amorin> at OVH, we have a specific team for that as well
15:18:48 <belmoreira> but maybe they will be open for a short presentation in one of our future sessions
15:18:54 <alistarle> Yes it seems in large deployment the Ceph part is likely separated from the Openstack one
15:18:55 <amorin> I can ask if one of the guys could join and talk about it
15:19:02 <alistarle> Which seems to be a good idea actually
15:19:10 <ttx> maybe in the mean time we can start a few threads on the mailing-list to start collecting solutions, but also names of people to involve in that specific session
15:19:55 <leducflorian> today at Societe Generale the cloud team manage almost all the stack (except for cinder API + ISCSI over pure storage solutions).
15:20:59 <ttx> FWIW we are interested in facilitating integration, so I'm happy to have the OpenStack large scale SIG host a Ceph-oriented discussion
15:21:38 <ttx> but if the teams are often separated, we'll likely need to do some prep work to get the right people
15:22:26 <ttx> So my proposal would be:
15:23:25 <ttx> 1- The SG people start a thread on a specific Ceph integration question they have on the OpenStack mailing-list, see if we get teh Ceph experts out of the woods and participating
15:23:41 <imtiazc> There is another topic, not related to scaling, but is being discussed a lot: what operating system should we continue on? For those who have been using CentOS?
15:24:26 <ttx> 2- Meanwhile, we plan to have in the near future a Ceph/OpenStack scaling session of the Large Scale SIG video meetings, and start gathering names of people to invite to that
15:24:58 <alistarle> Does Large Scale SIG already have video meetings or it will be a exceptional session ?
15:25:29 <ttx> We are just getting started with video meetings, and the first one should probably not be a Ceph specific one
15:25:51 <imtiazc> I noticed that CERN is coming up with strategy. I would imagine all large scale operators will have to come up with a plan on how to update CentOS . Is there enough interest on this?
15:25:57 <ttx> But... at the Foundation we are also working on a format for a short/recurring virtual event
15:26:02 <loan> 3- Maybe it could be interesting to post a wiki about Ceph scaling (as the Large Scale SIG wiki that you created recently)?
15:26:29 <ttx> so maybe we could also fit into that and get the Foundation promotion machine to help us assemble a crowd
15:26:30 <belmoreira> ttx I like 2, but when these meetings have some traction and attendees... to avoid some disappointment when having other teams presenting
15:26:53 <belmoreira> imtiazc ++
15:26:57 <ttx> belmoreira: yes we would need to have a number of presenters to seed the discussion
15:27:22 <ttx> loan: yes, once we start collecting answers we could totally document them on the wiki
15:27:40 <ttx> you all could actually help with that :)
15:28:02 <alistarle> ttx: for the wiki, maybe we can begin by filling up question without answer, to launch the discussion ?
15:29:08 <loan> maybe we misunderstood the purpose of these wikis
15:29:20 <ttx> imtiazc: I added that suggestion to https://etherpad.opendev.org/p/large-scale-sig-lightning-talks
15:29:32 <ttx> in the sublist of things we would like to discuss but need key presenters for
15:30:01 <belmoreira> For reference, some links about the ceph setup at CERN:
15:30:02 <belmoreira> https://www.youtube.com/watch?v=OopRMUYiY5E
15:30:02 <belmoreira> https://www.youtube.com/watch?v=21LF2LC58MM
15:30:02 <belmoreira> https://www.youtube.com/watch?v=0i7ew3XXb7Q
15:30:05 <ttx> alistarle: yes, that's how we've been using them. To list the questions and the answers
15:30:17 <ttx> sometimes just the questions
15:31:05 <ttx> alistarle: I can create a page for Ceph Q&A, seeding it with your set of questions
15:31:43 <alistarle> LGTM, and we can complete them by detailing it a little more if needed
15:31:49 <ttx> exactly
15:32:09 <ttx> OK, let me summarize all the ideas for the meeting logs
15:32:35 <ttx> #info The SG people start a thread on a specific Ceph integration question they have on the OpenStack mailing-list, see if we get the Ceph experts out of the woods and participating
15:33:07 <ttx> #info we plan to have in the near future a Ceph/OpenStack scaling session of the Large Scale SIG video meetings, and start gathering names of people to invite to that. need key presenters first
15:33:33 <ttx> #action ttx to create a wiki page for Ceph-related questions and seed it with SG questions
15:34:07 <ttx> #info In the list of "sessions we would like to have but need to find the right people first", we also have CentOS upgrading
15:35:01 <ttx> #info ttx should see if that sort of discussion could fit in the "regular short virtual event" format that the Foundation wants to support
15:35:02 <leducflorian> thx for all those usefull references belmoreira
15:35:24 <ttx> Alright, I think I captured everything
15:36:19 <ttx> #info If you have ideas of lightning talks that could seed a discussion, or ideas of sessions we shouldhave but lack presenters for, please add them to https://etherpad.opendev.org/p/large-scale-sig-lightning-talks
15:36:59 <ttx> Anything else on that topic?
15:37:34 <ttx> Business summary: We don't have answers, but we have a plan to gather them :)
15:38:09 <ttx> And yes, would be good to dig into the trove of past summit presentations see if anything useful turns up
15:38:15 <ttx> CERN but also elsewhere
15:38:42 <ttx> #link https://www.openstack.org/videos/search?search=Ceph
15:39:08 <ttx> I did have a look, but those sounded pretty specific and not general enough
15:39:10 <loan> Everything seems good. We are starting to gather informations on our side about that because it's quite urgent on our side (thining about redesigning our ceph infrastructure).
15:39:40 <loan> So maybe we will also have some results / presentations to share in a near future.
15:39:51 <ttx> https://www.openstack.org/videos/summits/sydney-2017/the-dos-and-donts-for-ceph-and-openstack looks promising
15:39:59 <ttx> And Florian is such a great speaker
15:40:53 <ttx> Alright, if nothing else, let's move on to next topic
15:40:54 <loan> yeah, it seem very interesting :)
15:41:04 <ttx> #topic Progress/Blockers
15:41:10 <ttx> * Stage 1 - Configure (amorin)
15:41:14 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Configure
15:41:22 <ttx> Any progress/blocker to report, amorin ?
15:41:31 <amorin> I havent been able to move this topic forward
15:41:53 <ttx> * Stage 2 - Monitor (genekuo)
15:41:59 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Monitor
15:42:36 <ttx> gene is not around, I can report a few patches were posted for oslo.metrics, slow progress on the oslo,messaging integration front
15:42:43 <ttx> * Stage 3 - Scale Up (ttx)
15:42:47 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleUp
15:43:10 <ttx> Started that thread on the ML and will be digesting answers on the page soon
15:43:20 <ttx> # Stage 4/5 - Scale Out, upgrade & maintain (belmoreira)
15:43:23 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut
15:43:25 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain
15:43:40 <ttx> we saw progress already, any blocker? help needed?
15:43:59 <belmoreira> as mentioned before, updated the pages. Please have a look.
15:44:19 <ttx> ok then...
15:44:21 <ttx> #topic Next meeting
15:44:43 <ttx> Our next meeting should be in two weeks, Feb 24, but I'll likely not be available on Feb 24 as we have a Foundation all-hands meeting planned
15:44:54 <ttx> We can push back, or someone else can chair
15:45:17 <leducflorian> plenty of information to gather thanks all
15:45:47 <ttx> we could also schedule our video meeting instead
15:45:49 <belmoreira> since you are the main driver of the group I would suggest to postpone, especially if you start the zoom
15:46:49 <ttx> belmoreira: Is there a topic in your list you would feel comfortable running for our first video meeting the week after?
15:46:53 <ttx> like... March 3rd?
15:47:19 <ttx> Regions vs Cells would probably be a hit
15:47:35 <imtiazc> +1
15:47:50 <belmoreira> looks good
15:48:01 <ttx> that gives us plenty of time to make some noise about it
15:48:15 <ttx> also we'll be past the Chinese new year holidays
15:49:07 <amorin> good for me
15:49:12 <ttx> #info Next meeting will be a Zoom meeting (usual time, link to be posted a few days before) with topic: "Regions vs. Cells" and Belmiro doing the intro talk
15:49:13 <belmoreira> we will need a place to have a zoom link. Also, ttx do you know if we can use an account from the foundation?
15:49:29 <ttx> I should be able to provide a link
15:49:42 <ttx> #info March 3rd, 15UTC
15:50:21 <ttx> That works for everyone?
15:50:34 <belmoreira> +1
15:50:44 <genekuo> oops sorry I miss the meeting
15:50:52 <genekuo> I will catch up with the logs
15:50:56 <ttx> genekuo: you missed all the fun!
15:51:16 <ttx> #topic Open discussion
15:51:35 <ttx> genekuo: anything specific to report? I mentioned your in-flight patches
15:51:53 <genekuo> yeah, I'm almost done with the functional test as oslo.messaging side
15:52:08 <genekuo> awaiting the new release for oslo.metrics to be release so that the bug can be fixed
15:52:17 <ttx> genekuo: do you run Ceph? We've been dsicussing collecting specific Ceph+OpenStack answers
15:52:47 <genekuo> We do run Ceph, but it's handled in other teams along with cinder
15:52:59 <genekuo> We have a dedicated storage team
15:53:06 <ttx> hah, common pattern it seems
15:53:44 <genekuo> I can probably make the oslo.messaging patch ready to review in next week
15:53:53 <ttx> genekuo: do you think they could be interested in joining a specific Ceph+OpenStack ops discussion in the near future?
15:54:07 <ttx> genekuo: great! But you also need to enjoy the holidays
15:54:13 <genekuo> hmm, I can ask them
15:54:29 <loan> it seem that Societe generale need to let the dedicated Cinder team also handle Ceph :')
15:54:44 <ttx> loan: problem solved!
15:55:14 <ttx> before we close, anything else, anyone?
15:55:48 <genekuo> Ah, I would like to mention I've reviewed the Google docs and it LGTM
15:56:20 <loan> the Google docs?
15:56:26 <ttx> you mean Belmiro's? he posted the result in the wiki
15:56:32 <genekuo> yep
15:56:49 <genekuo> Thanks for the work :)
15:56:55 <ttx> loan: belmiro had a google doc with the proposed responses to the #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut questions
15:56:56 <belmoreira> thanks genekuo, it's now in the wiki
15:57:16 <ttx> Alright, thanks everyone for participating and sharing your experience. Have a great week and we'll talk again in 3 weeks!
15:57:34 <genekuo> thank you all, sorry for being late
15:57:40 <ttx> Keep an eye on the mailing list for the video meeting announcement
15:57:52 <alistarle> thanks for all your inputs !
15:57:58 <alistarle> \o
15:58:03 <ttx> #endmeeting