09:00:01 <ttx> #startmeeting large_scale_sig 09:00:01 <opendevmeet> Meeting started Wed Jun 19 09:00:01 2024 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:01 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:01 <opendevmeet> The meeting name has been set to 'large_scale_sig' 09:00:20 <ttx> Hi everyone, welcome to our monthly Large Scale SIG meeting! 09:00:24 <amorin> o/ 09:00:26 <ttx> #topic Rollcall 09:00:48 <ttx> ping felix.huettner songwenping 09:01:23 <ttx> Our agenda is at: 09:01:27 <ttx> #link https://etherpad.opendev.org/p/large-scale-sig-meeting 09:02:05 <ttx> Waiting a few minutes in case other participants join late 09:02:06 <songwenping> \o/ 09:02:18 <songwenping> hi ttx 09:02:18 <ttx> songwenping: hi! Glad you could make it 09:03:03 <ttx> OK, let's get started 09:03:11 <ttx> #topic Brainstorm OpenInfra Live next episode ideas 09:03:46 <ttx> In previous meetings we discussed a potential new episode, after unsuccessfully trying to crowdsource one frmo the rset of the community 09:04:14 <ttx> amorin: we were considering one around infrastructure for GPUs, did you manage to convince anyone at OVH around that? 09:04:36 <amorin> I completely forgot to talk about it unfortunately, sorry for this 09:04:49 <amorin> so I will ask, adding in my local todo right now 09:05:07 <ttx> I was wondering if we could get https://www.nexgencloud.com/ to talk 09:05:34 <ttx> They are a one of the biggest buyers of GPUs recently and run an openstack cloud 09:05:40 <amorin> what is the idea in your mind? 09:05:49 <amorin> how openstack and gpu can work together? 09:05:57 <amorin> or how is it consumed by customers? 09:06:06 <amorin> or the usage of GPU in cloud? 09:06:29 <ttx> Specific challenges in providing a large scale GPU cloud, I guess 09:06:42 <ttx> identifying any gap 09:06:56 <songwenping> GPU management? our product adapt many kinds of GPUs, like A 09:07:02 <ttx> trying to anticipate questions the next GPU cloud deployer may have 09:07:05 <amorin> so, so more related to infrastructure than customer use cases 09:07:31 <ttx> yeah... Would not mind some shiny workload example too, but that's a bit orthogonal to our SIG purpose 09:07:37 <songwenping> A100, A40, V100, P100 and so on. 09:07:53 <ttx> Could be more of a panel thing 09:08:08 <amorin> ok, I have a guy for this in the team, will ask if he is willing to join/talk about it 09:08:25 <ttx> Experience operating an OpenStack GPU cloud those days 09:08:48 <ttx> cool. We'll reach out to Nexgen see if they are interested 09:08:57 <ttx> and then open it up to others 09:09:11 <ttx> probably somethign we'd do in ~October 09:09:31 <ttx> September we'll be busy at OpenInfra Summit Asia 09:09:33 <amorin> ack, so we have time to refine this, that'd good 09:09:39 <ttx> and July-August will be tricky 09:10:02 <ttx> #agreed let's try to do a panel episode around Experience operating an OpenStack GPU cloud 09:10:21 <ttx> #action amorin to confirm an OVHCloud speaker 09:10:37 <amorin> ack 09:10:38 <ttx> #action ttx to see if someone from nexgen would be interested 09:10:52 <ttx> #info targeting October timeframe 09:11:21 <amorin> maybe have sylvain bauza in the talk as well? he is involved in GPU and openstack a lot 09:11:30 <ttx> yeah that's a good idea... 09:11:47 <ttx> #info Sylvain Bauza could bring the development angle 09:12:11 <ttx> I'll give it some extra thought and pull Allison in for extra ideas 09:12:22 <ttx> moving on to next topic 09:12:23 <ttx> #topic Large scale doc 09:12:39 <ttx> songwenping sent a great report to the mailing-list 09:12:51 <ttx> #link https://etherpad.opendev.org/p/large-scale-inspur 09:12:59 <ttx> There were some open followup questions 09:13:03 <amorin> yes, that's great, thanks! 09:13:16 <ttx> mnaser asked "How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?" 09:13:49 <ttx> than amorin had questions too 09:13:52 <ttx> then* 09:14:44 <amorin> yup, I am eager to learn more about what you wanted to achieve and what you exactly did to fix your deployment 09:16:02 <songwenping> amorin, good question. we want to manage more nodes as there are big requirement for customer. 09:16:35 <ttx> songwenping: did you see those questions on the mailing-list? ideally you would respond there so that everyone benefits 09:16:43 <songwenping> we use k8s infrastructure to deploy openstack 09:17:13 <songwenping> sorry, maybe i miss the mail 09:17:20 <amorin> e.g. you mentionned booting 3k instances and having scheduler / placement issue. Is it because you ask those 3k instance in one shot? 09:19:42 <ttx> songwenping: still here? 09:20:00 <songwenping> yeah 09:20:13 <songwenping> i am finding the mail. 09:20:26 <songwenping> but still not find :( 09:20:36 <ttx> ah, let me link 09:21:00 <ttx> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/ISIG5TG4DYCTDTP4ZJNJFYCSUVYMX5BT/ 09:21:19 <ttx> you will see both questions there ^ 09:21:33 <songwenping> amorin, yes, we send requests to create 3k instances in one shot. 09:22:07 <amorin> that make sense then, that's an unusual use case, amazing! 09:22:31 <ttx> ideally you would reply by email to the mailing-list again, adressing mnaser's and amorin's questions 09:22:42 <ttx> that way everyone else can see the answers 09:22:45 <amorin> yes, sounds good to me also 09:23:11 <ttx> songwenping: would that work for you? 09:23:50 <songwenping> ttx, could you please forward the mail to me? 09:24:10 <amorin> it's weird you did no receive it, maybe check you spam box? 09:24:19 <ttx> can you see them at the link I just posted? 09:24:29 <ttx> https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/ISIG5TG4DYCTDTP4ZJNJFYCSUVYMX5BT/ 09:24:49 <songwenping> i can see at the link 09:24:55 <ttx> ok perfect 09:25:08 <ttx> #action songwenping to reply to the questions on the mailing-list 09:25:33 <ttx> amorin: is there anything new in the report that could be documented in the large-scale sig doc? 09:26:16 <songwenping> but i canot reply at the link. 09:26:38 <amorin> I believe yes, we can have something new to add to the doc 09:27:19 <ttx> I'll forward you both emails now 09:27:25 <amorin> however, we need to explain your use-case correctly also, because, e.g. max_connections = 100000 is unusual and maybe counter productive 09:27:33 <songwenping> ttx, thansks. 09:28:03 <amorin> the rabbit config you did also, I need to understand the details of it 09:28:23 <amorin> maybe your situation could also be improved if you switch to quorum queues 09:28:33 <amorin> I dont know for now to be honest 09:28:43 <amorin> let's continue the mail thread 09:29:35 <ttx> OK emails forwarded... let me know if you receive them :) 09:29:38 <songwenping> amorin, we donnot use quorum queues. 09:31:08 <ttx> OK let's continue the discussion on the mailing-list and we'll see if we can extract a few things from the story to add to the doc 09:31:14 <amorin> yup 09:31:16 <ttx> #topic Next meeting(s) 09:31:16 <songwenping> ttx, exactly not yet receive. 09:31:52 <ttx> Normally the next meeting would be on July 17, but I won't be around. Should we skip for summer and do next one September 18? 09:32:18 <ttx> songwenping: sent to the inspur.com address you used to post 09:32:21 <songwenping> amorin, i will complete the rabbit detail optimization on the etherpad. 09:32:31 <ttx> great! 09:32:59 <amorin> thanks 09:33:07 <songwenping> ttx, recevied just now, thanks. 09:33:12 <amorin> july 17 I will also be off 09:33:38 <ttx> OK so that one is a skip for sure 09:33:53 <amorin> we can maybe skip meetings this summer, agre 09:34:23 <ttx> We could keep the August 21 one if you are around 09:35:10 <amorin> I should be there 09:35:19 <ttx> OK let's keep it on the agenda 09:35:35 <ttx> #info next meeting, August 21 on IRC 09:35:49 <ttx> #topic Open discussion 09:36:03 <ttx> Anything else we should cover today? 09:36:14 <amorin> maybe stan you were there to talk about something? 09:38:22 <ttx> stan: still around? 09:39:31 <amorin> nothing more on my side 09:39:40 <ttx> alright then 09:39:44 <ttx> #endmeeting