09:00:01 #startmeeting large_scale_sig 09:00:01 Meeting started Wed Jun 19 09:00:01 2024 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:01 The meeting name has been set to 'large_scale_sig' 09:00:20 Hi everyone, welcome to our monthly Large Scale SIG meeting! 09:00:24 o/ 09:00:26 #topic Rollcall 09:00:48 ping felix.huettner songwenping 09:01:23 Our agenda is at: 09:01:27 #link https://etherpad.opendev.org/p/large-scale-sig-meeting 09:02:05 Waiting a few minutes in case other participants join late 09:02:06 \o/ 09:02:18 hi ttx 09:02:18 songwenping: hi! Glad you could make it 09:03:03 OK, let's get started 09:03:11 #topic Brainstorm OpenInfra Live next episode ideas 09:03:46 In previous meetings we discussed a potential new episode, after unsuccessfully trying to crowdsource one frmo the rset of the community 09:04:14 amorin: we were considering one around infrastructure for GPUs, did you manage to convince anyone at OVH around that? 09:04:36 I completely forgot to talk about it unfortunately, sorry for this 09:04:49 so I will ask, adding in my local todo right now 09:05:07 I was wondering if we could get https://www.nexgencloud.com/ to talk 09:05:34 They are a one of the biggest buyers of GPUs recently and run an openstack cloud 09:05:40 what is the idea in your mind? 09:05:49 how openstack and gpu can work together? 09:05:57 or how is it consumed by customers? 09:06:06 or the usage of GPU in cloud? 09:06:29 Specific challenges in providing a large scale GPU cloud, I guess 09:06:42 identifying any gap 09:06:56 GPU management? our product adapt many kinds of GPUs, like A 09:07:02 trying to anticipate questions the next GPU cloud deployer may have 09:07:05 so, so more related to infrastructure than customer use cases 09:07:31 yeah... Would not mind some shiny workload example too, but that's a bit orthogonal to our SIG purpose 09:07:37 A100, A40, V100, P100 and so on. 09:07:53 Could be more of a panel thing 09:08:08 ok, I have a guy for this in the team, will ask if he is willing to join/talk about it 09:08:25 Experience operating an OpenStack GPU cloud those days 09:08:48 cool. We'll reach out to Nexgen see if they are interested 09:08:57 and then open it up to others 09:09:11 probably somethign we'd do in ~October 09:09:31 September we'll be busy at OpenInfra Summit Asia 09:09:33 ack, so we have time to refine this, that'd good 09:09:39 and July-August will be tricky 09:10:02 #agreed let's try to do a panel episode around Experience operating an OpenStack GPU cloud 09:10:21 #action amorin to confirm an OVHCloud speaker 09:10:37 ack 09:10:38 #action ttx to see if someone from nexgen would be interested 09:10:52 #info targeting October timeframe 09:11:21 maybe have sylvain bauza in the talk as well? he is involved in GPU and openstack a lot 09:11:30 yeah that's a good idea... 09:11:47 #info Sylvain Bauza could bring the development angle 09:12:11 I'll give it some extra thought and pull Allison in for extra ideas 09:12:22 moving on to next topic 09:12:23 #topic Large scale doc 09:12:39 songwenping sent a great report to the mailing-list 09:12:51 #link https://etherpad.opendev.org/p/large-scale-inspur 09:12:59 There were some open followup questions 09:13:03 yes, that's great, thanks! 09:13:16 mnaser asked "How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?" 09:13:49 than amorin had questions too 09:13:52 then* 09:14:44 yup, I am eager to learn more about what you wanted to achieve and what you exactly did to fix your deployment 09:16:02 amorin, good question. we want to manage more nodes as there are big requirement for customer. 09:16:35 songwenping: did you see those questions on the mailing-list? ideally you would respond there so that everyone benefits 09:16:43 we use k8s infrastructure to deploy openstack 09:17:13 sorry, maybe i miss the mail 09:17:20 e.g. you mentionned booting 3k instances and having scheduler / placement issue. Is it because you ask those 3k instance in one shot? 09:19:42 songwenping: still here? 09:20:00 yeah 09:20:13 i am finding the mail. 09:20:26 but still not find :( 09:20:36 ah, let me link 09:21:00 #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/ISIG5TG4DYCTDTP4ZJNJFYCSUVYMX5BT/ 09:21:19 you will see both questions there ^ 09:21:33 amorin, yes, we send requests to create 3k instances in one shot. 09:22:07 that make sense then, that's an unusual use case, amazing! 09:22:31 ideally you would reply by email to the mailing-list again, adressing mnaser's and amorin's questions 09:22:42 that way everyone else can see the answers 09:22:45 yes, sounds good to me also 09:23:11 songwenping: would that work for you? 09:23:50 ttx, could you please forward the mail to me? 09:24:10 it's weird you did no receive it, maybe check you spam box? 09:24:19 can you see them at the link I just posted? 09:24:29 https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/ISIG5TG4DYCTDTP4ZJNJFYCSUVYMX5BT/ 09:24:49 i can see at the link 09:24:55 ok perfect 09:25:08 #action songwenping to reply to the questions on the mailing-list 09:25:33 amorin: is there anything new in the report that could be documented in the large-scale sig doc? 09:26:16 but i canot reply at the link. 09:26:38 I believe yes, we can have something new to add to the doc 09:27:19 I'll forward you both emails now 09:27:25 however, we need to explain your use-case correctly also, because, e.g. max_connections = 100000 is unusual and maybe counter productive 09:27:33 ttx, thansks. 09:28:03 the rabbit config you did also, I need to understand the details of it 09:28:23 maybe your situation could also be improved if you switch to quorum queues 09:28:33 I dont know for now to be honest 09:28:43 let's continue the mail thread 09:29:35 OK emails forwarded... let me know if you receive them :) 09:29:38 amorin, we donnot use quorum queues. 09:31:08 OK let's continue the discussion on the mailing-list and we'll see if we can extract a few things from the story to add to the doc 09:31:14 yup 09:31:16 #topic Next meeting(s) 09:31:16 ttx, exactly not yet receive. 09:31:52 Normally the next meeting would be on July 17, but I won't be around. Should we skip for summer and do next one September 18? 09:32:18 songwenping: sent to the inspur.com address you used to post 09:32:21 amorin, i will complete the rabbit detail optimization on the etherpad. 09:32:31 great! 09:32:59 thanks 09:33:07 ttx, recevied just now, thanks. 09:33:12 july 17 I will also be off 09:33:38 OK so that one is a skip for sure 09:33:53 we can maybe skip meetings this summer, agre 09:34:23 We could keep the August 21 one if you are around 09:35:10 I should be there 09:35:19 OK let's keep it on the agenda 09:35:35 #info next meeting, August 21 on IRC 09:35:49 #topic Open discussion 09:36:03 Anything else we should cover today? 09:36:14 maybe stan you were there to talk about something? 09:38:22 stan: still around? 09:39:31 nothing more on my side 09:39:40 alright then 09:39:44 #endmeeting