09:00:01 <ttx> #startmeeting large_scale_sig
09:00:01 <opendevmeet> Meeting started Wed Jun 19 09:00:01 2024 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:01 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:01 <opendevmeet> The meeting name has been set to 'large_scale_sig'
09:00:20 <ttx> Hi everyone, welcome to our monthly Large Scale SIG meeting!
09:00:24 <amorin> o/
09:00:26 <ttx> #topic Rollcall
09:00:48 <ttx> ping felix.huettner songwenping
09:01:23 <ttx> Our agenda is at:
09:01:27 <ttx> #link  https://etherpad.opendev.org/p/large-scale-sig-meeting
09:02:05 <ttx> Waiting a few minutes in case other participants join late
09:02:06 <songwenping> \o/
09:02:18 <songwenping> hi ttx
09:02:18 <ttx> songwenping: hi! Glad you could make it
09:03:03 <ttx> OK, let's get started
09:03:11 <ttx> #topic Brainstorm OpenInfra Live next episode ideas
09:03:46 <ttx> In previous meetings we discussed a potential new episode, after unsuccessfully trying to crowdsource one frmo the rset of the community
09:04:14 <ttx> amorin: we were considering one around infrastructure for GPUs, did you manage to convince anyone at OVH around that?
09:04:36 <amorin> I completely forgot to talk about it unfortunately, sorry for this
09:04:49 <amorin> so I will ask, adding in my local todo right now
09:05:07 <ttx> I was wondering if we could get https://www.nexgencloud.com/ to talk
09:05:34 <ttx> They are a one of the biggest buyers of GPUs recently and run an openstack cloud
09:05:40 <amorin> what is the idea in your mind?
09:05:49 <amorin> how openstack and gpu can work together?
09:05:57 <amorin> or how is it consumed by customers?
09:06:06 <amorin> or the usage of GPU in cloud?
09:06:29 <ttx> Specific challenges in providing a large scale GPU cloud, I guess
09:06:42 <ttx> identifying any gap
09:06:56 <songwenping> GPU management? our product adapt many kinds of GPUs, like A
09:07:02 <ttx> trying to anticipate questions the next GPU cloud deployer may have
09:07:05 <amorin> so, so more related to infrastructure than customer use cases
09:07:31 <ttx> yeah... Would not mind some shiny workload example too, but that's a bit orthogonal to our SIG purpose
09:07:37 <songwenping> A100, A40, V100, P100 and so on.
09:07:53 <ttx> Could be more of a panel thing
09:08:08 <amorin> ok, I have a guy for this in the team, will ask if he is willing to join/talk about it
09:08:25 <ttx> Experience operating an OpenStack GPU cloud those days
09:08:48 <ttx> cool. We'll reach out to Nexgen see if they are interested
09:08:57 <ttx> and then open it up to others
09:09:11 <ttx> probably somethign we'd do in ~October
09:09:31 <ttx> September we'll be busy at OpenInfra Summit Asia
09:09:33 <amorin> ack, so we have time to refine this, that'd good
09:09:39 <ttx> and July-August will be tricky
09:10:02 <ttx> #agreed let's try to do a panel episode around Experience operating an OpenStack GPU cloud
09:10:21 <ttx> #action amorin to confirm an OVHCloud speaker
09:10:37 <amorin> ack
09:10:38 <ttx> #action ttx to see if someone from nexgen would be interested
09:10:52 <ttx> #info targeting October timeframe
09:11:21 <amorin> maybe have sylvain bauza in the talk as well? he is involved in GPU and openstack a lot
09:11:30 <ttx> yeah that's a good idea...
09:11:47 <ttx> #info Sylvain Bauza could bring the development angle
09:12:11 <ttx> I'll give it some extra thought and pull Allison in for extra ideas
09:12:22 <ttx> moving on to next topic
09:12:23 <ttx> #topic Large scale doc
09:12:39 <ttx> songwenping sent a great report to the mailing-list
09:12:51 <ttx> #link https://etherpad.opendev.org/p/large-scale-inspur
09:12:59 <ttx> There were some open followup questions
09:13:03 <amorin> yes, that's great, thanks!
09:13:16 <ttx> mnaser asked "How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?"
09:13:49 <ttx> than amorin had questions too
09:13:52 <ttx> then*
09:14:44 <amorin> yup, I am eager to learn more about what you wanted to achieve and what you exactly did to fix your deployment
09:16:02 <songwenping> amorin, good question. we want to manage more nodes as there are big requirement for customer.
09:16:35 <ttx> songwenping: did you see those questions on the mailing-list? ideally you would respond there so that everyone benefits
09:16:43 <songwenping> we use k8s infrastructure to deploy openstack
09:17:13 <songwenping> sorry, maybe i miss the mail
09:17:20 <amorin> e.g. you mentionned booting 3k instances and having scheduler / placement issue. Is it because you ask those 3k instance in one shot?
09:19:42 <ttx> songwenping: still here?
09:20:00 <songwenping> yeah
09:20:13 <songwenping> i am finding the mail.
09:20:26 <songwenping> but still not find :(
09:20:36 <ttx> ah, let me link
09:21:00 <ttx> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/ISIG5TG4DYCTDTP4ZJNJFYCSUVYMX5BT/
09:21:19 <ttx> you will see both questions there ^
09:21:33 <songwenping> amorin, yes, we send requests to create 3k instances in one shot.
09:22:07 <amorin> that make sense then, that's an unusual use case, amazing!
09:22:31 <ttx> ideally you would reply by email to the mailing-list again, adressing mnaser's and amorin's questions
09:22:42 <ttx> that way everyone else can see the answers
09:22:45 <amorin> yes, sounds good to me also
09:23:11 <ttx> songwenping: would that work for you?
09:23:50 <songwenping> ttx, could you please forward the mail to me?
09:24:10 <amorin> it's weird you did no receive it, maybe check you spam box?
09:24:19 <ttx> can you see them at the link I just posted?
09:24:29 <ttx> https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/ISIG5TG4DYCTDTP4ZJNJFYCSUVYMX5BT/
09:24:49 <songwenping> i can see at the link
09:24:55 <ttx> ok perfect
09:25:08 <ttx> #action songwenping to reply to the questions on the mailing-list
09:25:33 <ttx> amorin: is there anything new in the report that could be documented in the large-scale sig doc?
09:26:16 <songwenping> but i canot reply at the link.
09:26:38 <amorin> I believe yes, we can have something new to add to the doc
09:27:19 <ttx> I'll forward you both emails now
09:27:25 <amorin> however, we need to explain your use-case correctly also, because, e.g. max_connections = 100000 is unusual and maybe counter productive
09:27:33 <songwenping> ttx, thansks.
09:28:03 <amorin> the rabbit config you did also, I need to understand the details of it
09:28:23 <amorin> maybe your situation could also be improved if you switch to quorum queues
09:28:33 <amorin> I dont know for now to be honest
09:28:43 <amorin> let's continue the mail thread
09:29:35 <ttx> OK emails forwarded... let me know if you receive them :)
09:29:38 <songwenping> amorin, we donnot use quorum queues.
09:31:08 <ttx> OK let's continue the discussion on the mailing-list and we'll see if we can extract a few things from the story to add to the doc
09:31:14 <amorin> yup
09:31:16 <ttx> #topic Next meeting(s)
09:31:16 <songwenping> ttx, exactly not yet receive.
09:31:52 <ttx> Normally the next meeting would be on July 17, but I won't be around. Should we skip for summer and do next one September 18?
09:32:18 <ttx> songwenping: sent to the inspur.com address you used to post
09:32:21 <songwenping> amorin, i will complete the rabbit detail optimization on the etherpad.
09:32:31 <ttx> great!
09:32:59 <amorin> thanks
09:33:07 <songwenping> ttx, recevied just now, thanks.
09:33:12 <amorin> july 17 I will also be off
09:33:38 <ttx> OK so that one is a skip for sure
09:33:53 <amorin> we can maybe skip meetings this summer, agre
09:34:23 <ttx> We could keep the August 21 one if you are around
09:35:10 <amorin> I should be there
09:35:19 <ttx> OK let's keep it on the agenda
09:35:35 <ttx> #info next meeting, August 21 on IRC
09:35:49 <ttx> #topic Open discussion
09:36:03 <ttx> Anything else we should cover today?
09:36:14 <amorin> maybe stan you were there to talk about something?
09:38:22 <ttx> stan: still around?
09:39:31 <amorin> nothing more on my side
09:39:40 <ttx> alright then
09:39:44 <ttx> #endmeeting