09:00:20 #startmeeting large_scale_sig 09:00:21 Meeting started Wed Jan 15 09:00:20 2020 UTC and is due to finish in 60 minutes. The chair is belmoreira. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:25 The meeting name has been set to 'large_scale_sig' 09:00:34 hello! 09:00:36 Hello everyone. Welcome to the Large Scale SIG meeting. 09:00:47 hello! 09:00:47 #topic Rollcall 09:00:52 o/ 09:01:01 o/ 09:01:03 o/ 09:01:30 o/ 09:02:23 great. let's start with our first topic 09:02:30 #topic "Scaling within one cluster" goal 09:02:38 #link https://etherpad.openstack.org/p/large-scale-sig-cluster-scaling 09:03:30 There's no much progress in the etherpad 09:03:59 Afraid not. Over here we are making slow progress towards improving instrumentation. 09:04:43 oslo.metrics blueprint is a great idea 09:05:36 let's wait for masahito feedback in the draft blueprint 09:05:38 I think it ties in well with a fishbowl session from Shanghai, advocating exposing metrics from each service. 09:06:49 But this only focus in collecting metrics 09:06:58 I'm sure we can expand this more 09:07:41 I'm thinking about interaction with other services. Like Placement and how it's deployed 09:07:55 does it make sense? 09:08:28 belmoreira: can you elaborate? 09:10:05 in my opinion scaling one nova cell is more than just rabbit and DB. For example, Nova interacts with placement and neutron and in my view we should focus in those scaling limits as well. 09:11:27 also, being only one cell, how the different components are deployed (shared rabbit or not - for example) will affect the scalability of the deployment 09:11:39 Sorry to be late. My IRC client doesn't work well :-( 09:11:48 hi masahito 09:11:57 welcome masahito 09:12:03 hello o/ 09:12:26 oneswig does it make sense? 09:12:41 belmoreira: is there any study on the growth rate of placement as a function of number of hypervisors? It could be polynomial (ie really bad)... 09:13:01 belmoreira: makes sense to me. 09:13:11 (hi, sorry for being late) I think it is definitely relevant to consider the placement and network 09:14:15 for example: placement is linear, but the number of requests that it can handle will change considering how it's deployed and options 09:14:38 #link discussion from Shanghai on common monitoring via oslo https://etherpad.openstack.org/p/PVG-bring-your-crazy-idea 09:15:21 unfortunately there are no actions or follow-up in that etherpad 09:16:13 it's not that crazy to me. It would be very cool. 09:17:08 masahito do you have any update regarding the oslo.metric blueprint 09:17:34 Not much update. 09:17:47 because of some outage in our cluster :-( 09:18:27 But I want to finish to write the first draft by end of January. 09:18:41 was thinking of your talk about LINE yesterday - had a rabbitmq issue on a system here 09:19:23 Thanks masahito. Let us know. 09:19:49 oneswig related with scalability? 09:20:11 not this time, only moderate scale 09:20:21 would have been timely otherwise :-) 09:20:38 anyone as something else related to this topic? 09:21:53 Let's continue to update the etherpad with our experiences in scaling one nova cell. 09:22:06 moving to the next topic in the agenda 09:22:16 #topic Progress on "Documenting large scale operations" goal 09:22:31 amorin started a thread in the mailing list to document the particular configuration that makes important for large scale deployments 09:22:34 thanks 09:22:40 yes 09:22:40 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html 09:22:44 #link https://etherpad.openstack.org/p/large-scale-sig-documentation 09:23:02 I got an answer from slawek (neutron PTL) 09:23:19 here created a neutron bug: https://bugs.launchpad.net/neutron/+bug/1858419 09:23:19 Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 09:23:29 just like mriedman did for nova 09:23:42 so we can collect the tunings over there 09:23:56 they also notify us about rabbit and DB params 09:24:04 which I believe are important also 09:24:19 amorin: hi, yes and we have some input from liuyulong who is neutron core and has a lot of large scale deployment experience also 09:24:36 so we need to start working on some patches based on those comments 09:24:55 yup 09:24:58 and also if You have anything to add there, feel free to write comments or send patches related to this bug :) 09:25:56 sounds great! 09:26:00 I am pretty sure I can find some tunings based on OVH experience, I will try to do that 09:26:25 This is great. Thank you amorin slaweq 09:26:30 sounds good 09:26:42 Also, I proposed to do some documentation change to identify parameters which could affect large scale 09:26:49 amorin: I know You will :) 09:27:01 what do you think about this? 09:27:24 btw has anyone compared different generally available neutron plugins ovs/ovn/...? 09:28:07 I havnt, on our side we use custom plugins based on OVS 09:28:30 maybe slaweq know if OVN is able to scale? 09:28:32 Only for performance. 09:28:55 amorin: I know there were some tests of ovn on scale done by networking-ovn team 09:28:56 etp definitely that is interesting. It would be great to know more 09:29:07 I can try to search for them if You want to check them 09:29:15 i noted that there's spec in Ussuri to move reference from ovs to ovn 09:29:44 etp: no, for Ussuri are only moving networking-ovn code to be in-tree neutron driver 09:30:10 ah, maybe I misread it :) 09:30:19 but in the future we will probably want to switch our "default" backend to be ovn instead of ovs-agent 09:30:25 but not in Ussuri for sure 09:31:01 I just added in the etherpad a presentation about CERN network deployment and the configuration options that we use. Will cross check to what is proposed. In our case we use linux bridge 09:31:21 ack 09:31:33 slaweq: we are also looking in to same direction, when it happens remains to be seen 09:32:18 etp: yes 09:32:35 belmoreira: tnx 09:32:51 If I will have link to any comparison between ovn and other backends I will place it in etherpad https://etherpad.openstack.org/p/large-scale-sig-cluster-scaling 09:32:53 ok? 09:33:14 please do 09:33:32 yes, thanks 09:34:00 should we have the same approach with other projects? like keystone, placement, glance? 09:34:16 have a bug open for relevant options for large deployments? 09:35:09 I think it could be nice, but it should come from them, we can force them 09:35:49 we cant* 09:35:50 sorry 09:36:01 I had an action item to poll the Scientific SIG for data points. I'll add it to today's meeting (at 1100 UTC) 09:37:03 amorin: ok, let's try to signal issues first in the etherpad 09:37:14 oneswig thanks 09:37:34 yes 09:37:38 anything else related to this topic? 09:38:48 moving on 09:38:49 #topic AOB 09:39:16 Is there something else that you would like to discuss? 09:40:51 not on my side 09:40:53 not for me 09:40:57 #topic Next Meeting 09:41:03 not from my side 09:41:18 If we follow the 2 weeks rule the next meeting will be on January 29. Is this OK? 09:41:34 +1 09:41:47 +1 09:41:59 +1 09:41:59 ls 09:42:09 +1 09:42:18 #agreed next meeting: January 29, 9utc #openstack-meeting 09:42:28 Anything else before we close the meeting? 09:42:40 have to go, thanks everyone 09:42:57 Thanks everyone 09:43:05 ps it's on the agenda now https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_15th_2020 09:43:06 thanks 09:43:06 thanks, all 09:43:20 thanks all 09:43:21 #endmeeting