21:00:18 <oneswig> #startmeeting scientific-sig 21:00:19 <openstack> Meeting started Tue Jan 21 21:00:18 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:23 <openstack> The meeting name has been set to 'scientific_sig' 21:00:30 <oneswig> up up and away! 21:00:35 <rbudden> hello 21:00:38 <oneswig> #topic who's here 21:00:48 <trandles> Hi all 21:00:50 <oneswig> hi all 21:00:54 <jmlowe> hey! 21:01:06 <martial> that's a topic! 21:01:33 <oneswig> The new informal SIG. I'm not even wearing a tie tonight :-) 21:02:37 <oneswig> OK, I forgot to post an agenda on the wiki, apologies. 21:02:48 <oneswig> #chair martial 21:02:49 <openstack> Current chairs: martial oneswig 21:03:18 <oneswig> #topic Conferences and CFPs for 2020 21:03:31 <oneswig> So what's coming up? 21:04:50 <oneswig> If you're in London in early March, this free 1-day conference at the Francis Crick Institute is excellent: https://cloud.ac.uk/ 21:04:50 <martial> GTC in March 21:05:06 <martial> OpenStack Summit in June (CFP coming out soon I was told) 21:05:11 <martial> PEARC ? 21:05:15 <oneswig> martial: where and when for GTC? You going? 21:05:27 <martial> GTC, LA, March 21:05:32 <martial> planning to 21:06:17 <jmlowe> ISC, SuperCompCloud, working on the CFP and program committee (nominations taken) https://sites.google.com/view/supercompcloud 21:06:28 <oneswig> Once you're done with GTC, have some downtime lakeside in Ticino, Switzerland: https://www.hpcadvisorycouncil.com/events/2020/swiss-workshop/ 21:06:34 <jmlowe> PEARC in Portland at the end of July 21:07:03 <martial> SC20 in Atlanta ... November? 21:07:08 <jmlowe> yep 21:07:10 <oneswig> jmlowe: you planning a presentation to submit for PEARC this year? 21:07:36 <jmlowe> Possibly, definitely if JS2 gets funded 21:10:53 <oneswig> I haven't checked yet when the deadline is for OpenStack Vancouver - anyone know? 21:11:32 <trandles> CFP hasn't even opened has it? 21:11:37 <martial> don't think it is out yet 21:11:49 <martial> Ildiko mentioned it should be soon 21:12:02 <trandles> There's nothing but dates and location on the normal summit page. 21:12:55 <oneswig> It's pretty minimal right now - https://www.openstack.org/summit/vancouver-2020/ 21:13:06 <trandles> Oh, looks like an eventbrite registration page is open now. Totally new format by the look of it? 21:13:12 <trandles> https://www.eventbrite.com/e/opendev-ptg-vancouver-2020-tickets-88923270897?aff=opendevptg2020&_ga=2.64326319.100008772.1579641137-574136464.1557423154 21:15:32 <oneswig> It actually looks great - quite a grassroots feel to the way it is presented. 21:16:44 <martial> looks very different to what we are used to indeed 21:16:46 <trandles> Yeah, much less like a conference and more like a big integrated PTG? 21:17:02 <martial> sure does 21:17:40 <oneswig> This plenary opener - "Short kickoff with all attendees to set the goals for the day or discuss the outcomes of the previous day." - sounds a lot more "intimate", for sure. 21:18:08 <rbudden> indeed, it looks like the format has drastically changed 21:18:18 <martial> I think there will still be an upstream academy 21:19:20 <trandles> Wonder how the various SIGs fit into the structure? Will they still support SIG meetings as part of the program? 21:19:33 <trandles> Or maybe we'll have to self-organize 21:19:45 <oneswig> trandles: hopefully in a different way than in Shanghai, where it didn't work out at all 21:20:00 <martial> not sure yet, once the call for participation is open we can ask 21:20:23 <martial> yes that sound like it was less what was expected at the time 21:20:25 <trandles> "PTG: Afternoon working sessions for project teams and SIGs to continue the morning�s discussions." 21:20:44 <martial> wait it only one day? 21:21:05 <trandles> 4 days 21:21:26 <martial> okay 8-11 21:21:30 <martial> makes more sense 21:21:32 <jmlowe> They were saying it would be different, I wasn't sure I believed them until now 21:22:01 <trandles> It's 4 days and they call out 4 focus areas, I wonder if each day will be dedicated to each area 21:22:14 <oneswig> I like it. I think we can get stuff done here. 21:22:36 <trandles> gah - "if each day will be dedicated to one area" 21:22:59 <trandles> It's certainly a lot cheaper than past summits! 21:23:11 <martial> well I look forward to learning new things 21:23:34 <martial> although the question remains about "is it a long PTG" 21:24:28 <martial> anything else? 21:24:32 <oneswig> definitely scope for an evening social! 21:24:40 <oneswig> OK, move on? 21:24:56 <martial> go 21:25:12 <jmlowe> I appreciate social events closer to sea level 21:25:17 <martial> :) 21:25:37 <oneswig> #topic large-scale SIG data 21:25:58 <oneswig> I've been taking part in the large-scale SIG meetings 21:26:04 <martial> expand on the topic please? 21:26:10 <oneswig> They are looking for operator data on pain points 21:26:28 <martial> oh right, that is a lot of our SIG users :) 21:26:29 <oneswig> #link mailing list post http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html 21:27:10 <oneswig> #link etherpad for information on scaling https://etherpad.openstack.org/p/large-scale-sig-documentation 21:27:21 <oneswig> Could actually become a handy resource that one. 21:27:28 <jmlowe> tldr; did they define large-scale? 21:28:32 <oneswig> Not precisely. But approaching 1000 hypervisors is definitely in it. 500+ will be showing a lot of the early signs of scaling trouble 21:30:14 <oneswig> Our contribution so far has been around "golden signals" for scaling problems - rates, latencies, errors, and all that. 21:30:50 <oneswig> to which end, the most useful thing I've seen to date is telemetry from HAProxy 21:31:40 <trandles> HAProxy at that scale? The OSA documents explicitly suggest you don't want to use haproxy in production at all. 21:32:10 <oneswig> It will give response latencies by endpoint for example. Very cool. And a graph of 500 response codes. Highly useful. 21:32:57 <oneswig> trandles: that's surprising. We build upon it as standard. Multiple in a deploy, sometimes. 21:33:44 <oneswig> As someone who appreciates a tight piece of C code, what's the objection, and indeed the alternative? 21:33:48 <jmlowe> um what!? no HAProxy, everything I have would fall over without it 21:33:53 <trandles> Looking for the reference now, but IIRC, it says something like "HAProxy is appropriate for testing but you should use a real hardware director in production" 21:34:21 <martial> what are the alternatives? 21:34:35 <jmlowe> sure, I've got a spare couple of million for a enterprise load balancer 21:35:07 <oneswig> trandles: did you heed this guidance? :-) 21:35:38 <jmlowe> I certainly didn't 21:36:04 <trandles> oneswig, I can't even get a deployment to finish. I spend most of my time googling to figure out what configuration variable I need to set that is left out of the OSA guide. 21:36:06 <oneswig> I don't think I've seen our OpenStack APIs generate enough traffic or connections to be a concern for it. Famous last words perhaps. 21:36:49 <trandles> Maybe Vancouver should have one topic: decent documentation 21:36:52 <oneswig> trandles: feel your pain. My own notes are the same. I'm like, "thanks, former self"... 21:37:17 <trandles> oneswig, I have started a new notebook of nothing but "gotchas and solutions" to deploying OpenStack 21:37:53 <oneswig> Thats awesome. Next time you do it, it'll sail through! 21:39:05 <rbudden> +1 on docs. I feel like the HA guides are lacking, and there’s an overall sense in the community that if you are doing production it’s assumed you are doing OSA or something similar which doesn’t really fit all deployment models 21:39:06 <martial> I feel that there is a book in the making here 21:39:47 <trandles> # load balancer 21:39:47 <trandles> # Ideally the load balancer should not use the Infrastructure hosts. 21:39:47 <trandles> # Dedicated hardware is best for improved performance and security. 21:39:53 <jmlowe> you're doing something seriously wrong if the openstack api's knock over haproxy for anything other than storing glance image 21:40:01 <trandles> In the OSA config file 21:40:25 <oneswig> Anyway, for haproxy users, I highly recommend the telemetry. There's a prometheus endpoint, which we scrape directly or poll and store in Monasca. 21:40:26 <trandles> But that's not exactly the comment I'm remembering 21:40:42 <oneswig> trandles: I might read that comment as "dedicated hardware for running haproxy" 21:40:52 <trandles> oneswig, I agree 21:41:13 <rbudden> indeed. running haproxy on control plane nodes isn’t really a great idea 21:41:33 <rbudden> @trandles i’d be curious to chat sometime about your experiences with OSA 21:41:50 <martial> +1 on the dedicated hardware ... but that was a long time ago when we did our first deployment 21:42:13 <trandles> oneswig, you said prometheus...did you replace gnocchi with it? 21:42:25 <trandles> rbudden, any time 21:43:08 <jmlowe> just from a security stand point, you greatly reduce your attack surface by only letting haproxy touch the outside and keeping your controllers inside 21:43:20 <oneswig> I saw it in two configs today - one with Prometheus (and for metric storage), the other with Monasca scraping the same endpoint (ending up in influx). 21:44:04 <rbudden> jmlowe: +1 this is rather important for our setups 21:44:42 <trandles> rbudden, the reason I'm asking about prometheus/gnocchi is because gnocchi wouldn't install via OSA without serious fiddling and someone in OSA project pointed me to a third-person's ansible for prometheus installation 21:44:54 <oneswig> Interesting aside, the Red Hat monitoring framework SAF apparently standardises on collectd instead. 21:45:36 <rbudden> the future of Gnocchni and/or it’s replacement is concerning 21:46:22 <rbudden> being knee deep in a large scale Train deploy at the moment i’m curious if/when there will be a concensus in the community as to the direction of Telemetry 21:46:51 <trandles> rbudden, I'm deploying train via OSA, we definitely need to have a call very soon 21:47:12 <rbudden> sounds like a plan 21:47:38 <oneswig> prometheus is getting wide adoption but the back-end storage has limitations, such as control over retention period. We've done a lot with influx (but never paid for the enterprise HA version) 21:47:46 <rbudden> we currently have a custom hybrid Puppet/Ansible model with the intent on moving it entirely to Ansible as time allows 21:48:57 <oneswig> Does anyone have good pointers for telemetry that they monitor? 21:49:32 <martial> for us it is heavy Prometheus and heavy Ansible (but we do only have six racks) 21:50:11 <rbudden> my understanding was Promethus with Ceilometer was not appropriate for billing data 21:50:45 <trandles> 10 minute warning btw - in case we're in the weeds ;) 21:51:30 <oneswig> As a peddlar of scalar telemetry, Prometheus doesn't get notifications, which can be useful for fine-grained timing. 21:51:42 <martial> I think the next topic is AOB ... so "in the weeds" works :) 21:51:53 <oneswig> #topic AOB 21:51:57 <oneswig> seamless transition 21:52:03 <rbudden> :) 21:52:18 <oneswig> Wanted to know if anyone has been using WekaIO? 21:52:25 <martial> I want my emojis to high give Stig :) 21:52:35 <martial> -g+f 21:52:43 <trandles> oneswig, took me a few months to get WekaIO to stop cold calling me. :P 21:53:02 <oneswig> trandles: surely after the first call they knew who you were? :-) 21:53:55 <trandles> Artful Dodger kept answering my phone 21:54:22 <oneswig> I've been working on getting it well integrated with OpenStack clients, in a mutually beneficial way. 21:55:16 <oneswig> With help we got a client mounting Weka in "fast mode", ie SR-IOV and DPDK 21:57:07 <oneswig> Don't have benchmark data yet, got held up building images with very specific versions (CentOS 7.6, MLNX-OFED 4.6). It can be hard to go back a few versions on cloud software :-) 21:58:32 <trandles> I did see this today: https://www.hpcwire.com/off-the-wire/openstack-software-adds-native-upstream-support-for-hdr-200-gigabit-infiniband/ 21:58:53 <martial> oneswig: seems like a pain to be honest 21:58:57 <oneswig> Ah, saw that too. What's the big deal, surely one version of IB is like another? 21:58:58 <martial> trandles: that is cool 21:59:31 <trandles> oneswig, no idea 22:00:03 <oneswig> janders would probably know, but I think he's been doing this for ages anyway! 22:00:10 <trandles> I don't know enough about HDR 200 to know if it has to do with virtual functions, verbs-type stuff...?? 22:00:39 <oneswig> Don't know, will keep an eye out. 22:00:48 <oneswig> We are on the hour, time to close 22:00:58 <oneswig> thanks all 22:01:02 <oneswig> #endmeeting