21:00:18 <oneswig> #startmeeting scientific-sig
21:00:19 <openstack> Meeting started Tue Jan 21 21:00:18 2020 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:23 <openstack> The meeting name has been set to 'scientific_sig'
21:00:30 <oneswig> up up and away!
21:00:35 <rbudden> hello
21:00:38 <oneswig> #topic who's here
21:00:48 <trandles> Hi all
21:00:50 <oneswig> hi all
21:00:54 <jmlowe> hey!
21:01:06 <martial> that's a topic!
21:01:33 <oneswig> The new informal SIG.  I'm not even wearing a tie tonight :-)
21:02:37 <oneswig> OK, I forgot to post an agenda on the wiki, apologies.
21:02:48 <oneswig> #chair martial
21:02:49 <openstack> Current chairs: martial oneswig
21:03:18 <oneswig> #topic Conferences and CFPs for 2020
21:03:31 <oneswig> So what's coming up?
21:04:50 <oneswig> If you're in London in early March, this free 1-day conference at the Francis Crick Institute is excellent: https://cloud.ac.uk/
21:04:50 <martial> GTC in March
21:05:06 <martial> OpenStack Summit in June (CFP coming out soon I was told)
21:05:11 <martial> PEARC ?
21:05:15 <oneswig> martial: where and when for GTC?  You going?
21:05:27 <martial> GTC, LA, March
21:05:32 <martial> planning to
21:06:17 <jmlowe> ISC, SuperCompCloud, working on the CFP and program committee (nominations taken) https://sites.google.com/view/supercompcloud
21:06:28 <oneswig> Once you're done with GTC, have some downtime lakeside in Ticino, Switzerland: https://www.hpcadvisorycouncil.com/events/2020/swiss-workshop/
21:06:34 <jmlowe> PEARC in Portland at the end of July
21:07:03 <martial> SC20 in Atlanta ... November?
21:07:08 <jmlowe> yep
21:07:10 <oneswig> jmlowe: you planning a presentation to submit for PEARC this year?
21:07:36 <jmlowe> Possibly, definitely if JS2 gets funded
21:10:53 <oneswig> I haven't checked yet when the deadline is for OpenStack Vancouver - anyone know?
21:11:32 <trandles> CFP hasn't even opened has it?
21:11:37 <martial> don't think it is out yet
21:11:49 <martial> Ildiko mentioned it should be soon
21:12:02 <trandles> There's nothing but dates and location on the normal summit page.
21:12:55 <oneswig> It's pretty minimal right now - https://www.openstack.org/summit/vancouver-2020/
21:13:06 <trandles> Oh, looks like an eventbrite registration page is open now. Totally new format by the look of it?
21:13:12 <trandles> https://www.eventbrite.com/e/opendev-ptg-vancouver-2020-tickets-88923270897?aff=opendevptg2020&_ga=2.64326319.100008772.1579641137-574136464.1557423154
21:15:32 <oneswig> It actually looks great - quite a grassroots feel to the way it is presented.
21:16:44 <martial> looks very different to what we are used to indeed
21:16:46 <trandles> Yeah, much less like a conference and more like a big integrated PTG?
21:17:02 <martial> sure does
21:17:40 <oneswig> This plenary opener - "Short kickoff with all attendees to set the goals for the day or discuss the outcomes of the previous day." - sounds a lot more "intimate", for sure.
21:18:08 <rbudden> indeed, it looks like the format has drastically changed
21:18:18 <martial> I think there will still be an upstream academy
21:19:20 <trandles> Wonder how the various SIGs fit into the structure? Will they still support SIG meetings as part of the program?
21:19:33 <trandles> Or maybe we'll have to self-organize
21:19:45 <oneswig> trandles: hopefully in a different way than in Shanghai, where it didn't work out at all
21:20:00 <martial> not sure yet, once the call for participation is open we can ask
21:20:23 <martial> yes that sound like it was less what was expected at the time
21:20:25 <trandles> "PTG: Afternoon working sessions for project teams and SIGs to continue the morning�s discussions."
21:20:44 <martial> wait it only one day?
21:21:05 <trandles> 4 days
21:21:26 <martial> okay 8-11
21:21:30 <martial> makes more sense
21:21:32 <jmlowe> They were saying it would be different, I wasn't sure I believed them until now
21:22:01 <trandles> It's 4 days and they call out 4 focus areas, I wonder if each day will be dedicated to each area
21:22:14 <oneswig> I like it.  I think we can get stuff done here.
21:22:36 <trandles> gah - "if each day will be dedicated to one area"
21:22:59 <trandles> It's certainly a lot cheaper than past summits!
21:23:11 <martial> well I look forward to learning new things
21:23:34 <martial> although the question remains about "is it a long PTG"
21:24:28 <martial> anything else?
21:24:32 <oneswig> definitely scope for an evening social!
21:24:40 <oneswig> OK, move on?
21:24:56 <martial> go
21:25:12 <jmlowe> I appreciate social events closer to sea level
21:25:17 <martial> :)
21:25:37 <oneswig> #topic large-scale SIG data
21:25:58 <oneswig> I've been taking part in the large-scale SIG meetings
21:26:04 <martial> expand on the topic please?
21:26:10 <oneswig> They are looking for operator data on pain points
21:26:28 <martial> oh right, that is a lot of our SIG users :)
21:26:29 <oneswig> #link mailing list post http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html
21:27:10 <oneswig> #link etherpad for information on scaling https://etherpad.openstack.org/p/large-scale-sig-documentation
21:27:21 <oneswig> Could actually become a handy resource that one.
21:27:28 <jmlowe> tldr; did they define large-scale?
21:28:32 <oneswig> Not precisely.  But approaching 1000 hypervisors is definitely in it.  500+ will be showing a lot of the early signs of scaling trouble
21:30:14 <oneswig> Our contribution so far has been around "golden signals" for scaling problems - rates, latencies, errors, and all that.
21:30:50 <oneswig> to which end, the most useful thing I've seen to date is telemetry from HAProxy
21:31:40 <trandles> HAProxy at that scale? The OSA documents explicitly suggest you don't want to use haproxy in production at all.
21:32:10 <oneswig> It will give response latencies by endpoint for example.  Very cool.  And a graph of 500 response codes.  Highly useful.
21:32:57 <oneswig> trandles: that's surprising.  We build upon it as standard.  Multiple in a deploy, sometimes.
21:33:44 <oneswig> As someone who appreciates a tight piece of C code, what's the objection, and indeed the alternative?
21:33:48 <jmlowe> um what!? no HAProxy, everything I have would fall over without it
21:33:53 <trandles> Looking for the reference now, but IIRC, it says something like "HAProxy is appropriate for testing but you should use a real hardware director in production"
21:34:21 <martial> what are the alternatives?
21:34:35 <jmlowe> sure, I've got a spare couple of million for a enterprise load balancer
21:35:07 <oneswig> trandles: did you heed this guidance? :-)
21:35:38 <jmlowe> I certainly didn't
21:36:04 <trandles> oneswig, I can't even get a deployment to finish. I spend most of my time googling to figure out what configuration variable I need to set that is left out of the OSA guide.
21:36:06 <oneswig> I don't think I've seen our OpenStack APIs generate enough traffic or connections to be a concern for it.  Famous last words perhaps.
21:36:49 <trandles> Maybe Vancouver should have one topic: decent documentation
21:36:52 <oneswig> trandles: feel your pain.  My own notes are the same.  I'm like, "thanks, former self"...
21:37:17 <trandles> oneswig, I have started a new notebook of nothing but "gotchas and solutions" to deploying OpenStack
21:37:53 <oneswig> Thats awesome.  Next time you do it, it'll sail through!
21:39:05 <rbudden> +1 on docs. I feel like the HA guides are lacking, and there’s an overall sense in the community that if you are doing production it’s assumed you are doing OSA or something similar which doesn’t really fit all deployment models
21:39:06 <martial> I feel that there is a book in the making here
21:39:47 <trandles> # load balancer
21:39:47 <trandles> # Ideally the load balancer should not use the Infrastructure hosts.
21:39:47 <trandles> # Dedicated hardware is best for improved performance and security.
21:39:53 <jmlowe> you're doing something seriously wrong if the openstack api's knock over haproxy for anything other than storing glance image
21:40:01 <trandles> In the OSA config file
21:40:25 <oneswig> Anyway, for haproxy users, I highly recommend the telemetry.  There's a prometheus endpoint, which we scrape directly or poll and store in Monasca.
21:40:26 <trandles> But that's not exactly the comment I'm remembering
21:40:42 <oneswig> trandles: I might read that comment as "dedicated hardware for running haproxy"
21:40:52 <trandles> oneswig, I agree
21:41:13 <rbudden> indeed. running haproxy on control plane nodes isn’t really a great idea
21:41:33 <rbudden> @trandles i’d be curious to chat sometime about your experiences with OSA
21:41:50 <martial> +1 on the dedicated hardware ... but that was a long time ago when we did our first deployment
21:42:13 <trandles> oneswig, you said prometheus...did you replace gnocchi with it?
21:42:25 <trandles> rbudden, any time
21:43:08 <jmlowe> just from a security stand point, you greatly reduce your attack surface by only letting haproxy touch the outside and keeping your controllers inside
21:43:20 <oneswig> I saw it in two configs today - one with Prometheus (and for metric storage), the other with Monasca scraping the same endpoint (ending up in influx).
21:44:04 <rbudden> jmlowe: +1 this is rather important for our setups
21:44:42 <trandles> rbudden, the reason I'm asking about prometheus/gnocchi is because gnocchi wouldn't install via OSA without serious fiddling and someone in OSA project pointed me to a third-person's ansible for prometheus installation
21:44:54 <oneswig> Interesting aside, the Red Hat monitoring framework SAF apparently standardises on collectd instead.
21:45:36 <rbudden> the future of Gnocchni and/or it’s replacement is concerning
21:46:22 <rbudden> being knee deep in a large scale Train deploy at the moment i’m curious if/when there will be a concensus in the community as to the direction of Telemetry
21:46:51 <trandles> rbudden, I'm deploying train via OSA, we definitely need to have a call very soon
21:47:12 <rbudden> sounds like a plan
21:47:38 <oneswig> prometheus is getting wide adoption but the back-end storage has limitations, such as control over retention period.  We've done a lot with influx (but never paid for the enterprise HA version)
21:47:46 <rbudden> we currently have a custom hybrid Puppet/Ansible model with the intent on moving it entirely to Ansible as time allows
21:48:57 <oneswig> Does anyone have good pointers for telemetry that they monitor?
21:49:32 <martial> for us it is heavy Prometheus and heavy Ansible (but we do only have six racks)
21:50:11 <rbudden> my understanding was Promethus with Ceilometer was not appropriate for billing data
21:50:45 <trandles> 10 minute warning btw - in case we're in the weeds ;)
21:51:30 <oneswig> As a peddlar of scalar telemetry, Prometheus doesn't get notifications, which can be useful for fine-grained timing.
21:51:42 <martial> I think the next topic is AOB ... so "in the weeds" works :)
21:51:53 <oneswig> #topic AOB
21:51:57 <oneswig> seamless transition
21:52:03 <rbudden> :)
21:52:18 <oneswig> Wanted to know if anyone has been using WekaIO?
21:52:25 <martial> I want my emojis to high give Stig :)
21:52:35 <martial> -g+f
21:52:43 <trandles> oneswig, took me a few months to get WekaIO to stop cold calling me. :P
21:53:02 <oneswig> trandles: surely after the first call they knew who you were? :-)
21:53:55 <trandles> Artful Dodger kept answering my phone
21:54:22 <oneswig> I've been working on getting it well integrated with OpenStack clients, in a mutually beneficial way.
21:55:16 <oneswig> With help we got a client mounting Weka in "fast mode", ie SR-IOV and DPDK
21:57:07 <oneswig> Don't have benchmark data yet, got held up building images with very specific versions (CentOS 7.6, MLNX-OFED 4.6). It can be hard to go back a few versions on cloud software :-)
21:58:32 <trandles> I did see this today: https://www.hpcwire.com/off-the-wire/openstack-software-adds-native-upstream-support-for-hdr-200-gigabit-infiniband/
21:58:53 <martial> oneswig: seems like a pain to be honest
21:58:57 <oneswig> Ah, saw that too.  What's the big deal, surely one version of IB is like another?
21:58:58 <martial> trandles: that is cool
21:59:31 <trandles> oneswig, no idea
22:00:03 <oneswig> janders would probably know, but I think he's been doing this for ages anyway!
22:00:10 <trandles> I don't know enough about HDR 200 to know if it has to do with virtual functions, verbs-type stuff...??
22:00:39 <oneswig> Don't know, will keep an eye out.
22:00:48 <oneswig> We are on the hour, time to close
22:00:58 <oneswig> thanks all
22:01:02 <oneswig> #endmeeting