#openstack-meeting log

11:00:15 <oneswig> #startmeeting scientific-sig
11:00:16 <openstack> Meeting started Wed Jan 15 11:00:15 2020 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:19 <openstack> The meeting name has been set to 'scientific_sig'
11:00:26 <oneswig> SIG time!
11:00:35 <oneswig> #link Today's agenda https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_15th_2020
11:00:56 <oneswig> greetings all
11:00:59 <janders> g'day all
11:01:03 <dh3> morning!
11:01:05 <janders> Happy New Year 2020
11:01:08 <oneswig> janders: saw your presentation from kubecon last week, nice work!
11:01:18 <janders> oneswig: thank you! :)
11:01:25 <janders> it was a great conference
11:01:29 <oneswig> clarification - kubecon last november, saw it last week
11:01:40 <oneswig> did you get many questions in follow-up?
11:02:31 <janders> just a few.. now I'm thinking if I replied :)  was pretty hectic end of the year
11:02:42 <oneswig> #link Janders at Kubecon https://www.youtube.com/watch?v=sPfZGHWnKNM
11:03:13 <oneswig> So many moving parts though!  Does it hold together?
11:03:52 <mattia> hello everyone
11:03:59 <oneswig> Hi mattia, welcome
11:04:26 <janders> it does - and it's getting better :)  waiting for some kit to arrive and hope to test 100GE version and HDR200 version of RDMA-K8s
11:04:41 <janders> it's still early days though
11:04:56 <oneswig> Have you looked at options for PMIX so MPI jobs can sit on top?
11:05:16 <janders> no, not yet
11:06:07 <janders> I was mostly thinking connecting to non-k8s storage and GPU workloads. I'm sure MPI workloads will bubble up too though
11:07:10 <oneswig> It seems a lot less straightforward than putting containers into a Slurm environment.  But with the energy going into it I'm sure a workable solution will appear
11:07:49 <oneswig> #link PMIX and kubernetes hinted to here https://pmix.org/wp-content/uploads/2019/04/PMIxSUG2019.pdf
11:08:32 <oneswig> janders: big numbers for your storage bandwidth at the end of the presentation
11:08:34 <janders> it's still work in progress but in a scenario where a fair chunk of workloads is cloud-native and MPI is a small fraction I think this approach makes good sense
11:09:21 <oneswig> yes - that looks like the use case
11:09:30 <oneswig> ... I'm sure we had an agenda somewhere ... oops
11:09:31 <janders> that was actually from our OpenStack system, but when I get more kit I hope to see these in k8s containers as well
11:10:04 <oneswig> be interesting to know where you take it next
11:10:07 <janders> we're getting close to going live with the cyber system so my bosses aren't to keen with me testing on the same kit
11:10:31 <janders> i have some interesting (I hope) questions on that - but let's leave that for AOB
11:10:31 <oneswig> curse those production obligations!
11:10:40 <janders> haha! :)
11:10:46 <oneswig> OK, time to start
11:10:50 <janders> I keep saying ops focused guys they are no fun
11:11:09 <oneswig> #topic OpenStack board elections
11:11:20 <oneswig> Just a quick reminder to vote for individual directors this week
11:12:21 <oneswig> I'd share a link but I think they are all somehow individual
11:12:31 <oneswig> That's all on that :-)
11:12:48 <oneswig> #topic Large-Scale SIG survey
11:13:16 <oneswig> Earlier this morning there was a Large Scale SIG meeting
11:13:55 <oneswig> There don't appear to be minutes for it on Eavesdrop...
11:14:10 <oneswig> aha http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-01-15-09.00.html
11:14:47 <oneswig> #link request for data on large-scale configuration http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html
11:15:48 <dh3> Thanks, I missed that (feeling I'm spread a bit thin at the moment) - not that we are *so* large as some
11:16:22 <oneswig> Hi dh3, everything is relative I guess!
11:17:11 <oneswig> The Large Scale SIG are hoping for input on scaling issues faced in real-world experiences.
11:17:29 <oneswig> (and what tools to tackle them)
11:17:35 <oneswig> #link Large Scale SIG etherpad https://etherpad.openstack.org/p/large-scale-sig-documentation
11:19:05 <janders> DB, AMQP, Neutron... guessed 3/3 and the order, too :)
11:20:07 <oneswig> janders: low-hanging fruit that :-)
11:20:55 <janders> with other APIs it's probably easier to just deploy more API nodes than to optimise... to a degree that is - and that assumes DB scaling is sorted out separately :)
11:21:06 <oneswig> One interesting starting point was the idea to create an Oslo module for standardising telemetry from OpenStack services, and making it available as prometheus or statsd
11:21:37 <janders> very prom(eth)ising
11:21:56 <oneswig> oh, good :-)
11:22:26 <oneswig> I don't think there's a spec for review yet but I think its underway
11:23:06 <janders> I think scalable telemetry was one of the PITA/WTF areas of OpenStack for quite a while so this sounds awesome
11:23:42 <janders> I got quite far with just switching it off but that's not good either
11:24:18 <dh3> is the intention that this data is for developers or operators? (or both) and if devs, has anyone thought about "phone home" reporting like Ceph added recently?
11:24:31 <oneswig> janders: indeed.
11:25:15 <oneswig> dh3: I think it's for operators principally, but I expect there's a symbiosis where operators have the pain points and use this as evidence for developers, perhaps.
11:26:26 <oneswig> If people are interested they can encourage this effort by adding their support to the etherpad!
11:26:52 <janders> RHATs are pumping a fair bit of effort into SAF
11:26:54 <dh3> OK. For operators there is a downstream process where - having received some data or telemetry trend or errors - they need to decide what to do about it (not necessarily being an expert in tuning MySQL or RabbitMQ or ...)
11:27:07 <janders> ("Service Assurance Framework")
11:27:08 <dh3> I will try to get some coherent thoughts onto the etherpad.
11:27:18 <janders> I wonder if the two directions could be combined
11:27:23 <oneswig> janders: what's that?
11:27:40 <oneswig> The lauchpad tickets on gathering config parameters that are important for scale may be worth a look
11:27:43 <janders> OpenShift/K8S powered, Prometheus based OpenStack monitoring
11:27:46 <dh3> embarrassed to say I had not met SAF and we have RHOSP
11:28:15 <oneswig> Nova https://bugs.launchpad.net/nova/+bug/1838819 Neutron https://bugs.launchpad.net/neutron/+bug/1858419
11:28:15 <openstack> Launchpad bug 1838819 in OpenStack Compute (nova) "Docs needed for tunables at large scale" [Undecided,Confirmed]
11:28:16 <openstack> Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
11:28:23 <janders> it's on my TODO list but due to resourcing constraints our first monitoring solution for the cyber system is gonna be Nagios
11:28:41 <janders> SAF looks quite good though
11:28:51 <oneswig> got a link?
11:29:03 <janders> and its getting a lot of good press from the Brisbane support guys who are real OpenStack gurus, hats off to them
11:29:11 <janders> if they say its good im totally gonna try it
11:29:16 <janders> (looking it up)
11:29:36 <dh3> I found https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/service_assurance_framework/introduction-to-service-assurance-framework_assembly (not sure if there's a free/OSS side?)
11:29:42 <oneswig> looks like it's purely a RHOSP thing
11:30:03 <janders> yes it is but I'd be surprised if it doesn't have a good upstream
11:30:14 <oneswig> The data at source does sound like it would be the same though
11:30:33 <janders> I cant find public link - would you like doco from KB over email?
11:31:28 <dh3> janders: anything you can share will be interesting (though we are likely to move off RH at some point so wary of lockin)
11:31:30 <oneswig> Thanks janders, one to watch
11:31:30 <witek> https://github.com/redhat-service-assurance/telemetry-framework
11:31:39 <oneswig> Hi witek, welcome!
11:31:48 <oneswig> and thanks for the link
11:31:53 <witek> they have presented it first time at the summit in Berlin
11:32:05 <janders> dh3: what's the best email to send it to?
11:32:17 <dh3> janders: dh3@sanger.ac.uk please
11:32:24 <dh3> witek: thanks, I will look up the talk
11:33:55 <janders> oneswig: dh3: done
11:35:47 <dh3> thanks, on first skim it looks the same as the access.redhat.com docs, I will find a quiet corner to read it this afternoon
11:36:03 <janders> yeah that's the source
11:36:34 <janders> I reviewed this with my monitoring expert and we figured it's worth a go, might be a good solution going forward
11:36:40 <oneswig> If more telemetry data is made available via standard protocols, everyone wins
11:36:58 <janders> but it's too complex to be deployed as the initial monitoring solution for our cyber system
11:38:00 <witek> it follows similar architecture as Monasca
11:38:21 <oneswig> I should perhaps rtfm but I wonder what SAF are doing for the limitations around eg retention policies in prometheus
11:39:36 <oneswig> janders: do they have a story for logs and notifications?
11:39:38 <janders> speaking of OSP, 16 is out in beta
11:39:52 <janders> notifications - yes, I believe so
11:39:57 <janders> logs - in what sense?
11:40:05 <oneswig> you know, like syslog
11:40:10 <janders> logstash/elasticsearch kind of stuff?
11:40:30 <oneswig> yep
11:40:32 <janders> I think they do this via collectd somehow but this part I'm less familiar with
11:41:17 <oneswig> sounds like when they get there they will have reinvented much of what monasca does today ;-)
11:41:31 <janders> :)
11:41:52 <janders> as long as the wheel is still round, a little reinvention is tolerable even if not optimal
11:42:07 <oneswig> anyway, should move on I guess.  Final comments on large scale?
11:42:26 <janders> nothing specific just big +1 for the initiative
11:42:32 <dh3> likewise
11:42:45 <janders> I think there was a need for this for a very long time
11:43:21 <oneswig> sure, I think so too
11:43:29 <oneswig> #topic events for 2020
11:43:48 <janders> on that... do you guys know what's the CFP timeframe for Vancouver?
11:43:48 <oneswig> OK y'all, lets get some links out for events coming up...
11:43:59 <oneswig> janders: not heard, nor checked
11:44:11 <janders> i looked around but haven't managed to find any info
11:44:22 <dh3> we were talking about conferences in the office this morning. wondering when the European summit dates/location might be announced. and a bit sad there are no upcoming OpenStack Days listed
11:44:28 <janders> it feels like that's typically Jan-Feb for the May conferences
11:44:47 <janders> this year NA event is bit later so perhaps so is CFP
11:44:47 <oneswig> #link UKRI Coud Workshop (London) https://cloud.ac.uk/
11:45:13 <oneswig> CFP closing date for that is end of Jan I think
11:46:13 <janders> is there anything interesting coming up for ISC2020? I was asked by my boss if I wanna go
11:46:55 <oneswig> I know the group who did SuperCompCloud at SC2019 have a similar workshop planned (CSCS in Switzerland, Indiana University, maybe others)
11:47:17 <dh3> we had the Arista conference ("Innovate") and Lustre LUG/LAD on our radar
11:47:54 <janders> at least OIS Vancouver and ISC20 don't overlap (but are close enough to make it tricky for us Aussies to attend both)
11:50:02 <oneswig> janders: I'm expecting to go to ISC 2020, hopefully see you there.
11:50:13 <janders> that would be great! :)
11:50:36 <janders> are you likely headed to Vancouver as well?
11:51:19 <oneswig> I expect so, provided they aren't the same week :-)
11:51:39 <janders> looks like they are not :)
11:51:43 <oneswig> Not planning on Cephalocon though - Seoul in March https://ceph.io/cephalocon/
11:52:53 <oneswig> #link HPCAC Swiss Workshop in April is also good https://www.hpcadvisorycouncil.com/events/2020/swiss-workshop/
11:53:45 <oneswig> dh3: where do cloud-oriented bioinformaticions go for conferences?
11:55:04 <dh3> oneswig: I'm not sure that the cloud emphasis affects the choice of conference but I will ask around
11:56:05 <janders> as we're nearing the end of the hour I have some AOB
11:56:13 <oneswig> #topic AOB
11:56:14 <dh3> other than some of platform-oriented things like Kubecon maybe have a bit more of an application focus than say the OpenStack summit
11:56:16 <oneswig> well spotted :-)
11:56:32 <janders> do you have much experience with cinder volume transfers?
11:56:55 <janders> I'm thinking using this as a tool for devsecops guys to make malware samples available to cyber researchers
11:57:19 <dh3> we tried it several releases back and it "just worked" but we didn't have a use case/user story for it
11:57:58 <dh3> for small scale (malware samples are small right) I might look at image sharing instead?
11:58:15 <janders> 1) people upload encrypted malware to secure dropbox 2) devsecops guys have a VM that sits on the same provider network as the dropbox and they create a volume, attach it and copy encrypted malware in 3) they give the volume to the researcher
11:58:27 <janders> I need a more fancy backend setup than glance can offer
11:58:47 <janders> with cinder I can have separate pools for normal and disco volumes
11:59:00 <janders> i tried transfers today and it *just worked*
11:59:16 <oneswig> I've used rbd export to effectively dd out a running volume before
11:59:20 <janders> if you guys have no horror stories how it breaks terribly in some corner cases I think I will endorse this solution
11:59:46 <oneswig> not used a cinder method before though
12:00:18 <janders> ok! thanks oneswig
12:00:18 <dh3> rbd export works for us (in "trying circumstances" too)
12:00:25 <oneswig> good luck with it janders!
12:00:33 <janders> thanks and I will report back
12:00:33 <oneswig> ok, time to close, thanks everyone
12:00:38 <janders> thank you all
12:00:40 <oneswig> #endmeeting