11:00:15 <oneswig> #startmeeting scientific-sig 11:00:16 <openstack> Meeting started Wed Jan 15 11:00:15 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:19 <openstack> The meeting name has been set to 'scientific_sig' 11:00:26 <oneswig> SIG time! 11:00:35 <oneswig> #link Today's agenda https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_15th_2020 11:00:56 <oneswig> greetings all 11:00:59 <janders> g'day all 11:01:03 <dh3> morning! 11:01:05 <janders> Happy New Year 2020 11:01:08 <oneswig> janders: saw your presentation from kubecon last week, nice work! 11:01:18 <janders> oneswig: thank you! :) 11:01:25 <janders> it was a great conference 11:01:29 <oneswig> clarification - kubecon last november, saw it last week 11:01:40 <oneswig> did you get many questions in follow-up? 11:02:31 <janders> just a few.. now I'm thinking if I replied :) was pretty hectic end of the year 11:02:42 <oneswig> #link Janders at Kubecon https://www.youtube.com/watch?v=sPfZGHWnKNM 11:03:13 <oneswig> So many moving parts though! Does it hold together? 11:03:52 <mattia> hello everyone 11:03:59 <oneswig> Hi mattia, welcome 11:04:26 <janders> it does - and it's getting better :) waiting for some kit to arrive and hope to test 100GE version and HDR200 version of RDMA-K8s 11:04:41 <janders> it's still early days though 11:04:56 <oneswig> Have you looked at options for PMIX so MPI jobs can sit on top? 11:05:16 <janders> no, not yet 11:06:07 <janders> I was mostly thinking connecting to non-k8s storage and GPU workloads. I'm sure MPI workloads will bubble up too though 11:07:10 <oneswig> It seems a lot less straightforward than putting containers into a Slurm environment. But with the energy going into it I'm sure a workable solution will appear 11:07:49 <oneswig> #link PMIX and kubernetes hinted to here https://pmix.org/wp-content/uploads/2019/04/PMIxSUG2019.pdf 11:08:32 <oneswig> janders: big numbers for your storage bandwidth at the end of the presentation 11:08:34 <janders> it's still work in progress but in a scenario where a fair chunk of workloads is cloud-native and MPI is a small fraction I think this approach makes good sense 11:09:21 <oneswig> yes - that looks like the use case 11:09:30 <oneswig> ... I'm sure we had an agenda somewhere ... oops 11:09:31 <janders> that was actually from our OpenStack system, but when I get more kit I hope to see these in k8s containers as well 11:10:04 <oneswig> be interesting to know where you take it next 11:10:07 <janders> we're getting close to going live with the cyber system so my bosses aren't to keen with me testing on the same kit 11:10:31 <janders> i have some interesting (I hope) questions on that - but let's leave that for AOB 11:10:31 <oneswig> curse those production obligations! 11:10:40 <janders> haha! :) 11:10:46 <oneswig> OK, time to start 11:10:50 <janders> I keep saying ops focused guys they are no fun 11:11:09 <oneswig> #topic OpenStack board elections 11:11:20 <oneswig> Just a quick reminder to vote for individual directors this week 11:12:21 <oneswig> I'd share a link but I think they are all somehow individual 11:12:31 <oneswig> That's all on that :-) 11:12:48 <oneswig> #topic Large-Scale SIG survey 11:13:16 <oneswig> Earlier this morning there was a Large Scale SIG meeting 11:13:55 <oneswig> There don't appear to be minutes for it on Eavesdrop... 11:14:10 <oneswig> aha http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-01-15-09.00.html 11:14:47 <oneswig> #link request for data on large-scale configuration http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html 11:15:48 <dh3> Thanks, I missed that (feeling I'm spread a bit thin at the moment) - not that we are *so* large as some 11:16:22 <oneswig> Hi dh3, everything is relative I guess! 11:17:11 <oneswig> The Large Scale SIG are hoping for input on scaling issues faced in real-world experiences. 11:17:29 <oneswig> (and what tools to tackle them) 11:17:35 <oneswig> #link Large Scale SIG etherpad https://etherpad.openstack.org/p/large-scale-sig-documentation 11:19:05 <janders> DB, AMQP, Neutron... guessed 3/3 and the order, too :) 11:20:07 <oneswig> janders: low-hanging fruit that :-) 11:20:55 <janders> with other APIs it's probably easier to just deploy more API nodes than to optimise... to a degree that is - and that assumes DB scaling is sorted out separately :) 11:21:06 <oneswig> One interesting starting point was the idea to create an Oslo module for standardising telemetry from OpenStack services, and making it available as prometheus or statsd 11:21:37 <janders> very prom(eth)ising 11:21:56 <oneswig> oh, good :-) 11:22:26 <oneswig> I don't think there's a spec for review yet but I think its underway 11:23:06 <janders> I think scalable telemetry was one of the PITA/WTF areas of OpenStack for quite a while so this sounds awesome 11:23:42 <janders> I got quite far with just switching it off but that's not good either 11:24:18 <dh3> is the intention that this data is for developers or operators? (or both) and if devs, has anyone thought about "phone home" reporting like Ceph added recently? 11:24:31 <oneswig> janders: indeed. 11:25:15 <oneswig> dh3: I think it's for operators principally, but I expect there's a symbiosis where operators have the pain points and use this as evidence for developers, perhaps. 11:26:26 <oneswig> If people are interested they can encourage this effort by adding their support to the etherpad! 11:26:52 <janders> RHATs are pumping a fair bit of effort into SAF 11:26:54 <dh3> OK. For operators there is a downstream process where - having received some data or telemetry trend or errors - they need to decide what to do about it (not necessarily being an expert in tuning MySQL or RabbitMQ or ...) 11:27:07 <janders> ("Service Assurance Framework") 11:27:08 <dh3> I will try to get some coherent thoughts onto the etherpad. 11:27:18 <janders> I wonder if the two directions could be combined 11:27:23 <oneswig> janders: what's that? 11:27:40 <oneswig> The lauchpad tickets on gathering config parameters that are important for scale may be worth a look 11:27:43 <janders> OpenShift/K8S powered, Prometheus based OpenStack monitoring 11:27:46 <dh3> embarrassed to say I had not met SAF and we have RHOSP 11:28:15 <oneswig> Nova https://bugs.launchpad.net/nova/+bug/1838819 Neutron https://bugs.launchpad.net/neutron/+bug/1858419 11:28:15 <openstack> Launchpad bug 1838819 in OpenStack Compute (nova) "Docs needed for tunables at large scale" [Undecided,Confirmed] 11:28:16 <openstack> Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 11:28:23 <janders> it's on my TODO list but due to resourcing constraints our first monitoring solution for the cyber system is gonna be Nagios 11:28:41 <janders> SAF looks quite good though 11:28:51 <oneswig> got a link? 11:29:03 <janders> and its getting a lot of good press from the Brisbane support guys who are real OpenStack gurus, hats off to them 11:29:11 <janders> if they say its good im totally gonna try it 11:29:16 <janders> (looking it up) 11:29:36 <dh3> I found https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/service_assurance_framework/introduction-to-service-assurance-framework_assembly (not sure if there's a free/OSS side?) 11:29:42 <oneswig> looks like it's purely a RHOSP thing 11:30:03 <janders> yes it is but I'd be surprised if it doesn't have a good upstream 11:30:14 <oneswig> The data at source does sound like it would be the same though 11:30:33 <janders> I cant find public link - would you like doco from KB over email? 11:31:28 <dh3> janders: anything you can share will be interesting (though we are likely to move off RH at some point so wary of lockin) 11:31:30 <oneswig> Thanks janders, one to watch 11:31:30 <witek> https://github.com/redhat-service-assurance/telemetry-framework 11:31:39 <oneswig> Hi witek, welcome! 11:31:48 <oneswig> and thanks for the link 11:31:53 <witek> they have presented it first time at the summit in Berlin 11:32:05 <janders> dh3: what's the best email to send it to? 11:32:17 <dh3> janders: dh3@sanger.ac.uk please 11:32:24 <dh3> witek: thanks, I will look up the talk 11:33:55 <janders> oneswig: dh3: done 11:35:47 <dh3> thanks, on first skim it looks the same as the access.redhat.com docs, I will find a quiet corner to read it this afternoon 11:36:03 <janders> yeah that's the source 11:36:34 <janders> I reviewed this with my monitoring expert and we figured it's worth a go, might be a good solution going forward 11:36:40 <oneswig> If more telemetry data is made available via standard protocols, everyone wins 11:36:58 <janders> but it's too complex to be deployed as the initial monitoring solution for our cyber system 11:38:00 <witek> it follows similar architecture as Monasca 11:38:21 <oneswig> I should perhaps rtfm but I wonder what SAF are doing for the limitations around eg retention policies in prometheus 11:39:36 <oneswig> janders: do they have a story for logs and notifications? 11:39:38 <janders> speaking of OSP, 16 is out in beta 11:39:52 <janders> notifications - yes, I believe so 11:39:57 <janders> logs - in what sense? 11:40:05 <oneswig> you know, like syslog 11:40:10 <janders> logstash/elasticsearch kind of stuff? 11:40:30 <oneswig> yep 11:40:32 <janders> I think they do this via collectd somehow but this part I'm less familiar with 11:41:17 <oneswig> sounds like when they get there they will have reinvented much of what monasca does today ;-) 11:41:31 <janders> :) 11:41:52 <janders> as long as the wheel is still round, a little reinvention is tolerable even if not optimal 11:42:07 <oneswig> anyway, should move on I guess. Final comments on large scale? 11:42:26 <janders> nothing specific just big +1 for the initiative 11:42:32 <dh3> likewise 11:42:45 <janders> I think there was a need for this for a very long time 11:43:21 <oneswig> sure, I think so too 11:43:29 <oneswig> #topic events for 2020 11:43:48 <janders> on that... do you guys know what's the CFP timeframe for Vancouver? 11:43:48 <oneswig> OK y'all, lets get some links out for events coming up... 11:43:59 <oneswig> janders: not heard, nor checked 11:44:11 <janders> i looked around but haven't managed to find any info 11:44:22 <dh3> we were talking about conferences in the office this morning. wondering when the European summit dates/location might be announced. and a bit sad there are no upcoming OpenStack Days listed 11:44:28 <janders> it feels like that's typically Jan-Feb for the May conferences 11:44:47 <janders> this year NA event is bit later so perhaps so is CFP 11:44:47 <oneswig> #link UKRI Coud Workshop (London) https://cloud.ac.uk/ 11:45:13 <oneswig> CFP closing date for that is end of Jan I think 11:46:13 <janders> is there anything interesting coming up for ISC2020? I was asked by my boss if I wanna go 11:46:55 <oneswig> I know the group who did SuperCompCloud at SC2019 have a similar workshop planned (CSCS in Switzerland, Indiana University, maybe others) 11:47:17 <dh3> we had the Arista conference ("Innovate") and Lustre LUG/LAD on our radar 11:47:54 <janders> at least OIS Vancouver and ISC20 don't overlap (but are close enough to make it tricky for us Aussies to attend both) 11:50:02 <oneswig> janders: I'm expecting to go to ISC 2020, hopefully see you there. 11:50:13 <janders> that would be great! :) 11:50:36 <janders> are you likely headed to Vancouver as well? 11:51:19 <oneswig> I expect so, provided they aren't the same week :-) 11:51:39 <janders> looks like they are not :) 11:51:43 <oneswig> Not planning on Cephalocon though - Seoul in March https://ceph.io/cephalocon/ 11:52:53 <oneswig> #link HPCAC Swiss Workshop in April is also good https://www.hpcadvisorycouncil.com/events/2020/swiss-workshop/ 11:53:45 <oneswig> dh3: where do cloud-oriented bioinformaticions go for conferences? 11:55:04 <dh3> oneswig: I'm not sure that the cloud emphasis affects the choice of conference but I will ask around 11:56:05 <janders> as we're nearing the end of the hour I have some AOB 11:56:13 <oneswig> #topic AOB 11:56:14 <dh3> other than some of platform-oriented things like Kubecon maybe have a bit more of an application focus than say the OpenStack summit 11:56:16 <oneswig> well spotted :-) 11:56:32 <janders> do you have much experience with cinder volume transfers? 11:56:55 <janders> I'm thinking using this as a tool for devsecops guys to make malware samples available to cyber researchers 11:57:19 <dh3> we tried it several releases back and it "just worked" but we didn't have a use case/user story for it 11:57:58 <dh3> for small scale (malware samples are small right) I might look at image sharing instead? 11:58:15 <janders> 1) people upload encrypted malware to secure dropbox 2) devsecops guys have a VM that sits on the same provider network as the dropbox and they create a volume, attach it and copy encrypted malware in 3) they give the volume to the researcher 11:58:27 <janders> I need a more fancy backend setup than glance can offer 11:58:47 <janders> with cinder I can have separate pools for normal and disco volumes 11:59:00 <janders> i tried transfers today and it *just worked* 11:59:16 <oneswig> I've used rbd export to effectively dd out a running volume before 11:59:20 <janders> if you guys have no horror stories how it breaks terribly in some corner cases I think I will endorse this solution 11:59:46 <oneswig> not used a cinder method before though 12:00:18 <janders> ok! thanks oneswig 12:00:18 <dh3> rbd export works for us (in "trying circumstances" too) 12:00:25 <oneswig> good luck with it janders! 12:00:33 <janders> thanks and I will report back 12:00:33 <oneswig> ok, time to close, thanks everyone 12:00:38 <janders> thank you all 12:00:40 <oneswig> #endmeeting