11:00:15 #startmeeting scientific-sig 11:00:16 Meeting started Wed Jan 15 11:00:15 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:19 The meeting name has been set to 'scientific_sig' 11:00:26 SIG time! 11:00:35 #link Today's agenda https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_15th_2020 11:00:56 greetings all 11:00:59 g'day all 11:01:03 morning! 11:01:05 Happy New Year 2020 11:01:08 janders: saw your presentation from kubecon last week, nice work! 11:01:18 oneswig: thank you! :) 11:01:25 it was a great conference 11:01:29 clarification - kubecon last november, saw it last week 11:01:40 did you get many questions in follow-up? 11:02:31 just a few.. now I'm thinking if I replied :) was pretty hectic end of the year 11:02:42 #link Janders at Kubecon https://www.youtube.com/watch?v=sPfZGHWnKNM 11:03:13 So many moving parts though! Does it hold together? 11:03:52 hello everyone 11:03:59 Hi mattia, welcome 11:04:26 it does - and it's getting better :) waiting for some kit to arrive and hope to test 100GE version and HDR200 version of RDMA-K8s 11:04:41 it's still early days though 11:04:56 Have you looked at options for PMIX so MPI jobs can sit on top? 11:05:16 no, not yet 11:06:07 I was mostly thinking connecting to non-k8s storage and GPU workloads. I'm sure MPI workloads will bubble up too though 11:07:10 It seems a lot less straightforward than putting containers into a Slurm environment. But with the energy going into it I'm sure a workable solution will appear 11:07:49 #link PMIX and kubernetes hinted to here https://pmix.org/wp-content/uploads/2019/04/PMIxSUG2019.pdf 11:08:32 janders: big numbers for your storage bandwidth at the end of the presentation 11:08:34 it's still work in progress but in a scenario where a fair chunk of workloads is cloud-native and MPI is a small fraction I think this approach makes good sense 11:09:21 yes - that looks like the use case 11:09:30 ... I'm sure we had an agenda somewhere ... oops 11:09:31 that was actually from our OpenStack system, but when I get more kit I hope to see these in k8s containers as well 11:10:04 be interesting to know where you take it next 11:10:07 we're getting close to going live with the cyber system so my bosses aren't to keen with me testing on the same kit 11:10:31 i have some interesting (I hope) questions on that - but let's leave that for AOB 11:10:31 curse those production obligations! 11:10:40 haha! :) 11:10:46 OK, time to start 11:10:50 I keep saying ops focused guys they are no fun 11:11:09 #topic OpenStack board elections 11:11:20 Just a quick reminder to vote for individual directors this week 11:12:21 I'd share a link but I think they are all somehow individual 11:12:31 That's all on that :-) 11:12:48 #topic Large-Scale SIG survey 11:13:16 Earlier this morning there was a Large Scale SIG meeting 11:13:55 There don't appear to be minutes for it on Eavesdrop... 11:14:10 aha http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-01-15-09.00.html 11:14:47 #link request for data on large-scale configuration http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011820.html 11:15:48 Thanks, I missed that (feeling I'm spread a bit thin at the moment) - not that we are *so* large as some 11:16:22 Hi dh3, everything is relative I guess! 11:17:11 The Large Scale SIG are hoping for input on scaling issues faced in real-world experiences. 11:17:29 (and what tools to tackle them) 11:17:35 #link Large Scale SIG etherpad https://etherpad.openstack.org/p/large-scale-sig-documentation 11:19:05 DB, AMQP, Neutron... guessed 3/3 and the order, too :) 11:20:07 janders: low-hanging fruit that :-) 11:20:55 with other APIs it's probably easier to just deploy more API nodes than to optimise... to a degree that is - and that assumes DB scaling is sorted out separately :) 11:21:06 One interesting starting point was the idea to create an Oslo module for standardising telemetry from OpenStack services, and making it available as prometheus or statsd 11:21:37 very prom(eth)ising 11:21:56 oh, good :-) 11:22:26 I don't think there's a spec for review yet but I think its underway 11:23:06 I think scalable telemetry was one of the PITA/WTF areas of OpenStack for quite a while so this sounds awesome 11:23:42 I got quite far with just switching it off but that's not good either 11:24:18 is the intention that this data is for developers or operators? (or both) and if devs, has anyone thought about "phone home" reporting like Ceph added recently? 11:24:31 janders: indeed. 11:25:15 dh3: I think it's for operators principally, but I expect there's a symbiosis where operators have the pain points and use this as evidence for developers, perhaps. 11:26:26 If people are interested they can encourage this effort by adding their support to the etherpad! 11:26:52 RHATs are pumping a fair bit of effort into SAF 11:26:54 OK. For operators there is a downstream process where - having received some data or telemetry trend or errors - they need to decide what to do about it (not necessarily being an expert in tuning MySQL or RabbitMQ or ...) 11:27:07 ("Service Assurance Framework") 11:27:08 I will try to get some coherent thoughts onto the etherpad. 11:27:18 I wonder if the two directions could be combined 11:27:23 janders: what's that? 11:27:40 The lauchpad tickets on gathering config parameters that are important for scale may be worth a look 11:27:43 OpenShift/K8S powered, Prometheus based OpenStack monitoring 11:27:46 embarrassed to say I had not met SAF and we have RHOSP 11:28:15 Nova https://bugs.launchpad.net/nova/+bug/1838819 Neutron https://bugs.launchpad.net/neutron/+bug/1858419 11:28:15 Launchpad bug 1838819 in OpenStack Compute (nova) "Docs needed for tunables at large scale" [Undecided,Confirmed] 11:28:16 Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 11:28:23 it's on my TODO list but due to resourcing constraints our first monitoring solution for the cyber system is gonna be Nagios 11:28:41 SAF looks quite good though 11:28:51 got a link? 11:29:03 and its getting a lot of good press from the Brisbane support guys who are real OpenStack gurus, hats off to them 11:29:11 if they say its good im totally gonna try it 11:29:16 (looking it up) 11:29:36 I found https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/service_assurance_framework/introduction-to-service-assurance-framework_assembly (not sure if there's a free/OSS side?) 11:29:42 looks like it's purely a RHOSP thing 11:30:03 yes it is but I'd be surprised if it doesn't have a good upstream 11:30:14 The data at source does sound like it would be the same though 11:30:33 I cant find public link - would you like doco from KB over email? 11:31:28 janders: anything you can share will be interesting (though we are likely to move off RH at some point so wary of lockin) 11:31:30 Thanks janders, one to watch 11:31:30 https://github.com/redhat-service-assurance/telemetry-framework 11:31:39 Hi witek, welcome! 11:31:48 and thanks for the link 11:31:53 they have presented it first time at the summit in Berlin 11:32:05 dh3: what's the best email to send it to? 11:32:17 janders: dh3@sanger.ac.uk please 11:32:24 witek: thanks, I will look up the talk 11:33:55 oneswig: dh3: done 11:35:47 thanks, on first skim it looks the same as the access.redhat.com docs, I will find a quiet corner to read it this afternoon 11:36:03 yeah that's the source 11:36:34 I reviewed this with my monitoring expert and we figured it's worth a go, might be a good solution going forward 11:36:40 If more telemetry data is made available via standard protocols, everyone wins 11:36:58 but it's too complex to be deployed as the initial monitoring solution for our cyber system 11:38:00 it follows similar architecture as Monasca 11:38:21 I should perhaps rtfm but I wonder what SAF are doing for the limitations around eg retention policies in prometheus 11:39:36 janders: do they have a story for logs and notifications? 11:39:38 speaking of OSP, 16 is out in beta 11:39:52 notifications - yes, I believe so 11:39:57 logs - in what sense? 11:40:05 you know, like syslog 11:40:10 logstash/elasticsearch kind of stuff? 11:40:30 yep 11:40:32 I think they do this via collectd somehow but this part I'm less familiar with 11:41:17 sounds like when they get there they will have reinvented much of what monasca does today ;-) 11:41:31 :) 11:41:52 as long as the wheel is still round, a little reinvention is tolerable even if not optimal 11:42:07 anyway, should move on I guess. Final comments on large scale? 11:42:26 nothing specific just big +1 for the initiative 11:42:32 likewise 11:42:45 I think there was a need for this for a very long time 11:43:21 sure, I think so too 11:43:29 #topic events for 2020 11:43:48 on that... do you guys know what's the CFP timeframe for Vancouver? 11:43:48 OK y'all, lets get some links out for events coming up... 11:43:59 janders: not heard, nor checked 11:44:11 i looked around but haven't managed to find any info 11:44:22 we were talking about conferences in the office this morning. wondering when the European summit dates/location might be announced. and a bit sad there are no upcoming OpenStack Days listed 11:44:28 it feels like that's typically Jan-Feb for the May conferences 11:44:47 this year NA event is bit later so perhaps so is CFP 11:44:47 #link UKRI Coud Workshop (London) https://cloud.ac.uk/ 11:45:13 CFP closing date for that is end of Jan I think 11:46:13 is there anything interesting coming up for ISC2020? I was asked by my boss if I wanna go 11:46:55 I know the group who did SuperCompCloud at SC2019 have a similar workshop planned (CSCS in Switzerland, Indiana University, maybe others) 11:47:17 we had the Arista conference ("Innovate") and Lustre LUG/LAD on our radar 11:47:54 at least OIS Vancouver and ISC20 don't overlap (but are close enough to make it tricky for us Aussies to attend both) 11:50:02 janders: I'm expecting to go to ISC 2020, hopefully see you there. 11:50:13 that would be great! :) 11:50:36 are you likely headed to Vancouver as well? 11:51:19 I expect so, provided they aren't the same week :-) 11:51:39 looks like they are not :) 11:51:43 Not planning on Cephalocon though - Seoul in March https://ceph.io/cephalocon/ 11:52:53 #link HPCAC Swiss Workshop in April is also good https://www.hpcadvisorycouncil.com/events/2020/swiss-workshop/ 11:53:45 dh3: where do cloud-oriented bioinformaticions go for conferences? 11:55:04 oneswig: I'm not sure that the cloud emphasis affects the choice of conference but I will ask around 11:56:05 as we're nearing the end of the hour I have some AOB 11:56:13 #topic AOB 11:56:14 other than some of platform-oriented things like Kubecon maybe have a bit more of an application focus than say the OpenStack summit 11:56:16 well spotted :-) 11:56:32 do you have much experience with cinder volume transfers? 11:56:55 I'm thinking using this as a tool for devsecops guys to make malware samples available to cyber researchers 11:57:19 we tried it several releases back and it "just worked" but we didn't have a use case/user story for it 11:57:58 for small scale (malware samples are small right) I might look at image sharing instead? 11:58:15 1) people upload encrypted malware to secure dropbox 2) devsecops guys have a VM that sits on the same provider network as the dropbox and they create a volume, attach it and copy encrypted malware in 3) they give the volume to the researcher 11:58:27 I need a more fancy backend setup than glance can offer 11:58:47 with cinder I can have separate pools for normal and disco volumes 11:59:00 i tried transfers today and it *just worked* 11:59:16 I've used rbd export to effectively dd out a running volume before 11:59:20 if you guys have no horror stories how it breaks terribly in some corner cases I think I will endorse this solution 11:59:46 not used a cinder method before though 12:00:18 ok! thanks oneswig 12:00:18 rbd export works for us (in "trying circumstances" too) 12:00:25 good luck with it janders! 12:00:33 thanks and I will report back 12:00:33 ok, time to close, thanks everyone 12:00:38 thank you all 12:00:40 #endmeeting