11:00:25 <oneswig> #startmeeting scientific-sig 11:00:25 <openstack> Meeting started Wed Jun 5 11:00:25 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:28 <openstack> The meeting name has been set to 'scientific_sig' 11:00:34 <oneswig> yarr \o/ 11:00:34 <janders> hi All 11:00:49 <martial_> Good morning/day/evening :) 11:00:51 <janders> oneswig: would you mind if we start with the SDN discussion? I need to pop out for half an hour soon 11:00:51 <oneswig> hi janders, g'day 11:00:59 <verdurin> Afternoon. 11:01:01 <oneswig> certainly, lets do that. 11:01:03 <oneswig> hi verdurin 11:01:11 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_5th_2019 11:01:25 <oneswig> OK, let's start by ignoring the running order... 11:01:27 <dh3> hi all 11:01:31 <mgoddard> o/ 11:01:36 <oneswig> #topic Coherent SDN fabrics 11:01:44 <b1airo> 'ello 11:01:45 <oneswig> janders: you raised this at teh PTG 11:01:47 <janders> only a brief update from my side: I've had a brief chat with the Mellanox guys. Overall they don't see an issue with this, but I'd like to get more detailed feedback off them 11:01:59 <oneswig> #chair b1airo martial_ 11:02:00 <openstack> Current chairs: b1airo martial_ oneswig 11:02:07 <janders> last week's neutron-driver meeting got cancelled so haven't spoken to the Neutron guys in detail either 11:02:26 <janders> indeed - this is follow up to the SDN consistency issues that we raised in the SIG 11:02:31 <oneswig> janders: ah that's too bad on the cancelled meeting 11:02:50 <oneswig> hi mgoddard thanks for coming 11:02:57 <oneswig> did you see what this is about? 11:03:18 <mgoddard> hi oneswig yes I did, and thought I'd pop along 11:03:28 <janders> hey Mark thanks for coming 11:03:30 <oneswig> #link Spec in review https://review.opendev.org/#/c/565463/ 11:03:33 <mgoddard> I haven't got as far through the neutron spec as I'd like 11:04:30 <oneswig> Me neither. Still a little concerned it's tied to one implementation 11:05:18 <oneswig> We could come back to this in 15 minutes to allow some reading time? 11:05:40 <hberaud> o/ 11:05:41 <janders> my understanding is they're trying to model on ODL but are trying to make the consistency check mechanism generic 11:06:20 <janders> what we can also do is - I will give you guys some time to go through this more in detail - and if you could make comments in the bug, that would be great 11:06:21 <oneswig> Hi hberaud, welcome 11:06:36 <janders> doesn't have to be during the meeting 11:06:43 <hberaud> oneswig: thx 11:06:46 <oneswig> janders: +1 makes sense to get the feedback out there 11:07:00 <mgoddard> one key difference between mlnx and OVS is the async update 11:07:03 <janders> but I think if we could provide some feedback to the Neutron guys before Friday's driver meeting, that would be great 11:07:12 <mgoddard> the mellanox driver updates NEO in an async thread 11:07:57 <mgoddard> so not only have we committed the update to the neutron DB, we've actually finished processing the original update request by the time NEO is updated 11:08:48 <janders> right! that matches my experience/observations 11:09:35 <oneswig> mgoddard: in extreme cases could this cause a provisioned instance to be transiently connected to a different tenant network? Is there any means of blocking on completion of an operation? 11:10:08 <b1airo> o_0 ! 11:10:16 <mgoddard> oneswig: potentially 11:10:31 <mgoddard> oneswig: neutron does have a blocking mechanism, but I don't think mlnx uses it 11:10:38 <janders> which part of this would need to go into neutron-core, and which part would be mlnx-specific changes? 11:11:09 <mgoddard> there is a thing called provisioning blocks which can be used for this 11:11:41 <mgoddard> agents are inherently aysnc, so it's nothing new. You just have to know how to use it 11:11:55 <janders> convincing mlnx to use that definitely sounds like a good idea 11:11:56 <oneswig> Ah OK. Feature not bug 11:12:36 <janders> but how about detecting state divergence and addressing it? what we're talking about doesn't address that right? 11:12:40 <janders> or would it? 11:12:51 <mgoddard> Probably bug not feature :) 11:13:32 <b1airo> state divergence is always a potential problem if the controller can be changed/updated independently of Neutron 11:13:41 <oneswig> janders: yes this doesn't relate to the missing port updates 11:14:04 <janders> my idea behind the bug was implementing a mechanism that brings SDN-based approaches on par with ovs - so that state divergence can be detected/addressed 11:14:26 <janders> https://review.opendev.org/#/c/565463/12/specs/stein/sbi-database-consistency.rst@332 looks like it might be able to address that 11:14:53 <janders> however it would be great to have your feedback 11:15:23 <janders> in general I think it would be worthwhile to split the discussion into two parts - the generic one (which the neutron team need to address) and the specific ones (which I'm happy to bring up with Mellanox) 11:16:00 <janders> I totally see value of improvements around provisioning blocks as suggested by mgoddard 11:17:01 <oneswig> Makes sense. First part may be gated, for example, on the difficulty of associating a revision number on applied config in arbitrary SDN implementations. That's reviewing the spec I guess 11:17:02 <mgoddard> I think provisioning blocks could be an easy win 11:17:23 <mgoddard> then the spec might be able to prevent subsequent drift 11:17:43 <oneswig> janders: think you can entice your friendly mlnx engineer here for part 2? 11:17:52 <janders> I have to disappear for about half an hour now - can I please ask you to make comments on what you like/dislike/any suggestions around the generic part in the bug report? It would be great to see some activity there, it would help get attention from the devs 11:18:03 <janders> oneswig: yes, I'm happy to! :) 11:18:08 <oneswig> +1 11:18:19 <janders> excellent - thanks guys 11:18:32 <janders> hopefully I'll be back before the end of the meeting 11:18:35 <oneswig> #action janders Involve Mellanox tech team on the issue 11:18:49 <oneswig> #action oneswig mgoddard add comments to the bug report and/or spec 11:19:08 <oneswig> OK, thanks janders and all, next 11:19:18 <oneswig> #topic CERN OpenStack roundup 11:19:36 <oneswig> that was fun, the CERN team laid on a great day 11:19:41 <oneswig> dh3: great talk 11:19:48 <dh3> you are too kind *blush* 11:19:55 <oneswig> How's the eagle recovering from being swabbed? :-) 11:20:17 <dh3> I hear there are some interesting stories from field expeditions ("Real Science") 11:20:58 <oneswig> I once had a support call that ended "can we get this fixed because there's a plane flying over antartica right now that needs the service back onilne..." 11:21:15 <oneswig> In a previous job 11:21:18 <b1airo> looked like a good time on Slack and Twitter 11:21:58 <oneswig> I enjoyed it. In the afternoon the talk on OpenStack control plane tracing was especially interesting. 11:22:25 <dh3> Yes I think osprofiler could be educational as well as useful 11:22:44 <oneswig> #link Ilya's presentation https://indico.cern.ch/event/776411/contributions/3402240/attachments/1851629/3039923/08_-_Distributed_Tracing_in_OpenStack.pptx.pdf 11:23:20 <oneswig> I think we'll be doing plenty of that in due course 11:24:02 <martial_> nice, thanks 11:25:24 <oneswig> Erwan Gallens talk on face recognition was good too - nice to cover more on the algorithms. 11:25:58 <oneswig> OK we should move on I think 11:26:12 <oneswig> #topic Secure Computing Environments 11:26:44 <oneswig> One question from the audience at CERN last week related to how to find best practice for controlled-access data processing 11:27:23 <oneswig> This is something we've attempted to grapple with previously but not got very far towards common ground. 11:27:31 <oneswig> Once more unto the breach... 11:28:03 <oneswig> Jani from Basel University and I presented on a project underway there 11:28:23 <oneswig> #link BioMedIT project in Basel https://indico.cern.ch/event/776411/contributions/3345191/attachments/1851634/3039929/05_-_Secure_IT_framework_for_Personalised_Health_data.pdf 11:28:45 <oneswig> This was a lightning talk though so no time to cover anything at all in depth 11:28:54 <martial_> #link https://indico.cern.ch/event/776411/contributions/ 11:29:04 <martial_> this had all the talks 11:29:25 <oneswig> cool, thanks martial_ 11:30:11 <dh3> We don't have a best practice as such - we are still offering all the various compute/storage facilities to data owners/principal investigators and trying to help them choose the right one for their data access and analysis workload 11:30:29 <dh3> buffet table vs a la carte I guess 11:31:07 <verdurin> On our side, the first meeting to discuss a paper about our work with restricted data is coming up. That will be science as well as infrastructure. 11:32:16 <oneswig> I saw a useful paper recently on classification of security levels across various domains, but I'm struggling to find it alas 11:32:38 <b1airo> interesting set of slides 11:34:10 <b1airo> looks like a decent architecture, though the thing often missing is mapping of various controls to any particular framework of reference 11:34:40 <oneswig> Something I'd like to see is authoritative research of the form "to achieve this level of security / standard, you must implement this...' 11:35:24 <dh3> oneswig: do you have something like ISO 27001/2 in mind? 11:36:12 <oneswig> Yes, that kind of thing, but also there are different classification systems in different sectors - government, health data, etc. 11:38:02 <b1airo> yes agreed oneswig... starting with a basic multi-tenant cloud implementation, what bits of OpenStack or other standard tooling can you sprinkle in to achieve sufficient confidence in the architecture and visibility of drift or abnormalities, you then need to layer in incident response requirements, so on and so forth 11:39:42 <oneswig> b1airo: infrastructure configuration is certainly not the only requirement, true. 11:40:15 <verdurin> Yes, in my experience, lots of policies, and possibly even the setting up of new administrative bodies. 11:40:15 <oneswig> I'm drawing a blank on this paper, may follow up to the Slack workspace if I find it. 11:40:51 <b1airo> to achieve compliance with any standard there's necessarily a layer of abstraction down to specific controls, procedures (and sometimes the organisational units & heads that are responsible for them) 11:41:42 <dh3> I've never had to work with that sort of published standard but experience says that it will reach from infra/admins through application devs too 11:42:30 <b1airo> in OpenStack today I don't think we even have a reference list of controls, let alone reference architectures 11:42:31 <oneswig> dh3: do you need specific standards to host eg UK BioBank data? (Or was that just animals and plants?) 11:43:19 <verdurin> oneswig: it's actually more complicated than that, there are many varieties of UKB data, with their own special requirements. 11:43:22 <dh3> oneswig: it's not really my area but I know that for some data (human identifiable) we have had to implement separate networks, virt platforms, storage.... 11:44:50 <oneswig> dh3: does Sanger have policy guidelines for this that are public? 11:44:54 <dh3> our UKB vanguard data is in our production OpenStack environment, but in a separate host aggregate (for perf isolation but does bring security) and separate storage/networking (via provider network) 11:45:06 <dh3> oneswig: not that I have seen! but I can ask 11:45:16 <heikkine> dh3 Jani from Basel here, I agree with requirement for separate networks, etc. 11:45:26 <oneswig> Might be really useful! 11:45:34 <oneswig> hi heikkine, thanks for coming 11:45:51 <heikkine> I may drop off any moment but trying to follow what I can 11:46:27 <oneswig> dh3: do you go as far as separate physical storage? 11:47:15 <dh3> oneswig: yes in some cases, "identifiable human data" is the usual trigger but not always 11:47:40 <heikkine> dh3: what is in your case separate physical storage? Separate from OpenStack? 11:48:31 <dh3> heikkine: taking UKB as an example, we have it on a separate Lustre system, it's only accessible from that tenant's provider network. 11:49:02 <oneswig> so separate from OpenStack and separate from other projects too 11:49:03 <janders> would software-defined-storage help in this sense? 11:49:26 <heikkine> I see. So you do tenant isolation that way. We are starting with NFS but I have done tests with BeeGFS 11:49:33 <janders> as in - having a pool of JBOFs - and then spinning up a "private" storage cluster of a certain scale on demand 11:49:38 <verdurin> We are looking at Lustre to cover this kind of case in the future, given it's more amenable than GPFS. 11:49:59 <oneswig> A substantial discussion in its own right 11:50:38 <b1airo> there are some US standards that actually specify isolated physical storage 11:50:39 <dh3> janders: it depends - again not entirely my area but I understand that some data access agreements are restrictive on where data can be kept (physically and geographically) - and SDS makes it harder to show an auditor "it's here" 11:51:08 <oneswig> Over the summer it may be good to gather some of these pieces together into a joint study. Does anyone object to being involved? 11:51:09 <b1airo> e.g. think ITAR data cannot be mixed with non-ITAR data on same physical storage 11:51:34 <martial_> I think this is correct, Blair 11:51:36 <b1airo> lol, love the phrasing of that question oneswig ! 11:51:37 <janders> right! whether it works is one thing, whether it ticks compliance box might be another thing. Very useful feedback, thank you. 11:51:37 <verdurin> janders: yes, to echo dh3's comments, I'm dealing with a project at the moment in which the form would not cope well with such a setup. 11:52:04 <dh3> SDS also makes proving secure deletion more difficult (can't even identify which HDDs to take out and shred...) 11:52:22 <oneswig> b1airo: opt-out from being volunteered :-) 11:52:24 <janders> (I suppose node cleaning doesn't cut it :) 11:52:42 <dh3> node cleaning + DBAN maybe :) 11:53:01 <b1airo> janders: actually if you are using self-encrypting disks then it might... 11:53:04 <verdurin> Don't deprive the workshop people of their longer-for chances to deploy the angle grinder 11:53:10 <verdurin> longed-for 11:53:28 <oneswig> verdurin: --yes-i-really-mean-it ... 11:53:34 <janders> :D 11:53:40 <b1airo> because we know hardware level security is so good... *cough* 11:54:15 <oneswig> One more topic to cover today, can we follow up on this perhaps via the slack channel? 11:54:26 <b1airo> sounds good 11:54:29 <oneswig> dh3: are you on there? 11:54:47 <oneswig> I can send an invite - unless you're too secure for Slack of course :-) 11:55:11 <oneswig> #topic Justification whitepaper for on premise cloud for research and scientific computing 11:55:16 <dh3> oneswig: no (not yet - do invite me please) 11:55:22 <oneswig> dh3: will do 11:55:58 <oneswig> just a quick mention, some of the US folks are very interested in pooling together on a paper that lays out the private cloud case 11:56:17 <oneswig> This was raised at the SIG session in the Denver PTG 11:56:18 <b1airo> there's a great NASA tech paper i came across today that looks at this from the HPC perspective: https://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_Report_NAS-2018-01.pdf 11:56:32 <oneswig> Anyone in this session recently made this case themselves? 11:56:41 <oneswig> thanks b1airo 11:57:17 <oneswig> During the session somebody mentioned the talk by Andrew Jones of NAG at the UKRI Cloud Workshop in January 11:57:25 <oneswig> #link Andrew Jones presentation https://www.linkedin.com/pulse/cloud-hpc-dose-reality-andrew-jones/ 11:58:30 <oneswig> The hope is that this material might make a useful resource to draw upon when writing procurement proposals. 11:58:42 <janders> lots of follow up reading after today's meeting - thanks for some great links guys 11:59:02 <b1airo> a key thing to establish early on here would be the kinds of applications that are being supported on the private cloud 11:59:09 <dh3> hadn't seen that, thanks (but we are "strategically" going cloud-wards anyway) 11:59:17 <oneswig> b1airo: very true 11:59:20 <b1airo> and a valid answer might be "all of them" 11:59:36 <oneswig> OK nearly out of time - final comments on this? 11:59:55 <b1airo> raise it again for next week's timezone i guess 12:00:07 <oneswig> b1airo: +1, will do. 12:00:30 <oneswig> OK all, time to stop - thanks everyone 12:00:31 <janders> thanks guys! again - any comments in the SDN bug will be very welcome by the Neutron team 12:00:37 <martial_> Thanks ;) 12:00:39 <oneswig> janders: will do 12:00:46 <oneswig> #endmeeting