09:00:40 <oneswig> #startmeeting scientific_wg 09:00:41 <openstack> Meeting started Wed Sep 14 09:00:40 2016 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:44 <openstack> The meeting name has been set to 'scientific_wg' 09:00:52 <oneswig> Hello hello hello 09:01:12 <oneswig> #link agenda for today is here https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_September_14th_2016 09:01:48 <priteau> Good morning oneswig 09:01:59 <oneswig> Hi priteau 09:02:00 <simon-AS559> Hello everybody 09:02:04 <oneswig> hello 09:02:32 <oneswig> Just checking for blairo 09:02:35 <priteau> Hi simon-AS559 09:02:55 <b1airo> allo allo 09:03:01 <oneswig> Evening b1airo 09:03:09 <b1airo> how goes it? 09:03:13 <priteau> Hey b1airo 09:03:15 <oneswig> Just getting started 09:03:39 <b1airo> hey priteau ! 09:03:47 <oneswig> #topic Barcelonaaaaaaa 09:03:53 <b1airo> :-) 09:04:05 <oneswig> So, we have plans taking shape 09:04:29 <oneswig> BoF double-session requested Wednesday morning (time, venue, doubleness all TBC) 09:04:46 <oneswig> WG meeting also requested Wednesday morning (ideally not concurrently) 09:05:25 <oneswig> I am hoping those will be confirmed in the schedule soon - there was a closing date for submissions tomorrow 09:05:30 <oneswig> so it could be as early as that 09:06:04 <oneswig> Yet more progress on this front however: 09:06:05 <b1airo> (fyi, apologies in advance if i disappear - monitoring our major ceph cluster whilst a network change is underway) 09:06:18 <simon-AS559> Good luck b1airo 09:06:24 <oneswig> Holy cow - live-blogging the outage! 09:06:35 <b1airo> don't say the O word!!! 09:06:59 <oneswig> keep us informed :-) 09:07:15 <b1airo> thanks simon-AS559, are you lurking for mentions of ceph?! 09:07:49 <simon-AS559> (Not really in this case, more lurking for mentions of federation) 09:08:06 <oneswig> #link we also have this arranged: https://www.eventbrite.co.uk/e/openstack-scientific-working-group-barcelona-social-tickets-27567156106 09:08:59 <oneswig> Currently room for 30, might expand if we get further sponsorship 09:09:26 <b1airo> ah great, are you one of the folks on the end of Khalil's broadcasts? 09:09:30 <oneswig> I haven't circulated the event details yet 09:10:27 <b1airo> oneswig, are you wanting us all to register that way or are we going to hold a few spots for known attendees ? 09:10:54 <oneswig> I can hold a few back but perhaps best to register, I'll only forget otherwise 09:11:48 <priteau> I have just registered 09:11:48 <b1airo> on it! 09:12:01 <oneswig> Grand, thanks guys, now we are having a party 09:12:23 <oneswig> Final item for Barcelona from me was this request from a couple of weeks back 09:12:38 <oneswig> #link WG scientific openstack summit picks https://etherpad.openstack.org/p/Scientific-WG-summit-picks 09:12:49 <b1airo> just occurred to me to search eavesdrop irc logs for "eventbrite" in the weeks leading up to summits, probably a good way to uncover all the private (but free and open) events! 09:13:02 <oneswig> b1airo: genius 09:13:59 <oneswig> I think we've got a good mix there now on the etherpad, it sounds like it's going to get converted into something on SuperUser - last chance to add or amend... 09:14:47 <oneswig> OK, anything else for Barcelona today? 09:15:27 <oneswig> #topic Lustre, SRIOV, high-performance data 09:15:51 <b1airo> yeah i figured it would end up on superuser or some such, hence the commentary 09:16:01 <oneswig> This came up because we've been having some issues with Lustre clients on our Cambridge system 09:16:21 <oneswig> connection drops and such like 09:16:45 <oneswig> Still getting to the bottom of it, we had our vendors in yesterday and I think the issue's getting some focus 09:17:04 <oneswig> I was wondering, what's the best we can expect to see? 09:17:48 <b1airo> from the vendors? ... ;-) 09:18:10 <oneswig> b1airo: lol, not this time 09:18:27 <oneswig> What performance do you get on Monarch/M3? 09:18:37 <oneswig> as a proportion of bare metal ? 09:19:46 <b1airo> you know i actually don't have those numbers readily available :-( 09:19:54 <b1airo> Gin has some from M3 09:20:17 <oneswig> Thanks b1airo I'll ask in that thread with Gin 09:20:30 <oneswig> I assume it's working and everything's good though? 09:20:31 <b1airo> and I never personally performed comparative tests on MonARCH, though the main admins did 09:21:13 <b1airo> yes, when the SRIOV VFs stay put and the drivers behave, it works well 09:21:32 <oneswig> Is that unusual? 09:22:32 <oneswig> In a separate but possibly related problem I'm getting TCP retransmits on SR-IOV VFs in my VMs. Possibly due to packet reordering... 09:22:40 <b1airo> i say that because we've found that e.g. doing a NIC firmware upgrade can cause Nova to delete the PCI device from its tables and thus the compute driver removes it 09:23:28 <oneswig> b1airo: I'm surprised you could do a NIC firmware upgrade without rebooting the node. Most of the ones I've done with these NICs seem to end up requiring that level of reset 09:24:09 <b1airo> oh yes they do, but then we shutdown the guest and sometimes find it no longer has a PCI device on reboot 09:24:46 <oneswig> b1airo: uh-oh. I'll bear that in mind. Did you figure out why? 09:26:57 <oneswig> priteau: do people ever virtualise Chameleon hardware with SR-IOV ethernet? 09:27:13 <b1airo> no i guess we probably need to ensure nova is stopped before doing the upgrade work so that it won't report the PCI dev gone 09:27:16 <priteau> unfortunately our Ethernet cards don't support SR-IOV 09:27:25 <priteau> only our Infiniband cards do 09:28:05 <b1airo> and regarding driver issues, we have had a few guests where the MOFED driver stack is not fully loaded after boot 09:28:52 <b1airo> just a handful of the kernel modules but not the core ones needed to see the IB ports, so e.g. ibdevinfo and friends don't return anything 09:29:15 <oneswig> OFED in the guest? I'll look for that. We don't have many users booting those images currently 09:29:55 <b1airo> seems to be the mlx4_core module, as modprobing that (or openibd restart) causes a ~1min hang and then everything starts working 09:30:30 <oneswig> wrt NIC firmware, I've been wondering about a ramdisk element for Ironic which might be a good place to perform Mellanox FW updates 09:31:19 <oneswig> b1airo: brrr... and not something you can reproduce on tap 09:32:24 <oneswig> OK, I'll write to Gin to see what counts as good performance, thanks for those details 09:32:30 <b1airo> no not reliably reproducible :-( 09:33:32 <oneswig> Move on? 09:33:55 <b1airo> just pinged Gin on Slack, she was doing IOR tests just the other day 09:34:29 <b1airo> oneswig, re. the reordering issue... 09:34:33 <oneswig> OK, thanks b1airo, as it happens so was I (for cinder volumes) 09:34:51 <b1airo> have you narrowed it down any more? 09:35:24 <oneswig> Not yet. I'm busy preparing for a presentation tomorrow - OpenStack Day UK - hoping to pick it up on Friday 09:35:33 <b1airo> can't recall if it is possible to tcpdump/tap the VF from the host... seems unlikely 09:35:56 <oneswig> I don't think so, but I can mirror the switch port 09:36:06 <b1airo> yeah that's the next option 09:36:27 <oneswig> I think that'll be my strategy, be great to have proof 09:37:29 <oneswig> #topic inter-cloud federation 09:37:39 <oneswig> OK, lets move on 09:38:20 <oneswig> So there is a discussion forming about the best ways to tackle scientific compute on shared federated clouds 09:38:38 <oneswig> I think there's already a good deal achieved here in different areas 09:38:55 <oneswig> So a discussion on common cause is perhaps overdue 09:39:38 <b1airo> yes, did you join last night's (melbourne time) chat? 09:40:37 <oneswig> Yes, it was a useful discussion and the consensus was that the most productive path for this discussion would be to focus on resolving the policy issues between federated sites 09:40:56 <oneswig> Accounting and (possibly) chargeback are also gaping holes from what I can see 09:41:37 <oneswig> There was some discussion on European projects - EGI, HN Sci Cloud, Indigo-DC 09:41:44 <b1airo> indeed, but is there a minimal commitment option that we could start with? 09:41:45 <oneswig> I took an action item from the discussion to seek European TZ WG members who would be interested in taking part 09:41:49 <simon-AS559> I'm very interested in these aspects, though from a different perspective. 09:42:36 <oneswig> simon-AS559: great, can I put you in touch with Khalil who is organising the group? 09:42:43 <simon-AS559> Sure 09:43:31 <oneswig> You're at SWITCH, right? 09:43:35 <simon-AS559> Right. 09:43:51 <oneswig> OK, thanks I'll follow up. What's different about your perspective? 09:43:58 <simon-AS559> I'll try to explain: 09:44:13 <simon-AS559> I'm interested in giving academic *institutions* (which are complex in themselves) 09:44:20 <simon-AS559> access to our community cloud 09:44:30 <simon-AS559> using Federated Identity Management systems that are already in place 09:44:33 <simon-AS559> (SAML-based) 09:44:40 <simon-AS559> So it's less "federation between clouds" 09:44:51 <simon-AS559> and more "using identity federations" 09:45:07 <simon-AS559> (in a "traditional" B2B context, not "sharing" like in the Grid/EGI community etc.) 09:45:45 <simon-AS559> Actually it's in the academic context with a shared service provider (such as an NREN or a national compute center) 09:45:51 <simon-AS559> so not quite "traditional B2B". 09:46:16 <b1airo> simon-AS559, i can point you to the code we use for this in the Nectar cloud 09:46:19 <oneswig> Right, that makes sense - so people from University A know that they are accessing your system. It isn't that they log on to some local portal and their workload happens to launch somewhere else. 09:46:24 <simon-AS559> I guess this is partly similar to what projects like Indigo Datacloud work on, but different. 09:46:38 <simon-AS559> oneswig: right. 09:46:43 <b1airo> we bootstrap users onto the cloud through AAF (Australian Access Federation), which is a Shibboleth federation 09:47:00 <simon-AS559> Same here, but our bootstrapping method is super simple and woefully inadequate. 09:47:11 <simon-AS559> And we still need to build showback/reporting for institutions. 09:47:19 <simon-AS559> We already have billing though! 09:47:31 <simon-AS559> (That's why the institutions are so interested in reporting, surprise surprise!) 09:47:59 <simon-AS559> Also, delegated administration (letting institutional IT managers on- and offboard users etc.) 09:48:11 <simon-AS559> (…set up projects and quotas...) 09:48:32 <b1airo> very similar to CERN's requirements 09:48:51 <simon-AS559> Again, similar but different (CERN doesn't send bills) 09:49:05 <simon-AS559> (and everybody has a CERN account :-) 09:49:11 <b1airo> what are the gaps in your bootstrapping? 09:49:26 <simon-AS559> Putting users into the right project(s) 09:49:39 <simon-AS559> Authorizing request for access via the "responsible person" at the site 09:49:54 <simon-AS559> Getting rid of users/their resources 09:50:13 <simon-AS559> —Some of our customer institutions have told us they want auto-expiring accounts 09:50:38 <simon-AS559> Also, a "bulk mode" for onboarding many users, e.g. as part of a course 09:50:46 <simon-AS559> (and offboarding them when the semester is over...) 09:51:20 <simon-AS559> Anyway, this is more "academic WG" than "scientific WG", but I'd be happy to talk about these issues in Barcelona if there's space. 09:51:31 <b1airo> so the authz part sounds like what you really need is an allocation process for a project associated with reasonable quota? 09:51:37 <simon-AS559> There should be a couple people from other European providers like us there. 09:52:05 <simon-AS559> b1airo: Maybe, where "reasonable" should be defined by someone from the "home" institution. 09:52:19 <oneswig> simon-AS559: sounds great. I'll forward you the details of this group and hopefully you can join in the discussion right away, and keep it going at Barcelona 09:52:26 <simon-AS559> Thanks. 09:52:31 <b1airo> does your current bootstrap process sit in-front of your horizon? 09:52:37 <simon-AS559> Interesting work in this area includes the Hexaa/RegSite project. 09:52:58 <simon-AS559> b1airo: Our current bootstrap process is a separate Rails app that can administrate "vouchers" etc. 09:53:42 <simon-AS559> Anyway, thanks for listening! :-) 09:54:21 <oneswig> Thanks simon-AS559! Really useful to know. Is this all openly available from SWITCH? 09:55:10 <simon-AS559> Unfortunately no, but anyway it's very limited and tailored to other things here. 09:55:21 <simon-AS559> The Hexaa/RegSite stuff is on GitHub and more powerful. 09:55:30 <simon-AS559> Probably a better start 09:55:36 <oneswig> OK thanks. 09:55:39 <simon-AS559> (It wasn't there when we needed something.) 09:56:08 <oneswig> #topic Any other business 09:56:26 <oneswig> Anything to share? 09:57:24 <oneswig> #link I'm speaking tomorrow at https://openstackday.uk/ 09:57:31 <b1airo> simon-AS559, i think the nectar bootstrap approach could work for you if it's a shib federation you want to leverage 09:57:53 <b1airo> that would solve your user account creation issues at least 09:57:55 <simon-AS559> Thanks, that would fit. I'll have a look! 09:58:13 <simon-AS559> Is that on Github or somewhere? 09:59:18 <b1airo> i think it's in our internal git though :-/, but let me invite you to our Slack and i'll introduce you to Sam (you probably know him already?) 09:59:33 <priteau> oneswig: I will be at the OpenStack Day tomorrow, looking forward to meet you! 09:59:51 <simon-AS559> b1airo: Yes, we met. Thanks! 09:59:55 <oneswig> priteau: Great! Looking forward to seeing you 10:00:28 <oneswig> OK we should close the meeting - final comments? 10:00:58 <oneswig> Thanks simon-AS559 b1airo priteau 10:01:06 <oneswig> #endmeeting