09:00:40 #startmeeting scientific_wg 09:00:41 Meeting started Wed Sep 14 09:00:40 2016 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:44 The meeting name has been set to 'scientific_wg' 09:00:52 Hello hello hello 09:01:12 #link agenda for today is here https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_September_14th_2016 09:01:48 Good morning oneswig 09:01:59 Hi priteau 09:02:00 Hello everybody 09:02:04 hello 09:02:32 Just checking for blairo 09:02:35 Hi simon-AS559 09:02:55 allo allo 09:03:01 Evening b1airo 09:03:09 how goes it? 09:03:13 Hey b1airo 09:03:15 Just getting started 09:03:39 hey priteau ! 09:03:47 #topic Barcelonaaaaaaa 09:03:53 :-) 09:04:05 So, we have plans taking shape 09:04:29 BoF double-session requested Wednesday morning (time, venue, doubleness all TBC) 09:04:46 WG meeting also requested Wednesday morning (ideally not concurrently) 09:05:25 I am hoping those will be confirmed in the schedule soon - there was a closing date for submissions tomorrow 09:05:30 so it could be as early as that 09:06:04 Yet more progress on this front however: 09:06:05 (fyi, apologies in advance if i disappear - monitoring our major ceph cluster whilst a network change is underway) 09:06:18 Good luck b1airo 09:06:24 Holy cow - live-blogging the outage! 09:06:35 don't say the O word!!! 09:06:59 keep us informed :-) 09:07:15 thanks simon-AS559, are you lurking for mentions of ceph?! 09:07:49 (Not really in this case, more lurking for mentions of federation) 09:08:06 #link we also have this arranged: https://www.eventbrite.co.uk/e/openstack-scientific-working-group-barcelona-social-tickets-27567156106 09:08:59 Currently room for 30, might expand if we get further sponsorship 09:09:26 ah great, are you one of the folks on the end of Khalil's broadcasts? 09:09:30 I haven't circulated the event details yet 09:10:27 oneswig, are you wanting us all to register that way or are we going to hold a few spots for known attendees ? 09:10:54 I can hold a few back but perhaps best to register, I'll only forget otherwise 09:11:48 I have just registered 09:11:48 on it! 09:12:01 Grand, thanks guys, now we are having a party 09:12:23 Final item for Barcelona from me was this request from a couple of weeks back 09:12:38 #link WG scientific openstack summit picks https://etherpad.openstack.org/p/Scientific-WG-summit-picks 09:12:49 just occurred to me to search eavesdrop irc logs for "eventbrite" in the weeks leading up to summits, probably a good way to uncover all the private (but free and open) events! 09:13:02 b1airo: genius 09:13:59 I think we've got a good mix there now on the etherpad, it sounds like it's going to get converted into something on SuperUser - last chance to add or amend... 09:14:47 OK, anything else for Barcelona today? 09:15:27 #topic Lustre, SRIOV, high-performance data 09:15:51 yeah i figured it would end up on superuser or some such, hence the commentary 09:16:01 This came up because we've been having some issues with Lustre clients on our Cambridge system 09:16:21 connection drops and such like 09:16:45 Still getting to the bottom of it, we had our vendors in yesterday and I think the issue's getting some focus 09:17:04 I was wondering, what's the best we can expect to see? 09:17:48 from the vendors? ... ;-) 09:18:10 b1airo: lol, not this time 09:18:27 What performance do you get on Monarch/M3? 09:18:37 as a proportion of bare metal ? 09:19:46 you know i actually don't have those numbers readily available :-( 09:19:54 Gin has some from M3 09:20:17 Thanks b1airo I'll ask in that thread with Gin 09:20:30 I assume it's working and everything's good though? 09:20:31 and I never personally performed comparative tests on MonARCH, though the main admins did 09:21:13 yes, when the SRIOV VFs stay put and the drivers behave, it works well 09:21:32 Is that unusual? 09:22:32 In a separate but possibly related problem I'm getting TCP retransmits on SR-IOV VFs in my VMs. Possibly due to packet reordering... 09:22:40 i say that because we've found that e.g. doing a NIC firmware upgrade can cause Nova to delete the PCI device from its tables and thus the compute driver removes it 09:23:28 b1airo: I'm surprised you could do a NIC firmware upgrade without rebooting the node. Most of the ones I've done with these NICs seem to end up requiring that level of reset 09:24:09 oh yes they do, but then we shutdown the guest and sometimes find it no longer has a PCI device on reboot 09:24:46 b1airo: uh-oh. I'll bear that in mind. Did you figure out why? 09:26:57 priteau: do people ever virtualise Chameleon hardware with SR-IOV ethernet? 09:27:13 no i guess we probably need to ensure nova is stopped before doing the upgrade work so that it won't report the PCI dev gone 09:27:16 unfortunately our Ethernet cards don't support SR-IOV 09:27:25 only our Infiniband cards do 09:28:05 and regarding driver issues, we have had a few guests where the MOFED driver stack is not fully loaded after boot 09:28:52 just a handful of the kernel modules but not the core ones needed to see the IB ports, so e.g. ibdevinfo and friends don't return anything 09:29:15 OFED in the guest? I'll look for that. We don't have many users booting those images currently 09:29:55 seems to be the mlx4_core module, as modprobing that (or openibd restart) causes a ~1min hang and then everything starts working 09:30:30 wrt NIC firmware, I've been wondering about a ramdisk element for Ironic which might be a good place to perform Mellanox FW updates 09:31:19 b1airo: brrr... and not something you can reproduce on tap 09:32:24 OK, I'll write to Gin to see what counts as good performance, thanks for those details 09:32:30 no not reliably reproducible :-( 09:33:32 Move on? 09:33:55 just pinged Gin on Slack, she was doing IOR tests just the other day 09:34:29 oneswig, re. the reordering issue... 09:34:33 OK, thanks b1airo, as it happens so was I (for cinder volumes) 09:34:51 have you narrowed it down any more? 09:35:24 Not yet. I'm busy preparing for a presentation tomorrow - OpenStack Day UK - hoping to pick it up on Friday 09:35:33 can't recall if it is possible to tcpdump/tap the VF from the host... seems unlikely 09:35:56 I don't think so, but I can mirror the switch port 09:36:06 yeah that's the next option 09:36:27 I think that'll be my strategy, be great to have proof 09:37:29 #topic inter-cloud federation 09:37:39 OK, lets move on 09:38:20 So there is a discussion forming about the best ways to tackle scientific compute on shared federated clouds 09:38:38 I think there's already a good deal achieved here in different areas 09:38:55 So a discussion on common cause is perhaps overdue 09:39:38 yes, did you join last night's (melbourne time) chat? 09:40:37 Yes, it was a useful discussion and the consensus was that the most productive path for this discussion would be to focus on resolving the policy issues between federated sites 09:40:56 Accounting and (possibly) chargeback are also gaping holes from what I can see 09:41:37 There was some discussion on European projects - EGI, HN Sci Cloud, Indigo-DC 09:41:44 indeed, but is there a minimal commitment option that we could start with? 09:41:45 I took an action item from the discussion to seek European TZ WG members who would be interested in taking part 09:41:49 I'm very interested in these aspects, though from a different perspective. 09:42:36 simon-AS559: great, can I put you in touch with Khalil who is organising the group? 09:42:43 Sure 09:43:31 You're at SWITCH, right? 09:43:35 Right. 09:43:51 OK, thanks I'll follow up. What's different about your perspective? 09:43:58 I'll try to explain: 09:44:13 I'm interested in giving academic *institutions* (which are complex in themselves) 09:44:20 access to our community cloud 09:44:30 using Federated Identity Management systems that are already in place 09:44:33 (SAML-based) 09:44:40 So it's less "federation between clouds" 09:44:51 and more "using identity federations" 09:45:07 (in a "traditional" B2B context, not "sharing" like in the Grid/EGI community etc.) 09:45:45 Actually it's in the academic context with a shared service provider (such as an NREN or a national compute center) 09:45:51 so not quite "traditional B2B". 09:46:16 simon-AS559, i can point you to the code we use for this in the Nectar cloud 09:46:19 Right, that makes sense - so people from University A know that they are accessing your system. It isn't that they log on to some local portal and their workload happens to launch somewhere else. 09:46:24 I guess this is partly similar to what projects like Indigo Datacloud work on, but different. 09:46:38 oneswig: right. 09:46:43 we bootstrap users onto the cloud through AAF (Australian Access Federation), which is a Shibboleth federation 09:47:00 Same here, but our bootstrapping method is super simple and woefully inadequate. 09:47:11 And we still need to build showback/reporting for institutions. 09:47:19 We already have billing though! 09:47:31 (That's why the institutions are so interested in reporting, surprise surprise!) 09:47:59 Also, delegated administration (letting institutional IT managers on- and offboard users etc.) 09:48:11 (…set up projects and quotas...) 09:48:32 very similar to CERN's requirements 09:48:51 Again, similar but different (CERN doesn't send bills) 09:49:05 (and everybody has a CERN account :-) 09:49:11 what are the gaps in your bootstrapping? 09:49:26 Putting users into the right project(s) 09:49:39 Authorizing request for access via the "responsible person" at the site 09:49:54 Getting rid of users/their resources 09:50:13 —Some of our customer institutions have told us they want auto-expiring accounts 09:50:38 Also, a "bulk mode" for onboarding many users, e.g. as part of a course 09:50:46 (and offboarding them when the semester is over...) 09:51:20 Anyway, this is more "academic WG" than "scientific WG", but I'd be happy to talk about these issues in Barcelona if there's space. 09:51:31 so the authz part sounds like what you really need is an allocation process for a project associated with reasonable quota? 09:51:37 There should be a couple people from other European providers like us there. 09:52:05 b1airo: Maybe, where "reasonable" should be defined by someone from the "home" institution. 09:52:19 simon-AS559: sounds great. I'll forward you the details of this group and hopefully you can join in the discussion right away, and keep it going at Barcelona 09:52:26 Thanks. 09:52:31 does your current bootstrap process sit in-front of your horizon? 09:52:37 Interesting work in this area includes the Hexaa/RegSite project. 09:52:58 b1airo: Our current bootstrap process is a separate Rails app that can administrate "vouchers" etc. 09:53:42 Anyway, thanks for listening! :-) 09:54:21 Thanks simon-AS559! Really useful to know. Is this all openly available from SWITCH? 09:55:10 Unfortunately no, but anyway it's very limited and tailored to other things here. 09:55:21 The Hexaa/RegSite stuff is on GitHub and more powerful. 09:55:30 Probably a better start 09:55:36 OK thanks. 09:55:39 (It wasn't there when we needed something.) 09:56:08 #topic Any other business 09:56:26 Anything to share? 09:57:24 #link I'm speaking tomorrow at https://openstackday.uk/ 09:57:31 simon-AS559, i think the nectar bootstrap approach could work for you if it's a shib federation you want to leverage 09:57:53 that would solve your user account creation issues at least 09:57:55 Thanks, that would fit. I'll have a look! 09:58:13 Is that on Github or somewhere? 09:59:18 i think it's in our internal git though :-/, but let me invite you to our Slack and i'll introduce you to Sam (you probably know him already?) 09:59:33 oneswig: I will be at the OpenStack Day tomorrow, looking forward to meet you! 09:59:51 b1airo: Yes, we met. Thanks! 09:59:55 priteau: Great! Looking forward to seeing you 10:00:28 OK we should close the meeting - final comments? 10:00:58 Thanks simon-AS559 b1airo priteau 10:01:06 #endmeeting