11:00:31 #startmeeting scientific-sig 11:00:32 Meeting started Wed Jan 17 11:00:31 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:35 The meeting name has been set to 'scientific_sig' 11:00:40 ahoy there 11:00:42 morning 11:00:52 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_17th_2018 11:00:56 Hi daveholland, morning 11:01:04 o/ 11:01:10 Hi Nick! 11:01:15 o/ 11:01:17 * ildikov is lurking :) 11:01:20 And John! 11:01:22 Hello 11:01:22 hi 11:01:39 and indeed, hi everyone 11:01:54 * ildikov is looking at johnthetubaguy with cat eyes from Shrek :) 11:02:22 Good morning! 11:02:37 While people gather, we have Spyros with us today, Magnum PTL. Thanks for coming Spyros 11:02:41 Hi priteau 11:02:51 hi 11:03:14 Ready to get stared? 11:03:18 started... 11:03:22 (hi belmoreira) 11:03:41 #topic Magnum for research computing use cases 11:04:17 Spyros, thanks for coming. We have a ton of questions and I'm sure others do too. 11:04:50 Can you set the ball rolling by describing what's happening with Magnum and bare metal? It appears to have really improved in the last year 11:05:08 Sure, thanks 11:05:41 Since the Newton cycle in 2016 magnum has the concept of cluster drivers 11:06:28 The goal of this change is to focus each cluster deployment in a combination of server type (vm|bm), Operating System and COE 11:07:11 A single driver for each 3-tuple of these? 11:07:16 Following this pattern we ended up having a driver for vm based drivers and ironic 11:07:19 yes 11:07:36 ed fedora-atomic, virtual machine, kubernetes 11:07:43 s/ed/eg 11:08:36 Ironic wasn't playing nice with neutron at that time 11:09:19 And the implementation was a assuming that all nodes are in a pre-existing network 11:09:46 plus we didn't have a way to make the usage of cinder optional. 11:10:07 All good points... 11:10:34 So what happened? 11:10:34 We used to have a cinder volume in each node moutned for container storage without the option to opt-out 11:11:30 So, we just made all these configurable and optionnal 11:11:56 * johnthetubaguy noodles about manila mounts for storage needs 11:12:02 We can now use the same setup for both VMs and physical servers 11:12:12 So Ironic is a support through a set of options. But COE is still a different driver, right? 11:13:07 strigazi: You mentioned issues between Ironic and Neutron. Do you require multi-tenant networking to be configured? 11:13:20 yes, we have a patch in flight to consolidate and weach COE is tied to the operating system 11:14:22 priteau: the usual use case is to have a private network where all nodes will be running. This network can be shared among tenants 11:15:11 strigazi: any plan to offer an unusual case? ingest bandwidth is an issue for us 11:15:38 oneswig: what do you mean? What network configuration do you need? 11:16:01 anything without a neutron router process between the data source and the container cluster, ideally 11:17:06 For example, a simplified setup where the nodes are deployed bound to a pre-existing (provider) network. Would work for us but we may be niche 11:17:37 that layout would be of interest to us too 11:17:38 I was thinking of using additional ips rather than floating ips, for example 11:18:02 Will you be able to create ports to that network? 11:18:18 strigazi: yes 11:19:00 * johnthetubaguy thinks you can pass ports into nova for ironic instances now, not 100% sure 11:19:24 I think that can done today already without patching magnum 11:19:46 oneswig: Do you have a write-up of how you use these existing provider networks to bypass a centralized Neutron router? We would be interested in Chameleon to perform high-bandwidth experiments 11:20:38 strigazi: so we can configure Magnum not to create the network and router? 11:21:22 priteau: I'm not sure there's any rocket science to it - but our system is not as multi-tenant as yours. Lets talk after. 11:21:22 oneswig: yes, that is already in pike for vms, ironic machines are expected to work in the same way in queens 11:21:39 oneswig: Thanks, I don't want to go off-topic 11:22:05 strigazi: sounds good to me. 11:22:19 strigazi: we are a little off mainline but we may have problems with deploying using Atomic Host cloud images - they lack drivers for some of the krazy hardware in our nodes (IB, etc). Is it possible to build our own images and bake this stuff in? 11:22:25 Getting the network architecture set correctly is the first thing to do 11:24:05 We used to have a diskimage-builder script for custom atomic hosts but we stopped using it and went for the upstream fedora releases. We can update it 11:24:39 strigazi: If it's on git, we'd be happy to refresh it. 11:24:44 The best option would be if those drivers could be installed on a running instance 11:25:03 Would that be possible? 11:25:25 Can be problematic for RAID devices, otherwise it should be 11:26:36 another option would be to use the standard images build with DIB and add the ostee storage for system containers 11:26:58 image builder is really good for ironic though, given ironic doesn't do snapshots 11:27:42 It's already part of our workflow in many other areas, we are immune to the pain 11:27:53 or perhaps numbed 11:27:58 heh 11:28:04 both probably 11:29:06 so we can have a working item after this meeting, run system container on non-atomic hosts OR build custom atomic images 11:29:30 always with diskimage-builder 11:29:43 you use diskimage-builder right? 11:29:55 strigazi: I haven't checked if DIB supports atomic as a target, but if it does, that sounds like a good plan to me. 11:31:26 those network requirements, its it really just the need to define the ports before building the instance (to get the IP address?) 11:31:41 johnthetubaguy: yes 11:32:07 oneswig: I don't see atomic support in the DIB repo, but there is http://teknoarticles.blogspot.co.uk/2016/04/generate-fedora-atomic-images-using.html 11:32:32 Thanks priteau, I'll bookmark that 11:33:16 OK, did we have other questions on Magnum for now? 11:33:27 it is also here https://github.com/openstack/magnum/tree/master/magnum/drivers/common/image/fedora-atomic 11:33:36 It kind of ties in to the next topic which relates to the PTG activities 11:33:42 oneswig: I have a question towards users :) 11:34:05 strigazi: good link - I know that guy :-) 11:34:47 Is your main use case short or long living clusters? 11:35:25 For us on the SKA project, it's long-lived, by which we mean weeks 11:35:34 That seems to be the pattern so far. 11:35:56 not sure about the virtual lab case, depends if they run on the cluster, or they are clusters 11:36:02 at Sanger: possibly both/either (long-lived for replacing the current compute clusters; short-lived for burst-y adding capacity to those) 11:36:02 How short is short? 11:36:30 Let's say less than two months 11:36:56 One thing we'd love to see is dynamic ramp up/down in a demand driven cycle. We have a use case for that 11:38:00 #link would be excellent to see a Magnum equivalent of this http://www.informaticslab.co.uk/dask/2017/07/21/adaptive-dask-clusters-on-kubernetes-and-aws.html 11:38:00 true, add/remove node is more likely than add/remove cluster 11:38:35 We can work on a kubernetes autoscaler for magnum 11:38:49 strigazi: do you know of anything underway? 11:39:16 I agree, I think most of us have this need to autoscale 11:39:29 Hi martial__, morning 11:39:32 #chair martial__ 11:39:33 Current chairs: martial__ oneswig 11:39:36 Hi Stig 11:39:38 Nothing at the moment, but since kube 1.8 or 1.9 it is possible to write autoscaler with user defined metrics 11:40:09 I believe there are public cloud plugins for this but nothing for OpenStack as of ~6 months ago 11:40:40 Well in openstack the resources usually are not infinite 11:41:15 The auto scaling feaure is not the same in a private cloud with quotas 11:41:31 although credit cards have limits too 11:41:42 but granted, its not identical 11:42:14 strigazi: http://www.stackhpc.com/baremetal-cloud-capacity.html - we do want to use the capacity we have to the fullest extent 11:43:02 We should move on 11:43:10 When we have preemptibles :) 11:43:11 Final items for Magnum? 11:43:21 One last thing from me. 11:43:23 strigazi: excellent. To be continued... 11:44:20 Do you support server rebuilds on ironic nodes? We can continue another time if you want 11:44:34 strigazi: sometimes it helps to rebuild a node 11:44:53 I've used it. Not in a Magnum context. But people like the IPs they know... 11:45:38 We can continue in ~5 weeks at the PTG perhaps? 11:45:40 oneswig: sounds good to me, we want to base the upgrade capability on it (rebuild) 11:45:46 sure 11:46:13 strigazi: I've seen Nova people not liking rebuilds - johnthetubaguy can you comment? 11:47:24 Perhaps we should take that offline too 11:47:32 Time is pressing 11:47:34 #topic SIG representation at PTG 11:47:56 Aha, so we have some time during the cross-project phase to advocate use cases. 11:48:06 should we have a second session on this topic? 11:48:29 Currently this has some CERN/SKA discussions and the Ironic configurable deploy steps 11:48:43 martial__: perhaps request input and follow up? 11:48:58 oneswig: thinking still, should work in theory. 11:49:35 #link deployment steps spec https://review.openstack.org/#/c/412523/ 11:50:21 Use cases I've seen have requested kexec and boot-to-ramdisk, for example 11:50:29 All a bit unusual in a cloud mindset 11:50:37 but very useful for SIG members 11:50:48 priteau: are you still following this spec? 11:50:55 getting the use cases and context accurate really helps get the right design 11:52:00 johnthetubaguy: +1 11:52:20 oneswig: I have one item regarding PTG too 11:52:32 ildikov: go for it 11:52:33 oneswig: for Chameleon I think many users may want to use the new "Boot from Volume" functionality when we move to Pike or later, but I am still interested in a more configurable Ironic -- I haven't fully reviewed the spec yet though. 11:52:54 oneswig: I'm dedicated that we will have multi-attach in Queens 11:53:10 You certainly are dedicated :-) 11:53:44 oneswig: well I hope johnthetubaguy has a half day to review the Nova patches like today or tomorrow the latest :) 11:54:00 oneswig: back to the topic, it's a first version and we're planning a cross-project session with Cinder and Nova on improvements for the PTG 11:54:18 its honestly looking like Friday :( 11:54:33 I can expect this to be extremely useful if we can do the all-read-only, cached mode that alas seems to be beyond this version 11:54:40 oneswig: and would love to have use cases and some input on how people are planning to use it 11:55:05 johnthetubaguy: I will ask melwitt if she might have a little time for it 11:55:35 its permissive, so you can do it with read/write volumes I think 11:55:38 johnthetubaguy: please do it on Friday then as the gate is blowing up all the time, so if anything needs to be fixed that's a bloodbath next week to get it done :( 11:56:27 johnthetubaguy: we're turning cache off which I think is the issue with the case oneswig mentioned 11:57:09 correct - would need a model where clients can cache or the fan-in of load will be bad 11:57:51 I didn't think it was that good of a cache we turned off, but that is a needed optimization 11:58:02 I know v1 is libvirt only too, I guess we might need ironic support 11:58:15 oneswig: Do you have more details about when the SIG session would happen during the PTG? The Blazar meetings are getting moved to Monday and Tuesday to remove conflict with Nova sessions. 11:58:18 oneswig: it's definitely an interesting case as those settings happening at attach time and I'm not aware of changing that easily later 11:58:18 We should continue this - and will - in following meetings. Thanks ildikov for raising it 11:58:36 oneswig: thanks for the opportunity :) 11:58:40 priteau: half day on either Monday or Tuesday AFAIK. 11:58:44 johnthetubaguy: +1 for Ironic 11:59:01 sounds like a plan, lots of things to continue in a follow up meeting 11:59:07 would be great to get phase 2 planned for multi-attach 11:59:49 johnthetubaguy: +1 on that 12:00:02 johnthetubaguy: I just wish to get phase one in finally first :) 12:00:03 It has huge potential, I think 12:00:11 ildikov: +100 12:00:11 We are out of time, alas 12:00:31 And johnthetubaguy has just used up every plus-sign in the country 12:00:32 lots of good things to follow up on 12:00:42 :) 12:00:43 * johnthetubaguy takes a bow 12:00:44 Thanks everyone 12:00:47 #endmeeting