Wednesday, 2017-09-27

*** goldenfri has quit IRC04:36
*** b1airo has joined #scientific-wg10:10
*** rbudden has joined #scientific-wg10:54
*** oneswig has joined #scientific-wg10:58
*** martial_ has joined #scientific-wg11:01
martial_hello?12:03
martial_so https://etherpad.openstack.org/p/Scientific-Sydney17-Forum12:04
oneswigaha12:04
martial_b1airo: the BoF and lighting talks are on the schedule now>12:04
b1airoi'll have 4x SN2700s, 80x CX-4 50GbE DP, 60x C6420s, 20x R740s...12:04
martial_? I did not receive any notice of this from the Foundation ... strange12:04
oneswigb1airo: nice, that's pretty much the entire Cambridge deployment, updated12:05
b1airomartial_, ah ok, i did get a message from Jimmy saying they were up - will forward it...12:05
martial_thanks b1airo12:06
oneswigb1airo: so you're looking at leading the scientific openstack pain points suggestion?  I have a suspicion these two sessions might be merged.  If you're meeting other HPC Ops people, what else have you got to talk about?12:09
martial_I see that they accepted the forum sessions I submitted a while back12:09
b1airooneswig, i'll also have 2x SN2100s, 20x CX-4 25GbE DP and 20x R740XD - 3.5PB of extra Ceph :-)12:09
oneswigI wonder if jmlowe might be interested in the Ceph BoF, if he's going12:09
b1airoi would say so oneswig - lots of people interested in OpenStack will also be interested in Ceph12:10
oneswigb1airo: great to hear it, sounds awesome.  All part of the same cloud, or a new project?12:10
martial_I need to reach out to them: it is not SWG meeting anymore but SWG Lighting Talk, my affiliation is wrong (likely why I did not get the email ... again) and Stig is missing on the list :P12:10
oneswigmartial_: I'm missing I think because I'm already doing 3 other things12:11
oneswigI couldn't get proposed for a fourth12:11
oneswig(or fifth)12:11
martial_oneswig: I remember this conversation :)12:11
martial_oneswig: popular guy12:12
martial_:)12:12
oneswigdon't know when to stop, that's my problem...12:12
martial_well who needs sleep anyhow right?12:12
oneswigOK - better go, tea to drink etc.12:12
oneswigcatch you later12:12
*** oneswig has quit IRC12:13
martial_b1airo: you okay on the "Top pain points for scientific openstack"?12:13
martial_I will email Mike related to the "Ceph BoF"12:13
martial_Maybe invite Robert Budden and Time Randles to be moderator on the "HPC Ops Unite"12:14
b1airomartial_, recall that we specifically decided to have a BoF and Lightening Talks as our two WG sessions and forgo a "meeting" - a BoF is already similar12:15
martial_b1airo: I understand but if you look at https://www.openstack.org/summit/sydney-2017/summit-schedule/global-search?t=scientific it is the wrong items12:16
martial_b1airo: are you concerned the "HPC Ops Unite" is a repeat of "SWG BoF", is that what you mean?12:19
martial_I think the BoF is specific to current members and the other is to welcome people that are not yet members12:20
jmloweThe ceph bof from MSI?12:39
jmloweOh, wait, you mean Sydney not SC'1712:40
jmloweI couldn't swing Sydney12:40
*** jmlowe has quit IRC12:48
*** jmlowe has joined #scientific-wg13:34
*** hogepodge has joined #scientific-wg13:35
hogepodgeSorry b1airo I fell back asleep13:35
hogepodgeI didn't understand the question too, access to the sections? Also, what were the terms offered for the images? Attribution?13:36
martial_jmlowe: do you know if Tim and Robert are coming?13:40
jmloweTim yes, Robert no13:40
martial_I think Tim is a good candidate for the "HPC Ops Unite" moderator, will reach out to him13:40
martial_I was hoping to be able to give Robert a slot as moderator for a forum session, I know he was interested in the experience13:41
martial_I know Mike May is a good candidate for the Ceph one, who else would you recommend?13:42
jmloweAnybody from MSI if they are going, only people I know who have been using ceph in production longer than me13:43
jmloweTrying to remember who was at SC ceph bof's and might be going to Sydney13:45
martial_if you can email me those info, I can follow up with them. I would appreciate that13:46
martial_thank you very much13:46
jmlowebollig and masber? are from MSI13:47
martial_(okay looking for their email :) )13:54
martial_thanks Mike13:54
martial_found Evan, will reach out and ask him to follow up with "masber" (real name?)13:57
rbuddenmartial_: sorry, I couldn’t swing Sydney either unfortunately :(14:07
martial_Robert, I am sorry to hear that, I was looking forward to see you guys again ... Vancouver hopefully14:07
rbuddenyes, I think I should be able to swing Vancouver14:07
rbuddeninternational travel is rather difficult at times14:08
rbuddenI need to write more papers/presentations ;)14:08
martial_I understand, trust me :)14:08
*** jmlowe has quit IRC14:31
*** jmlowe has joined #scientific-wg14:37
*** jmlowe has quit IRC14:48
*** jmlowe has joined #scientific-wg14:48
*** jmlowe has quit IRC15:03
*** jmlowe has joined #scientific-wg15:04
*** martial_ has quit IRC15:13
*** jmlowe has quit IRC15:32
*** b1airo has quit IRC16:20
*** jmlowe has joined #scientific-wg16:35
jmloweCome hell or high water I'm going to Vancouver, Berlin is an easier sell than Sydney16:36
*** b1airo has joined #scientific-wg16:37
masberjmlowe, MSI?16:43
jmloweMinnesota Super Computing Institute, and clearly I was wrong, you are from AZ right?16:44
masberIm going to sydney summit16:45
masberand I also have talk I have to prepare16:46
masberjmlowe, sorry I'm not from MSI but from Garvan Institute (Sydney)16:47
jmloweI need more coffee, country code for Australia is AU not AZ16:47
jmlowemasber yeah, getting all kinds of things wrong today16:48
masberjmlowe, np16:48
masberb1airo, are you from Monash University?16:48
masbercould I ask now that most of you are here, how do you deploy openstack?16:50
masberI used kolla-ansible but I am thinking to either move to openstack-ansible or tripleO? any thoughts?16:51
jmloweb1airo is from Monash16:51
jmloweI rolled my own using saltsack https://github.com/jetstream-cloud/Jetstream-Salt-States16:52
masberwow16:52
jmlowerbudden was using packstack for a while, not sure what now16:53
masberpackstack is not production ready as far as I know16:53
jmloweDo I remember that trandles was giving kolla a try?16:53
jmlowerbudden has a special case, they might have the world's largest ironic cluster16:54
jmlowehttps://www.psc.edu/bridges is their machine16:54
masberapprently they use HPC and big data, I wonder how do they deploy hadoop ecosystem16:56
jmlowehe's active, I'll poke him in a minute16:59
jmlowehttps://www.stackhpc.com/blog.html16:59
jmloweStig is apparently using Kolla16:59
rbuddenmasber: we have a hadoop guy that converts portions of bridges into hadoop sections on demand17:00
rbuddenwe actually don’t do a lot of hadoop17:00
rbuddenless than we expected17:01
rbuddenbut when we do the user(s) request a reservation in Slurm for the # of nodes they want, then our hadoop guys scripts everything up17:01
rbuddenI’d like to play with Sahara eventually and just have OpenStack handle it17:01
rbuddenmaybe one day ;)17:01
masberrbudden, why don't you use ambari? it will give a better supported version of hadoop and monitoring and control version of the configuration files17:02
rbuddenI can’t answer that as I don’t handle any of our hadoop infra, but I’ll check it out17:03
rbuddenI’m largely our OpenStack guy and do some filesystems work and other dev work17:03
masberhas anyone used zun + kuryr?17:04
masberrbudden, so you use packstack for provisioning openstack?17:05
rbuddenyes, modified packstack17:05
rbuddenlooking to move to kolla17:05
masberwhy not tripleO?17:05
masberkolla-k8s or kolla-ansible?17:06
rbuddentriple o is on the radar to take a look at17:06
rbuddeni largely have ansible to handle things so OSA seemed interesting as well17:07
rbuddenwas thinking Kolla just based on Stig’s experience with it17:07
masberyeah OSA and tripleO are the ones I want to move on17:07
masberthey both provide quite a lot of features like SR-IOV and ironic17:08
rbuddeni’m open to anything at this point that would simplify things and containerize the setup17:08
masbertripleO does containers based on docker and OSA based on linux containers17:08
masberI like tripleo because they release updates quite fast17:09
masberand I want to use pike so I can deploy containers using kuryr and zun17:09
masberand I am also exicted about ceph-bluestore17:10
masberI think tripleo/packstack will be the first ones to support it17:10
jmloweis zun usable now?17:11
rbuddeni’m very interested in Zun as well17:11
masberapparently yes with pike17:11
rbuddenI was sitting in on the IRC meetings, but it’s 11pm EST so it conflicts with getting sleep while my baby sleeps ;)17:11
jmloweI last checked in in on zun during the Boston summit, looked a bit like varporware17:12
masberegonzalez from kolla-ansible is quite active helping the community and giving support, he told he they were testing zun and kuryr17:12
rbuddenjmlowe: it’s more than vapor, it’s largely spun off nova-docker so a lot of the work was already done/started17:13
rbuddeni was planning on trying it on Bridges shortly17:13
rbuddenin a test setup17:13
masberjmlowe, why is that? apparently running containers using nova is deprecated17:13
jmloweit looked like they were just getting the project organized back in May17:14
rbuddenmasber: correct, nova-docker is effectively dead. it was just one dev ‘dimms’ maintaining it17:14
rbuddenhe’s now part of the Zun team17:14
jmloweI reserve the right to be completely wrong17:14
masberI am wondering how kuryr works... does it provide service discovery?17:15
masberjmlowe, I remember you told me you achive 97% efficiency in your cluster, which tool do you use to measure that?17:18
jmlowethat was linpack17:18
jmloweHPL 0.696GFLOPS bare metal .678GFLOPS vm17:19
jmloweworst was STREAM 88GB/s bare metal 68GB/s vm17:20
*** goldenfri has joined #scientific-wg17:20
jmloweI take that back, worst was FFTE 13.75 vs 9.235 GFLOPS17:20
masberjmlowe, why don't use YCSB?17:21
jmloweWe ran HPCC because we are funded by an agency that buys HPC and runs HPCC on all their machines17:22
jmloweYCSB would have been nice but we also had a 6 month delay in hardware delivery and a 2 week delay in production date17:23
masberjmlowe, why such a difference between baremetal and vm in STREAM? NUMA without cpu pinning?17:24
jmloweJetstream went from os install on nodes 2015-11-09 to first vm started 2015-12-15 with early operations and users on 2016-0217:24
jmloweprobably, they are numa nodes17:25
jmlowewe don't do pinning17:25
jmlowe17 business days to do bare metal benchmarks, burn in, and get openstack functional17:26
masbernuma is a pain and it is hard to fully utilize hardware resources with cpu pinning but apparently gives better performance on big VMs17:27
masberjmlowe, enable cpu pinning is quite easy you just need to put the attributes when you create the image17:28
jmloweone of those things I would have liked to fully map out17:28
jmloweit also prevents live migration, that is not something I'm willing to give up for any price17:28
masberso let me ask you, do you do any type of optimization on openstack?17:29
jmloweour mandate isn't hpc, it is to get the %97 of users eligible to use NSF funded resources who don't, getting an extra %10 performance boost won't do it17:29
jmloweAny time there was a choice between live migrate and always up vs speed we chose live migrate, no sriov, no pinning, 10gige w/ vxlan17:31
jmloweagain, if users had to squeeze that last %10 out of a node they would already be using hpc systems17:32
masberyes17:32
jmlowealso I think those runs were done with gcc not intel17:32
masberok, im going to be, it is 3.37am17:37
*** goldenfri has quit IRC18:26
*** jmlowe has quit IRC19:25
*** b1airo has quit IRC19:47
*** b1airo has joined #scientific-wg19:48
*** martial has joined #scientific-wg19:56
*** martial has quit IRC20:20
trandlesyes, I've been attempting to use kolla this week21:26
trandlesI've been totally unimpressed21:26
trandlesI think it's largely a lack of coherent documentation though21:27
trandlesthe quickstart guide didn't work21:28
trandlesthe "how to build an image" documentation makes a load of assumptions that you have to trip over and discover on your own before you can make progress21:29
trandlesif you start here (which doesn't seem unreasonable given the title of the page) https://docs.openstack.org/kolla/latest/index.html21:29
trandlesyou are sort of persuaded to choose between kolla-ansible and kolla-kubernetes21:30
trandlesI went down the kolla-ansible path and followed each link in the User Guides in succession...but only got to "Quick Start" before it failed21:31
trandleswhen googling for help with the failure I came across a bug report that basically said "we never said everything would work, here's documentation to prove it" and the issue was closed21:32
trandlesI think if you were trying to use the bootstrap-servers ansible play but not for the most recent release (i.e. Pike in my case) you might be ok21:34
trandlesI'm very interested in talking to Stig about how they managed an upgrade using kolla and ansible.  One hint may be that they21:36
trandlesI fell back to using the kolla stable/pike branch on github.  In there I used the various scripts in tools/ but only had partial success.  Several of the images didn't build.  I have been busy with other things today and haven't had a chance to go back and debug yet.21:39
trandlesOverall though, I don't think I'm going to try to untangle the kolla mess21:40
*** goldenfri has joined #scientific-wg21:41
trandlesThis however looks really good:  https://docs.openstack.org/install-guide/21:44
trandlesFollowing it for a clean pike install has been painless.  Plus I know docker well, so working from this guide to make LANL-specific dockerfiles, for only the things we care about right now, is the path I'm pursuing.21:45
*** b1airo has quit IRC22:11
*** rbudden has quit IRC22:24
*** rbudden has joined #scientific-wg22:38
*** b1airo has joined #scientific-wg22:47
*** b1airo has quit IRC23:10
*** jmlowe has joined #scientific-wg23:32
*** b1airo has joined #scientific-wg23:35
*** jmlowe has quit IRC23:52
*** jmlowe has joined #scientific-wg23:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!