11:01:17 <oneswig> hello
11:01:29 <witek> hi
11:02:06 <oneswig> Hi witek, how are you?
11:02:14 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_22nd_2020
11:02:22 <oneswig> small agenda.  Must try harder :-)
11:04:05 <janders> g'day
11:04:11 <oneswig> #topic Sig at virtual PTG
11:04:15 <witek> I'm fine, thanks
11:04:16 <oneswig> Hi janders, evening
11:05:19 <oneswig> janders: had a question for you about how you were using vxlan - on the IB NIC was that right?
11:05:46 <janders> I've done this some time back
11:07:16 <janders> I used this method for the fully virtualised part of the cluster running on IB kit
11:07:16 <oneswig> #link virtual PTG planning https://ethercalc.openstack.org/126u8ek25noy
11:09:26 <janders> I wouldnt recommend vxlan over IB for large scale, performance sensitive cases
11:09:32 <janders> but for small-medium it works pretty well
11:11:44 <oneswig> I put in for 1500UTC-1700UTC and 2100UTC-2300UTC for virtual PTG, hopefully the split session will work for more people.
11:12:12 <janders> comms issues sorry
11:12:26 <witek> the afternoon session should work for me
11:13:31 <oneswig> apologies, had a call
11:14:02 <oneswig> afternoon for me too, I'll see how much I've got left in the tank for the evening.
11:15:40 <oneswig> We should think about how to advocate scientific sig use cases
11:15:58 <oneswig> (and who to?)
11:17:46 <oneswig> #topic AOB
11:18:23 <oneswig> Just sitting in on a presentation on OpenStack changes for supporting CentOS8
11:18:45 <oneswig> plenty of changes to accommodate
11:20:48 <oneswig> Quite a big piece of work but CentOS 7 will rapidly go off, likely
11:20:50 <janders> top #3 challenges?
11:21:15 <janders> which release will support both el7 and el8?
11:21:31 <oneswig> I joined late, but DNF, PFtables, Python 3, and network scripts being deprecated all have to be resolved.
11:21:56 <oneswig> In the Kolla world, Train spans both.
11:22:09 <oneswig> Assume it's the same for TripleO
11:22:25 <janders> how well is el7->el8 migration going for people?
11:22:34 <janders> or is it irrelevant for OS?
11:22:49 <oneswig> It's a reinstall
11:23:01 <janders> wasn't that supposed to be an in-place?
11:23:27 <oneswig> I'm not sure for RHEL but for CentOS it's reinstall
11:23:42 <janders> I've done interesting el6->el7 migrations on Icehouse I think it was
11:23:46 <janders> back in the day
11:24:21 <janders> SRIOV cluster so no live migration
11:24:42 <janders> but /var/lib/nova/instances as a separate LV worked a treat
11:25:03 <oneswig> Right.  I occasionally hear things about support for SRIOV live migration...
11:25:30 <janders> in the "maybe one day" context or ready for testing context?
11:27:21 <oneswig> Trying to remember where I saw it recently.
11:33:29 <oneswig> It's quiet today, anyone with AOB to add?
11:33:56 <janders> playing around with tripleo updates on OSP13
11:34:06 <janders> seems updates are heaps better than installs :)
11:34:18 <janders> mostly ansible
11:34:28 <oneswig> Good to hear it, although usually it's the other way around
11:34:31 <janders> as next versions come out, who knows, maybe tripleo will get usable
11:34:48 <janders> promise is less puppet more ansible, should be all ansible soon
11:35:06 <janders> another interesting find - CX6 drivers are in overcloud images, but not in IPA
11:35:10 <oneswig> Seems like a good path and one that converges with OS-A and Kolla.
11:35:15 <janders> we were rebuilding IPA with CX6 drivers this week
11:35:42 <oneswig> From our team Doug did our first production deploy with the Ironic SW RAID driver
11:36:15 <janders> nice!
11:36:26 <janders> i may move to that before I get VROC working in IPA :P
11:37:03 <oneswig> Those things are harder than they ought to be, I've heard.
11:37:12 <janders> confirmed :/
11:37:30 <oneswig> The price paid for always having the shiny toys!
11:37:46 <janders> indeed
11:38:47 <oneswig> janders: are you still deploying supercloud nodes with only one NIC?
11:39:17 <janders> for the cyber project im doing right now I have two
11:39:31 <janders> 100/200 HDR for storage only
11:39:35 <janders> and 100GE for everything else
11:39:55 <janders> but this cluster is a little HPC and more security focused
11:41:05 <oneswig> Are you using the m-key stuff on the IB, or is that outside of the user-facing world?
11:42:00 <janders> m-key?
11:43:21 <oneswig> There was a fix for bare metal multi-tenant IB security, I believe it related to preventing IB NICs from sending management datagrams.
11:43:45 <janders> right!
11:44:01 <janders> for older cards it was special locked-down firmware
11:44:19 <janders> unsure for the latest ones - in this cluster, IB fabric is not user facing (at least for now)
11:44:35 <oneswig> ok, makes sense.
11:44:35 <janders> so it's just the standard FW
11:45:07 <oneswig> Are your tenants untrusted in this system janders?
11:45:29 <janders> im less worried about the tenants and more about the stuff they will be working on
11:45:33 <janders> malware analysis
11:45:50 <janders> so IB is mostly used to connect volumes with their data to hypervisors at decent performance
11:46:13 <oneswig> NVMEoF?
11:46:50 <janders24> GPFS
11:47:07 <janders24> though NVMEoF is a good idea actually :)
11:47:27 <janders24> I've gone with GPFS cause it can do everything (glance/cinder backend + parallel FS)
11:47:57 <janders24> but im starting to lean towards splitting openstack backends off the parallel FS
11:47:57 <oneswig> What's the state of the OpenStack support?
11:48:08 <janders24> for OSP13 it *just works*
11:48:16 <oneswig> Nice, good to hear it.
11:48:29 <janders24> for OSP16 it's uncertain, sometimes I'm thinking about forward-porting it myself :)
11:48:31 <oneswig> Assuming you didn't mean it *just* works :-)
11:48:53 <janders24> yeah... there is some fiddling to get it to work
11:49:00 <janders24> and then it *just works* :)
11:49:25 <janders24> containerisation isn't helping to make things simple
11:49:58 <oneswig> how so?
11:50:25 <janders> passing through extra subtrees into a number of containers via tripleo isnt trivial
11:53:20 <oneswig> Something like /var/lib/nova?
11:53:57 <janders> spot on
11:53:59 <janders> and glance
11:54:20 <janders> cinder is even more interesting cause its pacemaker powered in OSP
11:55:22 <oneswig> Back in the day it was a huge problem upgrading a hypervisor or controller with OFED packages installed.  How's that going now?
11:55:31 <janders24> hahaha
11:55:35 <janders24> it was killing us just this morning
11:55:50 <janders24> I fixed this some time back with elaborate repo management tactics
11:55:57 <janders24> but these days more and more stuff is in kernel
11:56:05 <janders24> so I usually get away with running without OFED
11:56:14 <janders24> i probably wouldnt do that on GPFS servers
11:56:20 <janders24> but GPFS clients are running fine
11:56:41 <janders24> and as mentioned earlier recent OSP13 overcloud images come with CX6 drivers included
11:56:49 <oneswig> Interestingly, it does seem possible to upgrade in-place, but not easy: https://www.centlinux.com/2020/01/how-to-upgrade-centos-7-to-8-server.html
11:57:14 <janders24> IPA may need injection of bits of OFED but that doesn't need to be upgradable, it's rebuild-as-needed
11:57:50 <oneswig> I remember benchmarking the iSER cinder driver, it was really good (but also a significant SPOF)
11:58:18 <janders24> tahts the beauty of GPFS
11:58:20 <janders24> no SPOFs
11:58:32 <oneswig> yes, very nice.
11:58:45 <janders24> yeah with el7>el8 I was expecting to see Fedora-like in-place upgrade mechanism
11:59:03 <janders24> unsure to what degree its applicable to OpenStack (or how worthwhile this would be)
11:59:15 <janders24> likes of tripleo will probably rebuild node after node
11:59:24 <janders24> but having said that i havent explored that path so not sure... yet
11:59:32 <janders24> OSP13 has a fair bit of life left on it and works well
12:00:07 <oneswig> Ah, we are out of time.
12:00:22 <janders24> thanks guys
12:00:23 <oneswig> Thanks janders24 witek good talking with you both
12:00:24 <janders24> stay safe!
12:00:28 <oneswig> and you
