21:00:20 <oneswig> #startmeeting scientific-sig
21:00:20 <openstack> Meeting started Tue Feb  2 21:00:20 2021 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:23 <openstack> The meeting name has been set to 'scientific_sig'
21:00:30 <oneswig> hey martial
21:00:33 <trandles> hi oneswig martial
21:00:35 <oneswig> #chair martial
21:00:35 <openstack> Current chairs: martial oneswig
21:00:45 <oneswig> greetings trandles
21:01:14 <martial> hello Stig, Tim
21:01:30 <oneswig> Is everyone well?
21:01:51 <trandles> fine here, thanks
21:02:16 <oneswig> The exciting news here is that the South African coronavirus mutation has arrived in our town.  Only 11 cases but obviously a concern...
21:02:33 <trandles> :(
21:03:02 <oneswig> I will watch with great interest how things develop
21:03:36 <trandles> it's not been too bad here, but we're still not going out except exercise in the neighborhood and to shop
21:03:54 <martial> same here
21:04:23 <oneswig> same here, I don't get out much nowadays.
21:05:03 <oneswig> In theory this gives plenty of time for OpenStack tinkering though, eh..
21:06:16 <trandles> I've been doing a lot with Ironic lately. Friday I go into the office to do some network cabling on what will be our first production RHOSP deployment.
21:06:17 <martial> any discoveries?
21:07:26 <oneswig> trandles: nice!  Is it multi-tenant networking?
21:07:51 <oneswig> ... and will you try out that ramdisk deployment driver?
21:07:58 <trandles> multi-tenant in what way?
21:08:09 <trandles> we've been using ramdisk deployment a lot
21:09:07 <trandles> as can be expected, ironic just works for the most part...building, configuring, maintaining your images is the real work
21:09:29 <oneswig> multi-tenant in the use of vlans for tenant network isolation
21:09:41 <oneswig> Good to hear about the ramdisk driver in use.  That's fantastic.
21:09:49 <trandles> ah, that multi-tenant networking...yes
21:10:52 <trandles> TBH the last deployment I did prior to switching to RHOSP was OVS/OVN and I'm not entirely opposed to that. If you don't care about the performance hit, it works fairly well.
21:11:04 <trandles> OVS/OVN is a LOT to get your head around though.
21:12:24 <trandles> Ironic ramdisk deployment was, almost literally, enroll->config->deploy->nothing to see here. It just works.
21:12:25 <oneswig> We've had some fun with it here this week, for Manila with the CephFS native driver and converged Ceph.  A tricky combination that appears to expose some curious problems with OVN network reachability.
21:13:28 <oneswig> trandles: with the ramdisk, how much RAM do you recommend for a "typical" Linux deploy?  Does the RAM requirement grow or does it reach a steady-state?
21:15:08 <trandles> So that's a hot topic internally. It depends entirely on what you're doing with your image and what other services you have to support your deployment (e.g., do you keep any logs locally?).
21:16:10 <trandles> I think we're going to end up with Ironic ramdisk deploy and NFSROOT
21:16:17 <oneswig> Does your deployment "descend" to disk for filesystem components like /var?
21:16:38 <oneswig> ah, right, nfsroot.  A pivot root during bootup?
21:16:42 <trandles> Luckily we have a long history of running all our clusters without any local disks.
21:17:11 <oneswig> That's a big simplifier, I'm sure.  Apart from for the network-attached storage
21:17:21 <trandles> So we already have configurations and tooling (cfengine but moving to ansible) so that we can have little to very little use of things like /var for PID files, etc.
21:18:03 <trandles> we don't have a single compute node with any local storage right now, all persistent storage is remotely mounted
21:18:39 <oneswig> Very neat.
21:19:23 <oneswig> Having been working on hypervisors with ~36 aging disks in each recently, I see your green grass.
21:21:05 <trandles> we treat our compute like expensive cattle, not pets
21:21:18 <oneswig> ... you barbecue them? :-)
21:21:39 <trandles> expensive in that as much as each node can be an exact clone of the rest we're happy...performance-wise
21:21:53 <trandles> if you're a laggard you're not going to last long
21:22:43 <trandles> anyone have experience with Ironic's ansible deployment interface?
21:22:49 <oneswig> trandles: hardware-cardiff is an interesting project for that.  We've used it on Ironic inspection data to identify nodes that are (for whatever reason) 10% slower than the herd on a simple benchmark.
21:22:57 <trandles> is it more than just an automatic way to launch a playbook?
21:23:12 <martial> what does it do?
21:23:16 <oneswig> haven't used the ansible deployment interface, alas.
21:24:03 <trandles> https://docs.openstack.org/ironic/victoria/admin/drivers/ansible.html
21:25:44 <oneswig> We tried it's big sister the Ansible networking driver, but it was *significantly* slower than the networking-generic-switch driver.
21:26:16 <oneswig> Are you using the ansible deployment interface, or interested in trying it?
21:28:10 <trandles> I'm just looking for anyone's experiences with it.
21:28:43 <trandles> we're going to ansiblize the whole damn place and I wonder if it's worth spending time looking into
21:28:58 <trandles> my problem is I have next to no "free" time to experiment right now :(
21:29:25 <oneswig> We've had some fun recently running Ansible playbooks inside diskimage-builder builds.  If you want to Ansiblize all the things, how about that :-)
21:30:41 <trandles> another option, and one we're probably going to use
21:31:15 <trandles> image question: has anyone tried to export a VM from something like RHV and then deploy the qcow2 using Ironic?
21:31:48 <trandles> it appears the image is broken in some way (partitioning?) that causes ironic to fail deploying
21:32:39 <trandles> I haven't actually looked at it, but a colleague has been struggling with it.
21:35:01 <oneswig> ooh, well that sounds like a proper tasty problem.
21:36:07 <oneswig> I wonder if you get out a partition image that gets treated as a whole-disk image, or vice versa.  The second avenue to explore would be whether the virt image has had all those pesky hardware drivers purged from it.
21:36:58 <jmlowe> that sounds a bit odd, the image should be a bitwise copy, I wonder if he just needs to convert it to raw or something
21:37:12 <trandles> something like "deploy_step: write-image: unable to find a valid partition table" yet if he uses parted to print the disk in VM form he gets a table with a partition for /boot and another for LVM
21:37:13 <oneswig> I'm quite amused by the "cold migration" of Ironic nodes
21:38:00 <oneswig> Hi jmlowe :-)
21:38:01 <jmlowe> also, could be some efi boot shenanigans
21:38:14 <jmlowe> hi
21:38:20 <trandles> jmlowe: I suggested he switch the hardware from UEFI to BIOS but that didn't fix it
21:38:33 <oneswig> if it's failing to write the data to the disk, that's probably next week's problem...
21:38:54 <jmlowe> I'm pretty sure it wouldn't ever boot in UEFI from a stock cloud image
21:40:18 <oneswig> IIRC we build an IPA deploy image with some extra packages to get it working.  But the error above seems more fundamental than firmware variety
21:42:16 <trandles> we can build and deploy all kinds of images using DIB, but we're trying to deploy RHV hypervisors with Ironic and all we have is an RHV iso
21:42:32 <trandles> doesn't look like RHAT provides a qcow2 for RHV hypervisors like they do for normal RHEL
21:43:00 <trandles> *RHV install ISO that is, not a bootable runnable ISO virt media image
21:43:25 <oneswig> trandles: interesting... the SuperCloud-style Ironic-deployed hypervisors is something we've been spending a bit of time with recently too.
21:43:45 <trandles> what flavor hypervisors?
21:44:36 <oneswig> Well its standard CentOS images with KVM deployed using Kolla-Ansible.  To operate as Nova compute resource.
21:44:52 <trandles> ah
21:45:15 <oneswig> In this model, we have this kind of mezzanine between undercloud and overcloud, where Nova's hypervisors live.
21:45:38 <oneswig> It all gets very Escher
21:45:56 <martial> seems scary honestly at this point :)
21:46:51 <oneswig> I have only been involved at arms length but from what I've heard it's simpler than other ways of partitioning virt and bare metal in a dynamic fashion.
21:47:45 <trandles> I'll be asking questions about that eventually. I already have customers who want us to implement something like SuperCloud
21:50:23 <oneswig> Where's janders when you need him :-)
21:50:39 <trandles> sleeping...I bug him in #openstack-ironic all the time ;)
21:52:16 <trandles> Anyone have any info about upcoming summits? The openstack.org/summit page has nothing for the future
21:52:18 <janders> hmm... my ears are burning
21:52:31 <trandles> \o/
21:52:44 <oneswig> Hey janders, g'day!
21:52:49 <janders> g'day !
21:53:29 <oneswig> We were reminiscing on SuperCloud - actually on the nova-on-ironic stuff you were doing.
21:53:40 * janders is reading through history
21:54:53 <janders> very cool! :)
21:56:00 <oneswig> This machine is built that way: https://www.top500.org/system/179908/
21:56:23 <janders> excellent
21:56:39 <janders> what is the lay of the land around this these days? any interesting catches?
21:56:47 <oneswig> It's early days but the virt component has been up to ~275 nodes and down to 6
21:57:00 <janders> that is a great result!
21:57:15 <martial> nice
21:57:23 <oneswig> The main catch I think has been the bit you left as an exercise to the reader - ethernet
21:57:23 <janders> I remember having to do a bit of hacking with cleaning up repurposed hypervisors, otherwise there were some side effects
21:58:09 <janders> but it was relatively easy (some cli commands in hypervisor un-provsioning playbook)
21:58:17 <janders> hmm... what's the challenge with ethernet?
21:58:38 <oneswig> The hypervisor's more rich networking config is applied via Ansible.  All the trunks, VLANs, LAGs, etc.
21:58:49 <oneswig> You had it so easy with IB :-)
21:58:53 <janders> haha!
21:59:04 <janders> so - no NEO-like SDN in the Ethernet approach?
21:59:28 <oneswig> Alas nothing like that.
21:59:49 <janders> given we're running out of time, I propose "SuperClouds in 2021" as a topic for discussion in two weeks time
22:00:00 <janders> (I have a meeting clash with the other SIG time slot)
22:00:11 <janders> would that work?
22:00:12 <oneswig> Sounds good to me.
22:00:28 <janders> excellent! I'm curious what are you guys up to in this area
22:00:30 <oneswig> Tell those other people you have a priority janders :-)
22:00:52 <oneswig> OK, better wrap up.  Any final comments?
22:01:24 <oneswig> #endmeeting