21:00:20 #startmeeting scientific-sig 21:00:20 Meeting started Tue Feb 2 21:00:20 2021 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:23 The meeting name has been set to 'scientific_sig' 21:00:30 hey martial 21:00:33 hi oneswig martial 21:00:35 #chair martial 21:00:35 Current chairs: martial oneswig 21:00:45 greetings trandles 21:01:14 hello Stig, Tim 21:01:30 Is everyone well? 21:01:51 fine here, thanks 21:02:16 The exciting news here is that the South African coronavirus mutation has arrived in our town. Only 11 cases but obviously a concern... 21:02:33 :( 21:03:02 I will watch with great interest how things develop 21:03:36 it's not been too bad here, but we're still not going out except exercise in the neighborhood and to shop 21:03:54 same here 21:04:23 same here, I don't get out much nowadays. 21:05:03 In theory this gives plenty of time for OpenStack tinkering though, eh.. 21:06:16 I've been doing a lot with Ironic lately. Friday I go into the office to do some network cabling on what will be our first production RHOSP deployment. 21:06:17 any discoveries? 21:07:26 trandles: nice! Is it multi-tenant networking? 21:07:51 ... and will you try out that ramdisk deployment driver? 21:07:58 multi-tenant in what way? 21:08:09 we've been using ramdisk deployment a lot 21:09:07 as can be expected, ironic just works for the most part...building, configuring, maintaining your images is the real work 21:09:29 multi-tenant in the use of vlans for tenant network isolation 21:09:41 Good to hear about the ramdisk driver in use. That's fantastic. 21:09:49 ah, that multi-tenant networking...yes 21:10:52 TBH the last deployment I did prior to switching to RHOSP was OVS/OVN and I'm not entirely opposed to that. If you don't care about the performance hit, it works fairly well. 21:11:04 OVS/OVN is a LOT to get your head around though. 21:12:24 Ironic ramdisk deployment was, almost literally, enroll->config->deploy->nothing to see here. It just works. 21:12:25 We've had some fun with it here this week, for Manila with the CephFS native driver and converged Ceph. A tricky combination that appears to expose some curious problems with OVN network reachability. 21:13:28 trandles: with the ramdisk, how much RAM do you recommend for a "typical" Linux deploy? Does the RAM requirement grow or does it reach a steady-state? 21:15:08 So that's a hot topic internally. It depends entirely on what you're doing with your image and what other services you have to support your deployment (e.g., do you keep any logs locally?). 21:16:10 I think we're going to end up with Ironic ramdisk deploy and NFSROOT 21:16:17 Does your deployment "descend" to disk for filesystem components like /var? 21:16:38 ah, right, nfsroot. A pivot root during bootup? 21:16:42 Luckily we have a long history of running all our clusters without any local disks. 21:17:11 That's a big simplifier, I'm sure. Apart from for the network-attached storage 21:17:21 So we already have configurations and tooling (cfengine but moving to ansible) so that we can have little to very little use of things like /var for PID files, etc. 21:18:03 we don't have a single compute node with any local storage right now, all persistent storage is remotely mounted 21:18:39 Very neat. 21:19:23 Having been working on hypervisors with ~36 aging disks in each recently, I see your green grass. 21:21:05 we treat our compute like expensive cattle, not pets 21:21:18 ... you barbecue them? :-) 21:21:39 expensive in that as much as each node can be an exact clone of the rest we're happy...performance-wise 21:21:53 if you're a laggard you're not going to last long 21:22:43 anyone have experience with Ironic's ansible deployment interface? 21:22:49 trandles: hardware-cardiff is an interesting project for that. We've used it on Ironic inspection data to identify nodes that are (for whatever reason) 10% slower than the herd on a simple benchmark. 21:22:57 is it more than just an automatic way to launch a playbook? 21:23:12 what does it do? 21:23:16 haven't used the ansible deployment interface, alas. 21:24:03 https://docs.openstack.org/ironic/victoria/admin/drivers/ansible.html 21:25:44 We tried it's big sister the Ansible networking driver, but it was *significantly* slower than the networking-generic-switch driver. 21:26:16 Are you using the ansible deployment interface, or interested in trying it? 21:28:10 I'm just looking for anyone's experiences with it. 21:28:43 we're going to ansiblize the whole damn place and I wonder if it's worth spending time looking into 21:28:58 my problem is I have next to no "free" time to experiment right now :( 21:29:25 We've had some fun recently running Ansible playbooks inside diskimage-builder builds. If you want to Ansiblize all the things, how about that :-) 21:30:41 another option, and one we're probably going to use 21:31:15 image question: has anyone tried to export a VM from something like RHV and then deploy the qcow2 using Ironic? 21:31:48 it appears the image is broken in some way (partitioning?) that causes ironic to fail deploying 21:32:39 I haven't actually looked at it, but a colleague has been struggling with it. 21:35:01 ooh, well that sounds like a proper tasty problem. 21:36:07 I wonder if you get out a partition image that gets treated as a whole-disk image, or vice versa. The second avenue to explore would be whether the virt image has had all those pesky hardware drivers purged from it. 21:36:58 that sounds a bit odd, the image should be a bitwise copy, I wonder if he just needs to convert it to raw or something 21:37:12 something like "deploy_step: write-image: unable to find a valid partition table" yet if he uses parted to print the disk in VM form he gets a table with a partition for /boot and another for LVM 21:37:13 I'm quite amused by the "cold migration" of Ironic nodes 21:38:00 Hi jmlowe :-) 21:38:01 also, could be some efi boot shenanigans 21:38:14 hi 21:38:20 jmlowe: I suggested he switch the hardware from UEFI to BIOS but that didn't fix it 21:38:33 if it's failing to write the data to the disk, that's probably next week's problem... 21:38:54 I'm pretty sure it wouldn't ever boot in UEFI from a stock cloud image 21:40:18 IIRC we build an IPA deploy image with some extra packages to get it working. But the error above seems more fundamental than firmware variety 21:42:16 we can build and deploy all kinds of images using DIB, but we're trying to deploy RHV hypervisors with Ironic and all we have is an RHV iso 21:42:32 doesn't look like RHAT provides a qcow2 for RHV hypervisors like they do for normal RHEL 21:43:00 *RHV install ISO that is, not a bootable runnable ISO virt media image 21:43:25 trandles: interesting... the SuperCloud-style Ironic-deployed hypervisors is something we've been spending a bit of time with recently too. 21:43:45 what flavor hypervisors? 21:44:36 Well its standard CentOS images with KVM deployed using Kolla-Ansible. To operate as Nova compute resource. 21:44:52 ah 21:45:15 In this model, we have this kind of mezzanine between undercloud and overcloud, where Nova's hypervisors live. 21:45:38 It all gets very Escher 21:45:56 seems scary honestly at this point :) 21:46:51 I have only been involved at arms length but from what I've heard it's simpler than other ways of partitioning virt and bare metal in a dynamic fashion. 21:47:45 I'll be asking questions about that eventually. I already have customers who want us to implement something like SuperCloud 21:50:23 Where's janders when you need him :-) 21:50:39 sleeping...I bug him in #openstack-ironic all the time ;) 21:52:16 Anyone have any info about upcoming summits? The openstack.org/summit page has nothing for the future 21:52:18 hmm... my ears are burning 21:52:31 \o/ 21:52:44 Hey janders, g'day! 21:52:49 g'day ! 21:53:29 We were reminiscing on SuperCloud - actually on the nova-on-ironic stuff you were doing. 21:53:40 * janders is reading through history 21:54:53 very cool! :) 21:56:00 This machine is built that way: https://www.top500.org/system/179908/ 21:56:23 excellent 21:56:39 what is the lay of the land around this these days? any interesting catches? 21:56:47 It's early days but the virt component has been up to ~275 nodes and down to 6 21:57:00 that is a great result! 21:57:15 nice 21:57:23 The main catch I think has been the bit you left as an exercise to the reader - ethernet 21:57:23 I remember having to do a bit of hacking with cleaning up repurposed hypervisors, otherwise there were some side effects 21:58:09 but it was relatively easy (some cli commands in hypervisor un-provsioning playbook) 21:58:17 hmm... what's the challenge with ethernet? 21:58:38 The hypervisor's more rich networking config is applied via Ansible. All the trunks, VLANs, LAGs, etc. 21:58:49 You had it so easy with IB :-) 21:58:53 haha! 21:59:04 so - no NEO-like SDN in the Ethernet approach? 21:59:28 Alas nothing like that. 21:59:49 given we're running out of time, I propose "SuperClouds in 2021" as a topic for discussion in two weeks time 22:00:00 (I have a meeting clash with the other SIG time slot) 22:00:11 would that work? 22:00:12 Sounds good to me. 22:00:28 excellent! I'm curious what are you guys up to in this area 22:00:30 Tell those other people you have a priority janders :-) 22:00:52 OK, better wrap up. Any final comments? 22:01:24 #endmeeting