Friday, 2025-01-31

opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/allocations  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93721302:24
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93721402:24
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93721406:02
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93721406:30
rpittaugood morning ironic! happy friday! o/06:59
jandershappy friday rpittau o/09:35
rpittautnx janders :)09:36
opendevreviewJacob Anders proposed openstack/bifrost master: Update `uuid` to `id` in node_info to match module change  https://review.opendev.org/c/openstack/bifrost/+/94050510:20
opendevreviewJacob Anders proposed openstack/bifrost master: Fix typo in CLI parameter spefifying config drive.  https://review.opendev.org/c/openstack/bifrost/+/94050710:26
janders^^ some quick and easy reviews w/r/t my yesterday's bifrost deploy playbook fault-finding if anyone has time10:27
jandersthe latter is literally a single-character change10:28
janders(not like the former is much bigger, that would be two chars... :) )10:28
jandersnow back to what I was actually working on when I ran into these and started debugging10:28
jandersI had a look at that CI issue and reproduced it locally, it seems to be complaining about a missing ansible module: https://paste.opendev.org/show/buTiKaYnSqhLBx1Bny0B/11:34
janderswhy this is needed for a single-char doc change, I do not know11:34
opendevreviewcid proposed openstack/ironic master: Log secure boot access failures at INFO level  https://review.opendev.org/c/openstack/ironic/+/94043312:22
opendevreviewMerged openstack/ironic master: doc/source/admin fixes part-3  https://review.opendev.org/c/openstack/ironic/+/93900312:25
fricklerjanders: your local error looks different, you need the clone the collection repo beside the bifrost repo13:19
fricklerbut I also have no idea what this is trying to tell us: ANSIBLE_COLLECTIONS_PATHS was detected, replace it with ANSIBLE_COLLECTIONS_PATH to continue.13:20
frickleroh, wait, it is PATHS vs. PATH13:20
opendevreviewKaifeng Wang proposed openstack/ironic master: Support querying node history with sort_key and sort_dir  https://review.opendev.org/c/openstack/ironic/+/94052213:38
opendevreviewRiccardo Pittau proposed openstack/bifrost master: Fix ansible linters  https://review.opendev.org/c/openstack/bifrost/+/94052714:17
rpittaujanders: ^14:17
TheJuliaHey, did we ever hear from keekz if my patch fixed their deployment up?14:56
TheJuliawell, cleaning, that is14:56
TheJuliaHey folks, any chance I can get another set of eyes on https://review.opendev.org/c/openstack/ironic/+/940072 ?15:05
jizaymesHello, and good day. I've been strugging with getting Ironic to provision a machine. I'll ask a few questions but if there are any consultants for hire on this topic, I'd be interested too.    Story: an existing openstack dalmation environment that I'm now trying to add bare metal to. I have a node added and inspected with redfish. When using the 'openstack server create' command, it starts building, the machine gets powered on15:09
jizaymes but never DHCPs.  Neutron cant bind a VIF, and the dhcp hostsdir file is always set to ,ignore, and it never dhcps as a result.   I'm trying with what I hope is a pretty remedial setup, with a flat cleaning/provisioning network set up, and then an Internet / external network for the server's  'production' traffic (more questions to come on ironic networking once I get the basics down).  Virtual machines can bind to the externa15:09
jizaymesl network but the bare metal vifs fail. Suggestions on how to diagnose further? 15:09
TheJuliaGreetings jizaymes, is Neutron setup to use OVN or dhcp-agent ?15:13
jizaymesI believe neutron is using OVN. forgive my noobness. Using kolla-ansible, with configs such as enable_ironic_neutron_agent: "yes" and neutron_enable_ovn_agent: "yes". I see baremetal,ovn in my mechanism_drivers15:17
TheJulia... okay15:17
TheJuliaso... hmm15:17
TheJuliaIPv4 or IPv6?15:17
TheJuliaYou configure ironic which to default to in ironic.conf && that uses the settings provided in the cleaning/provisioning networks to boot the machine15:19
TheJuliaThe root of the question is attempting to figure out where things might be going sideways15:20
TheJulia(and no, not available as a consultant, sorry!)15:20
jizaymesIPv4 is all thats in use here so it should default to that. 15:20
jizaymesUnassuming error message:15:20
jizaymes2025-01-31 09:20:28.715 793 ERROR neutron.plugins.ml2.managers [req-1f22b0dc-3e25-487e-86f5-65af2bf23781 req-46b50e33-97bb-4c3f-9a6c-fcd3ace1ab36 ccbc3570d6f54abb8bcecf7c32b0dee0 0bc4baf75fc74a199b027c37d2cf0c7b - - default default] Failed to bind port 1d0da9b6-840e-4e42-8cbe-c3cfff2d0f7c on host e516893e-e9fe-4cbe-a5a8-db2af9ca019c for vnic_type baremetal using segments [{'id': '5222c1eb-47e7-410f-ac52-2f543ef16095', 'network_15:20
jizaymestype': 'flat', 'physical_network': 'physnet-external', 'segmentation_id': None, 'network_id': '4df54151-7aea-4321-84e4-0fb99d2a24f5'}]15:20
TheJulia(But, happy to help)15:20
TheJuliaso, the agent might not be running, but that *shouldn't* be fatal15:21
TheJuliaon the physical machine, can you check the boot settings to make sure it is set to IPv4 PXE boot for network booting. That is unless your doing HTTPBoot explicitly15:21
TheJuliaSome servers are only going to issue requests for one type and they might disregard responses which don't match, so it looks like it doesn't get an IP at all.15:22
jizaymesi have both a neutron_dhcp_agent and neutron_ovn_agent containers, along with ironic_neutron_agent on all 3 of my controller systems -- is that dhcp and ovn agent conflicting?15:23
TheJulia... There is a specific flag for baremetal in ovn15:23
TheJuliauhhh15:23
TheJuliaone moment15:23
jizaymesI see the thing trying to PXE boot, and tcpdump sees it coming in. DHCP logs just say that it sees the request but ignores 15:23
TheJulialovely15:24
TheJuliais that the logs from the neutron_dhcp_agent ?15:24
TheJuliaor another dnsmasq's logs?15:24
jizaymesthe log I pasted earlier is from neutron-server.log 15:25
jizaymesok Im seeing some other errors in neutron-dhcp-agent.log that I will look into now. 15:26
jizaymes2025-01-29 17:05:46.074 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d542f66-4778-436a-8bc4-6ecedb0b8c37] AMQP server on 10.1.0.83:5672 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.: amqp.exceptions.ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'15:26
jizaymesoh nvm that was 2 days ago. ignoring15:27
TheJuliaSo the way it basically works is basically a request comes into neutron15:30
TheJuliaThat message gets put on the message bus and is executed upon by plugins in parallel15:30
TheJuliaso you get DHCP configuration being disjointed from physcial port binding, or the actual perception of attachment15:30
TheJuliabecause your have a flat network, port binding doesn't matter15:30
TheJuliaFor the most part, it is informational15:31
TheJuliaas a result of the use of the message bus, the dhcp stuffs get logged in different log files than neutron-server.log15:31
TheJuliaThe networking-baremetal ml2 plugin, is supposed to basically remedy the situation one perceives via port binding "oh, no binding failure is okay" so people don't freak out.15:32
TheJuliain your neutron config, see if https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ovn.disable_ovn_dhcp_for_baremetal_ports is set to True or False15:34
TheJuliaif False, I'd expect something in the neutron-dhcp-agent logs15:34
TheJuliaany *other* dnsmasq log files are for introspection activities most likely, not deploy-time activities15:35
jizaymes'tenant_network_types = geneve' --- do I need something different here to accomodate bare metal / flat ?    I do not have the disable_ovn_dhcp_for_baremetal_ports setting so it should be default. 15:36
TheJuliaI suspect you need to add flat to that list15:38
jizaymesok, adding that setting then will retry & keep an eye on the dhcp-agent log15:42
opendevreviewVerification of a change to openstack/ironic-python-agent stable/2024.2 failed: Fix errors in the function erase_devices_express  https://review.opendev.org/c/openstack/ironic-python-agent/+/93993815:43
TheJuliaMost neutron services don't accept HUPs con configuration changes (why, I have no idea, it makes me want to table flip), so you'll need to restart neutron15:43
jizaymesis ironic-conductor involved at the time of provisioning? I do see some redfish messages in conductor log but not sure if they are related15:47
TheJuliaironic-conductor drives the entire workflow15:47
TheJuliaexpect it to check power state, set settings remotely15:47
TheJuliaif you were confiugred for virtual media, it would master an iso image and attach it15:48
jizaymesok then so recentering there then. TypeError: float() argument must be a string or a real number, not 'NoneType'. AI says "it appears some temperature sensors (e.g., sensor #8 and #17) are reporting as 'Absent' with None readings"15:49
rpittaubye everyone have a great weekend! o/15:52
jizaymesthis conductor message comes every 30 seconds so (perhaps poorly) assumed it was just some continual polling it does for stats, but not necessarily a blocker for the provisioning. 15:53
TheJuliajizaymes: wow, sensor data collection is on, if you have a backtrace for that, it should be easy to fix that, but it is should be unrelated if it is from the sensor data collection thread15:53
TheJuliaor AI is bogus16:08
TheJuliaonly place where we would get a float is from the sensor data, though16:08
jizaymesyeah most likely. Here is the payload and traceback if that serves any purpose with that error : https://gist.github.com/jizaymes/23fc1aac31b189c8ea955e4cd68f5c31     Continuing to try to poke around with ironic now that I added the flat tenant network to see if there is any improvement there (so far no)16:18
TheJuliaI'd be looking very closely at aspects like the mac address and the neutron-dhcp-service logs the ovn-agent logs16:19
TheJuliaor even the ovn logs16:19
TheJuliaanyway, I need to go. I have to take my wife to the doctor16:19
jizaymesthanks for your help16:20
TheJuliaI'll take a look at your sensor data thing when I get back, don't know how many hours that will be16:20
jizaymesi am not very concerned with the sensors since you suggested its likely not blocking provisioning, but overall just trying to give good feedback if its a problem that needs fixing. 16:21
TheJuliaYeah, it shouldn't be blocking at all16:22
cardoeTheJulia: keekz is having issues with openstack-helm's image building to test the patch. We really need to do our own thing. I'll make sure we report back once it's tested.16:26
cardoeSo I suspect you don't have an ml2 plugin for handling the baremetal binding. You probably need ovn for ml2.16:30
jizaymes2025-01-31 10:33:17.977 800 INFO neutron.plugins.ml2.plugin [None req-6aa40f93-7551-4c8f-bda5-453069581fc4 - - - - - -] Attempt 10 to provision port 3b6f51c5-92c5-4484-8dfc-b1a4c64922a816:33
jizaymesthey all just hit max attempts16:33
jizaymesmeanwhile, neutron-dhcp-agent.log says its completed but still says ,ignore in the dchp_hostsdir file     DHCP configuration for ports {'3b6f51c5-92c5-4484-8dfc-b1a4c64922a8'} is completed16:42
cardoedo you have ovn enabled as a ml2 plugin?16:50
jizaymesyes I believe I do.   in my ml2_conf.ini I do have mechanism_drivers = baremetal,ovn16:52
cardoedo you have networking-baremetal installed?16:55
cardoeSomething's gotta handle the port binding for the baremetal port. In the VLAN case for example networking-generic-switch would talk to your switch and put the server on that VLAN.16:56
jizaymesyes on my 3 controllers, networking-baremetal   6.4.016:56
cardoeI've never used flat so I dunno what's suppose to handle that.16:56
cardoeProbably peeking at the metal3 config would make that more clear.16:56
jizaymesok currently Im just using kolla-ansible with ironic not metal3 specifically. will look there to try to gain some insight though but wasnt intending to introduce kubernetes.     My hope was to implement this incrementally using dumb flat networking before introducing other complexities like configuring the switches but I am not sure if that approach is now working to my detriment16:58
cardoeWell I just mention metal3 because they use ironic and I believe they configure it for flat networking.16:59
jizaymesoh ok that makes sense will look at that. 16:59
cardoeI've always had to deal with driving switches so it's just not a familiar model to me.17:00
jizaymesin the end, and I dont know what is even available with ironic/neutron/ovn, the ultimate hope would be to have bare metal work exactly like virtual  machines in that they can use External networks, or virtual internal networks. This is in a cloud service provider environment where the clients are not trusted so I assumed the OVN stuff was out of the picture for security reasons.17:00
cardoeIt should work. However be aware that something needs to put those bare metal ports on the overlay network of VMs if you're looking for them to be able to interact together.17:02
cardoeSo think of each hypervisor really being a rack and having a switch in it, you mentioned geneve so each of your tenant networks are traveling over the physical network encapsulated with geneve to the other hypervisors. It hits the virtual switch inside the hypervisor and then decides which VM port to go to based on the encapsulation.17:04
TheJuliacardoe: ok17:04
jizaymesPerhaps its not safe to assume I added the bare metal node and its ports correctly.   Overall i have a cleaning network set in my ironic.conf, but also had added these two ports. 17:08
jizaymesopenstack baremetal port create ab:42:1a:1c:0f:5e --node uuid --physical-network physnet-cleaning --pxe-enabled true17:08
jizaymesopenstack baremetal port create ab:96:91:c1:47:40 --node uuid --physical-network physnet-external --pxe-enabled false17:08
jizaymesand then with the openstack server create command, I specify the neutron network name (Internet) only (assuming I dont need to specify cleaning since thats not customer facing) 17:09
TheJuliaGreetings from the ER! Do the MAC addresses match the MAC addresses on the physical machine? Typically we see sequential as well….17:15
TheJuliaSo physical networks are unrelated to cleaning, that likely could have been empty17:16
TheJuliaThat might actually be source of some of it17:16
cardoeER does not sound like fun. :(17:18
cardoejizaymes: your neutron network for cleaning has the provider:physical_network set to "physnet-cleaning"?17:19
TheJuliaSince physnet mappings help relate and drive that. Physnet are needed and used when your ironic node network interface is set to neutron and not flat. I guess we need to check the ironic node network_interface driver setting because that also helps govern how it aligns17:19
TheJuliaI think the kolla stuff is all only modeled on flat networking models. Clearly we need to likely write some new theory docs on how this all plugs together, or clean it up17:21
TheJuliaBeen too long since I’ve looked at those docs17:21
TheJuliacardoe: no… it is not.17:21
cardoeyeah like another "reference use case"17:22
JayFI'm presuming https://docs.openstack.org/ironic/latest/admin/networking.html#vif-attachment-flow is accurate and we still have no way to ask nova for "N" networks and somehow map them to ports/portgroups properly?18:04
jizaymesRe: Do the MAC addresses match. The one with pxe enabled is the MAC that is displayed on the PXE booting screen. There are several NICs in the system however only 2 of them enabled for this initial test. 1 is 1g for PXE intended to be private, and another is a 100g for production, ergo differetn mac address vendors. I will note that looking in the redfish output does display other MAC addresses for these ports. I have tried int18:13
jizaymeserchanging them in the ports at various points in my testing (though not today in this chat). It correlates to the 1g port thats up as displayed in the BMC management web interface.   Are you saying I should omit the --physical-network argument? 18:13
cardoeDoes your neutron network have the same value?18:31
jizaymessaw this in ironic-conductor log. 18:31
jizaymesWARNING sushy.resources.system.system [None req-a18... ccbc... 0bc...  - - default default] Could not figure out the allowed values for the reset system action for System Self18:31
jizaymesERROR ironic_prometheus_exporter.messaging [None req-a18ccf9c-7e18-465a-b935-961c2e470afa ccbc3570d6f54abb8bcecf7c32b0dee0 0bc4baf75fc74a199b027c37d2cf0c7b - - default default] 'node_uuid'18:31
jizaymes<traceback>18:31
jizaymesERROR oslo_messaging.notify.notifier KeyError: 'node_uuid'18:31
cardoe“In the case of the flat network_interface`, the requested VIF(s) utilized for all binding configurations in all states.” From the docs. I assume all states here mean provisioning, cleaning, deployed and service. I might make a patch to expand that.18:32
jizaymesProblem ''node_uuid'' attempting to send to notification system.  In looking at the payload dictionary, there is uuid and instance_uuid, but not node_uuid. 18:39
JayFjizaymes: can you file that as a bug against ironic-prometheus-exporter?18:43
* JayF wonders if there's a way to use capabilities/traits to dictate VIF-mappings 18:46
JayFusing flavors to workaround the inability to set network<>port mappings via nova18:47
JayFthat's likely not clean enough to upstream, but may be able to make it work; I have to take a deeper look at what happens in neutron18:47
jizaymes I am not at all married to this design and am open to alternatives ; my problem was finding out how to implement them18:51
jizaymesand yes I am working to file a bug. 18:51
JayFjizaymes: oh, I'm talking about a different problem in the same ballpark; except I'm pretty sure my story is going to end with "yeah that's a major feature request for nova, neutron, and ironic all at the same time" 😅18:52
JayFI'm not much of an OVN expert so just been reading along with your issues18:52
JayFif your IPA breaks I'm your guy though :P18:52
cardoeI’m curious to explore the capabilities / traits thing you mentioned.18:58
jizaymesive never filed a bug for openstack so hope I did it right. Shouldve probably read some sort of guide first 19:05
JayFcardoe: so basically I'm exploring if it's possible to have "N" portgroups mapping to "N" different networks, with the mapping being controllable by the nova booter19:06
JayFcardoe: our docs explicitly say this is impossible, and the more I trace the data model in nova, I'm not sure we even get all the information we'd need at the ironic virt driver layer19:06
jizaymesLet me just confirm some other assumptions.    I have 3 openstack controllers running 2024.2 / ubuntu 22.04. they have 2x100Gbps in bond0, then bond0.70 is added to br-cleaning, and an ip applied on that bridge on all 3 hosts.   In my kolla-ansible globals.yaml I have br-cleaning as my ironic_dnsmasq_interface and there is a 'cleaning-net' that was added with 'openstack network create' that is listed in the ironic_cleaning_netw19:16
jizaymesork.    That whole thing seems confusing as it exists in two places sort of. On the linux host but then also the subnet was added in the openstack network with dhcp enabled.    I have a physnet-cleaning (and -external) both set with their respective bridge name (br-cleaning and br-ex). As I'm writing this, I'm thinking I shouldnt have any IPs set on the linux system level and that is probably conflicting.19:16
JayF> As I'm writing this, I'm thinking I shouldnt have any IPs set on the linux system level and that is probably conflicting.19:20
JayFwhat does that mean exactly?19:20
JayFI'll also note: openstack's kolla channel might be helpful too; we know a lot about ironic-the-service but your question crosses a lot of service borders19:21
jizaymeswell for example, my external interfaces, in my linux netplan configs, I have no ip addresses configured and I dont explicitly make the br-ex bridge. OVN does that I guess. Conversely, my management network, I do make br-mgmt and have an IP on all of the hosts. Assuming OVN managed the external one (theorizing that it also manages the cleaning network in the same manner) 19:22
jizaymesthanks. I am there too but didnt speak yet. 19:22
JayFYeah based on your example you're likely ahead of my knowledge in this space generally :) 19:23
JayFAlso openstack-discuss@lists.openstack.org might be a good idea given it crosses so many borders19:23
cardoeI suspect the issue is that it’s not really flat networking. I see the bond0.70 which to me says this is tagging traffic with VLAN 70 on this interface.19:37
cardoeBut those baremetal ports aren’t gonna have anything tagging that for them. It’s gotta be the switch.19:37
cardoejizaymes: your network is likely some overlay network on your physical side like I was mentioning before about the virtual switch.19:38
JayFoh, wait, that's being tried all with flat networking!?19:38
cardoeYeah. I was trying to explain that if he’s blending networks that work with hypervisors then there’s an overlay going on and not actually flat networking19:39
JayFwell, in ironic parlance, flat doesn't mean he can't have vlan on top of bond19:39
JayFit just means that has to be statically configured19:39
cardoeYour port group idea sounds +1 to me for  making things map with nova btw. I’ll work with ya on it.19:39
JayFand likely will need to use a switch which will happily fallback to one-headed for pxe booting OR static networking w/ipa19:40
JayFcardoe: well, it's not an "idea", I'm tryingto gather knowledge still19:40
JayFcardoe: just trying to figure out the shape of the problem and how much of this can already be done 19:40
cardoeYeah. Agreed. That’s the question. How are these things physically plugged in.19:40
JayFas you've picked up on I'm sure; I know a lot about the IPA side, a lot about disk imaging and working with servers, and even the nova<>ironic driver. Neutron ... not so much19:40
cardoeI’m learning the neutron side. A lot lately.19:41
opendevreviewMerged openstack/ironic master: CI: Remove IPv6 testing  https://review.opendev.org/c/openstack/ironic/+/94007219:49
jizaymesthe kolla-ansible instructions dont even really mention much for networking so I just mimiked the external internet network I have which is just a flat network with the gateway elsewhere on my routers.  From ovn's perspective its flat in that bond0.70 is presented to it with OVN having no need to insert a VLAN tag (the OS handles that). This is the same for my existing bond0.80 which is the external network in question, for whi19:51
jizaymesch my vm instances have no problem attaching and is also marked as flat.19:51
jizaymesUp until a few minutes ago, I had br-cleaning configured in my netplan config and IP set on my 3 controllers, I would see the DHCP requests come in from my bare metal server but would be ignored.  I just reconfigured to remove br-cleaning and the IP on each of the 3 controllers in hopes OVN would handle that. The issue does not appear to be physical networking related, more so that whatever mechanism is supposed to update the d19:51
jizaymeshcp hostsdir file to get it to not ignore my server, is not happening.  I had assumed that was a result of the earlier mentioned VIF binding problems. 19:51
cardoejizaymes: so what neutron network are your VMs attaching to?20:17
jizaymesmy external network "Internet" which maps to physnet-external which is bond0.8020:18
opendevreviewDoug Goldstein proposed openstack/sushy master: migrate sushy_oem_idrac to sushy  https://review.opendev.org/c/openstack/sushy/+/94055720:18
cardoeso can you do openstack network show <external-name>20:20
cardoeActually nvm.20:21
cardoeYou answered it20:21
cardoeThe OS is handling the VLAN tag.20:21
jizaymesyes a bit large to paste but yes its successful, state is UP, network-tyep is flat, physical_network is physnet-external, and status is ACTIVE20:21
cardoeWhat's handling the VLAN tag for the physical ports for the ironic nodes?20:22
cardoeYou said br-cleaning is on bond0.70. As you stated your OS is handling the VLAN tag. aka you have a virtual switch in your OS.20:23
jizaymesshould be nothing setting any vlans. I have it hard set as an access port in the external vlan 80.   eventually would like to wrap in ml2 to reconfigure my switch but do not have anything configured for that presently within openstack or ml220:23
cardoeokay so you set the physical ports to access VLAN 80 but cleaning (and PXE by extension) is on 70.20:23
jizaymesand the internal cleaning port vlan 70 is also hard set to that prt. 20:24
jizaymes(to its respective port)20:24
jizaymesgoal was to use 1g nic 1 for all pxe / cleaning / deployment activities, and 100g nic 2 as just (initially) dumb member of that separate Internet network20:25
JayFWhat's the value in having those activities split?20:26
JayFIt doesn't mitigate the risk of running flat network interface since your provisioned nodes still have an interface in it20:26
JayF(it == cleaning/provisioning/rescue/servicing network)20:26
jizaymes1) not having as much ml2 switch reconfiguration stuff 2) pseudo security in that I can turn off that port completely after its provisioned to limit access inward. 20:27
JayFokay, you're thinking more about agent->server threat model20:27
jizaymesits likely I just dont understand how this is "supposed" to work20:27
JayFwe usually worry more about tenants interfering with cleaning or deployment20:27
JayFwell, I say we, I do at least :D 20:27
JayFyour logic makes sense 20:28
JayFjust asking about use cases because it's easy to get down an "XY problem" style hole in openstack20:28
jizaymeswell that was another reason to keep it as a totally separate network so that customers would have limited ability to muck with it in that volatile state20:28
JayFwell, unless you have some network automation flipping the port off on the switch side; there'd be nothing preventing a tenant from turning up that port from a provisioned node, yeah?20:28
JayFso do you have sidecar network automation you'd expect to be running20:29
JayFor am I just in multitenant mindset and you're not multitenant at all 20:29
jizaymeswell enter ml2. not a lot of documentation on that, was hoping that could be configured to do that afterwards. 20:29
JayFso the normal/expected/happy path is Ironic w/neutron network interface20:29
jizaymesI'm in a public cloud service provider scenario so I want as much separation from backend as possible, and assuming the customers are all untrustworthy a-holes ;) 20:30
JayFand we flip the ports to bond+vlan for tenants, flip it to a different network for cleaning/provisioning/etc (you can even define different ones for this)20:30
JayFfor a public cloud model, you 100% ^^^ need that20:30
cardoeUltimately I think there needs to be some more logging to see why you're getting a binding error.20:30
JayFthe other thing to warn you about jizaymes as someone who's done multitenant cloud before (I was one of hte original engineers for rackspace onmetal like 10y ago); be wary of hardware hacks20:31
cardoeThe separate NIC approach is the same convo I've had as a public provider as well.20:31
JayFin our case, we did lots of coordination with hardware vendors to ensure we were able to keep it secure20:31
JayFbut a root user has a lot of hardware access and can wreak havoc on consumer-level gear 20:31
cardoeBut like JayF mentioned you then have the issue that the tenant can just enable the port. So then you need something to shut off the port. And if you're already doing port manipulation, might as well just run them both over the same physical ports.20:31
JayF(in that public cloud case; we explicitly read out and validated the firmware code was unchanged during each cleaning cycle)20:31
JayFcardoe: exactly my logic20:32
jizaymesok so ml2 can change vlans but cant enable/disable ports ? 20:32
cardoeYou can20:33
jizaymesi guess I was figuring ml2 would shut down the port on the switchport side so the customer would have no ability to bring it up 20:33
jizaymesi didnt even really get to learning ml2 yet as I cant even get a basic OS on this thing in a single vlan hehe20:33
cardoeUltimately it comes down to a cost thing.20:34
cardoePeople will end up deciding to collapse the provisioning switch since it's running as 1g/10g onto the same back plane that the 40g/100g TOR lives on.20:35
cardoeThen redundancy convos happen20:35
cardoeAnyway, turn up the logs to see why the bind is failing.20:36
jizaymeswill do. thank you both for the help!20:37
cardoeHappy to speak from an operator perspective. JayF and I both have touched that area a bunch.20:38
opendevreviewDoug Goldstein proposed openstack/ironic master: doc: fix typo and slight wording order for networking  https://review.opendev.org/c/openstack/ironic/+/94055820:39
JayFwell; yes-ish cardoe :) Most places I've worked (before $curJob) that ran this kinda setup was so forked much of my knowledge doesn't apply20:39
JayFnetworking and inspector are the two places I know the least about and I'm having to rapidly get up to speed in both cases20:39
cardoeWell I meant generically we can speak to it.20:40
cardoeI'm trying hard to kill forks.20:40
JayFyou all still have nicera?20:40
JayFor nsx or whatever it was called20:40
jizaymeshow about this question since I have not yet been able to get on the console of an ironic-installed bare-metal server -- is it just a normal OS that the customer has full access to, or is it like a VM abstracted from the end user but with more hardware passthrough or something? I always assumed it was like the former, which is why we couldnt have more deep OVN networking there (not trusted)20:41
JayFso when you provision a node with Ironic we put an OS on it, setup the configdrive so cloud-init/glean/ignition can setup networking and a user, setup the actual switch side of the network, and flip it on20:43
JayFat that point, when it's on, it's indistinguishable from a random machine installed via an installer20:43
jizaymesok thanks got it. maybe its easier just to sell big single-vms on dedicated hardware using normal nova, but keeping all of the built-in network fanciness that exists..and just do hardware passhtru for GPU or storage.   though.. I cant get this to work; whos to say I could get that to work :) 20:45
JayFYou'll be able to get the network stuff working I suspect, you're on the right path just doing bond+vlan makes it extra tough20:48
JayFbut just make sure to think about the security implications20:48
JayFI will tell you in our testing (10 years ago) we found about a 5-10% performance hit doing "giant VM on dedicated hypervisor" vs "actual bare metal"20:48
JayFbut that was haswell era in 2014 so who knows20:49
jizaymestoo bad my fancy Intel 100gbps cards cant terminate the neutron / ovn stuff out of band. isnt that what some of the mellanox stuff does? 20:51
cardoe"Smart NICs"20:52
cardoeThough I'm using VXLAN fabric now and almost there to having OVN joining it via BGP EVPN an participating as part of the fabric.20:53
JayFWe have absolutely dreamed of an Ironic network interface that was just like20:56
JayF"hey smartnic, expose a different set of vlans now"20:56
JayFthat does not exist as of now afaik20:57
JayFI think the closest we have to magic network operation is something like the OVN VTEP switches, but I've never seen that in practice just in theory, but the theory is cool20:57
jizaymesyeah I saw something like that with my arista switches but it seemed outdated.20:58
JayFhttps://docs.openstack.org/ironic/latest/admin/ovn-networking.html#vtep-switch-support was adapted from some patches by a juniper dev20:58
JayFbut afaik outside of that dev's patched setup, nobody has used this in production yet20:58
JayFand for all the cool hardware I have, switches that cost as much as my car are not one of them ;)20:59
jizaymesso as you said earlier, the common design pattern is just either a single nic or 2 nics in a bond, then ml2 changes the vlan between provisioning stages -- cleaning/deploy/pxe vlan first, then changes to the production vlan with ml2?   and the delta is right now I'm trying to do the same but with out any vlan changes, using 2 interfaces instead of 1.   21:03
JayFYou are the first person I think who has tried to do that (at least and mentioned it to us), and I'm not certain it's possible.21:08
JayFWe should document it if you figure oujt the path :D 21:08
jizaymesman does this always happen21:09
jizaymesugh. it feels remedial to me, yet of course its not haha21:09
jizaymesok I will pivot to figure out the ml2 part I guess. more dependencies..21:11
jizaymesdo I have to make a script for it to use? or does it magically know how to configure my switch?  I dabbled for like 2 minutes trying to get the networking-arista module installed but didnt want to go too far off the ranch with out any of this other stuff working21:13
cardoeSo that vtep-switch setup btw is what I've kinda got going.21:17
cardoeGimme a few more months and it'll flesh out better.21:18
cardoeFirst inspection21:18
cardoejizaymes: you'll likely want to look at networking-generic-switch21:18
jizaymesok wow. this is me coming out of the kolla bubble -- theres no mention of this in their docs or configuration examples that I can see. 21:23
jizaymesis it common for bare metal servers to be directly on the external Internet vlan like I'm trying, or is it usually somehow brought back into ovn via a vlan to be routed out from there?21:23
jizaymes(in the openstack world, I know in the general scope of the world that yes it is common)21:24
jizaymesalso throwing this out there for anyone just jumping in -- I'm happy to engage an independent consultant to provide paid guidance for this initiative.21:53
JayFI have absolutely seen public addresses given out to machines. It's the way that onmetal works (or at least did when I left)22:45
JayFjizaymes: most ironic contributors either work for a deployer like Rackspace (cardo e) or g-research (me) or for red hat who may offer some kind of professional services, I have no idea22:45
JayFjizaymes: a group that I know does consulting services and knows a lot about this area is stackHPC22:46
JayFThey've been in the community for a long time and I personally know of at least one setup similar to what you want to do that they've worked on22:46
JayFI have no idea what their incoming sales process looks like, but you can Google for them22:46
JayFI'm sure there are plenty of others too, those are just the folks I've worked with the most22:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!