opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/allocations https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937213 | 02:24 |
---|---|---|
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937214 | 02:24 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937214 | 06:02 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937214 | 06:30 |
rpittau | good morning ironic! happy friday! o/ | 06:59 |
janders | happy friday rpittau o/ | 09:35 |
rpittau | tnx janders :) | 09:36 |
opendevreview | Jacob Anders proposed openstack/bifrost master: Update `uuid` to `id` in node_info to match module change https://review.opendev.org/c/openstack/bifrost/+/940505 | 10:20 |
opendevreview | Jacob Anders proposed openstack/bifrost master: Fix typo in CLI parameter spefifying config drive. https://review.opendev.org/c/openstack/bifrost/+/940507 | 10:26 |
janders | ^^ some quick and easy reviews w/r/t my yesterday's bifrost deploy playbook fault-finding if anyone has time | 10:27 |
janders | the latter is literally a single-character change | 10:28 |
janders | (not like the former is much bigger, that would be two chars... :) ) | 10:28 |
janders | now back to what I was actually working on when I ran into these and started debugging | 10:28 |
janders | I had a look at that CI issue and reproduced it locally, it seems to be complaining about a missing ansible module: https://paste.opendev.org/show/buTiKaYnSqhLBx1Bny0B/ | 11:34 |
janders | why this is needed for a single-char doc change, I do not know | 11:34 |
opendevreview | cid proposed openstack/ironic master: Log secure boot access failures at INFO level https://review.opendev.org/c/openstack/ironic/+/940433 | 12:22 |
opendevreview | Merged openstack/ironic master: doc/source/admin fixes part-3 https://review.opendev.org/c/openstack/ironic/+/939003 | 12:25 |
frickler | janders: your local error looks different, you need the clone the collection repo beside the bifrost repo | 13:19 |
frickler | but I also have no idea what this is trying to tell us: ANSIBLE_COLLECTIONS_PATHS was detected, replace it with ANSIBLE_COLLECTIONS_PATH to continue. | 13:20 |
frickler | oh, wait, it is PATHS vs. PATH | 13:20 |
opendevreview | Kaifeng Wang proposed openstack/ironic master: Support querying node history with sort_key and sort_dir https://review.opendev.org/c/openstack/ironic/+/940522 | 13:38 |
opendevreview | Riccardo Pittau proposed openstack/bifrost master: Fix ansible linters https://review.opendev.org/c/openstack/bifrost/+/940527 | 14:17 |
rpittau | janders: ^ | 14:17 |
TheJulia | Hey, did we ever hear from keekz if my patch fixed their deployment up? | 14:56 |
TheJulia | well, cleaning, that is | 14:56 |
TheJulia | Hey folks, any chance I can get another set of eyes on https://review.opendev.org/c/openstack/ironic/+/940072 ? | 15:05 |
jizaymes | Hello, and good day. I've been strugging with getting Ironic to provision a machine. I'll ask a few questions but if there are any consultants for hire on this topic, I'd be interested too. Story: an existing openstack dalmation environment that I'm now trying to add bare metal to. I have a node added and inspected with redfish. When using the 'openstack server create' command, it starts building, the machine gets powered on | 15:09 |
jizaymes | but never DHCPs. Neutron cant bind a VIF, and the dhcp hostsdir file is always set to ,ignore, and it never dhcps as a result. I'm trying with what I hope is a pretty remedial setup, with a flat cleaning/provisioning network set up, and then an Internet / external network for the server's 'production' traffic (more questions to come on ironic networking once I get the basics down). Virtual machines can bind to the externa | 15:09 |
jizaymes | l network but the bare metal vifs fail. Suggestions on how to diagnose further? | 15:09 |
TheJulia | Greetings jizaymes, is Neutron setup to use OVN or dhcp-agent ? | 15:13 |
jizaymes | I believe neutron is using OVN. forgive my noobness. Using kolla-ansible, with configs such as enable_ironic_neutron_agent: "yes" and neutron_enable_ovn_agent: "yes". I see baremetal,ovn in my mechanism_drivers | 15:17 |
TheJulia | ... okay | 15:17 |
TheJulia | so... hmm | 15:17 |
TheJulia | IPv4 or IPv6? | 15:17 |
TheJulia | You configure ironic which to default to in ironic.conf && that uses the settings provided in the cleaning/provisioning networks to boot the machine | 15:19 |
TheJulia | The root of the question is attempting to figure out where things might be going sideways | 15:20 |
TheJulia | (and no, not available as a consultant, sorry!) | 15:20 |
jizaymes | IPv4 is all thats in use here so it should default to that. | 15:20 |
jizaymes | Unassuming error message: | 15:20 |
jizaymes | 2025-01-31 09:20:28.715 793 ERROR neutron.plugins.ml2.managers [req-1f22b0dc-3e25-487e-86f5-65af2bf23781 req-46b50e33-97bb-4c3f-9a6c-fcd3ace1ab36 ccbc3570d6f54abb8bcecf7c32b0dee0 0bc4baf75fc74a199b027c37d2cf0c7b - - default default] Failed to bind port 1d0da9b6-840e-4e42-8cbe-c3cfff2d0f7c on host e516893e-e9fe-4cbe-a5a8-db2af9ca019c for vnic_type baremetal using segments [{'id': '5222c1eb-47e7-410f-ac52-2f543ef16095', 'network_ | 15:20 |
jizaymes | type': 'flat', 'physical_network': 'physnet-external', 'segmentation_id': None, 'network_id': '4df54151-7aea-4321-84e4-0fb99d2a24f5'}] | 15:20 |
TheJulia | (But, happy to help) | 15:20 |
TheJulia | so, the agent might not be running, but that *shouldn't* be fatal | 15:21 |
TheJulia | on the physical machine, can you check the boot settings to make sure it is set to IPv4 PXE boot for network booting. That is unless your doing HTTPBoot explicitly | 15:21 |
TheJulia | Some servers are only going to issue requests for one type and they might disregard responses which don't match, so it looks like it doesn't get an IP at all. | 15:22 |
jizaymes | i have both a neutron_dhcp_agent and neutron_ovn_agent containers, along with ironic_neutron_agent on all 3 of my controller systems -- is that dhcp and ovn agent conflicting? | 15:23 |
TheJulia | ... There is a specific flag for baremetal in ovn | 15:23 |
TheJulia | uhhh | 15:23 |
TheJulia | one moment | 15:23 |
jizaymes | I see the thing trying to PXE boot, and tcpdump sees it coming in. DHCP logs just say that it sees the request but ignores | 15:23 |
TheJulia | lovely | 15:24 |
TheJulia | is that the logs from the neutron_dhcp_agent ? | 15:24 |
TheJulia | or another dnsmasq's logs? | 15:24 |
jizaymes | the log I pasted earlier is from neutron-server.log | 15:25 |
jizaymes | ok Im seeing some other errors in neutron-dhcp-agent.log that I will look into now. | 15:26 |
jizaymes | 2025-01-29 17:05:46.074 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [9d542f66-4778-436a-8bc4-6ecedb0b8c37] AMQP server on 10.1.0.83:5672 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.: amqp.exceptions.ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown' | 15:26 |
jizaymes | oh nvm that was 2 days ago. ignoring | 15:27 |
TheJulia | So the way it basically works is basically a request comes into neutron | 15:30 |
TheJulia | That message gets put on the message bus and is executed upon by plugins in parallel | 15:30 |
TheJulia | so you get DHCP configuration being disjointed from physcial port binding, or the actual perception of attachment | 15:30 |
TheJulia | because your have a flat network, port binding doesn't matter | 15:30 |
TheJulia | For the most part, it is informational | 15:31 |
TheJulia | as a result of the use of the message bus, the dhcp stuffs get logged in different log files than neutron-server.log | 15:31 |
TheJulia | The networking-baremetal ml2 plugin, is supposed to basically remedy the situation one perceives via port binding "oh, no binding failure is okay" so people don't freak out. | 15:32 |
TheJulia | in your neutron config, see if https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ovn.disable_ovn_dhcp_for_baremetal_ports is set to True or False | 15:34 |
TheJulia | if False, I'd expect something in the neutron-dhcp-agent logs | 15:34 |
TheJulia | any *other* dnsmasq log files are for introspection activities most likely, not deploy-time activities | 15:35 |
jizaymes | 'tenant_network_types = geneve' --- do I need something different here to accomodate bare metal / flat ? I do not have the disable_ovn_dhcp_for_baremetal_ports setting so it should be default. | 15:36 |
TheJulia | I suspect you need to add flat to that list | 15:38 |
jizaymes | ok, adding that setting then will retry & keep an eye on the dhcp-agent log | 15:42 |
opendevreview | Verification of a change to openstack/ironic-python-agent stable/2024.2 failed: Fix errors in the function erase_devices_express https://review.opendev.org/c/openstack/ironic-python-agent/+/939938 | 15:43 |
TheJulia | Most neutron services don't accept HUPs con configuration changes (why, I have no idea, it makes me want to table flip), so you'll need to restart neutron | 15:43 |
jizaymes | is ironic-conductor involved at the time of provisioning? I do see some redfish messages in conductor log but not sure if they are related | 15:47 |
TheJulia | ironic-conductor drives the entire workflow | 15:47 |
TheJulia | expect it to check power state, set settings remotely | 15:47 |
TheJulia | if you were confiugred for virtual media, it would master an iso image and attach it | 15:48 |
jizaymes | ok then so recentering there then. TypeError: float() argument must be a string or a real number, not 'NoneType'. AI says "it appears some temperature sensors (e.g., sensor #8 and #17) are reporting as 'Absent' with None readings" | 15:49 |
rpittau | bye everyone have a great weekend! o/ | 15:52 |
jizaymes | this conductor message comes every 30 seconds so (perhaps poorly) assumed it was just some continual polling it does for stats, but not necessarily a blocker for the provisioning. | 15:53 |
TheJulia | jizaymes: wow, sensor data collection is on, if you have a backtrace for that, it should be easy to fix that, but it is should be unrelated if it is from the sensor data collection thread | 15:53 |
TheJulia | or AI is bogus | 16:08 |
TheJulia | only place where we would get a float is from the sensor data, though | 16:08 |
jizaymes | yeah most likely. Here is the payload and traceback if that serves any purpose with that error : https://gist.github.com/jizaymes/23fc1aac31b189c8ea955e4cd68f5c31 Continuing to try to poke around with ironic now that I added the flat tenant network to see if there is any improvement there (so far no) | 16:18 |
TheJulia | I'd be looking very closely at aspects like the mac address and the neutron-dhcp-service logs the ovn-agent logs | 16:19 |
TheJulia | or even the ovn logs | 16:19 |
TheJulia | anyway, I need to go. I have to take my wife to the doctor | 16:19 |
jizaymes | thanks for your help | 16:20 |
TheJulia | I'll take a look at your sensor data thing when I get back, don't know how many hours that will be | 16:20 |
jizaymes | i am not very concerned with the sensors since you suggested its likely not blocking provisioning, but overall just trying to give good feedback if its a problem that needs fixing. | 16:21 |
TheJulia | Yeah, it shouldn't be blocking at all | 16:22 |
cardoe | TheJulia: keekz is having issues with openstack-helm's image building to test the patch. We really need to do our own thing. I'll make sure we report back once it's tested. | 16:26 |
cardoe | So I suspect you don't have an ml2 plugin for handling the baremetal binding. You probably need ovn for ml2. | 16:30 |
jizaymes | 2025-01-31 10:33:17.977 800 INFO neutron.plugins.ml2.plugin [None req-6aa40f93-7551-4c8f-bda5-453069581fc4 - - - - - -] Attempt 10 to provision port 3b6f51c5-92c5-4484-8dfc-b1a4c64922a8 | 16:33 |
jizaymes | they all just hit max attempts | 16:33 |
jizaymes | meanwhile, neutron-dhcp-agent.log says its completed but still says ,ignore in the dchp_hostsdir file DHCP configuration for ports {'3b6f51c5-92c5-4484-8dfc-b1a4c64922a8'} is completed | 16:42 |
cardoe | do you have ovn enabled as a ml2 plugin? | 16:50 |
jizaymes | yes I believe I do. in my ml2_conf.ini I do have mechanism_drivers = baremetal,ovn | 16:52 |
cardoe | do you have networking-baremetal installed? | 16:55 |
cardoe | Something's gotta handle the port binding for the baremetal port. In the VLAN case for example networking-generic-switch would talk to your switch and put the server on that VLAN. | 16:56 |
jizaymes | yes on my 3 controllers, networking-baremetal 6.4.0 | 16:56 |
cardoe | I've never used flat so I dunno what's suppose to handle that. | 16:56 |
cardoe | Probably peeking at the metal3 config would make that more clear. | 16:56 |
jizaymes | ok currently Im just using kolla-ansible with ironic not metal3 specifically. will look there to try to gain some insight though but wasnt intending to introduce kubernetes. My hope was to implement this incrementally using dumb flat networking before introducing other complexities like configuring the switches but I am not sure if that approach is now working to my detriment | 16:58 |
cardoe | Well I just mention metal3 because they use ironic and I believe they configure it for flat networking. | 16:59 |
jizaymes | oh ok that makes sense will look at that. | 16:59 |
cardoe | I've always had to deal with driving switches so it's just not a familiar model to me. | 17:00 |
jizaymes | in the end, and I dont know what is even available with ironic/neutron/ovn, the ultimate hope would be to have bare metal work exactly like virtual machines in that they can use External networks, or virtual internal networks. This is in a cloud service provider environment where the clients are not trusted so I assumed the OVN stuff was out of the picture for security reasons. | 17:00 |
cardoe | It should work. However be aware that something needs to put those bare metal ports on the overlay network of VMs if you're looking for them to be able to interact together. | 17:02 |
cardoe | So think of each hypervisor really being a rack and having a switch in it, you mentioned geneve so each of your tenant networks are traveling over the physical network encapsulated with geneve to the other hypervisors. It hits the virtual switch inside the hypervisor and then decides which VM port to go to based on the encapsulation. | 17:04 |
TheJulia | cardoe: ok | 17:04 |
jizaymes | Perhaps its not safe to assume I added the bare metal node and its ports correctly. Overall i have a cleaning network set in my ironic.conf, but also had added these two ports. | 17:08 |
jizaymes | openstack baremetal port create ab:42:1a:1c:0f:5e --node uuid --physical-network physnet-cleaning --pxe-enabled true | 17:08 |
jizaymes | openstack baremetal port create ab:96:91:c1:47:40 --node uuid --physical-network physnet-external --pxe-enabled false | 17:08 |
jizaymes | and then with the openstack server create command, I specify the neutron network name (Internet) only (assuming I dont need to specify cleaning since thats not customer facing) | 17:09 |
TheJulia | Greetings from the ER! Do the MAC addresses match the MAC addresses on the physical machine? Typically we see sequential as well…. | 17:15 |
TheJulia | So physical networks are unrelated to cleaning, that likely could have been empty | 17:16 |
TheJulia | That might actually be source of some of it | 17:16 |
cardoe | ER does not sound like fun. :( | 17:18 |
cardoe | jizaymes: your neutron network for cleaning has the provider:physical_network set to "physnet-cleaning"? | 17:19 |
TheJulia | Since physnet mappings help relate and drive that. Physnet are needed and used when your ironic node network interface is set to neutron and not flat. I guess we need to check the ironic node network_interface driver setting because that also helps govern how it aligns | 17:19 |
TheJulia | I think the kolla stuff is all only modeled on flat networking models. Clearly we need to likely write some new theory docs on how this all plugs together, or clean it up | 17:21 |
TheJulia | Been too long since I’ve looked at those docs | 17:21 |
TheJulia | cardoe: no… it is not. | 17:21 |
cardoe | yeah like another "reference use case" | 17:22 |
JayF | I'm presuming https://docs.openstack.org/ironic/latest/admin/networking.html#vif-attachment-flow is accurate and we still have no way to ask nova for "N" networks and somehow map them to ports/portgroups properly? | 18:04 |
jizaymes | Re: Do the MAC addresses match. The one with pxe enabled is the MAC that is displayed on the PXE booting screen. There are several NICs in the system however only 2 of them enabled for this initial test. 1 is 1g for PXE intended to be private, and another is a 100g for production, ergo differetn mac address vendors. I will note that looking in the redfish output does display other MAC addresses for these ports. I have tried int | 18:13 |
jizaymes | erchanging them in the ports at various points in my testing (though not today in this chat). It correlates to the 1g port thats up as displayed in the BMC management web interface. Are you saying I should omit the --physical-network argument? | 18:13 |
cardoe | Does your neutron network have the same value? | 18:31 |
jizaymes | saw this in ironic-conductor log. | 18:31 |
jizaymes | WARNING sushy.resources.system.system [None req-a18... ccbc... 0bc... - - default default] Could not figure out the allowed values for the reset system action for System Self | 18:31 |
jizaymes | ERROR ironic_prometheus_exporter.messaging [None req-a18ccf9c-7e18-465a-b935-961c2e470afa ccbc3570d6f54abb8bcecf7c32b0dee0 0bc4baf75fc74a199b027c37d2cf0c7b - - default default] 'node_uuid' | 18:31 |
jizaymes | <traceback> | 18:31 |
jizaymes | ERROR oslo_messaging.notify.notifier KeyError: 'node_uuid' | 18:31 |
cardoe | “In the case of the flat network_interface`, the requested VIF(s) utilized for all binding configurations in all states.” From the docs. I assume all states here mean provisioning, cleaning, deployed and service. I might make a patch to expand that. | 18:32 |
jizaymes | Problem ''node_uuid'' attempting to send to notification system. In looking at the payload dictionary, there is uuid and instance_uuid, but not node_uuid. | 18:39 |
JayF | jizaymes: can you file that as a bug against ironic-prometheus-exporter? | 18:43 |
* JayF wonders if there's a way to use capabilities/traits to dictate VIF-mappings | 18:46 | |
JayF | using flavors to workaround the inability to set network<>port mappings via nova | 18:47 |
JayF | that's likely not clean enough to upstream, but may be able to make it work; I have to take a deeper look at what happens in neutron | 18:47 |
jizaymes | I am not at all married to this design and am open to alternatives ; my problem was finding out how to implement them | 18:51 |
jizaymes | and yes I am working to file a bug. | 18:51 |
JayF | jizaymes: oh, I'm talking about a different problem in the same ballpark; except I'm pretty sure my story is going to end with "yeah that's a major feature request for nova, neutron, and ironic all at the same time" 😅 | 18:52 |
JayF | I'm not much of an OVN expert so just been reading along with your issues | 18:52 |
JayF | if your IPA breaks I'm your guy though :P | 18:52 |
cardoe | I’m curious to explore the capabilities / traits thing you mentioned. | 18:58 |
jizaymes | ive never filed a bug for openstack so hope I did it right. Shouldve probably read some sort of guide first | 19:05 |
JayF | cardoe: so basically I'm exploring if it's possible to have "N" portgroups mapping to "N" different networks, with the mapping being controllable by the nova booter | 19:06 |
JayF | cardoe: our docs explicitly say this is impossible, and the more I trace the data model in nova, I'm not sure we even get all the information we'd need at the ironic virt driver layer | 19:06 |
jizaymes | Let me just confirm some other assumptions. I have 3 openstack controllers running 2024.2 / ubuntu 22.04. they have 2x100Gbps in bond0, then bond0.70 is added to br-cleaning, and an ip applied on that bridge on all 3 hosts. In my kolla-ansible globals.yaml I have br-cleaning as my ironic_dnsmasq_interface and there is a 'cleaning-net' that was added with 'openstack network create' that is listed in the ironic_cleaning_netw | 19:16 |
jizaymes | ork. That whole thing seems confusing as it exists in two places sort of. On the linux host but then also the subnet was added in the openstack network with dhcp enabled. I have a physnet-cleaning (and -external) both set with their respective bridge name (br-cleaning and br-ex). As I'm writing this, I'm thinking I shouldnt have any IPs set on the linux system level and that is probably conflicting. | 19:16 |
JayF | > As I'm writing this, I'm thinking I shouldnt have any IPs set on the linux system level and that is probably conflicting. | 19:20 |
JayF | what does that mean exactly? | 19:20 |
JayF | I'll also note: openstack's kolla channel might be helpful too; we know a lot about ironic-the-service but your question crosses a lot of service borders | 19:21 |
jizaymes | well for example, my external interfaces, in my linux netplan configs, I have no ip addresses configured and I dont explicitly make the br-ex bridge. OVN does that I guess. Conversely, my management network, I do make br-mgmt and have an IP on all of the hosts. Assuming OVN managed the external one (theorizing that it also manages the cleaning network in the same manner) | 19:22 |
jizaymes | thanks. I am there too but didnt speak yet. | 19:22 |
JayF | Yeah based on your example you're likely ahead of my knowledge in this space generally :) | 19:23 |
JayF | Also openstack-discuss@lists.openstack.org might be a good idea given it crosses so many borders | 19:23 |
cardoe | I suspect the issue is that it’s not really flat networking. I see the bond0.70 which to me says this is tagging traffic with VLAN 70 on this interface. | 19:37 |
cardoe | But those baremetal ports aren’t gonna have anything tagging that for them. It’s gotta be the switch. | 19:37 |
cardoe | jizaymes: your network is likely some overlay network on your physical side like I was mentioning before about the virtual switch. | 19:38 |
JayF | oh, wait, that's being tried all with flat networking!? | 19:38 |
cardoe | Yeah. I was trying to explain that if he’s blending networks that work with hypervisors then there’s an overlay going on and not actually flat networking | 19:39 |
JayF | well, in ironic parlance, flat doesn't mean he can't have vlan on top of bond | 19:39 |
JayF | it just means that has to be statically configured | 19:39 |
cardoe | Your port group idea sounds +1 to me for making things map with nova btw. I’ll work with ya on it. | 19:39 |
JayF | and likely will need to use a switch which will happily fallback to one-headed for pxe booting OR static networking w/ipa | 19:40 |
JayF | cardoe: well, it's not an "idea", I'm tryingto gather knowledge still | 19:40 |
JayF | cardoe: just trying to figure out the shape of the problem and how much of this can already be done | 19:40 |
cardoe | Yeah. Agreed. That’s the question. How are these things physically plugged in. | 19:40 |
JayF | as you've picked up on I'm sure; I know a lot about the IPA side, a lot about disk imaging and working with servers, and even the nova<>ironic driver. Neutron ... not so much | 19:40 |
cardoe | I’m learning the neutron side. A lot lately. | 19:41 |
opendevreview | Merged openstack/ironic master: CI: Remove IPv6 testing https://review.opendev.org/c/openstack/ironic/+/940072 | 19:49 |
jizaymes | the kolla-ansible instructions dont even really mention much for networking so I just mimiked the external internet network I have which is just a flat network with the gateway elsewhere on my routers. From ovn's perspective its flat in that bond0.70 is presented to it with OVN having no need to insert a VLAN tag (the OS handles that). This is the same for my existing bond0.80 which is the external network in question, for whi | 19:51 |
jizaymes | ch my vm instances have no problem attaching and is also marked as flat. | 19:51 |
jizaymes | Up until a few minutes ago, I had br-cleaning configured in my netplan config and IP set on my 3 controllers, I would see the DHCP requests come in from my bare metal server but would be ignored. I just reconfigured to remove br-cleaning and the IP on each of the 3 controllers in hopes OVN would handle that. The issue does not appear to be physical networking related, more so that whatever mechanism is supposed to update the d | 19:51 |
jizaymes | hcp hostsdir file to get it to not ignore my server, is not happening. I had assumed that was a result of the earlier mentioned VIF binding problems. | 19:51 |
cardoe | jizaymes: so what neutron network are your VMs attaching to? | 20:17 |
jizaymes | my external network "Internet" which maps to physnet-external which is bond0.80 | 20:18 |
opendevreview | Doug Goldstein proposed openstack/sushy master: migrate sushy_oem_idrac to sushy https://review.opendev.org/c/openstack/sushy/+/940557 | 20:18 |
cardoe | so can you do openstack network show <external-name> | 20:20 |
cardoe | Actually nvm. | 20:21 |
cardoe | You answered it | 20:21 |
cardoe | The OS is handling the VLAN tag. | 20:21 |
jizaymes | yes a bit large to paste but yes its successful, state is UP, network-tyep is flat, physical_network is physnet-external, and status is ACTIVE | 20:21 |
cardoe | What's handling the VLAN tag for the physical ports for the ironic nodes? | 20:22 |
cardoe | You said br-cleaning is on bond0.70. As you stated your OS is handling the VLAN tag. aka you have a virtual switch in your OS. | 20:23 |
jizaymes | should be nothing setting any vlans. I have it hard set as an access port in the external vlan 80. eventually would like to wrap in ml2 to reconfigure my switch but do not have anything configured for that presently within openstack or ml2 | 20:23 |
cardoe | okay so you set the physical ports to access VLAN 80 but cleaning (and PXE by extension) is on 70. | 20:23 |
jizaymes | and the internal cleaning port vlan 70 is also hard set to that prt. | 20:24 |
jizaymes | (to its respective port) | 20:24 |
jizaymes | goal was to use 1g nic 1 for all pxe / cleaning / deployment activities, and 100g nic 2 as just (initially) dumb member of that separate Internet network | 20:25 |
JayF | What's the value in having those activities split? | 20:26 |
JayF | It doesn't mitigate the risk of running flat network interface since your provisioned nodes still have an interface in it | 20:26 |
JayF | (it == cleaning/provisioning/rescue/servicing network) | 20:26 |
jizaymes | 1) not having as much ml2 switch reconfiguration stuff 2) pseudo security in that I can turn off that port completely after its provisioned to limit access inward. | 20:27 |
JayF | okay, you're thinking more about agent->server threat model | 20:27 |
jizaymes | its likely I just dont understand how this is "supposed" to work | 20:27 |
JayF | we usually worry more about tenants interfering with cleaning or deployment | 20:27 |
JayF | well, I say we, I do at least :D | 20:27 |
JayF | your logic makes sense | 20:28 |
JayF | just asking about use cases because it's easy to get down an "XY problem" style hole in openstack | 20:28 |
jizaymes | well that was another reason to keep it as a totally separate network so that customers would have limited ability to muck with it in that volatile state | 20:28 |
JayF | well, unless you have some network automation flipping the port off on the switch side; there'd be nothing preventing a tenant from turning up that port from a provisioned node, yeah? | 20:28 |
JayF | so do you have sidecar network automation you'd expect to be running | 20:29 |
JayF | or am I just in multitenant mindset and you're not multitenant at all | 20:29 |
jizaymes | well enter ml2. not a lot of documentation on that, was hoping that could be configured to do that afterwards. | 20:29 |
JayF | so the normal/expected/happy path is Ironic w/neutron network interface | 20:29 |
jizaymes | I'm in a public cloud service provider scenario so I want as much separation from backend as possible, and assuming the customers are all untrustworthy a-holes ;) | 20:30 |
JayF | and we flip the ports to bond+vlan for tenants, flip it to a different network for cleaning/provisioning/etc (you can even define different ones for this) | 20:30 |
JayF | for a public cloud model, you 100% ^^^ need that | 20:30 |
cardoe | Ultimately I think there needs to be some more logging to see why you're getting a binding error. | 20:30 |
JayF | the other thing to warn you about jizaymes as someone who's done multitenant cloud before (I was one of hte original engineers for rackspace onmetal like 10y ago); be wary of hardware hacks | 20:31 |
cardoe | The separate NIC approach is the same convo I've had as a public provider as well. | 20:31 |
JayF | in our case, we did lots of coordination with hardware vendors to ensure we were able to keep it secure | 20:31 |
JayF | but a root user has a lot of hardware access and can wreak havoc on consumer-level gear | 20:31 |
cardoe | But like JayF mentioned you then have the issue that the tenant can just enable the port. So then you need something to shut off the port. And if you're already doing port manipulation, might as well just run them both over the same physical ports. | 20:31 |
JayF | (in that public cloud case; we explicitly read out and validated the firmware code was unchanged during each cleaning cycle) | 20:31 |
JayF | cardoe: exactly my logic | 20:32 |
jizaymes | ok so ml2 can change vlans but cant enable/disable ports ? | 20:32 |
cardoe | You can | 20:33 |
jizaymes | i guess I was figuring ml2 would shut down the port on the switchport side so the customer would have no ability to bring it up | 20:33 |
jizaymes | i didnt even really get to learning ml2 yet as I cant even get a basic OS on this thing in a single vlan hehe | 20:33 |
cardoe | Ultimately it comes down to a cost thing. | 20:34 |
cardoe | People will end up deciding to collapse the provisioning switch since it's running as 1g/10g onto the same back plane that the 40g/100g TOR lives on. | 20:35 |
cardoe | Then redundancy convos happen | 20:35 |
cardoe | Anyway, turn up the logs to see why the bind is failing. | 20:36 |
jizaymes | will do. thank you both for the help! | 20:37 |
cardoe | Happy to speak from an operator perspective. JayF and I both have touched that area a bunch. | 20:38 |
opendevreview | Doug Goldstein proposed openstack/ironic master: doc: fix typo and slight wording order for networking https://review.opendev.org/c/openstack/ironic/+/940558 | 20:39 |
JayF | well; yes-ish cardoe :) Most places I've worked (before $curJob) that ran this kinda setup was so forked much of my knowledge doesn't apply | 20:39 |
JayF | networking and inspector are the two places I know the least about and I'm having to rapidly get up to speed in both cases | 20:39 |
cardoe | Well I meant generically we can speak to it. | 20:40 |
cardoe | I'm trying hard to kill forks. | 20:40 |
JayF | you all still have nicera? | 20:40 |
JayF | or nsx or whatever it was called | 20:40 |
jizaymes | how about this question since I have not yet been able to get on the console of an ironic-installed bare-metal server -- is it just a normal OS that the customer has full access to, or is it like a VM abstracted from the end user but with more hardware passthrough or something? I always assumed it was like the former, which is why we couldnt have more deep OVN networking there (not trusted) | 20:41 |
JayF | so when you provision a node with Ironic we put an OS on it, setup the configdrive so cloud-init/glean/ignition can setup networking and a user, setup the actual switch side of the network, and flip it on | 20:43 |
JayF | at that point, when it's on, it's indistinguishable from a random machine installed via an installer | 20:43 |
jizaymes | ok thanks got it. maybe its easier just to sell big single-vms on dedicated hardware using normal nova, but keeping all of the built-in network fanciness that exists..and just do hardware passhtru for GPU or storage. though.. I cant get this to work; whos to say I could get that to work :) | 20:45 |
JayF | You'll be able to get the network stuff working I suspect, you're on the right path just doing bond+vlan makes it extra tough | 20:48 |
JayF | but just make sure to think about the security implications | 20:48 |
JayF | I will tell you in our testing (10 years ago) we found about a 5-10% performance hit doing "giant VM on dedicated hypervisor" vs "actual bare metal" | 20:48 |
JayF | but that was haswell era in 2014 so who knows | 20:49 |
jizaymes | too bad my fancy Intel 100gbps cards cant terminate the neutron / ovn stuff out of band. isnt that what some of the mellanox stuff does? | 20:51 |
cardoe | "Smart NICs" | 20:52 |
cardoe | Though I'm using VXLAN fabric now and almost there to having OVN joining it via BGP EVPN an participating as part of the fabric. | 20:53 |
JayF | We have absolutely dreamed of an Ironic network interface that was just like | 20:56 |
JayF | "hey smartnic, expose a different set of vlans now" | 20:56 |
JayF | that does not exist as of now afaik | 20:57 |
JayF | I think the closest we have to magic network operation is something like the OVN VTEP switches, but I've never seen that in practice just in theory, but the theory is cool | 20:57 |
jizaymes | yeah I saw something like that with my arista switches but it seemed outdated. | 20:58 |
JayF | https://docs.openstack.org/ironic/latest/admin/ovn-networking.html#vtep-switch-support was adapted from some patches by a juniper dev | 20:58 |
JayF | but afaik outside of that dev's patched setup, nobody has used this in production yet | 20:58 |
JayF | and for all the cool hardware I have, switches that cost as much as my car are not one of them ;) | 20:59 |
jizaymes | so as you said earlier, the common design pattern is just either a single nic or 2 nics in a bond, then ml2 changes the vlan between provisioning stages -- cleaning/deploy/pxe vlan first, then changes to the production vlan with ml2? and the delta is right now I'm trying to do the same but with out any vlan changes, using 2 interfaces instead of 1. | 21:03 |
JayF | You are the first person I think who has tried to do that (at least and mentioned it to us), and I'm not certain it's possible. | 21:08 |
JayF | We should document it if you figure oujt the path :D | 21:08 |
jizaymes | man does this always happen | 21:09 |
jizaymes | ugh. it feels remedial to me, yet of course its not haha | 21:09 |
jizaymes | ok I will pivot to figure out the ml2 part I guess. more dependencies.. | 21:11 |
jizaymes | do I have to make a script for it to use? or does it magically know how to configure my switch? I dabbled for like 2 minutes trying to get the networking-arista module installed but didnt want to go too far off the ranch with out any of this other stuff working | 21:13 |
cardoe | So that vtep-switch setup btw is what I've kinda got going. | 21:17 |
cardoe | Gimme a few more months and it'll flesh out better. | 21:18 |
cardoe | First inspection | 21:18 |
cardoe | jizaymes: you'll likely want to look at networking-generic-switch | 21:18 |
jizaymes | ok wow. this is me coming out of the kolla bubble -- theres no mention of this in their docs or configuration examples that I can see. | 21:23 |
jizaymes | is it common for bare metal servers to be directly on the external Internet vlan like I'm trying, or is it usually somehow brought back into ovn via a vlan to be routed out from there? | 21:23 |
jizaymes | (in the openstack world, I know in the general scope of the world that yes it is common) | 21:24 |
jizaymes | also throwing this out there for anyone just jumping in -- I'm happy to engage an independent consultant to provide paid guidance for this initiative. | 21:53 |
JayF | I have absolutely seen public addresses given out to machines. It's the way that onmetal works (or at least did when I left) | 22:45 |
JayF | jizaymes: most ironic contributors either work for a deployer like Rackspace (cardo e) or g-research (me) or for red hat who may offer some kind of professional services, I have no idea | 22:45 |
JayF | jizaymes: a group that I know does consulting services and knows a lot about this area is stackHPC | 22:46 |
JayF | They've been in the community for a long time and I personally know of at least one setup similar to what you want to do that they've worked on | 22:46 |
JayF | I have no idea what their incoming sales process looks like, but you can Google for them | 22:46 |
JayF | I'm sure there are plenty of others too, those are just the folks I've worked with the most | 22:47 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!