Friday, 2025-02-21

opendevreviewVerification of a change to openstack/ironic master failed: ci: focus ironic-tempest-bios-ipmi-direct-tinyipa  https://review.opendev.org/c/openstack/ironic/+/94220401:49
TheJuliacardoe: huh? not sure I understand02:18
TheJuliacardoe: might help to have a higher bandwidth discussion02:18
cardoeSure thing. Tomorrow? The more I think about it our network automation must have done something wrong. skrobul and I were doing manual testing on that box and I bet it still had some override. But 100% Ironic and Anaconda are busted.02:20
opendevreviewVasyl Saienko proposed openstack/ironic master: Enable atop on jobs  https://review.opendev.org/c/openstack/ironic/+/94234006:45
opendevreviewVasyl Saienko proposed openstack/ironic-tempest-plugin master: Make sure fixed IPs are different for multitenancy tests  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/94241407:20
fricklercardoe: sorry my day ended faster than expected yesterday. we still need to clean up the terms I think, like somehow I'm unsure what the difference between VXLAN and VNI is for you. also do you want to see encapsulated VXLAN packets on the baremetal server port?08:13
opendevreviewcid proposed openstack/ironic-python-agent-builder master: More reliable TinyIPA builds with network retries  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94236908:21
vsaienkoironic community, please review simple patch that enables atop installation on devstack environments https://review.opendev.org/c/openstack/ironic/+/942340 should helm to troubleshoot performance issues and unstable CI09:08
opendevreviewcid proposed openstack/ironic-python-agent-builder master: More reliable TinyIPA builds with network retries  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94236911:51
cardoefrickler: there is no difference between VXLAN and VNI for me. I’d like to be able to reuse the same VNI number in the same neutron. And yes I would like to see VXLAN encapsulated traffic to the bare metal port. Because I could have OVS VTEP running on that machine participating.12:28
fricklercardoe: so the vxlan port would be a trunk subinterface on that bare metal port? or how would the bare metal port get its l3 connectivity? so far in neutron, the L3 connectivity for vxlan is part of the globalserver setup and not port- or tenant-specific afaict12:58
cardoeSo let's ignore that piece for now cause I feel like that's where my conversation gets hung up each time.13:08
cardoeI've got two separate VXLAN fabrics. Both can have VNI 2000 running on them. How can I allow for that allocation in Neutron today?13:09
cardoeWith VLAN, you'd define separate physical_network values and then there's no conflict.13:09
fricklercardoe: in neutron, you cannot, because as I said, vnis are bound to the whole server and thus you cannot have multiple domains for them. you would first introduce a concept of VRFs in order to have different L3 domains and then you could maybe tie VNI ranges to those13:11
cardoeBut that's still taking an OVS view and not a bare metal view.13:13
cardoeBecause port binding happens according to the physical_network field. So if my server has value X for some ports and value Y for other ports. I've already got that separation.13:14
cardoeNo VRF involved because its physically separated.13:15
fricklercardoe: yes, I'm talking only about the hypervisor/instance focused via that neutron currently has (or as I interpret it)13:15
fricklercardoe: but you still have a single routing domain on your baremetal node? how does OVS distinguish vxlan packets from those two ports?13:16
cardoeAbsolutely. And I agree with you about the VRFs in that case.13:16
fricklers/via/view/13:17
cardoeThe VXLAN traffic isn't coming up into a single OVS.13:18
cardoeIn one case the traffic is coming up into a device doing NVMe over TCP so that Cinder can plumb stuff over the top.13:18
cardoeToday Cinder really expects your networks to be setup already for all your storage stuff.13:19
cardoeSo maybe it'll help to use some RedHat terms (well really Triple-O) but I'm interested in the "undercloud" case. The hypervisor/instance focus you speak of would be the "overcloud".13:20
cardoeSo I'm having my cinder backend actually talk to neutron to setup that physical fabric.13:21
cardoeAnd then the regular traffic side is nova talking to neutron.13:21
cardoeI could run two different neutrons. One for nova and one for cinder. But then when it comes to Ironic, I cannot toggle back and forth. That's the crux of my problem.13:25
fricklercardoe: I see. maybe treating these sides as different availability_zones might work? not sure if that would solve the "vni domains" issue right away, but might be a concept where this could be added to13:25
cardoeOh. Now we're gonna peel back some other gross hacks I did. :D13:26
cardoeSo right now since I cannot get the neutron network node to join the VTEP proper. I'm creating VLAN segments for each leaf switch on that VXLAN network. It's an idea I copied from the Juniper and Cisco ML2 mechanism.13:28
cardoeAnd in that case since the regular traffic to a bare metal server is really exposed out as a VLAN, those segments store the actual VLAN that box is plugged into. We'll we're trunking ports so there many VLANs with the trunking plugin. But that's another aside.13:30
cardoeBut the result is I've spread neutron network nodes running OVN into different leaf switches. Their AZ is named their leaf as well. The OVN bridge mapping to the trunk port is the name of the leaf switch.13:31
cardoeAnd OVN will happily use a VLAN and tag the traffic.13:32
cardoeI can't have HA cause <insert bug here that jamesdenton_ will probably remember>13:33
fricklercardoe: so "neutron network node" is the overcloud node provisioned by the undercloud ironic? or are you talking about neutron in the undercloud? because in the latter case I'd fail to see the ironic connection13:35
cardoeAnd a VXLAN network cannot ever have two segments that live in cabinets with two different OVN nodes. So for example if I have OVN node in F20-1 and F20-5. If the rest of those cabinets have usable servers. I cannot build with ironic/nova a server into F20-5 and have the network use OVN from F20-1.13:35
cardoeThis is neutron in the undercloud exclusively.13:36
cardoeSo think of any OpenStack public facing provider like Vexxhost or OVH or Rackspace.13:37
cardoeTheir public facing OpenStack would be the overcloud.13:37
cardoeLet's pick Rackspace's OpenStack Flex for example which OpenDev is consuming now days.13:39
cardoeThey need bare metal for their hypervisors and all those OpenStack services.13:39
cardoeThey come my undercloud. They build machines with Nova using their hypervisor OS image. They define networks in my neutron that carry their VXLAN or Geneve or whatever traffic along with whatever management they need.13:41
cardoeThey want their Cinder to be able to have their hypervisors hook up to storage and so they come to me and I configure those network paths as well.13:42
cardoeMy neutron using exclusively doing those bare metal switch configurations.13:43
cardoeBut I've started running OVN to provide some ease of use comforts. Like when the box is first coming up, it's default gateway might be an OVN router and there for it can communicate to their configuration management system and fetch whatever it needs.13:45
cardoeBigger footprint though is that my neutron runs neutron-fwaas and actually drives bare metal devices that ultimately are what their overcloud uses as their infrastructure.13:46
fricklerso you are running OVN on the baremetal node and integrating it into the neutron-controlled OVN?13:46
cardoeNo.13:47
cardoeThey are running OVN on the baremetal node and it's controlled by their neutron.13:47
cardoeThey're oblivious to how I set it up. To them it's just some boxes that are plugged into the same switch.13:48
cardoeI'm running an OVN on my control plane machines.13:49
fricklerso maybe let's move back one step, why do you need/want vxlan to the baremetal node, what's bad about terminating it on the leaf switch?13:50
cardoeNothing bad about terminating it on the leaf switch.13:50
cardoebtw I'm happy to have the convo in a higher bandwidth medium as well.13:51
cardoeI'm terminating it on the leaf switch today.13:51
cardoeBUT there's bugs with neutron and OVN there.13:52
cardoeSo like I said. I make VLAN segments on the VXLAN network with the physical_network field set to the leaf switch name for each VLAN segment.13:52
cardoeThat limits my OVN node which is providing that router for example (or DHCP) to 4096 - $OVERHEAD networks it can service.13:53
fricklerah, I'd have expected the vlan ids to be specific only per switch port, but in this case things might become limited pretty soon indeed13:56
cardoeI've got well in excess of that many networks running on my fabric today.13:56
cardoeI had tried to poke ovn-bgp-agent without success. https://bugs.launchpad.net/ovn-bgp-agent/+bug/2017890 exists for really adding that support. A person on that created https://github.com/toreanderson/evpn_agent/ which I haven't tried yet.13:57
fricklerthat's tore in #openstack-neutron I assume13:58
cardoeI've brought it up during their calls.13:59
fricklerone day I'll need to look into ovn-bgp-agent myself. so far I've been fine with just using n-d-r14:00
cardoeAnyway. I hope that kind of makes sense to the use case. I'm trying to see what the best way is to present it to the community at large so that when I make RFEs and such there's a way to tie back to the "why".14:06
cardoeThe biggest rub right now is the "I'm wanting VNI 200,000 on the storage fabric AND on the network fabric."14:06
fricklerI was just about to write that reasoning why you need to have overlap there might be a good start ;)14:07
jamesdenton_cardoe seems like we are attempting to overload the existing vxlan type to describe physical vxlan fabrics when its original intention was to be used for the single (software) overlay fabric managed by neutron? Therefore, the concept of a 'physical_network' equivalent doesn't really apply/exist? (FWIW, i don't know if that was its original intention, or not).14:15
cardoejamesdenton_: yep. I think that's the crux of the problem.14:17
frickler+1, well summarized14:18
fricklerand indeed one possible solution path might involve creating a new provider-network-type14:20
jamesdenton_having *tenant provisioned* baremetal nodes participate act as a VTEP isn't a requirement, either, AFAIK. But rather, the Neutron nodes handling L3, DHCP, etc would benefit as VTEPs against the *physical fabric* - since the VNI describes a network across the fabric while the VLAN is only locally significant to a particular leaf. I guess today we have to fumble with segments to make sure a network node gets plumbed14:22
jamesdenton_ correctly? But then i seem to recall your issue with the actual implementation (re: the first segment not actually having something to bind against)14:22
cardoefrickler: I thought about that... "vxlan_evpn" or something. I'm bad at naming stuff.14:23
jamesdenton_obviously, rackvxlan™14:23
opendevreviewHarald Jensås proposed openstack/sushy-tools master: nova driver - get_secure_boot volume boot  https://review.opendev.org/c/openstack/sushy-tools/+/94245614:26
opendevreviewHarald Jensås proposed openstack/sushy-tools master: nova driver - get_secure_boot volume boot  https://review.opendev.org/c/openstack/sushy-tools/+/94245614:29
mnasiadkacardoe: Have you seen that ovn-core is working on a BGP implementation? Maybe they are solving your issues? (And I assume once that BGP code lands - ovn-bgp-agent might be sort of redundant?)15:27
cardoeMaybe? We'll have to see cause ovn-bgp-agent doesn't do the needful today. That's why there's evpn_agent.15:29
cardoeThere's no configuration option that I can somehow have the anaconda deploy interface have a "switch to provisioning network" step and then later "switch to tenant network"?15:38
cardoeAs far as I can tell, the anaconda deploy interface never creates the provisioning network port and never puts the machine on the provisioning network.15:41
cardoeBut it then grabs the glance data and puts it on the conductor HTTP server, which is accessible from the provisioning network. It also requires the box to poke the heartbeat_url.15:42
cardoeBoth of which aren't accessible from the tenant network. But it's putting the box on the tenant network and asking it to PXE boot and load that stuff.15:43
jamesdenton_cardoe if the expectation is to hit the conductor api then i would expect that node to have been placed on the provisioning network and not the tenant network. But if the original test case was a "flat" provisioning/tenant network, then i could see how this might've worked. Just speculating.16:25
cardoeImma just wait until TheJulia tells me what to do. :D16:33
jamesdenton_that is usually best16:34
TheJuliawait, what did I do?!17:13
TheJuliasorry, day off and checking on neighbors danes and other tasks have kept me busy17:14
TheJuliavsaienko: as an fyi, we have a hashtag you can apply to changes your desiring reviews of17:15
TheJuliaironic-week-prio17:15
TheJuliaJayF: so, I think the ideal primitive of physical network should still apply because there are crazy folks who do things like A/B fabrics17:20
TheJuliacardoe: so.. to a provisioning network should be part of the higher level flow before the driver really gets into driving17:22
cardoeSo the only place I see "add_provisioning_network" being called in a prepare() method on a DeployInterface is for AnsibleDeploy and CustomAgentDeploy. Neither of which AnacondaDeploy inherits from.17:26
cardoeSo that's why I'm wondering if that's something I can configure? or do I literally need to make my own AnacondaDeploy that inherits from CustomAgentDeploy?17:26
* TheJulia raises an eyebrow17:29
TheJuliagive me a couple minutes to take some notes down and I'll look at the code17:29
cardoeThis is btw after I patched the glance_service code to stop doing what it's doing. Which I think you wanted to sync on TheJulia 17:30
TheJuliadistracted, a wild nobodycam is talking to me about switch configuration in person17:36
TheJuliaokay, so its based upon the pxe interface17:41
TheJuliahttps://github.com/openstack/ironic/blob/master/ironic/drivers/modules/pxe.py#L72-L7417:42
TheJuliabecause of the modeling, there is not a separate provisioning network with that interface to deploy, it drops it on the tenant network and expects it to boot/deploy from that point17:43
TheJuliaif you need to isolate and switch, I'd but a flag in.. I guess17:43
cardoeokay so custom class it is. :D17:46
TheJuliacardoe: oh, the permission check, yeah. I chatted with Jay about that last night17:53
cardoeNot the permissions check17:58
TheJuliathe bug jayf created yesterday?18:05
JayFI think cardoe's bug was different18:05
JayFjust in the same neighborhood18:05
JayFsomething about the structure of a dict changing 18:06
JayFand our tests (which work) mocking the original glance structure, while the code in production gets a slightly modified structure18:06
TheJuliaoh, that18:09
TheJuliayeah, that has me confused18:09
TheJuliabut I don't have enough details to be actionable on it18:09
TheJuliaat least, personally actionable18:09
JayFI think cardoe seemed to have his head wrapped around it well enough that I think (hope?) he might take a swing at it18:13
TheJulia++18:37
cardoeokay back at my desk18:41
cardoeActually no. I need some advice.18:42
TheJuliawelcome back!18:42
cardoeThat's what I was hoping to bug ya for TheJulia 18:42
TheJuliarutro!18:42
cardoewhy is golangci-lint destroying my CPU...18:43
cardoewhy am I touching golang code... these yaks need to go away.18:43
cardoehttps://etherpad.opendev.org/p/cardoe-ironic-anaconda-img-props so that's the details I took about the issue.18:43
cardoeBut I'm happy to jump on higher bandwidth convo to explain the issue and get advice on how to proceed.18:44
TheJuliahypothetical question: If I can't use libvirt to create a VM, and I need get it an IP to use on devstack, should I just bind it out to virbr0?18:45
TheJuliahmm, not sure I'm groking18:46
TheJulialet me go to my RV for a call18:46
opendevreviewSatoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager  https://review.opendev.org/c/openstack/ironic-python-agent/+/94171418:50
TheJuliacardoe: o/ https://meet.google.com/dks-gcwx-bga18:55
JayFdevstack test vms for ironic usage don't have internet access in provisioning network, seemingly, I assume this is expected? cc: satoshi 19:12
JayFand assuming so, anyone have experience making it work?19:12
TheJuliayeah, that is expected19:30
JayFYeah, I think I'm going to have to figure out how to change that, even if just as a oneoff19:52
JayFsince literally the point of the incoming hardware manager is to run things from a container registry19:52
JayFI guess option b might be spinning up a local registry on devstack19:52
JayFsatoshi: ^ that is actually the route I'd go for now; there should be some howtos on setting up a local container registry and you could access that by-ip19:54
vsaienkoTheJulia thanks, I've applied this hastag for vlan aware VMs patches20:40
JayFif I haven't said so, it's nice to see you around contributing o/ thanks Vasyl20:52
opendevreviewDoug Goldstein proposed openstack/ironic master: fix glance metadata layout  https://review.opendev.org/c/openstack/ironic/+/94249621:19
cardoeYou know what we need? Two copies of get_image_properties() with different call interfaces.21:20
cardoeAnd just for good luck, we'll inline another copy in another function.21:20
shermanmeven something like this works (replace docker/quay/whatever)21:29
shermanmdocker run -d -p 5000:5000 \21:29
shermanm    -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \21:29
shermanm    --restart always \21:29
shermanm    --name registry-docker.io registry:221:29
shermanmproxies all requests to e.g. 127.0.0.1/namespace/containername:tag to registry-1.docker.io/namespace/containername:tag21:30
cardoeTheJulia: so other random discoveries in the anaconda deploy... it already requires the heartbeat_url to be hit with the "end" message. When it gets that, it attempts to teardown the provisioning network and switch to the tenant network.22:03
TheJuliaThat is what I would have expected, fwiw22:06
opendevreviewHarald Jensås proposed openstack/sushy-tools master: OS vmedia: Update device on eject_image  https://review.opendev.org/c/openstack/sushy-tools/+/94249822:21
opendevreviewHarald Jensås proposed openstack/sushy-tools master: Openstack vmedia - refactor to pre-defined volumes  https://review.opendev.org/c/openstack/sushy-tools/+/94249923:00
opendevreviewHarald Jensås proposed openstack/sushy-tools master: Openstack vmedia - refactor to pre-defined volumes  https://review.opendev.org/c/openstack/sushy-tools/+/94249923:07

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!