fungi | mnaser: okay so that's connecting to afs01.dfw.openstack.org from vexxhost ca-ymq-1? | 00:24 |
---|---|---|
mnaser | fungi: that’s correct | 00:25 |
mnaser | well, from the same site but not from ca-ymq-1 directly :) | 00:25 |
fungi | i can successfully `ls /afs/openstack.org/` from mirror.ca-ymq-1.vexxhost.opendev.org | 00:26 |
mnaser | fungi: hmm, the only thing that changed is the SDN is now OVN.. but I’m seeing traffic go back and forth so I’m a bit clueless | 00:28 |
mnaser | I wonder if it’s something to do with UDP traffic | 00:29 |
fungi | interesting, i think mirror.ca-ymq-1.vexxhost.opendev.org is hitting afs01.ord.openstack.org instead of dfw | 00:35 |
fungi | maybe it's a problem reaching rax-dfw? | 00:35 |
fungi | ping works fine though | 00:35 |
mnaser | fungi: it does ping fine indeed, i'm even seeing traffic back and forth.. but just seems like the ls times out | 00:37 |
fungi | from home i'm hitting afs01.dfw.openstack.org when i browse thoug | 00:38 |
fungi | h | 00:38 |
fungi | so if it's a problem with dfw and not ord, it's also not broken from some parts of the internet either | 00:39 |
mnaser | yeah.. i wish i had more info, maybe what's happening is some mtu thing | 00:39 |
fungi | pmtud blackhole maybe | 00:39 |
fungi | i'll see if i can figure out how to get the mirror to hit dfw | 00:41 |
mnaser | im curious what a successful tcpdump looks like | 00:42 |
* mnaser hmm, interesting, looks like the afsd process is hung | 00:44 | |
fungi | mnaser: https://paste.opendev.org/show/bDrIsFdnfpLUUqqbn7KL/ | 00:44 |
fungi | that's from my home workstation which seems to be hitting afs01.dfw | 00:45 |
mnaser | hrm | 00:46 |
mnaser | yeah definetly dont see that | 00:46 |
mnaser | https://paste.opendev.org/show/bB7O2Jye8Hug5NPFL2iB/ | 00:48 |
mnaser | this is what im seeing on start up | 00:49 |
fungi | what openafs-client version? (or are you using kafs?) | 00:50 |
mnaser | openafs-client-1.8.8.1-1.el7.x86_64 | 00:51 |
fungi | we're using 1.8.8.1-3ubuntu2~22.04.2 on our zuul executors, so should be fine | 00:51 |
fungi | any chance you see similar issues from sjc1? | 00:52 |
mnaser | does openafs use bidirectional traffic? | 00:52 |
mnaser | i have allow out all, but nothing inbound (other than ssh) | 00:52 |
mnaser | and i can test.. but this is not from the public cloud but from another cloud running in the same site (so not vexxhost cloud persay) | 00:53 |
mnaser | i wonder if this is because ovn is using stateless rules | 00:54 |
mnaser | so i need to add the openafs client ports | 00:54 |
JayF | I am pretty sure UDP needs incoming ports opened, and it was a range iirc (it's been a very long while since a ran openafs) | 00:54 |
fungi | yeah, i believe it opens return data channels | 00:55 |
mnaser | yeah and ovn uses stateless fw rules as opposed to ml2/ovs stateful ones | 00:55 |
mnaser | so i wonder since this system only has port 22 open.. | 00:55 |
JayF | well, I think it's the sorta thing you'd need special handling for, like ftp or sip do to mark return traffic as related | 00:55 |
JayF | it's not so much stateful as I think two-way RPC comms | 00:56 |
mnaser | ill try to enable 7000 to 7007 and see what happens | 00:56 |
JayF | (again my knowledge is at least a decade old so please validate for yourself) | 00:56 |
fungi | we have iptables set for "-m state --state ESTABLISHED,RELATED -j ACCEPT" but nothing special for inbound communications, fwiw | 00:57 |
fungi | maybe ovn is less smart about identifying afs responses | 00:57 |
fungi | we have our security groups wide open in both directions though, and are relying on iptables on the server instances instead | 01:00 |
fungi | at home it's working fine through an openbsd nat, for that matter, with no inbound mappings at all | 01:03 |
mnaser | yeah but i think ovn is full stateless when it comes to its security groups | 01:05 |
fungi | ah, okay, so doesn't attempt to do udp state tracking guesswork? | 01:07 |
mnaser | it might not be, well, i just tried to enable all incoming tcp and udp and still no bueno | 01:08 |
mnaser | `$ ls /afs/openstack.org` just hangs | 01:08 |
mnaser | im deffo seeing 'action' in tcpdump happening.. with pings and pongs, | 01:09 |
fungi | our mirrors in vexxhost ca-ymq-1 and sjc1 are hitting afs01.ord instead of dfw, so it may still be something going missing on some routes and not others | 01:09 |
mnaser | the odd thing is im actually seeing traffic go and come back | 01:09 |
mnaser | https://paste.opendev.org/show/bd223OuxBuM5GS0zup5n/ things are moving here | 01:11 |
frickler | mnaser: so iiuc the issue is when a vm is in a tenant network behind OVN? as opposed to our mirror which is in a provider network, so no OVN SNAT involved? | 07:44 |
*** dviroel_ is now known as dviroel | 11:27 | |
fungi | alternatively it could be some packets being modified or going missing between there and rackspace's dfw network but not their ord network | 11:36 |
fungi | i didn't try forcing mirror.ca-ymq-1.vexxhost.opendev.org to prefer afs01.dfw.openstack.org instead of afs01.ord.openstack.org | 11:37 |
mnaser | frickler: so it seems with ovn + distributed floating ips.. if the MTU of the tenant network is smaller than the provider network, the network will end up getting packets that are too big for the interface and get dropped | 13:51 |
*** d34dh0r5- is now known as d34dh0r53 | 13:52 | |
frickler | mnaser: ah, yes, that would be a plausible explanation. one of the reasons I always make sure tenant networks as well as provider networks have MTU 1500 | 13:55 |
fungi | mnaser: can't it do proper pmtud or allow fragments worst case? | 13:55 |
JayF | mnaser: frickler: That's literally the thing we just figured out was breaking Julia's attempt to get Ironic+OVN working :) | 13:56 |
frickler | JayF: that was the issue with tftp stalling, right? | 13:56 |
JayF | well, more than just tftp, we switched to http and it was still stalling | 13:57 |
fungi | maybe i'm just too used to bsd based routers, but mtu blackholes are, like, a flashback to the 1990s for me | 13:57 |
JayF | yeah when I saw the shape of Julia's failure MTU is the first thing I said | 13:58 |
fungi | it's hard to believe a modern routing platform wouldn't just quietly solve them (either by telling the peers to negotiate packet sizes down or fragmentation and reassembly on the fly) | 13:59 |
frickler | I wouldn't be surprised if OVN would silently drop too big packets instead of generating proper ICMP responses, but also I'm biased about that piece of software | 14:00 |
fungi | the other possibility is that something else is zealously discarding the icmp error replies | 14:01 |
fungi | i used to see that back in the bad old days where people thought "block all icmp" was something expected for firewalling | 14:01 |
frickler | guess I'll need to do some devstack test setup. I need to do a big update on OVN gaps anyhow | 14:05 |
frickler | or maybe I can create a test network in our vexxhost tenant? | 14:05 |
fungi | as long as you don't attach any of our production servers to it, that's probably safe | 14:06 |
mnaser | frickler: the vexxhost public cloud is not running ovn yet. | 14:06 |
fungi | and also without access to the underlying infrastructure it might be hard to troubleshoot even if it were | 14:07 |
mnaser | for context, tenant network is 1450, provider network is 1500, distirbuted floating ip set to on .. boom | 14:08 |
frickler | mnaser: so without distributed fip it is working? | 14:09 |
frickler | please create a bug report for neutron if possible. also just for completeness, which ovn version? | 14:11 |
frickler | thinking about it, JayF: do you have a bug report for your ironic issue? | 14:15 |
JayF | I don't know if we ever considered it a bug; I think we consider it a devstack configuration issueo | 14:17 |
JayF | while building ironic support for ovn | 14:17 |
mnaser | frickler: i did not try to have it disabled, but i think it might be more complicated since i end up with many messages like this in dmesg: `tapdf9341b0-6d: dropped over-mtu packet: 1472 > 1450` | 14:24 |
frickler | mnaser: oh, so that looks like OVN does forward the packet, but the kernel then drops it | 14:27 |
frickler | ralonsoh: lajoskatona: ^^ is that something you've maybe seen before? | 14:27 |
clarkb | one issue with neutron and mtus historically is that there are multiple l2 (not l3) devices in a row which prevents them from generating proper icmp responses to the source to shorten their packets | 14:27 |
clarkb | because icmp operates at l3 and requires an ip address iirc | 14:28 |
ralonsoh | let me read the conversation | 14:28 |
fungi | oh, yes connecting l2 broadcast domains without an actual routing node is a recipe for packet loss and black holes | 14:28 |
clarkb | this problem is what drove me to push neutron to manage mtus far more aggressively along the entire pathway for VM connections because neutron is the only thing that knows the complete info | 14:28 |
clarkb | essentually neutron should (and did/does) set the mtu to the lowest value on the pathway across all nodes | 14:28 |
clarkb | to address this sort of problem. But for a long time neutron did not do this and things broke a lot particularly when we first started doing multinode testing | 14:29 |
fungi | the former network engineer in me shudders at that thought | 14:29 |
frickler | ralonsoh: tl;dr: UDP from the outside towards a FIP with size 1500 gets dropped when the inside tenant network has smaller MTU | 14:29 |
clarkb | my initial impression is that neutron should/must set the mtu on the fip to the lowest size of the path behind it | 14:29 |
ralonsoh | frickler, without packet fragmentation? | 14:30 |
clarkb | because the path behind it is almost always >3 interfaces only the last of which actually has an ip address and can send an icmp | 14:30 |
frickler | ralonsoh: it seems so. I'll set up a test in devstack for myself now | 14:30 |
ralonsoh | frickler, I would need to check with core OVN folks if that is possible. If not, the inner tenant network should increase the MTU | 14:31 |
ralonsoh | is that a problem to increase the MTU? | 14:31 |
fungi | does whatever neutron's using to interconnect the bridges actually have the ability to do on-the-fly transparent fragmentation and reassembly? | 14:31 |
clarkb | you can't increase the inner tenant mtu if it is running on a 1500 mtu host network | 14:31 |
fungi | normally i'd expect to connect the bridges by a router which can supply ntf responses to the sender at least | 14:32 |
fungi | or df and do pmtud | 14:32 |
ralonsoh | sorry what bridges? the issue is between networks, not bridges | 14:32 |
mnaser | ralonsoh: in my case i can increase it with no issues, but this seems like an undocumented ml2/ovs vs ml2/ovn issue | 14:32 |
mnaser | fungi: i think you're thinking in ml2/ovs terms | 14:33 |
mnaser | tap/qbr/qvo/etc .. not a thing in an ovn deployment | 14:33 |
ralonsoh | mnaser, for sure. Please open a LP bug because I need to redirect that to some core OVN folks | 14:33 |
mnaser | i am writing one up right now :) | 14:33 |
ralonsoh | and please specify the network types used | 14:33 |
fungi | clarkb was saying earlier that in some architectures neutron is directly connecting laeyr 2 broadcast domains with potentially differing mtus | 14:33 |
clarkb | mnaser: fwiw the example you gave is for a tap so thats still a thing here | 14:33 |
ralonsoh | mnaser, are you using VLAN? | 14:33 |
mnaser | yea, its a tap directly attached to br-int | 14:33 |
mnaser | yes vlan provider network ralonsoh | 14:34 |
ralonsoh | yes, ok, that could be a problem with FIPs and DVR | 14:34 |
clarkb | to be clear I don't know what neutron is doing with ovn. I do know this exact sort of problem was a major issue in neutron when we set up multinode testing and ou couldn't rely on fragementation due to the lack of ability for devices to send icmp responses | 14:34 |
ralonsoh | actually we are going to limit that for now | 14:34 |
ralonsoh | reason: FIP with port forwarding implies to send to traffic through the SNAT node (central node) | 14:35 |
ralonsoh | and that implies any other FIP in this network should be sent too to the central node | 14:35 |
ralonsoh | but please, open the LP bug documenting the issue. I'll mark it as priority=high | 14:36 |
mnaser | ralonsoh: https://bugs.launchpad.net/neutron/+bug/2032817 | 14:37 |
ralonsoh | thanks | 14:37 |
clarkb | and the tap device doesn't not have an associated l3 address to generate an icmp response from I guess? | 14:39 |
clarkb | fwiw it wouldn't surprise me if OVN says this isn't a bug because the remot side is supposed to complain not the source right? | 14:39 |
-opendevstatus- NOTICE: Gerrit is going to be restarted to pick up a small config update. You will notice a short outage of the service. | 15:33 | |
dpawlik | fungi: hey, tl;dr it was a network issue. AFS is connecting to openstack well :) | 15:52 |
dpawlik | thanks for yesterday help | 15:52 |
fungi | dpawlik: good to know, thanks for following up! | 16:58 |
frickler | dpawlik: just being curious, was that the same thing that mnaser was talking about earlier or a different one? | 17:21 |
frickler | mnaser: ralonsoh: just fyi found this by accident: "MTU handling (fragmentation on output)" https://github.com/ovn-org/ovn/blob/main/TODO.rst | 18:45 |
mnaser | frickler: if that’s the case.. ouch :( | 18:48 |
fungi | just don't stub your toe on all the "under construction" signs | 18:50 |
frickler | at least I could trivially reproduce this by simply doing a large ping to an instance in devstack. so also not related to vlan, same thing with geneve | 19:08 |
frickler | fungi: or "mind the gap(s)", cf. also https://review.opendev.org/c/openstack/neutron/+/892578 | 19:18 |
JayF | frickler: I passed that on to Julia, as well | 19:27 |
JayF | [screams in MTU but nobody can hear it because the scream is too large] | 19:28 |
JayF | she literally has been trying to get OVN working with Ironic jobs off and on for a month+ and that's at least one giant reason why not :( | 19:28 |
clarkb | why would OVN spoof your dns servers? that seems like a massive flaw | 21:15 |
clarkb | we (opendev) intentionally configure DNS servers external to our clouds because the cloud run dns servers are often deficient | 21:15 |
fungi | s/deficient/utterly broken/ | 21:18 |
frickler | clarkb: they do that because they replaced the dnsmasq setup that Neutron used earlier with packet mangling rules. those rules don't have any IP address they could use as source other than the one the original request was sent to | 21:49 |
clarkb | frickler: but neutron dnsmasq is/was only for dhcp right? it isn't the actual dns server? | 21:50 |
clarkb | I guess I don't understand the nuance there | 21:50 |
frickler | clarkb: it does both, depending on the deployment scenario. https://docs.openstack.org/neutron/latest/admin/config-dns-res.html#case-2-dhcp-agents-forward-dns-queries-from-instances | 21:51 |
frickler | actually that doc is incomplete, it doesn't mention that queries for local host entries are not forwarded | 21:52 |
frickler | so if you have vm1 and vm2 in a subnet, and vm1 asks "A? vm2", dnsmasq will answer that query from a hosts file generated by neutron | 21:53 |
frickler | the difference is that in the OVS scenario, the vm will explicitly have configured the dhcp server IP as resolver address to query | 21:55 |
frickler | with OVN, you ask 1.1.1.1 or similar, and OVN will intercept that packet and generate the answer "vm2 A 10.1.2.3" | 21:56 |
frickler | and yes, in some legislation that could be seen as data fraud | 21:56 |
clarkb | huh I never new that neutron would resolve that way. I thought you had to do designate integration and query the actual zone | 22:11 |
frickler | I think this is somehow mimicking what nova-network did, though I never worked with the latter | 22:18 |
clarkb | and I guess dns over tls or dns over https would completely break that too. Seems that is the direction we're headed too | 22:25 |
frickler | yes. dnssec in particular won't work with that. so maybe in due time this will all be dropped and only designate integration remains. if designate ever learns to deal with dnssec, that is | 22:29 |
*** dtantsur_ is now known as dtantsur | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!