*** salv-orlando has quit IRC | 00:24 | |
*** salv-orlando has joined #openstack-neutron-ovn | 02:28 | |
*** salv-orlando has quit IRC | 02:32 | |
*** arosen has quit IRC | 03:44 | |
*** armax has quit IRC | 03:55 | |
*** salv-orlando has joined #openstack-neutron-ovn | 05:26 | |
*** salv-orlando has quit IRC | 05:33 | |
*** subscope has quit IRC | 07:03 | |
*** salv-orlando has joined #openstack-neutron-ovn | 07:03 | |
*** salv-orlando has quit IRC | 09:56 | |
*** salv-orlando has joined #openstack-neutron-ovn | 10:47 | |
*** regXboi has joined #openstack-neutron-ovn | 12:46 | |
*** salv-orlando has quit IRC | 13:18 | |
*** shettyg has joined #openstack-neutron-ovn | 14:11 | |
*** shettyg has quit IRC | 14:50 | |
*** armax has joined #openstack-neutron-ovn | 15:07 | |
*** shettyg has joined #openstack-neutron-ovn | 15:10 | |
russellb | https://twitter.com/russellbryant/status/646318059737718784 | 15:12 |
---|---|---|
russellb | that demo with sflow is really cool | 15:15 |
mestery | russellb: +1000! | 15:52 |
russellb | love the potential it demonstrates, at least | 15:52 |
russellb | now to replicate it using all open source software :) | 15:52 |
russellb | annnnnd go | 15:52 |
mestery | :) | 15:54 |
*** arosen has joined #openstack-neutron-ovn | 16:45 | |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test remove oslo_db_api.wrap_db_retry https://review.openstack.org/226415 | 16:47 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test remove oslo_db_api.wrap_db_retry https://review.openstack.org/226416 | 16:47 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test remove oslo_db_api.wrap_db_retry https://review.openstack.org/226417 | 16:48 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test remove oslo_db_api.wrap_db_retry https://review.openstack.org/226418 | 16:48 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test: normal ci run https://review.openstack.org/226419 | 16:48 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test: normal ci run https://review.openstack.org/226420 | 16:48 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test: normal ci run https://review.openstack.org/226421 | 16:49 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test: normal ci run https://review.openstack.org/226422 | 16:49 |
russellb | arosen: ha, parallel CI runs? :) | 16:50 |
russellb | hax | 16:50 |
arosen | yup, :) | 16:50 |
arosen | i wanna get to the end of this SAVE POINT mysql bug | 16:50 |
russellb | yesssss | 16:51 |
russellb | i want to get security groups working | 16:51 |
russellb | seems to be close ... current issue seems to be an ovn issue, not plugin | 16:51 |
arosen | It seems like if I remove the retry decorator I don't see any failures and the CI passes each time. | 16:52 |
russellb | arosen: well that's frustrating | 16:52 |
arosen | At least it did https://review.openstack.org/#/c/224364/2 | 16:52 |
arosen | ^there | 16:52 |
russellb | seems odd to have to do the retry decorator down in our code anyway ... wish that could be hidden | 16:52 |
* russellb dives into plugin.py to try to add provider network support | 17:14 | |
arosen | russellb: coolio, I just gave a quick review on your security group patch. | 17:24 |
russellb | arosen: thanks!! | 17:25 |
arosen | you bet, nice job getting the security group support into ovn and the plugin :) | 17:26 |
russellb | i just did the neutron side, heh | 17:30 |
russellb | quite the team effort on this stuff through several layers | 17:30 |
arosen | yea i'm sure. I bet it's fun to dig into all layers of it. | 17:31 |
russellb | it also hurts my head | 17:31 |
russellb | but yeah :) | 17:31 |
russellb | arosen: good feedback, much appreciated | 17:34 |
arosen | sure thing, thanks. | 17:34 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test retry on request not deadlock https://review.openstack.org/226494 | 18:49 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test retry on request not deadlock https://review.openstack.org/226495 | 18:49 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test retry on request not deadlock https://review.openstack.org/226496 | 18:49 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Test retry on request not deadlock https://review.openstack.org/226497 | 18:49 |
*** jimchou has joined #openstack-neutron-ovn | 18:49 | |
*** asuvvari has joined #openstack-neutron-ovn | 19:22 | |
switchcade | russellb: hoping to look at the ACL conntrack issue a little more this afternoon | 20:15 |
russellb | switchcade: oh, hi! cool, thanks | 20:15 |
russellb | switchcade: i'm sorta stuck :/ | 20:16 |
switchcade | one alarm bell I see is that ingress pipeline ct(commit) flows aren't hit | 20:16 |
switchcade | it would be super helpful if we could get per-packet traces through the openflow pipeline | 20:16 |
russellb | is that possible? | 20:16 |
russellb | i've use the trace thing before, will that work here? | 20:17 |
switchcade | unfortunately since it's coming from the local stack, it has additional conntrack information and I'm not sure if ofproto/trace supports specifying that information or not | 20:17 |
russellb | ah, makes sense | 20:17 |
switchcade | I'm not super familiar with trace either. | 20:17 |
switchcade | I have a couple of directions to start though, like writing some system-traffic kmod unit tests that look at the behaviour in this case and just see what the behaviour is | 20:18 |
russellb | ok | 20:18 |
russellb | i also didn't see anything special in between from-lport and to-lport for a packet that stayed on the same host | 20:18 |
russellb | so not sure how that's supposed to work | 20:19 |
switchcade | for instance, I don't know exactly what OVS will report for a packet that comes from the local stack: Is it still "new", since this is the first packet for the connection? Or is it "established", because part of the linux stack processing involves committing the conntrack entry in zone=0 | 20:19 |
switchcade | two thoughts I have are: there's a flow which does ct(zone_reg=NXM_NX_REG5). Either that is not hit for the ping traffic, or the zone is not what we expect | 20:21 |
switchcade | that's in ingress, I mean | 20:21 |
russellb | don't discount the possibility that the output is misleading | 20:21 |
switchcade | cookie=0x0, duration=18.690s, table=17, n_packets=2, n_bytes=196, priority=100,ip,metadata=0x1 actions=ct(recirc,next_table=18,zone_reg=NXM_NX_REG5[]) | 20:21 |
russellb | because i'd restart ovn-controller (which had the side effect of resetting the counts in the flows) | 20:21 |
russellb | and then i'd ping again | 20:21 |
russellb | and there may have been some pre-existing state in conntract at that point | 20:22 |
russellb | conntrack... | 20:22 |
switchcade | might be good to do a "conntrack -F" in between testing | 20:22 |
switchcade | to flush all conntrack state | 20:22 |
russellb | ok, that was my next thought, "todo - look up how to clear conntrack" heh | 20:22 |
russellb | i could also probably set this up for you if that's useful | 20:23 |
switchcade | so, one thing that might help is if we could get a fresh openflow dump that only has the ping traffic through it (ideally no arp or anything else) | 20:23 |
russellb | yeah, that's what i was trying to get | 20:23 |
switchcade | if we know there should only be 2 packets: forward and reply, it might be easier to step through flow-by-flow | 20:23 |
russellb | hard to prevent other random crap | 20:23 |
switchcade | yeah, I noticed there was some IPv6 stuff in there | 20:24 |
switchcade | and I wasn't sure if ARP was too | 20:24 |
russellb | probably | 20:24 |
russellb | what's the better way to clear the counters | 20:24 |
switchcade | if we manually add flows using ovs-ofctl, will ovn-controller override them? | 20:24 |
russellb | i'm sure "restarting ovn-controller" isn't it | 20:24 |
russellb | yes | 20:24 |
russellb | ovn-controller is aggressive about ensuring the flows are what it thinks they should be | 20:24 |
switchcade | I see. | 20:25 |
switchcade | I don't think there's an explicit way to clear counters. | 20:25 |
russellb | ok | 20:25 |
russellb | i guess restarting ovn-controller is just making it reprogram all of them then | 20:25 |
switchcade | yeah. I think that adding the existing flow will reset its count | 20:26 |
switchcade | is it possible to mock out ovn-controller at all? Say we run ovn-controller, dump the flows, then kill ovn-controller, then send the traffic | 20:26 |
switchcade | we could then manually manipulate the flows to remove uninteresting traffic | 20:27 |
switchcade | maybe this is overthinking it:) | 20:27 |
russellb | it's possible | 20:27 |
russellb | just stop ovn-controller | 20:27 |
switchcade | how about stateless ACLs to drop IPv6 traffic? | 20:27 |
russellb | and it'll stay how it is | 20:27 |
russellb | i could do that,s ure | 20:27 |
switchcade | unfortunately I think these suggestions are all more of a "shotgun approach" to debugging | 20:28 |
switchcade | but if we can get a clearer picture, maybe it's what we need. | 20:29 |
russellb | arp should at least be mostly separated, as all the ACL flows i have match ip4, ip6, or both | 20:30 |
russellb | but dropping all ipv6 is a good start, because i don't know what that junk is :) | 20:30 |
russellb | (i know what ipv6 is, i'm not sure what the packets were) | 20:30 |
* russellb also distracted ... in an IRC meeting | 20:36 | |
switchcade | I think the dump you posted previously was fairly clean, but I had a little trouble resolving the flow byte counts | 20:37 |
switchcade | seemed like some of the byte counts just suddenly entered a few tables in, so maybe the stats snapshot was not atomic | 20:38 |
switchcade | I think what you need to do is clear the flows, set up the flows, send traffic, wait 1-2 seconds, then dump the flows | 20:38 |
switchcade | make sure that the openflow layer statistics are up to date | 20:39 |
russellb | ok | 20:39 |
russellb | could have been, i was trying to grab it quickly | 20:39 |
switchcade | it shouldn't take more than 500ms I think to sync the statistics, but better safe | 20:39 |
*** asuvvari has quit IRC | 20:56 | |
*** asuvvari has joined #openstack-neutron-ovn | 20:56 | |
*** asuvvari has joined #openstack-neutron-ovn | 20:57 | |
switchcade | in my test environment, the initial ping packet coming from local stack has zone=0,ct_state=+trk+new | 20:59 |
switchcade | slowly extending its pipeline to make it more elaborate | 20:59 |
switchcade | haven't tried zones yet, could be a bug there | 21:00 |
switchcade | (although the latest zone code is different from what's present on justin's OVN ACL branch) | 21:00 |
russellb | switchcade: yeah, i'm using that branch .. | 21:01 |
russellb | meeting over yay | 21:01 |
*** asuvvari has quit IRC | 21:02 | |
switchcade | I'll spend a little longer on this, but my main priority is to get the next version of the patches out so we can merge it and be sure we're not encountering bugs that were already fixed:) | 21:05 |
*** asuvvari has joined #openstack-neutron-ovn | 21:06 | |
*** fzdarsky has joined #openstack-neutron-ovn | 21:20 | |
russellb | switchcade: makes sense! | 21:26 |
russellb | switchcade: i could also just shelve this and wait for your patches to be merged and justin to rebase the OVN ACL code | 21:27 |
russellb | since it sounds like that's imminent anyway | 21:27 |
russellb | i've got other stuff i can work on | 21:27 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Change oslo_db_api decorator to retry on request https://review.openstack.org/226497 | 21:42 |
*** salv-orlando has joined #openstack-neutron-ovn | 21:48 | |
switchcade | russellb: might be a direction to take. | 21:51 |
switchcade | russellb: for what it's worth, I've boiled down the OVN pipeline into the basic pieces that interact with conntrack and it seems to work on my WIP development branch (based on the v2 series) | 21:51 |
russellb | switchcade: if you think that branch could be missing important fixes, it's not worth wasting anyone's time | 21:52 |
switchcade | Flows are here: https://gist.github.com/joestringer/ffc6154c548ba0578643 | 21:52 |
switchcade | russellb: I don't know of any specific related fixes, but I'm also not sure how justin did the zone and table stuff | 21:52 |
russellb | ok | 21:52 |
switchcade | so there's some degree of possibility that it's just a bug in the ct interface with that particular branch | 21:53 |
switchcade | my test uses a network namespace in substitute for a VM, but the main point in the connection with the local stack I think | 21:53 |
russellb | yeah | 21:53 |
switchcade | syntax is a little different there, but it may help in future. | 21:54 |
switchcade | anyways, this case works and I see three relevant entries in conntrack: One for local port, one for zone=0 and one for the namespace port | 21:55 |
russellb | and it seems in my case, the state is missing for a zone | 21:56 |
switchcade | so the fundamentals appear to be there at least with the latest code. I'll switch to trying to push this closer to upstream and we can revisit this issue when it gets merged | 21:56 |
* russellb nods | 21:56 | |
russellb | yeah, revisiting sounds fine | 21:56 |
russellb | this branch has the awesome benefit of making it easy to test without building a new kernel :) | 21:58 |
russellb | how much of a pain would it be to rebase the kernel changes once your userspace patches merge to master? | 21:59 |
russellb | I can even have it running in OpenStack CI, which is cool. | 21:59 |
switchcade | I have a branch for it, minus IP fragment support | 22:00 |
switchcade | but I figured that it's so close to merging upstream that I won't push a fresh version with out-of-tree module until it gets merged | 22:00 |
switchcade | when it goes upstream, I plan to push a version like this again with out-of-tree, but no IP fragment support, then I'll start working on the IP frag backport | 22:01 |
switchcade | then hopefully in the next few weeks we can get that merged too | 22:01 |
switchcade | main thing is that there'll be a few OpenFlow interface changes and I wanted to get them solidified before I hand another build out. | 22:02 |
*** jerrygb has joined #openstack-neutron-ovn | 22:08 | |
*** markmcclain has quit IRC | 22:08 | |
*** markmcclain has joined #openstack-neutron-ovn | 22:16 | |
russellb | switchcade: sure, makes sense. thanks for all of your hard work. this is great stuff :) | 22:26 |
*** asuvvari has quit IRC | 22:27 | |
*** asuvvari has joined #openstack-neutron-ovn | 22:28 | |
*** asuvvari has quit IRC | 22:32 | |
russellb | switchcade: also thanks for hanging out in here :) | 22:33 |
*** jimchou has quit IRC | 22:33 | |
*** fzdarsky has quit IRC | 22:34 | |
*** fzdarsky has joined #openstack-neutron-ovn | 22:37 | |
*** asuvvari has joined #openstack-neutron-ovn | 22:41 | |
*** regXboi has quit IRC | 22:45 | |
*** asuvvari has quit IRC | 22:46 | |
*** shettyg has quit IRC | 22:48 | |
switchcade | russellb: always happy to lurk ;) | 23:01 |
*** fzdarsky has quit IRC | 23:08 | |
*** arosen has quit IRC | 23:21 | |
*** salv-orlando has quit IRC | 23:25 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!