Wednesday, 2016-01-20

*** yamamoto_ has joined #openstack-neutron-ovn00:00
*** numans has joined #openstack-neutron-ovn00:02
*** roeyc has joined #openstack-neutron-ovn00:13
*** roeyc has quit IRC00:35
*** yamamoto_ has quit IRC00:35
*** roeyc has joined #openstack-neutron-ovn00:37
*** dlundquist has quit IRC00:40
*** dlundquist has joined #openstack-neutron-ovn00:40
*** chandrav has quit IRC00:47
openstackgerritDustin Lundquist proposed openstack/networking-ovn: Devstack: cleanup datapath  https://review.openstack.org/26993800:55
*** roeyc has quit IRC00:57
*** dlundquist has quit IRC01:03
*** roeyc has joined #openstack-neutron-ovn01:03
*** numans has quit IRC01:08
*** salv-orlando has quit IRC01:12
*** yamamoto_ has joined #openstack-neutron-ovn01:24
*** dlundquist has joined #openstack-neutron-ovn01:32
*** dlundquist has quit IRC01:32
*** sidk has joined #openstack-neutron-ovn01:41
*** armax has quit IRC01:51
*** armax has joined #openstack-neutron-ovn01:55
*** sidk has quit IRC01:59
*** arosen has quit IRC02:04
*** roeyc has quit IRC02:07
*** armax has quit IRC02:12
*** fzdarsky|afk has quit IRC02:20
*** armax has joined #openstack-neutron-ovn02:22
*** fzdarsky has joined #openstack-neutron-ovn02:26
*** gongysh has joined #openstack-neutron-ovn02:47
*** gongysh has quit IRC03:16
*** armax has quit IRC05:09
*** armax has joined #openstack-neutron-ovn05:17
*** chandrav has joined #openstack-neutron-ovn05:19
openstackgerritDongcan Ye proposed openstack/networking-ovn: Fix in create_router and update_router  https://review.openstack.org/26872205:39
*** salv-orlando has joined #openstack-neutron-ovn05:49
*** gongysh has joined #openstack-neutron-ovn05:54
*** otherwiseguy has quit IRC05:58
*** otherwiseguy has joined #openstack-neutron-ovn05:58
*** gongysh has quit IRC06:13
openstackgerritBabu Shanmugam proposed openstack/networking-ovn: Enabling qos support through Logical_Port.options  https://review.openstack.org/26579806:18
*** numans has joined #openstack-neutron-ovn06:45
*** chandrav has quit IRC07:16
*** salv-orlando has quit IRC07:49
*** fzdarsky has quit IRC08:02
*** gongysh has joined #openstack-neutron-ovn08:13
*** gongysh has quit IRC08:21
*** gongysh has joined #openstack-neutron-ovn08:31
*** gongysh has quit IRC08:36
*** fzdarsky has joined #openstack-neutron-ovn10:25
*** yamamoto_ has quit IRC10:49
*** yamamoto has joined #openstack-neutron-ovn10:50
*** gongysh has joined #openstack-neutron-ovn11:04
*** yamamoto has quit IRC11:04
*** openstackgerrit has quit IRC11:43
-openstackstatus- NOTICE: review.openstack.org is being restarted to apply patches11:43
*** ChanServ changes topic to "review.openstack.org is being restarted to apply patches"11:43
*** openstackgerrit has joined #openstack-neutron-ovn11:44
*** gongysh has quit IRC11:46
*** ChanServ changes topic to "http://docs.openstack.org/developer/networking-ovn/ -=- OVN meeting Thursdays 10:15am Pacific / 1:15pm Eastern #openvswitch -=- Tempest health: http://goo.gl/9MaBJC"11:54
-openstackstatus- NOTICE: Restart done, review.openstack.org is available11:54
*** gongysh has joined #openstack-neutron-ovn11:55
*** gongysh has quit IRC12:07
*** yamamoto has joined #openstack-neutron-ovn12:28
*** yamamoto has quit IRC12:29
*** rtheis has joined #openstack-neutron-ovn12:29
*** yamamoto_ has joined #openstack-neutron-ovn12:29
*** palexster has quit IRC12:30
*** yamamoto has joined #openstack-neutron-ovn12:31
*** flaviof has quit IRC12:32
*** yamamoto_ has quit IRC12:34
*** palexster has joined #openstack-neutron-ovn12:43
*** SongmingYan has joined #openstack-neutron-ovn13:29
*** dslev has joined #openstack-neutron-ovn13:34
*** dslev has quit IRC13:43
*** toor has joined #openstack-neutron-ovn13:47
*** leifmadsen_ has joined #openstack-neutron-ovn13:55
*** Sam-I-Am has quit IRC13:56
*** leifmadsen has quit IRC13:56
*** toor_ has quit IRC13:56
*** leifmadsen_ is now known as leifmadsen13:56
*** yamamoto has quit IRC13:59
*** yamamoto has joined #openstack-neutron-ovn14:03
*** Sam-I-Am has joined #openstack-neutron-ovn14:05
*** flaviof has joined #openstack-neutron-ovn14:11
*** dslevin has joined #openstack-neutron-ovn14:15
*** dslevin has quit IRC14:16
*** dslevin has joined #openstack-neutron-ovn14:16
*** yamamoto has quit IRC14:33
*** yamamoto has joined #openstack-neutron-ovn14:39
*** SongmingYan has quit IRC15:31
*** numans has quit IRC15:38
*** flaviof has quit IRC15:41
*** yamamoto has quit IRC15:42
*** yamamoto has joined #openstack-neutron-ovn15:42
*** armax has left #openstack-neutron-ovn15:55
*** salv-orlando has joined #openstack-neutron-ovn16:30
*** chandrav has joined #openstack-neutron-ovn16:31
*** chandrav has quit IRC16:31
*** regXboi has joined #openstack-neutron-ovn16:33
*** yamamoto has quit IRC16:33
regXboirussellb: ping16:38
regXboirussellb: got a question for you16:38
russellbk16:38
regXboiI'm chasing some scaling issues with adding ports (we've talked about this in the past)16:39
regXboiand I'm trying to segment the trouble to pieces of code16:39
regXboiso I can submit patches :)16:39
Sam-I-Amrussellb: mornings16:40
regXboiand I'm trying to figure out where the do_commit method of the IDL transaction gets tripped from the networking-ovn code, because I'm not seeing the path right now16:40
russellbit's magic16:47
regXboirussellb: thanks - I think :)16:48
russellbi'm getting you a better answer ..16:48
russellbin networking_ovn/plugin.py you either have a call to execute() directly16:49
russellbor in some cases, a transaction to build up a list of commands16:49
russellband then that list gets executed all at once when the transaction context manager exits16:49
russellbthe commit is called down in base code we use from neutron16:50
russellbi'd suggest following a simpler one, like create_router16:50
russellbself._ovn.create_lrouter(...).execute(...)16:50
russellband dig through what all that does16:51
russellbit'll take you back into neutron code eventually16:51
regXboirussellb: unfortunately, the scaling problem I need to chase is in _create_port_in_ovn :/16:51
russellbok, that one is just more complicated because it uses transactions and multiple commands16:52
russellbto answer your first question about how the commit is done and how code hooks together, a simpler one would be easier to trace first i think16:52
regXboirussellb: yeah I know - I never pick the simple things :)16:52
regXboianyway, no worries, I think I've got enough from that to continue to make progress16:53
russellbok16:53
* russellb gets lunch16:55
mesteryrussellb: Some new info on https://bugs.launchpad.net/networking-ovn/+bug/1536003, take a peek when back16:56
openstackLaunchpad bug 1536003 in networking-ovn "high cpu usage on ovn-controller process" [High,Confirmed]16:56
mesteryAdditional debugging by folks16:56
russellbgood info16:57
mesteryyup16:57
russellblflow_run comment not surprising16:58
mesteryYeah16:58
*** armax has joined #openstack-neutron-ovn17:01
*** shettyg has joined #openstack-neutron-ovn17:23
*** chandrav has joined #openstack-neutron-ovn18:07
*** flaviof has joined #openstack-neutron-ovn18:12
*** gizmoguy_ has joined #openstack-neutron-ovn18:17
*** mamulsow has joined #openstack-neutron-ovn18:17
mamulsowHi Russell18:17
mesterymamulsow: Howd :)18:18
mesteryhowdy18:18
russellbmamulsow: hello!18:18
mesteryrussellb: Meet mamulsow18:18
russellbthanks for your work so far18:18
russellbi'm playing with code, will ping if i have something for you to test18:18
mamulsowokay, sounds good18:18
russellbmamulsow: was curious how willing you are to throw experimental patches on a node to see what happens :)18:18
mamulsowdefinitely open to that, this is just a test environment at this point18:19
russellbok18:19
*** stac- has joined #openstack-neutron-ovn18:19
*** ajo_ has joined #openstack-neutron-ovn18:21
*** ajo has quit IRC18:22
*** stac has quit IRC18:22
*** gizmoguy has quit IRC18:22
*** arosen has joined #openstack-neutron-ovn18:22
russellbmamulsow: https://github.com/russellb/ovs/commits/ovn-controller-perf18:41
russellbi have 1 test commit, but i put it in a branch with some other localnet port fixes i think you're using already18:42
russellbthis is the ovn-controller perf related change ... https://github.com/russellb/ovs/commit/0434cba6b0b5c925f7effd6c629e663f9acda2fd18:42
mamulsowokay, building..18:42
russellbit did what i expected in a local trivial test, but i haven't done any load testing of it yet18:42
russellbwe'll see!18:42
russellbthe only changes are to ovn-controller18:46
russellbto be clear..18:46
mamulsowis it reasonable to test on just one of the nodes or do I need to push this out to every node running ovn-controller?18:46
russellbjust one should be fine18:46
mamulsowk18:46
russellband see if that has a noticable CPU impact for you18:46
russellbit should cut way down on what lflow_run does though18:46
russellb... this may be too aggressive of an optimization actually, but still interesting to know the CPU impact18:49
mamulsowit definitely helped, all of the other ovn-controllers are 95%+ CPU, this one is hovering around 40-50%18:53
mamulsow36.518:54
mamulsow27%18:54
mamulsow38%18:54
russellbnot bad..18:56
russellbi wouldnt' deploy it elsewhere, i'm not sure it's totally legit yet18:57
mamulsowsure18:57
mamulsowif there's anything you want me to test on it while I have it running let me know18:57
russellbnope, sounds like it did what i hoped18:57
*** flaviof has quit IRC18:58
*** flaviof has joined #openstack-neutron-ovn18:58
russellbmamulsow: so it's hovering at 95+% even when not creating/destroying anything?18:59
*** ajo_ has quit IRC18:59
*** ajo_ has joined #openstack-neutron-ovn18:59
mamulsowwell, hold on, I happened to be creating networks at that time19:00
mamulsowstopping that19:00
russellbok, when changes happen, it's going to recalcuate a new desired state, but then it should go idle again if no changes are happening19:01
mamulsowokay, now that nothing is being created/deleted the others are hovering in the 40-50% range and the updated one is hovering around 9%19:02
russellbok19:02
mamulsowI had cleaned up some stuff earlier and was trying to get back to a state where we're closer to 100% on the others at idle19:02
mamulsowbut yeah, with nothing happening now, the other ovn-controller nodes are around 40-70% and this one is pretty consistently in the 9-10% range19:04
russellbon one of the 40-70% nodes, can you turn on debug logging again, just for a minute or so19:07
russellbit *might* be very verbose, but might not be ....19:07
russellbovs-appctl -t ovn-controller vlog/set dbg19:07
russellb... wait a while ...19:07
russellbovs-appctl -t ovn-controller vlog/set info19:07
russellb(or whatever level you want to set it back to)19:07
russellbi have ... ANOTHER IDEA19:07
mamulsow!!19:08
openstackmamulsow: Error: "!" is not a valid command.19:08
mamulsow:)19:08
mamulsowanything in particular I should be looking for in the debug log?19:11
russellbnot exactly, was hoping you could share19:17
* regXboi makes note of how to change controller logs :)19:18
russellbregXboi: works for all ovs/ovn daemons, i think19:18
regXboicool - I was just starting to look up how to do that19:19
russellbdetails in ovs-appctl man page19:19
regXboifor ovn-controller, for one of the next sets of tests I'm planning19:19
regXboiack19:19
mamulsowhttp://pastebin.com/thRmPPWZ19:21
russellbok19:22
russellbthank you19:22
russellbi think what's happening is that idle isn't so idle because ovn-controller is re-calculating the full state every time it gets woken up for any reason19:23
russellband that includes these keepalive exchanges, both via openflow to the local switch, and via ovsdb19:23
russellbso we should be able to tell ovn-controller to calm down and not recalculate the world if nothing actually changed19:23
russellbcurrent idea/theory anyway19:24
mesteryrussellb: that makes sense to me19:24
russellbnow if i can turn that into code!19:25
mestery:)19:25
mesterymagic!19:25
russellbmamulsow: can you update the patched node with the current code in that branch?  it's not going to help as much, but i think the patch is a valid optimization now ...19:29
russellbthe last revision was too aggressive sadly19:29
Sam-I-Amwell, i just fired off a change to zuul so networking-ovn doesnt run expensive jobs for docs and stuff19:34
russellbSam-I-Am: thanks!19:34
Sam-I-Amrussellb: https://review.openstack.org/#/c/270444/19:34
Sam-I-Ammight check to see if i missed anything19:34
Sam-I-Amor was too greedy19:34
russellbSam-I-Am: that says skip dsvm jobs if all files match at least one pattern in that list?19:36
Sam-I-Amyeah19:36
russellbok thanks19:36
Sam-I-Ammore or less19:36
Sam-I-Amthat file is scary19:36
russellbmestery: need infra liason on ^^^19:37
Sam-I-Amthe rally job was named weirdly, so i had to add something to the job filter line19:39
arosenrussellb: This is weird..  Any idea where this this LOG statement is coming from? http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/026b9af/logs/screen-q-svc.txt.gz#_2016-01-19_23_29_38_07719:47
arosen*** FAILED TO DELETE NETWORK STILL IN USE ***19:47
arosenthat's from this patch: https://review.openstack.org/#/c/269897/19:48
arosenwhich doesn't add that line of logging.19:48
mamulsowrussellb: got the new build in and seems like it's hovering around 20% now, while the other nodes are still in the 40-70% range19:51
russellbmamulsow: ok, still a reasonable improvement19:51
russellbarosen: that patch is on top of the other19:52
russellbmamulsow: are you interested in getting credit for testing the patch in the commit message?19:53
openstackgerritAaron Rosen proposed openstack/networking-ovn: tempest debugging, ignore...  https://review.openstack.org/27045319:53
russellbwe sometimes add "Tested-by: Name <email>" headers for that19:53
arosenrussellb:  ah... sorry the new gerrit ui threw me off ;)19:54
mamulsowsure19:54
russellbarosen: though the first tempest run didn't actually include my ovn patch, i forgot to actually commit it >_<19:54
russellbarosen: the recheck should have it19:54
mamulsowrussellb: so at this point do you think I should push this updated ovn-controller out to all the nodes and run some more scale tests20:00
russellbmamulsow: sure, i think that patch should be safe20:10
russellbi'm not sure how long it will take me to get the next patch done20:10
russellbmamulsow: what's your email addr?20:11
mesteryrussellb: Done on the infra liaison +120:11
mesteryThanks Sam-I-Am20:11
russellbmestery: ack thanks20:11
Sam-I-Amlets unblock those docs (i hope)20:14
Sam-I-Ami might have terribly broken something20:14
russellbnow let's unblock tempest!20:14
russellb:)20:15
russellbi might have to turn to bribes soon20:15
*** salv-orlando has quit IRC20:18
*** gongysh has joined #openstack-neutron-ovn20:18
mesteryYes to that!20:18
*** gongysh has quit IRC20:23
mamulsowrussellb: mamulsow@us.ibm.com20:26
russellbthanks20:27
russellbmamulsow: i have another patch for you to try on 1 node when you're ready20:32
russellbit's in https://github.com/russellb/ovs/commits/ovn-controller-perf20:32
russellbthe new patch is https://github.com/russellb/ovs/commit/88f15971f1d66d41b0560f1735fb837655eb80fc20:33
* russellb on a roll20:33
mamulsowcool, building now20:35
* mestery has a tear in his eye from the teamwork on display in this channel20:56
russellbhehe20:58
russellbcorporate barriers be damned, we have collaboration to do here!20:59
mesteryrussellb: You sir, are a shining example of open source done right.21:00
* mestery more tears in his eyes21:00
mestery:)21:00
russellb<321:00
mamulsow+121:00
mesteryyou too mamulsow :)21:00
mamulsowso... not to break up the good times, but it looks like this latest patch has actually made things worse21:02
* Sam-I-Am appears21:03
* regXboi wanders in 21:03
Sam-I-Amrussellb: look out.21:03
mesterylol21:03
Sam-I-Amthe calvary has arrived21:03
mesteryhehehehehe21:03
Sam-I-Ami rode my horse in backward21:03
regXboior "more godd***mned cowmen"21:03
russellbmamulsow: well that's unfortuante.21:03
Sam-I-Amand i'm the headless horseman now21:04
regXboidepending on your point of view21:04
mesterylol21:04
regXboianyway - russellb - where are we w.r.t to the dsvm job?21:04
russellbmamulsow: can you confirm that if you drop just the latest patch, it gets better again?21:04
russellbregXboi: still very broken21:04
mamulsowso the modified one is at 100% and the others are still in the 40-70% range21:04
russellbi've mostly been looking into these new bug reports today21:04
regXboiok, let me start with my patch21:04
regXboiand work out from there21:04
russellbmamulsow: heh, well, that's .... not intentional!21:05
Sam-I-Ammestery: this looks like the six job that broke neutron too?21:07
Sam-I-Amor, six thing21:07
regXboiSam-I-Am: where are you looking?21:07
russellbyes the six thing broke networking-ovn as well21:07
Sam-I-Amhttp://logs.openstack.org/38/269938/1/check/gate-tempest-dsvm-networking-ovn/84b836c/logs/devstacklog.txt.gz21:07
mesterySam-I-Am: Yes, all dsvm jobs are broken due to that21:08
regXboiSam-I-Am: isn't there a workaround/fix for this already in train?21:08
Sam-I-Amthats what i thought21:08
mesteryBut the ovn dsvm job wasn't happy for a while before that21:08
russellbyes21:08
mestery:(21:08
russellbyeah ovn job broken before that21:08
russellbon failures we haven't debugged21:08
Sam-I-Amoh, so... this needs fixin' first21:08
russellbor successfully debugged yet21:08
russellbyaks everywhere21:09
regXboiSam-I-Am: yes21:09
regXboitake a look at https://review.openstack.org/#/c/269121/21:09
regXboifor an example21:09
flaviofrussellb wrt https://github.com/russellb/ovs/commit/88f15971f1d66d41b0560f1735fb837655eb80fc#diff-fe69598e03bed8fbd3aa3f970b12b83bR27621:10
Sam-I-AmregXboi: lookin21:10
flaviofshoun't init with ovs_seqno = ovsdb_idl_get_seqno(ovs_idl_loop.idl) ?21:10
regXboiSam-I-Am: it looks like something isn't getting torn down21:10
russellbflaviof: apparently that made things much worse :)21:10
regXboiI wonder21:10
Sam-I-Amyeah21:11
russellbflaviof: well, no, because i want to make sure it runs the first time21:11
Sam-I-Amdid a patch cause this?21:11
russellbor that was my thinking21:11
flaviofrussellb ic.21:11
regXboiSam-I-Am: that's my thought21:11
russellbflaviof: actually i think you're right, but i don't think that's why the patch sucks21:12
Sam-I-Amtrying to find the pattern of fail here21:12
flaviofack; no biggie; just trying to catch up... you guys move fast and i'm still working off the odl chains ;)21:13
russellbheh21:13
russellbi tweaked the commit with your suggestion, thanks21:13
mesteryflaviof: We'll get you integrated soon enough :)21:14
russellbno shortage of work to do right now21:14
flaviofmestery +121:15
*** rtheis has quit IRC21:15
Sam-I-AmregXboi: things seem to break consistently after this - https://review.openstack.org/#/c/178826/21:16
Sam-I-Amits an low-numbered patch too21:16
Sam-I-Aminterestingly, it passed the gate21:17
Sam-I-Amthe contents of the patch sort of relate to the problem21:17
regXboiwell... what if we try reverting it?21:17
russellbwe have a revert posted already21:17
russellbwhich fails as consistently as everything else21:17
Sam-I-Amwith the same error?21:17
russellbafaict so far21:17
russellbyes21:17
Sam-I-Amcrap. has anyone replicated this locally?21:17
Sam-I-Amessentially run whatever tempest is doing by hanf21:18
mesteryrussellb: Just to confirm (there was a lot going on here), the code mamulsow is now running also has your providernet fix as well?21:18
Sam-I-Amhand21:18
*** salv-orlando has joined #openstack-neutron-ovn21:18
*** salv-orlando has quit IRC21:19
russellbmestery: yesa21:19
mesterymamulsow: I think we can safely look to change the deployment topology to create provider networks as well, lets sync on that tomorrow before the standup21:19
*** salv-orlando has joined #openstack-neutron-ovn21:19
mesteryrussellb: Thanks sir!21:19
russellbmestery: yeah, switching to provider net testing sounds like a good move if that's the target topology anyway21:20
* mestery nods in agreement21:20
Sam-I-Amrussellb: i wanted to do that in the gate21:20
mesteryThough I'd still like to keep going with private network scale testing21:20
russellbmakes sense21:20
mesteryBetter to get an idea of things there and work to solve in parallel21:20
regXboiso ... I've got a failure on teardown of a test class21:20
regXboiwhich implies the class itself isn't cleaning up properly21:20
mesteryregXboi: Nice find!21:21
regXboiactually three of the four errors I see are teardown of classes21:21
regXboiwhich makes me think either the test class is wrong OR there is automatic port type being created and not cleaned up21:21
Sam-I-AmregXboi: are we talking about the same thing?21:21
regXboiSam-I-Am: possibly21:22
Sam-I-Amthe gate fizz21:22
regXboiyes, I'm looking at a patch set that failed before the latest six issue21:22
regXboiit was rechecked once, so there are four test failures to look at21:23
regXboithree are on teardown and I'm looking at the fourth now21:23
openstackgerritAaron Rosen proposed openstack/networking-ovn: tempest debugging, ignore...  https://review.openstack.org/27045321:23
mesteryregXboi: Looks like arosen is working this issue too21:23
mestery^^^21:23
russellbarosen: ah, yeah, that's cleaner than what i did, heh21:24
russellbregXboi: mestery yeah we were trying to get more debug about what ports hadn't been deleted, since we're seeing network delete failures regularly because of ports still being around21:24
arosenyea i wanna get a better look at what this hanging port looks like.21:25
russellbthat's been the most common error i've seen21:25
mesteryCool21:25
mesteryLooks like regXboi is looking at different issues21:25
arosenit's weird that it's not consistent.21:25
Sam-I-Amarosen: the port thing?21:25
regXboiarosen is looking at the same thing I'm coming at21:25
arosenthe tempest test fails saying cannot delete network because of port.21:26
Sam-I-Amoh, i just got to what regXboi found21:26
arosenlet me read up.21:26
regXboiyou know ... looking at the logging, I almost think it's something related to DHCP21:27
arosenregXboi: the tear down class in tempest or?21:27
regXboiarosen: I've got three tear down classes yes, but I have one which is in deleting after setting up multiple NICs21:28
Sam-I-Amarosen: http://logs.openstack.org/21/269121/3/check/gate-tempest-dsvm-networking-ovn/7f8b60d/logs/tempest.txt.gz#_2016-01-19_20_36_09_10221:28
arosenhttps://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_v2.py#L369 <-- it will delete the dhcp port for you so it's not that one.21:28
* arosen loading....21:29
regXboiinteresting - *THAT* is a network with an IPv6 subnet21:29
arosenyup, that's from the client side.21:30
regXboilet's see if the client port that was created gets cleaned up21:30
openstackgerritRussell Bryant proposed openstack/networking-ovn: Test patch, ignore.  https://review.openstack.org/26989721:31
russellb^^^ that's for debugging a different thing21:31
russellban error in ovn-controller i can't easily reproduce locally21:31
regXboioh crap21:31
regXboiI think I know what this was21:31
regXboinote *was*21:32
regXboirussellb: the -dsvm- job runs which tests from neutron?21:33
russellbnone21:33
russellbtempest only21:33
regXboioh21:33
regXboin/m then - I'm looking in the wrong place21:33
Sam-I-AmregXboi: what were you thinking?21:34
regXboiSam-I-Am: I was thinking the router delete optimization that I pushed in a while back and the partial revert that addresses potential loss of events at the L3 agent21:34
regXboibecause it looks like what's hanging the network delete is the router interface port21:34
Sam-I-Amthis is using the l3 agent21:35
regXboiyes... it is21:35
regXboiand the partial revert merged earlier this week21:35
Sam-I-Amhowever i would think something that breaks the l3 agent here would also impact neutron21:35
flaviofoff the wall question: is neutron l3_ha==True ? I've seen issues in 'other sdn controller (aka odl)' where it is very confused about the 'special network' as well as the tenant-id-less that it has.21:35
flaviofplease ignore if this is not applicable!21:36
regXboiwell - no - that was the thing - the neutron tests were modified to work with the optimized code21:36
russellbflaviof: i don't think we're setting it at all, so whatever that does21:36
regXboiand there was the potential for a race condition that carl_baldwin pointed out21:36
Sam-I-AmregXboi: this looks like a standard temptest test though, no?21:36
regXboiSam-I-Am: per russellb, it is, so that is why I think I'm barking up the wrong tree21:36
regXboiI will be rechecking a patch set after the six fix merges ... just to see21:37
regXboispeaking of which, do we know the patchset with the six fix?21:37
Sam-I-Amtrying to remember where that even went21:38
*** shettyg has quit IRC21:39
russellbmamulsow: I think we'll need to get to where we can generate a profile of ovn-controller so i don't just keep guessing at things.21:39
*** shettyg has joined #openstack-neutron-ovn21:39
russellbmamulsow: i haven't done profiling of a C app in a long time, but i used oprofile long ago...21:40
* regXboi channels Geisel for a moment21:40
mamulsowI was using poor-man's profiling by adding print statements so I could see which things were taking longest to run21:40
*** SpamapS has joined #openstack-neutron-ovn21:40
russellbmamulsow: :)21:40
* regXboi is looking for the six fix that let's tox rok21:40
mamulsowI can build debug and run a profiler against it if you want21:40
mamulsowI'm also happy just adding print statements to track it down21:41
russellbmamulsow: sure, that'd be great.  i don't have instructions yet though21:41
russellbbut if you know how, have at it!21:41
*** roeyc has joined #openstack-neutron-ovn21:41
arosenregXboi:  from the logs i  see the delete port call for the left over port come in after the delete network call which seems odd. I wonder if this is a possible race in tempest. Looking at how that works now.21:41
russellbarosen: weird21:42
russellbor does the delete come from nova?21:42
Sam-I-AmregXboi: https://review.openstack.org/#/c/269954/21:42
russellbthat could explain the race potential21:42
regXboiSam-I-Am: that says merged, so let's run a recheck21:43
regXboiand away we go21:44
Sam-I-Amlooks like pip revealed something we really shouldnt be doing21:44
Sam-I-Amhowever, this issue breaks things long before the port problem21:45
Sam-I-Amjust would be nice to get past this one21:45
regXboirussellb: if I recall correctly, nova is used in some tempest tests rather than neutron21:47
regXboiI've argued against that in the past21:47
russellbi mean, if port delete is the result of deleting a nova VM21:47
regXboiespecially for neutron jobs21:47
russellbso if tempest does delete_vm() then delete_network()21:47
russellbwe could have this race, right?21:47
* russellb just making guesses21:47
regXboiwell - the port I think I'm seeing is for the router interface. not a compute instance21:48
regXboiso I'm not so sure21:48
Sam-I-Ami'm looking in the l3 agent log21:48
regXboibut my memory says you are correct about nova21:48
regXboioh my21:49
regXboioh my oh my oh my21:49
*** chandrav has quit IRC21:49
regXboicould this be a sideeffect of the keystone middleware problem?21:49
mesterydum dum dum21:50
mestery:)21:50
Sam-I-AmregXboi: you mean things being really slow?21:50
regXboiif we are slowing things down, then that would exacerbate any race condition, wouldn't it?21:50
Sam-I-Ami think so21:50
Sam-I-Ammaybe we need a neutron-slow job to check for races :)21:50
regXboiI don't know how to answer that21:51
mesteryregXboi: Look for notmorgan's thread on that, he has a devstack patch21:51
mesteryOne way to verify would be to make one of our patches dependent on his21:51
mesteryTo see if that slowdown is the culprit21:51
regXboimestery: I'm re-checking 269121 now that the six fix has merged21:51
mesteryAck21:52
regXboiand if it fails, I'll look at the failures and take it from there21:52
Sam-I-Ami thought the middleware patch merged21:55
regXboiSam-I-Am: then this should pass ;)21:56
*** shettyg has quit IRC21:58
Sam-I-Aminterestingly i dont see anything obvious in the keystonemiddleware repo21:58
regXboiwell, the dsvm job is now running here: https://jenkins02.openstack.org/job/gate-tempest-dsvm-networking-ovn/88/console21:58
regXboiinterestingly, job 84 three days ago passed21:59
regXboiit was for 26871722:00
regXboiso the middleware cap failed on grenade-dsvm-multinode22:00
mamulsowrussellb: sorry, I found out that there was other testing going on in the environment at that time22:00
mamulsowso I think the cause of the patched one being pegged at 100% was from that testing not your updates22:01
russellbmamulsow: no worries22:01
russellbok, thanks22:01
russellbi haven't trashed the patch just yet :)22:01
russellblet me know if you get another chance to try and compare22:01
*** roeyc has quit IRC22:02
mesteryrussellb: Love it: "Grudgingly-Acked-by:" :)22:04
regXboimestery: reference?22:04
russellb:-D22:04
* regXboi wants to see that one22:04
russellbhttp://openvswitch.org/pipermail/dev/2016-January/064745.html22:05
mesteryrussellb: Faster than me! :)22:05
regXboirotflmao22:05
*** roeyc has joined #openstack-neutron-ovn22:08
SpamapSmamulsow: on an ovn related note.. testing 500 fake hypervisors with no neutron l2 agents results in a not-terribly-busy rabbitMQ. ;)22:09
SpamapSnow.. does anybody know how to run more than 500 docker containers on a single box without exploding the networking stack? ;-)22:09
russellbheh, yes, not using rabbitmq tends to make rabbitmq less busy22:10
russellb(•​_•)   ...   ( •_​•)>⌐■-■   ...   (⌐■_■)22:11
mesterySpamapS: I've heard this OVN thing may help with that ...22:11
SpamapSmestery: lies22:11
mesterylol22:11
russellbhttps://github.com/openvswitch/ovs/blob/master/INSTALL.Docker.md22:11
russellbnot sure what you mean by "exploding the networking stack" though :)22:12
* mestery pictures pieces of IP addresses laying all over SpamapS's office22:12
SpamapSrussellb: Oh I"m just using the default bridge networking in docker. At around 540 dockers running, all calls to socket() begin to fail22:12
SpamapSYes there's netmasks and node numbers litered throughout22:13
russellb¯\_(ツ)_/¯22:13
SpamapSEach docker running a nova-compute configured for fakevirt22:14
mesterynice22:14
SpamapSso yeah, at some point socket() just says NOPE22:15
SpamapSping, from outside the containers, for instance, does 'sendmsg: Invalid argument'22:15
*** chandrav has joined #openstack-neutron-ovn22:15
SpamapSbecause ping() just assumes socket() always works. ;)22:15
openstackgerritAaron Rosen proposed openstack/networking-ovn: random test  https://review.openstack.org/27050222:17
russellbooh random test22:17
arosenrussellb: matching up the tempest logs I noticed that your query isn't returning any ports. Which seems weird.22:19
arosenDetails: {u'detail': u'', u'message': u'Unable to complete operation on network b26280b9-6eff-4d1d-a2b1-ea2c1f59b0fd. There are one or more ports still in use on the network.', u'type': u'NetworkInUse'}22:19
arosenhttp://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/025fc59/logs/screen-q-svc.txt.gz#_2016-01-20_19_52_42_28322:19
arosensee how the log statement: " *** DUMPING DETAILS OF PORT IN USE ***" doesn't come after that trace?22:20
russellbright..22:20
russellb:-/22:20
regXboiwell well well22:20
russellbrace?22:20
russellbsounds like port was gone by the time we queried again?22:20
arosenmaybe.22:20
regXboimestery: *most* of the linearity is in ovsdb, but there is a little bit in row by value 222:20
russellbarosen: just put a loop around it, lolz22:21
arosenthus ml2 was born.22:21
russellbthat was my thought yes22:21
arosenborn again*22:21
arosen:)22:21
russellbarosen: is nova deleting the port in question?  or do you not know?22:22
aroseni need to look closer at that.22:22
russellbk, i'll keep looking too (tomorrow though, i need to leave in a few minutes)22:23
*** gangil has joined #openstack-neutron-ovn22:25
*** gangil has joined #openstack-neutron-ovn22:25
aroseni did find this in nova but not completely sure if it's related: http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/025fc59/logs/screen-n-cpu.txt.gz#_2016-01-20_19_51_49_28622:27
aroseni don't think it is though.22:28
regXboiok, this recheck is still going to fail22:28
regXboion the multiple_nics_order test22:28
regXboibut ... the middleware patch is hung up as well22:28
regXboiso I will see about a depends on for it tomorrow morning22:29
Sam-I-AmregXboi: what was the middleware patch #?22:29
regXboi27041722:30
Sam-I-Amyeah, thats stuck22:31
openstackgerritAaron Rosen proposed openstack/networking-ovn: Add missing call to self._process_l3_delete()  https://review.openstack.org/27050922:34
Sam-I-AmregXboi: wonder if its worth rechecking22:35
regXboiSam-I-Am: go ahead - I'm walking away shortly22:35
regXboias in as soon as dinner beeps at me22:35
russellbthanks for the help everyone!  i really appreciate it!22:35
russellbi'm out for now ... need to go home / eat / see family22:35
Sam-I-Ampffffft22:35
mesteryhave a good night regXboi22:35
mesteryyou too regXboi22:35
Sam-I-Ammy family is going to do family things....22:36
arosenenjoy, catch you guys later!22:36
russellbarosen: i see the cause of ovn-controller log spam, but i don't think it would cause any test failures.  i think the log spam is the worst it's doing22:36
Sam-I-Ami'm not going anywhere!22:37
arosenk22:37
aroseni have a multi node setup locally of ovn running the latest code and it seems to be working fine.22:37
arosenwith the latest code.22:37
russellbfigures22:37
regXboimestery: that was .... odd :)22:37
arosenor at least booting vms testing acls are working fine :)22:38
mesteryarosen: We have a 125 node system doing the same and it's fine :)22:38
aroseni ran into a really annoying corner case yesterday that was very self inflicted.22:38
aroseni cloned a vm to add as a slave.22:38
arosenand that OVN_UUID that's gets set in ovsdb as the system-id was the same on both systems.22:39
arosenthat caused things not to work..22:39
arosenit was funny the output of ovn-sbctl would only show one node and kept switching22:39
mesteryarosen: Heh :)22:40
russellbha, yes, ovn-controller on each node were fighting each other22:40
mesteryThere can be only one!22:40
arosenI pinged ben about the output and he gave me that hint22:40
arosenyup :)22:40
russellbi bet ovn-controller.log had "wtf?!" all in it22:40
arosenI bet others will hit this though.22:40
mesteryrofl22:40
russellbhopefully the log had a hint though22:40
arosennope it didn't have anything in the log.22:40
russellboh, well that's lame22:40
arosenor at least nothing i saw.22:41
russellbyou can file a bug if you'd like against networking-ovn22:41
russellband i'll look eventually22:41
russellbor someone can22:41
aroseni think that would be against ovn though not networking-ovn?22:41
russellbyes22:41
arosenare we filing ovn bugs there too?22:41
russellbbut ovn doesn't have a tracker ....22:41
arosenk22:41
russellbi'm using networking-ovn, or my own private trello ...22:41
arosensounds good i can file a bug there.22:41
russellbk22:42
russellbalright i'm out for real22:42
russellbttyl22:42
arosenlater!22:42
mamulsowrussellb: thanks for your help!22:42
* regXboi also heads exit stage right22:42
*** regXboi has quit IRC22:42
*** roeyc has quit IRC22:47
*** palexster has quit IRC23:03
*** palexster has joined #openstack-neutron-ovn23:03
openstackgerritAaron Rosen proposed openstack/networking-ovn: Add missing call to self._process_l3_update/delete()  https://review.openstack.org/27050923:14
*** roeyc has joined #openstack-neutron-ovn23:42
*** chandrav has quit IRC23:51

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!