*** yamamoto_ has joined #openstack-neutron-ovn | 00:00 | |
*** numans has joined #openstack-neutron-ovn | 00:02 | |
*** roeyc has joined #openstack-neutron-ovn | 00:13 | |
*** roeyc has quit IRC | 00:35 | |
*** yamamoto_ has quit IRC | 00:35 | |
*** roeyc has joined #openstack-neutron-ovn | 00:37 | |
*** dlundquist has quit IRC | 00:40 | |
*** dlundquist has joined #openstack-neutron-ovn | 00:40 | |
*** chandrav has quit IRC | 00:47 | |
openstackgerrit | Dustin Lundquist proposed openstack/networking-ovn: Devstack: cleanup datapath https://review.openstack.org/269938 | 00:55 |
---|---|---|
*** roeyc has quit IRC | 00:57 | |
*** dlundquist has quit IRC | 01:03 | |
*** roeyc has joined #openstack-neutron-ovn | 01:03 | |
*** numans has quit IRC | 01:08 | |
*** salv-orlando has quit IRC | 01:12 | |
*** yamamoto_ has joined #openstack-neutron-ovn | 01:24 | |
*** dlundquist has joined #openstack-neutron-ovn | 01:32 | |
*** dlundquist has quit IRC | 01:32 | |
*** sidk has joined #openstack-neutron-ovn | 01:41 | |
*** armax has quit IRC | 01:51 | |
*** armax has joined #openstack-neutron-ovn | 01:55 | |
*** sidk has quit IRC | 01:59 | |
*** arosen has quit IRC | 02:04 | |
*** roeyc has quit IRC | 02:07 | |
*** armax has quit IRC | 02:12 | |
*** fzdarsky|afk has quit IRC | 02:20 | |
*** armax has joined #openstack-neutron-ovn | 02:22 | |
*** fzdarsky has joined #openstack-neutron-ovn | 02:26 | |
*** gongysh has joined #openstack-neutron-ovn | 02:47 | |
*** gongysh has quit IRC | 03:16 | |
*** armax has quit IRC | 05:09 | |
*** armax has joined #openstack-neutron-ovn | 05:17 | |
*** chandrav has joined #openstack-neutron-ovn | 05:19 | |
openstackgerrit | Dongcan Ye proposed openstack/networking-ovn: Fix in create_router and update_router https://review.openstack.org/268722 | 05:39 |
*** salv-orlando has joined #openstack-neutron-ovn | 05:49 | |
*** gongysh has joined #openstack-neutron-ovn | 05:54 | |
*** otherwiseguy has quit IRC | 05:58 | |
*** otherwiseguy has joined #openstack-neutron-ovn | 05:58 | |
*** gongysh has quit IRC | 06:13 | |
openstackgerrit | Babu Shanmugam proposed openstack/networking-ovn: Enabling qos support through Logical_Port.options https://review.openstack.org/265798 | 06:18 |
*** numans has joined #openstack-neutron-ovn | 06:45 | |
*** chandrav has quit IRC | 07:16 | |
*** salv-orlando has quit IRC | 07:49 | |
*** fzdarsky has quit IRC | 08:02 | |
*** gongysh has joined #openstack-neutron-ovn | 08:13 | |
*** gongysh has quit IRC | 08:21 | |
*** gongysh has joined #openstack-neutron-ovn | 08:31 | |
*** gongysh has quit IRC | 08:36 | |
*** fzdarsky has joined #openstack-neutron-ovn | 10:25 | |
*** yamamoto_ has quit IRC | 10:49 | |
*** yamamoto has joined #openstack-neutron-ovn | 10:50 | |
*** gongysh has joined #openstack-neutron-ovn | 11:04 | |
*** yamamoto has quit IRC | 11:04 | |
*** openstackgerrit has quit IRC | 11:43 | |
-openstackstatus- NOTICE: review.openstack.org is being restarted to apply patches | 11:43 | |
*** ChanServ changes topic to "review.openstack.org is being restarted to apply patches" | 11:43 | |
*** openstackgerrit has joined #openstack-neutron-ovn | 11:44 | |
*** gongysh has quit IRC | 11:46 | |
*** ChanServ changes topic to "http://docs.openstack.org/developer/networking-ovn/ -=- OVN meeting Thursdays 10:15am Pacific / 1:15pm Eastern #openvswitch -=- Tempest health: http://goo.gl/9MaBJC" | 11:54 | |
-openstackstatus- NOTICE: Restart done, review.openstack.org is available | 11:54 | |
*** gongysh has joined #openstack-neutron-ovn | 11:55 | |
*** gongysh has quit IRC | 12:07 | |
*** yamamoto has joined #openstack-neutron-ovn | 12:28 | |
*** yamamoto has quit IRC | 12:29 | |
*** rtheis has joined #openstack-neutron-ovn | 12:29 | |
*** yamamoto_ has joined #openstack-neutron-ovn | 12:29 | |
*** palexster has quit IRC | 12:30 | |
*** yamamoto has joined #openstack-neutron-ovn | 12:31 | |
*** flaviof has quit IRC | 12:32 | |
*** yamamoto_ has quit IRC | 12:34 | |
*** palexster has joined #openstack-neutron-ovn | 12:43 | |
*** SongmingYan has joined #openstack-neutron-ovn | 13:29 | |
*** dslev has joined #openstack-neutron-ovn | 13:34 | |
*** dslev has quit IRC | 13:43 | |
*** toor has joined #openstack-neutron-ovn | 13:47 | |
*** leifmadsen_ has joined #openstack-neutron-ovn | 13:55 | |
*** Sam-I-Am has quit IRC | 13:56 | |
*** leifmadsen has quit IRC | 13:56 | |
*** toor_ has quit IRC | 13:56 | |
*** leifmadsen_ is now known as leifmadsen | 13:56 | |
*** yamamoto has quit IRC | 13:59 | |
*** yamamoto has joined #openstack-neutron-ovn | 14:03 | |
*** Sam-I-Am has joined #openstack-neutron-ovn | 14:05 | |
*** flaviof has joined #openstack-neutron-ovn | 14:11 | |
*** dslevin has joined #openstack-neutron-ovn | 14:15 | |
*** dslevin has quit IRC | 14:16 | |
*** dslevin has joined #openstack-neutron-ovn | 14:16 | |
*** yamamoto has quit IRC | 14:33 | |
*** yamamoto has joined #openstack-neutron-ovn | 14:39 | |
*** SongmingYan has quit IRC | 15:31 | |
*** numans has quit IRC | 15:38 | |
*** flaviof has quit IRC | 15:41 | |
*** yamamoto has quit IRC | 15:42 | |
*** yamamoto has joined #openstack-neutron-ovn | 15:42 | |
*** armax has left #openstack-neutron-ovn | 15:55 | |
*** salv-orlando has joined #openstack-neutron-ovn | 16:30 | |
*** chandrav has joined #openstack-neutron-ovn | 16:31 | |
*** chandrav has quit IRC | 16:31 | |
*** regXboi has joined #openstack-neutron-ovn | 16:33 | |
*** yamamoto has quit IRC | 16:33 | |
regXboi | russellb: ping | 16:38 |
regXboi | russellb: got a question for you | 16:38 |
russellb | k | 16:38 |
regXboi | I'm chasing some scaling issues with adding ports (we've talked about this in the past) | 16:39 |
regXboi | and I'm trying to segment the trouble to pieces of code | 16:39 |
regXboi | so I can submit patches :) | 16:39 |
Sam-I-Am | russellb: mornings | 16:40 |
regXboi | and I'm trying to figure out where the do_commit method of the IDL transaction gets tripped from the networking-ovn code, because I'm not seeing the path right now | 16:40 |
russellb | it's magic | 16:47 |
regXboi | russellb: thanks - I think :) | 16:48 |
russellb | i'm getting you a better answer .. | 16:48 |
russellb | in networking_ovn/plugin.py you either have a call to execute() directly | 16:49 |
russellb | or in some cases, a transaction to build up a list of commands | 16:49 |
russellb | and then that list gets executed all at once when the transaction context manager exits | 16:49 |
russellb | the commit is called down in base code we use from neutron | 16:50 |
russellb | i'd suggest following a simpler one, like create_router | 16:50 |
russellb | self._ovn.create_lrouter(...).execute(...) | 16:50 |
russellb | and dig through what all that does | 16:51 |
russellb | it'll take you back into neutron code eventually | 16:51 |
regXboi | russellb: unfortunately, the scaling problem I need to chase is in _create_port_in_ovn :/ | 16:51 |
russellb | ok, that one is just more complicated because it uses transactions and multiple commands | 16:52 |
russellb | to answer your first question about how the commit is done and how code hooks together, a simpler one would be easier to trace first i think | 16:52 |
regXboi | russellb: yeah I know - I never pick the simple things :) | 16:52 |
regXboi | anyway, no worries, I think I've got enough from that to continue to make progress | 16:53 |
russellb | ok | 16:53 |
* russellb gets lunch | 16:55 | |
mestery | russellb: Some new info on https://bugs.launchpad.net/networking-ovn/+bug/1536003, take a peek when back | 16:56 |
openstack | Launchpad bug 1536003 in networking-ovn "high cpu usage on ovn-controller process" [High,Confirmed] | 16:56 |
mestery | Additional debugging by folks | 16:56 |
russellb | good info | 16:57 |
mestery | yup | 16:57 |
russellb | lflow_run comment not surprising | 16:58 |
mestery | Yeah | 16:58 |
*** armax has joined #openstack-neutron-ovn | 17:01 | |
*** shettyg has joined #openstack-neutron-ovn | 17:23 | |
*** chandrav has joined #openstack-neutron-ovn | 18:07 | |
*** flaviof has joined #openstack-neutron-ovn | 18:12 | |
*** gizmoguy_ has joined #openstack-neutron-ovn | 18:17 | |
*** mamulsow has joined #openstack-neutron-ovn | 18:17 | |
mamulsow | Hi Russell | 18:17 |
mestery | mamulsow: Howd :) | 18:18 |
mestery | howdy | 18:18 |
russellb | mamulsow: hello! | 18:18 |
mestery | russellb: Meet mamulsow | 18:18 |
russellb | thanks for your work so far | 18:18 |
russellb | i'm playing with code, will ping if i have something for you to test | 18:18 |
mamulsow | okay, sounds good | 18:18 |
russellb | mamulsow: was curious how willing you are to throw experimental patches on a node to see what happens :) | 18:18 |
mamulsow | definitely open to that, this is just a test environment at this point | 18:19 |
russellb | ok | 18:19 |
*** stac- has joined #openstack-neutron-ovn | 18:19 | |
*** ajo_ has joined #openstack-neutron-ovn | 18:21 | |
*** ajo has quit IRC | 18:22 | |
*** stac has quit IRC | 18:22 | |
*** gizmoguy has quit IRC | 18:22 | |
*** arosen has joined #openstack-neutron-ovn | 18:22 | |
russellb | mamulsow: https://github.com/russellb/ovs/commits/ovn-controller-perf | 18:41 |
russellb | i have 1 test commit, but i put it in a branch with some other localnet port fixes i think you're using already | 18:42 |
russellb | this is the ovn-controller perf related change ... https://github.com/russellb/ovs/commit/0434cba6b0b5c925f7effd6c629e663f9acda2fd | 18:42 |
mamulsow | okay, building.. | 18:42 |
russellb | it did what i expected in a local trivial test, but i haven't done any load testing of it yet | 18:42 |
russellb | we'll see! | 18:42 |
russellb | the only changes are to ovn-controller | 18:46 |
russellb | to be clear.. | 18:46 |
mamulsow | is it reasonable to test on just one of the nodes or do I need to push this out to every node running ovn-controller? | 18:46 |
russellb | just one should be fine | 18:46 |
mamulsow | k | 18:46 |
russellb | and see if that has a noticable CPU impact for you | 18:46 |
russellb | it should cut way down on what lflow_run does though | 18:46 |
russellb | ... this may be too aggressive of an optimization actually, but still interesting to know the CPU impact | 18:49 |
mamulsow | it definitely helped, all of the other ovn-controllers are 95%+ CPU, this one is hovering around 40-50% | 18:53 |
mamulsow | 36.5 | 18:54 |
mamulsow | 27% | 18:54 |
mamulsow | 38% | 18:54 |
russellb | not bad.. | 18:56 |
russellb | i wouldnt' deploy it elsewhere, i'm not sure it's totally legit yet | 18:57 |
mamulsow | sure | 18:57 |
mamulsow | if there's anything you want me to test on it while I have it running let me know | 18:57 |
russellb | nope, sounds like it did what i hoped | 18:57 |
*** flaviof has quit IRC | 18:58 | |
*** flaviof has joined #openstack-neutron-ovn | 18:58 | |
russellb | mamulsow: so it's hovering at 95+% even when not creating/destroying anything? | 18:59 |
*** ajo_ has quit IRC | 18:59 | |
*** ajo_ has joined #openstack-neutron-ovn | 18:59 | |
mamulsow | well, hold on, I happened to be creating networks at that time | 19:00 |
mamulsow | stopping that | 19:00 |
russellb | ok, when changes happen, it's going to recalcuate a new desired state, but then it should go idle again if no changes are happening | 19:01 |
mamulsow | okay, now that nothing is being created/deleted the others are hovering in the 40-50% range and the updated one is hovering around 9% | 19:02 |
russellb | ok | 19:02 |
mamulsow | I had cleaned up some stuff earlier and was trying to get back to a state where we're closer to 100% on the others at idle | 19:02 |
mamulsow | but yeah, with nothing happening now, the other ovn-controller nodes are around 40-70% and this one is pretty consistently in the 9-10% range | 19:04 |
russellb | on one of the 40-70% nodes, can you turn on debug logging again, just for a minute or so | 19:07 |
russellb | it *might* be very verbose, but might not be .... | 19:07 |
russellb | ovs-appctl -t ovn-controller vlog/set dbg | 19:07 |
russellb | ... wait a while ... | 19:07 |
russellb | ovs-appctl -t ovn-controller vlog/set info | 19:07 |
russellb | (or whatever level you want to set it back to) | 19:07 |
russellb | i have ... ANOTHER IDEA | 19:07 |
mamulsow | !! | 19:08 |
openstack | mamulsow: Error: "!" is not a valid command. | 19:08 |
mamulsow | :) | 19:08 |
mamulsow | anything in particular I should be looking for in the debug log? | 19:11 |
russellb | not exactly, was hoping you could share | 19:17 |
* regXboi makes note of how to change controller logs :) | 19:18 | |
russellb | regXboi: works for all ovs/ovn daemons, i think | 19:18 |
regXboi | cool - I was just starting to look up how to do that | 19:19 |
russellb | details in ovs-appctl man page | 19:19 |
regXboi | for ovn-controller, for one of the next sets of tests I'm planning | 19:19 |
regXboi | ack | 19:19 |
mamulsow | http://pastebin.com/thRmPPWZ | 19:21 |
russellb | ok | 19:22 |
russellb | thank you | 19:22 |
russellb | i think what's happening is that idle isn't so idle because ovn-controller is re-calculating the full state every time it gets woken up for any reason | 19:23 |
russellb | and that includes these keepalive exchanges, both via openflow to the local switch, and via ovsdb | 19:23 |
russellb | so we should be able to tell ovn-controller to calm down and not recalculate the world if nothing actually changed | 19:23 |
russellb | current idea/theory anyway | 19:24 |
mestery | russellb: that makes sense to me | 19:24 |
russellb | now if i can turn that into code! | 19:25 |
mestery | :) | 19:25 |
mestery | magic! | 19:25 |
russellb | mamulsow: can you update the patched node with the current code in that branch? it's not going to help as much, but i think the patch is a valid optimization now ... | 19:29 |
russellb | the last revision was too aggressive sadly | 19:29 |
Sam-I-Am | well, i just fired off a change to zuul so networking-ovn doesnt run expensive jobs for docs and stuff | 19:34 |
russellb | Sam-I-Am: thanks! | 19:34 |
Sam-I-Am | russellb: https://review.openstack.org/#/c/270444/ | 19:34 |
Sam-I-Am | might check to see if i missed anything | 19:34 |
Sam-I-Am | or was too greedy | 19:34 |
russellb | Sam-I-Am: that says skip dsvm jobs if all files match at least one pattern in that list? | 19:36 |
Sam-I-Am | yeah | 19:36 |
russellb | ok thanks | 19:36 |
Sam-I-Am | more or less | 19:36 |
Sam-I-Am | that file is scary | 19:36 |
russellb | mestery: need infra liason on ^^^ | 19:37 |
Sam-I-Am | the rally job was named weirdly, so i had to add something to the job filter line | 19:39 |
arosen | russellb: This is weird.. Any idea where this this LOG statement is coming from? http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/026b9af/logs/screen-q-svc.txt.gz#_2016-01-19_23_29_38_077 | 19:47 |
arosen | *** FAILED TO DELETE NETWORK STILL IN USE *** | 19:47 |
arosen | that's from this patch: https://review.openstack.org/#/c/269897/ | 19:48 |
arosen | which doesn't add that line of logging. | 19:48 |
mamulsow | russellb: got the new build in and seems like it's hovering around 20% now, while the other nodes are still in the 40-70% range | 19:51 |
russellb | mamulsow: ok, still a reasonable improvement | 19:51 |
russellb | arosen: that patch is on top of the other | 19:52 |
russellb | mamulsow: are you interested in getting credit for testing the patch in the commit message? | 19:53 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: tempest debugging, ignore... https://review.openstack.org/270453 | 19:53 |
russellb | we sometimes add "Tested-by: Name <email>" headers for that | 19:53 |
arosen | russellb: ah... sorry the new gerrit ui threw me off ;) | 19:54 |
mamulsow | sure | 19:54 |
russellb | arosen: though the first tempest run didn't actually include my ovn patch, i forgot to actually commit it >_< | 19:54 |
russellb | arosen: the recheck should have it | 19:54 |
mamulsow | russellb: so at this point do you think I should push this updated ovn-controller out to all the nodes and run some more scale tests | 20:00 |
russellb | mamulsow: sure, i think that patch should be safe | 20:10 |
russellb | i'm not sure how long it will take me to get the next patch done | 20:10 |
russellb | mamulsow: what's your email addr? | 20:11 |
mestery | russellb: Done on the infra liaison +1 | 20:11 |
mestery | Thanks Sam-I-Am | 20:11 |
russellb | mestery: ack thanks | 20:11 |
Sam-I-Am | lets unblock those docs (i hope) | 20:14 |
Sam-I-Am | i might have terribly broken something | 20:14 |
russellb | now let's unblock tempest! | 20:14 |
russellb | :) | 20:15 |
russellb | i might have to turn to bribes soon | 20:15 |
*** salv-orlando has quit IRC | 20:18 | |
*** gongysh has joined #openstack-neutron-ovn | 20:18 | |
mestery | Yes to that! | 20:18 |
*** gongysh has quit IRC | 20:23 | |
mamulsow | russellb: mamulsow@us.ibm.com | 20:26 |
russellb | thanks | 20:27 |
russellb | mamulsow: i have another patch for you to try on 1 node when you're ready | 20:32 |
russellb | it's in https://github.com/russellb/ovs/commits/ovn-controller-perf | 20:32 |
russellb | the new patch is https://github.com/russellb/ovs/commit/88f15971f1d66d41b0560f1735fb837655eb80fc | 20:33 |
* russellb on a roll | 20:33 | |
mamulsow | cool, building now | 20:35 |
* mestery has a tear in his eye from the teamwork on display in this channel | 20:56 | |
russellb | hehe | 20:58 |
russellb | corporate barriers be damned, we have collaboration to do here! | 20:59 |
mestery | russellb: You sir, are a shining example of open source done right. | 21:00 |
* mestery more tears in his eyes | 21:00 | |
mestery | :) | 21:00 |
russellb | <3 | 21:00 |
mamulsow | +1 | 21:00 |
mestery | you too mamulsow :) | 21:00 |
mamulsow | so... not to break up the good times, but it looks like this latest patch has actually made things worse | 21:02 |
* Sam-I-Am appears | 21:03 | |
* regXboi wanders in | 21:03 | |
Sam-I-Am | russellb: look out. | 21:03 |
mestery | lol | 21:03 |
Sam-I-Am | the calvary has arrived | 21:03 |
mestery | hehehehehe | 21:03 |
Sam-I-Am | i rode my horse in backward | 21:03 |
regXboi | or "more godd***mned cowmen" | 21:03 |
russellb | mamulsow: well that's unfortuante. | 21:03 |
Sam-I-Am | and i'm the headless horseman now | 21:04 |
regXboi | depending on your point of view | 21:04 |
mestery | lol | 21:04 |
regXboi | anyway - russellb - where are we w.r.t to the dsvm job? | 21:04 |
russellb | mamulsow: can you confirm that if you drop just the latest patch, it gets better again? | 21:04 |
russellb | regXboi: still very broken | 21:04 |
mamulsow | so the modified one is at 100% and the others are still in the 40-70% range | 21:04 |
russellb | i've mostly been looking into these new bug reports today | 21:04 |
regXboi | ok, let me start with my patch | 21:04 |
regXboi | and work out from there | 21:04 |
russellb | mamulsow: heh, well, that's .... not intentional! | 21:05 |
Sam-I-Am | mestery: this looks like the six job that broke neutron too? | 21:07 |
Sam-I-Am | or, six thing | 21:07 |
regXboi | Sam-I-Am: where are you looking? | 21:07 |
russellb | yes the six thing broke networking-ovn as well | 21:07 |
Sam-I-Am | http://logs.openstack.org/38/269938/1/check/gate-tempest-dsvm-networking-ovn/84b836c/logs/devstacklog.txt.gz | 21:07 |
mestery | Sam-I-Am: Yes, all dsvm jobs are broken due to that | 21:08 |
regXboi | Sam-I-Am: isn't there a workaround/fix for this already in train? | 21:08 |
Sam-I-Am | thats what i thought | 21:08 |
mestery | But the ovn dsvm job wasn't happy for a while before that | 21:08 |
russellb | yes | 21:08 |
mestery | :( | 21:08 |
russellb | yeah ovn job broken before that | 21:08 |
russellb | on failures we haven't debugged | 21:08 |
Sam-I-Am | oh, so... this needs fixin' first | 21:08 |
russellb | or successfully debugged yet | 21:08 |
russellb | yaks everywhere | 21:09 |
regXboi | Sam-I-Am: yes | 21:09 |
regXboi | take a look at https://review.openstack.org/#/c/269121/ | 21:09 |
regXboi | for an example | 21:09 |
flaviof | russellb wrt https://github.com/russellb/ovs/commit/88f15971f1d66d41b0560f1735fb837655eb80fc#diff-fe69598e03bed8fbd3aa3f970b12b83bR276 | 21:10 |
Sam-I-Am | regXboi: lookin | 21:10 |
flaviof | shoun't init with ovs_seqno = ovsdb_idl_get_seqno(ovs_idl_loop.idl) ? | 21:10 |
regXboi | Sam-I-Am: it looks like something isn't getting torn down | 21:10 |
russellb | flaviof: apparently that made things much worse :) | 21:10 |
regXboi | I wonder | 21:10 |
Sam-I-Am | yeah | 21:11 |
russellb | flaviof: well, no, because i want to make sure it runs the first time | 21:11 |
Sam-I-Am | did a patch cause this? | 21:11 |
russellb | or that was my thinking | 21:11 |
flaviof | russellb ic. | 21:11 |
regXboi | Sam-I-Am: that's my thought | 21:11 |
russellb | flaviof: actually i think you're right, but i don't think that's why the patch sucks | 21:12 |
Sam-I-Am | trying to find the pattern of fail here | 21:12 |
flaviof | ack; no biggie; just trying to catch up... you guys move fast and i'm still working off the odl chains ;) | 21:13 |
russellb | heh | 21:13 |
russellb | i tweaked the commit with your suggestion, thanks | 21:13 |
mestery | flaviof: We'll get you integrated soon enough :) | 21:14 |
russellb | no shortage of work to do right now | 21:14 |
flaviof | mestery +1 | 21:15 |
*** rtheis has quit IRC | 21:15 | |
Sam-I-Am | regXboi: things seem to break consistently after this - https://review.openstack.org/#/c/178826/ | 21:16 |
Sam-I-Am | its an low-numbered patch too | 21:16 |
Sam-I-Am | interestingly, it passed the gate | 21:17 |
Sam-I-Am | the contents of the patch sort of relate to the problem | 21:17 |
regXboi | well... what if we try reverting it? | 21:17 |
russellb | we have a revert posted already | 21:17 |
russellb | which fails as consistently as everything else | 21:17 |
Sam-I-Am | with the same error? | 21:17 |
russellb | afaict so far | 21:17 |
russellb | yes | 21:17 |
Sam-I-Am | crap. has anyone replicated this locally? | 21:17 |
Sam-I-Am | essentially run whatever tempest is doing by hanf | 21:18 |
mestery | russellb: Just to confirm (there was a lot going on here), the code mamulsow is now running also has your providernet fix as well? | 21:18 |
Sam-I-Am | hand | 21:18 |
*** salv-orlando has joined #openstack-neutron-ovn | 21:18 | |
*** salv-orlando has quit IRC | 21:19 | |
russellb | mestery: yesa | 21:19 |
mestery | mamulsow: I think we can safely look to change the deployment topology to create provider networks as well, lets sync on that tomorrow before the standup | 21:19 |
*** salv-orlando has joined #openstack-neutron-ovn | 21:19 | |
mestery | russellb: Thanks sir! | 21:19 |
russellb | mestery: yeah, switching to provider net testing sounds like a good move if that's the target topology anyway | 21:20 |
* mestery nods in agreement | 21:20 | |
Sam-I-Am | russellb: i wanted to do that in the gate | 21:20 |
mestery | Though I'd still like to keep going with private network scale testing | 21:20 |
russellb | makes sense | 21:20 |
mestery | Better to get an idea of things there and work to solve in parallel | 21:20 |
regXboi | so ... I've got a failure on teardown of a test class | 21:20 |
regXboi | which implies the class itself isn't cleaning up properly | 21:20 |
mestery | regXboi: Nice find! | 21:21 |
regXboi | actually three of the four errors I see are teardown of classes | 21:21 |
regXboi | which makes me think either the test class is wrong OR there is automatic port type being created and not cleaned up | 21:21 |
Sam-I-Am | regXboi: are we talking about the same thing? | 21:21 |
regXboi | Sam-I-Am: possibly | 21:22 |
Sam-I-Am | the gate fizz | 21:22 |
regXboi | yes, I'm looking at a patch set that failed before the latest six issue | 21:22 |
regXboi | it was rechecked once, so there are four test failures to look at | 21:23 |
regXboi | three are on teardown and I'm looking at the fourth now | 21:23 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: tempest debugging, ignore... https://review.openstack.org/270453 | 21:23 |
mestery | regXboi: Looks like arosen is working this issue too | 21:23 |
mestery | ^^^ | 21:23 |
russellb | arosen: ah, yeah, that's cleaner than what i did, heh | 21:24 |
russellb | regXboi: mestery yeah we were trying to get more debug about what ports hadn't been deleted, since we're seeing network delete failures regularly because of ports still being around | 21:24 |
arosen | yea i wanna get a better look at what this hanging port looks like. | 21:25 |
russellb | that's been the most common error i've seen | 21:25 |
mestery | Cool | 21:25 |
mestery | Looks like regXboi is looking at different issues | 21:25 |
arosen | it's weird that it's not consistent. | 21:25 |
Sam-I-Am | arosen: the port thing? | 21:25 |
regXboi | arosen is looking at the same thing I'm coming at | 21:25 |
arosen | the tempest test fails saying cannot delete network because of port. | 21:26 |
Sam-I-Am | oh, i just got to what regXboi found | 21:26 |
arosen | let me read up. | 21:26 |
regXboi | you know ... looking at the logging, I almost think it's something related to DHCP | 21:27 |
arosen | regXboi: the tear down class in tempest or? | 21:27 |
regXboi | arosen: I've got three tear down classes yes, but I have one which is in deleting after setting up multiple NICs | 21:28 |
Sam-I-Am | arosen: http://logs.openstack.org/21/269121/3/check/gate-tempest-dsvm-networking-ovn/7f8b60d/logs/tempest.txt.gz#_2016-01-19_20_36_09_102 | 21:28 |
arosen | https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_v2.py#L369 <-- it will delete the dhcp port for you so it's not that one. | 21:28 |
* arosen loading.... | 21:29 | |
regXboi | interesting - *THAT* is a network with an IPv6 subnet | 21:29 |
arosen | yup, that's from the client side. | 21:30 |
regXboi | let's see if the client port that was created gets cleaned up | 21:30 |
openstackgerrit | Russell Bryant proposed openstack/networking-ovn: Test patch, ignore. https://review.openstack.org/269897 | 21:31 |
russellb | ^^^ that's for debugging a different thing | 21:31 |
russellb | an error in ovn-controller i can't easily reproduce locally | 21:31 |
regXboi | oh crap | 21:31 |
regXboi | I think I know what this was | 21:31 |
regXboi | note *was* | 21:32 |
regXboi | russellb: the -dsvm- job runs which tests from neutron? | 21:33 |
russellb | none | 21:33 |
russellb | tempest only | 21:33 |
regXboi | oh | 21:33 |
regXboi | n/m then - I'm looking in the wrong place | 21:33 |
Sam-I-Am | regXboi: what were you thinking? | 21:34 |
regXboi | Sam-I-Am: I was thinking the router delete optimization that I pushed in a while back and the partial revert that addresses potential loss of events at the L3 agent | 21:34 |
regXboi | because it looks like what's hanging the network delete is the router interface port | 21:34 |
Sam-I-Am | this is using the l3 agent | 21:35 |
regXboi | yes... it is | 21:35 |
regXboi | and the partial revert merged earlier this week | 21:35 |
Sam-I-Am | however i would think something that breaks the l3 agent here would also impact neutron | 21:35 |
flaviof | off the wall question: is neutron l3_ha==True ? I've seen issues in 'other sdn controller (aka odl)' where it is very confused about the 'special network' as well as the tenant-id-less that it has. | 21:35 |
flaviof | please ignore if this is not applicable! | 21:36 |
regXboi | well - no - that was the thing - the neutron tests were modified to work with the optimized code | 21:36 |
russellb | flaviof: i don't think we're setting it at all, so whatever that does | 21:36 |
regXboi | and there was the potential for a race condition that carl_baldwin pointed out | 21:36 |
Sam-I-Am | regXboi: this looks like a standard temptest test though, no? | 21:36 |
regXboi | Sam-I-Am: per russellb, it is, so that is why I think I'm barking up the wrong tree | 21:36 |
regXboi | I will be rechecking a patch set after the six fix merges ... just to see | 21:37 |
regXboi | speaking of which, do we know the patchset with the six fix? | 21:37 |
Sam-I-Am | trying to remember where that even went | 21:38 |
*** shettyg has quit IRC | 21:39 | |
russellb | mamulsow: I think we'll need to get to where we can generate a profile of ovn-controller so i don't just keep guessing at things. | 21:39 |
*** shettyg has joined #openstack-neutron-ovn | 21:39 | |
russellb | mamulsow: i haven't done profiling of a C app in a long time, but i used oprofile long ago... | 21:40 |
* regXboi channels Geisel for a moment | 21:40 | |
mamulsow | I was using poor-man's profiling by adding print statements so I could see which things were taking longest to run | 21:40 |
*** SpamapS has joined #openstack-neutron-ovn | 21:40 | |
russellb | mamulsow: :) | 21:40 |
* regXboi is looking for the six fix that let's tox rok | 21:40 | |
mamulsow | I can build debug and run a profiler against it if you want | 21:40 |
mamulsow | I'm also happy just adding print statements to track it down | 21:41 |
russellb | mamulsow: sure, that'd be great. i don't have instructions yet though | 21:41 |
russellb | but if you know how, have at it! | 21:41 |
*** roeyc has joined #openstack-neutron-ovn | 21:41 | |
arosen | regXboi: from the logs i see the delete port call for the left over port come in after the delete network call which seems odd. I wonder if this is a possible race in tempest. Looking at how that works now. | 21:41 |
russellb | arosen: weird | 21:42 |
russellb | or does the delete come from nova? | 21:42 |
Sam-I-Am | regXboi: https://review.openstack.org/#/c/269954/ | 21:42 |
russellb | that could explain the race potential | 21:42 |
regXboi | Sam-I-Am: that says merged, so let's run a recheck | 21:43 |
regXboi | and away we go | 21:44 |
Sam-I-Am | looks like pip revealed something we really shouldnt be doing | 21:44 |
Sam-I-Am | however, this issue breaks things long before the port problem | 21:45 |
Sam-I-Am | just would be nice to get past this one | 21:45 |
regXboi | russellb: if I recall correctly, nova is used in some tempest tests rather than neutron | 21:47 |
regXboi | I've argued against that in the past | 21:47 |
russellb | i mean, if port delete is the result of deleting a nova VM | 21:47 |
regXboi | especially for neutron jobs | 21:47 |
russellb | so if tempest does delete_vm() then delete_network() | 21:47 |
russellb | we could have this race, right? | 21:47 |
* russellb just making guesses | 21:47 | |
regXboi | well - the port I think I'm seeing is for the router interface. not a compute instance | 21:48 |
regXboi | so I'm not so sure | 21:48 |
Sam-I-Am | i'm looking in the l3 agent log | 21:48 |
regXboi | but my memory says you are correct about nova | 21:48 |
regXboi | oh my | 21:49 |
regXboi | oh my oh my oh my | 21:49 |
*** chandrav has quit IRC | 21:49 | |
regXboi | could this be a sideeffect of the keystone middleware problem? | 21:49 |
mestery | dum dum dum | 21:50 |
mestery | :) | 21:50 |
Sam-I-Am | regXboi: you mean things being really slow? | 21:50 |
regXboi | if we are slowing things down, then that would exacerbate any race condition, wouldn't it? | 21:50 |
Sam-I-Am | i think so | 21:50 |
Sam-I-Am | maybe we need a neutron-slow job to check for races :) | 21:50 |
regXboi | I don't know how to answer that | 21:51 |
mestery | regXboi: Look for notmorgan's thread on that, he has a devstack patch | 21:51 |
mestery | One way to verify would be to make one of our patches dependent on his | 21:51 |
mestery | To see if that slowdown is the culprit | 21:51 |
regXboi | mestery: I'm re-checking 269121 now that the six fix has merged | 21:51 |
mestery | Ack | 21:52 |
regXboi | and if it fails, I'll look at the failures and take it from there | 21:52 |
Sam-I-Am | i thought the middleware patch merged | 21:55 |
regXboi | Sam-I-Am: then this should pass ;) | 21:56 |
*** shettyg has quit IRC | 21:58 | |
Sam-I-Am | interestingly i dont see anything obvious in the keystonemiddleware repo | 21:58 |
regXboi | well, the dsvm job is now running here: https://jenkins02.openstack.org/job/gate-tempest-dsvm-networking-ovn/88/console | 21:58 |
regXboi | interestingly, job 84 three days ago passed | 21:59 |
regXboi | it was for 268717 | 22:00 |
regXboi | so the middleware cap failed on grenade-dsvm-multinode | 22:00 |
mamulsow | russellb: sorry, I found out that there was other testing going on in the environment at that time | 22:00 |
mamulsow | so I think the cause of the patched one being pegged at 100% was from that testing not your updates | 22:01 |
russellb | mamulsow: no worries | 22:01 |
russellb | ok, thanks | 22:01 |
russellb | i haven't trashed the patch just yet :) | 22:01 |
russellb | let me know if you get another chance to try and compare | 22:01 |
*** roeyc has quit IRC | 22:02 | |
mestery | russellb: Love it: "Grudgingly-Acked-by:" :) | 22:04 |
regXboi | mestery: reference? | 22:04 |
russellb | :-D | 22:04 |
* regXboi wants to see that one | 22:04 | |
russellb | http://openvswitch.org/pipermail/dev/2016-January/064745.html | 22:05 |
mestery | russellb: Faster than me! :) | 22:05 |
regXboi | rotflmao | 22:05 |
*** roeyc has joined #openstack-neutron-ovn | 22:08 | |
SpamapS | mamulsow: on an ovn related note.. testing 500 fake hypervisors with no neutron l2 agents results in a not-terribly-busy rabbitMQ. ;) | 22:09 |
SpamapS | now.. does anybody know how to run more than 500 docker containers on a single box without exploding the networking stack? ;-) | 22:09 |
russellb | heh, yes, not using rabbitmq tends to make rabbitmq less busy | 22:10 |
russellb | (•_•) ... ( •_•)>⌐■-■ ... (⌐■_■) | 22:11 |
mestery | SpamapS: I've heard this OVN thing may help with that ... | 22:11 |
SpamapS | mestery: lies | 22:11 |
mestery | lol | 22:11 |
russellb | https://github.com/openvswitch/ovs/blob/master/INSTALL.Docker.md | 22:11 |
russellb | not sure what you mean by "exploding the networking stack" though :) | 22:12 |
* mestery pictures pieces of IP addresses laying all over SpamapS's office | 22:12 | |
SpamapS | russellb: Oh I"m just using the default bridge networking in docker. At around 540 dockers running, all calls to socket() begin to fail | 22:12 |
SpamapS | Yes there's netmasks and node numbers litered throughout | 22:13 |
russellb | ¯\_(ツ)_/¯ | 22:13 |
SpamapS | Each docker running a nova-compute configured for fakevirt | 22:14 |
mestery | nice | 22:14 |
SpamapS | so yeah, at some point socket() just says NOPE | 22:15 |
SpamapS | ping, from outside the containers, for instance, does 'sendmsg: Invalid argument' | 22:15 |
*** chandrav has joined #openstack-neutron-ovn | 22:15 | |
SpamapS | because ping() just assumes socket() always works. ;) | 22:15 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: random test https://review.openstack.org/270502 | 22:17 |
russellb | ooh random test | 22:17 |
arosen | russellb: matching up the tempest logs I noticed that your query isn't returning any ports. Which seems weird. | 22:19 |
arosen | Details: {u'detail': u'', u'message': u'Unable to complete operation on network b26280b9-6eff-4d1d-a2b1-ea2c1f59b0fd. There are one or more ports still in use on the network.', u'type': u'NetworkInUse'} | 22:19 |
arosen | http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/025fc59/logs/screen-q-svc.txt.gz#_2016-01-20_19_52_42_283 | 22:19 |
arosen | see how the log statement: " *** DUMPING DETAILS OF PORT IN USE ***" doesn't come after that trace? | 22:20 |
russellb | right.. | 22:20 |
russellb | :-/ | 22:20 |
regXboi | well well well | 22:20 |
russellb | race? | 22:20 |
russellb | sounds like port was gone by the time we queried again? | 22:20 |
arosen | maybe. | 22:20 |
regXboi | mestery: *most* of the linearity is in ovsdb, but there is a little bit in row by value 2 | 22:20 |
russellb | arosen: just put a loop around it, lolz | 22:21 |
arosen | thus ml2 was born. | 22:21 |
russellb | that was my thought yes | 22:21 |
arosen | born again* | 22:21 |
arosen | :) | 22:21 |
russellb | arosen: is nova deleting the port in question? or do you not know? | 22:22 |
arosen | i need to look closer at that. | 22:22 |
russellb | k, i'll keep looking too (tomorrow though, i need to leave in a few minutes) | 22:23 |
*** gangil has joined #openstack-neutron-ovn | 22:25 | |
*** gangil has joined #openstack-neutron-ovn | 22:25 | |
arosen | i did find this in nova but not completely sure if it's related: http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/025fc59/logs/screen-n-cpu.txt.gz#_2016-01-20_19_51_49_286 | 22:27 |
arosen | i don't think it is though. | 22:28 |
regXboi | ok, this recheck is still going to fail | 22:28 |
regXboi | on the multiple_nics_order test | 22:28 |
regXboi | but ... the middleware patch is hung up as well | 22:28 |
regXboi | so I will see about a depends on for it tomorrow morning | 22:29 |
Sam-I-Am | regXboi: what was the middleware patch #? | 22:29 |
regXboi | 270417 | 22:30 |
Sam-I-Am | yeah, thats stuck | 22:31 |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Add missing call to self._process_l3_delete() https://review.openstack.org/270509 | 22:34 |
Sam-I-Am | regXboi: wonder if its worth rechecking | 22:35 |
regXboi | Sam-I-Am: go ahead - I'm walking away shortly | 22:35 |
regXboi | as in as soon as dinner beeps at me | 22:35 |
russellb | thanks for the help everyone! i really appreciate it! | 22:35 |
russellb | i'm out for now ... need to go home / eat / see family | 22:35 |
Sam-I-Am | pffffft | 22:35 |
mestery | have a good night regXboi | 22:35 |
mestery | you too regXboi | 22:35 |
Sam-I-Am | my family is going to do family things.... | 22:36 |
arosen | enjoy, catch you guys later! | 22:36 |
russellb | arosen: i see the cause of ovn-controller log spam, but i don't think it would cause any test failures. i think the log spam is the worst it's doing | 22:36 |
Sam-I-Am | i'm not going anywhere! | 22:37 |
arosen | k | 22:37 |
arosen | i have a multi node setup locally of ovn running the latest code and it seems to be working fine. | 22:37 |
arosen | with the latest code. | 22:37 |
russellb | figures | 22:37 |
regXboi | mestery: that was .... odd :) | 22:37 |
arosen | or at least booting vms testing acls are working fine :) | 22:38 |
mestery | arosen: We have a 125 node system doing the same and it's fine :) | 22:38 |
arosen | i ran into a really annoying corner case yesterday that was very self inflicted. | 22:38 |
arosen | i cloned a vm to add as a slave. | 22:38 |
arosen | and that OVN_UUID that's gets set in ovsdb as the system-id was the same on both systems. | 22:39 |
arosen | that caused things not to work.. | 22:39 |
arosen | it was funny the output of ovn-sbctl would only show one node and kept switching | 22:39 |
mestery | arosen: Heh :) | 22:40 |
russellb | ha, yes, ovn-controller on each node were fighting each other | 22:40 |
mestery | There can be only one! | 22:40 |
arosen | I pinged ben about the output and he gave me that hint | 22:40 |
arosen | yup :) | 22:40 |
russellb | i bet ovn-controller.log had "wtf?!" all in it | 22:40 |
arosen | I bet others will hit this though. | 22:40 |
mestery | rofl | 22:40 |
russellb | hopefully the log had a hint though | 22:40 |
arosen | nope it didn't have anything in the log. | 22:40 |
russellb | oh, well that's lame | 22:40 |
arosen | or at least nothing i saw. | 22:41 |
russellb | you can file a bug if you'd like against networking-ovn | 22:41 |
russellb | and i'll look eventually | 22:41 |
russellb | or someone can | 22:41 |
arosen | i think that would be against ovn though not networking-ovn? | 22:41 |
russellb | yes | 22:41 |
arosen | are we filing ovn bugs there too? | 22:41 |
russellb | but ovn doesn't have a tracker .... | 22:41 |
arosen | k | 22:41 |
russellb | i'm using networking-ovn, or my own private trello ... | 22:41 |
arosen | sounds good i can file a bug there. | 22:41 |
russellb | k | 22:42 |
russellb | alright i'm out for real | 22:42 |
russellb | ttyl | 22:42 |
arosen | later! | 22:42 |
mamulsow | russellb: thanks for your help! | 22:42 |
* regXboi also heads exit stage right | 22:42 | |
*** regXboi has quit IRC | 22:42 | |
*** roeyc has quit IRC | 22:47 | |
*** palexster has quit IRC | 23:03 | |
*** palexster has joined #openstack-neutron-ovn | 23:03 | |
openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Add missing call to self._process_l3_update/delete() https://review.openstack.org/270509 | 23:14 |
*** roeyc has joined #openstack-neutron-ovn | 23:42 | |
*** chandrav has quit IRC | 23:51 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!