| *** yamamoto_ has joined #openstack-neutron-ovn | 00:00 | |
| *** numans has joined #openstack-neutron-ovn | 00:02 | |
| *** roeyc has joined #openstack-neutron-ovn | 00:13 | |
| *** roeyc has quit IRC | 00:35 | |
| *** yamamoto_ has quit IRC | 00:35 | |
| *** roeyc has joined #openstack-neutron-ovn | 00:37 | |
| *** dlundquist has quit IRC | 00:40 | |
| *** dlundquist has joined #openstack-neutron-ovn | 00:40 | |
| *** chandrav has quit IRC | 00:47 | |
| openstackgerrit | Dustin Lundquist proposed openstack/networking-ovn: Devstack: cleanup datapath https://review.openstack.org/269938 | 00:55 |
|---|---|---|
| *** roeyc has quit IRC | 00:57 | |
| *** dlundquist has quit IRC | 01:03 | |
| *** roeyc has joined #openstack-neutron-ovn | 01:03 | |
| *** numans has quit IRC | 01:08 | |
| *** salv-orlando has quit IRC | 01:12 | |
| *** yamamoto_ has joined #openstack-neutron-ovn | 01:24 | |
| *** dlundquist has joined #openstack-neutron-ovn | 01:32 | |
| *** dlundquist has quit IRC | 01:32 | |
| *** sidk has joined #openstack-neutron-ovn | 01:41 | |
| *** armax has quit IRC | 01:51 | |
| *** armax has joined #openstack-neutron-ovn | 01:55 | |
| *** sidk has quit IRC | 01:59 | |
| *** arosen has quit IRC | 02:04 | |
| *** roeyc has quit IRC | 02:07 | |
| *** armax has quit IRC | 02:12 | |
| *** fzdarsky|afk has quit IRC | 02:20 | |
| *** armax has joined #openstack-neutron-ovn | 02:22 | |
| *** fzdarsky has joined #openstack-neutron-ovn | 02:26 | |
| *** gongysh has joined #openstack-neutron-ovn | 02:47 | |
| *** gongysh has quit IRC | 03:16 | |
| *** armax has quit IRC | 05:09 | |
| *** armax has joined #openstack-neutron-ovn | 05:17 | |
| *** chandrav has joined #openstack-neutron-ovn | 05:19 | |
| openstackgerrit | Dongcan Ye proposed openstack/networking-ovn: Fix in create_router and update_router https://review.openstack.org/268722 | 05:39 |
| *** salv-orlando has joined #openstack-neutron-ovn | 05:49 | |
| *** gongysh has joined #openstack-neutron-ovn | 05:54 | |
| *** otherwiseguy has quit IRC | 05:58 | |
| *** otherwiseguy has joined #openstack-neutron-ovn | 05:58 | |
| *** gongysh has quit IRC | 06:13 | |
| openstackgerrit | Babu Shanmugam proposed openstack/networking-ovn: Enabling qos support through Logical_Port.options https://review.openstack.org/265798 | 06:18 |
| *** numans has joined #openstack-neutron-ovn | 06:45 | |
| *** chandrav has quit IRC | 07:16 | |
| *** salv-orlando has quit IRC | 07:49 | |
| *** fzdarsky has quit IRC | 08:02 | |
| *** gongysh has joined #openstack-neutron-ovn | 08:13 | |
| *** gongysh has quit IRC | 08:21 | |
| *** gongysh has joined #openstack-neutron-ovn | 08:31 | |
| *** gongysh has quit IRC | 08:36 | |
| *** fzdarsky has joined #openstack-neutron-ovn | 10:25 | |
| *** yamamoto_ has quit IRC | 10:49 | |
| *** yamamoto has joined #openstack-neutron-ovn | 10:50 | |
| *** gongysh has joined #openstack-neutron-ovn | 11:04 | |
| *** yamamoto has quit IRC | 11:04 | |
| *** openstackgerrit has quit IRC | 11:43 | |
| -openstackstatus- NOTICE: review.openstack.org is being restarted to apply patches | 11:43 | |
| *** ChanServ changes topic to "review.openstack.org is being restarted to apply patches" | 11:43 | |
| *** openstackgerrit has joined #openstack-neutron-ovn | 11:44 | |
| *** gongysh has quit IRC | 11:46 | |
| *** ChanServ changes topic to "http://docs.openstack.org/developer/networking-ovn/ -=- OVN meeting Thursdays 10:15am Pacific / 1:15pm Eastern #openvswitch -=- Tempest health: http://goo.gl/9MaBJC" | 11:54 | |
| -openstackstatus- NOTICE: Restart done, review.openstack.org is available | 11:54 | |
| *** gongysh has joined #openstack-neutron-ovn | 11:55 | |
| *** gongysh has quit IRC | 12:07 | |
| *** yamamoto has joined #openstack-neutron-ovn | 12:28 | |
| *** yamamoto has quit IRC | 12:29 | |
| *** rtheis has joined #openstack-neutron-ovn | 12:29 | |
| *** yamamoto_ has joined #openstack-neutron-ovn | 12:29 | |
| *** palexster has quit IRC | 12:30 | |
| *** yamamoto has joined #openstack-neutron-ovn | 12:31 | |
| *** flaviof has quit IRC | 12:32 | |
| *** yamamoto_ has quit IRC | 12:34 | |
| *** palexster has joined #openstack-neutron-ovn | 12:43 | |
| *** SongmingYan has joined #openstack-neutron-ovn | 13:29 | |
| *** dslev has joined #openstack-neutron-ovn | 13:34 | |
| *** dslev has quit IRC | 13:43 | |
| *** toor has joined #openstack-neutron-ovn | 13:47 | |
| *** leifmadsen_ has joined #openstack-neutron-ovn | 13:55 | |
| *** Sam-I-Am has quit IRC | 13:56 | |
| *** leifmadsen has quit IRC | 13:56 | |
| *** toor_ has quit IRC | 13:56 | |
| *** leifmadsen_ is now known as leifmadsen | 13:56 | |
| *** yamamoto has quit IRC | 13:59 | |
| *** yamamoto has joined #openstack-neutron-ovn | 14:03 | |
| *** Sam-I-Am has joined #openstack-neutron-ovn | 14:05 | |
| *** flaviof has joined #openstack-neutron-ovn | 14:11 | |
| *** dslevin has joined #openstack-neutron-ovn | 14:15 | |
| *** dslevin has quit IRC | 14:16 | |
| *** dslevin has joined #openstack-neutron-ovn | 14:16 | |
| *** yamamoto has quit IRC | 14:33 | |
| *** yamamoto has joined #openstack-neutron-ovn | 14:39 | |
| *** SongmingYan has quit IRC | 15:31 | |
| *** numans has quit IRC | 15:38 | |
| *** flaviof has quit IRC | 15:41 | |
| *** yamamoto has quit IRC | 15:42 | |
| *** yamamoto has joined #openstack-neutron-ovn | 15:42 | |
| *** armax has left #openstack-neutron-ovn | 15:55 | |
| *** salv-orlando has joined #openstack-neutron-ovn | 16:30 | |
| *** chandrav has joined #openstack-neutron-ovn | 16:31 | |
| *** chandrav has quit IRC | 16:31 | |
| *** regXboi has joined #openstack-neutron-ovn | 16:33 | |
| *** yamamoto has quit IRC | 16:33 | |
| regXboi | russellb: ping | 16:38 |
| regXboi | russellb: got a question for you | 16:38 |
| russellb | k | 16:38 |
| regXboi | I'm chasing some scaling issues with adding ports (we've talked about this in the past) | 16:39 |
| regXboi | and I'm trying to segment the trouble to pieces of code | 16:39 |
| regXboi | so I can submit patches :) | 16:39 |
| Sam-I-Am | russellb: mornings | 16:40 |
| regXboi | and I'm trying to figure out where the do_commit method of the IDL transaction gets tripped from the networking-ovn code, because I'm not seeing the path right now | 16:40 |
| russellb | it's magic | 16:47 |
| regXboi | russellb: thanks - I think :) | 16:48 |
| russellb | i'm getting you a better answer .. | 16:48 |
| russellb | in networking_ovn/plugin.py you either have a call to execute() directly | 16:49 |
| russellb | or in some cases, a transaction to build up a list of commands | 16:49 |
| russellb | and then that list gets executed all at once when the transaction context manager exits | 16:49 |
| russellb | the commit is called down in base code we use from neutron | 16:50 |
| russellb | i'd suggest following a simpler one, like create_router | 16:50 |
| russellb | self._ovn.create_lrouter(...).execute(...) | 16:50 |
| russellb | and dig through what all that does | 16:51 |
| russellb | it'll take you back into neutron code eventually | 16:51 |
| regXboi | russellb: unfortunately, the scaling problem I need to chase is in _create_port_in_ovn :/ | 16:51 |
| russellb | ok, that one is just more complicated because it uses transactions and multiple commands | 16:52 |
| russellb | to answer your first question about how the commit is done and how code hooks together, a simpler one would be easier to trace first i think | 16:52 |
| regXboi | russellb: yeah I know - I never pick the simple things :) | 16:52 |
| regXboi | anyway, no worries, I think I've got enough from that to continue to make progress | 16:53 |
| russellb | ok | 16:53 |
| * russellb gets lunch | 16:55 | |
| mestery | russellb: Some new info on https://bugs.launchpad.net/networking-ovn/+bug/1536003, take a peek when back | 16:56 |
| openstack | Launchpad bug 1536003 in networking-ovn "high cpu usage on ovn-controller process" [High,Confirmed] | 16:56 |
| mestery | Additional debugging by folks | 16:56 |
| russellb | good info | 16:57 |
| mestery | yup | 16:57 |
| russellb | lflow_run comment not surprising | 16:58 |
| mestery | Yeah | 16:58 |
| *** armax has joined #openstack-neutron-ovn | 17:01 | |
| *** shettyg has joined #openstack-neutron-ovn | 17:23 | |
| *** chandrav has joined #openstack-neutron-ovn | 18:07 | |
| *** flaviof has joined #openstack-neutron-ovn | 18:12 | |
| *** gizmoguy_ has joined #openstack-neutron-ovn | 18:17 | |
| *** mamulsow has joined #openstack-neutron-ovn | 18:17 | |
| mamulsow | Hi Russell | 18:17 |
| mestery | mamulsow: Howd :) | 18:18 |
| mestery | howdy | 18:18 |
| russellb | mamulsow: hello! | 18:18 |
| mestery | russellb: Meet mamulsow | 18:18 |
| russellb | thanks for your work so far | 18:18 |
| russellb | i'm playing with code, will ping if i have something for you to test | 18:18 |
| mamulsow | okay, sounds good | 18:18 |
| russellb | mamulsow: was curious how willing you are to throw experimental patches on a node to see what happens :) | 18:18 |
| mamulsow | definitely open to that, this is just a test environment at this point | 18:19 |
| russellb | ok | 18:19 |
| *** stac- has joined #openstack-neutron-ovn | 18:19 | |
| *** ajo_ has joined #openstack-neutron-ovn | 18:21 | |
| *** ajo has quit IRC | 18:22 | |
| *** stac has quit IRC | 18:22 | |
| *** gizmoguy has quit IRC | 18:22 | |
| *** arosen has joined #openstack-neutron-ovn | 18:22 | |
| russellb | mamulsow: https://github.com/russellb/ovs/commits/ovn-controller-perf | 18:41 |
| russellb | i have 1 test commit, but i put it in a branch with some other localnet port fixes i think you're using already | 18:42 |
| russellb | this is the ovn-controller perf related change ... https://github.com/russellb/ovs/commit/0434cba6b0b5c925f7effd6c629e663f9acda2fd | 18:42 |
| mamulsow | okay, building.. | 18:42 |
| russellb | it did what i expected in a local trivial test, but i haven't done any load testing of it yet | 18:42 |
| russellb | we'll see! | 18:42 |
| russellb | the only changes are to ovn-controller | 18:46 |
| russellb | to be clear.. | 18:46 |
| mamulsow | is it reasonable to test on just one of the nodes or do I need to push this out to every node running ovn-controller? | 18:46 |
| russellb | just one should be fine | 18:46 |
| mamulsow | k | 18:46 |
| russellb | and see if that has a noticable CPU impact for you | 18:46 |
| russellb | it should cut way down on what lflow_run does though | 18:46 |
| russellb | ... this may be too aggressive of an optimization actually, but still interesting to know the CPU impact | 18:49 |
| mamulsow | it definitely helped, all of the other ovn-controllers are 95%+ CPU, this one is hovering around 40-50% | 18:53 |
| mamulsow | 36.5 | 18:54 |
| mamulsow | 27% | 18:54 |
| mamulsow | 38% | 18:54 |
| russellb | not bad.. | 18:56 |
| russellb | i wouldnt' deploy it elsewhere, i'm not sure it's totally legit yet | 18:57 |
| mamulsow | sure | 18:57 |
| mamulsow | if there's anything you want me to test on it while I have it running let me know | 18:57 |
| russellb | nope, sounds like it did what i hoped | 18:57 |
| *** flaviof has quit IRC | 18:58 | |
| *** flaviof has joined #openstack-neutron-ovn | 18:58 | |
| russellb | mamulsow: so it's hovering at 95+% even when not creating/destroying anything? | 18:59 |
| *** ajo_ has quit IRC | 18:59 | |
| *** ajo_ has joined #openstack-neutron-ovn | 18:59 | |
| mamulsow | well, hold on, I happened to be creating networks at that time | 19:00 |
| mamulsow | stopping that | 19:00 |
| russellb | ok, when changes happen, it's going to recalcuate a new desired state, but then it should go idle again if no changes are happening | 19:01 |
| mamulsow | okay, now that nothing is being created/deleted the others are hovering in the 40-50% range and the updated one is hovering around 9% | 19:02 |
| russellb | ok | 19:02 |
| mamulsow | I had cleaned up some stuff earlier and was trying to get back to a state where we're closer to 100% on the others at idle | 19:02 |
| mamulsow | but yeah, with nothing happening now, the other ovn-controller nodes are around 40-70% and this one is pretty consistently in the 9-10% range | 19:04 |
| russellb | on one of the 40-70% nodes, can you turn on debug logging again, just for a minute or so | 19:07 |
| russellb | it *might* be very verbose, but might not be .... | 19:07 |
| russellb | ovs-appctl -t ovn-controller vlog/set dbg | 19:07 |
| russellb | ... wait a while ... | 19:07 |
| russellb | ovs-appctl -t ovn-controller vlog/set info | 19:07 |
| russellb | (or whatever level you want to set it back to) | 19:07 |
| russellb | i have ... ANOTHER IDEA | 19:07 |
| mamulsow | !! | 19:08 |
| openstack | mamulsow: Error: "!" is not a valid command. | 19:08 |
| mamulsow | :) | 19:08 |
| mamulsow | anything in particular I should be looking for in the debug log? | 19:11 |
| russellb | not exactly, was hoping you could share | 19:17 |
| * regXboi makes note of how to change controller logs :) | 19:18 | |
| russellb | regXboi: works for all ovs/ovn daemons, i think | 19:18 |
| regXboi | cool - I was just starting to look up how to do that | 19:19 |
| russellb | details in ovs-appctl man page | 19:19 |
| regXboi | for ovn-controller, for one of the next sets of tests I'm planning | 19:19 |
| regXboi | ack | 19:19 |
| mamulsow | http://pastebin.com/thRmPPWZ | 19:21 |
| russellb | ok | 19:22 |
| russellb | thank you | 19:22 |
| russellb | i think what's happening is that idle isn't so idle because ovn-controller is re-calculating the full state every time it gets woken up for any reason | 19:23 |
| russellb | and that includes these keepalive exchanges, both via openflow to the local switch, and via ovsdb | 19:23 |
| russellb | so we should be able to tell ovn-controller to calm down and not recalculate the world if nothing actually changed | 19:23 |
| russellb | current idea/theory anyway | 19:24 |
| mestery | russellb: that makes sense to me | 19:24 |
| russellb | now if i can turn that into code! | 19:25 |
| mestery | :) | 19:25 |
| mestery | magic! | 19:25 |
| russellb | mamulsow: can you update the patched node with the current code in that branch? it's not going to help as much, but i think the patch is a valid optimization now ... | 19:29 |
| russellb | the last revision was too aggressive sadly | 19:29 |
| Sam-I-Am | well, i just fired off a change to zuul so networking-ovn doesnt run expensive jobs for docs and stuff | 19:34 |
| russellb | Sam-I-Am: thanks! | 19:34 |
| Sam-I-Am | russellb: https://review.openstack.org/#/c/270444/ | 19:34 |
| Sam-I-Am | might check to see if i missed anything | 19:34 |
| Sam-I-Am | or was too greedy | 19:34 |
| russellb | Sam-I-Am: that says skip dsvm jobs if all files match at least one pattern in that list? | 19:36 |
| Sam-I-Am | yeah | 19:36 |
| russellb | ok thanks | 19:36 |
| Sam-I-Am | more or less | 19:36 |
| Sam-I-Am | that file is scary | 19:36 |
| russellb | mestery: need infra liason on ^^^ | 19:37 |
| Sam-I-Am | the rally job was named weirdly, so i had to add something to the job filter line | 19:39 |
| arosen | russellb: This is weird.. Any idea where this this LOG statement is coming from? http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/026b9af/logs/screen-q-svc.txt.gz#_2016-01-19_23_29_38_077 | 19:47 |
| arosen | *** FAILED TO DELETE NETWORK STILL IN USE *** | 19:47 |
| arosen | that's from this patch: https://review.openstack.org/#/c/269897/ | 19:48 |
| arosen | which doesn't add that line of logging. | 19:48 |
| mamulsow | russellb: got the new build in and seems like it's hovering around 20% now, while the other nodes are still in the 40-70% range | 19:51 |
| russellb | mamulsow: ok, still a reasonable improvement | 19:51 |
| russellb | arosen: that patch is on top of the other | 19:52 |
| russellb | mamulsow: are you interested in getting credit for testing the patch in the commit message? | 19:53 |
| openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: tempest debugging, ignore... https://review.openstack.org/270453 | 19:53 |
| russellb | we sometimes add "Tested-by: Name <email>" headers for that | 19:53 |
| arosen | russellb: ah... sorry the new gerrit ui threw me off ;) | 19:54 |
| mamulsow | sure | 19:54 |
| russellb | arosen: though the first tempest run didn't actually include my ovn patch, i forgot to actually commit it >_< | 19:54 |
| russellb | arosen: the recheck should have it | 19:54 |
| mamulsow | russellb: so at this point do you think I should push this updated ovn-controller out to all the nodes and run some more scale tests | 20:00 |
| russellb | mamulsow: sure, i think that patch should be safe | 20:10 |
| russellb | i'm not sure how long it will take me to get the next patch done | 20:10 |
| russellb | mamulsow: what's your email addr? | 20:11 |
| mestery | russellb: Done on the infra liaison +1 | 20:11 |
| mestery | Thanks Sam-I-Am | 20:11 |
| russellb | mestery: ack thanks | 20:11 |
| Sam-I-Am | lets unblock those docs (i hope) | 20:14 |
| Sam-I-Am | i might have terribly broken something | 20:14 |
| russellb | now let's unblock tempest! | 20:14 |
| russellb | :) | 20:15 |
| russellb | i might have to turn to bribes soon | 20:15 |
| *** salv-orlando has quit IRC | 20:18 | |
| *** gongysh has joined #openstack-neutron-ovn | 20:18 | |
| mestery | Yes to that! | 20:18 |
| *** gongysh has quit IRC | 20:23 | |
| mamulsow | russellb: mamulsow@us.ibm.com | 20:26 |
| russellb | thanks | 20:27 |
| russellb | mamulsow: i have another patch for you to try on 1 node when you're ready | 20:32 |
| russellb | it's in https://github.com/russellb/ovs/commits/ovn-controller-perf | 20:32 |
| russellb | the new patch is https://github.com/russellb/ovs/commit/88f15971f1d66d41b0560f1735fb837655eb80fc | 20:33 |
| * russellb on a roll | 20:33 | |
| mamulsow | cool, building now | 20:35 |
| * mestery has a tear in his eye from the teamwork on display in this channel | 20:56 | |
| russellb | hehe | 20:58 |
| russellb | corporate barriers be damned, we have collaboration to do here! | 20:59 |
| mestery | russellb: You sir, are a shining example of open source done right. | 21:00 |
| * mestery more tears in his eyes | 21:00 | |
| mestery | :) | 21:00 |
| russellb | <3 | 21:00 |
| mamulsow | +1 | 21:00 |
| mestery | you too mamulsow :) | 21:00 |
| mamulsow | so... not to break up the good times, but it looks like this latest patch has actually made things worse | 21:02 |
| * Sam-I-Am appears | 21:03 | |
| * regXboi wanders in | 21:03 | |
| Sam-I-Am | russellb: look out. | 21:03 |
| mestery | lol | 21:03 |
| Sam-I-Am | the calvary has arrived | 21:03 |
| mestery | hehehehehe | 21:03 |
| Sam-I-Am | i rode my horse in backward | 21:03 |
| regXboi | or "more godd***mned cowmen" | 21:03 |
| russellb | mamulsow: well that's unfortuante. | 21:03 |
| Sam-I-Am | and i'm the headless horseman now | 21:04 |
| regXboi | depending on your point of view | 21:04 |
| mestery | lol | 21:04 |
| regXboi | anyway - russellb - where are we w.r.t to the dsvm job? | 21:04 |
| russellb | mamulsow: can you confirm that if you drop just the latest patch, it gets better again? | 21:04 |
| russellb | regXboi: still very broken | 21:04 |
| mamulsow | so the modified one is at 100% and the others are still in the 40-70% range | 21:04 |
| russellb | i've mostly been looking into these new bug reports today | 21:04 |
| regXboi | ok, let me start with my patch | 21:04 |
| regXboi | and work out from there | 21:04 |
| russellb | mamulsow: heh, well, that's .... not intentional! | 21:05 |
| Sam-I-Am | mestery: this looks like the six job that broke neutron too? | 21:07 |
| Sam-I-Am | or, six thing | 21:07 |
| regXboi | Sam-I-Am: where are you looking? | 21:07 |
| russellb | yes the six thing broke networking-ovn as well | 21:07 |
| Sam-I-Am | http://logs.openstack.org/38/269938/1/check/gate-tempest-dsvm-networking-ovn/84b836c/logs/devstacklog.txt.gz | 21:07 |
| mestery | Sam-I-Am: Yes, all dsvm jobs are broken due to that | 21:08 |
| regXboi | Sam-I-Am: isn't there a workaround/fix for this already in train? | 21:08 |
| Sam-I-Am | thats what i thought | 21:08 |
| mestery | But the ovn dsvm job wasn't happy for a while before that | 21:08 |
| russellb | yes | 21:08 |
| mestery | :( | 21:08 |
| russellb | yeah ovn job broken before that | 21:08 |
| russellb | on failures we haven't debugged | 21:08 |
| Sam-I-Am | oh, so... this needs fixin' first | 21:08 |
| russellb | or successfully debugged yet | 21:08 |
| russellb | yaks everywhere | 21:09 |
| regXboi | Sam-I-Am: yes | 21:09 |
| regXboi | take a look at https://review.openstack.org/#/c/269121/ | 21:09 |
| regXboi | for an example | 21:09 |
| flaviof | russellb wrt https://github.com/russellb/ovs/commit/88f15971f1d66d41b0560f1735fb837655eb80fc#diff-fe69598e03bed8fbd3aa3f970b12b83bR276 | 21:10 |
| Sam-I-Am | regXboi: lookin | 21:10 |
| flaviof | shoun't init with ovs_seqno = ovsdb_idl_get_seqno(ovs_idl_loop.idl) ? | 21:10 |
| regXboi | Sam-I-Am: it looks like something isn't getting torn down | 21:10 |
| russellb | flaviof: apparently that made things much worse :) | 21:10 |
| regXboi | I wonder | 21:10 |
| Sam-I-Am | yeah | 21:11 |
| russellb | flaviof: well, no, because i want to make sure it runs the first time | 21:11 |
| Sam-I-Am | did a patch cause this? | 21:11 |
| russellb | or that was my thinking | 21:11 |
| flaviof | russellb ic. | 21:11 |
| regXboi | Sam-I-Am: that's my thought | 21:11 |
| russellb | flaviof: actually i think you're right, but i don't think that's why the patch sucks | 21:12 |
| Sam-I-Am | trying to find the pattern of fail here | 21:12 |
| flaviof | ack; no biggie; just trying to catch up... you guys move fast and i'm still working off the odl chains ;) | 21:13 |
| russellb | heh | 21:13 |
| russellb | i tweaked the commit with your suggestion, thanks | 21:13 |
| mestery | flaviof: We'll get you integrated soon enough :) | 21:14 |
| russellb | no shortage of work to do right now | 21:14 |
| flaviof | mestery +1 | 21:15 |
| *** rtheis has quit IRC | 21:15 | |
| Sam-I-Am | regXboi: things seem to break consistently after this - https://review.openstack.org/#/c/178826/ | 21:16 |
| Sam-I-Am | its an low-numbered patch too | 21:16 |
| Sam-I-Am | interestingly, it passed the gate | 21:17 |
| Sam-I-Am | the contents of the patch sort of relate to the problem | 21:17 |
| regXboi | well... what if we try reverting it? | 21:17 |
| russellb | we have a revert posted already | 21:17 |
| russellb | which fails as consistently as everything else | 21:17 |
| Sam-I-Am | with the same error? | 21:17 |
| russellb | afaict so far | 21:17 |
| russellb | yes | 21:17 |
| Sam-I-Am | crap. has anyone replicated this locally? | 21:17 |
| Sam-I-Am | essentially run whatever tempest is doing by hanf | 21:18 |
| mestery | russellb: Just to confirm (there was a lot going on here), the code mamulsow is now running also has your providernet fix as well? | 21:18 |
| Sam-I-Am | hand | 21:18 |
| *** salv-orlando has joined #openstack-neutron-ovn | 21:18 | |
| *** salv-orlando has quit IRC | 21:19 | |
| russellb | mestery: yesa | 21:19 |
| mestery | mamulsow: I think we can safely look to change the deployment topology to create provider networks as well, lets sync on that tomorrow before the standup | 21:19 |
| *** salv-orlando has joined #openstack-neutron-ovn | 21:19 | |
| mestery | russellb: Thanks sir! | 21:19 |
| russellb | mestery: yeah, switching to provider net testing sounds like a good move if that's the target topology anyway | 21:20 |
| * mestery nods in agreement | 21:20 | |
| Sam-I-Am | russellb: i wanted to do that in the gate | 21:20 |
| mestery | Though I'd still like to keep going with private network scale testing | 21:20 |
| russellb | makes sense | 21:20 |
| mestery | Better to get an idea of things there and work to solve in parallel | 21:20 |
| regXboi | so ... I've got a failure on teardown of a test class | 21:20 |
| regXboi | which implies the class itself isn't cleaning up properly | 21:20 |
| mestery | regXboi: Nice find! | 21:21 |
| regXboi | actually three of the four errors I see are teardown of classes | 21:21 |
| regXboi | which makes me think either the test class is wrong OR there is automatic port type being created and not cleaned up | 21:21 |
| Sam-I-Am | regXboi: are we talking about the same thing? | 21:21 |
| regXboi | Sam-I-Am: possibly | 21:22 |
| Sam-I-Am | the gate fizz | 21:22 |
| regXboi | yes, I'm looking at a patch set that failed before the latest six issue | 21:22 |
| regXboi | it was rechecked once, so there are four test failures to look at | 21:23 |
| regXboi | three are on teardown and I'm looking at the fourth now | 21:23 |
| openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: tempest debugging, ignore... https://review.openstack.org/270453 | 21:23 |
| mestery | regXboi: Looks like arosen is working this issue too | 21:23 |
| mestery | ^^^ | 21:23 |
| russellb | arosen: ah, yeah, that's cleaner than what i did, heh | 21:24 |
| russellb | regXboi: mestery yeah we were trying to get more debug about what ports hadn't been deleted, since we're seeing network delete failures regularly because of ports still being around | 21:24 |
| arosen | yea i wanna get a better look at what this hanging port looks like. | 21:25 |
| russellb | that's been the most common error i've seen | 21:25 |
| mestery | Cool | 21:25 |
| mestery | Looks like regXboi is looking at different issues | 21:25 |
| arosen | it's weird that it's not consistent. | 21:25 |
| Sam-I-Am | arosen: the port thing? | 21:25 |
| regXboi | arosen is looking at the same thing I'm coming at | 21:25 |
| arosen | the tempest test fails saying cannot delete network because of port. | 21:26 |
| Sam-I-Am | oh, i just got to what regXboi found | 21:26 |
| arosen | let me read up. | 21:26 |
| regXboi | you know ... looking at the logging, I almost think it's something related to DHCP | 21:27 |
| arosen | regXboi: the tear down class in tempest or? | 21:27 |
| regXboi | arosen: I've got three tear down classes yes, but I have one which is in deleting after setting up multiple NICs | 21:28 |
| Sam-I-Am | arosen: http://logs.openstack.org/21/269121/3/check/gate-tempest-dsvm-networking-ovn/7f8b60d/logs/tempest.txt.gz#_2016-01-19_20_36_09_102 | 21:28 |
| arosen | https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_v2.py#L369 <-- it will delete the dhcp port for you so it's not that one. | 21:28 |
| * arosen loading.... | 21:29 | |
| regXboi | interesting - *THAT* is a network with an IPv6 subnet | 21:29 |
| arosen | yup, that's from the client side. | 21:30 |
| regXboi | let's see if the client port that was created gets cleaned up | 21:30 |
| openstackgerrit | Russell Bryant proposed openstack/networking-ovn: Test patch, ignore. https://review.openstack.org/269897 | 21:31 |
| russellb | ^^^ that's for debugging a different thing | 21:31 |
| russellb | an error in ovn-controller i can't easily reproduce locally | 21:31 |
| regXboi | oh crap | 21:31 |
| regXboi | I think I know what this was | 21:31 |
| regXboi | note *was* | 21:32 |
| regXboi | russellb: the -dsvm- job runs which tests from neutron? | 21:33 |
| russellb | none | 21:33 |
| russellb | tempest only | 21:33 |
| regXboi | oh | 21:33 |
| regXboi | n/m then - I'm looking in the wrong place | 21:33 |
| Sam-I-Am | regXboi: what were you thinking? | 21:34 |
| regXboi | Sam-I-Am: I was thinking the router delete optimization that I pushed in a while back and the partial revert that addresses potential loss of events at the L3 agent | 21:34 |
| regXboi | because it looks like what's hanging the network delete is the router interface port | 21:34 |
| Sam-I-Am | this is using the l3 agent | 21:35 |
| regXboi | yes... it is | 21:35 |
| regXboi | and the partial revert merged earlier this week | 21:35 |
| Sam-I-Am | however i would think something that breaks the l3 agent here would also impact neutron | 21:35 |
| flaviof | off the wall question: is neutron l3_ha==True ? I've seen issues in 'other sdn controller (aka odl)' where it is very confused about the 'special network' as well as the tenant-id-less that it has. | 21:35 |
| flaviof | please ignore if this is not applicable! | 21:36 |
| regXboi | well - no - that was the thing - the neutron tests were modified to work with the optimized code | 21:36 |
| russellb | flaviof: i don't think we're setting it at all, so whatever that does | 21:36 |
| regXboi | and there was the potential for a race condition that carl_baldwin pointed out | 21:36 |
| Sam-I-Am | regXboi: this looks like a standard temptest test though, no? | 21:36 |
| regXboi | Sam-I-Am: per russellb, it is, so that is why I think I'm barking up the wrong tree | 21:36 |
| regXboi | I will be rechecking a patch set after the six fix merges ... just to see | 21:37 |
| regXboi | speaking of which, do we know the patchset with the six fix? | 21:37 |
| Sam-I-Am | trying to remember where that even went | 21:38 |
| *** shettyg has quit IRC | 21:39 | |
| russellb | mamulsow: I think we'll need to get to where we can generate a profile of ovn-controller so i don't just keep guessing at things. | 21:39 |
| *** shettyg has joined #openstack-neutron-ovn | 21:39 | |
| russellb | mamulsow: i haven't done profiling of a C app in a long time, but i used oprofile long ago... | 21:40 |
| * regXboi channels Geisel for a moment | 21:40 | |
| mamulsow | I was using poor-man's profiling by adding print statements so I could see which things were taking longest to run | 21:40 |
| *** SpamapS has joined #openstack-neutron-ovn | 21:40 | |
| russellb | mamulsow: :) | 21:40 |
| * regXboi is looking for the six fix that let's tox rok | 21:40 | |
| mamulsow | I can build debug and run a profiler against it if you want | 21:40 |
| mamulsow | I'm also happy just adding print statements to track it down | 21:41 |
| russellb | mamulsow: sure, that'd be great. i don't have instructions yet though | 21:41 |
| russellb | but if you know how, have at it! | 21:41 |
| *** roeyc has joined #openstack-neutron-ovn | 21:41 | |
| arosen | regXboi: from the logs i see the delete port call for the left over port come in after the delete network call which seems odd. I wonder if this is a possible race in tempest. Looking at how that works now. | 21:41 |
| russellb | arosen: weird | 21:42 |
| russellb | or does the delete come from nova? | 21:42 |
| Sam-I-Am | regXboi: https://review.openstack.org/#/c/269954/ | 21:42 |
| russellb | that could explain the race potential | 21:42 |
| regXboi | Sam-I-Am: that says merged, so let's run a recheck | 21:43 |
| regXboi | and away we go | 21:44 |
| Sam-I-Am | looks like pip revealed something we really shouldnt be doing | 21:44 |
| Sam-I-Am | however, this issue breaks things long before the port problem | 21:45 |
| Sam-I-Am | just would be nice to get past this one | 21:45 |
| regXboi | russellb: if I recall correctly, nova is used in some tempest tests rather than neutron | 21:47 |
| regXboi | I've argued against that in the past | 21:47 |
| russellb | i mean, if port delete is the result of deleting a nova VM | 21:47 |
| regXboi | especially for neutron jobs | 21:47 |
| russellb | so if tempest does delete_vm() then delete_network() | 21:47 |
| russellb | we could have this race, right? | 21:47 |
| * russellb just making guesses | 21:47 | |
| regXboi | well - the port I think I'm seeing is for the router interface. not a compute instance | 21:48 |
| regXboi | so I'm not so sure | 21:48 |
| Sam-I-Am | i'm looking in the l3 agent log | 21:48 |
| regXboi | but my memory says you are correct about nova | 21:48 |
| regXboi | oh my | 21:49 |
| regXboi | oh my oh my oh my | 21:49 |
| *** chandrav has quit IRC | 21:49 | |
| regXboi | could this be a sideeffect of the keystone middleware problem? | 21:49 |
| mestery | dum dum dum | 21:50 |
| mestery | :) | 21:50 |
| Sam-I-Am | regXboi: you mean things being really slow? | 21:50 |
| regXboi | if we are slowing things down, then that would exacerbate any race condition, wouldn't it? | 21:50 |
| Sam-I-Am | i think so | 21:50 |
| Sam-I-Am | maybe we need a neutron-slow job to check for races :) | 21:50 |
| regXboi | I don't know how to answer that | 21:51 |
| mestery | regXboi: Look for notmorgan's thread on that, he has a devstack patch | 21:51 |
| mestery | One way to verify would be to make one of our patches dependent on his | 21:51 |
| mestery | To see if that slowdown is the culprit | 21:51 |
| regXboi | mestery: I'm re-checking 269121 now that the six fix has merged | 21:51 |
| mestery | Ack | 21:52 |
| regXboi | and if it fails, I'll look at the failures and take it from there | 21:52 |
| Sam-I-Am | i thought the middleware patch merged | 21:55 |
| regXboi | Sam-I-Am: then this should pass ;) | 21:56 |
| *** shettyg has quit IRC | 21:58 | |
| Sam-I-Am | interestingly i dont see anything obvious in the keystonemiddleware repo | 21:58 |
| regXboi | well, the dsvm job is now running here: https://jenkins02.openstack.org/job/gate-tempest-dsvm-networking-ovn/88/console | 21:58 |
| regXboi | interestingly, job 84 three days ago passed | 21:59 |
| regXboi | it was for 268717 | 22:00 |
| regXboi | so the middleware cap failed on grenade-dsvm-multinode | 22:00 |
| mamulsow | russellb: sorry, I found out that there was other testing going on in the environment at that time | 22:00 |
| mamulsow | so I think the cause of the patched one being pegged at 100% was from that testing not your updates | 22:01 |
| russellb | mamulsow: no worries | 22:01 |
| russellb | ok, thanks | 22:01 |
| russellb | i haven't trashed the patch just yet :) | 22:01 |
| russellb | let me know if you get another chance to try and compare | 22:01 |
| *** roeyc has quit IRC | 22:02 | |
| mestery | russellb: Love it: "Grudgingly-Acked-by:" :) | 22:04 |
| regXboi | mestery: reference? | 22:04 |
| russellb | :-D | 22:04 |
| * regXboi wants to see that one | 22:04 | |
| russellb | http://openvswitch.org/pipermail/dev/2016-January/064745.html | 22:05 |
| mestery | russellb: Faster than me! :) | 22:05 |
| regXboi | rotflmao | 22:05 |
| *** roeyc has joined #openstack-neutron-ovn | 22:08 | |
| SpamapS | mamulsow: on an ovn related note.. testing 500 fake hypervisors with no neutron l2 agents results in a not-terribly-busy rabbitMQ. ;) | 22:09 |
| SpamapS | now.. does anybody know how to run more than 500 docker containers on a single box without exploding the networking stack? ;-) | 22:09 |
| russellb | heh, yes, not using rabbitmq tends to make rabbitmq less busy | 22:10 |
| russellb | (•_•) ... ( •_•)>⌐■-■ ... (⌐■_■) | 22:11 |
| mestery | SpamapS: I've heard this OVN thing may help with that ... | 22:11 |
| SpamapS | mestery: lies | 22:11 |
| mestery | lol | 22:11 |
| russellb | https://github.com/openvswitch/ovs/blob/master/INSTALL.Docker.md | 22:11 |
| russellb | not sure what you mean by "exploding the networking stack" though :) | 22:12 |
| * mestery pictures pieces of IP addresses laying all over SpamapS's office | 22:12 | |
| SpamapS | russellb: Oh I"m just using the default bridge networking in docker. At around 540 dockers running, all calls to socket() begin to fail | 22:12 |
| SpamapS | Yes there's netmasks and node numbers litered throughout | 22:13 |
| russellb | ¯\_(ツ)_/¯ | 22:13 |
| SpamapS | Each docker running a nova-compute configured for fakevirt | 22:14 |
| mestery | nice | 22:14 |
| SpamapS | so yeah, at some point socket() just says NOPE | 22:15 |
| SpamapS | ping, from outside the containers, for instance, does 'sendmsg: Invalid argument' | 22:15 |
| *** chandrav has joined #openstack-neutron-ovn | 22:15 | |
| SpamapS | because ping() just assumes socket() always works. ;) | 22:15 |
| openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: random test https://review.openstack.org/270502 | 22:17 |
| russellb | ooh random test | 22:17 |
| arosen | russellb: matching up the tempest logs I noticed that your query isn't returning any ports. Which seems weird. | 22:19 |
| arosen | Details: {u'detail': u'', u'message': u'Unable to complete operation on network b26280b9-6eff-4d1d-a2b1-ea2c1f59b0fd. There are one or more ports still in use on the network.', u'type': u'NetworkInUse'} | 22:19 |
| arosen | http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/025fc59/logs/screen-q-svc.txt.gz#_2016-01-20_19_52_42_283 | 22:19 |
| arosen | see how the log statement: " *** DUMPING DETAILS OF PORT IN USE ***" doesn't come after that trace? | 22:20 |
| russellb | right.. | 22:20 |
| russellb | :-/ | 22:20 |
| regXboi | well well well | 22:20 |
| russellb | race? | 22:20 |
| russellb | sounds like port was gone by the time we queried again? | 22:20 |
| arosen | maybe. | 22:20 |
| regXboi | mestery: *most* of the linearity is in ovsdb, but there is a little bit in row by value 2 | 22:20 |
| russellb | arosen: just put a loop around it, lolz | 22:21 |
| arosen | thus ml2 was born. | 22:21 |
| russellb | that was my thought yes | 22:21 |
| arosen | born again* | 22:21 |
| arosen | :) | 22:21 |
| russellb | arosen: is nova deleting the port in question? or do you not know? | 22:22 |
| arosen | i need to look closer at that. | 22:22 |
| russellb | k, i'll keep looking too (tomorrow though, i need to leave in a few minutes) | 22:23 |
| *** gangil has joined #openstack-neutron-ovn | 22:25 | |
| *** gangil has joined #openstack-neutron-ovn | 22:25 | |
| arosen | i did find this in nova but not completely sure if it's related: http://logs.openstack.org/97/269897/1/check/gate-tempest-dsvm-networking-ovn/025fc59/logs/screen-n-cpu.txt.gz#_2016-01-20_19_51_49_286 | 22:27 |
| arosen | i don't think it is though. | 22:28 |
| regXboi | ok, this recheck is still going to fail | 22:28 |
| regXboi | on the multiple_nics_order test | 22:28 |
| regXboi | but ... the middleware patch is hung up as well | 22:28 |
| regXboi | so I will see about a depends on for it tomorrow morning | 22:29 |
| Sam-I-Am | regXboi: what was the middleware patch #? | 22:29 |
| regXboi | 270417 | 22:30 |
| Sam-I-Am | yeah, thats stuck | 22:31 |
| openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Add missing call to self._process_l3_delete() https://review.openstack.org/270509 | 22:34 |
| Sam-I-Am | regXboi: wonder if its worth rechecking | 22:35 |
| regXboi | Sam-I-Am: go ahead - I'm walking away shortly | 22:35 |
| regXboi | as in as soon as dinner beeps at me | 22:35 |
| russellb | thanks for the help everyone! i really appreciate it! | 22:35 |
| russellb | i'm out for now ... need to go home / eat / see family | 22:35 |
| Sam-I-Am | pffffft | 22:35 |
| mestery | have a good night regXboi | 22:35 |
| mestery | you too regXboi | 22:35 |
| Sam-I-Am | my family is going to do family things.... | 22:36 |
| arosen | enjoy, catch you guys later! | 22:36 |
| russellb | arosen: i see the cause of ovn-controller log spam, but i don't think it would cause any test failures. i think the log spam is the worst it's doing | 22:36 |
| Sam-I-Am | i'm not going anywhere! | 22:37 |
| arosen | k | 22:37 |
| arosen | i have a multi node setup locally of ovn running the latest code and it seems to be working fine. | 22:37 |
| arosen | with the latest code. | 22:37 |
| russellb | figures | 22:37 |
| regXboi | mestery: that was .... odd :) | 22:37 |
| arosen | or at least booting vms testing acls are working fine :) | 22:38 |
| mestery | arosen: We have a 125 node system doing the same and it's fine :) | 22:38 |
| arosen | i ran into a really annoying corner case yesterday that was very self inflicted. | 22:38 |
| arosen | i cloned a vm to add as a slave. | 22:38 |
| arosen | and that OVN_UUID that's gets set in ovsdb as the system-id was the same on both systems. | 22:39 |
| arosen | that caused things not to work.. | 22:39 |
| arosen | it was funny the output of ovn-sbctl would only show one node and kept switching | 22:39 |
| mestery | arosen: Heh :) | 22:40 |
| russellb | ha, yes, ovn-controller on each node were fighting each other | 22:40 |
| mestery | There can be only one! | 22:40 |
| arosen | I pinged ben about the output and he gave me that hint | 22:40 |
| arosen | yup :) | 22:40 |
| russellb | i bet ovn-controller.log had "wtf?!" all in it | 22:40 |
| arosen | I bet others will hit this though. | 22:40 |
| mestery | rofl | 22:40 |
| russellb | hopefully the log had a hint though | 22:40 |
| arosen | nope it didn't have anything in the log. | 22:40 |
| russellb | oh, well that's lame | 22:40 |
| arosen | or at least nothing i saw. | 22:41 |
| russellb | you can file a bug if you'd like against networking-ovn | 22:41 |
| russellb | and i'll look eventually | 22:41 |
| russellb | or someone can | 22:41 |
| arosen | i think that would be against ovn though not networking-ovn? | 22:41 |
| russellb | yes | 22:41 |
| arosen | are we filing ovn bugs there too? | 22:41 |
| russellb | but ovn doesn't have a tracker .... | 22:41 |
| arosen | k | 22:41 |
| russellb | i'm using networking-ovn, or my own private trello ... | 22:41 |
| arosen | sounds good i can file a bug there. | 22:41 |
| russellb | k | 22:42 |
| russellb | alright i'm out for real | 22:42 |
| russellb | ttyl | 22:42 |
| arosen | later! | 22:42 |
| mamulsow | russellb: thanks for your help! | 22:42 |
| * regXboi also heads exit stage right | 22:42 | |
| *** regXboi has quit IRC | 22:42 | |
| *** roeyc has quit IRC | 22:47 | |
| *** palexster has quit IRC | 23:03 | |
| *** palexster has joined #openstack-neutron-ovn | 23:03 | |
| openstackgerrit | Aaron Rosen proposed openstack/networking-ovn: Add missing call to self._process_l3_update/delete() https://review.openstack.org/270509 | 23:14 |
| *** roeyc has joined #openstack-neutron-ovn | 23:42 | |
| *** chandrav has quit IRC | 23:51 | |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!