Friday, 2016-06-24

openstackgerrit OpenStack Proposal Bot proposed openstack/networking-ovn: Updated from global requirements
openstackgerrit li,chen proposed openstack/networking-ovn: Remove 'origin/' in OVN_BRANCH
rtheisFYI: gate-tempest-dsvm-networking-ovn is failing consistently, I'm looking into it now.12:24
regXboiugh, it looks like the n-ovn gate is ill13:04
regXboigate-tempest-dsvm-networking-ovn appears to be failing on test_rebuild_server_with_personality13:05
rtheisindeed it is13:13
rtheisregXboi: any ideas on what may be causing this?13:15
regXboirtheis: no clue13:21
rtheisme either13:21
regXboirtheis: I can't believe that rearranging comment blocks would trip it suddenly13:21
regXboirtheis: so I'm wondering if it's something upstream or something that merged and we missed13:21
rtheisI see the failure on another patch set too13:21
regXboilet's pull out logstash13:22
rtheisregXboi: I didn't see anything obvious in neutron commits13:22
regXboiyeah, that's what bothers me13:22
regXboilogstash says it's only showing up in our jobs13:24
regXboiin the past 7 days13:24
regXboiso that says something merged that broke things13:24
russellbseems to be every run against networking-ovn right now13:29
regXboiyep... let's see if it's geographic13:30
rtheisregXboi: how do you check logstash to find this information?13:31
regXboinope ... it's spread around the clouds13:31
regXboirtheis: what I did was to search on the following:13:31
regXboimessage:"Current status: REBUILD. Current task state: rebuild_spawning"13:31
regXboiand then set the time to 7 days to 5 minutes ago13:31
regXboiand then start looking at the micro-analysis13:31
*** numans has quit IRC14:48
mesteryrtheis: Thanks for looking into this!14:50
rtheismestery: yw, I reverted nova commits to a couple days ago and now it works14:54
mesteryrtheis: Well, there you go :)14:54
* rtheis thinks it is related to neutron provisioning blocks for port activation16:08
mesteryrtheis: Still thinking it's port activation?16:55
rtheisI do16:55
rtheisbut trying to figure out why ovs doesn't have the same problem since it also uses provisioning blocks16:56
regXboicould the order of events be different in the two scenarios?16:57
* regXboi just guessing at this point16:57
rtheisI hacked something that worked with OVN by removing the provisioning block and immediately marking port as active on binding16:57
russellbwe can do that temporarily if needed to unblock the gate16:58
russellbthough in theory the 'up' state is supposed to give us what we need, right?16:58
rtheisI think so16:58
rtheisI'm studying an ovs environment now16:59
*** salv-orl_ has joined #openstack-neutron-ovn16:59
rtheisregXboi: I think the ordering may be part of it...ovs seems to activate faster and may let nova clear vif binding phase faster17:02
*** salv-orlando has quit IRC17:02
regXboirtheis: I can see that17:02
*** brad_behle has quit IRC17:02
rtheisI'm going to try delaying ovs port activation to see if I can recreate there17:03
azbiswasOVN should be able to mark the port as "up/active" within the timeframe of the timeout.17:16
rtheisIt does17:17
azbiswasso the problem is that neutron doesn't see it?17:17
rtheisneutron does see it but nova doesn't get the event17:19
russellbis always up?  or never changes to up?  or?19:00
*** yamamoto has quit IRC19:00
rtheislaunch instances will go from down to up19:00
*** yamamoto has joined #openstack-neutron-ovn19:00
rtheisrebuild instance doesn't trigger port going down19:00
rtheisthus no up event19:01
*** regXboi has joined #openstack-neutron-ovn19:01
russellbif rebuild means deleing a port from br-int, and then re-adding it, seeing down then up is what i would expect19:02
russellbunless the system is so overloaded it's recreated before OVN notices and reports all of it happening19:02
russellbmaybe changing to down is broken in general ... i can check19:02
rtheisdown seems to be broken19:03
russellbyay ovn bug19:03
russellbyou just saved me from working on a document19:03
rtheisI just shut down an instance and the port still reports up19:03
russellbovn-controller isn't clearing the chassis column on Port_Binding when a port gets deleted19:04
russellbthis would be handled in ovn/controller/binding.c for those following along at home19:04
rtheisrecent commit changed this?19:04
regXboiok, so I found something real - I guess that was worth something :)19:05
regXboium yeah19:05
regXboiso I've likely broken it ;)19:05
regXboimestery: I'll have to take a pass on that performance stuff19:05
regXboibecause ovn/controller/binding.c - it *has* changed19:05
regXboiand recently too19:05
russellbyep, probably regXboi's patch19:06
regXboiok, so this is on my place to go look19:06
mesteryNice work regXboi, we let you get a commit into OVS and you break everything19:07
regXboimestery: it's that anti-midas touch19:07
mesteryI think it is19:07
regXboiso, russellb, I may need to ask some questions here as to what triggers the down event19:09
* rtheis just glad I don't have to fix neutron port provisioning in networking-ovn again 19:09
russellbPort_Binding chassis column being set to empty19:09
russellbovn-controller isn't properly handling the ovs port going away19:09
russellband then updating Port_Binding to clear chassis as a result19:09
regXboirussellb: ok, let me look at the code some more - I thought I caught that correctly19:10
russellbi'll add a test case for what's broken too19:12
regXboiactually, to ovs unit tests?19:12
russellbto ovs, yes19:12
regXboiack - that will help19:12
regXboiI'll base the fix on that patch19:13
regXboirussellb: you are talking about this piece of code from before the patch, yes?19:15
*** salv-orlando has joined #openstack-neutron-ovn19:16
regXboi            if (ctx->ovnsb_idl_txn) {19:16
regXboi                VLOG_INFO("Releasing lport %s from this chassis.",19:16
regXboi                          binding_rec->logical_port);19:16
regXboi                sbrec_port_binding_set_chassis(binding_rec, NULL);19:16
regXboi            }19:16
russellbyes, that's what would trigger "up" being set to false in Logical_Switch_Port of OVN_Northbound19:16
russellbwhen ovn-northd sees that change happen19:17
regXboiah - I see what's wrong - it's in the wrong place19:17
regXboimy bad19:17
regXboior I should say, it isn't in enough places19:18
*** a_ta has joined #openstack-neutron-ovn19:20
*** azbiswas has quit IRC19:20
regXboiso this is going to be tricky19:25
regXboithe issue is that the list of ports coming out of the iface_ids has changed19:25
regXboiand I wasn't looking to check that19:25
*** a_ta has quit IRC19:26
regXboirussellb: would it be ok to persist the list of local iface ids and use that to handle this case?19:26
russellbi guess so19:27
russellbi'm not deeply familiar with your changes19:27
regXboilet me put together the change and make sure it passes your test and then let's find somebody to help push it in19:28
regXboirussellb: the problem is that this code is looking for changes in the port binding table - it's not correctly handling changes in the local port list19:29
*** yamamoto has quit IRC19:29
regXboiok russellb, I have a patch to test - do you have tests for it?19:48
russellbregXboi: i do19:48
regXboihas it merged already or is it just in the queue?19:48
russellblocal, i was about to post to the list19:48
regXboiok, post it to the list with the comment that it will fail if merged19:49
russellbone sec and i'll just post ... i have a test case and the reverts necessary to make it pass (not that we need to merge the reverts, just showing the issue)19:49
regXboiack - hold the reverts in case my idea doesn't work19:49
regXboior post the reverts with the comment that I'm working the patch to fix19:50
russellbposted reverts with comment that you're working on fix19:52
russellb+ patch with a test case19:52
russellbyou can also test manually by doing something like ....19:52
russellb$ make sandbox SANDBOXFLAGS="--ovn"19:52
russellb$ ovn/env1/setup.sh19:52
russellb$ ovn-nbctl lsp-get-up sw0-port119:53
russellb$ ovs-vsctl del-port br-int lport119:53
russellb$ ovn-nbctl lsp-get-up sw0-port119:53
russellb^^^^^ will be "down" when fixed19:53
russellb"up" while still broken19:53
regXboiok, cool - I'm running compile and unit tests to make sure I have everything else fixed19:53
regXboier I didn't break anything else19:53
regXboiand then I'll try that and if good, I'll post19:54
* russellb goes back to doc hacking....19:54
*** a_ta has joined #openstack-neutron-ovn19:55
regXboime goes to clean up mess he made19:55
russellball good19:55
russellbour CI did its job!19:55
mesteryYay to the CI!19:56
regXboiyeah, but an OVS test would have been better19:57
regXboiso we'll get that in too19:57
regXboiok, existing unit tests running19:59
regXboirussellb: it passes the above test :)20:00
regXboinow to make sure I didn't break anything else and I'll send it up20:01
russellbpasses the new test i added?20:01
regXboiI mean the by hand text you gave20:01
russellbohhh, got it20:01
russellbthat's a good sign20:01
regXboiI'm running it through the rest of the unit tests to make sure I didn't break something else20:01
regXboiand then I'll put it on top of your test and try it again20:01
russellbi double checked that the manual test i gave here does fail without changes20:03
russellbso yeah, you probably got it20:03
regXboiwell, I'm being paranoid20:03
regXboihaving broken it once :)20:04
regXboiugh... the Flow IPFIX sanity check test is just sitting here20:08
regXboiso let's take the chance and put your patch in20:09
russellbmake check TESTSUITEFLAGS="-k ovn"20:14
russellbthat's exhaustive enough :-)20:14
regXboithe new test is in the e2e space?20:15
* regXboi hates raceful tests20:16
regXboi2065: ovn -- port state up and down                   ok20:16
regXboinot so fast - had an e2e case fail20:16
regXboineed to make sure its a race case20:16
russellbyeah, the issue is ovn-controller, but the test is the state reflected up through the northbound db20:16
russellbthe same way networking-ovn sees the failure20:16
regXboiaw come on... you can't be really failing on me, can you?20:17
russellbi suggest a hammer20:18
regXboiyou can, but I think I've broken 3 HVs, 3 LS, 3 lports/LS, 1 LR20:19
*** fzdarsky has joined #openstack-neutron-ovn20:31
lrichardmeanwhile, the last 3 travis-ci builds failed with:20:33
lrichard1051: ofproto-dpif - Flow IPFIX sanity check20:33
lrichardNo output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.20:33
lrichardThe build has been terminated20:33
regXboithat's where I was failing, so that's not on me :)20:34
russellblrichard: hm, i think an ipfix related patch just merged?20:34
lrichardrusselb: yes, the first of these was an ipfix commit20:34
lrichardtrying to reproduce now20:34
russellbyay Friiday!20:34
regXboino kissinf20:34
regXboier kidding20:35
lrichardipfix is unhappy 1052: ofproto-dpif - Flow IPFIX sanity check - tunnel set FAILED (
regXboi2056: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           ok20:37
regXboigit status20:37
regXboiOn branch test20:37
regXboinothing to commit, working directory clean20:37
regXboiok sending the patch, russellb20:37
regXboiyou are
regXboiack sent20:39
regXboiand it's landed
mesteryNice work regXboi regXboi lrichard :)20:42
*** yamamoto has quit IRC20:42
mesteryTeamwork #ftw!20:42
russellbregXboi got double credit!!20:42
* mestery gives regXboi a gold start20:42
regXboimestery: not me twice: russellb and rtheis20:42
mesteryOh yes!20:42
* mestery hands rtheis two gold stars20:43
mesteryHe was the one who found the regression :)20:43
regXboibut #teamwork_rocks20:43
rtheisyes #teamwork_rocks20:43
regXboinow can I get an acked by :)20:47
*** fzdarsky has quit IRC20:48
* flaviof re-reading buffer20:49
flaviofregXboi: i see the race too. I normally run the racy test and see it pass: make check TESTSUITEFLAGS="$TESTNUMBER"20:50
regXboiflaviof: yes, but this one took a *LOT* of re-runs - I was into looking for what I broke20:51
regXboiwhich made no sense to me20:51
* regXboi suspects there may be other races I'll be uncovering .... :-/20:52
* flaviof never run ofproto*.at .... normally stay on "-k ovn"20:52
regXboiflaviof: I tend to run everything - just in case20:54
russellbregXboi: i'll let ben review your fix since it isn't a trivial fix and he reviewed the original code20:55
russellbthanks for working on it quickly :)20:55
regXboirussellb: sure20:58
regXboirussellb: besides, as the committer that pushed it in, it does fall to him if I hadn't had a patch ready20:59
regXboibut "I broke it, I fix it" is how I roll20:59
flaviofregXboi: if you are not breaking anything, they you are not doing anything21:03
regXboiwell, that's always a risk21:04
regXboiand I can say that it did pass unit tests, but that's cold comfort21:04
rtheisrussellb: OpenStack mailing list has a python 3 discussion, and I noticed networking-ovn doesn't have a py34 job.  Is it time to add one?21:05
russellbyes, we should21:05
russellbthere's a version of the ovs python lib on pypi that works with py3 now21:05
russellbso we should be able to21:05
rtheiscool, I'll open a bug to track it21:06
russellbhave a nice weekend, everyone21:11
rtheisyou too21:13
regXboiwell.. he's gone, but the patch to fix things landed at OVS21:13
regXboirtheis, shall we try a recheck to see what we see?21:14
rtheisrechecking now21:14
regXboiI'm running a reheck on 33075121:14
*** yamamoto has quit IRC21:15
regXboiand have them queued up on the status page21:15
*** banix has quit IRC21:35
regXboijobs just got re-queued21:35
* rtheis runs manual tests now21:41
rtheisoh yeah...port status is now down when shutting down the instance21:41
regXboiso hopefully this will pass21:42
rtheismanual recreate scenario using rebuild also passes21:44
*** a_ta has quit IRC21:44
rtheisI think we are good to go21:44
rtheisregXboi: have a good weekend21:44
