21:02:06 #startmeeting Networking 21:02:07 Meeting started Mon Dec 23 21:02:06 2013 UTC and is due to finish in 60 minutes. The chair is markmcclain. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:10 The meeting name has been set to 'networking' 21:02:27 #link https://wiki.openstack.org/wiki/Network/Meetings 21:02:35 #topic Announcements 21:02:48 #info No meeting Jan 30th 21:03:05 Dec 30th? 21:03:10 oops :) 21:03:18 #info No meeting Dec 30th 21:04:47 Many folks will be on vacation so pinging the mailing list is probably the best bet between now and the New Year 21:04:57 #link https://launchpad.net/neutron/+milestone/icehouse-2 21:05:23 We're 4 short weeks from I-2 21:05:48 We're in pretty decent shape blueprint wise 21:05:58 the few that are not started should be started soon 21:06:03 #topic bugs 21:06:21 anteaya has gone through and updated the gate critical bugs 21:06:54 the current #1 bug is this one 21:06:55 https://bugs.launchpad.net/neutron/+bug/1253896 21:06:58 Launchpad bug 1253896 in neutron "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,In progress] 21:07:48 salv-orlando has a devstack change which addresses this 21:07:48 .org/#/c/63641/ 21:07:53 https://review.openstack.org/#/c/63641/ 21:08:18 this devstack change is a workaround on a potential issue in the ml2 plugin for which I will file a bug shortly. 21:08:32 'potential' -> '' 21:08:50 fixed that for you ;) 21:09:02 I say potential because i'd like to have some feedback from l2 developers. It won't be the first time I've found a red herring 21:09:12 or whatever the US english idiom for that is 21:09:29 rkukura: ^^^ 21:09:35 salv-orlando: that's the correct idiom 21:10:01 well, via rkukura it seems that agent liveness is critical to port binding so yeah, we have a problem 21:10:04 markmcclain: What's the question? 21:10:26 salv-orlando's proposed fix to workaround a ml2 issue 21:10:44 rkukura: the ml2 issue we were just disussing - that port binding depends on agent liveness 21:10:53 discussing 21:11:01 Was just discussing this with marun. The agent-based mechanism drivers need to know whether or not the associated agent is on the node, and need to know if they have mappings for the physical networks. 21:11:02 rkukura: some details are in a comment in the bug reports, but I will file a new bug report with more details. 21:11:22 rkukura: cool. it seems there is no bug in ml2 then? 21:11:35 salv-orlando: it's a bug, it just may be intentional 21:11:37 It currently uses the "liveness" notion, but maybe we need something more relaxed. 21:11:56 then we probably need fail a little more explicitly if something is not live enough 21:12:05 i'd disagree 21:12:11 eventual consistency should be the goal 21:12:27 marun: agreed if we can actually achieve it 21:12:34 if we can we need to fail faster 21:12:40 s/can/can't/ 21:12:42 * salv-orlando thinks the current discussion is more suitable for openstack-neutron rather than openstack-meeting 21:12:46 ++ 21:12:53 we need to be able to perform binding async and explicitly track success/failure 21:12:57 I will be around after the meeting 21:13:03 marun: We can't bind ports until we know something about the agent. I think the same thing applies with l3 agents - we can't schedule routers until we know what external network(s) they have 21:13:13 Any other bugs we need to discuss? 21:13:35 #topic Nova Parity 21:13:38 beagles: hi 21:14:31 beagles: want to update us? 21:14:39 I was chatting with beagles a few minutes ago 21:14:41 hi.. so in parity related efforts - we met and affirmed the drivers on the basic categories last week 21:14:42 he should be around 21:14:50 (outlined on wiki) 21:15:00 great 21:15:20 glad there are assigned drivers for the different areas 21:15:38 Anything else to add? 21:16:01 and we focused a great deal on getting a handle of the situation with the full tempest tests... which rossella_s will report on in eventually as part of the neutron tempest thing 21:16:16 beagles: can you provide link, sorry 21:16:24 we still need a driver/plan-of-attack for the multi-host question 21:16:37 I think it's this wiki page: https://wiki.openstack.org/wiki/NovaNetNeutronParity 21:16:38 dkehn: https://wiki.openstack.org/wiki/Network/Meetings 21:16:52 beagles: for routing EmilienM and Sylvain are looking into it 21:16:58 beagles: sorry gues that was obvious 21:17:04 I just summarized prior to the meeting :) 21:17:15 markmcclain: cool 21:17:32 I'll sync up with them an see what's goin on. 21:17:37 sounds good 21:17:38 I think that's pretty much it for now 21:17:55 Thanks for updating happy to see the progress the team is making 21:17:58 #topic Tempest 21:18:01 mlavalle: around? 21:18:06 yes 21:18:20 looks like we have lots of items in review 21:18:49 we will start with rossella_s giving an update on the progress with the neutron tempest full job 21:19:26 rossella_s: want to update? 21:19:38 Yes! beagles, mlavalle and I have been meeting daily to sync up and coordinate the efforts. We ran an analysis of the current tests failures for the full neutron job. The situation is not bad. Few tests are failing, most of the failures are due to the timeout of the floating IP propagation. Since other people are working on it, we will follow up their efforts and monitor the impact of those on the tests. On the nova parity side we will prov 21:19:38 a list of the tests currently skipped for Neutron and start triaging them. 21:20:11 awesome 21:20:35 are you planning on putting the list in an etherpad? 21:20:56 markmcclain: yes 21:21:09 we will provide the link 21:21:13 great 21:21:21 timeout of floating ip propagation is basically the well known bug that nova takes a while to display neutron's floating ips? 21:21:56 basically the thing that you need to refresh horizon a few times before you can see the floating ip in instance list 21:22:30 salv-orlando: not sure it's the same 21:22:47 I think it is the same issue. 21:23:02 yeah the cache in nova is updated a fixed interval 21:23:21 and that is probably triggering the timeouts on tempest 21:23:49 salv-orlando: that is actually part of it.. yfried found that IP information was failing to propagate in the timeouts set during his cross tenant connectivity check, hence the motivation for removing the check 21:24:10 I have a question about nova's network apis which are currently not proxied into neutron, such as the ones for adding ips. What should we do about it? 21:25:04 do you mean the ones like the bulk floating IP ones? 21:26:19 seems like we should expand tests to include the non-proxied versions of the API 21:26:52 Or is it better tempest calls neutron API to check floating IP is available if neutron is enabled? 21:26:54 markmcclain: meaning that we should implement the proxy, I think. 21:27:10 salv-orlando: good point 21:27:54 which leads me to the follow up question that I never understood if compatibility with nova-network api is a requirement for nova-network deprecation 21:29:22 yes tools that work with nova api should work when neutron is the provider 21:29:31 I think it is during the deprecation period at the least. 21:29:52 in the situations where we can't provide compatibility we need to document the workaround or how to solve the use case 21:29:52 ok, so we should proxy 100% of calls. 21:30:21 or at least aim for that. This should create a plethora of low hanging fruit tasks. 21:30:56 yes it should 21:31:04 yup, there is an overlap with the triaging of skipped tests that rossella_s reported earlier 21:31:36 rossella_s: ^^^^ this is key to the work we are doing with the tempest job and the skipped tests 21:32:36 yes! 21:32:57 I think another key task is parallel testing, for which there are a few patches under review. Let me know if it's ok to give an update now or if it's better to wait open discussion 21:33:18 parallel testing falls into this topic 21:33:24 so go ahead and update 21:33:48 ok so basically we have 5 patches under review: 4 neutron, 1 tempest 21:34:03 we're getting a good deal of reviews, but beyond me and marun, no core devs. 21:34:24 the tempest patch or neutron patches need core attention? 21:34:33 all patches 21:34:54 but all these patches do no guarantee a success rate anywhere close to 100% 21:35:02 ok 21:35:15 what is the rate? 21:35:29 I have about 80% on my dev machine 21:35:51 ok 21:35:58 The problem is that ovs commands slow down under load, wiring takes longer, and happens after the vm sends the first DHCPDISCOVER 21:36:15 then the vms sends the second DHCPDISCOVER 60 seconds later, and that's too late 21:36:21 I have a few options for you: 21:36:28 salv-orlando: are you using multiple api workers? 21:36:33 - change cirrus to send a DHCPDISCOVER more frequently 21:36:36 marun: it's the agent. 21:36:39 not the api 21:37:00 multiple api workers won't make ovs-vsctl and ovs-ofctl faster 21:37:42 salv-orlando: sorry for diverting, just want to point out that if multiple api workers are used the number of potential races increases. 21:37:51 what are the other options? 21:37:53 another option is to don't bother trying the ssh connections until all the ports are ACTIVE, which might be tricky in scenario like test_snapshot_patter which apply to both nova-net and neutron 21:38:15 marun: the gate does not yet use multiple api workers, unless I missed that change 21:38:30 and finally the last option is what I'm experimenting now: ovs 2.0 21:38:35 salv-orlando: correct, i wasn't sure if you were using it locally is all. 21:38:39 with multithreaded vswitchd 21:39:04 salv-orlando: how does it look? 21:39:14 it looks like devstack does not start so far :( 21:39:21 :( 21:39:24 so I don't know yet 21:39:52 salv-orlando: what about the alternative - managing ovs state through a monitor to minimize ovs invocations? 21:39:58 but I just wanted to ask you whether you see any of the above options as totally unfeasible 21:40:29 it's not about monitoring ovs, but acting on it. The slow commands are the ones which destroy./setup flows and configure the data plane 21:40:49 the only one that might be problematic is waiting for the ports to become active 21:40:51 but if there's a way to use an async monitor for it 21:41:43 increasing DHCPDISCOVER in cirros might be not so easy as well 21:41:49 so let's see how it goes with ovs 2.0 21:42:11 otherwise… what about the linux bridge agent :) 21:42:30 or the HDN agent? 21:42:49 yeah let's work towards 2.0 and see what happens 21:43:03 and if not let's kick off a ML discussion since we're missing so many folks today 21:43:15 mlavalle: Any other tempest items to discuss? 21:43:18 on the api test gap analysis front, I had a conversation earlier today with sdague. We are not developing negative api tests manually anymore. There will be soon a generative tool that will do this automatically. This simplifies a lot the gap analysis and the tests we need to implement. I will update the ether pad accordingly 21:43:41 good to know 21:43:50 finally, I am keeping tabs on people developing tests, as you can see in today's agenda 21:43:58 when the pad is updated can ping the list to let everyone know of the change 21:44:06 I will 21:44:24 that's all I have 21:44:29 thanks for updating 21:45:32 Looks like all of the subteam reports are unchanged from last week 21:45:37 #topic Open Discussion 21:46:17 i'm making good progress on a design doc for an eventually consistent dhcp agent 21:46:29 will you allow me to be the usual pedant annoying axxxxxe even if it's christmas? 21:46:29 marun: great news 21:46:41 salv-orlando: yes 21:46:56 devstack migrations failures and the merging of https://review.openstack.org/#/c/61663/ 21:46:57 can we just stop let third-party CIs vote if they're still WIPs? 21:47:06 yes 21:47:20 need some guidance on fix backporting to havana. Can somebody help, please? 21:47:28 #action markmcclain to reach out to mis-behaving 3rd party systems 21:48:06 irenab: do you need help on the process? 21:48:15 yes 21:48:19 irenab: https://wiki.openstack.org/wiki/StableBranch#Workflow 21:48:39 markmclain: thanks 21:48:51 dkehn: the patch needs to fix trunk vs revising released migrations 21:49:09 thx 21:49:12 revising released migration causes problems for deployers 21:50:28 markmcclain: all I know is that you can't pull from trunk and get devstack to work 21:50:52 dkehn: you sure? we have items passing the gate each day 21:50:59 and that's exactly what they do 21:50:59 markmcclain: i have that same problem 21:51:11 same 21:51:28 markmcclain: whatever the gate is doing, it's not replicating developer use well enough to prevent breakage 21:51:36 markmcclain: this when you run locally 21:52:08 bingo 21:52:10 oh, i know what it is 21:52:12 damnit 21:52:14 it's lbaas 21:52:29 marun: ??? 21:52:31 the gate works because it enables lbaas by default 21:52:34 marun: explain 21:52:42 ok 21:52:50 enabling lbaas ensures that a migration that fixes the problem is run 21:53:10 makes sense 21:53:12 for those of us that by haven't configured lbaas, devstack is broken at present 21:53:20 I test with lbaas running locally 21:53:43 so, at least we have a workaround. but a fix is still needed. 21:53:48 right 21:53:49 marun is correct, I encountered the same problem. There is a specific migration which so far is triggered only if the lbaas service plugin is enabled 21:55:48 well that should make it easier to craft a good set of changes to fix the bad migrations 21:56:38 Any other items for open discussion? 21:56:48 so basically lbaas MUST be enabled fto work locally 21:57:46 dkehn: right let's discuss in -neutron after the meeting 21:57:54 ok 21:58:36 I think this affects also production deployments, not just devstack. 21:58:45 salv-orlando: right 21:59:18 We're out of time this week, so we'll jump into -neutron to continue this discussion 21:59:19 and I have no idea about how to back port without creating a branch in alembic migrations 21:59:26 k see you there 21:59:51 Expect the community to be pretty quiet over the next 10 days, so email might be best way to communicate since many will be on holiday 22:00:58 For those traveling for the holidays.. safe travels and our next meeting will be Jan 6th 22:01:04 #endmeeting