#openstack-meeting log

21:02:06 <markmcclain> #startmeeting Networking
21:02:07 <openstack> Meeting started Mon Dec 23 21:02:06 2013 UTC and is due to finish in 60 minutes.  The chair is markmcclain. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:10 <openstack> The meeting name has been set to 'networking'
21:02:27 <markmcclain> #link https://wiki.openstack.org/wiki/Network/Meetings
21:02:35 <markmcclain> #topic Announcements
21:02:48 <markmcclain> #info No meeting Jan 30th
21:03:05 <beagles> Dec 30th?
21:03:10 <markmcclain> oops :)
21:03:18 <markmcclain> #info No meeting Dec 30th
21:04:47 <markmcclain> Many folks will be on vacation so pinging the mailing list is probably the best bet between now and the New Year
21:04:57 <markmcclain> #link https://launchpad.net/neutron/+milestone/icehouse-2
21:05:23 <markmcclain> We're 4 short weeks from I-2
21:05:48 <markmcclain> We're in pretty decent shape blueprint wise
21:05:58 <markmcclain> the few that are not started should be started soon
21:06:03 <markmcclain> #topic bugs
21:06:21 <markmcclain> anteaya has gone through and updated the gate critical bugs
21:06:54 <markmcclain> the current #1 bug is this one
21:06:55 <markmcclain> https://bugs.launchpad.net/neutron/+bug/1253896
21:06:58 <uvirtbot> Launchpad bug 1253896 in neutron "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,In progress]
21:07:48 <markmcclain> salv-orlando has a devstack change which addresses this
21:07:48 <markmcclain> .org/#/c/63641/
21:07:53 <markmcclain> https://review.openstack.org/#/c/63641/
21:08:18 <salv-orlando> this devstack change is a workaround on a potential issue in the ml2 plugin for which I will file a bug shortly.
21:08:32 <marun> 'potential' -> ''
21:08:50 <marun> fixed that for you ;)
21:09:02 <salv-orlando> I say potential because i'd like to have some feedback from l2 developers. It won't be the first time I've found a red herring
21:09:12 <salv-orlando> or whatever the US english idiom for that is
21:09:29 <markmcclain> rkukura: ^^^
21:09:35 <marun> salv-orlando: that's the correct idiom
21:10:01 <marun> well, via rkukura it seems that agent liveness is critical to port binding so yeah, we have a problem
21:10:04 <rkukura> markmcclain: What's the question?
21:10:26 <markmcclain> salv-orlando's proposed fix to workaround a ml2 issue
21:10:44 <marun> rkukura: the ml2 issue we were just disussing - that port binding depends on agent liveness
21:10:53 <marun> discussing
21:11:01 <rkukura> Was just discussing this with marun. The agent-based mechanism drivers need to know whether or not the associated agent is on the node, and need to know if they have mappings for the physical networks.
21:11:02 <salv-orlando> rkukura: some details are in a comment in the bug reports, but I will file a new bug report with more details.
21:11:22 <salv-orlando> rkukura: cool. it seems there is no bug in ml2 then?
21:11:35 <marun> salv-orlando: it's a bug, it just may be intentional
21:11:37 <rkukura> It currently uses the "liveness" notion, but maybe we need something more relaxed.
21:11:56 <markmcclain> then we probably need fail a little more explicitly if something is not live enough
21:12:05 <marun> i'd disagree
21:12:11 <marun> eventual consistency should be the goal
21:12:27 <markmcclain> marun: agreed if we can actually achieve it
21:12:34 <markmcclain> if we can we need to fail faster
21:12:40 <markmcclain> s/can/can't/
21:12:42 * salv-orlando thinks the current discussion is more suitable for openstack-neutron rather than openstack-meeting
21:12:46 <markmcclain> ++
21:12:53 <marun> we need to be able to perform binding async and explicitly track success/failure
21:12:57 <salv-orlando> I will be around after the meeting
21:13:03 <rkukura> marun: We can't bind ports until we know something about the agent. I think the same thing applies with l3 agents - we can't schedule routers until we know what external network(s) they have
21:13:13 <markmcclain> Any other bugs we need to discuss?
21:13:35 <markmcclain> #topic Nova Parity
21:13:38 <markmcclain> beagles: hi
21:14:31 <markmcclain> beagles: want to update us?
21:14:39 <salv-orlando> I was chatting with beagles a few minutes ago
21:14:41 <beagles> hi.. so in parity related efforts - we met and affirmed the drivers on the basic categories last week
21:14:42 <salv-orlando> he should be around
21:14:50 <beagles> (outlined on wiki)
21:15:00 <markmcclain> great
21:15:20 <markmcclain> glad there are assigned drivers for the different areas
21:15:38 <markmcclain> Anything else to add?
21:16:01 <beagles> and we focused a great deal on getting a handle of the situation with the full tempest tests... which rossella_s will report on in eventually as part of the neutron tempest thing
21:16:16 <dkehn> beagles: can you provide link, sorry
21:16:24 <beagles> we still need a driver/plan-of-attack for the multi-host question
21:16:37 <salv-orlando> I think it's this wiki page: https://wiki.openstack.org/wiki/NovaNetNeutronParity
21:16:38 <beagles> dkehn: https://wiki.openstack.org/wiki/Network/Meetings
21:16:52 <markmcclain> beagles: for routing EmilienM and Sylvain are looking into it
21:16:58 <dkehn> beagles: sorry gues that was obvious
21:17:04 <beagles> I just summarized prior to the meeting :)
21:17:15 <beagles> markmcclain: cool
21:17:32 <beagles> I'll sync up with them an see what's goin on.
21:17:37 <markmcclain> sounds good
21:17:38 <beagles> I think that's pretty much it for now
21:17:55 <markmcclain> Thanks for updating happy to see the progress the team is making
21:17:58 <markmcclain> #topic Tempest
21:18:01 <markmcclain> mlavalle: around?
21:18:06 <mlavalle> yes
21:18:20 <markmcclain> looks like we have lots of items in review
21:18:49 <mlavalle> we will start with rossella_s giving an update on the progress with the neutron tempest full job
21:19:26 <markmcclain> rossella_s: want to update?
21:19:38 <rossella_s> Yes! beagles, mlavalle and I have been meeting daily to sync up and coordinate the efforts. We ran an analysis of the current tests failures for the full neutron job. The situation is not bad. Few tests are failing, most of the failures are due to the timeout of the floating IP propagation. Since other people are working on it, we will follow up their efforts and monitor the impact of those on the tests. On the nova parity side we will prov
21:19:38 <rossella_s> a list of the tests currently skipped for Neutron and start triaging them.
21:20:11 <markmcclain> awesome
21:20:35 <markmcclain> are you planning on putting the list in an etherpad?
21:20:56 <rossella_s> markmcclain: yes
21:21:09 <rossella_s> we will provide the link
21:21:13 <markmcclain> great
21:21:21 <salv-orlando> timeout of floating ip propagation is basically the well known bug that nova takes a while to display neutron's floating ips?
21:21:56 <salv-orlando> basically the thing that you need to refresh horizon a few times before you can see the floating ip in instance list
21:22:30 <rossella_s> salv-orlando: not sure it's the same
21:22:47 <amotoki> I think it is the same issue.
21:23:02 <markmcclain> yeah the cache in nova is updated a fixed interval
21:23:21 <salv-orlando> and that is probably triggering the timeouts on tempest
21:23:49 <beagles> salv-orlando: that is actually part of it.. yfried found that IP information was failing to propagate in the timeouts set during his cross tenant connectivity check, hence the motivation for removing the check
21:24:10 <salv-orlando> I have a question about nova's network apis which are currently not proxied into neutron, such as the ones for adding ips. What should we do about it?
21:25:04 <beagles> do you mean the ones like the bulk floating IP ones?
21:26:19 <markmcclain> seems like we should expand tests to include the non-proxied versions of the API
21:26:52 <amotoki> Or is it better tempest calls neutron API to check floating IP is available if neutron is enabled?
21:26:54 <salv-orlando> markmcclain: meaning that we should implement the proxy, I think.
21:27:10 <markmcclain> salv-orlando: good point
21:27:54 <salv-orlando> which leads me to the follow up question that I never understood if compatibility with nova-network api is a requirement for nova-network deprecation
21:29:22 <markmcclain> yes tools that work with nova api should work when neutron is the provider
21:29:31 <beagles> I think it is during the deprecation period at the least.
21:29:52 <markmcclain> in the situations where we can't provide compatibility we need to document the workaround or how to solve the use case
21:29:52 <salv-orlando> ok, so we should proxy 100% of calls.
21:30:21 <salv-orlando> or at least aim for that. This should create a plethora of low hanging fruit tasks.
21:30:56 <markmcclain> yes it should
21:31:04 <beagles> yup, there is an overlap with the triaging of skipped tests that rossella_s reported earlier
21:31:36 <mlavalle> rossella_s: ^^^^ this is key to the work we are doing with the tempest job and the skipped tests
21:32:36 <rossella_s> yes!
21:32:57 <salv-orlando> I think another key task is parallel testing, for which there are a few patches under review. Let me know if it's ok to give an update now or if it's better to wait open discussion
21:33:18 <markmcclain> parallel testing falls into this topic
21:33:24 <markmcclain> so go ahead and update
21:33:48 <salv-orlando> ok so basically we have 5 patches under review: 4 neutron, 1 tempest
21:34:03 <salv-orlando> we're getting a good deal of reviews, but beyond me and marun, no core devs.
21:34:24 <markmcclain> the tempest patch or neutron patches need core attention?
21:34:33 <salv-orlando> all patches
21:34:54 <salv-orlando> but all these patches do no guarantee a success rate anywhere close to 100%
21:35:02 <markmcclain> ok
21:35:15 <markmcclain> what is the rate?
21:35:29 <salv-orlando> I have about 80% on my dev machine
21:35:51 <markmcclain> ok
21:35:58 <salv-orlando> The problem is that ovs commands slow down under load, wiring takes longer, and happens after the vm sends the first DHCPDISCOVER
21:36:15 <salv-orlando> then the vms sends the second DHCPDISCOVER 60 seconds later, and that's too late
21:36:21 <salv-orlando> I have a few options for you:
21:36:28 <marun> salv-orlando: are you using multiple api workers?
21:36:33 <salv-orlando> - change cirrus to send a DHCPDISCOVER more frequently
21:36:36 <salv-orlando> marun: it's the agent.
21:36:39 <salv-orlando> not the api
21:37:00 <salv-orlando> multiple api workers won't make ovs-vsctl and ovs-ofctl faster
21:37:42 <marun> salv-orlando: sorry for diverting, just want to point out that if multiple api workers are used the number of potential races increases.
21:37:51 <markmcclain> what are the other options?
21:37:53 <salv-orlando> another option is to don't bother trying the ssh connections until all the ports are ACTIVE, which might be tricky in scenario like test_snapshot_patter which apply to both nova-net and neutron
21:38:15 <salv-orlando> marun: the gate does not yet use multiple api workers, unless I missed that change
21:38:30 <salv-orlando> and finally the last option is what I'm experimenting now: ovs 2.0
21:38:35 <marun> salv-orlando: correct, i wasn't sure if  you were using it locally is all.
21:38:39 <salv-orlando> with multithreaded vswitchd
21:39:04 <marun> salv-orlando: how does it look?
21:39:14 <salv-orlando> it looks like devstack does not start so far :(
21:39:21 <marun> :(
21:39:24 <salv-orlando> so I don't know yet
21:39:52 <marun> salv-orlando: what about the alternative - managing ovs state through a monitor to minimize ovs invocations?
21:39:58 <salv-orlando> but I just wanted to ask you whether you see any of the above options as totally unfeasible
21:40:29 <salv-orlando> it's not about monitoring ovs, but acting on it. The slow commands are the ones which destroy./setup flows and configure the data plane
21:40:49 <markmcclain> the only one that might be problematic is waiting for the ports to become active
21:40:51 <salv-orlando> but if there's a way to use an async monitor for it
21:41:43 <salv-orlando> increasing DHCPDISCOVER in cirros might be not so easy as well
21:41:49 <salv-orlando> so let's see how it goes with ovs 2.0
21:42:11 <salv-orlando> otherwise… what about the linux bridge agent :)
21:42:30 <salv-orlando> or the HDN agent?
21:42:49 <markmcclain> yeah let's work towards 2.0 and see what happens
21:43:03 <markmcclain> and if not let's kick off a ML discussion since we're missing so many folks today
21:43:15 <markmcclain> mlavalle: Any other tempest items to discuss?
21:43:18 <mlavalle> on the api test gap analysis front, I had a conversation earlier today with sdague. We are not developing negative api tests manually anymore. There will be soon a generative tool that will do this automatically. This simplifies a lot the gap analysis and the tests we need to implement. I will update the ether pad accordingly
21:43:41 <markmcclain> good to know
21:43:50 <mlavalle> finally, I am keeping tabs on people developing tests, as you can see in today's agenda
21:43:58 <markmcclain> when the pad is updated can ping the list to let everyone know of the change
21:44:06 <mlavalle> I will
21:44:24 <mlavalle> that's all I have
21:44:29 <markmcclain> thanks for updating
21:45:32 <markmcclain> Looks like all of the subteam reports are unchanged from last week
21:45:37 <markmcclain> #topic Open Discussion
21:46:17 <marun> i'm making good progress on a design doc for an eventually consistent dhcp agent
21:46:29 <salv-orlando> will you allow me to be the usual pedant annoying axxxxxe even if it's christmas?
21:46:29 <markmcclain> marun: great news
21:46:41 <markmcclain> salv-orlando: yes
21:46:56 <dkehn> devstack migrations failures and the merging of https://review.openstack.org/#/c/61663/
21:46:57 <salv-orlando> can we just stop let third-party CIs vote if they're still WIPs?
21:47:06 <markmcclain> yes
21:47:20 <irenab> need some guidance on fix backporting to havana. Can somebody help, please?
21:47:28 <markmcclain> #action markmcclain to reach out to mis-behaving 3rd party systems
21:48:06 <markmcclain> irenab: do you need help on the process?
21:48:15 <irenab> yes
21:48:19 <markmcclain> irenab: https://wiki.openstack.org/wiki/StableBranch#Workflow
21:48:39 <irenab> markmclain: thanks
21:48:51 <markmcclain> dkehn: the patch needs to fix trunk vs revising released migrations
21:49:09 <dkehn> thx
21:49:12 <markmcclain> revising released migration causes problems for deployers
21:50:28 <dkehn> markmcclain: all I know is that you can't pull from trunk and get devstack to work
21:50:52 <markmcclain> dkehn: you sure? we have items passing the gate each day
21:50:59 <markmcclain> and that's exactly what they do
21:50:59 <marun> markmcclain: i have that same problem
21:51:11 <beagles> same
21:51:28 <marun> markmcclain: whatever the gate is doing, it's not replicating developer use well enough to prevent breakage
21:51:36 <dkehn> markmcclain: this when you run locally
21:52:08 <dkehn> bingo
21:52:10 <marun> oh, i know what it is
21:52:12 <marun> damnit
21:52:14 <marun> it's lbaas
21:52:29 <markmcclain> marun: ???
21:52:31 <marun> the gate works because it enables lbaas by default
21:52:34 <dkehn> marun: explain
21:52:42 <dkehn> ok
21:52:50 <marun> enabling lbaas ensures that a migration that fixes the problem is run
21:53:10 <markmcclain> makes sense
21:53:12 <marun> for those of us that by haven't configured lbaas, devstack is broken at present
21:53:20 <markmcclain> I test with lbaas running locally
21:53:43 <marun> so, at least we have a workaround.  but a fix is still needed.
21:53:48 <markmcclain> right
21:53:49 <salv-orlando> marun is correct, I encountered the same problem. There is a specific migration which so far is triggered only if the lbaas service plugin is enabled
21:55:48 <markmcclain> well that should make it easier to craft a good set of changes to fix the bad migrations
21:56:38 <markmcclain> Any other items for open discussion?
21:56:48 <dkehn> so basically lbaas MUST be enabled fto work locally
21:57:46 <markmcclain> dkehn: right let's discuss in -neutron after the meeting
21:57:54 <dkehn> ok
21:58:36 <salv-orlando> I think this affects also production deployments, not just devstack.
21:58:45 <markmcclain> salv-orlando: right
21:59:18 <markmcclain> We're out of time this week, so we'll jump into -neutron to continue this discussion
21:59:19 <salv-orlando> and I have no idea about how to back port without creating a branch in alembic migrations
21:59:26 <salv-orlando> k see you there
21:59:51 <markmcclain> Expect the community to be pretty quiet over the next 10 days, so email might be best way to communicate since many will be on holiday
22:00:58 <markmcclain> For those traveling for the holidays.. safe travels and our next meeting will be Jan 6th
22:01:04 <markmcclain> #endmeeting