#openstack-meeting log

21:02:16 <markmcclain> #startmeeting Networking
21:02:17 <openstack> Meeting started Mon Jan  6 21:02:16 2014 UTC and is due to finish in 60 minutes.  The chair is markmcclain. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:21 <openstack> The meeting name has been set to 'networking'
21:02:23 <mestery> Ha!
21:02:33 * mestery can see his breath as he types in this meeting.
21:02:49 <markmcclain> #link https://wiki.openstack.org/wiki/Network/Meetings
21:02:51 <enikanorov> cold, heh?
21:02:55 <dkehn> I'm guessing anything south of the mason  dixon line
21:03:52 <markmcclain> so it's been two weeks since our last meeting and many took time off during the holidays
21:03:53 <mestery> Almost -30 air temperature this morning, at least -50 with wind chill, all temps F. As cold as I can remember it.
21:04:40 <markmcclain> mestery: that just sounds painful
21:04:47 <mlavalle> mestery: where do you live?
21:04:49 <Sukhdev> Wow!!
21:04:55 <salv-orlando> mestery: if next week in Montreal is anything like that, I think I'll just die. That is way below my operating temperature. Seriously I have a label which says "operate strictly between 0C an 40C"
21:05:11 <mestery> mlavelle: Minnesota
21:05:12 <dkehn> welcome to the polar vortex
21:05:26 <markmcclain> Icehouse-2 is Jan 24rd
21:05:30 <markmcclain> oops 23rd
21:05:57 <markmcclain> The tempest/Neutron sprint is next week
21:06:01 * mestery hopes someone brings salv-orlando a toque: http://en.wikipedia.org/wiki/Toque#Canadian_usage
21:06:40 <markmcclain> #topic Bugs
21:06:54 <salv-orlando> mestery: believe it or not I have a Toronto Maple Leafs toque, somewhere
21:07:11 * mestery loves the Maple Leafs. :)
21:07:12 <dkehn> mestery: its a balmy 28F in montreal
21:07:26 <mestery> dkehn: Psssh. That's for amateurs. :)
21:07:40 <salv-orlando> dkehn: oh yeah? I'm so packing my swimsuit and sunglasses then
21:07:54 <markmcclain> so we're still tracking the same critical bugs as before Christmas
21:07:55 <dkehn> http://www.weather.com/weather/extended/CAXX0301?par=yahoo&site=www.yahoo.com&promo=extendedforecast&cm_ven=Yahoo&cm_cat=www.yahoo.com&cm_pla=forecastpage&cm_ite=CityPage
21:08:10 <dkehn> salv-orlando: I'll meet you by the pool
21:08:22 <salv-orlando> markmcclain: I have updated the status of bug 1253896
21:08:22 <enikanorov> markmcclain: at least one should go away
21:08:31 <markmcclain> salv-orlando: ok
21:08:37 <markmcclain> enikanorov: which one?
21:08:50 <enikanorov> "timeout waiting for thing" one
21:09:19 <markmcclain> great
21:09:23 <enikanorov> the fix has been committed to nova
21:10:43 <markmcclain> enikanorov: do you have a link for the fix?
21:10:49 <markmcclain> I want to link it into https://bugs.launchpad.net/neutron/+bug/1250168
21:10:56 <enikanorov> let me find
21:11:14 <markmcclain> nati_ueno: looks like we have reviews ready for https://bugs.launchpad.net/neutron/+bug/1112912
21:11:43 <markmcclain> nati_ueno: still seems to failing jenkins
21:11:53 <markmcclain> have you been able to triage it?
21:12:02 <nati_ueno> markmcclain: I got it. I'll fix it in this afternoon
21:12:16 <markmcclain> great
21:12:38 <nati_ueno> markmcclain: Jenkins looks working for me (Dec16) but may be rebase needed
21:12:45 <enikanorov> markmcclain: https://review.openstack.org/#/c/64383/
21:12:56 <pcm_> I've getting Jenkin's failures on my code review (only docstring changes in latest version).
21:13:03 <markmcclain> nati_ueno: please rebase
21:13:06 <markmcclain> enikanorov: thanks
21:13:09 <pcm_> Can someone help me off-line to devug
21:13:09 <nati_ueno> markmcclain: gotcha
21:13:15 <pcm_> debug?
21:13:38 <markmcclain> pcm_: ask around in the channel after the meeting
21:13:45 <pcm_> or do I need to rebase (was done last week).
21:13:50 <markmcclain> if I didn't have to run I'd hang around and help
21:13:50 <pcm_> markmcclain: OK
21:14:06 <markmcclain> pcm_: I'd try that
21:14:13 <pcm_> markmcclain: OK.
21:14:40 <markmcclain> marun: this bug is still open
21:14:40 <markmcclain> https://bugs.launchpad.net/neutron/+bug/1192381
21:14:45 <markmcclain> can we consider it closed?
21:14:54 <marun> markmcclain: I think so, yes.
21:15:10 <markmcclain> ok
21:15:12 <marun> markmcclain: there is a follow-on blueprint that will ensure eventual consistency: https://blueprints.launchpad.net/neutron/+spec/eventually-consistent-dhcp-agent
21:15:25 <marun> markmcclain: but the best we can do without a refactor has already been committed
21:15:33 <markmcclain> ok
21:15:45 <markmcclain> what milestone should I target for the bp?
21:16:06 <marun> markmcclain: probably icehouse-3
21:16:10 <markmcclain> ok
21:16:24 <marun> markmcclain: hopefully progress can come sooner but i don't want to rush it
21:17:09 <markmcclain> yeah we definitely want to make sure we maintain stability
21:17:27 <markmcclain> Any other critical bugs the team needs to discuss?
21:17:42 <salv-orlando> do we skip our favourite bug?
21:17:46 <salv-orlando> bug 1253896?
21:18:15 <markmcclain> we did
21:18:24 <markmcclain> https://bugs.launchpad.net/neutron/+bug/1253896
21:18:42 <markmcclain> looks like you did some research on it late last week
21:18:44 <salv-orlando> come on, we're actually not fixing this bug just because we love talking about it.
21:19:05 <salv-orlando> seriously, failures in non-isolated jobs are down to 0% which is good for the gate
21:19:12 <markmcclain> yeah
21:19:46 <salv-orlando> however we want to make isolated and parallel jobs the default solution, so it's not so good that just the isolated jobs has a failure rate of about ^%
21:19:50 <salv-orlando> sorry 6%
21:20:02 <salv-orlando> and that's up ftom about 2.5% before christmas
21:20:07 <markmcclain> ugh
21:20:22 <marun> do we want to get isolated/parallel working as currently defined?
21:20:36 <marun> it's currently a really beastly stress test that is unrepresentative of real-world usage for the most part
21:20:50 <salv-orlando> marun: yes, that is correct, but I would like to move this discussion in the tempest part
21:20:51 <marun> (because of running on a single node that is cpu/io bound like crazy)
21:21:05 <marun> salv-orlando: fair enough
21:21:32 <markmcclain> We'll return to parallel tests in tempest section
21:21:43 <salv-orlando> for the current situation, we need to look at the logs. If we conclude that the reasons for the failure are the same that are causing failures in the parallel jobs, then we should just wait for the patches to merge; but I doubt that.
21:22:05 <salv-orlando> It would be great if we can get some fresh eyes to look at the logs, as I won't have much time during this week.
21:22:08 <markmcclain> yeah I am interested to know what caused the failure rate to triple
21:22:31 <markmcclain> Any other bugs the team needs to discuss?
21:22:35 <salv-orlando> I am too. Note that I've been taking 24 hours samples - the first on Dec 23 and the second on Jan 2
21:23:14 <salv-orlando> that's is all. I would be happier if I somebody else volunteers to look at the logs and provide his/her feedback
21:23:22 <salv-orlando> especially if that somebody is coming to montreal next week.
21:24:01 <markmcclain> I've got training tomorrow, but I'll try to spend time digging if anyone has spare cycles before then feel free update the bug
21:24:05 * salv-orlando is sure people are now rushing to logs.openstack.org to check the logs
21:24:50 <markmcclain> #topic Docs
21:25:04 <markmcclain> emagana is out today but has filled in the report
21:25:15 <markmcclain> #topic Tempest
21:25:24 <annegentle_> markmcclain: can I bring up a docs item?
21:25:28 <markmcclain> #undo
21:25:29 <openstack> Removing item from minutes: <ircmeeting.items.Topic object at 0x3072fd0>
21:25:36 <markmcclain> annegentle_: yes!
21:25:39 <annegentle_> markmcclain: we're having a discussion on the mailing list about  a new networking -only guide
21:25:45 <annegentle_> #link http://lists.openstack.org/pipermail/openstack-docs/2014-January/003582.html
21:25:54 <annegentle_> that just goes to a mid-thread discussion
21:26:04 <annegentle_> but since Edgar has been out I haven't been able to ping him about it
21:26:33 <annegentle_> so, just wanted to put it on your radar... I'm hesitant to add another guide what with all the reorg we've been doing (augggh) but wanted to see what you all think, would it be useful, better, worse?
21:27:13 <markmcclain> I missed this thread, so thanks for raising it
21:27:30 <sc68cal> +1
21:27:55 <annegentle_> I agree with Tim Bell that "if the image management gets it's own book, why not networking"
21:28:14 <annegentle_> but, it's a pile of work, with real underlying teaching needs, so we need to find a good "owner"
21:28:18 <sc68cal> subscribed to the BP - i'll pitch in from a lot of doc that I wrote up internally while I was learning Neutron
21:28:24 <mestery> I'm all in favor of a guide for networking.
21:28:37 <annegentle_> anyway, feel free to discuss amongst yourselves, comment on the blueprint, etc. Ohh thanks sc68cal
21:28:54 <annegentle_> markmcclain: thanks for letting me pop in :)
21:29:03 <annegentle_> carry on
21:29:05 <salv-orlando> my 2p at first glance is that the Neutron community has struggled a bit to handle the current workload
21:29:37 <annegentle_> salv-orlando: yep really it'd be best to have a tech writer take it
21:29:51 <annegentle_> salv-orlando: if you know of anyone, I could even ask for a contract
21:30:48 <salv-orlando> annegentle_: exactly my point is that we don't have a doc guy so far; just people taking turns in playing that role. I'll let you know if I hear of somebody interested
21:31:00 <annegentle_> k thanks
21:31:19 <markmcclain> ok.. everyone feel free to catchup on the thread and chime in on the mailing list
21:31:25 <markmcclain> annegentle_: thanks for pointing this out
21:31:38 <markmcclain> annegentle_: anything else since edgar is out this week?
21:31:40 <amotoki> note that it is a thread in docs ML
21:31:52 <markmcclain> amotoki: good reminder
21:32:19 <markmcclain> #topic tempest
21:32:19 * mestery didn't know there was a docs mailer and wonders why that is in fact ... :(
21:32:49 <markmcclain> Let's circle back to parallel testing before we dive into Tempest
21:32:58 <markmcclain> salv-orlando: want to update on parallel testing?
21:33:06 <salv-orlando> sure markmcclain
21:33:39 <salv-orlando> http://lists.openstack.org/pipermail/openstack-dev/2013-December/023109.html
21:34:02 <salv-orlando> We have a bunch of patches which are aimed at solving the structural problems we found in the OVS agent.
21:34:15 <salv-orlando> They're are all listed in the email linked above
21:34:37 <markmcclain> http://lists.openstack.org/pipermail/openstack-dev/2014-January/023289.html
21:34:38 <salv-orlando> While running parallel tests we noticed a set of new issues.
21:35:24 * salv-orlando is looking for a link, sorry
21:35:52 <salv-orlando> these issues have all been tagged with neutron-parallel: https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
21:36:20 <salv-orlando> a single issue is causing 90% of tests to fail and has to do with an error on port quota check.
21:36:24 <salv-orlando> I have a fix for it.
21:36:39 <markmcclain> ok
21:36:55 <salv-orlando> But sdague rightly asked me to make sure that I'm not just gaming the test and hiding what would be a fundamental issue inneutron
21:37:06 <salv-orlando> but I'm no that anyway
21:37:35 <markmcclain> https://review.openstack.org/#/c/64217/
21:37:38 <salv-orlando> on the other issues, I think most of them are just because the tests are not parallel-safe, or do not take into account things might work differently with neutron
21:37:47 <salv-orlando> except for one error: the ssh protocol banner error
21:38:06 <salv-orlando> I think Nachi had a good hint that it might depend on the metadata server being slow or failing
21:38:12 <salv-orlando> nati-ueno: ^^^
21:38:36 <salv-orlando> because that would explain why ping works and ssh not even if iptables rules are correctly configured on the l3-agent
21:39:03 <salv-orlando> So that's the current situation. What I would love to see is people picking up all the bugs tagged with neutron-parallel
21:39:09 <salv-orlando> and squash all of them by next week
21:39:20 <salv-orlando> that's all from me. Questions?
21:39:51 <markmcclain> it would be really nice to have the parallel bugs solved before we all meetup
21:40:16 * marun gets with the squashing
21:40:24 <markmcclain> marun:  thanks
21:40:43 <markmcclain> salv-orlando: thanks for working through the parallel issue
21:41:31 <salv-orlando> I think it's now time to move to what marun said: the load imposed by tempest
21:41:40 <salv-orlando> happy to talk about that?
21:41:48 <markmcclain> yeah we can discuss
21:42:06 <nati_ueno> salv-orlando: Yes I faced the errors when metadata server didn't working well
21:42:14 <salv-orlando> I think if I interpret marun correctly, tempest with isolation and parallelisms bring the cpu on the gate close to 100%
21:42:19 <nati_ueno> salv-orlando: problem is the ssl certificate configration via metadata server
21:42:22 <marun> i think it's reasonable to do stress testing, but watching neutron fall over in a scenario that no self-respecting operator will allow is crazy.
21:42:47 * mestery agrees with marun.
21:42:48 <salv-orlando> I think this is because every test creates and wires a network with dhcp enabled, and attach is to a router (which is wired as well)
21:43:11 <salv-orlando> now some history, deriving from conversations with mtreinish
21:43:25 <salv-orlando> this has been deemed good, because it stressed the neutron a bit
21:43:59 <salv-orlando> since there is no other form of stress testing (the large ops jobs uses a fake driver, so it stresses just the api server)
21:44:18 <markmcclain> I would agree the stress has been good
21:44:20 <salv-orlando> it is arguable whether stress testing should be part of the current test suite, whose aim is functional
21:44:27 <salv-orlando> and that decision is not from me.
21:44:43 <markmcclain> also agree that functional and stress should be different conversations
21:45:21 <salv-orlando> From my side, I've noticed that in particular creating and wiring a router for every test put such a load on the l3 agent that caused long delays and potential timeouts; so I had a  patch for making router creation only optional
21:46:07 <markmcclain> salv-orlando: so it's our agent not handling the load
21:46:44 <salv-orlando> markmcclain: I would say that if you ask to wire 50 routers in a minute, it's understandable that the agent might take more than 60 seconds to process the load
21:47:18 <markmcclain> yeah agreed
21:47:24 <marun> is that a reasonable usage scenario?
21:47:45 <markmcclain> in a large cloud maybe
21:48:06 <salv-orlando> anyway, I would just like to get a consensus between neutron and qa teams
21:48:24 <salv-orlando> on how to behave wrt network resource creation for each test
21:48:46 <salv-orlando> I am happy with anything :)
21:49:15 <marun> salv-orlando: i think providing stable behavior under parallel execution is the goal.
21:49:21 <markmcclain> ++
21:49:27 <rkukura> Seems to me that the goal for functional (tempest) tests should be to run as quickly as possible, and with minimal resources (i.e. in single VM). Stress testing is different.
21:49:29 <salv-orlando> But I guess we can't do anything better than moving the discussion on the mailing list
21:49:38 <marun> rkukura: +1
21:49:43 <marun> salv-orlando: agreed
21:50:01 <salv-orlando> ok marun, can we count on you to start the thread?
21:50:01 <markmcclain> yeah I'd like to give folks an opportunity to weigh in
21:50:12 <marun> salv-orlando: can do
21:50:18 <markmcclain> marun: thanks
21:50:19 <salv-orlando> marun: thanks
21:50:29 <marun> np
21:50:36 <markmcclain> ok 10 minutes left
21:51:00 <markmcclain> mlavalle: Any items that you included on the agenda that need to be highlighted?
21:51:20 <mlavalle> nope, everything is in the agenda, let's move on
21:51:34 <markmcclain> mlavalle: thanks for providing the update
21:51:47 <mlavalle> np
21:52:03 <markmcclain> Any other Tempest items to discss?
21:52:19 <markmcclain> #topic IPv6
21:52:25 <sc68cal> hello
21:52:38 <markmcclain> I'm still working on the tail-f failure
21:52:51 <markmcclain> I've been trying to track down the system maintainer
21:53:08 <markmcclain> I found one person, but he was the point of contact
21:53:12 <markmcclain> *was not
21:53:21 <sc68cal> Thanks - that's pretty much blocking all of our progress, since it seems to scare off reviewers with the big scary -1
21:53:30 <salv-orlando> markmcclain: I was wondering if it's ok to ask the infra team to temporarily revoke voting rights to the service user
21:53:50 <markmcclain> salv-orlando: that's an interesting proposition
21:53:55 <markmcclain> I'll discussion with them
21:54:21 <sc68cal> We're also looking more deeply into what would need to be done for VIF attributes for hairpinning
21:54:22 <markmcclain> sc68cal: yeah I've changed my gerrit view so that I can see who gave -1s
21:54:29 <markmcclain> ok
21:54:34 <sc68cal> we're finding that it's a libvirt only behavior
21:54:50 <markmcclain> ok
21:55:06 <markmcclain> Anything else?
21:55:13 <sc68cal> So is it worth having a VIF attribute if it's really only one compute driver using it?
21:55:33 <sc68cal> otherwise that's it for me
21:55:44 <markmcclain> that's probably best asked on ML, so that we give everyone a chance to weigh in
21:55:51 <markmcclain> Thanks for the update
21:55:53 <markmcclain> #topic ML2
21:56:00 <markmcclain> rkukura or mestery:?
21:56:23 <mestery> We plan to drive the port delete issue to closure this week at the meeting, there has been a long-standing thread on that since before the holidays.
21:56:36 <mestery> Also, the plan for this week is to refocus on the TypeDriver enhancements as well.
21:56:45 <markmcclain> some closure on that will be good
21:56:54 <mestery> We're seeing a few more Ml2 MechanismDrivers pop up, which is also pretty cool!
21:57:00 <markmcclain> awesome
21:57:18 <mestery> That's about it for ML2, more detail in the Wednesday meeting on IRC.
21:57:35 <markmcclain> thanks for updating
21:57:36 <rkukura> nothing to add
21:57:43 <markmcclain> #topic CLI
21:58:09 <markmcclain> On Friday, a version of dependent lib was released that broke the neutronclient
21:58:21 <markmcclain> we rushed out a temporary fix
21:58:34 <markmcclain> and released 3.3.3 which is compatible with the newer version of cliff
21:58:50 <markmcclain> #action markmcclain to send out email on long term client fix
21:58:55 * anteaya notes verification voting is turned off for tail-f, see ml
21:59:22 <anteaya> also new accounts have verification voting off by default
21:59:42 <markmcclain> anteaya: thanks for clearing that up
21:59:50 <anteaya> np, sorry I'm late
21:59:50 <markmcclain> #topic Open Discussion
22:00:00 <anteaya> montreal weather: http://www.weather.com/weather/tenday/Montreal+Canada+CAXX0301
22:00:05 <pcm_> dumb question...there's been some meeting date/time changes. Is there a wiki with all meetings listed (versus scanning ML)?
22:00:18 <sc68cal> pcm_: https://wiki.openstack.org/wiki/Meetings
22:00:25 <pcm_> sc68cal: Thanks!
22:00:31 <anteaya> we are doing waves of cold and warm, so far looking warm for code sprint
22:00:42 <Swami> Hi Folks, I have posted an updated version of "Distributed virtual Router" blueprint for review. Please review it and provide your feedback.
22:00:43 <marun> oh, on the issue of ml2....
22:00:48 <enikanorov> markmcclain: neutron multihost. is it planned? I guess it should be on nova-parity list?
22:00:52 <Swami> link: https://docs.google.com/document/d/1iXMAyVMf42FTahExmGdYNGOBFyeA4e74sAO3pvr_RjA/edit
22:01:03 <marun> requiring the agent to be up to bind ports synchronously.  good? bad? discuss!
22:01:11 <enikanorov> i also remember a patch from gongysh implementing multihost
22:01:15 <markmcclain> enikanorov: yes there are multiple groups working on solutions to tackle it
22:01:42 <rkukura> marun: Sounds like a question for openstack-dev
22:01:42 <enikanorov> markmcclain: do you knwo who can I contact with to discuss it?
22:01:42 <mestery> marun: Is that with OVS and/or LB? I would think it has to be up for a synchronous bind, no?
22:01:47 <markmcclain> enikanorov: the original patch from gongysh I don't think would be accepted right now
22:01:57 <enikanorov> that's for sure
22:02:05 <markmcclain> I'll encourage the teams working on solutions to post updates to the ML
22:02:15 <enikanorov> ok, thanks
22:02:42 <marun> rkukura: fair enough
22:02:47 <rkukura> marun, mestery: I think this is the "fail to bind if agent isn't currently alive" vs. "there has been an agent on that node sometime in the past so it will eventually work, so lets just bind now"
22:02:58 <markmcclain> marun: that's definitely a ML discussion since we're over time
22:03:08 <gongysh> enikanorov: I think the DVR is a direction to replace multihost.
22:03:15 <marun> mestery: neutron is a distributed  app.  we need to act accordingly
22:03:44 <mestery> marun: Lets discuss on ML :)
22:03:55 <markmcclain> Thanks to everyone for stopping in this week… remember we have the IRC channel and ML to discuss items between meetings
22:03:55 <marun> mestery: :)
22:04:00 <markmcclain> #endmeeting