21:02:00 #startmeeting networking 21:02:01 Meeting started Mon May 19 21:02:00 2014 UTC and is due to finish in 60 minutes. The chair is mestery. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:05 The meeting name has been set to 'networking' 21:02:12 #link https://wiki.openstack.org/wiki/Network/Meetings Agenda 21:02:20 I hope everyone made it back from the Summit ok. 21:02:26 Who else needs a week of vacation to recover? :) 21:02:46 I wish :) 21:02:47 o/ 21:02:48 mestery: :) 21:02:55 no rest for the wicked 21:02:57 +2 21:02:59 #topic Announcements 21:03:00 Hello everybody 21:03:19 The main items I wanted to highlight are the two mid-cycle meetups/sprints we have planned now. 21:03:24 Please see the agenda for details. 21:03:37 For the LBaaS mid-cycle, it woudl be good to get at least one more core to attend to back markmcclain and I up. 21:04:03 For the nova-network parity sprint in MN, the hotel block shoudl be available now. 21:04:30 I encourage people to signup for one or both sprints! 21:04:33 Any questions on these? 21:04:44 hi! 21:05:07 mestery: dates for each? 21:05:18 pcm__: Dates are in the etherpad, I'll highlight them here: 21:05:34 #info nova-network parity sprint is July 9-11 21:05:44 #info LBaaS sprint is June 17-19 21:05:58 mestery: thanks 21:06:07 OK, moving on. 21:06:10 #topic Bugs 21:06:15 jaypipes: Here? 21:06:31 mestery: yup. 21:06:33 #link http://lists.openstack.org/pipermail/openstack-dev/2014-May/035264.html Galera issue highlighted during DB OPs sessions 21:06:51 jaypipes has highlighted an issue we should discuss here found during the OPs sessions on DB last week. 21:07:00 jaypipes: The floor is yours! 21:07:04 mestery: thx. 21:07:22 mestery: so, the problem is that MySQL Galera does not support SELECT ... FOR UPDATE. 21:07:32 jaypipes: for clustering mode, right? 21:07:34 mestery: behavior is not deterministic 21:07:57 ouch 21:08:01 nati_ueno: no, there isn't any mode per-se... it's just MySQL Galera does not have an idea of multi-node read locks 21:08:26 this just pretty much means at the moment mysql galera can't use neutron. 21:08:33 salv-orlando: Yes, agreed. 21:08:33 I am afraid I do not see short term solutions 21:08:34 jaypipes: oh, so even if the user is using master-slave, select for update can't be used? 21:08:37 and since Galera is by far the most popular deployment option for database at this point, we need to figure out a solution for nova and neutron... 21:08:49 nati_ueno: no, this does not have to do with standard mysql master-slave... 21:08:53 long term tasks should solve this a bit bitter 21:08:57 s/bitter/better 21:09:03 markmcclain: yes, definitely. 21:09:13 markmcclain: as would an external quota management piece... 21:09:19 we lock the tables now because our retries are inconsistent 21:09:19 i vote to remove all SQL … FOR UPDATE locks :-) they are the source of the wonderful eventlet/mysql deadlock 21:09:19 but that is quite far in future 21:09:20 is this something that we need a fix for short term? 21:09:26 jaypipes: I got it 21:09:34 kevinbenton: uh, it's not free 21:09:38 marun: well, here's the deal.. 21:09:39 we still have to ensure consistency 21:09:40 jaypipes: can we commit to fixing this in Juno? 21:09:45 marun: I think a short term fix is needed, yes. 21:09:48 and release note it for Icehouse? 21:10:08 mestery: I don't think we can come up with a proper short term solution to the locking 21:10:10 marun: the problem manifests itself as a deadlock right now, so there isn't actually an issue with inconsistency, AFAICT. 21:10:12 markmcclain: Can we even fix this in Icehouse now? Seems like a large change for stable anyway you cut it. 21:10:20 mestery: my concern is whether a short-term fix is possible, not that we need one. 21:10:23 without even more coarse locks 21:10:27 markmcclain: Was afraid of that. 21:10:30 mestery: this stuff isn't simple afaik 21:10:35 the problem is in handling the "deadlock" (which is really just a timeout on galera's failure to certify an update request) 21:10:50 jaypipes: active clusters also tend to suffer from deadlock in neutron as there's no retry_on_deadlock mechanisms 21:10:55 marun markmcclain: So do we move forward faster with taskflow then? 21:11:18 mestery: yes 21:11:18 jaypipes: we discussed internally at Comcast, and Sridrar Basam replied on the list 21:11:43 I think short-term, the "solution" is to note the issue in the operations guide and just have some awareness of this spread in the community. 21:11:47 perhaps we can engage him further, since he has a bit of experience running galera & neutron 21:11:49 so basically is taskflow the solution to db locking problems? 21:11:55 salv-orlando: so would retry-on-deadlock be an alternative to taskflow that could be implemented short-term? 21:12:05 the medium-term solution is to work on taskflow in neutron and a better, lock-free quota driver implementation in nova 21:12:08 taskflow -> taskflow/lock_for_update 21:12:34 jaypipes: I agree with that plan. 21:12:39 markmcclain: I’d really like to see a short writeup describing how task flow will be used to eliminate the lock_for_updates ASAP 21:12:50 and the long-term solution is to identify a distrbuted lock manager that may be used at varying degrees of granularity 21:12:54 jaypipes +1 21:13:05 ahem 21:13:13 salv-orlando: ahem or amen? 21:13:23 * mestery wasn't sure what salv-orlando meant either. 21:13:28 I don't think introducing distributed coordination is a great idea. I mean distributed coordination *IS GREAT* 21:13:51 * jaypipes lost. 21:13:53 but we should strive to avoid it 21:13:54 salv-orlando: long-term we'll all be dead, so no worries ;) 21:14:04 marun: ha ha ha 21:14:09 marun: That's a rather grim view of this particular locking problem. :) 21:14:16 salv-orlando: well, you're the one with post-grad degree in distributed systems. 21:14:19 armax: your thoughts? 21:14:22 I stole it 21:14:31 marun: my thoughts are that even though I feel the urgency of this matter I wonder whether this is quickly eating too much time out of the meeting 21:14:32 :) 21:14:35 salv-orlando: so, I agree that distrbuted lock management should be avoid if possible, but there are certainly cases where we need it. 21:14:40 AGreed armax. 21:14:41 hah 21:14:44 Lets move on for htis one. 21:14:46 I agree with jaypipes 21:14:47 I think we need salv-orlando to do a lecture :) 21:14:50 optimistic locking! 21:14:56 we can't just keep ignoring the problem 21:15:07 sc68cal: I will refer to the right people that can give you a lecture. 21:15:16 jaypipes: Thanks for bringing this to our attention! I'll take an action item to track this and do the short term legwork of notifying the community and docs. 21:15:18 rkukura: will post one 21:15:25 mestery: cheers. 21:15:32 markmcclain: thanks! 21:15:33 I feel not entitled to do this. I am not even able to synchronize two threads 21:15:48 salv-orlando: not possible with greenthreads, so don't take it hard. 21:15:49 #action mestery to notify docs and the broader community about the mysql galera issue 21:15:58 salv-orlando: you always say who don't know how to do, teach 21:16:00 right? 21:16:37 mestery: you can assign to me the doc bug 21:16:46 emaganap: Thank! 21:16:57 OK, there are a few other bugs listed there, including the "ssh bug". 21:17:05 salv-orlando: Should we just file a new one for that at this point? 21:17:14 salv-orlando: We both seemed to agree on that in the bug itself. 21:17:20 yes the fingerprint has been removed from E-R 21:17:27 markmcclain: Awesome, thanks! 21:17:53 markmcclain: So, I'll close that one and indicate we should be opening new ones for "ssh connection issues" going forward, hopefully debugged a bit futther than just "ssh connection failure" 21:17:53 mestery: thank jogo_ he pushed the change up 21:17:59 I am looking at the logs for connectivity problems; first - most of them are in grenade jobs 21:18:00 * mestery thanks jogo_ 21:18:32 salv-orlando: Interesting, let me know if you need a second set of eyes at any point. 21:18:55 Any other bugs the team should be aware of at this point? 21:19:05 mestery: I got one 21:19:13 https://bugs.launchpad.net/neutron/+bug/1314850 21:19:14 Launchpad bug 1314850 in neutron "'module' object has no attribute 'get_engine'" [Medium,Fix committed] 21:19:14 I am focusing one the ones on voting jobs at the moment; unfortunately it's hard to write e-r queries for them. Anyway, I will update on the ML when I have more detai.s. 21:19:30 Thanks salv-orlando. 21:19:53 mestery: I have seen this hit few times on stable/Icehouse - is there any chance to backport it? 21:19:55 Sukhdev: Can you elaborate on this bug? It's marked Medium and committed. 21:19:59 Sukhdev: Got it. 21:20:30 Sukhdev: Seems like it would be possible. kevinbenton, any chance you can propose a backport to icehouse? 21:20:37 mestery: yes 21:20:52 #action kevinbenton to propose a backport to icehouse of https://bugs.launchpad.net/neutron/+bug/1314850 21:20:53 Launchpad bug 1314850 in neutron "'module' object has no attribute 'get_engine'" [Medium,Fix committed] 21:20:53 kevinbenton mestery: Thanks this will really help 21:21:01 Thanks kevinbenton! 21:21:13 OK, moving on. 21:21:18 #topic Docs 21:21:26 emaganap: How are things going in doc-land, post Summit? 21:21:41 mestery: hi there! 21:22:11 mestery: things are very slow.. no new updates from my side 21:22:19 emaganap: OK, thanks! 21:22:22 mestery: I will review where we are and update wiki 21:22:30 emaganap: Thank you! 21:22:39 #topic Nova Parity 21:22:47 markmcclain: Hi there! 21:22:59 I wanted to highlight this section today given it's importance for Juno. 21:23:36 #link https://wiki.openstack.org/wiki/Governance/TechnicalCommittee/Neutron_Gap_Coverage Nova-Network Gap Plan 21:23:49 Most of these have BPs or bugs, I'm going to make sure by end of week they all do. :) 21:23:54 That way I can track them for Juno. 21:24:14 And this is also a chance to highlight again the mid-cycle where we'll hopefully close on most of these before Juno-2. 21:24:40 sorry flaky connection.. the good thing is that most are well underway 21:24:54 markmcclain: Yup, exactly. And thanks for driving this one! 21:25:18 happy to 21:25:19 #topic Tempest 21:25:34 mlavalle: Any updates for this week on Tempest? 21:25:52 mestery: we have merged a couple of more api tests 21:26:06 I got a couple of neutron folks test this one: https://review.openstack.org/#/c/92436/ 21:26:07 only 4 more to go to finish that 21:26:08 Cool! I saw that! We're very close to complete coverage I beleive, right? 21:26:11 Awesome! 21:26:14 which is not api related 21:26:35 this week we will start implmeneting the plan we reviewed in Atlanta 21:26:39 armax: Thanks for the pointer! 21:26:43 but I thought it was worth drawing attention to because the way it was designed initially me a litle 21:26:48 and that's really all I have 21:26:51 s/litle/little :) 21:27:13 OK, thanks mlavalle and armax! 21:27:29 armax: I will keep track of this test 21:27:43 mlavalle: thanks, if we can get tempest cores look at it it'd be good 21:27:56 armax: will do 21:28:14 #topic Summit Post Mortem 21:28:17 mlavalle: and since I have your attention this one too: https://review.openstack.org/#/c/90427/ 21:28:21 ;) 21:28:28 There are other updates from sub-teams on the wiki, I encourage people to read those. 21:28:34 armax: :-) 21:28:49 But I wanted to take a few minutes to talk some Summit items if anyone wants to here. 21:29:09 A lot of energy is ready there, and a lot of ideas are floated, I thought we could talk about some of those here. 21:29:31 One thing which comes to mind is something marun and I spoke about Wednesday night around coding sprints at the summit. 21:29:51 marun: This was your idea about having sprints in the afternoon and sessions in the morning. Something to consider, even for the broader project. 21:30:37 mestery: you mean during the design summit? 21:30:37 which afternoon, and which morning? 21:30:37 mestery: +1 21:30:38 UTC? 21:30:40 the foundation sent out a surey 21:30:41 ;) 21:31:01 or local? 21:31:04 markmcclain: Thanks for pointing the survey out! 21:31:10 make sure to express that desire for coding sprint time 21:31:19 It would be local, and would try to capture the energy people have and make use of it while we're all there. 21:31:32 we won't be able to change things for paris but I hope we could start sprinting next spring 21:31:46 mestery: seems fair, and that's definitely the way to start I guess 21:31:49 marun: +1 to that! This was a good idea. 21:31:50 pycon does something very similar 21:32:02 markmcclain: that was definitely the inspiration 21:32:09 markmcclain: Exactly! Anyways, this is something we should all propose back to the foundation in the survery replies. 21:32:52 mestery, marun: Does this mean less design sessions, or more days of both? 21:33:00 +1 to coding sprint at the summit - would love to have fast review cycles since in person 21:33:04 rkukura: hopefully the latter 21:33:22 rkukura marun: +1 21:33:43 let's have the code spring in next week in paris. so we can enjoy weekend in there :P 21:33:56 nati_ueno: Ha! 21:33:57 nati_ueno: if only 21:34:09 OK, one other topic was all the procedural things we as a team discussed. 21:34:25 Please note I'm going to sift through that and slowly start implementing it. 21:35:02 Lots of good ideas in there, but it will take me a bit to get to all of it. So please bear with me. :) 21:35:30 Anything else Summit related anyone wants to bring up? 21:35:48 mestery: nothing that wouldn't require asbestos pants ;) 21:36:11 * mestery gets out his sunglasses at least. 21:36:12 marun: ;) 21:36:25 #topic Open Discussion 21:36:40 nati_ueno: maybe you can talk about your prototype? 21:36:47 One item I'd like to point out here is the new NFV sub-team/SIG/working group/project started by cdub and russellb last week. 21:36:58 marun: ya. I'll send it to the dev mailing list when it get ready 21:37:04 nati_ueno: +1 21:37:06 marun: I'm mering it to the reviewstats 21:37:15 marun: you can also see loc now 21:37:17 This new NFV team is taking input from many other projects, and Neutron is one of them. 21:37:21 mestery: I wonder if we should start adding a section to the meeting to talk about 3rd CI compliance 21:37:28 armax: +1 21:37:37 armax: +1 to that. I was in the 3rd party meeting this morning in fact. 21:37:48 today while I was doing my review chores... 21:37:51 armax: We need a representative for that weekly meeting, I can do that, but perhaps we should rotate that duty? 21:38:11 I noticed a number of 3rd CI services account were not lively... 21:38:14 oh, here's a procedural question. I have a blueprint that I'm kicking around in my head, about doing scenario tests for the provider networking extension - does that go in Neutron specs or Tempest ? 21:38:27 it's sort of kind of related to btoh 21:38:30 mestery: rotating makes sense 21:38:31 armax: I think we should move most of the 3rd party CI stuff to the project-wide meeting on IRC. Thoughts? 21:38:32 armax: +1 21:38:55 sc68cal: I think tempest makes the most sense there. 21:39:02 +1 - since there was a 3rd party ci meeting earlier today, so it looks like there's plan to do project-wide 21:39:15 mestery: agreed, but havin a slot during the neutron irc is good 21:39:21 sc68cal: we should be including a requirement for test planning in spec docs, though. 21:39:24 armax: The reason for moving the disucssion there is because there is a lot of ideas and requirements which are cross-project wide with regards to 3rd party testing. 21:39:24 because then we can talk about the neutron specific ci's 21:39:31 features aren't done until they are tested 21:39:36 armax: Good point, I'll add it going forward. 21:39:47 mestery: I am talking more about what do to in terms of who is complying and what to do in this case 21:40:03 armax: Ah, got it. 21:40:49 I have a bad memory… but I recall back in Hong Kong announcing that by Juno release it was either comply or out of tree 21:40:49 this might help us prioritaze reviews as well 21:40:57 armax: tangential, but I think we should be enforcing 3rd party account naming consistency so the bookmarklet doesn't have to be maintained in the face of new jobs. 21:41:01 I mean not me announcing but markmcclain 21:41:07 cuz if patches are pushed without CI backing them 21:41:32 then reviewers can give them lower priority 21:41:38 and also - somebody should define what "complying" means. 21:41:42 armax: +1 21:41:46 this is something I forgot to bring up at the summit 21:41:53 marun: I was thinking about your session 21:41:54 armax salv-orlando: I will document this and send out for review. 21:42:11 marun: but I was too tired/ill to be partecipating very actively ;) 21:42:13 armax: nati_ueno's work on providing more review info could also track whether the correct 3rd party ci is reporting 21:42:22 #action mestery to document on the wiki 3rd party compliance terms and implications of not complying 21:42:53 marun: Agreed! 21:43:02 marun: armax: mestery: yes 21:43:06 salv-orlando: I think the terms of complying were stated 21:43:25 but we should have been more specific on tests required to execute 21:43:53 markmcclain: Yes, and also on exactly the steps on what happens with not complying as well. 21:44:04 e.g. removal from tree, docs, etc. And a timeline for that, etc. 21:44:33 mestery: that would be git rm -r ./neutron/plugins//* && git commit -a && git review 21:44:51 salv-orlando: if it only were that simple 21:44:54 salv-orlando: :P 21:44:58 :P 21:45:13 I suspect a timeline for the offender would be a good idea 21:45:16 how is it not that simple? ;) 21:45:21 salv-orlando: Like I said, I have an action item to go through the existing documentation on this and edit as needed, and feedback will be great of course1 21:45:31 marun: removing the offending plugin as salv-orlando descrived 21:45:35 marun: Remember, you're wearing the fire resistant suit, it's easier with that on. 21:45:36 sure, I think we can stop it here 21:45:38 *described 21:46:21 the overarching goal, of course, is accountability 21:46:27 under the heading of new things (when we get there) 21:46:39 I also wanted to note I found it uncanny how closely salv-orlando was able to follow discussion on the etherpad remotely. 21:46:46 marun: "we need to account all the things" 21:46:47 is some form of notification back NB 21:46:57 armax: +1! 21:47:00 or at least back NB to neutron that is 21:47:15 yeah salv-orlando's ability was impressive 21:47:37 salv-orlando: he fooled us, but he secrectly told me he wrote a bot that did that 21:47:49 regXboi: Notifications for what? sorry if I'm not following a conversation we've had previously. :) 21:47:50 * armax is kidding 21:47:58 I just paid armax to put notes on the etherpad with my name 21:48:07 salv-orlando: yup…I am bot 21:48:08 salv-orlando: Now it all makes sense. 21:48:17 markmcclain: the tempest api tests section in https://etherpad.openstack.org/p/neutron-nova-parity is outdated. Is it ok if I update it? 21:48:36 mestery: there are cases where southbound *things* may need to allocate from the allocation pool. Right now (afaik) the only way to do this is to either 21:48:39 salv-orlando: btw the check didn't clear yey, what's going on? 21:48:44 (i) go around and come in the front door 21:48:48 mlavalle: Please do! I actually updated part of that this morning with ones which have already merged. 21:48:56 (ii) put the allocation logic into the plugin 21:48:56 mlavalle: yes please 21:49:12 remember that conversation now? 21:49:17 wil do :-) 21:49:32 regXboi: Yes, now I remember. 21:49:41 I found salv-orlando's BP for advanced networking scenarios and linked my blueprint to it. https://blueprints.launchpad.net/tempest/+spec/neutron-provider-networking 21:49:50 regXboi: I don't have a solution off the top of my head now, but at least I'm up to speed. 21:50:02 I'd really like to come up with a (iii) allow the SB thing to let the plugin know what it needs 21:50:14 so that the logic doesn't have to be in the plugin (as in ii) 21:50:42 because (i) is just frankly *hideous* 21:51:06 If I'm not mistaken, the OpenContrail plugin solves this by not using any Neutron DB operations, but maybe I'm just remembering this all wrong. 21:51:23 opencontrail I think it's a passthrough plugin 21:51:51 passthrough == maps neutron calls onto backend calls & does not store anything in neutron 21:52:08 yes it is 21:52:17 yes, I'm looking for a third option 21:52:31 it pretty much uses neutron as a shim, which might sound bad, but as long as it works it's fine 21:52:37 speaking of third options, also brings us to overlapping_ips :) 21:52:40 the reason we've traditionally made folk either persue i or ii is security 21:52:55 right now we don't have another way to provide a secure interface 21:53:10 when signed messaging is possible we could look at alternatives for RPC 21:53:16 I'd be interested in seeing how successful contrail is with that approach. iirc everyone who has gone down that road has found they needed to use neutron as more than a shim. 21:53:34 there is also the fourth option…. promoted by the fifth column... 21:53:34 I know when I looked at neutron as a shim, I gave up pretty quickly 21:53:47 destroy neutron and do everything from scratch! 21:54:08 speaking of third options... brings me to allow_overlapping_ips 21:54:08 * mestery hands salv-orlando another scotch. :) 21:54:22 I'm thinking that really should be a tri-state 21:54:42 salv-orlando: maybe if our process improvements prove successful that would be an option. as of now, though, I worry that we'd end up in the same state. 21:55:29 scope subnet search to network, tenant, all tenants 21:55:35 that middle value is missing today 21:57:01 We're getting close to the end ... of this meeting at least. 21:57:13 It was great seeing everyone last week! 21:57:35 And now we begin the March to Juno! 21:57:51 mestery: thanks for organizing the sessions last week 21:58:22 markmcclain: +1, thanks mestery! 21:58:24 markmcclain: NP, thanks for the guidance on that front! 21:58:32 OK, see you all next week! Thanks! 21:58:37 #endmeeting