22:00:21 #startmeeting neutron_drivers 22:00:22 Meeting started Thu Jul 21 22:00:21 2016 UTC and is due to finish in 60 minutes. The chair is armax. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:00:25 The meeting name has been set to 'neutron_drivers' 22:00:35 hi 22:00:44 o/ 22:00:59 hello everyone, thanks for joining 22:01:20 :) 22:01:25 I don’t have a special reminder for drivers folks so let’s dive in 22:01:27 o/ 22:01:30 #link https://bugs.launchpad.net/neutron/+bugs?field.status%3Alist=Triaged&field.tag=rfe&orderby=datecreated&start=0 22:01:55 bug #1575146 22:01:55 bug 1575146 in neutron "[RFE] ovs port status should the same as physnet." [Wishlist,Triaged] https://launchpad.net/bugs/1575146 22:02:02 anyone had a chance to navigate it? 22:02:35 o/ 22:02:38 is this going to affect the scalability of the agents? 22:03:08 i don't think scalability should be a big issue 22:03:15 since it's local to agents 22:03:23 and it's just watching the status of one interface 22:03:23 it depends how it’s implemented :) 22:03:31 is it talking about setting port's as down when a network is configured as down by admin? 22:03:34 or about monitoring the network? 22:03:36 because we're just scanning physicals? hmm. i'd hate to make a claim that vxlan was UP, when it has way more moving parts. 22:03:43 monitoring status of physicals 22:03:52 ajo: not logical model update from what i can see 22:04:02 this is about monitoring link status on the host 22:04:09 but better than the random DOWN nonsense we have today 22:04:35 dougwig: well that field is about what is wired 22:04:40 I guess it's doable, tunnel networks being the complicated part 22:04:49 and if the interface affected ends up being used by the agent reflect the state on the affected logical ports 22:04:53 dougwig: config state vs dataplane 22:05:16 armax: well i don't think this is even updating the logical port on the server 22:05:18 is current neutron api enough to build a monitoring tool? we expose phys net relationship per network, so that should be enough for an external tool to indicate which ports are affected by a failure on infra. 22:05:26 armax: its just setting the tap device to down 22:05:28 isn't it? 22:05:39 ip link set tap238947ac down 22:05:55 no interaction with neutron server from what i can tell 22:06:02 kevinbenton: and what then? 22:06:13 kevinbenton: when you do it from hypervisor, how does it affect guest? 22:06:14 armax: the VM sees its interface state change 22:06:34 hm, that'd be nice, 22:06:34 armax: so the VM can have a failover internally to another interface 22:06:49 I haven't tested this, but it's what I understand the request is for 22:07:00 kevinbenton: right, but wouldn’t we want to reflect that all the way to the neutron logical port? 22:07:03 but shouldn't we, in that case, report back to neutron-server to set the ports as down ? 22:07:17 exactly what armax said :P 22:08:00 that's possible 22:08:08 would make sense for API visibility 22:08:16 but I don't think it's the core component of the RFE 22:08:18 kevinbenton: otherwise we’d still have physical != logical 22:08:26 armax: right 22:08:27 as far as state goes 22:08:41 if we limit this to a host local thing 22:09:10 then I could see it being a neat enanchement 22:09:18 the trick is in how to reliably detect the link failure 22:09:28 do we have up/down indication on the port model? I guess we're not talking about admin-state-up 22:09:35 btw, let's say you have physnet broken; but you still have connectivity between instances on the same node; is it fair to shut down the port completely? 22:10:07 ihrachys: that's a good concern 22:10:11 it'll be funny if the taps reflect the physical, but ovs always has the bridges marked down (which started happening in liberty for some reason) 22:10:30 another issue is that the physnet bridge may have multiple physical interfaces plugged into it 22:10:39 what do we do if one fails? 22:10:43 but the other is active 22:10:55 good concerns 22:11:03 in physical server case, we don't get a port down when an uplink of a switch is down. 22:11:13 kevinbenton: right, this may become too deployment dependent 22:11:18 amotoki: ++ 22:11:24 this wouldn't detect other topology failures 22:11:24 amotoki++ 22:11:46 I honestly think it's a job for some external tool that would be able to 1) monitor physnet state; 2) talk to guests to do orchestration 22:12:05 yeah, i'm starting to agree 22:12:08 ihrachys: actually, now that you brought this up 22:12:16 because there are too many different things that an operator might want to watch for 22:12:21 there’s also this other bug which is relevant to this discussion 22:12:26 ihrachys, or related to the debugging scenarios hynek was proposing 22:12:31 armax: :-) 22:12:35 bug #1598081 22:12:35 bug 1598081 in neutron "[RFE] Port status update" [Wishlist,Triaged] https://launchpad.net/bugs/1598081 22:13:03 ahaa 22:13:12 if we assume that out of band tools can indeed cooperate with Neutron to manage a port state 22:13:33 perhaps we do need to relax the existing API and allow third party to set the status of a logical port 22:14:06 either that or, as one alternative being proposed, introduced a new state for this specific need 22:14:10 yeah, i'm more inclined to approve this one 22:14:19 1598081? 22:14:20 because it would allow tooling to do this 22:14:22 yeah 22:14:25 + 22:14:31 kevinbenton: you mean flipping the allowed_put on port status? 22:14:32 may be even extend the port status beyond a single constant (providing more details of the status issue) 22:14:47 armax: yeah, or some API mechanism 22:14:56 maybe a new field 22:15:01 for dataplane status 22:15:10 well, I thought it won't be rest exposed; only from inside ml2 drivers? 22:15:20 kevinbenton: like force-down as nova has not for hosts? 22:15:36 there is a lot of logic tied to that status field now, i'm not sure allowing arbitrary changes of STATUS will play well with ML2 22:15:42 kevinbenton: a new field might be better in case some plugins would not tolerate the change in allow_put to True? 22:15:50 armax: yes 22:16:05 cgoncalves: what does force-down do? 22:16:24 dataplane-status ? 22:16:33 ajo: yeah, i'm thinking something like that 22:16:42 kevinbenton: overwrites 'status' 22:17:05 cgoncalves: ah, yeah i'm not sure forcing status changes will work well with ML2 22:17:07 i'd think the ml2 driver or core plugin should get to decide whether a 3rd party gets to muck with your port state. 22:17:21 cgoncalves: its likely to come along and undo the status on an agent sync 22:17:30 ok let’s report back on 1575146 based on this discussion and see if that would solve their need 22:17:41 in the meantime we can figure out 1598081 on a spec 22:17:45 kevinbenton: I mean, not the 'status' db value but REST API 22:18:17 cgoncalves: it’s probably safer to start putting something into a spec format 22:18:20 cgoncalves: even that is tied to nova notifications 22:18:33 cgoncalves: would a new field not work for your use case? 22:18:42 any other opinon on bug 1598081? 22:18:42 bug 1598081 in neutron "[RFE] Port status update" [Wishlist,Triaged] https://launchpad.net/bugs/1598081 22:19:08 armax: yes. if the second goes in, then we can partially address this by having tap status reflect the dataplane status 22:19:20 armax: so then it's just up to a tool to set that status 22:19:27 kevinbenton: right, let’s circle back on the former RFE and take it from there 22:19:30 kevinbenton: it would address half of the issue, yes 22:19:46 cgoncalves: what's the other half that it leaves out? 22:20:02 armax: spec sounds good. question is with which approach should we propose firstly 22:20:40 cgoncalves: I think the approach that explores a new status field is probably the one with the best chances 22:20:42 kevinbenton: SDN controllers reporting through their existing APIs up to the mech driver 22:20:48 armax: ok 22:20:56 cgoncalves: oh, that's not a big deal if we have a new field 22:21:07 cgoncalves: they could just use the regular update_port core plugin api at that point 22:21:24 kevinbenton: sure 22:21:25 armax: no PUT for the start? 22:21:33 +1 to no PUT 22:21:47 and we can even have this new status force the old status to DOWN as well 22:21:48 yeah, + for no PUT. we can reiterate later. 22:21:49 ihrachys: changing the semantic of PUT may not go down well for all plugins 22:22:00 so we are saying a new field for out of band, while for in band would not need additional work from neutron core, right? 22:22:12 but during the spec review we can find out potentially 22:22:31 cgoncalves: yes 22:22:36 it makes me think... isn't it in a way duplicating the /health api that Hynek is working on? 22:22:46 armax: sounds good 22:22:58 ihrachys: well this is setting a state from the API 22:23:05 ihrachys: the /health would look at this i assume as well 22:23:08 ihrachys: that’s addressing a different use case 22:23:13 ihrachys: do you have a pointer at hand? 22:23:21 I see /health as resource status on steroid 22:23:21 s 22:23:33 cgoncalves: https://review.openstack.org/308973 22:23:41 ihrachys: thanks 22:23:56 i see /health as reading the status of everything and this new field as a way for plugins to say something is broken 22:24:02 at the dataplane somewhere 22:24:06 ihrachys: now the diagnostics framework could potentially include link status checking 22:24:31 ihrachys: but we’d never go for that in-tree 22:24:46 yeah, they can be complementary, or build on each other 22:25:05 ihrachys: now as for the level of pluggability of the diagnostics framework, that is still TBD 22:25:26 questions? notes? shall we move on? 22:25:40 not sure I understand why we would not go with at least a check model for link status, but ok. we can probably move on. 22:27:00 ihrachys: please do capture your thoughts on the relevant bug reports 22:27:16 armax: will do 22:27:18 ihrachys: I suppose anything is possible 22:27:57 (time check 30min, 7 RFEs left) 22:28:00 :-) 22:28:07 bug 1580880 22:28:07 bug 1580880 in neutron "[RFE] Distributed Portbinding for all port types" [Wishlist,Triaged] https://launchpad.net/bugs/1580880 - Assigned to Andreas Scheuring (andreas-scheuring) 22:28:18 carl_baldwin: ping 22:28:24 o/ 22:28:48 We talked about this at the Nova mid-cycle. johnthetubaguy is taking some interest from the Nova side. 22:28:48 carl_baldwin: anything worth sharing about this? 22:29:20 I personally think that this ought to be driven from the Nova side for live migration. We should prioritize it to match theirs. 22:29:36 carl_baldwin: so we need to figure out shape and scope, but it’s something that’s in Nova’s hands? 22:29:58 carl_baldwin: any other Nova developer willing to sponsor? 22:30:38 No one spoke up willing to sponsor but there was general interest like it was something that they'd like to fix. 22:30:56 They have a similar issue with Cinder and they'd like to see what similarities there are. 22:31:04 at this point we have the option of marking this postponed and tackle as best effort 22:31:12 the Newton window is shut for them anyway 22:31:23 The current goal is for John, Paul Murray, Andreas, and me to get a plan ready for summit. 22:31:36 so we can take the time to iterate on the spec and revisit as soon as Ocata opens up? 22:31:42 Yes. 22:31:44 carl_baldwin: ok 22:31:51 I did look at the spec already 22:31:59 let’s continue the tango 22:32:06 I read through it to. I think it is getting better. 22:32:09 moving on? 22:32:10 *too 22:32:15 Yes, move on. 22:32:18 bug 1583694 22:32:18 bug 1583694 in neutron "[RFE] DVR support for Allowed_address_pair port that are bound to multiple ACTIVE VM ports" [Wishlist,Triaged] https://launchpad.net/bugs/1583694 - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 22:33:22 As for this one, last week we agreed we wanted to explore more formal ways to describe the particular nature of the Floating IP for the use case in which multiple ports involved are needed 22:33:40 I thought about this a little bit too. So far, I can't convince myself that a new top-level resource is needed but I don't feel strongly. 22:33:47 we’ll keep this on the backburner until we have a new proposal to look at, at this point I feel this probably as come in teh form of a spec? 22:33:58 ++ 22:33:59 carl_baldwin: right, I tend to agree to 22:34:11 too* 22:34:34 * carl_baldwin 's and armando's double-o keys don't seem to be working today. 22:34:35 but the existing model/API experience can be streamlined 22:35:21 carl_baldwin: would you still agree with this last statement? 22:35:26 yes 22:35:47 ok 22:36:00 bug 1586056 22:36:00 bug 1586056 in neutron "[RFE] Improved validation mechanism for QoS rules with port types" [Wishlist,Triaged] https://launchpad.net/bugs/1586056 - Assigned to Slawek Kaplonski (slaweq) 22:36:25 \o/ 22:36:35 ajo, ihrachys are you saying that this turns out to be a simple bug fix? 22:36:48 ‘simple’? 22:37:17 well, not a simple bugfix, the current behaviour could have been considered a bug, may be 22:37:35 ajo: is there a patch in the works? 22:37:37 I prefer we actually track it by RFE, we even have a short spec describing the work to be done 22:37:43 armax, yes, 1 sec 22:38:01 I think it's a rather straightforward fix, though building it cleanly will require some thought. it changes behaviour for supported rule types API, so it's a RFE. 22:38:03 https://review.openstack.org/#/c/328655/? 22:38:14 https://review.openstack.org/#/c/319694/ 22:38:33 oh boy you got a -1 from garyk 22:38:39 :P :) 22:38:45 :D 22:39:00 ok +794,-35 22:39:17 it's contained in a way that it's only activated via callbacks if qos is enabled 22:39:18 whooo, it deletets 35 lines! :) 22:39:23 cleanup! ;) 22:39:49 ajo: and you still want a spec? 22:40:20 the spec seems to be fine, we used it to agree on the high level details of the implementation 22:40:32 ajo: is there a pending spec too? 22:40:38 1 sec 22:40:38 https://review.openstack.org/#/c/323474/ 22:40:43 the spec ^ 22:40:50 correct 22:41:02 it's fine. I don't insist on having one, but since it's already there... 22:41:25 ihrachys: ok, it seems most of the legwork is done 22:41:28 basically we're trying to conciliate the heterogeneity of a deployment (different vnic types, different port bindings... with different capabilities) 22:41:41 ajo: but you did it with no api changes? 22:41:44 and tell the admin when it's going to do something that does not work 22:41:54 armax, correct, we will only forbid things that don't work 22:41:56 * armax must read it to learn how they pulled that off 22:42:06 ok 22:42:17 like trying to set a policy not compatible with an SR-IOV port 22:42:35 no api changes, that was the original concern; now it's properly isolated in scope. 22:42:44 or trying to change a policy in a way that it becomes incompatible to a bound port 22:42:55 i think error conditions in API will be changed. correct? 22:43:13 amotoki, we will provide more error conditions (conflict probably) 22:43:18 and document them 22:43:30 but no parameters changed, or REST methods added 22:44:02 ok, I can’t see why we can’t proceed with this one, I’ll look at the outstanding patches 22:44:18 I assume that both of you can take care of this in time for Newton? 22:44:46 I will trade reviews for that for some of my pending patches... :) 22:44:57 thanks, yes, I hope we can get it done for newton 22:45:03 ihrachys: it doesn’t work like that, but nice try 22:45:03 hehe, that will be welcomed :P :) 22:45:04 :) 22:45:19 damn! 22:45:22 I thought the trade offer was to me :P 22:45:23 let's move on 22:45:26 ok 22:45:27 ajo: it was 22:45:29 bug 1592000 22:45:29 bug 1592000 in neutron "[RFE] Admin customized default security-group" [Wishlist,Triaged] https://launchpad.net/bugs/1592000 - Assigned to Roey Chen (roeyc) 22:46:08 I think it inherently makes openstack less compatible 22:46:19 however we implement or expose the feature. 22:46:20 I suppose ihrachys’ comment was the nail in the coffin 22:46:39 since the contents of the group is a contract for a long time. 22:46:41 I have been pushing back on this myself 22:47:00 nova allowed this right? 22:47:17 kevinbenton: allegedly 22:47:30 kevinbenton: though I am not sure if they ever removed the mechanism after juno 22:48:00 so right now we are 2 -2 22:48:16 well, in a way, one -2 and one -1 22:48:25 anyone is willing to argue against the non favorable votes? 22:48:30 ihrachys: same difference 22:48:43 well, the incompatibility is a matter of people getting used to properly setup the default SG 22:48:48 what other folks reckon? 22:49:06 I believe there's a good use case when admins want to setup a higher level of security 22:49:29 it's not the first time I heard that from an operator, but they didn't insist too much, they had more pressing things 22:49:31 ajo: bear in mind that this is somewhat already possible 22:49:41 armax: It is? 22:49:44 armax, what do you mean? 22:49:46 the current nonsense is bullshit that leads to literally every new tenant sending a request, "i can't ping my instances!", so everyone scripts a tenant create with their own default anyway. 22:49:51 tenant onboarding 22:49:59 you create a default security group with your junk in it 22:50:10 that's true 22:50:26 but do we want to give the admin the rope? 22:50:29 I’d rather not 22:50:46 I’d rather send the opposite signal 22:51:00 dougwig: those tenants should RTFM 22:51:15 armax: unrealistic for end users. 22:51:20 dougwig: nonsense 22:51:26 end users who? 22:51:33 my granpa? 22:51:37 come on! 22:51:38 :) 22:51:44 and this is why openstack sucks for public clouds. 22:51:58 i am not 100% sure this leads to incompatibility. this sounds a possible usecase and API consumer can know what rules are provisioned. 22:52:00 dougwig: aws is default closed, no? 22:52:01 that’s EC2 behavior too 22:52:08 no, it sucks because we change its behaviour every second cycle. oh wait. 22:52:38 nah, ec2 walks you through the SG as part of launch, so it hits you in the face. 22:52:41 amotoki: it would be discoverable but every cloud could have a different default 22:52:46 dougwig: so that has nothing to do with neutron 22:52:49 we've talked about this before 22:52:51 dougwig: and so is horizon 22:52:51 a questions is we need to ensure the default rules or we can say to check the default rules thru API. 22:52:52 amotoki: existing apps could not retroactively know that neutron will decide to screw them. 22:52:54 it sounds like you want a horizon feature 22:53:08 kevinbenton: i want unicorns and cake, too. 22:53:26 ok, let’s assume that this is not going anywhere anytime soon 22:53:27 If people really want this, I think the FWaaS v2 spec covers this use case. 22:53:33 perhaps we can involve the nova folks just to stir the pot 22:53:40 let’s move on 22:53:50 bug 1596611 22:53:50 bug 1596611 in neutron "[RFE] Create floating-ips with qos" [Wishlist,Triaged] https://launchpad.net/bugs/1596611 - Assigned to LiuYong (liu-yong8) 22:53:50 dougwig: but this would suck more for public clouds if we left it default open 22:54:07 kevinbenton: it's really hurt digital ocean a lot. not. 22:54:20 that one, I don't believe it's achievable with the current state of traffic classification in neutron (which is non-existent) 22:54:32 ihrachys: way ahead of you 22:54:48 ihrachys: we do have a mechanism to postpone 22:54:56 dougwig: digital ocean beating amazon? :) 22:55:20 I believe we can't tackle that yet 22:55:24 then let's do it. I would mark TC effort as a dep for that RFE, but that's probably something not supported for bugs but for bps only. good riddance. 22:55:24 kevinbenton: bah, we pick and choose using AWS as our PRD, depending on our biases. 22:55:25 ihrachys: your comment sums it up pretty well 22:55:25 eventually we will be able 22:55:32 bug 1603833 22:55:32 bug 1603833 in neutron "If we need a host filter in neutron ?" [Wishlist,Triaged] https://launchpad.net/bugs/1603833 22:55:42 this one I think can be tackled by nova’s scheduler filter mechanism 22:55:49 anyone can comment? 22:55:50 dougwig: it's a useful data point is all 22:55:54 dougwig: non-binding 22:56:05 armax: does the nova scheduler have the neutron net when it runs? 22:56:17 dougwig: I suppose they must 22:56:35 that's related to the nova generic resource pool integration 22:56:35 armax: I am not 100% sure that's the only goal, but if it's about bandwidth oversubscribing, then I think it's a dup for another bug I mentioned there. 22:56:37 the godaddy guys developed the IP availability API for a similar use case 22:56:48 and, also related to strict min bandwidth limit (when they talk about bandwidth) 22:57:08 we have a QoS RFE for that, but we need to wait on nova to be ready before jumping in 22:57:12 ok, let’s continue the chat on the bug to further scope it 22:57:28 I suppose this is a first, have we ever managed to finish the entire list in one meeting? 22:57:53 yes 22:57:54 I don't think we have 22:58:27 ok 22:58:34 let’s get 2 mins back 22:58:39 #endmeeting