09:04:05 <oanson> #startmeeting Dragonflow 09:04:06 <openstack> Meeting started Mon Apr 11 09:04:05 2016 UTC and is due to finish in 60 minutes. The chair is oanson. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:04:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:04:09 <openstack> The meeting name has been set to 'dragonflow' 09:04:12 <oanson> gampel, thankgs 09:04:19 <oanson> #info scsnow gampel Shlomo_N raofei nick-ma oanson yuli_s gsagie in meeting 09:04:36 <oanson> #topic Security Groups 09:04:50 <oanson> #link https://review.openstack.org/#/c/280538/ 09:05:13 <gampel> From the testing so far it does not break the code 09:05:23 <oanson> We are in feature freeze, but due to the importance of this feature and seeing how close it is to completion, we want to try and merge it. 09:05:24 <gampel> but does not work properly 09:05:59 <oanson> yuli_s, dingboopt you were testing this feature. Any comments? 09:06:54 <yuli_s> I found a bug with double security group fields, so basically SG does not works for me 09:07:18 <gampel> oanson: I think we can merge it if it does not break the flow when set to work without SG 09:07:28 <oanson> gampel, We can do that 09:07:37 <oanson> yuli_s, do you think it can be made to work by Monday? 09:07:49 <oanson> (i.e. next Monday, the 18th?) 09:07:51 <gampel> I see that there is a new patch from today what does it include ? 09:07:52 <yuli_s> probably yes 09:08:03 <gsagie> why do we need to merge it if it doesnt work? 09:08:05 <yuli_s> I have no tested the last patch 09:08:07 <nick-ma> yes, suggest to disable it by default in devstack. 09:08:07 <gsagie> this doesnt make sense 09:08:28 <oanson> gsagie, if it can be made to work by the code freeze, merging it should expedite matters 09:08:30 <gsagie> if its not enabled then its not in the feature freeze 09:08:41 <gsagie> thats something needs to be decided now 09:08:41 <oanson> gsagie, not enabled by default 09:09:02 <oanson> still available for anyone who wants to use it 09:09:30 <gsagie> who will want to use it if its not working? 09:09:43 <nick-ma> it depends. if someone is willing to use it, but it doesn't work, it hurts. 09:09:48 <dingboopt> oanson: I will take a look at the bugs after work. 09:09:48 <gampel> we will make it work this week 09:09:56 <gsagie> so lets merge it when it works 09:10:02 <oanson> The plan is to get it to work by the code freeze 09:10:04 <yuli_s> gsagie, I suppose it will be fixed ASAP 09:10:21 <oanson> Then it will be available, working, but not enabled by default. 09:10:34 <oanson> We could enable it by default if we are sure of its stability 09:10:37 <yuli_s> gsagie, agree 09:10:44 <gsagie> so if thats the plan lets get it to work, why we need to merge code that is not working? 09:10:47 <gampel> is yuanwei here ? 09:11:11 <gsagie> if security groups is in the version then it needs to work, and we are past the deadline 09:11:12 <nick-ma> agree. 09:11:20 <oanson> Yes 09:11:23 <gsagie> so we need to agree if its important feature enough or not 09:11:32 <gsagie> if it is, we extend the deadline 09:11:39 <oanson> I understood that we agreed that it was. 09:11:45 <gsagie> dingboopt: can you make security group work in the following 2 days? 09:11:46 <nick-ma> i think it's important. 09:11:49 <gampel> yes it is and we need to extend the deadline only for SG 09:11:55 <gsagie> Duan Kebo: can yuanwei can do it? 09:12:02 <Frank_Duan> If everyone finds a bug of SG, would you please forward it to me? 09:12:05 <gsagie> we need fast iterations on it 09:12:23 <Frank_Duan> I will make sure there is someone working on it? 09:12:31 <gampel> not for other features that did not made the feature freeze 09:12:49 <gampel> yuli can you please share the bug here ? 09:12:53 <gsagie> Frank_Duan: ok thanks, yuli_s and oanson will report bugs to you 09:13:14 <oanson> In addition to launchpad. It is helpful that the list is maintained 09:13:27 <oanson> SG bugs can be marked critical so that we won't miss them. 09:13:49 <raofei> It's better to decide who should responsibility for this feature. It's better for yuanwei or dingbo to complete it fully. 09:13:51 <yuli_s> gampel, In lport table we have 2 fields to save: sgids and security_groups. In my test they have different value. 09:13:53 <gsagie> yuli_s: please share the bug you found, also i noticed that IPv6 is not supported need to add a bug for it as well 09:14:20 <nick-ma> yes. ipv6 is not working. 09:14:23 <gsagie> the problem yuli_s showed me is that we have 2 fields for security groups 09:14:33 <gsagie> need to remove one 09:14:37 <Frank_Duan> I have talked it with dingboot 09:15:01 <Frank_Duan> He is busy recently, so I and yuawei will fix bugs of sg 09:15:13 <gsagie> Frank_Duan: ok great, please let me know if you need help 09:15:24 <oanson> I am also available to help 09:15:24 <gampel> Ok thx we need to make it to work by the end of the week 09:15:30 <yuli_s> ok, great 09:15:55 <Frank_Duan> thank you omer! 09:15:55 <oanson> #action Frank_Duan yuanwei To finish SG app by end of week. oanson gsagie yuli_s to help where needed 09:16:06 <oanson> Anything else on this topic? 09:16:16 <hshan> the reliability feature 09:16:22 <oanson> #topic Reliability 09:16:28 <oanson> hshan: The stage is yours. 09:16:34 <hshan> thx 09:16:40 <gampel> I think that we should delay it after the Mitcka tag 09:17:47 <gsagie> what is missing for that feature? 09:17:51 <gsagie> hshan: ? 09:17:55 <hshan> we need to move the 'switch_features_handler' from openflow handler to ovsdb_monitor, reliability feature rely on that 09:18:39 <hshan> this is the last dependence of reliability feature 09:18:49 <oanson> hshan: There is also the use of mod_flow and OFPFlowMod 09:19:05 <hshan> oanson: yes 09:19:09 <gsagie> ok, so this is a quick fix 09:19:13 <gampel> you need to change all the app to use mod_flow 09:19:14 <gsagie> moving the switch features handler 09:19:26 <gampel> this is a big change 09:19:33 <hshan> gsagie: I'll add a new patch to do that 09:19:35 <oanson> gsagie: we should consider if we keep this event 09:19:49 <gampel> I think that we should delay the reliability feature to after Mitka 09:19:50 <oanson> the sync_started and sync_finished should cover all cases. 09:20:02 <oanson> But this is definitely a change for after Mitaka, yes 09:20:41 <hshan> maybe we should merge it first, but don't enable it 09:20:49 <hshan> what do you think? 09:20:53 <gampel> hshan: I know that you worked very hard on this one But I feel that we need to merge it next week to master 09:20:56 <gsagie> hshan: so change mod_flow and change switch_features_handler, what about L3 app? 09:20:58 <oanson> hshan: I am afraid it may make the code unstable 09:21:23 <gampel> I think that it is problematic because it changes some infrastructure code 09:21:29 <gampel> like mod_flow 09:21:34 <oanson> Even disabled, it has some changes within the code that have some effect 09:21:42 <oanson> like what gampel said :) 09:21:45 <gsagie> gampel: but this we can disable 09:21:51 <gsagie> very easily 09:22:22 <gampel> I feel that this feature is not high priority like SG and it is too risky 09:22:36 <hshan> currently, the reliability feature's changing list is too long, i plan to divide it into several patches 09:22:52 <oanson> Actually, looking at the patch again, it looks like most changes are in new files 09:23:02 <oanson> (I remembered an older version (: ) 09:23:10 <hshan> huh 09:23:13 <gampel> But if you all think it is not we can consider adding it with disable flag 09:23:17 <gsagie> hshan: ok, lets decide that its going to be in our next version, it doesnt mean its not going to be merged 09:23:27 <hshan> yes, this reliability feature itself is simple 09:23:27 <nick-ma> mark it as experimental in the option. if it can be disabled and not affect the pipeline, i suggest we do it and continue to test by enabling it in devstack. 09:23:28 <gsagie> its just going to be merged in next version to master 09:23:40 <oanson> The only problem if its disabled in the code in df_base_app. It has to be conditional anyway. So if we add that, it can also be merged. 09:23:55 <gsagie> nick-ma: i agree, i personally dont see why its not going to be easy to disable this feature 09:24:29 <nick-ma> need to make sure it is working by code freeze. 09:24:29 <gampel> Ok so hshan add a patch that we can disable every thing and we coudl merge it this week 09:24:34 <gsagie> hshan: the question is, why its important to get ir merged? 09:24:36 <gsagie> it 09:24:45 <gsagie> and is it working 09:24:50 <gampel> I agree with the question ? 09:24:52 <hshan> of course 09:25:06 <nick-ma> if it is not wokring by end of this week, let it go to next release. even if it is experimental, we need to make sure it is working then. 09:25:10 <gsagie> hshan: did you change L3 application? 09:25:23 <gsagie> nick-ma: agree 09:25:28 <hshan> I change mod_flow 09:25:44 <hshan> all other apps use mod_flow 09:26:03 <gsagie> hshan: ahh i see 09:26:11 <oanson> hshan: This is in the latest patchset? 09:26:11 <hshan> ok 09:26:12 <gsagie> hshan: so any app that uses the cookie will work 09:26:32 <Frank_Duan> The change is mainly in controller relibility itself. Modoify to other files is mior. 09:26:55 <gsagie> yes i can see that, i will test this patch today 09:27:10 <hshan> oanson: I think it is, all other job will be did in new patch 09:27:10 <gsagie> and report by tomorrow and lets decide then 09:27:24 <gampel> ok so we agree to add it if it works and that it could be disabled and marked as experimental 09:27:26 <nick-ma> ok 09:27:36 <Frank_Duan> So I agree we mark it as experimental feature. 09:27:43 <hshan> gsagie: you must wait for the new patches a little, I think I can finished it today 09:27:54 <gsagie> hshan: please do and email me when its done 09:28:00 <gsagie> and tell me which patch to use for testing 09:28:05 <hshan> gsagie: ok 09:28:12 <oanson> #action hshan finish reliability and add option to disable reliability from configuration 09:28:17 <hshan> sure, I'll will 09:28:18 <Frank_Duan> It's also a important featuer if someone want to use df in any producion 09:28:21 <gsagie> i will also look how we can safely disable it if it turns out not to work 09:28:28 <oanson> #action gsagie test reliability feature 09:28:51 <gsagie> Frank_Duan: we all agree its important, the question is it stable enough right now or needs to be merged in a week 09:28:57 <gsagie> its really not that big delay either way 09:29:04 <gampel> i agree 09:29:11 <gsagie> but we want a "semi" stable version for more testing 09:29:18 <gsagie> and not to deal with logic bugs 09:29:21 <oanson> Though I guess that if anyone uses it in production, they'll want a tagged version 09:29:23 <gsagie> before the summit, thats all 09:29:28 <oanson> rather than the ongoing master. 09:29:43 <gsagie> oanson: yes but we can always add tags 09:29:57 <gampel> We can mark it as RC_1 09:30:00 <oanson> gsagie: Not if we're in the middle of feature development and adding new bugs :) 09:30:21 <oanson> All right, I think we reached an agreement. 09:30:25 <gampel> and cherry pick it when it is done to RC_2 09:30:38 <Frank_Duan> This feature needs some time to mature. 09:31:02 <oanson> All features need to be heavily tested in the coming week 09:31:05 <Frank_Duan> I do think it will be perfect in a few weeks. 09:31:14 <nick-ma> oanson: agree. 09:31:19 <oanson> We need to find and iron out as many bugs before the summit 09:31:56 <oanson> Any more issues on this topic? 09:32:14 <oanson> #topic Bugs 09:32:15 <gsagie> Frank_Duan: the big question it also depends on your schedule 09:32:47 <Frank_Duan> Yes 09:32:56 <oanson> #link https://bugs.launchpad.net/dragonflow 09:33:02 <oanson> yuli_s, anything to report? 09:33:05 <Frank_Duan> We will focus on fixed bugs of these features. 09:33:09 <yuli_s> oanson, most of the bugs are taking care of 09:33:47 <yuli_s> when doing debugging, I am getting strange exceptions, I just send you one 09:34:08 <gampel> oanson: there is the unreported one about the publisher update frequency 09:34:28 <oanson> gampel, I am on that one. 09:34:55 <oanson> yuli_s, please report these exceptions 09:35:04 <oanson> They may hide a larger problem beneath 09:35:06 <gampel> Frank_Duan: are you testing the selective proactive on multi node setup 09:35:07 <yuli_s> I hope we will close all open issues 09:35:12 <yuli_s> oanson, yup 09:35:14 <oanson> And it allows others to investigate and fix them 09:35:25 <Frank_Duan> Not yet. 09:35:35 <Frank_Duan> But hujie has 09:35:46 <gampel> Frank_Duan: I think it is very important to test both redis and zmq 09:36:04 <oanson> I see there are also 2 unassigned sg bugs. 09:36:07 <yuli_s> for example this one happens during debugging 09:36:10 <yuli_s> https://bugs.launchpad.net/dragonflow/+bug/1568506 09:36:11 <openstack> Launchpad bug 1568506 in DragonFlow "sg test bug 2" [High,New] 09:36:17 <hujie> I have test all feature for basic function in the middle of March based on redis in multi mode environment 09:36:18 <oanson> gampel, testing is next topic 09:36:28 <oanson> Lets focus on bugs, and then we can get to it :) 09:36:30 <gampel> Opps 09:36:30 <Frank_Duan> Gampel, we only tested redis 09:36:38 <Frank_Duan> and will test zmq later 09:36:45 <gsagie> okie lets go to testing 09:37:10 <gampel> who is taking the SG bug 09:37:20 <gsagie> go to testing 09:37:34 <Frank_Duan> You are assign the SG bug to Yuanwei 09:37:45 <Frank_Duan> You can 09:37:57 <oanson> #action yuanwei to review SG bugs 09:38:02 <oanson> yuli_s, please assign the bugs 09:38:07 <yuli_s> ok 09:38:26 <oanson> Any other bug-related topics? 09:38:30 <yuli_s> nop 09:38:38 <oanson> #topic Testing 09:38:47 <oanson> We have to make sure all features are tested 09:38:58 <oanson> including multinode, using redis and ZMQ 09:39:10 <oanson> and east-west traffic on L2 and L3 networks 09:39:24 <gampel> and Distributed DNAT 09:39:29 <oanson> Yes, including dnat 09:39:38 <oanson> And SG 09:39:49 <oanson> And if we can, also reliability 09:40:28 <oanson> It is important that any bugs found are uploaded to launchpad, possibly with a mail to yuli_s who is our bug-master 09:40:37 <Shlomo_N> sure 09:41:03 <raofei> sure 09:41:15 <gampel> Shlomo_N: Can you share the status of your performance testing so far 09:41:44 <Shlomo_N> yes, sure 09:42:33 <Shlomo_N> Last week I have tested for 4 scenarios: E-W (L2 and L3), N-S and SNAT 09:43:26 <Shlomo_N> After twicking the linux kernel, I have managed to get near 90% line utilization 09:43:41 <Shlomo_N> Here are the results: 09:43:42 <Shlomo_N> L2 VM VM 7.39Gb/s 09:43:45 <Shlomo_N> oops 09:44:00 <Shlomo_N> L2: 7.39Gb/s 09:44:11 <Shlomo_N> L3: 7.24Gb/s 09:44:21 <Shlomo_N> North-South: 6.98Gb/s 09:44:29 <gampel> it it cross node 09:44:32 <gampel> ? 09:44:34 <Shlomo_N> SNAT: 8.38Gb/s 09:44:43 <Shlomo_N> Yes, all tests are cross node only 09:44:52 <oanson> On a single 10G line? 09:44:59 <nick-ma> geneve or vxlan? 09:45:17 <Shlomo_N> Yep, the lab I'm using it based on 10Gb/s and vxlan 09:45:20 <gampel> did you get the line rate with multi VM traffic 09:45:21 <gampel> ? 09:45:30 <Frank_Duan> Shlomo, do you have the performance data of neutron + ovs agent? 09:46:07 <Shlomo_N> I got near line rate with multi vm traffic 09:46:29 <nick-ma> why does snat get more bandwidth/sec? i don't understand. 09:46:31 <Shlomo_N> Frank_Duan: I have, but it's based on 1Gb/s lab 09:47:00 <gampel> Frank_Duan: I think next will be to test with SG and then DVR 09:47:08 <oanson> Shlomo_N: Do you also have CPU usage statistics for these tests? Is the bottleneck now the line width? 09:47:28 <Shlomo_N> nick-ma: probably because we have only single VMSwitch in the way 09:47:59 <gampel> Shlomo_N: i agree with nick-ma: why is the SNAT and DNAT not the same it is alll centralized right ? 09:48:07 <Shlomo_N> oanson: the bottleneck is not the line bandwidth 09:48:57 <Frank_Duan> Yes, gampel. we also need to compare them with performance data of neutron + ovs agent 09:48:59 <Shlomo_N> gampel: yes. DNAT wasn't tested 09:49:21 <gampel> you said North-South: 6.98Gb/s and SNAT: 8.38Gb/s 09:49:47 <Shlomo_N> yes 09:49:56 <gampel> Frank_Duan: yes next we will do reference implementation with DVR 09:50:18 <Shlomo_N> probably because we have only single VMSwitch in the SNAT scenario 09:50:22 <Frank_Duan> We can do this in Beijing 09:50:36 <gampel> That will be great help 09:50:55 <gampel> shlomo is working on automating the testing you could use his work 09:51:24 <gampel> Shlomo_N: will you be able to upload it to the DF repository 09:51:28 <Shlomo_N> Frank_Duan: how many servers you have there? 09:51:40 <Shlomo_N> Sure, I will 09:51:57 <Frank_Duan> we have 3 severs with 10GE if 09:52:15 <Frank_Duan> other servers only have 1ge ports. 09:52:28 <Shlomo_N> So we have bigger lab here 09:52:35 <Shlomo_N> :-) 09:53:42 <Shlomo_N> Anyone have 40Gb/s lab anywhere? 09:53:57 <oanson> #action Shlomo_N upload performance tests to repository 09:54:19 <Shlomo_N> 10x oanson 09:54:22 <gampel> Shlomo_N: maybe they can do the DVR OVS test and you could focus on the SG automation 09:54:32 <oanson> Shlomo_N, no problem :) 09:54:44 <Shlomo_N> gampel: ok 09:55:07 <oanson> All right. Anything else on testing? 09:55:40 <oanson> #topic open discussion 09:56:21 <nick-ma> i suggest we vote on py34 gate. 09:56:50 <scsnow__> gampel: did you have a chance to look for bug I could work on? 09:56:50 <oanson> nick-ma: second 09:57:06 <nick-ma> it was working and broken and fixed and working then. 09:57:12 <gampel> please look at the DF Austin topics and add suggestions https://etherpad.openstack.org/p/dragonflow-design-summit 09:57:14 <gampel> +1 09:57:23 <nick-ma> i fixed it two times. 09:57:29 <gampel> you mean to make it voting 09:57:38 <nick-ma> yes. 09:58:20 <gampel> scsnow__: please contact yuli he is the bug master but I you dod some testing you will find bugs we are in integration time 09:58:24 <oanson> I think it's important. We should be able to automatically move over to py34 09:58:39 <gampel> is it working now 09:58:50 <nick-ma> of course it is. 09:58:53 <oanson> It appears stable. I looked back a few reviews. 09:58:59 <oanson> Thanks to nick-ma :) 09:58:59 <gampel> scsnow__: will you be in Austin 09:59:11 <gampel> thx yes +1 on make it voting 09:59:28 <scsnow__> gampel: no :( 09:59:34 <oanson> So it's agreed? 09:59:43 <nick-ma> :-) 10:00:08 <oanson> gampel, nick-ma: Who gets to do it? 10:00:12 <gampel> yuli: can you help scsnow__ find an easy bug to work on 10:00:29 <gampel> I will do it no problem 10:00:44 <oanson> #action gampel make gate test py34 voting 10:00:50 <yuli_s> gampel, sure 10:00:57 <nick-ma> scsnow yuli_s: there are lots of simple tasks in wishlist. 10:00:59 <oanson> The fullstack tests also seem to be getting there, but not ready yet. 10:01:15 <oanson> The following error repeats: Exception: VM is not deleted 10:01:27 <oanson> I will open a bug, and if I have time I will look into it. 10:01:52 <gampel> Yes i think making the fullsatck stable is very important ! 10:01:54 <yuli_s> nick-ma, yup, you are right 10:02:07 <oanson> #action oanson make fullstack more stable 10:02:16 <gampel> time is up thank you everyone 10:02:25 <nick-ma> thanks all. 10:02:32 <scsnow__> bb 10:02:35 <oanson> Thank you. 10:02:42 <oanson> #endmeeting