15:00:47 #startmeeting massively-distributed-clouds 15:00:48 Meeting started Wed Dec 6 15:00:47 2017 UTC and is due to finish in 60 minutes. The chair is ad_rien_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:49 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:51 The meeting name has been set to 'massively_distributed_clouds' 15:01:04 #chair ad_rien_ 15:01:04 Current chairs: ad_rien_ 15:01:10 #topic roll call 15:01:17 o/ 15:01:24 o/ 15:01:29 o/ 15:01:30 hi all o/ 15:01:42 yo! 15:01:45 o/ 15:02:05 o/ 15:02:05 #info agenda 15:02:05 #link https://etherpad.openstack.org/p/massivfbely_distributed_ircmeetings_2017 1547 15:02:25 May I ask you to add you name please on the etherpad 15:02:46 line 1549 (in particular for DragonFlow folks :-O) 15:02:53 thanks 15:03:01 The link is: https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 15:03:19 yes, we are line 1556 15:03:48 so before diving into details today we have the usual ongoing action reviews and 15:04:00 I hope a discussion around DragonFlow design choices. 15:04:36 oanson: we were discussing about a possible presentation. I hope we can discuss it 15:04:55 ad_rien_, yes. I've added a link to the etherpad 15:04:55 Last but not the least, is parus there ? 15:05:29 oanson: thanks 15:05:38 ok so let's start 15:05:49 #topic Annoucement 15:05:59 so only a few information to share from my side. 15:06:29 the OpenStack foundation tries to support edge actions 15:06:49 they have created a new ML and they are trying to launch new actions 15:07:12 I had a phone call with Jonathan last week to clarify the overlapping between these actions and the FEMDC SiG 15:07:22 The goal is to try to move forward faster. 15:07:37 So the first etherpad link is the general one (i.e. opened by the foundation) 15:07:45 The second one is an overview of our activities 15:07:55 To be honest, right now there is only a few interaction. 15:08:18 I don't know what we can expect but at least it is great that the foundation is trying to push everything forward 15:08:48 Maybe the second informtion I can share is related to the possibility to have a dedicated day during the next PTG in dublin 15:09:15 I added that point in the open discussion section (if you are interested to take part to such sessions, please add your name) 15:09:31 According to the number of persons, I will confirm our interest to the foundation. 15:09:34 That's all from my side 15:09:43 so any other news to share? 15:09:59 any folks from FBK? 15:10:12 ok 15:10:19 So if not let's move to the next topic 15:10:31 ad_rien_: no, for sure we will note be in Dublin 15:10:36 sorry 15:10:43 dancn: I just saw you now 15:10:54 but we keep an eye! 15:10:55 dancn: ack 15:10:57 no proble! 15:11:01 ok so let's move forward 15:11:08 #topic ongoing actions - Use-cases 15:11:15 we disapperaed for a while, now back! 15:11:33 welcome back dancn :) 15:11:35 Paul-Andre is not present (i didn't see him) but maybe dancn you can update a bit regarding the presentation you did in 15:11:42 the Fog Congress 15:11:43 ? 15:12:03 Yes, sure, I just added to ehterpad few notes 15:12:33 thanks 15:12:36 any other information. 15:12:39 we will do an extended presentaiton at cloudcom, we can redo the presentation for you people at FEMDC 15:12:47 Did you get valuable feedbacks from the fog congress 15:13:04 it may make sense thanks 15:13:21 basically it is a demo that show some problem in edge application with constrains and some "workaround" 15:13:43 still including OpenStack and kubernetes? 15:14:20 about 20 people showed up, interested in seeing something more than a research prototype 15:14:37 yes both k8s and OS are involved 15:14:43 ok 15:15:15 and did you see some prototypes from your side that can supervise/operate edge infrastructures during the fog congress event? 15:15:26 or your proposal was/is the most mature? 15:15:39 I was not there 15:15:42 ok 15:15:55 will be great to have information about that 15:15:56 ok 15:16:04 any other points/aspects to share? 15:16:06 we internally do not had a sync meeting, but as soon as we do I will share 15:16:32 ok 15:16:41 questions/remarks? 15:16:53 if not I proppse to continue 15:17:10 ok so next point 15:17:21 #topic ongoing-action-openstack++ 15:17:44 no news from our side, need time to make progress on that part. I hope that a subgroup will take this action soon 15:17:56 #topic ongoing-action-P2P-openstack 15:18:15 regarding that point, we had a meeting last week with Ericsson folks. The meeting has been fruitful from my viewpoint. 15:18:25 We present the P2P architecture using cockroach 15:18:36 find some pros and cons. we will try to summary that on a white paper 15:18:51 Could you please refresh what's p2p? 15:19:18 I think what Adrien mean is collaborative Openstack 15:19:22 Meanwhile, Ericsson is progressing on an architecture that can enable several operators to share their resources 15:19:25 msimonin: thanks ;) 15:19:33 A BitTorrent like OpenStack 15:19:45 P2P Peer to Peer. sorry oanson 15:19:54 Many OpenStack instances that collaborate 15:20:01 Ah. I see. Thanks! 15:20:16 in an edge infrastructure you will have several geo distributed DCs our goal is to be able to make several openstack collaborate 15:20:22 ok that's all 15:20:30 we have another meeting schedule on next friday 15:20:35 I hope we will progress 15:20:40 s/hope/I'm sure 15:20:42 ;) 15:20:51 any question ? 15:21:03 #topic ongoing-action-AMPQ 15:21:21 so ansmith kgiusti msimonin avankemp 15:21:26 the floorsis yours 15:21:37 I defer to msimonin and avankemp as they are doing the heavy lifting... 15:21:39 (sorry for the typo, please) 15:22:10 ...but we have been engaging in a weekly meeting re: the msging testplan 15:22:35 details in epad line 1583 15:22:47 the review has been merged (https://review.openstack.org/#/c/491818/) 15:23:16 So we are now running initial experiments regarding this test plan 15:23:29 on Grid'5000 15:23:50 We 15:23:57 We have now some results to analyse 15:24:39 looks great 15:24:40 ;) 15:24:41 As kgiusti said we are iterating with kgiusti and ansmith on the code / experiment design 15:25:24 initial tests is about thousand of rpc client and rpc servers 15:25:38 we target 10^4 in the midterm 15:26:29 I'm not sure if I need to dive more into details 15:26:46 that's ok from my side 15:27:00 any other comments from redhat folks? 15:27:07 if not we can move to the next point. 15:27:36 #topic ongoin-action-cockroach 15:27:41 we're good thanks 15:27:56 the floor is yours rcherrueau 15:28:00 ;-) 15:28:12 lemme keep it short 15:28:51 ad_rien_ speeks about a collaborative OpenStack using cockroachdb 15:29:35 but you need to tweak the code the get the desiderata 15:30:04 Especially, there are nova queries that don't work, as it, on CockroachDB 15:30:51 So I do a soft that extracts all SQL queries performed during unit/functional tests 15:32:13 The soft leverage subunit streams produced by ostestr, so the soft should work for keystone, glance, ... every unit/functional tests that use ostestr. 15:32:52 result of the extraction is available online 15:33:08 #link http://enos.irisa.fr/nova-sql/pike-nova-tests-unit.sql.json 15:33:26 #link http://enos.irisa.fr/nova-sql/pike-nova-tests-functional.sql.json 15:33:35 be careful, files are huge 15:33:42 be careful these are huge files 15:33:46 (too slow) 15:33:56 Length: 233394340 (223M) [application/json] 15:33:59 :-) 15:35:07 Next step is running analysis of sql queries to identify easily where we have to change the nova code regarding cockroachdb investigation 15:35:39 actually I already have an analysis that shows all correlated subqueries 15:35:46 Maybe you can also add that by conducting this study a few of our students discovered that there is a lot of select 1 requests. While these requests are not so expensive in the context of a LAN deployment, they may become critical in the case of a WAN context. 15:35:52 Lemme polish the code and then I will share it with you 15:36:02 that all for me 15:36:07 thanks 15:36:09 questions? 15:36:11 comments? 15:37:13 ok 15:37:20 so I propose to move to the open discussion 15:37:33 and more precisely on the dragonflow aspect 15:37:41 #topic open-discussions 15:37:54 regarding the PTG I already mentioned that if you are interested please add your name 15:38:11 so oanson the floor is your 15:38:16 ;-) 15:38:17 Thanks 15:38:22 How do you want to proceed 15:38:30 we have a webconference tool 15:38:32 We can go over the short presentation I prepared 15:38:35 The link is here: https://docs.google.com/presentation/d/1HdpS5FYyBnxNkarI356B7I8vwMrC4x76FMyHmU3Ml-c/edit?usp=sharing 15:38:37 or we can continue by IRC 15:38:58 Let's go with IRC. I think just setting it up won't be worth the time 15:39:08 ok 15:39:10 let's go 15:39:24 Basically up to slide 5 is very general introduction 15:39:33 Slide 5 basically says: 15:39:50 Cloud network solution: General idea is Neutron backend 15:40:12 including taking care of connecting compute elements (VMs, containers, etc.), also across compute nodes. 15:40:33 Fully distributed is that we try to avoid any bottlenecks, centralized elements, etc. 15:40:38 ok 15:41:11 Fully open source in that everything is upstream, including design, implementation, strategies, etc. Being an openstack project everything is on gerrit. 15:41:18 ok 15:41:56 Slides 6-7 goes a bit more in depth on what networking we do. But I would like to skip ahead to the architecture (Unless anyone has any preferences). We can always come back to it later 15:42:16 please go ahead 15:42:23 I think this is the right way to progress 15:42:25 ;-) 15:42:36 So slide 9 shows the birds-eye view architecture. The top most node is the API layer - currently a neutron server with Dragonflow plugins 15:42:54 The Dragonflow elements here are ML2 plugins and service drivers 15:43:20 These plugins only translate the Neutron objects to Dragonflow objects, and push them to the distributed database. 15:43:47 The distributed database is in the centre in this schema, but in general it is distributed across the Datacenter/cloud/deployment 15:44:19 Each compute node (The satellites) pulls the information from the distributed database, and applies the policy locally. 15:44:46 Additionally, there's also a pub/sub mechanism where the Neutron server publishes the policy, and the compute nodes subscribe to it. 15:45:17 The compilation of the policy translates high-level objects (e.g., ports, networks, subnets, security groups) to OpenFlow rules. 15:45:42 These rules are then installed in the (currently) OVS switch, which is the forwarding element for packets between compute elements 15:45:54 That's it for this slide :) 15:45:57 oanson: is olso.messaging used for pub/sub or separate library? 15:46:06 ansmith, a separate library. 15:46:18 oanson: what message settlement (at least/most/exacly) is offered by the publish/subscribe ? 15:46:41 We have implemented pub/sub ourselves in a pluggable way, so we can utilize either distributed db-based pub sub (as provided by Redis or Etcd) or use a broker based one (currently ZMQ) 15:46:50 msimonin, sorry? 15:47:19 What happen if messages cannot reach the satellite for a small periods of time ? 15:47:29 to All: may I ask you to add your questions into the etherpad 15:47:33 so we can keep trace of them 15:47:34 can it lead to inconsistency in the satellite configuration ? 15:47:44 and go through each of theam 15:47:52 The pub/sub mechanism is considered only an optimization. The 'gold standard' is what's in the database. 15:47:59 (just to facilitate the discussion, I think we have all a couple of questions) 15:48:18 We rely on the distributed database to (eventually) contain the correct configuration, and therefore the correct configuration will eventually reach the satellites. 15:48:34 oanson: you mean that at any time a agent can get it's whole configuration from the database ? 15:48:47 msimonin, yes. Exactly. 15:48:54 oanson: great, thanks 15:49:21 So slides 10 and 11 show the architecture a bit more in depth. Slide 10 stresses DB and pub/sub pluggability. 15:49:33 oanson: would it be possible to go through slides 15 and after? 15:49:42 Sure. 15:49:42 I think this is also an important aspect for the SiG 15:49:44 thanks 15:49:52 Let me just say 2 words on slide 10: 15:49:58 sure 15:50:16 Everything is pluggable, in that different db stores and pub/sub mechanisms can be used for each deployment, according to the deployment needs. 15:50:41 e.g. in some cases Redis performs better than etcd (or vice versa), and in some cases you have to use ZMQ for pub/sub (which is db agnostic) 15:51:09 Slide 15-16 shows the two options we had in mind for fog/edge/massively distributed architectures. 15:51:46 The first option is either to treat each datacentre (small cloud in the diagrams) as a standalone deployment 15:52:10 Then use cross-cloud connectivity solutions for the datacentres to communicate 15:52:36 like Tricircle proposal somehow 15:52:44 Yes 15:52:47 The second option (slide 16) is to have a single large Dragonflow deployment spread across multiple clouds, and use a tunneling solutions to communicate 15:53:31 The first option should be much easier to implement than the second. The only pro of the second (in my opinion) is that you have a single entity and then, once it works, management should be easier. 15:53:51 just one point which is not clear 15:53:58 Additionally, with a single entity more information is available for each controller, and perhaps better decision making is possible 15:54:03 why do you need to tunnel all traffics in the second possibility 15:54:22 (sorry if my question looks naive but I'm not sure I'm understanding well) 15:54:34 in a context of a network operator 15:54:47 ad_rien_, I'm guessing we would have to have an IPSec like solution, since the traffic (probably) has to be encrypted. 15:54:47 you may consider to have mandatory ports opened 15:54:53 ok 15:55:01 so for data privacy/security isseus 15:55:04 issues 15:55:17 ad_rien_, the tunnels exist because I assumed you want a mesh connectivity - everyone can communicate with everyone 15:55:21 but this problem is already present in a traditional cloud system 15:55:27 yes 15:55:38 And if you don't *have* to go through the main cloud, it's better to connect ad-hoc. 15:55:52 this is indeed an assumption we take for the P2P openstack scenarion (at least right now in order to make the complex story a bit easier) 15:56:03 ad-hoc - in a mesh topology 15:56:04 ok 15:56:09 so 4 min left 15:56:20 may I propose to put all questions you would like to discuss on the pad 15:56:25 I think I went over the main points of the presentation. 15:56:31 thanks oanson ! 15:56:33 and then we can discuss them during the next point 15:56:48 I'll try to answer them on the pad as well 15:56:56 oanson: moreover, do you think it makes sense for your to take part to the braimstorming discussions related to the P2P openstack 15:56:58 oanson: Thanks, any chance to implement Dragonflow in kolla-ansible? 15:57:08 rcherrueau: +1 15:57:10 :) 15:57:21 rcherrueau, I think we already have a kolla image. 15:57:23 to be honest we are not at the level of the network (i.e. enabling communication between 2 vms, each one being deployed on a dedidacted site) 15:57:28 ad_rien_, sure. 15:57:37 but if you think it is relevant from your side, it would be great to have you on board 15:57:48 ok 15:57:53 3 min before ending the meeting 15:57:58 ad_rien_, at the very least, it's important for us to know what problems exist to be solved 15:58:00 oanson: And in kolla ansible, something like `enable_dragonflow: yes` or `neutron_plugin: dragonflow` 15:58:08 should we discuss any particular point? 15:58:18 rcherrueau, I'll have to check. It was written by an external contributor 15:58:21 rcherrueau: +1000 :) 15:58:30 oanson: OK, thanks. 15:58:34 ok 15:58:43 parus: 15:58:49 Hello! 15:58:51 Hi 15:58:59 Unfortunately we are going to end the meeting 15:59:18 but I'm available by skype if you want to discuss a bit 15:59:24 notes are on the etherpad too 15:59:28 so thanks for all 15:59:32 and CU next meeting 15:59:33 Sorry about that. 15:59:34 Thanks! 15:59:39 #endmeeting