14:01:04 #startmeeting interop_challenge 14:01:05 Meeting started Wed May 3 14:01:04 2017 UTC and is due to finish in 60 minutes. The chair is tongli. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:08 The meeting name has been set to 'interop_challenge' 14:01:10 o/ 14:01:13 hello, every one. 14:01:17 o/ 14:01:21 hello 14:01:23 hi folks 14:01:25 hello 14:01:25 hello 14:01:37 hello 14:01:44 Hi 14:02:18 o/ 14:02:39 The agenda for today is just run the test and join the big cockroachdb cluster. 14:02:45 and post results to the mailing list. 14:03:06 #link https://etherpad.openstack.org/p/interop-challenge-meeting-2017-05-03 14:03:07 cool 14:03:07 o/ 14:03:32 We do have few items to talk before we dive into the actual run. 14:03:45 yep, I ran earlier today and also merged the commit, so we don't have to rely on in-review patchset 14:03:45 Please see the etherpad at the agenda section. 14:04:05 @dmellado, thanks. 14:04:13 awesome 14:04:13 zhipeng: I'd love +2 your commit ASAP too, so I'll do as soon as you address my comments there ;) 14:04:17 @dmellado, one more review though. 14:04:33 dmellado no problemo 14:04:40 tongli: the one for the tags? I assume that we'll need those for phase2 14:04:52 all cores, please also look at the Mark's patch for destroying the preallocated floating IPs. 14:05:23 @dmellado, not needed absolutely but it is nice to have when you test run it. 14:05:40 I'll take a look 14:05:47 https://review.openstack.org/#/c/461591/ 14:06:00 less than 10 charactor change 14:06:20 any way, please take a look at that. we've also merged the nfv patch, yeah!!! 14:06:37 now today's stuff. 14:07:07 I've put two commands on the etherpad, these two commands are very important. 14:07:09 tongli: for the NFV one, not yet! 14:07:11 pls review also this one 14:07:13 https://review.openstack.org/#/c/462018/ 14:07:40 Hi all 14:07:54 o/ 14:07:54 @dmellado, u r right. 14:07:56 hey! 14:08:02 hi vkmc o/ 14:08:06 ok, please review that one as well. 14:08:19 let me continue for today's stuff. 14:08:27 the two commands, one for each phase. 14:09:04 is everybody clear on that? how to run and what to expect? 14:09:31 just a question, the phase 1 acts with 'localhost' as a master 14:09:45 what is the --skip-tags="apps" option ? 14:09:49 @dmellado, not localhost, master node. 14:09:53 and then phase 2 tag would change that to use the master from the main cluster 14:10:01 yep, that's what I meant tongli ;) 14:10:16 ok. 14:10:38 tongli: I mean, for the first tag 14:10:40 @GregWaines, --skip-tags means do not run any tasks marked "apps" 14:10:46 should it act with own_master: True? 14:11:55 @dmellado, if you only do phase #1, own_cluster set to False. 14:12:00 @tongli, will it still start k8s and cockroach ? 14:12:08 that way, you are not dealing with cockroachdb at all. 14:12:25 if you set own_cluster == True, then you end up with one node cockroachdb cluster. 14:12:30 which does not hurt anything. 14:12:31 dmellado: hi, the task: setup first cockroachdb node is skipped. I'v configed the cockroachdb link in vars/ubuntu.yml. What should I do next to let it run? 14:12:59 tongli: yep, but that should be it for phase1-only participants, at least AFAIU 14:13:03 maybe we should make that clear 14:13:13 ok. 14:13:22 for phase runners, here is the configuration. 14:13:32 own_cluster set to False. 14:13:37 and run this command: 14:13:52 ansible-playbook -e "action=apply env=xxxx password=xxxx" site.yml--skip-tags="apps" 14:14:17 doing that will only set you up a k8s cluster, has nothing to do with cockroachdb. 14:14:26 now for phase2 runners. 14:14:40 own_cluster set to False. 14:14:47 public_node set to a IP 14:15:08 first run phase 1 command: 14:15:15 ansible-playbook -e "action=apply env=xxxx password=xxxx" site.yml--skip-tags="apps" 14:15:28 then run this command to join the first cockroachdb cluster 14:15:40 ansible-playbook -i run/runhosts -e "action=apply env=xxxxpassword=xxxx" site.yml --tags="info,apps" 14:15:49 notice that the runhosts and tags 14:16:09 the second command is for phase #2, which should take like 10 seconds. 14:16:30 basically create a pod for on each k8s node. 14:16:35 clear? 14:16:39 yep ;) 14:17:00 everybody else? 14:17:08 very clear. 14:17:08 yep 14:17:11 clear 14:17:12 Thanks very much 14:17:18 great. 14:18:17 if you run the test using cached container image, this is important for phase #2 runners, please make sure that you are caching the latest cockroachdb container image. 14:18:45 if not, the cockroachdb dashboard will show a warning, not end of the world but we do not want to see that. 14:18:51 it will be a distraction. 14:19:04 regarding the cached container image, I'd much prefer if we picked a specific tag for everyone to use than for everyone to just grab the "latest" image at different times 14:19:32 alexrobinson, that means another patch. 14:20:01 sure, but if the "latest" versions that people have grabbed vary by enough, the versions might not play well together 14:20:20 we'll see today, I guess, but it's a non-trivial risk 14:20:59 ok. 14:21:10 it's always a best practice when using docker to use specific version tags rather than ":latest" for this reason 14:21:19 (not just for cockroachdb, but for all container images) 14:21:43 anyways, sorry to interrupt 14:22:10 if it's an issue, everyone should switch to the "cockroachdb/cockroach:v1.0-rc.1" image 14:22:24 wxy|: you should be ok just with k8s after latest changes ;) 14:22:26 point #2, right after you run the test, do not destroy your cluster just yet especially you are joining the big cluster. 14:23:14 ok, back to point #1, container image, this is for phase #2 runners only. 14:23:37 do you all want to change it that tag for cockroachdb image? 14:23:45 dmellado: I think so. Just checking now. tongli printed a very clear flow. Thanks. 14:23:50 I have no issue with that change, just another small patch. 14:23:56 np wxy| ! 14:24:46 any one? 14:25:03 on using the specific tag rather than the latest by default? 14:26:02 or I have lost everybody? 14:26:24 tongli: heh, no, I'm still around 14:26:32 tbh I don't really care, I don't think that it'd affect too much 14:27:02 ok, if you refresh your cached images, then I think you are ok. 14:27:20 so #agreed, not to make a new patch for specific image 14:27:49 point #3, stack size, need some input from Mark, Alex, 14:28:06 when other cloud join in, how many nodes should we set? 14:28:22 should each cloud have a different number of nodes join or same number of node join. 14:28:39 from cockroachdb point of view, it should not matter, but to make the demo more dynamic, 14:28:46 should the number vary a bit? 14:29:06 I guess that'd also depend on the size of the cloud 14:29:07 if not, I suggest we set to 5, so each cloud have 4 nodes join the big cluster. 14:29:07 and the tenant 14:30:07 any one else? 14:30:24 any cloud should allow create 5 nodes, no? 14:31:03 hope so! 14:31:21 #agreed, stack_size to be set to 5. 14:31:59 @alexrobinson, do you have a IP we can use to join your cluster? 14:32:42 shouldn't be a problem with 5 nodes 14:33:38 @alexrobinson, u still there? 14:33:48 did you prepare a cluster so that we can join 14:34:11 @tongli, no I don't have a cluster 14:34:54 @tongli, I think there's some confusion here. We (including Alex Polvi from CoreOS) were under the impression that we'd be using one of your clouds 14:35:38 @alexrobinson, @SpencerKimball, hmmm, I do not think so. 14:36:16 mark sent out an email right after the Austin rehearsal, that you guys will setup the first cluster at the session before our show. 14:36:40 that has been communicated few times. 14:37:07 let's hash that out offline. 14:37:28 anyway, I have setup ibm cloud as the first cluster. 14:37:40 can we all start run the phase 1? 14:37:51 and post your run time and results to the mailing list? 14:38:07 the first cluster information is on the etherpad. 14:38:30 my screen is shared here #link https://apps.na.collabserv.com/meetings/join?id=1947-0912 14:40:18 tongli: password for the meeting? 14:40:23 tongli, what's passcode? 14:40:33 meettong 14:41:17 works fine, I can see your screen 14:41:21 yep 14:41:28 yep 14:41:33 great. it shows the cockroachdb cluster. 14:41:44 currently has 11 nodes, however 2 dead. 14:41:58 that should've my cluster around too 14:42:02 emailed my/WindRiver results to interop-wg@lists.openstack.org 14:42:31 tongli: which kind of results would you need? I thought we were tracking this on the wiki page too 14:42:49 https://wiki.openstack.org/wiki/Interop_Challenge#Boston_Summit_On_Stage_Keynote_K8S_Demo_Commited_Parties 14:42:54 wiki page has an example , at the very end. 14:43:22 I just cut & pasted the timing and PLAY RECAP messages at the end 14:43:23 it was made for the lampstack but should work for the k8s workload as well. 14:43:58 ... sent email previously using template from wiki 14:44:04 you can access cockroachdb dashboard by point a browser to your node as well. 14:44:18 @GregWaines, thanks. 14:44:27 will send it after the meeting, then 14:45:00 from screen, you can see that the interop database was created, and the size of the table demo will change 14:45:31 it does that because each pod also has a data generator container running. 14:45:40 which insert new values to the table. 14:46:39 there are two nodes at 162.2.44.37, 162.2.44.22 are dead nodes. 14:47:41 for phase #1, you should see something like this from k8s dashboard. 14:47:49 it's mine. :( 14:49:06 @wxyl, ok, make sure everything is ok. 14:49:28 if you switch namespace to kube-system from k8s dashboard, you should see more stuff on the k8s dashboard. 14:49:50 get yourself familiar with the k8s dashboard especially for the phase #1 only runners. 14:50:11 you should be able to just do this against your own cluster. 14:50:31 ok 14:51:00 if you switch to Pods section, you should see the Status to be running and Restarts to be 0 14:51:17 if you see Restarts to be greater than 0, normally something is not right. 14:51:27 it will be true for your cockroachdb pods as well. 14:52:08 my screen shows that my cockroachdb pods restarted more than 0. which is not a good sign. 14:52:15 pods for k8s run well. But there is no cockroachdb pods 14:52:17 tongli: heh xD 14:52:29 wxy|: that's normal if you're running only phase 1 14:52:43 I run phase 2 as well. 14:52:45 @wxyl, if you only run the phase #1, that will be correct 14:53:17 cool I saw it. 14:53:20 if you run phase 2 as well, that won't be correct. switch the namespace. 14:53:29 :-) 14:53:34 but result is 2.:( 14:53:57 @wxyl, that is correct. since you are not the first cluster, my cloud is. 14:54:21 one of your node is used as k8s master node, which has no cockroachdb nodes running on. 14:54:28 your stack_size must be 3. 14:54:35 which is the default if you did not change it. 14:54:46 ok, only few minutes left. 14:54:49 Does it means that I'm ok now? 14:55:00 we need a count for who will be running phase #2 14:55:07 o/ 14:55:30 @wxyl, no, seems your rockroachdb nodes are dead. 14:56:10 everybody, please indicate on the etherpad, who runs phase #2? 14:56:14 you can go in there and do kubectl get pods and check 14:56:17 wxy|: 14:56:18 How can I debug it? 14:56:29 o/ 14:56:32 I'm a new guy for k8S. Sorry 14:57:08 Anybody else for phase #2? 14:57:13 wxy|: using kubectl, but you could first try even with the dashboard 14:57:14 I plan to run phase #2, but I still have an issue with the latest patches 14:57:30 daniela_ebert: what's on with those for you? 14:57:38 Not me .... Wind River / Greg Waines / Brent Rowsell ... will be doing only Phase 1 14:57:56 tongli: I'm assuming that anyone who said 'public cloud' on the etherpad with the participant info 14:57:59 would be running it 14:58:02 even if not around today 14:58:07 we need confirmation. 14:58:18 the foundation has asked me yesterday. 14:58:41 I can go with that if no objections. 14:58:53 tongli, if I do not join phase #2, so I should config own_cluster as true and ignore --skip-tags="apps" in the command, right? 14:59:09 last meeting we said we were going to do the demo today, running the phase 2 so we make sure that with the last code we are all achieving the results we epect 14:59:21 @Wei_Liu, please see the conversation earlier. 14:59:54 expect* 14:59:59 yes, I was here all the time. 15:00:12 that's why I left the nodes connected to the DB, basically 15:00:19 ok. one minute left. 15:00:29 @dmellado: I am investigating in parallel. may I contact you later? 15:00:35 contact me if you still have problems. 15:00:46 daniela_ebert: sure, feel free reaching me out (and tongli too ;) ) 15:00:54 #endmeeting