14:01:04 <tongli> #startmeeting interop_challenge
14:01:05 <openstack> Meeting started Wed May  3 14:01:04 2017 UTC and is due to finish in 60 minutes.  The chair is tongli. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:08 <openstack> The meeting name has been set to 'interop_challenge'
14:01:10 <rarcea> o/
14:01:13 <tongli> hello, every one.
14:01:17 <topol> o/
14:01:21 <JASON_SHI> hello
14:01:23 <dmellado> hi folks
14:01:25 <GregWaines> hello
14:01:25 <wxy|> hello
14:01:37 <alexrobinson> hello
14:01:44 <jb4free> Hi
14:02:18 <ksumit> o/
14:02:39 <tongli> The agenda for today is just run the test and join the big cockroachdb cluster.
14:02:45 <tongli> and post results to the mailing list.
14:03:06 <tongli> #link https://etherpad.openstack.org/p/interop-challenge-meeting-2017-05-03
14:03:07 <JASON_SHI> cool
14:03:07 <zhipeng> o/
14:03:32 <tongli> We do have few items to talk before we dive into the actual run.
14:03:45 <dmellado> yep, I ran earlier today and also merged the commit, so we don't have to rely on in-review patchset
14:03:45 <tongli> Please see the etherpad at the agenda section.
14:04:05 <tongli> @dmellado, thanks.
14:04:13 <topol> awesome
14:04:13 <dmellado> zhipeng: I'd love +2 your commit ASAP too, so I'll do as soon as you address my comments there ;)
14:04:17 <tongli> @dmellado, one more review though.
14:04:33 <zhipeng> dmellado no problemo
14:04:40 <dmellado> tongli: the one for the tags? I assume that we'll need those for phase2
14:04:52 <tongli> all cores, please also look at the Mark's patch for destroying the preallocated floating IPs.
14:05:23 <tongli> @dmellado, not needed absolutely but it is nice to have when you test run it.
14:05:40 <dmellado> I'll take a look
14:05:47 <tongli> https://review.openstack.org/#/c/461591/
14:06:00 <tongli> less than 10 charactor change
14:06:20 <tongli> any way, please take a look at that. we've also merged the nfv patch, yeah!!!
14:06:37 <tongli> now today's stuff.
14:07:07 <tongli> I've put two commands on the etherpad, these two commands are very important.
14:07:09 <dmellado> tongli: for the NFV one, not yet!
14:07:11 <dmellado> pls review also this one
14:07:13 <dmellado> https://review.openstack.org/#/c/462018/
14:07:40 <fredli> Hi all
14:07:54 <vkmc> o/
14:07:54 <tongli> @dmellado, u r right.
14:07:56 <vkmc> hey!
14:08:02 <dmellado> hi vkmc o/
14:08:06 <tongli> ok, please review that one as well.
14:08:19 <tongli> let me continue for today's stuff.
14:08:27 <tongli> the two commands, one for each phase.
14:09:04 <tongli> is everybody clear on that? how to run and what to expect?
14:09:31 <dmellado> just a question, the phase 1 acts with 'localhost' as a master
14:09:45 <GregWaines> what is the --skip-tags="apps"  option ?
14:09:49 <tongli> @dmellado, not localhost, master node.
14:09:53 <dmellado> and then phase 2 tag would change that to use the master from the main cluster
14:10:01 <dmellado> yep, that's what I meant tongli ;)
14:10:16 <tongli> ok.
14:10:38 <dmellado> tongli: I mean, for the first tag
14:10:40 <tongli> @GregWaines, --skip-tags means do not run any tasks marked "apps"
14:10:46 <dmellado> should it act with own_master: True?
14:11:55 <tongli> @dmellado, if you only do phase #1, own_cluster set to False.
14:12:00 <GregWaines> @tongli,  will it still start k8s and cockroach ?
14:12:08 <tongli> that way, you are not dealing with cockroachdb at all.
14:12:25 <tongli> if you set own_cluster == True, then you end up with one node cockroachdb cluster.
14:12:30 <tongli> which does not hurt anything.
14:12:31 <wxy|> dmellado:  hi, the task: setup first cockroachdb node is skipped. I'v configed the cockroachdb link in vars/ubuntu.yml. What should I do next to let it run?
14:12:59 <dmellado> tongli: yep, but that should be it for phase1-only participants, at least AFAIU
14:13:03 <dmellado> maybe we should make that clear
14:13:13 <tongli> ok.
14:13:22 <tongli> for phase runners, here is the configuration.
14:13:32 <tongli> own_cluster set to False.
14:13:37 <tongli> and run this command:
14:13:52 <tongli> ansible-playbook -e "action=apply env=xxxx password=xxxx" site.yml--skip-tags="apps"
14:14:17 <tongli> doing that will only set you up a k8s cluster, has nothing to do with cockroachdb.
14:14:26 <tongli> now for phase2 runners.
14:14:40 <tongli> own_cluster set to False.
14:14:47 <tongli> public_node set to a IP
14:15:08 <tongli> first run phase 1 command:
14:15:15 <tongli> ansible-playbook -e "action=apply env=xxxx password=xxxx" site.yml--skip-tags="apps"
14:15:28 <tongli> then run this command to join the first cockroachdb cluster
14:15:40 <tongli> ansible-playbook -i run/runhosts -e "action=apply env=xxxxpassword=xxxx" site.yml --tags="info,apps"
14:15:49 <tongli> notice that the runhosts and tags
14:16:09 <tongli> the second command is for phase #2, which should take like 10 seconds.
14:16:30 <tongli> basically create a pod for on each k8s node.
14:16:35 <tongli> clear?
14:16:39 <dmellado> yep ;)
14:17:00 <tongli> everybody else?
14:17:08 <wxy|> very clear.
14:17:08 <GregWaines> yep
14:17:11 <daniela_ebert> clear
14:17:12 <wxy|> Thanks very much
14:17:18 <tongli> great.
14:18:17 <tongli> if you run the test using cached container image, this is important for phase #2 runners, please make sure that you are caching the latest cockroachdb container image.
14:18:45 <tongli> if not, the cockroachdb dashboard will show a warning, not end of the world but we do not want to see that.
14:18:51 <tongli> it will be a distraction.
14:19:04 <alexrobinson> regarding the cached container image, I'd much prefer if we picked a specific tag for everyone to use than for everyone to just grab the "latest" image at different times
14:19:32 <tongli> alexrobinson, that means another patch.
14:20:01 <alexrobinson> sure, but if the "latest" versions that people have grabbed vary by enough, the versions might not play well together
14:20:20 <alexrobinson> we'll see today, I guess, but it's a non-trivial risk
14:20:59 <tongli> ok.
14:21:10 <alexrobinson> it's always a best practice when using docker to use specific version tags rather than ":latest" for this reason
14:21:19 <alexrobinson> (not just for cockroachdb, but for all container images)
14:21:43 <alexrobinson> anyways, sorry to interrupt
14:22:10 <alexrobinson> if it's an issue, everyone should switch to the "cockroachdb/cockroach:v1.0-rc.1" image
14:22:24 <dmellado> wxy|: you should be ok just with k8s after latest changes ;)
14:22:26 <tongli> point #2, right after you run the test, do not destroy your cluster just yet especially you are joining the big cluster.
14:23:14 <tongli> ok, back to point #1, container image, this is for phase #2 runners only.
14:23:37 <tongli> do you all want to change it that tag for cockroachdb image?
14:23:45 <wxy|> dmellado: I think so. Just checking now. tongli printed a very clear flow. Thanks.
14:23:50 <tongli> I have no issue with that change, just another small patch.
14:23:56 <dmellado> np wxy| !
14:24:46 <tongli> any one?
14:25:03 <tongli> on using the specific tag rather than the latest by default?
14:26:02 <tongli> or I have lost everybody?
14:26:24 <dmellado> tongli: heh, no, I'm still around
14:26:32 <dmellado> tbh I don't really care, I don't think that it'd affect too much
14:27:02 <tongli> ok, if you refresh your cached images, then I think you are ok.
14:27:20 <tongli> so #agreed, not to make a new patch for specific image
14:27:49 <tongli> point #3, stack size, need some input from Mark, Alex,
14:28:06 <tongli> when other cloud join in, how many nodes should we set?
14:28:22 <tongli> should each cloud have a different number of nodes join or same number of node join.
14:28:39 <tongli> from cockroachdb point of view, it should not matter, but to make the demo more dynamic,
14:28:46 <tongli> should the number vary a bit?
14:29:06 <dmellado> I guess that'd also depend on the size of the cloud
14:29:07 <tongli> if not, I suggest we set to 5, so each cloud have 4 nodes join the big cluster.
14:29:07 <dmellado> and the tenant
14:30:07 <tongli> any one else?
14:30:24 <tongli> any cloud should allow create 5 nodes, no?
14:31:03 <dmellado> hope so!
14:31:21 <tongli> #agreed, stack_size to be set to 5.
14:31:59 <tongli> @alexrobinson, do you have a IP we can use to join your cluster?
14:32:42 <vkmc> shouldn't be a problem with 5 nodes
14:33:38 <tongli> @alexrobinson, u still there?
14:33:48 <tongli> did you prepare a cluster so that we can join
14:34:11 <alexrobinson> @tongli, no I don't have a cluster
14:34:54 <SpencerKimball> @tongli, I think there's some confusion here. We (including Alex Polvi from CoreOS) were under the impression that we'd be using one of your clouds
14:35:38 <tongli> @alexrobinson, @SpencerKimball, hmmm, I do not think so.
14:36:16 <tongli> mark sent out an email right after the Austin rehearsal, that you guys will setup the first cluster at the session before our show.
14:36:40 <tongli> that has been communicated few times.
14:37:07 <tongli> let's hash that out offline.
14:37:28 <tongli> anyway, I have setup ibm cloud as the first cluster.
14:37:40 <tongli> can we all start run the phase 1?
14:37:51 <tongli> and post your run time and results to the mailing list?
14:38:07 <tongli> the first cluster information is on the etherpad.
14:38:30 <tongli> my screen is shared here #link https://apps.na.collabserv.com/meetings/join?id=1947-0912
14:40:18 <dmellado> tongli: password for the meeting?
14:40:23 <Wei_Liu> tongli, what's passcode?
14:40:33 <tongli> meettong
14:41:17 <daniela_ebert> works fine, I can see your screen
14:41:21 <dmellado> yep
14:41:28 <Wei_Liu> yep
14:41:33 <tongli> great. it shows the cockroachdb cluster.
14:41:44 <tongli> currently has 11 nodes, however 2 dead.
14:41:58 <dmellado> that should've my cluster around too
14:42:02 <GregWaines> emailed my/WindRiver results to interop-wg@lists.openstack.org <interop-wg@lists.openstack.org>
14:42:31 <dmellado> tongli: which kind of results would you need? I thought we were tracking this on the wiki page too
14:42:49 <dmellado> https://wiki.openstack.org/wiki/Interop_Challenge#Boston_Summit_On_Stage_Keynote_K8S_Demo_Commited_Parties
14:42:54 <tongli> wiki page has an example , at the very end.
14:43:22 <GregWaines> I just cut & pasted the timing and PLAY RECAP messages at the end
14:43:23 <tongli> it was made for the lampstack but should work for the k8s workload as well.
14:43:58 <GregWaines> ... sent email previously using template from wiki
14:44:04 <tongli> you can access cockroachdb dashboard by point a browser to your node as well.
14:44:18 <tongli> @GregWaines, thanks.
14:44:27 <dmellado> will send it after the meeting, then
14:45:00 <tongli> from screen, you can see that the interop database was created, and the size of the table demo will change
14:45:31 <tongli> it does that because each pod also has a data generator container running.
14:45:40 <tongli> which insert new values to the table.
14:46:39 <tongli> there are two nodes at 162.2.44.37, 162.2.44.22 are dead nodes.
14:47:41 <tongli> for phase #1, you should see something like this from k8s dashboard.
14:47:49 <wxy|> it's mine. :(
14:49:06 <tongli> @wxyl, ok, make sure everything is ok.
14:49:28 <tongli> if you switch namespace to kube-system from k8s dashboard, you should see more stuff on the k8s dashboard.
14:49:50 <tongli> get yourself familiar with the k8s dashboard especially for the phase #1 only runners.
14:50:11 <tongli> you should be able to just do this against your own cluster.
14:50:31 <Wei_Liu> ok
14:51:00 <tongli> if you switch to Pods section, you should see the Status to be running and Restarts to be 0
14:51:17 <tongli> if you see Restarts to be greater than 0, normally something is not right.
14:51:27 <tongli> it will be true for your cockroachdb pods  as well.
14:52:08 <tongli> my screen shows that my cockroachdb pods restarted more than 0. which is not a good sign.
14:52:15 <wxy|> pods for k8s run well. But there is no cockroachdb pods
14:52:17 <dmellado> tongli: heh xD
14:52:29 <dmellado> wxy|: that's normal if you're running only phase 1
14:52:43 <wxy|> I run phase 2 as well.
14:52:45 <tongli> @wxyl, if you only run the phase #1, that will be correct
14:53:17 <wxy|> cool I saw it.
14:53:20 <tongli> if you run phase 2 as well, that won't be correct. switch the namespace.
14:53:29 <topol> :-)
14:53:34 <wxy|> but result is 2.:(
14:53:57 <tongli> @wxyl, that is correct. since you are not the first cluster, my cloud is.
14:54:21 <tongli> one of your node is used as k8s master node, which has no cockroachdb nodes running on.
14:54:28 <tongli> your stack_size must be 3.
14:54:35 <tongli> which is the default if you did not change it.
14:54:46 <tongli> ok, only few minutes left.
14:54:49 <wxy|> Does it means that I'm ok now?
14:55:00 <tongli> we need a count for who will be running phase #2
14:55:07 <dmellado> o/
14:55:30 <tongli> @wxyl, no, seems your rockroachdb nodes are dead.
14:56:10 <tongli> everybody, please indicate on the etherpad, who runs phase #2?
14:56:14 <dmellado> you can go in there and do kubectl get pods and check
14:56:17 <dmellado> wxy|:
14:56:18 <wxy|> How can I debug it?
14:56:29 <kongwei> o/
14:56:32 <wxy|> I'm a new guy for k8S. Sorry
14:57:08 <tongli> Anybody else for phase #2?
14:57:13 <dmellado> wxy|: using kubectl, but you could first try even with the dashboard
14:57:14 <daniela_ebert> I plan to run phase #2, but I still have an issue with the latest patches
14:57:30 <dmellado> daniela_ebert: what's on with those for you?
14:57:38 <GregWaines> Not me .... Wind River / Greg Waines / Brent Rowsell ... will be doing only Phase 1
14:57:56 <dmellado> tongli: I'm assuming that anyone who said 'public cloud' on the etherpad with the participant info
14:57:59 <dmellado> would be running it
14:58:02 <dmellado> even if not around today
14:58:07 <tongli> we need confirmation.
14:58:18 <tongli> the foundation has asked me yesterday.
14:58:41 <tongli> I can go with that if no objections.
14:58:53 <Wei_Liu> tongli, if I do not join phase #2, so I should config own_cluster as true and ignore --skip-tags="apps" in the command, right?
14:59:09 <vkmc> last meeting we said we were going to do the demo today, running the phase 2 so we make sure that with the last code we are all achieving the results we epect
14:59:21 <tongli> @Wei_Liu, please see the conversation earlier.
14:59:54 <vkmc> expect*
14:59:59 <Wei_Liu> yes, I was here all the time.
15:00:12 <dmellado> that's why I left the nodes connected to the DB, basically
15:00:19 <tongli> ok. one minute left.
15:00:29 <daniela_ebert> @dmellado: I am investigating in parallel. may I contact you later?
15:00:35 <tongli> contact me if you still have problems.
15:00:46 <dmellado> daniela_ebert: sure, feel free reaching me out (and tongli too ;) )
15:00:54 <tongli> #endmeeting