10:31:43 #startmeeting training_labs 10:31:43 Meeting started Wed Dec 16 10:31:43 2015 UTC and is due to finish in 60 minutes. The chair is rluethi. Information about MeetBot at http://wiki.debian.org/MeetBot. 10:31:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 10:31:47 The meeting name has been set to 'training_labs' 10:32:14 #topic liberty patch 10:32:57 Anybody else? 10:33:27 berndbausch: I think the rest of the team is travelling this week :) 10:33:55 OK great. It's all for me :) 10:34:06 We have a few issues to discuss wrt to the liberty patch. 10:34:13 Some minor, some rather major. 10:34:17 Good. 10:34:32 The first big one is the network reorg. 10:34:36 I did some testing as you know, fixed some typos and such. 10:34:43 That's the problem I am running into now. 10:34:53 liberty drops some networks and adds "manual" as a new type of network. 10:35:28 I guess I need to do some reading. I don't know what "manual" is. 10:35:32 and the install-guide now has two versions just for liberty. 10:36:04 basically, it's a network interface reserved for one network but without a specific IP address preassigned to it. 10:36:44 In the install guide there is a "provider" and "self-service" network option. 10:36:51 Is that what you mean? 10:36:55 yup. 10:37:03 OK. 10:37:27 I think the self-service is more relevant, so let's implement that. 10:37:38 I agree. 10:37:44 preferably, we implement both. 10:38:16 I was asking if the environment is flexible enough to provide such options. Yes, I would do that too. 10:38:37 the answer to your questions is that currently, it's tricky. 10:38:49 Grin 10:39:06 you can easily have several configurations as long as you only change scripts.ubuntu_cluster or somesuch. 10:39:39 for instance, we could have scripts.ubuntu_cluster_provider. 10:39:50 Yes, and I think that is good enough. 10:40:17 Just tell users that the default is selfservice, and provider can be implemented by doing this and that 10:41:10 Right now I am trying to test provider but hit a problem. Is it the right moment to raise this? 10:41:12 works for me. I am looking at how difficult it is to provide more options for customization. 10:41:18 sure 10:41:39 It's because I am not sure how to change the script so that it works with two nodes instead of three. 10:42:10 which script exactly_ 10:42:17 Here is what I see. In scripts.ubuntu_cluster, there is a line "cmd queue config_external_network.sh" 10:42:30 This is where the error occurs 10:42:52 "no network 'external'" (paraphrased) 10:43:19 In fact, the external network is set up in a different line.... 10:43:36 cmd queue ubuntu/setup_neutron_network.sh 10:43:43 which is later in the script. 10:43:55 And currently, it sets up the network node. 10:43:59 I think the install-guide moved the creation of networks from the middle to the end of the guide. 10:44:05 it is part of the testing part now. 10:44:09 But my question is, why is this line later. 10:44:50 So, in order for config_external_network to succeed, the setup_neutron_network script must be executed first 10:44:54 the answer is probably that the install-guide used to do it this way. 10:45:02 and currently it is not. Thus error. 10:45:34 OK, I will therefore have to redo that script. I wondered if some of those commands are running in parallel. 10:45:46 yes. roughly, the content of config_external_network would now go into launch_instance.sh 10:45:51 But they can't of course. There are snapshots. 10:46:06 That would be one of the last steps I guess 10:46:50 we need to go through the install-guide systematically and reorder our scripts accordingly. 10:47:39 Yes, that's right. So basically each install guide page is more or less represented by a shell script, and the shell scripts are sequenced in scripts.ubuntuy_cluster and so on. 10:47:45 I seem to remember the before the RST-conversion, the order was a bit more obvious than it is now. 10:47:53 right. 10:48:19 that is the concept in a nutshell. 10:48:20 There are some jumps in the install guide which make you go somewhere else then return to the same place. Like a subroutine. 10:48:39 I don't like it so much, but that's life. 10:48:42 yes, and the sidebar does not indicate that. 10:49:07 so you have to read the whole page to make sure you didn't miss a "subroutine". 10:49:35 Sometimes I feel lost in the pages. 10:49:49 So we have to recreate that flow in the scripts.something files. 10:50:00 yes. 10:50:20 Since no more testing is possible without reordering the scripts.something, I will give it a try tomorrow. 10:50:35 excellent. 10:51:19 note that the install-guide changed some other things, too, including the private network range. 10:51:37 used to be 192.168..., now 172... 10:52:42 So far I just went through the install guide pages rather mechanically and adapted the corresponding shell scripts or fixed the problems I saw. That is not so difficult; it just requires a certain level of precision. 10:52:46 I suggest we change that after we dealt with the network refactoring. that way it should be easy to rebase. 10:52:56 ok 10:53:49 usually, the tricky part is detecting and fixing the races. there are so many of them. 10:55:56 I noticed that :) 10:56:21 My system is a good testbed for race conditions. It's rather slow. 10:56:55 Some races seem to turn of only on fast systems. It is good to have a mix. 10:57:08 s/of/up/ 10:57:32 Strangely it is a rather powerful server, perhaps Virtualbox is missing something there. 10:57:49 define "slow", then. 10:57:55 ihrachys, I will look into failure of test_model_sync now 10:58:17 akamyshnikova_: I am already looking 10:58:20 The old Openvswitch race turned up on my system. It took 4-5 seconds to get OVS going 10:58:22 akamyshnikova: there was a bug for that 10:58:40 meh, sorry folks, bad channel 10:58:46 np 10:59:10 berndbausch: that's odd. 10:59:14 Also, starting a cluster took some 30-50 minutes rather than the 20 it's supposed to be. 10:59:40 including basedisk build? 11:00:18 Yes the whole thing I believe. Excluding the ISO download I *think*. Need to test this again if that interests you. 11:00:57 okay. 20 minutes is pretty fast for building basedisk and cluster. 11:01:33 building the basedisk depends on the internet (and its servers). 11:01:54 building the cluster takes about 10 minutes on a pretty fast machine. 11:02:24 I know there are opportunities to improve speed. 11:02:34 I have a Proliant ML330 with a four core Xeon and 20GB memory 11:02:42 It just never was a big priority. 11:02:56 We could build the nodes in parallel to an extent. 11:02:57 SSD? 11:03:03 normal disk. 11:03:13 single disk, no RAID. 11:03:20 The build is often disk-bound. 11:03:44 that would explain it. It's true, the CPU seemed to be bored most of the time. 11:03:52 We can start building in parallel once we are bored because we ran out of races to fix :). 11:04:13 yes it would provide us a few more races. 11:04:19 there are plenty of dependencies between the nodes. 11:04:41 yes. The one I am running into, for example. 11:05:21 I opened a ticket for the one in cinder_volumes. The sleep 20 is ugly and not good enough. 11:06:21 Do I have to remember that bug? Did I raise it :-$ 11:07:20 Nah. We sort of patched it up for Juno (I think), but according to my testing it fails maybe once in a 100 tests. 11:07:27 Ah ok. 11:07:44 It shouldn't be too hard to replace all sleeps by a proper test. 11:07:59 testing the condition we are waiting for. 11:08:25 for rare races, it becomes difficult to test the fix. 11:09:06 by provoking somehow perhaps? 11:09:30 sure, if you know how. 11:09:43 :) 11:10:24 can you take a look at the network refactoring? I realize the patch is too big for a thorough review. Maybe I need to break it into smaller pieces after all. But it's pretty easy to test, and we kinda need those changes for liberty. 11:10:42 Is there a patch already? 11:11:14 Refactoring needs first to restructure the scripts.ubuntu_clusters and friends, right? 11:11:19 https://review.openstack.org/#/c/257063/ 11:11:21 That's what I wanted to do anyway 11:11:24 OK good. 11:11:59 Actually, this refactoring only changes how the network configuration gets parsed and applied. 11:12:41 the patch is for kilo, in preparation for liberty. 11:13:01 Good to know! 11:13:51 there is another patch in the queue, but that just improves KVM disk handling (somewhat faster and much less space usage). so that one is not urgent. 11:14:03 looking at it now. It's too late to do much today, I will dive into this tomorrow. 11:14:39 sure, no rush. 11:15:18 feedback/questions welcome. 11:15:49 sure! 11:17:01 So I have two pieces of work, checking the network patch and creating a new scripts.ubuntu for Liberty. I feel satisfied. 11:17:24 :) 11:18:05 and we will go with selfservice for now. 11:18:26 I will look for a way to provide the alternative, too. 11:18:27 agreed 11:19:12 any other issues? 11:19:38 I don't know enough about it yet to have more issues. 11:19:53 So, thanks. Good for now. 11:20:32 okay. thanks for coming, see you next week. (of course, there is always email, too) 11:20:46 Yep, thanks. 11:20:51 #endmeeting