15:00:11 #startmeeting XenAPI 15:00:12 Meeting started Wed Sep 4 15:00:11 2013 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 The meeting name has been set to 'xenapi' 15:00:23 hello all 15:00:30 who is around for todays meeting 15:01:16 I have a few things for the open discussion 15:01:25 Excellent 15:01:28 other people got stuff for later in the meeting? 15:01:38 just the stuff we were talking about with safe_copy_vdi earlier 15:02:12 OK 15:02:21 #topic Actions from last meeting 15:02:38 BobBall: time for updates on VDI.copy workaround 15:02:57 you have a plan that involves taking snapshots? 15:03:25 So - for Mate and the logs - I realised why we use safe_copy_vdi - there get_cached_vdi function can have two independent calls to it - if one call finds the VDI not there, and starts to copy, the second call might see it and then try and copy it at the same time 15:03:43 Can't do two copys of the same VDI at the same time 15:03:50 so we added this hack to do the copy outside of XAPI 15:04:04 Ah, so it's some kind of locking. 15:04:13 but the right fix is to snapshot the VDI, copy it, then delete the snapshot 15:04:26 the copy of the snapshot creates a full copy - not the differencing disk 15:04:36 and you can therefore do them in parallel 15:04:51 well, would be nice for VDI.copy to do two copies at once, or block somehow, but snapshot seems like a good solution 15:04:52 That makes sense. 15:05:01 that leaves the only other possible race (which I assume is dealt with already) being in the glance plugin where it downloads it to a known UUID 15:05:08 Bob, is there a bug for that? 15:05:12 VDI.copy can't do two at once - we have to mount the VDI and then it's in use 15:05:23 Gimme an action and I'll raise a bug 15:05:25 glance doesn't download to a known uuid 15:05:38 it's always re-generating uuids. 15:05:41 it doesn't? Then how will the caching ever work? 15:06:04 oh hang on 15:06:06 of course - sorry 15:06:12 it matches the VDI based on image_id not the uuid 15:06:36 so what we need to do is once we find the image_id then we can use it - snapshot, copy, delete snapshot, done. 15:06:41 yep 15:06:51 uses image tag 15:06:59 uuids are auto_generated on each call 15:07:03 even across retries 15:07:24 BobBall: got the bug id, we might have one already 15:07:59 not yet 15:08:03 I haven't created it 15:08:10 I'll do it after the meeting 15:08:21 https://bugs.launchpad.net/nova/+bug/1215383 15:08:23 Launchpad bug 1215383 in nova "XenAPI: Consider removing safe_copy_vdi" [Low,Triaged] 15:08:24 there we go 15:08:29 #link https://bugs.launchpad.net/nova/+bug/1215383 15:08:51 #topic Blueprints 15:08:52 oh right 15:08:56 I'll update that bug report 15:08:59 its freeze day 15:09:08 mate, I see your patches 15:09:15 I think everything else is in? 15:09:20 yes 15:09:49 Mine's in 15:09:51 woohoo 15:10:08 There clearly isn't time to make the change to Mate's code before the end of freeze day 15:10:26 Can we get it in and refactor as a bug during H-4? 15:11:21 maybe, can't decide 15:11:28 would like another opinion 15:11:32 Could we involve russelb? 15:11:48 could do, I would rather be more general 15:11:50 russellb even - I assume that russelb doesn't trigger his hilight :D 15:12:02 i have russelb highlghted too actually :-) 15:12:16 oh... ok :) 15:12:20 whatcha need 15:12:31 So, the scenario is that we've got a change that could do with some refactoring to make it more generic 15:12:48 We don't have time today to do that refactor 15:13:05 so I was wondering what your thoughts were on using a bug to track the refactoring and us committing to fix it during H-4 15:13:10 and getting the fix in today? 15:13:23 which patch 15:13:29 https://review.openstack.org/#/c/40909/ 15:13:36 https://review.openstack.org/#/c/41651/ 15:13:48 how much time do you need? 15:13:48 russelb: those are the patches 15:13:58 Only a few days 15:14:10 like, could it be merged this week with a refactor? 15:14:21 sure 15:14:55 i'd rather "do it right" in general 15:15:17 so if doing it right means you need an extra day or two, and it's limited to the xenapi driver 15:15:22 and you guys are comfortable with it going in 15:15:32 and johnthetubaguy commits to reviewing it immediately 15:15:47 I am good with +2 if we commit to refactor next week 15:15:49 then i think we can grant an exception for extra time 15:15:55 can it be this week? :-) 15:16:01 oh i see what you're saying 15:16:01 hm 15:16:07 well, it could be called a bug 15:16:12 yeah 15:16:22 basically, its some extension to the glance download/upload 15:16:22 would you be upset if this went out in havana without a refactor? 15:16:30 I would be OK with that I guess 15:16:41 just using that as a test for whether it should go in now or not 15:16:45 just its extra driver specific stuff, where it doesn't have to be I guess 15:16:46 not saying we should do it 15:16:50 ah i see 15:17:02 well if you're OK with it as is, i say go for it 15:17:04 does targz raw images download/upload 15:17:19 and then we'll evaluate the refactor when ready 15:17:23 hard to call it a bug though 15:17:24 cool, OK 15:17:35 yeah, thats fair 15:17:46 but maybe we'll look at it and decide it's worth it 15:17:49 hard to say 15:17:54 i'm speaking high level, haven't been able to look at code 15:18:07 but if you want it in havana, sounds like you should merge what you have as long as you're OK with it as is 15:18:18 cool 15:18:20 if it was just another day, i'd grant an extension 15:18:45 but if we're looking at next week ... i'd go with what you have 15:18:49 good luck guys :) 15:18:54 thanks. 15:18:57 russellb: thanks 15:19:04 OK, so I am +2 on those now 15:19:10 we should move on 15:19:17 thanks 15:19:19 Okay, thanks. 15:19:29 #topic Docs 15:19:33 any updates this week 15:19:40 * BobBall looks at matel 15:19:43 I saw some activity bug wise that could be related? 15:19:44 that was you too :) 15:19:47 So install guide 15:20:05 I was thinking, that we should move all xenserver setup stuff to the install guide 15:20:12 +1 15:20:30 So that it would be part of both RH, Ubuntu, etc docs 15:20:39 would be nice to add some specific nova-compute steps too 15:21:20 would be good to not make it look like we just cut and paste some ideas to the front of the doc, they are quite step-by-step structured from what I remember? 15:22:30 Sure. 15:23:53 OK, 15:24:00 so any more on that? 15:24:07 or shall we move on to QA? 15:24:35 OKay. 15:24:47 There were going to be more updates on docs next week - but if Mate has to do the rebasing... ;) 15:25:03 refactoring* 15:25:06 anywya 15:25:10 move on to QA yeah 15:25:20 #topic QA and Bugs 15:25:39 hows the gating stuff going? 15:25:53 Mate will be on Holiday for 10 days starting next week Monday 15:25:55 https://bugs.launchpad.net/bugs/1218528 is an interesting one 15:25:57 Launchpad bug 1218528 in nova "openvswitch-nova in XenServer doesn't work with bonding" [Medium,Confirmed] 15:25:57 oh okay - gating 15:26:13 SS is now automatically commenting with the XS bugs 15:26:19 lets come back to that one 15:26:21 after we've parsed out the puppet and build failures 15:26:33 so that's on the road to getting -2 privs for SS 15:26:35 which is great 15:26:41 BUT the infra team are too good 15:26:42 sweet 15:26:52 yeah, yet are very impressive 15:26:52 or, more importantly, a combination of -infra and tempest 15:26:56 they^ 15:27:09 now the nodes are running tempest in parallel test times have almost halved 15:27:20 tempest would be a much better set of tests to run, because devs can run them 15:27:29 making the plan of SS commenting before tempest completes less useful 15:27:39 It's now much more of a race condition than it was before 15:27:56 so I'm not happy for SS but of course faster gate checks are better 15:28:05 in terms of where that leaves us... well... it's an interesting one 15:28:10 well SS still tests XenServer, and that is golden 15:28:24 without that, we will get pulled out of nova 15:28:27 I've also been working with RS cloud to get a XS virtualised - which works now 15:28:35 so I'm now thinking about how to get that into HP cloud 15:28:37 cool, thats a good thing to have 15:28:50 at which point we could run subset of tempest (I'd rather keep it to a subset than the full thing) 15:29:02 does tempest work in the RAX cloud now? 15:29:05 and have that integrated in -infra fully 15:29:12 KVM + HVM should work right? 15:29:26 can't use RAX because it uses xenstore to pass the IP which we can't get at because we're running Xen in the HVM which would intercept the hypercalls 15:29:46 so until RAX uses configdrive we can't do nested tempest there 15:29:55 that should be very very soon 15:30:10 in terms of HP, they use DHCP, so if I can get an instance there we can use it 15:30:21 except it doesn't have the IP address in there for some reason, I need to work on that then 15:30:24 getting an instance there is more fun because HP don't support PXE or iso or image upload... 15:30:30 *grin* 15:30:44 hmm, enjoy 15:30:58 plan is to do something extremely ugly 15:31:00 but quite fun 15:31:20 create centos image - new partition to dump the XS iso on, replace bootloader, kernel, initrd, reboot and pray 15:31:21 what about testing xenserver-core? 15:31:41 hmm, well that could work 15:31:46 xenserver-core has a number of things we're working on fixing ATM. XS is more "certain" than xenserver-core in terms of how confident we are that it'd work 15:32:08 OK, well keep in touch about the RAX issues, we have a few things in the pipe to fix that soon(ish) at least 15:32:29 well I'd quite like to create that hacky swizzle script 15:32:39 that way we can replace anything with a XS with the right IP etc 15:32:41 would be very cool 15:33:12 Althouhg... the next problem would be that infra expect everything to run in the host that you ssh to... so maybe would need redirection to the domU and some weird setups 15:33:29 yeah, thats the bit I think is worth trying to fix 15:33:59 hmm, so you could use Rescue mode on RAX to hack around the IP issue 15:34:07 I'm hoping it'll be quite simple - but unless we can get the hard bits done I'm not going to look at that 15:34:15 mount the disk and inject the IP address, for the moment 15:34:40 indeed 15:34:47 but I think it would be better to fix that issue of integration with infra stuff, rather than getting XenServer running, that should fix its-self in a few days 15:34:55 but thats just my 2 cents 15:34:58 but if I can get the swizzle working I'd use that to install XS on the RS cloud too 15:35:15 the way it works at the moment (hidden images, PXE boot, custom URL...) is faffy too 15:35:22 yeah, for sure 15:36:05 anyways, bugs you want to mention 15:36:13 would be good to set a goal for H-4 too 15:36:21 https://bugs.launchpad.net/bugs/1218528 15:36:22 Launchpad bug 1218528 in nova "openvswitch-nova in XenServer doesn't work with bonding" [Medium,Confirmed] 15:36:37 yeah, I didn't quite get what was going on there, do tell 15:36:45 some init script running in Dom0?? 15:37:13 indeed 15:37:17 tos et up the firewall rules 15:37:40 MAC and IP address stuff? 15:37:48 anti-spoof? 15:37:48 https://github.com/openstack/nova/tree/master/plugins/xenserver/networking/etc/init.d 15:37:51 yeah 15:38:28 hmm, that was written for bridge and nova-network right? 15:38:45 oh, maybe not 15:38:46 yes 15:38:48 no? 15:38:50 says OVS 15:39:20 but it is nova-network 15:39:22 I think 15:39:29 yes, of cousre it is 15:39:30 sure, nova-network flatDHCP with OVS 15:40:04 oh 15:40:09 I've got another fun one 15:40:13 let's talk about it here! 15:40:15 woohoo 15:40:17 TrustedFilter 15:40:22 yeah, that stuff is a mystery to me 15:40:28 looks at nova's interpretation of a host 15:40:35 oh right 15:40:38 each service has a host and a node 15:40:50 with host being set to what the compute reports and node being set to hypervisor_hostname 15:40:56 I think that's plain wrong. 15:41:02 I think it should be the other way round 15:41:15 but changing it probably breaks the world 15:41:39 not sure, that stuff was added for baremetal 15:42:09 host is defiantly nova-compute though 15:42:17 bug do go on 15:42:22 you got a link? 15:42:28 but^ 15:42:33 just getting it 15:43:07 https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L436 15:43:27 My issue is that "host" should be the hypervisor and "node" should be the compute node 15:43:31 but this gets them the wrong way round 15:43:49 not so sure, the host has always been nova-compute 15:43:49 There are one of two bugs here 15:43:52 either nova has got it wrong 15:44:03 or XenAPI needs to report the DomU name as hypervisor_hostname 15:44:08 which is completely screwy 15:44:33 the node has always been the nova-compute too :) 15:44:38 except for XenAPI 15:44:45 yeah, given the current model, I thought it would be both DomU address 15:44:48 so surely we get to chose which way round it is? ;) 15:44:53 it's not 15:45:00 Conf.host is the name of the nova-compute 15:45:01 service['host'] is the DomU name 15:45:14 'hypervisr_hostname' is the hostname 15:45:20 hmm 15:45:29 it's fine for libvirt 15:45:40 but all other hypervisors have probably got it the other way round 15:45:46 so the service host has to be what the RPC message goes to 15:46:02 ATM yes 15:46:18 the node, in baremetal, is the thing the VM is (or in our case, runs on) 15:46:38 anyway, we are getting descracted 15:46:44 Is this another question that Mr RB would be good to call in on? just to understand which way round it should be? 15:46:47 what is the issue with trusted filer? 15:46:51 The problem is the TrustedFilter uses .host 15:46:54 which is the DomU 15:47:00 right 15:47:02 and .node "feels" wrong 15:47:12 well node is the DomU always 15:47:16 I mean 15:47:19 but if .host really is the compute node and .node really is the host for the VMs then TrustedFilter needs changing 15:47:22 node is what is running nova-compute 15:47:31 not according to that code :D 15:47:37 .node is the machine 15:47:40 oh man 15:47:40 .host is nova-compute 15:47:44 I typed it wrong 15:47:54 yep, host is nova-compute 15:48:09 which is wrong. Almost as wrong as calling VMs servers... 15:48:10 node is the specific hypervisor, or baremetal node 15:48:40 well depends what you are looking at 15:48:40 So TrustedFilter has it the wrong way round 15:48:43 nope 15:48:54 yeah - TrustedFilter _MUST_ check the host 15:48:59 trusted filter is meant to confirm the nova-compute code, I thought? 15:49:05 no 15:49:09 the hypervisor that'll run the VMs 15:49:10 as well at the KVM hypervisor right? 15:49:32 well, in KVM land it would apply to the whole host right? 15:49:38 https://github.com/openstack/nova/blob/master/nova/scheduler/filters/trusted_filter.py#L291 is what I mean 15:49:52 sure - but the thing that can register as trusted is the hypervisor only 15:50:00 even in KVM land there is no trustworthyness of the nova code 15:50:01 that's never checked 15:50:16 only the hypervisor (I think in KVM it's just kernel and qemu even) 15:50:28 OK 15:50:38 so its what the attestation server is confirming 15:50:56 The attestation service only knows about the hypervisors 15:50:59 well I think the trusted filter should be checking the other value 15:51:08 it has no knowledge at all about the nova-computes 15:51:10 that should work for both right? 15:51:19 who knows :) 15:51:25 maybe 15:51:37 if we're asserting that in KVMland the two are always the same value 15:51:42 yep, I am 15:51:56 I thought that was true for us too, but it makes sense we use the second value 15:52:18 for a cluster you have nova-compute, hypervisor-x 15:52:51 yeah 15:53:04 but bare meta has nova-compute, random-server-for-VM-on-bare-metal-thingy 15:53:09 anyways 15:53:16 running out of time 15:53:19 any other bugs? 15:53:20 yeah 15:53:24 probably 15:53:29 but lets save some of the fun for next week 15:53:35 What is our goal for H-4? 15:53:52 I would like to see all medium bugs squashed, is this a crazy idea? 15:54:02 Probably - depending on what they are 15:54:08 We want to seriously improve the docs 15:54:15 well, medium as defined by the priority 15:54:27 Yes, docs would be a good thing too 15:54:34 I meant depending what the specific bugs are 15:54:39 Are you Citrix guys concentrating on docs then? 15:54:52 I am talking about all bugs that impact key features 15:55:02 for the next few weeks yes 15:55:06 but we'll see how that goes 15:55:17 I don't think most of the bugs are as bad as the docs ATM 15:55:23 so that is our priority 15:55:30 OK, that sounds good 15:55:38 I will concentrate on bugs, if you guys are on Docs 15:55:49 Great 15:55:57 I will try rope in some help from some other Rax guys too 15:56:06 we only have a few weeks for bug fixing 15:56:17 before we get more cautious towards release 15:56:23 One more thing... 15:56:33 oh yeah 15:56:46 testing release candidates, and things 15:56:58 we should really help out with that, make sure H has good XenServer support 15:57:09 I mean some testing beyond the CI 15:57:27 I'd much rather get more stuff tested in the CI 15:57:35 stuff like full tempest (do you run that all the time now?) plus manual testing through the GUI to make sure it all hangs together OK 15:57:39 which is a background thing we've been working on 15:57:51 yes - full tempest is running in the CI and passing 15:57:58 OK thats good 15:58:11 I'm just wondering, could we get an account, and report the results? 15:58:17 I am only talking about half a day playing with the builds manually at RC? 15:58:21 Voting with +1 ? 15:58:25 only once we've got the logs auto connected matel 15:58:28 matel: its open to anyone, so go for it 15:58:39 BobBall: you can just use paste.openstack.org for the logs 15:58:48 I know - but we need the CI to collect them 15:58:52 which it doesn't yet 15:58:57 It requires a captcha for big files 15:59:14 hmm, that sucks 15:59:26 besides which - we've got space we can put it - so no need to use paste 15:59:32 we'll see 16:00:41 Can we call time? 16:00:48 yep 16:00:48 and johnthetubaguy 16:00:50 can I call it? 16:00:52 #endmeeting