14:06:51 #startmeeting nova 14:06:53 Meeting started Thu Nov 26 14:06:51 2015 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:06:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:06:56 The meeting name has been set to 'nova' 14:06:59 o/ 14:06:59 o/ 14:07:02 o/ 14:07:04 o/ 14:07:06 hi 14:07:09 o/ 14:07:10 \o 14:07:16 hi 14:07:17 o/ 14:07:28 o/ 14:07:29 o/ 14:07:30 o/ 14:07:51 so happy thanksgiving to everyone who isn't here 14:07:55 lets get started 14:08:12 #link https://wiki.openstack.org/wiki/Meetings/Nova 14:08:18 #topic Release status 14:08:41 #info Mitaka-1 is likely to be tagged on Tuesday, 1st December 14:08:57 #info Spec freeze is on Thursday, 3rd December 14:09:09 so its a good time to look out for major bugs that would stop us tagging 14:09:43 although better still, make sure everything you care about is well tested in our CI systems 14:09:55 so... specs 14:10:03 we have a few specless BPs in the etherpad 14:10:10 #link https://etherpad.openstack.org/p/mitaka-nova-spec-review-tracking 14:10:51 Ok, seems like no objections there 14:11:07 will approve those on Monday, if there are no objections 14:11:20 #topic Regular Reminders 14:11:26 lets focus on these code reviews: 14:11:35 #link https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking 14:11:37 #topic Bugs 14:11:43 markus_z: how are we doing? 14:11:54 lots to triage, I believe? 14:12:06 yeah, that's the short version. 14:12:17 I'll do a first skimming Monday morning 14:12:34 A lot of ancient open bugs, I don't yet know how to tackle them 14:12:57 we would need a bug contact for network and db 14:13:16 I guess we just have network, and not nova-network vs neutron? 14:13:24 markus_z: well, bugs that are 2 cycles old seem pretty hard to work on 14:13:35 because of all the changes 14:13:42 we do have the bug expiry idea/thingy? 14:13:50 for Invalid only AFAIK 14:13:51 not sure when that kicks in 14:13:53 ah 14:13:54 bauzas: true 14:14:05 incomplete without assignee for 60 days + 14:14:05 so we could ask for reproducing again 14:14:15 by putting them incomplete 14:14:22 but that sounds a little harsh 14:14:30 honestly, most folks will have moved on by then, but yeah, we could do with something 14:14:37 but we need to be sure that the problem is still present on a supported release 14:14:47 anyways, thinking caps to solve that 14:15:00 its seems the third part ci watcher is down 14:15:05 so no news there 14:15:24 like we could gently close the very old bugs and say "if that's still present on master/liberty, please reopen the bug" 14:15:36 yeah 14:15:40 bauzas, +1 14:15:43 bauzas: yeah, I think that's a fair approach 14:15:48 it feels bad, but its better than a pile we just leave alone 14:15:58 that's my point 14:16:01 #topic stuck reviews 14:16:08 so we have some links added here 14:16:11 we need to get more insight from ops about what really doesn't work 14:16:18 #link https://review.openstack.org/#/c/239798/ 14:16:22 but yeah moving on 14:16:26 spec: Encryption support for rbd-backed volumes 14:16:31 I can sum that up if you want me to 14:17:02 in order to encrypt RBD, we have to go via the hypervisor's kernel (just as we do for iSCSI/FC) 14:17:14 but the current non-encrypted rbd case is not touching the hypervisor, as qemu has native rbd support 14:17:16 unfortunately, mriedem should be eating some food now 14:17:40 danpb said that he is working on native qemu encryption expected to hit around Nxxx in qemu trunk 14:18:06 everybody agreed that that's the way to go forward, but it will take 6-12 months for that trunk to trickle down to distros 14:18:20 the core issue is that there are 2 RBD clients - one in QEMU and one in the kernel 14:18:22 the code has been implemented before the liberty release and was shown to make the encrypted rbd test cases work 14:18:35 Nova uses the one in QEMU because it is better in essentially every way 14:18:52 there are a lot of un answered questions from mriedem in the spec as well 14:19:16 johnthetubaguy, I think tha latest spec answers every question that we had in the comments 14:19:26 I really don't want to see Nova silently switching to a completely different RBD storage driver when a encrypted=true flag is set 14:19:51 we could make it something that needs to be explicitly enabled and add the warning to the documentation 14:19:52 it wouldn't be silent 14:19:53 nagyz: ah, OK, I didn't see anything in the comments, my bad 14:19:54 as that is going to impose a support burden on nova maintainers, openstack vendors and cloud administrators alike 14:20:29 and currently the encryption support is just not there for rbd at all. we had to disable the tests in gate even for these scenarios 14:20:49 and we're already seen from the ongoing race conditions & bugs we have dealing with lvm & iscsi that working with the in-kernel clients is a serious maintenance burden 14:20:52 nagyz: agreed, we don't support encryption + rbd 14:21:27 rbd mapping is a lot more simple than iscsi/fc 14:21:30 *simpler 14:21:36 danpb: agreed, that sounds really confusing that we could call 2 different drivers based on a conf flag which is not about that 14:21:38 I fully agree that we need to have encryption for *all* storage drivers, but I really question why it is urgent to solve in Mitaka when the solution has a bad technical architecture 14:21:46 +1 14:21:59 and we have a clear path towards a desired good technical architecture which is viable in the next release 14:22:10 that will work for *all* storage drivers, not merely RBD 14:22:12 so it's better to not have encryption for ceph at all until Oxxx or Pxxxx? 14:22:30 as the in-qemu encryption will start hitting distros around that 14:22:30 nagyz: the next release is Nxxx 14:22:34 yes, I'm aware 14:22:53 nagyz: the QEMU support will hit community distros before Nxx is even released 14:23:05 nagyz: creating technical debt for something which would be changed in 2 cycles sounds really wrong to me 14:23:07 but just because qemu trunk has the code during the Nxxx release, the nova/openstack part won't be there suddenly 14:23:17 nagyz: enterprise distros can choose to backport it if they so desire the feature urgently 14:23:21 bauzas: +1 14:23:47 bauzas, it's a simple extension to how currently iSCSI/FC works... are those technical debts as well then? 14:24:00 nagyz: it is *not* a simple extension of how iSCSI currently works 14:24:07 I agree that having in-qemu down the line makes a lot of sense, no argument there 14:24:17 but it's simply not going to happen for a long while 14:24:18 I like danpb's approach of having a single abstraction for all our storage drivers, so yes 14:24:19 nagyz: you are changing from the in-QEMU RBD to the in-kernel RBD driver which is a bad plan 14:24:42 for a case that's currently not even supported and you just get nice exceptions all around as operators... 14:24:45 as we've seen how unstable it is to deal with the in-kernel drivers for SCSI/FC 14:25:37 so I think its better to have a reliable feature we can maintain, it seems like there is a route to make that happen during N, and right now we already have more blueprints than we can possible merge in M, it seems better to focus on other things for M 14:26:10 * danpb doesn't really have anything more to add - i'm -1 on anything that switches nova to use in-kernel RBD driver - if other nova-cores feel strongly to approve it regardless go ahead 14:26:10 just for the record: this won't be ready and supported in N besides trunk in any distros... I have a beer to bet on that. 14:26:27 * nagyz has nothing more to add as well 14:27:11 so I think we can move on ? 14:27:17 nagyz: I guess we are saying its easier to backport QEMU to get the feature than fix all the bugs the other route has 14:27:20 yeah 14:27:56 so no encryption for 60%+ userbase. 14:27:56 roger 14:28:30 #link https://review.openstack.org/#/c/184537/ 14:28:38 that's mine 14:28:53 is it nova-manage calling the RPC API ? 14:29:10 I think this is where I am a bit worried about the long term maintainance 14:29:24 didn't see your reply on that till just now 14:29:24 bauzas: yes 14:29:38 I am not fully convinced of changing the nvoa code 14:29:38 code: Nova-manage command for cleaning volume attachment 14:29:53 adding a new param which is used jsut nova-mangage 14:29:56 mmm, was there a spec ? 14:29:59 that is my only concern 14:30:09 bauzas: it was a very long spec 14:30:27 my concern is also having nova-manage calling the message queue 14:30:40 I thought it was not something accepted 14:30:44 so the compute.api.py I see as the facade to trigger all compute actions, its primary user is the REST API code 14:31:06 that I agree 14:31:11 I am really suggesting a new def force_detach_volume method in compute/api.py 14:31:27 who would be the caller ? nova-manage? 14:31:44 bauzas: at the moment yes nova-manage 14:32:21 okay, if it has been agreed to use nova-manage, then I'm +1 with johnthetubaguy 14:32:30 because the facade should be the compute API 14:32:33 johnthetubaguy: ok I can implement a new method It would be good to have the other 2 cores agree with that idea 14:32:45 but they are not around. 14:33:05 andrearosa: yep, they are only in the same timezone for the late meeting I fear 14:33:11 to gain a +2 I don't want to lose 2 +2 :-p 14:33:12 andrearosa, what happened to making detach just do it 14:33:54 ndipanov: do you mean by the original code? 14:34:22 IIRC the trade-off was to not touch and change the nova code path too much 14:34:33 okay... 14:34:48 as this feature is a kind of operators-only tool 14:34:56 nova manage is a compromise I guess 14:35:05 as the alternative at the moment is amending manually the DB 14:35:11 ndipanov: yeap 14:35:13 yeah, I mean we might end up wanting to do both, I guess 14:35:17 why not make nova-manage do that 14:35:24 it does it in a ton of other places 14:35:25 but this seems like a good force operation, anyways 14:35:38 OK, we should move on to the next bits 14:35:45 #topic Open Discussion 14:35:52 so we have a few items here 14:35:52 ok I'll submit a enw patch, thanks 14:36:09 #link https://review.openstack.org/#/c/247372/ 14:36:19 Copy encryptors from Nova to os-brick 14:36:35 Hi, I put this one up 14:36:46 i'm not seeing any nova spec for this yet unless i missed it 14:37:06 this is more a discussion about what goes into os-brick, as I understand it 14:37:34 The short story being that cinder is going ahead and copying the encryptor code into brick, since we need it to create encrypted volumes from images and to retype to encrypted volumes 14:38:02 It seems an obvious source of bugs to have 2 copies of the same code that can get out-of-sync 14:38:06 ah, so you don't need decrypt? 14:38:09 DuncanT: agreed 14:38:25 my concern is that it is not clear exactly what code is being moved 14:38:33 There's ongoing discussion about whether we need decrypt 14:38:45 But encrypt is the priority 14:38:52 and whether it impacts on the nova volume setup code in virt drivers 14:39:29 it seems to match the use case of everything else in os-brick, unless I am missing something massive? 14:39:31 we don't want to hand setup of encrypted volumes off to os-brick, because we want the flexibility to choose how we deal with encrypted volumes 14:39:32 danpb: Would you like a spec? A patch? We can work with process if somebody will tell us what that process is 14:40:07 eg as discussed earlier, our long term goal is to not use dm-crypt at all and just configure encryption in QEMU 14:40:09 danpb: Brick won't be taking over, just acting as a repository of the helper methods so that they work the same in nova and cinder 14:40:20 so I am not understanding the question here? 14:40:54 is the question, if we move the code to os-brick, will nova delete the old code and use the code in os-brick? 14:41:00 danpb: I don't see that this affects your long term plans, just stop calling the code in brick (unless you want to test back-compat I guess, up to you) 14:41:22 johnthetubaguy: Yes. That's the question - is nova interested in patches to use the brick code 14:41:24 ok, that sounds ok - it just wasn't clear from the description provided 14:41:40 yeah, it seems to make total sense to me 14:41:54 though I could suggest we could just create a standalone python dm-crypt library 14:41:54 needs to be shared, lets share it through os-brick 14:42:03 rather than just stuffing it into os-brick 14:42:38 danpb: Brick exists as a place to stuff block related code. It isn't very big, and having yet another library is a management pain IMO 14:43:02 it does seem easier to put it in there, we could split it out later if that turns out to be a bad idea 14:43:16 but I don't really know that code too well 14:43:40 so feels like we can create a blueprint to track that work, a specless one? 14:44:14 Sure, blueprint in the next few days 14:44:15 the details are really down to what os-brick does I feel, which should be discussed in the os-brick context 14:44:47 DuncanT: cool, ping me when thats up, and I can do the needful, although we do have a blueprint freeze a week today 14:45:01 We'll try to make the initial move line-for-line exact, and I'll add some nova folks to the reviews - please call us out on any changes initially 14:45:12 I'll go write a blueprint now then :-) 14:45:30 DuncanT: sounds good, thanks for reaching out 14:45:41 DuncanT: nb, one other caveat is that os-brick will need to be careful about not changing API once it owns it 14:46:15 so as to avoid us having to do lock-step updates of nova & cinder & os-brick for api incompatible changes 14:46:16 danpb: Acknowledged and understood. 14:46:19 so there is a performance team note about issues with n-api-meta causing high CPU load 14:46:50 its referring to large ops tests 14:47:04 I think there is an action for mriedem, so will live that on the agenda for next time 14:47:18 yeah doesn't seem like there's really anything to discuss for that 14:47:31 #link https://bugs.launchpad.net/nova/+bug/1515768 14:47:31 Launchpad bug 1515768 in tacker "Instance creation fails with libvirtError: Unable to create tap device: Device or resource busy " [Undecided,New] 14:47:34 there is a new bug someone has linked here 14:47:40 hi: i put this one up 14:48:43 there is duplicate set of ports in nova cache for the instance when it is trying to plug 3 vnics to the instance 14:49:05 sripriya: what would you like to discuss? is this requesting help to debug the problem? 14:49:21 johnthetubaguy: yes 14:49:38 looks like this is one of those things that causes the intermittent generic ssh failure in the tests 14:49:47 but its nice we have a specific failure mode 14:49:49 not sure 14:50:01 sripriya: have you created an ER query for this? 14:50:08 it's a bit different from the setup we have in the gate AFAIK 14:50:09 johnthetubaguy: it fails 1 out of 3 times 14:50:26 johnthetubaguy: sorry what's ER query? 14:50:27 not sure it's a gate failure 14:50:32 checking now 14:50:36 sripriya: so the bug claims its a gate failure 14:50:37 yeah this just sounds like a nova network cache problem 14:50:44 bauzas: yes, happens one out of 3 times 14:50:51 on your env 14:50:55 its corrupting itself and getting duplicate nic entries 14:51:01 here, I'm wondering if the CI is impacted 14:51:17 so the ER query is basically how we track gate bugs, to see how often we hit them 14:51:20 danpb: yeah that's my understanding 14:51:20 johnthetubaguy: yes, reproducible on local too in fact on a physical system, fails 1 out of 3 times 14:51:21 i've seen a very similar bug reported downstream in RHEL openstack, but i could have sworn we identified it as already fixed upstream 14:51:43 OK, lets catch up on that in #openstack-nova after the meeting 14:51:46 oh man, launchpad why are you trampling the \n ? 14:51:52 agreed 14:52:02 johnthetubaguy: thank you 14:52:16 https://review.openstack.org/#/c/248780 FK between BandwidthUsageCache.uuid and Instance, how do we add this safely? 14:52:58 lxsli I think this is your question? 14:53:24 hes coming 14:53:27 here 14:53:43 mriedem suggested I add this to the meeting 14:54:14 my patch cleans up orphaned data now and turns off FK constraints for MySQL, do we need to add other DB-specific protections or do we think it's safe now? 14:54:41 so the problem we have here, is we have not test to check "online"-ness of a migration yet 14:54:56 have no test to^ 14:55:16 which kinda sucks, but I think thats the best we can do right now really 14:56:20 I'm all good to continue then? No arguments about adding an FK? 14:56:34 seems like a sane thing to add, from where I am say 14:56:38 am sat 14:56:54 great, I'm happy then 14:57:01 I know we started off with zero FKs in the DB, but I think that was a premature optimisation really 14:57:10 well, the ML covered it quite well I think 14:57:20 yeah Mike was very helpful 14:57:41 cool 14:57:45 any more for any more? 14:57:48 have a question about spec https://review.openstack.org/#/c/180049/ , the main purpose is to add new nova-manage cmd to list the compute node metrics stored in nova DB 14:58:16 oh, yes, fire away 14:58:23 someone said it could go specless, should I do that or still waiting for spec approval? 14:59:25 it already got 1 +2 and several +1s, don't know which way is better 14:59:41 llu-laptop: we have a spec for it now, I think this needs a spec really, just to capture all the conversation we had about that 14:59:53 OK, so the clock says we are done 15:00:00 thanks all 15:00:01 #endmeeting