#openstack-meeting log

14:06:51 <johnthetubaguy> #startmeeting nova
14:06:53 <openstack> Meeting started Thu Nov 26 14:06:51 2015 UTC and is due to finish in 60 minutes.  The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:06:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:06:56 <openstack> The meeting name has been set to 'nova'
14:06:59 <markus_z> o/
14:06:59 <gcb> o/
14:07:02 <nagyz> o/
14:07:04 <dguitarbite> o/
14:07:06 <kaisers> hi
14:07:09 <PaulMurray> o/
14:07:10 <bauzas> \o
14:07:16 <andrearosa> hi
14:07:17 <alex_xu> o/
14:07:28 <gibi> o/
14:07:29 <llu> o/
14:07:30 <jichen> o/
14:07:51 <johnthetubaguy> so happy thanksgiving to everyone who isn't here
14:07:55 <johnthetubaguy> lets get started
14:08:12 <johnthetubaguy> #link https://wiki.openstack.org/wiki/Meetings/Nova
14:08:18 <johnthetubaguy> #topic Release status
14:08:41 <johnthetubaguy> #info Mitaka-1 is likely to be tagged on Tuesday, 1st December
14:08:57 <johnthetubaguy> #info Spec freeze is on Thursday, 3rd December
14:09:09 <johnthetubaguy> so its a good time to look out for major bugs that would stop us tagging
14:09:43 <johnthetubaguy> although better still, make sure everything you care about is well tested in our CI systems
14:09:55 <johnthetubaguy> so... specs
14:10:03 <johnthetubaguy> we have a few specless BPs in the etherpad
14:10:10 <johnthetubaguy> #link https://etherpad.openstack.org/p/mitaka-nova-spec-review-tracking
14:10:51 <johnthetubaguy> Ok, seems like no objections there
14:11:07 <johnthetubaguy> will approve those on Monday, if there are no objections
14:11:20 <johnthetubaguy> #topic Regular Reminders
14:11:26 <johnthetubaguy> lets focus on these code reviews:
14:11:35 <johnthetubaguy> #link https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking
14:11:37 <johnthetubaguy> #topic Bugs
14:11:43 <johnthetubaguy> markus_z: how are we doing?
14:11:54 <johnthetubaguy> lots to triage, I believe?
14:12:06 <markus_z> yeah, that's the short version.
14:12:17 <markus_z> I'll do a first skimming Monday morning
14:12:34 <markus_z> A lot of ancient open bugs, I don't yet know how to tackle them
14:12:57 <markus_z> we would need a bug contact for network and db
14:13:16 <johnthetubaguy> I guess we just have network, and not nova-network vs neutron?
14:13:24 <bauzas> markus_z: well, bugs that are 2 cycles old seem pretty hard to work on
14:13:35 <bauzas> because of all the changes
14:13:42 <johnthetubaguy> we do have the bug expiry idea/thingy?
14:13:50 <bauzas> for Invalid only AFAIK
14:13:51 <johnthetubaguy> not sure when that kicks in
14:13:53 <johnthetubaguy> ah
14:13:54 <markus_z> bauzas: true
14:14:05 <markus_z> incomplete without assignee for 60 days +
14:14:05 <bauzas> so we could ask for reproducing again
14:14:15 <bauzas> by putting them incomplete
14:14:22 <bauzas> but that sounds a little harsh
14:14:30 <johnthetubaguy> honestly, most folks will have moved on by then, but yeah, we could do with something
14:14:37 <bauzas> but we need to be sure that the problem is still present on a supported release
14:14:47 <johnthetubaguy> anyways, thinking caps to solve that
14:15:00 <johnthetubaguy> its seems the third part ci watcher is down
14:15:05 <johnthetubaguy> so no news there
14:15:24 <bauzas> like we could gently close the very old bugs and say "if that's still present on master/liberty, please reopen the bug"
14:15:36 <johnthetubaguy> yeah
14:15:40 <PaulMurray> bauzas, +1
14:15:43 <markus_z> bauzas: yeah, I think that's a fair approach
14:15:48 <johnthetubaguy> it feels bad, but its better than a pile we just leave alone
14:15:58 <bauzas> that's my point
14:16:01 <johnthetubaguy> #topic stuck reviews
14:16:08 <johnthetubaguy> so we have some links added here
14:16:11 <bauzas> we need to get more insight from ops about what really doesn't work
14:16:18 <johnthetubaguy> #link https://review.openstack.org/#/c/239798/
14:16:22 <bauzas> but yeah moving on
14:16:26 <johnthetubaguy> spec: Encryption support for rbd-backed volumes
14:16:31 <nagyz> I can sum that up if you want me to
14:17:02 <nagyz> in order to encrypt RBD, we have to go via the hypervisor's kernel (just as we do for iSCSI/FC)
14:17:14 <nagyz> but the current non-encrypted rbd case is not touching the hypervisor, as qemu has native rbd support
14:17:16 <bauzas> unfortunately, mriedem should be eating some food now
14:17:40 <nagyz> danpb said that he is working on native qemu encryption expected to hit around Nxxx in qemu trunk
14:18:06 <nagyz> everybody agreed that that's the way to go forward, but it will take 6-12 months for that trunk to trickle down to distros
14:18:20 <danpb> the core issue is that there are 2 RBD clients - one in QEMU and one in the kernel
14:18:22 <nagyz> the code has been implemented before the liberty release and was shown to make the encrypted rbd test cases work
14:18:35 <danpb> Nova uses the one in QEMU because it is better in essentially every way
14:18:52 <johnthetubaguy> there are a lot of un answered questions from mriedem in the spec as well
14:19:16 <nagyz> johnthetubaguy, I think tha latest spec answers every question that we had in the comments
14:19:26 <danpb> I really don't want to see Nova silently switching to a completely different RBD storage driver when a encrypted=true flag is set
14:19:51 <nagyz> we could make it something that needs to be explicitly enabled and add the warning to the documentation
14:19:52 <nagyz> it wouldn't be silent
14:19:53 <johnthetubaguy> nagyz: ah, OK, I didn't see anything in the comments, my bad
14:19:54 <danpb> as that is going to impose a support burden on nova maintainers, openstack vendors and cloud administrators alike
14:20:29 <nagyz> and currently the encryption support is just not there for rbd at all. we had to disable the tests in gate even for these scenarios
14:20:49 <danpb> and we're already seen from the ongoing race conditions & bugs we have dealing with lvm & iscsi that working with the in-kernel clients is a serious maintenance burden
14:20:52 <johnthetubaguy> nagyz: agreed, we don't support encryption + rbd
14:21:27 <nagyz> rbd mapping is a lot more simple than iscsi/fc
14:21:30 <nagyz> *simpler
14:21:36 <bauzas> danpb: agreed, that sounds really confusing that we could call 2 different drivers based on a conf flag which is not about that
14:21:38 <danpb> I fully agree that we need to have encryption for *all* storage drivers, but I really question why it is urgent to solve in Mitaka when the solution  has a bad technical architecture
14:21:46 <bauzas> +1
14:21:59 <danpb> and we have a clear path towards a desired good technical architecture which is viable in the next release
14:22:10 <danpb> that will work for *all* storage drivers, not merely RBD
14:22:12 <nagyz> so it's better to not have encryption for ceph at all until Oxxx or Pxxxx?
14:22:30 <nagyz> as the in-qemu encryption will start hitting distros around that
14:22:30 <danpb> nagyz:  the next release is Nxxx
14:22:34 <nagyz> yes, I'm aware
14:22:53 <danpb> nagyz: the QEMU support will hit community distros before Nxx is even released
14:23:05 <bauzas> nagyz: creating technical debt for something which would be changed in 2 cycles sounds really wrong to me
14:23:07 <nagyz> but just because qemu trunk has the code during the Nxxx release, the nova/openstack part won't be there suddenly
14:23:17 <danpb> nagyz: enterprise distros can choose to backport it if they so desire the feature urgently
14:23:21 <alex_xu> bauzas: +1
14:23:47 <nagyz> bauzas, it's a simple extension to how currently iSCSI/FC works... are those technical debts as well then?
14:24:00 <danpb> nagyz: it is *not* a simple extension of how iSCSI currently works
14:24:07 <nagyz> I agree that having in-qemu down the line makes a lot of sense, no argument there
14:24:17 <nagyz> but it's simply not going to happen for a long while
14:24:18 <bauzas> I like danpb's approach of having a single abstraction for all our storage drivers, so yes
14:24:19 <danpb> nagyz: you are changing from the in-QEMU RBD to the in-kernel RBD driver which is a bad plan
14:24:42 <nagyz> for a case that's currently not even supported and you just get nice exceptions all around as operators...
14:24:45 <danpb> as we've seen how unstable it is to deal with the in-kernel drivers for SCSI/FC
14:25:37 <johnthetubaguy> so I think its better to have a reliable feature we can maintain, it seems like there is a route to make that happen during N, and right now we already have more blueprints than we can possible merge in M, it seems better to focus on other things for M
14:26:10 * danpb doesn't really have anything more to add - i'm -1 on anything that switches nova to use in-kernel RBD driver  - if other nova-cores feel strongly to approve it regardless go ahead
14:26:10 <nagyz> just for the record: this won't be ready and supported in N besides trunk in any distros... I have a beer to bet on that.
14:26:27 * nagyz has nothing more to add as well
14:27:11 <bauzas> so I think we can move on ?
14:27:17 <johnthetubaguy> nagyz: I guess we are saying its easier to backport QEMU to get the feature than fix all the bugs the other route has
14:27:20 <johnthetubaguy> yeah
14:27:56 <nagyz> so no encryption for 60%+ userbase.
14:27:56 <nagyz> roger
14:28:30 <johnthetubaguy> #link https://review.openstack.org/#/c/184537/
14:28:38 <andrearosa> that's mine
14:28:53 <bauzas> is it nova-manage calling the RPC API ?
14:29:10 <johnthetubaguy> I think this is where I am a bit worried about the long term maintainance
14:29:24 <johnthetubaguy> didn't see your reply on that till just now
14:29:24 <andrearosa> bauzas: yes
14:29:38 <andrearosa> I am not fully convinced of changing the nvoa code
14:29:38 <johnthetubaguy> code: Nova-manage command for cleaning volume attachment
14:29:53 <andrearosa> adding a new param which is used jsut nova-mangage
14:29:56 <bauzas> mmm, was there a spec ?
14:29:59 <andrearosa> that is my only concern
14:30:09 <andrearosa> bauzas: it was a very long spec
14:30:27 <bauzas> my concern is also having nova-manage calling the message queue
14:30:40 <bauzas> I thought it was not something accepted
14:30:44 <johnthetubaguy> so the compute.api.py I see as the facade to trigger all compute actions, its primary user is the REST API code
14:31:06 <bauzas> that I agree
14:31:11 <johnthetubaguy> I am really suggesting a new def force_detach_volume method in compute/api.py
14:31:27 <bauzas> who would be the caller ? nova-manage?
14:31:44 <andrearosa> bauzas: at the moment yes nova-manage
14:32:21 <bauzas> okay, if it has been agreed to use nova-manage, then I'm +1 with johnthetubaguy
14:32:30 <bauzas> because the facade should be the compute API
14:32:33 <andrearosa> johnthetubaguy: ok I can implement a new method It would be good to have the other 2 cores agree with that idea
14:32:45 <andrearosa> but they are not around.
14:33:05 <johnthetubaguy> andrearosa: yep, they are only in the same timezone for the late meeting I fear
14:33:11 <andrearosa> to gain a +2 I don't want to lose 2 +2 :-p
14:33:12 <ndipanov> andrearosa, what happened to making detach just do it
14:33:54 <andrearosa> ndipanov: do you mean by the original code?
14:34:22 <andrearosa> IIRC the trade-off was to not touch and change the nova code path too much
14:34:33 <ndipanov> okay...
14:34:48 <andrearosa> as this feature is a kind of operators-only tool
14:34:56 <ndipanov> nova manage is a compromise I guess
14:35:05 <andrearosa> as the alternative at the moment is amending manually the DB
14:35:11 <andrearosa> ndipanov: yeap
14:35:13 <johnthetubaguy> yeah, I mean we might end up wanting to do both, I guess
14:35:17 <ndipanov> why not make nova-manage do that
14:35:24 <ndipanov> it does it in a ton of other places
14:35:25 <johnthetubaguy> but this seems like a good force operation, anyways
14:35:38 <johnthetubaguy> OK, we should move on to the next bits
14:35:45 <johnthetubaguy> #topic Open Discussion
14:35:52 <johnthetubaguy> so we have a few items here
14:35:52 <andrearosa> ok I'll submit a enw patch, thanks
14:36:09 <johnthetubaguy> #link https://review.openstack.org/#/c/247372/
14:36:19 <johnthetubaguy> Copy encryptors from Nova to os-brick
14:36:35 <DuncanT> Hi, I put this one up
14:36:46 <danpb> i'm not seeing any nova spec for this yet unless i missed it
14:37:06 <johnthetubaguy> this is more a discussion about what goes into os-brick, as I understand it
14:37:34 <DuncanT> The short story being that cinder is going ahead and copying the encryptor code into brick, since we need it to create encrypted volumes from images and to retype to encrypted volumes
14:38:02 <DuncanT> It seems an obvious source of bugs to have 2 copies of the same code that can get out-of-sync
14:38:06 <johnthetubaguy> ah, so you don't need decrypt?
14:38:09 <johnthetubaguy> DuncanT: agreed
14:38:25 <danpb> my concern is that it is not clear exactly what code is being moved
14:38:33 <DuncanT> There's ongoing discussion about whether we need decrypt
14:38:45 <DuncanT> But encrypt is the priority
14:38:52 <danpb> and whether it impacts on the nova volume setup code in virt drivers
14:39:29 <johnthetubaguy> it seems to match the use case of everything else in os-brick, unless I am missing something massive?
14:39:31 <danpb> we don't want to hand setup of encrypted volumes off to os-brick, because we want the flexibility to choose how we deal with encrypted volumes
14:39:32 <DuncanT> danpb: Would you like a spec? A patch? We can work with process if somebody will tell us what that process is
14:40:07 <danpb> eg as discussed earlier, our long term goal is to not use dm-crypt at all and just configure encryption in QEMU
14:40:09 <DuncanT> danpb: Brick won't be taking over, just acting as a repository of the helper methods so that they work the same in nova and cinder
14:40:20 <johnthetubaguy> so I am not understanding the question here?
14:40:54 <johnthetubaguy> is the question, if we move the code to os-brick, will nova delete the old code and use the code in os-brick?
14:41:00 <DuncanT> danpb: I don't see that this affects your long term plans, just stop calling the code in brick (unless you want to test back-compat I guess, up to you)
14:41:22 <DuncanT> johnthetubaguy: Yes. That's the question - is nova interested in patches to use the brick code
14:41:24 <danpb> ok, that sounds ok - it just wasn't clear from the description provided
14:41:40 <johnthetubaguy> yeah, it seems to make total sense to me
14:41:54 <danpb> though I could suggest we could just create a standalone  python dm-crypt library
14:41:54 <johnthetubaguy> needs to be shared, lets share it through os-brick
14:42:03 <danpb> rather than just stuffing it into os-brick
14:42:38 <DuncanT> danpb: Brick exists as a place to stuff block related code. It isn't very big, and having yet another library is a management pain IMO
14:43:02 <johnthetubaguy> it does seem easier to put it in there, we could split it out later if that turns out to be a bad idea
14:43:16 <johnthetubaguy> but I don't really know that code too well
14:43:40 <johnthetubaguy> so feels like we can create a blueprint to track that work, a specless one?
14:44:14 <DuncanT> Sure, blueprint in the next few days
14:44:15 <johnthetubaguy> the details are really down to what os-brick does I feel, which should be discussed in the os-brick context
14:44:47 <johnthetubaguy> DuncanT: cool, ping me when thats up, and I can do the needful, although we do have a blueprint freeze a week today
14:45:01 <DuncanT> We'll try to make the initial move line-for-line exact, and I'll add some nova folks to the reviews - please call us out on any changes initially
14:45:12 <DuncanT> I'll go write a blueprint now then :-)
14:45:30 <johnthetubaguy> DuncanT: sounds good, thanks for reaching out
14:45:41 <danpb> DuncanT: nb, one other caveat is that os-brick will need to be careful about not changing API once it owns it
14:46:15 <danpb> so as to avoid us having to do lock-step updates of nova & cinder & os-brick for api incompatible changes
14:46:16 <DuncanT> danpb: Acknowledged and understood.
14:46:19 <johnthetubaguy> so there is a performance team note about issues with n-api-meta causing high CPU load
14:46:50 <johnthetubaguy> its referring to large ops tests
14:47:04 <johnthetubaguy> I think there is an action for mriedem, so will live that on the agenda for next time
14:47:18 <danpb> yeah doesn't seem like there's really anything to discuss for that
14:47:31 <johnthetubaguy> #link https://bugs.launchpad.net/nova/+bug/1515768
14:47:31 <openstack> Launchpad bug 1515768 in tacker "Instance creation fails with libvirtError: Unable to create tap device: Device or resource busy " [Undecided,New]
14:47:34 <johnthetubaguy> there is a new bug someone has linked here
14:47:40 <sripriya> hi: i put this one up
14:48:43 <sripriya> there is duplicate set of ports in nova cache for the instance when it is trying to plug 3 vnics to the instance
14:49:05 <johnthetubaguy> sripriya: what would you like to discuss? is this requesting help to debug the problem?
14:49:21 <sripriya> johnthetubaguy: yes
14:49:38 <johnthetubaguy> looks like this is one of those things that causes the intermittent generic ssh failure in the tests
14:49:47 <johnthetubaguy> but its nice we have a specific failure mode
14:49:49 <bauzas> not sure
14:50:01 <johnthetubaguy> sripriya: have you created an ER query for this?
14:50:08 <bauzas> it's a bit different from the setup we have in the gate AFAIK
14:50:09 <sripriya> johnthetubaguy: it fails 1 out of 3 times
14:50:26 <sripriya> johnthetubaguy: sorry what's ER query?
14:50:27 <bauzas> not sure it's a gate failure
14:50:32 <bauzas> checking now
14:50:36 <johnthetubaguy> sripriya: so the bug claims its a gate failure
14:50:37 <danpb> yeah this just sounds like a nova network cache problem
14:50:44 <sripriya> bauzas: yes, happens one out of 3 times
14:50:51 <bauzas> on your env
14:50:55 <danpb> its corrupting itself and getting duplicate nic entries
14:51:01 <bauzas> here, I'm wondering if the CI is impacted
14:51:17 <johnthetubaguy> so the ER query is basically how we track gate bugs, to see how often we hit them
14:51:20 <bauzas> danpb: yeah that's my understanding
14:51:20 <sripriya> johnthetubaguy: yes, reproducible on local too in fact on a physical system, fails 1 out of 3 times
14:51:21 <danpb> i've seen a very similar bug reported downstream in RHEL openstack, but i could have sworn we identified it as already fixed upstream
14:51:43 <johnthetubaguy> OK, lets catch up on that in #openstack-nova after the meeting
14:51:46 <bauzas> oh man, launchpad why are you trampling the \n ?
14:51:52 <bauzas> agreed
14:52:02 <sripriya> johnthetubaguy: thank you
14:52:16 <johnthetubaguy> https://review.openstack.org/#/c/248780 FK between BandwidthUsageCache.uuid and Instance, how do we add this safely?
14:52:58 <johnthetubaguy> lxsli I think this is your question?
14:53:24 <PaulMurray> hes coming
14:53:27 <lxsli> here
14:53:43 <lxsli> mriedem suggested I add this to the meeting
14:54:14 <lxsli> my patch cleans up orphaned data now and turns off FK constraints for MySQL, do we need to add other DB-specific protections or do we think it's safe now?
14:54:41 <johnthetubaguy> so the problem we have here, is we have not test to check "online"-ness of a migration yet
14:54:56 <johnthetubaguy> have no test to^
14:55:16 <johnthetubaguy> which kinda sucks, but I think thats the best we can do right now really
14:56:20 <lxsli> I'm all good to continue then? No arguments about adding an FK?
14:56:34 <johnthetubaguy> seems like a sane thing to add, from where I am say
14:56:38 <johnthetubaguy> am sat
14:56:54 <lxsli> great, I'm happy then
14:57:01 <johnthetubaguy> I know we started off with zero FKs in the DB, but I think that was a premature optimisation really
14:57:10 <johnthetubaguy> well, the ML covered it quite well I think
14:57:20 <lxsli> yeah Mike was very helpful
14:57:41 <johnthetubaguy> cool
14:57:45 <johnthetubaguy> any more for any more?
14:57:48 <llu-laptop> have a question about spec https://review.openstack.org/#/c/180049/ , the main purpose is to add new nova-manage cmd to list the compute node metrics stored in nova DB
14:58:16 <johnthetubaguy> oh, yes, fire away
14:58:23 <llu-laptop> someone said it could go specless, should I do that or still waiting for spec approval?
14:59:25 <llu-laptop> it already got 1 +2 and several +1s, don't know which way is better
14:59:41 <johnthetubaguy> llu-laptop: we have a spec for it now, I think this needs a spec really, just to capture all the conversation we had about that
14:59:53 <johnthetubaguy> OK, so the clock says we are done
15:00:00 <johnthetubaguy> thanks all
15:00:01 <johnthetubaguy> #endmeeting