#openstack-meeting log

21:00:15 <mriedem> #startmeeting nova
21:00:16 <openstack> Meeting started Thu Jul 20 21:00:15 2017 UTC and is due to finish in 60 minutes.  The chair is mriedem. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:19 <openstack> The meeting name has been set to 'nova'
21:00:25 <dansmith> ohai
21:00:35 <takashin> o/
21:00:49 <efried> \o
21:00:50 <melwitt> o/
21:01:19 <TheJulia> o/
21:01:28 <nic> \m/
21:01:50 <mriedem> ok let's get started
21:01:53 <mriedem> #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
21:01:56 <jaypipes> o/
21:01:59 <mriedem> #topic release news
21:02:07 <mriedem> so i heard there is a release around the corner
21:02:12 <mriedem> #link Pike release schedule: https://wiki.openstack.org/wiki/Nova/Pike_Release_Schedule
21:02:22 <mriedem> #info July 27 is feature freeze (1 week away)
21:02:32 <mriedem> #info July 20 (today) is the release freeze for non-client libraries (oslo, os-vif, os-traits, os-brick): https://releases.openstack.org/pike/schedule.html#p-final-lib
21:02:39 * bauzas waves late
21:02:41 <mriedem> we've done the final os-vif and os-traits releases already
21:02:59 <mriedem> if you needed something from oslo or keystoneauth, you'd better get cracking on it before dims passes out
21:03:18 * dims waves
21:03:21 <mriedem> #info Blueprints: 65 targeted, 64 approved, 33 completed (+4 from last week)
21:03:38 <mriedem> yay we officially made it over the 50% mark
21:03:43 <mriedem> which i think is a passing grade in the US now
21:04:01 <cburgess> lol
21:04:04 <cburgess> and... sad....
21:04:04 <mriedem> thank you
21:04:10 <mriedem> i'll be here all week
21:04:17 <mriedem> #link Pike feature freeze blueprint tracker etherpad: https://etherpad.openstack.org/p/nova-pike-feature-freeze-status
21:04:35 <mriedem> i'm keeping ^ up to date throughout the day
21:04:41 <mriedem> trying to focus reviews a bit
21:04:49 <mriedem> any questions about the release?
21:05:05 <mriedem> ok moving on
21:05:07 <mriedem> #topic bugs
21:05:16 <mriedem> nothing is listed as critical
21:05:22 <mriedem> #help Need help with bug triage; there are 123 (+8 from last week) new untriaged bugs as of today (July 20).
21:05:35 <mriedem> just a note, after the FF, we have 2 weeks until the rc1
21:05:58 <mriedem> so we'll have to start going through those untriaged bugs here at some point to see if there are stop ship rc candidates
21:06:07 <mriedem> "rc candidates" is redundant
21:06:07 <mriedem> i know
21:06:20 <mriedem> speaking of
21:06:20 <mriedem> #info Starting to tag bugs for pike-rc-potential: https://bugs.launchpad.net/nova/+bugs?field.tag=pike-rc-potential
21:06:43 <mriedem> so if you spot a bug that looks like a major regression or upgrade issue, please tag it
21:06:57 <mriedem> gate status
21:06:58 <mriedem> #link check queue gate status http://status.openstack.org/elastic-recheck/index.html
21:07:07 <mriedem> well, we know the functional tests have been wonky
21:07:21 <mriedem> something to do with lock timeouts
21:07:36 <mriedem> i started playing with that here but it went badly https://review.openstack.org/#/c/485335/
21:07:43 <melwitt> heh
21:07:54 <mriedem> turns out that if you don't set the external lock fixture,
21:08:12 <mriedem> we're still starting services, like the network service, which has code that uses external locks,
21:08:19 <mriedem> and if you don't have the lock_path configured, it blows up
21:08:29 <mriedem> so maybe we can stub that out, i haven't had time to dig into it
21:08:36 <mriedem> oslo.concurrency does have some fixtures for this though
21:09:03 <mriedem> #link 3rd party CI status http://ci-watch.tintri.com/project?project=nova&time=7+days
21:09:15 <mriedem> i noticed the xenserver ci was busted the other night
21:09:22 <mriedem> i don't know if that's been fixed
21:09:34 <mriedem> questions about bugs?
21:09:45 <mriedem> ok moving on
21:09:47 <mriedem> #topic reminders
21:09:50 <mriedem> #link Pike Review Priorities etherpad: https://etherpad.openstack.org/p/pike-nova-priorities-tracking
21:10:02 <mriedem> ^ is probably stale, and i'm personally focusing on the FF etherpad above
21:10:09 <mriedem> #link Consider signing up for the Queens PTG which is the week of September 11 in Denver, CO, USA: https://www.openstack.org/ptg
21:10:30 <mriedem> #topic stable branch status
21:10:41 <mriedem> there isn't really any news here
21:10:57 <mriedem> we have some things to review,
21:11:06 <mriedem> and i'll probably do stable branch releases around rc1
21:11:20 <mriedem> #topic subteam highlights
21:11:29 <mriedem> dansmith: things for cells v2?
21:11:35 <dansmith> no meeting this week,
21:11:46 <dansmith> made some progress on the quotas set, which is our highest priority
21:11:50 <dansmith> need more tho
21:11:52 <dansmith> that is all.
21:12:18 <mriedem> right so quotas are now at https://review.openstack.org/#/c/410945/
21:12:23 <mriedem> which is moving quotas into the api db
21:12:26 <mriedem> it's 2 changes
21:12:31 <mriedem> the counting quotas stuff is all approved though
21:12:35 <dansmith> aye
21:12:45 <mriedem> and the external-server-events thing is approved
21:12:52 <mriedem> it just doesn't want to merge
21:12:59 <dansmith> nor do lots of things
21:13:02 <dansmith> it's in the gate right no though
21:13:04 <dansmith> *now
21:13:05 <mriedem> beyond tha twe have some docs and bugs to fix
21:13:19 <mriedem> tracking those here https://etherpad.openstack.org/p/nova-pike-cells-v2-todos
21:13:23 * dansmith nods
21:13:30 <mriedem> #help cells v2 TODOs https://etherpad.openstack.org/p/nova-pike-cells-v2-todos
21:13:41 <mriedem> ok scheduler stuff
21:13:42 <mriedem> edleafe: ?
21:13:48 <jaypipes> I'll fill in for ed
21:13:52 <jaypipes> he's out
21:14:09 <mriedem> ok
21:14:31 <jaypipes> mriedem: basically, thanks to cdent and gibi we identified a nasty little bug in the placement shared resource providers SQL code.
21:15:18 <mriedem> and...
21:15:32 <dansmith> jaypipes is a master of suspense
21:15:38 * jaypipes looking for link
21:15:55 <bauzas> The secret of the universe is...
21:16:02 * dansmith is on the edge of his seat
21:16:04 <cburgess> 42
21:16:13 <jaypipes> https://review.openstack.org/#/c/485088/
21:16:15 <jaypipes> #link https://review.openstack.org/#/c/485088/
21:16:36 <jaypipes> that is the cause of the transient NoValidHosts functional test failures we were seeing
21:16:48 <mriedem> oh nice
21:16:56 <jaypipes> my bad for not notifying you and dansmith sooner.
21:16:58 <efried> mriedem is clearly spoiled by edleafe's prepared copy/paste reports
21:17:03 <mriedem> i saw the talk
21:17:05 <jaypipes> for some reason I thought I did but no.
21:17:15 <mriedem> i mean,
21:17:17 <jaypipes> efried: no, I'm just an idiot.
21:17:17 <mriedem> people been talkin'
21:17:24 <mriedem> talkin' about patches
21:17:26 <dansmith> mriedem: talkin bout talkin?
21:17:48 <mriedem> i like to do the bonnie rait joke once per year
21:17:51 <jaypipes> mriedem: outside of that major bug fix, there is the placement-claims work
21:17:54 <melwitt> I got it :)
21:18:01 <jaypipes> which has gotten reviews and is slowly being merged.
21:18:04 <mriedem> yeah so claims https://review.openstack.org/#/c/483566/
21:18:08 <mriedem> jaypipes: you see the bug in there?
21:18:40 <mriedem> like, you don't have to understand it right now
21:18:45 <mriedem> but there is a bug in there, found by gib
21:18:47 <mriedem> *gibi
21:19:01 <mriedem> #info gibi is finding lots of placement bugs - yay gibi
21:19:18 <jaypipes> mriedem: yeah, sorry, I dropped the ball on that. will address shortly after meeting.
21:19:19 <mriedem> we also need to talk about alternatives at some point
21:19:25 <mriedem> but can do that later
21:19:38 <mriedem> i'd like to rap about contingency plans, mkay?
21:19:46 * mriedem turns his chair around
21:19:48 <jaypipes> go for it.
21:19:51 <mriedem> well, later
21:20:05 <mriedem> well, or now
21:20:20 <mriedem> main question is, if we don't get alternatives sorted out going to the computes, what breaks or do we lose?
21:20:20 <jaypipes> there's some outstanding traits-related things that are also in-flight from alex_xu
21:20:32 <jaypipes> including a bug fix that is important. I need to finish the review on that one.
21:20:34 <jaypipes> re-review..
21:20:35 <mriedem> we lose the separate conductors
21:20:59 <dansmith> we lose separate conductors for retries
21:21:04 <mriedem> right
21:21:10 <mriedem> asking from another pov,
21:21:11 <jaypipes> well, yeah, we can't fulfill our promise of retaining retries in cellsv2
21:21:12 <dansmith> could be worse,
21:21:15 <bauzas> What Dan said
21:21:21 <mriedem> if we land the claims in the scheduler, but not alternatives going to computes, is that ok?
21:21:25 <dansmith> one of the first people that should be doing things with cellsv2 and pike don't care about retries anyway
21:21:54 <dansmith> mriedem: if we haven't removed the regular retry mechanism it should still work the same way it does now, right?
21:21:56 <melwitt> who is that? CERN?
21:21:59 <dansmith> melwitt: yeah
21:22:07 <mriedem> dansmith: yes,
21:22:12 <mriedem> i just want to make sure we're not missing something,
21:22:22 <mriedem> or that once we claim in the scheduler we lose something or break something else
21:22:34 <mriedem> but i think that if you're single level conductor it's business as usual,
21:22:43 <dansmith> yep
21:22:48 <mriedem> but you get the up-front claim protection which should mitigate some of the claim race failures in the computes
21:22:54 <mriedem> so we're making progress still
21:22:56 <bauzas> I don't think we would have other problems
21:23:16 <mriedem> and we'll still retry alternatives if the allocation request in the scheduler fails
21:23:18 <mriedem> due to a race
21:23:34 <bauzas> Correct
21:24:04 <jaypipes> right
21:24:22 <mriedem> is the retry happening in https://review.openstack.org/#/c/483566/ yet?
21:24:35 <jaypipes> mriedem: oh yes.
21:24:40 <bauzas> Yup
21:24:56 <jaypipes> mriedem: line 189 in filter_scheduler.py
21:25:06 <mriedem> ok i was looking at https://review.openstack.org/#/c/483566/4/nova/scheduler/filter_scheduler.py@203
21:25:31 <jaypipes> mriedem: that's after all retries.
21:25:33 <mriedem> ok and the number of hosts is limited by the number of retries?
21:25:55 <jaypipes> mriedem: no
21:26:09 <dansmith> there's no retries in this code right?
21:26:14 <mriedem> so we could retry 1000 times?
21:26:30 <mriedem> the retry is the for loop over hosts on L189
21:26:30 <jaypipes> dansmith: yes, there is. line 189-200
21:26:44 <jaypipes> mriedem: yes, we can retry 1000 times.
21:26:45 <mriedem> but we don't limit that by the configurable retry option
21:26:55 <dansmith> uh
21:27:08 <dansmith> so wait, this is a loop over all the hosts we got back from placement,
21:27:14 <jaypipes> dansmith: correct.
21:27:19 <mriedem> and filtered/weighed
21:27:20 <dansmith> and we try to claim one until we run out of hosts or get one?
21:27:25 <jaypipes> yes
21:28:06 <dansmith> okay, I hadn't noticed that the first time around, but I see now
21:28:12 <mriedem> so is there a reason we don't limit by the config?
21:28:15 <dansmith> to be clear, this is just retries in the claiming sense,
21:28:24 <dansmith> not the *reschedule* that won't work without the alternatives stuff
21:28:24 <jaypipes> mriedem: no reason really...
21:28:31 <mriedem> dansmith: yeah i get that
21:28:49 <bauzas> Honestly its just for a race
21:28:50 <mriedem> so retrying here 1000 times is better than retrying compute > conductor > scheduler * 1000
21:28:59 <dansmith> I feel like we should limit here, but I'm not sure why to be honest
21:29:14 <mriedem> i feel like we should limit too,
21:29:27 <dansmith> because as bauzas says, this should only be a race between placement saying something is good and us trying it
21:29:28 <bauzas> In case two concurrent requests get the same destination
21:29:33 <mriedem> but only because if we goofed something up and are going to retry 1000 times on a set of hosts that won't ever match
21:29:37 <dansmith> you'd need a lot of concurrent schedulers to race through 1000 of them
21:29:49 <bauzas> Yup
21:30:08 <dansmith> so, I'm tempted to leave it as is and backport an emergency fix if we find people are hitting it in some way
21:30:15 <bauzas> Looks a minor issue for me IMHO
21:30:17 <jaypipes> and a lot of luck. remember there's the randomization of the host subset as well
21:30:18 <mriedem> release note?
21:30:22 <dansmith> in reality, we probably have to fix whatever is causing them to hit it and not just the limit
21:30:26 <dansmith> jaypipes: aye
21:30:29 <mriedem> dansmith: yeah agreed
21:30:34 <mriedem> the retry is papering over a bigger problem
21:30:38 <dansmith> mriedem: sure, reno for the warning perhaps
21:30:44 <mriedem> ok
21:30:48 <jaypipes> happy to do that.
21:30:49 <mriedem> let's continue that fun in the review
21:30:52 <jaypipes> k
21:30:53 <bauzas> Reno FTW
21:31:15 <mriedem> remember, reno is the way we can always say "we told you so" :)
21:31:26 <mriedem> "we told you it was going to break!"
21:31:31 <dansmith> we're making a lot of people watch a small conversation, fwiw
21:31:38 <mriedem> good
21:31:41 <mriedem> we're a family here
21:31:52 <mriedem> ok moving on then
21:31:58 <mriedem> thanks for the discussion
21:32:10 <mriedem> the api meeting, i don't think i was tehre
21:32:19 <mriedem> sdague: were you?
21:32:36 <mriedem> the no more api extensions stuff is working its way through https://review.openstack.org/#/q/topic:bp/api-no-more-extensions-pike+status:open
21:32:44 <mriedem> the service/hypervisor uuids apis landed for 2.53
21:32:53 <mriedem> i'm working on the novaclient changes for that one now
21:33:08 <mriedem> notification meeting was relatively short,
21:33:24 <mriedem> gibi and i talked about the last remaining change for the searchlight notifications series
21:33:30 <mriedem> which is https://review.openstack.org/#/c/459493/
21:33:51 <mriedem> and we talked about the updated_at bug fix from takashin
21:34:05 <mriedem> this https://review.openstack.org/#/c/475276/
21:34:27 <mriedem> there is something weird going on in there where the updated_at field isn't set on the instance record even after we've updated the vm/task state in there
21:34:42 <mriedem> and updated_at should be set anytime there is an ONUPDATE triggered by a db record update
21:34:44 <mriedem> so it's weird
21:35:00 <mriedem> that's the only thing i'm holding my +2 for
21:35:10 <mriedem> ok cinder stuff
21:35:21 <mriedem> making progress, the swap volume change merged
21:35:24 <mriedem> open reviews are now https://review.openstack.org/#/q/status:open+project:openstack/nova+topic:bp/cinder-new-attach-apis+branch:master
21:35:31 <mriedem> the check_detach one is close
21:35:43 <mriedem> stvnoyes found a problem in the live migration one and is fixing it up
21:35:46 <mriedem> and we need a cinderclient release
21:35:54 <mriedem> but it's really down to like mainly 2 changes
21:36:02 <mriedem> so any help there on reviews would be appreciated
21:36:19 <mriedem> #topic stuck reviews
21:36:32 <mriedem> there was nothing in the agenda, so does anyone have something to mention here?
21:36:52 <mriedem> alright then
21:36:55 <mriedem> #topic open discussion
21:37:01 <nic> So I made a thing: https://blueprints.launchpad.net/nova/+spec/libvirt-virtio-set-queue-sizes
21:37:08 <nic> I can't see this landing in Pike, seeing as it exposes functionality in libvirt that hasn't even merged yet, and therefore technically doesn't exist
21:37:13 <nic> But it's been a while since I've done any Nova work, and I could use some expert eyes on it to make sure I'm not crazy, and that my approach makes some sense
21:37:18 <nic> And I could use some help getting it prioritized somewhere, obviously
21:37:43 <mriedem> nic: i'll target the bp to queens so we can discuss it after the pike release
21:37:50 <nic> \o/
21:38:06 <cburgess> I will be in Denever and nic might be so we could discuss it there as well?
21:38:11 <cburgess> Denver even...
21:38:11 <mriedem> sure
21:38:27 <dansmith> I like how that was "de never"
21:38:31 <cburgess> In the meantime I think what nic was asking is.. are we even headed in the right direction or is it omg abort?
21:38:43 <mriedem> talk to sean-k-mooney
21:38:57 <mriedem> or vladikr maybe?
21:39:00 <jangutter> nic: I'm also interested. For the NFV case with low drops, you sometimes need larger queues.
21:39:32 <cburgess> jangutter Thats what its for...
21:39:37 <mriedem> i've added people to the review
21:39:52 <nic> Excellent
21:40:00 <cburgess> jangutter We hit it with NFV work loads under NFV. We carry a local set of patches for now to work around this but obviously want to get it into upstream.
21:40:09 <cburgess> NFV workloads under VPP even
21:40:15 <cburgess> I can't type today... or ever.
21:40:21 <jangutter> cburgess: Yep, and partly QEMU's also to blame :-(
21:40:30 <cburgess> jangutter yeah
21:40:50 <mriedem> alright, anything else?
21:41:15 <mriedem> i'll take that as a no
21:41:19 <mriedem> 1 week left
21:41:21 <mriedem> thanks everyone
21:41:23 <mriedem> #endmeeting