21:00:15 <mriedem> #startmeeting nova 21:00:16 <openstack> Meeting started Thu Jul 20 21:00:15 2017 UTC and is due to finish in 60 minutes. The chair is mriedem. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:19 <openstack> The meeting name has been set to 'nova' 21:00:25 <dansmith> ohai 21:00:35 <takashin> o/ 21:00:49 <efried> \o 21:00:50 <melwitt> o/ 21:01:19 <TheJulia> o/ 21:01:28 <nic> \m/ 21:01:50 <mriedem> ok let's get started 21:01:53 <mriedem> #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 21:01:56 <jaypipes> o/ 21:01:59 <mriedem> #topic release news 21:02:07 <mriedem> so i heard there is a release around the corner 21:02:12 <mriedem> #link Pike release schedule: https://wiki.openstack.org/wiki/Nova/Pike_Release_Schedule 21:02:22 <mriedem> #info July 27 is feature freeze (1 week away) 21:02:32 <mriedem> #info July 20 (today) is the release freeze for non-client libraries (oslo, os-vif, os-traits, os-brick): https://releases.openstack.org/pike/schedule.html#p-final-lib 21:02:39 * bauzas waves late 21:02:41 <mriedem> we've done the final os-vif and os-traits releases already 21:02:59 <mriedem> if you needed something from oslo or keystoneauth, you'd better get cracking on it before dims passes out 21:03:18 * dims waves 21:03:21 <mriedem> #info Blueprints: 65 targeted, 64 approved, 33 completed (+4 from last week) 21:03:38 <mriedem> yay we officially made it over the 50% mark 21:03:43 <mriedem> which i think is a passing grade in the US now 21:04:01 <cburgess> lol 21:04:04 <cburgess> and... sad.... 21:04:04 <mriedem> thank you 21:04:10 <mriedem> i'll be here all week 21:04:17 <mriedem> #link Pike feature freeze blueprint tracker etherpad: https://etherpad.openstack.org/p/nova-pike-feature-freeze-status 21:04:35 <mriedem> i'm keeping ^ up to date throughout the day 21:04:41 <mriedem> trying to focus reviews a bit 21:04:49 <mriedem> any questions about the release? 21:05:05 <mriedem> ok moving on 21:05:07 <mriedem> #topic bugs 21:05:16 <mriedem> nothing is listed as critical 21:05:22 <mriedem> #help Need help with bug triage; there are 123 (+8 from last week) new untriaged bugs as of today (July 20). 21:05:35 <mriedem> just a note, after the FF, we have 2 weeks until the rc1 21:05:58 <mriedem> so we'll have to start going through those untriaged bugs here at some point to see if there are stop ship rc candidates 21:06:07 <mriedem> "rc candidates" is redundant 21:06:07 <mriedem> i know 21:06:20 <mriedem> speaking of 21:06:20 <mriedem> #info Starting to tag bugs for pike-rc-potential: https://bugs.launchpad.net/nova/+bugs?field.tag=pike-rc-potential 21:06:43 <mriedem> so if you spot a bug that looks like a major regression or upgrade issue, please tag it 21:06:57 <mriedem> gate status 21:06:58 <mriedem> #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 21:07:07 <mriedem> well, we know the functional tests have been wonky 21:07:21 <mriedem> something to do with lock timeouts 21:07:36 <mriedem> i started playing with that here but it went badly https://review.openstack.org/#/c/485335/ 21:07:43 <melwitt> heh 21:07:54 <mriedem> turns out that if you don't set the external lock fixture, 21:08:12 <mriedem> we're still starting services, like the network service, which has code that uses external locks, 21:08:19 <mriedem> and if you don't have the lock_path configured, it blows up 21:08:29 <mriedem> so maybe we can stub that out, i haven't had time to dig into it 21:08:36 <mriedem> oslo.concurrency does have some fixtures for this though 21:09:03 <mriedem> #link 3rd party CI status http://ci-watch.tintri.com/project?project=nova&time=7+days 21:09:15 <mriedem> i noticed the xenserver ci was busted the other night 21:09:22 <mriedem> i don't know if that's been fixed 21:09:34 <mriedem> questions about bugs? 21:09:45 <mriedem> ok moving on 21:09:47 <mriedem> #topic reminders 21:09:50 <mriedem> #link Pike Review Priorities etherpad: https://etherpad.openstack.org/p/pike-nova-priorities-tracking 21:10:02 <mriedem> ^ is probably stale, and i'm personally focusing on the FF etherpad above 21:10:09 <mriedem> #link Consider signing up for the Queens PTG which is the week of September 11 in Denver, CO, USA: https://www.openstack.org/ptg 21:10:30 <mriedem> #topic stable branch status 21:10:41 <mriedem> there isn't really any news here 21:10:57 <mriedem> we have some things to review, 21:11:06 <mriedem> and i'll probably do stable branch releases around rc1 21:11:20 <mriedem> #topic subteam highlights 21:11:29 <mriedem> dansmith: things for cells v2? 21:11:35 <dansmith> no meeting this week, 21:11:46 <dansmith> made some progress on the quotas set, which is our highest priority 21:11:50 <dansmith> need more tho 21:11:52 <dansmith> that is all. 21:12:18 <mriedem> right so quotas are now at https://review.openstack.org/#/c/410945/ 21:12:23 <mriedem> which is moving quotas into the api db 21:12:26 <mriedem> it's 2 changes 21:12:31 <mriedem> the counting quotas stuff is all approved though 21:12:35 <dansmith> aye 21:12:45 <mriedem> and the external-server-events thing is approved 21:12:52 <mriedem> it just doesn't want to merge 21:12:59 <dansmith> nor do lots of things 21:13:02 <dansmith> it's in the gate right no though 21:13:04 <dansmith> *now 21:13:05 <mriedem> beyond tha twe have some docs and bugs to fix 21:13:19 <mriedem> tracking those here https://etherpad.openstack.org/p/nova-pike-cells-v2-todos 21:13:23 * dansmith nods 21:13:30 <mriedem> #help cells v2 TODOs https://etherpad.openstack.org/p/nova-pike-cells-v2-todos 21:13:41 <mriedem> ok scheduler stuff 21:13:42 <mriedem> edleafe: ? 21:13:48 <jaypipes> I'll fill in for ed 21:13:52 <jaypipes> he's out 21:14:09 <mriedem> ok 21:14:31 <jaypipes> mriedem: basically, thanks to cdent and gibi we identified a nasty little bug in the placement shared resource providers SQL code. 21:15:18 <mriedem> and... 21:15:32 <dansmith> jaypipes is a master of suspense 21:15:38 * jaypipes looking for link 21:15:55 <bauzas> The secret of the universe is... 21:16:02 * dansmith is on the edge of his seat 21:16:04 <cburgess> 42 21:16:13 <jaypipes> https://review.openstack.org/#/c/485088/ 21:16:15 <jaypipes> #link https://review.openstack.org/#/c/485088/ 21:16:36 <jaypipes> that is the cause of the transient NoValidHosts functional test failures we were seeing 21:16:48 <mriedem> oh nice 21:16:56 <jaypipes> my bad for not notifying you and dansmith sooner. 21:16:58 <efried> mriedem is clearly spoiled by edleafe's prepared copy/paste reports 21:17:03 <mriedem> i saw the talk 21:17:05 <jaypipes> for some reason I thought I did but no. 21:17:15 <mriedem> i mean, 21:17:17 <jaypipes> efried: no, I'm just an idiot. 21:17:17 <mriedem> people been talkin' 21:17:24 <mriedem> talkin' about patches 21:17:26 <dansmith> mriedem: talkin bout talkin? 21:17:48 <mriedem> i like to do the bonnie rait joke once per year 21:17:51 <jaypipes> mriedem: outside of that major bug fix, there is the placement-claims work 21:17:54 <melwitt> I got it :) 21:18:01 <jaypipes> which has gotten reviews and is slowly being merged. 21:18:04 <mriedem> yeah so claims https://review.openstack.org/#/c/483566/ 21:18:08 <mriedem> jaypipes: you see the bug in there? 21:18:40 <mriedem> like, you don't have to understand it right now 21:18:45 <mriedem> but there is a bug in there, found by gib 21:18:47 <mriedem> *gibi 21:19:01 <mriedem> #info gibi is finding lots of placement bugs - yay gibi 21:19:18 <jaypipes> mriedem: yeah, sorry, I dropped the ball on that. will address shortly after meeting. 21:19:19 <mriedem> we also need to talk about alternatives at some point 21:19:25 <mriedem> but can do that later 21:19:38 <mriedem> i'd like to rap about contingency plans, mkay? 21:19:46 * mriedem turns his chair around 21:19:48 <jaypipes> go for it. 21:19:51 <mriedem> well, later 21:20:05 <mriedem> well, or now 21:20:20 <mriedem> main question is, if we don't get alternatives sorted out going to the computes, what breaks or do we lose? 21:20:20 <jaypipes> there's some outstanding traits-related things that are also in-flight from alex_xu 21:20:32 <jaypipes> including a bug fix that is important. I need to finish the review on that one. 21:20:34 <jaypipes> re-review.. 21:20:35 <mriedem> we lose the separate conductors 21:20:59 <dansmith> we lose separate conductors for retries 21:21:04 <mriedem> right 21:21:10 <mriedem> asking from another pov, 21:21:11 <jaypipes> well, yeah, we can't fulfill our promise of retaining retries in cellsv2 21:21:12 <dansmith> could be worse, 21:21:15 <bauzas> What Dan said 21:21:21 <mriedem> if we land the claims in the scheduler, but not alternatives going to computes, is that ok? 21:21:25 <dansmith> one of the first people that should be doing things with cellsv2 and pike don't care about retries anyway 21:21:54 <dansmith> mriedem: if we haven't removed the regular retry mechanism it should still work the same way it does now, right? 21:21:56 <melwitt> who is that? CERN? 21:21:59 <dansmith> melwitt: yeah 21:22:07 <mriedem> dansmith: yes, 21:22:12 <mriedem> i just want to make sure we're not missing something, 21:22:22 <mriedem> or that once we claim in the scheduler we lose something or break something else 21:22:34 <mriedem> but i think that if you're single level conductor it's business as usual, 21:22:43 <dansmith> yep 21:22:48 <mriedem> but you get the up-front claim protection which should mitigate some of the claim race failures in the computes 21:22:54 <mriedem> so we're making progress still 21:22:56 <bauzas> I don't think we would have other problems 21:23:16 <mriedem> and we'll still retry alternatives if the allocation request in the scheduler fails 21:23:18 <mriedem> due to a race 21:23:34 <bauzas> Correct 21:24:04 <jaypipes> right 21:24:22 <mriedem> is the retry happening in https://review.openstack.org/#/c/483566/ yet? 21:24:35 <jaypipes> mriedem: oh yes. 21:24:40 <bauzas> Yup 21:24:56 <jaypipes> mriedem: line 189 in filter_scheduler.py 21:25:06 <mriedem> ok i was looking at https://review.openstack.org/#/c/483566/4/nova/scheduler/filter_scheduler.py@203 21:25:31 <jaypipes> mriedem: that's after all retries. 21:25:33 <mriedem> ok and the number of hosts is limited by the number of retries? 21:25:55 <jaypipes> mriedem: no 21:26:09 <dansmith> there's no retries in this code right? 21:26:14 <mriedem> so we could retry 1000 times? 21:26:30 <mriedem> the retry is the for loop over hosts on L189 21:26:30 <jaypipes> dansmith: yes, there is. line 189-200 21:26:44 <jaypipes> mriedem: yes, we can retry 1000 times. 21:26:45 <mriedem> but we don't limit that by the configurable retry option 21:26:55 <dansmith> uh 21:27:08 <dansmith> so wait, this is a loop over all the hosts we got back from placement, 21:27:14 <jaypipes> dansmith: correct. 21:27:19 <mriedem> and filtered/weighed 21:27:20 <dansmith> and we try to claim one until we run out of hosts or get one? 21:27:25 <jaypipes> yes 21:28:06 <dansmith> okay, I hadn't noticed that the first time around, but I see now 21:28:12 <mriedem> so is there a reason we don't limit by the config? 21:28:15 <dansmith> to be clear, this is just retries in the claiming sense, 21:28:24 <dansmith> not the *reschedule* that won't work without the alternatives stuff 21:28:24 <jaypipes> mriedem: no reason really... 21:28:31 <mriedem> dansmith: yeah i get that 21:28:49 <bauzas> Honestly its just for a race 21:28:50 <mriedem> so retrying here 1000 times is better than retrying compute > conductor > scheduler * 1000 21:28:59 <dansmith> I feel like we should limit here, but I'm not sure why to be honest 21:29:14 <mriedem> i feel like we should limit too, 21:29:27 <dansmith> because as bauzas says, this should only be a race between placement saying something is good and us trying it 21:29:28 <bauzas> In case two concurrent requests get the same destination 21:29:33 <mriedem> but only because if we goofed something up and are going to retry 1000 times on a set of hosts that won't ever match 21:29:37 <dansmith> you'd need a lot of concurrent schedulers to race through 1000 of them 21:29:49 <bauzas> Yup 21:30:08 <dansmith> so, I'm tempted to leave it as is and backport an emergency fix if we find people are hitting it in some way 21:30:15 <bauzas> Looks a minor issue for me IMHO 21:30:17 <jaypipes> and a lot of luck. remember there's the randomization of the host subset as well 21:30:18 <mriedem> release note? 21:30:22 <dansmith> in reality, we probably have to fix whatever is causing them to hit it and not just the limit 21:30:26 <dansmith> jaypipes: aye 21:30:29 <mriedem> dansmith: yeah agreed 21:30:34 <mriedem> the retry is papering over a bigger problem 21:30:38 <dansmith> mriedem: sure, reno for the warning perhaps 21:30:44 <mriedem> ok 21:30:48 <jaypipes> happy to do that. 21:30:49 <mriedem> let's continue that fun in the review 21:30:52 <jaypipes> k 21:30:53 <bauzas> Reno FTW 21:31:15 <mriedem> remember, reno is the way we can always say "we told you so" :) 21:31:26 <mriedem> "we told you it was going to break!" 21:31:31 <dansmith> we're making a lot of people watch a small conversation, fwiw 21:31:38 <mriedem> good 21:31:41 <mriedem> we're a family here 21:31:52 <mriedem> ok moving on then 21:31:58 <mriedem> thanks for the discussion 21:32:10 <mriedem> the api meeting, i don't think i was tehre 21:32:19 <mriedem> sdague: were you? 21:32:36 <mriedem> the no more api extensions stuff is working its way through https://review.openstack.org/#/q/topic:bp/api-no-more-extensions-pike+status:open 21:32:44 <mriedem> the service/hypervisor uuids apis landed for 2.53 21:32:53 <mriedem> i'm working on the novaclient changes for that one now 21:33:08 <mriedem> notification meeting was relatively short, 21:33:24 <mriedem> gibi and i talked about the last remaining change for the searchlight notifications series 21:33:30 <mriedem> which is https://review.openstack.org/#/c/459493/ 21:33:51 <mriedem> and we talked about the updated_at bug fix from takashin 21:34:05 <mriedem> this https://review.openstack.org/#/c/475276/ 21:34:27 <mriedem> there is something weird going on in there where the updated_at field isn't set on the instance record even after we've updated the vm/task state in there 21:34:42 <mriedem> and updated_at should be set anytime there is an ONUPDATE triggered by a db record update 21:34:44 <mriedem> so it's weird 21:35:00 <mriedem> that's the only thing i'm holding my +2 for 21:35:10 <mriedem> ok cinder stuff 21:35:21 <mriedem> making progress, the swap volume change merged 21:35:24 <mriedem> open reviews are now https://review.openstack.org/#/q/status:open+project:openstack/nova+topic:bp/cinder-new-attach-apis+branch:master 21:35:31 <mriedem> the check_detach one is close 21:35:43 <mriedem> stvnoyes found a problem in the live migration one and is fixing it up 21:35:46 <mriedem> and we need a cinderclient release 21:35:54 <mriedem> but it's really down to like mainly 2 changes 21:36:02 <mriedem> so any help there on reviews would be appreciated 21:36:19 <mriedem> #topic stuck reviews 21:36:32 <mriedem> there was nothing in the agenda, so does anyone have something to mention here? 21:36:52 <mriedem> alright then 21:36:55 <mriedem> #topic open discussion 21:37:01 <nic> So I made a thing: https://blueprints.launchpad.net/nova/+spec/libvirt-virtio-set-queue-sizes 21:37:08 <nic> I can't see this landing in Pike, seeing as it exposes functionality in libvirt that hasn't even merged yet, and therefore technically doesn't exist 21:37:13 <nic> But it's been a while since I've done any Nova work, and I could use some expert eyes on it to make sure I'm not crazy, and that my approach makes some sense 21:37:18 <nic> And I could use some help getting it prioritized somewhere, obviously 21:37:43 <mriedem> nic: i'll target the bp to queens so we can discuss it after the pike release 21:37:50 <nic> \o/ 21:38:06 <cburgess> I will be in Denever and nic might be so we could discuss it there as well? 21:38:11 <cburgess> Denver even... 21:38:11 <mriedem> sure 21:38:27 <dansmith> I like how that was "de never" 21:38:31 <cburgess> In the meantime I think what nic was asking is.. are we even headed in the right direction or is it omg abort? 21:38:43 <mriedem> talk to sean-k-mooney 21:38:57 <mriedem> or vladikr maybe? 21:39:00 <jangutter> nic: I'm also interested. For the NFV case with low drops, you sometimes need larger queues. 21:39:32 <cburgess> jangutter Thats what its for... 21:39:37 <mriedem> i've added people to the review 21:39:52 <nic> Excellent 21:40:00 <cburgess> jangutter We hit it with NFV work loads under NFV. We carry a local set of patches for now to work around this but obviously want to get it into upstream. 21:40:09 <cburgess> NFV workloads under VPP even 21:40:15 <cburgess> I can't type today... or ever. 21:40:21 <jangutter> cburgess: Yep, and partly QEMU's also to blame :-( 21:40:30 <cburgess> jangutter yeah 21:40:50 <mriedem> alright, anything else? 21:41:15 <mriedem> i'll take that as a no 21:41:19 <mriedem> 1 week left 21:41:21 <mriedem> thanks everyone 21:41:23 <mriedem> #endmeeting