14:00:10 #startmeeting nova 14:00:11 Meeting started Thu Aug 22 14:00:10 2019 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:15 The meeting name has been set to 'nova' 14:00:23 o/ 14:00:24 o/ 14:00:25 o/ 14:00:29 o/ 14:00:49 o/ 14:00:51 o/ 14:01:33 \o 14:02:18 o/ 14:02:26 #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 14:03:27 #topic Last meeting 14:03:27 #link Minutes from last (*2) meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-08-08-14.01.html 14:03:57 Last week it was just me & takashin, and we didn't have anything to discuss, so we didn't bother going through the agenda. Stats below will be for 2 weeks worth of bugs. 14:04:07 anything to bring up from 2 weeks ago? 14:05:15 i was younger and happier 14:05:28 full of vim and vigor 14:05:28 there there 14:05:51 #topic Release News 14:05:59 three weaks to feature freeze 14:06:02 heh 14:06:09 sean-k-mooney moment there 14:06:37 #link Train release schedule https://wiki.openstack.org/wiki/Nova/Train_Release_Schedule 14:07:00 As if you didn't know, #action everyone do lots of reviews 14:07:05 What are the chances of SEV landing before the freeze? It's currently 2nd in the runway queue 14:07:29 so I had been tempted to compose a list of blueprints I think are "close" and could just stand a quick final look to be pushed through 14:07:40 but thought that would potentially subvert the runways process 14:07:47 how do others feel? 14:08:39 I assume that if something is in the runway queue it is ready so you actualy has a list already 14:09:01 efried: or you want to filter that list down a bit? 14:09:03 right, but there are things that are "really close" that are waaay down in the queue 14:09:08 there are at least 3 api changes that are all conflicting for the next microversion, 14:09:22 so somehow serializing those would be nice, and i think they are all the same owner (brinzhang) 14:09:35 yonglihe has one very close 14:09:37 i think the unshelve + az one is going to be next given it's in a runway right now and it's had active review 14:09:42 oh right that's 4 14:10:02 @alex_xu, thank, i just thing how to attract force. 14:10:12 thing/ think 14:10:23 efried: really close aka forbidden aggregates? 14:10:43 yeah, that's one that was on top of brain 14:11:04 some stuff has been reviewed outside of the runway slot, like pmu, but i had looked for that one as well since i knew the spec was pretty simple 14:11:19 yeah, not being in a runway slot doesn't mean it can't be reviewed of course 14:11:26 I think we have several blueprints that are counting on that ^ :P 14:11:27 i think for the really close stuff, 14:11:30 people just need prodding 14:11:47 "ad hoc prod" vs "efried writes up something more formal" 14:11:50 so if for forbidden aggregates that means poking dansmith, i guess poke on 14:12:26 efried: writing something up / etherpad doesn't mean people will look w/o being poked 14:12:29 from experience, 14:12:41 it's good for project management either way so if it clears your head go nuts 14:12:44 true story. 14:12:50 i did it in the past for my own mental health 14:12:50 public shaming can be effective, though 14:13:46 😂 14:14:01 okay, I'll see whether "digital pillory" floats to the top of my to-do list this week. 14:14:18 #topic Bugs (stuck/critical) 14:14:18 No Critical bugs 14:14:18 #link 69 new untriaged bugs (+2 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:14:18 #link 3 untagged untriaged bugs (+2 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:14:49 i haven't really done anything with this one https://bugs.launchpad.net/nova/+bug/1839800 14:14:50 Launchpad bug 1839800 in OpenStack Compute (nova) "instance need to trigger onupdate when nwinfo update" [Undecided,New] 14:14:52 since it's kind of opinion, 14:14:56 but i can see the logic 14:14:59 in case others care to weigh in 14:16:10 feels similar to this https://bugs.launchpad.net/nova/+bug/1704928 14:16:11 Launchpad bug 1704928 in OpenStack Compute (nova) "updated_at field is set on the instance only after it is scheduled" [Medium,In progress] - Assigned to Balazs Gibizer (balazs-gibizer) 14:16:39 gibi: that's a deep cut 14:17:04 I know, this is why it is not progressing 14:18:16 im not sure if the updated at field should be updated for this 14:18:23 you never replied to my nack :) 14:18:27 anyway, 14:18:33 let's talk on reviews and in bugs or -nova later 14:18:38 mriedem: I had no idea what to do with it :) 14:18:49 gibi: but you have the cape man! 14:19:13 * gibi needs to think about who to pass that cape 14:20:00 Totally. 14:20:00 moving on... 14:20:00 #topic Gate status 14:20:00 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 14:20:00 #link 3rd party CI status (seems to be back in action) http://ciwatch.mmedvede.net/project?project=nova 14:20:17 seeing a lot of that innodb thing lately :( 14:20:39 also grenade things have seemed quite brittle for the past week or two. 14:20:40 yeah, that's one a single provider so i'm not sure what's up with that 14:20:58 grenade as in ssh fails? 14:21:00 on the old side? 14:21:05 if so, there is a devstack patch to stein for that 14:21:16 I'm not sure, I haven't dug into a lot of them 14:21:25 https://review.opendev.org/#/c/676760/ 14:21:31 mriedem: the patch to enable memcache? 14:21:49 yes 14:21:51 After you mentioned a bug number a few days ago, I started trying to find that same issue in the subsequent failures, but either it was something different or I was looking in the wrong place. 14:21:54 so I basically gave up. 14:22:02 and just started blind rechecking. 14:22:30 mriedem: does that need a devstack core or a stable core? 14:22:40 devstack core 14:22:43 or is devstack-stable its own thing? 14:22:45 so gmann 14:22:47 no 14:22:51 okay 14:22:53 devstack is special, there is no stable team 14:23:04 I knew there was something unusual there 14:24:37 Could frickler approve it? 14:24:48 oh i'm sure, or ianw 14:24:52 i pinged gmann in -qa 14:24:57 or clarkb 14:25:00 or sdague! 14:25:13 I was about to say, let's resurrect that guy ^ 14:26:01 As for the innodb thing, that's on limestone-regionone? Who owns that? 14:26:01 moving on? 14:26:12 you'd have to ask in -infra 14:26:14 i forget the name 14:26:48 okay, pinging in -infra 14:26:49 moving on. 14:27:00 #topic Reminders 14:27:01 any? 14:27:26 wear your seatbelts, kids 14:27:42 my car dings if i don't so i do 14:27:55 #action kids to wear seatbelts 14:27:55 #topic Stable branch status 14:27:55 #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein 14:27:55 #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky 14:27:55 #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens 14:27:56 big gubment safety standards 14:28:19 mriedem: stable nooz? 14:28:21 stable reviews are piling up again, 14:28:24 * alex_xu can keep the body float on the seat 14:28:43 we have a regression on stable that needs reviews, sec 14:29:05 https://review.opendev.org/#/q/topic:bug/1839560+(status:open+OR+status:merged) 14:29:11 lyarwood must be on vacation? 14:29:20 he is back 14:29:26 maybe dansmith can hit those stein ones 14:29:41 my dance card is getting pretty full today 14:29:49 but remind me 14:30:20 mriedem: did TheJulia comment on if the backport fixed there issue 14:30:23 https://review.opendev.org/#/q/topic:bug/1839560+branch:stable/stein 14:30:27 sean-k-mooney: haven't heard 14:30:42 mriedem: zuul won't let us trigger the job, just says merge error 14:30:49 so... *shrug* 14:31:01 TheJulia: link me the ironic patch in -nova 14:31:48 anything else stable? 14:31:54 no 14:32:00 #topic Sub/related team Highlights 14:32:00 Placement (cdent) 14:32:00 #link latest pupdate http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008537.html 14:32:06 cdent is on vacation this week 14:32:16 summary: 14:32:16 Consumer Types: "nice to have" but not critical for Train. 14:32:16 same_subtree discoveries - docs needed 14:32:16 osc-placement needs attention 14:32:38 tetsuro and i have been reviewing mel's osc-placement series 14:32:42 for the aggregate inventory thing 14:32:43 should land soon 14:32:46 ++ 14:33:16 API (gmann) 14:33:16 This week updates: http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008669.html 14:34:18 comments, questions, concerns? 14:34:47 #topic Stuck Reviews 14:34:47 any? 14:35:33 #topic Review status page 14:35:33 #link http://status.openstack.org/reviews/#nova 14:35:33 Count: 459 (+2); Top score: 1415 (+42) 14:35:33 #help Pick a patch near the top, shepherd it to closure 14:35:44 #topic Open discussion 14:35:55 one item on the agenda 14:35:57 #link generic resource management for VPMEM, VGPU, and beyond http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008625.html 14:36:06 alex_xu: care to take the mic? 14:36:23 i'll say i know i haven't replied yet, still need to digest that 14:36:37 and I'm recusing myself 14:36:53 yea, I summary the xml way and db way's pros/cons for the resource claim 14:37:10 most of my comment in the mail about future use. but i like the direction 14:37:53 here is the new plan https://etherpad.openstack.org/p/vpmems-non-virt-driver-specific-new 14:38:06 TL;DR: 14:38:06 - generic `resources` object stored on the Instance and (old/new) on MigrationContext 14:38:06 - virt driver update_provider_tree responsible for populating it if/as necessary 14:38:06 - RT uses it (without introspecting the hyp-specific bits) to claim individual resources on the platform. 14:38:15 efried: thanks 14:38:43 sounds like ERT 14:38:50 amiright folks 14:38:53 yup 14:39:05 I don't know what that is 14:39:11 the previous dumpster fire 14:39:22 also like pcimanager, but without host side persistent 14:39:22 json blob for out of tree hp public cloud things pre-placement 14:39:27 that killed about 3 years of effort 14:39:38 and squashed many souls 14:39:49 phil day hasn't been seen since 14:40:01 ERT (extensible resouce tracker) 14:40:23 oh, at least, we aren't going to make any extension 14:40:26 https://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/extensible-resource-tracking.html 14:40:59 alex_xu: right your proposal does not leave the RT/virt driver 14:41:41 i thought we've wanted to move away from the pci manager for placement but in the case of vgpu and vpmem we're just using placement inventory to tell us how many of a device we have on a host, but not which devices are allocated, which is what pci manager does for pci devices (inventory and allocation tracking) 14:42:05 right, this would (eventually) allow us to do away with the PCI manager 14:42:24 effectively, move the "select which device" logic into the virt driver and get rid of the rest 14:42:43 scratch that, "select which device" still in RT. 14:42:50 so yeah, get rid of PCI manager. 14:43:06 yes. but there are other gaps to fill first before we can 14:43:26 i know i mentioned to alex_xu that i thought it could be possible to get the vpmem allocation info from the hypervisor and only use persistence with the migration context during a move operation (resize) to ease that issue with same host resize, but i still need to read through the ML thread 14:43:29 e.g. moding pci device in placment. and passing allocation candiates to the weighers 14:43:40 this is a step on that path 14:44:24 sean-k-mooney: yes. It provides a clean divide at the virt driver boundary, which IMO is the biggest architectural hurdle we were going to need to overcome. 14:44:29 anyway, not going to solve this here 14:44:45 yea, I think the goal is total different 14:44:45 ERT has plugin for RT, and plugin for host manager, that is extension for the whole nova-scheduler 14:44:45 that isn't what we want 14:44:46 we want something to manage the resource assignment of compute node. since placement only know how many resource we have, but don't know which resource can be assigned 14:45:00 oops, my network slow, just bump a lot of messages... 14:45:03 Fair enough. Sounds like dansmith is abstaining, so I think basically we're asking for mriedem's buy-in on this direction. 14:45:30 alex_xu: to be clear, my ERT comment was mostly about storing generic json blobs for resource tracking purposes 14:45:38 not that anyone is proposing plugins 14:46:13 what if i defer to dansmith? then we hit an infinite loop? 14:46:16 mriedem: that json-blob is version object dump, so under the version control, and only read/write by virt driver. 14:46:28 heh 14:46:43 :) 14:46:47 alex_xu: for now, until someone wants to add a weigher 14:46:50 but i digress 14:47:08 we can continue on ml i think 14:47:28 alex_xu: Did you say code was forthcoming soon? 14:47:59 but i think plamcnet aware (allocation candiate aware) weigher would mitigate the need to look at the json blob 14:48:09 yes, luyao already verfied the code, and she is working on unittest, and refine the code. I think we can bring the code up in two or three days 14:48:32 mriedem: would that help ^ or is it really just a matter of carving out time to read the ML? 14:48:50 most of code she already have, since our initial proposal is about the db 14:48:53 reading wip code isn't going to help me, i'll procrastinate more on that 14:49:00 ack 14:49:04 i just need to read the ML 14:49:11 Okay, /me adds to pester list 14:49:13 i also know that CI will never happen for any of this 14:49:33 so i'm on the fence about just saying, "do whatever you want, it won't be tested anyway" 14:49:50 You mean CI jobs with real vpmems behind them? 14:50:00 i.e. if anyone ever uses it and it doesn't work, i guess we hope they report it and someone fixes it 14:50:01 efried: yes 14:50:10 we promised to have ci for vpmem, and rui is working on it 14:50:11 even a periodic job 14:50:24 and good progress 14:50:53 that's good to hear 14:50:57 okay, let's move on. 14:51:00 any other open topics? 14:51:15 there is an open question about a regression in stein for ironic rebalance, 14:51:17 and lost allocations, 14:51:32 so i need to find someone to test and verify that, but it's probably better to just ask on the ML 14:51:38 dtantsur might be able to do it 14:52:08 so action for me to ask about testing that on the ML 14:52:27 no idea if ironic has a CI job that does a rebalance with an allocated node, 14:52:30 if so we could just check there 14:54:45 Okay. 14:54:45 Thanks all. 14:54:45 o/ 14:54:45 #endmeeting