14:00:07 #startmeeting nova 14:00:08 Meeting started Thu Jun 13 14:00:07 2019 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:11 The meeting name has been set to 'nova' 14:00:19 o/ 14:00:20 o/ 14:00:22 o/ 14:00:23 o/ 14:00:24 o/ 14:00:29 ~o~ 14:01:22 _o 14:02:40 \o 14:02:57 \/o 14:03:04 #link agenda (updated seconds ago): https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 14:03:23 #topic Last meeting 14:03:23 #link Minutes (such as they are) from last meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-06-21.00.html 14:03:35 we skipped last meeting due to low attendance and interest 14:04:02 #topic Release News 14:04:02 #link nova 19.0.1 (stein) released https://review.opendev.org/661376 14:04:02 #link m1 release for os-vif (merged) https://review.opendev.org/#/c/663642/ 14:04:40 #topic Bugs (stuck/critical) 14:04:40 No Critical bugs 14:04:40 #link 89 new untriaged bugs (up 7! since the last meeting (2w ago)): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:04:40 #link 12 untagged untriaged bugs (up 2 since the last meeting (2w ago)): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:04:57 stop me any time if you have something under one of these topics 14:05:01 lots of bugs going up 14:05:04 few people doing triage 14:05:13 we use to average in the low 70s for open bugs 14:05:23 #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 14:05:23 #help need help with bug triage 14:05:39 should we have a bug smash day? 14:05:52 idk 14:05:59 this isn't even reviewing patches, it's triaging reported stuff 14:06:00 I can raise the point internally FWIW 14:06:12 yeah, so like a "bug triage day" 14:06:17 thanks bauzas 14:06:29 efried: send an email and I'll forward 14:06:57 idk, if we just had more cores triaging 1-2 bugs per day / week we'd keep it under control 14:07:25 tell me that 14:07:26 #action efried to (delegate? or) send email complaining about climbing bug numbers 14:07:40 emphasize fixing the shit we've already broken before putting in new broken shit 14:07:49 mriedem, is there like a checklist or something for triaging bugs? 14:07:52 * gibi feels the pressure 14:07:57 #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 14:08:00 clearly this is all gibi's fault :P 14:08:02 As in, what status to set when, affected releases, etc 14:08:07 Thanks efried 14:08:20 tia for helping, artom :) 14:08:43 and if you see holes in ^, please let us know, or edit 14:08:48 efried, I'll try, but no promises :( I'll be blunt, unless we internally prioritize that stuff, it's not very exciting work 14:08:56 no question 14:09:21 I've made half-hearted attempts at having "upstream" days in my week 14:09:38 Starting those with a bug triaged or two might be a good beginning 14:09:43 artom: that's my point, I'll try to make it a priority 14:09:53 but let's not discuss it here 14:10:09 Yeah, we all now it's a problem, and the obvious solution 14:10:21 Thanks for reminding us, mriedem 14:10:28 Gate status 14:10:28 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 14:10:53 had a doozy a couple days ago when the gate nodes upgraded their py27 with a breaking backport in the tokenizer 14:11:25 stephenfin came to the rescue: https://review.opendev.org/664940 14:11:27 we likely need to backport that skip patch to stein http://status.openstack.org/elastic-recheck/#1804062 14:11:37 or that 14:11:41 we can backport the real fix ... yeah 14:11:43 stein is using bionic nodes as well 14:11:57 I thought someone already proposed the backport of the skip patch 14:12:30 lyarwood did https://review.opendev.org/#/c/664841/ 14:12:31 ok 14:12:41 but yes, we should backport the fix. 14:12:53 I can do that now 14:13:27 gross, merge conflict. Thanks stephenfin 14:13:48 3rd party CI 14:13:48 #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days 14:13:54 gotta cherry pick it on top of the skip change 14:14:01 mm 14:14:46 I've noticed a resurgence in activity in PowerVM (still pay a little bit of attention over there) - they're on the path to getting their CI green again. 14:15:05 The new people are themselves very green, in a different way. But coming along. 14:15:15 #topic Reminders 14:15:17 any? 14:15:58 #topic Stable branch status 14:15:58 #link Stein regressions: http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005637.html 14:15:58 bug 1824435 still pending 14:15:59 #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein 14:15:59 stein release 19.0.1 happened 14:15:59 #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky 14:15:59 #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens 14:16:00 bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [Medium,Triaged] https://launchpad.net/bugs/1824435 14:16:26 i'm working on flushing rocky so we can do a release 14:16:26 I heard rumblings that mriedem wants to get a rocky release going soon 14:16:45 you can probably remove that stein regressions line from the agenda now 14:17:06 just read scrollback i was planning to do an os-vif release today/tomorow so its unforunet that a release was jsut done with out the ovs-dpdk feature :( but anyway ill propose the new release when it does. 14:17:34 that doesn't apply to stable right? 14:18:08 the dpdk feature does not but there is a stable stine backport i want to land 14:18:25 to fix tox/the mock_open bug 14:18:57 a stable branch release for a unit test fix is likely not super important 14:19:30 the dpdk feature on master would get released at t-2 unless we do an early release for some reason (if something depends on it) 14:19:31 ya we can decide after the backport 14:19:53 moving on? 14:19:54 mriedem: we need the release to merge the nova use of it 14:20:02 efried: yes we can move on 14:20:10 #topic Sub/related team Highlights 14:20:11 Placement (cdent) 14:20:11 #link latest pupdate http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006994.html 14:20:11 #link Spec for can_split is dead/deferred unless someone speaks up https://review.opendev.org/#/c/658510/ 14:20:11 #link Spec for the rest of nested magic is undergoing deep discussion on one last (?) edge case https://review.opendev.org/#/c/662191/ 14:20:11 #link spec for rg/rp mappings is merged https://docs.openstack.org/placement/latest/specs/train/approved/placement-resource-provider-request-group-mapping-in-allocation-candidates.html 14:20:11 #link WIP for rg/rp mappings https://review.opendev.org/#/c/662245/ 14:20:44 We might be getting close on the nested magic 1 discussion, see review 14:21:13 melwitt expressed concern yesterday about how such discussions impact the timeline of the work, so this is worth mentioning to the nova team: 14:21:33 We (placement team) are not strictly observing merge-spec-before-code rules. 14:21:49 We have already cut a new microversion with one of the features in the spec. 14:21:59 and can continue to do so with the components that are not contentious. 14:22:34 and the one that is contentious right now is pretty corner-y and not in the critical path for things like NUMA affinity 14:22:56 any questions etc about placement-as-relates-to-nova? 14:22:57 that being can_split 14:22:59 no 14:23:03 can_split is dead 14:23:10 RIP 14:23:18 right i thought that was the contentious one 14:23:21 FWIW, I tried to get our PMs to react - nothing came of it 14:23:26 Maybe it's because they're both French 14:23:27 we found a new one 14:23:31 As in, holidays 14:23:42 ah ok ill try to catch up outside the meeting 14:23:52 Ah. I did some internal poking as well, and also asked the starlingx ML. Radio silence. 14:24:02 (on can_split ^ ) 14:24:06 as far as i know nothing *active* on the nova side is being blocked by the placement nested magic stuff 14:24:10 yeah, sean-k-mooney lmk if you want me te tldr it for you. 14:24:22 (after the meeting) 14:24:50 mriedem: well that depend there a few numa feature like vgpu numa affintiy 14:24:50 mriedem: The one melwitt brought up yesterday was effectively: "Do we need the VGPU affinity spec at all" 14:25:00 efried, sean-k-mooney, so can_split's death means nova will need to uncouple NUMA topologies from CPU pinning and stuff? Can discuss after the meeting 14:25:01 dammit, I'm just a half second late all day today 14:25:02 sean-k-mooney: that spec isn't dependent on placement 14:25:02 that we are proposing workaroudn for to not need to depend on it 14:25:19 this ^ 14:25:34 it isnt but we went out of our way to make that be the case 14:25:57 mriedem: correct, the spec does not depend on placement, that's the point. It's a workaround assuming we don't have placement support. I would prefer to get rid of it and use the placement support. 14:26:10 so let me ask, 14:26:36 of the people that care about that nova spec (bauzas, sean-k-mooney, melwitt, et al) - how many are reviewing the placement nested magic spec? 14:26:46 efried: right but we dont want to block that feature or other again waiting for that supprot 14:26:47 melwitt is 14:26:51 i am too 14:27:16 i think bauzas has skimmed it and is aware of it 14:28:01 melwitt also asked who was on the impl and if we wanted help. There's code proposed for some of it, in various stages of WIP. 14:28:51 #link rp/rg mappings microversion series (of 2 patches) starting at https://review.opendev.org/#/c/662785/ 14:28:56 ^ has one +2 already 14:29:36 am i the only one gerrit is broken for currently by the way 14:30:01 #link WIP resourceless request groups https://review.opendev.org/657510 14:30:02 #link also WIP resourceless request groups https://review.opendev.org/663009 14:30:07 sean-k-mooney: working for me 14:30:15 :( ok 14:30:44 Much of the feature work isn't especially hard. It's the requirements/design that has been hard. 14:30:46 so 14:30:53 volunteers would be welcomed. 14:31:07 but probably not critical for closure. 14:31:13 moving on? 14:31:58 API (gmann) 14:31:58 Updates on ML- #link http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006958.html 14:32:08 ^ from week before last, I don't see one this week. 14:32:14 gmann: anything to add? 14:33:11 #topic Stuck Reviews 14:33:13 any? 14:34:03 #topic Review status page 14:34:03 http://status.openstack.org/reviews/#nova 14:34:03 #help Pick a patch near the top, shepherd it to closure 14:34:38 that reminds me, 14:34:47 Presumably only cores can do this? For instance, https://review.opendev.org/#/c/522106 has had to repsonse to -1 in 1.5 years, can be force-abandonned? 14:34:48 we can probably drop the remove-cells-v1 series from runways now can't we? 14:34:53 *has had no 14:35:07 artom: i've looked at that one, 14:35:10 mriedem: Is someone can hit this one patch then yes 14:35:14 and rather than abandon i'd like someone to just take it over 14:35:36 but yes artom, cores can force-abandon in general 14:35:37 https://review.opendev.org/#/c/660950/ 14:35:50 actually, it's all cleanup. Yeah, remove it 14:35:51 artom: most of the review comments in that change are nits from what i can tell 14:36:06 stephenfin: will do 14:36:10 mriedem, I mean, since I brought it up I guess I have to now? 14:36:11 (remove run runways) 14:36:21 artom: no, you can just be a bad person 14:36:23 it's fine 14:36:29 it's the world we live in now :) 14:36:38 There's a place in hell with my name on it 14:37:29 we've discussed doing a force-abandon-anything-older-than-X. So far we haven't done it. I guess I prefer mindful looks like we've been doing, despite that being much slower. 14:37:56 same - a lot of the time there are just nits or a missing test or something on an otherwise obvious/reasonable fix 14:38:03 efried, in that case we might want to maintain a list "plz someone else own this" somewhere 14:38:18 Anyway, I think I'm going to start tracking this like our bug queue, listing the total (+/- from last time) and the highest score on the list. Not that I expect any action to come of it, but it'll be fun to see the numbers go down... 14:38:30 artom: That's a good idea. Someone should make an etherpad... 14:38:42 Oh god I signed for another thing didn't I? 14:38:50 yep 14:38:55 * stephenfin politely disagrees 14:39:03 ^ with what stephenfin? 14:39:12 The "plz someone else own this" list 14:39:28 if we think its still relevent then i think it makes sense 14:39:29 We all have enough work to do. If it's a significant issue, someone will stumble upon it again in the future 14:39:50 if its just old stuff that no longer relevent we can abandon 14:39:52 stephenfin, I don't see the harm in having a list - no one has to look at it 14:40:10 I assumed the list was more for folks outside the usual folks who want to join in? maybe I am too optimistic? 14:40:25 Fair. I just didn't want you wasting your time :) 14:40:27 Counter-argument is that people (a.k.a. unicorns) with free time might be able to use the list to pick up stuff 14:40:44 stephenfin, I can't really populate it - except that one patch I mentioned 14:40:44 If others think it'll be useful, go for it. Just noting _I_ won't personally use it 14:40:49 But I can create it 14:40:51 yup. Add near "low-hanging fruit" link on our wiki page 14:40:54 Which is, like, 5 seconds 14:41:07 What's the bot thing for action item? 14:41:15 artom: #action 14:41:22 I think... 14:41:25 yup 14:41:32 like other things (e.g. themes, cf ptg discussion) it's useful for people who find it useful; others can ignore, no harm no foul 14:41:47 #action artom to create etherpad for "plz someone else own this" and link to low-hanging fruit section of wiki 14:41:59 artom: i think efried has to do it 14:42:00 s/to/from/ 14:42:00 thanks artom 14:42:03 eh? 14:42:13 meeting chairs add actions 14:42:16 oh, the #action thing. 14:42:16 i didnt think the bot listened to everyone 14:42:17 okay. 14:42:31 * artom storms off in a huff and pouts in a corner 14:42:33 #action artom to create etherpad for "plz someone else own this" and link from near low-hanging fruit section of wiki 14:42:42 we'll see if it shows up twice 14:42:49 #topic Open discussion 14:42:49 did we ever do anything with that review priorities in Gerrit thing 14:42:50 stephenfin: ironically the patch used as an example here is for docs 14:42:54 which you're a big fan of 14:42:58 and pushed fora release theme 14:42:58 oh 14:43:06 well then artom should just ask me :) 14:43:11 stephenfin: no, nothing with review priorities in gerrit 14:43:21 stephenfin, or. OR! you should just listen ;) 14:43:34 I think we concluded (at least for now) that it's too heavyweight and we don't have a crisp view of what it should look like anyway. 14:43:40 We have three open discussion topics on the agenda 14:43:42 if we still want to do it its fairly simple to update the project-config repo to add teh lables 14:43:45 and ~15 minutes left 14:43:48 artom: Sorry, I just saw something fluffy and am going chasing it 14:43:52 where were we? 14:43:53 (mriedem): Any interest in adding a rally job to the experimental queue? http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006769.html 14:44:09 i didn't add that to the agenda so i'm not speaking to it :) 14:44:15 Didn't we talk last week about dwindling gate resources? 14:44:36 experimental queue is on-demand 14:44:38 experimenal is manually triggered but it also has a lot of stuff in it 14:44:44 * artom stands corrected 14:44:45 oh. Who added the agenda item then? 14:44:55 if nobody speaks up Ima skip it 14:45:02 efried: i should say i don't remember adding it 14:45:08 maybe i did in a blackout, idk 14:45:18 personlaly if we add it i would poroably make it a periodic job 14:45:29 the interesting part from that thread is neutron has a rally job with visualizations for the time spent in a given flow 14:45:30 but it sound like we dont really see the need 14:45:34 at least for now 14:45:47 Okay, let's move on. Someone re-add it next week if there's interest and further discussion needed. 14:45:47 (shilpasd) Forbidden Aggregate Filters: automatic (un)traiting of host providers (http://lists.openstack.org/pipermail/openstack-discuss/2019-June/thread.html#6950) 14:45:53 i for one assume we have lots of poorly performing code in nova but nobody profiles so nobody knows 14:46:04 mriedem, we run performance/scale tests internally 14:46:07 So there's that 14:46:17 artom: like 12 months after we release upstream yeah? 14:46:46 mriedem, you're being optimistic 14:46:47 :) yes and i dont know where teh data actully goes form those test either 14:47:01 sean-k-mooney, I've seen a couple of reports on rhos-dev ML 14:47:04 right - i don't hear about that scale testing resulting in fixes coming back upstream 14:47:16 CERN is usually the one saying scaling sucks 14:47:29 mriedem, to be honest a lot of it is finding misconfigurations :/ 14:47:47 I think last time we discovered we were using the wrong hash algorithms in keystone tokens or something 14:47:47 so back to the forbidden aggrage filters topic 14:47:54 (shilpasd) Forbidden Aggregate Filters: automatic (un)traiting of host providers (http://lists.openstack.org/pipermail/openstack-discuss/2019-June/thread.html#6950) 14:47:54 tldr the "strict isolation using forbidden aggregates" work in nova has a question on the ML, linked above, about automatic traiting (and especially UNtraiting) of providers to enable this feature. 14:48:03 Just saying, it's *something* Not the best, but something 14:48:47 efried: were we settling on just extending the osc-placement plugin to add a command to add/remove traits 14:48:48 What we *could* do is, when you add "required traits" metadata to an aggregate, or when you add a host to such an aggregate, we *could* automatically add those traits to the RPs. So it would be one step to set up your "strict isolation". 14:48:59 sean-k-mooney: the osc plugin already does that 14:49:00 form the root RPs in the aggrate? 14:49:12 oh traits per aggregate... 14:49:23 i can't even remember now what that spec said 14:49:25 But then what do you do when you remove a host from such an aggregate, or remove a trait from such an aggregate? Do you remove it from the hosts? 14:49:45 The spec said automatically add traits. But it doesn't mention what happens on removal type ops. 14:49:51 which is where the thread comes in. 14:50:00 Removing traits is really hairy 14:50:06 knowing how/when/which to remove 14:50:10 so IMO we shouldn't do it. 14:50:13 the host could be in multiple aggregates yes? 14:50:17 But then there's asymmetry between the + and - 14:50:21 mriedem: yes 14:50:23 yes, among other tygers 14:50:44 as in pan tang tygers? 14:50:58 so what at least sean-k-mooney and I have agreed on in the thread is: don't muck with RP traits at all, on the + or - side. Make the admin do that part manually. 14:51:10 i discirbed how to do it in ~ nlongn time but i also liked efried suggestion of just doing it via the command line 14:52:05 fwiw there is a reason that the sync_aggregates CLI we have doesn't deal with removals 14:52:24 ok 14:52:39 it *could* but requires a lot of thought 14:52:49 ya its possible to do semi effiently in via the info in the nova database but its notrival to write correctly 14:53:14 o/ 14:53:15 so can we agree with this path forward: aggregate ops won't muck with RP traits 14:53:18 which is why i was lean toword dont do it automaticlaly 14:53:56 aggregate ops as in when you add a host to an aggregate *in nova*? 14:53:59 efried: im +1 on that. if someone write the code later to do it automatically and proves it work right i would not be againt it either 14:54:15 mriedem: yes 14:54:20 or when you add a required trait to an aggregate 14:54:30 can aggregates have traits? 14:54:39 nova host aggages can 14:54:40 mriedem: as part of the "forbidden aggs isolation" spec, yes. 14:54:43 placmente ones no 14:54:57 oh i see, the orchestration funbag we have to sync that all up 14:55:09 guess i should read that spec again 14:55:26 yes. The proposal is: syncing to host RPs should be done manually, not automatically via agg ops 14:55:27 actully i just thought of another way to do this but ill bring that up on #openstack-nova afterwards 14:55:42 Okay. Five minutes, let's hit the last topic please. 14:55:47 Ironic "resize" http://lists.openstack.org/pipermail/openstack-discuss/2019-June/thread.html#6809 14:55:51 mgoddard: thanks for being here 14:55:58 efried: np 14:56:33 The proposal in the thread is basically: Implement a limited 'resize' for ironic. It's always "same host" (really same-node) and it's very limited to what you can change 14:56:38 basically the "deploy template" 14:57:02 mgoddard: did I get that more or less right? 14:57:26 why not pick up and polish off the old fujitsu spec for actually supporting resize for baremetal instances? 14:57:35 mriedem: link? 14:57:38 which i think was pretty close last i reviewed 14:57:47 I think so. Resize is like rebuild but keeps changes to the server, correct? 14:57:57 no 14:57:59 mriedem: I think that was about redundancy, failing instances over to hardware with share storage. 14:58:08 resize changes the flavor rebuild changes the image 14:58:23 https://review.opendev.org/#/c/449155/ 14:58:37 so resize keeps your data and may or may not schduled you to another host in the vm case 14:58:49 rebuild is always on the same host in the vm case 14:59:00 I see 14:59:03 i guess rebuild would be the same for ironic 14:59:11 yes 14:59:16 e.g. same host just reporvios the root disk 14:59:22 so the overlap here is: we want to change flavors, keep "data", but stay same host 14:59:28 (same node) 14:59:29 ironic driver in nova already supports rebuild, but we're diverging 14:59:59 i'd say spec it up 15:00:12 in principal for BFV ironic nodes there is no reason not to supprot resize in general 15:00:18 we can obviously restrict a resize to the same compute service host and node via request spec 15:00:18 posible to a different host 15:00:19 is this logic handled within the virt driver, or will we need logic elsewhere to handle an ironic resize? 15:00:25 Okay. We're close to spec-ish in the ML, but getting something formal in the specs repo... 15:00:32 we're at time. Can we continue in -nova? 15:00:44 Thanks all o/ 15:00:45 #endmeeting