14:00:13 #startmeeting nova 14:00:14 Meeting started Thu May 30 14:00:13 2019 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:17 The meeting name has been set to 'nova' 14:00:24 o/ 14:00:28 o/ 14:00:33 o/ 14:00:35 \o 14:00:37 o/ 14:00:43 o/ 14:01:06 o/ 14:01:33 #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 14:01:35 o/ 14:01:42 o/ 14:02:26 #topic Last meeting 14:02:26 #link Minutes from last meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-05-23-21.00.html 14:03:00 I think the only update from here is to mention that https://blueprints.launchpad.net/nova/+spec/remove-consoleauth got the necessary federal background check and has now been approved. 14:03:14 stephenfin: just noticed this isn't tagged against a release - assume you're planning to do this in Train? 14:03:22 (y) 14:03:30 k, fixed 14:03:35 anything else from last meeting? 14:04:26 #topic Release News 14:04:55 I'm thinking it might be a good idea to do a spec scrub soon 14:05:06 Ooh, yes please 14:05:10 * johnthetubaguy nods 14:05:13 ya 14:05:23 How does next Tuesday grab y'all? 14:05:39 wfm 14:05:53 mostly wfm 14:06:10 +1 14:06:16 #action efried to post on ML re spec review sprint Tuesday June 4th 14:06:23 anything else release newsy? 14:06:51 #topic Bugs (stuck/critical) 14:06:51 No Critical bugs 14:06:51 #link 82 new untriaged bugs (down 3 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:06:51 #link 10 untagged untriaged bugs (down 1 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:07:10 any bugs in the spotlight? 14:07:24 Gate status 14:07:24 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 14:07:24 3rd party CI 14:07:24 #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days 14:07:38 zuul has seemed ungodsly slow lately 14:07:40 were we going to put up patches to log a warning on start of drivers that don't have 3rd party ci? 14:07:51 at least xenapi, 14:07:55 sounds like powervm is working on fixing their ci 14:08:21 I saw them put up a fix this morning, yah. Haven't caught up on the ML yet todya. 14:08:22 today 14:08:23 gee, guess what, vmware's ci was dead again and the keepers took days to notice :( 14:08:47 (but it's been kicked now) 14:08:50 mriedem: I remember we agreed to do that, yes. Who's volunteering to put up that patch? 14:08:57 i suppose i can 14:09:05 give me an action item 14:09:20 #action mriedem to propose patch logging warning that xenapi is not supported because no 3pCI 14:09:32 thanks mriedem 14:09:55 warning or deprecating outright? 14:10:10 I guess where's the line between no ci and half & maintained ci, but il leave that for the review 14:10:32 stephenfin: i was going to start with a warning 14:10:37 I spoke to bob ball about that, some neutron "bug" is blocking things working 14:10:52 I imagine that depends on whether there's anyone in the wild still using it. Might be worth bringing up on openstack-discuss 14:10:56 in any case we should be sure to add all the xen folks we know about to that review. 14:10:57 Ah, okay. That's fair 14:10:59 but, yeah, +1 the log message warning about removal 14:11:00 artom: i don't know what you're asking, the xenserver ci has been dead for months 14:11:31 yeah, its a dead ci at this point, we just merge the log message ASAP 14:11:45 if it gets removed as the CI becomes stable again, then whoop whoop, go for it 14:11:56 ya 14:12:08 cool, moving on. 14:12:13 mriedem, just saying that some 3pcis are more reactive than others 14:12:31 yeah, we're not deprecating every driver that goes red for two weeks 14:12:40 .... 14:12:40 Anyone know anything about zuul slowitude? 14:12:44 let's move on, I'll nitpick the review ;) 14:12:53 efried: you'd have to ask infra, but i'm guessing fewer nodes 14:13:00 http://grafana.openstack.org/d/T6vSHcSik/zuul-status? 14:13:29 http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1&fullscreen&panelId=20 14:14:19 those are very pretty 14:14:31 fraid I don't really know what I'm looking at though 14:15:11 I guess what I really want is for somebody to say, "Yes, it's a known issue, and we're working on it, and it'll all be fixed very very soon, so your jobs will wait in queue for five minutes tops." 14:15:25 clarkb: fungi: ^ if there are known infra issues can we get a summary in the ML? 14:15:43 i.e. fewer nodes, lots of gate failures, etc 14:15:48 efried: I think to get that you're going to need to have your employer pony up a cloud :( 14:16:00 http://status.openstack.org/elastic-recheck/data/integrated_gate.html is pretty good as far as classification rates 14:16:01 I've brought it up 14:16:11 me too. no luck. 14:16:25 the only known issue I know of is 1 of 12 zuul executors was misconfigured on its mirror node for about 12hours 14:16:27 yeah, I don't feel like the spurious failure rate is overly high. It's just taking forever to get a build through. 14:16:32 10 hours i guess 14:16:56 clarkb: So is it mainly just a matter of too many jobs, not enough hardware? 14:17:07 otherwise if you want more throughput we need to run fewer jobs, or make jobs faster, or add resources 14:17:13 if we're for real losing gate nodes, do we need to start a conversation about removing some tests? 14:17:47 Sounds like a good idea to me. 14:17:59 here are some ideas 14:18:00 https://review.opendev.org/#/c/650300/ 14:18:03 https://review.opendev.org/#/c/651865/ 14:18:21 i'm not cool with dropping tests, but being smarter about the jobs we use can be done (and i've tried to do) 14:18:30 we discussed about decreasing the test run in QA PTG. 14:18:33 #link http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005871.html 14:18:57 for nova it is just removing the swift test but for other proejct it will be good amount of improvement 14:19:56 forgive my ignorance, but when I see the zuul report in a patch, does each line represent stuff having run on a separate node? 14:20:10 yes 14:20:15 or sometimes more than one 14:20:17 also I imagine that every round of cpu vulnerability fixes decreases our throuput by some amount 14:20:23 a lot of our unit tests is test_python_works, as edleafe put it 14:20:32 efried: possibly more than one per line too 14:20:41 but improving that is a massive thankless job 14:20:45 so it seems like we could e.g. run all the pyXX unit tests on the same node, what harm? 14:20:57 what gain? 14:21:08 efried: that isnt really much faster because nova already uses all cpus for the tests 14:21:14 For those short-running tests, doesn't node setup & teardown consume most of the time? 14:21:23 not for unit tests 14:21:23 efried: not really 14:21:30 okay, never mind 14:21:31 particularly not for nova 14:21:33 you're thinking devstack + tempest 14:21:44 counter-intuitive. most of our node capacity is consumed by long-running jobs 14:22:02 ...and for those we don't want to share nodes because of weird interaction potentials and/or different configs... 14:22:04 optimizing the overhead on fast jobs only gets us a few percent, bar-napkin-math 14:22:10 okay, I'll stop 14:22:22 so one other thing we could consider 14:22:32 is changing the way we do patch series 14:22:42 aspiers and I were talking about this yesterday or the day before 14:23:16 tempest-full run a lot and long . with optimized integrated-gate idea, it will make it faster for many projects 14:23:36 efried: how by using more fixup patches at the end? if so that would break the master is always deployable goal 14:23:48 the way I've always seen it done, at least in nova, is if you have a series of changes, and you need to fix something on the bottom, you download the whole series, restack, fix the bottom, rebase the rest, and push the whole lot. The top N-1 patches are just rebase with no conflict, but they get to run again in the gate. 14:24:18 aspiers was being particularly gate-conservative by only fixing the one bottom patch and leaving the rest with the green tildes on them 14:24:28 yeah, i frequently avoid updating lower patches in a series and just comment that new change addresses it 14:24:32 right that work in some cases 14:24:39 the others cant merge unless you push the new commit though 14:24:55 depends on what you need to fix 14:24:56 because their parent will be outdated 14:24:56 clarkb: or you rely on gerrit to create a merge comit 14:25:06 sean-k-mooney: doesnt work if it is a stack 14:25:12 my cross-cell resize gate enablement is at the end of a 40+ patch series, and without rebasing the whole thing i can't get a gate run on the live code 14:25:16 so not going to work for me in that case 14:25:28 yeah, gerrit is extra picky about parent changes being in their final state in the dependency relationship 14:25:36 clarkb: i think it does as long as it does not create a merge conflict 14:25:43 nope gerrit refuses 14:25:48 clarkb: ok 14:25:50 try it sometime ;) 14:26:17 gerrit considers an explicit parent which is out of date as a blocker 14:26:17 feels like this is tangential to the nova meeting... 14:26:18 fungi: i generally collect comments on the serires and address and rebase tehm all in one go 14:27:31 point is, if we as a team try to adopt a more gate-conservative aspiers-ish approach to pushing patches, in or out of series, where possible and practical, it could help gate throughput. 14:27:33 moving on 14:27:52 #topic Reminders 14:27:55 any? 14:28:47 #topic Stable branch status 14:28:48 #link Stein regressions: http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005637.html 14:28:48 bug 1824435 still pending 14:28:48 #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein 14:28:48 #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky 14:28:48 #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens 14:28:49 bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [Medium,Triaged] https://launchpad.net/bugs/1824435 14:29:03 someone else should probably work on ^ 14:29:05 i'm not 14:29:21 i've got a stein release request up as well https://review.opendev.org/#/c/661376/ 14:29:29 just waiting on the release gods 14:31:07 this is related to the force refresh change correct 14:31:13 mriedem: we could bump that hash at this point, had another couple patches merge, including the unversioning of reno docs 14:31:14 huh? 14:31:30 mriedem: the force refresh of the instance info cache? 14:31:39 i'm not sure what you're asking me 14:31:47 we added a migration to populate the virtual interface table for old instance 14:31:47 that upgrade ga bug? 14:31:50 yes 14:31:54 there were a couple of upgrade issues from that 14:32:02 some fixed and waiting to release in that stein release patch, 14:32:06 that other one is still outstanding 14:32:30 efried: i could rev, but i don't really care to and lose the +2 14:32:32 for a docs fix 14:32:48 mriedem: nod, I +1d fwiw 14:33:18 anything else stable-y? 14:33:43 #topic Sub/related team Highlights 14:33:44 Placement (cdent) 14:33:44 #link latest pupdate http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006601.html 14:33:44 #link Spec for can_split https://review.opendev.org/#/c/658510/ 14:33:44 #link Spec for the rest of nested magic https://review.opendev.org/#/c/662191/ 14:33:44 #link spec for rg/rp mapping https://review.opendev.org/#/c/657582/ 14:33:48 cdent: your mic 14:33:55 thanks 14:34:00 two main things: 14:34:15 one is that there won't be a placement meeting any more, office hours are being arranged (see email) 14:34:41 #link placement office hours http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006728.html 14:34:53 the other is that can_split is proven to be problematic, an email has been sent out http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006726.html to ask ops if they'll live if we don't do it 14:35:19 people like artom or stephenfin if there are people in RH who should see that please pass it on 14:35:27 that's all 14:35:40 cdent, I can get our NFV PM eyes's on that I guess 14:36:04 thanks 14:36:07 Per links above, note that can_split is now in its own spec. The other, uncontroversial, bits of nested magic are in https://review.opendev.org/#/c/662191/ 14:36:16 (I thought NUMA stuff in placement kinda required can_split to work at all, but I'll read the email) 14:36:20 ^ this should be pretty close to approvable, if we could get some nova eyes to make sure it satisfies the use case. 14:36:44 artom: I'll be happy to TLDR the issue for you in -nova after this, if the email doesn't suffice. 14:36:52 (I haven't read it yet) 14:36:53 efried, ack, thanks :) 14:36:58 (but probably can't see the forest for the trees anyway) 14:37:11 And the squirrels. Don't forget the squirrels. 14:37:25 freakin' everywhere 14:37:36 (why must they play chicken with my tires?) 14:37:38 moving on 14:37:50 API (gmann) 14:37:50 Updates on ML- #link http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006699.html 14:37:53 gmann: anything to add? 14:38:01 other than that, policy improvement spec is under review #link https://review.opendev.org/#/c/547850/19 14:38:25 replied on johnthetubaguy comments, we will put more details about exact check_str etc in spec 14:38:57 thanks gmann 14:38:59 #topic Stuck Reviews 14:39:03 any? 14:39:31 https://review.opendev.org/#/c/579897/ came up yesterday 14:39:47 not sure if cern or ovh are going to pick it up again 14:40:42 sounds like mel wants a bp for it 14:40:55 and should remove a lot of the nvidia specific language from the code comment, 14:40:59 but otherwise jay dropped his -1 14:41:00 it might be a helpful signal if ^ were represented as a -1 14:41:34 idk that it's really a feature, 14:41:43 it's an existing support that only worked for linux guests 14:41:47 this makes it work for windows guests 14:42:00 and we could pester tssurya to see if cern is still interested in driving it 14:42:48 so i was previous +1 on that patch but stopped reviewing when it became political 14:43:36 it looks like its not in merge conflict so if we clean up the paperwork i think its fine to add 14:43:45 assuming they still want it 14:43:47 anyway, we can move on, the action is to find an owner 14:43:56 i'm sure cern and ovh forgot about it b/c they are already using it in production 14:45:22 we can move on... 14:45:36 ..okay.. 14:45:42 #topic Review status page (still NEW) 14:45:42 http://status.openstack.org/reviews/#nova 14:45:42 #help Ideas for how we should use this? 14:46:09 As of last time, someone (sean-k-mooney?) figured out that you can hover over things to figure out how the scores are calculated 14:46:13 i keep looking at https://review.opendev.org/#/c/361438 when you bring up that link, 14:46:21 the dependent neutron patch is abandoned, 14:46:37 therefore i'd like to abandon if unless someone (adrianc from mellanox?) is going to fix it 14:46:57 mriedem: ++, go ahead and abandon it please, it can always be restored if they care. 14:47:05 i'll ask adrianc_ about it, or cfriesen 14:47:13 you left that last comment over a month ago. 14:47:19 fair warning 14:48:38 the dependent patch is over 2 years old aswell 14:49:11 and was abandonded in feb 2017 14:49:17 ok i abandoned it 14:49:26 thanks mriedem 14:49:29 So I'm going to throw this out as a suggestion: It would be lovely if each nova team member would grab a patch from the top of that list - something with a 4-digit score - and figure out what needs to be done to make it go away (merge, abandon, whatever) and drive that. 14:49:36 mriedem: you're done for the week :) 14:50:01 ten minute warning 14:50:05 heh, i asked about https://review.opendev.org/#/c/489484 weeks ago too and never got a reply 14:50:16 from bauzas 14:50:43 why not force abandon any over N days stale and let other people sort it out? 14:50:49 I don't have a problem core-abandoning patches that have .... yeah, like that ^ 14:50:52 if it was important, itwill come back around 14:50:53 mriedem: bauzas is on PTO today as its a holiday in france 14:51:11 let's discuss the specifics of core-abandoning old stuff, separately. 14:51:19 so we can move on to 14:51:20 #topic Open discussion 14:51:31 any? 14:51:35 oh i guess it's the 3 month EU summer holiday season 14:51:46 yep 14:51:57 and no items form me 14:52:42 Thanks all. 14:52:42 o/ 14:52:42 #endmeeting