#openstack-meeting log

14:00:13 <efried> #startmeeting nova
14:00:14 <openstack> Meeting started Thu May 30 14:00:13 2019 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:17 <openstack> The meeting name has been set to 'nova'
14:00:24 <mriedem> o/
14:00:28 <takashin> o/
14:00:33 <gmann> o/
14:00:35 <edleafe> \o
14:00:37 <gibi> o/
14:00:43 <artom> o/
14:01:06 <johnthetubaguy> o/
14:01:33 <efried> #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
14:01:35 <stephenfin> o/
14:01:42 <cdent> o/
14:02:26 <efried> #topic Last meeting
14:02:26 <efried> #link Minutes from last meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-05-23-21.00.html
14:03:00 <efried> I think the only update from here is to mention that https://blueprints.launchpad.net/nova/+spec/remove-consoleauth got the necessary federal background check and has now been approved.
14:03:14 <efried> stephenfin: just noticed this isn't tagged against a release - assume you're planning to do this in Train?
14:03:22 <stephenfin> (y)
14:03:30 <efried> k, fixed
14:03:35 <efried> anything else from last meeting?
14:04:26 <efried> #topic Release News
14:04:55 <efried> I'm thinking it might be a good idea to do a spec scrub soon
14:05:06 <stephenfin> Ooh, yes please
14:05:10 * johnthetubaguy nods
14:05:13 <cdent> ya
14:05:23 <efried> How does next Tuesday grab y'all?
14:05:39 <mriedem> wfm
14:05:53 <gibi> mostly wfm
14:06:10 <gmann> +1
14:06:16 <efried> #action efried to post on ML re spec review sprint Tuesday June 4th
14:06:23 <efried> anything else release newsy?
14:06:51 <efried> #topic Bugs (stuck/critical)
14:06:51 <efried> No Critical bugs
14:06:51 <efried> #link 82 new untriaged bugs (down 3 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
14:06:51 <efried> #link 10 untagged untriaged bugs (down 1 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW
14:07:10 <efried> any bugs in the spotlight?
14:07:24 <efried> Gate status
14:07:24 <efried> #link check queue gate status http://status.openstack.org/elastic-recheck/index.html
14:07:24 <efried> 3rd party CI
14:07:24 <efried> #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days
14:07:38 <efried> zuul has seemed ungodsly slow lately
14:07:40 <mriedem> were we going to put up patches to log a warning on start of drivers that don't have 3rd party ci?
14:07:51 <mriedem> at least xenapi,
14:07:55 <mriedem> sounds like powervm is working on fixing their ci
14:08:21 <efried> I saw them put up a fix this morning, yah. Haven't caught up on the ML yet todya.
14:08:22 <efried> today
14:08:23 <cdent> gee, guess what, vmware's ci was dead again and the keepers took days to notice :(
14:08:47 <cdent> (but it's been kicked now)
14:08:50 <efried> mriedem: I remember we agreed to do that, yes. Who's volunteering to put up that patch?
14:08:57 <mriedem> i suppose i can
14:09:05 <mriedem> give me an action item
14:09:20 <efried> #action mriedem to propose patch logging warning that xenapi is not supported because no 3pCI
14:09:32 <efried> thanks mriedem
14:09:55 <stephenfin> warning or deprecating outright?
14:10:10 <artom> I guess where's the line between no ci and half & maintained ci, but il leave that for the review
14:10:32 <mriedem> stephenfin: i was going to start with a warning
14:10:37 <johnthetubaguy> I spoke to bob ball about that, some neutron "bug" is blocking things working
14:10:52 <stephenfin> I imagine that depends on whether there's anyone in the wild still using it. Might be worth bringing up on openstack-discuss
14:10:56 <efried> in any case we should be sure to add all the xen folks we know about to that review.
14:10:57 <stephenfin> Ah, okay. That's fair
14:10:59 <johnthetubaguy> but, yeah, +1 the log message warning about removal
14:11:00 <mriedem> artom: i don't know what you're asking, the xenserver ci has been dead for months
14:11:31 <johnthetubaguy> yeah, its a dead ci at this point, we just merge the log message ASAP
14:11:45 <johnthetubaguy> if it gets removed as the CI becomes stable again, then whoop whoop, go for it
14:11:56 <cdent> ya
14:12:08 <efried> cool, moving on.
14:12:13 <artom> mriedem, just saying that some 3pcis are more reactive than others
14:12:31 <efried> yeah, we're not deprecating every driver that goes red for two weeks
14:12:40 <efried> ....
14:12:40 <efried> Anyone know anything about zuul slowitude?
14:12:44 <artom> let's move on, I'll nitpick the review ;)
14:12:53 <mriedem> efried: you'd have to ask infra, but i'm guessing fewer nodes
14:13:00 <mriedem> http://grafana.openstack.org/d/T6vSHcSik/zuul-status?
14:13:29 <mriedem> http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1&fullscreen&panelId=20
14:14:19 <efried> those are very pretty
14:14:31 <efried> fraid I don't really know what I'm looking at though
14:15:11 <efried> I guess what I really want is for somebody to say, "Yes, it's a known issue, and we're working on it, and it'll all be fixed very very soon, so your jobs will wait in queue for five minutes tops."
14:15:25 <mriedem> clarkb: fungi: ^ if there are known infra issues can we get a summary in the ML?
14:15:43 <mriedem> i.e. fewer nodes, lots of gate failures, etc
14:15:48 <cdent> efried: I think to get that you're going to need to have your employer pony up a cloud :(
14:16:00 <mriedem> http://status.openstack.org/elastic-recheck/data/integrated_gate.html is pretty good as far as classification rates
14:16:01 <efried> I've brought it up
14:16:11 <cdent> me too. no luck.
14:16:25 <clarkb> the only known issue I know of is 1 of 12 zuul executors was misconfigured on its mirror node for about 12hours
14:16:27 <efried> yeah, I don't feel like the spurious failure rate is overly high. It's just taking forever to get a build through.
14:16:32 <clarkb> 10 hours i guess
14:16:56 <efried> clarkb: So is it mainly just a matter of too many jobs, not enough hardware?
14:17:07 <clarkb> otherwise if you want more throughput we need to run fewer jobs, or make jobs faster, or add resources
14:17:13 <artom> if we're for real losing gate nodes, do we need to start a conversation about removing some tests?
14:17:47 <efried> Sounds like a good idea to me.
14:17:59 <mriedem> here are some ideas
14:18:00 <mriedem> https://review.opendev.org/#/c/650300/
14:18:03 <mriedem> https://review.opendev.org/#/c/651865/
14:18:21 <mriedem> i'm not cool with dropping tests, but being smarter about the jobs we use can be done (and i've tried to do)
14:18:30 <gmann> we discussed about decreasing the test run in QA PTG.
14:18:33 <gmann> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005871.html
14:18:57 <gmann> for nova it is just removing the swift test but for other proejct it will be good amount of improvement
14:19:56 <efried> forgive my ignorance, but when I see the zuul report in a patch, does each line represent stuff having run on a separate node?
14:20:10 <mriedem> yes
14:20:15 <fungi> or sometimes more than one
14:20:17 <clarkb> also I imagine that every round of cpu vulnerability fixes decreases our throuput by some amount
14:20:23 <artom> a lot of our unit tests is test_python_works, as edleafe put it
14:20:32 <clarkb> efried: possibly more than one per line too
14:20:41 <artom> but improving that is a massive thankless job
14:20:45 <efried> so it seems like we could e.g. run all the pyXX unit tests on the same node, what harm?
14:20:57 <fungi> what gain?
14:21:08 <clarkb> efried: that isnt really much faster because nova already uses all cpus for the tests
14:21:14 <efried> For those short-running tests, doesn't node setup & teardown consume most of the time?
14:21:23 <mriedem> not for unit tests
14:21:23 <sean-k-mooney> efried: not really
14:21:30 <efried> okay, never mind
14:21:31 <clarkb> particularly not for nova
14:21:33 <mriedem> you're thinking devstack + tempest
14:21:44 <fungi> counter-intuitive. most of our node capacity is consumed by long-running jobs
14:22:02 <efried> ...and for those we don't want to share nodes because of weird interaction potentials and/or different configs...
14:22:04 <fungi> optimizing the overhead on fast jobs only gets us a few percent, bar-napkin-math
14:22:10 <efried> okay, I'll stop
14:22:22 <efried> so one other thing we could consider
14:22:32 <efried> is changing the way we do patch series
14:22:42 <efried> aspiers and I were talking about this yesterday or the day before
14:23:16 <gmann> tempest-full run a lot and long . with optimized integrated-gate idea, it will make it faster for many projects
14:23:36 <sean-k-mooney> efried: how by using more fixup patches at the end? if so that would break the master is always deployable goal
14:23:48 <efried> the way I've always seen it done, at least in nova, is if you have a series of changes, and you need to fix something on the bottom, you download the whole series, restack, fix the bottom, rebase the rest, and push the whole lot. The top N-1 patches are just rebase with no conflict, but they get to run again in the gate.
14:24:18 <efried> aspiers was being particularly gate-conservative by only fixing the one bottom patch and leaving the rest with the green tildes on them
14:24:28 <fungi> yeah, i frequently avoid updating lower patches in a series and just comment that new change <whatever> addresses it
14:24:32 <sean-k-mooney> right that work in some cases
14:24:39 <clarkb> the others cant merge unless you push the new commit though
14:24:55 <fungi> depends on what you need to fix
14:24:56 <clarkb> because their parent will be outdated
14:24:56 <sean-k-mooney> clarkb: or you rely on gerrit to create a merge comit
14:25:06 <clarkb> sean-k-mooney: doesnt work if it is a stack
14:25:12 <mriedem> my cross-cell resize gate enablement is at the end of a 40+ patch series, and without rebasing the whole thing i can't get a gate run on the live code
14:25:16 <mriedem> so not going to work for me in that case
14:25:28 <fungi> yeah, gerrit is extra picky about parent changes being in their final state in the dependency relationship
14:25:36 <sean-k-mooney> clarkb: i think it does as long as it does not create a merge conflict
14:25:43 <clarkb> nope gerrit refuses
14:25:48 <sean-k-mooney> clarkb: ok
14:25:50 <fungi> try it sometime ;)
14:26:17 <fungi> gerrit considers an explicit parent which is out of date as a blocker
14:26:17 <mriedem> feels like this is tangential to the nova meeting...
14:26:18 <sean-k-mooney> fungi: i generally collect comments on the serires and address and rebase tehm all in one go
14:27:31 <efried> point is, if we as a team try to adopt a more gate-conservative aspiers-ish approach to pushing patches, in or out of series, where possible and practical, it could help gate throughput.
14:27:33 <efried> moving on
14:27:52 <efried> #topic Reminders
14:27:55 <efried> any?
14:28:47 <efried> #topic Stable branch status
14:28:48 <efried> #link Stein regressions: http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005637.html
14:28:48 <efried> bug 1824435 still pending
14:28:48 <efried> #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein
14:28:48 <efried> #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky
14:28:48 <efried> #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens
14:28:49 <openstack> bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [Medium,Triaged] https://launchpad.net/bugs/1824435
14:29:03 <mriedem> someone else should probably work on ^
14:29:05 <mriedem> i'm not
14:29:21 <mriedem> i've got a stein release request up as well https://review.opendev.org/#/c/661376/
14:29:29 <mriedem> just waiting on the release gods
14:31:07 <sean-k-mooney> this is related to the force refresh change correct
14:31:13 <efried> mriedem: we could bump that hash at this point, had another couple patches merge, including the unversioning of reno docs
14:31:14 <mriedem> huh?
14:31:30 <sean-k-mooney> mriedem: the force refresh of the instance info cache?
14:31:39 <mriedem> i'm not sure what you're asking me
14:31:47 <sean-k-mooney> we added a migration to populate the virtual interface table for old instance
14:31:47 <mriedem> that upgrade ga bug?
14:31:50 <mriedem> yes
14:31:54 <mriedem> there were a couple of upgrade issues from that
14:32:02 <mriedem> some fixed and waiting to release in that stein release patch,
14:32:06 <mriedem> that other one is still outstanding
14:32:30 <mriedem> efried: i could rev, but i don't really care to and lose the +2
14:32:32 <mriedem> for a docs fix
14:32:48 <efried> mriedem: nod, I +1d fwiw
14:33:18 <efried> anything else stable-y?
14:33:43 <efried> #topic Sub/related team Highlights
14:33:44 <efried> Placement (cdent)
14:33:44 <efried> #link latest pupdate http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006601.html
14:33:44 <efried> #link Spec for can_split https://review.opendev.org/#/c/658510/
14:33:44 <efried> #link Spec for the rest of nested magic https://review.opendev.org/#/c/662191/
14:33:44 <efried> #link spec for rg/rp mapping https://review.opendev.org/#/c/657582/
14:33:48 <efried> cdent: your mic
14:33:55 <cdent> thanks
14:34:00 <cdent> two main things:
14:34:15 <cdent> one is that there won't be a placement meeting any more, office hours are being arranged (see email)
14:34:41 <efried> #link placement office hours http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006728.html
14:34:53 <cdent> the other is that can_split is proven to be problematic, an email has been sent out http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006726.html to ask ops if they'll live if we don't do it
14:35:19 <cdent> people like artom or stephenfin if there are people in RH who should see that please pass it on
14:35:27 <cdent> that's all
14:35:40 <artom> cdent, I can get our NFV PM eyes's on that I guess
14:36:04 <cdent> thanks
14:36:07 <efried> Per links above, note that can_split is now in its own spec. The other, uncontroversial, bits of nested magic are in https://review.opendev.org/#/c/662191/
14:36:16 <artom> (I thought NUMA stuff in placement kinda required can_split to work at all, but I'll read the email)
14:36:20 <efried> ^ this should be pretty close to approvable, if we could get some nova eyes to make sure it satisfies the use case.
14:36:44 <efried> artom: I'll be happy to TLDR the issue for you in -nova after this, if the email doesn't suffice.
14:36:52 <efried> (I haven't read it yet)
14:36:53 <artom> efried, ack, thanks :)
14:36:58 <efried> (but probably can't see the forest for the trees anyway)
14:37:11 <artom> And the squirrels. Don't forget the squirrels.
14:37:25 <cdent> freakin' everywhere
14:37:36 <efried> (why must they play chicken with my tires?)
14:37:38 <efried> moving on
14:37:50 <efried> API (gmann)
14:37:50 <efried> Updates on ML- #link http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006699.html
14:37:53 <efried> gmann: anything to add?
14:38:01 <gmann> other than that, policy improvement spec is under review #link https://review.opendev.org/#/c/547850/19
14:38:25 <gmann> replied on johnthetubaguy comments, we will put more details about exact check_str etc in spec
14:38:57 <efried> thanks gmann
14:38:59 <efried> #topic Stuck Reviews
14:39:03 <efried> any?
14:39:31 <mriedem> https://review.opendev.org/#/c/579897/ came up yesterday
14:39:47 <mriedem> not sure if cern or ovh are going to pick it up again
14:40:42 <mriedem> sounds like mel wants a bp for it
14:40:55 <mriedem> and should remove a lot of the nvidia specific language from the code comment,
14:40:59 <mriedem> but otherwise jay dropped his -1
14:41:00 <efried> it might be a helpful signal if ^ were represented as a -1
14:41:34 <mriedem> idk that it's really a feature,
14:41:43 <mriedem> it's an existing support that only worked for linux guests
14:41:47 <mriedem> this makes it work for windows guests
14:42:00 <efried> and we could pester tssurya to see if cern is still interested in driving it
14:42:48 <sean-k-mooney> so i was previous +1 on that patch but stopped reviewing when it became political
14:43:36 <sean-k-mooney> it looks like its not in merge conflict so if we clean up the paperwork i think its fine to add
14:43:45 <sean-k-mooney> assuming they still want it
14:43:47 <mriedem> anyway, we can move on, the action is to find an owner
14:43:56 <mriedem> i'm sure cern and ovh forgot about it b/c they are already using it in production
14:45:22 <mriedem> we can move on...
14:45:36 <efried> ..okay..
14:45:42 <efried> #topic Review status page (still NEW)
14:45:42 <efried> http://status.openstack.org/reviews/#nova
14:45:42 <efried> #help Ideas for how we should use this?
14:46:09 <efried> As of last time, someone (sean-k-mooney?) figured out that you can hover over things to figure out how the scores are calculated
14:46:13 <mriedem> i keep looking at https://review.opendev.org/#/c/361438 when you bring up that link,
14:46:21 <mriedem> the dependent neutron patch is abandoned,
14:46:37 <mriedem> therefore i'd like to abandon if unless someone (adrianc from mellanox?) is going to fix it
14:46:57 <efried> mriedem: ++, go ahead and abandon it please, it can always be restored if they care.
14:47:05 <mriedem> i'll ask adrianc_ about it, or cfriesen
14:47:13 <efried> you left that last comment over a month ago.
14:47:19 <efried> fair warning
14:48:38 <sean-k-mooney> the dependent patch is over 2 years old aswell
14:49:11 <sean-k-mooney> and was abandonded in feb 2017
14:49:17 <mriedem> ok i abandoned it
14:49:26 <efried> thanks mriedem
14:49:29 <efried> So I'm going to throw this out as a suggestion: It would be lovely if each nova team member would grab a patch from the top of that list - something with a 4-digit score - and figure out what needs to be done to make it go away (merge, abandon, whatever) and drive that.
14:49:36 <efried> mriedem: you're done for the week :)
14:50:01 <cdent> ten minute warning
14:50:05 <mriedem> heh, i asked about https://review.opendev.org/#/c/489484 weeks ago too and never got a reply
14:50:16 <mriedem> from bauzas
14:50:43 <cdent> why not force abandon any over N days stale and let other people sort it out?
14:50:49 <efried> I don't have a problem core-abandoning patches that have .... yeah, like that ^
14:50:52 <cdent> if it was important, itwill come back around
14:50:53 <sean-k-mooney> mriedem: bauzas is on PTO today as its a holiday in france
14:51:11 <efried> let's discuss the specifics of core-abandoning old stuff, separately.
14:51:19 <efried> so we can move on to
14:51:20 <efried> #topic Open discussion
14:51:31 <efried> any?
14:51:35 <mriedem> oh i guess it's the 3 month EU summer holiday season
14:51:46 <sean-k-mooney> yep
14:51:57 <sean-k-mooney> and no items form me
14:52:42 <efried> Thanks all.
14:52:42 <efried> o/
14:52:42 <efried> #endmeeting