21:00:04 <melwitt> #startmeeting nova 21:00:05 <openstack> Meeting started Thu May 10 21:00:04 2018 UTC and is due to finish in 60 minutes. The chair is melwitt. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:08 <openstack> The meeting name has been set to 'nova' 21:00:11 <mriedem> o/ 21:00:12 <tssurya> o/ 21:00:13 <melwitt> hi everybody 21:00:15 <takashin> o/ 21:00:19 <dansmith> o/ 21:00:27 <edleafe> \o 21:00:47 <melwitt> let's get started 21:00:49 <efried> ō/ 21:00:52 <melwitt> #topic Release News 21:00:58 <melwitt> #link Rocky release schedule: https://wiki.openstack.org/wiki/Nova/Rocky_Release_Schedule 21:01:13 <melwitt> we've got the summit coming up soon, the week after next 21:01:27 <cdent> hmm. I totally shouldn't be here. Oh well. 21:01:30 * cdent settles in 21:01:34 <melwitt> r-2 is June 7 which is also the spec freeze 21:01:53 <melwitt> current runway status: 21:01:57 <melwitt> #link Rocky review runways: https://etherpad.openstack.org/p/nova-runways-rocky 21:02:03 <melwitt> #link runway #1: XenAPI: Support a new image handler for non-FS based SRs [END DATE: 2018-05-11] series starting at https://review.openstack.org/#/c/497201 21:02:11 <melwitt> #link runway #2: Add z/VM driver [END DATE: 2018-05-15] spec amendment needed at https://review.openstack.org/562154 and implementation starting at https://review.openstack.org/523387 21:02:17 <melwitt> #link runway #3: Local disk serial numbers [END DATE: 2018-05-16] series starting at https://review.openstack.org/526346 21:03:10 <melwitt> anyone have anything to add for release news or runways? 21:03:48 <melwitt> #topic Bugs (stuck/critical) 21:03:59 <melwitt> no critical bugs 21:04:07 <melwitt> #link 34 new untriaged bugs (up 3 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 21:04:15 <melwitt> #link 9 untagged untriaged bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 21:04:24 <mriedem> damn was down to 28 last night 21:04:42 <melwitt> wow, 6 new overnight then. that's a lot 21:04:52 <mriedem> https://bugs.launchpad.net/nova/+bug/1718439 isn't new but... 21:04:53 <openstack> Launchpad bug 1718439 in OpenStack Compute (nova) "Apparent lack of locking in conductor logs" [Undecided,New] 21:05:47 <melwitt> oh, okay got added to nova today 21:05:57 <mriedem> added back, i tried to pawn it off on oslo a year ago 21:06:02 <mriedem> anyway 21:06:21 <melwitt> okay 21:06:43 <melwitt> #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 21:07:00 <melwitt> please lend a helping hand with bug triage if you can ^ and thanks to all who have been helping 21:07:28 <melwitt> Gate status: 21:07:32 <melwitt> #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 21:07:45 <melwitt> there have been a lot of job timeouts lately, I've noticed 21:08:26 <melwitt> one cool thing is that e-r (elastic recheck) is working again and commenting on reviews if a known gate bug is hit 21:08:49 <melwitt> 3rd party CI: 21:08:55 <melwitt> #link 3rd party CI status http://ci-watch.tintri.com/project?project=nova&time=7+days 21:09:28 <melwitt> the zKVM third party job is currently broken because of a consoles related devstack change of mine 21:10:02 <melwitt> ML thread http://lists.openstack.org/pipermail/openstack-dev/2018-May/130331.html and a fix is proposed https://review.openstack.org/567298 21:10:24 <melwitt> does anyone have anything else on bugs, gate status or third party CI? 21:11:04 <melwitt> #topic Reminders 21:11:14 <melwitt> #link Forum session moderators, create your etherpads and link them at https://wiki.openstack.org/wiki/Forum/Vancouver2018 21:11:20 <melwitt> #link ML post http://lists.openstack.org/pipermail/openstack-dev/2018-May/130316.html 21:11:45 <melwitt> friendly reminder to get your etherpads created and link them on the wiki so people can find em 21:11:59 <melwitt> #link Rocky Review Priorities https://etherpad.openstack.org/p/rocky-nova-priorities-tracking 21:12:26 <melwitt> reminder that the subteam and bugs etherpad is there for finding patches 21:12:33 <melwitt> anything else for reminders? 21:12:57 <melwitt> #topic Stable branch status 21:13:04 <melwitt> #link stable/queens: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens,n,z 21:13:15 <melwitt> #link stable/pike: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike,n,z 21:13:20 <melwitt> #link stable/ocata: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/ocata,n,z 21:13:35 <melwitt> #link We're going to do some stable releases to get some regression fixes out: https://etherpad.openstack.org/p/nova-stable-branch-status 21:14:15 <melwitt> looks like "libvirt: check image type before removing snapshots in _cleanup_resize" is the last thing we need before we do the releases 21:14:33 <melwitt> anything else on stable branch status? 21:14:56 <melwitt> #topic Subteam Highlights 21:15:16 <melwitt> we skipped the cells v2 meeting again because we didn't need a meeting. anything interesting to mention dansmith? 21:15:23 <dansmith> negative ghostrider 21:15:28 <melwitt> k 21:15:38 <melwitt> edleafe, have some scheduler subteam news? 21:15:42 <edleafe> Gave an update on Nested Resource Providers progress to a Cyborg team rep 21:15:45 <edleafe> Bauzas remembered that he owes me a beer, to be paid in Vancouver 21:15:47 <edleafe> Confirmed the importance of merging the conversion of libvirt's get_inventory() method to update_provider_tree() 21:15:51 <edleafe> #link https://review.openstack.org/#/c/560444 21:15:53 <edleafe> This will be my last update. After running the Scheduler meeting for over two years, I'm stepping down. 21:15:56 <edleafe> EOM 21:16:58 <melwitt> glad to hear the collab with the cyborg team is going along smoothly it sounds like 21:17:22 <edleafe> yeah, I want to keep them in the loop 21:17:34 <melwitt> and sorry to hear that edleafe. thanks for chairing the meeting all these years 21:18:17 <melwitt> gibi has left some notes about the notifications meeting 21:18:26 <melwitt> "We talked about #link https://review.openstack.org/#/c/563269 Add notification support for trusted_certs and agreed that it is going to a good direction" 21:18:45 <melwitt> "We agreed that the notification requirement in #link https://review.openstack.org/#/c/554212 Add PENDING vm state can be fulfilled with existing notifications. But now I see that additional requirements are popping up in the spec so I have to go back and re-review it." 21:19:19 <melwitt> anything else for subteams? 21:19:45 <melwitt> #topic Stuck Reviews 21:19:57 <melwitt> nothing in the agenda. anyone have anything for stuck reviews? 21:20:21 <melwitt> #topic Open discussion 21:20:27 <melwitt> couple of agenda items in here, 21:20:33 <melwitt> first one: 21:20:41 <melwitt> (wznoinsk): I wanted to raise a concern about not running some advanced tempest tests (they're part of 'slow' type hence are skipped): https://github.com/openstack/tempest/search?utf8=%E2%9C%93&q=%22type%3D%27slow%27%22&type= 21:21:05 <melwitt> are you around wznoinsk? 21:21:14 <mriedem> we do run the slow tests in a job in the experimental queue 21:21:19 <mriedem> i always have to look up which one though 21:21:39 <melwitt> so they only run on-demand then? I wonder if that should be a daily periodic 21:21:48 <mriedem> for example, the encrypted volume tests are marked as slow 21:21:52 <mriedem> well, 21:22:01 <mriedem> i think i've also floated the idea of a job that only runs slow tests 21:22:14 <mriedem> since there are relatively few of them 21:22:35 <wznoinsk> melwitt, yes 21:22:37 <mriedem> they were removed from the main tempest-full job because the time it was taking to run api, scenario and slow tests 21:22:45 <melwitt> to run on every change, the slow tests you mean? 21:22:48 <mriedem> yes 21:22:57 <melwitt> ah, okay. so they used to be in tempest-full 21:23:09 <mriedem> i believe so, until sdague went on a rampage 21:23:13 <melwitt> yeah, I think they should be run once a day at the very least 21:23:22 <mriedem> we could even have a nova-slow job in-tree that just runs compute api and all slow scenario tests 21:23:36 <mriedem> i could put up something to see what that looks like and how long it takes 21:23:39 <melwitt> having a separate slow-only job on every change sounds fine too being that we used to run them anyway as part of tempest-full 21:23:48 <mriedem> if it's only like 45 minutes that's about half of a normal tempest-full run 21:23:50 <wznoinsk> melwitt, mriedem I don't think some of them are that slow... I can compare times but IIRC there wasn't anything time-exploding 21:23:53 <melwitt> okay, I think that would be a good idea 21:24:24 <mriedem> i think it's this one legacy-tempest-dsvm-neutron-scenario-multinode-lvm-multibackend 21:24:39 <wznoinsk> melwitt, mriedem basically I found that because we don't run these slow upstream at all (or I couldn't find any check/gate job with them) some of them are basically broken 21:24:41 <mriedem> in nova experimental i mean 21:24:56 <mriedem> yeah, and we could be regressing encrypted volume support w/o knowing it 21:25:03 <melwitt> wznoinsk: gotcha. thanks for bringing it up, I didn't know about this 21:26:05 <melwitt> okay, so let's try out a nova-slow in-tree job and see if it's reasonable to run on all changes to guard against regressions 21:26:30 <melwitt> anything else on that topic? 21:26:33 <mriedem> #action mriedem to add nova-slow job 21:27:01 <melwitt> do I have to type that to make it go on the minutes? 21:27:02 <wznoinsk> mriedem, there are a few networking related slows too 21:27:15 <mriedem> wznoinsk: sure but nova changes don't care about slow provider network tests 21:28:01 <mriedem> we can argue in the review 21:28:03 <wznoinsk> mriedem, would I bring up this topic with networking guys too then? or can it be somehow discussed between teams internally? 21:28:20 <mriedem> cinder would also likely want this 21:28:28 <mriedem> or we just do a generic tempest-slow job 21:28:33 <mriedem> should talk to gmann 21:28:41 <wznoinsk> mriedem++ 21:28:55 <mriedem> wznoinsk: how about you talk to gmann :) 21:29:09 <wznoinsk> mriedem, I don't mind, will do 21:29:45 <melwitt> cool. let's move to the next agenda item: 21:30:00 <melwitt> (takashin): Abort Cold Migration #link https://review.openstack.org/#/c/334732/ 21:30:22 <takashin> About the "Abort Cold Migration" function(spec), I got feedbacks from operators in openstack-operators mailing list. 21:30:36 <takashin> #link http://lists.openstack.org/pipermail/openstack-operators/2018-May/015209.html 21:30:44 <takashin> #link http://lists.openstack.org/pipermail/openstack-operators/2018-May/015237.html 21:30:53 <takashin> I would like to proceed "List/show all server migration types" implementation that the function depends on. 21:31:00 <takashin> #link https://review.openstack.org/#/c/430608/ 21:31:16 <mriedem> takashin: you got replies from some people from your company, 21:31:21 <mriedem> and i asked them some follow up questoins 21:31:23 <mriedem> *questions 21:31:47 <takashin> Yes. 21:32:27 <mriedem> so i personally would like to at least wait to hear those responses 21:32:58 <takashin> okay. 21:33:08 <melwitt> IIRC, this has been discussed long ago and the usual use case is, cold migrating an instance with a large disk that will take forever, and wanting the ability to cancel it 21:33:30 <mriedem> sure, on non-shared storage 21:33:45 <mriedem> which was the one other reply in the ops list, they are rolling out a new deployment and before investing in it, they use cheap local storage 21:33:45 <takashin> or it is stalled out 21:33:48 <mriedem> not ceph 21:34:03 <melwitt> right 21:34:21 <mriedem> i'd like more details in the ops list reply about the stall out case 21:35:03 <mriedem> i dont really want to put a bunch of time into new apis and plumbing for one operator that hits some stall issue without details 21:36:19 <melwitt> fwiw, I've thought the use case sounds reasonable but that the implementation would be really error prone and racy. I don't know if that could be solved with a simple task_state lockout 21:37:04 <mriedem> we also can't reliably test this 21:37:12 <mriedem> without stubs 21:37:24 <mriedem> by test i'm talking tempest 21:37:41 <dansmith> cleanup from stuff like this is a pita 21:37:46 <dansmith> and if we can't test it (agree we can't), then it's going to be ugly 21:37:59 <dansmith> I know it sounds useful in principle, 21:38:13 <dansmith> but I think most people that care about this kind of thing use shared storage, 21:38:38 <dansmith> because allowing your users to move terabytes of data around your network and disks every time they want to resize is kinda ugly 21:38:55 <dansmith> the cancel and cleanup case for ceph, by the way, 21:39:20 <dansmith> is potentially even worse than the "stop a copy and roll back" case of copying the disk manually 21:39:49 <dansmith> so without some really compelling reason I'd love to not introduce this complication 21:39:50 <melwitt> you mean non-shared ceph? 21:40:22 <dansmith> no 21:41:10 <melwitt> okay, I thought in the shared storage case, you don't have to copy anything 21:41:20 <dansmith> right, 21:41:54 <dansmith> you've got a vastly smaller window of time where it makes sense to abort 21:42:38 <mriedem> i would expect anyone using ceph to not need this 21:42:40 <mriedem> so they won't use it 21:42:53 <dansmith> right, but, it'll be there.. a button to push 21:43:18 <melwitt> yeah ... I think that's a good summary of why we haven't gone forward with this in the past. really complex and anyone with large storage isn't going to be doing local storage 21:43:25 <melwitt> (if they want to migrate things) 21:44:31 <mriedem> so (1) wait for replies, (2) talk some more 21:44:40 <melwitt> yeah 21:44:41 <mriedem> (3) we're <2 weeks from the forum with real live operators in the room to harass 21:44:57 <mriedem> there will be a public cloud forum session, 21:44:58 <takashin> okay 21:45:01 <mriedem> where they talk about gaps and such, 21:45:07 <mriedem> we should get some feedback in that session for sure 21:45:16 <mriedem> i'll find their etherpad and add this 21:45:32 <melwitt> cool, thank you 21:45:39 <takashin> Thanks. 21:46:16 <mriedem> https://etherpad.openstack.org/p/YVR-publiccloud-wg-brainstorming btw 21:46:40 <melwitt> okay, so we'll await the replies to the ML thread and gain more understanding of what the "stall out" problem is, whether large disk local storage is being used, and why shared storage or boot-from-volume isn't being used 21:47:52 <melwitt> anything else on that topic? anyone have any other topics for open discussion? 21:48:14 <dansmith> END IT 21:48:27 <melwitt> :) 21:48:34 <melwitt> going once 21:48:44 <melwitt> going twice 21:48:56 <melwitt> ... 21:49:13 <melwitt> #endmeeting