Thursday, 2018-11-29

gibi#startmeeting nova14:00
openstackMeeting started Thu Nov 29 14:00:04 2018 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:00
*** openstack changes topic to " (Meeting topic: nova)"14:00
openstackThe meeting name has been set to 'nova'14:00
*** abhishekk has left #openstack-meeting14:00
*** georgh has joined #openstack-meeting14:01
*** longkb has quit IRC14:01
gibilet's get started14:01
gibi#topic Release News14:01
*** openstack changes topic to "Release News (Meeting topic: nova)"14:01
* bauzas waves late14:01
gibi#link Stein release schedule:
gibinext milestone is 10th of January14:02
gibi#link Stein runway etherpad:
gibi#link runway #1: (jackding) [END 2018-12-04] (approved Nov 27, currently in the gate queue)14:02
gibi#link runway #2: (bauzas/naichuan) [END 2018-12-04]14:02
gibi libvirt: implement reshaper for vgpu (bauzas)14:02
gibi xenapi(N-R-P): support compute node resource provider update (naichuan)14:02
gibi#link runway #3 (yikun) [END 2018-12-10] starts here
gibihopefully the io semaphore item goes through CI soon and then one slot can be freed14:03
gibiany comments about release or runways?14:03
gibi#topic Bugs14:04
*** openstack changes topic to "Bugs (Meeting topic: nova)"14:04
gibiNo critical bugs14:04
gibi#link 54 new untriaged bugs (down 4 since the last meeting):
gibi#link 9 untagged untriaged bugs (up 2 since the last meeting):*&field.status%3Alist=NEW14:04
*** jamesmcarthur has quit IRC14:04
gibi#link bug triage how-to:
gibi#help need help with bug triage14:04
gibiany comments on the bug situation?14:05
*** arne_wiebalck_ has quit IRC14:05
cdentI'm trying to stir some bug triage help internally, but everyone is busy :(\14:05
gibithanks cdent14:05
cdentseems same story everywhere14:06
gibiseems so14:06
gibi#topic Gate status14:06
*** openstack changes topic to "Gate status (Meeting topic: nova)"14:06
gibi#link check queue gate status
gibithere are some mirror issues as far as I see14:06
gibiit seems is a heavy hitter14:07
gibi#link 3rd party CI status
gibithe link ^^ does not open for me14:07
gibiany comments about the gate or 3rd party CI?14:08
bauzasthat depens14:08
bauzasit can be long to open14:08
gibibauzas: it times out for me today14:08
bauzashmm, right14:09
efriedyeah, I heard whoever was maintaining it, wasn't anymore.14:09
efriedI (or someone) started a conversation in -infra, possibly with the thought of getting someone to take over / replace that with something equivalent14:09
*** bobh has joined #openstack-meeting14:09
efriedbut I'm not sure if that panned out.14:09
efriedThat was several weeks ago, before chaos.14:09
bauzasI have a very bad DSL connection at home, so I'm not really the right guy to tell whether it's a problem or not14:10
gibiefried: thanks for the info14:10
*** mjturek has joined #openstack-meeting14:10
gibi#topic Reminders14:10
*** openstack changes topic to "Reminders (Meeting topic: nova)"14:10
gibi#link Stein Subteam Patches n Bugs:
gibiany other reminders?14:10
gibi#topic Stable branch status14:11
*** openstack changes topic to "Stable branch status (Meeting topic: nova)"14:11
gibi#link stable/rocky:,n,z14:11
gibi#link stable/queens:,n,z14:11
gibi#link stable/pike:,n,z14:11
gibithere is a list of patches waiting for a second stable core14:11
*** sean-k-mooney has joined #openstack-meeting14:11
gibiany comment on stable?14:12
sean-k-mooneyhas there been any progress on teh oslo.service issue14:13
gmannFYI, nova-next does not run on queens. i have pushed the backport14:13
gmannand matt backported  the devstack patch14:13
gibisean-k-mooney: this is the last mail in the ML
gibisean-k-mooney: I did not follow the issue closely14:15
efriedI'm happy with any of the proposed solutions, so I'm sitting back and letting others sort it out.14:16
gibigmann: thanks14:16
sean-k-mooneyit looklike a new version is proposed for oslo.servce to fix it
efriedbut I do feel kinda guilty for causing the whole debacle.14:16
sean-k-mooneyok i think we can assume that it is progressing14:18
gibiOK, moving on14:18
gibi#topic Subteam Highlights14:18
*** openstack changes topic to "Subteam Highlights (Meeting topic: nova)"14:18
gibiScheduler (efried)14:18
efried#link n-sch meeting minutes
efriedLots of specs need TLC from authors14:18
efriedExtraction is proceeding well. The14:18
efried#link devstack change
efriedhas merged since the meeting14:18
efried#link data migrations fix from tetsuro #link
efriedLots of small/easy placement changes that could use a look to clean things up.14:19
efried#link placement open patches
efried#link integrated template
efriedwhich has since merged14:19
efriedReshaper patches still need review14:19
efried#link libvirt reshaper
efried#link xen reshaper (middle of series)
efriedFFU framework for reshaper: we will consciously continue kicking this can down the road.14:19
efriedEducated tetsuro on flamethrowers14:19
gibiefried: thanks14:19
edleafeImportant education14:19
gibiAPI (gmann)14:20
*** longkb has joined #openstack-meeting14:20
gmannNo office hour this week14:20
gmannTriaged 5 bugs during that time.14:20
gmannUpdated the subteam tracking etherpad #link
gmannDetail status of this week #link
gibigmann: thanks14:20
gibiany other subteam report?14:20
gibi#topic Stuck Reviews14:21
*** openstack changes topic to "Stuck Reviews (Meeting topic: nova)"14:21
gibiwe have one on the agenda14:21
gibi(mriedem): Need to figure out what to do about the fix for gate bug 1798688: - do we whack the mole in the compute code or detect and retry in the scheduler?14:21
openstackbug 1798688 in OpenStack Compute (nova) "AllocationUpdateFailed_Remote: Failed to update allocations for consumer. Error: another process changed the consumer after the report client read the consumer state during the claim" [High,In progress] - Assigned to Matt Riedemann (mriedem)14:21
gibiThis is basically a race condition between shelve offload and unshelve14:22
gibidetected by the consumer generation14:23
gibiin placement14:23
gibiefried: had -1 on the code and I also left an alternative proposal this morning there14:23
gibithe currently proposed solution is a retry in the report client14:23
cdentI think I prefer option 2 as well. Without matt or dan here, it's hard to move along14:23
bauzasI don't have context14:24
gibicdent: yeah, without dansmith and mriedem it is not easy to discuss this item14:24
bauzashah, reading the commit msg14:24
georghi need a final approve for this change:
gibigeorgh: we can get back to that in Open Discussion14:25
gibiso I think I will leave this review in the agenda in the stuck reviews14:26
gibiand if nothing changes til next week then we can try to dicuss it again14:26
bauzasI thought we said retries are fine when we have a generation issue ?14:26
bauzasif so, why not for an unshelve ?14:27
edleafebauzas: I was wondering the same thing14:27
efriedbauzas: that's too broad a statement.14:27
gibibauzas: blind retry felt like ignoring the consumer generation feature itself14:27
efriedyes, exactly.14:27
bauzasso we should check the generation bit, that's your concern ?14:27
gibito be correct the proposed solution is not totally blind14:27
gibibauzas: I personally would like to avoid the unshelve race if possible14:28
edleafeRetry with a GET to update the current state of the allocations14:28
edleafethat's what the idea of consumer gens was for14:28
*** rambo_li_ has joined #openstack-meeting14:28
efriedThe statement needs to be more like, "When you get a generation conflict, you need to re-GET the relevant pieces and re-execute the relevant logic, which may or may not amount to an exact retry."14:28
efriedYeah, what edleafe said.14:28
edleafeSo why is this case different?14:29
efriededleafe: I think it's the scope of the retry14:29
efriededleafe: I think the point is that some of the logic from the caller of this method would need to be included in the retry.14:29
*** njohnston has left #openstack-meeting14:29
*** mriedem has joined #openstack-meeting14:29
efriedI don't know that for sure in this case btw14:30
edleafeThe retry would be better in the code itself, not the reportclient, which whould be generic14:30
cdentIs there anything wrong with gibi's proposal? because fixing a race is much nicer than handling a race14:30
*** ttsiouts has joined #openstack-meeting14:30
sean-k-mooneythis is the basese fo the compare and swap idiom for concurrent modifcatoin of a shared datastructure. the unshlve action would need to read the state but it shoudl be able to retry14:30
edleafecdent: I'm not familiar enough with unshelve to know whether that would break other assumptions14:31
mdboothIs there any reason why you wouldn't make both changes?14:31
bauzasedleafe: unshelve is just a crazy scheduling call which literrally recreates the instance14:32
sean-k-mooneymdbooth: it might hide if we reintoduce the race after fixing it if we have a retry mechaniums too14:32
edleafebauzas: yeah, I get that. It's the 'crazy' part that I'm nervous about14:32
efriedmriedem: are you following / catching up?14:32
mriedemdid dst mess me up?14:33
*** eharney has joined #openstack-meeting14:33
mriedemthen no14:33
mriedemyou're talking about my stuck patch14:34
efriedmriedem: we're discussing the delete-consumer-race-on-shelving14:34
bauzasedleafe: for more details
mriedemas i said in the change, i could do the spot fix in the shelve code or the more generic fix in the scheduler, i opted for the latter14:34
mriedemit seems to me the scheduler code, before consumer aggregates, was already retrying on conflict14:34
mriedemso i followed that14:34
mriedemfixing where shelve removes the allocatoins will fix *this* race but i worry about others14:35
gibimriedem: I would fix the race in the shelve code as that would be a specific fix. the current proposal is a generic fix for more than just shelve - unshelve race14:35
efriedAgree with ^14:35
mriedemso people want both?14:36
gibimriedem: I'm affraid the retry would hide things we eventually want to fix14:36
mriedemok, i guess if there is majority agreement on at least the shelve fix i can do that and leave the generic fix to rot14:36
gibianybody against ^^ ?14:37
edleafenot I14:37
efriedDon't know about "rot". But it should be evaluated separately whether we can identify a proper scope for "always retry".14:37
efriedI'm just not convinced that that scope == this method.14:37
mriedemi just likely won't have the energy to do that evaluation14:37
efriedThat's fine.14:38
mriedemi already spent the better part of a day identifying this race14:38
gibiyeah, fix the known race14:38
efriedSeems like something gibi would have the energy for, nudge nudge :P14:38
mriedemgibi has bigger fish to fry14:38
gibiefried: :) I don't feel that way14:38
gibiOK so we agreed that mriedem propose a fix for the shelve race14:39
gibimoving forward14:39
gibi#topic Open discussion14:39
*** openstack changes topic to "Open discussion (Meeting topic: nova)"14:39
gibi(mriedem): Looking for approval on this specless blueprint: - this was discussed in the Oct 25 meeting and there was agreement for no spec, but efried wanted to know the name of the config option which is in the blueprint now. So are we good to go?14:39
gibiefried: ^^ ?14:39
rambo_li_Hi,all ,I have some questions,one is The actual  operator is different from the operator was record  in panko. Such as the delete action, we create the VM as user1,  and we delete the VM as user2, but the operator is user1 who delete the VM in panko event, not the actual operator user2.14:40
gibirambo_li_: lets take that after georgh14:40
efriedgibi, mriedem: Yup, good to go, thanks for the update.14:40
gibiefried: cool14:41
mriedemok so should i approve my own blueprint then?14:41
gibimriedem: I agree to approve it14:41
mriedemit's on the record14:41
gibi(gmann): Regarding migrating the gate jobs to Bionic- . I have tested existing zuulv3 nova jobs on bionic and all working fine. I have marked OK for nova in
gibigmann: thanks for the effort14:41
gibiis there anything we have to discuss about the bionic jobs?14:42
gmannnote- legacy jobs are not planned to migrate to bionic so they keep running on xenial until move to zuulv3 native14:42
mriedemwhen will infra drop xenial?14:42
mriedemwe have a lot of legacy jobs14:42
gmannit will not be dropped as few job still have dependency like keystone federation job etc14:43
gmannmriedem: i am planning to start the migration to zuulv3 one by one next week.14:43
mriedemok, i guess it can be a community wide goal in the future when it's a real problem14:43
mriedemor that14:43
gibiOK. georgh it is your turn14:44
georghthx, I'm looking for a final approval of
georghmelwitt asked Kashyap Chamarthy for help but his review didn't bring the issue ahead14:45
mriedemand sahid is gone from red hat,14:46
sean-k-mooneyyes i rember this patch for what its worth i still think it good14:46
georghIt's been stuck since then and from my point of view the change its finished.14:46
mriedemthere just aren't that many people familiar with this code14:46
efriedI pinged kashyap to have another look.14:46
georghI know that interactive serial consoles are an exotic topic14:46
sean-k-mooneymriedem: sahid is now at canonical but im not sure if he is working on openstack anymore14:46
*** kashyap has joined #openstack-meeting14:47
*** ttsiouts has quit IRC14:47
*** ttsiouts has joined #openstack-meeting14:48
efriedkashyap is looking14:48
gibiefried: thanks14:48
gibiI hope kashyap reply will unblock this patch14:48
kashyapefried: Oh, this one.  Yes, need to load all the "console-related context", can do it tomm or early next week14:48
kashyapThanks for the ping14:48
georghok, thank you14:48
kashyapMy bad, the reply has been sitting since 02-Nov14:48
gibimoving forward14:48
*** ttsiouts has quit IRC14:48
efriedWe'll need a couple of cores once that happens. I'm afraid I have zero background in this area. Anyone?14:48
*** ttsiouts has joined #openstack-meeting14:49
sean-k-mooneyefried: stephenfin: should be black monday14:49
gibiI happy to read the patch and learn some of it14:49
mriedemmine is very thin14:49
efriedIf not, I wouldn't feel *too* bad about leaning on the expertise of sean-k-mooney and kashyap14:49
mriedemsomeone get markus zoeller out of retirement14:49
efriedif stephenfin is familiar, that would be good.14:49
gibireally moving forward14:49
gibirambo_li_: your turn14:50
efriedI wonder if rambo_li_'s issue is better for the main channel, outside of the meeting.14:50
efriedor even the ML14:51
gibirambo_li_: would it be OK to you to bring this up on #openstack-nova or on the mailing list?14:51
rambo_li_  ok,thank you,let's go to the next one14:52
gibirambo_li_: thanks14:52
rambo_li_Another one,When we resize/migrate instance, if error occurs on source compute node, the instance state can rollback to active currently.But if error occurs in "finish_resize" function on destination compute node, the instance state would not rollback to active. Is there a bug, or if anyone plans to change this?14:52
bauzasreset-state ?14:53
mriedemlikely because there isn't cleanup on finish_resize if an error occurs14:53
gibirambo_li_: if finish resize already destroyed someting on the source then it is hard to roll back14:53
*** awaugama has joined #openstack-meeting14:53
mriedemdepending on where it happens, by the time you're in finish_resize there is a guest on the dest14:53
mriedemso putting the instance as ACTIVE isn't really valid in that case14:53
gibimriedem: agree14:53
efriedTL;DR: working as designed14:53
mriedemif not designed,14:54
mriedemnot a bug really,14:54
mriedemit would really be an RFE i think to make it more robust for rolling back14:54
mriedemif we cared to do that14:54
mriedemb/c rollback is hard14:54
gibiyeah it would be good to know why finish resize failed at the first place14:54
gibimaybe we can avoid that failure14:54
bauzaswhat exactly does finish_resize ?14:55
bauzassorry for the dumb question, I can read code but I'm old and laz14:55
mriedemi finishes the resize14:55
gibirambo_li_: could you provide information about the failure during finish resize?14:56
mriedemsets up the disk and starts the guest14:56
mriedemon the dest host14:56
sean-k-mooneybauzas: it cleans up networks on the source node a a few other things14:56
rambo_li_oh,thank you,I will reconsider it14:56
bauzasok, I was asking this, because if that's just clean-up, you can still resurrect your instance ?14:56
*** longkb has quit IRC14:56
mriedemit's not cleanup14:57
rambo_li_when we finish resize ,the instance can't start in dest node14:57
mriedemit's the thing right before the status goes to VERIFY_RESIZE14:57
bauzasok, nevermind, I'll look14:57
mriedemprep_resize (claim on dest) -> resize_instance (source, transfer disk to dest) -> finish_resize (setup disk on dest, start guest, do db magic)14:57
sean-k-mooneyoh ok my bad i tought it was the confirm step after that14:57
mriedemby the time you get to the finish_resize and it fails, you're in trouble14:58
efriedSomeone should do a flow diagram for that.14:58
*** lpetrut has quit IRC14:58
*** vishalmanchanda has quit IRC14:58
mriedemefried: i've got a dime with your name on it14:58
efriedI require a napkin sketch, like last time.14:58
mriedemoh and don't forget about reschedules in the flow diagram :)14:58
gibiwe have less than 2 minutes. I think robustifying finish_resize could be a valid feature request14:58
bauzasmriedem: oh shit, I also confused myself with confirm resize14:58
efriedUpdate on ci-watch:14:59
efriedThe maintainer of is gone and unreachable.14:59
efriedBut mmedvede has redeployed the code to:
efriedI have updated the Nova meeting agenda accordingly.14:59
mriedemsee i don't know shit about tcp consoles, but i know how the resize flow works14:59
gibiefried: thanks14:59
gibiwe have to close the meeting in seconds.14:59
rambo_li_ok,last one, I find it is important that live-resize the instance in production environment. We have talked it many years and we agreed this in Rocky PTG, then the author remove the spec to Stein, but there is no information about this spec, is there anyone to push the spec and achieve it?  The link:
gibilet's continue on #openstack-nova14:59
*** openstack changes topic to "OpenStack Meetings ||"15:00
openstackMeeting ended Thu Nov 29 15:00:05 2018 UTC.  Information about MeetBot at . (v 0.1.4)15:00
openstackMinutes (text):
gmannthanks gibi15:00
*** redrobot has joined #openstack-meeting15:00
mriedemrambo_li_: please ask in #openstack-nova15:00
mriedemrambo_li_: btw, are you ji lie from unitedstack?15:01
cdentthanks gibi15:01
mriedem*li jie15:01
rambo_li_thank you for the rebuild spec15:02
mriedemi was looking for you on wechat but didn't know your nickname15:02
rambo_li_or you can send email to lijie@unitedstack.com15:03
