16:00:24 <gibi> #startmeeting nova 16:00:26 <openstack> Meeting started Thu Apr 15 16:00:24 2021 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:29 <openstack> The meeting name has been set to 'nova' 16:00:34 <gibi> o/ 16:01:13 <ganso> o/ 16:01:22 <lyarwood> o/ 16:02:05 <elod> o/ 16:02:08 <gibi> #topic Bugs (stuck/critical) 16:02:13 <gibi> No Critical bugs 16:02:16 <bauzas> \o 16:02:18 <gibi> #link 18 new untriaged bugs (+4 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:02:25 <gibi> It seems that the gerrit - launchpad integration started to work so bug status expected to be updated automatically from gerrit again. 16:02:29 <gibi> Details: #link http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2021-04-15.log.html#t2021-04-15T14:28:21 16:02:44 <dansmith> o/ 16:02:59 <stephenfin> o/ 16:03:05 <gibi> is there any specific bug we need to talk about? 16:03:24 <stephenfin> There's that gate bug, but I guess we'll get to it shortly 16:03:45 <gibi> yupp 16:03:57 <gibi> anything else on bug side? 16:04:37 <gibi> #topic Gate status 16:04:42 <gibi> We have a high failure rate in the live migration job due to #link https://bugs.launchpad.net/tempest/+bug/1924258 16:04:44 <openstack> Launchpad bug 1924258 in tempest "test_live_migration_with_trunk fails intermittently" [Undecided,In progress] - Assigned to Lajos Katona (lajos-katona) 16:05:01 <gibi> stephenfin: Is this you wanted to add? ^^ 16:05:22 <gibi> it just got a WIP patch #link https://review.opendev.org/c/openstack/tempest/+/786465 16:05:23 <stephenfin> that's the one 16:06:06 <gibi> any other gate issue we should know about? 16:06:50 <gmann> o/ 16:07:01 <gibi> #topic Release Planning 16:07:06 <gibi> Wallaby has been released 16:07:16 <gibi> Thank you all who made it happen! \o/ 16:07:20 <gibi> Wallaby project update call happened today, the recording is available here: #link https://www.youtube.com/watch?v=tZ2bfdF0fOg 16:07:45 <gibi> We need two patches to land to fully open the master to Xena 16:07:48 <gibi> #link https://review.opendev.org/c/openstack/nova/+/782171 16:07:52 <gibi> #link https://review.opendev.org/c/openstack/nova/+/778923 16:08:03 <gibi> needs some eyes from cores ^^ 16:08:18 <gibi> any other release info? 16:09:55 <stephenfin> I'm +2 on both of those now 16:10:02 <gibi> stephenfin: thanks! 16:10:05 <stephenfin> needs another +2 for https://review.opendev.org/c/openstack/nova/+/782171 16:10:05 <gibi> #topic PTG planning 16:10:16 <gibi> PTG is next week! 16:10:24 <gibi> topics: #link https://etherpad.opendev.org/p/nova-xena-ptg 16:10:26 <bauzas> I miss you all folks 16:10:32 <gibi> me too 16:11:00 * bauzas needs to prepare the PTG... by filling up his keg 16:11:15 <gibi> recent updates in the ptg schedule: 16:11:15 <gibi> A small neutron - nova cross project is booked for Friday 15:00 UTC, we have only one topic. 16:11:21 <gibi> Also there was a request for an interop session from Arkady and it is booked to Wednesday 14:00 UTC 16:11:35 <gibi> I did a minimal reorg on the nova topics but the basic rule that we well go from top to bottom and potentially defer topics if the author / expert is not available. 16:12:03 <gibi> if anybody has a topic that needs special timing then let me know and I will note it 16:12:08 <gibi> and try to schedule it 16:12:38 <gibi> ptg bot is up to date #link http://ptg.openstack.org/ptg.html 16:13:29 <gibi> next week we will skip the weekly meeting due to PTG 16:14:29 <gibi> anything else about next week and the PTG? 16:15:40 <gibi> #topic Stable Branches 16:15:45 <gibi> stable/wallaby is open for bug fix backports 16:15:49 <gibi> stable gates should be OK from Wallaby till Pike (stackviz post-task workarounds are merged) 16:15:52 <gibi> EOM 16:15:55 <gibi> thanks elod for the update 16:15:59 <elod> np 16:16:03 <gibi> anything else on stable? 16:16:18 <elod> nothing else from me 16:17:08 <gmann> +1 16:17:28 <bauzas> how are the stable branch jobs ? 16:18:00 <gmann> it should be green now after stackviz workaround and grenade stable/train fixes 16:19:17 <gmann> this on pike merged 5 days ago, so should be all green https://review.opendev.org/c/openstack/nova/+/723055 16:19:24 <elod> yes, as I saw they are OK, thanks for the fixes 16:20:04 <bauzas> thanks 16:20:33 <gibi> #topic Sub/related team Highlights 16:20:37 <gibi> Libvirt (bauzas) 16:21:01 <bauzas> well, mnaser filed a complaint about how bad I write libvirt features 16:21:52 <bauzas> so, now, in order to avoid jail, I have to work on https://bugs.launchpad.net/nova/+bug/1900800 and deliver appropriate backports 16:21:53 <openstack> Launchpad bug 1900800 in OpenStack Compute (nova) "VGPUs is not recreated on host reboot" [Low,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza) 16:21:54 <bauzas> that's it. 16:21:59 <gibi> I saw you found a way forward about it 16:22:15 <bauzas> (this was a joke, to be 100% clear) 16:22:22 <bauzas> moving on 16:22:59 <gibi> #topic Open discussion 16:23:02 <lassimus87> I'd like to discuss adding support for guests with arch != host arch. I'm tracking @stephenfin and others topics in ptg etherpad. I have a working concept here: https://review.opendev.org/c/openstack/nova/+/772156. I've been afk for the last few weeks waiting out the wallaby release period, so I have some new merge conflicts to resolve. I'm 16:23:03 <lassimus87> bringing it up here because there seems to be differing opinions on the direction of nova regarding emulation support. 16:23:45 <gibi> lassimus87: could you open up what are the differing options? 16:24:10 <bauzas> this rings me a bell. 16:24:18 <lassimus87> "Only conflict is if we ever wanted to support non-host guests using these architectures but that seems to be a non-goal of nova?(maybe we shoudl rephsase this as dropping supprot for 32bit hosts) that might be a valid cross project goal." --@stephenfin from the xena ptg notes 16:24:21 <bauzas> belmoreira was interested in this, if I recall correctly 16:24:49 <lassimus> okay I got my nick back :) 16:25:20 <stephenfin> I think this would be worth discussing at the PTG, if possible 16:25:50 <stephenfin> My point there is that we tend to conflate host and guest architectures, since we've haven't support host != guest architecture for some time now 16:26:05 <lassimus> I'm fine waiting to discuss until the PTG. I'm new to the nova dev community, so I didn't want to add discussion directly to etherpad 16:26:28 <gibi> lassimus: feel free to add discussion to the ptg etherpad 16:26:42 <stephenfin> We have checks for things like MIPS, which no new hardware is being made for, which means we could drop the host architecture support, but I'm not sure if we can drop the guest architecture support 16:26:50 <stephenfin> so yeah, a good PTG topic 16:27:08 <gibi> lassimus: will you be able to join us during the PTG next week? 16:27:17 <lassimus> awesome. I'll add some thoughts on etherpad, and I look forward to hashing it out next week 16:27:18 <lassimus> yes 16:27:43 <gibi> lassimus: I will make sure to ping you when we reach this topic next week 16:28:00 <lassimus> perfect 16:28:12 <gibi> I do feel that we had guest != host request from CERN so you are not alone 16:28:30 <bauzas> right, hence my courtesy ping to belmoreira :) 16:28:36 <gibi> bauzas: ++ 16:28:40 <lassimus> my customer is the Georgia Cyber Center, and some other minor interested parties 16:29:05 <gibi> lassimus: cool 16:29:12 <bauzas> that's a reasonable ask, but we need to discuss the design 16:29:12 * artom thought there was a massive emulation performance penalty on that, last time he checked 16:29:19 <artom> Though I guess it depends on the specific arch's 16:29:27 <bauzas> artom: yup, from my recollection 16:29:38 <lassimus> performance isn't always the goal 16:29:39 <bauzas> but there are good reasons now to mix them up 16:29:47 <artom> So I am a bit curious what use cases don't mind the perf hit 16:29:55 <bauzas> spec up ! 16:29:56 <bauzas> :p 16:30:14 <bauzas> unkidding, sounds a good PTG discussion 16:30:18 <gibi> artom: I can imagine a CI system functional testing arch specific app in a cheap way 16:30:34 <gibi> ppc tend to be expensive 16:30:39 <lassimus> yeah, I'm happy to brain dump here, but it seems like a better fit for the PTG 16:30:51 <gibi> sure, lets do the braindumping next week 16:31:01 <gibi> any other topic for today? 16:31:09 <ganso> gibi: o/ my topic is on the agenda 16:31:26 <gibi> ganso: ohh, I missed that, please tell us 16:31:38 <ganso> topic: update on bug 1821755 (ganso) 16:31:39 <openstack> bug 1821755 in OpenStack Compute (nova) "live migration break the anti-affinity policy of server group simultaneously" [Medium,In progress] https://launchpad.net/bugs/1821755 - Assigned to Boxiang Zhu (bxzhu-5355) 16:31:57 <ganso> so, I've brought this up in a meeting previously about addressing this bug 16:32:17 <ganso> but I leaned towards a redesign of the (anti-)affinity functionality in placement 16:32:56 <ganso> I spent a significant amount of effort on that and hit several struggles. It can be done, but the amount of work and complexity has increased far beyond what I initially estimated 16:33:08 <bauzas> affinity in placement is a can of worms 16:33:26 <ganso> bauzas: yea, looks like I hit some of those worms xD 16:33:29 <ganso> therefore I decided to take a step back and try a simpler alternative 16:33:30 <artom> Angry worms, with teeth and spikes and venomous stingers 16:33:33 <bauzas> I thought we said we should model the affinity between RPs as a distance between them 16:33:39 <ganso> voi-la https://review.opendev.org/c/openstack/nova/+/784166 16:34:10 <ganso> basically I took inspiration from a previous attempt on solving the bug (https://review.openstack.org/651969) 16:34:16 <ganso> and did some things differently 16:34:30 <bauzas> ganso: I discover the bug, what's the problem ? 16:34:34 <ganso> in my testing I was not able to reproduce the issue for anti-affinity any longer 16:35:08 <ganso> bauzas: sorry I didn't understand your question? 16:35:10 <bauzas> don't we have the late affinity check ? 16:35:23 <bauzas> on the compute service 16:35:27 <gibi> bauzas: I think this patch now adds the late affinity check for live migration 16:35:35 <bauzas> or have we removed it? 16:35:35 <ganso> bauzas: oh ok, so the problem is that there are race conditions that violate the policy when doing concurrent migrations (live or cold) 16:35:36 <artom> bauzas, yeah, we only have that check for boot, and no other move operation 16:35:52 <bauzas> artom: really? I'm surprised 16:36:01 <artom> I'm not :P 16:36:15 <ganso> bauzas: the existing check worked only for when creating instances, and it doesn't account for instances are being migrated to that host 16:36:34 <bauzas> actually reading mriedem's comment 16:36:58 <bauzas> okay, this sounds a decent review request then 16:37:12 <gibi> yeah, I queued it up to my review list 16:37:12 <artom> I mean, I agree the "correct" way to do it would be placement 16:37:26 <artom> Anything else is hax. 16:37:36 <artom> OTOH, the former is hard, and the latter is much quicker and easier 16:37:38 <gibi> and I agree that affinity and placement is a hard topic so I have no problem having a hax in the meantime 16:37:47 <ganso> there is a lengthy discussion on the gerrit page, I tried to address all comments with as much detail as I can to move this forward. There is also a summary of my work in the redesign in one of the comments (the lenghiest one) 16:38:26 <ganso> gibi, artom: so, the patch I'm proposing only addresses anti-affinity, not affinity 16:38:45 <gibi> ganso: thanks for picking this work up I will try to get to it tomorrow 16:38:56 <ganso> those are 2 different can of worms, and I found the affinity ones to be the most venomous ones :P 16:38:56 <bauzas> ganso: which is what the late-affinity check is doing 16:39:27 <bauzas> violating the affinity policy isn't a race for a single compute 16:39:42 <ganso> doing both through placement is a huge amount of work. Doing just one leaves things hanging and incomplete, possibly requiring another redesign for implementing the other one 16:39:46 <bauzas> you just don't see the race as both instances are spreaded 16:40:15 <bauzas> ganso: I totally agree and I fundamentally disagree with sean-k-mooney's objection :) 16:40:30 <bauzas> sad he isn't here :) 16:40:33 <ganso> bauzas: yea I use 5 computes in my lab, makes it much easier to reproduce and visualize the violations 16:41:10 <ganso> that's all I had. Thanks all and looking forward to your reviews :) 16:41:21 <gibi> ganso: thanks for working on the bug 16:41:23 <bauzas> yup 16:41:32 <gibi> anything else for today? 16:42:53 <gibi> if not then I thank all of you to join today 16:43:14 <gibi> I will have glass of vine for the wallaby release. thanks again to make that happen 16:43:20 <gibi> not the wine, the release :D 16:43:56 <gibi> #endmeeting