Thursday, 2019-09-12

*** adriant has quit IRC00:11
*** lbragstad_ has joined #openstack-nova00:12
*** lbragstad has quit IRC00:14
*** igordc has quit IRC00:23
*** markvoelker has quit IRC00:29
*** markvoelker has joined #openstack-nova00:29
*** hemna has quit IRC00:36
*** adriant has joined #openstack-nova00:42
*** henriqueof has joined #openstack-nova00:45
*** henriqueof1 has quit IRC00:46
*** ivve has quit IRC00:47
*** ricolin has joined #openstack-nova00:48
*** gyee has quit IRC00:56
*** henriqueof1 has joined #openstack-nova00:56
*** henriqueof has quit IRC00:57
dolpheralex_xu: good evening01:03
*** markvoelker has quit IRC01:03
*** markvoelker has joined #openstack-nova01:04
*** henriqueof1 has quit IRC01:05
*** henriqueof has joined #openstack-nova01:05
*** hemna has joined #openstack-nova01:09
*** pcaruana has quit IRC01:09
*** slaweq has joined #openstack-nova01:11
*** brinzhang_ has joined #openstack-nova01:13
*** slaweq has quit IRC01:16
*** brinzhang has quit IRC01:17
openstackgerritweibin proposed openstack/nova master: Add support for using ceph RBD ereasure code  https://review.opendev.org/68118801:20
yaawanggood morning01:21
*** Tianhao_Hu has joined #openstack-nova01:22
*** Tianhao_Hu has quit IRC01:22
openstackgerritBrin Zhang proposed openstack/nova master: Add user_id and project_id colume to Migration  https://review.opendev.org/67399001:32
openstackgerritBrin Zhang proposed openstack/nova master: Add user_id and project_id colume to Migration  https://review.opendev.org/67399001:35
*** dklyle has quit IRC01:42
*** david-lyle has joined #openstack-nova01:42
*** hemna has quit IRC01:43
*** TxGirlGeek has quit IRC01:45
*** david-lyle has quit IRC01:46
*** dklyle has joined #openstack-nova01:47
alex_xudolpher: yaawang good evening and good morning01:54
*** yedongcan has joined #openstack-nova02:09
*** slaweq has joined #openstack-nova02:11
openstackgerritBrin Zhang proposed openstack/nova master: Set user_id/project_id from context when creating a Migration  https://review.opendev.org/67941302:11
*** slaweq has quit IRC02:16
*** dannins has joined #openstack-nova02:20
*** hemna has joined #openstack-nova02:43
*** tinwood has quit IRC02:50
*** tinwood has joined #openstack-nova02:52
dolpherERROR: No matching distribution found for configparser===4.0.103:01
dolpherLooks like configparser4.0.1 is not available from pypi03:02
openstackgerritLuyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table  https://review.opendev.org/67844703:12
openstackgerritLuyao Zhong proposed openstack/nova master: object: Introduce Resource and ResourceList objs  https://review.opendev.org/67844803:12
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844903:12
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845003:12
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845203:12
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845303:12
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845403:12
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845503:12
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845603:12
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964003:12
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847003:12
*** hemna has quit IRC03:16
*** Garyx_ has joined #openstack-nova03:33
*** Garyx has quit IRC03:33
*** mkrai has joined #openstack-nova03:41
*** mkrai has quit IRC03:46
*** hemna has joined #openstack-nova03:48
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964004:03
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847004:03
*** KeithMnemonic has quit IRC04:04
*** slaweq has joined #openstack-nova04:11
*** slaweq has quit IRC04:16
openstackgerritMerged openstack/nova stable/stein: doc: Fix a broken reference link  https://review.opendev.org/68114104:17
openstackgerritMerged openstack/nova stable/rocky: doc: Fix a broken reference link  https://review.opendev.org/68114204:17
openstackgerritMerged openstack/nova stable/queens: doc: Fix a broken reference link  https://review.opendev.org/68114304:17
*** spatel has joined #openstack-nova04:17
*** hemna has quit IRC04:23
*** sapd1_x has joined #openstack-nova04:23
*** ricolin has quit IRC04:27
*** udesale has joined #openstack-nova04:31
*** pcaruana has joined #openstack-nova04:32
*** Luzi has joined #openstack-nova04:46
*** markvoelker has quit IRC04:48
*** dave-mccowan has quit IRC04:57
*** pcaruana has quit IRC04:58
*** spatel has quit IRC04:58
*** yedongcan has quit IRC05:03
*** mkrai has joined #openstack-nova05:09
*** pots has joined #openstack-nova05:10
*** sapd1_x has quit IRC05:13
*** ash2307 has quit IRC05:13
*** ash2307 has joined #openstack-nova05:15
*** hemna has joined #openstack-nova05:15
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847005:19
*** jaosorior has joined #openstack-nova05:20
*** ratailor has joined #openstack-nova05:30
*** ccamacho has quit IRC05:33
*** efried_zzz has quit IRC05:34
*** hamzy_ has quit IRC05:35
*** brinzhang has joined #openstack-nova05:48
*** brinzhang_ has quit IRC05:51
*** slaweq has joined #openstack-nova05:53
*** slaweq has quit IRC05:59
*** tkajinam has quit IRC06:00
*** mkrai has quit IRC06:01
*** aloga has quit IRC06:03
*** markvoelker has joined #openstack-nova06:06
*** tkajinam has joined #openstack-nova06:06
*** pcaruana has joined #openstack-nova06:08
*** markvoelker has quit IRC06:11
*** slaweq has joined #openstack-nova06:13
*** TxGirlGeek has joined #openstack-nova06:15
*** TxGirlGeek has quit IRC06:17
*** hemna has quit IRC06:18
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845006:28
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845206:28
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845306:28
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845406:28
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845506:28
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845606:28
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964006:28
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847006:28
*** ccamacho has joined #openstack-nova06:35
*** rpittau|afk is now known as rpittau06:36
*** mkrai has joined #openstack-nova06:39
*** pcaruana has quit IRC06:41
*** ricolin has joined #openstack-nova06:41
alex_xustephenfin: sean-k-mooney the fallback case I understand, https://etherpad.openstack.org/p/pcpu_fallback06:41
*** maciejjozefczyk has joined #openstack-nova06:43
gibigood morning nova06:45
alex_xugibi: good morning06:46
*** tkajinam_ has joined #openstack-nova06:53
*** tkajinam has quit IRC06:56
*** ricolin has quit IRC06:59
*** damien_r has joined #openstack-nova07:00
*** ralonsoh has joined #openstack-nova07:00
*** brault has joined #openstack-nova07:04
openstackgerrithutianhao27 proposed openstack/nova master: Fix bug directory left after cold migration  https://review.opendev.org/68165207:05
*** ivve has joined #openstack-nova07:07
*** spsurya has joined #openstack-nova07:07
*** pcaruana has joined #openstack-nova07:11
bauzasgood morning nova07:13
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845007:15
*** pierreprinetti has quit IRC07:15
*** trident has quit IRC07:15
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845207:15
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845307:15
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845407:15
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845507:15
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845607:15
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964007:15
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847007:15
*** awalende has joined #openstack-nova07:17
*** ricolin has joined #openstack-nova07:18
*** ralonsoh has quit IRC07:23
*** ralonsoh has joined #openstack-nova07:24
*** ralonsoh has quit IRC07:24
*** trident has joined #openstack-nova07:24
*** ralonsoh has joined #openstack-nova07:24
*** trident has quit IRC07:28
openstackgerritBrin Zhang proposed openstack/nova master: Filter migrations by user_id/project_id  https://review.opendev.org/67424307:30
*** ralonsoh has quit IRC07:32
*** ralonsoh has joined #openstack-nova07:32
*** avolkov has joined #openstack-nova07:36
*** ttsiouts has joined #openstack-nova07:37
*** trident has joined #openstack-nova07:37
*** dolpher has quit IRC07:41
*** hamzy has joined #openstack-nova07:42
gibido we have a solution for the gate failure: ERROR: Could not find a version that satisfies the requirement configparser===4.0.107:50
gibi?07:50
alex_xugibi: I saw this https://review.opendev.org/#/c/681630/07:50
alex_xuso it suppose to be work now?07:51
gibialex_xu: thanks, then I go and recheclk07:52
gibirecheck07:52
*** takashin has left #openstack-nova08:02
*** shilpasd has joined #openstack-nova08:07
*** mkrai has quit IRC08:08
*** tkajinam_ has quit IRC08:09
*** mkrai has joined #openstack-nova08:10
*** cdent has joined #openstack-nova08:11
*** shilpasd has quit IRC08:11
*** hemna has joined #openstack-nova08:14
*** markvoelker has joined #openstack-nova08:16
*** threestrands has quit IRC08:16
lyarwoodHave we cut M3 yet? I've been holding off for a while on posting bugfixes to avoid clogging up the gate.08:17
lyarwoodwait, the deadline is today, ignore me.08:18
gibilyarwood: yepp, today is still crazy FF day08:18
lyarwoodack thanks I'll hold off in that case08:19
*** markvoelker has quit IRC08:20
*** mdbooth has joined #openstack-nova08:20
openstackgerritBalazs Gibizer proposed openstack/nova master: Make SRIOV computes non symmetric in func test  https://review.opendev.org/68166708:24
*** derekh has joined #openstack-nova08:28
stephenfinalex_xu: Looking08:34
stephenfinalex_xu: That looks correct. I started work on the fix for that yesterday but haven't finished. It only happens if you have two compute nodes though, right?08:35
alex_xustephenfin: it can be happened for 100 nodes, if all 100 nodes can get from placement allocation candidtes, but failed in later filtering08:36
stephenfinboth cases?08:37
alex_xustephenfin: yes, see the line 2508:37
alex_xustephenfin: I think the root cause is that08:37
stephenfinbut it'll only happen if we attempt to use the same host08:38
alex_xuI don't think I figure out the root cause yesterday, it isn't just about resize08:38
alex_xuwhat about case108:38
alex_xustephenfin: I think anycase about placement can return candidates, but nova scheduler filter refuse the request.08:39
alex_xuthen we won't fallback08:39
stephenfintrue08:41
stephenfinhmm08:41
stephenfinI'm still tempted to think it's not a huge issue. You'd need an awful lot to go against you08:42
alex_xustephenfin: but need to think about that whether we can have such worse case, like 100 nodes failed at filter not the placement08:42
stephenfinNamely that you're in the middle of an upgrade and you've got a lot of failures in the compute node08:42
alex_xuI guess it will bad in the every begining of upgrade, like you only have 5 nodes or 10 nodes in T08:42
stephenfinI'd be tempted to just add a workaround option08:42
stephenfinThat, when enabled, will trigger "request VCPU by default"08:43
stephenfinso if someone ran into this in the middle of an upgrade, they'd set the config option and we wouldn't request PCPU until they've a few more compute nodes upgraded08:43
stephenfinIt might be overkill thoughj08:43
stephenfin*though08:43
alex_xuok, the operator can switch that when he confidence he has enough PCPU node08:43
stephenfinyeah08:43
*** sapd1 has quit IRC08:44
alex_xustephenfin: I guess someone may say we add more job for operator08:44
alex_xustephenfin: how deficult if we fallback the whole placment+filtering08:44
stephenfinI'm looking now08:44
*** jaosorior has quit IRC08:45
alex_xuI'm thinking we don't want to fallback all the time, how to do that check condition when filter failed08:46
*** hemna has quit IRC08:47
*** IvensZambrano has joined #openstack-nova08:48
stephenfinalex_xu: Okay, I think it should be possible. Leave it with me :)08:48
alex_xustephenfin: so cool :)08:49
bauzasalex_xu: stephenfin: I looked at your convo, can you please summarize me why the filter would different from placement ?08:50
alex_xubauzas: here is two cases I found https://etherpad.openstack.org/p/pcpu_fallback08:50
alex_xumaybe more08:50
stephenfinbauzas: We try to get PCPU from placement, and if that returns nothing we try for VCPU08:51
stephenfinIt's to minimize the upgrade impact08:51
stephenfinHowever, if the first request does return some stuff, but those things fail the filters, we don't do the fallback for VCPU08:51
bauzaswell, I still don't get it08:51
bauzasthat's surely me, but if the operator said "okay, let's go with PCPU", that's because he opted-in, right?08:52
stephenfinthey've opted in by setting '[compute] cpu_dedicated_set' on some compute nodes, yes08:52
bauzasbut we said that it wouldn't provide PCPU inventories until all computes are opted in, right?08:53
* bauzas looks back at the spec08:53
alex_xubauzas: no, the upgrade plan is changed08:53
bauzasactually, we send new inventories after opting in with the conf opt, you're right08:53
bauzasI remember +2ing this08:53
bauzasalex_xu: stephenfin: honesly, I feel it's hard to manage both at the same time, and I voiced it a couple of times in the spec review08:54
alex_xubauzas: here is the thing manage same time https://review.opendev.org/#/c/671801/46/nova/scheduler/manager.py@17408:55
*** priteau has joined #openstack-nova08:56
openstackgerritLuyao Zhong proposed openstack/nova master: Retrieve the allocations early  https://review.opendev.org/67845008:57
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845208:57
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845308:57
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845408:57
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845508:57
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845608:57
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964008:57
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847008:57
stephenfinbauzas: I'm not sure what else there is to do though. We have to do this, because no one has an alternative08:57
stephenfinand we need to get to a future with PCPU. We can't just keep kicking the can down the road :(08:57
bauzasI don't disagree with you08:57
bauzasI'm just trying to make sure we don't shoot ourselves in the feet08:58
bauzasthat whole upgrade plan gave me brainaches when I was reviewing the spec08:58
stephenfinoh, I've been working plenty hard to make sure we don't that :)08:58
stephenfinchanging the world (as far as placement is concerned) is hard08:58
stephenfinas is getting things through the gate right now, it seems :D08:59
stephenfinso many false negatives...08:59
*** efried has joined #openstack-nova09:00
efriedo/ nova09:00
*** ociuhandu has joined #openstack-nova09:00
bauzas...09:00
efriedThe gate appears to be well and truly f'ed.09:00
stephenfinYes. Yes it does.09:00
efriedThere was a configparser thing, which may be fixed?09:01
efriedBut I haven't seen any successful runs even since that happened09:01
stephenfinI'm not sure. Takashi-san (ignorance time: should you use the first name or second?) has been furiously rechecking some Mox -> mock and its failed everytime with a different error09:02
bauzaslemme look at logstash09:02
bauzasany bug tracking this ?09:02
bauzasany exception snippet I could use ?09:02
stephenfinbauzas: Not really. It's a simple dependency issue https://zuul.opendev.org/t/openstack/build/f4c728cbb77e48b7a149ea952d2ca2ec/log/job-output.txt#86209:03
bauzasthat's enough for looking at occurrences09:03
*** udesale has quit IRC09:04
* bauzas is a bit rusty with logstash but this should be doable09:04
stephenfinit was hitting anything using Python 2.7. Here's the functional test from the same change (the other one was the py27 unit test) https://zuul.opendev.org/t/openstack/build/f862fb763ee54e96b0a844eaf6f00b57/log/job-output.txt#94409:04
stephenfinI guess they dropped support for Python 2 too \o/09:04
*** ociuhandu has quit IRC09:04
*** udesale has joined #openstack-nova09:07
luyaostephenfin: I see you maybe busy for your own patch, very appreciate if you can look the vpmems again  when you get time. NUMA issue is fixed and CI for vpmems works well, the tests passed on the last two patches.09:08
stephenfinluyao: Yup, don't worry, I'll get to it before the end of my day09:09
stephenfinShould be a straightforward +2 at this point, I'd imagine09:09
luyaostephenfin: many thanks! :D09:09
bauzashttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%3A%20Could%20not%20find%20a%20version%20that%20satisfies%20the%20requirement%20configparser%5C%2209:09
bauzaslooks like it's solved ^09:09
efriedstephenfin: meanwhile, what's the status of cpu-resources? Ready for a re-look?09:09
stephenfinefried: one patch, yes, but alex_xu has some concerns around the other patch09:10
efriedbauzas: are you savvy on the quota issue or do we need to wait for Dan?09:10
alex_xuefried: yes, the only https://review.opendev.org/#/c/671801/, I +2 all other patches09:10
stephenfinefried: In short, the way we're doing the retry is to only retry if placement doesn't provide any matches09:10
stephenfinHowever, placement could return some matches and the NUMATopologyFilter (or other filters) could reject those09:11
efriedugh09:11
stephenfinSo alex_xu is suggesting that we should both the request to placement and the request to the filter scheduler inside the try-try again logic09:11
bauzasefried: well, I'm not a quota specialist, but looks like melwitt's concerns were addressed09:11
bauzasefried: I can take a look on it09:11
alex_xuefried: may help you understand the problem https://etherpad.openstack.org/p/pcpu_fallback09:12
efriedstephenfin: or what we could do is just do both GET /a_c queries, cat the vcpu ones after the pcpu ones, and run the filter once.09:12
bauzaswe are technically sharding the capacity, right?09:12
stephenfinefried: I thought of that but doesn't the filter scheduler shuffle the allocation candidates?09:13
*** ralonsoh has quit IRC09:13
efriedplacement does the shuffle09:13
stephenfinohhhh09:13
efriedI don't know if the filter scheduler also does09:13
*** FlorianFa has quit IRC09:13
efriedstephenfin: this scenario only applies if PCPUs were requested, right?09:14
efriedMeaning if you have the granny switch flipped *and* you're on a flavor that's requesting dedicated?09:14
efriedDoing a double GET /a_c would be expensive, so it would be nice if we were only doing it in a corner case.09:15
*** hemna has joined #openstack-nova09:15
*** lpetrut has joined #openstack-nova09:16
*** rcernin has quit IRC09:16
gibiefried: if there is more than one candidate for a single host then the scheduler will only look for the first candidate (per host)09:18
*** ociuhandu has joined #openstack-nova09:19
openstackgerritEric Fried proposed openstack/nova master: Support reverting migration / resize with bandwidth  https://review.opendev.org/67614009:19
gibiefried: that will complicate my work a bit ^^09:20
*** shilpasd has joined #openstack-nova09:20
efriedgibi: just a rebase, shouldn't affect anything09:21
*** ociuhandu has quit IRC09:21
*** ociuhandu has joined #openstack-nova09:21
efriedgibi: only one candidate per host -- yeah, that kills it (<== stephenfin)09:21
gibiefried: I have ongoing work in that series locally, so I have to restach top of the rebase09:23
gibiefried: I will manage09:23
efriedoh shit, I'm sorry gibi09:23
luyaosean-k-mooney: Hi Sean, are you around.09:23
gibiefried: no worries, I think I can do some interactive rebase magic09:23
efriedgibi: I didn't rebase anything except the first patch09:23
*** tbachman has quit IRC09:23
efriedso you may just be able to proceed as you were09:23
*** lpetrut has quit IRC09:23
gibiefried: ack. I will rebase my work on top of that09:24
efriedI was just trying to do what little I could to get the gate moving09:24
gibiefried: sure, I understand your motives09:24
efriedthat patch had already failed py37 on the f'in innodb thing09:24
stephenfinefried, gibi: That shouldn't matter, should it?09:24
*** lpetrut has joined #openstack-nova09:24
stephenfinI mean, if a host is reporting PCPU then by definition we shouldn't try using it for VCPU09:25
efriedstephenfin: It makes it a very imperfect solution, but I guess it still narrows the window09:25
efriedoh09:25
gibiefried: no worries. I guess I'm overreacted09:25
*** tbachman has joined #openstack-nova09:25
efriedthat's a good point stephenfin... I think09:25
stephenfinthe crucial thing is does the filter scheduler shuffle hosts?09:25
stephenfin*shuffle allocation candidates09:25
gibistephenfin: I see. so either there will be more than once candidate but then we are OK with the first. Or there only be candidate with VCPU and then we fallback to that09:25
*** mkrai has quit IRC09:26
gibistephenfin: there is some processing involved let me find a link09:26
efriedgibi: actually, I'm not sure if you had hoped to merge more in that series than is already +Wed, but it might be best to wait until after FF to do *anything* else. Esp since gate resources are going to be really scarce for the forseeable future.09:26
gibiefried: Matt and I has hopes for the series. He said he could be able to +2 the rest today. But I accept the bad news if the gate is sad09:27
*** shilpasd has quit IRC09:27
stephenfingibi: I think so. Either we'll return a candidate for host from the PCPU query (host has 'cpu_dedicated_set' configured)...09:27
stephenfinor we'll return a candidate for a host from the VCPU query (host has 'vcpu_pin_set' or 'cpu_shared_set' configured or nothing)...09:28
gibistephenfin: https://github.com/openstack/nova/blob/5fa49cd0b8b6015aa61b4312b2ce1ae780c42c64/nova/scheduler/manager.py#L17609:28
stephenfinor we'll return the same candidate for both (host has 'cpu_dedicated_set' *and* 'cpu_shared_set') but ignore the one from the VCPU request09:28
gibistephenfin: sound OK to me09:28
stephenfinbecause the PCPU query was done first09:28
stephenfinschweet09:28
gibistephenfin: but look at the link. We bould dicts :/09:28
* stephenfin clicks09:28
bauzasstephenfin: gibi: efried: alex_xu: seriously, I'm still continuing to consider we overthink the problem09:29
gibistephenfin: hm, that dict is not a problem the allocation requests still in a list09:29
efriedgibi: but only per host09:30
bauzasI mean, once the operator opts in, he litterally shards his cloud in twice09:30
gibiefried: but the fallback is needed per host, isn't it?09:30
bauzasif we say the opt-in is per compute, that means the sharding can be long09:30
efriedgibi: actually, maybe so.09:30
bauzasthat's why the more I see our convos, the best i think it would be to just consider a global option that would dictate all computes sending PCPU inventories09:31
gibistephenfin: we take the first ar per host https://github.com/openstack/nova/blob/5fa49cd0b8b6015aa61b4312b2ce1ae780c42c64/nova/scheduler/filter_scheduler.py#L23809:31
efriedgibi, stephenfin: as long as it's okay if we try has-only-VCPU candidates before we try a has-PCPU candidate09:31
stephenfinefried: It's not an issue. As noted, the NUMATopologyFilter or virt driver will fail that request09:31
efried...which should still be fine, because the ntf will remove... yeah09:31
efriedso again, stephenfin, under what circumstances are we worried about this?09:32
stephenfinbauzas: We sort of have that - the compute nodes won't send PCPU inventory until 'cpu_dedicated_set' is set09:32
bauzasstephenfin: I know, my opinion is just "doc it, dude"09:32
efriedIf it's only in the "sharded" case as bauzas says, then I think it may be acceptable to take the double GET /a_c hit.09:33
stephenfinefried: The current solution (where we only try the second 'GET /a_c' if the first doesn't return anything) or the modified one we're suggesting here (where we always make two requests and concat them) ?09:33
efriedbut if it can happen in a fully-pre or fully-post cloud, we should think it through more carefully.09:33
cdentin train two a_c hits is pretty cheap09:34
bauzasadding more complexity into a abstract API that would add special-cases in order to fix a very temporary situation is IMHO like using a big hammer09:34
efriedI'm talking about doing double GET /a_c preemptively.09:34
efriedcdent: even for CERN?09:34
cdentyeah09:34
cdent<1 sec per09:34
stephenfinoh, then I can't think there are huge issues outside of always making an extra requests to placement09:34
efriedand it's not just the API calls; we'd be doubling the amount of memory we're using to store the candidates09:34
stephenfinwhich cdent is saying isn't a huge issue09:35
efriedI suppose we could cut the ?limit in half... but I don't know if that's a good idea.09:35
stephenfinand only happens if we're using pinning09:35
stephenfinefried: alex_xu had suggested a config option to disable the double lookup09:35
efriedthat's also a good idea09:35
stephenfinor rather, disable the fallback09:35
efriedworkaround09:35
stephenfinworkaround option, killed in the next release09:35
stephenfinyeah09:36
cdent(the perf issue for a double a_c is the nova-side processing of the results, not the placemnt-side generaiton)09:36
stephenfinjust for CERN and other people with huge clouds who might run into this issue09:36
efriedso: workaround disabled, you might get nvh when you're out of "real" PCPUs; workaround enabled, you're taking an extra perf/mem hit (small though it may be) any time we... what?09:36
efried...detect that there are any PCPUs in the cloud?09:37
efriedstill trying to understand when we would do the double thing09:37
*** ricolin has quit IRC09:37
stephenfinanytime you try to create or move a pinned instance09:37
efriedmm09:37
stephenfinfor one cycle09:38
efriedmm09:38
bauzasyeah, that's what I'm trying to say : dudes, if you really want to support sharded clouds, it will come with a performance penalty09:39
bauzasin this case, make it super temporary and super clear that this is a workaround09:39
efriedbauzas: problem is, it's not just sharded that you get hit on09:40
efriedin this case if you fully upgrade, you still get hit09:40
efriedbut09:40
efriedin that case you should disable the workaround09:40
efriedbecause you're done09:40
efriedand don't need it.09:40
efriedso09:40
bauzasthat will leave the operator a choice between a quick rolling upgrade (and no performance hit) or a slow rolling upgrade with some performance degradation he could estimate09:40
efriedI think this is acceptable.09:40
bauzasefried: not really, you opt-in to a sharded world once you use a conf opt09:40
bauzasefried: I'm talking of the opt usage rolling update (sorry indeed)09:41
bauzasie. upgrade your cloud to Train09:41
bauzastake your time09:41
bauzasonce all computes are done, you have two choices09:41
bauzasA/ you have some puppetry that can change options on the fly09:41
bauzasin this case, you don't really need to use some workaround that gives you performance hit09:42
efriedstephenfin: propose double query config on or off by default?09:42
bauzasdo the option update at some time09:42
bauzasB/ you don't like CMSes or you prefer updating it smoothly for business reasons09:42
bauzasthen, you sign-off for a performance hit09:43
bauzasbut you'll be sure everything will continue to work09:43
bauzasthis sounds reasonable to me09:43
stephenfinefried: the previous approach was a boolean - request PCPU or request VCPU - and it defaulted to requesting VCPU to not break upgrades09:43
bauzasthat said, I trust cdent09:43
stephenfinI think the same thing applies here09:43
* cdent does his best evil laugh09:44
bauzasif querying a_c isn't really a performance issue, it's just a matter of providing a temporary codepath09:44
efriedstephenfin: meaning we want to err on the side of getting hosts, even if it hurts, so default to double query?09:44
cdentas I said, the performance concern is nova-side processing, which can be tested09:44
stephenfinwe need to default to double requests otherwise everyone will upgrade, not realize they need to set 'cpu_dedicated_set' on compute nodes, try to boot a pinned instance and fail miserably09:44
stephenfinefried: yup09:44
efriedack09:44
stephenfinI'll work with TripleO at a minimum to make sure that workaround option is enabled on new deployments09:45
stephenfinCan do the same from OSA and kolla too09:45
stephenfin(though I'm not sure Kolla supports configuring hosts for pinned instances etc.)09:45
* efried suspects dan is going to shit a brick09:45
stephenfinwe've got three hours or so before he's up - should be loads of time to shove everything through09:46
* stephenfin cackles09:46
efriedyeah, except the part where the gate delay is already past 3h09:46
*** ralonsoh has joined #openstack-nova09:46
stephenfin:(09:46
efried3h13m09:46
efriedand counting09:46
stephenfinI saw stuff in the queue for 9 hours last night09:46
stephenfinthough I may have been misreading things09:46
bauzasstephenfin: wait a sec09:47
bauzasthe workaround necessarly has to be enabled *by default* if you don't wanna break users09:47
efriedbauzas: "enabled" meaning "double" yah?09:47
efriedI think that's what we're suggesting.09:47
efriedstephenfin said he's going to work with the deployment projects to make sure they disable it explicitly for *new* deployments09:48
stephenfinbauzas: what efried said09:49
*** udesale has quit IRC09:49
stephenfinfor anyone that knows TripleO, that's what we were planning to do with the previous config option too https://review.opendev.org/#/c/681207/09:50
stephenfinso it can be done09:50
*** udesale has joined #openstack-nova09:50
*** hemna has quit IRC09:50
efriedare we ready to unblock the bottom of cpu-resources and/or vpmem at this point? Get them queued up...09:50
efried...does this issue warrant continuing to hold cpu-resources until ready?09:51
stephenfinI'm biased, but I _think_ we're good09:56
luyaoefried: for vpmems, I think I'm ready, but not sure stephenfin and sean-k-mooney, still need sean-k-mooney to confirm the xml again09:57
stephenfinThe quota issue and the switch over to the try-try again logic were the blockers. The former's resolved and the latter just needs a tweak09:57
stephenfinI've promised luyao I'll sign off on vPMEM today too, so that's pretty much good to go too09:57
efriedOkay. I'll unblock cpu-resources and start +Aing. Plenty of time to yank them out of the gate if something goes pear-shaped.09:58
efriedAnd I'll unblock vpmem, ready for your +A stephenfin09:58
stephenfinCool. Lemme finish reworking this patch and I'll hit that09:59
stephenfinefried, alex_xu, bauzas: bikeshed time - what should the workaround option be called?10:01
stephenfinit's disabling the second request for VCPUs if the instance is pinned10:02
stephenfindisable_second_request_for_pinned_instances sux10:02
efried[workarounds]try_really_hard_to_get_allocation_candidates_for_pinned_instances_but_suffer_a_performance_hit10:02
bauzassorry folks, was AFK10:02
efriedoh, right, reverse of that10:02
stephenfin*do_not_try_...10:02
stephenfinyup10:02
efriedyeah10:02
* bauzas suddently remembered he was a dad who left her daughters at school10:03
efriedhah, I did that the other day10:03
bauzasyou don't imagine how a 9yo can yell at you :p10:03
efriedmy kid: "why are you so late, dad?"10:03
efriedme: "because I suck"10:03
bauzasyeah, life of a remottee10:04
bauzasanyway10:04
bauzasfor the workaround name, meh10:04
bauzasjust provide a clear doc message, that's it10:04
stephenfindisable_fallback_pcpu_query10:04
stephenfinI'm going with that10:04
efried++10:04
bauzascool10:04
bauzasit could be "brexit"10:05
* bauzas runs off10:05
cdentpcpu_backstop10:06
bauzasthis ^10:06
efriedgibi, bauzas, sean-k-mooney: if one of you could ack https://review.opendev.org/#/c/671800/34 at some point soon...10:06
alex_xustephenfin: +1 disable_fallback_pcpu_query10:06
bauzasefried: I was previously +210:07
bauzasefried: but lemme look10:07
efriedbauzas: I see a +1 at PS3310:07
bauzasmy bad10:07
efriedbut yes, please +2+W if you're satisfied10:07
efriedthis is one I wasn't comfortable +2ing myself.10:08
* bauzas just does at the moment https://review.opendev.org/#/c/671800/33..3410:08
efriedhah, don't do that :P10:08
*** dtantsur|afk is now known as dtantsur10:08
bauzasoh heh, just discovered https://en.wiktionary.org/wiki/Bauza#English10:09
bauzas\o/10:09
*** mkrai has joined #openstack-nova10:11
bauzasefried: https://sbauza.wordpress.com/2014/11/14/how-to-compare-2-patchsets-in-gerrit/10:12
bauzas(FWIW)10:12
efriedYou're 26887th in the US10:12
*** ttsiouts has quit IRC10:13
* bauzas doesn't know he lives in the US 10:13
bauzasdid the president already paid for France?10:13
efriedYeah, if you moved here there would be 90*4* of you10:13
stephenfinI never learned how to use vimdiff10:13
*** ttsiouts has joined #openstack-nova10:13
efriedafaict the diff from PS33 is negligible, just that one reinstated assertion in two tests10:13
bauzasthat's what I saw10:14
bauzas+Wipped10:14
bauzasstephenfin: you don't really need vimdiff10:14
efriedthanks bauzas10:14
efriednow, who's going to +A the quota one?10:14
alex_xuby fiinger-guessing game?10:15
*** lpetrut has quit IRC10:17
*** ttsiouts has quit IRC10:17
*** hemna has joined #openstack-nova10:19
*** lpetrut has joined #openstack-nova10:19
*** tbachman has quit IRC10:21
*** markvoelker has joined #openstack-nova10:26
*** panda is now known as panda|ruck10:28
*** mkrai has quit IRC10:28
*** markvoelker has quit IRC10:32
*** jaosorior has joined #openstack-nova10:32
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139710:36
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth  https://review.opendev.org/67697210:45
efriedgibi: https://review.opendev.org/#/c/680810/ is failing, can I pull it out of the gate?10:46
openstackgerritBalazs Gibizer proposed openstack/nova master: Make SRIOV computes non symmetric in func test  https://review.opendev.org/68166710:48
openstackgerritBalazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth  https://review.opendev.org/67698010:50
gibiefried: sure10:52
*** hemna has quit IRC10:52
gibiefried: what is the way to get it out of the gate?10:53
gibiefried: rebase? That patch can be rebased freely as it is not part of the series.10:54
efriedyes. cool, thanks.10:54
openstackgerritBalazs Gibizer proposed openstack/nova master: Follow up for I220fa02ee916728e241503084b14984bab4b0c3b  https://review.opendev.org/68081010:54
efriedrebase stops the current run and shoves it to the back of the queue10:54
gibirebased.10:54
gibiis there any other way that does not involve changing the review itself?10:55
efriedthanks, re+W'd10:55
efriedgibi: no10:55
efriednot that I'm aware of10:55
gibiwe will run out of rebase targets if the master is not moving forward :)10:55
efriedyeah, I know :(10:55
efriedbut for now this is the best we can do.10:55
gibiack10:55
*** sapd1_x has joined #openstack-nova10:56
aspierskashyap, efried: seems hw_firmware_type is missing from glance/etc/metadefs10:56
aspiersand so is hw_mem_encryption10:56
efriedwhat does that mean?10:57
aspiersit means they are not shown as nice user-friendly options in Horizon, for a start10:57
aspiersnot sure if there are other implications10:57
efriedto refine the question: Is this going to affect a) gate, b) vpmem/cpu-resources series?10:57
aspiersI don't think it affects nova at all10:58
aspiersWe probably would have noticed by now if it did10:58
efriedthen I humbly request the issue be raised *after* this week10:58
aspiersI think it's just for registering metadata as "officially blessed" rather than some arbitrary key/value pair10:58
gibibauzas, efried: lost a +W in a rebase https://review.opendev.org/#/c/676972 could you please add it back?10:58
aspiersefried: haha sure, it can't be urgent :)10:58
aspiersefried: but probably trivial to fix too, and anyway it's a matter of submitting reviews to glance not nova10:59
efriedgibi: done, but you should feel free to re+W in these circumstances10:59
efriedaspiers: cool10:59
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149710:59
gibiefried: ack, will do11:00
sean-k-mooneyefried: i see bauzas ack'd https://review.opendev.org/#/c/671800/34 but ill take a look now too11:03
efriedsean-k-mooney: if you like. Though actually I would rather you re-looked at that one vpmem patch...11:04
efriedsean-k-mooney: this un https://review.opendev.org/#/c/678455/3611:04
sean-k-mooneyefried: sure11:04
efriedthank you11:04
sean-k-mooneylooking at teh cpu seriese have they all been approved11:04
efriedmostly11:04
efriedwe're still working on a couple at the top11:04
efriedremember that thing about trying a second GET /a_c if the first one returned no results?11:05
sean-k-mooneyok so we are waiting on the gate for most of them and fixing the last few patches11:05
sean-k-mooneyefried: ya11:05
sean-k-mooneyefried: is it causing issues11:05
efriedI'll save you reading the backscroll, but we're doing something different now: we're going to do both GET /a_c calls up front (with a workaround conf opt)11:05
efriedStephen is working on that now11:06
sean-k-mooneyok11:06
sean-k-mooneywhat is the advantage of that11:06
sean-k-mooneyare we doing them in parallel or something11:06
efriedwith the cascaded approach, if we did the first one, got results, then sent them down to the filter and the filter rejected all of them, we'd be hosed.11:06
efriedSo one option was to make that retry "wider"11:07
efriedwhich would be tough to code11:07
artomefried, has that configparser thing been fixed? I see you rechecked the NUMA LM patches11:07
sean-k-mooneyok i see11:07
efriedartom:  yes11:07
artomSweet11:07
efriedartom: carefully curating the gate atm11:07
artomYep, thanks for that11:08
efriedsean-k-mooney: no, we have to do them sequentially, because we always want to try the pcpu ones first for a given host.11:08
*** tbachman has joined #openstack-nova11:08
efriedbut we do one right after the other11:08
efriedand collect all the candidates11:08
efriedthen send them all down to the filter.11:08
sean-k-mooneyright11:08
sean-k-mooneyi assume we only add host form the second query if its not in the first?11:09
sean-k-mooneyor do we just let the filter handel that11:09
efriedI don't think we're being that careful about it11:09
sean-k-mooneyactully it does not matter11:09
efriedright11:09
sean-k-mooneythe filter work on host not allocation candiates11:10
sean-k-mooneyso it will pass the host if it fit and fail it if it does not11:11
sean-k-mooneyand the host will only support one of the two with pinning11:11
sean-k-mooneya host can have VCPU and PCPUs but if it returns pcpu allocation then that is what we will need to use11:12
sean-k-mooneyefried: by the way this https://review.opendev.org/#/c/679656/ is the numa live migration job. it not a pirortiy today but i would like to merge that before RC1 to get it running nightly once artoms code is fully merged.11:14
sean-k-mooneyit add the nova-nfv-multi-numa-multinode job11:15
efriedsean-k-mooney: ack, please ask about it next week11:15
sean-k-mooneyyep will do11:15
efriedsean-k-mooney: https://review.opendev.org/#/c/681358/ os-vif release ack pls?11:16
openstackgerritBalazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize  https://review.opendev.org/67882711:17
sean-k-mooneymels patch ya ill ack that now11:17
*** ttsiouts has joined #openstack-nova11:19
openstackgerritStephen Finucane proposed openstack/nova master: fixup! Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/68171911:20
stephenfinefried, sean-k-mooney, alex_xu, bauzas: Before I squash that back and fix the tests, want to give that the once over ^ ?11:20
alex_xuyup, checking now11:20
efried...11:21
*** ttsiouts has quit IRC11:23
*** ttsiouts has joined #openstack-nova11:23
alex_xuah, that is smart11:25
*** hemna has joined #openstack-nova11:25
alex_xustephenfin: works for me11:26
sean-k-mooneychecking11:26
efriedstephenfin: looks good, couple of comments in there.11:27
luyaosean-k-mooney: Hi, if you get time, could you help look vpmem xml again? thanks ( we discussed before about the numa requiring and alignment, and got a consensus about them)11:29
luyaohttps://review.opendev.org/#/c/678455/11:30
*** jaosorior has quit IRC11:31
sean-k-mooneyefried: also commented. we should not use update as i noted inline11:31
stephenfinefried: I shouldn't have pushed those mox patches through last night either. Sorry :(11:32
efriedno biggie stephenfin11:32
efriedthey're all going to get kicked out here in a minute11:33
sean-k-mooneyluyao: yep i have it open efried pinged me earlier11:33
sean-k-mooneyim just going to grab coffe then ill review the vpmem seriese in full starting with the xml patch11:33
stephenfincool. I switched to -2 on the bottom one to make it clear11:33
luyaosean-k-mooney: hah, thanks11:33
stephenfinOn the other hand, once that lands mox is basically gone. It's only been four years \o/11:35
efriedsean-k-mooney, stephenfin: It's not important enough to spend a bunch of time on, but am I crazy about .update()?11:35
stephenfinin relation to...?11:35
efrieda psum is a psum11:35
efriedthe way you're doing it I guess you're not replacing one that's already there, whereas .update would (redundantly) replace11:36
stephenfinI need a link11:36
efriedI guess python probably isn't smart enough to skip11:36
sean-k-mooneyoh maybe i miss understood11:36
efriedso yeah, I guess the way you're doing it is going to be more efficient11:36
efriedbut the result will be the same11:36
sean-k-mooneyi though stephen was filtering the allocation candiate but he is not11:37
efriedjust question of whether you want fewer LOC I guess11:37
efriedstephenfin: https://review.opendev.org/#/c/681719/1/nova/scheduler/manager.py@19711:37
sean-k-mooneyupdate is fine for the summaries11:37
sean-k-mooneyi would prefer if we filtered this however11:37
sean-k-mooneyDraft11:37
sean-k-mooneyalloc_reqs.extend(alloc_reqs_fallback)11:38
efriedsean-k-mooney: filter it how?11:38
stephenfinefried: tomato tomato. As you say, it doesn't matter11:38
stephenfinI'm happy to go with '.update()'11:39
sean-k-mooneyis that the set of alloction candiates11:39
efriedwe explicitly can't skip the ones we already have PCPUs for11:39
efriedthat's the whole point11:39
efried... I think11:39
sean-k-mooneyit would be better not to pass 2 allcoation candiate for any given host11:39
efriedbecause the filter might kick out the PCPU ones11:39
sean-k-mooneyefried: we need to make sure if the filter passes we use the PCPU allocation candiate not the VCPU one11:40
sean-k-mooneyif we have two we need extra code to handel that11:40
efriedThat's already covered I thought11:40
efriedbecause the candidates are in order per host11:40
sean-k-mooneywhere11:40
*** ratailor has quit IRC11:40
efried.extend is O(1) (I think) so we're just taking a mem hit by not filtering. Whereas if we loop through and try to filter, we're O(N).11:41
sean-k-mooneyif they are ordered and we take the first one then ok11:41
efriedy11:41
efriedwe noodled this out with gibi earlier I think11:41
stephenfinTo do that, we'd need to inspect each entry in alloc_reqs_fallback, pull out the resource provider ID, and check if it exists in alloc_reqs11:41
stephenfinand I think they're ordered so it won't matter11:41
*** pots has quit IRC11:41
sean-k-mooneyefried: exthend is fine if we have odering and can rely on it11:41
sean-k-mooneyi just was not aware that was a thing11:42
stephenfinI could do this (I think)11:42
*** pots has joined #openstack-nova11:42
efriedstephenfin: I'd rather leave it I think11:42
sean-k-mooneyya i think its fine11:42
efriedunless some good reason to do otherwise11:42
stephenfinalloc_reqs.extend([a for a in alloc_reqs_fallback if a['allocations'][path to rp UUID] not in provider_summaries)11:42
efriedstill O(N)11:43
sean-k-mooneyim just trying to make sure we dont pick the wrong allcoatin candiage and end up using VCPUs when we have pinned cores on a host that has both11:43
stephenfinNah, it's simpler as-is and avoids us doing extra work on what could be a very big list11:43
stephenfinYup11:43
stephenfinsean-k-mooney: Remember, the NUMATopologyFilter will prevent that11:43
efried++11:43
sean-k-mooneystephenfin: no the numa toplogy filter will only pass or fail a host11:43
sean-k-mooneystephenfin: it has no say over the allocation candiates11:44
sean-k-mooneyso it will pass or fail both11:44
sean-k-mooneyfilters today work on host not allocation candiates11:44
stephenfinah, right11:44
sean-k-mooneythat is why the ordering is important11:44
stephenfinso we'd be using the VCPU-based allocation candidate blob11:44
sean-k-mooneyand that we take the first one11:44
stephenfinbut the host has both 'NUMATopology.cpuset' and 'NUMATopology.pcpuset' set to something11:45
stephenfinand the NUMATopologyFilter would use the latter and pass the request11:45
sean-k-mooneyyes11:45
stephenfinOkay, I'll triple check that ordering is preserved11:46
sean-k-mooneyits an issue only if it we have free inventory of both VCPUs and PCPUs11:46
sean-k-mooneyand have allocation candate for both11:46
stephenfinagreed11:46
sean-k-mooneyin which case we need to make sure we use the PCPU inventory11:46
sean-k-mooney*allocation_candiate11:46
stephenfina corner case, but there's no reason to give someone that loaded gun11:47
*** ttsiouts has quit IRC11:47
sean-k-mooneyi prefer not giving people firearms at all11:47
donnydsean-k-mooney: something is wrong with the mirror at FN11:48
sean-k-mooneysowrd at noon is much more civalised then pistols at dawn11:48
* stephenfin was going to make a joke about hurleys, but your choice of firearms instead of weapons precluded that :(11:48
donnydI am working on getting it fixed11:48
stephenfinhttps://www.youtube.com/watch?v=vtnS3_bmtm811:48
sean-k-mooneyhurleys are a perfectly suitable alternitive11:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize  https://review.opendev.org/67882711:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow resizing server with port resource request  https://review.opendev.org/67901911:49
*** mkrai has joined #openstack-nova11:50
sean-k-mooneystephenfin: i havent seen that in a while11:51
*** elod has quit IRC11:51
*** elod has joined #openstack-nova11:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Extract pf$N literals as constants from func test  https://review.opendev.org/68099111:54
openstackgerritBalazs Gibizer proposed openstack/nova master: Improve dest service level func tests  https://review.opendev.org/68099811:57
openstackgerritBalazs Gibizer proposed openstack/nova master: Follow up for Ib50b6b02208f5bd2972de8a6f8f685c19745514c  https://review.opendev.org/68149011:57
alex_xustephenfin: for sean-k-mooney pointed case both have VCPU and PCPU, then we only keep the PCPU allocation_candidtes override VCPu allocation_candidtes in alloc_reqs, then it will be ok.11:57
alex_xuif both PCPU and VCPU on the same host, that VCPU must be for shared cpu11:57
sean-k-mooneyright11:58
sean-k-mooneybut i did not see code to do that in stephens code11:58
sean-k-mooneythat siad i only quickly looked at it11:59
alex_xuyea, nice spot11:59
stephenfinalex_xu: Yeah, we don't have that. Instead we're relying on ordering preventing that from happening11:59
stephenfinI think I should add the filtering, to be safe11:59
*** hemna has quit IRC12:00
sean-k-mooneyif we were building a dict of host to allocation candiate we coudl use set_default12:00
stephenfinScenario: we only have one host that provides both PCPU and VCPU12:00
stephenfinactually, I was going to say the the NUMATopology filter fails the request with PCPUs but passes the VCPU request12:01
stephenfinwould that happen?12:01
stephenfin...though?12:01
*** markvoelker has joined #openstack-nova12:01
sean-k-mooneyno12:01
stephenfinI guess something could have changed in the time between the two tests12:01
sean-k-mooneythe filter does not operate on requests12:01
sean-k-mooneythey operate on hosts12:01
stephenfinyeah, correct12:01
stephenfinso if our order is preserved, we'll have hit that host already due to the allocation request for PCPUs12:02
stephenfinif it failed once, it will fail if we hit it a second time due to the allocation request with VCPUs12:02
stephenfinright?12:02
*** tbachman has quit IRC12:02
alex_xuno, the scheduler filtered the hosts by allocation-candidtes first, then iterate the filtered hosts12:03
efried"something could have changed" disregard this, already a window between GET /a_c and PUT /allocs12:03
*** ttsiouts has joined #openstack-nova12:03
*** mkrai has quit IRC12:03
stephenfinefried: I mean we take the allocation request with PCPUs and try passing that through the NUMATopologyFilter or something and it fails12:04
sean-k-mooneyi though we got all the host form all allcoation candiate and looped over them once12:04
stephenfinwe then try a load of other hosts, which all fail12:04
sean-k-mooneyapplying all fiters12:04
sean-k-mooneythen passed that to the weighers12:04
stephenfinthen we come back to that same original host only using the allocation candidate with VCPUs12:04
*** mkrai has joined #openstack-nova12:04
sean-k-mooneyand finally select a host then find the corresponing allcoaiton candiate12:04
alex_xustephenfin: maybe we ignore the allocation_request if the rp_uuid already in alloc_reqs_by_rp_uuid https://review.opendev.org/#/c/681719/1/nova/scheduler/manager.py@21212:04
stephenfinonly in the time inbetween, something has changed on the host so the NUMATopologyFilter or whatever passes the request12:05
stephenfinit's going to be such a tiny window but I wonder if it's worth worrying about?12:05
sean-k-mooneystephenfin: if something changes the claim wil fial in placmeent12:05
alex_xuah, no, I'm wrong, we can't ignore directly, since we have same rp for child rp12:05
sean-k-mooneythen we will move on to the next host12:05
efriedI wouldn't have thought it was any different than what we're doing now for multiple candidates on one host.12:06
stephenfinno, it's not really actually12:06
efriedsean-k-mooney: I think he's talking about if the second one spuriously succeeds12:06
efriedbut yeah, I don't think this is worth worrying about.12:07
sean-k-mooneymy point is i dont think the numa toplogy filter will ever run twice for the same host12:07
sean-k-mooneyit will only run once12:07
efriedand i think that ^ is what we want12:07
alex_xu^ agree12:07
alex_xuagree to sean-k-mooney... :)12:07
sean-k-mooneybecause filter today have no knowadge of allcoation candates12:07
efriedbecause if there are available PCPUs for a host, we want to check that candidate only. But if there aren't any, there will only be (at most) VCPU candidates in the list.12:08
sean-k-mooneyif there are free pcpus and vcpus we will pass the host and have two candiates12:08
sean-k-mooneyand hopefully rely on orderr to pic the PCPU one12:09
efriedYes exactly. Whole point of this is to make sure the first thing we look at is *either* a PCPU candidate *or* there are no PCPU candidates.12:09
sean-k-mooneyya12:09
sean-k-mooneystephenfin: rather then guessing12:09
sean-k-mooneycan you add a test12:09
stephenfinsure. What should the test do?12:10
sean-k-mooneycreate a host that has inventories fo both12:10
sean-k-mooneyand assrt the pcpu inventory was calimed12:10
sean-k-mooneyas a functional test12:10
stephenfinI think I have that already12:10
* stephenfin checks12:10
sean-k-mooneyif it is we are all good12:10
efriedIf ordering was random, that would be nondeterministic12:10
stephenfinsean-k-mooney: Yeah, look at https://review.opendev.org/#/c/671801/46/nova/tests/functional/libvirt/test_numa_servers.py@36412:11
alex_xuIf a host have both vcpu and pcpu, then we will get two allocation_request in alloc_reqs_by_rp_uuid https://review.opendev.org/#/c/681719/1/nova/scheduler/manager.py@21212:11
sean-k-mooneywell we coudl do it in a loop12:11
sean-k-mooneybug fair point12:11
stephenfinIt's a resize test but it does exactly this12:11
alex_xuthen we only claim the PCPU one12:11
stephenfinLemme run that in a loop12:11
efriedalex_xu: exactly12:11
alex_xuso there will be check to found out there two allocation req, one for vcpu and another for pcpu, then only claim for pcpu12:12
sean-k-mooneyok i think ye have that in hand to ill go back to the vpmem reviews12:14
*** mkrai has quit IRC12:15
*** tbachman has joined #openstack-nova12:16
openstackgerritIvaylo Mitev proposed openstack/nova master: VMware: Update flavor-related metadata on resize  https://review.opendev.org/68100412:17
stephenfinefried: What was the decision on '.update()' vs. the for-loop again?12:18
sean-k-mooneystephenfin: update is fine12:21
sean-k-mooneythe sumaris will be the same12:21
sean-k-mooneyefried: the vpmem xml generation patch looks sane to me12:22
sean-k-mooneyim starting form the bottom of the seriese now and working my way up12:22
openstackgerritMerged openstack/nova master: Fix race in _test_live_migration_force_complete  https://review.opendev.org/68154012:25
sean-k-mooneydonnyd: thanks for the head up on the mirror issue12:25
*** lbragstad_ has quit IRC12:25
openstackgerritIvaylo Mitev proposed openstack/nova master: VMware: Update flavor-related metadata on resize  https://review.opendev.org/68100412:26
*** owalsh is now known as owalsh_brb12:27
donnydsean-k-mooney: I think we have it narrowed down. should be back up in the next couple hours12:28
sean-k-mooneyit looks like its the same issue i was having from your conversation on infra12:28
sean-k-mooneyyou are hitting MTU issue becasue of the tunnel12:28
sean-k-mooneywell similar12:28
sean-k-mooneyi needed to clamp it on HEs end12:29
sean-k-mooneyin your case you need to clamp the mtu in the neutron subnet12:29
sean-k-mooneybut same general symptoms of retransmits of large packets12:29
sean-k-mooneyby the way i also sit in the infra channel so you can always ping me there if there are issues12:30
sean-k-mooneyif you want to notify nova in general then here is fine too12:30
*** elod has quit IRC12:32
*** elod has joined #openstack-nova12:33
*** eharney has joined #openstack-nova12:36
openstackgerritBalazs Gibizer proposed openstack/nova master: Skip querying resource request in revert_resize if no qos port  https://review.opendev.org/68151312:37
gibibauzas, mriedem: I've finished respinning the bandwidth series12:38
openstackgerritThomas Bechtold proposed openstack/nova master: Add nova-status to man-pages list  https://review.opendev.org/68173312:41
*** derekh has quit IRC12:42
*** mriedem has joined #openstack-nova12:43
*** owalsh_brb is now known as owalsh12:43
*** larainema has quit IRC12:45
*** ociuhandu has quit IRC12:50
*** mriedem has quit IRC12:54
*** BjoernT has joined #openstack-nova12:56
*** derekh has joined #openstack-nova12:56
*** ociuhandu has joined #openstack-nova12:58
*** derekh has quit IRC12:59
*** mriedem has joined #openstack-nova13:00
*** derekh has joined #openstack-nova13:00
*** _erlon_ has quit IRC13:01
*** _erlon_ has joined #openstack-nova13:02
*** jmlowe has quit IRC13:03
*** BjoernT has quit IRC13:04
bauzasgibi: ack, will look13:05
openstackgerritThomas Bechtold proposed openstack/nova master: Add nova-status to man-pages list  https://review.opendev.org/68173313:06
sean-k-mooneyluyao: efried stephenfin https://review.opendev.org/#/c/678448/21/nova/objects/base.py i think that is a proablem that should be fixed. it could be a follow up but all_things_equal is not a valid comparison implematnion of ==13:10
efriedsean-k-mooney: say wha?13:13
sean-k-mooneyits not symetic so a == b  will not always be the same as b == a13:13
sean-k-mooneyand it does not check type13:13
sean-k-mooneyso two ovo with the same filed will compare equal if they are complete unrelated types13:14
efriedsean-k-mooney: This code is copied in from numa.py13:14
efrieda is not None because this is invoked with `self`13:14
efriedand I don't care if they're different types.13:14
mriedemlyarwood: have you seen this? https://bugs.launchpad.net/nova/+bug/184364313:15
openstackLaunchpad bug 1843643 in OpenStack Compute (nova) "VM on encrypted boot volume fails to start after compute host reboot" [Undecided,New]13:15
sean-k-mooneyefried: well it a free function so you can only rely on it being invoked with self  when it is used to implement _equal_13:16
sean-k-mooney* __eq__13:16
efriedright, but it's not called "two_objects_identical"13:16
efriedit's called "all_things_equal"13:16
sean-k-mooneyyes and equal is a stonger guarentee then equvalent13:17
sean-k-mooneyanyway13:17
lyarwoodmriedem: no if it's new13:17
sean-k-mooneythe way its used it will work13:17
* lyarwood looks13:17
efriedsean-k-mooney: the only reason this code is added to base is so it can be removed from numa and reused13:18
efriedthe code isn't changed.13:18
sean-k-mooneyso ill change to a +1 i guess but i still think that is an incorerct implemation of __eq__13:18
*** alex_xu has quit IRC13:18
sean-k-mooneyright i think the code in numa was wrong13:18
sean-k-mooneyagaing it proably works for the case we use it in13:18
sean-k-mooney*again13:19
sean-k-mooneybut its not generally correct13:19
*** alex_xu has joined #openstack-nova13:19
openstackgerritMartin Midolesov proposed openstack/nova master: Implementing graceful shutdown.  https://review.opendev.org/66624513:20
alex_xusean-k-mooney: I guess just quite return, since self can't be None, if the other is None, then just return False13:20
lyarwoodmriedem: that smells like more of an issue with resume_guests_state_on_host_boot tbh13:20
*** lbragstad has joined #openstack-nova13:20
* lyarwood adds notes in the bug13:20
efriedsean-k-mooney: more tech debt we shouldn't be fixing here.13:20
efriedsee https://review.opendev.org/#/c/681397/813:21
sean-k-mooneyefried: ya i said it could be fix in a follow up13:21
sean-k-mooneyas a free fuction its incorrect13:21
sean-k-mooneybut if only invoked in __eq__ where the first arguemtn it self13:21
sean-k-mooneythen soem of the assumtion make sense13:22
*** jmlowe has joined #openstack-nova13:24
*** dolpher has joined #openstack-nova13:24
*** jmlowe has quit IRC13:25
efriedsean-k-mooney: are you un -1 ing?13:26
sean-k-mooneyill +1 but https://www.python.org/dev/peps/pep-0207/#proposed-resolutions section 4 allow the interperter to swap the arguemtn to == which is why i rased this13:26
bauzasfolks, I'm being dragged for some internal bug duty, ping me for urgent reviews13:27
efriedack, thanks bauzas13:27
bauzaslive my life13:27
sean-k-mooneyefried: if == is not reflexiv/semetic it can break on no cpython impmenations13:27
sean-k-mooneyon cpython this will work fine as it wont swap them13:27
*** Luzi has quit IRC13:28
*** nweinber__ has joined #openstack-nova13:28
openstackgerritMatt Riedemann proposed openstack/nova stable/stein: Log notifications if assertion in _test_live_migration_force_complete fails  https://review.opendev.org/68174313:28
openstackgerritMatt Riedemann proposed openstack/nova stable/stein: Fix race in _test_live_migration_force_complete  https://review.opendev.org/68174413:28
efriedsean-k-mooney: open a bug?13:28
sean-k-mooneysure13:29
*** cdent has quit IRC13:30
*** pcaruana has quit IRC13:30
mriedemgibi: sorry about all of that _reschedule_resize_or_reraise noise13:32
mriedemi had the logic obviously wrong13:32
*** cdent has joined #openstack-nova13:34
efriedI'm going to get a nap while the gate cranks. Back in... say, 3.5h.13:35
dansmithgate looks constipated13:35
dansmithheh13:35
efrieddansmith: there was a dep problem overnight, resolved now, but everything is backed up, yes.13:36
*** efried is now known as efried_afk13:36
*** jmlowe has joined #openstack-nova13:36
bauzasdansmith: we checked whether we should provide the gate a pill, but looks like the issue is solved13:36
gibimriedem: no worries. I needed my fresh mind the morning to figure out what is going on.13:37
gibimriedem, bauzas: I will be on and off in the coming 2-3 hours. But I will be back for the night to finish up things if needed13:37
bauzashttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ERROR%3A%20Could%20not%20find%20a%20version%20that%20satisfies%20the%20requirement%20configparser%5C%2213:38
dansmithbauzas: go natural: https://www.youtube.com/watch?v=Ku42Iszh9KM13:38
bauzasgibi: like I said, I'm also dragged due to some internal bug scrub duty13:38
bauzas:(13:38
mriedemgibi: ok, i did spot a race in the new functional test in the sriov patch, so i'm going to -1 for that, but i can probably just fix and rebase the series myself13:38
mriedembauzas: you can't tell your management / team lead that today is f'ing upstream feature freeze?13:39
mriedemand get a 1 day break?13:39
bauzasdansmith: I wish you knew some french folks named "Les Nuls" so I could share you some video13:39
*** ricolin has joined #openstack-nova13:39
dansmithheh13:39
*** markvoelker has quit IRC13:40
mriedemif bauzas can't review the rest of the bw provider migrate series i'll see if i can get melwitt to help out13:40
bauzasmriedem: this duty should be around 1 hour or 213:40
bauzasbut then I'll have meeting13:40
bauzasyay13:40
mriedemis there a manager standing next to you with a gun to your head saying you have to do bug duty over FF?13:40
mriedemcan you tell them you're a nova core?13:40
mriedemone of the remaining few?13:41
bauzasmy dog is barking at me13:41
mriedemwhatever, this is why i have no hope for "we'll do it in U"13:41
*** ociuhandu has quit IRC13:41
bauzasmriedem: that's why you'll never see me saying "Ussuri"13:42
bauzasmriedem: "Unicorn" sounds a way better release name13:42
*** markvoelker has joined #openstack-nova13:42
bauzas(more appropriate at least)13:42
openstackgerritStephen Finucane proposed openstack/nova master: Include both VCPU and PCPU in core quota count  https://review.opendev.org/68137413:43
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180113:43
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106013:43
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106113:43
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489513:43
openstackgerritStephen Finucane proposed openstack/nova master: tests: Additional functional tests for pinned instances  https://review.opendev.org/68175013:44
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Remove single-use classmethod  https://review.opendev.org/68175113:44
alex_xustephenfin: need you kick the first one https://review.opendev.org/#/c/678447/1413:44
stephenfindansmith: In case you didn't see the scrollback, we modified the solution in  https://review.opendev.org/671801 slightly13:44
stephenfinalex_xu: just done13:44
alex_xustephenfin: thanks13:44
stephenfinthough I realize I missed some of your comments13:45
* stephenfin quickly fixes up13:45
artombauzas, want me to do the bugs thing?13:45
artombauzas, so you can concentrate on w/e?13:45
stephenfinalex_xu: However, https://review.opendev.org/#/c/671801/ should be ready for your viewing pleasure too13:45
bauzasartom: how is your series ?13:45
alex_xustephenfin: got it13:45
artombauzas, merged (well, merging, because gate)13:45
stephenfinbauzas: Want to look at https://review.opendev.org/#/c/681750/, pretty please?13:46
artom(Otherwise I wouldn't have offered)13:46
stephenfin(I split it out to make https://review.opendev.org/#/c/671801/ smaller)13:46
*** BjoernT has joined #openstack-nova13:46
bauzasartom: cool then okay, we could switch13:46
bauzasstephenfin: I can13:46
artombauzas, ack, I'll handle the bugs, do your nova core job13:46
bauzasstephenfin: did your reorganised your series ?13:47
bauzasoh wait no13:47
stephenfinnope, just broke some things out13:47
artomThe "out" is ket13:47
artom*key13:47
dansmithstephenfin: I didn't but I'll try to circle back in a bit13:48
alex_xustephenfin: I didn't find that part for sean-k-mooney said host both have pcpu and vcpu13:49
stephenfinalex_xu: We test it implicitly here https://review.opendev.org/#/c/671801/47/nova/tests/functional/libvirt/test_numa_servers.py@36413:50
stephenfinBoth of those hosts have both PCPU and VCPU inventory13:50
stephenfinand we still get PCPU as expected https://review.opendev.org/#/c/671801/47/nova/tests/functional/libvirt/test_numa_servers.py@46013:51
stephenfinand https://review.opendev.org/#/c/671801/47/nova/tests/functional/libvirt/test_numa_servers.py@48913:51
*** KeithMnemonic has joined #openstack-nova13:53
mriedemgibi: i've gone through the updates, i'll rebase and fix the small things and then i'll be +2 up the stack13:53
alex_xuemm...let me check13:53
bauzasmriedem: cool, I'll look at the series once i'm done with a couple of changes from stephenfin13:54
*** dave-mccowan has joined #openstack-nova13:54
mriedemefried_afk: so we're pushing PCPU in before the scheduler and quotas stuff is approved? https://review.opendev.org/#/c/671793/13:55
*** brinzhang_ has joined #openstack-nova13:55
*** tkajinam has joined #openstack-nova13:55
*** hemna has joined #openstack-nova13:55
mriedemi think for train the quotas one is pretty straight-forward, counting from placement will work the same as counting from nova,13:56
mriedembut the scheduler patch looks like it's still in flux13:56
stephenfinthe quota stuff has got a tentative ack from melwitt, and the scheduler stuff has the same from efried and alex_xu13:56
stephenfinsee here13:56
stephenfinhttps://review.opendev.org/#/c/681719/13:56
* stephenfin abandons same now that's outlived its purpose13:56
dansmitheven still, if we were to land that first patch and get stuck on the rest, we've got a config option that claims to enable something that doesn't, which is uncool13:57
stephenfinit'll do exactly what it says - you just won't be able to use the things it provides (PCPUs)13:58
gibimriedem: thanks a lot!13:58
*** hamzy has quit IRC13:58
bauzasstephenfin: can you please tell me why we need this https://review.opendev.org/#/c/681750/1/nova/tests/unit/virt/libvirt/fake_imagebackend.py ?13:58
mriedemi hope there is going to be some "so you want to use PCPUs" admin doc at some point b/c there are a ton of moving parts13:58
bauzasmriedem: I feel there are some docs that absolutely need to be written13:59
stephenfinbauzas: There's a check deep in the libvirt code for that. If it's not done, it complains with "function doesn't have attribute SUPPORTS_CLONE"13:59
bauzasmriedem: in particular given the upgrade plan we took a while to agree this morning13:59
stephenfinbauzas: I'm not the only one who found that. This is from the vPMEM series https://review.opendev.org/#/c/678470/35/nova/tests/unit/virt/libvirt/fake_imagebackend.py13:59
mriedemstephenfin: there should probably be a comment on that in the fake imagebackend code then13:59
bauzasstephenfin: what mriedem said14:00
*** ociuhandu has joined #openstack-nova14:00
bauzasbecause I have no idea why you need this14:00
mriedempretty gross that the virt agnostic image backend fixture has to have libvirt specific stuff in it14:00
*** belmoreira has joined #openstack-nova14:00
mriedembauzas: stephenfin loves to doc so i'm sure he doesn't have a problem documenting all the things14:00
bauzasmriedem: the upgrade strategy is a bit tricky, I'm just saying we need to carefully doc it14:01
bauzasbased on the fact we'll provide some options14:01
stephenfinmriedem: It's a libvirt-specific image backend fixture - look at the filename14:01
stephenfinit's for mocking out nova/virt/libvirt/imagebackend.py14:01
* stephenfin does love to doc14:02
mriedemstephenfin: you're right, sorry - i was thinking of https://review.opendev.org/#/c/678470/35/nova/tests/unit/image/fake.py14:02
stephenfinall g14:03
stephenfinI wonder what the comment should be though14:03
stephenfinI mean, it's an attribute of the Image base class in nova/virt/libvirt/imagebackend.py14:04
stephenfinand this function is a stub for that class (or some subclass)14:04
stephenfinso it's not really doing anything special but rather just ensuring the stub looks like the thing it's stubbing14:04
mriedem# Set the SUPPORTS_CLONE member variable to satisfy the Image base class.14:05
stephenfinSounds good. Done14:08
openstackgerritStephen Finucane proposed openstack/nova master: tests: Additional functional tests for pinned instances  https://review.opendev.org/68175014:08
openstackgerritStephen Finucane proposed openstack/nova master: Include both VCPU and PCPU in core quota count  https://review.opendev.org/68137414:08
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180114:08
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106014:08
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106114:08
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489514:08
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Remove single-use classmethod  https://review.opendev.org/68175114:08
*** pcaruana has joined #openstack-nova14:08
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Remove single-use classmethod  https://review.opendev.org/68175114:09
dansmithstephenfin: what job logs can I look at to see this running?14:13
stephenfindansmith: Look at the functional logs for https://review.opendev.org/#/c/67180114:14
dansmithno I mean a devstack/tempest job14:14
stephenfinNone. The Intel NFV CI has been missing for quite some time. It's been alex_xu, sean-k-mooney, Bhagyashri Shewale (forgot his IRC nick) and I testing it locally14:16
dansmithstephenfin: is there some reason this has to be tested in special ci?14:16
sean-k-mooneyyes it need nested virt14:16
stephenfinPinned CPUs. That needs nested virt14:16
stephenfinwhich we can't rely on14:16
sean-k-mooneybut i can test it in FN14:17
dansmithwe can't get a one-off?14:17
dansmithsean-k-mooney: that would be good14:17
artomstephenfin, wait, I thought this used the <vcpus> style pinning, no?14:17
artomWhich *does* work in the vanilla gate14:17
dansmithI'm shocked that artom and sean-k-mooney were able to get numa live migration tested for us, so surely this is possible14:17
sean-k-mooneythat was also testing pinning with the resize flavors14:18
sean-k-mooneyso yes14:18
artomdansmith, let's be honest, *sean-k-mooney* was able to get NUMA live migration tested14:18
dansmithartom: I just meant that effort, but sure14:18
stephenfinsean-k-mooney: can we just point the same job at the top-most change of this so?14:18
sean-k-mooneywe can also technical test non numa stuff in vexhost and limestone14:18
sean-k-mooneystephenfin: yes but ill create a second patch14:19
dansmithI'm disappointed that nobody has asked this question yet :/14:19
sean-k-mooneyi want to actully merge that job14:19
*** lbragstad has quit IRC14:19
stephenfinbetween manual tests and the functional tests, I figure our coverage is pretty much spot on as-is14:20
dansmithstephenfin: what testing are you doing locally? Just hand booting some things or running tempest?14:20
artomdansmith, presumably the crucial thing would be testing the upgrade path - so something grenade-y?14:20
stephenfindansmith: https://etherpad.openstack.org/p/nova-cpu-resources14:20
stephenfinalex_xu has his own set of tests that he's been working towards too14:21
sean-k-mooneywhat do people want me to actully test for this in the ci job?14:21
dansmithartom: I'm definitely concerned about that yeah, but more concerned about at least seeing a tempest run first14:21
sean-k-mooneyfor the live migration job i ran the live migration test shoudl i jsut run the server and network baskic op14:21
dansmithstephenfin: that's cool, but it's far from comprehensive14:21
alex_xumore on upgrade case, one node old, one node new, then boot, resize between nodes14:22
sean-k-mooneyalso do you want a singel node test or multi14:22
dansmithsean-k-mooney: if you're going to run the grenade job, multi would be good14:22
*** mriedem61 has joined #openstack-nova14:23
sean-k-mooneyim not sure how easy grenate will be. i could try the non legacy grenade job however14:23
*** mriedem has quit IRC14:23
*** mriedem61 is now known as mriedem14:23
sean-k-mooneyok i can give grenade a try but ill do a non grenade run first14:23
artomHrm, so actually grenade is not what I meant by upgrade14:23
artomI mean it's good too, but...14:23
artomI'd be most concerned about the in-place upgrade scenario14:24
dansmithartom: how is that not grenade?14:24
artomdansmith, wait, does grenade do that?14:24
sean-k-mooneythats all greande does14:24
artomI thought it just deployed a mix of old and new computes14:24
dansmithartom: that is all it does14:24
artomIt upgrades in place?14:24
sean-k-mooneyyes14:24
dansmithartom: it can leave one back14:24
artomOh, hah, ignore me then :P14:25
dansmithartom: but in general it's single node in-place14:25
sean-k-mooneyif its multi node it does the contoler first then you do the live migrate back and for test14:25
mriedemthe grenade multinode jobs leave one nova-compute back14:25
artomOh, and the multinode version of it does the live migration back and forth business14:25
dansmithsean-k-mooney: I think a full basic tempest run would be good just to make sure we're good on things like resizes, etc14:25
mriedemartom: yeah14:25
mriedemmy wifi dropped (power outage) - talking about PCPU CI testing?14:26
sean-k-mooneyok so tempest full single node with pinned flavor14:26
dansmithI'm concerned that maybe nobody has even run tempest against this yet to the point that there may be some gotchas we don't know about such that it can't run :/14:26
artomSo actually wouldn't we be able to just resurect https://review.opendev.org/#/c/680806/ and make it depend on stephenfin's series?14:26
dansmithmriedem: yes14:26
stephenfinartom: I could point run whitebox locally?14:27
mriedemstephenfin: alex_xu: have you done functional testing with the PCPU series where things aren't stubbed out?14:27
*** henriqueof1 has joined #openstack-nova14:27
*** hamzy has joined #openstack-nova14:27
artomstephenfin, whitebox doesn't have any code to test your stuff14:27
sean-k-mooneydansmith: well tempest has run with out pinning so if i do a run with only pinning that covers it right14:27
dansmithmriedem: they have an etherpad of hand-run test cases14:27
alex_xumriedem: I done manual test with upgrade case14:27
dansmithsean-k-mooney: yeah, isn't that what we're talking about here?14:27
artomstephenfin, but... in your series, for instances using the PCPU resource class, what's the XML that pins them?14:27
stephenfinartom: It tests that we can boot pinned instances, resize them, etc.14:27
artom<vcpu cpuset=> or <cputune> ?14:28
sean-k-mooneyill kick of 3 runs 1 with vcpu pin set 1 with onlyv cpu_dedicated_set and one with cpu_dedicated_set and cpu_shared_set14:28
stephenfinartom: The same thing that was used for pinning with 'hw:cpu_policy'14:28
dansmithsean-k-mooney: sounds good14:28
stephenfinnone of the libvirt XML stuff changes. This is all higher level accounting stuff14:28
dansmithstephenfin: he knows, he's asking from a where-can-we-test perspective I think14:28
sean-k-mooneyFN is currenly offline since donnyd is doing a router upgrde so ill use vexhost instead for now14:29
stephenfinYeah, I get that. The answer is the same place we can test 'hw:cpu_policy=dedicated' currently14:29
artomstephenfin, well I was asking because the allocation-styl <vcpu> pinning is what's done if the instance doens't have a NUMA topology and vcpu_pin_set is set (we talked about this, remember?)14:29
donnydI am ready to bring FN back online14:29
dansmithstephenfin: unless one knows where that is...14:29
artomstephenfin, and *that* can be tested in the gate14:29
*** hemna has quit IRC14:29
artomstephenfin, but from what you're saying that's not what your series uses14:29
sean-k-mooneydonnyd: well either works for this test i just need nested virt14:29
stephenfinartom: yeah, no, the instance will have a NUMA topology because of the pinning14:29
sean-k-mooneybut i do need to go write a patch to add a lable that will use all the nested virt provder rater then just one specifically14:30
*** henriqueof has quit IRC14:30
donnydsean-k-mooney: I think maybe vexxhost can also run this job. I don't want to speak on behalf of mnaser14:31
dansmithdonnyd: I think he's said several times that it can :)14:31
mnaseryes should be ok14:31
* mnaser goe sback to hiding14:31
*** mriedem has quit IRC14:31
*** mriedem has joined #openstack-nova14:32
donnydmnaser: do you have a flavor already built?14:32
sean-k-mooneyill do all the nodepool lable stuff next week to make this more resiliant14:32
donnydkk sean-k-mooney14:33
sean-k-mooneydonnyd: this does not need the multi numa node stuff14:33
sean-k-mooneyjust a single numa node is fine14:33
donnydoh just nested virt14:33
sean-k-mooneyyep14:33
sean-k-mooneywe just want to test pinning14:33
sean-k-mooneysince that is what we are chaning14:33
sean-k-mooneyonce artoms stuff is actully merged i can retarget the multi numa job to also test stephens code14:34
sean-k-mooneyand get test coverage for both14:34
dansmithmigration with pinning "works" today but just blind copies everything to the other side, right?14:34
sean-k-mooneyya14:34
dansmithso these tests should also support migration but with smarts yeah?14:34
sean-k-mooneyi can run the migration tests if you like14:35
dansmithyes, definitely14:35
sean-k-mooneybut it will do the wrong thing until artoms code lands14:35
mnaserno i didn't get a chance to setup the flavor because it didn't seem pressing at the time :)14:35
dansmithwe need to at least make sure we don't regress the stupid behavior14:35
mnaserill wait until "we need it"14:35
sean-k-mooneybut if i use concurancy 1 it wont break anything14:35
dansmithsean-k-mooney: or depends-on artoms?14:35
sean-k-mooneyim not sure if there will be a merge conflict14:36
sean-k-mooneyill test that locally14:36
dansmithuh14:36
openstackgerritMartin Midolesov proposed openstack/nova master: Implementing graceful shutdown.  https://review.opendev.org/66624514:36
dansmithwell, if there is we should rebase the pcpu stuff now, right?14:36
sean-k-mooneyam i guess so14:36
dansmithmaybe check locally first14:36
openstackgerritMatt Riedemann proposed openstack/nova master: Support migrating SRIOV port with bandwidth  https://review.opendev.org/67698014:36
openstackgerritMatt Riedemann proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149714:36
openstackgerritMatt Riedemann proposed openstack/nova master: Allow resizing server with port resource request  https://review.opendev.org/67901914:36
openstackgerritMatt Riedemann proposed openstack/nova master: Extract pf$N literals as constants from func test  https://review.opendev.org/68099114:36
openstackgerritMatt Riedemann proposed openstack/nova master: Improve dest service level func tests  https://review.opendev.org/68099814:36
openstackgerritMatt Riedemann proposed openstack/nova master: Follow up for Ib50b6b02208f5bd2972de8a6f8f685c19745514c  https://review.opendev.org/68149014:36
openstackgerritMatt Riedemann proposed openstack/nova master: Skip querying resource request in revert_resize if no qos port  https://review.opendev.org/68151314:36
sean-k-mooneydansmith: artoms funtional tests still ned to be rebased but if i merge the cpu series with artoms serise without that it works locally14:40
sean-k-mooneyso ill do that quickly14:40
dansmithsean-k-mooney: ack14:40
sean-k-mooneythe functionla patch we are leaving to not need to rebase all the rest14:40
dansmithsean-k-mooney: that means you can depends-on it in your gate jobs yeah? or are you going to rebase all of pcpu and push it up?14:40
sean-k-mooneyso that will be rebased when the stuff is merged14:41
mriedemgibi: bauzas: melwitt: i'm +2 up the bw provider migrate/resize series now https://review.opendev.org/#/q/topic:bp/support-move-ops-with-qos-ports+status:open14:41
mriedemjust a few changes need a +W but they aren't too hard14:41
*** dave-mccowan has quit IRC14:41
sean-k-mooneyya i just need to depnd on the second form last patch in artoms serie14:41
sean-k-mooneye.g. the one before the the functionl test14:41
sean-k-mooneybut i can have the job that is testing artoms stuff depend on stephens change too14:42
sean-k-mooneyand it should all merge fine14:42
sean-k-mooneydid i mention that zuul is awsome recently14:42
artomFamous last words ;)14:42
sean-k-mooneyi did the merge locally so it should be fine14:43
sean-k-mooneyit will be a 4 way merge however (master+numa+cpu+ci) patches14:44
dansmithnothing has even entered the gate yet14:45
dansmithI mean, nothing at all14:45
mriedemcheck queue is ~200 deep so nothing approved today is going to merge today14:45
sean-k-mooneyyep14:45
mriedemfor nova anyway14:45
* mriedem lights a cigarette in the queue for the mainframe job14:45
dansmithmriedem: don't burn your cards14:46
mriedemhey, i'm a professional14:46
mriedemoh wait,14:46
mriedemcan i switch cigarette for flavored vape?14:47
mriedemi want to be young and hip14:47
dansmithmriedem: vaping kills, haven't you heard?14:48
dansmithapparently kills you faster than cigarettes, which is pretty impressive14:48
donnydmriedem: the juul virginia tobacco is pretty good14:48
*** hamzy has quit IRC14:48
mriedemi did hear a pretty funny quote from trump on his ban on flavored vaping,14:48
*** lbragstad has joined #openstack-nova14:49
*** hamzy has joined #openstack-nova14:49
mriedemsomething like, "the kids, they're coming home, and they're saying to their mom, 'hey, i want to vape!', and it's bad"14:49
mriedemlike, what kid tells their parents they want to smoke?14:49
mriedemthey just do it and then tell their parents they were at their friend's house, the friend whose parents smoke14:49
mriedemduh14:49
stephenfinI see you've done this before14:50
mriedemi was at jared's house if my wife asks14:50
dansmithI like how mriedem has to fly under his wife's radar just like he had to with his parents14:53
openstackgerritsean mooney proposed openstack/nova master: [DNM] numa + pcpus in placment live migration tests  https://review.opendev.org/68177114:54
sean-k-mooneydonnyd: is FN currenly active?14:54
donnydno14:55
donnydhttps://review.opendev.org/#/c/681731/14:55
sean-k-mooneyok then ^ will fail14:55
sean-k-mooneythat is the tweek to the multi numa job14:55
sean-k-mooneyill start on the vexhost single numa version now14:55
*** hemna has joined #openstack-nova14:57
*** pcaruana has quit IRC14:57
*** jmlowe has quit IRC15:01
bauzasmriedem: ack, reviewing those now15:02
dansmithsean-k-mooney: thanks15:02
donnydsean-k-mooney: https://review.opendev.org/#/c/681773/15:02
*** weshay is now known as weshay_passport15:05
*** jmlowe has joined #openstack-nova15:05
*** tkajinam has quit IRC15:06
stephenfinalex_xu: If you're still about, could you look at https://review.opendev.org/#/c/681750/ ?15:07
luyaostephenfin: he's away, but will be back some time(he said, but not sure)15:09
aspierskashyap: is the reference to vexpress-a15 on https://docs.openstack.org/glance/latest/admin/useful-image-properties.html really correct?15:10
aspiersfeels like that doc is at best incomplete15:11
*** rouk has joined #openstack-nova15:11
*** hamzy has quit IRC15:14
*** hamzy has joined #openstack-nova15:14
*** spsurya has quit IRC15:16
bauzasgibi: +Wd with comment https://review.opendev.org/#/c/676980/22/nova/compute/manager.py@443915:20
bauzasgibi: I'm not opposed to using a deprecated method as it's an easy path15:20
bauzasgibi: but I feel this old API is broken and we should bump the version very soon15:21
gibibauzas: the comment about the usage explains that it is there until RPC version 6.015:21
bauzasI know15:22
gibibauzas: but until that, I _have to_ use the deprecated conversion methos15:22
gibid15:22
bauzasI'm just saying this RPC bump is quite important15:22
gibibauzas: I agree we shoudl make a bump15:22
bauzasgibi: yeah15:22
bauzasgibi: you could technically use another helper method15:23
bauzasbut this one is convenient, I don't disagree15:23
*** gyee has joined #openstack-nova15:23
*** brinzhang_ has quit IRC15:25
*** lpetrut has quit IRC15:25
gibibauzas: ack, I made a TODO for myself about RPC bump for Ussuri but I'm sure I will need help doing that properly15:25
*** damien_r has quit IRC15:28
dansmithgibi: we have a doc about how to do it actually15:29
dansmithit's been a while15:29
dansmithgibi: https://wiki.openstack.org/wiki/RpcMajorVersionUpdates15:30
dansmithnot sure we've done it since we've had auto version pinning so we'd need to think through that a bit15:31
bauzasgibi: yeah follow dansmith's point, I made it for the scheduler and I can give you a ton of code snippets15:31
bauzasdansmith: oh good point15:31
*** hemna has quit IRC15:31
*** JamesBenson has joined #openstack-nova15:31
bauzasit's been a while since I've seen a major bump15:31
bauzas(but it's been a while since I followed changes like I was before)15:31
gibidansmith: thanks. I've annotated my TODO with the wiki link15:32
sean-k-mooneywhats the the synatax for resouce request again "resouces:PCPU=2"15:33
sean-k-mooneythat is correct yes?15:34
sean-k-mooneyin the flavor15:34
dansmithsean-k-mooney: with proper spelling, I think so15:35
sean-k-mooneyill copy it form functional test to get it right15:35
bauzassean-k-mooney: yeah, if you don't want more15:35
gibimriedem: thanks for fixing the race and fixing up the series and the reviews!15:36
bauzassean-k-mooney: example for VGPU : https://docs.openstack.org/nova/latest/admin/virtual-gpu.html#configure-a-flavor-controller15:36
sean-k-mooneyya i found an example15:36
*** boxiang has quit IRC15:37
mriedemgibi: np15:37
bauzasFWIW, pidgin makes it a joke15:37
bauzasor a smiley rather15:37
*** boxiang has joined #openstack-nova15:37
mriedemdansmith: i think you did the last compute rpc api bump to 5.0 in queens15:38
mriedemwhich i'm pretty sure was after the auto stuff15:39
dansmithyeah I was looking.. I think so too15:39
mriedemsean-k-mooney: i said this the other day, but i'd think you'd want a flavor with PCPU=1 (i guess if you test resize you need an alternate flavor, but that could also be PCPU=1 and a different ram/disk) and you'll want to tell tempest to run the tests in serial or at least --concurrency=215:40
mriedemi think, because otherwise i'd expect novalidhost errors while concurrent tests are consuming all the PCPU inventory on the single node15:40
bauzasgibi: just a nit with +Wd https://review.opendev.org/#/c/679019/14/releasenotes/notes/support-cold-migrating-neutron-ports-with-resource-request-6d23be654a253625.yaml15:41
mriedemi'd probably also just restrict tempest to tempest.api.compute tests to start15:41
*** sapd1_x has quit IRC15:41
*** macz has joined #openstack-nova15:42
donnydsean-k-mooney: You should be good to go now15:42
*** rpittau is now known as rpittau|afk15:44
gibibauzas: ack, thanks15:45
*** maciejjozefczyk has quit IRC15:45
bauzasgibi: mriedem: efried_afk: I'm glad we can say support-move-ops-with-qos-ports as approved15:47
bauzaslet's now wait for the gate to claim it as implemented15:47
mriedemcool, thanks for hitting those15:47
bauzasmriedem: you did the most15:48
bauzasefried_afk: sean-k-mooney: what's the status with vpmem ?15:48
bauzascan i help ?15:49
bauzasstephenfin: FWIW, reviewing back https://review.opendev.org/#/c/671801/4815:49
stephenfinbauzas: I'm going through vPMEM while my local hosts are deploying for manual upgrade testing of PCPU15:50
stephenfinSo it _should_ be fine15:50
stephenfinWe're going to need to rebase this whole thing into one giant chain once it's one though15:50
stephenfinbecause vPMEM, VCPU and NUMA-aware LM series all conflict with each other15:51
bauzas\o/15:51
stephenfinso much fun :)15:51
stephenfinI can do that once everything is approved, I guess15:51
stephenfinhopefully the conflicts are trivial (I suspect they will be)15:52
*** hamzy_ has joined #openstack-nova15:52
*** ivve has quit IRC15:52
*** hamzy has quit IRC15:52
gibidddf15:52
gibisorry15:54
gibibauzas: thanks!15:54
gibibauzas: to be precise the cold migrate and resize part of support-move-ops-with-qos-ports is now approved. I have to continue working on evacuate, live migrate, and unshelve in Ussuri15:55
bauzasoh correct of course15:55
bauzasI was short in explanation :)15:55
gibi:)15:55
bauzasbut we stick with the plan you wrote15:55
gibibauzas: yeah, there was no way to fitt everything15:55
bauzasFWIW, I wish I could have had time for this with VGPU support...15:55
gibibauzas: let's try that again in U15:56
stephenfinmelwitt: ta for https://review.opendev.org/681374 (y)15:56
bauzasevery cycle, I make myself a promise15:56
bauzasI stopped doing it15:56
luyaostephenfin:  reminder for vpmems series. :)15:57
stephenfinluyao: https://review.opendev.org/#/c/678452/32 :)15:57
bauzasstephenfin: I'm balancing in my head the need for defaulting disable_fallback_pcpu_query to True as default15:58
melwittstephenfin: np, good call on keeping the quota tests with it and moving the other unrelated func test to the next patch15:58
stephenfinbauzas: It's the same discussion we had with the previous manual configuration option15:58
bauzasyeah I now15:58
bauzasknow*15:58
stephenfinmelwitt: Yeah, thank mriedem for that (he complained (fairly) that the next one was way too big)15:59
luyaostephenfin: thanks, not noticed that.:)15:59
openstackgerritBalazs Gibizer proposed openstack/nova master: Follow up for Ib50b6b02208f5bd2972de8a6f8f685c19745514c  https://review.opendev.org/68149015:59
*** ttsiouts has quit IRC16:01
gibibauzas, mriedem: could you re-apply your love there? ^^16:01
bauzasgibi: a safe love but yeah16:02
openstackgerritBalazs Gibizer proposed openstack/nova master: Skip querying resource request in revert_resize if no qos port  https://review.opendev.org/68151316:02
*** ttsiouts has joined #openstack-nova16:02
gibibauzas: :)16:02
*** dave-mccowan has joined #openstack-nova16:02
*** brault has quit IRC16:03
*** ricolin has quit IRC16:03
*** TxGirlGeek has joined #openstack-nova16:03
mriedemconsider my love applied16:03
*** elod has quit IRC16:04
*** ricolin has joined #openstack-nova16:04
gibimriedem: thanks16:04
artomEww16:04
mriedemhey,16:05
*** ricolin has quit IRC16:05
mriedemnothing eww about a grown man loving up on some patches16:05
*** elod has joined #openstack-nova16:05
*** ttsiouts has quit IRC16:06
bauzasstephenfin: https://review.opendev.org/#/c/671801/48/releasenotes/notes/cpu-resources-d4e6a0c12681fa87.yaml@35 worth considering a nova-status upgrade check that would verify before you upgrade to Ussuri that you no longer require the workaround16:06
melwittmnaser: fyi os-vif 1.15.2 has been released https://review.opendev.org/68135816:06
bauzasanyway, dropping now, but I'll be back later (doing a bit of trail running)16:07
stephenfinbauzas: Yup, good call. That's what we did for the consoleauth workaround16:07
* bauzas back in a few hours16:08
stephenfindansmith: Can we squash migrations in the future?16:08
dansmithstephenfin: what migrations?16:08
stephenfini.e. "INFO migrate.versioning.api [-] 254 -> 255..."16:08
stephenfinour DB migrations16:08
dansmithwe've compacted them before, but not in a long time, as you know16:09
dansmithso.. yes?16:09
*** dtantsur is now known as dtantsur|afk16:09
stephenfinI did not know that16:09
stephenfinor rather, I'd forgotten it16:09
* stephenfin notes to investigate that for ones from Queens or so and back in Ussuri (Ussuri?)16:10
*** artom has quit IRC16:10
mriedemstephenfin: my recollection with that is sdague heard from a bunch of operators at one point that they actually didn't like the compacting because it f'ed up things like FFUs16:12
mriedemthis was long before FFU was even a term16:12
mriedemso "skip level upgrades" at the time16:12
dansmithyeah it's a giant pain16:12
mriedemso we stopped doing it,16:12
mriedemand,16:12
stephenfinunderstandable. That's why I was thinking Queens16:12
melwittmriedem: I'm +1 on the top patch of brinzhang's set and I see you are +1 on the middle patch. I'm wondering if I should fix up the bottom patch since I doubt brin is around the rest of today16:13
*** ociuhandu has quit IRC16:13
stephenfinrather than Train or something more recent16:13
mriedemthe amount of new schema migrations we do compared to the old days is nowhere close16:13
dansmithalso true16:13
dansmithwe chewed through a ton early on, but it's rare today16:13
stephenfinI bring it up because I'm deploying on a machine locally and they're taking forever16:13
mriedemdan prince had a schema diff tool thing since he was the one that used to do it but i'm not sure if that works anymore16:13
stephenfinbut that machine is using a HDD so that could be a limiting factor16:14
mriedemmelwitt: i've got an appointment this afternoon and haven't been back on the api change itself yet, so i'm not sure i'm going to be able to get that done today,16:14
mriedemmelwitt: you could fix up the bottom change though yeah it's just simple testing16:15
*** jmlowe has quit IRC16:15
mriedemi also have a patch i've been meaning to update and get into placement this release that i need to spend some time on16:15
*** ccamacho has quit IRC16:16
melwittmriedem: ack. I'll do it just in case but won't expect you'll be able to be back to it today16:16
* stephenfin also notes to create empty migrations once we cut stable/train16:16
openstackgerritsean mooney proposed openstack/nova master: [DNM] cpu pinning testing  https://review.opendev.org/68180716:17
sean-k-mooney ok i think ^ is correct16:19
sean-k-mooneystephenfin: can you check https://review.opendev.org/#/c/681807/1/playbooks/nfv/pinning.yaml16:19
stephenfinsure16:19
sean-k-mooneythose flavors are correct right?16:19
sean-k-mooneyi need to start makeing the other 2 version vcpu_pin_set only where i will have to drp the expcitly resouce request and  cpu_dedicated_set only which shuld just work16:20
sean-k-mooneye.g. no flavor changes needed16:21
stephenfinsean-k-mooney: the first one has vcpus=1 but '--property hw:cpu_threads=2'16:21
stephenfinthat'll conflict, surely16:21
sean-k-mooneyactully ya it might16:21
sean-k-mooneyit previously had 2 vcpus16:21
sean-k-mooneyill fix that16:21
sean-k-mooneyanything else16:21
stephenfinPersonally I'd just drop the CPU topology stuff since it's not relevant to this16:21
stephenfinlooking16:21
sean-k-mooneyi just did16:22
stephenfin'--property hw:cpu_policy=dedicated --property resources:PCPU=2'16:22
*** brault has joined #openstack-nova16:22
stephenfinthat'll fail16:22
sean-k-mooneythat shoudl not16:22
stephenfinyou've to do one or the other. We enforce that at the API16:22
sean-k-mooneyoh ok16:22
sean-k-mooneybut wait no that does not make sense16:22
*** brault has quit IRC16:22
sean-k-mooneyif you just did --property resources:PCPU=216:23
sean-k-mooneythen you would not get cpu pinning16:23
sean-k-mooneyso why would we ever allow that16:23
stephenfinyou will - PCPU == a pinned CPUs16:23
sean-k-mooneywe should not allow that16:23
stephenfin'hw:cpu_policy=dedicated' is syntactic sugar for 'resources:PCPU=$(flavor.vcpus)'16:23
sean-k-mooneydidnt we say we woudl only do pinning if you had dedicated16:23
stephenfinA PCPU is a resource for dedicated CPUs16:24
sean-k-mooneyyes but we dont want to supprot people useing --property resources:PCPU=2 long term16:24
dansmithsean-k-mooney: why?16:24
dansmiththat's exactly what I want16:24
sean-k-mooneybecause you have to change your flavors if we change how we modle it in placement16:24
dansmithI'm fine with that16:24
sean-k-mooneyalso it will break if you enable multiple numa nodes in the image16:24
stephenfinbreak how?16:25
stephenfinor when?16:25
sean-k-mooneywhen numa is modles in palcmeent. it will add the request to the un numbered group16:25
sean-k-mooneybut if you had hw_numa_nodes=2 in the image and resources:PCPU=2 in the falvor it would fial16:26
sean-k-mooneyyou would have to use teh numbered resouce request synatx16:26
sean-k-mooneyin the flavor instead16:26
stephenfinin a future where NUMA is in placement, yes, you would need to use a different syntax16:27
sean-k-mooneyso we dont want to encurage resources:PCPU as it leaks implmeantion details of placmenet via the nova api16:27
stephenfinbut that's the same for requesting resource:VCPU if you use NUMA without placement16:27
stephenfinand we already support that16:27
stephenfin(requesting resources:VCPU)16:27
sean-k-mooneyyes that is also bad16:27
stephenfinmainly for ironic but anyone can use it16:27
sean-k-mooneyi get why this raw syntax exists16:27
sean-k-mooneybut its fragile16:27
sean-k-mooneyand leaking placmenet detail via nova api16:28
stephenfinthat's a fair opinion16:28
stephenfinbut going back to the original point16:28
stephenfin"--property hw:cpu_policy=dedicated --property resources:PCPU=2" is a no-no16:28
stephenfinas is "-property resources:PCPU=2 --property hw:cpu_thread_policy=prefer"16:29
sean-k-mooneyi expect  that to work if the PCPUs match the flavor.vcpus16:29
melwittfwiw when we have unified limits users are going to know all about placement resources, that's how they set limits16:29
stephenfinyou want either16:29
dansmithwhich is a good thing, IMHO16:29
sean-k-mooneystephenfin: that shoudl definetly work16:29
melwittyeah, just saying16:29
sean-k-mooneystephenfin: prefer is the most relaxed policy16:29
stephenfinit's not the prefer bit that's the issue16:30
*** DinaBelova has quit IRC16:30
sean-k-mooneyif you say nothing its a stricter requirement16:30
stephenfinit's the fact that you're mixing the old way of doing this and the new way16:30
sean-k-mooneythis is not what i understood form the spec16:30
stephenfinI want to be very clear and say you either do things with resources:PCPU and traits:HW_CPU_HYPERTHREADING16:30
stephenfinor with hw:cpu_policy and hw:cpu_thread_policy16:31
mriedemwe should probably start an etherpad for post-FF release todos huh....like documenting PCPUs16:31
sean-k-mooneyill change the tst to work but i think this is a problem16:31
dansmithso,16:32
dansmithwe're not sure how to use it16:32
dansmithand there are some concerns about how to use it16:32
dansmithso we should merge it, figure it out later and then document? :)16:32
stephenfinthat's not very fair16:32
mriedemi've started https://etherpad.openstack.org/p/nova-train-release-todo16:32
*** udesale has quit IRC16:32
stephenfinI don't know how to use BW-aware scheduling16:32
dansmithstephenfin: I think you know how to use it, which is why I said we :)16:33
mriedemit's documented16:33
stephenfinI'm not blocking that because I don't personally understand it16:33
mriedemthe bw stuff was documented in stein16:33
stephenfinSo's this. There's a not insignificant spec for the thing16:33
stephenfinand I don't think me not documenting things is a concern16:34
*** dave-mccowan has quit IRC16:34
sean-k-mooneywhen i get the jobs running ill review the api checks16:34
dansmithstephenfin: sean-k-mooney is one of the people you included in the "are testing it manually" crew, so I think being a little concerned is not unreasonable16:34
openstackgerritsean mooney proposed openstack/nova master: [DNM] cpu pinning testing  https://review.opendev.org/68180716:37
sean-k-mooneystephenfin: is that more to your likeing16:38
stephenfinperfect16:38
sean-k-mooneyi think the second flavor should result in no cpus pinning and an api error personally but that should work as you suggest16:38
*** DinaBelova has joined #openstack-nova16:39
*** gbarros has joined #openstack-nova16:43
openstackgerritsean mooney proposed openstack/nova master: [DNM] test with dedicated cpus only  https://review.opendev.org/68182716:43
stephenfinsean-k-mooney: I'll try not to rathole on it, but what do you think requesting 'resources:PCPU=N' should imply, out of curiosity?16:45
stephenfinIt sounds like your objections to that would apply equally to 'resources:VCPU=N'16:46
*** igordc has joined #openstack-nova16:46
sean-k-mooneyit should be an error if hw:cpu_policy is not set to dedicated and if it is set to dedicated it shoudl be compared to the flavor.vcpu16:46
sean-k-mooneyi dont think operators shoudl use either16:47
stephenfinbut I was told we wanted to get away from those request specs to the more generic 'resources' syntax16:47
sean-k-mooneyby who16:47
stephenfinjaypipes, efried, dansmith (above)16:47
sean-k-mooneybecause i rememebr talking about this with alex_xu and efried_afk16:47
* dansmith owns it16:47
sean-k-mooneyi had tought that i convinced both efried_afk and alex_xu that we shoudl prefer the abstract form16:48
sean-k-mooneye.g. hw:cpu_policy16:48
stephenfinprefer, yes, but not limit to16:49
sean-k-mooneyi have not spoken to dansmith or jaypipes about it16:49
stephenfinhw:cpu_policy is syntactic sugar16:49
sean-k-mooneyi dont think it should be jsut syntatic sugar16:49
mriedemefried_afk: in case you didn't see i've created https://etherpad.openstack.org/p/nova-train-release-todo and added it to the meeting agenda16:49
sean-k-mooneyi was pretty sure we agree in the reve to explcitly not support resouce:PCPUs16:49
sean-k-mooneythis came up in the pmem seriese too16:50
sean-k-mooneywe are not infering pmem usage form pmem resocues:... extra specs16:51
openstackgerritmelanie witt proposed openstack/nova master: Add user_id and project_id column to Migration  https://review.opendev.org/67399016:53
openstackgerritmelanie witt proposed openstack/nova master: Set user_id/project_id from context when creating a Migration  https://review.opendev.org/67941316:53
openstackgerritmelanie witt proposed openstack/nova master: Filter migrations by user_id/project_id  https://review.opendev.org/67424316:53
*** efried_afk is now known as efried16:56
*** weshay_passport is now known as weshay16:57
sean-k-mooneyanyway i have more or less said my pieice but the final point i want to make is that if you use resouces:* you will either have to do a resize to a flaovr without if if that resouce moves in placmenet before upgrading or we will have to do an online data migration of all instance embeded flavor or the vm will not be able to cold/live  migrate after an upgrade from on version of nova to another16:58
sean-k-mooneywell or a resize to the new flavor with the updated extra specs16:58
sean-k-mooneyreflectign the toplogy change16:58
stephenfinaye, that'll be the same for vGPU too, unfortunately16:59
stephenfinI'd rather like to provide a flavor migration tool in the future16:59
stephenfinMigrate all flavors included embedded ones, as a way of deprecating old extra specs16:59
stephenfinbut I haven't really thought it throuhg16:59
sean-k-mooneywe could but its still not a good thing to encurage people to use16:59
mriedemmelwitt: i'm +2 on those bottom 2 changes now16:59
stephenfinWe'll do it in U, eh :)16:59
mriedemif the api change doesn't make train at least the data will start flowing17:00
sean-k-mooneyi hope not17:00
sean-k-mooneyi dont think we should be deprecating those extrapecs17:00
mriedemstephenfin: heh https://review.opendev.org/#/c/637217/17:00
*** derekh has quit IRC17:00
mriedem"This code can be removed in Queens"17:01
melwittmriedem: ack thanks17:01
sean-k-mooneyanyway that not an somthing we are doing today17:02
stephenfin\o/17:02
* stephenfin points to the removal of cells v1, consoleauth and ec2 crud as proof he's serious about burning down tech debt17:02
stephenfinI have the nova-network series ready to go once U opens up17:03
sean-k-mooneystephenfin: what you crurrtly hav is inline with the spec https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html#example-flavor-configurations so i guess that is what we are stuck with.17:05
stephenfinyeah, I haven't pulled the rug out from anyone here17:06
stephenfinjaypipes wanted the resources-style syntax17:06
stephenfinyou wanted the hw:cpu_policy-style17:06
sean-k-mooneyyes we have both i would have hoped we could use both together if they dont conflict17:07
openstackgerritsean mooney proposed openstack/nova master: [DNM] legacy vcpu_pin_set pinning with shared emulator threads  https://review.opendev.org/68184017:08
*** hamzy_ has quit IRC17:09
*** hamzy_ has joined #openstack-nova17:11
*** ociuhandu has joined #openstack-nova17:11
*** artom has joined #openstack-nova17:13
sean-k-mooneyhttps://review.opendev.org/#/c/681773/ is now merged so FN should be back in rotation so ill also kick off the multi numa migration run17:13
sean-k-mooneywith stephens changes17:13
sean-k-mooneye.g. https://review.opendev.org/#/c/681771/117:13
*** ociuhandu has quit IRC17:16
sean-k-mooneyi seam to be getting node failures form the vexxhost lables so if thoese all fail ill swap it over to FN since that seams to be working proably again.17:20
*** ralonsoh has quit IRC17:20
*** awalende has quit IRC17:22
*** awalende has joined #openstack-nova17:23
*** nweinber__ has quit IRC17:23
*** awalende has quit IRC17:27
*** hemna has joined #openstack-nova17:27
*** brault has joined #openstack-nova17:30
*** ivve has joined #openstack-nova17:31
*** xek has joined #openstack-nova17:34
*** brault has quit IRC17:34
*** itlinux has joined #openstack-nova17:35
openstackgerritsean mooney proposed openstack/nova master: [DNM] cpu pinning testing  https://review.opendev.org/68180717:39
openstackgerritsean mooney proposed openstack/nova master: [DNM] test with dedicated cpus only  https://review.opendev.org/68182717:39
openstackgerritsean mooney proposed openstack/nova master: [DNM] legacy vcpu_pin_set pinning with shared emulator threads  https://review.opendev.org/68184017:39
sean-k-mooneyok they are all queued to run on FN now17:41
*** priteau has quit IRC17:42
*** nweinber__ has joined #openstack-nova17:43
*** markvoelker has quit IRC17:45
efriedstephenfin: I think I'm caught up since nap. Going to look at https://review.opendev.org/#/c/681750/ (additional func tests) now. Anything else I need to hit?17:46
*** markvoelker has joined #openstack-nova17:46
stephenfinefried: I don't think so17:46
efriedstephenfin: you still running up the vpmem series?17:46
sean-k-mooneywe should get the results fomr FN in about an hour by the way17:47
efriednoyce17:47
stephenfinI'm working through manual tests at the moment but once that's done, I'm probably going to string together NUMA-aware live migration, vPMEM, and PCPU into one giant chain17:47
sean-k-mooneythe result for combining the numa migration serise with PCU will report back first then teh 3 toplogies for PCPUS series only17:47
stephenfinon account of the few (trivial, but not auto-resolvable) merge conflicts between the three17:47
sean-k-mooneyim gong to go eat something. be back in a while17:48
*** bbowen__ has quit IRC17:55
*** boxiang has quit IRC17:55
*** boxiang has joined #openstack-nova17:56
*** pcaruana has joined #openstack-nova17:58
sean-k-mooneystephenfin: can you wait till http://zuul.openstack.org/stream/c1f6ccc551c649bab6f60911fd3485ff?logfile=console.log complete before you push that chain17:59
*** brault has joined #openstack-nova17:59
stephenfinsean-k-mooney: sure, I won't be doing it for a while18:00
sean-k-mooneyok it just started running tempest tests18:00
*** hemna has quit IRC18:00
sean-k-mooneythat is numa migration + pcpu in placmenet18:01
*** oomichi_ has joined #openstack-nova18:01
sean-k-mooneyit would be nice to wait for the 3 other jobs too but if that one pass thats a good start18:01
sean-k-mooneyoh the first live migration test passed18:01
efried\o/18:04
*** brault has quit IRC18:04
sean-k-mooneyone of the tests failed so far but the rest are all passing. any idea what http://paste.openstack.org/show/775434/ is18:06
*** lbragstad has quit IRC18:07
sean-k-mooneyit looks like a delete failed but we will need the full logs to know why18:07
sean-k-mooneyit might be unrelated18:07
aspiersefried: I fixed the glance metadata and it makes SEV quite nice to set up from Horizon https://review.opendev.org/#/c/681866/18:12
sean-k-mooneydid you also update the usful image properts. apparenlty that is a thing you are ment too do18:13
aspierssean-k-mooney: click the link ;-)18:13
*** dolpher has quit IRC18:13
sean-k-mooneyyep just did and you did + added the release note18:13
aspierssean-k-mooney: yes I extended your release note18:13
sean-k-mooneyso that shoudl be all in order18:13
sean-k-mooneyactully i exteded someone elses18:14
aspiersand I used git blame to find a good commit to copy and it was yours ;)18:14
sean-k-mooneythey uses 1 release note per release apparely for al the metadefs18:14
aspiersweird18:14
aspiersbut better to stay consistent18:14
aspiersefried, sean-k-mooney: look how pretty it is! https://photos.app.goo.gl/MTuTzPnz165bVVaC618:15
sean-k-mooneyif it works for them. why not. the only issue would be backporting could be weird but you would not backport them anyway18:15
sean-k-mooneyha you with your suse theming18:15
aspiers:-D18:16
aspiersbelieve it or not that is devstack18:16
sean-k-mooneyits says suse openstack cloud18:16
aspiersyup18:16
aspiersit's still devstack ;)18:16
aspierswith our theme18:16
sean-k-mooneyoh you have enabled the plugin18:16
*** hamzy_ has quit IRC18:16
sean-k-mooneyya18:16
sean-k-mooneyit does look nice18:16
sean-k-mooneyaspiers: also its nice to have the docs programatially available18:17
*** hamzy_ has joined #openstack-nova18:17
aspiersyes18:17
sean-k-mooneyyou could in princapal hit the metadef api for the cli or heat too18:17
aspiersI didn't really know it worked this way until today18:17
sean-k-mooneyheat does actully use it18:17
aspiersYeah I guessed it would18:17
sean-k-mooneyso the heat ui for flavors will pull that info too18:18
aspiersexactly18:18
sean-k-mooneyi have a todo to go through all th eimage porperties we suppport in nova and figure out where the gaps are18:19
sean-k-mooneyi would like to get them all up to date18:19
stephenfinso I have never done boot from volume before18:19
stephenfinand our docs don't work18:19
stephenfinhurrah!18:19
stephenfin:D18:19
sean-k-mooneydo it from horizon18:19
stephenfinlooking at https://docs.openstack.org/nova/latest/user/launch-instance-from-volume.html18:19
sean-k-mooneyits the defualt if you have cinder instaled18:19
stephenfinI got it working with novaclient18:19
stephenfinthanks to a bug which we closed as not having enough info /o\ https://bugzilla.redhat.com/show_bug.cgi?id=150546518:20
openstackbugzilla.redhat.com bug 1505465 in python-openstackclient "openstack server create can't parse --block-device option source=blank,dest=volume" [Medium,Closed: insufficient_data] - Assigned to jpichon18:20
stephenfini need me a whiteboard18:20
sean-k-mooney well did you try https://docs.openstack.org/nova/latest/user/launch-instance-from-volume.html#create-volume-from-image-and-boot-instance18:20
stephenfinI did. See step 3? '--block-device' doesn't exist18:21
stephenfinthat's the syntax from the novaclient command18:21
stephenfinAnother lossy conversion from novaclient to openstackclient, I suspect18:21
stephenfinI hope I didn't review that, because I did a terrible job if so18:21
stephenfinDoesn't matter though. I have live migration working between the two baremetal nodes _finally_ (only took three hours). Time to start manual testing18:21
sean-k-mooneymriedem:  would know but i think we wantd to add some syntatic sugar to make this better18:22
sean-k-mooneystephenfin: ^18:22
stephenfinthough it sounds you'll have most of it done before I even get started. Hurrah for automated testing18:22
stephenfinWell, docs that are accurate would be a great start18:22
sean-k-mooneyas i said i ues horizon for this because well the docs always sucked18:23
*** IvensZambrano has quit IRC18:23
sean-k-mooneyfor "boot form voulem form image" anyway18:23
dtroyerstephenfin: osc4 was released yesterday and has the block device fixes we did over the summer18:23
dtroyerby 'we' I mean mriedem and others18:24
sean-k-mooneyim guessing we udpated the wrong docs18:24
stephenfindtroyer: o rly? If you can point me to the tl;dr: for that, I can incorporate it in the docs rework I'll do post feature freeze18:24
sean-k-mooneydtroyer: ye/they added a cleaner syntax that hide most of the detail right18:24
mriedemhttps://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create18:25
mriedem[--boot-from-volume <volume-size>] is new18:25
dtroyeryes, was an entire forum session in Denver18:25
sean-k-mooneydtroyer: yep i was there whcih is why i rememebr this was a thing18:25
mriedemand --block-device-mapping allows type=image18:25
stephenfinI personally tried various combinations of --image, --volume, --block-device and --block-device-mapping before giving up and falling back to novaclient /o\18:25
mriedemstephenfin: https://review.opendev.org/#/q/topic:story/2006302+(status:open+OR+status:merged)18:26
sean-k-mooneymriedem: can you do --boot-from-volume 100G --image <my image> now?18:26
mriedemeff yeah you can18:26
sean-k-mooneyi was hoping it would end up bining somthing like ^18:26
sean-k-mooneyawsome18:27
efriedaspiers: what's causing it to be spelled "flavours"?18:27
efriedthat should be fixed18:27
aspiersefried: my locale18:27
stephenfinI guess what I wanted (not now) was the equivalent of 'nova boot --block-device source=image,id=<image ID>,dest=volume,size=10,shutdown=preserve,bootindex=0 ...'18:27
aspiersand no, it should be fixed whenever it's spelt "flavors" :-p18:28
sean-k-mooneyefried: that is the correct spelling18:28
efried'flavor' should be a token18:28
mriedemstephenfin: there is a thing like that18:28
mriedemnot the exact same syntax18:28
*** hemna has joined #openstack-nova18:28
aspiersefried: en_GB != en_US18:28
stephenfinit sounds like with OSC 4 that's 'openstack server create  --boot-from-volume 10G --image <image ID>18:28
stephenfin'18:28
stephenfin*...'18:28
mriedemstephenfin: yeah18:28
efriedaspiers, sean-k-mooney: I'm not disputing that the word "flavor" is spelled "flavour". This is not a word. It's the name of a thing. You can't issue a command with "openstack flavour ..."18:28
mriedemmaybe the osc functional test makes usage more clear https://review.opendev.org/#/c/674111/5/openstackclient/tests/functional/compute/v2/test_server.py18:28
aspiersefried: which instance of "flavour" are you referring to?18:29
dtroyerefried: /me makes a note for a special build…18:29
stephenfinIt's on the list of doc fixes I want to get to post feature freeze18:29
sean-k-mooneyefried: oh if its givign a cli example ya the US spelling needs to be used18:29
efriedI'm talking about this: https://photos.google.com/share/AF1QipOckLfsOWRShI0bwvHL-PGoVzhxwWM376UgImt2h1flG6AXslZmqiMvpF5V3EmxzA?key=MUNMbkVQVDMwbnlmQkswOVZLZldzRThyYV9BaGt318:29
efriedthe title "Update Flavour Metadata"18:29
efriedYou're not updating the metadata for a flavour.18:30
efriedYou're updating the metadata for a flavor18:30
efriedno matter where you're sitting.18:30
aspiersthat's just Horizon's translation18:30
aspiersimagine if it was Chinese18:30
efriedswhy I'm asking how that is being "translated". It shouldn't be.18:30
sean-k-mooney ya form a ui point of view its fine alther the flavor quota is not updated18:30
stephenfinsounds like the great configuration drive vs. config drive war of early 201918:30
aspiersit would look ridiculous if there was some Mandarin surrounding the word "Flavor"18:31
donnyddtroyer: can you just alias that in osc18:31
aspiersefried: in fact, it would be ridiculous in any language. You can't have titles which are only partially translated18:31
efriedwhoah18:31
efriedof course you can.18:31
* donnyd fans flames18:31
donnydLOL18:31
aspiersOK, agree to disagree I guess :)18:32
efriedshould you have 新星 instead of "nova" in a chinese title???18:32
aspiersQuite possibly, but that's different18:32
* stephenfin suddently feels hungry18:32
artomзагрузка с volume18:32
aspiers"nova" is a name which has no pre-existing semantics relevant to OpenStack18:32
aspiers"flavor" is a word which is being used in a way which aligns with its normal every-day meaning18:33
aspiersso they are totally different18:33
aspiersIf OpenStack used the word "purple" instead of "flavor" then I might agree with you. But it doesn't.18:33
sean-k-mooneyefried: i use to work in a localistion research center in my university before i started on openstack18:33
aspiersefried: But you are free to report a bug in Horizon ;-)18:34
sean-k-mooneythey would have said yes you shoudl translate it except where giving example of commandline/tools or api requets18:34
aspiersYeah I'd agree with that18:34
efriedokay, nova is clearly a terrible example. "cinder" is also totally aligned with its normal every-day meaning, so we should translate that one. And "glance".18:34
aspiersHuh :)18:34
aspiersHow are cinder and glance aligned?18:35
efriedyou know, cinder for block storage, glance for images.18:35
aspiersYeah I know but that's not alignment. That's a couple of obscure puns.18:35
efried"flavor" is just as far away from what the thing is in openstack as "cinder" and "glance".18:35
aspiersEveryone knows what "flavor" means, without needing a pun to be explained.18:35
efriedwow18:36
aspiersIt's not18:36
sean-k-mooneyefried: your saving grace is the fact that you are not ment to traslate proper nouns  like names, tradmarks. products18:36
artomFlavors are usually in my mouth18:36
artomNot in my cloud :)18:36
sean-k-mooneyso if you want to win the argument sat that falvor is the proper name of a resouces typ18:36
aspiershttps://www.dictionary.com/browse/flavor18:36
aspiers"3. the characteristic quality of a thing:"18:36
efriedNow, if we wanted to capitalize Nova, Glance, and Cinder I would *totally* agree they're different.18:36
aspiersThat's exactly what nova flavors are18:37
sean-k-mooneyyou know the code ofthen also spells flavor  instance-type18:37
artom"broken" is not a flavor ;)18:37
aspiers"4. a particular quality noticeable in a thing:"18:37
sean-k-mooneyanyway https://20aa241df59e72844ae9-597ff148d0ea9164d11e7cb764cf9b04.ssl.cf5.rackcdn.com/681771/1/experimental/nova-nfv-multi-numa-multinode/c1f6ccc/testr_results.html.gz18:38
aspiersBoth of those align with nova's usage18:38
sean-k-mooneyit look like almost all the test passed18:38
aspiersFlavors are qualities of VMs18:38
efriedor "instances"18:38
aspiersAnyway, dinner time18:38
sean-k-mooneyfor the combinined numa migration + pcpus tests18:38
*** gbarros has quit IRC18:38
artomsean-k-mooney, "almost" makes me nervous18:38
sean-k-mooneythe ones that failed were not live migration18:38
sean-k-mooneythose all passed18:39
sean-k-mooneyyour code is fine18:39
efriedI suspect it would confuse the hell out of Chinese operators if "instance" was 例子 everywhere.18:39
artomsean-k-mooney, well ok, but is stephenfin's code fine?18:39
sean-k-mooneyit finished 30 seconds ago so im chekcing18:39
*** mriedem is now known as mriedem_afk18:39
mriedem_afki'll be back before the meeting18:39
aspiersefried: I have no idea, but anyway none of this is related to any of my patches, so I'm washing my hands ;-)18:40
sean-k-mooneyone of the 3 failures was failing to delete a vm18:40
efriedYeah, even at the start of this I was just being pedantic, not arguing for actual change18:40
aspiersI would have thought that the i18n team would have received enough complaints by now if it wasn't right18:40
aspiersUnless Horizon changed something recently, I dunno18:40
*** jmlowe has joined #openstack-nova18:40
mriedem_afkthe one guy at a university trying to stand up nova + xen just called out in the ML18:41
aspiersI never noticed it with the British spelling before, but maybe it was always there18:41
efriedtimeouts18:41
sean-k-mooneyok 1 was failure to delete an instance and 2 failrues were trying to migrete to the same host18:41
sean-k-mooneyartom: ^18:41
artomsean-k-mooney, which review is that from?18:41
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/c1f6ccc551c649bab6f60911fd3485ff/log/controller/logs/screen-n-cpu.txt.gz?severity=418:42
sean-k-mooneyah let me get the link18:42
sean-k-mooneyhttps://review.opendev.org/#/c/681771/118:42
dansmiththere's some failed db lookups18:43
dansmithlike upcalls or something? I don't even know18:43
dansmithSep 12 18:10:31.308872 ubuntu-bionic-expanded-fortnebula-regionone-0011217193 nova-compute[14200]: ERROR nova.compute.manager [instance: 44f764db-e0ef-492a-8ac4-c9b110037b7d] oslo_messaging.rpc.client.RemoteError: Remote error: CantStartEngineError No sql_connection parameter is established18:43
dansmithSep 12 18:10:31.376947 ubuntu-bionic-expanded-fortnebula-regionone-0011217193 nova-compute[14200]: INFO nova.compute.manager [None req-a4e37fd4-9b29-472f-82f8-4cc5d6ba78cd tempest-TestNetworkAdvancedServerOps-353388395 tempest-TestNetworkAdvancedServerOps-353388395] [instance: 44f764db-e0ef-492a-8ac4-c9b110037b7d] Setting instance back to active after: Instance rollback performed due to: Unable to migrate instance18:44
dansmith (44f764db-e0ef-492a-8ac4-c9b110037b7d) to current host (ubuntu-bionic-expanded-fortnebula-regionone-0011217193).18:44
sean-k-mooneyya i think i have seen those before18:44
sean-k-mooneybut i dont know where18:44
artomWasn't that happening when you were testing just my patches?18:44
sean-k-mooneyperhaps18:45
sean-k-mooneywe shoudl have the ci logs to check18:45
artomYeah, trying to find them18:45
sean-k-mooneyi think i have seen that on master before too18:45
dansmithit's trying to do a reschedule,18:45
sean-k-mooneylogstsh shoudl tell us i guess18:45
dansmithbut I'm guessing that means we failed to do something we should have been able to do,18:46
dansmithlike claiming pcpus again or something18:46
sean-k-mooneyso we have this error https://zuul.opendev.org/t/openstack/build/c1f6ccc551c649bab6f60911fd3485ff/log/controller/logs/screen-n-cpu.txt.gz#482518:47
sean-k-mooneythat is what i expect to see wehn we live migrate without updating things properly18:47
dansmiththe logs I saw were cold migrate I think18:47
dansmithyep, resize18:48
sean-k-mooneyinventory data: {'VCPU': {'total': 7, 'reserved': 0, 'min_unit': 1, 'max_unit': 7, 'step_size': 1, 'allocation_ratio': 16.0},18:54
sean-k-mooneyso that isnt workign correctly right18:54
sean-k-mooneythis should be the combiantion fo stephens change + artoms18:54
sean-k-mooneyso i would expect that to contain PCPUS18:55
sean-k-mooneyi have other jobs that are just stephens code running too18:55
dansmithwouldn't we fail to boot at all if there wasn't any inventory?18:55
dansmithmeaning, fail to get to the compute at all18:55
sean-k-mooneyi think its reporting inventory of PCPU and either not traslating or doing the fallback18:56
artomI didn't touch any of the scheduler stuff in my code - I assumed it handled placement allocations correctly for a live migration18:57
artomMaybe that part needs to be filled in for PCPUs?18:57
dansmithinventory data: {'VCPU': {'total': 7, 'reserved': 0, 'min_unit': 1, 'max_unit': 7, 'step_size': 1, 'allocation_ratio': 16.0}, 'MEMORY_MB': {'total': 16039, 'reserved': 512, 'min_unit': 1, 'max_unit': 16039, 'step_size': 1, 'allocation_ratio': 1.5}, 'DISK_GB': {'total': 74, 'reserved': 0, 'min_unit': 1, 'max_unit': 74, 'step_size': 1, 'allocation_ratio': 1.0}18:58
dansmitham I missing the PCPU in there?18:58
dansmithTranslating request for VCPU=2 to VCPU=0,PCPU=218:59
dansmithI thought that was supposed to happen in the scheduler request, but that's in the compute log18:59
sean-k-mooneyno18:59
sean-k-mooneyso did i19:00
sean-k-mooneycould it happen in the compute on resize maybe19:00
dansmithno, it can't call the scheduler19:00
sean-k-mooneywell its in the scheduler utils19:00
sean-k-mooneybut ya19:01
dansmithunless that's just being logged from a util method we're calling to generate an allocation update or something19:01
sean-k-mooneydo we have placmenet logs19:01
*** gbarros has joined #openstack-nova19:02
sean-k-mooneythat is a request for PCPUs right https://zuul.opendev.org/t/openstack/build/c1f6ccc551c649bab6f60911fd3485ff/log/controller/logs/screen-placement-api.txt.gz#34219:02
*** hemna has quit IRC19:03
sean-k-mooneyits using the fallback19:03
dansmiththat's a req from the scheduler I assume,19:03
dansmithI'm not sure why we're seeing the translation message in the compute log tho19:03
sean-k-mooneyyes and just below it we see it request again with VCPUS19:04
sean-k-mooneystephenfin: did i miss adding something in the nova conf19:05
stephenfinsean-k-mooney: what did you add?19:05
sean-k-mooneyjust vcpu_pin_set in this case  + cpu_shared_set19:05
sean-k-mooneybut it hsould still get PCPU no?19:06
stephenfinnope19:06
sean-k-mooneyoh ok19:06
dansmithstephenfin: do you know why we're logging that translation thing in the compute log? is it from generating allocations to compare in the periodic or something?19:06
stephenfinthe host won't do reshape or report PCPU until '[compute] cpu_dedicated_set' is configured. That's our switchover19:06
sean-k-mooneyah ok19:06
sean-k-mooneyill update that job then19:07
stephenfindansmith: which one?19:07
sean-k-mooneythe other use cpu_share_set and cpu_dedicated_set19:07
stephenfinoh Translating request for VCPU=2 to VCPU=0,PCPU=219:07
dansmithyeah19:07
stephenfinyeah, sec19:07
dansmithI saw that, forgot it was in the compute log and was like "cool, cool, we're ... wait a sec." :)19:08
stephenfinso this is the code that does the translation https://review.opendev.org/#/c/671801/48/nova/scheduler/utils.py19:09
stephenfinand the question is where do we build ResourceRequest object on the compute ndoe19:09
stephenfin*node19:09
stephenfinbecause I didn't add that :)19:09
dansmithif we're legit logging that we should probably change that wording a little19:09
stephenfingot it19:10
stephenfinhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/utils.py#L595-L60219:10
sean-k-mooney"cpu_dedicated_set: 0-6" is the correct spelling yes19:10
stephenfinyup, and it has to be in the compute group19:11
sean-k-mooneyya19:11
stephenfindansmith: That's from the vCPU model selection code and it was added as a quick way to extract the traits19:12
openstackgerritsean mooney proposed openstack/nova master: [DNM] numa + pcpus in placment live migration tests  https://review.opendev.org/68177119:12
stephenfinSo agreed, it's misleading, and I didn't see it before because the vCPU model selection change only merged last night19:13
dansmiths'cool, just sayin'19:13
sean-k-mooneyok that re running19:13
sean-k-mooneythe 3 jobs that are just running the PCPU code shoudl report back in the next 20 mins or so the last one with numa will take about an hour and a half19:14
stephenfincool :)19:14
stephenfinThe solution, fwiw (and IMO), is what efried has asked for elsewhere: a util function to pull traits and resources from a flavor in a standard way, so we don't have a load of 'resourceNN:<type>' regexes about the place19:15
stephenfinregexes? regexi?19:15
dansmithsean-k-mooney: so what's the deal on the failed db queries?19:15
dansmithsean-k-mooney: worried that maybe we're lazy-loading some api-only field or something19:15
sean-k-mooneyif they dont show up the ones that is stephens only then its from artoms series19:16
sean-k-mooneyif its in both its from master19:16
*** TxGirlGeek has quit IRC19:17
sean-k-mooneythis https://review.opendev.org/#/c/681771/1 is the combiend job and https://review.opendev.org/#/c/681807/2 is the 3 pcpu only jobs19:17
*** TxGirlGeek has joined #openstack-nova19:17
sean-k-mooneyi think i did see that with artoms series but not in all versions19:17
sean-k-mooneycould it be the late upcall for anti affintiy19:18
sean-k-mooneyif we are lazy loadign somethign im not sure what it would be off the top of my head19:19
stephenfinsean-k-mooney, dansmith: WIP manual testing notes here too, btw https://etherpad.openstack.org/p/nova-cpu-resources19:20
stephenfinI'm at controller+compute updated to use PCPU code with other compute using plain master (without PCPU code). Onto setting 'cpu_dedicated_set' on the former now19:21
stephenfinI went with master because DevStack wouldn't deploy from stable/stein and I figured this would be easier simulate "upgrades" from nova's perspective19:21
sean-k-mooneyi ment to get food earilar and still havent so im gong to run for an 30 mins to an hour19:22
*** itlinux is now known as itlinux-away19:22
stephenfinI just want to test the reshaper and create/move operations with nodes with and without the PCPU stuff, after all19:22
dansmithsean-k-mooney: I dunno how it could be unrelated if it works on master (the lazy load)19:22
*** awalende has joined #openstack-nova19:23
*** itlinux-away is now known as itlinux19:24
*** itlinux is now known as itlinux-away19:24
*** itlinux-away is now known as itlinux19:24
*** itlinux is now known as itlinux-away19:24
sean-k-mooneyits likely an issue in artoms code then if stephen has not seen it in his testing?19:26
*** artom has quit IRC19:26
*** itlinux-away is now known as itlinux19:26
*** itlinux is now known as itlinux-away19:26
dansmithno, I would think it's more likely that stephen added something that looks at an object field, but hasn't been testing with computes and conductors that don't have access to the api database19:27
*** itlinux-away is now known as itlinux19:27
*** TxGirlGeek has quit IRC19:27
*** itlinux is now known as itlinux-away19:27
dansmithartom's code eventually passed this right?19:27
*** bbowen has joined #openstack-nova19:27
*** awalende has quit IRC19:28
mnasermelwitt: thanks (re os-vif)19:29
*** factor has quit IRC19:30
*** factor has joined #openstack-nova19:30
*** artom has joined #openstack-nova19:31
dansmith...still not a damn thing in the gate19:31
*** itlinux-away is now known as itlinux19:31
*** itlinux is now known as itlinux-away19:31
*** itlinux-away is now known as itlinux19:32
*** itlinux is now known as itlinux-away19:32
*** lbragstad has joined #openstack-nova19:32
melwittthe Nova Penalty™19:33
*** itlinux-away is now known as itlinux19:34
*** itlinux is now known as itlinux-away19:34
artomdansmith, stuff that's been +W'ed pre-FF can be rechecked until it merges, right?19:40
artomIt's not like "whelp, FF, killall the things"19:41
* artom has to run19:42
*** artom has quit IRC19:42
dansmith...mkay19:42
*** itlinux-away is now known as itlinux19:43
*** itlinux is now known as itlinux-away19:43
stephenfindansmith: okay, so, here's what I've tested19:43
stephenfinthat I can boot pinned and unpinned instances on master19:44
stephenfinthat if I "upgrade" a compute node to use the PCPU stuff, the existing instances keep working19:44
stephenfinthat I can boot new instances on that upgraded compute node19:44
stephenfinthat if I set 'cpu_dedicated_set', all my pinned instance magically transition to having PCPU inventory19:45
stephenfin(that one was tricky because I (boldly) had pinned and unpinned instances on the same host, so I have to pick a value for 'cpu_dedicated_set' that match the pinset of the pinned instances and still leave some aside for 'cpu_shared_set')19:46
stephenfinthat I can continue to boot pinned instances after having set that, and they will go to both the host reporting PCPUs and the host without PCPUs19:47
stephenfinand throughout all of that, placement's view of WTF is happening remains consistent19:48
stephenfin*that placement's19:48
stephenfindansmith: How's that sounding so far? Anything in particular you want me to double back on or try?19:49
stephenfinHappy to tar up the logs from both compute nodes for your perusal too. They're a bit messy because I've made a few (user-driven) mistakes along the way but they should do to quickly grep for ERRORs19:50
stephenfinThe main ones of those I've seen are errors from RabbitMQ about trying again in 1 sec, which I seem to always get in DevStack19:50
*** itlinux-away is now known as itlinux19:51
stephenfinand libvirt b****** when I tried to set 'cpu_dedicated_set' to a range that excluded pinned cores from existing instances19:51
*** itlinux is now known as itlinux-away19:51
*** TxGirlGeek has joined #openstack-nova19:52
stephenfinThat's an issue already with vcpu_pin_set but I thought I'd fixed that with this bugger https://review.opendev.org/#/c/680107/ I've clearly missed something though and will keep investigating19:52
stephenfinAlso, there's a bug in the vCPU model selection code19:53
*** icarusfactor has joined #openstack-nova19:53
*** TxGirlGeek has quit IRC19:54
stephenfinNamely, if I have a required trait and it's not a CPU flag, this will contain at least some None entries https://github.com/openstack/nova/blob/master/nova/virt/libvirt/utils.py#L60019:54
*** itlinux-away is now known as itlinux19:55
stephenfinand we have a later check to see if the result from the function is false'y19:55
*** factor has quit IRC19:55
*** itlinux is now known as itlinux-away19:55
stephenfinset([None]) is not false'y19:55
stephenfinpatch incoming for that19:55
*** itlinux-away is now known as itlinux19:55
*** itlinux is now known as itlinux-away19:56
*** TxGirlGeek has joined #openstack-nova19:56
*** itlinux-away is now known as itlinux19:59
*** itlinux is now known as itlinux-away19:59
sean-k-mooneystephenfin: here is the results for https://review.opendev.org/#/c/681827/2  https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_5bb/681827/2/experimental/nova-nfv-multinode/5bb9889/testr_results.html.gz20:00
sean-k-mooneythat is your code without  artoms20:00
sean-k-mooneythe only thing that failed were 2 live migration tests20:00
stephenfinthat's exactly what we expected, right?20:01
sean-k-mooneythere are a coupld of extra failure in https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_909/681840/2/experimental/nova-nfv-multinode/9090152/testr_results.html.gz20:01
sean-k-mooneywhichi is using vcpu_pin_set again20:01
dansmithresize fails are not expected20:01
dansmithDetails: {'code': 400, 'created': '2019-09-12T18:52:42Z', 'message': 'CPU set to unpin [1] must be a subset of pinned CPU set []'}20:01
*** kaliya has joined #openstack-nova20:02
*** itlinux-away is now known as itlinux20:02
sean-k-mooneyyou get set[] if the update resouce provider task fails20:02
*** itlinux is now known as itlinux-away20:02
sean-k-mooneywhich happen if you live migrate and cause a pinning conflict20:02
*** itlinux-away is now known as itlinux20:02
*** itlinux is now known as itlinux-away20:03
sean-k-mooneybut i just got back form getting food so now im going to go eat in20:03
sean-k-mooneybrb20:03
dansmithah, so you think this had already caused a pinning conflict and then a resize failed or something?20:03
dansmithI thought you were running these with concurrency=120:03
sean-k-mooneyya20:03
sean-k-mooneyyes but it takes a while for it to get fixed20:03
*** itlinux-away is now known as itlinux20:04
sean-k-mooneye.g. the perodic task time20:04
dansmitheven after the delete?20:04
*** itlinux is now known as itlinux-away20:04
sean-k-mooneyit wont be fixed until the update_resouces perodic runs i think20:04
dansmiththat seems odd20:05
*** kaliya has quit IRC20:06
*** itlinux-away is now known as itlinux20:06
*** itlinux is now known as itlinux-away20:07
*** dolpher has joined #openstack-nova20:09
*** markvoelker has quit IRC20:11
*** markvoelker has joined #openstack-nova20:11
dansmithI see that both in the periodic as well as during a boot20:11
*** itlinux-away is now known as itlinux20:13
*** itlinux is now known as itlinux-away20:13
*** tbachman has quit IRC20:15
openstackgerritBalazs Gibizer proposed openstack/nova master: Note about Destination.forbidden_aggregates  https://review.opendev.org/68094520:18
dansmithstephenfin: sorry I missed the ping above20:18
*** itlinux-away is now known as itlinux20:18
*** itlinux is now known as itlinux-away20:18
dansmithstephenfin: I want to see tempest passing against it.. I'm glad you're doing lots of hand testing of it too, and I believe that whatever you're doing is working if you say it is20:19
*** tbachman has joined #openstack-nova20:19
dansmithnot that it matters, several other people are apparently fine +Wing this without seeing our test suite run on it, so this is just for funsies at this point20:19
stephenfinI guess having a bajillion functional tests does help thing though20:20
*** itlinux-away is now known as itlinux20:20
*** itlinux is now known as itlinux-away20:20
*** oomichi_ has quit IRC20:20
stephenfinYes, I know they're not the same thing, but this is all touching the management'y parts of nova20:20
stephenfinwhich the functional tests are excellent for20:20
dansmiththis is all touching plenty of the stuff in nova that breaks all the time20:20
*** xek has quit IRC20:21
*** itlinux-away is now known as itlinux20:21
*** itlinux is now known as itlinux-away20:21
*** itlinux-away is now known as itlinux20:21
*** itlinux is now known as itlinux-away20:21
*** itlinux-away is now known as itlinux20:23
*** itlinux is now known as itlinux-away20:23
*** itlinux-away is now known as itlinux20:23
*** itlinux is now known as itlinux-away20:23
*** itlinux-away is now known as itlinux20:27
*** itlinux is now known as itlinux-away20:27
sean-k-mooneyok im back20:27
dansmithsean-k-mooney: so what's the deal with the test run that was showing the object/db errors?20:29
sean-k-mooneyi dont know but thats what im going to look into now20:29
sean-k-mooneydid we see that in the ones for just stephens code20:29
sean-k-mooneyor  is it only in the combined one20:29
*** lpetrut has joined #openstack-nova20:30
dansmithonly combined, afaik20:30
sean-k-mooneyok ill redeploy just artoms code and see if its there20:30
dansmithsean-k-mooney: it would have caused the tests to fail when we originally run on artom's code if it was right?20:30
sean-k-mooneyi wonder if it could be releated to the instance.refresh()20:31
dansmithno, I think it was on reqspec20:31
sean-k-mooneyam im not sure it would20:31
*** itlinux-away is now known as itlinux20:31
sean-k-mooneywhen i saw that previoulsy the migrtaions worked20:31
dansmithsean-k-mooney: I asked earlier why not and what is different20:31
*** itlinux is now known as itlinux-away20:31
stephenfinDo we know if that instance was moved before attempting to rebuild? I'm assuming not20:32
dansmiththese were failing in the pcpu tests as a result no?20:32
efriedstephenfin: thanks for those reviews. There's going to be a merge conflict with numa lm on test_objects, so I'm going to rebase now to resolve preemptively, cool?20:32
*** itlinux-away is now known as itlinux20:32
*** itlinux is now known as itlinux-away20:32
*** pcaruana has quit IRC20:32
stephenfinefried: yeah, go for it. I've been meaning to do that myself but got sidetracked20:33
*** itlinux-away is now known as itlinux20:33
stephenfinLemme know what ones I need to rehit if the merge conflicts are non-trivial20:33
*** itlinux is now known as itlinux-away20:33
*** itlinux-away is now known as itlinux20:33
*** itlinux is now known as itlinux-away20:33
stephenfinefried: Aaaactually, do you think you could rebase the cpu-resources series onto that too?20:33
efriedstephenfin: Yeah, I can do that, but in two stages if that's okay with you.20:34
stephenfinI pointed out earlier that I think that's going to have the same issue20:34
stephenfincool by me20:34
dansmithsean-k-mooney: File "/opt/stack/nova/nova/objects/instance_group.py"20:34
dansmithsean-k-mooney: so maybe it is the late affinity check, but.. I don't understand why we'd hit that here but not in regular/other tests.. can you explain?20:34
sean-k-mooneydansmith: we turn it off in the upstream gate but i might not be truning it off in my jobs20:35
dansmithah okay20:35
sean-k-mooneyit shoudl be in the nova-cpu.conf right20:35
sean-k-mooneythe workarounds section20:35
*** itlinux-away is now known as itlinux20:36
*** itlinux is now known as itlinux-away20:36
dansmithoh you mean we turn off the check, not disable whatever test this is?20:36
dansmithI dunno where it is specifically but nova-cpu.conf yeah20:36
sean-k-mooneyyes we disabel the upcall20:36
sean-k-mooneyalso https://8ab1fb0a384d9cbfa221-969de3c017bb40c2acaf4bf21edd2ff6.ssl.cf1.rackcdn.com/681771/2/experimental/nova-nfv-multi-numa-multinode/1faf214/testr_results.html.gz20:36
sean-k-mooneyhttps://review.opendev.org/#/c/681771/20:36
*** markvoelker has quit IRC20:37
dansmithdo we have any grenade results yet?20:37
sean-k-mooneyno i dont have a grenade job set up. but i can try to set one up quickly20:37
dansmithI don't think quickly matters anymore,20:38
dansmithbut it'd be good to get a run on it while we have context, IMHO20:38
*** trident has quit IRC20:39
sean-k-mooneyso there is a patch to convert grenade to non legacy20:39
*** markvoelker has joined #openstack-nova20:39
sean-k-mooneyif i depend on that i should be able to do the same config changes20:39
*** mriedem_afk is now known as mriedem20:39
dansmithoh, is that why it's hard because it's still a legacy job?20:39
sean-k-mooneydansmith: this https://review.opendev.org/#/c/548936/20:40
sean-k-mooneydansmith: yes20:40
dansmithgotcha20:40
dansmithwell, if that works currently, then yeah, try that I'd say20:40
sean-k-mooneydevstack-gate is a pain to get kvm working20:40
sean-k-mooneyyes it looks like its passing so ill try and modify it20:40
dansmithcool20:40
sean-k-mooneywell depend on it20:40
mriedemso...things are where they were 2 hours ago?20:41
mriedemooo down to 179 changes in the check queue20:41
dansmithstill nothing in the gate tho20:41
mriedemyup, i don't expect nova stuff to probably be fully merged until maybe this weekend20:42
mriedemif you account for rechecks20:42
sean-k-mooneydansmith: by the way the latest test run for the combiniend numa + pcpus job does not appear to have the db error20:42
dansmithsean-k-mooney: that db error wouldn't be flaky, so that makes me kinda wonder20:42
sean-k-mooneywell the difference is i am using cpu_dedicated_set instead of vcpu_pin_set20:43
dansmithsean-k-mooney: unless one node is configured to have api db access and the other is not, and you don't get the fail if you're lucky and land on the one20:43
sean-k-mooneystephenfin: could we be trying to lazy load the pinned cpus or something when we use vcpu_pin_set instead of cpu_dedicated_Set20:45
*** itlinux-away is now known as itlinux20:45
*** TxGirlGeek has quit IRC20:45
*** itlinux is now known as itlinux-away20:45
mriedemdevstack shouldn't configure nova-cpu.conf nor either of the cell conductor configs for api db access20:45
mriedemby default anyway20:45
dansmithmriedem: yeah I dunno why this would be legit failing here and not on master tho20:46
*** lpetrut has quit IRC20:46
sean-k-mooneyyeah the db is not configured on either node20:46
mriedemtalking about this job? https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_909/681840/2/experimental/nova-nfv-multinode/9090152/testr_results.html.gz20:46
stephenfinsean-k-mooney: I'm looking but I don't see why it would.20:47
dansmithno20:47
dansmithmriedem: https://20aa241df59e72844ae9-597ff148d0ea9164d11e7cb764cf9b04.ssl.cf5.rackcdn.com/681771/1/experimental/nova-nfv-multi-numa-multinode/c1f6ccc/compute/logs/screen-n-cpu.txt.gz20:47
efriedstephenfin: that rebase isn't going to work, because numalm is based waaay back on master, so that way conflicts with (at least) cpu models. I could rebase numalm, but it would lose its place in the queue (~10.5h) so I don't want to do that. Just going to ride it out, mkay?20:47
dansmithmriedem: but I think this is just the late affinity check that must not be disabled on sean-k-mooney's20:47
stephenfin(y)20:47
sean-k-mooneythis one https://zuul.opendev.org/t/openstack/build/1faf2148524e41638672ff627cb9a011/log/compute/logs/etc/nova/nova-cpu_conf.txt.gz#9220:48
mriedemdisable_group_policy_check_upcall20:48
dansmithsean-k-mooney: that's set on your job?20:48
mriedemit's not in https://20aa241df59e72844ae9-597ff148d0ea9164d11e7cb764cf9b04.ssl.cf5.rackcdn.com/681771/1/experimental/nova-nfv-multi-numa-multinode/c1f6ccc/compute/logs/etc/nova/nova_conf.txt.gz20:48
mriedemin the subnode20:48
sean-k-mooneyyes20:48
dansmithokay not being on the subnode would explain why we hit it sometimes and not others20:48
mriedemdevstack should configure that by default if you're using superconductor mode (which is the default)20:48
*** takashin has joined #openstack-nova20:48
*** hamzy_ has quit IRC20:49
sean-k-mooneyits set on the contoler and compute20:49
mriedemin that link above,20:49
mriedemit's set on the compute on the controller host https://20aa241df59e72844ae9-597ff148d0ea9164d11e7cb764cf9b04.ssl.cf5.rackcdn.com/681771/1/experimental/nova-nfv-multi-numa-multinode/c1f6ccc/controller/logs/etc/nova/nova-cpu_conf.txt.gz20:49
mriedemthe primary20:49
mriedembut not the subnode20:49
mriedemso that's why it's flaky20:49
* dansmith nods20:49
mriedemi'm not sure why it wouldn't be set, unless the zuul yaml post-config is overwriting the workarounds section20:51
mriedembut the controller devstack run fixes it but the subnode one doesn't20:51
*** trident has joined #openstack-nova20:51
sean-k-mooney i dont think that is what is happening20:51
sean-k-mooneythat is the comptue/subnode https://zuul.opendev.org/t/openstack/build/1faf2148524e41638672ff627cb9a011/log/compute/logs/etc/nova/nova-cpu_conf.txt.gz#9220:52
sean-k-mooneythat is the contoler/primary https://zuul.opendev.org/t/openstack/build/1faf2148524e41638672ff627cb9a011/log/controller/logs/etc/nova/nova-cpu_conf.txt.gz#11720:52
sean-k-mooneyon the run that pased its set on both20:52
mriedemyeah i was looking at the one that failed20:52
sean-k-mooneyoh20:52
mriedemnote that i'm coming into this 2 hours late :)20:52
sean-k-mooneyso the one that failed didnt have it set on both20:53
dansmithright20:53
sean-k-mooneyok20:53
sean-k-mooneyi have no idea why that is then20:53
sean-k-mooneyi can add it expclitly to the job i guess20:53
*** itlinux-away is now known as itlinux20:54
*** itlinux is now known as itlinux-away20:54
sean-k-mooneyi also dont have that set locally so that is prably why i sometimes saw this an other times did not20:54
mriedemsean-k-mooney: different patch sets for those jobs,20:54
mriedemnot sure if that matters,20:54
mriedemi was looking at PS120:54
mriedemhttps://review.opendev.org/#/c/681771/1..2/.zuul.yaml20:54
*** itlinux-away is now known as itlinux20:54
mriedemdo'nt know why that would change things20:55
*** itlinux is now known as itlinux-away20:55
*** itlinux-away is now known as itlinux20:55
sean-k-mooneynor do i20:55
*** itlinux is now known as itlinux-away20:55
sean-k-mooneythat was needed to activate stephens code20:55
mriedemunless there is a devstack or zuul bug or something20:55
sean-k-mooneysice cpu_dedicated_set is what does the reshape20:55
mriedemmaybe https://github.com/openstack/devstack/commit/2468ceaa724aa5c8c44fb87ae223eb6687ff85f2 regressed something in devstack20:56
sean-k-mooneyanwyway the important thing is the last run pased with both seriese merged togoether so at least that good20:56
sean-k-mooney maybe but both of thoes runs were done today20:56
mriedemyeah, i guess we just keep an eye out for random failures like that20:57
*** mdbooth has quit IRC20:57
sean-k-mooneyya maybe check logstash to see if its happening20:57
sean-k-mooneyright so i was going to try and create a non legacy grenade job with all the onter non merged stuff.20:58
*** hemna has joined #openstack-nova20:58
*** TxGirlGeek has joined #openstack-nova20:58
dansmithmriedem: we only do that if we have an instance group right?20:59
sean-k-mooneywhat of the chaces 4 un merged unrelated seriese will all work togeher perficectly20:59
dansmithdoes the network advanced server ops thing use groups?20:59
sean-k-mooneyi dont think so20:59
sean-k-mooneyyou mean server groups right20:59
dansmithso I'm wondering why we'd be hitting this for that20:59
dansmithyes, server grops20:59
dansmithgroups even20:59
sean-k-mooneydo we do check for resize/migrate21:00
dansmithonly if they're in a group, IIRC21:00
sean-k-mooneyits for the anti afinity stuff right21:00
openstackgerritEric Fried proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces  https://review.opendev.org/67845321:00
openstackgerritEric Fried proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845421:00
openstackgerritEric Fried proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845521:00
openstackgerritEric Fried proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845621:00
openstackgerritEric Fried proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67964021:00
openstackgerritEric Fried proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847021:00
openstackgerritEric Fried proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139721:00
*** slaweq has quit IRC21:00
dansmithI was wondering earlier if we were failing to boot because of pcpu stuff and running that group recalc in a reschedule or something21:01
*** itlinux-away is now known as itlinux21:01
efriedstephenfin: https://review.opendev.org/#/c/678453/33..34/nova/tests/unit/objects/test_objects.py21:01
mriedemefried: nova meeting?21:01
efriedyes21:01
*** itlinux is now known as itlinux-away21:01
mriedemdansmith: yeah https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L145721:01
ozzzoI'm trying to get live migration working without shared storage. I got libvirtd to listen and set auth_tcp = "none" to fix the auth errrors; now the error is:21:02
ozzzo2019-09-12 13:05:10.154 60779 ERROR nova.virt.libvirt.driver [-] [instance: 01a9d185-f003-4f59-b87b-9256f5ea2eaa] Live Migration failure: unsupported configuration: Unable to find security driver for model apparmor: libvirtError: unsupported configuration: Unable to find security driver for model apparmor21:02
mriedemthough if it's a reschedule there are a couple of known up-call fails from that21:02
mriedemdansmith: for resize and build reschedule21:02
ozzzoDoes anyone know what that means? I don't have apparmor running21:02
mriedemto pull the AZ for the host which goes to aggregates21:02
mriedemdansmith: https://bugs.launchpad.net/nova/+bug/178128621:02
openstackLaunchpad bug 1781286 in OpenStack Compute (nova) "CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call" [Medium,Triaged]21:02
sean-k-mooneyozzzo: that is proably what its complaining about21:03
dansmiththis one looked like setup_groups21:03
mriedemError trying to reschedule: oslo_messaging.rpc.client.RemoteError: Remote error: CantStartEngineError No sql_connection parameter is established21:03
dansmithor whatever it's called21:03
*** itlinux-away is now known as itlinux21:03
ozzzoI need apparmor for live migration?21:03
sean-k-mooneyno21:03
*** itlinux is now known as itlinux-away21:03
*** itlinux-away is now known as itlinux21:03
*** nweinber__ has quit IRC21:03
*** itlinux is now known as itlinux-away21:03
*** itlinux-away is now known as itlinux21:03
mriedemdansmith: nope it's that aggs bug21:03
mriedemhttps://20aa241df59e72844ae9-597ff148d0ea9164d11e7cb764cf9b04.ssl.cf5.rackcdn.com/681771/1/experimental/nova-nfv-multi-numa-multinode/c1f6ccc/controller/logs/screen-n-cond-cell1.txt.gz21:03
sean-k-mooneyozzzo: but if you dont have apparmor running you need to disable apparmor as the security driver in the libvirt config21:03
mriedemgrep for CantStartEngineError in there21:03
*** nweinber__ has joined #openstack-nova21:03
mriedemreschedules and then tries to get az from the cell conductor21:04
sean-k-mooneymriedem: ya we were seeing CantStartEngineError21:04
mriedemyup, known bug, see above21:04
mriedemi don't think tempest does a ton of server group testing but there are a few tests for server affinity and anti-affinty group testing21:05
dansmithmriedem: that fails from resouces unavailable on the compute21:05
*** mdbooth has joined #openstack-nova21:05
ozzzoI don't find "appa" in my libvirt config. Is it a default that I need to override?21:05
dansmith"Requested instance NUMA topology cannot fit the given host NUMA topology"21:05
mriedemdansmith: oh, maybe that's a race?21:05
mriedemare the tests running in serial?21:05
dansmithsean-k-mooney: ^21:05
ozzzooic it looks like it is the default in qemu.conf21:05
ozzzo#       security_driver = [ "selinux", "apparmor" ]21:06
sean-k-mooneydansmith: or the issue was the exception in update_resouce21:06
ozzzowhat would that look like, to disable both?21:06
ozzzosecurity_driver = []   ?21:06
sean-k-mooneyozzzo: i think by defualt it will use either21:06
sean-k-mooneyyou have to expclity diable it21:06
sean-k-mooneymriedem: and yes i have it running serially.21:07
mriedemozzzo: please see the channel topic, this is a mostly dev focused channel and today is feature freeze for the train release so as you can see there is quite a bit of dev activity trying to get this release closed out21:07
mriedemand we're having a meeting at the same time over in #openstack-meeting21:08
mriedemozzzo: #openstack-operators or just #openstack might help, or the ask.o.o forum21:08
mriedemotherwise try hitting up the openstack-discuss mailing list21:08
sean-k-mooneyozzzo: i think you need to do [] yes but i dont know.21:09
ozzzook i'll try it, ty21:09
mriedemanyway, sean-k-mooney dansmith reschedule in gate jobs with devstack doesn't really work b/c of that up-call21:09
mriedemwe just don't normally see or care about it b/c if we hit it the job fails and we're likely toast anyway, but we don't normally see reschedules in tempest runs anyway,21:10
mriedemunless like libvirtd dies on a compute and if that happens we fail anyway21:10
dansmiththat's why I'm wondering if we're failing and doing a reschedule because of the changes under test21:10
mriedemcould be latent races in the numa claims code but yeah, idk21:12
mriedemif we're running tests in serial that should be less of a risk, unless some test is creating multiple servers in one run21:12
mriedemand we're trying to pin all to the same cpus?21:12
*** nweinber__ has quit IRC21:12
dansmithoh well, we'll just wait for mnaser to shake out all the bugs in here when he rolls it out21:12
* dansmith facepalms21:13
mriedemthe job is setup to pin to the same cpus for the flavor we're using?21:13
mriedemor how does that work?21:13
mriedemiow, if a test creates more than one server on the same host, are we guaranteed to fail a claim for one and reschedule?21:14
sean-k-mooneyno21:14
sean-k-mooneywe have 8 cpus in the gate vms21:14
sean-k-mooneythe instace are each claiming 121:14
sean-k-mooneyand i have 7 of the 8 enabeld21:15
sean-k-mooneybut some tests do create multiple vms21:15
mriedemyeah ok, but there aren't any tempest tests that create more than 3 servers i'm pretty sure21:17
mriedemin a single test case i mean,21:17
mriedemand i'm pretty sure the tempest tearDown waits for the servers to be gone, i.e. 404 from the API21:17
sean-k-mooneythe case weher we saw that errror the teardwon failed to delete before hiting the timeout21:18
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Correctly handle non-CPU flag traits  https://review.opendev.org/68193221:18
*** artom has joined #openstack-nova21:18
stephenfinmriedem, dansmith, efried: ^21:18
mriedemstephenfin: is that a master only regression? it should probably have a bug21:19
stephenfintest is wrong - I rushed it21:19
stephenfinI can do that21:19
mriedemso only unit tests, no functional coverage21:19
sean-k-mooneydid that get intoduced with the sev stuff21:19
mriedemthis might be where i asked for a nova/tests/functional/regressions test case21:20
mriedemcode: do something wrong21:20
mriedemunit test: assert the code did something wrong21:20
*** JamesBenson has quit IRC21:22
*** tbachman has quit IRC21:23
stephenfincan someone give me an example of a non-CPU flag trait we support at the moment?21:23
*** zhubx has joined #openstack-nova21:25
mriedemCOMPUTE_SUPPORTS_MULTIATTACH21:25
sean-k-mooneyi was going to mention the once in my seriese that we punted...21:25
sean-k-mooneyya that21:25
mriedemso i bet we could recreate in devstack by just adding that to a tempest flavor yeah?21:26
sean-k-mooneyya21:26
sean-k-mooneyyou can do it the way im doing all the nfv/numa jobs21:27
*** JamesBenson has joined #openstack-nova21:27
mriedemheh even just COMPUTE_ATTACH_INTERFACE21:27
stephenfinmriedem: https://bugs.launchpad.net/nova/+bug/184383621:27
openstackLaunchpad bug 1843836 in OpenStack Compute (nova) "Failure to schedule if flavor contains non-CPU flag traits" [Undecided,New]21:27
*** boxiang has quit IRC21:27
sean-k-mooneywell all the compute capablity traits fall into that catagory right21:27
mriedemstephenfin: yeah anything in here really https://github.com/openstack/nova/blob/master/nova/virt/driver.py#L10221:28
efriedartom et al, numalm bottom failed l-c on subunit parser https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_769/634827/60/check/openstack-tox-lower-constraints/7693dab/testr_results.html.gz21:28
artomefried, ugh, recheck?21:28
artomThat's not a legit thing, right?21:29
efried#2 failed functional on 6!=7 -- we fixed that mriedem right? Rebase?21:29
efriedartom: question whether to recheck or rebase21:29
efriedartom: you're way down the master branch21:29
mriedemefried: gate should pick it up from master21:29
efriedyou would think so mriedem, but it's possible this was queued from before that merged too.21:29
efriedhttps://ebfd4ec54dbc92b01f1e-dbc96e30a25abd79ad83d28f747c4e2b.ssl.cf2.rackcdn.com/635229/62/check/nova-tox-functional/a59c098/testr_results.html.gz21:29
artomefried, yeah, that was intentional to make reviewing easier21:30
mriedemthe subunit to parser failure is when a test dumps too much shit to the logging output stream21:30
sean-k-mooneyefried: it will do a merge of the patch you see in gerrit and master21:30
mriedemefried: yeah maybe21:30
mriedemthat claims one was in the gate b/c my fix was merged21:30
mriedem*before21:30
efried#3 hasn't failed yet, #4 is still queued21:30
*** JamesBenson has quit IRC21:30
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Correctly handle non-CPU flag traits  https://review.opendev.org/68193221:31
efriedso the question is, do we recheck 1&2, hoping 3&4 pass, and save their place in the queue21:31
efriedor do we rebase, and lose 3&4's place in the queue21:31
sean-k-mooneyif we are going to rebase the LM series now is proably better then in a few hours21:31
*** henriqueof1 has quit IRC21:31
* artom still isn't sure what the actionable thing to do here is21:31
efriedartom: your call21:31
stephenfinmriedem: I'll whip up a functional test for that tomorrow. It's too late to face into that now21:31
mriedemha, got a node after 11 hours, then failed21:31
stephenfinbut unit test is fixed and I've linked the bug21:31
mriedemstephenfin: sure, i was just going to push a simple devstack patch to show the failure21:31
artomefried, does it really matter? I'm assuming if I rebase now, I'll get administrative +Ws21:32
efriedartom: first option we have to wait until gate posts actual failure, which will be a while. IMO second option would be best21:32
efriedartom: yes21:32
artomIt's not like we need this by tomorrow21:32
*** hemna has quit IRC21:32
*** markvoelker has quit IRC21:32
stephenfingiant chain of patches21:32
stephenfinwheee21:32
mriedemartom: sorry but my +Ws on that series were only good through yesterday21:32
artomIt can wait the weekend or w/e, when the load on the gate isn't as high21:32
artommriedem, you sonnova21:32
mriedemi thought you saw the disclaimer?21:32
sean-k-mooneyartom: we are better off just rebasing it now i think21:33
efriedartom: by rebasing now, you kick four patches out of the queue, leaving room for other things.21:33
*** lbragstad has quit IRC21:33
* artom rebases21:33
sean-k-mooneyif we can line ups the LM serise then put PCPUs on top the vPMEM i think that is the best path forward21:33
* efried waits with +Wand21:34
stephenfinsean-k-mooney: As things would have it, I've got a PCPUs on vPMEM rebase just waiting to go21:34
*** JamesBenson has joined #openstack-nova21:34
* efried stops doing that ^21:34
sean-k-mooneystephenfin: then after artom pushs can you rebase on top?21:35
efrieddon't do that ^21:35
stephenfinyarp21:35
efriedcause it'll kick vpmem out21:35
stephenfinor not21:35
stephenfinactually, yeah, no21:35
mriedemare these all conflicting?21:35
efriedminorly21:35
stephenfinPCPU doesn't conflict with LM21:35
efriedand I un-conflicted vpmem with lm21:35
mriedemi really only care about numa lm21:35
efriedyeah we know21:36
artommriedem, <321:36
stephenfinVPMEM conflicted with LM but efried fixed it21:36
mriedem...because it's been a thing people have wanted forever21:36
stephenfinPCPU and VPMEM is a horror show21:36
sean-k-mooneymriedem: sicne we intoduced numa in like juno21:36
artommriedem, and here I thought it was because of my pretty eyes21:36
stephenfinin terms of number of merge conflicts, not complexity21:36
openstackgerritArtom Lifshitz proposed openstack/nova master: New objects for NUMA live migration  https://review.opendev.org/63482721:36
artomIncoming21:36
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: Use Claims to update numa-related XML on the source  https://review.opendev.org/63522921:36
mriedemartom: you do have nice brown (?) eyes21:36
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460621:36
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002121:36
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration  https://review.opendev.org/67259521:36
mriedemi got lost in them21:36
efriedhazel is my vote21:37
mriedem+Wing21:37
stephenfinI could have worked on k8s...21:37
*** hamzy_ has joined #openstack-nova21:37
sean-k-mooneystephenfin: i hear you like yaml21:37
sean-k-mooneystephenfin: becasue that is all k8s is as a user21:37
*** tbachman_ has joined #openstack-nova21:38
mriedemsoon we'll all only code in yaml21:38
*** JamesBenson has quit IRC21:39
efriedmriedem: "I'll push a devstack change to put a required trait that I know will be on all libvirt computes and that can show the failure for us quicker than writing some functional libvirt test." I assume this is a joke.21:39
mriedemefried: nope21:39
mriedemit's a 2 line change in devstack,21:39
mriedemyes i know it will take awhile to get a node21:39
efriedyeah, and an 11h...21:39
mriedembut as melwitt pointed out earlier, nova gets punished by zuul21:39
mriedemdevstack probably doesn't21:39
efriedtrue story21:39
melwittyeah, a devstack change might make it in a few hours. or less21:40
sean-k-mooneyor you can cheat. notice that all my jobs completed in like 2 yours21:40
efriedI thought you meant push a change to nova/devstack21:40
mriedemi heard fungi takes a devstack change manually and runs it on his own hardware :)21:40
*** henriqueof has joined #openstack-nova21:40
sean-k-mooneythe FN special lables im using have there own pool and if you put them in the experimental queue run almost right away21:40
melwittlet's cheat21:40
* melwitt jokes21:41
fungimriedem: i take devstack changes amd manually rub them on my feet. it's better than a pedicure21:41
mriedemi was not expecting that response21:42
melwittthat is some fungi CI indeed21:42
mriedemnow i just need to look up the magic incantation to configure a flavor with a required trait21:42
fungii no longer have a basement full of server racks21:42
sean-k-mooneymelwitt: i mean if we want to repoduce quickly we could. also im ment to be wringin a greneade job....21:42
funginor, you know, a basement21:42
sean-k-mooneyefried: stephenfin so what patches are where currently21:43
sean-k-mooneyvpmem is pending21:43
efriedmriedem: extra spec trait:YOUR_TRAIT_HERE='required' ?21:43
sean-k-mooneyand stephenfin you have a version of pcpu on that21:43
mriedemefried: WRONG21:43
mriedemtrait:<trait>=required21:44
mriedemquotes would kill you21:44
melwittsean-k-mooney: I only vaguely understand what y'all were talking about earlier. was just saying funny [to me] unhelpful things21:44
efriedmriedem: I was being syntax-y21:44
stephenfinsean-k-mooney: I do, yeah. Just running tests locally before I push it up21:44
efriedif you're doing it on an osc cli, the quotes will go away21:44
efrieds/n osc//21:44
mriedemhttps://review.opendev.org/68193821:45
sean-k-mooneymelwitt: :) i mean if one queue is slow and the other is fast sometimes cheating in the gate is for the greater good21:45
efriedsean-k-mooney: keeping in mind that shoving stuff in the queue from another project will push all the nova stuff down a slot too.21:46
sean-k-mooneyya i know that is why os-vif is generaly way quicker to test stuff in the nova21:46
fungiyep. the change scheduling round-robin allocates nodes to changes by project, so the more changes there are for a given project requesting resources, the longer the later ones will wait for a turn21:46
mriedemhttps://bugs.launchpad.net/nova/+bug/1843836 tagged for train-rc-potential so we can start using that tag21:46
openstackLaunchpad bug 1843836 in OpenStack Compute (nova) "Failure to schedule if flavor contains non-CPU flag traits" [Undecided,In progress] - Assigned to Stephen Finucane (stephenfinucane)21:47
sean-k-mooneyfungi: isnt that there to stop all gate capastity been eaten by triplo21:47
mriedemsean-k-mooney: and nova and neutron21:47
fungithat was a big part of it, but yes also nova and neutron ;)21:47
mriedemhttps://bugs.launchpad.net/nova/+bugs?field.tag=train-rc-potential21:48
sean-k-mooneyyes but i think triplo still uses like the equivalent of nova and neutron combined21:48
sean-k-mooneyanyway it makes it fairer for everyone else21:48
fungiunder the old first-come-first-served scheduling, if a project has someone push a 30-change series in one shot then those all got priority and one-off changes for other projects got to wait until that entire series got the requested resources21:49
* mriedem does shifty eyes21:49
*** markvoelker has joined #openstack-nova21:50
sean-k-mooneywe nver have 30+ long seriese in nova21:50
efriednever21:50
* efried reboots21:50
* fungi chuckles21:50
*** efried has quit IRC21:50
sean-k-mooneyonly ever have 3 of them together21:50
sean-k-mooneyits never just 121:51
sean-k-mooneyalso our downstream ci is supper dumb and really does not like it when that is down to it21:51
sean-k-mooneyour downstream ci does not first apply the previous patch before the current one21:52
*** efried has joined #openstack-nova21:52
fungii'll be the first to admit that the round-robin job scheduler algorithm is still painful, but in general being backlogged on builds is going to be painful for someone regardless21:52
sean-k-mooneyfungi: yes but at least zuul make series actully work21:53
sean-k-mooneyjenkins on the other hand...21:53
fungiif people have ideas for less painful scheduling algorithms/rules we can consider those too21:53
sean-k-mooneythe only one i have come up with that i thought was beeter we cant do.21:54
fungistill trying to figure out if some of the packet scheduling algorithms used for adaptive rate limiting in the network space could be applied to job scheduling21:54
sean-k-mooneywhich is split check into fast-check and check21:54
sean-k-mooneyand stick the tox jobs and docs in fast check21:55
fungiwe already have an analog of qos in place by prioritizing different pipelines21:55
fungiand perform windowing and exponential backoff on failure in dependent pipelines21:55
fungiso it's not out of the realm of possibility that networking ideas have still more we can steal from21:56
sean-k-mooneyfungi: just dont follow tcp's window algortioum21:56
sean-k-mooneybut ya21:56
fungiwell, yeah, it's not the exact same algorithm, just the basic idea21:56
fungibut basically we only allocate resources to a subset of changes in dependent pipelines like the gate, and then increase or decrease that window based on how many changes pass or fail21:57
sean-k-mooneythere are some intersting algortiom in the cache/task execution domain too21:57
fungiso that even though the gate pipeline has priority over the check pipeline, a perpetually failing gate load won't monopolize all available resources and leave check starved entirely21:58
sean-k-mooneywell to get to gate you have to go through check21:58
sean-k-mooneyso that should not happen anyway21:58
sean-k-mooneyit would be self regualting21:58
*** mlavalle has joined #openstack-nova21:59
sean-k-mooneythis would poably be better conversation to have on infra or zuul channel21:59
fungiexcept dependent pipelines can eat orders of magnitude more resources due to gate resets from failures near the front of the queue21:59
efriedfungi: would be neat to have a fast-fail toggle... somewhere, somehow.21:59
sean-k-mooneyefried: its a conflict between fail fast and report as much info as possibel22:00
efriedyeah, I know22:00
sean-k-mooneyfungi: how hard would it be to have each job report independly22:00
efriedbut if I know my patch *should* pass, I turn that toggle on, so if e.g. py27 blows up spuriously, the whole thing gets kicked out right away freeing up the rest.22:00
fungiit might be that fast-fail would be more appropriate in a dependent pipeline if its changes are filtered through an independent pipeline first like happens with the openstack tenant22:01
sean-k-mooneyefried: ya if a voting job fails it would be cool if it did nto start any other job in the job set but continue runniinng the ones that are running22:01
openstackgerritEric Fried proposed openstack/nova master: DB API changes to get non-matching aggregates from metadata  https://review.opendev.org/67107422:01
openstackgerritEric Fried proposed openstack/nova master: Add a new request filter to isolate aggregates  https://review.opendev.org/67107522:01
openstackgerritEric Fried proposed openstack/nova master: Docs for isolated aggregates request filter  https://review.opendev.org/66795222:01
fungigiven the expectation is that if the change has made it through check then it normally shouldn't fail in the gate either, so as soon as one failure is encountered it can just be reported and ejected22:02
efriedlike that one --^22:02
sean-k-mooneyfungi: isnet there a way to make jobs depend on other jobs in the same pipline22:02
fungithere is, but then you delay starting them until the others finish which increases time to report on success22:02
sean-k-mooneyyes but if you only dely it by the 5-10 minute fo the tox jobs22:03
sean-k-mooneythat might be worth it22:03
sean-k-mooneyfungi: its too bad we cant simulate this22:04
fungisure, i expect that depends on the project. ultimately though i think there's a lot more throughput to be gained by finding and fixing the bugs that lead to nondeterministic test results than in optimizing for failure22:04
*** weshay is now known as weshay|ruck22:04
sean-k-mooneywell i hope we are trying to do both22:04
fungisure, but the resources consumed by build failures far outweigh the meager gains from shuffling them around in the available resource pool22:06
stephenfinefried: Alrighty, functional, unit and pep8 tests passing so I'm going to push up this rebase of PCPU onto vPMEM. Can you spin through it and check out the conflicts when I do?22:07
* mriedem looks back at irc22:08
mriedemfungi: i apologize for rousing you into this channel and discussion22:08
mriedems/rousing/conjuring/22:08
*** tbachman_ has quit IRC22:09
fungimriedem: no apologies needed. i'm just sorry if i derailed otherwise stimulating discussion in here ;)22:09
fungii do still recommend devstack changes on the feet though... so soothing22:09
efriedstephenfin: will do.22:09
efriedstephenfin: actually, hold on22:10
sean-k-mooneyfungi: the scheduling probalem for gate jobs has a lot of similar challanges to nova schduler problem for vms btu a lot of difference too22:10
stephenfinholding22:10
efriedstephenfin: cpu-resources #1 just passed check and entered gate22:10
mriedemfungi: before you showed up and professionaled it up we were talking about artom's beautiful brown eyes22:10
efriedstephenfin: but #2 is failing check22:10
stephenfinefried: where are you seeing this?22:11
mriedemsean-k-mooney: i already said we should use zuul as a nova scheduler driver in -infra months ago22:11
mriedemdibs on that ideea22:11
efriedstephenfin: so if possible, just rebase #2+ onto vpmem22:11
sean-k-mooneyhehe its all yours22:11
efriedI... think that should be possible.22:11
sean-k-mooneymriedem: add a ai or ml to it and you have your self a start up22:11
stephenfinefried: I don't think I can straddle another branch like that22:12
stephenfinI'd need to pull in vpmem between #1 and #222:12
stephenfinwhich means pulling that out of the queue22:12
efriedyeah, boo22:12
sean-k-mooneyeven when its a pain im still glad we have the ci that zuul/infra provides. can you imaging merging all this stuff by hand and testing it like the kernel does22:14
* alex_xu probably can continue sleep22:14
*** hemna has joined #openstack-nova22:15
efriedalex_xu: Yes, please do, I don't think there's anything you need to do here. Thanks for all the work.22:15
alex_xuefried: stephenfin sean-k-mooney thank you all :)22:15
sean-k-mooneyalex_xu: night o/22:16
efriedstephenfin: I guess the choices are 1) wait for #1 to merge -- gate queue looks to be <2h -- then rebase the remainder on vpmem; or 2) rebase all right now and lose the +V on #122:16
stephenfinI've just going to rebase22:16
stephenfinit's one patch22:16
stephenfinif I don't, we'll need to rebase vpmem22:16
stephenfinto pick up the newly merged patch from master22:17
efriednot sure about that. But.. okay.22:17
sean-k-mooneybecause of a conflict?22:17
efriedno, because 2+ relies on 122:17
stephenfin^ that22:17
sean-k-mooneyoh ok ya22:17
sean-k-mooneythats our green check policy22:18
stephenfinmaybe the gate will rebase for me. idk22:18
sean-k-mooneythe gate will merge in some case but never rebase22:18
stephenfindoesn't seem worth worrying about though. it's easy to hit recheck all weekend long22:18
*** eharney has quit IRC22:19
*** tbachman has joined #openstack-nova22:19
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement  https://review.opendev.org/67179322:19
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available'  https://review.opendev.org/67269322:19
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'InstanceNUMATopology.cpu_pinning' property  https://review.opendev.org/68010622:19
openstackgerritStephen Finucane proposed openstack/nova master: Validate CPU config options against running instances  https://review.opendev.org/68010722:19
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Use sane indent  https://review.opendev.org/68022922:19
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'NUMACell.pcpuset' field  https://review.opendev.org/68010822:19
openstackgerritStephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs  https://review.opendev.org/67180022:19
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait  https://review.opendev.org/67557122:19
openstackgerritStephen Finucane proposed openstack/nova master: tests: Additional functional tests for pinned instances  https://review.opendev.org/68175022:19
openstackgerritStephen Finucane proposed openstack/nova master: Include both VCPU and PCPU in core quota count  https://review.opendev.org/68137422:19
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180122:19
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106022:19
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106122:19
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489522:19
stephenfinoh thank God22:19
*** rcernin has joined #openstack-nova22:19
stephenfinI was terrified I'd accidentally rebase vpmem :-D22:19
sean-k-mooneyok so now in thery i can create a job that depens on pcpu+vpmem and numa LM right in my testing changes22:20
sean-k-mooneysince artom rebased LM there shoudl be no conflict anymore right22:21
stephenfincorrect22:21
sean-k-mooneyim going to check that locally.22:21
sean-k-mooneyalso i really like git at times like this22:22
efriedstephenfin: that's weird, how did https://review.opendev.org/#/c/671793/ end up with a -2 from me on it? Usually that only happens if you wound up identical to an earlier PS that was -222:23
sean-k-mooneyya i can merge https://review.opendev.org/674895 with https://review.opendev.org/#/c/64002122:24
stephenfinLooks like it was one of your procedural -2s22:24
stephenfinI suspect git thinks the diff is the same22:24
stephenfinas an earlier PS22:24
sean-k-mooneyya if gerrit detect there is no change via a rebase it keps +/- 1/222:25
sean-k-mooneythat stops peole just rebasign to clear negitive reviews22:25
stephenfinI wish Gerrit had some way to view just the diff between two versions side-by-side without all the additional context introduced in between22:25
*** avolkov has quit IRC22:25
stephenfinAs it is, I've to open different versions of the review in different tabs and toggle between them22:26
sean-k-mooneythats a 4 way diff and i think the new ui can do it22:26
stephenfino rly?22:26
*** BjoernT has quit IRC22:26
sean-k-mooneyapparently althogh i have not see it in partice22:26
sean-k-mooneyim sure we could still break it22:26
melwittgit review -m <review num>,PSold-PSnew will do it, fwiw22:27
sean-k-mooneyyou creat it by taking the origin patch angainst base  and the updatdate patch against new base then diff the two diffs22:27
melwitthttps://docs.openstack.org/infra/git-review/usage.html22:28
sean-k-mooneyi dont know if that ignore change intoduce by rebasing22:29
stephenfinit does not :(22:29
sean-k-mooneybut the process i discribe should do that22:30
melwittit did for me last time I tried it a few days ago22:30
stephenfinat least if I do 'git review -m 671793,24-25' I see all the vpmem stuff introduced22:30
stephenfinmaybe I've an old version of git-review22:30
* melwitt tries with your example22:31
sean-k-mooneymelwitt: it will do it if you dont rebase between 4 and 1022:31
melwittI mean I did a diff after a rebase between22:31
stephenfinsean-k-mooney: yeah, 'git format-patch HEAD~N > (old|new); diff -r old/ new/' would do the trick22:31
stephenfinmelwitt: I imagine the difference could be that that rebase didn't introduce changes to the files you were touching22:32
stephenfinbut that's a complete guess22:32
*** markvoelker has quit IRC22:32
melwittit did, otherwise it wouldn't be useful right? I went to do it in git-review after seeing the diff in gerrit a mess with rebase stuff22:32
efriedstephenfin: looks like manual rebases on four of those? You're +A all the way.22:33
sean-k-mooneywell i just get an error when i run stephens example22:33
stephenfinefried: That's what I was seeing too22:34
stephenfinSweet. Anything else I need to do or is it home time?22:34
efriedI do what stephenfin does - open two tabs and toggle.22:34
melwittbut anyway, I'm getting the same result as you with your example :\ I don't get why this acts inconsistent depending on the change22:35
efriedand yes, rumor has it the newer gerrit has a 4-way. aspiers is always wishing misty-eyed for it.22:35
sean-k-mooneyit is very pretty22:35
efriedmaybe fungi is getting that set up for us once he gets his shoes back on22:35
fungino shoes needed with new gerrit22:36
* sean-k-mooney is impressed by how flavor less this beer tastes...22:37
fungiit's like the plushest 1970s era orange shag carpet22:37
* fungi has no clue what a 4-way diff is. wondering if he needs extra-dimensional space to render it, and what sort of monitor that entails22:38
* sean-k-mooney apparenetly it has ingredenet but otherwise heineken light really does taste like water22:38
sean-k-mooneyfungi: its a diff that only show you what changed between two patche set ignoring anything that came form a rebase22:39
melwittthere's a heineken light? eesh. as if normal heineken wasn't bad enough22:39
fungiyeah, i'm probably a beer snob but i wouldn't touch heineken anything with a 3m pole22:39
*** lbragstad has joined #openstack-nova22:40
sean-k-mooneyfungi: normally i only dirink stouts or poters if im drinking beer22:40
fungi(currently enjoying o'connor brewing's "el guapo" agave ipa)22:40
sean-k-mooneybut this was all i had in the fridge22:40
melwittI'm obsessed with hazy ipas lately22:40
melwittin colder weather I go for stouts and porters22:41
fungithose can indeed be tasty22:41
melwittwith the exception of north coast's old no 38, I will drink that anytime22:42
fungii do prefer drier varieties like pale ales, but especially when doing yardwork in the summer22:42
*** slaweq has joined #openstack-nova22:42
sean-k-mooneyhot summer days are cider weather22:43
stephenfinsean-k-mooney: Spoken like a true Tipp wan22:45
*** tbachman has quit IRC22:45
sean-k-mooneyof course i was born in the town where bulmer/magnars cider is from22:46
*** markvoelker has joined #openstack-nova22:46
sean-k-mooneyalthough i prefer dry cider in general22:46
*** slaweq has quit IRC22:48
openstackgerritMatt Riedemann proposed openstack/nova master: Cleanup reno live-migration-with-PCI-device  https://review.opendev.org/68194222:49
mriedemhmm, i was going to avoid mentioning things in the cycle highlights about deprecating the xenapi driver, consoleauth and cells v1, but wondering if i should mention that the placement-in-nova code was removed22:50
*** markvoelker has quit IRC22:51
mriedemi've avoided mentioning things like that as "highlights"22:51
sean-k-mooneyif you mentioned it as remvoed you should mention cells v1 as removed22:52
sean-k-mooneydeprecation of xen is more a lowlight22:52
mriedemcells v1 was optional and not many used it,22:52
mriedemplacement-in-nova was not optional22:52
*** ivve has quit IRC22:52
mriedemnor was consoleauth really22:52
mriedemanyway, from a marketing perspective those aren't highlights22:52
efriedmriedem: if you'd rather put a positive spin, focus on "extracted placement is required"22:52
sean-k-mooneywas the removeal of placmenet done with a release note22:53
mriedemthat's not really marketing22:53
stephenfindoes the prelude form part of the marketing?22:53
mriedemsean-k-mooney: yes, i'm scanning the release notes for highlights22:53
mriedemstephenfin: no,22:53
mriedemthat's why i think ^ is all good prelude stuff b/c it's big,22:53
mriedembut it's not marketing shit that sales people would undersatnd22:53
melwittyeah, I was gonna say, I agree none of those are highlights but they are prelude material22:53
stephenfinI agreed22:53
stephenfin*agree22:53
mriedemif i'm selling train, why do i care if nova uses placement in nova or external22:53
sean-k-mooneymriedem: i expect the placement release note will mark it graduation to a sperate service as a highlight22:53
efriedsales schmales, this is open source beyotch22:53
stephenfinefried: Not according to cdent :-D22:54
efriedmriedem: because it's way faster?22:54
cdentaw man, don't invoke me for that22:54
stephenfinsorry, cdent22:54
melwittsean-k-mooney: well, I think we already used "you can run with extracted placement" as a highlight last cycle22:54
cdentfor cycle highlights we justsaid that22:55
melwittre: graduation to separate service22:55
*** macz has quit IRC22:55
cdentwhoops22:55
cdentwe said "extracted placement must be used"22:55
efriedokay, I gotta go teach people how to beat each other up. No rest for the wicked and all that. Thanks all for the long hours and general ass-kicking today.22:55
cdentand then talked about nfv22:55
cdentbecause it's, like, the rage22:55
*** hemna has quit IRC22:56
sean-k-mooneycdent: ye had some pretty big feature this release22:56
cdentsean-k-mooney: I'm only being half serious, see: https://review.opendev.org/68119722:56
sean-k-mooneycdent: nfv does corralate with rage however22:57
*** tkajinam has joined #openstack-nova23:03
mriedemefried: et al nova cycle highlights https://review.opendev.org/#/c/681943/123:05
mriedemif people think that should say something about sev please propose words23:05
mriedemsean-k-mooney: sriov live migration was not driver specific right?23:06
sean-k-mooneyit only works for libvirt23:06
mriedemf23:06
mriedemof course23:06
sean-k-mooneyyou said that in the highlight23:07
mriedemi didn't say it was specific to libvirt,23:07
sean-k-mooneyoh that was numa23:07
mriedemlike the numa one23:07
mriedemok i'll leave it wip until tomorrow and update it23:08
mriedemi need to family now23:08
sean-k-mooneyo/23:08
*** mriedem is now known as mriedem_afk23:08
sean-k-mooneydoes anywone know what grenade_devstack_localrc does https://review.opendev.org/#/c/548936/102/.zuul.yaml23:09
sean-k-mooneyis there a way in greade to set differnt local.conf options in different releases?23:10
sean-k-mooneyah there is23:11
sean-k-mooneymaybe not...23:13
stephenfinefried: Would you mind if we pulled this out of the queue so I can do some surgery on it tomorrow? https://review.opendev.org/#/c/678470/23:13
stephenfinIt's test only so we can merge after FF, if artom's comments on https://review.opendev.org/#/c/640021/ are to be believed23:13
sean-k-mooneydansmith: i think i have hit a blocker on the greade job23:23
sean-k-mooneyhttps://review.opendev.org/#/c/548936/102/.zuul.yaml@6923:23
sean-k-mooneyi have it mostly done but i dont know how to change the config on upgrade23:23
sean-k-mooneyi think i might need a greade plugin to do that or and extention to the grenade job23:24
*** awalende has joined #openstack-nova23:24
*** awalende has quit IRC23:28
*** cdent has quit IRC23:31
*** icarusfactor has quit IRC23:32
*** icarusfactor has joined #openstack-nova23:33
*** igordc has quit IRC23:34
*** factor has joined #openstack-nova23:37
*** icarusfactor has quit IRC23:37
*** TxGirlGeek has quit IRC23:38
*** JamesBenson has joined #openstack-nova23:40
*** JamesBenson has quit IRC23:44
artomstephenfin, that was my understanding, yeah23:51
artomI mean, even dansmith was telling me he'd look at it next week23:51
artomTo me that's an implicit "test-only patches can wait until after FF"23:51
sean-k-mooneyspeaking of test only patches my uber test of doom https://review.opendev.org/#/c/681771/2 is pendingin the experimenal queue. that will test PMEM PCPUS and numa migration23:53
sean-k-mooneyits curently getting ready to stack23:53
sean-k-mooneyhttp://zuul.openstack.org/stream/6a67b28d6e9e4b739e7575df7943a69d?logfile=console.log23:53
sean-k-mooneybut im going to go to sleep now23:53
sean-k-mooneyif nothing else merge but that 31 patch chain then there are no conflic between those 3 feature branches23:54
sean-k-mooneyso lets see how that looks in the morning23:54

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!