Thursday, 2019-04-11

*** owalsh has joined #openstack-nova00:00
*** Sundar has joined #openstack-nova00:06
*** liuyulong has quit IRC00:06
*** tetsuro has joined #openstack-nova00:12
*** threestrands has joined #openstack-nova00:14
*** rcernin has joined #openstack-nova00:22
*** bbowen__ has quit IRC00:24
*** betherly has joined #openstack-nova00:24
*** betherly has quit IRC00:28
*** Sundar has quit IRC00:28
*** rchurch_ has joined #openstack-nova00:30
*** mchlumsky_ has joined #openstack-nova00:30
*** tjgresha_nope has joined #openstack-nova00:31
*** tetsuro_ has joined #openstack-nova00:31
*** rchurch has quit IRC00:32
*** mchlumsky has quit IRC00:32
*** dpawlik has quit IRC00:32
*** tetsuro has quit IRC00:32
*** tjgresha has quit IRC00:32
*** Vek has quit IRC00:32
*** kukacz_ has quit IRC00:32
*** dpawlik has joined #openstack-nova00:32
*** kukacz has joined #openstack-nova00:33
openstackgerritDakshina Ilangovan proposed openstack/nova-specs master: Resource Management Daemon - Base Enablement  https://review.openstack.org/65113000:34
*** mgoddard has quit IRC00:34
*** sambetts_ has quit IRC00:35
*** mgoddard has joined #openstack-nova00:37
*** sambetts_ has joined #openstack-nova00:37
*** tbachman has quit IRC00:43
*** tbachman has joined #openstack-nova00:46
*** wolverineav has joined #openstack-nova00:52
*** wolverineav has quit IRC00:54
*** wolverineav has joined #openstack-nova00:55
*** rcernin has quit IRC00:59
*** wolverineav has quit IRC01:00
*** bbowen__ has joined #openstack-nova01:08
*** rcernin has joined #openstack-nova01:09
*** gyee has quit IRC01:10
*** betherly has joined #openstack-nova01:33
*** lbragstad has quit IRC01:37
*** betherly has quit IRC01:37
openstackgerritya.wang proposed openstack/nova-specs master: Expose auto converge and post copy  https://review.openstack.org/65168101:39
*** betherly has joined #openstack-nova01:54
*** betherly has quit IRC01:58
*** bhagyashris has joined #openstack-nova02:01
*** dave-mccowan has quit IRC02:01
*** zhongjun2_ has joined #openstack-nova02:05
*** zhongjun2_ has quit IRC02:07
*** zhongjun2_ has joined #openstack-nova02:07
*** zhongjun2_ is now known as zhongjun202:11
*** awaugama has quit IRC02:13
*** betherly has joined #openstack-nova02:15
*** betherly has quit IRC02:19
*** wolverineav has joined #openstack-nova02:24
*** wolverineav has quit IRC02:41
gmannsean-k-mooney: artom : we are good on 648123  from microversion point of view. It is not breaking any API contract. Commented inline. https://review.openstack.org/#/c/648123/402:42
gmanntempest schema test is passing which make sure the API contract verification02:42
*** betherly has joined #openstack-nova02:46
*** igordc has quit IRC02:47
*** betherly has quit IRC02:50
*** bhagyashris has quit IRC02:56
*** nicolasbock has quit IRC03:03
*** hongbin has joined #openstack-nova03:03
*** cfriesen has quit IRC03:07
*** psachin has joined #openstack-nova03:09
openstackgerritBoxiang Zhu proposed openstack/nova-specs master: Add host and hypervisor_hostname flag to create server  https://review.openstack.org/64545803:11
*** wolverineav has joined #openstack-nova03:13
*** betherly has joined #openstack-nova03:17
*** betherly has quit IRC03:22
*** wolverineav has quit IRC03:43
*** betherly has joined #openstack-nova03:48
*** betherly has quit IRC03:53
*** whoami-rajat has joined #openstack-nova03:54
openstackgerritMerged openstack/nova master: Add test coverage for nova.privsep.libvirt.  https://review.openstack.org/64861603:54
*** Kevin_Zheng has joined #openstack-nova03:57
*** markvoelker has joined #openstack-nova04:02
*** imacdonn_ has quit IRC04:03
*** imacdonn_ has joined #openstack-nova04:04
*** hongbin has quit IRC04:05
openstackgerritMerged openstack/nova master: Add test coverage for nova.privsep.qemu.  https://review.openstack.org/64919104:06
*** ricolin has joined #openstack-nova04:10
openstackgerritMerged openstack/nova stable/stein: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64932004:16
*** betherly has joined #openstack-nova04:30
*** betherly has quit IRC04:34
*** markvoelker has quit IRC04:36
*** chhagarw has joined #openstack-nova04:38
*** ratailor has joined #openstack-nova04:59
*** betherly has joined #openstack-nova05:01
*** betherly has quit IRC05:06
*** sidx64 has joined #openstack-nova05:12
*** rambo_li has joined #openstack-nova05:13
*** sidx64 has quit IRC05:26
*** bhagyashris has joined #openstack-nova05:31
*** betherly has joined #openstack-nova05:32
*** markvoelker has joined #openstack-nova05:33
*** betherly has quit IRC05:37
*** sidx64 has joined #openstack-nova05:38
*** Luzi has joined #openstack-nova05:41
*** wolverineav has joined #openstack-nova05:44
*** awalende has joined #openstack-nova05:48
*** wolverineav has quit IRC05:48
*** jaypipes has quit IRC05:50
*** jaypipes has joined #openstack-nova05:50
*** betherly has joined #openstack-nova05:53
*** awalende has quit IRC05:53
*** awalende has joined #openstack-nova05:54
*** gouthamr has quit IRC05:56
*** betherly has quit IRC05:57
*** gouthamr has joined #openstack-nova06:00
*** udesale has joined #openstack-nova06:03
*** markvoelker has quit IRC06:06
*** tetsuro_ has quit IRC06:09
*** tetsuro has joined #openstack-nova06:09
*** chhagarw has quit IRC06:13
*** whoami-rajat has quit IRC06:13
*** chhagarw has joined #openstack-nova06:14
*** sridharg has joined #openstack-nova06:21
*** sidx64 has quit IRC06:23
*** awalende has quit IRC06:24
*** whoami-rajat has joined #openstack-nova06:25
*** sidx64 has joined #openstack-nova06:27
*** awalende has joined #openstack-nova06:33
*** chhagarw has quit IRC06:39
*** chhagarw has joined #openstack-nova06:39
*** phasespace has quit IRC06:40
*** betherly has joined #openstack-nova06:45
*** betherly has quit IRC06:49
*** obre has quit IRC06:52
*** obre has joined #openstack-nova06:52
*** ivve has joined #openstack-nova06:52
*** ccamacho has joined #openstack-nova07:00
*** markvoelker has joined #openstack-nova07:03
*** boxiang has quit IRC07:03
*** slaweq has joined #openstack-nova07:03
*** betherly has joined #openstack-nova07:05
*** rpittau|afk is now known as rpittau07:06
*** luksky has joined #openstack-nova07:07
*** betherly has quit IRC07:11
*** awalende has quit IRC07:13
*** betherly has joined #openstack-nova07:26
*** mvkr has quit IRC07:27
*** threestrands has quit IRC07:30
*** betherly has quit IRC07:31
*** markvoelker has quit IRC07:36
*** tosky has joined #openstack-nova07:39
*** tssurya has joined #openstack-nova07:40
*** ralonsoh has joined #openstack-nova07:42
*** ralonsoh has quit IRC07:42
*** ralonsoh has joined #openstack-nova07:43
*** betherly has joined #openstack-nova07:45
*** maciejjozefczyk has left #openstack-nova07:49
*** betherly has quit IRC07:50
*** brinzhang has joined #openstack-nova07:55
*** phasespace has joined #openstack-nova07:56
*** ttsiouts has joined #openstack-nova08:04
*** betherly has joined #openstack-nova08:06
*** betherly has quit IRC08:10
*** maciejjozefczyk has joined #openstack-nova08:10
*** ttsiouts has quit IRC08:15
*** ttsiouts has joined #openstack-nova08:16
*** betherly has joined #openstack-nova08:19
*** owalsh has quit IRC08:19
*** ttsiouts has quit IRC08:20
*** ttsiouts has joined #openstack-nova08:22
*** priteau has joined #openstack-nova08:23
*** betherly has quit IRC08:23
*** davidsha has joined #openstack-nova08:29
*** mdbooth has quit IRC08:30
*** markvoelker has joined #openstack-nova08:34
*** owalsh has joined #openstack-nova08:34
*** derekh has joined #openstack-nova08:34
*** dtantsur|afk is now known as dtantsur08:35
*** betherly has joined #openstack-nova08:39
*** betherly has quit IRC08:44
*** tkajinam has quit IRC08:58
*** mdbooth has joined #openstack-nova08:58
*** wolverineav has joined #openstack-nova09:00
*** cdent has joined #openstack-nova09:00
*** sidx64 has quit IRC09:02
openstackgerritTheodoros Tsioutsias proposed openstack/nova-specs master: Add PENDING vm state  https://review.openstack.org/64868709:02
*** wolverineav has quit IRC09:05
*** sidx64 has joined #openstack-nova09:05
*** markvoelker has quit IRC09:07
*** sidx64 has quit IRC09:12
*** mvkr has joined #openstack-nova09:30
*** sidx64 has joined #openstack-nova09:34
*** rambo_li has quit IRC09:35
*** betherly has joined #openstack-nova09:41
*** luksky has quit IRC09:42
*** ttsiouts has quit IRC09:44
*** ttsiouts has joined #openstack-nova09:44
*** betherly has quit IRC09:46
*** ttsiouts has quit IRC09:49
*** bhagyashris has quit IRC09:50
*** sidx64 has quit IRC09:52
*** boxiang has joined #openstack-nova09:54
*** markvoelker has joined #openstack-nova10:04
*** ratailor_ has joined #openstack-nova10:08
*** ratailor has quit IRC10:11
*** luksky has joined #openstack-nova10:18
openstackgerritChris Dent proposed openstack/nova master: Use update_provider_tree in vmware virt driver  https://review.openstack.org/65161510:20
*** sidx64 has joined #openstack-nova10:21
*** bbowen__ has quit IRC10:23
*** lpetrut has joined #openstack-nova10:27
openstackgerritChris Dent proposed openstack/nova master: Delete the placement code  https://review.openstack.org/61821510:29
*** mvkr has quit IRC10:34
*** markvoelker has quit IRC10:36
openstackgerritMerged openstack/nova-specs master: Spec: Use in_tree getting allocation candidates  https://review.openstack.org/64602910:39
*** ratailor__ has joined #openstack-nova10:45
*** nicolasbock has joined #openstack-nova10:45
*** sapd1_x has joined #openstack-nova10:47
*** ratailor_ has quit IRC10:48
*** mvkr has joined #openstack-nova10:48
*** tbachman has quit IRC10:54
*** udesale has quit IRC10:57
*** francoisp_ has quit IRC10:58
*** priteau has quit IRC10:59
*** mvkr has quit IRC11:04
*** mvkr has joined #openstack-nova11:05
*** sidx64 has quit IRC11:08
*** sidx64 has joined #openstack-nova11:11
*** ttsiouts has joined #openstack-nova11:13
*** sidx64 has quit IRC11:18
*** sidx64 has joined #openstack-nova11:20
*** bbowen has joined #openstack-nova11:33
*** ricolin has quit IRC11:37
*** dave-mccowan has joined #openstack-nova11:38
*** sapd1_x has quit IRC11:38
*** ttsiouts has quit IRC11:40
*** ttsiouts has joined #openstack-nova11:40
*** yan0s has joined #openstack-nova11:42
*** mvkr has quit IRC11:44
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Remove dead nova.db functions  https://review.openstack.org/64957011:46
stephenfingibi: Wanna send this cleanup patch on its way? https://review.openstack.org/#/c/650018/11:48
*** eharney has joined #openstack-nova11:49
*** nicolasbock has quit IRC11:53
*** mvkr has joined #openstack-nova11:56
gibistephenfin: done11:58
stephenfinta11:58
*** nicolasbock has joined #openstack-nova12:00
*** pchavva has joined #openstack-nova12:00
artomgmann, cheers! (if you're still awake/around)12:03
*** tbachman has joined #openstack-nova12:11
*** francoisp_ has joined #openstack-nova12:13
*** brinzhang has quit IRC12:23
*** rambo_li has joined #openstack-nova12:34
*** lbragstad has joined #openstack-nova12:36
cdentaspiers: have you seen http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004779.html ?12:39
aspierscdent: yes, it's on my TODO list :)12:39
aspiersstill jetlagged from SUSECON12:39
cdentroger, thanks12:39
aspiersI saw that efried felt OK with the status quo though12:40
openstackgerritTetsuro Nakamura proposed openstack/nova master: Add in_tree field to RequestGroup object  https://review.openstack.org/64953412:40
aspiersI'm inclined to agree that "don't do that" is a reasonable stance12:40
openstackgerritTetsuro Nakamura proposed openstack/nova master: Add get_compute_nodes_by_host_or_node()  https://review.openstack.org/65087712:40
openstackgerritTetsuro Nakamura proposed openstack/nova master: Pass target host to RequestGroup.in_tree  https://review.openstack.org/65087812:40
openstackgerritTetsuro Nakamura proposed openstack/nova master: Query `in_tree` to placement  https://review.openstack.org/64953512:40
sean-k-mooney /query cdent12:40
cdent?12:41
sean-k-mooneyi was going to pm you and had an extra space12:41
sean-k-mooney:)12:41
aspierslucky you noticed before you said something rude in public ;-)12:42
*** tetsuro has quit IRC12:46
*** ttsiouts has quit IRC12:50
*** ttsiouts has joined #openstack-nova12:51
*** ratailor__ has quit IRC12:55
*** ttsiouts has quit IRC12:55
*** ttsiouts has joined #openstack-nova12:57
*** udesale has joined #openstack-nova13:00
*** wolverineav has joined #openstack-nova13:01
*** wolverineav has quit IRC13:05
*** sidx64 has quit IRC13:07
*** rambo_li has quit IRC13:08
*** jmlowe has quit IRC13:08
*** rcernin has quit IRC13:18
*** pcaruana has quit IRC13:20
*** mlavalle has joined #openstack-nova13:24
*** mlavalle has quit IRC13:25
*** mlavalle has joined #openstack-nova13:27
*** tbachman has quit IRC13:28
*** mriedem has joined #openstack-nova13:28
*** psachin has quit IRC13:30
mnaseris the nova archive tool not hitting cell0 a decision by design?13:31
dansmithmnaser: do you mean archive_deleted_rows?13:32
mnaseryeah13:32
dansmithisn't there an all cells flag for that?13:32
mnaserthat was an unmerged patch afaik13:32
*** tbachman has joined #openstack-nova13:32
dansmithah, okay, well, then yes by design? :)13:32
openstackgerritBoxiang Zhu proposed openstack/nova master: Make evacuation respects anti-affinity rule  https://review.openstack.org/64996313:32
dansmithrun it against a config pointing at cell013:32
mnaserhttps://review.openstack.org/#/c/587858/13:33
mnasererr well13:33
mnaserlooks like that was a dup of https://review.openstack.org/#/c/507486/13:33
mnaserwhich seems to have stalled out13:33
*** ttsiouts has quit IRC13:35
*** ttsiouts has joined #openstack-nova13:35
dansmithmnaser: you can run purge --all-cells --before13:36
dansmithoh wait,13:36
*** ttsiouts has quit IRC13:36
dansmithI was thinking that did an archive first but it does not, nevermind13:36
dansmithit's not like I wrote that...13:36
*** ttsiouts has joined #openstack-nova13:36
mnaserit's early :p13:36
dansmithyeah, YEAH.13:36
*** jmlowe has joined #openstack-nova13:37
*** phasespace has quit IRC13:37
mnaseranyways, I rechecked that patch and I can iterate here and there to get it to land eventually13:39
mnaserthough probably not at a fast pace13:39
*** pcaruana has joined #openstack-nova13:42
*** tbachman has quit IRC13:47
*** tbachman has joined #openstack-nova13:51
*** tetsuro has joined #openstack-nova13:52
stephenfindansmith, jaypipes, cdent, bauzas: Reworking the cpu-resources spec at the moment. It feels like the general preference is to have a hard break between the current behavior and the new placement-based flow, right?13:54
stephenfinas in only newly deployed compute nodes would support the new behavior13:54
bauzasstephenfin: well, my opinion would be to not have a modification if you have the same options13:54
sean-k-mooney?13:55
*** ricolin has joined #openstack-nova13:55
bauzasbut maybe however for CONF.vcpu_pin_set13:55
cdentstephenfin: I'd need to load that context back in before being able to say something useful, so will defer to others for now13:55
stephenfinsean-k-mooney: Pretty much this https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@72513:55
bauzasstephenfin: so, see my opinion :13:56
dansmithstephenfin: no, I think that's the opposite of what we want13:56
bauzas- no modifications for the same options but CONF.vcpu_pin_set13:56
sean-k-mooney well i really wish we did not confalate vCPU with floating13:56
sean-k-mooneyor with pinned13:57
sean-k-mooneyit has a different meaning then either13:57
*** awaugama has joined #openstack-nova13:57
sean-k-mooneyanyway personally i would like to have two paralle implementions. in train. freeze the current one and implement a paralle placemnt native one and then switch in U or later13:58
stephenfindansmith: It is? What is the preference?13:58
stephenfinsean-k-mooney: Yeah, that's pretty much where my head was going13:58
stephenfinas for conflating VCPU with floating, I'm not sure what other term we could use13:59
sean-k-mooneya vCPU can be floating or pinned13:59
dansmithstephenfin: I think I said on the review that I was opposed to us solving the problem of accounting by making people move their guests around between computes that are counting the old way and new way13:59
dansmithstephenfin: so having compute nodes never transition to the new way (without being cleaned off first) is not okay14:00
mriedemmelwitt: just fyi that i replied to some of your replies in the quotas from placement change https://review.openstack.org/#/c/638073/214:00
mriedemhttps://review.openstack.org/#/c/638073/14:00
sean-k-mooneyi dont like using the vCPU resouce class in plamenet to mean just floating as it has a different menanitn the VCPU in the flavor14:00
mriedemif you're working on updates14:00
stephenfinsean-k-mooney: We're eventually going to kill vcpus in the flavor though14:01
sean-k-mooneydansmith: ok so you are asking for an inpalce reshape or other mechanisuym14:01
jaypipessean-k-mooney: I think I'm pretty clear in the glossary of that spec.14:01
sean-k-mooneystephenfin: im not really a fan of that but we could14:01
sean-k-mooneyjaypipes: yes i know what it say in the spec14:01
mriedemmnaser: dansmith already said this but you can archive cell0 by just running it against a nova.conf with [database]/connection pointed at cell014:02
jaypipessean-k-mooney: that glossary clearly delineates between guest vCPU threads, shared host CPUs (floating CPUs) and dedicated host CPUs (pinned to a guest vCPU thread)14:02
stephenfinsean-k-mooney: Yeah, we won't need it once we're modelling this stuff in placement. That'd be a future work item though14:02
stephenfinsean-k-mooney: and VCPU and PCPU are far more succinct than VCPU_SHARED and VCPU_DEDICATED, even if we're overloading the term vCPU14:02
dansmithstephenfin: we are? (going to kill vcpu in the flavor) ?14:02
sean-k-mooneydansmith: see i dint think we were14:03
dansmithsean-k-mooney: me either :)14:03
stephenfindansmith: https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@72714:03
mriedemkilling vcpu in the flavor will be a giant change14:03
*** cfriesen has joined #openstack-nova14:04
dansmithstephenfin: you mean adjusting flavor vcpus down to zero for purely physically-pinned instances14:04
sean-k-mooneystephenfin: ya that was one of the parts i disliked about the current spec but i could live with it if we needed too14:04
sean-k-mooneystephenfin: its a faily major api breakage14:04
mriedemis this a known gate breakage and i'm just late to the party? http://logs.openstack.org/50/651650/2/check/openstack-tox-cover/ebe055d/job-output.txt.gz#_2019-04-10_21_35_25_17172714:05
sean-k-mooneyany system the used to inpsect the flaovr vcpu field woudl now have to inspec the resouces dict14:05
jaypipesdansmith: yes.14:05
dansmithmaybe we need a hangout14:05
mriedemb'migrate.exceptions.ScriptError: You can only have one Python script per version, but you have: /home/zuul/src/git.openstack.org/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions/393_add_instances_hidden.py and /home/zuul/src/git.openstack.org/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions/393_placeholder.py'14:05
stephenfindansmith: Initially, yeah, but at some point that entire field could go. I'm trying to find the place we discussed this in the spec previously (this is a big review)14:05
mriedemoh nvm i need to rebase14:05
dansmithstephenfin: I don't agree with you :)14:05
sean-k-mooneystephenfin: i think it would be better to leave teh vcpu filed in the flavor as the total numaber of cpu and set the vcpu resouce class request to 0 instead in that instance14:06
mnaserwait, why would we kill the cpu field14:06
stephenfindansmith: Then it's a good thing this isn't actually a stated goal of this particular spec14:06
sean-k-mooneyit does not break clients and achive the same goal14:06
stephenfinLet's forget about that and move back to the original question of handling upgrades14:06
jaypipesmnaser: because there's two different actual resource classes: dedicated (pinned) CPU and shared CPU.14:07
sean-k-mooneyi dont think haveing a paralle implemantins and inplace updates are mutally exclcive14:07
stephenfinI've no idea how we can make resource claims for existing instances that currently don't have anything claimed14:07
jaypipesmnaser: and the ugliness of our NUMA and pinning code has borked how we think of the CPU resources.14:07
sean-k-mooneystephenfin: they do have clims but they are all of resocue class vcpu14:08
stephenfinyeah, so moving those from VCPU to PCPU14:08
jaypipesstephenfin: by looking at the flavor and image extra specs.14:08
sean-k-mooneystephenfin: so we need to modify there exiting claimes inplace as part of the reshape14:08
stephenfinthe migration is going to be hell14:08
jaypipesyuuup.14:08
jaypipesalways is.14:08
sean-k-mooneywill it be any worse then the vgpu reshape14:08
stephenfinwe're going to handle the stupid stuff that can happen now, like shared and dedicated instances being on the same host14:08
jaypipesas I've mentioned before, upgrade path is about 95% of the code and effort.14:09
mriedemi thought you couldn't have shared and dedicated on the same host today, or is that a 'recommendation' but not enforced14:09
mriedemand we just hope people use aggregates for sanity?14:09
sean-k-mooneystephenfin: by the way today the vms cant mix pinned and flaoting instace so the only things we have to migrate are the pinned isntance and then just need to change the resouce class14:09
stephenfinmriedem: yeah, the latter14:09
jaypipesmriedem: recommendation. there is no way to enforce it.14:09
stephenfinmriedem: It's all over the docs but who reads those14:10
sean-k-mooneywindriver have some downwstream only hacks to make mixed stuff work14:10
sean-k-mooneyor i guess i should say starlingx14:10
mriedemi know they have their crazy floating vcpu stuff in starlingx14:10
sean-k-mooneybut we shoudl not port that14:10
mriedemi'm not sure there is no way we could've enforced it,14:11
stephenfinsean-k-mooney: You mean you can't have pinned and floating instances on the same host? If so, you know that's not true14:11
* jaypipes steps out of time machine. yep, I thought this looked like 2 years ago...14:11
sean-k-mooneyya so they were only able to do that because we dont enforce that14:11
mriedeme.g. if cpu_allocation_ratio=1.0, you have to have dedicated cpus14:11
stephenfinYeah, we don't so we have to handle that14:11
sean-k-mooneystephenfin: no you can but you shouldnt14:11
sean-k-mooneywith out the starlinx hacks14:11
stephenfinand because you can, we're going to have to handle that14:12
jaypipesmriedem: cpu_allocation_ratio=1.0 has nothing to do with pinned CPUs.14:12
dansmithwell,14:12
jaypipes(which is part of the problem)14:12
dansmithit's all over this spec14:12
stephenfinhence my inclination towards draining hosts and moving them to other, newly configured hosts14:12
stephenfin*moving the instances14:12
mriedemmy point was we could have used that as indication a host can only have dedicated cpu guests14:12
mriedembut since we didn't do that, yeah we could have mixed on the same host i guess and have to deal wit it14:12
sean-k-mooneycpu_allocation_ratio=1.0 jsut disables oversubsciption unfrotunetly14:12
mriedem*with14:12
dansmithI'm -2 on making people move instances to update counting numbers in placement14:12
stephenfinmriedem: yeah, cpu_allocation_ratio is ignored for pinned instances14:13
jaypipesdansmith: especially when those instances are pets and pandas. all of them.14:13
sean-k-mooneyanyway im personally not to worried about fixing the allocations.14:13
dansmithaye14:13
sean-k-mooneyi think we can do that14:13
stephenfinso I'd imagine it's set to 16.0 for most deployments, regardless of the workload14:14
cdentdansmith: I remain confused about why it isn't okay for those instance to remain defined/allocation in the "old way"?14:14
sean-k-mooneystephenfin: it depense on the deployment tool14:14
stephenfinyup14:14
dansmithcdent: I don't think it is14:14
bauzassorry folks, I had to go AFK14:14
dansmithcdent: we have to reshape14:14
cdentdansmith: I hear you, I'm asking "why?"14:14
bauzasdansmith: my point is that I think we only need to reshape for CONF.vcpu_pin_set14:15
bauzasfor the other options, we don't need it14:15
sean-k-mooneyvcpu_pin_set is used for host with shared cpus too14:15
bauzasstephenfin: ^14:15
dansmithcdent: why not just leave them with the old accounting for five years?14:15
bauzassean-k-mooney: I know, that's why we need to reshape14:15
cdentif that's how long they live, sure14:15
bauzasbut only for this option14:15
stephenfinsean-k-mooney: Is it? I thought that was totally ignored unless you'd a NUMA topology14:16
*** hongbin has joined #openstack-nova14:16
sean-k-mooneyyes14:16
sean-k-mooneybut you can have numa without pinning14:16
cdentdansmith: I'm not suggesting it, I'm asking why it is not okay.14:16
sean-k-mooneyand actully no14:16
stephenfinInstead you've to use that reserved_cpus option (or whatever it's called)14:16
dansmithcdent: because it means new instances scheduled to compute nodes that don't have proper accounting would also have to be accounted the old way?14:16
stephenfincdent: Yeah, that's what I was thinking to ^14:16
stephenfin*too14:16
mriedemcdent: by "remain defined/allocation in the "old way"?" do you mean reporting VCPU allocations to placement rather than PCPU?14:16
sean-k-mooneyif you have no numa toplogy th enumber of enabeld cores in vcpu pinnset is still used to determin the number of cores reported to the resouce tracker and therefor to placemnt14:17
cdentdansmith: make them end up on other nodes?14:17
stephenfinWe can't schedule new instances to that host until we know how many PCPUs are actually in use there14:17
dansmithcdent: so I have to waste the capacity on those nodes until long-lived instances die?14:17
cdentmriedem: yes, if that's how they were booted in the first place, how/why should they change14:17
dansmithright, what stephenfin said14:17
cdentdansmith: yes!14:17
dansmithcdent: um, no14:17
sean-k-mooneystephenfin: the advice that many gave was never to user reserved_cpus and alwauys use vcpu_pin_set instead14:17
mnaserthat's a terrible idea14:17
mnasertbh with my operators 2 cents: I'm not ok with moving around all my instances around to magically reshape things14:18
dansmiththis ^14:18
mriedemsince i just went through mel's change to counting usage from placement, your quotas would be all out of whack too14:18
cdentI'm not suggesting that people migrate their instances14:18
mnaserand I'm not okay with my capacity sitting empty because this is $$$$14:18
stephenfinsean-k-mooney: Again, advice that wasn't enforced anywhere and therefore not something we can rely on :(14:18
dansmithmoving gigs and gigs of live instances so that we can adjust integers in placement is INSANE14:18
sean-k-mooneythe reason is vcpu_pin_set takes effect before teh cpu_allocation_reatio is applied and reserved_happens after14:18
cdentmeh14:18
cdentI never suggested anybody do any migrations14:18
dansmithreserving capacity until a five-year instance goes away so we can update integers in placement is INSANE14:18
cdentLet stuff live out its lifecycle14:18
stephenfincdent: Yup, that's all me. Sorry :)14:18
mnaserthat's not possible though, that's the thing14:18
mnaserI have no control over my environment14:19
*** lpetrut has quit IRC14:19
cdentdansmith: I'm not concerned about this from a placement standpoint: abuse placement all we want, it'll take it14:19
cdentI'm concerned about it from a magical recognition happening on the compute node14:19
dansmithcdent: yeah, this isn't a placement concern, it's a nova concern14:19
stephenfinOK, so we're rewriting flavors and moving allocations around to switch everything to the new system. If that's the case, we're back to trying to think of all the edge cases that exist14:20
stephenfinand there are many. Many many14:20
sean-k-mooneystephenfin: we can support the new flow without requrieing flavor to be modifed14:20
dansmithsean-k-mooney: he means the instance's flavor I think14:20
stephenfinsean-k-mooney: via a shim, I guess?14:20
stephenfindansmith: I do. The embedded one14:20
sean-k-mooneystephenfin: today we generate the palcement request via the request spec14:21
dansmithstephenfin: you can write a nova-status check that verifies that all the instances can be fit to whatever minimally simplified scheme you support,14:21
* mriedem thinks about how we haven't dropped the ironic flavor migration code from pike yet14:21
dansmithto warn people before they upgrade with complex instances that can't be fixed or something14:21
sean-k-mooneywe convert the VCPU element in the flavor in to a vcpu resouce request14:21
sean-k-mooneythat code can take account of the hw:cpu_policy extra spec and jsut assk for PCPU resouces instread14:22
stephenfindansmith: That's going to be a big check, fair warning :)14:22
dansmithstephenfin: I'm just throwing ideas14:22
stephenfinYup, and good ones too14:22
*** awalende has joined #openstack-nova14:22
dansmithstephenfin: migrating everything, deleting everything, not upgrading until instances age out -- all not options, IMHO14:22
bauzasdansmith: mnaser: okay, sorry, that's me who proposed migrating instances, and it was a terrible idea, I reckon14:22
bauzasso we should stop thinking about this possibility14:23
mnaserbauzas: all good :) ideas are good to bring up anyways14:23
bauzasbut, then, we want to just make sure that when creating a new RC, we also look at the existing capacity14:23
*** jobewan has joined #openstack-nova14:23
stephenfinSo supporting instances with just 'hw:cpu_policy=dedicated' in a deployment that has used aggregates as we suggest seems pretty easy14:24
bauzassee the example I provided : https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@18114:24
bauzasstephenfin: ^14:24
*** jobewan has quit IRC14:24
bauzasif we change the capacity for VGPU, then it could be a problem14:24
dansmithstephenfin: another thing I'd support is converting instances to allocations that are maybe overly conservative.. like if you need to reserve more resources than they really have to make the math work out, that seems like a potential compromise14:24
*** Luzi has quit IRC14:24
bauzas"I'll try to clarify my thoughts with an upgrade example on a host with 8 physical CPUs, named CPU0 to 7:in Stein, instances are running and actively taking VCPUs that are consuming CPU0 to CPU7.in Train, operator wants to dedicate CPU0 and CPU1. Accordingly, CPU2 to 7 will be shared.Consequently, VCPU inventory for that host will be reduced by the amount of allocation_ratio * 2. In theory, we should then allocate VCPU resource14:24
bauzaslass for instances that are taking VCPUs located on CPU2 to 7 and allocate PCPU for instances that are taking CPU0 and 1. But if ratio is 16, we could have 32 instances (asking for 1 VCPU) to be allocated against 2 PCPU with ratio=1.0."14:24
dansmithstephenfin: and then recalculate that on migrate if they want.. depending on how that looks14:25
dansmithI dunno what the actual complexity concern looks like, so I'm just spitballing14:25
stephenfindansmith: That might be necessary for something like 'hw:cpu_threads_policy=isolate'14:25
dansmithyeah14:25
*** jobewan has joined #openstack-nova14:25
* stephenfin really regrets ever having added that feature :(14:26
*** awalende has quit IRC14:26
stephenfinbauzas: Yeah, I think we need a startup check to ensure NUM_VCPUS_USED_BY_INSTANCE <= NUM_VCPUS_AVAILABLE14:27
stephenfin*INSTANCES14:27
sean-k-mooneywell we say in the spec hw:cpu_threads_policy woudl be going away14:27
bauzasstephenfin: cool with it then14:27
bauzasstephenfin: but then we need to call placement when restarting the compute service14:27
stephenfinsean-k-mooney: Yup, but there has to be an intermediate step that lets us account for the fact that existing instances are using more cores than instance.vcpus14:28
bauzas*every time*14:28
*** jobewan has quit IRC14:28
bauzasfor vGPUs, we basically only reshape once14:28
stephenfinbauzas: Hmm, could also be a nova-status check as dansmith suggested14:28
mnaserwhat's the concern in taking the current state and translating that directly into placement when the compute node goes up?14:28
dansmithmnaser: complexity14:28
dansmithbut we have to bite that bullet I think14:29
stephenfinmnaser: there are a lot of ways things can be inconsistent and we need to handle those14:29
sean-k-mooneystephenfin: that only happens for pinned instacnes if the hsot has hyper treading or you have emulartor_treads=isolate14:29
*** jobewan has joined #openstack-nova14:29
sean-k-mooneystephenfin: but yes we do14:29
stephenfinlike the way there's nothing preventing you from scheduling pinned and unpinned instances on the same host14:29
mnasercouldn't you introspect pinned and unpinned from the libvirt definition14:30
sean-k-mooneymnaser: we can tell form the flavor14:30
stephenfin(so when we migrate, we could end up in a situation where an N core host could have N PCPUs and N * overallocation_ratio VCPUs in use at the same time)14:30
sean-k-mooneyits not a case of we dont know this happens we told operators that its there respociblity to ensure it does not14:31
sean-k-mooneythat is the issue we told them to do somthing but did not enforec it in code14:31
sean-k-mooneythere for we have to assuem the worst14:31
stephenfinor the fact that when using the isolate cpu thread policy, the instance may or may not be using twice as many cores as its supposed to be using (isolate will reserve the hyperthread siblings for each core used by the instance)14:32
stephenfinsean-k-mooney: Correct14:32
sean-k-mooneyyes alther to be faire we do account for that properly in the resocue tracker14:32
sean-k-mooney*although14:32
stephenfinyup14:32
sean-k-mooneymnaser: so we have all the data to fix things if we need to14:33
bauzasstephenfin: we *could* do it with nova-status but then operators would have to migrate (or delete some instances) :(14:33
sean-k-mooneythe issue is that the isolate pollicy is not compatible with placcement14:33
bauzasthanks, allocation ratio14:33
mriedemlyarwood: can you hit these to keep things rolling https://review.openstack.org/#/q/topic:bug/1669054+branch:stable/rocky14:33
sean-k-mooneyit chagnes the quantity fo resocue based in the hsot that is selected14:33
sean-k-mooneybauzas: no i think we can fix allocation for existing instance14:34
bauzassean-k-mooney: how ? see my example14:34
sean-k-mooneythe thing we have to be ok with is removing cpu_thread policies14:34
sean-k-mooneybauzas: we can over allocate RPs if we need to initally14:35
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: libvirt: disconnect volume when encryption fails  https://review.openstack.org/65179614:36
sean-k-mooneyor we can say you asked for 2 cpus but you have isolate and are artully using 4 cpus and update the placement allcoation accordingly14:36
*** tetsuro has quit IRC14:36
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Don't warn on network-vif-unplugged event during live migration  https://review.openstack.org/65179714:37
stephenfinmriedem: If you're looking at stable stuff, think you could look at these too? https://review.openstack.org/#/c/650363/ https://review.openstack.org/#/c/650364/14:37
sean-k-mooneywith pinned cpus there was no over subscirtion so the fact the vm is there means it can fit and we correctly do the accounting in the resouce tracker to hanel the addtional cpu usage14:37
*** udesale has quit IRC14:38
mriedemstephenfin: ok14:38
stephenfinthanks14:38
stephenfinbauzas: You would, but is there anyway to work around that?14:38
stephenfinI mean, if they're in a broken state, something has to change14:39
stephenfinbauzas: Also, wouldn't this exact same thing happen now if you messed with allocation ratios?14:39
stephenfini.e. If there are already instances on a host and I drop cpu_allocation_ratio from 16.0 to 2.0 and restart nova-compute, what happens?14:40
bauzaswell, I dunno what to say14:40
bauzasstephenfin: it just works14:40
sean-k-mooneyin the placement side the ratio changes14:40
bauzasstephenfin: but any other instance request would not go to this compute14:40
sean-k-mooneyif you are using more then is available you can nolonger allocate untill you drop below the new limit14:40
sean-k-mooneybut nothing breaks14:40
stephenfinSo it'd be the same here, right?14:40
*** jobewan has quit IRC14:41
bauzasexactly what I said :)14:41
sean-k-mooneyit just prevent new instance going to the node14:41
stephenfinOr am I missing something?14:41
bauzasactually, that's a good point14:41
sean-k-mooneyyes it would be the same14:41
bauzasif the host is oversubscribed, that's fine14:41
*** jobewan has joined #openstack-nova14:41
bauzasit's just the options mean nothing14:41
sean-k-mooneyok i hate myself for saying this but can we seperate hw:cpu_thread_policy into antoher spec for the removal of that option?14:43
sean-k-mooneyit can be replaced with a trait for host with SMT enabled14:44
sean-k-mooneyif we agree on that then that one less thing we need to figure out in the cpu spec14:44
sean-k-mooneywe will be loosing functionality by doing that but if are not ok with removing that option we have a blocker with the larger spec for cpus in placment anyway14:46
stephenfinsean-k-mooney: Not _really_. I mean, 'isolate' results in extra cores being used and those have to be account for somehow14:46
sean-k-mooneystephenfin: they are in the resouce tracker14:46
dansmithsean-k-mooney: you can't really remove that image property14:46
dansmithyou can translate it into something more sane, but it's basically API at this point14:47
*** sapd1_x has joined #openstack-nova14:47
sean-k-mooneyi personally see value in it but its cause huge issues for cpu in plament14:47
dansmithif you just start ignoring that, everyone's tooling is going to start spinning up instances they think are isolated (or whatever) but arent' and they'll find out when it's too late14:47
sean-k-mooneyya i know14:47
sean-k-mooneywe can certely translate it14:48
stephenfindansmith: The migration path we'd suggested was keeping that but limiting it to hosts with "I don't have hyperthreads" trait set14:48
stephenfinI think14:48
sean-k-mooneyto forbined:trait=COMPUTE_SMT14:48
* stephenfin goes to double check14:48
stephenfinWait, yeah, that ^14:48
dansmithstephenfin: that's cool, it just can't like .. be removed, like sean-k-mooney was saying :)14:49
sean-k-mooneyremoved was a bad phasing14:49
stephenfindansmith: True. We can think about deprecating it in the future though14:49
sean-k-mooneyits meaning would change14:49
stephenfinIt's that or we carry the shim forever14:49
dansmithI don't14:49
dansmithit's API14:49
stephenfinWe deprecate/remove other APIs though?14:49
dansmithyou can fail the boot if it's specified as anything if you want14:49
dansmithstephenfin: this is unversioned14:50
dansmithbut I think we have to check for it basically forever14:50
dansmithit's also API that's unversioned and spread between nova, glance and cinder14:50
stephenfinHmm, good point14:50
dansmithI'm not saying you have to honor it well, with a shim forever,14:50
sean-k-mooneyok does it help to split that bit out into another spec14:50
dansmithbut you can't ever just start ignoring it, IMHO14:51
stephenfinok, let's kick that can down the road14:51
*** dklyle has joined #openstack-nova14:51
stephenfinfor now, I quite like the idea of overallocating the PCPUs for existing instances with the 'isolate' policy14:51
dansmiththe only reason not to separate it is if it's a problem for your current proposal, like you can't continue to honor it as is after you make other changes14:51
sean-k-mooneywell its a prequisit for dedicated cpus in placmente14:51
dansmithbut if that's not a problem, then sure14:51
sean-k-mooneywell it is a proablem for the current proposal14:52
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Add host and hypervisor_hostname flag to create server  https://review.openstack.org/64545814:52
dansmiththen we can't separate it completely14:52
stephenfinsean-k-mooney: Yeah, I don't think we can split it out entirely14:52
dansmithI've been on a call for the last hour, and have another starting soon that I actually have to pay attention to, FYI14:52
stephenfinWe need to figure out what happens right now, once we have PCPUs in placement14:52
dansmithso don't assume my pending silence on this matter is because I have shot myself in the face14:52
stephenfinWhat we don't need to figure out now is what we're doing even further down the road (in terms of failing the instance if the image property is set or something else)14:53
mriedembauzas: you know how we reset_forced_destinations on the request spec when moving a server?14:53
stephenfindansmith: Ack, me too14:53
sean-k-mooneystephenfin: well the traits thing is mentioned here https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@77814:53
dansmithstephenfin: yes, I think you can punt on that part14:53
mriedembauzas: if i create a server with a query scheduler hint targeted at a host or hypervisor_hostname, i can never migrate my server off the host :) same problem - but dumber14:54
stephenfinsean-k-mooney: Oh, indeed it is. I just need to expand on that I guess14:54
stephenfinOK, let me try and jot all this down in the spec and clean up the other issues14:54
* stephenfin would like to get started on the code for this sooner rather than latter as it's going to be a untangling stuff14:55
sean-k-mooneyso its litrally already part of the spec. the only fuctionality  that you loose with tah it you can nolonger use isolate to allow a host with hyperthreads to be shared between guests that want full coures and thost that can just have threads14:55
sean-k-mooneystephenfin: the code isnt the problem with this sepc14:56
sean-k-mooneystephenfin: the upgrade impact is and it think we can move forward with this spec but the current spec still has upgrade issues14:56
sean-k-mooneystephenfin: i share dansmith view that we shoudl enable inplace upgrades to this new way of doing things if we can do that i will be happy with this14:57
sean-k-mooneystephenfin: that requrie either paralle implementions and a config to opt in to new beahvior or no change to exising configs14:58
bauzasmriedem: sec, was otp15:01
mriedembauzas: somewhat related but you might have something to add to my reply here https://review.openstack.org/#/c/649534/5/nova/objects/request_spec.py@60615:02
bauzasuh, and now in meeting actually :(15:02
mriedembauzas: not high priority15:02
*** artom has quit IRC15:06
mriedemstephenfin: can we not have a py3 unit test for https://review.openstack.org/#/c/650235/ ?15:07
stephenfinmriedem: Not really, no. It's an environment thing15:08
mriedembut can't we control the environment in a test?15:08
*** tbachman has quit IRC15:09
mriedembtw zigo made the same change in osc https://review.openstack.org/#/c/541609/15:09
stephenfinYes? No? I honestly don't know. We'd be monkeypatching Python internals, I suspect15:09
mriedemidk about that,15:09
mriedemnova's tox.ini sets this:15:09
mriedemLC_ALL=en_US.utf-815:10
stephenfinright, so the Python process is correctly configured in that case15:10
stephenfinI think we'd have to reload the interpreter to misconfigure things15:10
sean-k-mooneyis this realted to the gat using the LC_ALL=C again15:11
stephenfinyup15:11
mriedemstephenfin: takashi also asked that something is documented about this which could probably go here https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-console-log15:11
stephenfinmriedem: I don't think OSC is an issue because of the follow up patch to the one you linked https://review.openstack.org/#/c/554698/115:11
mriedemah ok15:12
stephenfinAlas, we don't use cliff in novaclient15:12
*** gyee has joined #openstack-nova15:12
stephenfinAs for docs, it's on my todo list and I'll try drag something out by the end of the week15:12
stephenfinmriedem: There's an alternative approach we can take that doesn't require changing environment configuration, but I don't know if we want to do it as it's a huge hack https://review.openstack.org/#/c/583535/15:15
*** sridharg has quit IRC15:15
stephenfinI'd nearly rather suggest people use Python 3 if they're encountering these kinds of Unicode issues15:15
mriedemi'll see if i can add a unit test in novaclient15:18
*** mvkr has quit IRC15:27
melwittmriedem: thanks, will take a look. I'm in the middle of updating everything, will be able to push the updates soon today I think15:30
melwittsoonâ„¢15:34
*** ivve has quit IRC15:36
*** luksky has quit IRC15:36
*** tbachman has joined #openstack-nova15:41
*** pcaruana has quit IRC15:42
lyarwoodmelwitt: https://review.openstack.org/#/c/611974/ - finally got to this btw, LGTM after playing around with it locally.15:42
melwittlyarwood: just saw that, thanks so much15:43
*** wolverineav has joined #openstack-nova15:45
*** hamzy has quit IRC15:47
*** ccamacho has quit IRC15:49
*** wolverineav has quit IRC15:49
*** tbachman has quit IRC15:52
*** boxiang has quit IRC15:55
*** boxiang has joined #openstack-nova15:55
*** amodi has joined #openstack-nova15:56
*** sapd1_x has quit IRC15:58
*** yan0s has quit IRC15:58
*** tssurya has quit IRC16:00
openstackgerritMatt Riedemann proposed openstack/python-novaclient master: Add test for console-log and docs for bug 1746534  https://review.openstack.org/65182716:06
openstackbug 1746534 in python-novaclient "encoding error when doing console-log" [High,Fix released] https://launchpad.net/bugs/1746534 - Assigned to Thomas Goirand (thomas-goirand)16:06
mriedemstephenfin: see how ^ grabs you16:06
mriedemi couldn't reproduce the original bug in the unit test16:06
stephenfinmriedem: Yup, that looks good to me. Good find with those click docs16:08
mriedemneed a stable core to hit these https://review.openstack.org/#/q/topic:bug/1821824+branch:stable/stein16:13
*** rpittau is now known as rpittau|afk16:15
lyarwoodmriedem: have the branch open to review now btw16:15
*** tbachman has joined #openstack-nova16:15
*** artom has joined #openstack-nova16:18
*** chhagarw has quit IRC16:25
*** tbachman has quit IRC16:31
*** zbr has quit IRC16:34
*** dave-mccowan has quit IRC16:39
*** zbr has joined #openstack-nova16:40
*** ricolin has quit IRC16:44
*** ttsiouts has quit IRC16:44
*** ttsiouts has joined #openstack-nova16:45
*** ttsiouts has quit IRC16:50
*** dtantsur is now known as dtantsur|afk16:50
*** davidsha has quit IRC16:52
openstackgerritMerged openstack/nova master: devstack: Remove 'tempest-dsvm-tempest-xen-rc'  https://review.openstack.org/65001816:54
*** ivve has joined #openstack-nova16:56
*** hamzy has joined #openstack-nova17:00
*** pcaruana has joined #openstack-nova17:05
*** igordc has joined #openstack-nova17:06
*** derekh has quit IRC17:11
*** tbachman has joined #openstack-nova17:13
*** jobewan has quit IRC17:13
openstackgerritRodolfo Alonso Hernandez proposed openstack/os-vif master: Remove IP proxy methods  https://review.openstack.org/64311517:14
openstackgerritRodolfo Alonso Hernandez proposed openstack/os-vif master: Refactor functional base test classes  https://review.openstack.org/64310117:14
lyarwoodmriedem: https://review.openstack.org/#/q/status:open+topic:bug/1803961 - Would you mind taking a look at this again when you have time. I'm going to suggest that we land and backport this over the competing cinder fix for the time being.17:14
*** jobewan has joined #openstack-nova17:15
*** ivve has quit IRC17:15
*** penick has joined #openstack-nova17:16
openstackgerritMerged openstack/nova master: trivial: Remove dead nova.db functions  https://review.openstack.org/64957017:17
*** jobewan has quit IRC17:20
*** tbachman has quit IRC17:25
*** tbachman has joined #openstack-nova17:26
*** gyee has quit IRC17:29
*** Sundar has joined #openstack-nova17:30
*** cfriesen has quit IRC17:34
*** cfriesen has joined #openstack-nova17:34
*** luksky has joined #openstack-nova17:41
mriedemlyarwood: couldn't get jgriffith or another cinder person to look at https://review.openstack.org/#/c/637224/ ?17:48
lyarwoodmriedem: I did a while ago and they pointed me towards https://review.openstack.org/#/c/638995/ but that has now stalled and I'm thinking it might just be easier to fix and backport in Nova given the cinder change is touching lots of different backends.17:51
lyarwoodmriedem: I can ask again to get confirmation that using migration_status is okay with them as a workaround until ^ lands.17:52
*** ralonsoh has quit IRC17:56
*** gmann is now known as gmann_afk18:15
mriedemmelwitt: thinking out loud about quota and cross-cell resize and your placement change, when a user resizes a server today, placement will track vcpu/ram usage against both the source and dest node, but the /limits API will only show vcpu/ram usage for the new flavor since that is counted from the instance right?18:22
mriedemso i think if i'm resizing from a flavor with vcpu=2 to vcpu=4, usage from placement for that project will say a total of 6, but the compute limits API would say 418:22
mriedemi'm not necessarily saying that's wrong, but is that accurate?18:23
melwittyeah, it's definitely not going to say 6, but I don't remember if it will say 2 or 4 before the resize is confirmed18:24
mriedemplacement would say 618:24
melwittyeah18:25
melwittlooking at the code to see when the new flavor is saved to the Instance object vcpus and memory_mb attributes18:25
mriedemonce the server is resized the limits api would say 4 https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/objects/instance.py#L153518:25
melwittthose attributes are what's counted today18:26
*** tosky has quit IRC18:26
mriedemit's finish_resize on the dest host https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/compute/manager.py#L463218:26
mriedemsame method that changes the instance status to VERIFY_RESIZE18:26
melwittah ok18:27
mriedemmaybe this is part of why we were talking about adding consumer types to placement so nova could ask for usage for 'instances' to filter out the usage tracked by the migration record holding the old flavor usage18:28
mriedemanyway, it's a thing for cross-cell resize because while the instance is in VERIFY_RESIZE status we'll have the instance in both the source and target dbs and if we're counting from the dbs we don't want to double count the instance, so i need to filter out the hidden one,18:30
mriedembut that got me thinking about how vcpus/ram will be counted18:30
melwittyeah. food for thought18:31
mriedemi'm assuming no one wants to think about that or eat that food though18:31
melwittof course not18:31
*** jmlowe has quit IRC18:32
openstackgerritMerged openstack/nova stable/rocky: doc: Fix openstack CLI command  https://review.openstack.org/64842518:33
openstackgerritMerged openstack/nova stable/rocky: doc: Capitalize keystone domain name  https://review.openstack.org/65060118:33
melwittthis was brought up before in earlier discussions about counting from placement I think, and IIRC some (dansmith?) thought if placement is consuming 6 resources at that point in time, it makes sense for quota usage counting to reflect that as well18:33
dansmithit depends on the resource and the direction18:34
melwittwhen do the old allocations go away from placement? VERIFY_RESIZE or after CONFIRM_RESIZE?18:34
dansmithideally we would only consume max(old_cpus, new_cpus) from quota18:34
dansmith(and placement18:34
dansmithbecause that's all we need to revert18:34
dansmithbut potentially we need to claim sum(old_disk, new_disk) depending18:34
melwittI see18:35
dansmithand always both if we're not on the same node of course18:35
mriedemright it's a mess and resize to same host doesn't help https://bugs.launchpad.net/nova/+bug/179020418:35
openstackLaunchpad bug 1790204 in OpenStack Compute (nova) "Allocations are "doubled up" on same host resize even though there is only 1 server on the host" [High,Triaged]18:35
mriedemmelwitt: the old allocations go away on confirm18:36
melwittack18:36
mriedemit's probably fair to say that our internal resource tracking and quota usage reporting during resize have just never aligned18:36
mriedemthe resource tracker would always report usage for the old flavor on the source node while resized to save room for a revert,18:37
mriedembut the quota usage wouldn't reflect that18:37
mriedemi don't know if the old pre-counting reservations stuff held some quota during a resize or not18:37
mriedemi.e. did we hold a reservation on the target host and then /limits + reserved would report 6 in this scenario rather than 4, idk18:38
mriedemno one would probably notice18:38
*** tbachman has quit IRC18:39
melwittmriedem: doesn't look like it. from this, you would fail quota check if you didn't have room to revert https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L320118:44
mriedemthat code is all black magic to me, i'd have to setup an ocata devstack to see what happens in the API18:48
mriedemdefinitely not a priority, was just thinking about it while i write a functional test for this for my cross-cell series18:48
melwittit does reserve for the new flavor here though at the beginning of the resize https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L335118:48
melwittif it's an upsize18:50
melwittso from what I can tell, it will consume quota usage for the new flavor as soon as the resize starts18:53
melwittonly for an upsize18:53
melwittotherwise it will consume for the old flavor still18:54
mriedemok so pre-pike the RT would report usage for both the old and new flavor and /limits API would show usage for the old and new flavor (assuming upsize), and starting in pike we'll only report usage for the new flavor18:55
mriedemwhich could also be a downsize18:55
mriedemmaybe i drop vcpu but bump disk or something18:55
*** bbowen_ has joined #openstack-nova18:57
*** wolverineav has joined #openstack-nova18:58
*** wolverineav has quit IRC18:58
*** wolverineav has joined #openstack-nova18:58
*** bbowen has quit IRC18:59
melwittno, I think the limits API would show the usage for only the new flavor because it would be upsize_delta + current usage (old flavor)19:00
melwittpre-pike19:00
*** tbachman has joined #openstack-nova19:00
melwittsorry I wasn't clear that it reserves the delta between old and new19:01
*** pcaruana has quit IRC19:02
*** wolverineav has quit IRC19:15
*** jmlowe has joined #openstack-nova19:16
*** wolverineav has joined #openstack-nova19:16
*** mdbooth_ has joined #openstack-nova19:20
mriedemincoming19:21
*** wolverineav has quit IRC19:21
openstackgerritMerged openstack/nova stable/rocky: Add functional regression test for bug 1669054  https://review.openstack.org/64932519:21
openstackbug 1669054 in OpenStack Compute (nova) rocky "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem)19:21
openstackgerritMatt Riedemann proposed openstack/nova master: Fix ProviderUsageBaseTestCase._run_periodics for multi-cell  https://review.openstack.org/64117919:22
openstackgerritMatt Riedemann proposed openstack/nova master: Improve CinderFixtureNewAttachFlow  https://review.openstack.org/63938219:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1818914  https://review.openstack.org/64152119:22
openstackbug 1818914 in OpenStack Compute (nova) "Hypervisor resource usage on source still shows old flavor usage after resize confirm until update_available_resource periodic runs" [Low,In progress] https://launchpad.net/bugs/1818914 - Assigned to Matt Riedemann (mriedem)19:22
openstackgerritMatt Riedemann proposed openstack/nova master: Remove unused context parameter from RT._get_instance_type  https://review.openstack.org/64179219:22
openstackgerritMatt Riedemann proposed openstack/nova master: Update usage in RT.drop_move_claim during confirm resize  https://review.openstack.org/64180619:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add Migration.cross_cell_move and get_by_uuid  https://review.openstack.org/61401219:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add InstanceAction/Event create() method  https://review.openstack.org/61403619:22
openstackgerritMatt Riedemann proposed openstack/nova master: DNM: Add instance hard delete  https://review.openstack.org/65098419:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add Instance.hidden field  https://review.openstack.org/63112319:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add TargetDBSetupTask  https://review.openstack.org/62789219:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add CrossCellMigrationTask  https://review.openstack.org/63158119:22
openstackgerritMatt Riedemann proposed openstack/nova master: Execute TargetDBSetupTask  https://review.openstack.org/63385319:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add can_connect_volume() compute driver method  https://review.openstack.org/62131319:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_dest compute method  https://review.openstack.org/63329319:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add PrepResizeAtDestTask  https://review.openstack.org/62789019:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_source compute method  https://review.openstack.org/63483219:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add nova.compute.utils.delete_image  https://review.openstack.org/63760519:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add PrepResizeAtSourceTask  https://review.openstack.org/62789119:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method  https://review.openstack.org/63804719:22
openstackgerritMatt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API  https://review.openstack.org/63804819:22
openstackgerritMatt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server  https://review.openstack.org/63826819:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add archive_deleted_rows wrinkle to cross-cell functional test  https://review.openstack.org/65165019:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add CrossCellWeigher  https://review.openstack.org/61435319:22
openstackgerritMatt Riedemann proposed openstack/nova master: Add cross-cell resize policy rule and enable in API  https://review.openstack.org/63826919:22
efriedsean-k-mooney: you still awake?19:22
*** mdbooth has quit IRC19:23
*** gmann_afk is now known as gmann19:27
*** Sundar has quit IRC19:35
melwittefried: fyi, patch to add a few post-release items to the ptl guide https://review.openstack.org/65100919:36
*** baclawski has joined #openstack-nova19:37
*** Sundar has joined #openstack-nova19:38
*** awaugama has quit IRC19:39
efriedmelwitt: ack, on the radar, thank you.19:40
efriedoh, that was easy. +219:41
*** baclawski has quit IRC19:42
*** baclawski has joined #openstack-nova19:47
*** wolverineav has joined #openstack-nova19:54
*** pchavva has quit IRC19:55
*** bbowen_ has quit IRC19:56
*** wolverineav has quit IRC19:57
*** wolverineav has joined #openstack-nova19:57
*** baclawski has quit IRC19:58
openstackgerritArtom Lifshitz proposed openstack/nova master: [DNM: extra logs] Revert resize: wait for external events in compute manager  https://review.openstack.org/64488120:00
* artom beats his head against the wall for ^^20:00
*** wolverineav has quit IRC20:02
*** wolverineav has joined #openstack-nova20:03
mriedemheh i didn't realize there was a ptl guide20:08
*** wolverineav has quit IRC20:08
*** wolverineav has joined #openstack-nova20:08
*** wolverineav has quit IRC20:11
*** wolverineav has joined #openstack-nova20:12
*** wolverineav has quit IRC20:13
*** wolverineav has joined #openstack-nova20:13
openstackgerritMerged openstack/python-novaclient master: Add test for console-log and docs for bug 1746534  https://review.openstack.org/65182720:14
openstackbug 1746534 in python-novaclient "encoding error when doing console-log" [High,Fix released] https://launchpad.net/bugs/1746534 - Assigned to Thomas Goirand (thomas-goirand)20:14
melwittmriedem: I added it recently. was just a copy-paste of a google doc I had made to help me as ptl20:22
*** wolverineav has quit IRC20:26
*** eharney has quit IRC20:34
melwittthis came up in downstream bug triage, cells v2 discover_hosts can fail with DBDuplicateEntry if run in parallel per compute host deployed20:40
dansmithmelwitt: I'm sure20:41
melwittfound related issues from OSA and chef: https://bugs.launchpad.net/openstack-ansible/+bug/1752540 https://github.com/bloomberg/chef-bcpc/issues/137820:41
openstackLaunchpad bug 1752540 in openstack-ansible "os_nova cell_v2 discover failure" [Undecided,In progress] - Assigned to git-harry (git-harry)20:41
dansmithwhy would someone run it on each compute node20:41
dansmithah the OSA one seems to be that they're running it per conductor20:42
dansmithwhich is also not really what should be happening, but less bad than compute I guess20:42
melwittmaybe lack of clear documentation but probably just thinking deploy compute host => discover host and made it part of that task piece20:42
dansmithexcept you have to give the compute node db credentials it doesn't otherwise need in order to do that :)20:42
dansmithbut yeah, misunderstanding of what should be done, I'm sure20:42
mriedemi want to say mdbooth_ opened some bug like this20:43
*** wolverineav has joined #openstack-nova20:43
mriedemconcurrency something or other20:43
mriedembut it was like db sync concurrently i think20:44
dansmithit's not really a concurrency thing in the way he normally looks for such issues,20:44
*** artom has quit IRC20:44
dansmithyeah, that also is one of those "don't do that" things, IMHO20:44
melwitthm, I wasn't thinking that they give db credentials to compute node, just running discover_host per, centrally? I dunno, nevermind20:44
dansmithunless we want to do our own locking in the DB to prevent it, which is kinda silly20:44
mriedemit was this https://bugs.launchpad.net/nova/+bug/180465220:44
openstackLaunchpad bug 1804652 in OpenStack Compute (nova) "nova.db.sqlalchemy.migration.db_version is racy" [Low,In progress] - Assigned to Matthew Booth (mbooth-9)20:44
dansmithmelwitt: you have to have api db credentials to run discover hosts, which are normally not on the compute node (or shouldn't be)20:45
dansmithmriedem: yeah, would love to WONTFIX that20:45
mriedemyou could -1 the change20:46
mriedemstart a war20:46
mriedembut this is now the month of positivity and motivation20:46
melwittdansmith: yeah... I was thinking maybe they're doing that in a central place that is deploying compute nodes in parallel but not necessarily running nova-manage _on_ the compute nodes?20:46
*** hamzy has quit IRC20:46
dansmithmelwitt: who are we talking about? I thought you said "per compute" so I thought you meant on a compute, but maybe you just mean spawning a bunch of nova-manage commands on one node one for each new host?20:47
melwittI'll reply on the bug with guidance to run discover_hosts once after deploying compute hosts. and then I think I'll have a docs patch in my future20:47
dansmiththat's even more.. crazy20:47
dansmithshould be using anisble triggers for that, which AFAIK, would mean a thing runs one20:47
dansmith*once20:47
dansmithlike, you don't restart apache for every module you enable, you enable a bunch of modules and run restart at the end  :)20:48
melwittdansmith: I don't actually know the details of how the deployment works, was just thinking it seems unlikely they're putting api db creds on compute hosts and running nova-manage on them20:48
dansmiths/triggers/handlers/20:48
dansmithmelwitt: ack, I just thought that was what you originally asserted20:48
openstackgerritMatt Riedemann proposed openstack/python-novaclient stable/stein: Add test for console-log and docs for bug 1746534  https://review.openstack.org/65192520:49
openstackbug 1746534 in python-novaclient "encoding error when doing console-log" [High,Fix released] https://launchpad.net/bugs/1746534 - Assigned to Thomas Goirand (thomas-goirand)20:49
melwittno, I didn't mean to make it sound like that20:49
*** takashin has joined #openstack-nova20:49
openstackgerritMatt Riedemann proposed openstack/python-novaclient stable/stein: Add test for console-log and docs for bug 1746534  https://review.openstack.org/65192520:49
dansmithmelwitt: so, we could do an external lock on nova-manage which would prevent the specific case of running multiple copies on a single host,20:50
dansmithbut it might also be confusing because it won't fix the case where it's multiple on different hosts20:50
dansmithwe could do some janky locking in the database to try to catch it, but I'd rather just say "do not do this lest ye burn in nova purgatory"20:50
dansmithor, you know, something20:50
mriedemi smell etcd20:51
mriedemand tooz20:51
melwittlol.. yes20:51
mriedemthat's like 2 etcd references for nova in the last few weeks20:51
dansmithyes, let's deploy and depend on etcd instead of people using ansible handlers! :P20:51
* mriedem starts writing the spec20:51
melwittok, I'll reply with guidance on what to do. thanks y'all20:52
dansmithmriedem: make sure the spec says that etcd is only running when nova-manage is used, and stopped at runtime20:52
* dansmith polishes his resume20:52
* mriedem thinks of polish sausage20:52
eanderssonDo you need to run nova-manage placement heal_allocations when upgrading from non-cell to Cell V2?20:53
mriedemeandersson: no20:53
mriedemdoesn't have anything to do with cells20:53
eanderssonLet me re-phrase, when upgrading from Mitaka to Rocky20:53
mriedemit was built for people migrating from caching scheduler - which didn't use placement and thus didn't create allocations - to filter scheduler20:53
mriedemeandersson: in that case...maybe20:54
eanderssonbecause placement is empty for us20:54
mriedemheh20:54
eanderssonand it was never in any of the upgrade steps20:54
mriedembecause "upgrading from 1983 to now" isn't a supported upgrade20:54
eanderssonwell we didn't upgrade from mitaka to rocky20:54
eanderssonwe upgraded from one version to another20:54
eanderssonalso not a helpful comment20:55
mriedemeandersson: sorry,20:55
mriedemyou've been at rocky for a couple of weeks now haven't you/20:55
mriedem?20:55
eanderssonYes - and things have been working great20:55
eanderssonbut we started seeing some odd errors20:55
dansmitheandersson: don't mind mriedem he's just cranky today.. it's about nap time20:55
openstackgerritMerged openstack/nova stable/stein: Add retry_on_deadlock to migration_update DB API  https://review.openstack.org/64842820:55
mriedemso you're at rocky but still not seeing anything in placement? like resource providers? or just allocations?20:55
mriedemi'm not sure how you could create a server...20:56
eanderssonWe see new servers in placement20:56
mriedembut not the old ones20:56
mriedem*you don't see allocations for old servers in placement20:56
eanderssonThe problem is that now we try to run the above command and it fails20:56
* melwitt stands amazed that ocata devstack deployed without a hitch20:56
mriedemeandersson: paste me the error20:57
mriedemi should have put a dry run on that command20:57
mriedemit was in the todo list20:57
melwittnova meeting in a few min20:57
eanderssonThe first error we saw was this20:57
eandersson> Instance xxx has allocations against this compute host but is not found in the database.20:57
eanderssonNext we ran the heal command, but for some reason it thinks the overprovisoned host is too full and fails to complete20:58
mriedemthe heal command is trying to PUT /allocations against a given resource provider (compute node) for a given instance,20:58
mriedemand i've you've migrated or created servers on that compute node, placement is going to say "you're already at capacity"20:59
mriedemwhich is why that PUT is probably failing even though there is already a server on it consuming resources20:59
mriedemeandersson: so in that case, it'd be helpful to look at the inventory reported by placement for that compute node21:00
efriednova meeting now #openstack-meeting21:00
mriedemeandersson: which you can get with this https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-show21:00
mriedemuuid is the uuid of the compute node21:00
mriedemsorry https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-list is probably more helpful21:00
ceryxAfter running heal_allocations we have the same consumer_id (which I believe is the compute instance ID) have allocations created across 6 separate resource providers, each being a different compute node21:02
mriedemceryx: are you and eandersson in kahoots or just coincidental?21:02
eanderssonkahoots :D21:02
mriedemok21:03
mriedemgood21:03
ceryxFor RAM, 5 of those allocations are for 32GB but one is for 64GB which seems also strange (and yes sorry, was about to clarify, working with eandersson)21:03
*** whoami-rajat has quit IRC21:03
mriedemthe consumer id is the instance id yes21:04
mriedemhas the instance been migrating around?21:04
*** ak92514 has joined #openstack-nova21:05
efriedthat 64... if we're leaking allocations and we move an instance away and back...?21:05
ceryxhttp://paste.openstack.org/show/749212/ this is what we're seeing for one consumer ID. I checked nova.migrations and don't see this instance mentioned.21:07
openstackgerritMerged openstack/nova stable/stein: Use Selection object to fill request group mapping  https://review.openstack.org/64771321:07
mriedemit could be a migration record21:07
mriedemcheck your migrations table in the cell db for that id21:07
mriedemdo you see errors in the logs when you're doing a migration?21:08
mriedemmy guess is you've done cold migrations / resize and the source node allocations, which are held by the migration record consumer, are not getting cleaned up for some reason21:09
mriedemthe multiple allocations for the same consumer and provider are because of different resource classes, mostly like VCPU and MEMORY_MB21:10
eanderssonWe have had migrations disabled for a very long time21:10
ceryxAh yep, I was checking uuid and not instance_uuid from migrations. This had previously (about a month ago) had 6 failed migrations, then one confirmed migration.21:10
mriedemif these are volume-backed servers21:10
eanderssonoh so nvm :D21:10
mriedemceryx: in a nutshell, when you start a cold or live migration, the allocations held by the instance (instances.uuid) on the source node provider are moved to a migration record consumer (consumer_id=migrations.uuid), and the dest node provider allocations are held by the instance during scheduling.21:12
mriedemhttps://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html21:13
mriedemwhen the cold migration / resize is confirmed the allocations against the source node held by the migration record should be deleted21:13
mriedemand you should just be left with the instance consuming allocations from the target provider21:14
mriedemso in this paste is fec5409b-010e-4316-845c-ef68440d3593 an instance uuid or migration uuid?21:14
mriedemlooks like resource_class_id=0 is for VCPU and resource_class_id=1 is for MEMORY_MB21:15
ceryxIn this case the computes that we see unexpected allocations on were target hypervisors of an errored migration. So it looks like heal_allocations might be creating allocations for migrations in error state?21:15
mriedemand this is a volume-backed server because there is no DISK_GB allocation21:15
ceryxAnd yes - this has a cinder backed root disk21:15
mriedemheal_allocations shouldn't even be looking at migrations, just instances, but it's been awhile since i dug into this21:16
openstackgerritMerged openstack/nova stable/stein: Add functional recreate test for bug 1819963  https://review.openstack.org/64840121:17
openstackbug 1819963 in OpenStack Compute (nova) stein "Reverting a resize does not update the instance.availability_zone value to the source az" [Medium,In progress] https://launchpad.net/bugs/1819963 - Assigned to Matt Riedemann (mriedem)21:17
*** slaweq has quit IRC21:20
mriedemso heal_allocations should skip the instance if it doesn't have a node set, but if the migration failed the node shouldn't keep changing21:21
mriedemand it also shouldn't PUT new allocations if the instance already has allocations in placement, which it should have if you tried migrating it since you upgraded to rocky21:21
eanderssonWe don't use azs btw21:21
*** dansmith changes topic to "Current runways: https://etherpad.openstack.org/p/nova-runways-train -- This channel is for Nova development. For support of Nova deployments, please use #openstack."21:21
*** ChanServ sets mode: -o dansmith21:22
eandersson(or we only have one rather :p)21:22
mriedemnote sure why azs would have anything to do with this21:22
eanderssonah I was looking at the bug above :p21:23
eanderssonDidn't realize that it was unrelated21:23
mriedemso for all 6 compute node resource providers in that paste, are there actually 6 matching compute_nodes in the cell db with the same uuid as the resource provider?21:24
mriedemi wonder if you maybe have duplicate compute nodes?21:24
mriedembut with different uuids21:24
mriedemalthough unless the hostname changed i'm not sure how that could happen since there is a unique constraint on that table for host/hypervisor_hostname21:25
mriedemmelwitt: ^ this kind of problem is why switching counting quota usage from placement by default worries me21:26
ceryxYeah, all migration attempts were post-rocky upgrade. Each one of the resource provides does match a different compute_node that are all still online, they were just past targets for the failed migration.21:26
mriedemceryx: hmm, so maybe the scheduler created the allocations for that instance and each of those providers, but then the migration failed and we failed to cleanup the allocations created by the scheduler21:27
ceryxIn the allocations DB they were all created when we ran heal_allocations though according to the created_at date, so these aren't old allocations that were failed21:27
ceryxAll the allocations for this consumer_id across all 6 resource providers were created at 2019-04-11 20:25:5721:27
mriedemlike i said, heal_allocations should only create allocations if the instance doesn't have any and even then should only be against the same node for the instance.host/node values21:27
mriedemdo you have the output of the command?21:28
melwittmriedem: ack. the more examples I see, the more I lean toward making it opt-in by default for now. until we have allocations issues more sorted out21:28
dansmithalso why I want to make the image filter (and other prefilters) opt-in21:29
dansmithuntil we're sure they're 100% right for everyone21:29
* melwitt nods21:29
*** tosky has joined #openstack-nova21:29
mriedemceryx: this is the method that does the actual work per instance https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L181121:29
mriedemso the instance has to be on a node here https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L183821:30
mriedemthen we check to see if it already has allocations https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L184821:30
mriedemif it doesn't, we get the compute node uuid which should match the provider https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L188321:30
mriedemso i'm not sure why that would create allocations for the same instance against 6 different providers21:31
mriedemwhat would be more likely to me is what i said before - the scheduler created the allocations during the migration, it failed, and then we didn't cleanup somewher21:31
ceryxmriedem: I don't have the full output, was not expecting this many allocations to get created and it ate up all my scrollback.21:32
* mriedem curses self for not adding the --dry-run option yet21:32
ceryxWhat would the process be for cleaning up allocations that exist but shouldn't? Would it be relatively safe to delete the allocations for this one consumer_id, then rerun heal_allocations and confirm what was added back?21:32
mriedemsec21:32
mriedemmnaser has a script i think21:33
mriedemhttps://bugs.launchpad.net/nova/+bug/179356921:33
openstackLaunchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed]21:33
*** Sundar has quit IRC21:33
mriedemceryx: so that bug has a link to a script from mnaser which it looks like just dumps commands to run,21:35
mriedemand links to another tool from larsks21:35
mriedemceryx: "Would it be relatively safe to delete the allocations for this one consumer_id, then rerun heal_allocations and confirm what was added back?" - i think so, but i'd also like it better if you had a --dry-run option when doing that with heal_allocations as well,21:36
mriedemwhich i could probably wip up real quick21:36
*** slaweq has joined #openstack-nova21:37
*** wolverineav has quit IRC21:37
ceryxThat would be awesome :D21:38
*** wolverineav has joined #openstack-nova21:38
mriedemok will crank something out here21:38
imacdonn_so I seem to have a problem with Stein .. haven't fully diagnosed yet, but if I run online_data_migrations a second time, fill_virtual_interface_list fails with:21:40
imacdonn_2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4050, in _security_group_ensure_default21:40
imacdonn_2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage     default_group = _security_group_get_by_names(context, ['default'])[0]21:40
imacdonn_2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage TypeError: 'NoneType' object has no attribute '__getitem__'21:40
imacdonn_ring any bells ?21:40
imacdonn_http://paste.openstack.org/show/73aAO3bB23d62wBjt5nL/21:42
mriedemimacdonn_: not for me21:42
* mriedem needs help from other nova people at the help desk21:42
*** wolverineav has quit IRC21:42
imacdonn_k. I'll try to dig into it a bit. Tnx.21:43
*** wolverineav has joined #openstack-nova21:46
efriedimacdonn_: Looking at that method, that should be impossible21:46
efriedoh21:47
efriedimacdonn_: Do you have more than one security group named 'default'?21:47
imacdonn_well, each project has one.....21:47
efriedIt oughtta be filtering by project ID21:48
imacdonn_I do see two rows in the security_groups table with name="default" but project_id NULL .. not sure if that should be21:49
efriedbut that's the only way that method can return None21:49
efriedmhm, that'd do it.21:49
efriedif you called with project_id NULL somehow21:49
efriedin your context21:49
efriedimacdonn_: Repeatable?21:49
*** gyee has joined #openstack-nova21:49
mriedemah,21:49
mriedemthe online_data_migratoin is using an admin context,21:50
mriedemwhich doesn't have a project_id21:50
imacdonn_I'm seeing it in two different installations - one was a fresh install, the other upgraded from Rocky21:50
*** slaweq has quit IRC21:50
efriedmriedem: but that works fine unless there's multiple default security groups with project_id NULL, yah?21:51
efriedimacdonn_: So I can WIP a patch that ought to make the problem go away; or you can manually delete the extra row from your security groups table.21:51
mriedemefried: he said he's running it twice21:52
efriedcourse it'd be nice to know how it got there21:52
melwittI really hope this isn't related to the user_id instance mapping migration somehow21:52
mriedemand blows up the 2nd time right?21:52
mriedemmelwitt: he said it was the vifs one21:52
mriedem"if I run online_data_migrations a second time, fill_virtual_interface_list fails with:"21:52
melwittyeah, but I added user_id to that, which shouldn't hurt21:52
imacdonn_yeah, it seemed to work OK the first time, but blows on subsequent attempts ... not sure where these NULL security groups are coming from21:53
imacdonn_in the frest install case, the projects probably didn't exist when migrations were run the first time21:54
melwittok, I added the user_id for the fill virtual interface list instance mapping marker record. so shouldn't be related but just wanted to mention there was a change in stein there21:55
mriedemthe vifs migration was also new in stein22:07
*** luksky has quit IRC22:08
melwittoh. nevermind me22:09
*** igordc has quit IRC22:09
mriedemimacdonn_: w/o looking at the code i think the migration is creating a marker instance record22:10
mriedemwhich is why it's using an empty admin context with no project,22:10
mriedemit should be using a sentinel for the project_id probably22:10
melwittit uses a sentinel of all zeros uuid22:10
mriedemshould be fairly easy to reproduce that by just modifying an existing test to run the command twice22:10
melwitthttps://github.com/openstack/nova/blob/master/nova/objects/virtual_interface.py#L30322:11
mriedemmelwitt: but the context doesn't have that22:11
mriedemand thta's what the db api is looking for i think22:11
melwittoh, other direction22:12
mriedemhttp://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n403722:12
melwittI see, ok22:13
openstackgerritDustin Cowles proposed openstack/nova master: WIP/PoC: Introduces the openstacksdk to nova  https://review.openstack.org/64366422:13
mriedemimacdonn_: report a bug22:13
mriedemi'm glad we didn't backport that data migration yet...i was worried about just backporting it before anyone was using it (besides ovh) since it's pretty complicated22:13
mriedemmaciejjozefczyk: ^22:14
imacdonn_OK. Are we still using launchpad? I've been a bit out of the loop22:14
mriedemceryx: i've got this --dry-run patch coming, just building docs and running tests locally first22:14
mriedemimacdonn_: of course22:14
mriedemonly crazy projects move to SB :)22:14
imacdonn_heh ok22:14
melwittimacdonn_: yes launchpad for nova. it's placement that has moved to storyboard22:14
*** tosky has quit IRC22:15
openstackgerritMatt Riedemann proposed openstack/nova master: Add --dry-run option to heal_allocations CLI  https://review.openstack.org/65193222:16
mriedemceryx: eandersson: ^ should be backportable to rocky i think, that code hasn't changed much22:16
mriedemor run it in a container or something22:16
*** slaweq has joined #openstack-nova22:16
imacdonn_https://bugs.launchpad.net/nova/+bug/182443522:18
openstackLaunchpad bug 1824435 in OpenStack Compute (nova) "fill_virtual_interface_list migration fails on second attempt" [Undecided,New]22:18
*** chhagarw has joined #openstack-nova22:21
mriedemimacdonn_: thanks triaged - are you working a fix?22:24
imacdonn_mriedem, negative .. I don't think I understand the problem well though (yet?)22:25
*** rcernin has joined #openstack-nova22:26
efriedmriedem: I still don't get how the duplicate row is getting created.22:29
efriedShouldn't https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/db/sqlalchemy/api.py#L4037 only happen the first time?22:29
openstackgerritDustin Cowles proposed openstack/nova master: WIP/PoC: Use SDK instead of ironicclient for node.get  https://review.openstack.org/64289922:32
mriedemhmm yeah i'm not sure how https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/db/sqlalchemy/api.py#L3874 can't either return at least 1 or raise22:33
*** chhagarw has quit IRC22:35
efriedoh, that part is because there's two rows in the database with the "right" project_id (NULL)22:35
efriedmriedem: So the initial check (==) fails because there's *more* db rows than expected22:36
efriedand then the for loop doesn't hit because all the names match (because they're the same)22:36
mriedemah yup22:36
mriedemi don't know why the unique constraint doesn't blow up - because NULL isn't considered unique?22:36
efriedso the problem is that we're somehow creating two rows, and I don't know how that ....22:36
mriedemschema.UniqueConstraint('project_id', 'name', 'deleted',22:36
mriedem                                name='uniq_security_groups0project_id0'22:36
mriedem                                     'name0deleted'),22:36
efriedcalling all zzzeek?22:36
mriedemimacdonn_: what db are you using? oracle?22:37
mriedemor mysql?22:37
imacdonn_mysql22:37
mriedemhttps://stackoverflow.com/questions/3712222/does-mysql-ignore-null-values-on-unique-constraints/1654168622:38
mriedemi remember something about this when i added db2 support to nova way back when22:38
mriedemdb2 was very strict about null values in a constraint but mysql isn't22:38
mriedemour tests don't fail b/c we're using sqlite22:39
efriedbutbutbut22:39
efriedunique constraint or no22:39
efriedwhat, are we hitting that NotFound from multiple threads, reliably, at the same time??22:39
*** slaweq has quit IRC22:40
mriedemi'm able to recreate the same thing in the db in my devstack http://paste.openstack.org/show/749218/22:44
mriedembut i'm not hitting errors running the online data migration22:44
mriedemi created a server and ran the migrations a few times22:44
mriedemimacdonn_: are you running the CLI concurrently or something? or able to recreate manually?22:45
imacdonn_mriedem, you mean like two instances of nova-manage at the same time? no.... I can run it manually and it fails consistently22:46
mriedemon a fresh install?22:46
imacdonn_One of these was a fresh install, but I did create an instance at some point22:47
mriedemok i'm not sure how to recreate it then22:51
mriedemceryx: eandersson: it's getting late here for the work day but please follow up with me on whatever you figure out with this allocations thing22:52
melwittdansmith: would another option other than trying to lock for discover_hosts be to try-except around the host_mapping.create() and catch and ignore DBDuplicateEntry?22:52
melwitthttps://github.com/openstack/nova/blob/master/nova/objects/host_mapping.py#L19322:54
mriedemmelwitt: here as well https://github.com/openstack/nova/blob/master/nova/objects/host_mapping.py#L21122:54
mriedemthat seems reasonable though22:54
melwittyes, two places22:54
ceryxmriedem: Thanks, will do. eandersson is building a new container now with that patch to test.22:55
dansmithmelwitt: obviously that will avoid the trace, yeah. I'd still abort and say "looks like you're doing naughty things, so I'm stopping"22:55
mriedemceryx: unfortunately there isn't a way to just run heal_allocations on a specific instance yet, but maybe you won't get too much output22:56
melwittdansmith: hm, ok22:56
mriedemthe discover hosts periodic is in the scheduler process right?22:56
mriedemso you could have multiple schedulers running discover_hosts at the same time22:57
dansmithmriedem: yup22:57
dansmithyup22:57
mriedemso...probably not a terrible idea to just handle the duplicate and move on22:57
melwittI guess I'm not sure why it's so bad because the database will synchronize everything anyway. if there's a collision, just ignore it22:57
dansmiththat periodic was really just for the ironic case where you'd have a small control plane22:57
mriedemsure, but people are going to use knobs if we give them22:58
mriedemi'm not saying that's what osa / chef are hitting here, but it's another possibility22:58
eanderssonUnfortunately a lot of changes between master and rocky in that script22:58
mriedemeandersson: oh - probably the report client refactoring stuff...22:58
dansmithmelwitt: it's not so bad, it just seems like a bad idea to act like that's okay or expected.. we're looking for un-mapped records and adding host mappings, then marking those service records as mapped22:58
mriedemeandersson: if we had a bug or something we could think about backporting that change upstream22:59
* efried just figured out why he hasn't been receiving bug mail: yet another place where email address needed to be changed22:59
dansmithmelwitt: if you make sure none of that gets skipped (like one gets set mapped without a mapping being created, etc) then it's okay I guess22:59
mriedemheh ibm loves the email22:59
efriedpretty sure they're bouncing 'em, cause I got a snail mail letter from another thing where I hadn't changed it.23:00
dansmithmelwitt: and the scheduler periodic is a case where we're kinda inviting you to run multiple in parallel, so there's that23:00
melwittdansmith: yeah... I could see that too. at the same time, I could see two discover_hosts happening to overlap. and yeah would have to be done with care23:00
*** tkajinam has joined #openstack-nova23:00
dansmithmelwitt: but if we just skip that one and keep scanning, then you could have multiples of those just hammering your database when you bring a bunch of nodes online, all but one of them losing on every attempt23:01
melwittI was just thinking about the lock thing and wondered about ignoring dupes23:01
dansmithif they all backoff and stop, then the one that won will proceed23:01
dansmiththe lock thing is just a hack to handle the sloppy ansible case23:01
dansmithor puppet or whatever23:01
melwittyeah23:01
*** wolverineav has quit IRC23:01
dansmithsince the have the same case for other things in nova-manage,23:02
dansmithlike db_sync, online_data_migrations, etc, it seems like we should just prescribe that those are to be run singly23:02
dansmithand if not, we just need to fix it all, otherwise we're inviting confusion23:02
melwittthat's true23:02
*** wolverineav has joined #openstack-nova23:02
*** wolverineav has quit IRC23:02
*** wolverineav has joined #openstack-nova23:02
dansmithheck, archive and purge probably explode if you run them in parallel23:03
dansmithand probably map_instances23:03
melwitthaha yeah23:03
melwittso the only valid concern would be the periodics23:03
dansmithyeah, the periodic is legit, although like I said we added that for tripleo undercloud where they didn't have instrumentation to even run it,23:04
dansmithand they only have one controller node23:04
melwittand I guess that would resolve itself eventually as the periodics keep running?23:04
dansmithso we could *also* just augment the help for that and place a warning and/or make sure it just logs a warning and doesn't make too much noise in the logs23:04
dansmithyes23:04
melwittlike you fail by bad luck and then next time you'll get it23:05
dansmithright23:05
dansmithit's a slow lazy discovery anyway, so who cares, and if one fails, the other likely succeeded and mapped everything anyway23:05
melwittyeah, I definitely wanted to make docs/usage update to help with this somehow23:05
*** cdent has quit IRC23:05
dansmithjesus, is there no mystique in the art of running a nova these days?23:05
melwittyeah23:05
dansmithalways with the documentation23:06
melwittlol, mystique23:06
*** wolverineav has quit IRC23:10
*** wolverineav has joined #openstack-nova23:13
*** owalsh_ has joined #openstack-nova23:14
*** owalsh has quit IRC23:15
openstackgerritsean mooney proposed openstack/nova master: extend libvirt video model support  https://review.openstack.org/64773323:17
*** owalsh has joined #openstack-nova23:21
*** kukacz has quit IRC23:21
*** dpawlik has quit IRC23:21
*** N3l1x has quit IRC23:21
*** aspiers has quit IRC23:21
*** amorin has quit IRC23:21
*** bbobrov has quit IRC23:21
*** antonym has quit IRC23:21
*** jdillaman has quit IRC23:21
*** jlvillal has quit IRC23:21
*** owalsh_ has quit IRC23:22
openstackgerritMerged openstack/nova stable/stein: Update instance.availability_zone on revertResize  https://review.openstack.org/64840223:23
*** mgoddard has quit IRC23:24
*** sambetts_ has quit IRC23:25
*** sambetts_ has joined #openstack-nova23:25
*** mgoddard has joined #openstack-nova23:27
*** kukacz has joined #openstack-nova23:27
*** dpawlik has joined #openstack-nova23:27
*** N3l1x has joined #openstack-nova23:27
*** aspiers has joined #openstack-nova23:27
*** amorin has joined #openstack-nova23:27
*** bbobrov has joined #openstack-nova23:27
*** antonym has joined #openstack-nova23:27
*** jdillaman has joined #openstack-nova23:27
*** jlvillal has joined #openstack-nova23:27
*** owalsh_ has joined #openstack-nova23:29
*** owalsh has quit IRC23:30
*** owalsh has joined #openstack-nova23:35
*** wolverineav has quit IRC23:36
*** owalsh_ has quit IRC23:36
zzzeekefried: hey23:41
*** owalsh_ has joined #openstack-nova23:42
*** owalsh has quit IRC23:43
*** hongbin has quit IRC23:43
*** igordc has joined #openstack-nova23:44
openstackgerritMerged openstack/nova stable/stein: Temporarily mutate migration object in finish_revert_resize  https://review.openstack.org/64868823:47
*** owalsh has joined #openstack-nova23:51
*** owalsh_ has quit IRC23:52

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!