*** owalsh has joined #openstack-nova | 00:00 | |
*** Sundar has joined #openstack-nova | 00:06 | |
*** liuyulong has quit IRC | 00:06 | |
*** tetsuro has joined #openstack-nova | 00:12 | |
*** threestrands has joined #openstack-nova | 00:14 | |
*** rcernin has joined #openstack-nova | 00:22 | |
*** bbowen__ has quit IRC | 00:24 | |
*** betherly has joined #openstack-nova | 00:24 | |
*** betherly has quit IRC | 00:28 | |
*** Sundar has quit IRC | 00:28 | |
*** rchurch_ has joined #openstack-nova | 00:30 | |
*** mchlumsky_ has joined #openstack-nova | 00:30 | |
*** tjgresha_nope has joined #openstack-nova | 00:31 | |
*** tetsuro_ has joined #openstack-nova | 00:31 | |
*** rchurch has quit IRC | 00:32 | |
*** mchlumsky has quit IRC | 00:32 | |
*** dpawlik has quit IRC | 00:32 | |
*** tetsuro has quit IRC | 00:32 | |
*** tjgresha has quit IRC | 00:32 | |
*** Vek has quit IRC | 00:32 | |
*** kukacz_ has quit IRC | 00:32 | |
*** dpawlik has joined #openstack-nova | 00:32 | |
*** kukacz has joined #openstack-nova | 00:33 | |
openstackgerrit | Dakshina Ilangovan proposed openstack/nova-specs master: Resource Management Daemon - Base Enablement https://review.openstack.org/651130 | 00:34 |
---|---|---|
*** mgoddard has quit IRC | 00:34 | |
*** sambetts_ has quit IRC | 00:35 | |
*** mgoddard has joined #openstack-nova | 00:37 | |
*** sambetts_ has joined #openstack-nova | 00:37 | |
*** tbachman has quit IRC | 00:43 | |
*** tbachman has joined #openstack-nova | 00:46 | |
*** wolverineav has joined #openstack-nova | 00:52 | |
*** wolverineav has quit IRC | 00:54 | |
*** wolverineav has joined #openstack-nova | 00:55 | |
*** rcernin has quit IRC | 00:59 | |
*** wolverineav has quit IRC | 01:00 | |
*** bbowen__ has joined #openstack-nova | 01:08 | |
*** rcernin has joined #openstack-nova | 01:09 | |
*** gyee has quit IRC | 01:10 | |
*** betherly has joined #openstack-nova | 01:33 | |
*** lbragstad has quit IRC | 01:37 | |
*** betherly has quit IRC | 01:37 | |
openstackgerrit | ya.wang proposed openstack/nova-specs master: Expose auto converge and post copy https://review.openstack.org/651681 | 01:39 |
*** betherly has joined #openstack-nova | 01:54 | |
*** betherly has quit IRC | 01:58 | |
*** bhagyashris has joined #openstack-nova | 02:01 | |
*** dave-mccowan has quit IRC | 02:01 | |
*** zhongjun2_ has joined #openstack-nova | 02:05 | |
*** zhongjun2_ has quit IRC | 02:07 | |
*** zhongjun2_ has joined #openstack-nova | 02:07 | |
*** zhongjun2_ is now known as zhongjun2 | 02:11 | |
*** awaugama has quit IRC | 02:13 | |
*** betherly has joined #openstack-nova | 02:15 | |
*** betherly has quit IRC | 02:19 | |
*** wolverineav has joined #openstack-nova | 02:24 | |
*** wolverineav has quit IRC | 02:41 | |
gmann | sean-k-mooney: artom : we are good on 648123 from microversion point of view. It is not breaking any API contract. Commented inline. https://review.openstack.org/#/c/648123/4 | 02:42 |
gmann | tempest schema test is passing which make sure the API contract verification | 02:42 |
*** betherly has joined #openstack-nova | 02:46 | |
*** igordc has quit IRC | 02:47 | |
*** betherly has quit IRC | 02:50 | |
*** bhagyashris has quit IRC | 02:56 | |
*** nicolasbock has quit IRC | 03:03 | |
*** hongbin has joined #openstack-nova | 03:03 | |
*** cfriesen has quit IRC | 03:07 | |
*** psachin has joined #openstack-nova | 03:09 | |
openstackgerrit | Boxiang Zhu proposed openstack/nova-specs master: Add host and hypervisor_hostname flag to create server https://review.openstack.org/645458 | 03:11 |
*** wolverineav has joined #openstack-nova | 03:13 | |
*** betherly has joined #openstack-nova | 03:17 | |
*** betherly has quit IRC | 03:22 | |
*** wolverineav has quit IRC | 03:43 | |
*** betherly has joined #openstack-nova | 03:48 | |
*** betherly has quit IRC | 03:53 | |
*** whoami-rajat has joined #openstack-nova | 03:54 | |
openstackgerrit | Merged openstack/nova master: Add test coverage for nova.privsep.libvirt. https://review.openstack.org/648616 | 03:54 |
*** Kevin_Zheng has joined #openstack-nova | 03:57 | |
*** markvoelker has joined #openstack-nova | 04:02 | |
*** imacdonn_ has quit IRC | 04:03 | |
*** imacdonn_ has joined #openstack-nova | 04:04 | |
*** hongbin has quit IRC | 04:05 | |
openstackgerrit | Merged openstack/nova master: Add test coverage for nova.privsep.qemu. https://review.openstack.org/649191 | 04:06 |
*** ricolin has joined #openstack-nova | 04:10 | |
openstackgerrit | Merged openstack/nova stable/stein: Do not persist RequestSpec.ignore_hosts https://review.openstack.org/649320 | 04:16 |
*** betherly has joined #openstack-nova | 04:30 | |
*** betherly has quit IRC | 04:34 | |
*** markvoelker has quit IRC | 04:36 | |
*** chhagarw has joined #openstack-nova | 04:38 | |
*** ratailor has joined #openstack-nova | 04:59 | |
*** betherly has joined #openstack-nova | 05:01 | |
*** betherly has quit IRC | 05:06 | |
*** sidx64 has joined #openstack-nova | 05:12 | |
*** rambo_li has joined #openstack-nova | 05:13 | |
*** sidx64 has quit IRC | 05:26 | |
*** bhagyashris has joined #openstack-nova | 05:31 | |
*** betherly has joined #openstack-nova | 05:32 | |
*** markvoelker has joined #openstack-nova | 05:33 | |
*** betherly has quit IRC | 05:37 | |
*** sidx64 has joined #openstack-nova | 05:38 | |
*** Luzi has joined #openstack-nova | 05:41 | |
*** wolverineav has joined #openstack-nova | 05:44 | |
*** awalende has joined #openstack-nova | 05:48 | |
*** wolverineav has quit IRC | 05:48 | |
*** jaypipes has quit IRC | 05:50 | |
*** jaypipes has joined #openstack-nova | 05:50 | |
*** betherly has joined #openstack-nova | 05:53 | |
*** awalende has quit IRC | 05:53 | |
*** awalende has joined #openstack-nova | 05:54 | |
*** gouthamr has quit IRC | 05:56 | |
*** betherly has quit IRC | 05:57 | |
*** gouthamr has joined #openstack-nova | 06:00 | |
*** udesale has joined #openstack-nova | 06:03 | |
*** markvoelker has quit IRC | 06:06 | |
*** tetsuro_ has quit IRC | 06:09 | |
*** tetsuro has joined #openstack-nova | 06:09 | |
*** chhagarw has quit IRC | 06:13 | |
*** whoami-rajat has quit IRC | 06:13 | |
*** chhagarw has joined #openstack-nova | 06:14 | |
*** sridharg has joined #openstack-nova | 06:21 | |
*** sidx64 has quit IRC | 06:23 | |
*** awalende has quit IRC | 06:24 | |
*** whoami-rajat has joined #openstack-nova | 06:25 | |
*** sidx64 has joined #openstack-nova | 06:27 | |
*** awalende has joined #openstack-nova | 06:33 | |
*** chhagarw has quit IRC | 06:39 | |
*** chhagarw has joined #openstack-nova | 06:39 | |
*** phasespace has quit IRC | 06:40 | |
*** betherly has joined #openstack-nova | 06:45 | |
*** betherly has quit IRC | 06:49 | |
*** obre has quit IRC | 06:52 | |
*** obre has joined #openstack-nova | 06:52 | |
*** ivve has joined #openstack-nova | 06:52 | |
*** ccamacho has joined #openstack-nova | 07:00 | |
*** markvoelker has joined #openstack-nova | 07:03 | |
*** boxiang has quit IRC | 07:03 | |
*** slaweq has joined #openstack-nova | 07:03 | |
*** betherly has joined #openstack-nova | 07:05 | |
*** rpittau|afk is now known as rpittau | 07:06 | |
*** luksky has joined #openstack-nova | 07:07 | |
*** betherly has quit IRC | 07:11 | |
*** awalende has quit IRC | 07:13 | |
*** betherly has joined #openstack-nova | 07:26 | |
*** mvkr has quit IRC | 07:27 | |
*** threestrands has quit IRC | 07:30 | |
*** betherly has quit IRC | 07:31 | |
*** markvoelker has quit IRC | 07:36 | |
*** tosky has joined #openstack-nova | 07:39 | |
*** tssurya has joined #openstack-nova | 07:40 | |
*** ralonsoh has joined #openstack-nova | 07:42 | |
*** ralonsoh has quit IRC | 07:42 | |
*** ralonsoh has joined #openstack-nova | 07:43 | |
*** betherly has joined #openstack-nova | 07:45 | |
*** maciejjozefczyk has left #openstack-nova | 07:49 | |
*** betherly has quit IRC | 07:50 | |
*** brinzhang has joined #openstack-nova | 07:55 | |
*** phasespace has joined #openstack-nova | 07:56 | |
*** ttsiouts has joined #openstack-nova | 08:04 | |
*** betherly has joined #openstack-nova | 08:06 | |
*** betherly has quit IRC | 08:10 | |
*** maciejjozefczyk has joined #openstack-nova | 08:10 | |
*** ttsiouts has quit IRC | 08:15 | |
*** ttsiouts has joined #openstack-nova | 08:16 | |
*** betherly has joined #openstack-nova | 08:19 | |
*** owalsh has quit IRC | 08:19 | |
*** ttsiouts has quit IRC | 08:20 | |
*** ttsiouts has joined #openstack-nova | 08:22 | |
*** priteau has joined #openstack-nova | 08:23 | |
*** betherly has quit IRC | 08:23 | |
*** davidsha has joined #openstack-nova | 08:29 | |
*** mdbooth has quit IRC | 08:30 | |
*** markvoelker has joined #openstack-nova | 08:34 | |
*** owalsh has joined #openstack-nova | 08:34 | |
*** derekh has joined #openstack-nova | 08:34 | |
*** dtantsur|afk is now known as dtantsur | 08:35 | |
*** betherly has joined #openstack-nova | 08:39 | |
*** betherly has quit IRC | 08:44 | |
*** tkajinam has quit IRC | 08:58 | |
*** mdbooth has joined #openstack-nova | 08:58 | |
*** wolverineav has joined #openstack-nova | 09:00 | |
*** cdent has joined #openstack-nova | 09:00 | |
*** sidx64 has quit IRC | 09:02 | |
openstackgerrit | Theodoros Tsioutsias proposed openstack/nova-specs master: Add PENDING vm state https://review.openstack.org/648687 | 09:02 |
*** wolverineav has quit IRC | 09:05 | |
*** sidx64 has joined #openstack-nova | 09:05 | |
*** markvoelker has quit IRC | 09:07 | |
*** sidx64 has quit IRC | 09:12 | |
*** mvkr has joined #openstack-nova | 09:30 | |
*** sidx64 has joined #openstack-nova | 09:34 | |
*** rambo_li has quit IRC | 09:35 | |
*** betherly has joined #openstack-nova | 09:41 | |
*** luksky has quit IRC | 09:42 | |
*** ttsiouts has quit IRC | 09:44 | |
*** ttsiouts has joined #openstack-nova | 09:44 | |
*** betherly has quit IRC | 09:46 | |
*** ttsiouts has quit IRC | 09:49 | |
*** bhagyashris has quit IRC | 09:50 | |
*** sidx64 has quit IRC | 09:52 | |
*** boxiang has joined #openstack-nova | 09:54 | |
*** markvoelker has joined #openstack-nova | 10:04 | |
*** ratailor_ has joined #openstack-nova | 10:08 | |
*** ratailor has quit IRC | 10:11 | |
*** luksky has joined #openstack-nova | 10:18 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Use update_provider_tree in vmware virt driver https://review.openstack.org/651615 | 10:20 |
*** sidx64 has joined #openstack-nova | 10:21 | |
*** bbowen__ has quit IRC | 10:23 | |
*** lpetrut has joined #openstack-nova | 10:27 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Delete the placement code https://review.openstack.org/618215 | 10:29 |
*** mvkr has quit IRC | 10:34 | |
*** markvoelker has quit IRC | 10:36 | |
openstackgerrit | Merged openstack/nova-specs master: Spec: Use in_tree getting allocation candidates https://review.openstack.org/646029 | 10:39 |
*** ratailor__ has joined #openstack-nova | 10:45 | |
*** nicolasbock has joined #openstack-nova | 10:45 | |
*** sapd1_x has joined #openstack-nova | 10:47 | |
*** ratailor_ has quit IRC | 10:48 | |
*** mvkr has joined #openstack-nova | 10:48 | |
*** tbachman has quit IRC | 10:54 | |
*** udesale has quit IRC | 10:57 | |
*** francoisp_ has quit IRC | 10:58 | |
*** priteau has quit IRC | 10:59 | |
*** mvkr has quit IRC | 11:04 | |
*** mvkr has joined #openstack-nova | 11:05 | |
*** sidx64 has quit IRC | 11:08 | |
*** sidx64 has joined #openstack-nova | 11:11 | |
*** ttsiouts has joined #openstack-nova | 11:13 | |
*** sidx64 has quit IRC | 11:18 | |
*** sidx64 has joined #openstack-nova | 11:20 | |
*** bbowen has joined #openstack-nova | 11:33 | |
*** ricolin has quit IRC | 11:37 | |
*** dave-mccowan has joined #openstack-nova | 11:38 | |
*** sapd1_x has quit IRC | 11:38 | |
*** ttsiouts has quit IRC | 11:40 | |
*** ttsiouts has joined #openstack-nova | 11:40 | |
*** yan0s has joined #openstack-nova | 11:42 | |
*** mvkr has quit IRC | 11:44 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead nova.db functions https://review.openstack.org/649570 | 11:46 |
stephenfin | gibi: Wanna send this cleanup patch on its way? https://review.openstack.org/#/c/650018/ | 11:48 |
*** eharney has joined #openstack-nova | 11:49 | |
*** nicolasbock has quit IRC | 11:53 | |
*** mvkr has joined #openstack-nova | 11:56 | |
gibi | stephenfin: done | 11:58 |
stephenfin | ta | 11:58 |
*** nicolasbock has joined #openstack-nova | 12:00 | |
*** pchavva has joined #openstack-nova | 12:00 | |
artom | gmann, cheers! (if you're still awake/around) | 12:03 |
*** tbachman has joined #openstack-nova | 12:11 | |
*** francoisp_ has joined #openstack-nova | 12:13 | |
*** brinzhang has quit IRC | 12:23 | |
*** rambo_li has joined #openstack-nova | 12:34 | |
*** lbragstad has joined #openstack-nova | 12:36 | |
cdent | aspiers: have you seen http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004779.html ? | 12:39 |
aspiers | cdent: yes, it's on my TODO list :) | 12:39 |
aspiers | still jetlagged from SUSECON | 12:39 |
cdent | roger, thanks | 12:39 |
aspiers | I saw that efried felt OK with the status quo though | 12:40 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Add in_tree field to RequestGroup object https://review.openstack.org/649534 | 12:40 |
aspiers | I'm inclined to agree that "don't do that" is a reasonable stance | 12:40 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Add get_compute_nodes_by_host_or_node() https://review.openstack.org/650877 | 12:40 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Pass target host to RequestGroup.in_tree https://review.openstack.org/650878 | 12:40 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Query `in_tree` to placement https://review.openstack.org/649535 | 12:40 |
sean-k-mooney | /query cdent | 12:40 |
cdent | ? | 12:41 |
sean-k-mooney | i was going to pm you and had an extra space | 12:41 |
sean-k-mooney | :) | 12:41 |
aspiers | lucky you noticed before you said something rude in public ;-) | 12:42 |
*** tetsuro has quit IRC | 12:46 | |
*** ttsiouts has quit IRC | 12:50 | |
*** ttsiouts has joined #openstack-nova | 12:51 | |
*** ratailor__ has quit IRC | 12:55 | |
*** ttsiouts has quit IRC | 12:55 | |
*** ttsiouts has joined #openstack-nova | 12:57 | |
*** udesale has joined #openstack-nova | 13:00 | |
*** wolverineav has joined #openstack-nova | 13:01 | |
*** wolverineav has quit IRC | 13:05 | |
*** sidx64 has quit IRC | 13:07 | |
*** rambo_li has quit IRC | 13:08 | |
*** jmlowe has quit IRC | 13:08 | |
*** rcernin has quit IRC | 13:18 | |
*** pcaruana has quit IRC | 13:20 | |
*** mlavalle has joined #openstack-nova | 13:24 | |
*** mlavalle has quit IRC | 13:25 | |
*** mlavalle has joined #openstack-nova | 13:27 | |
*** tbachman has quit IRC | 13:28 | |
*** mriedem has joined #openstack-nova | 13:28 | |
*** psachin has quit IRC | 13:30 | |
mnaser | is the nova archive tool not hitting cell0 a decision by design? | 13:31 |
dansmith | mnaser: do you mean archive_deleted_rows? | 13:32 |
mnaser | yeah | 13:32 |
dansmith | isn't there an all cells flag for that? | 13:32 |
mnaser | that was an unmerged patch afaik | 13:32 |
*** tbachman has joined #openstack-nova | 13:32 | |
dansmith | ah, okay, well, then yes by design? :) | 13:32 |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: Make evacuation respects anti-affinity rule https://review.openstack.org/649963 | 13:32 |
dansmith | run it against a config pointing at cell0 | 13:32 |
mnaser | https://review.openstack.org/#/c/587858/ | 13:33 |
mnaser | err well | 13:33 |
mnaser | looks like that was a dup of https://review.openstack.org/#/c/507486/ | 13:33 |
mnaser | which seems to have stalled out | 13:33 |
*** ttsiouts has quit IRC | 13:35 | |
*** ttsiouts has joined #openstack-nova | 13:35 | |
dansmith | mnaser: you can run purge --all-cells --before | 13:36 |
dansmith | oh wait, | 13:36 |
*** ttsiouts has quit IRC | 13:36 | |
dansmith | I was thinking that did an archive first but it does not, nevermind | 13:36 |
dansmith | it's not like I wrote that... | 13:36 |
*** ttsiouts has joined #openstack-nova | 13:36 | |
mnaser | it's early :p | 13:36 |
dansmith | yeah, YEAH. | 13:36 |
*** jmlowe has joined #openstack-nova | 13:37 | |
*** phasespace has quit IRC | 13:37 | |
mnaser | anyways, I rechecked that patch and I can iterate here and there to get it to land eventually | 13:39 |
mnaser | though probably not at a fast pace | 13:39 |
*** pcaruana has joined #openstack-nova | 13:42 | |
*** tbachman has quit IRC | 13:47 | |
*** tbachman has joined #openstack-nova | 13:51 | |
*** tetsuro has joined #openstack-nova | 13:52 | |
stephenfin | dansmith, jaypipes, cdent, bauzas: Reworking the cpu-resources spec at the moment. It feels like the general preference is to have a hard break between the current behavior and the new placement-based flow, right? | 13:54 |
stephenfin | as in only newly deployed compute nodes would support the new behavior | 13:54 |
bauzas | stephenfin: well, my opinion would be to not have a modification if you have the same options | 13:54 |
sean-k-mooney | ? | 13:55 |
*** ricolin has joined #openstack-nova | 13:55 | |
bauzas | but maybe however for CONF.vcpu_pin_set | 13:55 |
cdent | stephenfin: I'd need to load that context back in before being able to say something useful, so will defer to others for now | 13:55 |
stephenfin | sean-k-mooney: Pretty much this https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@725 | 13:55 |
bauzas | stephenfin: so, see my opinion : | 13:56 |
dansmith | stephenfin: no, I think that's the opposite of what we want | 13:56 |
bauzas | - no modifications for the same options but CONF.vcpu_pin_set | 13:56 |
sean-k-mooney | well i really wish we did not confalate vCPU with floating | 13:56 |
sean-k-mooney | or with pinned | 13:57 |
sean-k-mooney | it has a different meaning then either | 13:57 |
*** awaugama has joined #openstack-nova | 13:57 | |
sean-k-mooney | anyway personally i would like to have two paralle implementions. in train. freeze the current one and implement a paralle placemnt native one and then switch in U or later | 13:58 |
stephenfin | dansmith: It is? What is the preference? | 13:58 |
stephenfin | sean-k-mooney: Yeah, that's pretty much where my head was going | 13:58 |
stephenfin | as for conflating VCPU with floating, I'm not sure what other term we could use | 13:59 |
sean-k-mooney | a vCPU can be floating or pinned | 13:59 |
dansmith | stephenfin: I think I said on the review that I was opposed to us solving the problem of accounting by making people move their guests around between computes that are counting the old way and new way | 13:59 |
dansmith | stephenfin: so having compute nodes never transition to the new way (without being cleaned off first) is not okay | 14:00 |
mriedem | melwitt: just fyi that i replied to some of your replies in the quotas from placement change https://review.openstack.org/#/c/638073/2 | 14:00 |
mriedem | https://review.openstack.org/#/c/638073/ | 14:00 |
sean-k-mooney | i dont like using the vCPU resouce class in plamenet to mean just floating as it has a different menanitn the VCPU in the flavor | 14:00 |
mriedem | if you're working on updates | 14:00 |
stephenfin | sean-k-mooney: We're eventually going to kill vcpus in the flavor though | 14:01 |
sean-k-mooney | dansmith: ok so you are asking for an inpalce reshape or other mechanisuym | 14:01 |
jaypipes | sean-k-mooney: I think I'm pretty clear in the glossary of that spec. | 14:01 |
sean-k-mooney | stephenfin: im not really a fan of that but we could | 14:01 |
sean-k-mooney | jaypipes: yes i know what it say in the spec | 14:01 |
mriedem | mnaser: dansmith already said this but you can archive cell0 by just running it against a nova.conf with [database]/connection pointed at cell0 | 14:02 |
jaypipes | sean-k-mooney: that glossary clearly delineates between guest vCPU threads, shared host CPUs (floating CPUs) and dedicated host CPUs (pinned to a guest vCPU thread) | 14:02 |
stephenfin | sean-k-mooney: Yeah, we won't need it once we're modelling this stuff in placement. That'd be a future work item though | 14:02 |
stephenfin | sean-k-mooney: and VCPU and PCPU are far more succinct than VCPU_SHARED and VCPU_DEDICATED, even if we're overloading the term vCPU | 14:02 |
dansmith | stephenfin: we are? (going to kill vcpu in the flavor) ? | 14:02 |
sean-k-mooney | dansmith: see i dint think we were | 14:03 |
dansmith | sean-k-mooney: me either :) | 14:03 |
stephenfin | dansmith: https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@727 | 14:03 |
mriedem | killing vcpu in the flavor will be a giant change | 14:03 |
*** cfriesen has joined #openstack-nova | 14:04 | |
dansmith | stephenfin: you mean adjusting flavor vcpus down to zero for purely physically-pinned instances | 14:04 |
sean-k-mooney | stephenfin: ya that was one of the parts i disliked about the current spec but i could live with it if we needed too | 14:04 |
sean-k-mooney | stephenfin: its a faily major api breakage | 14:04 |
mriedem | is this a known gate breakage and i'm just late to the party? http://logs.openstack.org/50/651650/2/check/openstack-tox-cover/ebe055d/job-output.txt.gz#_2019-04-10_21_35_25_171727 | 14:05 |
sean-k-mooney | any system the used to inpsect the flaovr vcpu field woudl now have to inspec the resouces dict | 14:05 |
jaypipes | dansmith: yes. | 14:05 |
dansmith | maybe we need a hangout | 14:05 |
mriedem | b'migrate.exceptions.ScriptError: You can only have one Python script per version, but you have: /home/zuul/src/git.openstack.org/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions/393_add_instances_hidden.py and /home/zuul/src/git.openstack.org/openstack/nova/nova/db/sqlalchemy/migrate_repo/versions/393_placeholder.py' | 14:05 |
stephenfin | dansmith: Initially, yeah, but at some point that entire field could go. I'm trying to find the place we discussed this in the spec previously (this is a big review) | 14:05 |
mriedem | oh nvm i need to rebase | 14:05 |
dansmith | stephenfin: I don't agree with you :) | 14:05 |
sean-k-mooney | stephenfin: i think it would be better to leave teh vcpu filed in the flavor as the total numaber of cpu and set the vcpu resouce class request to 0 instead in that instance | 14:06 |
mnaser | wait, why would we kill the cpu field | 14:06 |
stephenfin | dansmith: Then it's a good thing this isn't actually a stated goal of this particular spec | 14:06 |
sean-k-mooney | it does not break clients and achive the same goal | 14:06 |
stephenfin | Let's forget about that and move back to the original question of handling upgrades | 14:06 |
jaypipes | mnaser: because there's two different actual resource classes: dedicated (pinned) CPU and shared CPU. | 14:07 |
sean-k-mooney | i dont think haveing a paralle implemantins and inplace updates are mutally exclcive | 14:07 |
stephenfin | I've no idea how we can make resource claims for existing instances that currently don't have anything claimed | 14:07 |
jaypipes | mnaser: and the ugliness of our NUMA and pinning code has borked how we think of the CPU resources. | 14:07 |
sean-k-mooney | stephenfin: they do have clims but they are all of resocue class vcpu | 14:08 |
stephenfin | yeah, so moving those from VCPU to PCPU | 14:08 |
jaypipes | stephenfin: by looking at the flavor and image extra specs. | 14:08 |
sean-k-mooney | stephenfin: so we need to modify there exiting claimes inplace as part of the reshape | 14:08 |
stephenfin | the migration is going to be hell | 14:08 |
jaypipes | yuuup. | 14:08 |
jaypipes | always is. | 14:08 |
sean-k-mooney | will it be any worse then the vgpu reshape | 14:08 |
stephenfin | we're going to handle the stupid stuff that can happen now, like shared and dedicated instances being on the same host | 14:08 |
jaypipes | as I've mentioned before, upgrade path is about 95% of the code and effort. | 14:09 |
mriedem | i thought you couldn't have shared and dedicated on the same host today, or is that a 'recommendation' but not enforced | 14:09 |
mriedem | and we just hope people use aggregates for sanity? | 14:09 |
sean-k-mooney | stephenfin: by the way today the vms cant mix pinned and flaoting instace so the only things we have to migrate are the pinned isntance and then just need to change the resouce class | 14:09 |
stephenfin | mriedem: yeah, the latter | 14:09 |
jaypipes | mriedem: recommendation. there is no way to enforce it. | 14:09 |
stephenfin | mriedem: It's all over the docs but who reads those | 14:10 |
sean-k-mooney | windriver have some downwstream only hacks to make mixed stuff work | 14:10 |
sean-k-mooney | or i guess i should say starlingx | 14:10 |
mriedem | i know they have their crazy floating vcpu stuff in starlingx | 14:10 |
sean-k-mooney | but we shoudl not port that | 14:10 |
mriedem | i'm not sure there is no way we could've enforced it, | 14:11 |
stephenfin | sean-k-mooney: You mean you can't have pinned and floating instances on the same host? If so, you know that's not true | 14:11 |
* jaypipes steps out of time machine. yep, I thought this looked like 2 years ago... | 14:11 | |
sean-k-mooney | ya so they were only able to do that because we dont enforce that | 14:11 |
mriedem | e.g. if cpu_allocation_ratio=1.0, you have to have dedicated cpus | 14:11 |
stephenfin | Yeah, we don't so we have to handle that | 14:11 |
sean-k-mooney | stephenfin: no you can but you shouldnt | 14:11 |
sean-k-mooney | with out the starlinx hacks | 14:11 |
stephenfin | and because you can, we're going to have to handle that | 14:12 |
jaypipes | mriedem: cpu_allocation_ratio=1.0 has nothing to do with pinned CPUs. | 14:12 |
dansmith | well, | 14:12 |
jaypipes | (which is part of the problem) | 14:12 |
dansmith | it's all over this spec | 14:12 |
stephenfin | hence my inclination towards draining hosts and moving them to other, newly configured hosts | 14:12 |
stephenfin | *moving the instances | 14:12 |
mriedem | my point was we could have used that as indication a host can only have dedicated cpu guests | 14:12 |
mriedem | but since we didn't do that, yeah we could have mixed on the same host i guess and have to deal wit it | 14:12 |
sean-k-mooney | cpu_allocation_ratio=1.0 jsut disables oversubsciption unfrotunetly | 14:12 |
mriedem | *with | 14:12 |
dansmith | I'm -2 on making people move instances to update counting numbers in placement | 14:12 |
stephenfin | mriedem: yeah, cpu_allocation_ratio is ignored for pinned instances | 14:13 |
jaypipes | dansmith: especially when those instances are pets and pandas. all of them. | 14:13 |
sean-k-mooney | anyway im personally not to worried about fixing the allocations. | 14:13 |
dansmith | aye | 14:13 |
sean-k-mooney | i think we can do that | 14:13 |
stephenfin | so I'd imagine it's set to 16.0 for most deployments, regardless of the workload | 14:14 |
cdent | dansmith: I remain confused about why it isn't okay for those instance to remain defined/allocation in the "old way"? | 14:14 |
sean-k-mooney | stephenfin: it depense on the deployment tool | 14:14 |
stephenfin | yup | 14:14 |
dansmith | cdent: I don't think it is | 14:14 |
bauzas | sorry folks, I had to go AFK | 14:14 |
dansmith | cdent: we have to reshape | 14:14 |
cdent | dansmith: I hear you, I'm asking "why?" | 14:14 |
bauzas | dansmith: my point is that I think we only need to reshape for CONF.vcpu_pin_set | 14:15 |
bauzas | for the other options, we don't need it | 14:15 |
sean-k-mooney | vcpu_pin_set is used for host with shared cpus too | 14:15 |
bauzas | stephenfin: ^ | 14:15 |
dansmith | cdent: why not just leave them with the old accounting for five years? | 14:15 |
bauzas | sean-k-mooney: I know, that's why we need to reshape | 14:15 |
cdent | if that's how long they live, sure | 14:15 |
bauzas | but only for this option | 14:15 |
stephenfin | sean-k-mooney: Is it? I thought that was totally ignored unless you'd a NUMA topology | 14:16 |
*** hongbin has joined #openstack-nova | 14:16 | |
sean-k-mooney | yes | 14:16 |
sean-k-mooney | but you can have numa without pinning | 14:16 |
cdent | dansmith: I'm not suggesting it, I'm asking why it is not okay. | 14:16 |
sean-k-mooney | and actully no | 14:16 |
stephenfin | Instead you've to use that reserved_cpus option (or whatever it's called) | 14:16 |
dansmith | cdent: because it means new instances scheduled to compute nodes that don't have proper accounting would also have to be accounted the old way? | 14:16 |
stephenfin | cdent: Yeah, that's what I was thinking to ^ | 14:16 |
stephenfin | *too | 14:16 |
mriedem | cdent: by "remain defined/allocation in the "old way"?" do you mean reporting VCPU allocations to placement rather than PCPU? | 14:16 |
sean-k-mooney | if you have no numa toplogy th enumber of enabeld cores in vcpu pinnset is still used to determin the number of cores reported to the resouce tracker and therefor to placemnt | 14:17 |
cdent | dansmith: make them end up on other nodes? | 14:17 |
stephenfin | We can't schedule new instances to that host until we know how many PCPUs are actually in use there | 14:17 |
dansmith | cdent: so I have to waste the capacity on those nodes until long-lived instances die? | 14:17 |
cdent | mriedem: yes, if that's how they were booted in the first place, how/why should they change | 14:17 |
dansmith | right, what stephenfin said | 14:17 |
cdent | dansmith: yes! | 14:17 |
dansmith | cdent: um, no | 14:17 |
sean-k-mooney | stephenfin: the advice that many gave was never to user reserved_cpus and alwauys use vcpu_pin_set instead | 14:17 |
mnaser | that's a terrible idea | 14:17 |
mnaser | tbh with my operators 2 cents: I'm not ok with moving around all my instances around to magically reshape things | 14:18 |
dansmith | this ^ | 14:18 |
mriedem | since i just went through mel's change to counting usage from placement, your quotas would be all out of whack too | 14:18 |
cdent | I'm not suggesting that people migrate their instances | 14:18 |
mnaser | and I'm not okay with my capacity sitting empty because this is $$$$ | 14:18 |
stephenfin | sean-k-mooney: Again, advice that wasn't enforced anywhere and therefore not something we can rely on :( | 14:18 |
dansmith | moving gigs and gigs of live instances so that we can adjust integers in placement is INSANE | 14:18 |
sean-k-mooney | the reason is vcpu_pin_set takes effect before teh cpu_allocation_reatio is applied and reserved_happens after | 14:18 |
cdent | meh | 14:18 |
cdent | I never suggested anybody do any migrations | 14:18 |
dansmith | reserving capacity until a five-year instance goes away so we can update integers in placement is INSANE | 14:18 |
cdent | Let stuff live out its lifecycle | 14:18 |
stephenfin | cdent: Yup, that's all me. Sorry :) | 14:18 |
mnaser | that's not possible though, that's the thing | 14:18 |
mnaser | I have no control over my environment | 14:19 |
*** lpetrut has quit IRC | 14:19 | |
cdent | dansmith: I'm not concerned about this from a placement standpoint: abuse placement all we want, it'll take it | 14:19 |
cdent | I'm concerned about it from a magical recognition happening on the compute node | 14:19 |
dansmith | cdent: yeah, this isn't a placement concern, it's a nova concern | 14:19 |
stephenfin | OK, so we're rewriting flavors and moving allocations around to switch everything to the new system. If that's the case, we're back to trying to think of all the edge cases that exist | 14:20 |
stephenfin | and there are many. Many many | 14:20 |
sean-k-mooney | stephenfin: we can support the new flow without requrieing flavor to be modifed | 14:20 |
dansmith | sean-k-mooney: he means the instance's flavor I think | 14:20 |
stephenfin | sean-k-mooney: via a shim, I guess? | 14:20 |
stephenfin | dansmith: I do. The embedded one | 14:20 |
sean-k-mooney | stephenfin: today we generate the palcement request via the request spec | 14:21 |
dansmith | stephenfin: you can write a nova-status check that verifies that all the instances can be fit to whatever minimally simplified scheme you support, | 14:21 |
* mriedem thinks about how we haven't dropped the ironic flavor migration code from pike yet | 14:21 | |
dansmith | to warn people before they upgrade with complex instances that can't be fixed or something | 14:21 |
sean-k-mooney | we convert the VCPU element in the flavor in to a vcpu resouce request | 14:21 |
sean-k-mooney | that code can take account of the hw:cpu_policy extra spec and jsut assk for PCPU resouces instread | 14:22 |
stephenfin | dansmith: That's going to be a big check, fair warning :) | 14:22 |
dansmith | stephenfin: I'm just throwing ideas | 14:22 |
stephenfin | Yup, and good ones too | 14:22 |
*** awalende has joined #openstack-nova | 14:22 | |
dansmith | stephenfin: migrating everything, deleting everything, not upgrading until instances age out -- all not options, IMHO | 14:22 |
bauzas | dansmith: mnaser: okay, sorry, that's me who proposed migrating instances, and it was a terrible idea, I reckon | 14:22 |
bauzas | so we should stop thinking about this possibility | 14:23 |
mnaser | bauzas: all good :) ideas are good to bring up anyways | 14:23 |
bauzas | but, then, we want to just make sure that when creating a new RC, we also look at the existing capacity | 14:23 |
*** jobewan has joined #openstack-nova | 14:23 | |
stephenfin | So supporting instances with just 'hw:cpu_policy=dedicated' in a deployment that has used aggregates as we suggest seems pretty easy | 14:24 |
bauzas | see the example I provided : https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@181 | 14:24 |
bauzas | stephenfin: ^ | 14:24 |
*** jobewan has quit IRC | 14:24 | |
bauzas | if we change the capacity for VGPU, then it could be a problem | 14:24 |
dansmith | stephenfin: another thing I'd support is converting instances to allocations that are maybe overly conservative.. like if you need to reserve more resources than they really have to make the math work out, that seems like a potential compromise | 14:24 |
*** Luzi has quit IRC | 14:24 | |
bauzas | "I'll try to clarify my thoughts with an upgrade example on a host with 8 physical CPUs, named CPU0 to 7:in Stein, instances are running and actively taking VCPUs that are consuming CPU0 to CPU7.in Train, operator wants to dedicate CPU0 and CPU1. Accordingly, CPU2 to 7 will be shared.Consequently, VCPU inventory for that host will be reduced by the amount of allocation_ratio * 2. In theory, we should then allocate VCPU resource | 14:24 |
bauzas | lass for instances that are taking VCPUs located on CPU2 to 7 and allocate PCPU for instances that are taking CPU0 and 1. But if ratio is 16, we could have 32 instances (asking for 1 VCPU) to be allocated against 2 PCPU with ratio=1.0." | 14:24 |
dansmith | stephenfin: and then recalculate that on migrate if they want.. depending on how that looks | 14:25 |
dansmith | I dunno what the actual complexity concern looks like, so I'm just spitballing | 14:25 |
stephenfin | dansmith: That might be necessary for something like 'hw:cpu_threads_policy=isolate' | 14:25 |
dansmith | yeah | 14:25 |
*** jobewan has joined #openstack-nova | 14:25 | |
* stephenfin really regrets ever having added that feature :( | 14:26 | |
*** awalende has quit IRC | 14:26 | |
stephenfin | bauzas: Yeah, I think we need a startup check to ensure NUM_VCPUS_USED_BY_INSTANCE <= NUM_VCPUS_AVAILABLE | 14:27 |
stephenfin | *INSTANCES | 14:27 |
sean-k-mooney | well we say in the spec hw:cpu_threads_policy woudl be going away | 14:27 |
bauzas | stephenfin: cool with it then | 14:27 |
bauzas | stephenfin: but then we need to call placement when restarting the compute service | 14:27 |
stephenfin | sean-k-mooney: Yup, but there has to be an intermediate step that lets us account for the fact that existing instances are using more cores than instance.vcpus | 14:28 |
bauzas | *every time* | 14:28 |
*** jobewan has quit IRC | 14:28 | |
bauzas | for vGPUs, we basically only reshape once | 14:28 |
stephenfin | bauzas: Hmm, could also be a nova-status check as dansmith suggested | 14:28 |
mnaser | what's the concern in taking the current state and translating that directly into placement when the compute node goes up? | 14:28 |
dansmith | mnaser: complexity | 14:28 |
dansmith | but we have to bite that bullet I think | 14:29 |
stephenfin | mnaser: there are a lot of ways things can be inconsistent and we need to handle those | 14:29 |
sean-k-mooney | stephenfin: that only happens for pinned instacnes if the hsot has hyper treading or you have emulartor_treads=isolate | 14:29 |
*** jobewan has joined #openstack-nova | 14:29 | |
sean-k-mooney | stephenfin: but yes we do | 14:29 |
stephenfin | like the way there's nothing preventing you from scheduling pinned and unpinned instances on the same host | 14:29 |
mnaser | couldn't you introspect pinned and unpinned from the libvirt definition | 14:30 |
sean-k-mooney | mnaser: we can tell form the flavor | 14:30 |
stephenfin | (so when we migrate, we could end up in a situation where an N core host could have N PCPUs and N * overallocation_ratio VCPUs in use at the same time) | 14:30 |
sean-k-mooney | its not a case of we dont know this happens we told operators that its there respociblity to ensure it does not | 14:31 |
sean-k-mooney | that is the issue we told them to do somthing but did not enforec it in code | 14:31 |
sean-k-mooney | there for we have to assuem the worst | 14:31 |
stephenfin | or the fact that when using the isolate cpu thread policy, the instance may or may not be using twice as many cores as its supposed to be using (isolate will reserve the hyperthread siblings for each core used by the instance) | 14:32 |
stephenfin | sean-k-mooney: Correct | 14:32 |
sean-k-mooney | yes alther to be faire we do account for that properly in the resocue tracker | 14:32 |
sean-k-mooney | *although | 14:32 |
stephenfin | yup | 14:32 |
sean-k-mooney | mnaser: so we have all the data to fix things if we need to | 14:33 |
bauzas | stephenfin: we *could* do it with nova-status but then operators would have to migrate (or delete some instances) :( | 14:33 |
sean-k-mooney | the issue is that the isolate pollicy is not compatible with placcement | 14:33 |
bauzas | thanks, allocation ratio | 14:33 |
mriedem | lyarwood: can you hit these to keep things rolling https://review.openstack.org/#/q/topic:bug/1669054+branch:stable/rocky | 14:33 |
sean-k-mooney | it chagnes the quantity fo resocue based in the hsot that is selected | 14:33 |
sean-k-mooney | bauzas: no i think we can fix allocation for existing instance | 14:34 |
bauzas | sean-k-mooney: how ? see my example | 14:34 |
sean-k-mooney | the thing we have to be ok with is removing cpu_thread policies | 14:34 |
sean-k-mooney | bauzas: we can over allocate RPs if we need to initally | 14:35 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: libvirt: disconnect volume when encryption fails https://review.openstack.org/651796 | 14:36 |
sean-k-mooney | or we can say you asked for 2 cpus but you have isolate and are artully using 4 cpus and update the placement allcoation accordingly | 14:36 |
*** tetsuro has quit IRC | 14:36 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Don't warn on network-vif-unplugged event during live migration https://review.openstack.org/651797 | 14:37 |
stephenfin | mriedem: If you're looking at stable stuff, think you could look at these too? https://review.openstack.org/#/c/650363/ https://review.openstack.org/#/c/650364/ | 14:37 |
sean-k-mooney | with pinned cpus there was no over subscirtion so the fact the vm is there means it can fit and we correctly do the accounting in the resouce tracker to hanel the addtional cpu usage | 14:37 |
*** udesale has quit IRC | 14:38 | |
mriedem | stephenfin: ok | 14:38 |
stephenfin | thanks | 14:38 |
stephenfin | bauzas: You would, but is there anyway to work around that? | 14:38 |
stephenfin | I mean, if they're in a broken state, something has to change | 14:39 |
stephenfin | bauzas: Also, wouldn't this exact same thing happen now if you messed with allocation ratios? | 14:39 |
stephenfin | i.e. If there are already instances on a host and I drop cpu_allocation_ratio from 16.0 to 2.0 and restart nova-compute, what happens? | 14:40 |
bauzas | well, I dunno what to say | 14:40 |
bauzas | stephenfin: it just works | 14:40 |
sean-k-mooney | in the placement side the ratio changes | 14:40 |
bauzas | stephenfin: but any other instance request would not go to this compute | 14:40 |
sean-k-mooney | if you are using more then is available you can nolonger allocate untill you drop below the new limit | 14:40 |
sean-k-mooney | but nothing breaks | 14:40 |
stephenfin | So it'd be the same here, right? | 14:40 |
*** jobewan has quit IRC | 14:41 | |
bauzas | exactly what I said :) | 14:41 |
sean-k-mooney | it just prevent new instance going to the node | 14:41 |
stephenfin | Or am I missing something? | 14:41 |
bauzas | actually, that's a good point | 14:41 |
sean-k-mooney | yes it would be the same | 14:41 |
bauzas | if the host is oversubscribed, that's fine | 14:41 |
*** jobewan has joined #openstack-nova | 14:41 | |
bauzas | it's just the options mean nothing | 14:41 |
sean-k-mooney | ok i hate myself for saying this but can we seperate hw:cpu_thread_policy into antoher spec for the removal of that option? | 14:43 |
sean-k-mooney | it can be replaced with a trait for host with SMT enabled | 14:44 |
sean-k-mooney | if we agree on that then that one less thing we need to figure out in the cpu spec | 14:44 |
sean-k-mooney | we will be loosing functionality by doing that but if are not ok with removing that option we have a blocker with the larger spec for cpus in placment anyway | 14:46 |
stephenfin | sean-k-mooney: Not _really_. I mean, 'isolate' results in extra cores being used and those have to be account for somehow | 14:46 |
sean-k-mooney | stephenfin: they are in the resouce tracker | 14:46 |
dansmith | sean-k-mooney: you can't really remove that image property | 14:46 |
dansmith | you can translate it into something more sane, but it's basically API at this point | 14:47 |
*** sapd1_x has joined #openstack-nova | 14:47 | |
sean-k-mooney | i personally see value in it but its cause huge issues for cpu in plament | 14:47 |
dansmith | if you just start ignoring that, everyone's tooling is going to start spinning up instances they think are isolated (or whatever) but arent' and they'll find out when it's too late | 14:47 |
sean-k-mooney | ya i know | 14:47 |
sean-k-mooney | we can certely translate it | 14:48 |
stephenfin | dansmith: The migration path we'd suggested was keeping that but limiting it to hosts with "I don't have hyperthreads" trait set | 14:48 |
stephenfin | I think | 14:48 |
sean-k-mooney | to forbined:trait=COMPUTE_SMT | 14:48 |
* stephenfin goes to double check | 14:48 | |
stephenfin | Wait, yeah, that ^ | 14:48 |
dansmith | stephenfin: that's cool, it just can't like .. be removed, like sean-k-mooney was saying :) | 14:49 |
sean-k-mooney | removed was a bad phasing | 14:49 |
stephenfin | dansmith: True. We can think about deprecating it in the future though | 14:49 |
sean-k-mooney | its meaning would change | 14:49 |
stephenfin | It's that or we carry the shim forever | 14:49 |
dansmith | I don't | 14:49 |
dansmith | it's API | 14:49 |
stephenfin | We deprecate/remove other APIs though? | 14:49 |
dansmith | you can fail the boot if it's specified as anything if you want | 14:49 |
dansmith | stephenfin: this is unversioned | 14:50 |
dansmith | but I think we have to check for it basically forever | 14:50 |
dansmith | it's also API that's unversioned and spread between nova, glance and cinder | 14:50 |
stephenfin | Hmm, good point | 14:50 |
dansmith | I'm not saying you have to honor it well, with a shim forever, | 14:50 |
sean-k-mooney | ok does it help to split that bit out into another spec | 14:50 |
dansmith | but you can't ever just start ignoring it, IMHO | 14:51 |
stephenfin | ok, let's kick that can down the road | 14:51 |
*** dklyle has joined #openstack-nova | 14:51 | |
stephenfin | for now, I quite like the idea of overallocating the PCPUs for existing instances with the 'isolate' policy | 14:51 |
dansmith | the only reason not to separate it is if it's a problem for your current proposal, like you can't continue to honor it as is after you make other changes | 14:51 |
sean-k-mooney | well its a prequisit for dedicated cpus in placmente | 14:51 |
dansmith | but if that's not a problem, then sure | 14:51 |
sean-k-mooney | well it is a proablem for the current proposal | 14:52 |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: Add host and hypervisor_hostname flag to create server https://review.openstack.org/645458 | 14:52 |
dansmith | then we can't separate it completely | 14:52 |
stephenfin | sean-k-mooney: Yeah, I don't think we can split it out entirely | 14:52 |
dansmith | I've been on a call for the last hour, and have another starting soon that I actually have to pay attention to, FYI | 14:52 |
stephenfin | We need to figure out what happens right now, once we have PCPUs in placement | 14:52 |
dansmith | so don't assume my pending silence on this matter is because I have shot myself in the face | 14:52 |
stephenfin | What we don't need to figure out now is what we're doing even further down the road (in terms of failing the instance if the image property is set or something else) | 14:53 |
mriedem | bauzas: you know how we reset_forced_destinations on the request spec when moving a server? | 14:53 |
stephenfin | dansmith: Ack, me too | 14:53 |
sean-k-mooney | stephenfin: well the traits thing is mentioned here https://review.openstack.org/#/c/555081/22/specs/train/approved/cpu-resources.rst@778 | 14:53 |
dansmith | stephenfin: yes, I think you can punt on that part | 14:53 |
mriedem | bauzas: if i create a server with a query scheduler hint targeted at a host or hypervisor_hostname, i can never migrate my server off the host :) same problem - but dumber | 14:54 |
stephenfin | sean-k-mooney: Oh, indeed it is. I just need to expand on that I guess | 14:54 |
stephenfin | OK, let me try and jot all this down in the spec and clean up the other issues | 14:54 |
* stephenfin would like to get started on the code for this sooner rather than latter as it's going to be a untangling stuff | 14:55 | |
sean-k-mooney | so its litrally already part of the spec. the only fuctionality that you loose with tah it you can nolonger use isolate to allow a host with hyperthreads to be shared between guests that want full coures and thost that can just have threads | 14:55 |
sean-k-mooney | stephenfin: the code isnt the problem with this sepc | 14:56 |
sean-k-mooney | stephenfin: the upgrade impact is and it think we can move forward with this spec but the current spec still has upgrade issues | 14:56 |
sean-k-mooney | stephenfin: i share dansmith view that we shoudl enable inplace upgrades to this new way of doing things if we can do that i will be happy with this | 14:57 |
sean-k-mooney | stephenfin: that requrie either paralle implementions and a config to opt in to new beahvior or no change to exising configs | 14:58 |
bauzas | mriedem: sec, was otp | 15:01 |
mriedem | bauzas: somewhat related but you might have something to add to my reply here https://review.openstack.org/#/c/649534/5/nova/objects/request_spec.py@606 | 15:02 |
bauzas | uh, and now in meeting actually :( | 15:02 |
mriedem | bauzas: not high priority | 15:02 |
*** artom has quit IRC | 15:06 | |
mriedem | stephenfin: can we not have a py3 unit test for https://review.openstack.org/#/c/650235/ ? | 15:07 |
stephenfin | mriedem: Not really, no. It's an environment thing | 15:08 |
mriedem | but can't we control the environment in a test? | 15:08 |
*** tbachman has quit IRC | 15:09 | |
mriedem | btw zigo made the same change in osc https://review.openstack.org/#/c/541609/ | 15:09 |
stephenfin | Yes? No? I honestly don't know. We'd be monkeypatching Python internals, I suspect | 15:09 |
mriedem | idk about that, | 15:09 |
mriedem | nova's tox.ini sets this: | 15:09 |
mriedem | LC_ALL=en_US.utf-8 | 15:10 |
stephenfin | right, so the Python process is correctly configured in that case | 15:10 |
stephenfin | I think we'd have to reload the interpreter to misconfigure things | 15:10 |
sean-k-mooney | is this realted to the gat using the LC_ALL=C again | 15:11 |
stephenfin | yup | 15:11 |
mriedem | stephenfin: takashi also asked that something is documented about this which could probably go here https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-console-log | 15:11 |
stephenfin | mriedem: I don't think OSC is an issue because of the follow up patch to the one you linked https://review.openstack.org/#/c/554698/1 | 15:11 |
mriedem | ah ok | 15:12 |
stephenfin | Alas, we don't use cliff in novaclient | 15:12 |
*** gyee has joined #openstack-nova | 15:12 | |
stephenfin | As for docs, it's on my todo list and I'll try drag something out by the end of the week | 15:12 |
stephenfin | mriedem: There's an alternative approach we can take that doesn't require changing environment configuration, but I don't know if we want to do it as it's a huge hack https://review.openstack.org/#/c/583535/ | 15:15 |
*** sridharg has quit IRC | 15:15 | |
stephenfin | I'd nearly rather suggest people use Python 3 if they're encountering these kinds of Unicode issues | 15:15 |
mriedem | i'll see if i can add a unit test in novaclient | 15:18 |
*** mvkr has quit IRC | 15:27 | |
melwitt | mriedem: thanks, will take a look. I'm in the middle of updating everything, will be able to push the updates soon today I think | 15:30 |
melwitt | soonâ„¢ | 15:34 |
*** ivve has quit IRC | 15:36 | |
*** luksky has quit IRC | 15:36 | |
*** tbachman has joined #openstack-nova | 15:41 | |
*** pcaruana has quit IRC | 15:42 | |
lyarwood | melwitt: https://review.openstack.org/#/c/611974/ - finally got to this btw, LGTM after playing around with it locally. | 15:42 |
melwitt | lyarwood: just saw that, thanks so much | 15:43 |
*** wolverineav has joined #openstack-nova | 15:45 | |
*** hamzy has quit IRC | 15:47 | |
*** ccamacho has quit IRC | 15:49 | |
*** wolverineav has quit IRC | 15:49 | |
*** tbachman has quit IRC | 15:52 | |
*** boxiang has quit IRC | 15:55 | |
*** boxiang has joined #openstack-nova | 15:55 | |
*** amodi has joined #openstack-nova | 15:56 | |
*** sapd1_x has quit IRC | 15:58 | |
*** yan0s has quit IRC | 15:58 | |
*** tssurya has quit IRC | 16:00 | |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient master: Add test for console-log and docs for bug 1746534 https://review.openstack.org/651827 | 16:06 |
openstack | bug 1746534 in python-novaclient "encoding error when doing console-log" [High,Fix released] https://launchpad.net/bugs/1746534 - Assigned to Thomas Goirand (thomas-goirand) | 16:06 |
mriedem | stephenfin: see how ^ grabs you | 16:06 |
mriedem | i couldn't reproduce the original bug in the unit test | 16:06 |
stephenfin | mriedem: Yup, that looks good to me. Good find with those click docs | 16:08 |
mriedem | need a stable core to hit these https://review.openstack.org/#/q/topic:bug/1821824+branch:stable/stein | 16:13 |
*** rpittau is now known as rpittau|afk | 16:15 | |
lyarwood | mriedem: have the branch open to review now btw | 16:15 |
*** tbachman has joined #openstack-nova | 16:15 | |
*** artom has joined #openstack-nova | 16:18 | |
*** chhagarw has quit IRC | 16:25 | |
*** tbachman has quit IRC | 16:31 | |
*** zbr has quit IRC | 16:34 | |
*** dave-mccowan has quit IRC | 16:39 | |
*** zbr has joined #openstack-nova | 16:40 | |
*** ricolin has quit IRC | 16:44 | |
*** ttsiouts has quit IRC | 16:44 | |
*** ttsiouts has joined #openstack-nova | 16:45 | |
*** ttsiouts has quit IRC | 16:50 | |
*** dtantsur is now known as dtantsur|afk | 16:50 | |
*** davidsha has quit IRC | 16:52 | |
openstackgerrit | Merged openstack/nova master: devstack: Remove 'tempest-dsvm-tempest-xen-rc' https://review.openstack.org/650018 | 16:54 |
*** ivve has joined #openstack-nova | 16:56 | |
*** hamzy has joined #openstack-nova | 17:00 | |
*** pcaruana has joined #openstack-nova | 17:05 | |
*** igordc has joined #openstack-nova | 17:06 | |
*** derekh has quit IRC | 17:11 | |
*** tbachman has joined #openstack-nova | 17:13 | |
*** jobewan has quit IRC | 17:13 | |
openstackgerrit | Rodolfo Alonso Hernandez proposed openstack/os-vif master: Remove IP proxy methods https://review.openstack.org/643115 | 17:14 |
openstackgerrit | Rodolfo Alonso Hernandez proposed openstack/os-vif master: Refactor functional base test classes https://review.openstack.org/643101 | 17:14 |
lyarwood | mriedem: https://review.openstack.org/#/q/status:open+topic:bug/1803961 - Would you mind taking a look at this again when you have time. I'm going to suggest that we land and backport this over the competing cinder fix for the time being. | 17:14 |
*** jobewan has joined #openstack-nova | 17:15 | |
*** ivve has quit IRC | 17:15 | |
*** penick has joined #openstack-nova | 17:16 | |
openstackgerrit | Merged openstack/nova master: trivial: Remove dead nova.db functions https://review.openstack.org/649570 | 17:17 |
*** jobewan has quit IRC | 17:20 | |
*** tbachman has quit IRC | 17:25 | |
*** tbachman has joined #openstack-nova | 17:26 | |
*** gyee has quit IRC | 17:29 | |
*** Sundar has joined #openstack-nova | 17:30 | |
*** cfriesen has quit IRC | 17:34 | |
*** cfriesen has joined #openstack-nova | 17:34 | |
*** luksky has joined #openstack-nova | 17:41 | |
mriedem | lyarwood: couldn't get jgriffith or another cinder person to look at https://review.openstack.org/#/c/637224/ ? | 17:48 |
lyarwood | mriedem: I did a while ago and they pointed me towards https://review.openstack.org/#/c/638995/ but that has now stalled and I'm thinking it might just be easier to fix and backport in Nova given the cinder change is touching lots of different backends. | 17:51 |
lyarwood | mriedem: I can ask again to get confirmation that using migration_status is okay with them as a workaround until ^ lands. | 17:52 |
*** ralonsoh has quit IRC | 17:56 | |
*** gmann is now known as gmann_afk | 18:15 | |
mriedem | melwitt: thinking out loud about quota and cross-cell resize and your placement change, when a user resizes a server today, placement will track vcpu/ram usage against both the source and dest node, but the /limits API will only show vcpu/ram usage for the new flavor since that is counted from the instance right? | 18:22 |
mriedem | so i think if i'm resizing from a flavor with vcpu=2 to vcpu=4, usage from placement for that project will say a total of 6, but the compute limits API would say 4 | 18:22 |
mriedem | i'm not necessarily saying that's wrong, but is that accurate? | 18:23 |
melwitt | yeah, it's definitely not going to say 6, but I don't remember if it will say 2 or 4 before the resize is confirmed | 18:24 |
mriedem | placement would say 6 | 18:24 |
melwitt | yeah | 18:25 |
melwitt | looking at the code to see when the new flavor is saved to the Instance object vcpus and memory_mb attributes | 18:25 |
mriedem | once the server is resized the limits api would say 4 https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/objects/instance.py#L1535 | 18:25 |
melwitt | those attributes are what's counted today | 18:26 |
*** tosky has quit IRC | 18:26 | |
mriedem | it's finish_resize on the dest host https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/compute/manager.py#L4632 | 18:26 |
mriedem | same method that changes the instance status to VERIFY_RESIZE | 18:26 |
melwitt | ah ok | 18:27 |
mriedem | maybe this is part of why we were talking about adding consumer types to placement so nova could ask for usage for 'instances' to filter out the usage tracked by the migration record holding the old flavor usage | 18:28 |
mriedem | anyway, it's a thing for cross-cell resize because while the instance is in VERIFY_RESIZE status we'll have the instance in both the source and target dbs and if we're counting from the dbs we don't want to double count the instance, so i need to filter out the hidden one, | 18:30 |
mriedem | but that got me thinking about how vcpus/ram will be counted | 18:30 |
melwitt | yeah. food for thought | 18:31 |
mriedem | i'm assuming no one wants to think about that or eat that food though | 18:31 |
melwitt | of course not | 18:31 |
*** jmlowe has quit IRC | 18:32 | |
openstackgerrit | Merged openstack/nova stable/rocky: doc: Fix openstack CLI command https://review.openstack.org/648425 | 18:33 |
openstackgerrit | Merged openstack/nova stable/rocky: doc: Capitalize keystone domain name https://review.openstack.org/650601 | 18:33 |
melwitt | this was brought up before in earlier discussions about counting from placement I think, and IIRC some (dansmith?) thought if placement is consuming 6 resources at that point in time, it makes sense for quota usage counting to reflect that as well | 18:33 |
dansmith | it depends on the resource and the direction | 18:34 |
melwitt | when do the old allocations go away from placement? VERIFY_RESIZE or after CONFIRM_RESIZE? | 18:34 |
dansmith | ideally we would only consume max(old_cpus, new_cpus) from quota | 18:34 |
dansmith | (and placement | 18:34 |
dansmith | because that's all we need to revert | 18:34 |
dansmith | but potentially we need to claim sum(old_disk, new_disk) depending | 18:34 |
melwitt | I see | 18:35 |
dansmith | and always both if we're not on the same node of course | 18:35 |
mriedem | right it's a mess and resize to same host doesn't help https://bugs.launchpad.net/nova/+bug/1790204 | 18:35 |
openstack | Launchpad bug 1790204 in OpenStack Compute (nova) "Allocations are "doubled up" on same host resize even though there is only 1 server on the host" [High,Triaged] | 18:35 |
mriedem | melwitt: the old allocations go away on confirm | 18:36 |
melwitt | ack | 18:36 |
mriedem | it's probably fair to say that our internal resource tracking and quota usage reporting during resize have just never aligned | 18:36 |
mriedem | the resource tracker would always report usage for the old flavor on the source node while resized to save room for a revert, | 18:37 |
mriedem | but the quota usage wouldn't reflect that | 18:37 |
mriedem | i don't know if the old pre-counting reservations stuff held some quota during a resize or not | 18:37 |
mriedem | i.e. did we hold a reservation on the target host and then /limits + reserved would report 6 in this scenario rather than 4, idk | 18:38 |
mriedem | no one would probably notice | 18:38 |
*** tbachman has quit IRC | 18:39 | |
melwitt | mriedem: doesn't look like it. from this, you would fail quota check if you didn't have room to revert https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L3201 | 18:44 |
mriedem | that code is all black magic to me, i'd have to setup an ocata devstack to see what happens in the API | 18:48 |
mriedem | definitely not a priority, was just thinking about it while i write a functional test for this for my cross-cell series | 18:48 |
melwitt | it does reserve for the new flavor here though at the beginning of the resize https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L3351 | 18:48 |
melwitt | if it's an upsize | 18:50 |
melwitt | so from what I can tell, it will consume quota usage for the new flavor as soon as the resize starts | 18:53 |
melwitt | only for an upsize | 18:53 |
melwitt | otherwise it will consume for the old flavor still | 18:54 |
mriedem | ok so pre-pike the RT would report usage for both the old and new flavor and /limits API would show usage for the old and new flavor (assuming upsize), and starting in pike we'll only report usage for the new flavor | 18:55 |
mriedem | which could also be a downsize | 18:55 |
mriedem | maybe i drop vcpu but bump disk or something | 18:55 |
*** bbowen_ has joined #openstack-nova | 18:57 | |
*** wolverineav has joined #openstack-nova | 18:58 | |
*** wolverineav has quit IRC | 18:58 | |
*** wolverineav has joined #openstack-nova | 18:58 | |
*** bbowen has quit IRC | 18:59 | |
melwitt | no, I think the limits API would show the usage for only the new flavor because it would be upsize_delta + current usage (old flavor) | 19:00 |
melwitt | pre-pike | 19:00 |
*** tbachman has joined #openstack-nova | 19:00 | |
melwitt | sorry I wasn't clear that it reserves the delta between old and new | 19:01 |
*** pcaruana has quit IRC | 19:02 | |
*** wolverineav has quit IRC | 19:15 | |
*** jmlowe has joined #openstack-nova | 19:16 | |
*** wolverineav has joined #openstack-nova | 19:16 | |
*** mdbooth_ has joined #openstack-nova | 19:20 | |
mriedem | incoming | 19:21 |
*** wolverineav has quit IRC | 19:21 | |
openstackgerrit | Merged openstack/nova stable/rocky: Add functional regression test for bug 1669054 https://review.openstack.org/649325 | 19:21 |
openstack | bug 1669054 in OpenStack Compute (nova) rocky "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem) | 19:21 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix ProviderUsageBaseTestCase._run_periodics for multi-cell https://review.openstack.org/641179 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Improve CinderFixtureNewAttachFlow https://review.openstack.org/639382 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1818914 https://review.openstack.org/641521 | 19:22 |
openstack | bug 1818914 in OpenStack Compute (nova) "Hypervisor resource usage on source still shows old flavor usage after resize confirm until update_available_resource periodic runs" [Low,In progress] https://launchpad.net/bugs/1818914 - Assigned to Matt Riedemann (mriedem) | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove unused context parameter from RT._get_instance_type https://review.openstack.org/641792 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update usage in RT.drop_move_claim during confirm resize https://review.openstack.org/641806 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add Migration.cross_cell_move and get_by_uuid https://review.openstack.org/614012 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add InstanceAction/Event create() method https://review.openstack.org/614036 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: Add instance hard delete https://review.openstack.org/650984 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add Instance.hidden field https://review.openstack.org/631123 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add TargetDBSetupTask https://review.openstack.org/627892 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellMigrationTask https://review.openstack.org/631581 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Execute TargetDBSetupTask https://review.openstack.org/633853 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add can_connect_volume() compute driver method https://review.openstack.org/621313 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_dest compute method https://review.openstack.org/633293 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add PrepResizeAtDestTask https://review.openstack.org/627890 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_source compute method https://review.openstack.org/634832 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add nova.compute.utils.delete_image https://review.openstack.org/637605 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add PrepResizeAtSourceTask https://review.openstack.org/627891 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method https://review.openstack.org/638047 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API https://review.openstack.org/638048 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server https://review.openstack.org/638268 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add archive_deleted_rows wrinkle to cross-cell functional test https://review.openstack.org/651650 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellWeigher https://review.openstack.org/614353 | 19:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add cross-cell resize policy rule and enable in API https://review.openstack.org/638269 | 19:22 |
efried | sean-k-mooney: you still awake? | 19:22 |
*** mdbooth has quit IRC | 19:23 | |
*** gmann_afk is now known as gmann | 19:27 | |
*** Sundar has quit IRC | 19:35 | |
melwitt | efried: fyi, patch to add a few post-release items to the ptl guide https://review.openstack.org/651009 | 19:36 |
*** baclawski has joined #openstack-nova | 19:37 | |
*** Sundar has joined #openstack-nova | 19:38 | |
*** awaugama has quit IRC | 19:39 | |
efried | melwitt: ack, on the radar, thank you. | 19:40 |
efried | oh, that was easy. +2 | 19:41 |
*** baclawski has quit IRC | 19:42 | |
*** baclawski has joined #openstack-nova | 19:47 | |
*** wolverineav has joined #openstack-nova | 19:54 | |
*** pchavva has quit IRC | 19:55 | |
*** bbowen_ has quit IRC | 19:56 | |
*** wolverineav has quit IRC | 19:57 | |
*** wolverineav has joined #openstack-nova | 19:57 | |
*** baclawski has quit IRC | 19:58 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: [DNM: extra logs] Revert resize: wait for external events in compute manager https://review.openstack.org/644881 | 20:00 |
* artom beats his head against the wall for ^^ | 20:00 | |
*** wolverineav has quit IRC | 20:02 | |
*** wolverineav has joined #openstack-nova | 20:03 | |
mriedem | heh i didn't realize there was a ptl guide | 20:08 |
*** wolverineav has quit IRC | 20:08 | |
*** wolverineav has joined #openstack-nova | 20:08 | |
*** wolverineav has quit IRC | 20:11 | |
*** wolverineav has joined #openstack-nova | 20:12 | |
*** wolverineav has quit IRC | 20:13 | |
*** wolverineav has joined #openstack-nova | 20:13 | |
openstackgerrit | Merged openstack/python-novaclient master: Add test for console-log and docs for bug 1746534 https://review.openstack.org/651827 | 20:14 |
openstack | bug 1746534 in python-novaclient "encoding error when doing console-log" [High,Fix released] https://launchpad.net/bugs/1746534 - Assigned to Thomas Goirand (thomas-goirand) | 20:14 |
melwitt | mriedem: I added it recently. was just a copy-paste of a google doc I had made to help me as ptl | 20:22 |
*** wolverineav has quit IRC | 20:26 | |
*** eharney has quit IRC | 20:34 | |
melwitt | this came up in downstream bug triage, cells v2 discover_hosts can fail with DBDuplicateEntry if run in parallel per compute host deployed | 20:40 |
dansmith | melwitt: I'm sure | 20:41 |
melwitt | found related issues from OSA and chef: https://bugs.launchpad.net/openstack-ansible/+bug/1752540 https://github.com/bloomberg/chef-bcpc/issues/1378 | 20:41 |
openstack | Launchpad bug 1752540 in openstack-ansible "os_nova cell_v2 discover failure" [Undecided,In progress] - Assigned to git-harry (git-harry) | 20:41 |
dansmith | why would someone run it on each compute node | 20:41 |
dansmith | ah the OSA one seems to be that they're running it per conductor | 20:42 |
dansmith | which is also not really what should be happening, but less bad than compute I guess | 20:42 |
melwitt | maybe lack of clear documentation but probably just thinking deploy compute host => discover host and made it part of that task piece | 20:42 |
dansmith | except you have to give the compute node db credentials it doesn't otherwise need in order to do that :) | 20:42 |
dansmith | but yeah, misunderstanding of what should be done, I'm sure | 20:42 |
mriedem | i want to say mdbooth_ opened some bug like this | 20:43 |
*** wolverineav has joined #openstack-nova | 20:43 | |
mriedem | concurrency something or other | 20:43 |
mriedem | but it was like db sync concurrently i think | 20:44 |
dansmith | it's not really a concurrency thing in the way he normally looks for such issues, | 20:44 |
*** artom has quit IRC | 20:44 | |
dansmith | yeah, that also is one of those "don't do that" things, IMHO | 20:44 |
melwitt | hm, I wasn't thinking that they give db credentials to compute node, just running discover_host per, centrally? I dunno, nevermind | 20:44 |
dansmith | unless we want to do our own locking in the DB to prevent it, which is kinda silly | 20:44 |
mriedem | it was this https://bugs.launchpad.net/nova/+bug/1804652 | 20:44 |
openstack | Launchpad bug 1804652 in OpenStack Compute (nova) "nova.db.sqlalchemy.migration.db_version is racy" [Low,In progress] - Assigned to Matthew Booth (mbooth-9) | 20:44 |
dansmith | melwitt: you have to have api db credentials to run discover hosts, which are normally not on the compute node (or shouldn't be) | 20:45 |
dansmith | mriedem: yeah, would love to WONTFIX that | 20:45 |
mriedem | you could -1 the change | 20:46 |
mriedem | start a war | 20:46 |
mriedem | but this is now the month of positivity and motivation | 20:46 |
melwitt | dansmith: yeah... I was thinking maybe they're doing that in a central place that is deploying compute nodes in parallel but not necessarily running nova-manage _on_ the compute nodes? | 20:46 |
*** hamzy has quit IRC | 20:46 | |
dansmith | melwitt: who are we talking about? I thought you said "per compute" so I thought you meant on a compute, but maybe you just mean spawning a bunch of nova-manage commands on one node one for each new host? | 20:47 |
melwitt | I'll reply on the bug with guidance to run discover_hosts once after deploying compute hosts. and then I think I'll have a docs patch in my future | 20:47 |
dansmith | that's even more.. crazy | 20:47 |
dansmith | should be using anisble triggers for that, which AFAIK, would mean a thing runs one | 20:47 |
dansmith | *once | 20:47 |
dansmith | like, you don't restart apache for every module you enable, you enable a bunch of modules and run restart at the end :) | 20:48 |
melwitt | dansmith: I don't actually know the details of how the deployment works, was just thinking it seems unlikely they're putting api db creds on compute hosts and running nova-manage on them | 20:48 |
dansmith | s/triggers/handlers/ | 20:48 |
dansmith | melwitt: ack, I just thought that was what you originally asserted | 20:48 |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient stable/stein: Add test for console-log and docs for bug 1746534 https://review.openstack.org/651925 | 20:49 |
openstack | bug 1746534 in python-novaclient "encoding error when doing console-log" [High,Fix released] https://launchpad.net/bugs/1746534 - Assigned to Thomas Goirand (thomas-goirand) | 20:49 |
melwitt | no, I didn't mean to make it sound like that | 20:49 |
*** takashin has joined #openstack-nova | 20:49 | |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient stable/stein: Add test for console-log and docs for bug 1746534 https://review.openstack.org/651925 | 20:49 |
dansmith | melwitt: so, we could do an external lock on nova-manage which would prevent the specific case of running multiple copies on a single host, | 20:50 |
dansmith | but it might also be confusing because it won't fix the case where it's multiple on different hosts | 20:50 |
dansmith | we could do some janky locking in the database to try to catch it, but I'd rather just say "do not do this lest ye burn in nova purgatory" | 20:50 |
dansmith | or, you know, something | 20:50 |
mriedem | i smell etcd | 20:51 |
mriedem | and tooz | 20:51 |
melwitt | lol.. yes | 20:51 |
mriedem | that's like 2 etcd references for nova in the last few weeks | 20:51 |
dansmith | yes, let's deploy and depend on etcd instead of people using ansible handlers! :P | 20:51 |
* mriedem starts writing the spec | 20:51 | |
melwitt | ok, I'll reply with guidance on what to do. thanks y'all | 20:52 |
dansmith | mriedem: make sure the spec says that etcd is only running when nova-manage is used, and stopped at runtime | 20:52 |
* dansmith polishes his resume | 20:52 | |
* mriedem thinks of polish sausage | 20:52 | |
eandersson | Do you need to run nova-manage placement heal_allocations when upgrading from non-cell to Cell V2? | 20:53 |
mriedem | eandersson: no | 20:53 |
mriedem | doesn't have anything to do with cells | 20:53 |
eandersson | Let me re-phrase, when upgrading from Mitaka to Rocky | 20:53 |
mriedem | it was built for people migrating from caching scheduler - which didn't use placement and thus didn't create allocations - to filter scheduler | 20:53 |
mriedem | eandersson: in that case...maybe | 20:54 |
eandersson | because placement is empty for us | 20:54 |
mriedem | heh | 20:54 |
eandersson | and it was never in any of the upgrade steps | 20:54 |
mriedem | because "upgrading from 1983 to now" isn't a supported upgrade | 20:54 |
eandersson | well we didn't upgrade from mitaka to rocky | 20:54 |
eandersson | we upgraded from one version to another | 20:54 |
eandersson | also not a helpful comment | 20:55 |
mriedem | eandersson: sorry, | 20:55 |
mriedem | you've been at rocky for a couple of weeks now haven't you/ | 20:55 |
mriedem | ? | 20:55 |
eandersson | Yes - and things have been working great | 20:55 |
eandersson | but we started seeing some odd errors | 20:55 |
dansmith | eandersson: don't mind mriedem he's just cranky today.. it's about nap time | 20:55 |
openstackgerrit | Merged openstack/nova stable/stein: Add retry_on_deadlock to migration_update DB API https://review.openstack.org/648428 | 20:55 |
mriedem | so you're at rocky but still not seeing anything in placement? like resource providers? or just allocations? | 20:55 |
mriedem | i'm not sure how you could create a server... | 20:56 |
eandersson | We see new servers in placement | 20:56 |
mriedem | but not the old ones | 20:56 |
mriedem | *you don't see allocations for old servers in placement | 20:56 |
eandersson | The problem is that now we try to run the above command and it fails | 20:56 |
* melwitt stands amazed that ocata devstack deployed without a hitch | 20:56 | |
mriedem | eandersson: paste me the error | 20:57 |
mriedem | i should have put a dry run on that command | 20:57 |
mriedem | it was in the todo list | 20:57 |
melwitt | nova meeting in a few min | 20:57 |
eandersson | The first error we saw was this | 20:57 |
eandersson | > Instance xxx has allocations against this compute host but is not found in the database. | 20:57 |
eandersson | Next we ran the heal command, but for some reason it thinks the overprovisoned host is too full and fails to complete | 20:58 |
mriedem | the heal command is trying to PUT /allocations against a given resource provider (compute node) for a given instance, | 20:58 |
mriedem | and i've you've migrated or created servers on that compute node, placement is going to say "you're already at capacity" | 20:59 |
mriedem | which is why that PUT is probably failing even though there is already a server on it consuming resources | 20:59 |
mriedem | eandersson: so in that case, it'd be helpful to look at the inventory reported by placement for that compute node | 21:00 |
efried | nova meeting now #openstack-meeting | 21:00 |
mriedem | eandersson: which you can get with this https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-show | 21:00 |
mriedem | uuid is the uuid of the compute node | 21:00 |
mriedem | sorry https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-list is probably more helpful | 21:00 |
ceryx | After running heal_allocations we have the same consumer_id (which I believe is the compute instance ID) have allocations created across 6 separate resource providers, each being a different compute node | 21:02 |
mriedem | ceryx: are you and eandersson in kahoots or just coincidental? | 21:02 |
eandersson | kahoots :D | 21:02 |
mriedem | ok | 21:03 |
mriedem | good | 21:03 |
ceryx | For RAM, 5 of those allocations are for 32GB but one is for 64GB which seems also strange (and yes sorry, was about to clarify, working with eandersson) | 21:03 |
*** whoami-rajat has quit IRC | 21:03 | |
mriedem | the consumer id is the instance id yes | 21:04 |
mriedem | has the instance been migrating around? | 21:04 |
*** ak92514 has joined #openstack-nova | 21:05 | |
efried | that 64... if we're leaking allocations and we move an instance away and back...? | 21:05 |
ceryx | http://paste.openstack.org/show/749212/ this is what we're seeing for one consumer ID. I checked nova.migrations and don't see this instance mentioned. | 21:07 |
openstackgerrit | Merged openstack/nova stable/stein: Use Selection object to fill request group mapping https://review.openstack.org/647713 | 21:07 |
mriedem | it could be a migration record | 21:07 |
mriedem | check your migrations table in the cell db for that id | 21:07 |
mriedem | do you see errors in the logs when you're doing a migration? | 21:08 |
mriedem | my guess is you've done cold migrations / resize and the source node allocations, which are held by the migration record consumer, are not getting cleaned up for some reason | 21:09 |
mriedem | the multiple allocations for the same consumer and provider are because of different resource classes, mostly like VCPU and MEMORY_MB | 21:10 |
eandersson | We have had migrations disabled for a very long time | 21:10 |
ceryx | Ah yep, I was checking uuid and not instance_uuid from migrations. This had previously (about a month ago) had 6 failed migrations, then one confirmed migration. | 21:10 |
mriedem | if these are volume-backed servers | 21:10 |
eandersson | oh so nvm :D | 21:10 |
mriedem | ceryx: in a nutshell, when you start a cold or live migration, the allocations held by the instance (instances.uuid) on the source node provider are moved to a migration record consumer (consumer_id=migrations.uuid), and the dest node provider allocations are held by the instance during scheduling. | 21:12 |
mriedem | https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html | 21:13 |
mriedem | when the cold migration / resize is confirmed the allocations against the source node held by the migration record should be deleted | 21:13 |
mriedem | and you should just be left with the instance consuming allocations from the target provider | 21:14 |
mriedem | so in this paste is fec5409b-010e-4316-845c-ef68440d3593 an instance uuid or migration uuid? | 21:14 |
mriedem | looks like resource_class_id=0 is for VCPU and resource_class_id=1 is for MEMORY_MB | 21:15 |
ceryx | In this case the computes that we see unexpected allocations on were target hypervisors of an errored migration. So it looks like heal_allocations might be creating allocations for migrations in error state? | 21:15 |
mriedem | and this is a volume-backed server because there is no DISK_GB allocation | 21:15 |
ceryx | And yes - this has a cinder backed root disk | 21:15 |
mriedem | heal_allocations shouldn't even be looking at migrations, just instances, but it's been awhile since i dug into this | 21:16 |
openstackgerrit | Merged openstack/nova stable/stein: Add functional recreate test for bug 1819963 https://review.openstack.org/648401 | 21:17 |
openstack | bug 1819963 in OpenStack Compute (nova) stein "Reverting a resize does not update the instance.availability_zone value to the source az" [Medium,In progress] https://launchpad.net/bugs/1819963 - Assigned to Matt Riedemann (mriedem) | 21:17 |
*** slaweq has quit IRC | 21:20 | |
mriedem | so heal_allocations should skip the instance if it doesn't have a node set, but if the migration failed the node shouldn't keep changing | 21:21 |
mriedem | and it also shouldn't PUT new allocations if the instance already has allocations in placement, which it should have if you tried migrating it since you upgraded to rocky | 21:21 |
eandersson | We don't use azs btw | 21:21 |
*** dansmith changes topic to "Current runways: https://etherpad.openstack.org/p/nova-runways-train -- This channel is for Nova development. For support of Nova deployments, please use #openstack." | 21:21 | |
*** ChanServ sets mode: -o dansmith | 21:22 | |
eandersson | (or we only have one rather :p) | 21:22 |
mriedem | note sure why azs would have anything to do with this | 21:22 |
eandersson | ah I was looking at the bug above :p | 21:23 |
eandersson | Didn't realize that it was unrelated | 21:23 |
mriedem | so for all 6 compute node resource providers in that paste, are there actually 6 matching compute_nodes in the cell db with the same uuid as the resource provider? | 21:24 |
mriedem | i wonder if you maybe have duplicate compute nodes? | 21:24 |
mriedem | but with different uuids | 21:24 |
mriedem | although unless the hostname changed i'm not sure how that could happen since there is a unique constraint on that table for host/hypervisor_hostname | 21:25 |
mriedem | melwitt: ^ this kind of problem is why switching counting quota usage from placement by default worries me | 21:26 |
ceryx | Yeah, all migration attempts were post-rocky upgrade. Each one of the resource provides does match a different compute_node that are all still online, they were just past targets for the failed migration. | 21:26 |
mriedem | ceryx: hmm, so maybe the scheduler created the allocations for that instance and each of those providers, but then the migration failed and we failed to cleanup the allocations created by the scheduler | 21:27 |
ceryx | In the allocations DB they were all created when we ran heal_allocations though according to the created_at date, so these aren't old allocations that were failed | 21:27 |
ceryx | All the allocations for this consumer_id across all 6 resource providers were created at 2019-04-11 20:25:57 | 21:27 |
mriedem | like i said, heal_allocations should only create allocations if the instance doesn't have any and even then should only be against the same node for the instance.host/node values | 21:27 |
mriedem | do you have the output of the command? | 21:28 |
melwitt | mriedem: ack. the more examples I see, the more I lean toward making it opt-in by default for now. until we have allocations issues more sorted out | 21:28 |
dansmith | also why I want to make the image filter (and other prefilters) opt-in | 21:29 |
dansmith | until we're sure they're 100% right for everyone | 21:29 |
* melwitt nods | 21:29 | |
*** tosky has joined #openstack-nova | 21:29 | |
mriedem | ceryx: this is the method that does the actual work per instance https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L1811 | 21:29 |
mriedem | so the instance has to be on a node here https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L1838 | 21:30 |
mriedem | then we check to see if it already has allocations https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L1848 | 21:30 |
mriedem | if it doesn't, we get the compute node uuid which should match the provider https://github.com/openstack/nova/blob/stable/rocky/nova/cmd/manage.py#L1883 | 21:30 |
mriedem | so i'm not sure why that would create allocations for the same instance against 6 different providers | 21:31 |
mriedem | what would be more likely to me is what i said before - the scheduler created the allocations during the migration, it failed, and then we didn't cleanup somewher | 21:31 |
ceryx | mriedem: I don't have the full output, was not expecting this many allocations to get created and it ate up all my scrollback. | 21:32 |
* mriedem curses self for not adding the --dry-run option yet | 21:32 | |
ceryx | What would the process be for cleaning up allocations that exist but shouldn't? Would it be relatively safe to delete the allocations for this one consumer_id, then rerun heal_allocations and confirm what was added back? | 21:32 |
mriedem | sec | 21:32 |
mriedem | mnaser has a script i think | 21:33 |
mriedem | https://bugs.launchpad.net/nova/+bug/1793569 | 21:33 |
openstack | Launchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed] | 21:33 |
*** Sundar has quit IRC | 21:33 | |
mriedem | ceryx: so that bug has a link to a script from mnaser which it looks like just dumps commands to run, | 21:35 |
mriedem | and links to another tool from larsks | 21:35 |
mriedem | ceryx: "Would it be relatively safe to delete the allocations for this one consumer_id, then rerun heal_allocations and confirm what was added back?" - i think so, but i'd also like it better if you had a --dry-run option when doing that with heal_allocations as well, | 21:36 |
mriedem | which i could probably wip up real quick | 21:36 |
*** slaweq has joined #openstack-nova | 21:37 | |
*** wolverineav has quit IRC | 21:37 | |
ceryx | That would be awesome :D | 21:38 |
*** wolverineav has joined #openstack-nova | 21:38 | |
mriedem | ok will crank something out here | 21:38 |
imacdonn_ | so I seem to have a problem with Stein .. haven't fully diagnosed yet, but if I run online_data_migrations a second time, fill_virtual_interface_list fails with: | 21:40 |
imacdonn_ | 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4050, in _security_group_ensure_default | 21:40 |
imacdonn_ | 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage default_group = _security_group_get_by_names(context, ['default'])[0] | 21:40 |
imacdonn_ | 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage TypeError: 'NoneType' object has no attribute '__getitem__' | 21:40 |
imacdonn_ | ring any bells ? | 21:40 |
imacdonn_ | http://paste.openstack.org/show/73aAO3bB23d62wBjt5nL/ | 21:42 |
mriedem | imacdonn_: not for me | 21:42 |
* mriedem needs help from other nova people at the help desk | 21:42 | |
*** wolverineav has quit IRC | 21:42 | |
imacdonn_ | k. I'll try to dig into it a bit. Tnx. | 21:43 |
*** wolverineav has joined #openstack-nova | 21:46 | |
efried | imacdonn_: Looking at that method, that should be impossible | 21:46 |
efried | oh | 21:47 |
efried | imacdonn_: Do you have more than one security group named 'default'? | 21:47 |
imacdonn_ | well, each project has one..... | 21:47 |
efried | It oughtta be filtering by project ID | 21:48 |
imacdonn_ | I do see two rows in the security_groups table with name="default" but project_id NULL .. not sure if that should be | 21:49 |
efried | but that's the only way that method can return None | 21:49 |
efried | mhm, that'd do it. | 21:49 |
efried | if you called with project_id NULL somehow | 21:49 |
efried | in your context | 21:49 |
efried | imacdonn_: Repeatable? | 21:49 |
*** gyee has joined #openstack-nova | 21:49 | |
mriedem | ah, | 21:49 |
mriedem | the online_data_migratoin is using an admin context, | 21:50 |
mriedem | which doesn't have a project_id | 21:50 |
imacdonn_ | I'm seeing it in two different installations - one was a fresh install, the other upgraded from Rocky | 21:50 |
*** slaweq has quit IRC | 21:50 | |
efried | mriedem: but that works fine unless there's multiple default security groups with project_id NULL, yah? | 21:51 |
efried | imacdonn_: So I can WIP a patch that ought to make the problem go away; or you can manually delete the extra row from your security groups table. | 21:51 |
mriedem | efried: he said he's running it twice | 21:52 |
efried | course it'd be nice to know how it got there | 21:52 |
melwitt | I really hope this isn't related to the user_id instance mapping migration somehow | 21:52 |
mriedem | and blows up the 2nd time right? | 21:52 |
mriedem | melwitt: he said it was the vifs one | 21:52 |
mriedem | "if I run online_data_migrations a second time, fill_virtual_interface_list fails with:" | 21:52 |
melwitt | yeah, but I added user_id to that, which shouldn't hurt | 21:52 |
imacdonn_ | yeah, it seemed to work OK the first time, but blows on subsequent attempts ... not sure where these NULL security groups are coming from | 21:53 |
imacdonn_ | in the frest install case, the projects probably didn't exist when migrations were run the first time | 21:54 |
melwitt | ok, I added the user_id for the fill virtual interface list instance mapping marker record. so shouldn't be related but just wanted to mention there was a change in stein there | 21:55 |
mriedem | the vifs migration was also new in stein | 22:07 |
*** luksky has quit IRC | 22:08 | |
melwitt | oh. nevermind me | 22:09 |
*** igordc has quit IRC | 22:09 | |
mriedem | imacdonn_: w/o looking at the code i think the migration is creating a marker instance record | 22:10 |
mriedem | which is why it's using an empty admin context with no project, | 22:10 |
mriedem | it should be using a sentinel for the project_id probably | 22:10 |
melwitt | it uses a sentinel of all zeros uuid | 22:10 |
mriedem | should be fairly easy to reproduce that by just modifying an existing test to run the command twice | 22:10 |
melwitt | https://github.com/openstack/nova/blob/master/nova/objects/virtual_interface.py#L303 | 22:11 |
mriedem | melwitt: but the context doesn't have that | 22:11 |
mriedem | and thta's what the db api is looking for i think | 22:11 |
melwitt | oh, other direction | 22:12 |
mriedem | http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n4037 | 22:12 |
melwitt | I see, ok | 22:13 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: WIP/PoC: Introduces the openstacksdk to nova https://review.openstack.org/643664 | 22:13 |
mriedem | imacdonn_: report a bug | 22:13 |
mriedem | i'm glad we didn't backport that data migration yet...i was worried about just backporting it before anyone was using it (besides ovh) since it's pretty complicated | 22:13 |
mriedem | maciejjozefczyk: ^ | 22:14 |
imacdonn_ | OK. Are we still using launchpad? I've been a bit out of the loop | 22:14 |
mriedem | ceryx: i've got this --dry-run patch coming, just building docs and running tests locally first | 22:14 |
mriedem | imacdonn_: of course | 22:14 |
mriedem | only crazy projects move to SB :) | 22:14 |
imacdonn_ | heh ok | 22:14 |
melwitt | imacdonn_: yes launchpad for nova. it's placement that has moved to storyboard | 22:14 |
*** tosky has quit IRC | 22:15 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add --dry-run option to heal_allocations CLI https://review.openstack.org/651932 | 22:16 |
mriedem | ceryx: eandersson: ^ should be backportable to rocky i think, that code hasn't changed much | 22:16 |
mriedem | or run it in a container or something | 22:16 |
*** slaweq has joined #openstack-nova | 22:16 | |
imacdonn_ | https://bugs.launchpad.net/nova/+bug/1824435 | 22:18 |
openstack | Launchpad bug 1824435 in OpenStack Compute (nova) "fill_virtual_interface_list migration fails on second attempt" [Undecided,New] | 22:18 |
*** chhagarw has joined #openstack-nova | 22:21 | |
mriedem | imacdonn_: thanks triaged - are you working a fix? | 22:24 |
imacdonn_ | mriedem, negative .. I don't think I understand the problem well though (yet?) | 22:25 |
*** rcernin has joined #openstack-nova | 22:26 | |
efried | mriedem: I still don't get how the duplicate row is getting created. | 22:29 |
efried | Shouldn't https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/db/sqlalchemy/api.py#L4037 only happen the first time? | 22:29 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: WIP/PoC: Use SDK instead of ironicclient for node.get https://review.openstack.org/642899 | 22:32 |
mriedem | hmm yeah i'm not sure how https://github.com/openstack/nova/blob/03322bb517925a9f5a04ebdb41c3fd31e7962440/nova/db/sqlalchemy/api.py#L3874 can't either return at least 1 or raise | 22:33 |
*** chhagarw has quit IRC | 22:35 | |
efried | oh, that part is because there's two rows in the database with the "right" project_id (NULL) | 22:35 |
efried | mriedem: So the initial check (==) fails because there's *more* db rows than expected | 22:36 |
efried | and then the for loop doesn't hit because all the names match (because they're the same) | 22:36 |
mriedem | ah yup | 22:36 |
mriedem | i don't know why the unique constraint doesn't blow up - because NULL isn't considered unique? | 22:36 |
efried | so the problem is that we're somehow creating two rows, and I don't know how that .... | 22:36 |
mriedem | schema.UniqueConstraint('project_id', 'name', 'deleted', | 22:36 |
mriedem | name='uniq_security_groups0project_id0' | 22:36 |
mriedem | 'name0deleted'), | 22:36 |
efried | calling all zzzeek? | 22:36 |
mriedem | imacdonn_: what db are you using? oracle? | 22:37 |
mriedem | or mysql? | 22:37 |
imacdonn_ | mysql | 22:37 |
mriedem | https://stackoverflow.com/questions/3712222/does-mysql-ignore-null-values-on-unique-constraints/16541686 | 22:38 |
mriedem | i remember something about this when i added db2 support to nova way back when | 22:38 |
mriedem | db2 was very strict about null values in a constraint but mysql isn't | 22:38 |
mriedem | our tests don't fail b/c we're using sqlite | 22:39 |
efried | butbutbut | 22:39 |
efried | unique constraint or no | 22:39 |
efried | what, are we hitting that NotFound from multiple threads, reliably, at the same time?? | 22:39 |
*** slaweq has quit IRC | 22:40 | |
mriedem | i'm able to recreate the same thing in the db in my devstack http://paste.openstack.org/show/749218/ | 22:44 |
mriedem | but i'm not hitting errors running the online data migration | 22:44 |
mriedem | i created a server and ran the migrations a few times | 22:44 |
mriedem | imacdonn_: are you running the CLI concurrently or something? or able to recreate manually? | 22:45 |
imacdonn_ | mriedem, you mean like two instances of nova-manage at the same time? no.... I can run it manually and it fails consistently | 22:46 |
mriedem | on a fresh install? | 22:46 |
imacdonn_ | One of these was a fresh install, but I did create an instance at some point | 22:47 |
mriedem | ok i'm not sure how to recreate it then | 22:51 |
mriedem | ceryx: eandersson: it's getting late here for the work day but please follow up with me on whatever you figure out with this allocations thing | 22:52 |
melwitt | dansmith: would another option other than trying to lock for discover_hosts be to try-except around the host_mapping.create() and catch and ignore DBDuplicateEntry? | 22:52 |
melwitt | https://github.com/openstack/nova/blob/master/nova/objects/host_mapping.py#L193 | 22:54 |
mriedem | melwitt: here as well https://github.com/openstack/nova/blob/master/nova/objects/host_mapping.py#L211 | 22:54 |
mriedem | that seems reasonable though | 22:54 |
melwitt | yes, two places | 22:54 |
ceryx | mriedem: Thanks, will do. eandersson is building a new container now with that patch to test. | 22:55 |
dansmith | melwitt: obviously that will avoid the trace, yeah. I'd still abort and say "looks like you're doing naughty things, so I'm stopping" | 22:55 |
mriedem | ceryx: unfortunately there isn't a way to just run heal_allocations on a specific instance yet, but maybe you won't get too much output | 22:56 |
melwitt | dansmith: hm, ok | 22:56 |
mriedem | the discover hosts periodic is in the scheduler process right? | 22:56 |
mriedem | so you could have multiple schedulers running discover_hosts at the same time | 22:57 |
dansmith | mriedem: yup | 22:57 |
dansmith | yup | 22:57 |
mriedem | so...probably not a terrible idea to just handle the duplicate and move on | 22:57 |
melwitt | I guess I'm not sure why it's so bad because the database will synchronize everything anyway. if there's a collision, just ignore it | 22:57 |
dansmith | that periodic was really just for the ironic case where you'd have a small control plane | 22:57 |
mriedem | sure, but people are going to use knobs if we give them | 22:58 |
mriedem | i'm not saying that's what osa / chef are hitting here, but it's another possibility | 22:58 |
eandersson | Unfortunately a lot of changes between master and rocky in that script | 22:58 |
mriedem | eandersson: oh - probably the report client refactoring stuff... | 22:58 |
dansmith | melwitt: it's not so bad, it just seems like a bad idea to act like that's okay or expected.. we're looking for un-mapped records and adding host mappings, then marking those service records as mapped | 22:58 |
mriedem | eandersson: if we had a bug or something we could think about backporting that change upstream | 22:59 |
* efried just figured out why he hasn't been receiving bug mail: yet another place where email address needed to be changed | 22:59 | |
dansmith | melwitt: if you make sure none of that gets skipped (like one gets set mapped without a mapping being created, etc) then it's okay I guess | 22:59 |
mriedem | heh ibm loves the email | 22:59 |
efried | pretty sure they're bouncing 'em, cause I got a snail mail letter from another thing where I hadn't changed it. | 23:00 |
dansmith | melwitt: and the scheduler periodic is a case where we're kinda inviting you to run multiple in parallel, so there's that | 23:00 |
melwitt | dansmith: yeah... I could see that too. at the same time, I could see two discover_hosts happening to overlap. and yeah would have to be done with care | 23:00 |
*** tkajinam has joined #openstack-nova | 23:00 | |
dansmith | melwitt: but if we just skip that one and keep scanning, then you could have multiples of those just hammering your database when you bring a bunch of nodes online, all but one of them losing on every attempt | 23:01 |
melwitt | I was just thinking about the lock thing and wondered about ignoring dupes | 23:01 |
dansmith | if they all backoff and stop, then the one that won will proceed | 23:01 |
dansmith | the lock thing is just a hack to handle the sloppy ansible case | 23:01 |
dansmith | or puppet or whatever | 23:01 |
melwitt | yeah | 23:01 |
*** wolverineav has quit IRC | 23:01 | |
dansmith | since the have the same case for other things in nova-manage, | 23:02 |
dansmith | like db_sync, online_data_migrations, etc, it seems like we should just prescribe that those are to be run singly | 23:02 |
dansmith | and if not, we just need to fix it all, otherwise we're inviting confusion | 23:02 |
melwitt | that's true | 23:02 |
*** wolverineav has joined #openstack-nova | 23:02 | |
*** wolverineav has quit IRC | 23:02 | |
*** wolverineav has joined #openstack-nova | 23:02 | |
dansmith | heck, archive and purge probably explode if you run them in parallel | 23:03 |
dansmith | and probably map_instances | 23:03 |
melwitt | haha yeah | 23:03 |
melwitt | so the only valid concern would be the periodics | 23:03 |
dansmith | yeah, the periodic is legit, although like I said we added that for tripleo undercloud where they didn't have instrumentation to even run it, | 23:04 |
dansmith | and they only have one controller node | 23:04 |
melwitt | and I guess that would resolve itself eventually as the periodics keep running? | 23:04 |
dansmith | so we could *also* just augment the help for that and place a warning and/or make sure it just logs a warning and doesn't make too much noise in the logs | 23:04 |
dansmith | yes | 23:04 |
melwitt | like you fail by bad luck and then next time you'll get it | 23:05 |
dansmith | right | 23:05 |
dansmith | it's a slow lazy discovery anyway, so who cares, and if one fails, the other likely succeeded and mapped everything anyway | 23:05 |
melwitt | yeah, I definitely wanted to make docs/usage update to help with this somehow | 23:05 |
*** cdent has quit IRC | 23:05 | |
dansmith | jesus, is there no mystique in the art of running a nova these days? | 23:05 |
melwitt | yeah | 23:05 |
dansmith | always with the documentation | 23:06 |
melwitt | lol, mystique | 23:06 |
*** wolverineav has quit IRC | 23:10 | |
*** wolverineav has joined #openstack-nova | 23:13 | |
*** owalsh_ has joined #openstack-nova | 23:14 | |
*** owalsh has quit IRC | 23:15 | |
openstackgerrit | sean mooney proposed openstack/nova master: extend libvirt video model support https://review.openstack.org/647733 | 23:17 |
*** owalsh has joined #openstack-nova | 23:21 | |
*** kukacz has quit IRC | 23:21 | |
*** dpawlik has quit IRC | 23:21 | |
*** N3l1x has quit IRC | 23:21 | |
*** aspiers has quit IRC | 23:21 | |
*** amorin has quit IRC | 23:21 | |
*** bbobrov has quit IRC | 23:21 | |
*** antonym has quit IRC | 23:21 | |
*** jdillaman has quit IRC | 23:21 | |
*** jlvillal has quit IRC | 23:21 | |
*** owalsh_ has quit IRC | 23:22 | |
openstackgerrit | Merged openstack/nova stable/stein: Update instance.availability_zone on revertResize https://review.openstack.org/648402 | 23:23 |
*** mgoddard has quit IRC | 23:24 | |
*** sambetts_ has quit IRC | 23:25 | |
*** sambetts_ has joined #openstack-nova | 23:25 | |
*** mgoddard has joined #openstack-nova | 23:27 | |
*** kukacz has joined #openstack-nova | 23:27 | |
*** dpawlik has joined #openstack-nova | 23:27 | |
*** N3l1x has joined #openstack-nova | 23:27 | |
*** aspiers has joined #openstack-nova | 23:27 | |
*** amorin has joined #openstack-nova | 23:27 | |
*** bbobrov has joined #openstack-nova | 23:27 | |
*** antonym has joined #openstack-nova | 23:27 | |
*** jdillaman has joined #openstack-nova | 23:27 | |
*** jlvillal has joined #openstack-nova | 23:27 | |
*** owalsh_ has joined #openstack-nova | 23:29 | |
*** owalsh has quit IRC | 23:30 | |
*** owalsh has joined #openstack-nova | 23:35 | |
*** wolverineav has quit IRC | 23:36 | |
*** owalsh_ has quit IRC | 23:36 | |
zzzeek | efried: hey | 23:41 |
*** owalsh_ has joined #openstack-nova | 23:42 | |
*** owalsh has quit IRC | 23:43 | |
*** hongbin has quit IRC | 23:43 | |
*** igordc has joined #openstack-nova | 23:44 | |
openstackgerrit | Merged openstack/nova stable/stein: Temporarily mutate migration object in finish_revert_resize https://review.openstack.org/648688 | 23:47 |
*** owalsh has joined #openstack-nova | 23:51 | |
*** owalsh_ has quit IRC | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!