openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (8) https://review.openstack.org/575311 | 00:07 |
---|---|---|
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (9) https://review.openstack.org/575581 | 00:07 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (10) https://review.openstack.org/576017 | 00:07 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (11) https://review.openstack.org/576018 | 00:18 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (12) https://review.openstack.org/576019 | 00:18 |
*** itlinux has joined #openstack-nova | 00:23 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (13) https://review.openstack.org/576020 | 00:32 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (14) https://review.openstack.org/576027 | 00:38 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (15) https://review.openstack.org/576031 | 00:38 |
openstackgerrit | liuming proposed openstack/nova master: Deletes evacuated instance files when source host is ok https://review.openstack.org/605987 | 00:41 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: api-ref: Remove a description in servers-actions.inc https://review.openstack.org/608796 | 00:44 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (16) https://review.openstack.org/576299 | 00:49 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (17) https://review.openstack.org/576344 | 00:49 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (18) https://review.openstack.org/576673 | 00:57 |
*** lyarwood has quit IRC | 00:59 | |
*** tristanC has quit IRC | 01:01 | |
*** tristanC has joined #openstack-nova | 01:01 | |
*** hongbin has joined #openstack-nova | 01:08 | |
*** hshiina has joined #openstack-nova | 01:09 | |
*** slaweq has joined #openstack-nova | 01:11 | |
openstackgerrit | Merged openstack/nova master: Fix nits in choices documentation https://review.openstack.org/608310 | 01:12 |
naichuans | bauzas: Thanks. | 01:14 |
*** slaweq has quit IRC | 01:15 | |
*** slagle has joined #openstack-nova | 01:20 | |
*** mrsoul has quit IRC | 01:35 | |
openstackgerrit | lei zhang proposed openstack/nova master: Remove useless TODO section https://review.openstack.org/608802 | 01:44 |
*** Dinesh_Bhor has joined #openstack-nova | 01:44 | |
*** TuanDA has joined #openstack-nova | 01:54 | |
*** hshiina has quit IRC | 01:55 | |
*** mhen has quit IRC | 01:58 | |
*** Dinesh_Bhor has quit IRC | 01:59 | |
*** mhen has joined #openstack-nova | 02:00 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (19) https://review.openstack.org/576676 | 02:05 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (20) https://review.openstack.org/576689 | 02:05 |
*** hshiina has joined #openstack-nova | 02:14 | |
openstackgerrit | Takashi NATSUME proposed openstack/python-novaclient master: Replace MB with MiB https://review.openstack.org/608807 | 02:21 |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/rocky: Remove unnecessary redirect https://review.openstack.org/607400 | 02:28 |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add compute version 36 to support ``volume_type`` https://review.openstack.org/579360 | 02:34 |
*** psachin has joined #openstack-nova | 02:40 | |
*** lbragstad has joined #openstack-nova | 02:43 | |
*** spatel has quit IRC | 02:49 | |
openstackgerrit | Merged openstack/nova master: Remove redundant irrelevant-files from neutron-tempest-linuxbridge https://review.openstack.org/606989 | 02:51 |
*** tetsuro has joined #openstack-nova | 02:52 | |
*** Dinesh_Bhor has joined #openstack-nova | 02:55 | |
*** psachin has quit IRC | 03:10 | |
openstackgerrit | Takashi NATSUME proposed openstack/python-novaclient master: doc: Start using openstackdoctheme's extlink extension https://review.openstack.org/608829 | 03:23 |
*** psachin has joined #openstack-nova | 03:44 | |
*** lbragstad has quit IRC | 03:46 | |
*** lbragstad has joined #openstack-nova | 03:48 | |
*** lbragstad has quit IRC | 03:58 | |
*** tetsuro has quit IRC | 03:58 | |
*** hongbin has quit IRC | 04:00 | |
*** udesale has joined #openstack-nova | 04:07 | |
*** Dinesh_Bhor has quit IRC | 04:09 | |
*** jaosorior has joined #openstack-nova | 04:28 | |
*** pcaruana has joined #openstack-nova | 04:38 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add compute version 36 to support ``volume_type`` https://review.openstack.org/579360 | 04:48 |
alex_xu | jaypipes: sorry, I get more question on it https://review.openstack.org/#/c/555081/20, it's my last question I think, hope won't make you annoying on no ending spec review | 04:51 |
*** Dinesh_Bhor has joined #openstack-nova | 04:51 | |
*** ratailor has joined #openstack-nova | 04:53 | |
*** brinzhang has joined #openstack-nova | 04:53 | |
*** tetsuro has joined #openstack-nova | 04:54 | |
*** pooja-jadhav has joined #openstack-nova | 05:03 | |
*** ShilpaSD has quit IRC | 05:04 | |
*** pooja_jadhav has quit IRC | 05:04 | |
*** ShilpaSD has joined #openstack-nova | 05:06 | |
*** mdbooth has joined #openstack-nova | 05:11 | |
*** mdbooth has quit IRC | 05:13 | |
*** pooja-jadhav is now known as pooja_jadhav | 05:19 | |
*** janki has joined #openstack-nova | 05:25 | |
openstackgerrit | Naichuan Sun proposed openstack/nova master: xenapi(N-R-P): Add API to support vgpu resource provider create https://review.openstack.org/520313 | 05:26 |
openstackgerrit | Naichuan Sun proposed openstack/nova master: xenapi(N-R-P):Get vgpu info from `allocations` https://review.openstack.org/521717 | 05:28 |
openstackgerrit | Naichuan Sun proposed openstack/nova master: xenapi(N-R-P): support compute node resource provider update https://review.openstack.org/521041 | 05:28 |
openstackgerrit | Naichuan Sun proposed openstack/nova master: os-xenapi(n-rp): add traits for vgpu n-rp https://review.openstack.org/604269 | 05:28 |
*** tetsuro has quit IRC | 05:34 | |
*** Luzi has joined #openstack-nova | 05:48 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: api-ref: Replace non UUID string with UUID https://review.openstack.org/608854 | 06:02 |
*** adrianc has joined #openstack-nova | 06:13 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Transform volume.usage notification https://review.openstack.org/580345 | 06:28 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Transform compute_task notifications https://review.openstack.org/482629 | 06:29 |
*** brinzhang has quit IRC | 06:30 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata https://review.openstack.org/604260 | 06:30 |
*** brinzhang has joined #openstack-nova | 06:31 | |
*** tetsuro has joined #openstack-nova | 06:32 | |
*** ttsiouts has joined #openstack-nova | 06:34 | |
*** mdbooth has joined #openstack-nova | 06:55 | |
*** slaweq has joined #openstack-nova | 06:58 | |
*** mvkr has quit IRC | 06:59 | |
*** rcernin has quit IRC | 07:08 | |
*** helenafm has joined #openstack-nova | 07:08 | |
*** alexchadin has joined #openstack-nova | 07:24 | |
*** ralonsoh has joined #openstack-nova | 07:24 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (21) https://review.openstack.org/576709 | 07:29 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (22) https://review.openstack.org/576712 | 07:29 |
*** ttsiouts has quit IRC | 07:30 | |
*** ttsiouts has joined #openstack-nova | 07:31 | |
*** ttsiouts has quit IRC | 07:35 | |
*** ttsiouts has joined #openstack-nova | 07:36 | |
bauzas | good morning nova | 07:36 |
*** ttsiouts has quit IRC | 07:38 | |
*** ttsiouts has joined #openstack-nova | 07:38 | |
*** tetsuro has quit IRC | 07:39 | |
*** belmoreira has joined #openstack-nova | 07:40 | |
*** alexchadin has quit IRC | 07:41 | |
*** priteau has joined #openstack-nova | 07:48 | |
*** priteau has quit IRC | 07:50 | |
*** ttsiouts has quit IRC | 07:52 | |
*** mdbooth has quit IRC | 07:53 | |
*** mdbooth has joined #openstack-nova | 07:54 | |
*** jangutter has quit IRC | 07:55 | |
*** jangutter has joined #openstack-nova | 07:55 | |
*** hshiina has quit IRC | 07:55 | |
gibi | good morning | 07:57 |
gibi | bauzas, mriedem, jaypipes, efried: I saw the discussion about force live migrate on the scheduler meeting. I think I will post a mail with a summary of the different pieces and possibilities on the ML | 07:58 |
bauzas | ok cool | 07:58 |
openstackgerrit | Rodolfo Alonso Hernandez proposed openstack/os-vif master: Remove IPTools deprecated implementation https://review.openstack.org/605422 | 07:59 |
openstackgerrit | Jan Gutter proposed openstack/os-vif master: Add support for generic representors https://review.openstack.org/608693 | 08:02 |
*** dtantsur|afk is now known as dtantsur | 08:08 | |
*** mdbooth has quit IRC | 08:09 | |
*** ttsiouts has joined #openstack-nova | 08:09 | |
*** mdbooth has joined #openstack-nova | 08:10 | |
*** Dinesh_Bhor has quit IRC | 08:13 | |
*** janki is now known as janki|lunch | 08:16 | |
*** tetsuro has joined #openstack-nova | 08:26 | |
*** mdbooth has quit IRC | 08:28 | |
*** mdbooth has joined #openstack-nova | 08:37 | |
*** priteau has joined #openstack-nova | 08:45 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:49 | |
*** mdbooth has quit IRC | 08:50 | |
*** dpawlik has quit IRC | 08:51 | |
*** dpawlik has joined #openstack-nova | 08:52 | |
*** trungnv has joined #openstack-nova | 08:53 | |
*** psachin has quit IRC | 08:54 | |
mnaser | fwiw | 08:54 |
mnaser | http://logs.openstack.org/15/608315/4/check/gpu-test/7243464/job-output.txt.gz | 08:54 |
mnaser | https://review.openstack.org/#/c/608315/ | 08:54 |
mnaser | you should be able to do tests with access to a k80 gpu | 08:54 |
mnaser | with nested virt too | 08:56 |
mnaser | cc sean-k-mooney ^ | 08:56 |
*** lbragstad has joined #openstack-nova | 08:56 | |
*** belmorei_ has joined #openstack-nova | 08:57 | |
*** belmoreira has quit IRC | 08:57 | |
*** mdbooth has joined #openstack-nova | 09:21 | |
*** panda has quit IRC | 09:22 | |
*** mvkr has joined #openstack-nova | 09:22 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova-specs master: Detach and attach boot volumes - Stein https://review.openstack.org/600628 | 09:24 |
*** panda has joined #openstack-nova | 09:24 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add compute API version for when a ``volume_type`` is requested https://review.openstack.org/605573 | 09:25 |
jaypipes | gibi: cool with me. thank you! | 09:36 |
*** adrianc has quit IRC | 09:38 | |
*** imacdonn has quit IRC | 09:39 | |
gibi | bauzas, mriedem, jaypipes, efried: I've posted the mail. Sorry it turned out as a long one. http://lists.openstack.org/pipermail/openstack-dev/2018-October/135551.html | 09:42 |
jaypipes | gibi: :) no worries, it's a complicated subject. | 09:43 |
sean-k-mooney | mnaser: oh cool. bauzas should be pleased. thanks :) now we just need to figure out how to use devstack to deploy with vgpus | 09:46 |
sean-k-mooney | bauzas: do you have a local.conf/devstack plugin that automates teh setup | 09:46 |
*** vabada has quit IRC | 09:46 | |
mnaser | yeah feel free to hack away at it | 09:47 |
*** k_mouza has joined #openstack-nova | 09:48 | |
*** dave-mccowan has joined #openstack-nova | 09:49 | |
sean-k-mooney | mnaser: actully looking at https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html nvidia may have locked out the k80s... we can still try them | 09:49 |
mnaser | #justnvidiathings | 09:50 |
sean-k-mooney | mnaser: you know im really looking forward to the point were someone implement the vgpu support in the nouveau | 09:51 |
sean-k-mooney | driver so there are not nvida locks on the hardware | 09:51 |
*** imacdonn has joined #openstack-nova | 09:52 | |
sean-k-mooney | that said i personally use the nvidia binary driver becase performance | 09:52 |
mnaser | i'd say this is where things go beyond my knowledge | 09:54 |
sean-k-mooney | the nouveau is the opensource linux driver for nvidia gpus. the offical binary blob give you about 10-15% better performace in games and i thin its require to use some of the nvida only techs like hair works. for vgpus instead of the normal driver you run there grid driver which is desinged for there datachenter gpus | 09:56 |
sean-k-mooney | that said im 98% certin that if you could remove the sku check that it would likely work on there desktop and workstation gpus but nvidia want to charge for the privlage of useing virtualisation with there gpus | 09:57 |
*** vabada has joined #openstack-nova | 09:57 | |
*** janki|lunch is now known as janki | 10:02 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova stable/rocky: Add alloc cands test with nested and aggregates https://review.openstack.org/607454 | 10:02 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova stable/rocky: Fix aggregate members in nested alloc candidates https://review.openstack.org/608903 | 10:02 |
*** TuanDA has quit IRC | 10:03 | |
*** Dinesh_Bhor has quit IRC | 10:06 | |
*** k_mouza has quit IRC | 10:15 | |
*** k_mouza has joined #openstack-nova | 10:15 | |
*** lbragstad has quit IRC | 10:16 | |
*** k_mouza has quit IRC | 10:20 | |
*** adrianc has joined #openstack-nova | 10:24 | |
*** markvoelker has joined #openstack-nova | 10:25 | |
openstackgerrit | Merged openstack/nova master: conf: Gather 'live_migration_scheme', 'live_migration_inbound_addr' https://review.openstack.org/456572 | 10:28 |
*** nehaalhat_ has joined #openstack-nova | 10:29 | |
*** k_mouza has joined #openstack-nova | 10:29 | |
nehaalhat_ | Hi, can any one help me to merge this patch: https://review.openstack.org/#/c/581218/ | 10:29 |
*** takashin has left #openstack-nova | 10:30 | |
*** tetsuro has quit IRC | 10:35 | |
*** tetsuro has joined #openstack-nova | 10:40 | |
*** tbachman has quit IRC | 10:42 | |
*** moshele has joined #openstack-nova | 10:42 | |
*** ttsiouts has quit IRC | 10:44 | |
*** mvkr has quit IRC | 10:44 | |
*** k_mouza has quit IRC | 10:44 | |
*** alexchadin has joined #openstack-nova | 10:45 | |
*** k_mouza has joined #openstack-nova | 10:45 | |
*** k_mouza has quit IRC | 10:48 | |
*** k_mouza has joined #openstack-nova | 10:48 | |
*** udesale has quit IRC | 10:52 | |
*** Dinesh_Bhor has joined #openstack-nova | 10:52 | |
*** markvoelker has quit IRC | 10:58 | |
*** threestrands has joined #openstack-nova | 11:01 | |
*** ttsiouts has joined #openstack-nova | 11:01 | |
*** mvkr has joined #openstack-nova | 11:02 | |
*** k_mouza has quit IRC | 11:05 | |
*** ttsiouts has quit IRC | 11:09 | |
*** ttsiouts has joined #openstack-nova | 11:13 | |
*** erlon_ has joined #openstack-nova | 11:16 | |
*** slagle has quit IRC | 11:22 | |
*** jangutter has quit IRC | 11:29 | |
*** jangutter has joined #openstack-nova | 11:30 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add microversion 2.67 to support volume_type https://review.openstack.org/606398 | 11:32 |
*** alexchadin has quit IRC | 11:39 | |
openstackgerrit | Merged openstack/nova master: Move test.nested to utils.nested_contexts https://review.openstack.org/608416 | 11:46 |
*** alexchadin has joined #openstack-nova | 11:47 | |
*** markvoelker has joined #openstack-nova | 11:51 | |
*** ttsiouts has quit IRC | 12:01 | |
*** ttsiouts has joined #openstack-nova | 12:02 | |
*** tetsuro has quit IRC | 12:07 | |
*** Dinesh_Bhor has quit IRC | 12:09 | |
*** ttsiouts has quit IRC | 12:10 | |
*** brinzhang has quit IRC | 12:17 | |
*** ttsiouts has joined #openstack-nova | 12:22 | |
*** tbachman has joined #openstack-nova | 12:27 | |
*** ratailor has quit IRC | 12:36 | |
pooja_jadhav | Hi team, anyone help me in the https://github.com/openstack/nova/blob/85b36cd2f82ccd740057c1bee08fc722209604ab/nova/tests/functional/api_sample_tests/test_simple_tenant_usage.py#L85-L93.. When we run test name "test_get_tenants_usage" and passed instance_uuid_1 in the query. If we check the instances in simple tenant usages controller. we can see instance-2 object data in the instances list. If we pass instance-2 in query then in the | 12:41 |
pooja_jadhav | instances we can see instance-3. So how actually its working? | 12:41 |
openstackgerrit | Lucian Petrut proposed openstack/nova master: Fix os-simple-tenant-usage result order https://review.openstack.org/608685 | 12:45 |
bauzas | mnaser: sean-k-mooney: sorry was afk | 12:49 |
bauzas | mnaser: thanks for the proposal, but unfortunately, AFAIK, k80 devices aren't supported by nvidia for vGPUs | 12:50 |
* bauzas raises fist | 12:50 | |
bauzas | gibi: saw your thread, I need proper time to read it and reply to it | 12:51 |
* bauzas is currently stuck in downstream universe | 12:51 | |
*** skatsaounis has quit IRC | 12:53 | |
*** skatsaounis has joined #openstack-nova | 12:55 | |
gibi | bauzas: sure it needs time. The reason for the mail was to summarize the problem as it is pretty hard for solve it consistently without seeing every corner of it | 12:55 |
*** adrianc has quit IRC | 12:57 | |
*** lpetrut has joined #openstack-nova | 12:58 | |
*** alexchadin has quit IRC | 12:59 | |
stephenfin | dansmith: What's the by-service approach you refer to here? https://review.openstack.org/#/c/608703/ (link to a spec/commit is good) | 13:02 |
*** adrianc has joined #openstack-nova | 13:03 | |
*** dpawlik has quit IRC | 13:07 | |
*** ociuhandu has joined #openstack-nova | 13:07 | |
*** udesale has joined #openstack-nova | 13:13 | |
*** panda has quit IRC | 13:13 | |
*** ttsiouts has quit IRC | 13:13 | |
*** panda has joined #openstack-nova | 13:14 | |
*** mriedem has joined #openstack-nova | 13:19 | |
*** udesale has quit IRC | 13:21 | |
*** janki has quit IRC | 13:23 | |
*** mvkr has quit IRC | 13:23 | |
*** lbragstad has joined #openstack-nova | 13:28 | |
dansmith | stephenfin: did you see the link to the bug? | 13:32 |
dansmith | stephenfin: regardless, as I said, I think mentioning the discover step (whether by service or regular) in the devstack docs is the right thing to do | 13:32 |
stephenfin | Yup, that's the plan. Didn't click the bug link though. Will do now | 13:33 |
stephenfin | Well, soon as I'm done with downstream fun | 13:33 |
*** ttsiouts has joined #openstack-nova | 13:37 | |
*** awaugama has joined #openstack-nova | 13:42 | |
*** liuyulong has joined #openstack-nova | 13:43 | |
*** adrianc has quit IRC | 13:44 | |
efried | gibi: Can you help me understand some basics of selecting a destination host during evac/migrate? | 13:47 |
*** hongbin has joined #openstack-nova | 13:48 | |
*** mlavalle has joined #openstack-nova | 13:50 | |
efried | or dansmith bauzas | 13:51 |
efried | You can specify a host without the force flag and we'll run GET /a_c, right? | 13:51 |
dansmith | efried: I'm not sure what you're asking.. it's not really any different | 13:51 |
dansmith | yeah, IIRC | 13:51 |
dansmith | only the force flag makes us totally skip I think | 13:51 |
efried | And then what happens if the GET /a_c returns no candidates for the requested host? | 13:51 |
efried | Do we fail or do we select a different host? | 13:51 |
*** ttsiouts has quit IRC | 13:52 | |
dansmith | we should fail | 13:53 |
dansmith | lemme find a thread to pull | 13:53 |
dansmith | https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L983-L995 | 13:55 |
*** mchlumsky has joined #openstack-nova | 13:55 | |
dansmith | schedule with a single host in the destination field.. if we get back novalidhost, we error the migration | 13:56 |
*** adrianc has joined #openstack-nova | 13:56 | |
efried | So what use was the force flag ever supposed to be? Literally an optimization to avoid running some code, but behaviorally/algorithmically no difference? | 13:58 |
efried | Because in theory since we're using placement now, which is fast, that optimization buys us almost nothing. So if ^ is true, the force flag is basically obsolete anyway. | 14:00 |
*** eharney has joined #openstack-nova | 14:00 | |
mriedem | i can field this one... | 14:02 |
mriedem | efried: before the force flag, specifying a host at all bypassed the scheduler, | 14:02 |
mriedem | then a microversion was added which made passing a host go through the scheduler for validation, but apparently at least one person though we should preserve the ability to bypass the scheduler, so the force flag was added to do that backdoor | 14:02 |
*** ttsiouts has joined #openstack-nova | 14:02 | |
efried | What was the motivation to "bypass the scheduler"? Because it was inefficient? | 14:03 |
*** s1061123 has quit IRC | 14:03 | |
*** beagles is now known as beagles_cough | 14:03 | |
*** beagles_cough is now known as beagles | 14:03 | |
mriedem | idk, i'm assuming to oversubscribe a host just to move things around | 14:03 |
mriedem | at least temporarily | 14:03 |
efried | is oversubscribe even possible at this point? | 14:04 |
efried | (Outside of allocation ratio, which doesn't count) | 14:04 |
mriedem | i don't think so, at least not for vcpu/ram/dis | 14:04 |
mriedem | *disk | 14:04 |
mriedem | as i noted in gibi's patch - we broke that in pike when force still goes through claiming resource allocations in conductor | 14:04 |
sean-k-mooney | efried: if it bypassed the scduer it proably bypassed the placement claim too but i have never check that | 14:04 |
mriedem | sean-k-mooney: incorrect | 14:05 |
efried | Yeah, that's the point. Since we started claiming from placement, you can't oversubscribe. | 14:05 |
mriedem | efried: in the old days, before placement, if you bypass the scheduler, conductor would not send limits down to the RT so it wouldn't fail the limits check on the resource claim | 14:05 |
sean-k-mooney | mriedem: was that unintentional however as you jsut said we "broke" that in pike | 14:05 |
*** Luzi has quit IRC | 14:05 | |
efried | has anyone screamed about that breakage? | 14:05 |
efried | Or do we still not have enough serious operators on pike yet? :P | 14:06 |
*** s1061123 has joined #openstack-nova | 14:06 | |
mriedem | there are like 3 people i now on >= pike but no... | 14:06 |
mriedem | *know | 14:06 |
edleafe | efried: one reason for the force option was that admins wanted to be able to say "I know what I'm doing, dammit!". It wasn't about code efficiency | 14:06 |
efried | edleafe: But IIUC, the destination host is observed regardless. | 14:07 |
mriedem | observed? | 14:07 |
efried | meaning you either get on the suggested host or you die | 14:07 |
*** itlinux has quit IRC | 14:07 | |
mriedem | correct | 14:07 |
efried | I'm not sure how we would "fix" the oversubscribe thing at this point, without adding a placement feature to allow it. | 14:07 |
mriedem | i don't think adding features to support shooting yourself is something we want to do at this point | 14:08 |
efried | which I doubt we want to do | 14:08 |
efried | "Hey placement, you know that one thing you're supposed to be designed to do? Yeah, don't do that." | 14:08 |
edleafe | efried: it isn't just oversubscription that would cause the scheduler to reject it. Things like affinity, etc., would also cause it to fail | 14:08 |
mriedem | note that even with forced live migration, conductor still runs some checks that could cause us to reject the host | 14:08 |
mriedem | edleafe: nope | 14:09 |
efried | you mean besides the allocation claim? | 14:09 |
mriedem | we don't do any affinity checks outside of the scheduler for live migration | 14:09 |
edleafe | mriedem: without the force option, we do | 14:09 |
mriedem | in the scheduler | 14:09 |
mriedem | but that's more to my point - force is bad | 14:10 |
efried | I think what I'm getting at is, force isn't so much bad as... obsolete? | 14:10 |
mriedem | you could screw up AZs too | 14:10 |
efried | Like, it doesn't do anything useful anymore. | 14:10 |
edleafe | well, yeah, but admins wanted to be able to override | 14:10 |
mriedem | efried: agree | 14:10 |
edleafe | I'm not saying it's right | 14:10 |
mriedem | efried: it made a bit more sense pre-placement | 14:10 |
edleafe | Just that that was the pushback at the time | 14:10 |
efried | because oversubscribe | 14:10 |
*** ttsiouts has quit IRC | 14:11 | |
efried | Okay, thanks, this helps validate my response to gibi's thread. | 14:11 |
openstackgerrit | Merged openstack/os-vif master: Remove IPTools deprecated implementation https://review.openstack.org/605422 | 14:13 |
*** mugsie has joined #openstack-nova | 14:14 | |
openstackgerrit | Jan Gutter proposed openstack/os-vif master: Extend port profiles with datapath offload type https://review.openstack.org/572081 | 14:15 |
*** ttsiouts has joined #openstack-nova | 14:17 | |
belmorei_ | hi. I'm having an issue with Ironic allocations in placement | 14:17 |
*** eharney has quit IRC | 14:18 | |
belmorei_ | We define the Ironic flavors with resources:VCPU=0, ... as documented in "https://docs.openstack.org/ironic/queens/install/configure-nova-flavors.html". | 14:18 |
belmorei_ | What I see is that allocations are only done for the resource class, and not vcpus, ram, disk. I imagine this intentional? | 14:18 |
efried | what do you mean, only done for the resource class? | 14:20 |
belmorei_ | the allocation for the resource_provider is only the ironic resource_class | 14:20 |
dansmith | belmorei_: yes, expected | 14:20 |
belmorei_ | great. Let me describe the issue | 14:21 |
belmorei_ | The problem is that we enable the ironic flavors per project (these projects are only for baremetal) (projects are mapped to a cell that only has the baremetal nodes) | 14:24 |
belmorei_ | However, users also have access to the "default" flavors that are for VMs in these projects (we can't remove public flavors) | 14:24 |
*** spatel has joined #openstack-nova | 14:25 | |
belmorei_ | If a user makes the mistake to use a default flavor in these projects (flavor for VMs) placement can return already in use baremetal nodes because they have cpu, ram, ... | 14:26 |
dansmith | belmorei_: the baremetal nodes should be exposing no cpu,ram,etc inventory | 14:27 |
dansmith | belmorei_: they should expose one inventory item of the baremetal resource class and nothing else | 14:27 |
belmorei_ | dansmith: Good to know I would expect that, but it's not happening. Maybe a conf issue in my side | 14:28 |
dansmith | belmorei_: yeah, I'm not sure how that could be happening anymore.. jroll dtantsur ? | 14:29 |
* dtantsur is on a meeting, but will check soon | 14:29 | |
mriedem | https://github.com/openstack/nova/blob/stable/queens/nova/virt/ironic/driver.py#L790 | 14:29 |
mriedem | we still reported cpu/ram/disk inventory for ironic nodes in queens | 14:29 |
mriedem | https://github.com/openstack/nova/commit/a985e34cdeef777fe7ff943e363a5f1be6d991b7#diff-1e4547e2c3b36b8f836d8f851f85fde7 removed that in stein | 14:30 |
dansmith | mriedem: um, we should have had a cutover so we're not exposing both right? | 14:31 |
dansmith | I thought that was like pike | 14:31 |
dansmith | comment there says zero in pike | 14:31 |
jroll | dansmith: we didn't do the cutover so people could migrate their flavor | 14:31 |
dansmith | jroll: right, but in queens? | 14:32 |
jroll | this is the first I've heard of this, fwiw, though it makes sense | 14:32 |
jroll | dansmith: people were scared to remove it | 14:32 |
dansmith | hmm | 14:32 |
jroll | a quick workaround would likely be to set 0 for cpu/ram in ironic node.properties | 14:33 |
jroll | and then it will report 0 | 14:33 |
dansmith | I thought this was long since sorted | 14:33 |
jroll | yeah, I thought it just worked as well | 14:33 |
dansmith | what was the plan during the overlap to avoid exposing the old values once everything was migrated? | 14:33 |
*** eharney has joined #openstack-nova | 14:33 | |
dansmith | override properties in ironic? | 14:33 |
dansmith | s/everything/flavors/ | 14:33 |
jroll | I don't remember, sorry | 14:33 |
belmorei_ | jroll: that means updating all ironic nodes... can't do it. | 14:34 |
belmorei_ | How about if I remove the "resources:VCPU=0", ... from the flavors? | 14:34 |
*** moshele has quit IRC | 14:34 | |
mriedem | your bm flavors aren't the problem | 14:34 |
mriedem | the problem is the bm flavors are getting scheduled to the ironic nodes right? | 14:35 |
mriedem | *vm flavors | 14:35 |
dansmith | right | 14:35 |
jroll | that would prevent VMs from landing on an active ironic node, but not an inactive one | 14:35 |
mriedem | and that's because the bm nodes are reporting ram/cpu/disk inventory | 14:35 |
belmorei_ | jroll: true | 14:35 |
belmorei_ | ok, so for me the best is to backport the commit mentioned by mriedem | 14:39 |
mriedem | belmorei_: you might be able to just patch https://github.com/openstack/nova/blob/stable/queens/nova/virt/ironic/driver.py#L797 to be 0 | 14:39 |
belmorei_ | yeah, thanks | 14:39 |
mriedem | belmorei_: i'm guessing that's not going to backport cleanly to queens | 14:39 |
bauzas | gibi: thanks for the email recap | 14:39 |
belmorei_ | any plan to still include this in rocky? | 14:39 |
bauzas | gibi: I now understand the problem :) | 14:40 |
belmorei_ | because for me is a bug | 14:40 |
gibi | bauzas: :) | 14:40 |
mriedem | belmorei_: i'm not sure how others would feel about a stable/queens / rocky only config option to disable reporting vcpu/ram/disk inventory for ironic | 14:40 |
mriedem | a workarounds option | 14:40 |
gibi | mriedem, efried, edleafe: force flag means I want to move the server so desperately that I don't care about any safeties | 14:41 |
gibi | at least for me it means that | 14:41 |
mriedem | default to disabled, but if enabled, report 0 total vcpu/ram/disk inventory for ironic nodes | 14:41 |
mriedem | dansmith: ^ how do you feel about a config option backdoor for belmorei_'s case? | 14:41 |
mriedem | we're not going to backport https://github.com/openstack/nova/commit/a985e34cdeef777fe7ff943e363a5f1be6d991b7 | 14:42 |
dansmith | mriedem: I'm for it because I'm not sure how we're not effing people over with this right now | 14:42 |
belmorei_ | mriedem: this is easy for me to patch. Having a conf backdoor option doesn't seem good | 14:42 |
dansmith | I'm guessing tripleo people don't care because undercloud/overcloud | 14:42 |
mriedem | belmorei_: a workarounds config option provides a generic solution for *everyone* with this problem | 14:42 |
belmorei_ | mriedem: true | 14:43 |
mriedem | essentially it means enabling it says you've done your ironic instance flavor migration and you're good to go | 14:43 |
dansmith | right | 14:43 |
mriedem | and we have a nova-status check for that as well | 14:43 |
mriedem | belmorei_: how about you report a bug to start and we can go from there? | 14:43 |
mriedem | jroll: btw i do remember something breaking after we removed that code in stein, but i can't remember what off the top of my head | 14:44 |
mriedem | which is why i wanted to hold off on removing it right before the rocky GA | 14:44 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Don't emit warning when ironic properties are zero https://review.openstack.org/608573 | 14:46 |
jroll | mriedem: this? https://bugs.launchpad.net/nova/+bug/1787509 | 14:46 |
openstack | Launchpad bug 1787509 in OpenStack Compute (nova) "Baremetal filters and default filters cannot be used simultaneously in the same nova" [Undecided,Won't fix] | 14:46 |
jroll | or maybe https://bugs.launchpad.net/tripleo/+bug/1787910/ | 14:46 |
openstack | Launchpad bug 1787910 in OpenStack Compute (nova) rocky "OVB overcloud deploy fails on nova placement errors" [High,Fix committed] - Assigned to Matt Riedemann (mriedem) | 14:46 |
belmorei_ | mriedem I will create the bug report | 14:47 |
mriedem | maybe | 14:47 |
belmorei_ | mriedem dansmith jroll thanks for the help | 14:47 |
mriedem | jroll: around the time we deprecated the core/ram/disk filters | 14:47 |
jroll | mriedem: yeah I don't remember what either, just going by irc logs | 14:47 |
mriedem | yeah https://bugs.launchpad.net/tripleo/+bug/1787910 | 14:47 |
openstack | Launchpad bug 1787910 in OpenStack Compute (nova) rocky "OVB overcloud deploy fails on nova placement errors" [High,Fix committed] - Assigned to Matt Riedemann (mriedem) | 14:47 |
dansmith | mriedem: so they still had ram required in those flavors and failed when ironic stopped reporting ram inventory | 14:52 |
dansmith | yeah? | 14:52 |
dansmith | we have to cut over at some point and I thought we already had.. workaround config flag to let them get over the hump seems like the best thing at this point | 14:52 |
mriedem | dansmith: they being tripleo in that bug? | 14:54 |
dansmith | well, ovb but yeah | 14:54 |
mriedem | yeah looks like it based on https://review.openstack.org/#/c/596093/ | 14:55 |
*** munimeha1 has joined #openstack-nova | 14:55 | |
*** lpetrut has quit IRC | 14:56 | |
*** hamzy has quit IRC | 14:57 | |
*** hamzy has joined #openstack-nova | 15:01 | |
dtantsur | belmorei_: a workaround may be to remove memory_mb and vcpus from ironic nodes properties | 15:05 |
dtantsur | with something like $ openstack baremetal node unset <node> --property memory_mb | 15:06 |
dtantsur | mriedem: this may be a bit easier than hacking nova ^^ | 15:07 |
belmorei_ | dtantsur: thanks, but the problem is the number of baremetal nodes that we have. Also, we would need to change the commission procedure to include that. | 15:08 |
belmorei_ | for now I will just patch this in nova | 15:08 |
*** itlinux has joined #openstack-nova | 15:08 | |
dtantsur | belmorei_: do you use something like inspection to populate these properties? | 15:08 |
dtantsur | also before resource classes I used to use host aggregates to more or less separate bm and vm nodes on the same nova | 15:09 |
belmorei_ | dtantsur: yes, inspection populate them | 15:11 |
mriedem | belmorei_: out of curiosity, before cells v2, could your vm flavors get scheduled to bm cells/ | 15:13 |
mriedem | ? | 15:13 |
mriedem | or were the flavors segregated at the top cell layer? | 15:14 |
mriedem | belmorei_: also fyi, you can't set total=0 for inventory on the resource class as i said above, placement api will reject that since total must be >=1 | 15:17 |
mriedem | so need to just omit posting those non-custom-resource-class inventories | 15:17 |
belmorei_ | mriedem: with cellsV1 we were using the baremetal filters, so they will not be schedule to an already deployed node. But yes, if a user used a vm flavor a think it would be the same (it would use the physical node to create the vm flavor instance) | 15:20 |
belmorei_ | mriedem: thanks for the heads up for the patch | 15:21 |
*** tssurya has joined #openstack-nova | 15:26 | |
*** eharney_ has joined #openstack-nova | 15:31 | |
openstackgerrit | Martin Midolesov proposed openstack/nova master: vmware:PropertyCollector for caching instance properties https://review.openstack.org/608278 | 15:32 |
*** ttsiouts has quit IRC | 15:33 | |
mriedem | dansmith: belmorei_: fyi i'm working on a rocky patch with the workaround option | 15:33 |
*** ttsiouts has joined #openstack-nova | 15:33 | |
*** eharney has quit IRC | 15:34 | |
dansmith | mriedem: cool | 15:37 |
*** ttsiouts has quit IRC | 15:38 | |
belmorei_ | mriedem: thanks | 15:39 |
belmorei_ | mriedem: https://bugs.launchpad.net/nova/+bug/1796920 | 15:39 |
openstack | Launchpad bug 1796920 in OpenStack Compute (nova) "Baremetal nodes should not be exposing non-custom-resource-class (vcpu, ram, disk)" [Undecided,New] | 15:39 |
*** munimeha1 has quit IRC | 15:43 | |
*** mdbooth has quit IRC | 15:44 | |
*** hamzy has quit IRC | 15:50 | |
*** janki has joined #openstack-nova | 15:52 | |
mriedem | dansmith: looks like zuulv3 status something or other changed and now openstack-gerrit-dashboard is getting NoneType errors - you see the same? | 15:58 |
dansmith | mriedem: I noticed it was failing this morning but didn't go to look if zuul was down. usually that's the reason | 15:59 |
mriedem | i'm guessing API change http://zuul.openstack.org/status | 15:59 |
mriedem | not sure, but the dashboard is different | 15:59 |
*** macza has joined #openstack-nova | 15:59 | |
dansmith | ah yeah | 15:59 |
imacdonn | dansmith: could you take a peek at this, please? https://review.openstack.org/608091 | 16:03 |
*** belmorei_ has quit IRC | 16:03 | |
*** helenafm has quit IRC | 16:06 | |
*** ircuser-1 has quit IRC | 16:07 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: [stable-only] Add report_ironic_standard_resource_class_inventory option https://review.openstack.org/609043 | 16:08 |
mriedem | dansmith: jroll: dtantsur: ^ belmiro took off....would be nice if he can confirm that fixes his problem | 16:08 |
jroll | thanks | 16:09 |
imacdonn | that's one long option name :) | 16:10 |
mriedem | suggestions welcome | 16:10 |
*** belmoreira has joined #openstack-nova | 16:10 | |
mriedem | i figured do_the_dew wouldn't be helpful | 16:11 |
dansmith | imacdonn: done | 16:11 |
dansmith | imacdonn: mriedem should look at that too | 16:11 |
dansmith | or rather | 16:11 |
mriedem | i did once.. | 16:11 |
dansmith | mriedem should look at and agree with me on that too | 16:11 |
imacdonn | I do see your point | 16:13 |
imacdonn | not sure if anyone is actually doing the "keep hammering on it until it concedes" approach, but yeah | 16:14 |
edmondsw | and that notification having the message is also important for PowerVC, since it has means to present errors from notifications in the PowerVC GUI | 16:14 |
dansmith | imacdonn: I expect everyone is | 16:14 |
edmondsw | oops, ignore ^, somehow jumped channels | 16:15 |
imacdonn | my suspicion is that some people are running it once, and missing the fact that there are failures, and maybe others are not running it at all | 16:15 |
dansmith | people have to run this at various times or things won't work | 16:16 |
imacdonn | that may not be immediately obvious | 16:17 |
imacdonn | I've upgraded at least pike -> queens -> rocky without doing any online migrations, and nothing obviously didn't work | 16:17 |
dansmith | we have some db migrations which have blocked if you haven't run these to completion | 16:18 |
dansmith | maybe none since pike, but.. | 16:18 |
mriedem | http://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/tree/tasks/nova_db_setup.yml#n98 | 16:18 |
mriedem | osa is certainly using it | 16:18 |
dansmith | I guess the default now is to run until completion, which is probably what people are doing I guess | 16:18 |
dansmith | but I know the return value here was critical earlier when people were running batches themselves | 16:19 |
mriedem | imacdonn: we also migrate some stuff online outside of the command | 16:19 |
mriedem | like on read from the db | 16:19 |
mriedem | or new resource create | 16:19 |
imacdonn | yeah, I know ... my point is that it's possible to get away without running the command, at least in some circumstances | 16:20 |
dansmith | imacdonn: I'm not sure what that has to do with anything | 16:20 |
dansmith | OSA and tripleo, and I expect other systems run this explicitly | 16:20 |
mriedem | if it's possible it's by chance | 16:20 |
dansmith | if you don't and it doesn't break in the versions you use, then you got lucky, | 16:21 |
mriedem | like dan said, we probably just haven't had a blocker migration in awhile | 16:21 |
dansmith | but that doesn't really mean anything for how important this is to notbreak | 16:21 |
*** threestrands has quit IRC | 16:21 | |
mriedem | also depends on how old your data is, | 16:21 |
mriedem | i plan on dropping our request spec compat from newton in stein and if you don't have that migration done you'll fail to do things like migrate instances | 16:21 |
imacdonn | OK, nevermind .. I wasn't disgreeing that it's important to solve ... | 16:21 |
mriedem | so just make this return 2, doc and reno it and we're happy right? | 16:22 |
*** ralonsoh has quit IRC | 16:22 | |
imacdonn | yeah, that'd work for this particular problem ... although it's probably not backportable ? | 16:23 |
imacdonn | (since it'll break things that don't know to check for 2) | 16:23 |
mriedem | i'm not sure; if things are failing but tooling is not aware of it, i think it's probably better to opt to the side of putting an upgrade release note and saying this will fail now | 16:23 |
mriedem | but i'd rather know something isn't working explicitly | 16:24 |
mriedem | dansmith: agree? ^ | 16:24 |
imacdonn | but if the automation is just repeating infinitely until it gets a 0, it'll .... repeat infinitely | 16:24 |
dansmith | what if 2 means "I didn't do anything but there were exceptions", 1 means "I did things, maybe there were some exceptions too", 0 means "I didn't do anything, but no errors" | 16:25 |
dansmith | repeat on 1, done on 0, 2 means we hit terminal fail state | 16:25 |
dansmith | either way people that loop on nonzero will break with anything you're going to do, which is why reno and doc is super important | 16:26 |
imacdonn | right | 16:26 |
dansmith | not sure how I feel about changing behavior in a backport with a retroactive reno, but mriedem is the authority here, so I'd do whatever he says | 16:27 |
imacdonn | I'm thinking that most people only read release notes for a new release, not for errata updates | 16:28 |
dansmith | unfortunately I don't think they even read them for new releases, but.. yeah | 16:28 |
imacdonn | heh yeah, there is that | 16:28 |
*** k_mouza has joined #openstack-nova | 16:29 | |
mriedem | i'll defer to tonyb | 16:30 |
dansmith | the AUD stops with tonyb | 16:31 |
*** panda has quit IRC | 16:34 | |
*** spatel has quit IRC | 16:35 | |
*** panda has joined #openstack-nova | 16:36 | |
melwitt | . | 16:36 |
*** moshele has joined #openstack-nova | 16:39 | |
*** gyee has joined #openstack-nova | 16:43 | |
mriedem | i get it | 16:43 |
mriedem | took me awhile | 16:43 |
*** mriedem is now known as mriedem_stew | 16:43 | |
imacdonn | dansmith: I'm on the fence about your last suggestion .... I think I tend towards exceptions being bad, requiring some problem to be addressed immediately ... but then there was a suggestion that some migrations may raise exceptions "by design" if some other migration had not yet been completed | 16:48 |
dansmith | imacdonn: well, by design or not, we've had some that won't complete until others do | 16:49 |
*** eharney_ is now known as eharney | 16:52 | |
imacdonn | dansmith: is it defined somewhere that a migration should raise an exception in such a case? (as opposed to just not doing any work) Seems like ideally there should be a way for a migration to explicitly state that "I can't do this *yet*" | 16:55 |
dansmith | the way to do that is to return nonzero found, with zero done | 16:55 |
imacdonn | so how do you distinguish that from "I can't do this *all all, ever*" ? | 16:56 |
dansmith | regardless, because of the complexity of hitting all the cases of live data, which we're historically bad at doing, making the process stop on exception is just practically not the best plan, IMHO | 16:56 |
dansmith | there's no case where found is nonzero where done is expected to remain zero forever | 16:56 |
dansmith | found is items that should be migratable | 16:56 |
imacdonn | OK | 16:59 |
imacdonn | I'll try to implement that and see what falls out | 17:00 |
*** moshele has quit IRC | 17:01 | |
*** dtantsur is now known as dtantsur|afk | 17:08 | |
*** janki has quit IRC | 17:14 | |
*** sambetts_ is now known as sambetts|afk | 17:18 | |
*** janki has joined #openstack-nova | 17:22 | |
*** moshele has joined #openstack-nova | 17:25 | |
*** ralonsoh has joined #openstack-nova | 17:28 | |
*** moshele has quit IRC | 17:29 | |
*** tssurya has quit IRC | 17:39 | |
*** ralonsoh has quit IRC | 17:40 | |
*** mchlumsky has quit IRC | 17:40 | |
*** ociuhandu has quit IRC | 17:45 | |
*** k_mouza has quit IRC | 17:45 | |
*** qqmber has joined #openstack-nova | 17:47 | |
*** k_mouza has joined #openstack-nova | 17:49 | |
*** Swami has joined #openstack-nova | 17:49 | |
*** adrianc has quit IRC | 17:51 | |
*** k_mouza_ has joined #openstack-nova | 17:52 | |
*** k_mouza has quit IRC | 17:53 | |
*** k_mouza_ has quit IRC | 17:54 | |
*** k_mouza has joined #openstack-nova | 17:54 | |
*** k_mouza has quit IRC | 17:58 | |
*** moshele has joined #openstack-nova | 18:02 | |
*** k_mouza has joined #openstack-nova | 18:02 | |
*** hamzy has joined #openstack-nova | 18:05 | |
*** k_mouza_ has joined #openstack-nova | 18:05 | |
*** k_mouza has quit IRC | 18:07 | |
*** k_mouza_ has quit IRC | 18:08 | |
*** k_mouza has joined #openstack-nova | 18:08 | |
*** k_mouza_ has joined #openstack-nova | 18:11 | |
*** k_mouza has quit IRC | 18:13 | |
*** k_mouza_ has quit IRC | 18:14 | |
*** k_mouza has joined #openstack-nova | 18:14 | |
*** k_mouza has quit IRC | 18:18 | |
*** hoangcx has quit IRC | 18:19 | |
*** trungnv has quit IRC | 18:20 | |
*** k_mouza has joined #openstack-nova | 18:30 | |
sean-k-mooney | melwitt: i left some feedback in https://review.openstack.org/#/c/575735/2 fyi. hope that helps. the code should work but its duplicating logic that is not needed. | 18:30 |
*** mriedem_stew is now known as mriedem | 18:31 | |
melwitt | sean-k-mooney: cool thanks | 18:31 |
*** k_mouza has quit IRC | 18:34 | |
*** k_mouza has joined #openstack-nova | 18:34 | |
*** janki has quit IRC | 18:38 | |
*** k_mouza has quit IRC | 18:38 | |
*** k_mouza has joined #openstack-nova | 18:40 | |
*** k_mouza has quit IRC | 18:42 | |
mriedem | dansmith: off the top of your head, do you know much about the migration_context we stash on the instance during cold migration / resize (created by the RT move claim) and what we need it for besides routing neutron events to the source and dest host? looks like it's otherwise for tracking numa/pci on the source and dest host, | 18:43 |
mriedem | reason i ask is because i'm currently using a move claim for the cross-cell resize but i have alternatives to using a resize_claim, | 18:44 |
mriedem | both claim ways in the RT are kind of weird for how i'm doing this | 18:44 |
*** k_mouza has joined #openstack-nova | 18:46 | |
mriedem | re: https://review.openstack.org/#/c/603930/9/nova/compute/resource_tracker.py and https://review.openstack.org/#/c/603930/9/nova/compute/manager.py@5138 | 18:46 |
*** k_mouza_ has joined #openstack-nova | 18:49 | |
mriedem | i don't think i need to care about the migration_context for the same reasons as normal resize because for cross-cell, we'll have shelved offloaded from the source by the time we claim on the dest, so meh | 18:49 |
dansmith | mriedem: well, the point of it was to avoid doing things like looking up the most recent unfinished migration for an instance in order to get at things we were going to stash on there | 18:50 |
dansmith | probably for things that aren't covered by the old/new flavor, so yeah probably numaish things | 18:50 |
dansmith | maybe, but I guess I'd hope that we could make it as similar of a process as possible, | 18:50 |
dansmith | and the fact that we only have migration-context for cold moves right now is unfortunate I think | 18:50 |
*** k_mouza has quit IRC | 18:51 | |
mriedem | well, we can make it similar, but things get weird if we do, as noted in those links above | 18:51 |
mriedem | i just haven't ran into anything with this that requires needing the migration_context being set on the instance | 18:52 |
dansmith | we use it for directing notifications to both computes right? | 18:52 |
*** eharney has quit IRC | 18:53 | |
mriedem | yes, but i don't really need that with cross-cell resize, | 18:53 |
mriedem | because sending an event to the source is useless b/c we've shelved offloaded from the source | 18:53 |
*** k_mouza_ has quit IRC | 18:53 | |
mriedem | when we unshelve on the target, the instance.host gets set and the API will route the event there | 18:54 |
dansmith | sure, I was just responding to "we don't use it anywhere" | 18:54 |
mriedem | i also have to do cludgy shit like this https://review.openstack.org/#/c/603930/9/nova/compute/manager.py@5189 | 18:54 |
mriedem | if not using an instance_claim | 18:54 |
dansmith | so, one question I had but was saving was.. are you going to have a resize operation end up going through SHELVED_OFFLOADED from the user's perspective? | 18:54 |
*** k_mouza has joined #openstack-nova | 18:55 | |
mriedem | the terminal state is VERIFY_RESIZE for the user | 18:55 |
mriedem | i have a TODO related to that https://review.openstack.org/#/c/603930/9/nova/compute/manager.py@5024 | 18:55 |
dansmith | because the context would be one place to stash what we're doing to hide that | 18:56 |
mriedem | i think i need to leave the task_state set there, | 18:56 |
mriedem | and then on unshelve on the target host, we set the vm_state to RESIZED rather than ACTIVE: https://review.openstack.org/#/c/603930/9/nova/compute/manager.py@5195 | 18:57 |
mriedem | so the functional test is like a normal resize where the caller issues the resize and then polls for VERIFY_RESIZE status | 18:57 |
*** k_mouza_ has joined #openstack-nova | 18:58 | |
dansmith | aight, well, whatever.. point being, it seems bad to create more places where we don't have that set.. more places where if we needed to know which kind of migration we're doing we have to look at the list an guess | 18:58 |
dansmith | for example, | 18:58 |
dansmith | we have this long standing bug where we don't properly consider live migrations with pinned cpus as needing to match 1:1 for the destination host, | 18:59 |
dansmith | which is solvable in other ways, | 18:59 |
dansmith | but one downstream hack tries to find the migration, determine if it's a live one, so it can make better choices | 18:59 |
*** k_mouza has quit IRC | 18:59 | |
*** moshele has quit IRC | 18:59 | |
dansmith | which is icky for other reasons, but.. finding out what kind of migration is currently going on and what those details are seems like a good thing to me | 19:00 |
dansmith | so, not a real helpful opinion, but....there it is | 19:00 |
*** k_mouza has joined #openstack-nova | 19:01 | |
mriedem | i can do it either way, it's mostly just a question of how gorby we want the RT resize_claim flow to become when we're unshelving during a cross-cell resize....since as noted we have to do some things manually if we're not doing an instance_claim | 19:01 |
mriedem | *gorpy | 19:01 |
dansmith | do we not have any places where we might need to look at a late-stage migration and know if it was cross-cell before we allow or disallow something? | 19:01 |
*** k_mouza_ has quit IRC | 19:02 | |
dansmith | well, fwiw, the migration context setting via instance_claim never made sense to me, and continues to confuse me and others when we try to remember where it gets set (and why not for things like live migration) | 19:02 |
mriedem | it's not set in instance_claim, it's in resize_claim | 19:02 |
dansmith | you know what I mean gdi :) | 19:02 |
dansmith | in the claim process | 19:02 |
mriedem | i'm not aware of late stage thingies that rely on the migration_context, maybe there are in the actual finish_resize/finish_revert_resize/confirm_resize flows...which i'm not using | 19:03 |
dansmith | I'm saying things we'd need to handle in this process, not existing ones | 19:03 |
mriedem | when we revert/confirm the API looks up the migration directly, not via the migration_context: https://review.openstack.org/#/c/603930/9/nova/compute/api.py@3244 | 19:03 |
dansmith | yeah, which is kinda silly | 19:04 |
mriedem | which, i guess ^ works because we have the finished status and then it goes to 'completed' or something in the compute | 19:04 |
dansmith | that would be much nicer as "get the instance and get the current migration" instead of "sort and assume the last one is legit" | 19:04 |
mriedem | which sound the same to me | 19:04 |
dansmith | I should just stop talking. No, I don't have any good reasons. | 19:05 |
mriedem | oh i guess on revert the migration status goes to 'reverted' | 19:05 |
*** k_mouza has quit IRC | 19:05 | |
mriedem | and 'confirmed' on confirm | 19:05 |
mriedem | alright, well, it's just a drop in the bucket of questions in here | 19:05 |
mriedem | i just wanted to plant the seed of doubt in someone else's mind about this | 19:06 |
*** qqmber has quit IRC | 19:06 | |
mriedem | you're welcome | 19:06 |
*** eharney has joined #openstack-nova | 19:08 | |
*** markvoelker has quit IRC | 19:09 | |
*** markvoelker has joined #openstack-nova | 19:09 | |
*** hamzy_ has joined #openstack-nova | 19:10 | |
*** moshele has joined #openstack-nova | 19:11 | |
*** k_mouza has joined #openstack-nova | 19:12 | |
*** hamzy has quit IRC | 19:12 | |
*** k_mouza has quit IRC | 19:15 | |
*** markvoelker has quit IRC | 19:15 | |
*** markvoelker has joined #openstack-nova | 19:17 | |
*** tbachman has quit IRC | 19:18 | |
*** k_mouza has joined #openstack-nova | 19:18 | |
sean-k-mooney | so can i ask a dumb question related to that converstation. | 19:18 |
sean-k-mooney | what is the difference logically between a migration context, a migtion object and migration_data | 19:19 |
sean-k-mooney | i know all three exists but not sure why there is not jsut one datastructure | 19:19 |
sean-k-mooney | mriedem: dansmith is the answer to ^ documented anywhere | 19:20 |
dansmith | migration context is attached to an instance and contains a link to the migration record (i.e. the current one) and some other current details | 19:20 |
dansmith | migration objects are the in-progress and archival history of an instance's movements | 19:21 |
*** k_mouza has quit IRC | 19:21 | |
dansmith | migrate data is transient virt-specific detailage about the low-level bits that is ferried back and forth but never persisted | 19:21 |
*** k_mouza has joined #openstack-nova | 19:21 | |
sean-k-mooney | oh ok that actully kind of makes sence. the nameing is unfortuate but the reason for having 3 distinct entities makes sense | 19:22 |
mriedem | migrate_data is also only for live migration | 19:23 |
mriedem | hence the name, LiveMigrateData | 19:23 |
dansmith | yeah, left that detail out | 19:24 |
sean-k-mooney | mriedem: yes but for cold migrtion we kindo of abuse the migration_context for associting claim with the active migration too right | 19:24 |
*** k_mouza_ has joined #openstack-nova | 19:24 | |
*** moshele has quit IRC | 19:25 | |
sean-k-mooney | so we dont actully stuff the resocetrack claims into the migration_context as far as i know but i think we must the uuid of the migration_context when claiming the resouce or somthing like that. | 19:25 |
*** k_mouza has quit IRC | 19:26 | |
mriedem | based on my questions above, i'm clearly not the person to ask about the intracacies of how the migration_context is used during resize | 19:27 |
mriedem | *intricacies even | 19:27 |
dansmith | sean-k-mooney: migration context has no uuid, it's attached to the instance | 19:27 |
mriedem | MigrationContext.migration_id could be used to find the migration object if needed | 19:28 |
mriedem | within the same cell | 19:28 |
sean-k-mooney | sorry the migrtion record/object reffrence by the migration_context has a uuid which we use | 19:28 |
dansmith | sean-k-mooney: it has things that we don't need to persist after completion, unlike things we store in the migration record for posterity, like what flavor it was and what flavor it is now | 19:28 |
*** k_mouza_ has quit IRC | 19:28 | |
dansmith | mriedem: I meant the context doesn't have its own identifier | 19:28 |
mriedem | ah yeah | 19:28 |
dansmith | and, unfortunate that we used the id there I guess, as it makes it potentially less helpful for the cross-cell case | 19:29 |
mriedem | laura is home, the 2:30 vacuuming has started | 19:29 |
mriedem | for cross-cell everything is scoped to the cell db, so it's not a big deal | 19:29 |
*** k_mouza has joined #openstack-nova | 19:29 | |
dansmith | well, | 19:30 |
mriedem | conductor orchestrates the db switching when needed | 19:30 |
dansmith | once we want cross-cell live migration it'll probably be relevant | 19:30 |
mriedem | i think i'll probably be back at ibm working on php / xenapi by the time that happens.. | 19:30 |
* dansmith nods | 19:30 | |
mriedem | you know, the up and coming tech | 19:31 |
dansmith | it's a thing people want though | 19:31 |
mriedem | yeah, huawei's public cloud has a beta for cross-cell live migration, | 19:31 |
dansmith | especially if it were to make it easier to migrate from an older cloud to a newer one by making one a cell of the other until it's emptied | 19:31 |
mriedem | i'm not sure how they do it but... | 19:31 |
dansmith | that's been requested since icehouse or so | 19:31 |
*** k_mouza_ has joined #openstack-nova | 19:32 | |
sean-k-mooney | dansmith: we all know livemigration never works :P also if i get sirov live migration working this cycle i have no plan to test cross cell, sriov, nuam aware live migration between different releases during upgrade :) | 19:32 |
* sean-k-mooney now that i have said it a telco will ask for it | 19:33 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: WIP: Test report_ironic_standard_resource_class_inventory=False https://review.openstack.org/609107 | 19:33 |
mriedem | there is no cross cell live migration... | 19:33 |
*** k_mouza has quit IRC | 19:34 | |
sean-k-mooney | mriedem: didnt you jsut say huawei's public cloud has beta supprot. i would assume they will ask you to upstream it a some point or is that complete different devision form yours? | 19:34 |
*** awaugama has quit IRC | 19:35 | |
mriedem | completely different | 19:35 |
*** k_mouza has joined #openstack-nova | 19:35 | |
mriedem | their public cloud is still using cascading, which is their proprietary cells v1 | 19:35 |
mriedem | they are working on migrating off that to cells v2 | 19:36 |
dansmith | that's more like cross-deployment live migration | 19:36 |
sean-k-mooney | dansmith: is that still an ask from the edge working group? | 19:36 |
*** k_mouza_ has quit IRC | 19:36 | |
dansmith | is what? cross-deployment live migration? | 19:37 |
dansmith | I'm sure it'll be on their list at some point | 19:37 |
dansmith | I can see the cross-cell thing being fine, but I dunno about cross-deploy | 19:37 |
sean-k-mooney | ya when i was in the edge room in dublin i spent 15 minute explaining why cross cloud inter hyperviror live migration was never going to be a thing and should not be in the phase 1 basic feature support for edge | 19:38 |
*** k_mouza_ has joined #openstack-nova | 19:38 | |
sean-k-mooney | they litrally wanted to live migrate form libvirt + ceph in one edge site to vmware on another | 19:39 |
jaypipes | dansmith, mriedem, melwitt: is there a reason we don't pass instance metadata to the RequestSpec? | 19:39 |
*** k_mouza has quit IRC | 19:40 | |
dansmith | jaypipes: why would we need to? | 19:40 |
jaypipes | dansmith: so that scheduler filters can look at the instance metadata? :) | 19:40 |
dansmith | they should never do that | 19:40 |
dansmith | instance metadata is owned by the user not nova | 19:40 |
jaypipes | dansmith: oh, but Oath disagrees strongly. ;) | 19:40 |
dansmith | looking at it, especially for placement would violate | 19:40 |
dansmith | jaypipes: -2 | 19:40 |
jaypipes | hehe | 19:41 |
melwitt | but what about... custom filters YALL | 19:41 |
jaypipes | dansmith: what melwitt said :) | 19:41 |
dansmith | yeah | 19:41 |
jaypipes | dansmith: example... | 19:41 |
sean-k-mooney | jaypipes: jay just stuff the info in a schduler hint and use the json fileter | 19:41 |
* sean-k-mooney ducks | 19:41 | |
mriedem | the request spec has an instance_uuid on it, you can get the instance from that and pull the metadata off the instance; that won't work for multi-create, but you probably don't care at oath | 19:41 |
dansmith | yup | 19:42 |
jaypipes | dansmith: nova boot --property ytag=SOME_CUSTOM_YAHOO_GOOP; nova scheduler has a filter that does an external lookup to our inventory management system of the ytag to grab the availability zone (really, just a power domain) to send the instance to | 19:42 |
dansmith | or hint yourself to an external artifact, fetch it and go nuts | 19:42 |
*** k_mouza has joined #openstack-nova | 19:42 | |
dansmith | jaypipes: yep, I got it, but metadata is off limits | 19:42 |
*** k_mouza_ has quit IRC | 19:42 | |
dansmith | --hint ytag-fml -> look up fml externally, do a thing | 19:43 |
jaypipes | mriedem: you can't do that. the instance_uuid is for the mapping, but the metadata doesn't exist yet, so if you try to call Instance.get_by_uuid(instance_uuid) from the filter, that's a dead end. | 19:43 |
mriedem | jaypipes: get the build request then | 19:43 |
jaypipes | mriedem: build request doesn't store metadata. | 19:43 |
melwitt | last I heard, oath does use multi-create. there's auto-scale stuff that boots several instances at once | 19:43 |
mriedem | jaypipes: sure it does, | 19:43 |
mriedem | it stores the instance | 19:43 |
mriedem | which stores the metadata | 19:43 |
*** mvkr has joined #openstack-nova | 19:44 | |
dansmith | right, that's how we know what to create when we've picked a host :) | 19:44 |
jaypipes | mriedem: https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L1004-L1007 | 19:44 |
jaypipes | mriedem: where exactly does the build request store the instance metadata? | 19:44 |
mriedem | instance=instance | 19:44 |
dansmith | it's in instance | 19:44 |
dansmith | heh yeah | 19:44 |
mriedem | instance.update(base_options) | 19:45 |
mriedem | build request stores a serialized instance object | 19:45 |
jaypipes | ah.. | 19:45 |
jaypipes | I missed that. sorry. | 19:45 |
jaypipes | ok, so I'll change the filter to pull the build request by instance_uuid. | 19:45 |
mriedem | https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L936 | 19:45 |
jaypipes | thanks y'all | 19:45 |
mriedem | performance will suck | 19:45 |
mriedem | but... | 19:45 |
dansmith | but you're going to hell anyway? | 19:45 |
dansmith | yah. | 19:45 |
jaypipes | mriedem: yes, understood. | 19:45 |
sean-k-mooney | hum in that case the json fileter and compute capablity filters can already read it and do stuff... | 19:46 |
jaypipes | sean-k-mooney: we already have a custom IronicCapabilitiesFilter. don't get me started :) | 19:46 |
melwitt | jaypipes: yeah, you might probably run into perf problems during a db lookup in a filter (but I guess you said earlier you're doing an external system lookup in a filter already and that wasn't hurting perf?) | 19:46 |
*** k_mouza has quit IRC | 19:46 | |
sean-k-mooney | jaypipes: are you thinking of pulling this suff out in a pre placement filter or post out of interest | 19:46 |
jaypipes | this one's actually a network availability filter that looks for num_additional_ipv4 and num_ipv6 custom metadata key/values, calls out to our IPAM system from within the filter itself, and determines if the target system has enough IP addresses available. | 19:47 |
jaypipes | dansmith: you're welcome ^ | 19:47 |
melwitt | I remember when I worked at yahoo, we ran into perf issues with an in-tree filter that was doing db lookups and had to patch it out (and upstream fixed it soon after) | 19:47 |
* dansmith hands jaypipes a handful of poo | 19:47 | |
jaypipes | melwitt: this is even worse. :) it's doing out of band calls to a REST API from within the in-tree filter :) | 19:48 |
sean-k-mooney | jaypipes: for l3 routeded network we are storing that kindo of info in placement | 19:48 |
melwitt | yeah. interesting that it's not causing perf issues | 19:48 |
jaypipes | melwitt: well, it's not like the traffic to the scheduler is huge... | 19:48 |
jaypipes | melwitt: I mean, it's not like there's thousands of concurrent callers of nova boot for ironic hosts. | 19:49 |
sean-k-mooney | jaypipes: any way you could jsut write some kind of bridge between neutron and the ipam to model it in placement and not use a scheduler filter | 19:49 |
melwitt | it used to be, is what I'm saying. but that was back before we had placement filtering the set of compute nodes down | 19:49 |
jaypipes | sean-k-mooney: baby steps :) | 19:49 |
jaypipes | melwitt: we're getting there.. slowly :) | 19:49 |
sean-k-mooney | jaypipes: hehe ok it just seam like some of that code might already exists for the l3 routed network and with gibis resouce stuff you might be able to have neutron pass say a host aggreage and ip reouce request back | 19:50 |
sean-k-mooney | * placement aggreate | 19:50 |
melwitt | oh, ironic. I don't know much about that. I think what I was referring to was VMs. it was since there were 1000 compute nodes, have a db lookup once per the 1000 and having concurrent requests caused the problems | 19:51 |
jaypipes | sean-k-mooney: ALL of that code already exists in the L3 routed net segments stuff :) we just need to get there first.. | 19:51 |
melwitt | *db lookup for each of the 1000 | 19:51 |
*** k_mouza has joined #openstack-nova | 19:51 | |
sean-k-mooney | jaypipes: so playing devils advocate here if you wrote a neutron ipam plugin driver and and we had a way to get the info from neutron to nova via the port you would not need the filter right. | 19:53 |
*** k_mouza has quit IRC | 19:54 | |
jaypipes | sean-k-mooney: correct. | 19:54 |
sean-k-mooney | jaypipes: im just thinking this would be generically useful out side oath too but for mass reuse we would not want to go the schduler filter route | 19:54 |
jaypipes | sean-k-mooney: I welcome your imminent pull request to our repo. | 19:54 |
sean-k-mooney | :) well have you talked to any neutron folk about how much effort that would be? | 19:55 |
sean-k-mooney | sound like you can do the schduler filter out of tree without nova changes in anycase | 19:55 |
mriedem | dansmith: on the detach/attach root volume spec https://review.openstack.org/#/c/600628/ did you see that it was updated to also allow detaching the root volume from stopped instances in addition to shelved offloaded? | 19:56 |
jaypipes | sean-k-mooney: though the "interesting" part about this particular filter is that it's quantitative -- "make sure this baremetal host is in rack that has X number of available IPv4s in its subnet" -- but the request isn't actually *for* that amount. It's basically "ok, just make sure I *could* get this many IPs, but don't actually grab those IPs for me. right now. maybe later, ok thx bai" | 19:56 |
dansmith | mriedem: no I saw some activity on that this morning and have it queued | 19:57 |
mriedem | jaypipes: sounds like what you want is quotas and resource claims brother! | 19:57 |
melwitt | jaypipes: that sounds kind of like the key/value discussion we've been having. need disk_typeA=4 but don't want to actually consume them | 19:57 |
mriedem | your fingers say no but your mouth....also says no | 19:58 |
melwitt | so I wonder if you could solve your complex partitioning/layout issues similarly | 19:58 |
*** gouthamr has quit IRC | 19:58 | |
mriedem | dansmith: yeah i'm not sure how i feel about it....but i'm also not sure i have a good excuse against allowing it | 19:58 |
sean-k-mooney | jaypipes: right ok so your not reseving the ips so your hoping that if you need them in the future you could resrve them | 19:58 |
sean-k-mooney | jaypipes: that sound more like a weigher | 19:58 |
dansmith | mriedem: well, I thought the shelve bit was because we didn't have to worry about disconnecting on the compute node | 19:59 |
mriedem | it's definitely more straight forward if the instance is shelved | 19:59 |
dansmith | yeah | 19:59 |
dansmith | anywa | 19:59 |
dansmith | I'll try to get around to it | 19:59 |
*** k_mouza has joined #openstack-nova | 20:04 | |
sean-k-mooney | melwitt: jaypipes not sure how the disk_typeA=4 is intended to work but asumming this was all in placement you "could" ask placement for 4 ips in this case but instead of claiming all the resouces in the allocation candiate only calim 1 ip. | 20:05 |
sean-k-mooney | melwitt: jaypipes that said for that code to be in the nova tree it would have to be generic and not for jsut this usecase. not sure how you would model that in flvor extraspecs however | 20:06 |
mriedem | jaypipes: efried: i'm trying to figure out what's going on with https://review.openstack.org/#/c/552105/ and https://review.openstack.org/#/c/544683/ from reading the ptg etherpad and it's not really clear to me if those are supposed to be combined or initial allocation ratios is a dependency for the other spec? | 20:06 |
mriedem | i spoke with yikun last night and he's confused as to what should be changed based on the etherpad, and i kind of am too since the etherpad is just mostly discussion | 20:07 |
melwitt | I think the initial ratios spec is a dependency for the other spec. two specs needed | 20:07 |
mriedem | https://review.openstack.org/#/c/544683/ says "#agreed in Stein PTG to squash this into [1]" | 20:07 |
mriedem | and the etherpad says "Sounds like the two specs need to be combined a bit." | 20:08 |
melwitt | ok, then I must not have understood | 20:08 |
melwitt | I thought it was two specs, one to define the initial allocation ratios and another to define how to handle the initial allocation ratios | 20:08 |
mriedem | why wouldn't that just be one spec? | 20:09 |
*** k_mouza has quit IRC | 20:09 | |
sean-k-mooney | mriedem: we said to combine them at the ptg yes but im trying to rember the details | 20:09 |
melwitt | I don't know. but that's what was being talked about in the room at the time, or so I thought | 20:10 |
jaypipes | sean-k-mooney, mriedem, dansmith, melwitt: to be clear, I'm not asking for anything at all :) I'm really just gonna do this custom filter thing as a stop-gap measure until we get on a more up to date version of nova. | 20:10 |
mriedem | agreed to add new initial_allocation ratio options with the default values from the ComputeNode object today, | 20:10 |
mriedem | change the existing *_allocation_ratio values from 0.0 defaults to None | 20:10 |
*** k_mouza has joined #openstack-nova | 20:10 | |
sean-k-mooney | jaypipes: oh i know. it just sounded like a useful thing. maybe for T | 20:10 |
mriedem | delete the code in the ComputeNode object so it's all config driven | 20:10 |
jaypipes | mriedem: yes, change them back to None from 0.0. | 20:11 |
mriedem | and then something something if config is set, that trumps the API | 20:11 |
jaypipes | mriedem: but I distinctly remember saying I was at the end of my proverbial rope with both of those specs and someone else would need to pick it up. | 20:11 |
jaypipes | :) | 20:11 |
mriedem | so mgagne's use case can use the API exclusively and CERN can use the config exclusively | 20:11 |
mriedem | jaypipes: yes yikun is happy to pick it up, | 20:11 |
jaypipes | cool, thx | 20:11 |
mriedem | but he doesn't understand what the direction is... | 20:11 |
sean-k-mooney | mriedem: yes i think we said if you want to be api drive in the config you set the value to none or remvoe it | 20:11 |
mriedem | which is why i'm trying to be a middleman here | 20:11 |
sean-k-mooney | mriedem: if you want to be config driven you set the config value and dont touch it from the api | 20:12 |
jaypipes | mriedem: you are wonderful middleware. | 20:12 |
mriedem | and yikun had a question, "How to address the upgrade case? If we already have a 0.0 cpu ratio in db, should we change it to 16.0 first? online migration?" | 20:13 |
*** k_mouza_ has joined #openstack-nova | 20:13 | |
sean-k-mooney | mriedem: spefically you set cpu_allocation_ratio=None initall_cpu_allocation_ratio=16.0 if you want to set a default for new node but manage the actuall value form api | 20:13 |
mriedem | would we change the ComputeNode.*_allocation_ratio to the config value on read if the value in the db is 0.0? | 20:14 |
sean-k-mooney | and set cpu_allocation_ratio=x if you want to manage via config | 20:14 |
*** k_mouza has quit IRC | 20:15 | |
sean-k-mooney | mriedem: does that make sense? | 20:15 |
mriedem | yes i get that, | 20:16 |
mriedem | the question is upgrades https://review.openstack.org/#/c/552105/5/specs/stein/approved/initial-allocation-ratios.rst@114 | 20:16 |
*** k_mouza has joined #openstack-nova | 20:17 | |
*** k_mouza_ has quit IRC | 20:18 | |
sean-k-mooney | mriedem: i guess on upgrade if 0.0 is set in the db that would also imply that the resouce provider allocation_ratio or what ever is 0. | 20:18 |
sean-k-mooney | 0.0 also correct | 20:18 |
sean-k-mooney | mriedem: if the resouce provide exists and we get 0.0 from the db but placemetn has another vaule i would assuem we shoudl keep the placement value but not sure if that case can happen today | 20:20 |
sean-k-mooney | the compute node will just override placement with the value it gets from the resouce tracker today in update provider tree right? | 20:21 |
*** k_mouza has quit IRC | 20:21 | |
*** k_mouza_ has joined #openstack-nova | 20:24 | |
*** cfriesen has joined #openstack-nova | 20:24 | |
mriedem | the allocation ratio in placement can't be 0.0 | 20:25 |
mriedem | it will literally shit itself | 20:25 |
cfriesen | I assume it | 20:26 |
cfriesen | it's a divide by zero thing? | 20:26 |
sean-k-mooney | mriedem: ok in that case the logic is simple. on upgrade if placement provider exits and db value is 0 set db value to placement value. if not placement provider exeits set db to intiall_* value and create provider as normal | 20:26 |
cfriesen | should placement check for that? | 20:26 |
openstackgerrit | iain MacDonnell proposed openstack/nova master: Handle online_data_migrations exceptions https://review.openstack.org/608091 | 20:27 |
sean-k-mooney | cfriesen: its not actully a device by 0 but we multiply the available capasity by 0 and see if its larger then what we requested | 20:27 |
*** k_mouza has joined #openstack-nova | 20:27 | |
sean-k-mooney | cfriesen: so placement will not have a math error but you wont be able to get allocation against that resouce provider ever | 20:28 |
cfriesen | ah, thanks | 20:28 |
*** k_mouza_ has quit IRC | 20:29 | |
*** k_mouza_ has joined #openstack-nova | 20:31 | |
*** k_mouza has quit IRC | 20:31 | |
*** k_mouza has joined #openstack-nova | 20:34 | |
*** k_mouza_ has quit IRC | 20:36 | |
*** k_mouza has quit IRC | 20:36 | |
*** k_mouza has joined #openstack-nova | 20:37 | |
*** pcaruana has quit IRC | 20:37 | |
*** k_mouza_ has joined #openstack-nova | 20:40 | |
*** k_mouza has quit IRC | 20:41 | |
*** k_mouza has joined #openstack-nova | 20:44 | |
*** k_mouza_ has quit IRC | 20:44 | |
*** hamzy_ has quit IRC | 20:46 | |
*** k_mouza_ has joined #openstack-nova | 20:47 | |
*** k_mouza has quit IRC | 20:48 | |
*** erlon_ has quit IRC | 20:50 | |
*** gouthamr has joined #openstack-nova | 20:50 | |
*** k_mouza has joined #openstack-nova | 20:50 | |
efried | mriedem: We talked about this in the sched meeting yesterday. jaypipes said he was about ready to abandon those two specs. We also discussed the possibility of generic inventory yaml leading to a solution. | 20:50 |
*** k_mouza_ has quit IRC | 20:51 | |
mriedem | efried: i know i was there and said i'd reach out to yikun to pick up the specs, | 20:51 |
mriedem | but he's confused about the direction, as am i | 20:51 |
mriedem | hence questions | 20:51 |
*** ttsiouts has joined #openstack-nova | 20:51 | |
mriedem | i'm going through https://review.openstack.org/#/c/552105/ again now, | 20:51 |
mriedem | some of that is outdated given https://github.com/openstack/nova/commit/2588af87c862cfd02d860f6b860381e907b279ff | 20:51 |
*** k_mouza has quit IRC | 20:54 | |
*** ttsiouts has quit IRC | 21:00 | |
*** rmart04 has joined #openstack-nova | 21:00 | |
*** ttsiouts has joined #openstack-nova | 21:00 | |
*** rmart04 has quit IRC | 21:01 | |
*** k_mouza has joined #openstack-nova | 21:02 | |
*** eharney has quit IRC | 21:05 | |
*** k_mouza has quit IRC | 21:05 | |
*** k_mouza has joined #openstack-nova | 21:06 | |
mriedem | alright i've dumped comments in https://review.openstack.org/#/c/552105/ - i think i could probably update the spec at this point to cover the upgrade impact | 21:07 |
mriedem | i don't think we should leave the existing options defaulting to 0.0 like bauzas is asking for - that just prolongs the confusion of what those defaults mean | 21:08 |
mriedem | jaypipes: if you can skim my comments to see if they make sense i can take over updating the spec | 21:08 |
*** k_mouza has quit IRC | 21:08 | |
*** k_mouza has joined #openstack-nova | 21:12 | |
*** k_mouza has quit IRC | 21:14 | |
*** k_mouza has joined #openstack-nova | 21:15 | |
*** priteau has quit IRC | 21:15 | |
*** cfriesen has quit IRC | 21:19 | |
*** k_mouza has quit IRC | 21:20 | |
sean-k-mooney | mriedem: are you proposing defaulting them to None or 16.0/ what ever the real default is for that resource ? | 21:20 |
mriedem | what the spec says | 21:21 |
mriedem | change the *_allocation_ratio defaults from 0.0 to None | 21:21 |
mriedem | the initial_*_allocation_ratio defaults become what is in the ComputeNode object facade today | 21:21 |
mriedem | and we drop the facade | 21:21 |
sean-k-mooney | right that makes sense to me and inital_*_allocation_ratios will have per resouce type defaults correct | 21:22 |
mriedem | yes | 21:22 |
sean-k-mooney | ya that all sound sane to me. i have not read bauzas comment arguing for keeping 0.0 | 21:22 |
sean-k-mooney | current 0.0 has a special meaning right? e.g. use schduler/conductor values not compute node os somthing like that | 21:23 |
sean-k-mooney | i assume that is what his comment was related too. | 21:24 |
* sean-k-mooney clicks speck link to read | 21:24 | |
sean-k-mooney | the nova-status check makes sense but im not sure you can check the config as part of it | 21:26 |
mriedem | if the config explicitly sets the *_allocation_ratios to 0.0 when we have changed the defaults to None, that means their config mgmt system is setting that on purpose and is likely busted | 21:27 |
sean-k-mooney | mriedem: ture i just was thinking for FFU or in general the nova status command cant check the config on each compute unless you ran it on each compute | 21:29 |
sean-k-mooney | that said if you use oslo configs ablity to auto generate configs does it generate the config with all the values commeted out or set to there default. just trying to think if there was a resonable reason why it might be set to 0.0 | 21:30 |
mriedem | the allocation ratios aren't read on control plane services, so i think it's reasonable to assume if someone's config said 0.0 for those values when the defaults are None they are just setting the config globally and it's wrong | 21:30 |
sean-k-mooney | sorry i thnk i missed that bit where will these new values be read? scheduler/condctor or compute node | 21:34 |
sean-k-mooney | i had assuemed this was all config that was being read by the compute node? | 21:35 |
mriedem | the compute is what sets it | 21:38 |
mriedem | the scheduler will read it | 21:38 |
mriedem | from the compute node object | 21:38 |
*** tbachman has joined #openstack-nova | 21:39 | |
*** ttsiouts has quit IRC | 21:39 | |
mriedem | the only service that reads the config for these options is the compute service | 21:39 |
*** ttsiouts has joined #openstack-nova | 21:39 | |
openstackgerrit | Adam Harwell proposed openstack/nova master: Add apply_cells to nova-manage https://review.openstack.org/568987 | 21:41 |
sean-k-mooney | mriedem: oh ok i was under the impression if the sechduler recived a 0.0 from the compute node it would read its own config and use the allocation ratio it got. was that how it used to work or am i just imagining things. | 21:42 |
*** tbachman_ has joined #openstack-nova | 21:43 | |
sean-k-mooney | mriedem: by in anycase i think the nova-status check is sufficent. if there is a 0.0 in the db for a value the operator should first update there config and then upgrate/run online migration whatever is needed | 21:43 |
*** tbachman has quit IRC | 21:44 | |
*** tbachman_ is now known as tbachman | 21:44 | |
*** ttsiouts has quit IRC | 21:44 | |
sean-k-mooney | i.e. im agreeing with your suggestion thanks for explaining :) | 21:44 |
imacdonn | mriedem dansmith: Fixing the migrations thing made grenade go boom ... there actually was another latent bug that I stumbled on, which was causing the exit code to be zero even though work had been done - with that fixed, grenade is not doing the right thing (repeating until exit status 0) | 21:48 |
sean-k-mooney | imacdonn: what is the other bug? | 21:48 |
mriedem | jaypipes: yikun: i've also gone through https://review.openstack.org/#/c/544683/ and left comments; i'm not on board with all it's proposing, but i think some of that is outdated now per the ptg discussions | 21:49 |
imacdonn | sean-k-mooney: "ran" gets reset to zero for each iteration of the loop here: https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L718 | 21:49 |
mriedem | i think the gist of ^ is that it's proposing to proxy aggregate allocation ratio metadata to placement, correct? | 21:49 |
imacdonn | sean-k-mooney: then it gets used later to determine if any migrations were ran/run at all here: https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L739 | 21:50 |
imacdonn | sean-k-mooney: so if the last iteration of the loop didn't do anything (even though a previous iteration did), it'd not count | 21:51 |
sean-k-mooney | imacdonn im not sure that is incorrect. if the last iteration did nothing it means migrations was empty so we break out of the while | 21:52 |
imacdonn | sean-k-mooney: yes, but the decision about whether or not any work had been done needs to consider ALL iterations of the loop | 21:52 |
sean-k-mooney | does it? why? | 21:53 |
imacdonn | sean-k-mooney: because, IIUC, that's what exit code 1 means (some migrations did work) | 21:53 |
*** k_mouza has joined #openstack-nova | 21:54 | |
* sean-k-mooney currently parseing return ran and 1 or 0 | 21:54 | |
imacdonn | I interpret that as "if ran is not zero, return 1, otherwise return 0" | 21:55 |
sean-k-mooney | imacdonn: yes but im trying to think what does that logically mean | 21:56 |
sean-k-mooney | is return 0 been used to indicate sucess like in bash or does 1 indicate sucess | 21:56 |
*** k_mouza has quit IRC | 21:57 | |
imacdonn | sean-k-mooney: it's complicated :) (again, IIUC)... zero means that there is no migration work remaining to be done, 1 means "some migrations did work, and there may be some more work that still needs to be done: | 21:58 |
imacdonn | so you're supposed to keep running the command until it doesn't return 1 | 21:58 |
sean-k-mooney | imacdonn: right in that cae you want ran to be 0 when all pending migrations have been processed so you want to reset it in the loop | 21:58 |
imacdonn | sean-k-mooney: no... you're supposed to re-run the command ... that's not what that loop is for | 21:59 |
sean-k-mooney | that is not how i read how it is currently written | 22:00 |
sean-k-mooney | from the current code it looks like its intent is to run all migration in batches up to max count and then exit when there are none left | 22:01 |
imacdonn | sean-k-mooney: yes, but there are scenarios where some of them will not work the first time (due to dependencies on others), so oit' | 22:02 |
imacdonn | it's necessary to iterate the whole command until you get a 0 | 22:02 |
imacdonn | .... is what I've been led to understand | 22:02 |
imacdonn | see comments on https://review.openstack.org/#/c/608091/ | 22:02 |
sean-k-mooney | this comment https://review.openstack.org/#/c/608091/1/nova/cmd/manage.py@753 or another one | 22:04 |
imacdonn | yes, those | 22:04 |
sorrison | mriedem I have another fun policy change https://review.openstack.org/#/c/608474/ | 22:05 |
*** k_mouza has joined #openstack-nova | 22:06 | |
sorrison | mriedem: I've been trying to figure out the whole admin/user context thing with tests and requests. I think I'm missing something | 22:06 |
sean-k-mooney | so from mriedem comment we did not raise exceptions before so while we said we coudl exit with a 1 error code we did not | 22:06 |
imacdonn | sean-k-mooney: I'm still not completely clear on what the exit code should be ... I thought I understood that '1' means "some migrations did work" | 22:08 |
*** k_mouza has quit IRC | 22:09 | |
*** k_mouza_ has joined #openstack-nova | 22:09 | |
*** skatsaounis has quit IRC | 22:10 | |
sean-k-mooney | i think for mriedem comment 1 means some migrations ran but there were no errors and he was suggesting 2 for some migrations ran and there were errros and 0 means all migration ran and no errors | 22:11 |
imacdonn | I'm not sure how to distinguish between "some ran" and "all ran" | 22:11 |
sean-k-mooney | imacdonn: if you specify make count before your change and you havde more migration pending then max count it would have exited with 1 | 22:12 |
sean-k-mooney | so just change the retur on line 753 to 2 then be correct but we need a docs update to say what 2 means | 22:13 |
*** k_mouza_ has quit IRC | 22:14 | |
imacdonn | sean-k-mooney: the suggestion from today was to change it to return 2 *only if no migrations did work this time* | 22:14 |
sean-k-mooney | when --max-count is set unlimited is set to false so we break out of the loop on line 735 and ran is non 0 so " retrun ran and 1 or 0" returns 1 | 22:14 |
sean-k-mooney | imacdonn: ok in that case on line 734 add if got_exceptions: return 2 | 22:16 |
*** moshele has joined #openstack-nova | 22:16 | |
*** slaweq has quit IRC | 22:17 | |
sean-k-mooney | infact you could do the exception check on line 725 if you really wanted to exit fast | 22:17 |
imacdonn | that would be bad | 22:18 |
imacdonn | because some other migration may be a dependency for the first one to not fail, and you'd never get to run the second one | 22:18 |
sean-k-mooney | you jsut said you want to exit if there are excpetions | 22:19 |
imacdonn | nope, pretty sure I didn't | 22:19 |
imacdonn | I said they want the final exit status to be 2, if there were exceptions AND no migrations did any work | 22:19 |
sean-k-mooney | oh "*only if no migrations did work this time*" i misread that | 22:20 |
mriedem | sorrison: comments inline | 22:20 |
imacdonn | I guess "did work" is a confusing term ... "took effect"? ...... | 22:21 |
sean-k-mooney | mriedem: so we are gueessing about what you want return 2 to mean for https://review.openstack.org/#/c/608091/1/nova/cmd/manage.py@753 | 22:21 |
sorrison | Thanks mriedem :-) | 22:21 |
sean-k-mooney | mriedem: since your hear can you clarify for imacdonn | 22:21 |
* imacdonn reckons mriedem is in NUMA mode, and we're on the other node ;) | 22:22 | |
sean-k-mooney | mriedem: based on the current code return 1 could have been returned if we passed --max-count and we had more the max-count migration so i think retrun 1 means some migrations ran with no errors and there are more to run and return 2 should be mean there were errors when runign migrations you better check out what went wrong | 22:23 |
*** k_mouza has joined #openstack-nova | 22:24 | |
imacdonn | sean-k-mooney: per dansmith, we should only return 2 if there were exceptions *and* we've determined that no other migrations did work (i.e. modified rows) | 22:26 |
sean-k-mooney | imacdonn: i dont see that in his comment on the review. was that in the schduler meeting? | 22:28 |
mriedem | it was in channel | 22:28 |
mriedem | several hours ago | 22:28 |
*** k_mouza has quit IRC | 22:28 | |
mriedem | and it sounds like you guys are talking about it all over again to come back to the same conclusion? | 22:28 |
sean-k-mooney | no there was a vlaid case to return 1 before | 22:28 |
mriedem | "(11:25:34 AM) dansmith: what if 2 means "I didn't do anything but there were exceptions", 1 means "I did things, maybe there were some exceptions too", 0 means "I didn't do anything, but no errors"" | 22:29 |
imacdonn | mriedem: the sticking point now is the meaning of '1' | 22:29 |
sean-k-mooney | we would have return one if we set a max count and there were more then max count migrations | 22:29 |
imacdonn | mriedem: In the current implementation, if it runs through all migrations (in batches), the final return code is 0, even though work was done | 22:30 |
sean-k-mooney | mriedem: so if there were 100 migrations to run and we set --max-count=90 before and all 90 ran sucessfully we would have retruned 1 to indicate there are more to run | 22:30 |
imacdonn | in my interpretation of the discussion this morning, it should be '1' if work was done, even if there is no work remaining to do | 22:31 |
*** tbachman has quit IRC | 22:31 | |
*** k_mouza has joined #openstack-nova | 22:31 | |
sean-k-mooney | mriedem: 0 used to mean i ran all migrtions sucessfuly since 0 is sucess on bash | 22:32 |
mriedem | sorry but i'm past the point of having attention to think about this today | 22:33 |
sean-k-mooney | mriedem: no worries am ill leave a comment in the patch with what i understand the current logic is and you and dan can check when ye have had some rest | 22:33 |
sean-k-mooney | imacdonn: are you ok with waiting for them to look at this tomorrow? | 22:34 |
*** k_mouza has quit IRC | 22:34 | |
imacdonn | sean-k-mooney: sure | 22:34 |
*** panda has quit IRC | 22:37 | |
*** panda has joined #openstack-nova | 22:39 | |
*** rcernin has joined #openstack-nova | 22:42 | |
*** k_mouza has joined #openstack-nova | 22:43 | |
*** owalsh is now known as owalsh_away | 22:47 | |
*** mriedem has quit IRC | 22:57 | |
*** hongbin has quit IRC | 22:58 | |
*** lbragstad has quit IRC | 23:01 | |
*** spatel has joined #openstack-nova | 23:03 | |
*** cfriesen has joined #openstack-nova | 23:05 | |
*** spatel has quit IRC | 23:07 | |
*** slaweq has joined #openstack-nova | 23:11 | |
*** efried has quit IRC | 23:12 | |
*** efried has joined #openstack-nova | 23:13 | |
*** itlinux has quit IRC | 23:14 | |
*** slaweq has quit IRC | 23:16 | |
openstackgerrit | sean mooney proposed openstack/nova master: add get_pci_requests_from_vifs to request.py https://review.openstack.org/609166 | 23:22 |
*** macza has quit IRC | 23:32 | |
*** mlavalle has quit IRC | 23:35 | |
*** cfriesen has quit IRC | 23:41 | |
*** cfriesen has joined #openstack-nova | 23:42 | |
*** Swami has quit IRC | 23:43 | |
*** erlon_ has joined #openstack-nova | 23:44 | |
*** sambetts|afk has quit IRC | 23:44 | |
*** sambetts_ has joined #openstack-nova | 23:45 | |
openstackgerrit | Chris Friesen proposed openstack/nova-specs master: Add support for emulated virtual TPM https://review.openstack.org/571111 | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!