*** zhanglong has joined #openstack-nova | 00:00 | |
*** zhanglong has quit IRC | 00:04 | |
*** mriedem has quit IRC | 00:06 | |
openstackgerrit | Dustin Cowles proposed openstack/nova-specs master: Update provider config spec for identification conflicts https://review.opendev.org/693414 | 00:06 |
---|---|---|
*** zhanglong has joined #openstack-nova | 00:06 | |
*** ociuhandu has joined #openstack-nova | 00:07 | |
*** slaweq has joined #openstack-nova | 00:10 | |
*** macz has quit IRC | 00:11 | |
*** ociuhandu has quit IRC | 00:11 | |
*** slaweq has quit IRC | 00:15 | |
*** ociuhandu has joined #openstack-nova | 00:18 | |
*** slaweq has joined #openstack-nova | 00:19 | |
*** tosky has quit IRC | 00:22 | |
*** slaweq has quit IRC | 00:24 | |
*** ociuhandu has quit IRC | 00:32 | |
*** JamesBenson has joined #openstack-nova | 00:33 | |
*** slaweq has joined #openstack-nova | 00:37 | |
*** JamesBenson has quit IRC | 00:37 | |
*** sapd1 has joined #openstack-nova | 00:39 | |
*** slaweq has quit IRC | 00:42 | |
openstackgerrit | Merged openstack/nova stable/train: Don't delete compute node, when deleting service other than nova-compute https://review.opendev.org/695145 | 00:42 |
*** slaweq has joined #openstack-nova | 00:44 | |
*** slaweq has quit IRC | 00:48 | |
*** slaweq has joined #openstack-nova | 00:51 | |
*** slaweq has quit IRC | 00:55 | |
*** slaweq has joined #openstack-nova | 00:57 | |
*** brault has quit IRC | 00:58 | |
*** Liang__ has joined #openstack-nova | 00:58 | |
*** brault has joined #openstack-nova | 00:59 | |
*** zhanglong has quit IRC | 01:00 | |
*** slaweq has quit IRC | 01:01 | |
*** slaweq has joined #openstack-nova | 01:03 | |
*** ociuhandu has joined #openstack-nova | 01:04 | |
*** sapd1 has quit IRC | 01:06 | |
*** nanzha has joined #openstack-nova | 01:07 | |
*** ociuhandu has quit IRC | 01:08 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Handle target host cross-cell cold migration in conductor https://review.opendev.org/642591 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Validate image/create during cross-cell resize functional testing https://review.opendev.org/642592 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add zones wrinkle to TestMultiCellMigrate https://review.opendev.org/643450 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add negative test for cross-cell finish_resize failing https://review.opendev.org/643451 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add negative test for prep_snapshot_based_resize_at_source failing https://review.opendev.org/669013 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize_at_source compute method https://review.opendev.org/637058 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add ConfirmResizeTask https://review.opendev.org/637070 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize conductor RPC method https://review.opendev.org/637075 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize from the API https://review.opendev.org/637316 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize_at_dest compute method https://review.opendev.org/637630 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deal with cross-cell resize in _remove_deleted_instances_allocations https://review.opendev.org/639453 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add finish_revert_snapshot_based_resize_at_source compute method https://review.opendev.org/637647 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add RevertResizeTask https://review.opendev.org/638046 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method https://review.opendev.org/638047 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API https://review.opendev.org/638048 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server https://review.opendev.org/638268 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add archive_deleted_rows wrinkle to cross-cell functional test https://review.opendev.org/651650 | 01:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellWeigher https://review.opendev.org/614353 | 01:11 |
*** slaweq has quit IRC | 01:13 | |
*** zhanglong has joined #openstack-nova | 01:15 | |
gmann | efried: any idea or have you seen 'openstack:' as resource provider name in https://zuul.opendev.org/t/openstack/build/b26bd7fc82d94d4cac97b5481623b629/log/logs/grenade.sh.txt.gz#30397 | 01:16 |
gmann | this is grenade job on octavia which is failing due to that when moving to py3. | 01:16 |
*** slaweq has joined #openstack-nova | 01:17 | |
*** nanzha has quit IRC | 01:20 | |
*** slaweq has quit IRC | 01:21 | |
*** awalende has joined #openstack-nova | 01:22 | |
*** nanzha has joined #openstack-nova | 01:24 | |
*** awalende has quit IRC | 01:27 | |
*** nanzha has quit IRC | 01:30 | |
*** nanzha has joined #openstack-nova | 01:30 | |
*** zhanglong has quit IRC | 01:34 | |
*** dave-mccowan has joined #openstack-nova | 01:35 | |
*** zhanglong has joined #openstack-nova | 01:38 | |
*** dave-mccowan has quit IRC | 01:42 | |
*** yedongcan has joined #openstack-nova | 01:42 | |
*** tetsuro_ has joined #openstack-nova | 01:50 | |
*** slaweq has joined #openstack-nova | 01:51 | |
*** gyee has quit IRC | 01:52 | |
*** mdbooth has quit IRC | 01:52 | |
*** tetsuro has quit IRC | 01:52 | |
*** mdbooth has joined #openstack-nova | 01:54 | |
*** ricolin has joined #openstack-nova | 01:55 | |
*** slaweq has quit IRC | 01:55 | |
*** slaweq has joined #openstack-nova | 01:58 | |
*** larainema has joined #openstack-nova | 01:59 | |
*** ericin has joined #openstack-nova | 02:05 | |
*** TxGirlGeek has quit IRC | 02:06 | |
*** slaweq has quit IRC | 02:10 | |
*** ociuhandu has joined #openstack-nova | 02:11 | |
*** ociuhandu has quit IRC | 02:16 | |
*** slaweq has joined #openstack-nova | 02:23 | |
*** brinzhang has joined #openstack-nova | 02:29 | |
*** slaweq has quit IRC | 02:30 | |
*** slaweq has joined #openstack-nova | 02:31 | |
*** brinzhang_ has joined #openstack-nova | 02:32 | |
*** slaweq has quit IRC | 02:35 | |
*** brinzhang has quit IRC | 02:36 | |
*** brinzhang has joined #openstack-nova | 02:37 | |
*** brinzhang_ has quit IRC | 02:37 | |
*** slaweq has joined #openstack-nova | 02:48 | |
*** macz has joined #openstack-nova | 02:50 | |
*** slaweq has quit IRC | 02:52 | |
*** brinzhang has quit IRC | 03:01 | |
*** brinzhang has joined #openstack-nova | 03:02 | |
*** ericleiin has joined #openstack-nova | 03:09 | |
*** ericin has quit IRC | 03:13 | |
*** abaindur has quit IRC | 03:14 | |
*** ociuhandu has joined #openstack-nova | 03:28 | |
*** macz has quit IRC | 03:32 | |
*** macz has joined #openstack-nova | 03:33 | |
*** ociuhandu has quit IRC | 03:33 | |
*** macz has quit IRC | 03:37 | |
*** brinzhang_ has joined #openstack-nova | 03:39 | |
*** awalende has joined #openstack-nova | 03:40 | |
*** brinzhang has quit IRC | 03:42 | |
*** awalende has quit IRC | 03:44 | |
*** ericleiin has quit IRC | 03:44 | |
*** ericleiin has joined #openstack-nova | 03:45 | |
*** zhanglong has quit IRC | 03:50 | |
*** tonyb[m] has joined #openstack-nova | 03:50 | |
*** udesale has joined #openstack-nova | 03:53 | |
*** brinzhang has joined #openstack-nova | 03:54 | |
*** brinzhang_ has quit IRC | 03:56 | |
*** mkrai has joined #openstack-nova | 03:57 | |
*** bhagyashris has joined #openstack-nova | 04:05 | |
*** mkrai has quit IRC | 04:10 | |
*** mkrai has joined #openstack-nova | 04:14 | |
*** brinzhang_ has joined #openstack-nova | 04:19 | |
*** brinzhang has quit IRC | 04:22 | |
*** ericlei_ has joined #openstack-nova | 04:33 | |
*** ericleiin has quit IRC | 04:35 | |
*** bhagyashris has quit IRC | 04:39 | |
*** factor has joined #openstack-nova | 04:52 | |
*** brinzhang has joined #openstack-nova | 04:53 | |
*** brinzhang_ has quit IRC | 04:56 | |
*** bhagyashris has joined #openstack-nova | 05:09 | |
*** igordc has quit IRC | 05:11 | |
*** brinzhang_ has joined #openstack-nova | 05:13 | |
*** ratailor has joined #openstack-nova | 05:16 | |
*** brinzhang has quit IRC | 05:17 | |
*** ericleiin has joined #openstack-nova | 05:22 | |
openstackgerrit | Merged openstack/nova stable/pike: Only nil az during shelve offload https://review.opendev.org/693839 | 05:23 |
*** ericlei_ has quit IRC | 05:24 | |
*** artom has quit IRC | 05:27 | |
*** ociuhandu has joined #openstack-nova | 05:30 | |
*** ociuhandu has quit IRC | 05:35 | |
*** ericlei_ has joined #openstack-nova | 05:36 | |
*** brinzhang has joined #openstack-nova | 05:38 | |
*** ericleiin has quit IRC | 05:39 | |
*** brinzhang_ has quit IRC | 05:42 | |
*** zhanglong has joined #openstack-nova | 05:44 | |
*** yaawang has quit IRC | 05:50 | |
*** yaawang has joined #openstack-nova | 05:51 | |
*** Luzi has joined #openstack-nova | 05:58 | |
*** awalende has joined #openstack-nova | 05:59 | |
*** brinzhang_ has joined #openstack-nova | 05:59 | |
*** brinzhang has quit IRC | 06:02 | |
*** awalende has quit IRC | 06:03 | |
*** udesale has quit IRC | 06:15 | |
*** udesale has joined #openstack-nova | 06:16 | |
*** lpetrut has quit IRC | 06:21 | |
*** macz has joined #openstack-nova | 06:22 | |
*** ociuhandu has joined #openstack-nova | 06:25 | |
*** macz has quit IRC | 06:27 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova master: Imported Translations from Zanata https://review.opendev.org/694717 | 06:36 |
*** slaweq has joined #openstack-nova | 06:39 | |
*** udesale has quit IRC | 06:39 | |
*** udesale has joined #openstack-nova | 06:39 | |
*** dpawlik has joined #openstack-nova | 06:41 | |
*** slaweq has quit IRC | 06:44 | |
*** ociuhandu has quit IRC | 06:46 | |
*** brault has quit IRC | 06:48 | |
*** factor has quit IRC | 07:01 | |
*** ociuhandu has joined #openstack-nova | 07:05 | |
*** brinzhang has joined #openstack-nova | 07:07 | |
*** xek_ has joined #openstack-nova | 07:10 | |
*** brinzhang_ has quit IRC | 07:11 | |
*** rcernin has quit IRC | 07:12 | |
*** abaindur has joined #openstack-nova | 07:22 | |
*** brinzhang_ has joined #openstack-nova | 07:22 | |
*** jawad_axd has joined #openstack-nova | 07:23 | |
*** ericleiin has joined #openstack-nova | 07:24 | |
*** brinzhang has quit IRC | 07:25 | |
*** pcaruana has joined #openstack-nova | 07:27 | |
*** ericlei_ has quit IRC | 07:28 | |
*** tosky has joined #openstack-nova | 07:31 | |
*** brinzhang has joined #openstack-nova | 07:33 | |
*** abaindur has quit IRC | 07:35 | |
*** abaindur has joined #openstack-nova | 07:35 | |
*** brinzhang_ has quit IRC | 07:36 | |
*** ociuhandu has quit IRC | 07:39 | |
*** brault has joined #openstack-nova | 07:44 | |
*** damien_r has joined #openstack-nova | 07:52 | |
*** bhagyashris has quit IRC | 07:55 | |
*** ericlei_ has joined #openstack-nova | 07:56 | |
*** ericleiin has quit IRC | 07:58 | |
*** ociuhandu has joined #openstack-nova | 08:00 | |
*** brault has quit IRC | 08:01 | |
*** nanzha has quit IRC | 08:03 | |
*** ociuhandu has quit IRC | 08:05 | |
*** nanzha has joined #openstack-nova | 08:06 | |
*** brinzhang_ has joined #openstack-nova | 08:08 | |
bauzas | good morning Nova | 08:10 |
*** brinzhang has quit IRC | 08:10 | |
*** ericlei_ has quit IRC | 08:11 | |
*** awalende has joined #openstack-nova | 08:15 | |
*** awalende has quit IRC | 08:15 | |
gibi | bauzas: good morning | 08:16 |
bauzas | :) | 08:16 |
*** awalende has joined #openstack-nova | 08:19 | |
*** rpittau|afk is now known as rpittau | 08:20 | |
*** tkajinam has quit IRC | 08:28 | |
*** bhagyashris has joined #openstack-nova | 08:41 | |
*** ociuhandu has joined #openstack-nova | 08:44 | |
*** slaweq has joined #openstack-nova | 08:44 | |
*** ralonsoh has joined #openstack-nova | 08:45 | |
*** ociuhandu has quit IRC | 08:46 | |
*** ociuhandu has joined #openstack-nova | 08:47 | |
*** ociuhandu has quit IRC | 08:50 | |
openstackgerrit | ya.wang proposed openstack/nova master: Handle instance crash event in libvirt driver https://review.opendev.org/695369 | 08:50 |
*** ociuhandu has joined #openstack-nova | 08:50 | |
*** mkrai has quit IRC | 08:52 | |
*** mkrai has joined #openstack-nova | 08:53 | |
*** ociuhandu has quit IRC | 08:57 | |
*** sridharg has joined #openstack-nova | 09:01 | |
*** maciejjozefczyk has joined #openstack-nova | 09:01 | |
*** priteau has joined #openstack-nova | 09:03 | |
*** maciejjozefczyk has quit IRC | 09:14 | |
*** ricolin has quit IRC | 09:22 | |
*** tssurya has joined #openstack-nova | 09:24 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: docs: Rewrite quotas documentation https://review.opendev.org/667165 | 09:25 |
stephenfin | bauzas: Morning. Any chance you could look at one or two of the "die, nova-network, die" patches I have up today? Like, say, this really easy one https://review.opendev.org/#/c/684345/ ? | 09:26 |
bauzas | lol ok | 09:26 |
* stephenfin has logged one #success already this week (for DevStack -> Python 3). Would be nice to log another one | 09:27 | |
openstackgerrit | Tushar Patil proposed openstack/nova-specs master: Allow compute nodes to use DISK_GB from shared storage RP https://review.opendev.org/650188 | 09:28 |
*** dpawlik has quit IRC | 09:30 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: zuul: Remove unnecessary 'USE_PYTHON3' https://review.opendev.org/695380 | 09:34 |
*** shilpasd has joined #openstack-nova | 09:34 | |
*** Liang__ has quit IRC | 09:34 | |
*** dpawlik has joined #openstack-nova | 09:35 | |
*** abaindur has quit IRC | 09:45 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova stable/stein: Don't delete compute node, when deleting service other than nova-compute https://review.opendev.org/695381 | 09:45 |
*** brinzhang has joined #openstack-nova | 09:45 | |
*** abaindur has joined #openstack-nova | 09:45 | |
*** brinzhang_ has quit IRC | 09:48 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova stable/rocky: Don't delete compute node, when deleting service other than nova-compute https://review.opendev.org/695382 | 09:49 |
*** martinkennelly has joined #openstack-nova | 09:49 | |
*** brinzhang_ has joined #openstack-nova | 09:50 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova stable/queens: Don't delete compute node, when deleting service other than nova-compute https://review.opendev.org/695383 | 09:51 |
*** derekh has joined #openstack-nova | 09:53 | |
*** brinzhang has quit IRC | 09:53 | |
*** ociuhandu has joined #openstack-nova | 09:54 | |
*** ociuhandu has quit IRC | 09:57 | |
*** ociuhandu has joined #openstack-nova | 09:58 | |
*** mkrai has quit IRC | 10:01 | |
*** mkrai has joined #openstack-nova | 10:02 | |
*** ociuhandu has quit IRC | 10:03 | |
*** derekh has quit IRC | 10:04 | |
*** liuyulong has quit IRC | 10:06 | |
*** priteau has quit IRC | 10:07 | |
*** brinzhang_ has quit IRC | 10:07 | |
*** derekh has joined #openstack-nova | 10:09 | |
*** priteau has joined #openstack-nova | 10:12 | |
*** macz has joined #openstack-nova | 10:30 | |
yaawang | mdbooth: Hello, can you review the spec again, I'd updated it :) https://review.opendev.org/#/c/693655/ | 10:32 |
*** macz has quit IRC | 10:35 | |
*** dtantsur|afk is now known as dtantsur | 10:49 | |
kashyap | yaawang: Hi, on the "no_performance_impact" name -- I still find it too broad and sweeping | 10:52 |
kashyap | yaawang: Thanks for addressing my feedback, though! | 10:52 |
*** derekh has quit IRC | 10:52 | |
kashyap | I agree, we don't want to expose "hypervisor"-specific features. (BTW, we are abusing the term "hyerpvisor" here to include QEMU; the term normally includes KVM/Kernel area only.) | 10:53 |
kashyap | Maybe "no_migration_perf_impact" | 10:53 |
*** damien_r has quit IRC | 10:53 | |
*** damien_r has joined #openstack-nova | 10:54 | |
kashyap | But just plain "no_performance_impact" is just _awful_ name (Cc: mdbooth.) At least make it: "no_migration_perf_impact" | 10:57 |
yaawang | kashyap: Agree, "no_performance_impact" sounds include live-migrate and migrate. How about "no_perf_impact_live_migration"? | 11:10 |
*** rpittau is now known as rpittau|bbl | 11:11 | |
kashyap | yaawang: Yeah, I thought of including "live" as well; almost good - but reverse it: "no_live_emigration_perf_impact" | 11:12 |
kashyap | Typo: s/emigration/migration/ | 11:12 |
*** dpawlik has quit IRC | 11:13 | |
kashyap | yaawang: Commented on the change. I'd say, let's go with the above ("no_live_migration_perf_impact"). | 11:14 |
*** udesale has quit IRC | 11:14 | |
yaawang | kashyap: Thanks, need mdbooth to post his comment. | 11:17 |
*** ociuhandu has joined #openstack-nova | 11:18 | |
*** mlycka has joined #openstack-nova | 11:19 | |
*** ociuhandu has quit IRC | 11:23 | |
*** tbachman has quit IRC | 11:37 | |
*** zhanglong has quit IRC | 11:39 | |
*** yedongcan has left #openstack-nova | 11:43 | |
*** dpawlik has joined #openstack-nova | 11:48 | |
*** dpawlik has quit IRC | 11:52 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova stable/pike: Explicitly fail if trying to attach SR-IOV port https://review.opendev.org/695408 | 12:03 |
*** ociuhandu has joined #openstack-nova | 12:05 | |
*** ociuhandu has quit IRC | 12:11 | |
*** derekh has joined #openstack-nova | 12:12 | |
gibi | bauzas: you can send this to the age too https://review.opendev.org/#/c/686808 | 12:16 |
*** mriedem has joined #openstack-nova | 12:21 | |
*** ricolin has joined #openstack-nova | 12:27 | |
*** tetsuro_ has quit IRC | 12:29 | |
openstackgerrit | Wei Hui proposed openstack/nova master: bugfix device_type=type-PCI passthrough failed https://review.opendev.org/695416 | 12:34 |
*** ratailor has quit IRC | 12:36 | |
gibi | mriedem, stephenfin: thanks for the laugh https://review.opendev.org/#/c/489267/2/nova/network/neutronv2/api.py@181 | 12:39 |
*** slaweq has quit IRC | 12:39 | |
gibi | stephenfin: btw you have now a lot of +2s on your nova-net removal series :) | 12:41 |
efried | gmann: looking... | 12:42 |
*** dpawlik has joined #openstack-nova | 12:48 | |
mriedem | gibi: heh, anytime | 12:49 |
efried | gmann: that log file isn't loading up for me :( | 12:51 |
efried | It would be weird for a provider managed by nova to have 'openstack:' in its name. If it was made by some other service, maybe, but I've never heard of it. | 12:51 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Include removal of ephemeral backing files in the image cache manager https://review.opendev.org/689422 | 12:53 |
johnthetubaguy | stephenfin: I just had a look at the network API removal... are we not adding a microversion to signal when we removed the API, I think that would be more consistent with our rules? | 12:58 |
johnthetubaguy | I am guessing I missed about a year of conversation, so probably missed the reasoning | 12:58 |
mriedem | fun gate bug i finally wrote up after rechecking several times https://bugs.launchpad.net/tempest/+bug/1853453 | 13:00 |
openstack | Launchpad bug 1853453 in tempest "test_shelve_volume_backed_instance intermittently fails guest ssh with dhcp lease fail" [Undecided,New] | 13:00 |
mriedem | looks like this only hits in multinode jobs, | 13:00 |
mriedem | and looking at a recent failure, we shelve from the primary node and unshelve on the subnode | 13:00 |
mriedem | i wonder if the snapshot is no good in some cases and unshelving on another node causes some kind of issue | 13:00 |
*** artom has joined #openstack-nova | 13:00 | |
*** nweinber__ has joined #openstack-nova | 13:02 | |
*** nweinber__ has quit IRC | 13:05 | |
*** nweinber__ has joined #openstack-nova | 13:06 | |
mriedem | oh nvm there is no snapshot, it's volume-backed | 13:06 |
mriedem | derp | 13:06 |
gibi | efried: do you know if provider_tree in the report client will contain RPs associated to the compute RP via aggregate? this comment states it does https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/scheduler/client/report.py#L593 | 13:12 |
*** bhagyashris has quit IRC | 13:12 | |
gibi | efried: but I did not find in the impl where we query for those RPs from placement | 13:12 |
efried | gibi: it should, yes; the code has been written that way for a while, but we've never had a real scenario that uses it. | 13:12 |
*** rpittau|bbl is now known as rpittau | 13:12 | |
efried | hold on, let me find you the code (It's somewhere under _refresh_associations...) | 13:13 |
efried | ah, it looks like we specifically target only sharing providers (we specifically filter on MISC_SHARES) | 13:13 |
efried | which makes sense | 13:13 |
efried | we wouldn't want to just arbitrarily pick up anything in an aggregate. | 13:14 |
gibi | efried: cool, MISC_SHARES... is what I'm looking for. Do you have a link? | 13:14 |
efried | gibi: https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/client/report.py#L787-L806 ish | 13:15 |
gibi | efried: thanks a lot! | 13:15 |
gibi | shilpasd: ^^ | 13:15 |
bauzas | johnthetubaguy: we deprecated the APIs so AFAIK we don't need a microversion | 13:15 |
efried | gibi: that comment you pointed out should really say *sharing* providers. | 13:15 |
gibi | I can fix that quickly.. | 13:16 |
efried | bauzas: I see johnthetubaguy's point, though. If you come into one cloud and try to use nova-net at microversion 2.58 and it works, and then you come into a different cloud and it fails at the same microversion... | 13:16 |
bauzas | efried: sure, it's an interop issue | 13:16 |
bauzas | but | 13:16 |
johnthetubaguy | efried: I am more thinking, the SDK and CLI should be able to know when its missing, like with all our other APIs | 13:17 |
bauzas | if you use nova-network since 2.58, you are a bit having problems | 13:17 |
johnthetubaguy | to be clear, I think we should remove it, and it should return 210 | 13:17 |
efried | Either way, I think that ship has sailed, cause we've already ripped a bunch of stuff out. Pretty sure the point of no return was in train, too. | 13:17 |
bauzas | also, if you use OSC, you won't see the APIs be deleted | 13:17 |
*** slaweq has joined #openstack-nova | 13:17 | |
johnthetubaguy | I just think we should have a microversion that tells a user, if that is here, you know those APIs are gone | 13:17 |
johnthetubaguy | we can add it as the last patch in the series, and I think that kinda works, its just a handy hit | 13:18 |
*** tbachman has joined #openstack-nova | 13:18 | |
johnthetubaguy | it makes the docs easier though, if microversion X is available, you know this API will always return HTTP gone | 13:18 |
* johnthetubaguy end rant | 13:18 | |
zigo | bauzas: Hi there! I'm trying to start an instance with a GPU, and I get this: | 13:19 |
zigo | $ openstack server create --image bionic-server-cloudimg-amd64_20190726_GPU --nic net-id=bdb-blue-int01 --key-name yubikey-zigo --flavor cpu4-ram12-disk20-gpu-nvidia-p1000 --availability-zone=AZ2 zigo-gpu | 13:19 |
zigo | PCI alias nvidia-p1000 is not defined (HTTP 400) (Request-ID: req-b0514752-9e50-4e9b-a085-6ef9169b1d59) | 13:19 |
zigo | bauzas: Though I do have this in nova.conf: | 13:19 |
zigo | alias={"vendor_id":"10de","product_id":"1cb1","name":"nvidia-p1000","device_type":"type-PCI"} | 13:19 |
zigo | What am I missing? | 13:19 |
openstackgerrit | Elod Illes proposed openstack/nova stable/pike: Explicitly fail if trying to attach SR-IOV port https://review.opendev.org/695408 | 13:19 |
bauzas | johnthetubaguy: well, we could indeed | 13:19 |
bauzas | once all the APIs are giving HTTP410 | 13:20 |
johnthetubaguy | bauzas: it is a nice to have, really its the docs that worried me | 13:20 |
bauzas | stephenfin: ^ | 13:20 |
johnthetubaguy | zigo: is that in the API and Compute nova.conf? | 13:20 |
zigo | johnthetubaguy: Yeah... | 13:20 |
zigo | johnthetubaguy: It's ok to have multiple times: | 13:21 |
zigo | [pci] | 13:21 |
zigo | alias=<something> | 13:21 |
zigo | right? | 13:21 |
johnthetubaguy | hmm, I think that is how I screwed it once | 13:21 |
zigo | Cause I have 2 different boards in this cloud ... | 13:21 |
zigo | johnthetubaguy: You mean it should be only in the compute? | 13:21 |
johnthetubaguy | no, sorry, I was meaning it needs to be everyone, almost | 13:22 |
gmann | johnthetubaguy: bauzas efried : and that microversion is just to notify the users that nova-net APIs (including url) are gone and not maintaining those API for older microversion right ? | 13:22 |
johnthetubaguy | gmann: +1 | 13:22 |
bauzas | gmann: yup, just a signal | 13:22 |
gmann | ok. that make sense. | 13:22 |
gibi | works for me | 13:23 |
bauzas | zigo: not sure I understand | 13:23 |
bauzas | zigo: you need to set the alias value *once* | 13:23 |
zigo | bauzas: Yeah, but I have one Nvidia p1000 and one t4 (on 2 different compute nodes). | 13:24 |
efried | johnthetubaguy: I think the patch to make those 410s already flew... | 13:24 |
efried | johnthetubaguy: okay, patches plural. And yes, they merged in train. | 13:24 |
zigo | Therefore, I have: | 13:24 |
zigo | [pci] | 13:24 |
zigo | alias={"vendor_id":"10de","product_id":"1cb1","name":"nvidia-p1000","device_type":"type-PCI"} | 13:24 |
zigo | alias={"vendor_id":"10de","product_id":"1eb8","name":"nvidia-t4","device_type":"type-PCI"} | 13:24 |
zigo | Is this correct? | 13:24 |
efried | johnthetubaguy: so really we could do that microversion any time? | 13:25 |
efried | and the rest is just cleanup? | 13:25 |
johnthetubaguy | efried: yes, expect for the docs needing to be updated to reflect how you find out when its gone | 13:25 |
efried | okay. Pretty sure the docs were the first thing stephenfin hit -- but yeah, they wouldn't mention a microversion cutover if we didn't do that. | 13:26 |
johnthetubaguy | efried: we can go back an add that for sure | 13:26 |
efried | So you're looking for a patch that creates a new "signal microversion" and updates the docs accordingly. | 13:26 |
efried | that makes sense ++ | 13:26 |
johnthetubaguy | yeah, thanks, that clears it up in my head too | 13:27 |
johnthetubaguy | phew | 13:27 |
efried | you can still have the interop snafu I mentioned earlier. But at least in that scenario your "signal" is the 410. | 13:27 |
*** nanzha has quit IRC | 13:27 | |
sean-k-mooney | zigo: the nviad-t4 apparently supprot sriov so you have to set teh device_type to type-PF | 13:28 |
johnthetubaguy | efried: yeah, its that you could have known better | 13:28 |
sean-k-mooney | stephenfin: ^ that is the thin you were writing the docs patch for right | 13:28 |
sean-k-mooney | https://review.opendev.org/#/c/694522/ | 13:29 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Specify what RPs _ensure_resource_provider collects https://review.opendev.org/695429 | 13:29 |
zigo | sean-k-mooney: The issue is with my p1000, the t4 looks like working ... | 13:29 |
zigo | But thanks, I'll try. | 13:29 |
gibi | efried: fixed up the comment in _ensure_resource_provider_collects https://review.opendev.org/695429 | 13:29 |
sean-k-mooney | zigo: what is the issue you are having specificly | 13:29 |
* gibi needs to go afk for a while | 13:29 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Force config drive in nova-next multinode job https://review.opendev.org/695431 | 13:30 |
zigo | sean-k-mooney: PCI alias nvidia-p1000 is not defined (HTTP 400) | 13:30 |
zigo | (when trying to spawn my instance) | 13:30 |
gmann | efried: but from nova ussuri onwards, any microversion will be no-nova-net-api for all cloud so interop things needs to be taken care by discoverability of new microversion we will introduce instead of code handle that. | 13:30 |
sean-k-mooney | have you set the alias on both the compute nodes and the controler nodes | 13:30 |
zigo | sean-k-mooney: Yeah, I did that ... | 13:30 |
zigo | I did with puppet, so normally, it will have restart all nova services. | 13:30 |
sean-k-mooney | and you have it set in the [pci] section not default | 13:31 |
gmann | efried: RE: provider as 'openstack:' these are log it return from 60_nova/resources.sh:_get_inventory_value for VCPU etc: https://zuul.opendev.org/t/openstack/build/323afb9d5fd94f62b0bba4bac6004442/log/logs/grenade.sh.txt.gz#28851 | 13:32 |
gmann | and that is heppning when octavia grenade job is moving to py3. i ma not sure how it has to do with py3 htings. | 13:32 |
gmann | thins | 13:32 |
zigo | sean-k-mooney: That's all I did yes... | 13:32 |
gmann | https://review.opendev.org/#/c/693486/ | 13:32 |
*** nanzha has joined #openstack-nova | 13:33 | |
gmann | mriedem: have you seen this before in any grenade job - https://zuul.opendev.org/t/openstack/build/323afb9d5fd94f62b0bba4bac6004442/log/logs/grenade.sh.txt.gz#28854 | 13:33 |
zigo | Oh, another thing which is very annoying, I keep having in my logs: | 13:33 |
zigo | Instance 2aab6469-4292-4d04-80de-2ae2a7174b3a has been moved to another host clint1-compute-2.infomaniak.ch(clint1-compute-2.infomaniak.ch). There are allocations remaining against the source host that might need to be removed: {'resources': {'DISK_GB': 80, 'MEMORY_MB': 24576, 'VCPU': 8}}. | 13:33 |
zigo | Many of this ... | 13:34 |
zigo | Is this a known issue with Rocky? | 13:34 |
sean-k-mooney | that usually means you have not compuleted migrations using resize-verify | 13:34 |
zigo | sean-k-mooney: It's mostly all live migrations. | 13:34 |
efried | gmann: those CI results are still not loading up for me :( | 13:34 |
gmann | oh | 13:35 |
zigo | sean-k-mooney: IMO, the only thing that remains is the placement record... | 13:35 |
efried | gmann: if you have them open, maybe you could pastebin the relevant chunk? | 13:35 |
zigo | I could easily write a clean-up script I suppose. | 13:35 |
sean-k-mooney | efried: https://zuul.opendev.org/t/openstack/build/323afb9d5fd94f62b0bba4bac6004442/log/logs/grenade.sh.txt.gz#28854 loaded for me fine | 13:35 |
sean-k-mooney | zigo: it should be updated automatically | 13:36 |
*** mmethot has quit IRC | 13:36 | |
efried | gmann: ah, it's working now. | 13:36 |
mriedem | gmann: the broken pipe? no | 13:36 |
*** udesale has joined #openstack-nova | 13:37 | |
gmann | it return 'openstack:' as provider name from 60_nova/resources.sh:_get_inventory_value | 13:38 |
mriedem | there is a broken pipe right before that | 13:39 |
mriedem | 2019-11-20 17:52:23.572 | +++ /opt/stack/new/grenade/projects/60_nova/resources.sh:_get_inventory_value:57 : head -n1 2019-11-20 17:52:23.573 | +++ /opt/stack/new/grenade/projects/60_nova/resources.sh:_get_inventory_value:57 : openstack resource provider list -f value 2019-11-20 17:52:23.573 | +++ /opt/stack/new/grenade/projects/60_nova/resources.sh:_get_inventory_value:57 : cut -d ' ' -f 1 2019-11-20 17:52:24.613 | Exc | 13:39 |
mriedem | on raised: [Errno 32] Broken pipe | 13:39 |
mriedem | so parsing the output is failing | 13:39 |
mriedem | openstack resource provider list -f value | 13:40 |
mriedem | oops | 13:40 |
efried | gmann: yeah, this is gonna have nothing to do with a provider named 'openstack:'. It looks to me like we're parsing error output from the openstack command (which would start with 'openstack: $something_went_wrong') | 13:40 |
mriedem | provider=$(openstack resource provider list -f value | head -n1 | cut -d ' ' -f 1) | 13:40 |
mriedem | that's the command that's fialing | 13:40 |
mriedem | well, parsing that's failing | 13:41 |
mriedem | if there was just one provider we could do: | 13:41 |
mriedem | provider=$(openstack resource provider list -f value -c uuid) | 13:41 |
mriedem | but if it's a multinode grenade job then there will be more than one and that doesn't work | 13:41 |
efried | mriedem: well, we should do that anyway, and head -n1 it | 13:42 |
efried | i.e. don't do the cut | 13:42 |
efried | not that that would help here, because clearly the command is failing. | 13:42 |
efried | But why is the error output going to stdout rather than stderr? | 13:42 |
mriedem | of that provider list command allowed passing a --name for filtering, we could pass the local fqdn to get 1 result back... | 13:43 |
efried | Whole point of stderr is so exactly this doesn't happen, and you can see what actually went wrong. | 13:43 |
zigo | mriedem: A much nicer way using http://harelba.github.io/q/ : provider=$(openstack resource provider list --format csv | q -H -d, "SELECT uuid FROM - LIMIT 1") | 13:43 |
zigo | ;) | 13:43 |
efried | mriedem: to that point, it looks like we don't care *which* provider we're grabbing? That seems... weird. | 13:43 |
zigo | (q-text-as-data is such a nice tool...) | 13:43 |
mriedem | efried: we don't, | 13:43 |
mriedem | it's a smoke test to make sure that we can save off some inventory before upgrading and that after the upgrade it's still there | 13:44 |
gmann | pick up anything should be fine | 13:44 |
efried | okay. | 13:44 |
zigo | mriedem: Then you could do: provider=$(openstack resource provider list --format csv | q -H -d, "SELECT uuid FROM - WHERE name='something-you-want'") | 13:45 |
efried | anyway, the problem here seems to be that the openstack command -- the first thing in the pipe -- is failing, printing its error to stdout. | 13:45 |
mriedem | zigo: or i could just do provider=$(openstack resource provider list -f value -c uuid --name `hostname -f`) | 13:47 |
mriedem | and not rely on pipes and other tooling | 13:47 |
mriedem | but that provider list command doesn't support --name (yet - that's easy to add) | 13:47 |
gmann | API has the name filter ? | 13:48 |
mriedem | yes | 13:48 |
efried | one way to get output like 'openstack: $stuff' is if the openstack command doesn't exist. But that output goes to stderr like it should. | 13:49 |
*** haleyb has joined #openstack-nova | 13:49 | |
sean-k-mooney | gmann: its how neutron identifies the compute node resouce provider without needing to known the compute node uuid | 13:50 |
efried | I can't find anything in the code that's joining stderr to stdout. Unless the job itself is doing that. | 13:51 |
sean-k-mooney | it looks up the RP by hostname | 13:51 |
efried | Okay, apparently I'm the only one concerned about the fact that `openstack resource provider list` is producing bogus output in gmann's case, so I must be misunderstanding what we're actually trying to solve here. /me stfu, call if you need me. | 13:52 |
gmann | sean-k-mooney: and there it is working fine? failure case of octavia grenade job on py3. | 13:53 |
sean-k-mooney | they do that in code not via osc | 13:54 |
gmann | ohk | 13:54 |
sean-k-mooney | is looking up the provider by hostname all that is breakign the job? | 13:59 |
sean-k-mooney | it would be quick to fix osc-plamcenet to support that but equally quick to just do it with curl | 13:59 |
gmann | sean-k-mooney: fixing osc might take time with release etc until octavia job can install it from source. | 14:01 |
sean-k-mooney | do we know what cause the broken pipes? | 14:01 |
gmann | no. | 14:02 |
haleyb | sean-k-mooney: would adding osc-placement to requirements in octavia fix it as well? | 14:02 |
sean-k-mooney | i think mriedem said list does not currently support --name | 14:02 |
sean-k-mooney | i was looking at the greand job logs by the way gmann haleyb do you have the link to the octavia job | 14:04 |
johnsom | haleyb Yes, that is the error output you are seeing. Installing osc-placement should fix it. | 14:04 |
gmann | sean-k-mooney: https://review.opendev.org/#/c/693486/ | 14:04 |
johnsom | https://www.irccloud.com/pastebin/PrDd7zG6/ | 14:04 |
openstackgerrit | Merged openstack/nova stable/pike: Delete instance_id_mappings record in instance_destroy https://review.opendev.org/684658 | 14:05 |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova master: libvirt: Bump MIN_{LIBVIRT,QEMU}_VERSION for "Ussuri" https://review.opendev.org/695056 | 14:05 |
sean-k-mooney | ah right yes "'resource provider list -f value' is not an openstack command" | 14:05 |
kashyap | stephenfin: ^^ Fixed the functional test (forgot to run `tox -e functional-36`, bad me) | 14:05 |
johnsom | haleyb The question is which devstack plugin has the missing requirement | 14:05 |
sean-k-mooney | is becasue osc-placement is not installed | 14:05 |
gmann | johnsom: haleyb it is parsing of command failing not about osc command is need more installation etc | 14:05 |
kashyap | stephenfin: Thanks for the earlier review :-) | 14:06 |
kashyap | Uh, seems like I need a rebase... | 14:06 |
sean-k-mooney | johnsom: i guess the ocatavia one | 14:07 |
gmann | thus command we need to adjust to get the first RP- https://github.com/openstack/grenade/blob/fad62595bff3ae55b5b428a5ea00a9a168390fd2/projects/60_nova/resources.sh#L57 | 14:07 |
sean-k-mooney | but honestly it might be better for devstack to install osc-placemetn if placement is installed | 14:07 |
johnsom | sean-k-mooney Octavia devstack plugin isn't running that command. | 14:07 |
gmann | command i mean parsing logic | 14:07 |
sean-k-mooney | oh ok | 14:07 |
johnsom | It is openstack/grenade | 14:08 |
johnsom | projects/60_nova/resources.sh | 14:08 |
johnsom | line 57 | 14:08 |
sean-k-mooney | gmann: im confused on the octaiva patch you are linink to grenade | 14:08 |
sean-k-mooney | is this a but there are other fialng jobs too | 14:09 |
sean-k-mooney | *failing jobs | 14:09 |
*** factor has joined #openstack-nova | 14:09 | |
sean-k-mooney | are ye only looking at the grenade one at the moment | 14:09 |
gmann | yeah grenade one only i checked | 14:10 |
sean-k-mooney | well we could have grenade install it or as i said we could have devstack install it if placemnt is installed | 14:10 |
sean-k-mooney | grenade is using it so its resonable for it to install its own depencies | 14:11 |
johnsom | Yeah. It's odd that grenade doesn't have a requirements.txt though it obvious has requirements in it's scripts | 14:11 |
sean-k-mooney | well grenade is not a python porject | 14:11 |
*** jawad_axd has quit IRC | 14:12 | |
sean-k-mooney | its almost all bash so you would not pip install it | 14:12 |
gmann | osc-placement installation is not the issue here. | 14:13 |
*** jawad_axd has joined #openstack-nova | 14:13 | |
johnsom | Ha, yeah, I just noticed that. I guess you know now how much time I have looked at grenade.... lol | 14:13 |
johnsom | gmann Yes it is. If that is not installed the first output of OSC is "openstack:" which the script tries to parse. | 14:14 |
*** dpawlik has quit IRC | 14:15 | |
sean-k-mooney | right as your irccloud link show the message is "openstack: 'resource provider list -f value' is not an openstack command. See 'openstack --help'." | 14:15 |
johnsom | Yep | 14:15 |
gmann | https://zuul.opendev.org/t/openstack/build/323afb9d5fd94f62b0bba4bac6004442/log/logs/grenade.sh.txt.gz#13753 | 14:17 |
*** jawad_axd has quit IRC | 14:17 | |
kashyap | Is a rebase really necessary here? - https://review.opendev.org/#/c/695056/ | 14:18 |
sean-k-mooney | gmann: intersting | 14:20 |
johnsom | I wonder if it is using the py2 python-openstackclient | 14:25 |
efried | johnsom: I had thought so too, but similar commands just above that seem to be working fine | 14:26 |
johnsom | I see most of OSC also installed in the py2 environment there. | 14:26 |
efried | ohhh | 14:27 |
efried | the working commands above that aren't mucking with providers | 14:27 |
efried | so yeah, it's probably a matter of osc-placement being installed on the wrong py version. | 14:27 |
efried | gmann: ^ | 14:28 |
sean-k-mooney | it should be python 3 https://zuul.opendev.org/t/openstack/build/b26bd7fc82d94d4cac97b5481623b629/log/logs/grenade.sh.txt.gz#16522 | 14:28 |
sean-k-mooney | but if it was install on python 2 first | 14:29 |
johnsom | Well, it looks like it was properly installed in the 3.6 environment given it was a python3 devstack setup. It's just somehow the script is using the py2 openstack command. | 14:29 |
sean-k-mooney | then the console script would be python 2 | 14:29 |
johnsom | https://zuul.opendev.org/t/openstack/build/323afb9d5fd94f62b0bba4bac6004442/log/logs/pip2-freeze.txt.gz#77 | 14:29 |
sean-k-mooney | oh i know what the issue is | 14:30 |
sean-k-mooney | the first run install in python 2 right? | 14:30 |
gmann | yeah, let's recheck as devstack is all py3 now ? | 14:30 |
johnsom | Well, I guess that explains why it only fails when devstack is set to python3 | 14:30 |
sean-k-mooney | e.g. train would run with python 2 | 14:30 |
gmann | oh yeah | 14:30 |
sean-k-mooney | then we upgrade to ussuri with python 3 | 14:31 |
sean-k-mooney | and we keep the ocs console script form python 2 | 14:31 |
sean-k-mooney | then we install osc-placmeent in py36 | 14:31 |
sean-k-mooney | which the python2 version wont find | 14:31 |
*** tosky_ has joined #openstack-nova | 14:31 | |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova master: Pick NEXT_MIN libvirt/QEMU versions for "V" release https://review.opendev.org/694821 | 14:31 |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova master: libvirt: Bump MIN_{LIBVIRT,QEMU}_VERSION for "Ussuri" https://review.opendev.org/695056 | 14:31 |
*** tosky has quit IRC | 14:32 | |
*** tbachman has quit IRC | 14:33 | |
*** tosky_ is now known as tosky | 14:33 | |
sean-k-mooney | so we would not see this if we set USE_PYTHON3=True in the grenade job | 14:35 |
sean-k-mooney | since both version would run on python 3 | 14:36 |
*** tbachman has joined #openstack-nova | 14:36 | |
sean-k-mooney | and in a real install the package manager/installer would unistall the python 2 versions when installing the python3 version for ussuri | 14:37 |
sean-k-mooney | or in container land you would jsut spin up the new python3 only container inplace of the old contianer | 14:37 |
sean-k-mooney | im pretty sure kolla already move to python 3 contaienr in train for what its worth | 14:38 |
*** awalende has quit IRC | 14:38 | |
*** awalende has joined #openstack-nova | 14:38 | |
*** awalende has quit IRC | 14:41 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: PoC for using COMPUTE_SAME_HOST_COLD_MIGRATE https://review.opendev.org/695220 | 14:41 |
*** awalende has joined #openstack-nova | 14:41 | |
*** mmethot has joined #openstack-nova | 14:43 | |
*** Luzi has quit IRC | 14:46 | |
*** davee_ has quit IRC | 14:46 | |
mriedem | does any of this grenade talk have anything to do with nova/ | 14:48 |
mriedem | ? | 14:48 |
mriedem | still the resource provider thing or what? | 14:48 |
*** tbachman has quit IRC | 14:48 | |
*** davee_ has joined #openstack-nova | 14:50 | |
johnsom | lol, well, it's the nova scripting in grenade that is failing. That is the tie back to nova. | 14:52 |
mriedem | i could have sworn at one point in grenade we had some kind of hack where we'd re-install python-openstackclient b/c we went from 2 to 3 | 14:54 |
mriedem | but i'm not finding that | 14:54 |
*** tosky has quit IRC | 14:54 | |
*** mlycka has quit IRC | 14:54 | |
*** tosky has joined #openstack-nova | 14:57 | |
*** ayoung has joined #openstack-nova | 14:58 | |
*** JamesBenson has joined #openstack-nova | 14:59 | |
*** igordc has joined #openstack-nova | 15:01 | |
*** pcaruana has quit IRC | 15:02 | |
*** ociuhandu has joined #openstack-nova | 15:04 | |
*** igordc has quit IRC | 15:05 | |
*** igordc has joined #openstack-nova | 15:06 | |
*** ociuhandu has quit IRC | 15:09 | |
*** igordc has quit IRC | 15:14 | |
*** igordc has joined #openstack-nova | 15:14 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Avoid spurious error logging in _get_compute_nodes_in_db https://review.opendev.org/695453 | 15:15 |
ayoung | What does Nova do to modify the kernel command line? It has to happen prior to cloud-init. I can see docs that imply I should be able to do this: glance image-update --property kernel_extra_args="coreos.inst.ignition_url=http://example.com/config.ign " coreos-xx-bootstrap | 15:17 |
ayoung | but that does not show up when the image boots, and I am guessing I need to pass that through to the instance somehow | 15:17 |
sean-k-mooney | if you pass a seperate kernel image in addtion to the root image i think we can pass the kernel arges to qemu | 15:19 |
sean-k-mooney | but in general im not sure how much that featuer is used or tested | 15:19 |
sean-k-mooney | i do not belive you can use it with just a root image | 15:19 |
johnthetubaguy | ayoung did you try os_command_line: https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/objects/image_meta.py#L463 | 15:20 |
*** pcaruana has joined #openstack-nova | 15:20 | |
johnthetubaguy | hmm, I am not so sure we do anything with that, ignore me | 15:20 |
sean-k-mooney | johnthetubaguy: isnt os_command_line jsut for lxc and other containers | 15:20 |
sean-k-mooney | like openvz | 15:20 |
mriedem | "The kernel command line to be used by the libvirt driver, instead of the default. For Linux Containers (LXC), the value is used as arguments for initialization. This key is valid only for Amazon kernel, ramdisk, or machine images (aki, ari, or ami)." | 15:21 |
johnthetubaguy | yeah, my bad | 15:21 |
ayoung | BTW, this is a pretty good argument for Nova supporting ignition the same way we do cloud-init...it happens earlier in the process | 15:21 |
sean-k-mooney | its also a good argument for ignition supporting the metadata service :P | 15:22 |
ayoung | I know that the openshift install, which works via terraform, does something to inject this value. I do not know what that is | 15:22 |
*** jmlowe has joined #openstack-nova | 15:22 | |
ayoung | So, I think the metadata service would work. I think what you are saying is that the ignition mech should default to the cloud-init URL, somehow? | 15:22 |
ayoung | Well, not the URL, but the host, and a separate URL specific to ignition. | 15:23 |
sean-k-mooney | i was suggesting that ignition could try to hit the metadta url and then load info form it | 15:23 |
ayoung | Yep | 15:23 |
*** priteau has quit IRC | 15:23 | |
*** jawad_axd has joined #openstack-nova | 15:24 | |
ayoung | or a logical shift from it, like: http://meta-data-host/ignition | 15:24 |
sean-k-mooney | ayoung: anyway the way that was all ment to work was ehn you boot the vm you provide a sperate kernel image with the kernel command line parmater set and then nova woudl use the root iamge and kernel image when booting and pass the kernel args form teh kernel image to qemu | 15:25 |
ayoung | And Nova/metadata server then would be responsible for multiplexing between the different instances | 15:25 |
ayoung | I see. I don;t think that the installer (terraform) is doing any of that. But I can reproduce this afternoon and determine what it IS doing | 15:26 |
*** awalende has quit IRC | 15:27 | |
*** larainema has quit IRC | 15:27 | |
*** jawad_axd has quit IRC | 15:28 | |
*** damien_r has quit IRC | 15:33 | |
mriedem | efried: heh, oops https://review.opendev.org/#/c/694897/ | 15:35 |
mriedem | not sure i/we missed that | 15:36 |
mriedem | i mean, it's zvm so totally forgettable but otherwise | 15:36 |
openstackgerrit | Dan Smith proposed openstack/nova master: ZVM: Implement update_provider_tree https://review.opendev.org/694897 | 15:37 |
*** awalende has joined #openstack-nova | 15:40 | |
*** ociuhandu has joined #openstack-nova | 15:40 | |
*** ociuhandu has quit IRC | 15:41 | |
*** ociuhandu has joined #openstack-nova | 15:42 | |
aarents | Hi dansmith, can you confirm this last upaste is ok for you since your were holding -1 on that https://review.opendev.org/#/c/670000 ? thanks ! | 15:44 |
dansmith | aarents: yeah, I was waiting for CI before, but I'll circle back | 15:45 |
dansmith | aarents: we probably want to have a few people look at that and sanity check my thinking there | 15:45 |
dansmith | maybe gibi since he was previously okay with the other fix, at least | 15:45 |
dansmith | obviously mriedem is always good at everything | 15:46 |
aarents | yes good idea | 15:46 |
mriedem | umm, rescue + disk bus = i defer to lyarwood | 15:47 |
*** ociuhandu has quit IRC | 15:47 | |
dansmith | eh? | 15:47 |
dansmith | it's not really rescue related, | 15:48 |
mriedem | "There is a case during rescue where this value can be mistakenly updated | 15:48 |
mriedem | to reflect disk bus property of rescue image (hw_disk_bus)." | 15:48 |
dansmith | rescue is just one place where this side effect screws us | 15:48 |
dansmith | right, but.. the change is actually more generic | 15:48 |
dansmith | but yes, lyarwood would be a good person to look also | 15:48 |
mriedem | my gut feeling on something like this is it fixes one thing and breaks another | 15:49 |
mriedem | and i don't have the background on that | 15:49 |
dansmith | mriedem: check out my comments on the earlier set | 15:49 |
dansmith | mriedem: a patch from gary unceremoniously removed all instance.save()s he thought were unnecessary, | 15:49 |
dansmith | which removed the one right after this that was there originally, | 15:49 |
dansmith | so we just almost never save this change | 15:49 |
dansmith | and also, | 15:49 |
dansmith | making a change to instance within a "just generate me the xml" method is crazy wrong | 15:50 |
*** priteau has joined #openstack-nova | 15:50 | |
dansmith | but rescue just happens to tickle things right so we mangle the disk bus, but never save() it back to normal | 15:50 |
* lyarwood reads up | 15:50 | |
sean-k-mooney | we should not be saving the disk bus back but we shoudl be usign for the rescue xml | 15:51 |
sean-k-mooney | i dont think there has ever been an expection that the /dev/sd* names for the rescue boot would be the same as when the instnace is booted normally | 15:52 |
dansmith | sean-k-mooney: read the patches I linked in my analysis earlier | 15:53 |
*** mlavalle has joined #openstack-nova | 15:53 | |
mriedem | would have been useful to have that context in the commit message, if just summarized from the big comment in PS1 | 15:53 |
dansmith | sean-k-mooney: the original change from like 2012 was trying to use get_xml as a hook to update the info late after libvirt had chosen defaults | 15:53 |
mriedem | tl;dr https://review.opendev.org/#/c/119622/ removed the code that would persist this change and yet we can still incorrectly persist it in edge cases incorrectly (like rescue) and we shouldn't be modifying the instance in a _get* method anyway. | 15:54 |
sean-k-mooney | ok ya we should not | 15:54 |
dansmith | mriedem: exactly | 15:55 |
mriedem | so do that, dan can +2 and ill +W | 15:55 |
dansmith | aarents: ^ | 15:55 |
*** TxGirlGeek has joined #openstack-nova | 15:56 | |
*** TxGirlGeek has quit IRC | 15:58 | |
aarents | dansmith: So I rephrase commit message with more context ? | 15:59 |
*** tbachman has joined #openstack-nova | 16:00 | |
dansmith | aarents: yeah just add all that context in there to make mriedem happy | 16:00 |
aarents | ok got it ! | 16:00 |
*** TxGirlGeek has joined #openstack-nova | 16:00 | |
mgoddard | hi mriedem, got a minute to discuss https://review.opendev.org/#/c/684849 ? | 16:00 |
*** derekh has quit IRC | 16:03 | |
*** mlavalle has quit IRC | 16:03 | |
*** udesale has quit IRC | 16:04 | |
*** jawad_axd has joined #openstack-nova | 16:05 | |
*** mlavalle has joined #openstack-nova | 16:09 | |
*** jawad_axd has quit IRC | 16:09 | |
*** sapd1 has joined #openstack-nova | 16:12 | |
*** nanzha has quit IRC | 16:13 | |
mriedem | mgoddard: sure | 16:15 |
mriedem | preface: i have lost a lot of context on that bug and fix | 16:15 |
mriedem | so i'll likely just abandon my changes and you can move forward | 16:16 |
mgoddard | do you happen to remember how the RP association becomes stale after your patch | 16:16 |
mgoddard | I can't work it out from the code | 16:16 |
mgoddard | https://review.opendev.org/#/c/684849/2/nova/tests/functional/regressions/test_bug_1841481.py@129 | 16:17 |
*** jmlowe has quit IRC | 16:17 | |
*** ociuhandu has joined #openstack-nova | 16:18 | |
mriedem | i think the comment from L79 | 16:21 |
mriedem | because host1 deletes the provider between | 16:21 |
mriedem | # _check_for_nodes_rebalance and _refresh_associations. | 16:21 |
*** bhagyashris has joined #openstack-nova | 16:22 | |
mriedem | i think because we either don't add the provider uuid to _association_refresh_time or we pop it out on failure if the provider doesn't exist | 16:24 |
mriedem | it's all linked to when the ResourceTracker calls SchedulerReportClient.get_provider_tree_and_ensure_root | 16:25 |
mriedem | and then you go down the rabbit hole | 16:25 |
*** damien_r has joined #openstack-nova | 16:25 | |
*** jawad_axd has joined #openstack-nova | 16:26 | |
*** ociuhandu has quit IRC | 16:27 | |
mgoddard | ok, I think I see. The RP wasn't in _association_refresh_time because it hasn't been in the local tree yet. If the RP exists in placement, that means we only update _association_refresh_time after _refresh_associations is done | 16:29 |
*** jawad_axd has quit IRC | 16:30 | |
mriedem | i'm going to say yes | 16:30 |
*** jaosorior has joined #openstack-nova | 16:33 | |
mgoddard | that part makes sense now. I still don't see how the node not being removed from the RT compute_nodes prevents placement from getting healed though - each time through the loop we call _update, which calls _update_placement | 16:34 |
*** tbachman has quit IRC | 16:35 | |
mgoddard | I probably need to stop thinking about this, going a little mad. I ran your functional test with my patch chain and it seemed to fix the issue. | 16:36 |
efried | mriedem: yeah, I was *sure* that one was already implemented. Oh well :( | 16:37 |
*** jaosorior has quit IRC | 16:38 | |
*** jaosorior has joined #openstack-nova | 16:39 | |
*** ricolin has quit IRC | 16:39 | |
mriedem | mgoddard: i don't want to think about it anymore either which is why i abandoned my changes | 16:43 |
*** tbachman has joined #openstack-nova | 16:46 | |
*** TxGirlGeek has quit IRC | 16:48 | |
*** _mlavalle_1 has joined #openstack-nova | 16:49 | |
*** mlavalle has quit IRC | 16:52 | |
*** TxGirlGeek has joined #openstack-nova | 16:54 | |
mgoddard | mriedem: hopefully you (or someone) will face thinking about it to review my patches at some point | 16:54 |
*** JamesBen_ has joined #openstack-nova | 16:54 | |
*** JamesBen_ has quit IRC | 16:55 | |
*** sapd1 has quit IRC | 16:56 | |
mriedem | can i get a stable core here? https://review.opendev.org/#/q/topic:bug/1849409+branch:stable/rocky | 16:57 |
mriedem | the changes to queens/pike/ocata are dependent on that since i have to redo them | 16:57 |
*** JamesBenson has quit IRC | 16:58 | |
mriedem | melwitt: were these something you wanted to get downstream? https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:stable/stein+topic:heal_allocations_dry_run | 16:59 |
mriedem | i know eandersson was saying he used those in rocky | 16:59 |
*** jaosorior has quit IRC | 17:00 | |
mriedem | bauzas: could you take a look at these train backports? https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:stable/train+topic:bug/1852610 | 17:01 |
*** bhagyashris has quit IRC | 17:01 | |
melwitt | mriedem: I don't know that it's come up specifically (everything's on queens) but yeah definitely could use it I'm sure. really most likely is I'd want to get heal_allocations in queens in the first place (downstream) and then backport the --instance and --dry-run too | 17:02 |
donnyd | mriedem: :( | 17:02 |
melwitt | I dunno how doable that would be, I haven't tried it yet | 17:03 |
mriedem | donnyd: what? the lxc unicorn ci job that no one cares about? | 17:03 |
donnyd | I care | 17:03 |
donnyd | Lol | 17:03 |
mriedem | "care" and "care enough to work on" are different things, and i don't care enough to work on that anymore | 17:04 |
donnyd | Hence my :( | 17:04 |
mriedem | yeah i know | 17:05 |
mriedem | freedom ain't free and all that | 17:05 |
donnyd | Well I am very appreciative of the time that has been put in | 17:06 |
*** rpittau is now known as rpittau|afk | 17:06 | |
melwitt | https://www.youtube.com/watch?v=tzW2ybYFboQ | 17:07 |
*** ociuhandu has joined #openstack-nova | 17:07 | |
donnyd | melwitt: zomg lol | 17:07 |
melwitt | :) | 17:07 |
mriedem | heh yo'uve never seen that? | 17:08 |
mriedem | it's the only reason i say it | 17:08 |
donnyd | Oh I have, and it makes me laugh every time | 17:09 |
mriedem | costs about a buck-o-five | 17:09 |
*** tssurya has quit IRC | 17:10 | |
mriedem | ok on that note, my wife wants me out of the gd house for a few hours so i'm going to lunch, errands and then to be driven crazy working at a coffee shop so bbiab | 17:10 |
*** dtantsur is now known as dtantsur|afk | 17:10 | |
*** tbachman has quit IRC | 17:10 | |
*** mriedem has quit IRC | 17:10 | |
*** tbachman has joined #openstack-nova | 17:11 | |
*** gyee has joined #openstack-nova | 17:17 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: block_device: Copy original volume_type when missing for snapshot based volumes https://review.opendev.org/694497 | 17:17 |
*** ociuhandu has quit IRC | 17:17 | |
*** priteau has quit IRC | 17:20 | |
*** priteau has joined #openstack-nova | 17:22 | |
*** priteau has quit IRC | 17:22 | |
*** tbachman has quit IRC | 17:23 | |
*** ociuhandu has joined #openstack-nova | 17:36 | |
*** ociuhandu has quit IRC | 17:36 | |
*** ociuhandu has joined #openstack-nova | 17:37 | |
*** ociuhandu has quit IRC | 17:37 | |
*** ociuhandu has joined #openstack-nova | 17:38 | |
*** damien_r has quit IRC | 17:42 | |
*** ociuhandu has quit IRC | 17:44 | |
*** TxGirlGeek has quit IRC | 17:45 | |
*** sridharg has quit IRC | 17:48 | |
*** tbachman has joined #openstack-nova | 17:50 | |
*** pcaruana has quit IRC | 17:51 | |
*** TxGirlGeek has joined #openstack-nova | 17:54 | |
sean-k-mooney | gibi: our downstream qa just found an issue with how we report the bandwidth provires | 17:59 |
sean-k-mooney | nova creates the compute node rp with the hypervior_hostname as the RP name | 18:00 |
sean-k-mooney | which means if you change the compute node host with the host config value in the nova and neutron config | 18:00 |
*** TxGirlGeek has quit IRC | 18:01 | |
sean-k-mooney | we cannot find the root RP | 18:01 |
sean-k-mooney | so for it to work compute node host and hypervior_hostname meed to match | 18:04 |
efried | that sounds like a problem with how we report providers in general. Are you really supposed to be able to change that config for an existing service? | 18:04 |
efried | do we actually rename the provider correctly in that case? | 18:04 |
sean-k-mooney | this is a clean deployment | 18:05 |
sean-k-mooney | so no rename | 18:05 |
sean-k-mooney | but that would also be a problem | 18:05 |
*** dacbxyz has joined #openstack-nova | 18:06 | |
sean-k-mooney | efried: http://paste.openstack.org/show/786502/ | 18:07 |
*** TxGirlGeek has joined #openstack-nova | 18:07 | |
sean-k-mooney | efried: nova is correctly using the host confiv vlaue "sriov01.localdomain" for the host as is neutron and that is also the vlaue set in the neutron port bindings | 18:08 |
sean-k-mooney | efried: the rp uses the hypervior_hostname which makes sense for ironic | 18:08 |
sean-k-mooney | but for libvirt this is an issue | 18:08 |
sean-k-mooney | sriov01.mobius.lab.eng.rdu2.redhat.com is actully the real hostname set in /etc/hosts | 18:09 |
sean-k-mooney | sorry /etc/hostname | 18:09 |
efried | can you get a UUID from `openstack hypervisor list`? | 18:09 |
sean-k-mooney | no it return the internal data base id intead which is a differnte issue | 18:10 |
sean-k-mooney | i gues a show might work | 18:10 |
sean-k-mooney | no no uuid | 18:11 |
efried | well, my point is that the compute node's *UUID* should be predictable (it's compute_node.uuid) | 18:11 |
efried | so, once again, trying to use RP name for *anything* is dangerous and brittle. | 18:12 |
sean-k-mooney | sure but only nova know the uuid | 18:12 |
openstackgerrit | John Garbutt proposed openstack/nova master: WIP: review comments around unit test idea https://review.opendev.org/695547 | 18:13 |
efried | that can't be true | 18:13 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: block_device: Copy original volume_type when missing for snapshot based volumes https://review.opendev.org/694497 | 18:13 |
sean-k-mooney | why cant it. | 18:13 |
efried | you telling me there's no way to discover the compute node UUID from the host? | 18:13 |
efried | seems like I asked mriedem about this the other day... | 18:13 |
sean-k-mooney | not via the hyperviors api | 18:13 |
sean-k-mooney | or the compute service list | 18:14 |
*** pcaruana has joined #openstack-nova | 18:14 | |
efried | ...okay, the result of said discussion was "the RP name matches CONF.host" -- which you've just proven ain't true. | 18:16 |
sean-k-mooney | right it was ment to | 18:16 |
sean-k-mooney | it actully match the result of calling get_hostname | 18:16 |
sean-k-mooney | or what ever that function is called | 18:16 |
*** ociuhandu has joined #openstack-nova | 18:16 | |
sean-k-mooney | ill grant you the fact that they have set the confg value so they dont amctch is a little weired but it shoudl actully work | 18:18 |
openstackgerrit | John Garbutt proposed openstack/nova master: WIP: just an idea, adding scope checking https://review.opendev.org/695550 | 18:18 |
efried | dammit, having it be hypervisor_hostname is the *right* thing | 18:18 |
efried | but yeah, it makes discoverability problematic. | 18:18 |
sean-k-mooney | yes it would be if we used that when talking to neutron and cinder | 18:19 |
sean-k-mooney | but we dont | 18:19 |
efried | nova.compute.resource_tracker.ResourceTracker._update_to_placement would have to condition on "am I ironic?" | 18:19 |
efried | https://opendev.org/openstack/nova/src/branch/master/nova/compute/resource_tracker.py#L1111-L1112 | 18:20 |
efried | heck, I don't even know if we have a way to tell at that point in the code whether we're ironic. | 18:21 |
efried | Can't we just deprecate CONF.host? :P | 18:22 |
sean-k-mooney | possibly although i dont think that is the only place we can creat he compute node record | 18:22 |
sean-k-mooney | https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/compute/resource_tracker.py#L671-L699 | 18:22 |
efried | the one I linked is the only place we create the provider | 18:22 |
efried | I'm not talking about changing what's in the compute node record. | 18:22 |
sean-k-mooney | ah ok | 18:23 |
sean-k-mooney | ya sorry i was trying to figure out where the hyperviour hostname was set orginally in the compute node | 18:24 |
*** dacbxyz has quit IRC | 18:24 | |
sean-k-mooney | in the libvirt case i had kind of assumed it shoudl always match the CONF.host value | 18:25 |
sean-k-mooney | but i does not | 18:25 |
efried | it appears as though it's up to the individual driver to set hypervisor_hostname in that resources dict you pointed to. | 18:26 |
efried | And the libvirt driver asks the libvirt API. | 18:26 |
efried | nova.virt.libvirt.host.Host.get_hostname | 18:26 |
efried | which I can only imagine does effectively `hostname` | 18:27 |
sean-k-mooney | ya which we cant change becasue i think it uses that for live migration | 18:27 |
efried | can't and shouldn't. | 18:27 |
efried | It would also be a nightmare at this point to try to change the resource provider name I think. | 18:27 |
sean-k-mooney | ya | 18:27 |
sean-k-mooney | so really the host config option cant be used | 18:28 |
efried | correct | 18:28 |
efried | at least with libvirt and bandwidth | 18:28 |
sean-k-mooney | unless you are setting it to the value returin by get_hostname | 18:28 |
efried | can you think of other places this could break us? | 18:28 |
sean-k-mooney | cyborg | 18:28 |
sean-k-mooney | it will be doing the same to creat its resouce right? | 18:29 |
*** tosky has quit IRC | 18:29 | |
sean-k-mooney | it was going to look up the RP by name | 18:29 |
efried | which also relies on the correlation between the hypervisor and the RP? | 18:29 |
efried | Sundar isn't here, could go check the code... | 18:29 |
efried | but I'm not gonna. | 18:30 |
efried | I have him on slack, will ask... | 18:30 |
sean-k-mooney | ill check | 18:30 |
sean-k-mooney | so first sign is not promising https://github.com/openstack/cyborg/blob/8f78a05a24ea17ad1036b32e05dbc75233cd374d/cyborg/conductor/manager.py#L360-L372 | 18:31 |
sean-k-mooney | https://github.com/openstack/cyborg/blob/369abe8dd06aa6648298c3256f444a63ee6268d0/cyborg/agent/manager.py#L40 | 18:33 |
sean-k-mooney | so it can be set in the config too | 18:33 |
*** martinkennelly has quit IRC | 18:34 | |
sean-k-mooney | so yes cyborg would also have to have that set to have the same value set | 18:35 |
sean-k-mooney | https://github.com/openstack/cyborg/blob/369abe8dd06aa6648298c3256f444a63ee6268d0/cyborg/conf/default.py#L38-L45 | 18:35 |
sean-k-mooney | they default to socket.getfqdn() | 18:36 |
sean-k-mooney | nova defualt to socket.getHostname() https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484626/nova/conf/netconf.py#L55-L59 | 18:36 |
sean-k-mooney | and neutron has there own function | 18:37 |
sean-k-mooney | https://github.com/openstack/neutron/blob/f8b990736ba91af098e467608c6dfa0b801ec19c/neutron/conf/common.py#L94-L99 | 18:37 |
sean-k-mooney | which just calls socket.gethostname() | 18:38 |
sean-k-mooney | https://github.com/openstack/neutron-lib/blob/master/neutron_lib/utils/net.py#L23-L28 | 18:38 |
*** gmann is now known as gmann_afk | 18:38 | |
sean-k-mooney | so at least nova and neutron agree on the default cyborg could get a differnt value | 18:39 |
sean-k-mooney | efried: i dont think we can change anything on the nova side so i might have to file a triplo bug | 18:40 |
sean-k-mooney | although this could be related to this specific deploment | 18:41 |
*** pcaruana has quit IRC | 18:42 | |
*** dacbxyz has joined #openstack-nova | 18:47 | |
*** mloza has joined #openstack-nova | 18:51 | |
*** tbachman has quit IRC | 18:53 | |
*** ralonsoh has quit IRC | 18:55 | |
*** pcaruana has joined #openstack-nova | 19:06 | |
*** mriedem has joined #openstack-nova | 19:15 | |
*** dviroel has joined #openstack-nova | 19:17 | |
mriedem | test_encrypted_cinder_volumes_luks might be failing all the time on multinode jobs too if the volume and server are on different hosts, i wonder if that's causing that to fail | 19:20 |
mriedem | nvm i guess that's not the case in the failure i'm looking at | 19:22 |
lyarwood | mriedem: link? I'm about for 15 and can take a look. | 19:45 |
*** openstackstatus has quit IRC | 19:50 | |
mriedem | i closed it but it's a bug we've already talked about before https://bugs.launchpad.net/os-brick/+bug/1820007 | 19:50 |
openstack | Launchpad bug 1820007 in os-brick "Failed to attach encrypted volumes after detach: volume device not found at /dev/disk/by-id" [Undecided,Confirmed] | 19:50 |
*** openstackstatus has joined #openstack-nova | 19:51 | |
*** ChanServ sets mode: +v openstackstatus | 19:51 | |
lyarwood | mriedem: ah right, that weirdness | 19:54 |
*** tbachman has joined #openstack-nova | 19:59 | |
lyarwood | mriedem: https://review.opendev.org/695564 - I'll follow up in the morning with this. | 20:05 |
*** TxGirlGeek has quit IRC | 20:06 | |
mriedem | word | 20:10 |
*** ociuhandu has quit IRC | 20:13 | |
*** ociuhandu has joined #openstack-nova | 20:15 | |
*** ociuhandu has quit IRC | 20:19 | |
*** ianw has quit IRC | 20:20 | |
*** TxGirlGeek has joined #openstack-nova | 20:25 | |
*** ianw has joined #openstack-nova | 20:27 | |
melwitt | mriedem: I left comments on the --instance and --dry-run backports for the docs | 20:29 |
mriedem | melwitt: yeah good catch, i left some thoughts/options in https://review.opendev.org/#/c/693199/1/doc/source/cli/nova-manage.rst@358 | 20:39 |
dansmith | mriedem: speaking of getting off your lawn, | 20:39 |
dansmith | does it not seem hilarious that we had "% locals()" all over the code, removed it all, and then python baked it into the language? | 20:39 |
mriedem | i was reading that article ttx linked and thought that exact same thing | 20:40 |
mriedem | "oh fun this is just using locals" | 20:40 |
dansmith | right | 20:40 |
mriedem | not breakable at all! | 20:40 |
dansmith | so a new incompatible syntax for ... something you can already do but shouldn't | 20:40 |
dansmith | hah right | 20:40 |
mriedem | i also didn't know that % pre-dated format() | 20:40 |
mriedem | i prefer % | 20:40 |
dansmith | me too, because I'm old school | 20:40 |
dansmith | I've been doing python since before .format and since before you saw the non-java light, whippersnapper | 20:41 |
mriedem | hey, now we're both losers because there is go: java for python people | 20:41 |
dansmith | heh | 20:41 |
dansmith | indeed | 20:41 |
mriedem | umm, + pointers | 20:41 |
*** ociuhandu has joined #openstack-nova | 20:42 | |
mriedem | f strings also looks like you can run functions from within strings, which .... | 20:44 |
mriedem | seems like erlang? | 20:44 |
dansmith | you can definitely hit dictionaries, which seems borderline evil | 20:44 |
dansmith | running functions and you're nearly bash | 20:44 |
*** gmann_afk is now known as gmann | 20:45 | |
dansmith | I also don't like the single-char prefix to a quoted string syntax python seems to love | 20:45 |
dansmith | u"foo", r"bar", etc | 20:45 |
dansmith | and all these new kids with their hoverboards and roller skates...*fist* | 20:46 |
melwitt | mriedem: replied. no strong opinion here | 20:46 |
melwitt | I'm cool with whatever you think is best | 20:47 |
mriedem | melwitt: but what's your take on f-strings?! | 20:47 |
melwitt | lol | 20:47 |
mriedem | f'em right?! | 20:47 |
melwitt | I didn't know what f-strings is yet so I'm way ahead of the game | 20:47 |
mriedem | i learned <1 hour ago | 20:47 |
melwitt | I read a few of the posts and had no idea what was going on | 20:47 |
melwitt | and of course, I didn't read the first one, cause BORING | 20:47 |
melwitt | jk | 20:48 |
*** spatel has joined #openstack-nova | 20:48 | |
melwitt | I mean, I really didn't read it but not because it was boring. once I saw replies piling up I read a few | 20:48 |
* artom gates how cold migration, rebuild and resize are all intertwined all over the place | 20:51 | |
artom | Hates, even | 20:52 |
melwitt | I think we all do. I've been thinking about how nice it would be to refactor those things in a non-boil the ocean way. somehow. | 20:54 |
dansmith | I don't | 20:54 |
dansmith | I love it | 20:54 |
dansmith | LOVE | 20:55 |
melwitt | :) | 20:55 |
artom | Microversion 2.whatever "support for resize, cold migration, and rebuild is dropped" | 20:56 |
artom | Rewrite everything from scratch | 20:56 |
artom | Microversion 2.whatever+1 "Nova now supports resize, cold migration, and rebuild" | 20:56 |
melwitt | lol, nah just leave em out | 20:57 |
artom | Hah, yeah, just conveniently forget step 3 | 20:57 |
artom | Or step 2, even | 20:57 |
mriedem | artom: how are rebuild and resize/cold migrate intertwined? | 20:58 |
efried | Nova meeting in 2 minutes in #openstack-meeting | 20:58 |
artom | mriedem, through sheer willpower and anger | 20:58 |
efried | mriedem: they're not, rebuild and evacuate are the same thing though. | 20:58 |
mriedem | don't make me link the doc i wrote again | 20:58 |
artom | OK, resize and cold migrate then | 20:58 |
artom | Mea culpa | 20:58 |
mriedem | ok so write a resize vs cold migrate doc like this https://docs.openstack.org/nova/latest/contributor/evacuate-vs-rebuild.html | 20:59 |
mriedem | easy | 20:59 |
mriedem | 1. resize has a new flavor, cold migrate doesn't, | 20:59 |
gregwork | does nova in queens understand how to boot instances on a particular host aggregate? ive tried tagging the host aggregate "compute = 1" and then passing scheduler_hints: compute: 1 when deploying a stack with OS::Nova::Server .. however that doesnt appear to do anything and the instances ends up on whatever compute node | 20:59 |
mriedem | 2. resize can sometimes go on the same host, cold migrate does not except if you're using vcenter | 20:59 |
mriedem | artom: what other differences are there besides those 2 things? | 21:00 |
melwitt | gregwork: has to be availability zone. host agg is a admin only hidden thing | 21:00 |
*** spatel has quit IRC | 21:00 | |
artom | mriedem, I think that's about it? | 21:00 |
artom | And resize to same host is user-configurable | 21:00 |
gregwork | melwitt: oh .. so even if the admin has defined the host agg, regular users cant reference it ? | 21:00 |
mriedem | JESUS THEY ARE SO INTERTWINED!!! | 21:00 |
artom | Cold migrate is... always to a new host? | 21:00 |
mriedem | artom: you mean operator configurable... | 21:01 |
artom | mriedem, just... let me be angry | 21:01 |
artom | mriedem, sorry, yeah | 21:01 |
artom | I meant in the code, anyways | 21:01 |
mriedem | artom: remember https://review.opendev.org/#/c/676022/ | 21:01 |
melwitt | gregwork: they can but only by way of a AZ. you have to make one and map it to the aggregate(s) you want (as admin) | 21:01 |
artom | Like, you hit a method called "_cold_migrate" in the resize flow | 21:01 |
artom | mriedem, I do | 21:01 |
mriedem | yeah, so docstrings ftw | 21:01 |
mriedem | -1 people that don't write code comments | 21:02 |
gregwork | melwit: from reading up on AZ's in openstack, these seem like a very heavy abstraction compared to a simple host aggregate to configure | 21:02 |
mriedem | be the change you want to see in the world... | 21:02 |
artom | But how do I become free limitless beer? | 21:02 |
artom | (Point taken, though) | 21:02 |
melwitt | gregwork: they're really just a tag on host aggregates. shouldn't be heavy. you put tag my_az on the host aggs you want to be part of it and then the user can say my_az | 21:03 |
mriedem | artom: so if you want to see some things in a contributor doc about resize vs cold migrate, throw those into a doc bug and maybe i can crank something out | 21:03 |
*** openstack has joined #openstack-nova | 21:16 | |
*** ChanServ sets mode: +o openstack | 21:16 | |
melwitt | gregwork: sorry that link was for admin user to be able to bypass the scheduler. this is the normal end user instructions for how to specify AZ https://docs.openstack.org/nova/latest/user/availability-zones.html | 21:16 |
artom | mriedem, I think that I'd really at this point is a sequence diagram for resize (to start with), like we have for live migration | 21:20 |
artom | Btw, stephenfin, who gave me https://review.opendev.org/#/c/662522/, I will forever remember that | 21:21 |
artom | My fault for opening my mouth and saying I have extra bandwidth, I suppose | 21:21 |
*** dave-mccowan has joined #openstack-nova | 21:21 | |
mriedem | i had an old todo somewhere to do a resize diagram like the live migrate one i put on a post-it and efried turned into that diagram | 21:21 |
mriedem | and then got the nickel 3 years later | 21:21 |
mriedem | artom: fwiw i do have a simple resize diagram in my last cells v2 summit presentation, you could lift the slides from that and turn it into what seqdiag stuff in sphinx | 21:22 |
mriedem | https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/summits/26/presentations/23287/slides/Whats-new-in-Nova-CellsV2-Denver-Summit-2019.pdf | 21:23 |
mriedem | slide 26 | 21:23 |
artom | mriedem, much thanks | 21:25 |
*** pcaruana has quit IRC | 21:25 | |
* artom needs to do the school run | 21:26 | |
gregwork | hmmn "Cannot update metadata of aggregate 11: Reason: One or more hosts already in availability zones [u'nova'] | 21:26 |
*** TxGirlGeek has quit IRC | 21:28 | |
mriedem | i've noticed that artom has the same disappearing pattern as bauzas | 21:29 |
mriedem | t0: complain! | 21:30 |
mriedem | t1: here you can do this | 21:30 |
mriedem | t2: thanks! but i've got to run | 21:30 |
gregwork | alright nm i think i got it .. | 21:31 |
*** TxGirlGeek has joined #openstack-nova | 21:32 | |
mriedem | gregwork: hosts can be in M aggregates but 1 AZ | 21:32 |
gregwork | yeah i had created an aggregate and already tagged it as nova | 21:32 |
gregwork | for the az | 21:32 |
gregwork | so adding an additional az was failing | 21:32 |
gregwork | now to figure out what the scheduler_hint is to specify the az in OS::Nova::Server | 21:33 |
gregwork | i dont think its group: | 21:33 |
mriedem | there is a big fat warning in the docs to not ever create an az literally called "nova" | 21:34 |
mriedem | b/c that's the default schedule zone for services, | 21:34 |
mriedem | so if you create an az that users boot into called nova they can be stuck and not migrate their servers (potentially) | 21:34 |
mriedem | we should probably just block that in the api, but no one has cared enough to yet | 21:35 |
gregwork | so apparently availability_zone is a property in os::nova::server and not a scheduler hint map | 21:36 |
mriedem | correct | 21:37 |
mriedem | but let's not bring heat into this please... | 21:37 |
gregwork | well im trying to understand how it works in nova so i can figure out how to solve this in heat | 21:38 |
mriedem | ok https://docs.openstack.org/api-ref/compute/?expanded=create-server-detail#create-server | 21:38 |
gregwork | os:scheduler_hints.stuff section is very very useful | 21:39 |
gregwork | thanks | 21:39 |
mriedem | yup - thank takashin for doing the thankless work of documenting a lot of that stuff | 21:40 |
mriedem | organizing it, etc | 21:40 |
* mriedem heads home, bbiab | 21:41 | |
*** mriedem has quit IRC | 21:41 | |
*** nweinber__ has quit IRC | 21:46 | |
*** ayoung has quit IRC | 21:47 | |
*** ayoung has joined #openstack-nova | 21:49 | |
efried | sean-k-mooney: what we were talking about earlier, I guess the first step to at least acknowledge the problem would be to beef up the help for https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.host | 21:53 |
efried | it already has a bullet list of "things this is used for" | 21:53 |
efried | Adding "external services looking up the compute node resource provider; due to bug #XXXXX you will break your world if you set this when using the libvirt driver, so don't." kind of thing. | 21:54 |
*** awalende has joined #openstack-nova | 21:55 | |
*** awalende has quit IRC | 22:00 | |
*** TxGirlGeek has quit IRC | 22:01 | |
*** TxGirlGeek has joined #openstack-nova | 22:04 | |
efried | sean-k-mooney: Potential remedy (though it would be kind of a long road) would be to expose the compute node UUID through that hypervisors API. | 22:05 |
efried | then we instruct consumers (neutron, cyborg) to use that rather than CONF.host to discover the compute node RP. | 22:05 |
efried | um | 22:05 |
*** abaindur has quit IRC | 22:05 | |
efried | for libvirt | 22:05 |
efried | This whole thing makes efried :( | 22:06 |
*** mriedem has joined #openstack-nova | 22:08 | |
efried | mriedem: you may have missed this earlier, but it turns out we broke CONF.host for libvirt. | 22:08 |
*** rcernin has joined #openstack-nova | 22:09 | |
*** dacbxyz has quit IRC | 22:09 | |
efried | Because external services (neutron (problem now), cyborg (problem soon)) assume CONF.host is the name of the compute node RP | 22:10 |
efried | ...which they need to know in order to hang their nested providers (neutron: bw; cyborg: accelerator) off of | 22:10 |
efried | but the compute node RP is actually hypervisor_hostname, which for libvirt is the `hostname()` of the system. | 22:11 |
efried | which is the default for CONF.host, so you're fine... unless you actually *set* that guy. | 22:12 |
efried | dansmith: ^ | 22:12 |
*** jbernard has quit IRC | 22:12 | |
mriedem | "we broke CONF.host"? | 22:13 |
mriedem | who is we and when? | 22:13 |
mriedem | if we = jay and when is ocata then... | 22:14 |
artom | mriedem, all part of my cunning plan | 22:14 |
efried | okay, "If you set set CONF.host to a non-default value, and you're using libvirt, bandwidth (and accelerator, and other future external-service-created nested) providers are broken" | 22:15 |
mriedem | oopsy daisy | 22:15 |
efried | sean-k-mooney: o hey, it looks like you were maybe getting the hypervisor ID as a short ID because you were using the default microversion. Per [1] it'll come back as a UUID starting at 2.53. | 22:16 |
efried | [1] https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-detail#id306 | 22:16 |
efried | which phew gives us a path forward without further changes. | 22:16 |
efried | I think | 22:16 |
* mriedem exhales | 22:17 | |
efried | Having to hit the /os-hypervisors API is heavier than just looking at CONF.host, but that UUID is the UUID of the provider you want. | 22:17 |
mriedem | yup | 22:17 |
efried | hm, but mriedem, how do I ask /os-hypervisors for the entry for my current node? | 22:18 |
efried | cause the qparams and output still talk in terms of hypervisor_hostname :( | 22:19 |
mriedem | GET /os-hypervisors/detail?hypervisor_hostname_pattern=<hostname> | 22:19 |
efried | right, exactly. | 22:19 |
efried | I mean, if I'm going to rely on running `gethostname()` on the host, I might as well just do that from the start. | 22:20 |
mriedem | if not that, then i guess GET /os-hypervisors/detail and you iterate all until you find the one with the service.host that matches what you care about if hypervisor_hostname is not a match | 22:20 |
mriedem | i think service.host there is CONF.host | 22:21 |
mriedem | someone had an RFE at one point to be able to filter hypervisors by service.host for ironic | 22:21 |
mriedem | it would be pretty easy to do i think | 22:21 |
mriedem | it's not clear to me if you need that though | 22:21 |
mriedem | you being neutron/cyborg | 22:22 |
efried | that would be a sure, but heavy, way to correlate CONF.host. | 22:22 |
efried | esp in a big env, that's a big payload | 22:22 |
mriedem | which thing? getting all hypervisors and then finding the service.host? | 22:22 |
mriedem | yeah it would, and likely limited to 1000 results by default so you'd have to page | 22:23 |
efried | yeah, GET /os-hypervisors/detail is going to be a big response for CERn. | 22:23 |
efried | use sdk and it pages for you, which is fin. | 22:23 |
efried | fine. | 22:23 |
mriedem | sure, so GET /os-hypervisors/detail?service_host=CONF.host | 22:23 |
mriedem | ^ is the RFE i mentioned | 22:23 |
efried | oh, that's a thing... | 22:23 |
*** kaisers1 has joined #openstack-nova | 22:23 | |
mriedem | ha, no | 22:23 |
mriedem | oh eric | 22:23 |
mriedem | https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-details-detail#list-hypervisors-details | 22:23 |
efried | yes, I'm looking at that. You're taking advantage of my weakened mental state to f with me. Is it April? | 22:24 |
mriedem | why is your mental state weak? | 22:24 |
*** kaisers has quit IRC | 22:24 | |
efried | is it ever not? | 22:25 |
mriedem | i can't find the bug, i could have sworn a guy opened one though | 22:25 |
mriedem | but that was the idea, he was trying to filter hypervisors by ironic compute service host but was only getting nodes based on the ironic node uuid which wasn't helpful | 22:25 |
efried | okay, anyway, yes, that would be a nice way to make this work that involves an API change with a microversion. | 22:25 |
efried | but for the sake of discussion... | 22:26 |
efried | would it be so wrong for neutron/cyborg to simply use the result of `gethostname()` instead of CONF.host? | 22:27 |
*** jbernard has joined #openstack-nova | 22:27 | |
efried | I guess that becomes coupled to the virt driver implementation. | 22:27 |
efried | though arguably using CONF.host at all already is. | 22:27 |
efried | in that it at least will never work for ironic | 22:28 |
*** ccamacho has quit IRC | 22:29 | |
mriedem | we don't care about ironic for nested providers | 22:29 |
mriedem | or most things | 22:29 |
efried | I can totally see needing to do... *something* with bandwidth for ironic. | 22:30 |
efried | Though I imagine the providers would be shared in that case, not nested. | 22:30 |
efried | nevertheless we would have to discover the ironic nodes for aggregation purposes. | 22:30 |
efried | anyway... | 22:30 |
mriedem | idk | 22:33 |
mriedem | filtering hypervisors by service host seems useful in general so if it could be used here then i don't see a reason not to add that earlier than later | 22:33 |
mriedem | always nice to get to N+3 release from now and be like, "we have this problem, oh but we added X in N so we an use that" | 22:33 |
efried | okay. Not backportable tho | 22:33 |
mriedem | nope | 22:34 |
efried | For backport purposes, I guess since the scope is known and constrained to libvirt, we could just ask neutron to use gethostname() instead of CONF.host. | 22:34 |
*** tosky has joined #openstack-nova | 22:35 | |
mriedem | again, idk... | 22:35 |
mriedem | does neutron have a concept of workarounds options? | 22:35 |
mriedem | sounds like either way the list of 'used as' here should be updated https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.host | 22:35 |
mriedem | or something mentioned about how this is linked across services for certain features | 22:35 |
mriedem | > does neutron have a concept of workarounds options? - meaning, could neutron be configured to say if it should use CONF.host or gethostname() | 22:36 |
mriedem | or the sake of linking nested providers i mean | 22:36 |
efried | yup. I actually mentioned that (the DEFAULT.host help text) while you were offline. | 22:36 |
mriedem | b/c that would be backportable | 22:36 |
efried | what are the rules about neutron and n-cpu versions on a given host? | 22:37 |
efried | are they allowed to differ? by how much? | 22:37 |
mriedem | i would guess (1) yes they should be able to differ and (2) assume N-1 | 22:37 |
mriedem | that's part of the idea behind passing os-vif negotiated objects around so you can do rolling upgrades of those | 22:37 |
mriedem | over the rest api i mean | 22:37 |
efried | which one gets to be -1? or are they both allowed to be? | 22:38 |
efried | ugh, am I making sense? | 22:38 |
mriedem | i know what you're asking, but i don't have a good answer | 22:38 |
efried | k | 22:38 |
efried | well | 22:38 |
mriedem | i doubt that level of upgrade granularity is documented or tested anywhere | 22:38 |
mriedem | nova and neutron and cinder and keystone etc should all be able to work with each other at wildly different versions but we don't test upgrades that way | 22:39 |
mriedem | not because we can't | 22:39 |
efried | any neutron that does bw as currently written knows that it will work properly with `gethostname()` | 22:39 |
efried | and | 22:39 |
efried | it can condition the "new" thing simply on whether nova is exposing the microversion providing the new os-hypervisors qparam | 22:39 |
efried | try: | 22:40 |
efried | new thing | 22:40 |
efried | except NoSuchMicroversion: | 22:40 |
efried | old thing | 22:40 |
efried | unless neutron does better discovery than that (which it should be able to, but that doesn't mean it does) | 22:40 |
efried | Okay, | 22:40 |
mriedem | i would probably start with just a simple workaround optoin in neutron which is backportable without the microversion stuff | 22:40 |
mriedem | and the way to deprecate the workaround optoin in neutron is the microversion in nova when it's available | 22:41 |
efried | why is a workaround necessary? | 22:41 |
mriedem | because neutron has to do a thing based on how nova CONF.host is set yeah? | 22:41 |
efried | no | 22:41 |
*** lennyb has joined #openstack-nova | 22:41 | |
efried | CONF.host is only ever right by chance | 22:41 |
efried | `gethostname()` is always right | 22:41 |
efried | ...for the cases it cares about in existing code, which is what we care about for backportability. | 22:41 |
efried | Anyway, since gibi and sean-k-mooney aren't here right now, and I don't know whether there's a bug yet, I guess I'll throw out a ML post summarizing the discussion and let it fester from there. | 22:42 |
mriedem | ok | 22:46 |
melwitt | speaking of downstream bugs, | 22:47 |
melwitt | we hit a problem downstream where compute node orphan removal was happening and destroyed the compute node record but failed to delete the RP bc keystone or placement was down, | 22:48 |
melwitt | and after that, nova-compute could not start up again bc it was trying to create compute node record and then failed with 409 dupe from placement when trying to create the same provider | 22:49 |
mriedem | unrelated, i just wanted to say this makes us look dumb https://docs.openstack.org/api-guide/compute/general_info.html#relationship-with-volume-api | 22:50 |
melwitt | searching for ResourceProviderCreationFailed led me to mriedem's patch https://review.opendev.org/#/c/678100/1/nova/compute/manager.py@8332 where he posed the question, should we swap the destroy and RP delete ordering, | 22:50 |
mriedem | and this https://docs.openstack.org/api-guide/compute/server_concepts.html#server-creation | 22:50 |
melwitt | and I think the answer is yes | 22:50 |
melwitt | that's odd. I'm not why someone added empty docs. and I hope it wasn't me | 22:51 |
melwitt | *not sure | 22:51 |
mriedem | the api-guide was imported i think so no i'm not saying you, i just was looking for some stuff and noticed these | 22:51 |
mriedem | these giant todo gaps in our user-facing docs are embarrassing | 22:52 |
mriedem | i'd rather we just delete them than leave them | 22:52 |
melwitt | yeah, I was gonna say, just remove em | 22:52 |
melwitt | that was a joke, sometimes I see a thing and be like, wtf who did this and find it was me | 22:52 |
mriedem | your provider issue is also because in queens we didn't link the ironic node id to the compute node uuid to the provider uuid, we started that in rocky | 22:54 |
mriedem | so your recourse in queens is deleting the old providers so compute on restart can re-create them | 22:54 |
mriedem | but you'll have to heal allocations | 22:54 |
mriedem | which isn't in quenes | 22:54 |
mriedem | *queens | 22:54 |
mriedem | there is a pretty beefy ML thread about all of this orphaned provider stuff months back | 22:55 |
mriedem | i've been slowly polishing these turds | 22:55 |
melwitt | yeah, sean-k-mooney mentioned that | 22:55 |
melwitt | the turd polishing | 22:55 |
mriedem | http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007135.html is the tl;dr of the first mega beef thread | 22:56 |
mriedem | http://lists.openstack.org/pipermail/openstack-discuss/2019-June/thread.html#7097 | 22:56 |
melwitt | thank you. my brain is like about to explode so tl;dr is majorly appreciated | 22:56 |
mriedem | http://lists.openstack.org/pipermail/openstack-discuss/2019-November/thread.html#10642 is the post-ptg summary | 22:56 |
melwitt | I'm adding a comment to your review just so.... it's there | 22:57 |
mriedem | yeah so related to this, | 22:57 |
mriedem | https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:stable/train+topic:bug/1852610 | 22:57 |
mriedem | and https://review.opendev.org/#/q/status:open+project:openstack/osc-placement+branch:master+topic:story/2006779 | 22:58 |
melwitt | so you're thinking a change of the ordering and backport to queens is not gonna be viable? I guess you're saying that can't happen in rocky. dansmith said the same thing and I had no understanding of how, but I trust it. the messed up thing is they're also seeing a ResourceProviderCreationFailed on an overcloud (not ironic!) BUT now that I'm thinking more, that must be the service deletion case yeah? | 22:59 |
melwitt | that you're solving in that patch | 22:59 |
mriedem | we've backported several pieces of this, and i have those train backports up for part of it as well | 22:59 |
mriedem | we didn't backport the ironic node uuid = compute node uuid = provider uuid thing to queens because the initial patch had caused some issues and other fallout that i recently fixed as well | 23:00 |
mriedem | so backporting that ironic / compute node uuid sync stuff to queens would involve a few patches | 23:00 |
melwitt | we've got customers hitting the ironic case in queens and a non-ironic case in queens. and the former is the orphan cleanup and the latter must be a service deletion issue, *maybe* | 23:00 |
mriedem | otherwise service delete orphan stuff is bugs and i've been writing these as backportable changes | 23:01 |
mriedem | bugs since....pike | 23:01 |
mriedem | i think | 23:01 |
mriedem | well ocata really | 23:01 |
melwitt | ok, so you're saying the strategy should be to backport the uuid matcher patch rather than a split out "change the order" patch | 23:01 |
melwitt | but would that not require some kinda migration actions for already existing compute node records? | 23:02 |
*** xek_ has quit IRC | 23:05 | |
mriedem | no i'm saying trying to backport the uuid matcher stuff is full of dragons | 23:05 |
melwitt | ok | 23:06 |
mriedem | it makes some things simpler though, e.g. your issue where compute failed to restart b/c it couldn't create a new provider with the same name | 23:06 |
melwitt | noted | 23:06 |
melwitt | ok, I see. yeah, I think sean-k-mooney mentioned that too but I didn't understand it at the time | 23:06 |
mriedem | if the uuids are all synced, compute restarts, creates the compute node with the same uuid and finds the provider already exists with that uuid | 23:06 |
mriedem | not so lucky with libvirt though | 23:07 |
mriedem | b/c the uuid will be unique per compute node record on the same host | 23:07 |
mriedem | granted our code that checks to see if the provider already exists could be smarter | 23:07 |
mriedem | around here https://opendev.org/openstack/nova/src/tag/19.0.0/nova/scheduler/client/report.py#L568 | 23:08 |
melwitt | ok, hm that last part is interesting because that sounds like our second case (non-ironic) | 23:08 |
mriedem | if we found a provider with the same name we should probably just use it | 23:08 |
melwitt | yeah. I think fixing this would kill two birds with one stone | 23:09 |
melwitt | ironic and non-ironic, unless I'm missing something | 23:09 |
melwitt | *fixing it in that way | 23:09 |
*** sapd1 has joined #openstack-nova | 23:09 | |
mriedem | we can also easily find a provider with the same name using GET /resource_providers?name=<name> | 23:09 |
mriedem | if we find out, use it | 23:09 |
mriedem | *one | 23:09 |
melwitt | aye | 23:09 |
*** dacbxyz has joined #openstack-nova | 23:09 | |
mriedem | maybe | 23:09 |
mriedem | i mean there could be dragons there too i'm not thinking about | 23:10 |
melwitt | that would be an extra call tho | 23:10 |
mriedem | only if you 409 | 23:10 |
melwitt | right | 23:10 |
melwitt | oh | 23:10 |
mriedem | because clearly we aren't getting this back when we hit this name_conflict = 'Conflicting resource provider name:' | 23:10 |
melwitt | I yeah | 23:10 |
mriedem | or maybe we are getting something like that but the message changed in placement, idk | 23:10 |
mriedem | see the todo from efried | 23:11 |
melwitt | ok | 23:11 |
melwitt | so maybe I can try a patch for this and get efried to look for dragons | 23:12 |
efried | (I haven't been following the conversation, and need to split real soon, but sure, add me to a patch) | 23:12 |
melwitt | because this looks like it could be a small backportable change that would save us in both the ironic and non-ironic cases | 23:12 |
*** awalende has joined #openstack-nova | 23:14 | |
melwitt | thanks for the help on this | 23:15 |
dansmith | I'm not sure how I feel about reusing the provider by name | 23:15 |
melwitt | ah dammit | 23:15 |
dansmith | we're kinda making that a primary key because the name has to be unique, | 23:15 |
dansmith | but if you think about compute nodes in split brain, | 23:15 |
dansmith | they're going to fight over the same provider | 23:15 |
dansmith | and with conductor groups in ironic, | 23:15 |
dansmith | or two ironics and separate computes, | 23:16 |
dansmith | if you ended up with two nodes of the same name, nova isn't going to know they're different and is going to fight over them | 23:16 |
dansmith | and by fight, I mean silently overwrite inventory | 23:16 |
dansmith | if that provider has allocations, things are going to get all messed (the eff) up I think | 23:16 |
*** dacbxyz has quit IRC | 23:16 | |
dansmith | moving the ironic node id to be the provider id is the right move (which we've already done) | 23:17 |
dansmith | so I dunno, maybe that means for <rocky, the name hack works, but... it's kinda contrary to the point of the provider id | 23:17 |
dansmith | melwitt: I think I hinted at this in my email on the internal thread about this | 23:17 |
melwitt | what's the right move in the non-ironic case then? | 23:17 |
*** slaweq has quit IRC | 23:17 | |
dansmith | for the service delete? | 23:18 |
dansmith | that's probably even worse really, | 23:18 |
melwitt | I assume it's service delete. I don't know for sure how they got into that state | 23:18 |
dansmith | because if you end up with two computes with the same name due to some dns or dhcp breakage, they'll take over each others' providers (and allocations) which would be helzabad | 23:18 |
melwitt | that thread spun off from the ironic bz | 23:18 |
*** awalende has quit IRC | 23:19 | |
dansmith | you will have other issues if you have a name clash, obviously, but if you mix allocations from two computes together, or have one go negative because it's a smaller compute, just... hard to debug and fix | 23:19 |
dansmith | so anyway, I dunno | 23:19 |
dansmith | not saying don't do it.. glad we don't have to do it on master, but I'm not super confident that it's a great idea for <rocky either | 23:20 |
melwitt | ok... sigh.. so with the moving node id to be provider id, would that not require a migration step if we were able to backport to queens? | 23:20 |
dansmith | that only affects ironic | 23:20 |
melwitt | yeah sorry. going back to the ironic thing again, to fix that one in that way, would it require a migration step? | 23:20 |
dansmith | as I said on that thread, you'd have to make sure all the computes rolled to that change atomically, and yeah I dunno how the allocations get or got moved with that when we transitioned, | 23:21 |
dansmith | but not reasonable for a backport either way | 23:21 |
melwitt | I can and will publish the workaround steps but given the number of customers hitting it, I dunno, thinking about trying the backport | 23:21 |
dansmith | reversing the order of provider delete works around the ironic issue doesn't it? | 23:21 |
melwitt | yeah it does | 23:22 |
dansmith | that's reasonable, backporting the node uuid thing is not reasonable I think | 23:22 |
melwitt | ok. got it | 23:22 |
melwitt | thanks | 23:22 |
melwitt | the non-ironic thing I think need more information because I don't know how it got into the state. and no idea how to workaround bc they can't migrate any instances away from it because all migrations fail with ResourceProviderCreationFailed | 23:24 |
melwitt | and there's no heal_allocations so they can't delete the allocations and RP and restore allocations | 23:24 |
dansmith | well, they can | 23:24 |
dansmith | using osc-placement | 23:25 |
dansmith | I mean, you can script that for them | 23:25 |
dansmith | or backport heal allocations, that's much less scary I think | 23:25 |
melwitt | yeah, that's what I'm thinking | 23:25 |
dansmith | heal allocations will just fix it so you can delete everything, let it create and then heal right? | 23:25 |
melwitt | osc-placement I didn't see a way to update a RP with a different uuid | 23:25 |
dansmith | you mean the nova-manage healer thing right? | 23:25 |
melwitt | I do yeah | 23:26 |
dansmith | i.e. the sunday morning public access TV version of nova.. BAH HALED! | 23:26 |
melwitt | lol | 23:26 |
dansmith | melwitt: I mean with osc-placement you can create and delete allocations, IIRC, so you can save them off, then re-add them after it creates the provider afresh | 23:26 |
melwitt | I wish the command were healer, that would be more fun | 23:27 |
melwitt | dansmith: oh geez. yes, that's true | 23:27 |
melwitt | I didn't even think about that, guh | 23:27 |
dansmith | https://makeameme.org/meme/come-to-me-551mb7 | 23:27 |
melwitt | lol oh I wish we could put that on the docs page for it | 23:28 |
efried | mriedem, sean-k-mooney, gibi: http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.html | 23:28 |
efried | and I'm out | 23:28 |
melwitt | yeah, I'm gonna just backport heal_allocations instead because I'm just imagining support doing that on a one-by-one instance basis. oof | 23:29 |
melwitt | er, one by one RP | 23:29 |
dansmith | melwitt: you could, oh I dunno, write them a script :) | 23:30 |
dansmith | but whatever :) | 23:30 |
melwitt | oh | 23:30 |
dansmith | osc is like lovin up on some scriptability amirite? | 23:30 |
melwitt | lol | 23:30 |
melwitt | sure. | 23:30 |
dansmith | anyway, I'm very happy to leave that decision and work to you | 23:31 |
dansmith | just sayin' | 23:31 |
melwitt | don't worry, they're all assigned to e | 23:31 |
melwitt | *me | 23:31 |
melwitt | unfortunately for them | 23:32 |
dansmith | praise be | 23:32 |
*** slaweq has joined #openstack-nova | 23:35 | |
*** slaweq has quit IRC | 23:40 | |
*** slaweq has joined #openstack-nova | 23:44 | |
mriedem | i haven't read all the way back, but regarding "if you ended up with two nodes of the same name, nova isn't going to know they're different and is going to fight over them" - with ironic that's not possible (since rocky) since there is a unique index on the compute_nodes.uuid | 23:47 |
*** tbachman has quit IRC | 23:48 | |
*** slaweq has quit IRC | 23:48 | |
*** mkrai has quit IRC | 23:48 | |
melwitt | yeah we were talking queens and non-ironic as well | 23:49 |
melwitt | I was trying to get together a game plan for each case | 23:50 |
mriedem | well, just give this to whoever? https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html | 23:51 |
mriedem | i wrote that b/c of all of this | 23:51 |
melwitt | we don't have heal_allocations yet. but I'm gonna backport it after this convo | 23:51 |
mriedem | help with my service delete backports | 23:51 |
melwitt | in queens | 23:51 |
mriedem | and then review my wip thing which is the last piece | 23:51 |
eandersson | Awesome | 23:51 |
mriedem | eandersson: i think she's talking about internal to rhosp | 23:52 |
mriedem | so you're sol | 23:52 |
eandersson | awh | 23:52 |
melwitt | yeah, I have to get these things unwedged first | 23:52 |
mriedem | sylvain has this audit thing he's been working on for about 3 years as well | 23:52 |
melwitt | and I don't know for sure whether service delete caused the issue, but I think there's a fair chance | 23:53 |
mriedem | between contract negotiations and skiing | 23:53 |
*** mdbooth has quit IRC | 23:53 | |
eandersson | tbh we are almost always backporting these things ourselfs, but much better to have it officially backported :p | 23:53 |
mriedem | eandersson: you know you could propose backports *upstream* | 23:53 |
eandersson | So much work to do, so little time :D | 23:53 |
mriedem | or at least be like, "can you guys backport x because that would be gr8 lol" | 23:54 |
eandersson | I am still top1 of lines contributed for U :D | 23:54 |
melwitt | yeah, just hang on, I got involved with these bugs within the last couple of weeks. I didn't understand them before | 23:54 |
mriedem | if operators speak up about needing shit we usually jump on it a bit faster | 23:54 |
mriedem | did you know belmiro has a dedicated line to dansmith's office? | 23:54 |
eandersson | I tend to backport things, but a lot of the nova backports are high complexity due to the number of changes between master and stable branches. | 23:55 |
*** mdbooth has joined #openstack-nova | 23:55 | |
eandersson | So a lot of my effort goes into projects I already understand (e.g. designate, senlin etc). | 23:55 |
eandersson | But I do try to report them here =] | 23:56 |
melwitt | all of the compute/service/host/node/RP/allocation intermingling stuff has not been my area of expertise | 23:56 |
mriedem | it's confusing | 23:56 |
melwitt | now that I have some idea wtf is going on, sure I will help with the backports and all that | 23:57 |
eandersson | Sometimes I wish I wasn't a manager. Would give me more time behind the keyboard. =] | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!