Thursday, 2019-11-21

*** zhanglong has joined #openstack-nova00:00
*** zhanglong has quit IRC00:04
*** mriedem has quit IRC00:06
openstackgerritDustin Cowles proposed openstack/nova-specs master: Update provider config spec for identification conflicts
*** zhanglong has joined #openstack-nova00:06
*** ociuhandu has joined #openstack-nova00:07
*** slaweq has joined #openstack-nova00:10
*** macz has quit IRC00:11
*** ociuhandu has quit IRC00:11
*** slaweq has quit IRC00:15
*** ociuhandu has joined #openstack-nova00:18
*** slaweq has joined #openstack-nova00:19
*** tosky has quit IRC00:22
*** slaweq has quit IRC00:24
*** ociuhandu has quit IRC00:32
*** JamesBenson has joined #openstack-nova00:33
*** slaweq has joined #openstack-nova00:37
*** JamesBenson has quit IRC00:37
*** sapd1 has joined #openstack-nova00:39
*** slaweq has quit IRC00:42
openstackgerritMerged openstack/nova stable/train: Don't delete compute node, when deleting service other than nova-compute
*** slaweq has joined #openstack-nova00:44
*** slaweq has quit IRC00:48
*** slaweq has joined #openstack-nova00:51
*** slaweq has quit IRC00:55
*** slaweq has joined #openstack-nova00:57
*** brault has quit IRC00:58
*** Liang__ has joined #openstack-nova00:58
*** brault has joined #openstack-nova00:59
*** zhanglong has quit IRC01:00
*** slaweq has quit IRC01:01
*** slaweq has joined #openstack-nova01:03
*** ociuhandu has joined #openstack-nova01:04
*** sapd1 has quit IRC01:06
*** nanzha has joined #openstack-nova01:07
*** ociuhandu has quit IRC01:08
openstackgerritMatt Riedemann proposed openstack/nova master: Handle target host cross-cell cold migration in conductor
openstackgerritMatt Riedemann proposed openstack/nova master: Validate image/create during cross-cell resize functional testing
openstackgerritMatt Riedemann proposed openstack/nova master: Add zones wrinkle to TestMultiCellMigrate
openstackgerritMatt Riedemann proposed openstack/nova master: Add negative test for cross-cell finish_resize failing
openstackgerritMatt Riedemann proposed openstack/nova master: Add negative test for prep_snapshot_based_resize_at_source failing
openstackgerritMatt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize_at_source compute method
openstackgerritMatt Riedemann proposed openstack/nova master: Add ConfirmResizeTask
openstackgerritMatt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize conductor RPC method
openstackgerritMatt Riedemann proposed openstack/nova master: Confirm cross-cell resize from the API
openstackgerritMatt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize_at_dest compute method
openstackgerritMatt Riedemann proposed openstack/nova master: Deal with cross-cell resize in _remove_deleted_instances_allocations
openstackgerritMatt Riedemann proposed openstack/nova master: Add finish_revert_snapshot_based_resize_at_source compute method
openstackgerritMatt Riedemann proposed openstack/nova master: Add RevertResizeTask
openstackgerritMatt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method
openstackgerritMatt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API
openstackgerritMatt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server
openstackgerritMatt Riedemann proposed openstack/nova master: Add archive_deleted_rows wrinkle to cross-cell functional test
openstackgerritMatt Riedemann proposed openstack/nova master: Add CrossCellWeigher
*** slaweq has quit IRC01:13
*** zhanglong has joined #openstack-nova01:15
gmannefried: any idea or have you seen 'openstack:' as resource provider name in
gmannthis is grenade job on octavia which is failing due to that when moving to py3.01:16
*** slaweq has joined #openstack-nova01:17
*** nanzha has quit IRC01:20
*** slaweq has quit IRC01:21
*** awalende has joined #openstack-nova01:22
*** nanzha has joined #openstack-nova01:24
*** awalende has quit IRC01:27
*** nanzha has quit IRC01:30
*** nanzha has joined #openstack-nova01:30
*** zhanglong has quit IRC01:34
*** dave-mccowan has joined #openstack-nova01:35
*** zhanglong has joined #openstack-nova01:38
*** dave-mccowan has quit IRC01:42
*** yedongcan has joined #openstack-nova01:42
*** tetsuro_ has joined #openstack-nova01:50
*** slaweq has joined #openstack-nova01:51
*** gyee has quit IRC01:52
*** mdbooth has quit IRC01:52
*** tetsuro has quit IRC01:52
*** mdbooth has joined #openstack-nova01:54
*** ricolin has joined #openstack-nova01:55
*** slaweq has quit IRC01:55
*** slaweq has joined #openstack-nova01:58
*** larainema has joined #openstack-nova01:59
*** ericin has joined #openstack-nova02:05
*** TxGirlGeek has quit IRC02:06
*** slaweq has quit IRC02:10
*** ociuhandu has joined #openstack-nova02:11
*** ociuhandu has quit IRC02:16
*** slaweq has joined #openstack-nova02:23
*** brinzhang has joined #openstack-nova02:29
*** slaweq has quit IRC02:30
*** slaweq has joined #openstack-nova02:31
*** brinzhang_ has joined #openstack-nova02:32
*** slaweq has quit IRC02:35
*** brinzhang has quit IRC02:36
*** brinzhang has joined #openstack-nova02:37
*** brinzhang_ has quit IRC02:37
*** slaweq has joined #openstack-nova02:48
*** macz has joined #openstack-nova02:50
*** slaweq has quit IRC02:52
*** brinzhang has quit IRC03:01
*** brinzhang has joined #openstack-nova03:02
*** ericleiin has joined #openstack-nova03:09
*** ericin has quit IRC03:13
*** abaindur has quit IRC03:14
*** ociuhandu has joined #openstack-nova03:28
*** macz has quit IRC03:32
*** macz has joined #openstack-nova03:33
*** ociuhandu has quit IRC03:33
*** macz has quit IRC03:37
*** brinzhang_ has joined #openstack-nova03:39
*** awalende has joined #openstack-nova03:40
*** brinzhang has quit IRC03:42
*** awalende has quit IRC03:44
*** ericleiin has quit IRC03:44
*** ericleiin has joined #openstack-nova03:45
*** zhanglong has quit IRC03:50
*** tonyb[m] has joined #openstack-nova03:50
*** udesale has joined #openstack-nova03:53
*** brinzhang has joined #openstack-nova03:54
*** brinzhang_ has quit IRC03:56
*** mkrai has joined #openstack-nova03:57
*** bhagyashris has joined #openstack-nova04:05
*** mkrai has quit IRC04:10
*** mkrai has joined #openstack-nova04:14
*** brinzhang_ has joined #openstack-nova04:19
*** brinzhang has quit IRC04:22
*** ericlei_ has joined #openstack-nova04:33
*** ericleiin has quit IRC04:35
*** bhagyashris has quit IRC04:39
*** factor has joined #openstack-nova04:52
*** brinzhang has joined #openstack-nova04:53
*** brinzhang_ has quit IRC04:56
*** bhagyashris has joined #openstack-nova05:09
*** igordc has quit IRC05:11
*** brinzhang_ has joined #openstack-nova05:13
*** ratailor has joined #openstack-nova05:16
*** brinzhang has quit IRC05:17
*** ericleiin has joined #openstack-nova05:22
openstackgerritMerged openstack/nova stable/pike: Only nil az during shelve offload
*** ericlei_ has quit IRC05:24
*** artom has quit IRC05:27
*** ociuhandu has joined #openstack-nova05:30
*** ociuhandu has quit IRC05:35
*** ericlei_ has joined #openstack-nova05:36
*** brinzhang has joined #openstack-nova05:38
*** ericleiin has quit IRC05:39
*** brinzhang_ has quit IRC05:42
*** zhanglong has joined #openstack-nova05:44
*** yaawang has quit IRC05:50
*** yaawang has joined #openstack-nova05:51
*** Luzi has joined #openstack-nova05:58
*** awalende has joined #openstack-nova05:59
*** brinzhang_ has joined #openstack-nova05:59
*** brinzhang has quit IRC06:02
*** awalende has quit IRC06:03
*** udesale has quit IRC06:15
*** udesale has joined #openstack-nova06:16
*** lpetrut has quit IRC06:21
*** macz has joined #openstack-nova06:22
*** ociuhandu has joined #openstack-nova06:25
*** macz has quit IRC06:27
openstackgerritOpenStack Proposal Bot proposed openstack/nova master: Imported Translations from Zanata
*** slaweq has joined #openstack-nova06:39
*** udesale has quit IRC06:39
*** udesale has joined #openstack-nova06:39
*** dpawlik has joined #openstack-nova06:41
*** slaweq has quit IRC06:44
*** ociuhandu has quit IRC06:46
*** brault has quit IRC06:48
*** factor has quit IRC07:01
*** ociuhandu has joined #openstack-nova07:05
*** brinzhang has joined #openstack-nova07:07
*** xek_ has joined #openstack-nova07:10
*** brinzhang_ has quit IRC07:11
*** rcernin has quit IRC07:12
*** abaindur has joined #openstack-nova07:22
*** brinzhang_ has joined #openstack-nova07:22
*** jawad_axd has joined #openstack-nova07:23
*** ericleiin has joined #openstack-nova07:24
*** brinzhang has quit IRC07:25
*** pcaruana has joined #openstack-nova07:27
*** ericlei_ has quit IRC07:28
*** tosky has joined #openstack-nova07:31
*** brinzhang has joined #openstack-nova07:33
*** abaindur has quit IRC07:35
*** abaindur has joined #openstack-nova07:35
*** brinzhang_ has quit IRC07:36
*** ociuhandu has quit IRC07:39
*** brault has joined #openstack-nova07:44
*** damien_r has joined #openstack-nova07:52
*** bhagyashris has quit IRC07:55
*** ericlei_ has joined #openstack-nova07:56
*** ericleiin has quit IRC07:58
*** ociuhandu has joined #openstack-nova08:00
*** brault has quit IRC08:01
*** nanzha has quit IRC08:03
*** ociuhandu has quit IRC08:05
*** nanzha has joined #openstack-nova08:06
*** brinzhang_ has joined #openstack-nova08:08
bauzasgood morning Nova08:10
*** brinzhang has quit IRC08:10
*** ericlei_ has quit IRC08:11
*** awalende has joined #openstack-nova08:15
*** awalende has quit IRC08:15
gibibauzas: good morning08:16
*** awalende has joined #openstack-nova08:19
*** rpittau|afk is now known as rpittau08:20
*** tkajinam has quit IRC08:28
*** bhagyashris has joined #openstack-nova08:41
*** ociuhandu has joined #openstack-nova08:44
*** slaweq has joined #openstack-nova08:44
*** ralonsoh has joined #openstack-nova08:45
*** ociuhandu has quit IRC08:46
*** ociuhandu has joined #openstack-nova08:47
*** ociuhandu has quit IRC08:50 proposed openstack/nova master: Handle instance crash event in libvirt driver
*** ociuhandu has joined #openstack-nova08:50
*** mkrai has quit IRC08:52
*** mkrai has joined #openstack-nova08:53
*** ociuhandu has quit IRC08:57
*** sridharg has joined #openstack-nova09:01
*** maciejjozefczyk has joined #openstack-nova09:01
*** priteau has joined #openstack-nova09:03
*** maciejjozefczyk has quit IRC09:14
*** ricolin has quit IRC09:22
*** tssurya has joined #openstack-nova09:24
openstackgerritStephen Finucane proposed openstack/nova master: docs: Rewrite quotas documentation
stephenfinbauzas: Morning. Any chance you could look at one or two of the "die, nova-network, die" patches I have up today? Like, say, this really easy one ?09:26
bauzaslol ok09:26
* stephenfin has logged one #success already this week (for DevStack -> Python 3). Would be nice to log another one09:27
openstackgerritTushar Patil proposed openstack/nova-specs master: Allow compute nodes to use DISK_GB from shared storage RP
*** dpawlik has quit IRC09:30
openstackgerritStephen Finucane proposed openstack/nova master: zuul: Remove unnecessary 'USE_PYTHON3'
*** shilpasd has joined #openstack-nova09:34
*** Liang__ has quit IRC09:34
*** dpawlik has joined #openstack-nova09:35
*** abaindur has quit IRC09:45
openstackgerritSylvain Bauza proposed openstack/nova stable/stein: Don't delete compute node, when deleting service other than nova-compute
*** brinzhang has joined #openstack-nova09:45
*** abaindur has joined #openstack-nova09:45
*** brinzhang_ has quit IRC09:48
openstackgerritSylvain Bauza proposed openstack/nova stable/rocky: Don't delete compute node, when deleting service other than nova-compute
*** martinkennelly has joined #openstack-nova09:49
*** brinzhang_ has joined #openstack-nova09:50
openstackgerritSylvain Bauza proposed openstack/nova stable/queens: Don't delete compute node, when deleting service other than nova-compute
*** derekh has joined #openstack-nova09:53
*** brinzhang has quit IRC09:53
*** ociuhandu has joined #openstack-nova09:54
*** ociuhandu has quit IRC09:57
*** ociuhandu has joined #openstack-nova09:58
*** mkrai has quit IRC10:01
*** mkrai has joined #openstack-nova10:02
*** ociuhandu has quit IRC10:03
*** derekh has quit IRC10:04
*** liuyulong has quit IRC10:06
*** priteau has quit IRC10:07
*** brinzhang_ has quit IRC10:07
*** derekh has joined #openstack-nova10:09
*** priteau has joined #openstack-nova10:12
*** macz has joined #openstack-nova10:30
yaawangmdbooth: Hello, can you review the spec again, I'd updated it :)
*** macz has quit IRC10:35
*** dtantsur|afk is now known as dtantsur10:49
kashyapyaawang: Hi, on the "no_performance_impact" name -- I still find it too broad and sweeping10:52
kashyapyaawang: Thanks for addressing my feedback, though!10:52
*** derekh has quit IRC10:52
kashyapI agree, we don't want to expose "hypervisor"-specific features.  (BTW, we are abusing the term "hyerpvisor" here to include QEMU; the term normally includes KVM/Kernel area only.)10:53
kashyapMaybe "no_migration_perf_impact"10:53
*** damien_r has quit IRC10:53
*** damien_r has joined #openstack-nova10:54
kashyapBut just plain "no_performance_impact" is just _awful_ name (Cc: mdbooth.)  At least make it: "no_migration_perf_impact"10:57
yaawangkashyap: Agree, "no_performance_impact" sounds include live-migrate and migrate. How about "no_perf_impact_live_migration"?11:10
*** rpittau is now known as rpittau|bbl11:11
kashyapyaawang: Yeah, I thought of including "live" as well; almost good - but reverse it: "no_live_emigration_perf_impact"11:12
kashyapTypo: s/emigration/migration/11:12
*** dpawlik has quit IRC11:13
kashyapyaawang: Commented on the change.  I'd say, let's go with the above ("no_live_migration_perf_impact").11:14
*** udesale has quit IRC11:14
yaawangkashyap: Thanks, need mdbooth to post his comment.11:17
*** ociuhandu has joined #openstack-nova11:18
*** mlycka has joined #openstack-nova11:19
*** ociuhandu has quit IRC11:23
*** tbachman has quit IRC11:37
*** zhanglong has quit IRC11:39
*** yedongcan has left #openstack-nova11:43
*** dpawlik has joined #openstack-nova11:48
*** dpawlik has quit IRC11:52
openstackgerritBalazs Gibizer proposed openstack/nova stable/pike: Explicitly fail if trying to attach SR-IOV port
*** ociuhandu has joined #openstack-nova12:05
*** ociuhandu has quit IRC12:11
*** derekh has joined #openstack-nova12:12
gibibauzas: you can send this to the age too
*** mriedem has joined #openstack-nova12:21
*** ricolin has joined #openstack-nova12:27
*** tetsuro_ has quit IRC12:29
openstackgerritWei Hui proposed openstack/nova master: bugfix device_type=type-PCI passthrough failed
*** ratailor has quit IRC12:36
gibimriedem, stephenfin: thanks for the laugh
*** slaweq has quit IRC12:39
gibistephenfin: btw you have now a lot of +2s on your nova-net removal series :)12:41
efriedgmann: looking...12:42
*** dpawlik has joined #openstack-nova12:48
mriedemgibi: heh, anytime12:49
efriedgmann: that log file isn't loading up for me :(12:51
efriedIt would be weird for a provider managed by nova to have 'openstack:' in its name. If it was made by some other service, maybe, but I've never heard of it.12:51
openstackgerritSurya Seetharaman proposed openstack/nova master: Include removal of ephemeral backing files in the image cache manager
johnthetubaguystephenfin: I just had a look at the network API removal... are we not adding a microversion to signal when we removed the API, I think that would be more consistent with our rules?12:58
johnthetubaguyI am guessing I missed about a year of conversation, so probably missed the reasoning12:58
mriedemfun gate bug i finally wrote up after rechecking several times
openstackLaunchpad bug 1853453 in tempest "test_shelve_volume_backed_instance intermittently fails guest ssh with dhcp lease fail" [Undecided,New]13:00
mriedemlooks like this only hits in multinode jobs,13:00
mriedemand looking at a recent failure, we shelve from the primary node and unshelve on the subnode13:00
mriedemi wonder if the snapshot is no good in some cases and unshelving on another node causes some kind of issue13:00
*** artom has joined #openstack-nova13:00
*** nweinber__ has joined #openstack-nova13:02
*** nweinber__ has quit IRC13:05
*** nweinber__ has joined #openstack-nova13:06
mriedemoh nvm there is no snapshot, it's volume-backed13:06
gibiefried: do you know if provider_tree in the report client will contain RPs associated to the compute RP via aggregate? this comment states it does
*** bhagyashris has quit IRC13:12
gibiefried: but I did not find in the impl where we query for those RPs from placement13:12
efriedgibi: it should, yes; the code has been written that way for a while, but we've never had a real scenario that uses it.13:12
*** rpittau|bbl is now known as rpittau13:12
efriedhold on, let me find you the code (It's somewhere under _refresh_associations...)13:13
efriedah, it looks like we specifically target only sharing providers (we specifically filter on MISC_SHARES)13:13
efriedwhich makes sense13:13
efriedwe wouldn't want to just arbitrarily pick up anything in an aggregate.13:14
gibiefried: cool, MISC_SHARES... is what I'm looking for. Do you have a link?13:14
efriedgibi: ish13:15
gibiefried: thanks a lot!13:15
gibishilpasd: ^^13:15
bauzasjohnthetubaguy: we deprecated the APIs so AFAIK we don't need a microversion13:15
efriedgibi: that comment you pointed out should really say *sharing* providers.13:15
gibiI can fix that quickly..13:16
efriedbauzas: I see johnthetubaguy's point, though. If you come into one cloud and try to use nova-net at microversion 2.58 and it works, and then you come into a different cloud and it fails at the same microversion...13:16
bauzasefried: sure, it's an interop issue13:16
johnthetubaguyefried: I am more thinking, the SDK and CLI should be able to know when its missing, like with all our other APIs13:17
bauzasif you use nova-network since 2.58, you are a bit having problems13:17
johnthetubaguyto be clear, I think we should remove it, and it should return 21013:17
efriedEither way, I think that ship has sailed, cause we've already ripped a bunch of stuff out. Pretty sure the point of no return was in train, too.13:17
bauzasalso, if you use OSC, you won't see the APIs be deleted13:17
*** slaweq has joined #openstack-nova13:17
johnthetubaguyI just think we should have a microversion that tells a user, if that is here, you know those APIs are gone13:17
johnthetubaguywe can add it as the last patch in the series, and I think that kinda works, its just a handy hit13:18
*** tbachman has joined #openstack-nova13:18
johnthetubaguyit makes the docs easier though, if microversion X is available, you know this API will always return HTTP gone13:18
* johnthetubaguy end rant13:18
zigobauzas: Hi there! I'm trying to start an instance with a GPU, and I get this:13:19
zigo$ openstack server create --image bionic-server-cloudimg-amd64_20190726_GPU --nic net-id=bdb-blue-int01 --key-name yubikey-zigo --flavor cpu4-ram12-disk20-gpu-nvidia-p1000 --availability-zone=AZ2 zigo-gpu13:19
zigoPCI alias nvidia-p1000 is not defined (HTTP 400) (Request-ID: req-b0514752-9e50-4e9b-a085-6ef9169b1d59)13:19
zigobauzas: Though I do have this in nova.conf:13:19
zigoWhat am I missing?13:19
openstackgerritElod Illes proposed openstack/nova stable/pike: Explicitly fail if trying to attach SR-IOV port
bauzasjohnthetubaguy: well, we could indeed13:19
bauzasonce all the APIs are giving HTTP41013:20
johnthetubaguybauzas: it is a nice to have, really its the docs that worried me13:20
bauzasstephenfin: ^13:20
johnthetubaguyzigo: is that in the API and Compute nova.conf?13:20
zigojohnthetubaguy: Yeah...13:20
zigojohnthetubaguy: It's ok to have multiple times:13:21
johnthetubaguyhmm, I think that is how I screwed it once13:21
zigoCause I have 2 different boards in this cloud ...13:21
zigojohnthetubaguy: You mean it should be only in the compute?13:21
johnthetubaguyno, sorry, I was meaning it needs to be everyone, almost13:22
gmannjohnthetubaguy: bauzas efried : and that microversion is just to notify the users that nova-net APIs (including url) are gone and not maintaining those API for older microversion right ?13:22
johnthetubaguygmann: +113:22
bauzasgmann: yup, just a signal13:22
gmannok. that make sense.13:22
gibiworks for me13:23
bauzaszigo: not sure I understand13:23
bauzaszigo: you need to set the alias value *once*13:23
zigobauzas: Yeah, but I have one Nvidia p1000 and one t4 (on 2 different compute nodes).13:24
efriedjohnthetubaguy: I think the patch to make those 410s already flew...13:24
efriedjohnthetubaguy: okay, patches plural. And yes, they merged in train.13:24
zigoTherefore, I have:13:24
zigoIs this correct?13:24
efriedjohnthetubaguy: so really we could do that microversion any time?13:25
efriedand the rest is just cleanup?13:25
johnthetubaguyefried: yes, expect for the docs needing to be updated to reflect how you find out when its gone13:25
efriedokay. Pretty sure the docs were the first thing stephenfin hit -- but yeah, they wouldn't mention a microversion cutover if we didn't do that.13:26
johnthetubaguyefried: we can go back an add that for sure13:26
efriedSo you're looking for a patch that creates a new "signal microversion" and updates the docs accordingly.13:26
efriedthat makes sense ++13:26
johnthetubaguyyeah, thanks, that clears it up in my head too13:27
efriedyou can still have the interop snafu I mentioned earlier. But at least in that scenario your "signal" is the 410.13:27
*** nanzha has quit IRC13:27
sean-k-mooneyzigo: the nviad-t4 apparently supprot sriov so you have to set teh device_type to type-PF13:28
johnthetubaguyefried: yeah, its that you could have known better13:28
sean-k-mooneystephenfin: ^ that is the thin you were writing the docs patch for right13:28
openstackgerritBalazs Gibizer proposed openstack/nova master: Specify what RPs _ensure_resource_provider collects
zigosean-k-mooney: The issue is with my p1000, the t4 looks like working ...13:29
zigoBut thanks, I'll try.13:29
gibiefried: fixed up the comment in _ensure_resource_provider_collects
sean-k-mooneyzigo: what is the issue you are having specificly13:29
* gibi needs to go afk for a while13:29
openstackgerritMatt Riedemann proposed openstack/nova master: Force config drive in nova-next multinode job
zigosean-k-mooney: PCI alias nvidia-p1000 is not defined (HTTP 400)13:30
zigo(when trying to spawn my instance)13:30
gmannefried: but from nova ussuri onwards, any microversion will be no-nova-net-api for all cloud so interop things needs to be taken care by discoverability of new microversion we will introduce instead of code handle that.13:30
sean-k-mooneyhave you set the alias on both the compute nodes and the controler nodes13:30
zigosean-k-mooney: Yeah, I did that ...13:30
zigoI did with puppet, so normally, it will have restart all nova services.13:30
sean-k-mooneyand you have it set in the [pci] section not default13:31
gmannefried: RE: provider as 'openstack:'  these are log it return from 60_nova/ for VCPU etc:
gmannand that is heppning when octavia grenade job is moving to py3. i ma not sure how it has to do with py3 htings.13:32
zigosean-k-mooney: That's all I did yes...13:32
*** nanzha has joined #openstack-nova13:33
gmannmriedem: have you seen this before in any grenade job -
zigoOh, another thing which is very annoying, I keep having in my logs:13:33
zigoInstance 2aab6469-4292-4d04-80de-2ae2a7174b3a has been moved to another host There are allocations remaining against the source host that might need to be removed: {'resources': {'DISK_GB': 80, 'MEMORY_MB': 24576, 'VCPU': 8}}.13:33
zigoMany of this ...13:34
zigoIs this a known issue with Rocky?13:34
sean-k-mooneythat usually means you have not compuleted migrations using resize-verify13:34
zigosean-k-mooney: It's mostly all live migrations.13:34
efriedgmann: those CI results are still not loading up for me :(13:34
zigosean-k-mooney: IMO, the only thing that remains is the placement record...13:35
efriedgmann: if you have them open, maybe you could pastebin the relevant chunk?13:35
zigoI could easily write a clean-up script I suppose.13:35
sean-k-mooneyefried: loaded for me fine13:35
sean-k-mooneyzigo: it should be updated automatically13:36
*** mmethot has quit IRC13:36
efriedgmann: ah, it's working now.13:36
mriedemgmann: the broken pipe? no13:36
*** udesale has joined #openstack-nova13:37
gmannit return 'openstack:' as provider name from 60_nova/
mriedemthere is a broken pipe right before that13:39
mriedem2019-11-20 17:52:23.572 | +++ /opt/stack/new/grenade/projects/60_nova/ :   head -n1 2019-11-20 17:52:23.573 | +++ /opt/stack/new/grenade/projects/60_nova/ :   openstack resource provider list -f value 2019-11-20 17:52:23.573 | +++ /opt/stack/new/grenade/projects/60_nova/ :   cut -d ' ' -f 1 2019-11-20 17:52:24.613 | Exc13:39
mriedemon raised: [Errno 32] Broken pipe13:39
mriedemso parsing the output is failing13:39
mriedemopenstack resource provider list -f value13:40
efriedgmann: yeah, this is gonna have nothing to do with a provider named 'openstack:'. It looks to me like we're parsing error output from the openstack command (which would start with 'openstack: $something_went_wrong')13:40
mriedemprovider=$(openstack resource provider list -f value | head -n1 | cut -d ' ' -f 1)13:40
mriedemthat's the command that's fialing13:40
mriedemwell, parsing that's failing13:41
mriedemif there was just one provider we could do:13:41
mriedemprovider=$(openstack resource provider list -f value -c uuid)13:41
mriedembut if it's a multinode grenade job then there will be more than one and that doesn't work13:41
efriedmriedem: well, we should do that anyway, and head -n1 it13:42
efriedi.e. don't do the cut13:42
efriednot that that would help here, because clearly the command is failing.13:42
efriedBut why is the error output going to stdout rather than stderr?13:42
mriedemof that provider list command allowed passing a --name for filtering, we could pass the local fqdn to get 1 result back...13:43
efriedWhole point of stderr is so exactly this doesn't happen, and you can see what actually went wrong.13:43
zigomriedem: A much nicer way using : provider=$(openstack resource provider list --format csv | q -H -d, "SELECT uuid FROM - LIMIT 1")13:43
efriedmriedem: to that point, it looks like we don't care *which* provider we're grabbing? That seems... weird.13:43
zigo(q-text-as-data is such a nice tool...)13:43
mriedemefried: we don't,13:43
mriedemit's a smoke test to make sure that we can save off some inventory before upgrading and that after the upgrade it's still there13:44
gmannpick up anything should be fine13:44
zigomriedem: Then you could do: provider=$(openstack resource provider list --format csv | q -H -d, "SELECT uuid FROM - WHERE name='something-you-want'")13:45
efriedanyway, the problem here seems to be that the openstack command -- the first thing in the pipe -- is failing, printing its error to stdout.13:45
mriedemzigo: or i could just do provider=$(openstack resource provider list -f value -c uuid --name `hostname -f`)13:47
mriedemand not rely on pipes and other tooling13:47
mriedembut that provider list command doesn't support --name (yet - that's easy to add)13:47
gmannAPI has the name filter ?13:48
efriedone way to get output like 'openstack: $stuff' is if the openstack command doesn't exist. But that output goes to stderr like it should.13:49
*** haleyb has joined #openstack-nova13:49
sean-k-mooneygmann: its how neutron identifies the compute node resouce provider without needing to known the compute node uuid13:50
efriedI can't find anything in the code that's joining stderr to stdout. Unless the job itself is doing that.13:51
sean-k-mooneyit looks up the RP by hostname13:51
efriedOkay, apparently I'm the only one concerned about the fact that `openstack resource provider list` is producing bogus output in gmann's case, so I must be misunderstanding what we're actually trying to solve here. /me stfu, call if you need me.13:52
gmannsean-k-mooney: and there it is working fine? failure case of octavia grenade job on py3.13:53
sean-k-mooneythey do that in code not via osc13:54
sean-k-mooneyis looking up the provider by hostname all that is breakign the job?13:59
sean-k-mooneyit would be quick to fix osc-plamcenet to support that but equally quick to just do it with curl13:59
gmannsean-k-mooney: fixing osc might take time with release etc until octavia job can install it from source.14:01
sean-k-mooney do we know what cause the broken pipes?14:01
haleybsean-k-mooney: would adding osc-placement to requirements in octavia fix it as well?14:02
sean-k-mooneyi think mriedem said list does not currently support --name14:02
sean-k-mooneyi was looking at the greand job logs by the way gmann haleyb do you have the link to the  octavia job14:04
johnsomhaleyb Yes, that is the error output you are seeing. Installing osc-placement should fix it.14:04
openstackgerritMerged openstack/nova stable/pike: Delete instance_id_mappings record in instance_destroy
openstackgerritKashyap Chamarthy proposed openstack/nova master: libvirt: Bump MIN_{LIBVIRT,QEMU}_VERSION for "Ussuri"
sean-k-mooneyah right yes "'resource provider list -f value' is not an openstack command"14:05
kashyapstephenfin: ^^ Fixed the functional test (forgot to run `tox -e functional-36`, bad me)14:05
johnsomhaleyb The question is which devstack plugin has the missing requirement14:05
sean-k-mooneyis becasue osc-placement is not installed14:05
gmannjohnsom: haleyb it is parsing of command failing not about osc command is need more installation etc14:05
kashyapstephenfin: Thanks for the earlier review :-)14:06
kashyapUh, seems like I need a rebase...14:06
sean-k-mooneyjohnsom: i guess the ocatavia one14:07
gmannthus command we need to adjust to get the first RP-
sean-k-mooneybut honestly it might be better for devstack to install osc-placemetn if placement is installed14:07
johnsomsean-k-mooney Octavia devstack plugin isn't running that command.14:07
gmanncommand i mean parsing logic14:07
sean-k-mooneyoh ok14:07
johnsomIt is openstack/grenade14:08
johnsomline 5714:08
sean-k-mooneygmann: im confused on the octaiva patch you are linink to grenade14:08
sean-k-mooneyis this a but there are other fialng jobs too14:09
sean-k-mooney*failing jobs14:09
*** factor has joined #openstack-nova14:09
sean-k-mooneyare ye only looking at the grenade one at the moment14:09
gmannyeah grenade one only i checked14:10
sean-k-mooneywell we could have grenade install it or as i said we could have devstack install it if placemnt is installed14:10
sean-k-mooneygrenade is using it so its resonable for it to install its own depencies14:11
johnsomYeah. It's odd that grenade doesn't have a requirements.txt though it obvious has requirements in it's scripts14:11
sean-k-mooneywell grenade is not a python porject14:11
*** jawad_axd has quit IRC14:12
sean-k-mooneyits almost all bash so you would not pip install it14:12
gmannosc-placement installation is not the issue here.14:13
*** jawad_axd has joined #openstack-nova14:13
johnsomHa, yeah, I just noticed that. I guess you know now how much time I have looked at grenade.... lol14:13
johnsomgmann Yes it is. If that is not installed the first output of OSC is "openstack:" which the script tries to parse.14:14
*** dpawlik has quit IRC14:15
sean-k-mooneyright as your irccloud link show the message is "openstack: 'resource provider list -f value' is not an openstack command. See 'openstack --help'."14:15
*** jawad_axd has quit IRC14:17
kashyapIs a rebase really necessary here? -
sean-k-mooneygmann: intersting14:20
johnsomI wonder if it is using the py2 python-openstackclient14:25
efriedjohnsom: I had thought so too, but similar commands just above that seem to be working fine14:26
johnsomI see most of OSC also installed in the py2 environment there.14:26
efriedthe working commands above that aren't mucking with providers14:27
efriedso yeah, it's probably a matter of osc-placement being installed on the wrong py version.14:27
efriedgmann: ^14:28
sean-k-mooneyit should be python 3
sean-k-mooneybut if it was install on python 2 first14:29
johnsomWell, it looks like it was properly installed in the 3.6 environment given it was a python3 devstack setup. It's just somehow the script is using the py2 openstack command.14:29
sean-k-mooneythen the console script would be python 214:29
sean-k-mooneyoh i know what the issue is14:30
sean-k-mooneythe first run install in python 2 right?14:30
gmannyeah, let's recheck as devstack is all py3 now ?14:30
johnsomWell, I guess that explains why it only fails when devstack is set to python314:30
sean-k-mooneye.g. train would run with python 214:30
gmannoh yeah14:30
sean-k-mooneythen we upgrade to ussuri with python 314:31
sean-k-mooneyand we keep the ocs console script form python 214:31
sean-k-mooneythen we install osc-placmeent in py3614:31
sean-k-mooneywhich the python2 version wont find14:31
*** tosky_ has joined #openstack-nova14:31
openstackgerritKashyap Chamarthy proposed openstack/nova master: Pick NEXT_MIN libvirt/QEMU versions for "V" release
openstackgerritKashyap Chamarthy proposed openstack/nova master: libvirt: Bump MIN_{LIBVIRT,QEMU}_VERSION for "Ussuri"
*** tosky has quit IRC14:32
*** tbachman has quit IRC14:33
*** tosky_ is now known as tosky14:33
sean-k-mooneyso we would not see this if we set USE_PYTHON3=True in the grenade job14:35
sean-k-mooneysince both version would run on python 314:36
*** tbachman has joined #openstack-nova14:36
sean-k-mooneyand in a real install the package manager/installer  would unistall the python 2 versions when installing the python3 version for ussuri14:37
sean-k-mooneyor in container land you would jsut spin up the new python3 only container inplace of the old contianer14:37
sean-k-mooneyim pretty sure kolla already move to python 3 contaienr in train for what its worth14:38
*** awalende has quit IRC14:38
*** awalende has joined #openstack-nova14:38
*** awalende has quit IRC14:41
openstackgerritMatt Riedemann proposed openstack/nova master: PoC for using COMPUTE_SAME_HOST_COLD_MIGRATE
*** awalende has joined #openstack-nova14:41
*** mmethot has joined #openstack-nova14:43
*** Luzi has quit IRC14:46
*** davee_ has quit IRC14:46
mriedemdoes any of this grenade talk have anything to do with nova/14:48
mriedemstill the resource provider thing or what?14:48
*** tbachman has quit IRC14:48
*** davee_ has joined #openstack-nova14:50
johnsomlol, well, it's the nova scripting in grenade that is failing. That is the tie back to nova.14:52
mriedemi could have sworn at one point in grenade we had some kind of hack where we'd re-install python-openstackclient b/c we went from 2 to 314:54
mriedembut i'm not finding that14:54
*** tosky has quit IRC14:54
*** mlycka has quit IRC14:54
*** tosky has joined #openstack-nova14:57
*** ayoung has joined #openstack-nova14:58
*** JamesBenson has joined #openstack-nova14:59
*** igordc has joined #openstack-nova15:01
*** pcaruana has quit IRC15:02
*** ociuhandu has joined #openstack-nova15:04
*** igordc has quit IRC15:05
*** igordc has joined #openstack-nova15:06
*** ociuhandu has quit IRC15:09
*** igordc has quit IRC15:14
*** igordc has joined #openstack-nova15:14
openstackgerritMatt Riedemann proposed openstack/nova master: Avoid spurious error logging in _get_compute_nodes_in_db
ayoungWhat does Nova do to modify the kernel command line?  It has to happen prior to cloud-init.  I can see docs that imply I should be able to do this: glance image-update --property kernel_extra_args="coreos.inst.ignition_url= " coreos-xx-bootstrap15:17
ayoungbut that does not show up when the image boots, and I am guessing I need to pass that through to the instance somehow15:17
sean-k-mooneyif you pass a seperate kernel image in addtion to the root image i think we can pass the kernel arges to qemu15:19
sean-k-mooneybut in general im not sure how much that featuer is used or tested15:19
sean-k-mooneyi do not belive you can use it with just a root image15:19
johnthetubaguyayoung did you try os_command_line:
*** pcaruana has joined #openstack-nova15:20
johnthetubaguyhmm, I am not so sure we do anything with that, ignore me15:20
sean-k-mooneyjohnthetubaguy: isnt os_command_line jsut for lxc and other containers15:20
sean-k-mooneylike openvz15:20
mriedem"The kernel command line to be used by the libvirt driver, instead of the default. For Linux Containers (LXC), the value is used as arguments for initialization. This key is valid only for Amazon kernel, ramdisk, or machine images (aki, ari, or ami)."15:21
johnthetubaguyyeah, my bad15:21
ayoungBTW, this is a pretty good argument for Nova supporting ignition the same way we do happens earlier in the process15:21
sean-k-mooneyits also a good argument for ignition supporting the metadata service :P15:22
ayoungI know that the openshift install, which works via terraform, does something to inject this value.  I do not know what that is15:22
*** jmlowe has joined #openstack-nova15:22
ayoungSo, I think the metadata service would work.  I think what you are saying is that the ignition mech should default to the cloud-init URL, somehow?15:22
ayoungWell, not the URL, but the host, and a separate URL specific to ignition.15:23
sean-k-mooneyi was suggesting that ignition could try to hit the metadta url and then load info form it15:23
*** priteau has quit IRC15:23
*** jawad_axd has joined #openstack-nova15:24
ayoungor a logical shift from it, like:  http://meta-data-host/ignition15:24
sean-k-mooneyayoung: anyway the way that was all ment to work was ehn you boot the vm you provide a sperate kernel image with the kernel command line parmater set and then nova woudl use the root iamge and kernel image when booting and pass the kernel args form teh kernel image to qemu15:25
ayoungAnd Nova/metadata server then would be responsible for multiplexing between the different instances15:25
ayoungI see.  I don;t think that the installer (terraform) is doing any of that.  But I can reproduce this afternoon and determine what it IS doing15:26
*** awalende has quit IRC15:27
*** larainema has quit IRC15:27
*** jawad_axd has quit IRC15:28
*** damien_r has quit IRC15:33
mriedemefried: heh, oops
mriedemnot sure i/we missed that15:36
mriedemi mean, it's zvm so totally forgettable but otherwise15:36
openstackgerritDan Smith proposed openstack/nova master: ZVM: Implement update_provider_tree
*** awalende has joined #openstack-nova15:40
*** ociuhandu has joined #openstack-nova15:40
*** ociuhandu has quit IRC15:41
*** ociuhandu has joined #openstack-nova15:42
aarentsHi dansmith, can you confirm this last upaste is ok for you since your were holding -1 on that ? thanks !15:44
dansmithaarents: yeah, I was waiting for CI before, but I'll circle back15:45
dansmithaarents:  we probably want to have a few people look at that and sanity check my thinking there15:45
dansmithmaybe gibi since he was previously okay with the other fix, at least15:45
dansmithobviously mriedem is always good at everything15:46
aarentsyes good idea15:46
mriedemumm, rescue + disk bus = i defer to lyarwood15:47
*** ociuhandu has quit IRC15:47
dansmithit's not really rescue related,15:48
mriedem"There is a case during rescue where this value can be mistakenly updated15:48
mriedemto reflect disk bus property of rescue image (hw_disk_bus)."15:48
dansmithrescue is just one place where this side effect screws us15:48
dansmithright, but.. the change is actually more generic15:48
dansmithbut yes, lyarwood would be a good person to look also15:48
mriedemmy gut feeling on something like this is it fixes one thing and breaks another15:49
mriedemand i don't have the background on that15:49
dansmithmriedem: check out my comments on the earlier set15:49
dansmithmriedem: a patch from gary unceremoniously removed all he thought were unnecessary,15:49
dansmithwhich removed the one right after this that was there originally,15:49
dansmithso we just almost never save this change15:49
dansmithand also,15:49
dansmithmaking a change to instance within a "just generate me the xml" method is crazy wrong15:50
*** priteau has joined #openstack-nova15:50
dansmithbut rescue just happens to tickle things right so we mangle the disk bus, but never save() it back to normal15:50
* lyarwood reads up15:50
sean-k-mooneywe should not be saving the disk bus back but we shoudl be usign for the rescue xml15:51
sean-k-mooneyi dont think there has ever been an expection that the /dev/sd* names for the rescue boot would be the same as when the instnace is booted normally15:52
dansmithsean-k-mooney: read the patches I linked in my analysis earlier15:53
*** mlavalle has joined #openstack-nova15:53
mriedemwould have been useful to have that context in the commit message, if just summarized from the big comment in PS115:53
dansmithsean-k-mooney: the original change from like 2012 was trying to use get_xml as a hook to update the info late after libvirt had chosen defaults15:53
mriedemtl;dr removed the code that would persist this change and yet we can still incorrectly persist it in edge cases incorrectly (like rescue) and we shouldn't be modifying the instance in a _get* method anyway.15:54
sean-k-mooneyok ya we should not15:54
dansmithmriedem: exactly15:55
mriedemso do that, dan can +2 and ill +W15:55
dansmithaarents: ^15:55
*** TxGirlGeek has joined #openstack-nova15:56
*** TxGirlGeek has quit IRC15:58
aarentsdansmith: So I rephrase commit message with more context ?15:59
*** tbachman has joined #openstack-nova16:00
dansmithaarents: yeah just add all that context in there to make mriedem happy16:00
aarentsok got it !16:00
*** TxGirlGeek has joined #openstack-nova16:00
mgoddardhi mriedem, got a minute to discuss ?16:00
*** derekh has quit IRC16:03
*** mlavalle has quit IRC16:03
*** udesale has quit IRC16:04
*** jawad_axd has joined #openstack-nova16:05
*** mlavalle has joined #openstack-nova16:09
*** jawad_axd has quit IRC16:09
*** sapd1 has joined #openstack-nova16:12
*** nanzha has quit IRC16:13
mriedemmgoddard: sure16:15
mriedempreface: i have lost a lot of context on that bug and fix16:15
mriedemso i'll likely just abandon my changes and you can move forward16:16
mgoddarddo you happen to remember how the RP association becomes stale after your patch16:16
mgoddardI can't work it out from the code16:16
*** jmlowe has quit IRC16:17
*** ociuhandu has joined #openstack-nova16:18
mriedemi think the comment from L7916:21
mriedembecause host1 deletes the provider between16:21
mriedem        # _check_for_nodes_rebalance and _refresh_associations.16:21
*** bhagyashris has joined #openstack-nova16:22
mriedemi think because we either don't add the provider uuid to _association_refresh_time or we pop it out on failure if the provider doesn't exist16:24
mriedemit's all linked to when the ResourceTracker calls SchedulerReportClient.get_provider_tree_and_ensure_root16:25
mriedemand then you go down the rabbit hole16:25
*** damien_r has joined #openstack-nova16:25
*** jawad_axd has joined #openstack-nova16:26
*** ociuhandu has quit IRC16:27
mgoddardok, I think I see. The RP wasn't in _association_refresh_time because it hasn't been in the local tree yet. If the RP exists in placement, that means we only update _association_refresh_time after _refresh_associations is done16:29
*** jawad_axd has quit IRC16:30
mriedemi'm going to say      yes16:30
*** jaosorior has joined #openstack-nova16:33
mgoddardthat part makes sense now. I still don't see how the node not being removed from the RT compute_nodes prevents placement from getting healed though - each time through the loop we call _update, which calls _update_placement16:34
*** tbachman has quit IRC16:35
mgoddardI probably need to stop thinking about this, going a little mad. I ran your functional test with my patch chain and it seemed to fix the issue.16:36
efriedmriedem: yeah, I was *sure* that one was already implemented. Oh well :(16:37
*** jaosorior has quit IRC16:38
*** jaosorior has joined #openstack-nova16:39
*** ricolin has quit IRC16:39
mriedemmgoddard: i don't want to think about it anymore either which is why i abandoned my changes16:43
*** tbachman has joined #openstack-nova16:46
*** TxGirlGeek has quit IRC16:48
*** _mlavalle_1 has joined #openstack-nova16:49
*** mlavalle has quit IRC16:52
*** TxGirlGeek has joined #openstack-nova16:54
mgoddardmriedem: hopefully you (or someone) will face thinking about it to review my patches at some point16:54
*** JamesBen_ has joined #openstack-nova16:54
*** JamesBen_ has quit IRC16:55
*** sapd1 has quit IRC16:56
mriedemcan i get a stable core here?
mriedemthe changes to queens/pike/ocata are dependent on that since i have to redo them16:57
*** JamesBenson has quit IRC16:58
mriedemmelwitt: were these something you wanted to get downstream?
mriedemi know eandersson was saying he used those in rocky16:59
*** jaosorior has quit IRC17:00
mriedembauzas: could you take a look at these train backports?
*** bhagyashris has quit IRC17:01
melwittmriedem: I don't know that it's come up specifically (everything's on queens) but yeah definitely could use it I'm sure. really most likely is I'd want to get heal_allocations in queens in the first place (downstream) and then backport the --instance and --dry-run too17:02
donnydmriedem: :(17:02
melwittI dunno how doable that would be, I haven't tried it yet17:03
mriedemdonnyd: what? the lxc unicorn ci job that no one cares about?17:03
donnydI care17:03
mriedem"care" and "care enough to work on" are different things, and i don't care enough to work on that anymore17:04
donnydHence my :(17:04
mriedemyeah i know17:05
mriedemfreedom ain't free and all that17:05
donnydWell I am very appreciative of the time that has been put in17:06
*** rpittau is now known as rpittau|afk17:06
*** ociuhandu has joined #openstack-nova17:07
donnydmelwitt: zomg lol17:07
mriedemheh yo'uve never seen that?17:08
mriedemit's the only reason i say it17:08
donnydOh I have, and it makes me laugh every time17:09
mriedemcosts about a buck-o-five17:09
*** tssurya has quit IRC17:10
mriedemok on that note, my wife wants me out of the gd house for a few hours so i'm going to lunch, errands and then to be driven crazy working at a coffee shop so bbiab17:10
*** dtantsur is now known as dtantsur|afk17:10
*** tbachman has quit IRC17:10
*** mriedem has quit IRC17:10
*** tbachman has joined #openstack-nova17:11
*** gyee has joined #openstack-nova17:17
openstackgerritLee Yarwood proposed openstack/nova master: block_device: Copy original volume_type when missing for snapshot based volumes
*** ociuhandu has quit IRC17:17
*** priteau has quit IRC17:20
*** priteau has joined #openstack-nova17:22
*** priteau has quit IRC17:22
*** tbachman has quit IRC17:23
*** ociuhandu has joined #openstack-nova17:36
*** ociuhandu has quit IRC17:36
*** ociuhandu has joined #openstack-nova17:37
*** ociuhandu has quit IRC17:37
*** ociuhandu has joined #openstack-nova17:38
*** damien_r has quit IRC17:42
*** ociuhandu has quit IRC17:44
*** TxGirlGeek has quit IRC17:45
*** sridharg has quit IRC17:48
*** tbachman has joined #openstack-nova17:50
*** pcaruana has quit IRC17:51
*** TxGirlGeek has joined #openstack-nova17:54
sean-k-mooneygibi: our downstream qa just found an issue with how we report the bandwidth provires17:59
sean-k-mooneynova creates the compute node rp with the hypervior_hostname as the RP name18:00
sean-k-mooneywhich means if you change the compute node host with the host config value in the nova and neutron config18:00
*** TxGirlGeek has quit IRC18:01
sean-k-mooneywe cannot find the root RP18:01
sean-k-mooneyso for it to work compute node host and hypervior_hostname meed to match18:04
efriedthat sounds like a problem with how we report providers in general. Are you really supposed to be able to change that config for an existing service?18:04
efrieddo we actually rename the provider correctly in that case?18:04
sean-k-mooneythis is a clean deployment18:05
sean-k-mooneyso no rename18:05
sean-k-mooneybut that would also be a problem18:05
*** dacbxyz has joined #openstack-nova18:06
*** TxGirlGeek has joined #openstack-nova18:07
sean-k-mooneyefried: nova is correctly using the host confiv vlaue "sriov01.localdomain" for the host as is neutron and that is also the vlaue set in the neutron port bindings18:08
sean-k-mooneyefried: the rp uses the hypervior_hostname which makes sense for ironic18:08
sean-k-mooneybut for libvirt this is an issue18:08 is actully the real hostname set in /etc/hosts18:09
sean-k-mooneysorry /etc/hostname18:09
efriedcan you get a UUID from `openstack hypervisor list`?18:09
sean-k-mooneyno it return the internal data base id intead which is a differnte issue18:10
sean-k-mooneyi gues a show might work18:10
sean-k-mooneyno no uuid18:11
efriedwell, my point is that the compute node's *UUID* should be predictable (it's compute_node.uuid)18:11
efriedso, once again, trying to use RP name for *anything* is dangerous and brittle.18:12
sean-k-mooneysure but only nova know the uuid18:12
openstackgerritJohn Garbutt proposed openstack/nova master: WIP: review comments around unit test idea
efriedthat can't be true18:13
openstackgerritLee Yarwood proposed openstack/nova master: block_device: Copy original volume_type when missing for snapshot based volumes
sean-k-mooneywhy cant it.18:13
efriedyou telling me there's no way to discover the compute node UUID from the host?18:13
efriedseems like I asked mriedem about this the other day...18:13
sean-k-mooneynot via the hyperviors api18:13
sean-k-mooneyor the compute service list18:14
*** pcaruana has joined #openstack-nova18:14
efried...okay, the result of said discussion was "the RP name matches" -- which you've just proven ain't true.18:16
sean-k-mooneyright it was ment to18:16
sean-k-mooneyit actully match the result of calling get_hostname18:16
sean-k-mooneyor what ever that function is called18:16
*** ociuhandu has joined #openstack-nova18:16
sean-k-mooneyill grant you the fact that they have set the confg value so they dont amctch is a little weired but it shoudl actully work18:18
openstackgerritJohn Garbutt proposed openstack/nova master: WIP: just an idea, adding scope checking
efrieddammit, having it be hypervisor_hostname is the *right* thing18:18
efriedbut yeah, it makes discoverability problematic.18:18
sean-k-mooneyyes it would be if we used that when talking to neutron and cinder18:19
sean-k-mooneybut we dont18:19
efriednova.compute.resource_tracker.ResourceTracker._update_to_placement would have to condition on "am I ironic?"18:19
efriedheck, I don't even know if we have a way to tell at that point in the code whether we're ironic.18:21
efriedCan't we just deprecate :P18:22
sean-k-mooneypossibly although i dont think that is the only place we can creat he compute node record18:22
efriedthe one I linked is the only place we create the provider18:22
efriedI'm not talking about changing what's in the compute node record.18:22
sean-k-mooneyah ok18:23
sean-k-mooneyya sorry i was trying to figure out where the hyperviour hostname was set orginally in the compute node18:24
*** dacbxyz has quit IRC18:24
sean-k-mooneyin the libvirt case i had kind of assumed it shoudl always match the value18:25
sean-k-mooneybut i does not18:25
efriedit appears as though it's up to the individual driver to set hypervisor_hostname in that resources dict you pointed to.18:26
efriedAnd the libvirt driver asks the libvirt API.18:26
efriedwhich I can only imagine does effectively `hostname`18:27
sean-k-mooneyya which we cant change becasue i think it uses that for live migration18:27
efriedcan't and shouldn't.18:27
efriedIt would also be a nightmare at this point to try to change the resource provider name I think.18:27
sean-k-mooneyso really the host config option cant be used18:28
efriedat least with libvirt and bandwidth18:28
sean-k-mooneyunless you are setting it to the value returin by get_hostname18:28
efriedcan you think of other places this could break us?18:28
sean-k-mooneyit will be doing the same to creat its resouce right?18:29
*** tosky has quit IRC18:29
sean-k-mooneyit was going to look up the RP by name18:29
efriedwhich also relies on the correlation between the hypervisor and the RP?18:29
efriedSundar isn't here, could go check the code...18:29
efriedbut I'm not gonna.18:30
efriedI have him on slack, will ask...18:30
sean-k-mooneyill check18:30
sean-k-mooneyso first sign is not promising
sean-k-mooneyso it can be set in the config too18:33
*** martinkennelly has quit IRC18:34
sean-k-mooneyso yes cyborg would also have to have that set to have the same value set18:35
sean-k-mooneythey default to socket.getfqdn()18:36
sean-k-mooneynova defualt to socket.getHostname()
sean-k-mooneyand neutron has there own function18:37
sean-k-mooneywhich just calls socket.gethostname()18:38
*** gmann is now known as gmann_afk18:38
sean-k-mooneyso at least nova and neutron agree on the default cyborg could get a differnt value18:39
sean-k-mooneyefried: i dont think we can change anything on the nova side so i might have to file a triplo bug18:40
sean-k-mooneyalthough this could be related to this specific deploment18:41
*** pcaruana has quit IRC18:42
*** dacbxyz has joined #openstack-nova18:47
*** mloza has joined #openstack-nova18:51
*** tbachman has quit IRC18:53
*** ralonsoh has quit IRC18:55
*** pcaruana has joined #openstack-nova19:06
*** mriedem has joined #openstack-nova19:15
*** dviroel has joined #openstack-nova19:17
mriedemtest_encrypted_cinder_volumes_luks might be failing all the time on multinode jobs too if the volume and server are on different hosts, i wonder if that's causing that to fail19:20
mriedemnvm i guess that's not the case in the failure i'm looking at19:22
lyarwoodmriedem: link? I'm about for 15 and can take a look.19:45
*** openstackstatus has quit IRC19:50
mriedemi closed it but it's a bug we've already talked about before
openstackLaunchpad bug 1820007 in os-brick "Failed to attach encrypted volumes after detach: volume device not found at /dev/disk/by-id" [Undecided,Confirmed]19:50
*** openstackstatus has joined #openstack-nova19:51
*** ChanServ sets mode: +v openstackstatus19:51
lyarwoodmriedem: ah right, that weirdness19:54
*** tbachman has joined #openstack-nova19:59
lyarwoodmriedem: - I'll follow up in the morning with this.20:05
*** TxGirlGeek has quit IRC20:06
*** ociuhandu has quit IRC20:13
*** ociuhandu has joined #openstack-nova20:15
*** ociuhandu has quit IRC20:19
*** ianw has quit IRC20:20
*** TxGirlGeek has joined #openstack-nova20:25
*** ianw has joined #openstack-nova20:27
melwittmriedem: I left comments on the --instance and --dry-run backports for the docs20:29
mriedemmelwitt: yeah good catch, i left some thoughts/options in
dansmithmriedem: speaking of getting off your lawn,20:39
dansmithdoes it not seem hilarious that we had "% locals()" all over the code, removed it all, and then python baked it into the language?20:39
mriedemi was reading that article ttx linked and thought that exact same thing20:40
mriedem"oh fun this is just using locals"20:40
mriedemnot breakable at all!20:40
dansmithso a new incompatible syntax for ... something you can already do but shouldn't20:40
dansmithhah right20:40
mriedemi also didn't know that % pre-dated format()20:40
mriedemi prefer %20:40
dansmithme too, because I'm old school20:40
dansmithI've been doing python since before .format and since before you saw the non-java light, whippersnapper20:41
mriedemhey, now we're both losers because there is go: java for python people20:41
mriedemumm, + pointers20:41
*** ociuhandu has joined #openstack-nova20:42
mriedemf strings also looks like you can run functions from within strings, which ....20:44
mriedemseems like erlang?20:44
dansmithyou can definitely hit dictionaries, which seems borderline evil20:44
dansmithrunning functions and you're nearly bash20:44
*** gmann_afk is now known as gmann20:45
dansmithI also don't like the single-char prefix to a quoted string syntax python seems to love20:45
dansmithu"foo", r"bar", etc20:45
dansmithand all these new kids with their hoverboards and roller skates...*fist*20:46
melwittmriedem: replied. no strong opinion here20:46
melwittI'm cool with whatever you think is best20:47
mriedemmelwitt: but what's your take on f-strings?!20:47
mriedemf'em right?!20:47
melwittI didn't know what f-strings is yet so I'm way ahead of the game20:47
mriedemi learned <1 hour ago20:47
melwittI read a few of the posts and had no idea what was going on20:47
melwittand of course, I didn't read the first one, cause BORING20:47
*** spatel has joined #openstack-nova20:48
melwittI mean, I really didn't read it but not because it was boring. once I saw replies piling up I read a few20:48
* artom gates how cold migration, rebuild and resize are all intertwined all over the place20:51
artomHates, even20:52
melwittI think we all do. I've been thinking about how nice it would be to refactor those things in a non-boil the ocean way. somehow.20:54
dansmithI don't20:54
dansmithI love it20:54
artomMicroversion 2.whatever "support for resize, cold migration, and rebuild is dropped"20:56
artomRewrite everything from scratch20:56
artomMicroversion 2.whatever+1 "Nova now supports resize, cold migration, and rebuild"20:56
melwittlol, nah just leave em out20:57
artomHah, yeah, just conveniently forget step 320:57
artomOr step 2, even20:57
mriedemartom: how are rebuild and resize/cold migrate intertwined?20:58
efriedNova meeting in 2 minutes in #openstack-meeting20:58
artommriedem, through sheer willpower and anger20:58
efriedmriedem: they're not, rebuild and evacuate are the same thing though.20:58
mriedemdon't make me link the doc i wrote again20:58
artomOK, resize and cold migrate then20:58
artomMea culpa20:58
mriedemok so write a resize vs cold migrate doc like this
mriedem1. resize has a new flavor, cold migrate doesn't,20:59
gregworkdoes nova in queens understand how to boot instances on a particular host aggregate? ive tried tagging the host aggregate "compute = 1" and then passing scheduler_hints: compute: 1 when deploying a stack with OS::Nova::Server .. however that doesnt appear to do anything and the instances ends up on whatever compute node20:59
mriedem2. resize can sometimes go on the same host, cold migrate does not except if you're using vcenter20:59
mriedemartom: what other differences are there besides those 2 things?21:00
melwittgregwork: has to be availability zone. host agg is a admin only hidden thing21:00
*** spatel has quit IRC21:00
artommriedem, I think that's about it?21:00
artomAnd resize to same host is user-configurable21:00
gregworkmelwitt: oh .. so even if the admin has defined the host agg, regular users cant reference it ?21:00
artomCold migrate is... always to a new host?21:00
mriedemartom: you mean operator configurable...21:01
artommriedem, just... let me be angry21:01
artommriedem, sorry, yeah21:01
artomI meant in the code, anyways21:01
mriedemartom: remember
melwittgregwork: they can but only by way of a AZ. you have to make one and map it to the aggregate(s) you want (as admin)21:01
artomLike, you hit a method called "_cold_migrate" in the resize flow21:01
artommriedem, I do21:01
mriedemyeah, so docstrings ftw21:01
mriedem-1 people that don't write code comments21:02
gregworkmelwit: from reading up on AZ's in openstack, these seem like a very heavy abstraction compared to a simple host aggregate to configure21:02
mriedembe the change you want to see in the world...21:02
artomBut how do I become free limitless beer?21:02
artom(Point taken, though)21:02
melwittgregwork: they're really just a tag on host aggregates. shouldn't be heavy. you put tag my_az on the host aggs you want to be part of it and then the user can say my_az21:03
mriedemartom: so if you want to see some things in a contributor doc about resize vs cold migrate, throw those into a doc bug and maybe i can crank something out21:03
*** openstack has joined #openstack-nova21:16
*** ChanServ sets mode: +o openstack21:16
melwittgregwork: sorry that link was for admin user to be able to bypass the scheduler. this is the normal end user instructions for how to specify AZ
artommriedem, I think that I'd really at this point is a sequence diagram for resize (to start with), like we have for live migration21:20
artomBtw, stephenfin, who gave me, I will forever remember that21:21
artomMy fault for opening my mouth and saying I have extra bandwidth, I suppose21:21
*** dave-mccowan has joined #openstack-nova21:21
mriedemi had an old todo somewhere to do a resize diagram like the live migrate one i put on a post-it and efried turned into that diagram21:21
mriedemand then got the nickel 3 years later21:21
mriedemartom: fwiw i do have a simple resize diagram in my last cells v2 summit presentation, you could lift the slides from that and turn it into what seqdiag stuff in sphinx21:22
mriedemslide 2621:23
artommriedem, much thanks21:25
*** pcaruana has quit IRC21:25
* artom needs to do the school run21:26
gregworkhmmn "Cannot update metadata of aggregate 11: Reason: One or more hosts already in availability zones [u'nova']21:26
*** TxGirlGeek has quit IRC21:28
mriedemi've noticed that artom has the same disappearing pattern as bauzas21:29
mriedemt0: complain!21:30
mriedemt1: here you can do this21:30
mriedemt2: thanks! but i've got to run21:30
gregworkalright nm i think i got it ..21:31
*** TxGirlGeek has joined #openstack-nova21:32
mriedemgregwork: hosts can be in M aggregates but 1 AZ21:32
gregworkyeah i had created an aggregate and already tagged it as nova21:32
gregworkfor the az21:32
gregworkso adding an additional az was failing21:32
gregworknow to figure out what the scheduler_hint is to specify the az in OS::Nova::Server21:33
gregworki dont think its group:21:33
mriedemthere is a big fat warning in the docs to not ever create an az literally called "nova"21:34
mriedemb/c that's the default schedule zone for services,21:34
mriedemso if you create an az that users boot into called nova they can be stuck and not migrate their servers (potentially)21:34
mriedemwe should probably just block that in the api, but no one has cared enough to yet21:35
gregworkso apparently availability_zone is a property in os::nova::server and not a scheduler hint map21:36
mriedembut let's not bring heat into this please...21:37
gregworkwell im trying to understand how it works in nova so i can figure out how to solve this in heat21:38
gregworkos:scheduler_hints.stuff section is very very useful21:39
mriedemyup - thank takashin for doing the thankless work of documenting a lot of that stuff21:40
mriedemorganizing it, etc21:40
* mriedem heads home, bbiab21:41
*** mriedem has quit IRC21:41
*** nweinber__ has quit IRC21:46
*** ayoung has quit IRC21:47
*** ayoung has joined #openstack-nova21:49
efriedsean-k-mooney: what we were talking about earlier, I guess the first step to at least acknowledge the problem would be to beef up the help for
efriedit already has a bullet list of "things this is used for"21:53
efriedAdding "external services looking up the compute node resource provider; due to bug #XXXXX you will break your world if you set this when using the libvirt driver, so don't." kind of thing.21:54
*** awalende has joined #openstack-nova21:55
*** awalende has quit IRC22:00
*** TxGirlGeek has quit IRC22:01
*** TxGirlGeek has joined #openstack-nova22:04
efriedsean-k-mooney: Potential remedy (though it would be kind of a long road) would be to expose the compute node UUID through that hypervisors API.22:05
efriedthen we instruct consumers (neutron, cyborg) to use that rather than to discover the compute node RP.22:05
*** abaindur has quit IRC22:05
efriedfor libvirt22:05
efriedThis whole thing makes efried :(22:06
*** mriedem has joined #openstack-nova22:08
efriedmriedem: you may have missed this earlier, but it turns out we broke for libvirt.22:08
*** rcernin has joined #openstack-nova22:09
*** dacbxyz has quit IRC22:09
efriedBecause external services (neutron (problem now), cyborg (problem soon)) assume is the name of the compute node RP22:10
efried...which they need to know in order to hang their nested providers (neutron: bw; cyborg: accelerator) off of22:10
efriedbut the compute node RP is actually hypervisor_hostname, which for libvirt is the `hostname()` of the system.22:11
efriedwhich is the default for, so you're fine... unless you actually *set* that guy.22:12
efrieddansmith: ^22:12
*** jbernard has quit IRC22:12
mriedem"we broke"?22:13
mriedemwho is we and when?22:13
mriedemif we = jay and when is ocata then...22:14
artommriedem, all part of my cunning plan22:14
efriedokay, "If you set set to a non-default value, and you're using libvirt, bandwidth (and accelerator, and other future external-service-created nested) providers are broken"22:15
mriedemoopsy daisy22:15
efriedsean-k-mooney: o hey, it looks like you were maybe getting the hypervisor ID as a short ID because you were using the default microversion. Per [1] it'll come back as a UUID starting at 2.53.22:16
efriedwhich phew gives us a path forward without further changes.22:16
efriedI think22:16
* mriedem exhales22:17
efriedHaving to hit the /os-hypervisors API is heavier than just looking at, but that UUID is the UUID of the provider you want.22:17
efriedhm, but mriedem, how do I ask /os-hypervisors for the entry for my current node?22:18
efriedcause the qparams and output still talk in terms of hypervisor_hostname :(22:19
mriedemGET /os-hypervisors/detail?hypervisor_hostname_pattern=<hostname>22:19
efriedright, exactly.22:19
efriedI mean, if I'm going to rely on running `gethostname()` on the host, I might as well just do that from the start.22:20
mriedemif not that, then i guess GET /os-hypervisors/detail and you iterate all until you find the one with the that matches what you care about if hypervisor_hostname is not a match22:20
mriedemi think there is CONF.host22:21
mriedemsomeone had an RFE at one point to be able to filter hypervisors by for ironic22:21
mriedemit would be pretty easy to do i think22:21
mriedemit's not clear to me if you need that though22:21
mriedemyou being neutron/cyborg22:22
efriedthat would be a sure, but heavy, way to correlate
efriedesp in a big env, that's a big payload22:22
mriedemwhich thing? getting all hypervisors and then finding the
mriedemyeah it would, and likely limited to 1000 results by default so you'd have to page22:23
efriedyeah, GET /os-hypervisors/detail is going to be a big response for CERn.22:23
efrieduse sdk and it pages for you, which is fin.22:23
mriedemsure, so GET /os-hypervisors/detail?service_host=CONF.host22:23
mriedem^ is the RFE i mentioned22:23
efriedoh, that's a thing...22:23
*** kaisers1 has joined #openstack-nova22:23
mriedemha, no22:23
mriedemoh eric22:23
efriedyes, I'm looking at that. You're taking advantage of my weakened mental state to f with me. Is it April?22:24
mriedemwhy is your mental state weak?22:24
*** kaisers has quit IRC22:24
efriedis it ever not?22:25
mriedemi can't find the bug, i could have sworn a guy opened one though22:25
mriedembut that was the idea, he was trying to filter hypervisors by ironic compute service host but was only getting nodes based on the ironic node uuid which wasn't helpful22:25
efriedokay, anyway, yes, that would be a nice way to make this work that involves an API change with a microversion.22:25
efriedbut for the sake of discussion...22:26
efriedwould it be so wrong for neutron/cyborg to simply use the result of `gethostname()` instead of
*** jbernard has joined #openstack-nova22:27
efriedI guess that becomes coupled to the virt driver implementation.22:27
efriedthough arguably using at all already is.22:27
efriedin that it at least will never work for ironic22:28
*** ccamacho has quit IRC22:29
mriedemwe don't care about ironic for nested providers22:29
mriedemor most things22:29
efriedI can totally see needing to do... *something* with bandwidth for ironic.22:30
efriedThough I imagine the providers would be shared in that case, not nested.22:30
efriednevertheless we would have to discover the ironic nodes for aggregation purposes.22:30
mriedemfiltering hypervisors by service host seems useful in general so if it could be used here then i don't see a reason not to add that earlier than later22:33
mriedemalways nice to get to N+3 release from now and be like, "we have this problem, oh but we added X in N so we an use that"22:33
efriedokay. Not backportable tho22:33
efriedFor backport purposes, I guess since the scope is known and constrained to libvirt, we could just ask neutron to use gethostname() instead of
*** tosky has joined #openstack-nova22:35
mriedemagain, idk...22:35
mriedemdoes neutron have a concept of workarounds options?22:35
mriedemsounds like either way the list of 'used as' here should be updated
mriedemor something mentioned about how this is linked across services for certain features22:35
mriedem> does neutron have a concept of workarounds options? - meaning, could neutron be configured to say if it should use or gethostname()22:36
mriedemor the sake of linking nested providers i mean22:36
efriedyup. I actually mentioned that (the help text) while you were offline.22:36
mriedemb/c that would be backportable22:36
efriedwhat are the rules about neutron and n-cpu versions on a given host?22:37
efriedare they allowed to differ? by how much?22:37
mriedemi would guess (1) yes they should be able to differ and (2) assume N-122:37
mriedemthat's part of the idea behind passing os-vif negotiated objects around so you can do rolling upgrades of those22:37
mriedemover the rest api i mean22:37
efriedwhich one gets to be -1? or are they both allowed to be?22:38
efriedugh, am I making sense?22:38
mriedemi know what you're asking, but i don't have a good answer22:38
mriedemi doubt that level of upgrade granularity is documented or tested anywhere22:38
mriedemnova and neutron and cinder and keystone etc should all be able to work with each other at wildly different versions but we don't test upgrades that way22:39
mriedemnot because we can't22:39
efriedany neutron that does bw as currently written knows that it will work properly with `gethostname()`22:39
efriedit can condition the "new" thing simply on whether nova is exposing the microversion providing the new os-hypervisors qparam22:39
efried  new thing22:40
efriedexcept NoSuchMicroversion:22:40
efried  old thing22:40
efriedunless neutron does better discovery than that (which it should be able to, but that doesn't mean it does)22:40
mriedemi would probably start with just a simple workaround optoin in neutron which is backportable without the microversion stuff22:40
mriedemand the way to deprecate the workaround optoin in neutron is the microversion in nova when it's available22:41
efriedwhy is a workaround necessary?22:41
mriedembecause neutron has to do a thing based on how nova is set yeah?22:41
*** lennyb has joined #openstack-nova22:41 is only ever right by chance22:41
efried`gethostname()` is always right22:41
efried...for the cases it cares about in existing code, which is what we care about for backportability.22:41
efriedAnyway, since gibi and sean-k-mooney aren't here right now, and I don't know whether there's a bug yet, I guess I'll throw out a ML post summarizing the discussion and let it fester from there.22:42
melwittspeaking of downstream bugs,22:47
melwittwe hit a problem downstream where compute node orphan removal was happening and destroyed the compute node record but failed to delete the RP bc keystone or placement was down,22:48
melwittand after that, nova-compute could not start up again bc it was trying to create compute node record and then failed with 409 dupe from placement when trying to create the same provider22:49
mriedemunrelated, i just wanted to say this makes us look dumb
melwittsearching for ResourceProviderCreationFailed led me to mriedem's patch where he posed the question, should we swap the destroy and RP delete ordering,22:50
mriedemand this
melwittand I think the answer is yes22:50
melwittthat's odd. I'm not why someone added empty docs. and I hope it wasn't me22:51
melwitt*not sure22:51
mriedemthe api-guide was imported i think so no i'm not saying you, i just was looking for some stuff and noticed these22:51
mriedemthese giant todo gaps in our user-facing docs are embarrassing22:52
mriedemi'd rather we just delete them than leave them22:52
melwittyeah, I was gonna say, just remove em22:52
melwittthat was a joke, sometimes I see a thing and be like, wtf who did this and find it was me22:52
mriedemyour provider issue is also because in queens we didn't link the ironic node id to the compute node uuid to the provider uuid, we started that in rocky22:54
mriedemso your recourse in queens is deleting the old providers so compute on restart can re-create them22:54
mriedembut you'll have to heal allocations22:54
mriedemwhich isn't in quenes22:54
mriedemthere is a pretty beefy ML thread about all of this orphaned provider stuff months back22:55
mriedemi've been slowly polishing these turds22:55
melwittyeah, sean-k-mooney mentioned that22:55
melwittthe turd polishing22:55
mriedem is the tl;dr of the first mega beef thread22:56
melwittthank you. my brain is like about to explode so tl;dr is majorly appreciated22:56
mriedem is the post-ptg summary22:56
melwittI'm adding a comment to your review just so.... it's there22:57
mriedemyeah so related to this,22:57
melwittso you're thinking a change of the ordering and backport to queens is not gonna be viable? I guess you're saying that can't happen in rocky. dansmith said the same thing and I had no understanding of how, but I trust it. the messed up thing is they're also seeing a ResourceProviderCreationFailed on an overcloud (not ironic!) BUT now that I'm thinking more, that must be the service deletion case yeah?22:59
melwittthat you're solving in that patch22:59
mriedemwe've backported several pieces of this, and i have those train backports up for part of it as well22:59
mriedemwe didn't backport the ironic node uuid = compute node uuid = provider uuid thing to queens because the initial patch had caused some issues and other fallout that i recently fixed as well23:00
mriedemso backporting that ironic / compute node uuid sync stuff to queens would involve a few patches23:00
melwittwe've got customers hitting the ironic case in queens and a non-ironic case in queens. and the former is the orphan cleanup and the latter must be a service deletion issue, *maybe*23:00
mriedemotherwise service delete orphan stuff is bugs and i've been writing these as backportable changes23:01
mriedembugs since....pike23:01
mriedemi think23:01
mriedemwell ocata really23:01
melwittok, so you're saying the strategy should be to backport the uuid matcher patch rather than a split out "change the order" patch23:01
melwittbut would that not require some kinda migration actions for already existing compute node records?23:02
*** xek_ has quit IRC23:05
mriedemno i'm saying trying to backport the uuid matcher stuff is full of dragons23:05
mriedemit makes some things simpler though, e.g. your issue where compute failed to restart b/c it couldn't create a new provider with the same name23:06
melwittok, I see. yeah, I think sean-k-mooney mentioned that too but I didn't understand it at the time23:06
mriedemif the uuids are all synced, compute restarts, creates the compute node with the same uuid and finds the provider already exists with that uuid23:06
mriedemnot so lucky with libvirt though23:07
mriedemb/c the uuid will be unique per compute node record on the same host23:07
mriedemgranted our code that checks to see if the provider already exists could be smarter23:07
mriedemaround here
melwittok, hm that last part is interesting because that sounds like our second case (non-ironic)23:08
mriedemif we found a provider with the same name we should probably just use it23:08
melwittyeah. I think fixing this would kill two birds with one stone23:09
melwittironic and non-ironic, unless I'm missing something23:09
melwitt*fixing it in that way23:09
*** sapd1 has joined #openstack-nova23:09
mriedemwe can also easily find a provider with the same name using GET /resource_providers?name=<name>23:09
mriedemif we find out, use it23:09
*** dacbxyz has joined #openstack-nova23:09
mriedemi mean there could be dragons there too i'm not thinking about23:10
melwittthat would be an extra call tho23:10
mriedemonly if you 40923:10
mriedembecause clearly we aren't getting this back when we hit this name_conflict = 'Conflicting resource provider name:'23:10
melwittI yeah23:10
mriedemor maybe we are getting something like that but the message changed in placement, idk23:10
mriedemsee the todo from efried23:11
melwittso maybe I can try a patch for this and get efried to look for dragons23:12
efried(I haven't been following the conversation, and need to split real soon, but sure, add me to a patch)23:12
melwittbecause this looks like it could be a small backportable change that would save us in both the ironic and non-ironic cases23:12
*** awalende has joined #openstack-nova23:14
melwittthanks for the help on this23:15
dansmithI'm not sure how I feel about reusing the provider by name23:15
melwittah dammit23:15
dansmithwe're kinda making that a primary key because the name has to be unique,23:15
dansmithbut if you think about compute nodes in split brain,23:15
dansmiththey're going to fight over the same provider23:15
dansmithand with conductor groups in ironic,23:15
dansmithor two ironics and separate computes,23:16
dansmithif you ended up with two nodes of the same name, nova isn't going to know they're different and is going to fight over them23:16
dansmithand by fight, I mean silently overwrite inventory23:16
dansmithif that provider has allocations, things are going to get all messed (the eff) up I think23:16
*** dacbxyz has quit IRC23:16
dansmithmoving the ironic node id to be the provider id is the right move (which we've already done)23:17
dansmithso I dunno, maybe that means for <rocky, the name hack works, but... it's kinda contrary to the point of the provider id23:17
dansmithmelwitt: I think I hinted at this in my email on the internal thread about this23:17
melwittwhat's the right move in the non-ironic case then?23:17
*** slaweq has quit IRC23:17
dansmithfor the service delete?23:18
dansmiththat's probably even worse really,23:18
melwittI assume it's service delete. I don't know for sure how they got into that state23:18
dansmithbecause if you end up with two computes with the same name due to some dns or dhcp breakage, they'll take over each others' providers (and allocations) which would be helzabad23:18
melwittthat thread spun off from the ironic bz23:18
*** awalende has quit IRC23:19
dansmithyou will have other issues if you have a name clash, obviously, but if you mix allocations from two computes together, or have one go negative because it's a smaller compute, just... hard to debug and fix23:19
dansmithso anyway, I dunno23:19
dansmithnot saying don't do it.. glad we don't have to do it on master, but I'm not super confident that it's a great idea for <rocky either23:20
melwittok... sigh.. so with the moving node id to be provider id, would that not require a migration step if we were able to backport to queens?23:20
dansmiththat only affects ironic23:20
melwittyeah sorry. going back to the ironic thing again, to fix that one in that way, would it require a migration step?23:20
dansmithas I said on that thread, you'd have to make sure all the computes rolled to that change atomically, and yeah I dunno how the allocations get or got moved with that when we transitioned,23:21
dansmithbut not reasonable for a backport either way23:21
melwittI can and will publish the workaround steps but given the number of customers hitting it, I dunno, thinking about trying the backport23:21
dansmithreversing the order of provider delete works around the ironic issue doesn't it?23:21
melwittyeah it does23:22
dansmiththat's reasonable, backporting the node uuid thing is not reasonable I think23:22
melwittok. got it23:22
melwittthe non-ironic thing I think need more information because I don't know how it got into the state. and no idea how to workaround bc they can't migrate any instances away from it because all migrations fail with ResourceProviderCreationFailed23:24
melwittand there's no heal_allocations so they can't delete the allocations and RP and restore allocations23:24
dansmithwell, they can23:24
dansmithusing osc-placement23:25
dansmithI mean, you can script that for them23:25
dansmithor backport heal allocations, that's much less scary I think23:25
melwittyeah, that's what I'm thinking23:25
dansmithheal allocations will just fix it so you can delete everything, let it create and then heal right?23:25
melwittosc-placement I didn't see a way to update a RP with a different uuid23:25
dansmithyou mean the nova-manage healer thing right?23:25
melwittI do yeah23:26
dansmithi.e. the sunday morning public access TV version of nova.. BAH HALED!23:26
dansmithmelwitt: I mean with osc-placement you can create and delete allocations, IIRC, so you can save them off, then re-add them after it creates the provider afresh23:26
melwittI wish the command were healer, that would be more fun23:27
melwittdansmith: oh geez. yes, that's true23:27
melwittI didn't even think about that, guh23:27
melwittlol oh I wish we could put that on the docs page for it23:28
efriedmriedem, sean-k-mooney, gibi:
efriedand I'm out23:28
melwittyeah, I'm gonna just backport heal_allocations instead because I'm just imagining support doing that on a one-by-one instance basis. oof23:29
melwitter, one by one RP23:29
dansmithmelwitt: you could, oh I dunno, write them a script :)23:30
dansmithbut whatever :)23:30
dansmithosc is like lovin up on some scriptability amirite?23:30
dansmithanyway, I'm very happy to leave that decision and work to you23:31
dansmithjust sayin'23:31
melwittdon't worry, they're all assigned to e23:31
melwittunfortunately for them23:32
dansmithpraise be23:32
*** slaweq has joined #openstack-nova23:35
*** slaweq has quit IRC23:40
*** slaweq has joined #openstack-nova23:44
mriedemi haven't read all the way back, but regarding "if you ended up with two nodes of the same name, nova isn't going to know they're different and is going to fight over them" - with ironic that's not possible (since rocky) since there is a unique index on the compute_nodes.uuid23:47
*** tbachman has quit IRC23:48
*** slaweq has quit IRC23:48
*** mkrai has quit IRC23:48
melwittyeah we were talking queens and non-ironic as well23:49
melwittI was trying to get together a game plan for each case23:50
mriedemwell, just give this to whoever?
mriedemi wrote that b/c of all of this23:51
melwittwe don't have heal_allocations yet. but I'm gonna backport it after this convo23:51
mriedemhelp with my service delete backports23:51
melwittin queens23:51
mriedemand then review my wip thing which is the last piece23:51
mriedemeandersson: i think she's talking about internal to rhosp23:52
mriedemso you're sol23:52
melwittyeah, I have to get these things unwedged first23:52
mriedemsylvain has this audit thing he's been working on for about 3 years as well23:52
melwittand I don't know for sure whether service delete caused the issue, but I think there's a fair chance23:53
mriedembetween contract negotiations and skiing23:53
*** mdbooth has quit IRC23:53
eanderssontbh we are almost always backporting these things ourselfs, but much better to have it officially backported :p23:53
mriedemeandersson: you know you could propose backports *upstream*23:53
eanderssonSo much work to do, so little time :D23:53
mriedemor at least be like, "can you guys backport x because that would be gr8 lol"23:54
eanderssonI am still top1 of lines contributed for U :D23:54
melwittyeah, just hang on, I got involved with these bugs within the last couple of weeks. I didn't understand them before23:54
mriedemif operators speak up about needing shit we usually jump on it a bit faster23:54
mriedemdid you know belmiro has a dedicated line to dansmith's office?23:54
eanderssonI tend to backport things, but a lot of the nova backports are high complexity due to the number of changes between master and stable branches.23:55
*** mdbooth has joined #openstack-nova23:55
eanderssonSo a lot of my effort goes into projects I already understand (e.g. designate, senlin etc).23:55
eanderssonBut I do try to report them here =]23:56
melwittall of the compute/service/host/node/RP/allocation intermingling stuff has not been my area of expertise23:56
mriedemit's confusing23:56
melwittnow that I have some idea wtf is going on, sure I will help with the backports and all that23:57
eanderssonSometimes I wish I wasn't a manager. Would give me more time behind the keyboard. =]23:57

Generated by 2.15.3 by Marius Gedminas - find it at!