*** markvoelker has quit IRC | 00:01 | |
*** brinzhang has joined #openstack-nova | 00:14 | |
*** slaweq has quit IRC | 00:15 | |
*** erlon has quit IRC | 00:30 | |
*** betherly has joined #openstack-nova | 00:35 | |
*** betherly has quit IRC | 00:39 | |
*** mlavalle has quit IRC | 00:42 | |
*** erlon has joined #openstack-nova | 00:43 | |
*** betherly has joined #openstack-nova | 00:56 | |
*** betherly has quit IRC | 01:01 | |
*** zul has quit IRC | 01:04 | |
*** wangy has joined #openstack-nova | 01:06 | |
*** hongbin has joined #openstack-nova | 01:24 | |
*** TuanDA has joined #openstack-nova | 01:37 | |
*** betherly has joined #openstack-nova | 01:48 | |
*** betherly has quit IRC | 01:52 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:54 | |
*** erlon has quit IRC | 01:58 | |
*** sapd1 has quit IRC | 02:02 | |
*** sapd1_ has joined #openstack-nova | 02:02 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: WIP: Support attach/detach instance root volume https://review.openstack.org/614441 | 02:02 |
---|---|---|
*** markvoelker has joined #openstack-nova | 02:03 | |
*** cfriesen has quit IRC | 02:04 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Remove useless sample and add the lack of tests in v266 https://review.openstack.org/614671 | 02:07 |
*** tetsuro has joined #openstack-nova | 02:08 | |
*** tetsuro has quit IRC | 02:11 | |
*** mhen has quit IRC | 02:13 | |
*** mhen has joined #openstack-nova | 02:16 | |
*** tiendc has joined #openstack-nova | 02:25 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672 | 02:29 |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672 | 02:33 |
*** markvoelker has quit IRC | 02:35 | |
*** tetsuro has joined #openstack-nova | 02:54 | |
*** mrsoul has quit IRC | 02:55 | |
*** tetsuro has quit IRC | 02:59 | |
*** tetsuro has joined #openstack-nova | 03:00 | |
*** psachin has joined #openstack-nova | 03:10 | |
*** sapd1_ has quit IRC | 03:15 | |
*** sapd1__ has joined #openstack-nova | 03:17 | |
*** icey has quit IRC | 03:18 | |
*** betherly has joined #openstack-nova | 03:21 | |
*** sapd1__ has quit IRC | 03:22 | |
*** sapd1_ has joined #openstack-nova | 03:22 | |
*** icey has joined #openstack-nova | 03:23 | |
*** betherly has quit IRC | 03:26 | |
*** markvoelker has joined #openstack-nova | 03:32 | |
*** threestrands has joined #openstack-nova | 03:46 | |
*** udesale has joined #openstack-nova | 03:50 | |
*** betherly has joined #openstack-nova | 03:53 | |
*** Dinesh_Bhor has quit IRC | 03:56 | |
*** betherly has quit IRC | 03:57 | |
*** bzhao__ has quit IRC | 03:58 | |
*** wangy has quit IRC | 04:04 | |
*** markvoelker has quit IRC | 04:06 | |
*** betherly has joined #openstack-nova | 04:24 | |
*** betherly has quit IRC | 04:29 | |
*** hongbin has quit IRC | 04:37 | |
*** tetsuro has quit IRC | 04:44 | |
*** Dinesh_Bhor has joined #openstack-nova | 04:46 | |
*** alex_xu has quit IRC | 04:53 | |
*** alex_xu has joined #openstack-nova | 04:56 | |
*** markvoelker has joined #openstack-nova | 05:02 | |
*** wangy has joined #openstack-nova | 05:10 | |
*** TuanDA has quit IRC | 05:10 | |
*** betherly has joined #openstack-nova | 05:16 | |
*** betherly has quit IRC | 05:21 | |
*** ircuser-1 has quit IRC | 05:24 | |
*** abhishekk has joined #openstack-nova | 05:26 | |
*** ratailor has joined #openstack-nova | 05:35 | |
*** markvoelker has quit IRC | 05:36 | |
*** betherly has joined #openstack-nova | 05:37 | |
*** betherly has quit IRC | 05:42 | |
*** fanzhang has joined #openstack-nova | 05:49 | |
*** betherly has joined #openstack-nova | 05:57 | |
*** betherly has quit IRC | 06:02 | |
*** wangy has quit IRC | 06:04 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672 | 06:15 |
*** Dinesh_Bhor has quit IRC | 06:25 | |
*** markvoelker has joined #openstack-nova | 06:33 | |
*** tiendc has quit IRC | 06:39 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672 | 06:44 |
*** wangy has joined #openstack-nova | 06:45 | |
*** threestrands has quit IRC | 06:52 | |
*** Dinesh_Bhor has joined #openstack-nova | 07:01 | |
*** brinzhang has quit IRC | 07:01 | |
*** brinzhang has joined #openstack-nova | 07:02 | |
*** tetsuro has joined #openstack-nova | 07:04 | |
*** icey has quit IRC | 07:04 | |
*** markvoelker has quit IRC | 07:05 | |
*** icey_ has joined #openstack-nova | 07:12 | |
*** icey_ is now known as icey | 07:14 | |
openstackgerrit | Merged openstack/nova stable/rocky: conductor: Recreate volume attachments during a reschedule https://review.openstack.org/612487 | 07:18 |
*** Dinesh_Bhor has quit IRC | 07:24 | |
*** lpetrut has joined #openstack-nova | 07:26 | |
*** Dinesh_Bhor has joined #openstack-nova | 07:30 | |
*** skatsaounis has quit IRC | 07:30 | |
*** alexchadin has joined #openstack-nova | 07:38 | |
*** pcaruana|elisa| has joined #openstack-nova | 07:40 | |
*** slaweq has joined #openstack-nova | 07:58 | |
*** pcaruana|elisa| has quit IRC | 07:59 | |
*** imacdonn has quit IRC | 08:00 | |
*** udesale has quit IRC | 08:02 | |
*** Dinesh_Bhor has quit IRC | 08:02 | |
*** markvoelker has joined #openstack-nova | 08:03 | |
*** pcaruana has joined #openstack-nova | 08:05 | |
*** lpetrut has quit IRC | 08:12 | |
*** slaweq has quit IRC | 08:18 | |
*** ralonsoh has joined #openstack-nova | 08:22 | |
*** ralonsoh has quit IRC | 08:22 | |
*** ralonsoh has joined #openstack-nova | 08:23 | |
*** markvoelker has quit IRC | 08:36 | |
*** gokhani has joined #openstack-nova | 08:37 | |
*** skatsaounis has joined #openstack-nova | 08:38 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova-specs master: Make scheduling weight more granular https://review.openstack.org/599308 | 08:46 |
*** mgoddard has joined #openstack-nova | 08:50 | |
*** tetsuro has quit IRC | 08:51 | |
*** tetsuro has joined #openstack-nova | 08:53 | |
*** alexchadin has quit IRC | 09:00 | |
*** alexchadin has joined #openstack-nova | 09:01 | |
*** rmk has quit IRC | 09:02 | |
*** rabel has quit IRC | 09:02 | |
*** Dinesh_Bhor has joined #openstack-nova | 09:05 | |
*** tetsuro has quit IRC | 09:09 | |
*** alexchadin has quit IRC | 09:11 | |
*** fghaas has joined #openstack-nova | 09:14 | |
*** derekh has joined #openstack-nova | 09:16 | |
*** udesale has joined #openstack-nova | 09:16 | |
*** wangy has quit IRC | 09:20 | |
*** fghaas has quit IRC | 09:23 | |
*** k_mouza has joined #openstack-nova | 09:24 | |
openstackgerrit | Martin Midolesov proposed openstack/nova master: vmware:PropertyCollector for caching instance properties https://review.openstack.org/608278 | 09:26 |
openstackgerrit | Martin Midolesov proposed openstack/nova master: VMware: Expose esx hosts to Openstack https://review.openstack.org/613626 | 09:26 |
*** rabel has joined #openstack-nova | 09:28 | |
*** masayukig[m] has joined #openstack-nova | 09:29 | |
*** markvoelker has joined #openstack-nova | 09:33 | |
*** ttsiouts has joined #openstack-nova | 09:36 | |
openstackgerrit | Lucian Petrut proposed openstack/os-vif master: Do not import pyroute2 on Windows https://review.openstack.org/614728 | 09:39 |
*** ttsiouts has quit IRC | 09:42 | |
*** ttsiouts has joined #openstack-nova | 09:47 | |
*** lpetrut has joined #openstack-nova | 09:49 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: PowerVM upt parity for reshaper, DISK_GB reserved https://review.openstack.org/614643 | 09:50 |
openstackgerrit | gaobin proposed openstack/nova master: Improve the properties of the api https://review.openstack.org/614730 | 09:50 |
stephenfin | lyarwood: Morning. Think you could take a look at these backports today? https://review.openstack.org/#/q/topic:bug/1799727+branch:stable/rocky | 09:52 |
*** spatel has joined #openstack-nova | 09:56 | |
*** panda|off is now known as panda | 09:56 | |
lyarwood | stephenfin: yup will do | 09:58 |
*** spatel has quit IRC | 10:00 | |
*** ralonsoh has quit IRC | 10:02 | |
*** ralonsoh has joined #openstack-nova | 10:03 | |
*** markvoelker has quit IRC | 10:07 | |
openstackgerrit | Lucian Petrut proposed openstack/os-vif master: Do not import pyroute2 on Windows https://review.openstack.org/614728 | 10:10 |
*** phuongnh has joined #openstack-nova | 10:12 | |
*** k_mouza has quit IRC | 10:16 | |
openstackgerrit | gaobin proposed openstack/nova master: Improve the properties of the api https://review.openstack.org/614730 | 10:17 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Fail to live migration if instance has a NUMA topology https://review.openstack.org/611088 | 10:17 |
*** k_mouza has joined #openstack-nova | 10:18 | |
*** k_mouza has quit IRC | 10:19 | |
*** k_mouza has joined #openstack-nova | 10:20 | |
*** jaosorior has quit IRC | 10:23 | |
*** tssurya has joined #openstack-nova | 10:24 | |
*** phuongnh has quit IRC | 10:31 | |
*** tbachman has quit IRC | 10:32 | |
*** k_mouza has quit IRC | 10:40 | |
openstackgerrit | Takashi NATSUME proposed openstack/python-novaclient master: Fix flavor keyerror when nova boot vm https://review.openstack.org/582147 | 10:41 |
*** k_mouza has joined #openstack-nova | 10:44 | |
*** alexchadin has joined #openstack-nova | 10:45 | |
*** dave-mccowan has joined #openstack-nova | 10:46 | |
*** Dinesh_Bhor has quit IRC | 10:57 | |
*** markvoelker has joined #openstack-nova | 11:04 | |
*** pcaruana has quit IRC | 11:05 | |
*** k_mouza has quit IRC | 11:07 | |
johnthetubaguy | stephenfin: yeah, it makes sense not to disrupt the chain. | 11:15 |
*** udesale has quit IRC | 11:24 | |
*** alexchadin has quit IRC | 11:26 | |
*** k_mouza has joined #openstack-nova | 11:26 | |
*** ttsiouts has quit IRC | 11:30 | |
*** jaosorior has joined #openstack-nova | 11:35 | |
*** ratailor has quit IRC | 11:36 | |
*** markvoelker has quit IRC | 11:36 | |
*** Nel1x has joined #openstack-nova | 11:40 | |
openstackgerrit | huanhongda proposed openstack/nova master: AZ operations: check host has no instances https://review.openstack.org/611833 | 11:50 |
*** pcaruana has joined #openstack-nova | 11:52 | |
*** erlon has joined #openstack-nova | 11:57 | |
*** ttsiouts has joined #openstack-nova | 12:06 | |
*** lpetrut has quit IRC | 12:13 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: WIP support attach/detach root volume 2 https://review.openstack.org/614750 | 12:19 |
jaypipes | johnthetubaguy: re: the unified limits thing... I should have some PoC code to show you by end of week. It will give us something more concrete to discuss. It doesn't impact the REST API in nova at all. | 12:20 |
*** alexchadin has joined #openstack-nova | 12:21 | |
johnthetubaguy | jaypipes: OK, cool. Which bit are you looking at, using placement or the oslo.limits piece, or both? | 12:21 |
johnthetubaguy | jaypipes: was hoping to start work on a PoC soon, how I have finished the previous project that has been distracting me full time! | 12:22 |
jaypipes | johnthetubaguy: both. | 12:22 |
johnthetubaguy | at the PTG we seemed to land on doing the placement thing second, but I would certainly like to see the two together | 12:22 |
jaypipes | johnthetubaguy: not actually using oslo.limits, but with a bunch of "TODO(jaypipes): This should be ported to oslo.limits" notes. :) Along with a health dose of "NOTE(jaypipes): Under no circumstances should this infect oslo.limits" | 12:23 |
johnthetubaguy | ah, OK, got you | 12:23 |
johnthetubaguy | sounds good | 12:23 |
jaypipes | johnthetubaguy: yeah, I'm tackling the limit-getting stuff first, placement queries second. | 12:23 |
jaypipes | johnthetubaguy: obviously, the limit-*setting* stuff along with quota classes are the things marked "under no circumstances should this infect oslo.limits" :) | 12:24 |
johnthetubaguy | jaypipes: I was thinking along the lines of a parallel quota system, so we just ditch all the old stuff, its too infected with junk like user limits | 12:25 |
johnthetubaguy | well, its clearly not quite that simple, but anyways, looking forward to seeing the PoC | 12:26 |
jaypipes | johnthetubaguy: yeah, I haven't touched any of the "develop a system to migrate nova to use unified limits" stuff. that part of your spec would still very much be needed. | 12:27 |
jaypipes | johnthetubaguy: that said, I've added the infrastructure to be able to configure CONF.quota.driver to something like "unified" and have that switch the underlying mechanisms for limits retrieval. | 12:27 |
jaypipes | johnthetubaguy: so hopefully that data migration stuff can build on top of my work. | 12:28 |
jaypipes | johnthetubaguy: hopefully it should all make sense when I push the code today or tomorrow. | 12:28 |
jaypipes | (I'm OOO this afternoon) | 12:28 |
johnthetubaguy | jaypipes: ah, I don't have code for it yet, only a plan. Yeah, I think I get what you mean, but will look out for the patches | 12:29 |
jaypipes | johnthetubaguy: cool, thanks. I'll add you to the reviews. | 12:30 |
*** udesale has joined #openstack-nova | 12:32 | |
*** Nel1x has quit IRC | 12:37 | |
*** brinzhang has quit IRC | 12:40 | |
*** munimeha1 has joined #openstack-nova | 12:47 | |
*** jmlowe has quit IRC | 12:47 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata https://review.openstack.org/614757 | 12:51 |
*** ttsiouts has quit IRC | 12:55 | |
*** zul has joined #openstack-nova | 12:56 | |
*** udesale has quit IRC | 12:56 | |
*** eharney has joined #openstack-nova | 13:01 | |
*** udesale has joined #openstack-nova | 13:01 | |
*** ttsiouts has joined #openstack-nova | 13:04 | |
*** zul has quit IRC | 13:04 | |
*** zul has joined #openstack-nova | 13:06 | |
*** jmlowe has joined #openstack-nova | 13:07 | |
*** tbachman_ has joined #openstack-nova | 13:10 | |
*** mchlumsky has joined #openstack-nova | 13:12 | |
*** tbachman_ is now known as tbachman | 13:13 | |
*** liuyulong has joined #openstack-nova | 13:15 | |
*** belmoreira has joined #openstack-nova | 13:17 | |
*** k_mouza has quit IRC | 13:20 | |
*** k_mouza has joined #openstack-nova | 13:25 | |
*** awaugama has joined #openstack-nova | 13:28 | |
*** mriedem has joined #openstack-nova | 13:28 | |
sean-k-mooney | bauzas: mriedem we have a regression in the os-vif 1.12.0 release which is fixed in one of my patches already so we are going to blacklist 1.12.0 in the global requirements. https://review.openstack.org/#/c/614764/1 | 13:38 |
sean-k-mooney | im going to work on geting 2 new os-vif gate jobs to test ovs with iptables and linux brige next sprint to catch these kind of thing going forward | 13:39 |
sean-k-mooney | ill likely start on that next week however. | 13:40 |
sean-k-mooney | bauzas: as the nova release liasion could you comment on | 13:40 |
sean-k-mooney | https://review.openstack.org/#/c/614764/1 | 13:40 |
mnaser | ok please forgive me if this sound silly but | 13:43 |
mnaser | microversion 1.4 > microversion 1.25, right? | 13:43 |
sean-k-mooney | no | 13:45 |
sean-k-mooney | its not a desimal point | 13:45 |
sean-k-mooney | it semantic versioning | 13:45 |
*** takashin has joined #openstack-nova | 13:46 | |
mnaser | ok | 13:46 |
mnaser | explains things | 13:46 |
* mnaser goes back to hacking things | 13:46 | |
mnaser | thanks sean-k-mooney | 13:46 |
*** liuyulong has quit IRC | 13:47 | |
johnthetubaguy | mnaser: its more like version 4.0 vs version 25.0 actually, as any micro-version can drop functionality | 13:50 |
mnaser | Okay, so trying to figure out why this upgrade somehow is causing nova to request a micro version 1.25 but the service is not providing that | 13:51 |
mnaser | Could be a super screwed up deployment too. | 13:51 |
johnthetubaguy | oh right, request the version from cinder or ironic? | 13:51 |
johnthetubaguy | we usually have a minimum version we need, which implies a minimum version of all the dependent services | 13:52 |
johnthetubaguy | mnaser: who is requesting 1.25 from whom? | 13:53 |
mnaser | johnthetubaguy: so it looks like os_region_name is not a valid option inside the placement section | 13:55 |
mnaser | So this multiregion deployed was probably hitting the wrong region. os_region_name was silently dropped? | 13:56 |
mnaser | So it was hitting an older region | 13:56 |
johnthetubaguy | good question, that sounds bad | 13:56 |
mnaser | It was removed after one cycle.. | 13:57 |
sean-k-mooney | mnaser: the simplest thing to do it pretend ther is no . | 13:57 |
mnaser | https://github.com/openstack/nova/commit/3db815957324f4bd6912238a960a90624d97c518 | 13:58 |
mriedem | nova meeting in 2 minutes | 13:58 |
mnaser | A bit quick to remove it after just a cycle? | 13:58 |
*** suggestable has joined #openstack-nova | 14:01 | |
johnthetubaguy | mnaser: that has always been the norm for config, we just don't usually remember to do it | 14:01 |
*** suggestable has left #openstack-nova | 14:01 | |
*** liuyulong_ has joined #openstack-nova | 14:03 | |
mnaser | johnthetubaguy: ah okay | 14:04 |
johnthetubaguy | mnaser: now the whole skip version upgrades thing clearly makes that less of a good policy... not sure if we have an answer for that one yet. | 14:06 |
mnaser | johnthetubaguy: yeah, i dont do that (nor do i support that idea).. so i should look at logs :p | 14:07 |
johnthetubaguy | mnaser: heh :) | 14:09 |
mriedem | oslo.config has a new thing for FFU with config stuff | 14:15 |
johnthetubaguy | mriedem: ah, cool | 14:19 |
sean-k-mooney | on the meeting ended quicking then i taught it would | 14:20 |
*** takashin has left #openstack-nova | 14:20 | |
sean-k-mooney | i was going to ask peole to asses https://blueprints.launchpad.net/nova/+spec/libvirt-neutron-sriov-livemigration and the related spec if they can to indicate if this can proceed for this cycle | 14:21 |
sean-k-mooney | i have spec update to make but they will be done later today. | 14:21 |
mriedem | johnthetubaguy: mnaser: this thing https://specs.openstack.org/openstack/oslo-specs/specs/rocky/handle-config-changes.html | 14:23 |
mriedem | i think that is still a WIP | 14:23 |
mriedem | jackding: if https://review.openstack.org/#/c/609180/ is ready for review please put it in the runways queue https://etherpad.openstack.org/p/nova-runways-stein | 14:24 |
*** mlavalle has joined #openstack-nova | 14:24 | |
*** Luzi has joined #openstack-nova | 14:27 | |
*** mvkr has quit IRC | 14:40 | |
mriedem | hmm, did something regress with performance? https://bugs.launchpad.net/nova/+bug/1800755 | 14:42 |
openstack | Launchpad bug 1800755 in OpenStack Compute (nova) "The instance_faults table is too large, leading to slow query speed of command: nova list --all-tenants" [Undecided,New] | 14:42 |
mriedem | that was fixed with https://bugs.launchpad.net/nova/+bug/1800755 | 14:42 |
mriedem | oops | 14:42 |
mriedem | https://review.openstack.org/#/c/409943/ | 14:42 |
mriedem | is there any reason we don't purge old faults? | 14:43 |
mriedem | we only show the latest | 14:43 |
mriedem | and we don't provide any API or nova-manage CLI to show *all* faults for a given instance | 14:43 |
*** Luzi has quit IRC | 14:44 | |
jackding | mriedem: sure, will do | 14:44 |
sean-k-mooney | mriedem: would that mess with audit logs? | 14:45 |
mriedem | you mean that config/api that no one uses? | 14:45 |
sean-k-mooney | mriedem: a nova-manage command could make sense or an admin only api | 14:45 |
mriedem | https://developer.openstack.org/api-ref/compute/?expanded=list-server-usage-audits-detail#server-usage-audit-log-os-instance-usage-audit-log | 14:45 |
sean-k-mooney | mriedem: no i was thinking that for some deployment there may be requiremetn to record falts for audit/sla reasons | 14:46 |
sean-k-mooney | i was not thinking of any feature in partaclar | 14:46 |
sean-k-mooney | im just not sure if auto cleanup of old faluts would be somehting we would want in all cases | 14:47 |
mriedem | i'm not suggest an auto cleanup, | 14:47 |
mriedem | but a nova-manage db purge_faults | 14:48 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: [WIP] Make _instances_cores_ram_count() be smart about cells https://review.openstack.org/569055 | 14:48 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: WIP: API microversion bump for handling-down-cell https://review.openstack.org/591657 | 14:48 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: [WIP] Add os_compute_api:servers:create:cell_down policy https://review.openstack.org/614783 | 14:48 |
sean-k-mooney | ya that i think makes total sense. the same way keystone allows you to purge the expired uuid tokens from its db | 14:48 |
sean-k-mooney | mriedem: where you thinking it would drop the fault older then X from the db or move them to an archive table? | 14:50 |
mriedem | i'm not really putting much thought into this | 14:51 |
*** Swami has joined #openstack-nova | 14:52 | |
sean-k-mooney | its one of those things that if you brought it up at the ptg i woudl be like "sure go for it" but it also does not should like a supper high prioity either so ya in any case i cant really think of a reason not to allow it off the top of my head | 14:54 |
mriedem | tssurya: in case you haven't started yet, i was thinking about how to do 2.68 down-cell functional api samples testing, which will require some kind of fixture to simulate a down cell, | 14:58 |
mriedem | and i think i have an idea of how to write that | 14:58 |
*** cfriesen has joined #openstack-nova | 15:00 | |
tssurya | mriedem: I saw your todos but I haven't started, feel free to start if you have the time your tests are surely going to be more thorough than mine. | 15:00 |
tssurya | bug thanks | 15:00 |
tssurya | big* | 15:00 |
mriedem | ok i think i'll just hack on a DownCellFixture in a separate patch below the API microversion one at the end, and then it could be used in the functional api samples tests, | 15:00 |
mriedem | the nice thing with fixtures is they are also context managers, | 15:01 |
mriedem | so you could create a server while the cell is 'up' and then do something like: | 15:01 |
mriedem | with down_cell_fixture: | 15:01 |
mriedem | get('/servers') | 15:01 |
mriedem | and you should get the minimal construct back | 15:01 |
tssurya | oh nice | 15:02 |
tssurya | there was a doubt however with the sample tests, the jsons you have created.. I thought they were supposed to be created automatically once we write the tests ? | 15:02 |
mriedem | i think they are, | 15:02 |
mriedem | i was just trying to get the api-ref build to pass | 15:02 |
tssurya | ah okay :) | 15:03 |
*** liuyulong_ has quit IRC | 15:05 | |
*** liuyulong has joined #openstack-nova | 15:06 | |
mriedem | lyarwood: don't forget to add an etherpad for your forum session to https://wiki.openstack.org/wiki/Forum/Berlin2018 | 15:07 |
mriedem | i think i have crap to dump in there | 15:07 |
mriedem | melwitt: were you going to send https://etherpad.openstack.org/p/nova-forum-stein to the ML for the list of xp sessions to have warm nova bodies in attendance? | 15:09 |
lyarwood | mriedem: unfortunately I'm no longer attending, sent a note to the foundation when I found out yesterday. | 15:09 |
mriedem | lyarwood: hmm, i could possibly run that session | 15:09 |
mriedem | or we could find *someone* | 15:09 |
dansmith | I vote for mriedem | 15:09 |
mriedem | random berliner on the street | 15:09 |
mriedem | i'll pay them in sausage | 15:10 |
dansmith | he needs more stuff to do and I hear he loves volumes, especially multi-attached ones | 15:10 |
*** k_mouza has quit IRC | 15:10 | |
*** jmlowe has quit IRC | 15:10 | |
mriedem | i'll gladly moderate any number of forum sessions if it means i don't have to do any presentations | 15:10 |
*** mvkr has joined #openstack-nova | 15:11 | |
*** kukacz has quit IRC | 15:12 | |
mriedem | lyarwood: well if the foundation doesn't pull the session, it looks like i had it marked on my calendar to attend anyway so if you want i can moderate it | 15:12 |
mriedem | and just assign all of the work to you | 15:12 |
*** psachin has quit IRC | 15:13 | |
*** itlinux has quit IRC | 15:13 | |
lyarwood | mriedem: haha so nothing would ever get done | 15:13 |
lyarwood | mriedem: but yeah let me ping them quickly and see if I can save that session | 15:14 |
mriedem | there are a few volume-related specs for stein that would be good to discuss there, like this one to specify delete_on_termination when attaching a volume (and changing that value for existing attachments) | 15:14 |
mriedem | which reminds me, i dusted this off too https://review.openstack.org/#/c/393930/ | 15:15 |
mriedem | getting device tags out of the API | 15:15 |
mriedem | dansmith: i think you've been on board with that in the past ^ | 15:15 |
lyarwood | mriedem: ha the delete_on_termination issue just came up downstream and we NACK'd assuming it would be lots of work for little gain | 15:15 |
* dansmith nods | 15:16 | |
lyarwood | mriedem: are you getting pushed for that as well given users can do this in AWS? | 15:16 |
mriedem | the major problem i see with that one is we already have a PUT API for volume attachments, | 15:16 |
mriedem | the dreaded swap volume API | 15:16 |
mriedem | lyarwood: no i'm not getting pushed for it from our product people | 15:16 |
mriedem | as far as i know anyway | 15:16 |
mriedem | but it's one of those things that comes up every so often, like proxying the volume type on bfv | 15:17 |
mriedem | i don't think it's much work, it's just updating the DB | 15:17 |
mriedem | and taking a new parameter on attach | 15:17 |
mriedem | updating existing attachments is difficult b/c of our already f'ed up api | 15:18 |
mriedem | https://developer.openstack.org/api-ref/compute/#update-a-volume-attachment | 15:18 |
dansmith | it's not a major amount of heavy lifting, | 15:18 |
dansmith | but the gain seems very minor to me | 15:18 |
sean-k-mooney | so random quest. would peole object to an api to list the currently enabled schduler filters? specifically to enabel tempest and other multicloud services to detect what schuler featres they can expect | 15:19 |
dansmith | the strongest argument I've seen for it is that AWS has it and thus the standalone EC2 thing needs to be able to proxy that in | 15:19 |
sean-k-mooney | *question however it could become a quest | 15:19 |
dansmith | but afaik, that's pretty much dead these days | 15:19 |
*** kukacz has joined #openstack-nova | 15:19 | |
dansmith | sean-k-mooney: yes I would object | 15:19 |
*** alexchadin has quit IRC | 15:19 | |
mriedem | "because AWS and Alibaba have it" is something i hear every week | 15:19 |
sean-k-mooney | dansmith: because we are exposing configuration via the api or somethign else | 15:20 |
dansmith | sean-k-mooney: it would literally be an api call that would make an rpc call to scheduler to return a chunk of config, which shouldn't be visible externally anyway. and if you're running multiple schedulers, which do you cal? | 15:20 |
mriedem | the only reason i could see for doing something like that (scheduler filters and such) is to tell users, via the api, which hints are available | 15:20 |
*** fghaas has joined #openstack-nova | 15:20 | |
johnthetubaguy | sean-k-mooney: discovery of available scheduler hints was something we once said we would consider, which is a bit different | 15:21 |
dansmith | yep | 15:21 |
mriedem | right, it would only be feasible if it was a list of hints | 15:21 |
mriedem | which is totally pluggable btw | 15:21 |
sean-k-mooney | johnthetubaguy: its related to this tempest change https://review.openstack.org/#/c/570207/12 | 15:21 |
johnthetubaguy | yeah, that was the downside, in the general case, it means nothing useful | 15:21 |
sean-k-mooney | johnthetubaguy: the issue i have with the change is it require use to keep the nova and tempest default in sync | 15:22 |
dansmith | sean-k-mooney: tempest has always been blackbox, requiring you to tell it the nova side scheduler config for this reason | 15:22 |
artom | sean-k-mooney, don't you dare bring more people into this. I will fly to Ireland and cut you, I swear. | 15:22 |
artom | We already can't agree downstream | 15:22 |
dansmith | tempest is a testing/validation tool.. keeping the two configs in sync is a few lines of bash | 15:22 |
mriedem | right, devstack configures the filters in both nova and tempest | 15:22 |
dansmith | right | 15:22 |
*** k_mouza has joined #openstack-nova | 15:22 | |
sean-k-mooney | dansmith: the issue is making triplo do that | 15:22 |
mriedem | devstack also adds the same/different host filtesr which aren't in the default enabled_filters list for nova | 15:23 |
dansmith | sean-k-mooney: s/bash/puppet/ | 15:23 |
mriedem | for any nfv ci, they'd also need to configure to numa/pci filters | 15:23 |
mriedem | etc | 15:23 |
sean-k-mooney | dansmith: ya i know its just triplo is a pain to make work instead of devstack | 15:23 |
dansmith | sean-k-mooney: adding an api to nova to work around tripleo not being able to communicate config to another module is INSANITY | 15:23 |
sean-k-mooney | mriedem: yes today they only need to enable it in nova however | 15:23 |
johnthetubaguy | I think sdague convinced me about this in the past, you don't want auto discovery, you want to tell the test system what you expect to happen, else there be dragons | 15:24 |
artom | dansmith, it's not communicate per se - if tripleo doens't set the nova value, it shouldn't have to set the corresponding tempest value | 15:24 |
dansmith | artom: find another way | 15:25 |
dansmith | seriously. | 15:25 |
johnthetubaguy | matching defaults? | 15:25 |
artom | dansmith, my other way is https://review.openstack.org/#/c/570207/12 | 15:25 |
sean-k-mooney | johnthetubaguy: ya well it was jut a taught my main issue with artom change is that we would have to keep it in sync if we add filter in the futre to the default set | 15:25 |
artom | johnthetubaguy, that's what ^^ is | 15:25 |
artom | But apparently everyone is literally willing to fight to the death over this. | 15:25 |
dansmith | I am | 15:25 |
* dansmith breaks a bottle on the table | 15:25 | |
dansmith | let's do it. | 15:25 |
artom | I only have this bluetooth mouse :( | 15:26 |
dansmith | forfeit? | 15:26 |
* johnthetubaguy hopes people defend themselves with pumpkins | 15:26 | |
sean-k-mooney | johnthetubaguy: the interop benift is really only a side effect and i dont feel that strongly that its a good thing | 15:26 |
artom | Pfft, as if. I'm making brass knuckles. Wireless ones. | 15:26 |
*** k_mouza has quit IRC | 15:27 | |
johnthetubaguy | artom: curious, when nova changes a default in its config, what happens to the rest of the tempest settings? | 15:29 |
artom | johnthetubaguy, you mean for other config options where Tempest uses values from Nova? Good question - gmann was saying on that review that they just update Tempest, but I'd need to look for concrete examples | 15:31 |
johnthetubaguy | artom: cool, that is what I assumed. I know its branchless, but the default is just a helping hand. | 15:31 |
* johnthetubaguy shudders, its complicated | 15:32 | |
artom | johnthetubaguy, yeah, I grok that it can't be perfect, I figured at least making it match what Nova has in master is a good first step. | 15:32 |
*** rmk has joined #openstack-nova | 15:32 | |
artom | johnthetubaguy, because the previous default of 'all' is... well, it's a handy "feature" for CIs, because they can just enable any filter in Nova and Tempest just runs with it | 15:33 |
*** gyee has joined #openstack-nova | 15:33 | |
artom | But it becomes a problem if a filter *hasn't* been enabled in Nova, Tempest will still try to run with it. | 15:33 |
mriedem | this is fun https://bugs.launchpad.net/nova/+bug/1800508 | 15:52 |
openstack | Launchpad bug 1800508 in OpenStack Compute (nova) "Missing exception handling mechanism in 'schedule_and_build_instances' for DBError at line 1180 of nova/conductor/manager.py" [Low,New] | 15:52 |
mriedem | "nova should set the instance to error state when nova fails to insert the instance into the db" | 15:53 |
sean-k-mooney | mriedem: am wait if nova cant insert the instacne into the db what is it setting error on? | 15:54 |
artom | Chicken, meet again. Cart, meet horse. | 15:54 |
artom | "again"? I mean egg | 15:54 |
mriedem | the only thing i could think there is we could try updating the instance within the build request, but that's pretty shitty | 15:57 |
sean-k-mooney | mriedem: in this case however it seams like they are booting with an invalid flavor id right? | 15:59 |
mriedem | no | 15:59 |
mriedem | he's injecting some kind of fault into the code | 15:59 |
mriedem | to trigger the db error | 15:59 |
mriedem | if you try to boot with an invalid flavor id, you'll get a 404 in the api looking up the flavor | 15:59 |
sean-k-mooney | oh ok i was trying to figure out how the create a flavor with id 1E+22 but then failed to boot with that flavor | 16:00 |
mriedem | so, i mean, your cell db could drop right when we're trying to create the server i guess, that would do it as well | 16:00 |
mriedem | but are we going to handle that scenario everywhere in nova? | 16:00 |
sean-k-mooney | ok right so in that case the insnace would be in the api db but fail to insert into the cell db | 16:01 |
sean-k-mooney | me moved the instance staus into the api db recnetly right so ya in that case we coudl set error on the api db i guess but there are a tone of other edgecase like that we dont handel | 16:02 |
sean-k-mooney | mriedem: the other thing we coudl do is have a periodic task that just updates the status of perptually building instance to error after some time e.g a day or rety limit*build timeout or something | 16:04 |
mriedem | this guy has been busy https://bugs.launchpad.net/nova/+bug/1800204 https://bugs.launchpad.net/nova/+bug/1799949 | 16:05 |
openstack | Launchpad bug 1800204 in OpenStack Compute (nova) "n-cpu.service consuming 100% of CPU indeterminately" [Undecided,New] | 16:05 |
openstack | Launchpad bug 1799949 in OpenStack Compute (nova) "VM instance building forever when an RPC error occurs" [Undecided,New] | 16:05 |
sean-k-mooney | mriedem: actull https://bugs.launchpad.net/nova/+bug/1800204 seams familar there was a similar bug report a few monts back around the rocky release | 16:06 |
openstack | Launchpad bug 1800204 in OpenStack Compute (nova) "n-cpu.service consuming 100% of CPU indeterminately" [Undecided,New] | 16:06 |
*** k_mouza has joined #openstack-nova | 16:10 | |
*** belmoreira has quit IRC | 16:11 | |
sean-k-mooney | oh wait that n-cpu not the conductor never mind | 16:13 |
*** itlinux has joined #openstack-nova | 16:14 | |
*** imacdonn has joined #openstack-nova | 16:15 | |
*** ttsiouts has quit IRC | 16:16 | |
sean-k-mooney | mriedem: do you think we will actully adress any of those bugs. | 16:16 |
stephenfin | artom: So, do I need to review https://code.engineering.redhat.com/gerrit/#/c/154627/2 yet? | 16:16 |
sean-k-mooney | stephenfin: wrong irc | 16:16 |
stephenfin | ta :) | 16:16 |
mriedem | sean-k-mooney: probably not | 16:19 |
mriedem | unless there is a more obvious way to create those faults with injecting code into the path and blow up the system | 16:19 |
mriedem | *without | 16:19 |
*** ircuser-1 has joined #openstack-nova | 16:20 | |
sean-k-mooney | mriedem: i was just debating if we shoudl triage them as incomplete or wontfix unless a different way to reporduce can be provided | 16:22 |
mriedem | i marked one of them as opinion | 16:23 |
*** k_mouza has quit IRC | 16:24 | |
johnthetubaguy | FWIW, I always wanted to be able to "timeout" tasks to try and catch that pending forever case. They caused me endless pain at Rackspace (I think mostly in the migrate/resize code path). The difference was they were more expected / user triggered errors. | 16:25 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add --before to nova-manage db archive_deleted_rows https://review.openstack.org/556751 | 16:25 |
mriedem | johnthetubaguy: how much of that was resolved with service user tokens though? | 16:26 |
mriedem | or the long_rpc_timeout we have since rocky | 16:26 |
mriedem | which we're using now in the live migration flows that do rpc calls | 16:26 |
johnthetubaguy | mriedem: yeah, I saw that go in. Although most of those cases it went to Error (eventually) when it didn't have to. | 16:27 |
*** ttsiouts has joined #openstack-nova | 16:27 | |
mriedem | that's a different bug then | 16:27 |
sean-k-mooney | johnthetubaguy: i have seen this happen with rabbitmq restarte in the past where when perstiency was disabled on instance build and a few other cases. | 16:29 |
sean-k-mooney | i never really considerd that a nova bug however because i cased the issue by restarting rabbit | 16:29 |
*** slaweq has joined #openstack-nova | 16:29 | |
sean-k-mooney | johnthetubaguy: but ya its proably more complcated then jsut set to error after x time as some request could still be in flight | 16:31 |
melwitt | mriedem: yes, it completely slipped my mind :( and I'm not done going through the entire list of the schedule yet | 16:36 |
mriedem | melwitt: i added several sessions in there based on my schedule | 16:40 |
melwitt | ok, thank you. that's helpful | 16:41 |
johnthetubaguy | sean-k-mooney: yeah, its hard to get right | 16:47 |
*** k_mouza has joined #openstack-nova | 16:53 | |
*** Swami has quit IRC | 16:55 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: API microversion bump for handling-down-cell https://review.openstack.org/591657 | 16:56 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add DownCellFixture https://review.openstack.org/614810 | 16:56 |
mriedem | tssurya: ^ | 16:56 |
tssurya | looking, thanks | 16:57 |
*** k_mouza has quit IRC | 16:59 | |
*** mgoddard has quit IRC | 17:02 | |
*** belmoreira has joined #openstack-nova | 17:02 | |
tssurya | mriedem: okay I am going to write the tests here https://review.openstack.org/#/c/591657/12/nova/tests/functional/api_sample_tests/test_servers.py based on your fixture | 17:04 |
*** udesale has quit IRC | 17:06 | |
openstackgerrit | Merged openstack/nova master: Make ResourceTracker.tracked_instances a set https://review.openstack.org/608781 | 17:13 |
dansmith | mriedem: tssurya I'm explaining the host-status concept to someone right now, and why an instance state doesn't go to STOPPED just because the compute node is down | 17:18 |
dansmith | mriedem: I wonder if it would make sense to integrate the use of the UNKNOWN state we're adding here with that feature, | 17:18 |
*** fghaas has quit IRC | 17:18 | |
dansmith | so that in the same microversion, instances with a down host show up as UNKNOWN as well | 17:18 |
*** fghaas has joined #openstack-nova | 17:19 | |
*** belmoreira has quit IRC | 17:19 | |
*** fghaas has quit IRC | 17:19 | |
*** k_mouza has joined #openstack-nova | 17:20 | |
*** belmoreira has joined #openstack-nova | 17:20 | |
tssurya | dansmith: you mean you want to add a new "UNKNOWN" vm_state ? | 17:25 |
*** Swami has joined #openstack-nova | 17:26 | |
dansmith | tssurya: you're already doing that from the external view right now | 17:26 |
tssurya | yea | 17:26 |
dansmith | we would do a similar thing for real instances we can look up just fine, but which have down hosts | 17:26 |
dansmith | the only problem would be that right now UNKNOWN means "the rest of the instance details aren't there" which would be slightly more ambiguous in this case | 17:27 |
*** slaweq has quit IRC | 17:27 | |
tssurya | hmm, makes sense to make the instance state UNKNOWN since we don't know the host state, I mean I guess "UNKNOWN" could mean unknown details/state right ? | 17:28 |
belmoreira | dansmith mriedem should placement/nova issues be discussed here or in placement channel | 17:28 |
cfriesen | dansmith: for what it's worth, in our environment if a compute node goes down an external entity sets all of the instances to the "error" state, until they were automatically recovered. | 17:28 |
dansmith | that seems like an improper use of the error state to me | 17:29 |
dansmith | not to mention that nova on its own won't know whether they're still up and fine or not | 17:29 |
cfriesen | if the host is "down", then we fence it off and force a reboot. those instances are guaranteed to be toast | 17:29 |
dansmith | which is why we don't call them "stopped" | 17:29 |
dansmith | cfriesen: okay, well, that's better in that case but vanilla nova can't do or know that | 17:30 |
cfriesen | agreed, nova itself can't know the bigger picture | 17:30 |
dansmith | belmoreira: depends on what it is.. if it's integration issues then probably here | 17:30 |
belmoreira | is the increase of the number of requests to placement | 17:32 |
cfriesen | dansmith: although, if an external entity uses the nova API to tell nova that the compute node is "down", it's supposed to have already fenced off the node to prevent instances from (eg) talking to volumes. | 17:32 |
belmoreira | have a look into: https://docs.google.com/document/d/1d5k1hA3DbGmMyJbXdVcekR12gyrFTaj_tJdFwdQy-8E/edit?usp=sharing | 17:32 |
cfriesen | otherwise you could evacuate and then have two copies of an instance trying to access the same cinder volume | 17:32 |
dansmith | cfriesen: yeah, true.. I guess I just prefer something less overloaded like UNKNOWN than saying it's stopped or error | 17:33 |
belmoreira | this is the number of requests to placement when compute-nodes get upgrade to Rocky | 17:33 |
dansmith | efried: ^ | 17:33 |
efried | how far back am I reading? | 17:33 |
dansmith | efried: one line | 17:33 |
dansmith | and the url he posted | 17:33 |
dansmith | belmoreira: is it really increasing, or is that you bringing nodes on over time? | 17:35 |
belmoreira | the increase of requests shows the compute nodes being upgraded over time (queens -> rocky) | 17:36 |
dansmith | okay | 17:36 |
dansmith | (ouch) | 17:36 |
efried | What's happening at that cat-head bump? | 17:36 |
efried | or possibly batman | 17:36 |
dansmith | online migrations? | 17:36 |
efried | those look like trait requests | 17:37 |
efried | if I'm reading this right. | 17:37 |
belmoreira | no, it must be a cell that upgraded and then stopped nova-compute | 17:37 |
*** ttsiouts has quit IRC | 17:38 | |
belmoreira | so, in the second graph we can see all the new requests | 17:38 |
*** ttsiouts has joined #openstack-nova | 17:38 | |
*** liuyulong has quit IRC | 17:39 | |
belmoreira | UUID/trais ; ?in_tree; UUID/aggregates; ... | 17:39 |
*** k_mouza_ has joined #openstack-nova | 17:40 | |
efried | right, so it looks to me like, before rocky, we weren't calling ?in_tree, UUID/aggregates, or ?member_of at all. Which makes sense. | 17:40 |
belmoreira | yes, and this seems to be the reason of the increase of requests | 17:41 |
efried | but also increased number of requests for inventories. | 17:42 |
belmoreira | but is a huge increase. Just added another graph with the response time of my placement infrastructure | 17:42 |
efried | I have to say, this isn't all that surprising. | 17:42 |
*** ttsiouts has quit IRC | 17:43 | |
*** k_mouza has quit IRC | 17:43 | |
efried | although, hm, I would have expected this jump in queens | 17:44 |
efried | belmoreira: Was this an upgrade from queens, or from earlier? | 17:44 |
*** k_mouza_ has quit IRC | 17:45 | |
belmoreira | efried from queens | 17:45 |
belmoreira | I could handle it creating more placement nodes (x3). But looks too much... | 17:47 |
efried | belmoreira: Can you give me a sense of what this timeline represents? At what point are all the upgrades done and the cloud in stable state? | 17:48 |
belmoreira | efried the nova/placement control plane was upgraded between 8:00 and 9:00. ~12:00 the compute nodes started to upgrade (this takes 24h for all of them upgrade) | 17:50 |
belmoreira | at 12:00 (today) almost all compute nodes are in Rocky. | 17:51 |
efried | belmoreira: So where it tails off at the end, that's when the upgrades are pretty much done? | 17:51 |
belmoreira | the load graphs shows when I added more capacity for placement | 17:52 |
efried | Do you have a graph for what it looks like right now? | 17:52 |
efried | I'm just wondering if it's a massive spike during upgrade, but then it evens back out afterward. | 17:52 |
efried | in which case... yeah | 17:52 |
belmoreira | efried I'm getting a new graph from now | 17:54 |
efried | though once again, I wouldn't have expected e.g. ?in_tree to be zero at queens. That should be happening every periodic. | 17:55 |
dansmith | efried: you mean you think it's startup storm? | 17:57 |
dansmith | so every time they reboot computes they'll get this? | 17:57 |
efried | dansmith: If you reboot a thousand computes... | 17:58 |
efried | dansmith: I just wanted to understand *whether* it was startup storm. | 17:58 |
dansmith | right but presumably they're not rebooting them every second | 17:58 |
dansmith | ack | 17:58 |
efried | Whether it goes back to normal once everything stabilizes | 17:58 |
openstackgerrit | Merged openstack/nova stable/rocky: De-dupe subnet IDs when calling neutron /subnets API https://review.openstack.org/608336 | 17:58 |
dansmith | they also know what upgrades look like | 17:58 |
efried | (I don't) | 17:58 |
dansmith | so the fact that they're concerned probably means something | 17:58 |
efried | Heh, I'm not tryng to weasel out of anything. Just trying to grok the problem domain. | 17:59 |
dansmith | no, I know | 17:59 |
dansmith | just sain' | 17:59 |
dansmith | even if we just made the reboot storm a lot worse, that's something we probably need to look at | 18:00 |
belmoreira | efried a new graph from now | 18:00 |
*** derekh has quit IRC | 18:00 | |
belmoreira | it is flat at the end. That is the total number of requests that we handle now | 18:00 |
efried | dansmith: Can you sanity-check me on this, though - the _refresh_associations code is in queens, including _ensure_resource_provider invoking _get_provider_in_tree, which is what invokes the ?in_tree URI. | 18:01 |
efried | the mystery being, why would they be seeing zero ?in_tree calls right before the upgrade? | 18:01 |
dansmith | I just headed into a meeting I have to pay attention to | 18:01 |
belmoreira | humm. tssurya just point out the "resource_provider_association_refresh" configuration that we had in queens we don't have it in rocky | 18:04 |
efried | mm, that'd explain a lot. Y'all added that to compensate for this kind of spike in placement traffic iirc | 18:05 |
belmoreira | efried that explains " I wouldn't have expected e.g. ?in_tree to be zero at queens" | 18:05 |
efried | yup | 18:05 |
mriedem | i thought you totally nuked resource_provider_association_refresh rather than just set it to a large value? | 18:06 |
efried | but also why all those things are zero before the upgrade and nonzero after. Like I was saying, I expect all this stuff to happen at the queens boundary, not rocky. | 18:06 |
belmoreira | in queens we patch it and set it to a very large number (to not run again). And I miss it now. My fault! | 18:07 |
efried | IOW I suspect that turning that you would have seen the same graphs simply by turning that switch off and leaving your nodes at queens | 18:08 |
belmoreira | but the number of requests we really impressive! meaning that is very difficult to keep this option in a large infrastructure | 18:08 |
efried | belmoreira: I don't disagree with that. | 18:09 |
mriedem | so by default, every compute (70K?) is refreshing inventory every 1 minute, and every 5 minutes it's also refreshing in_tree, aggregates and traits? | 18:10 |
efried | I would think moving it to a fairly generous interval and hoping your computes don't all hit that interval at the same time :) | 18:10 |
tssurya | mriedem: yea | 18:10 |
mriedem | and we do'nt use the aggregates stuff in compute yet at all from what i can tell | 18:10 |
mriedem | it was there for sharing providers which we don't support yet | 18:10 |
efried | well, didn't we start cloning host azs ? | 18:10 |
mriedem | that's in the API | 18:10 |
efried | but we're not using that in the scheduler yet? | 18:11 |
mriedem | the mirrored aggregates stuff? yes there are pre-request placement filters that rely on it (or something external doing the mirroring) | 18:11 |
mriedem | i'm not sure what that has to do with the cache / refresh for aggregates in all the computes | 18:11 |
mriedem | iow, i'm not sure what the cache in the compute buys us | 18:12 |
efried | yeah, I'm actually trying to think what we actually use the cache for at all... right. | 18:12 |
*** ralonsoh has quit IRC | 18:13 | |
efried | cdent has been grousing for a while that we should just be able to make placement calls when we need 'em. | 18:13 |
sean-k-mooney | if we really wanted to make the storm less likely we could use a random prime ofset for the update | 18:14 |
efried | I was thinking it, but then you said it. | 18:14 |
mriedem | oslo already does something like that for periodics | 18:15 |
*** jdillaman has quit IRC | 18:15 | |
efried | Trying to think what it would take to rip out the cache completely. | 18:15 |
belmoreira | I'm changing this option, it will take ~2h to propagate. Will let you know the result | 18:15 |
efried | ack | 18:15 |
belmoreira | I have to leave now for some minutes. Thanks for all your help | 18:16 |
efried | o/ | 18:16 |
efried | mriedem: We use the cache data so the virt driver has the opportunity every periodic to update the provider tree. | 18:17 |
efried | mriedem: assuming stable placement, no _refresh'ing, we would be doing a helluva lot fewer calls | 18:18 |
efried | and that's also why we cache agg data. Because upt gets to muck with that stuff also. | 18:18 |
mriedem | but nothing does right now right? | 18:18 |
mriedem | for aggregates | 18:18 |
mriedem | and assuming inventory isn't wildly changing on a compute node, we don't really need the cache | 18:19 |
efried | not sure I'm following. | 18:19 |
efried | are you saying "as long as nothing is changing, we don't need to call update_provider_tree" ? | 18:20 |
mriedem | update_provider_tree is what returns the inventory from the driver to the RT to push off to placement every 60 seocnds | 18:20 |
mriedem | *seconds | 18:20 |
mriedem | right? | 18:20 |
efried | Yes | 18:21 |
mriedem | and assuming that disk/ram/cpu on a host doesn't change all that often, at least without a restart of the host, it seems odd we need to cache that information | 18:21 |
efried | But how else would we know whether to push the info back to placement? | 18:21 |
*** ldau has joined #openstack-nova | 18:21 | |
mriedem | in the before upt times, didn't the RT/report client just pull inventory, compare to what was reported by the driver, and the PUT it back if there were changes? | 18:21 |
efried | What does "pull inventory" mean, though? | 18:22 |
*** jmlowe has joined #openstack-nova | 18:22 | |
efried | pull from placement | 18:22 |
mriedem | GET /resource_providers/{rp_uuid}/inventories | 18:22 |
efried | i.e. GET /rps/UUID/inventory | 18:22 |
efried | yeah | 18:22 |
sean-k-mooney | efried: well the driver could have a perodic check but rememebr the last value it sent and only send a value if it detactes there was a chage | 18:22 |
efried | sean-k-mooney: ^ cache | 18:22 |
sean-k-mooney | that not the same as a cache | 18:22 |
efried | and that's what we do | 18:23 |
mriedem | get_provider_tree_and_ensure_root is what gets the provider tree from the report client and pulls the current inventory from placement, yes? | 18:23 |
mriedem | and also checks to see that the provider exists on every periodic | 18:23 |
efried | yes | 18:23 |
mriedem | which we should actually know | 18:23 |
efried | yeah, we could conceivably expect the compute RP not to disappear once we've created it. | 18:24 |
efried | I mean, I don't know how resilient we're trying to be in the face of OOB changes. | 18:24 |
efried | we do offer a placement CLI, not just for GETs but for writes as well | 18:24 |
mriedem | the compute service record, compute node record, and rp can all be deleted if the compute service record is deleted | 18:25 |
mriedem | but to get the compute service record back, you have to restart the compute service | 18:25 |
mriedem | to recreate the record which would also re-create the compute node | 18:26 |
mriedem | and then the RP | 18:26 |
mriedem | since we know https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L780 | 18:26 |
sean-k-mooney | efried: true but in that case could we do what we do with neutron and have placment send a notificaiton to nova that it was changed instead of polling | 18:26 |
mriedem | we can pass that down | 18:26 |
mriedem | i'll hack something up quick | 18:26 |
efried | IOW, only have the create code path on start=True | 18:26 |
efried | and don't bother with the existence check otherwise | 18:26 |
mriedem | not even start true | 18:26 |
mriedem | since https://github.com/openstack/nova/commit/418fc93a10fe18de27c75b522a6afdc15e1c49f2 we have a flag to pass through when we create the compute node | 18:26 |
mriedem | we just don't plumb it far enough | 18:27 |
mriedem | i can push up a change that does | 18:27 |
efried | mriedem: That's what pike looked like, though. The stuff that's causing the spike is necessary for *enablement* of nrp, which we haven't started using yet | 18:27 |
mriedem | that might save precious ms for belmoreira :) | 18:27 |
efried | so it seems useless atm | 18:27 |
efried | but as soon as we get e.g. neutron or cyborg adding shit to the tree, we're going to need to do that ?in_tree call every periodic. | 18:27 |
efried | unless there's some kind of async notification hook to trigger a refresh | 18:28 |
efried | yeah, what sean-k-mooney said. | 18:28 |
mriedem | i'm saying we can resolve this todo i think https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/scheduler/client/report.py#L1011 | 18:28 |
mriedem | can we agree on that? | 18:28 |
mriedem | neutron sending an event is possible, but it's also per instance... | 18:29 |
mriedem | not per host | 18:29 |
efried | mriedem: unfortunately not anymore, because we plan to allow other-than-nova to edit the tree. | 18:29 |
mriedem | including delete the compute node root provider? | 18:29 |
mriedem | that nova creates? | 18:29 |
efried | no, not that. | 18:29 |
mriedem | well isn't that what that todo is all about? | 18:29 |
mriedem | create the resource provider for the compute node if it doesn't exist | 18:30 |
efried | no | 18:30 |
sean-k-mooney | mriedem: we are going to allow them to manage there onw subtrees only so the wont be allowed to modify any nodes created by nova | 18:30 |
*** tssurya has quit IRC | 18:30 | |
efried | mriedem: You could probably factor out *just* the root provider part of that; but you can't get rid of the whole method. | 18:31 |
mriedem | i'm not saying remove the method | 18:31 |
efried | and the GET that _ensure_resource_provider is doing is the ?in_tree one that we can't get rid of anyway. | 18:31 |
mriedem | because of something external adding/removing things from the root | 18:35 |
mriedem | right? | 18:35 |
sean-k-mooney | well external entitiy can only leagally add nested resouce providers to the root node they cant add invetories or traits | 18:36 |
sean-k-mooney | technicall the api does not enforce that as we dont have owner of resouce proivers in the api however | 18:37 |
mriedem | what i'm hearing is we can't remove this todo https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/scheduler/client/report.py#L1011 even if we know we didn't just create the root compute node, because we need to call it anyway to determine if there are new nested providers under that pre-existing compute node | 18:38 |
mriedem | s/remove/resolve/ | 18:38 |
mriedem | iow, the todo should be removed b/c we can't do anything about it | 18:38 |
mriedem | even if we *know* the compute node record was just created | 18:38 |
sean-k-mooney | am i dont know if we need to know if there are new nested resouce providers | 18:39 |
mriedem | that's the whole in_tree thing i thought | 18:39 |
sean-k-mooney | in fact i would asser as the compute node we dont need to know that | 18:39 |
mriedem | ok well what i'm saying is we (the RT) know when we created a new compute node record, and thus need to create its resource provider, i guess i'll wait for someone to tell me if that's worth doing so we can avoid the GET /resource_providers?in_tree=<uuid of the thing we just created and thus doesn't exist yet> case | 18:40 |
sean-k-mooney | mriedem: in that case i think you are right wew dont need the /resource_providers?in_tree=<thing i just created> call | 18:42 |
sean-k-mooney | the placement api will not allow me to create a resouce provider with a parrent uuid that does not exist | 18:43 |
* mriedem shoots self and moves on | 18:43 | |
efried | sean-k-mooney: We *do* need to know if there are new nested providers. | 18:44 |
sean-k-mooney | efried: there cant be nested resouce providers of a compute node if we have not created the compute node yet right | 18:44 |
efried | That's what ?in_tree is about, though. | 18:45 |
efried | ?in_tree=$compute_rp gives me the compute RP and any descendants. | 18:45 |
sean-k-mooney | yes | 18:45 |
sean-k-mooney | but if the compute RP does not exist yet then cyborge cant create nested resources under it | 18:46 |
efried | So at T0, it gives me nothing, so I create the compute RP. At T1 it gives me the compute RP. At T2, cyborg creates a child provider for a device. At T3 ?in_tree=$compute_rp gives me both providers. | 18:46 |
efried | If I didn't call ?in_tree I would never know about that device RP | 18:46 |
efried | and I need to know about that device RP e.g. from my virt driver so I can white/blacklist it and/or deploy it. | 18:46 |
sean-k-mooney | efried: sure you dont own that and as nova you cant directly modify it | 18:46 |
efried | Unclear whether blacklisting happens at cyborg or at nova | 18:47 |
efried | but what you say makes sense (single ownership) so it would have to be at cyborg. | 18:47 |
*** mvkr has quit IRC | 18:47 | |
efried | Which means virt driver-esque code would need to be invoked by cyborg to do discovery in the first place. | 18:48 |
*** belmoreira has quit IRC | 18:48 | |
*** cfriesen has quit IRC | 18:48 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove TODO from get_provider_tree_and_ensure_root https://review.openstack.org/614835 | 18:48 |
sean-k-mooney | efried: well not nessisarily | 18:48 |
*** cfriesen has joined #openstack-nova | 18:49 | |
sean-k-mooney | we did suggest that in update_provider_tree we could call os-acc to do that | 18:49 |
efried | yes | 18:49 |
efried | wait | 18:49 |
sean-k-mooney | but what cyborg need to do is 1 lookup the root provider for the compute node | 18:50 |
efried | update_provider_tree would call os_acc with a list of discovered-and-already-whitelist-scrubbed devices so that cyborg can create the providers? | 18:50 |
sean-k-mooney | efried: no | 18:50 |
sean-k-mooney | if nova is doing the deicovery we are rebuiling cyborg in nova | 18:50 |
sean-k-mooney | the idea was that we woudl pass in the current tree to cyborg and it woudl do the discovery itslf and append to that tree | 18:51 |
sean-k-mooney | but the other approch | 18:51 |
sean-k-mooney | which is what we were going to do | 18:51 |
sean-k-mooney | was cybroge poll placement for compute node to be created. | 18:52 |
sean-k-mooney | then it would add child resouce providers to the tree created by nova | 18:52 |
sean-k-mooney | but not modify any resouce provider it did not created | 18:52 |
sean-k-mooney | efried: today are we doing the provider tree update by put or patch. if put what would it take to make it a patch so nova can do a partial update and merge it on the placement side | 18:57 |
efried | mriedem: quick fix pls | 18:58 |
efried | sean-k-mooney: patch is only applicable if you're talking about modifying part of a single provider. Which I think we're not considering. | 18:59 |
efried | sean-k-mooney: IIUC, you're suggesting modifying some providers in the tree, but not others. That's still PUT - one per provider to be modified. | 18:59 |
efried | And it's what we do today, see update_from_provider_tree | 19:00 |
sean-k-mooney | efried: ok cool | 19:00 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove TODO from get_provider_tree_and_ensure_root https://review.openstack.org/614835 | 19:00 |
efried | +2 ^ | 19:00 |
*** ldau has quit IRC | 19:01 | |
*** itlinux has quit IRC | 19:01 | |
sean-k-mooney | what i was actully suggesting was make it a single patch call to placement to update all resouce providers in a tree owned by service x but thats a different conversation | 19:03 |
efried | totally | 19:03 |
sean-k-mooney | i am still not aware of an usecase that woudl require nova to be aware of a resouce provider created by another service by the way | 19:04 |
sean-k-mooney | when i say nova i sepcfically mean the compute agent | 19:05 |
*** pcaruana has quit IRC | 19:05 | |
dansmith | efried: so was there some outcome? | 19:07 |
efried | dansmith: Remember that thing they did where they disabled the refresh_associations? | 19:08 |
dansmith | oh they reverted that? | 19:08 |
dansmith | accidentally | 19:08 |
efried | dansmith: That's why all those calls were zeroes in queens and nonzero once they upgraded (because that hack was no longer there). Yeah. | 19:08 |
dansmith | ah cool. | 19:09 |
efried | So belmiro is going to reinstate that and come back at us. | 19:09 |
dansmith | right on | 19:09 |
efried | but it's still a shit ton of calls | 19:09 |
efried | Matt and Sean and I brainstormed briefly on whether we could just get rid of the cache completely (and whether that would actually help). | 19:10 |
efried | And what we actually ended up doing was getting rid of a comment: https://review.openstack.org/614835 :( | 19:10 |
sean-k-mooney | efried: well i think we could maybe get rid of the cache but i think it needs a spec not irc ideas | 19:13 |
efried | sean-k-mooney: I would rather see a PoC in code for that one than a spec. | 19:13 |
sean-k-mooney | efried: well that a possiblity but it would be similar to neutrons notifications for prot/network events | 19:14 |
efried | yeah, a subscribable notification framework at placement itself would be cool. | 19:15 |
sean-k-mooney | yep that is what i was about to type but had not decided how to phase it | 19:15 |
mriedem | there was a blueprint for that at one point i think | 19:15 |
mriedem | https://blueprints.launchpad.net/nova/+spec/placement-notifications | 19:16 |
sean-k-mooney | well this is a much more positive resonce to this idea then i had expected | 19:16 |
sean-k-mooney | if we had an owner attribute on every provider and a api to register owner with call backs then each service could register a subsciption to the rps they own | 19:17 |
mriedem | and if ifs and buts were candy and nuts we'd all have a merry christmas | 19:19 |
*** tbachman has quit IRC | 19:19 | |
efried | I was thinking much simpler to start. | 19:19 |
efried | You could register for a notification any time $rp_uuid is touched. | 19:19 |
efried | which includes "create a resource provider with $rp_uuid as a root", which solves the use case we were discussing. | 19:20 |
sean-k-mooney | efried: i considered that but that could be a lot of RPs | 19:20 |
efried | Only one per host | 19:20 |
sean-k-mooney | that said i guess you would only have to do that on creating the rp the first time | 19:20 |
sean-k-mooney | at least for clean deployments | 19:20 |
cfriesen | sean-k-mooney: does nova-compute need to know about the child resource providers that cyborg created? | 19:20 |
*** panda is now known as panda|off | 19:20 | |
sean-k-mooney | cfriesen: i dont think so | 19:21 |
sean-k-mooney | cfriesen: not in any of the interaction specs i have seen | 19:21 |
efried | cfriesen: Yeah, we talked about that above; the virt driver needs to know about them for purposes of whitelisting, deploying/attaching, etc. | 19:21 |
sean-k-mooney | efried: no it does not | 19:21 |
sean-k-mooney | the whitelisting is cyborges job | 19:21 |
*** ldau has joined #openstack-nova | 19:22 | |
sean-k-mooney | and deploy/attaching will be not dont in term of the VARs or whatever the equivalent of a port bining has become | 19:22 |
cfriesen | assuming nova owns a specific set of resource providers, and it's the only thing consuming from those resource providers (can we assume that) then it should only have to update inventory once at startup. | 19:22 |
openstackgerrit | Merged openstack/os-vif master: Do not import pyroute2 on Windows https://review.openstack.org/614728 | 19:23 |
ldau | Hi, somebody has installed all-in-one openstack using vmware as hypervisor? | 19:23 |
sean-k-mooney | cfriesen: that is the assumtion we stated in denver so yes i think that is still ture | 19:23 |
cfriesen | now if anything else can consume those resources, then we need the periodic inventory update | 19:23 |
efried | sean-k-mooney, cfriesen: if all of that is true, then we can indeed resolve the TODO that mriedem just blew away. | 19:23 |
efried | cfriesen: You mocking this up? | 19:24 |
cfriesen | not me. :) | 19:24 |
efried | cfriesen: "Consume the resources" doesn't matter. Changing inventory matters, but only for the providers I onw. | 19:24 |
efried | own | 19:24 |
efried | we don't cache allocation/usage data. | 19:25 |
cfriesen | efried: agreed. I guess it'd have to be something like CPU/RAM hotplug where it actually changes the inventory | 19:25 |
efried | But that would be noticed by the virt driver, which would update_provider_tree, which the rt would flush back. | 19:26 |
sean-k-mooney | cfriesen: it would be the virtdirver that would do that however | 19:26 |
efried | IOW, we're not getting rid of the cache. We're getting rid of all the cache *refreshing*. | 19:26 |
sean-k-mooney | efried: yes | 19:26 |
efried | sean-k-mooney: you mocking this up? | 19:26 |
efried | or is mriedem? | 19:26 |
mriedem | i stopped paying attention, what now? | 19:27 |
sean-k-mooney | efried: if we just disable the refesh in the config dose that not effectivly mock it up | 19:27 |
efried | mriedem: We're operating on the hypothesis that nova does *not* in fact need to know if outside agents create child providers that they will continue to own. | 19:27 |
efried | mriedem: And if that's true, we do *not* in fact need to refresh the cache, ever. | 19:28 |
efried | mriedem: So we *can* in fact resolve the TODO you just removed. | 19:28 |
efried | sean-k-mooney: More or less, yeah. Which is what CERN did. Which they seem to have had success with. | 19:28 |
mriedem | how about someone poop this out in the ML and get it sorted out there when gibi, jaypipes and cdent can also weigh in | 19:28 |
efried | sean-k-mooney: I think there's more we can do, though. | 19:29 |
mriedem | i think in general, we should default to *not* cache b/c of the cern issue, and only allow caching if you want to opt-in b/c you have a wildly busy env where inventory is changing a lot | 19:29 |
efried | mriedem: What was your idea to get the compute RP creation happening only once? | 19:29 |
mriedem | which i guess is a powervm thing | 19:29 |
sean-k-mooney | efried: yes proably | 19:29 |
mriedem | efried: yes, create the compute node root rp when the compute node is created, otherwise don't attempt to do the _ensure_resource_provider thing again | 19:29 |
mriedem | i.e. is it a powervm thing to be swapping out disk and such on the fly and expect nova-compute to just happily handle that? | 19:30 |
cfriesen | mriedem: changing inventory, or changing usage? | 19:30 |
mriedem | inventory | 19:31 |
mriedem | anyway, that probably doesn't matter here, | 19:31 |
mriedem | we get the inventory regardless to know if it changed so we can push updates back to placement | 19:31 |
cfriesen | do we expect anyone to have wildly changing inventory? I would have thought inventory is relatively stable. | 19:31 |
mriedem | cfriesen: that's what i said about 2 hours ago | 19:31 |
sean-k-mooney | mriedem: if its somthing that is discoverd by the virt driver its not an issue if it chagnes | 19:31 |
mriedem | i also don't think we really need to worry about refreshing aggregate relationships in the compute, | 19:32 |
efried | mriedem: I contend it doesn't matter if powervm (or any driver) changes inventory every single periodic. Because update_provider_tree is getting whatever the previous state of placement was (because placement isn't changed yet) and then update_from_provider_tree is flushing that back to placement *and* updating the cache accordingly. | 19:32 |
mriedem | since we don't do anything with those yet | 19:32 |
*** belmoreira has joined #openstack-nova | 19:32 | |
efried | yeah, what sean-k-mooney said, only bigger. | 19:32 |
sean-k-mooney | so do we all agree we dont need to refesh the cache in any case that is atleas emitly obvious to us | 19:33 |
efried | tentatively yes | 19:33 |
mriedem | i would have thought a lot of prior discussion about why we even have a cache in the first place has happene | 19:34 |
mriedem | *happened | 19:34 |
mriedem | therefore it seems pretty severe to just all of a sudden say, "oh i guess we don't" | 19:34 |
mriedem | and if there are reasons, do those reasons justify us caching by default | 19:35 |
mriedem | anyway, those are questions for the ML, not irc | 19:35 |
sean-k-mooney | if so then should we start with a patch to default the refresh to off. and a mailing list post to see what operators think/other feedback | 19:35 |
mriedem | cern is the only deployment big enough and new enough that i've heard complain about that refresh | 19:35 |
mriedem | not sure if mnaser is doing anything about it | 19:36 |
mnaser | hi | 19:36 |
* mnaser reads | 19:36 | |
mriedem | mnaser: tl;dr do you turn this way down? https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh | 19:36 |
mriedem | to avoid computes DDoS'ing placement every 5 minutes | 19:37 |
mnaser | i didn't know about that but i do feel like placement gets waaaaay too much unnecessary traffic | 19:37 |
mnaser | an idle cloud (aka literally no new vms being created/deleted) will constantly hit placement | 19:38 |
* mnaser never understood why | 19:38 | |
efried | so mriedem, to do the thing you were talking about earlier, we would add is_new_compute_node as a kwarg from _update_available_resource into _update => _update_to_placement and then only call _get_provider_tree_and_ensure_root if it's true? | 19:38 |
mriedem | mnaser: inventory updates baby! | 19:38 |
mnaser | i'm all in favour of minimizing the amount of traffic that placement gets however a lot of times we end up seeing weird stuff happen in placement db | 19:38 |
mnaser | so if we have to edit stuff via the api | 19:38 |
mnaser | it'd be nice to just know they work | 19:39 |
*** itlinux has joined #openstack-nova | 19:39 | |
mriedem | efried: that's what i was thinking yeah, and nearly starting writing it, but then you guys all said in_tree was mega importante | 19:39 |
efried | mnaser: True story. If you edit something by hand, and we've switched off all this cache refreshing, the only way you're going to pick it up again is to restart the compute service. | 19:39 |
efried | or maybe we can work a HUP in there or something. | 19:39 |
mriedem | HUP is how we refresh the enabled/disabled cell cache in the scheduler | 19:40 |
sean-k-mooney | and any of the mutable config stuff if we have it so HUP makes sense | 19:40 |
mnaser | but really the only editing ive had to do was delete/remove allocations | 19:40 |
mnaser | that were stale for $reasons | 19:40 |
sean-k-mooney | mnaser: oh if its allocation that fine | 19:40 |
efried | mriedem: based on having toggled some kind of setting, right? I.e. HUP is a no-op if you haven't changed anything. | 19:40 |
mnaser | so forgive me for my silly question but | 19:40 |
mnaser | why do computes care about that information | 19:41 |
efried | mnaser: which information specifically? | 19:41 |
efried | allocations? | 19:41 |
mnaser | well, whatever api hits all the time, i think its allocations? | 19:41 |
efried | what hits the API all the time is inventory, providers, aggregates. | 19:41 |
mnaser | but inventory can be a one time hit on start up because afaik the state of that is pretty darn static right | 19:42 |
sean-k-mooney | no this check aggretes and traits on the provider not the assoications | 19:42 |
efried | mnaser: And what we're discussing is, the virt drivers need to know what that stuff looks like so they can *modify* the provider layout, inventory, etc. | 19:42 |
sean-k-mooney | *allocations | 19:42 |
efried | BUT that stuff shouldn't change unless the virt driver changes it | 19:42 |
efried | ...or if you muck with it in the CLI :) | 19:42 |
* mnaser should probably read placement for dummies | 19:42 | |
mnaser | yeah usually my mucking around is around allocations, that's where things get out of sync usually | 19:43 |
efried | mnaser: Placement for dummies won't help you. And for placement-in-nova, there's no such thing as for-dummies. | 19:43 |
mnaser | but i mean the "traits" change dynamically? | 19:43 |
mnaser | anyways, i wont let your discusion diverge too much | 19:43 |
efried | no, traits is a good point, /me thinks... | 19:44 |
mriedem | mnaser: you are asking and saying the same thing i've been saying for an hour or so, | 19:44 |
mriedem | that inventory is pretty static | 19:44 |
mnaser | its 100% static.. things in inventory are | 19:44 |
sean-k-mooney | mnaser: we technicall propised that operators should be able to add traits to RPs but currently the virt dirver just overrite them i think | 19:44 |
efried | once again, if traits change due to external factors, you could HUP to get that flushed. | 19:44 |
mriedem | traits could be changed out of band if you're decorating capabilities on your compute node for scheduling, | 19:44 |
mnaser | memory.. disk.. vcpus.. | 19:44 |
mriedem | and aggregates aren't used in the compute service (yet) | 19:44 |
mnaser | yeah unless someone is hot plugging in memory/disk/cpu | 19:45 |
mnaser | i dont see inventory changing | 19:45 |
mnaser | and yes i agree traits make sense if you wanna say this compute node is special.. but also, does that compute node really care to know if its special? | 19:45 |
mnaser | only the scheduler cares that it's special.. | 19:45 |
sean-k-mooney | mriedem: well the only ones that are are the ones that are created form nova host aggreates(assuming jays stuff laned last cycle) | 19:45 |
mriedem | mnaser: almost correct | 19:45 |
efried | you could run into some interesting race conditions. If you muck with traits at the same time as the virt driver is mucking with traits, whoever gets there last will win. | 19:45 |
mriedem | right, we try to merge in what the compute is reporting for traits with what was put on the node resource provider for traits externally | 19:45 |
*** fghaas has joined #openstack-nova | 19:46 | |
sean-k-mooney | mriedem: sorry you said compute service ignore that | 19:46 |
mnaser | ok so its like | 19:46 |
mriedem | the *only* thing on the compute that would care about aggregates in placement is shared storage providers, | 19:46 |
mriedem | which we don't support yet | 19:46 |
mnaser | self reported traits + "user" decorated traits | 19:46 |
mriedem | mnaser: yes | 19:46 |
mnaser | but the nova-compute reported traits.. i feel like those are pretty static right? maybe i just don't know any wild use cases | 19:47 |
mriedem | i think i've been saying since pike, at least queens, we don't need to refresh aggregate info in compute | 19:47 |
mriedem | mnaser: proably depends on the driver | 19:47 |
mnaser | but i feel like most of the time, if a system trait changes, you probably have nova-compute restart things | 19:47 |
mnaser | ah ok | 19:47 |
mriedem | vmware would love to be able to randomly proxy traits from vcenter through nova-compute to placement | 19:47 |
cfriesen | mnaser: the one exception would be something like vTPM where the driver uses the presence of the requested trait to decide to do something with the instance. | 19:47 |
mriedem | for changes in vcenter | 19:47 |
sean-k-mooney | mnaser: the intent with the user decorated traits was to let the operator tag node with stuff the virt dirver cant discover or to express policy | 19:47 |
cfriesen | mnaser: but that's really looking at the requested trait, not the trait on the resource provider | 19:48 |
sean-k-mooney | cfriesen: but that is in the instance request | 19:48 |
mnaser | yeah, i dunno, i feel like those will not change much, and i think just calling a method *once* when you make some changes rather than all the time isnt probelmatic | 19:48 |
sean-k-mooney | cfriesen: it does not need the RP info | 19:48 |
cfriesen | sean-k-mooney: yah, that's what I realized after typing the first sentence. :) | 19:48 |
*** tbachman has joined #openstack-nova | 19:48 | |
mnaser | i mean lets be honest, we don't "refresh" allocations and those can go pretty stale | 19:48 |
mnaser | are "system" and "user" traits distingusable or in a race they can wipe each other out? | 19:49 |
*** ldau has quit IRC | 19:49 | |
mriedem | system traits might be 'standard' traits | 19:49 |
mriedem | user traits would be CUSTOM traits | 19:49 |
sean-k-mooney | efried: i assume we update the resouce provider generattion when updating traits? | 19:49 |
mriedem | but a user could put standard traits on a provider that the virt driver doesn't report | 19:49 |
mriedem | sean-k-mooney: yes | 19:49 |
efried | sean-k-mooney: Placement does | 19:49 |
cfriesen | sean-k-mooney: on a totally different topic, have you ever run into a scenario where qemu has a thread sitting at 100% cpu but not making any forward progress? I'm assuming it's a livelock somehow, just not sure how. | 19:50 |
sean-k-mooney | ya so one of either the user or the virtdriver will fail in that case and have to retry | 19:50 |
mriedem | i think the only virt driver today that reports any traits is the libvirt driver reporting cpu features | 19:50 |
efried | so yeah, this is where re-GET-and-redrive comes into play. | 19:50 |
sean-k-mooney | so there is not race | 19:50 |
mnaser | i still don't see the need for constantly updating, sounds like this is something that you can just report once on start up or when it changes | 19:50 |
sean-k-mooney | well unless the clinet is coded badly | 19:50 |
efried | mnaser: Yes | 19:50 |
mnaser | MAYBE pull it in when a new VM gets spun up | 19:51 |
*** mvkr has joined #openstack-nova | 19:51 | |
mnaser | if it means a different codepath | 19:51 |
*** tbachman_ has joined #openstack-nova | 19:51 | |
sean-k-mooney | so currntly resource_provider_association_refresh has a min vlaue of 1. can we allow 0 and define that to mean update cache only on startup or sig hup | 19:51 |
*** cfriesen has quit IRC | 19:52 | |
*** cfriesen has joined #openstack-nova | 19:53 | |
sean-k-mooney | mriedem: did you not have a propsal to report things like support_migration as traits too | 19:53 |
cfriesen | dunno what's going on today, I keep disconnecting | 19:53 |
sean-k-mooney | mriedem: or is that handeled by the compute manager above the virt dirver level | 19:54 |
*** tbachman has quit IRC | 19:54 | |
*** tbachman_ is now known as tbachman | 19:54 | |
mnaser | anyways thats my 2 cents | 19:54 |
* mnaser goes back to dealing with rocky upgrades | 19:54 | |
efried | mriedem, sean-k-mooney: Is ComputeManager.reset() the right hook for that SIGHUP thing? | 19:56 |
*** tbachman has quit IRC | 19:56 | |
mriedem | sean-k-mooney: https://review.openstack.org/#/c/538498/ | 19:56 |
mriedem | efried: yeah | 19:56 |
sean-k-mooney | mriedem: ah ya that is what i was thinking of | 19:56 |
openstackgerrit | Merged openstack/nova master: PowerVM upt parity for reshaper, DISK_GB reserved https://review.openstack.org/614643 | 19:57 |
efried | mriedem: So like if I wanted to "clear the cache" I could add | 19:58 |
efried | self.scheduler_client = scheduler_client.SchedulerClient() | 19:58 |
efried | self.reportclient = self.scheduler_client.reportclient | 19:58 |
efried | to that method | 19:58 |
efried | or if I wanted to be narrower about it, I could add a reset() to the report client and invoke self.reportclient.reset() from there instead. | 19:58 |
mriedem | i reckon | 19:59 |
sean-k-mooney | efried: reset might be better | 19:59 |
mriedem | this just seems like a 180 in attitude about how important it is to having nova-compute be totally self-healing on every periodic | 19:59 |
mriedem | which i'm sure was debated to death in releases past | 19:59 |
sean-k-mooney | mriedem: well nothing is stopping us also reviving the notificaiton also to move the healing to a push model | 20:00 |
mriedem | there are plenty of things stopping me from doing anything | 20:01 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Clean up cpu_shared_set config docs https://review.openstack.org/614864 | 20:01 |
efried | mriedem: I think a lot of the reasoning in the past was because information was coming from several different places while we were transitioning to placement but not fully there yet. | 20:02 |
efried | We're getting pretty close to fully there at this point, so I think a lot of this stuff is going to get to be cleaned up. | 20:02 |
efried | Like _normalize_allocation_and_reserved_bullshit() | 20:03 |
mriedem | efried: maybe, but i specifically remember you bringing up something once about how powervm shared storage pools can have disk swapped in and out on a whim and nova-compute should be cool with reporting that as it changes - but maybe that's unrelated to this, idk | 20:04 |
efried | mriedem: 1) that's not implemented yet, but 2) even when it is, that gels just fine with this, precisely because it's being node by virt.powervm.update_provider_tree and *not* out of band. | 20:04 |
efried | s/node/done/ | 20:05 |
mriedem | ok i thought it was to handle some out of band thing | 20:05 |
mriedem | but it was awhile ago and i've been high on ether since then | 20:06 |
sean-k-mooney | well in additon to the sig hup stuff + disableing the cache refush via config =0 if we inject a sleep(random(refersh interval)) seconds to the specific periodic task once the jitter should sperad out the update over the entire interval smoothly on average | 20:06 |
sean-k-mooney | so for those that dont turn this off the same amount of update to placement will happen jsut not all at once every x seconds | 20:07 |
*** eharney has quit IRC | 20:08 | |
mriedem | i imagine cern would appreciate a way to disable the refresh altogether since they are already doing that out of tree | 20:10 |
efried | I'm working something up now. | 20:11 |
sean-k-mooney | efried: code or ml post or spec | 20:11 |
*** belmoreira has quit IRC | 20:12 | |
efried | sean-k-mooney: code | 20:13 |
efried | If it gets traction, I can spec it. | 20:13 |
sean-k-mooney | cool | 20:13 |
*** itlinux has quit IRC | 20:14 | |
sean-k-mooney | mriedem: by the way i know your busy with other stuff but do you plan to revive https://review.openstack.org/#/c/538498/ at somepoint | 20:15 |
*** itlinux has joined #openstack-nova | 20:15 | |
mriedem | it's pretty low priority | 20:16 |
sean-k-mooney | mriedem: ok i stared it. so in the unlikely event i run out of things to do i might take a look at it if you dont get back to it. but ya ther are many things ahead of it | 20:18 |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: High Precision Event Timer (HPET) on x86 guests https://review.openstack.org/607989 | 20:27 |
mriedem | cfriesen: jackding: ^ i cleaned that up, +2 | 20:29 |
cfriesen | mriedem: sweet, thanks. any chance you could take another look at the vtpm one? | 20:30 |
cfriesen | sean-k-mooney: you too | 20:30 |
sean-k-mooney | cfriesen: am sure | 20:30 |
sean-k-mooney | cfriesen: im respining a patch but ill take a look after | 20:30 |
mriedem | ffs yes you know i'd love t | 20:30 |
mriedem | *to | 20:30 |
cfriesen | you're so sweet | 20:30 |
*** KeithMnemonic has joined #openstack-nova | 20:31 | |
mriedem | i have my moments | 20:32 |
mriedem | once per quarter | 20:32 |
KeithMnemonic | mriedem: can someone help move this along https://review.openstack.org/#/c/611326/1 ? | 20:32 |
*** dave-mccowan has quit IRC | 20:32 | |
* cfriesen snags another bag of leftover halloween snacks | 20:32 | |
mriedem | KeithMnemonic: umm, melwitt and/or dansmith could probably hammer that through | 20:32 |
mriedem | KeithMnemonic: how far back do you need that fix? | 20:34 |
KeithMnemonic | thanks melwitt: dansmith: can you help here ? | 20:34 |
KeithMnemonic | just pike | 20:34 |
mriedem | ok i can work on the queens and pike backports in the meantime | 20:35 |
KeithMnemonic | but it needs to get in rocky first then | 20:35 |
mriedem | yup | 20:35 |
melwitt | looking | 20:35 |
KeithMnemonic | thanks for helping out!! | 20:35 |
*** cdent has joined #openstack-nova | 20:43 | |
*** slaweq has joined #openstack-nova | 20:50 | |
openstackgerrit | sean mooney proposed openstack/os-vif master: add support for generic tap device plug https://review.openstack.org/602384 | 21:02 |
openstackgerrit | sean mooney proposed openstack/os-vif master: add isolate_vif config option https://review.openstack.org/612534 | 21:02 |
*** erlon has quit IRC | 21:05 | |
openstackgerrit | sean mooney proposed openstack/os-vif master: always create ovs port during plug https://review.openstack.org/602384 | 21:07 |
openstackgerrit | sean mooney proposed openstack/os-vif master: add isolate_vif config option https://review.openstack.org/612534 | 21:07 |
sean-k-mooney | jaypipes: sorry for the delay i shoudl have adressed all your comments in ^ i have also reworded the commit message for the first patch to clarify things a little | 21:08 |
mriedem | cfriesen: done https://review.openstack.org/#/c/571111/ | 21:10 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Fix NoneType error in _notify_volume_usage_detach https://review.openstack.org/614868 | 21:11 |
cfriesen | thanks. do you think we should deal with shelve/unshelve as part of this, given that it's broken for UEFI nvram currently? | 21:11 |
mriedem | i think if you're not going to deal with it now, it should be explicitly called out as a limitation | 21:12 |
cfriesen | okay, happy to do that | 21:12 |
mriedem | happier than adding shelve support anyway :) | 21:12 |
cfriesen | I think for both cases we'd need to store those files somewhere, either in glance or maybe swift (if present) | 21:13 |
mriedem | nova doesn't do anything with swift directly so idk | 21:14 |
mriedem | if only we had switched to glare 3 years ago when they wanted us to | 21:14 |
cfriesen | fyi, there are actual differences between 1.2 and 2.0 other than CRB | 21:15 |
mriedem | i figured maybe there were, but idk what they are | 21:16 |
mriedem | but assume people that care about using this would know the difference | 21:16 |
cfriesen | me too. :) | 21:16 |
mriedem | ooo https://www.dell.com/support/article/us/en/04/sln312590/tpm-12-vs-20-features | 21:16 |
cfriesen | my impression is that this stuff is all crazy complicated | 21:17 |
sean-k-mooney | cfriesen: yes yes it is | 21:17 |
mriedem | cool, let's add it to nova! | 21:17 |
mriedem | WHAT COULD GO WRONG?! | 21:17 |
sean-k-mooney | mriedem: well a version number is a lot better then traits for all the crap added in each versions | 21:18 |
cfriesen | you're giving me nightmares | 21:18 |
mriedem | i'm fine with reporting the different versions as traits | 21:19 |
mriedem | https://en.wikipedia.org/wiki/Trusted_Platform_Module#TPM_1.2_vs_TPM_2.0 could be a reference in the spec if we cared | 21:19 |
mriedem | sounds like 2.0 is more secure | 21:19 |
sean-k-mooney | cfriesen: the cloud plathform group gave me them frist when the wanted me to enable tpm traits 12 months ago | 21:19 |
*** awaugama has quit IRC | 21:19 | |
sean-k-mooney | mriedem: yes it is | 21:20 |
sean-k-mooney | mriedem: when i was orignally try to standardise tpm trais i have multiple version traits https://review.openstack.org/#/c/514712/3/os_traits/hw/platform/security.py | 21:21 |
sean-k-mooney | but honelst 1.2 and 2.0 are all that matter | 21:22 |
sean-k-mooney | as far a i know very few deplopyment of tpm 1.0 or 1.1 were ever a thing | 21:22 |
cfriesen | on a totally different topic, I'd like to draw your attention to https://review.openstack.org/#/c/473973/ | 21:24 |
cfriesen | originally we used these for the nova/neutron update where we were being blasted with a bunch of neutron updates. now with the changes to get fewer neutron updates it's probably not as big a deal, but we might want to consider using the fair locks in a few places. | 21:26 |
sean-k-mooney | cfriesen: so these are basically the opisite of pirority locks hehe | 21:27 |
cfriesen | they're like ticket spinlocks | 21:27 |
sean-k-mooney | cfriesen: just looking at the implementaiton | 21:29 |
cfriesen | the original problem we hit was that the nova-compute thread handling "real work" (like a migration or something) was being starved by tons of incoming neutron events that always got the lock first | 21:29 |
cfriesen | sean-k-mooney: for simplicity it uses the fact that fasteners writer locks are queued | 21:29 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/pike: Fix NoneType error in _notify_volume_usage_detach https://review.openstack.org/614872 | 21:30 |
sean-k-mooney | cfriesen: how does this interact with and without eventlets monkypatching | 21:31 |
cfriesen | should just work. the underlying stuff is threading.Condition | 21:33 |
*** slaweq has quit IRC | 21:36 | |
sean-k-mooney | cfriesen: once comment if you re spin the patch but ya it neat | 21:38 |
sean-k-mooney | that said if you did not need a named lock you could jsut use the readerwriter lock directly | 21:39 |
jackding | mriedem: I forgot to push my change, Thank you for doing that. | 21:45 |
sean-k-mooney | cfriesen: so for real time guests do you care that we cant disable the perfomance moniting unit in the libvir xml in nova | 22:06 |
cfriesen | sean-k-mooney: I don't think it's come up. Do they default to on? | 22:10 |
sean-k-mooney | cfriesen: yep | 22:10 |
sean-k-mooney | i have no idea what the impact of that is | 22:10 |
sean-k-mooney | i assume low | 22:10 |
sean-k-mooney | but i have an internal email asking about turning realtime instance and that was the only itme that is not already supported upstream | 22:11 |
sean-k-mooney | i could write a patch to allow disableing it in like an hour just not sure its worth my time and or if people would accpet the patch if i did | 22:11 |
cfriesen | sean-k-mooney: I don't see a "perf" section if I do "virsh dumpxml" | 22:12 |
cfriesen | maybe we default it to off or something in libvirt | 22:13 |
sean-k-mooney | its in this section https://libvirt.org/formatdomain.html#elementsFeatures | 22:13 |
sean-k-mooney | and its defalted to on in libvirt | 22:14 |
cfriesen | "virsh domstats --perf <domain>" gives me nothing | 22:16 |
sean-k-mooney | cfriesen: virsh dumpxml | grep pmu ? | 22:16 |
*** threestrands has joined #openstack-nova | 22:17 | |
cfriesen | nothing | 22:17 |
sean-k-mooney | what version of libvirt are you running | 22:17 |
sean-k-mooney | the docs could be wrong | 22:17 |
cfriesen | 3.5.0 | 22:18 |
sean-k-mooney | and qemu | 22:18 |
cfriesen | qemu-kvm-ev-2.10.0 | 22:18 |
*** cdent has quit IRC | 22:19 | |
sean-k-mooney | ok it said since 1.2.12 ill assume the docs are wrong until they show me a vm xml with this from an openstack instance | 22:19 |
cfriesen | I have a specific CPU model though, not host-passhtrough, if that matters | 22:20 |
sean-k-mooney | it may in this case it was using host-passtrogh | 22:21 |
sean-k-mooney | that said i pmu is not a cpu flag so it should not | 22:21 |
*** mriedem has quit IRC | 22:23 | |
sean-k-mooney | actully maybe when the default is on it just does not include it in the xml | 22:23 |
sean-k-mooney | ill get them to verify its actully on before spending any more time on it. thanks cfriesen :) | 22:24 |
cfriesen | how do we handle long URLs in specs? | 22:24 |
sean-k-mooney | i belive flake8 ignore them | 22:25 |
sean-k-mooney | at least it appeared to in the onse i was writing so i just put them in the refernece section and use [0]_ to refer to them | 22:25 |
sean-k-mooney | i dont belive there is an openstack url shortenaer so just use google or something else if you need too | 22:26 |
cfriesen | hmm..just had a thought. is there a way to schedule based on libvirt version? | 22:32 |
sean-k-mooney | cfriesen: nope but you could have trait | 22:33 |
* sean-k-mooney ducks before jaypipes see ^ | 22:33 | |
cfriesen | heh. actually, I think I'm okay. I have a trait for TPM 2.0, and that requires libvirt 4.5 which will also support CRB | 22:34 |
sean-k-mooney | ya i think realistically we dont want to expose software versions to schdule on things and use feautre instead | 22:35 |
sean-k-mooney | TPM 2.0 is defernt at that is refering to an iso standard and well they take a bit more time to have revisions and get implementd in hardware | 22:36 |
jaypipes | sean-k-mooney: you're now officially on the naughty list. | 22:38 |
sean-k-mooney | hehe i did say lets not use traits for this :) also was i ever not? | 22:38 |
*** fghaas has quit IRC | 22:45 | |
jaypipes | :) | 22:48 |
*** KeithMnemonic has quit IRC | 22:55 | |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Trust the report client cache more https://review.openstack.org/614886 | 23:06 |
efried | mriedem, sean-k-mooney, jaypipes, cfriesen, belmoreira: ^^ | 23:07 |
efried | I should link today's IRC discussion in there. But I gotta run riiight now. | 23:07 |
*** owalsh_ has joined #openstack-nova | 23:14 | |
*** owalsh has quit IRC | 23:15 | |
*** tbachman has joined #openstack-nova | 23:19 | |
*** spatel has joined #openstack-nova | 23:21 | |
*** mvkr has quit IRC | 23:22 | |
*** mvkr has joined #openstack-nova | 23:23 | |
*** tbachman has quit IRC | 23:25 | |
*** spatel has quit IRC | 23:25 | |
*** mlavalle has quit IRC | 23:36 | |
*** Swami has quit IRC | 23:53 | |
*** gyee has quit IRC | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!