Thursday, 2018-11-01

*** markvoelker has quit IRC		00:01
*** brinzhang has joined #openstack-nova		00:14
*** slaweq has quit IRC		00:15
*** erlon has quit IRC		00:30
*** betherly has joined #openstack-nova		00:35
*** betherly has quit IRC		00:39
*** mlavalle has quit IRC		00:42
*** erlon has joined #openstack-nova		00:43
*** betherly has joined #openstack-nova		00:56
*** betherly has quit IRC		01:01
*** zul has quit IRC		01:04
*** wangy has joined #openstack-nova		01:06
*** hongbin has joined #openstack-nova		01:24
*** TuanDA has joined #openstack-nova		01:37
*** betherly has joined #openstack-nova		01:48
*** betherly has quit IRC		01:52
*** Dinesh_Bhor has joined #openstack-nova		01:54
*** erlon has quit IRC		01:58
*** sapd1 has quit IRC		02:02
*** sapd1_ has joined #openstack-nova		02:02
openstackgerrit	Zhenyu Zheng proposed openstack/nova master: WIP: Support attach/detach instance root volume https://review.openstack.org/614441	02:02
*** markvoelker has joined #openstack-nova		02:03
*** cfriesen has quit IRC		02:04
openstackgerrit	Brin Zhang proposed openstack/nova master: Remove useless sample and add the lack of tests in v266 https://review.openstack.org/614671	02:07
*** tetsuro has joined #openstack-nova		02:08
*** tetsuro has quit IRC		02:11
*** mhen has quit IRC		02:13
*** mhen has joined #openstack-nova		02:16
*** tiendc has joined #openstack-nova		02:25
openstackgerrit	Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672	02:29
openstackgerrit	Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672	02:33
*** markvoelker has quit IRC		02:35
*** tetsuro has joined #openstack-nova		02:54
*** mrsoul has quit IRC		02:55
*** tetsuro has quit IRC		02:59
*** tetsuro has joined #openstack-nova		03:00
*** psachin has joined #openstack-nova		03:10
*** sapd1_ has quit IRC		03:15
*** sapd1__ has joined #openstack-nova		03:17
*** icey has quit IRC		03:18
*** betherly has joined #openstack-nova		03:21
*** sapd1__ has quit IRC		03:22
*** sapd1_ has joined #openstack-nova		03:22
*** icey has joined #openstack-nova		03:23
*** betherly has quit IRC		03:26
*** markvoelker has joined #openstack-nova		03:32
*** threestrands has joined #openstack-nova		03:46
*** udesale has joined #openstack-nova		03:50
*** betherly has joined #openstack-nova		03:53
*** Dinesh_Bhor has quit IRC		03:56
*** betherly has quit IRC		03:57
*** bzhao__ has quit IRC		03:58
*** wangy has quit IRC		04:04
*** markvoelker has quit IRC		04:06
*** betherly has joined #openstack-nova		04:24
*** betherly has quit IRC		04:29
*** hongbin has quit IRC		04:37
*** tetsuro has quit IRC		04:44
*** Dinesh_Bhor has joined #openstack-nova		04:46
*** alex_xu has quit IRC		04:53
*** alex_xu has joined #openstack-nova		04:56
*** markvoelker has joined #openstack-nova		05:02
*** wangy has joined #openstack-nova		05:10
*** TuanDA has quit IRC		05:10
*** betherly has joined #openstack-nova		05:16
*** betherly has quit IRC		05:21
*** ircuser-1 has quit IRC		05:24
*** abhishekk has joined #openstack-nova		05:26
*** ratailor has joined #openstack-nova		05:35
*** markvoelker has quit IRC		05:36
*** betherly has joined #openstack-nova		05:37
*** betherly has quit IRC		05:42
*** fanzhang has joined #openstack-nova		05:49
*** betherly has joined #openstack-nova		05:57
*** betherly has quit IRC		06:02
*** wangy has quit IRC		06:04
openstackgerrit	Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672	06:15
*** Dinesh_Bhor has quit IRC		06:25
*** markvoelker has joined #openstack-nova		06:33
*** tiendc has quit IRC		06:39
openstackgerrit	Zhenyu Zheng proposed openstack/nova master: Add method to allow fetch root_volume BDM by instance_uuid https://review.openstack.org/614672	06:44
*** wangy has joined #openstack-nova		06:45
*** threestrands has quit IRC		06:52
*** Dinesh_Bhor has joined #openstack-nova		07:01
*** brinzhang has quit IRC		07:01
*** brinzhang has joined #openstack-nova		07:02
*** tetsuro has joined #openstack-nova		07:04
*** icey has quit IRC		07:04
*** markvoelker has quit IRC		07:05
*** icey_ has joined #openstack-nova		07:12
*** icey_ is now known as icey		07:14
openstackgerrit	Merged openstack/nova stable/rocky: conductor: Recreate volume attachments during a reschedule https://review.openstack.org/612487	07:18
*** Dinesh_Bhor has quit IRC		07:24
*** lpetrut has joined #openstack-nova		07:26
*** Dinesh_Bhor has joined #openstack-nova		07:30
*** skatsaounis has quit IRC		07:30
*** alexchadin has joined #openstack-nova		07:38
*** pcaruana\|elisa\| has joined #openstack-nova		07:40
*** slaweq has joined #openstack-nova		07:58
*** pcaruana\|elisa\| has quit IRC		07:59
*** imacdonn has quit IRC		08:00
*** udesale has quit IRC		08:02
*** Dinesh_Bhor has quit IRC		08:02
*** markvoelker has joined #openstack-nova		08:03
*** pcaruana has joined #openstack-nova		08:05
*** lpetrut has quit IRC		08:12
*** slaweq has quit IRC		08:18
*** ralonsoh has joined #openstack-nova		08:22
*** ralonsoh has quit IRC		08:22
*** ralonsoh has joined #openstack-nova		08:23
*** markvoelker has quit IRC		08:36
*** gokhani has joined #openstack-nova		08:37
*** skatsaounis has joined #openstack-nova		08:38
openstackgerrit	Zhenyu Zheng proposed openstack/nova-specs master: Make scheduling weight more granular https://review.openstack.org/599308	08:46
*** mgoddard has joined #openstack-nova		08:50
*** tetsuro has quit IRC		08:51
*** tetsuro has joined #openstack-nova		08:53
*** alexchadin has quit IRC		09:00
*** alexchadin has joined #openstack-nova		09:01
*** rmk has quit IRC		09:02
*** rabel has quit IRC		09:02
*** Dinesh_Bhor has joined #openstack-nova		09:05
*** tetsuro has quit IRC		09:09
*** alexchadin has quit IRC		09:11
*** fghaas has joined #openstack-nova		09:14
*** derekh has joined #openstack-nova		09:16
*** udesale has joined #openstack-nova		09:16
*** wangy has quit IRC		09:20
*** fghaas has quit IRC		09:23
*** k_mouza has joined #openstack-nova		09:24
openstackgerrit	Martin Midolesov proposed openstack/nova master: vmware:PropertyCollector for caching instance properties https://review.openstack.org/608278	09:26
openstackgerrit	Martin Midolesov proposed openstack/nova master: VMware: Expose esx hosts to Openstack https://review.openstack.org/613626	09:26
*** rabel has joined #openstack-nova		09:28
*** masayukig[m] has joined #openstack-nova		09:29
*** markvoelker has joined #openstack-nova		09:33
*** ttsiouts has joined #openstack-nova		09:36
openstackgerrit	Lucian Petrut proposed openstack/os-vif master: Do not import pyroute2 on Windows https://review.openstack.org/614728	09:39
*** ttsiouts has quit IRC		09:42
*** ttsiouts has joined #openstack-nova		09:47
*** lpetrut has joined #openstack-nova		09:49
openstackgerrit	Stephen Finucane proposed openstack/nova master: PowerVM upt parity for reshaper, DISK_GB reserved https://review.openstack.org/614643	09:50
openstackgerrit	gaobin proposed openstack/nova master: Improve the properties of the api https://review.openstack.org/614730	09:50
stephenfin	lyarwood: Morning. Think you could take a look at these backports today? https://review.openstack.org/#/q/topic:bug/1799727+branch:stable/rocky	09:52
*** spatel has joined #openstack-nova		09:56
*** panda\|off is now known as panda		09:56
lyarwood	stephenfin: yup will do	09:58
*** spatel has quit IRC		10:00
*** ralonsoh has quit IRC		10:02
*** ralonsoh has joined #openstack-nova		10:03
*** markvoelker has quit IRC		10:07
openstackgerrit	Lucian Petrut proposed openstack/os-vif master: Do not import pyroute2 on Windows https://review.openstack.org/614728	10:10
*** phuongnh has joined #openstack-nova		10:12
*** k_mouza has quit IRC		10:16
openstackgerrit	gaobin proposed openstack/nova master: Improve the properties of the api https://review.openstack.org/614730	10:17
openstackgerrit	Stephen Finucane proposed openstack/nova master: Fail to live migration if instance has a NUMA topology https://review.openstack.org/611088	10:17
*** k_mouza has joined #openstack-nova		10:18
*** k_mouza has quit IRC		10:19
*** k_mouza has joined #openstack-nova		10:20
*** jaosorior has quit IRC		10:23
*** tssurya has joined #openstack-nova		10:24
*** phuongnh has quit IRC		10:31
*** tbachman has quit IRC		10:32
*** k_mouza has quit IRC		10:40
openstackgerrit	Takashi NATSUME proposed openstack/python-novaclient master: Fix flavor keyerror when nova boot vm https://review.openstack.org/582147	10:41
*** k_mouza has joined #openstack-nova		10:44
*** alexchadin has joined #openstack-nova		10:45
*** dave-mccowan has joined #openstack-nova		10:46
*** Dinesh_Bhor has quit IRC		10:57
*** markvoelker has joined #openstack-nova		11:04
*** pcaruana has quit IRC		11:05
*** k_mouza has quit IRC		11:07
johnthetubaguy	stephenfin: yeah, it makes sense not to disrupt the chain.	11:15
*** udesale has quit IRC		11:24
*** alexchadin has quit IRC		11:26
*** k_mouza has joined #openstack-nova		11:26
*** ttsiouts has quit IRC		11:30
*** jaosorior has joined #openstack-nova		11:35
*** ratailor has quit IRC		11:36
*** markvoelker has quit IRC		11:36
*** Nel1x has joined #openstack-nova		11:40
openstackgerrit	huanhongda proposed openstack/nova master: AZ operations: check host has no instances https://review.openstack.org/611833	11:50
*** pcaruana has joined #openstack-nova		11:52
*** erlon has joined #openstack-nova		11:57
*** ttsiouts has joined #openstack-nova		12:06
*** lpetrut has quit IRC		12:13
openstackgerrit	Zhenyu Zheng proposed openstack/nova master: WIP support attach/detach root volume 2 https://review.openstack.org/614750	12:19
jaypipes	johnthetubaguy: re: the unified limits thing... I should have some PoC code to show you by end of week. It will give us something more concrete to discuss. It doesn't impact the REST API in nova at all.	12:20
*** alexchadin has joined #openstack-nova		12:21
johnthetubaguy	jaypipes: OK, cool. Which bit are you looking at, using placement or the oslo.limits piece, or both?	12:21
johnthetubaguy	jaypipes: was hoping to start work on a PoC soon, how I have finished the previous project that has been distracting me full time!	12:22
jaypipes	johnthetubaguy: both.	12:22
johnthetubaguy	at the PTG we seemed to land on doing the placement thing second, but I would certainly like to see the two together	12:22
jaypipes	johnthetubaguy: not actually using oslo.limits, but with a bunch of "TODO(jaypipes): This should be ported to oslo.limits" notes. :) Along with a health dose of "NOTE(jaypipes): Under no circumstances should this infect oslo.limits"	12:23
johnthetubaguy	ah, OK, got you	12:23
johnthetubaguy	sounds good	12:23
jaypipes	johnthetubaguy: yeah, I'm tackling the limit-getting stuff first, placement queries second.	12:23
jaypipes	johnthetubaguy: obviously, the limit-setting stuff along with quota classes are the things marked "under no circumstances should this infect oslo.limits" :)	12:24
johnthetubaguy	jaypipes: I was thinking along the lines of a parallel quota system, so we just ditch all the old stuff, its too infected with junk like user limits	12:25
johnthetubaguy	well, its clearly not quite that simple, but anyways, looking forward to seeing the PoC	12:26
jaypipes	johnthetubaguy: yeah, I haven't touched any of the "develop a system to migrate nova to use unified limits" stuff. that part of your spec would still very much be needed.	12:27
jaypipes	johnthetubaguy: that said, I've added the infrastructure to be able to configure CONF.quota.driver to something like "unified" and have that switch the underlying mechanisms for limits retrieval.	12:27
jaypipes	johnthetubaguy: so hopefully that data migration stuff can build on top of my work.	12:28
jaypipes	johnthetubaguy: hopefully it should all make sense when I push the code today or tomorrow.	12:28
jaypipes	(I'm OOO this afternoon)	12:28
johnthetubaguy	jaypipes: ah, I don't have code for it yet, only a plan. Yeah, I think I get what you mean, but will look out for the patches	12:29
jaypipes	johnthetubaguy: cool, thanks. I'll add you to the reviews.	12:30
*** udesale has joined #openstack-nova		12:32
*** Nel1x has quit IRC		12:37
*** brinzhang has quit IRC		12:40
*** munimeha1 has joined #openstack-nova		12:47
*** jmlowe has quit IRC		12:47
openstackgerrit	OpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata https://review.openstack.org/614757	12:51
*** ttsiouts has quit IRC		12:55
*** zul has joined #openstack-nova		12:56
*** udesale has quit IRC		12:56
*** eharney has joined #openstack-nova		13:01
*** udesale has joined #openstack-nova		13:01
*** ttsiouts has joined #openstack-nova		13:04
*** zul has quit IRC		13:04
*** zul has joined #openstack-nova		13:06
*** jmlowe has joined #openstack-nova		13:07
*** tbachman_ has joined #openstack-nova		13:10
*** mchlumsky has joined #openstack-nova		13:12
*** tbachman_ is now known as tbachman		13:13
*** liuyulong has joined #openstack-nova		13:15
*** belmoreira has joined #openstack-nova		13:17
*** k_mouza has quit IRC		13:20
*** k_mouza has joined #openstack-nova		13:25
*** awaugama has joined #openstack-nova		13:28
*** mriedem has joined #openstack-nova		13:28
sean-k-mooney	bauzas: mriedem we have a regression in the os-vif 1.12.0 release which is fixed in one of my patches already so we are going to blacklist 1.12.0 in the global requirements. https://review.openstack.org/#/c/614764/1	13:38
sean-k-mooney	im going to work on geting 2 new os-vif gate jobs to test ovs with iptables and linux brige next sprint to catch these kind of thing going forward	13:39
sean-k-mooney	ill likely start on that next week however.	13:40
sean-k-mooney	bauzas: as the nova release liasion could you comment on	13:40
sean-k-mooney	https://review.openstack.org/#/c/614764/1	13:40
mnaser	ok please forgive me if this sound silly but	13:43
mnaser	microversion 1.4 > microversion 1.25, right?	13:43
sean-k-mooney	no	13:45
sean-k-mooney	its not a desimal point	13:45
sean-k-mooney	it semantic versioning	13:45
*** takashin has joined #openstack-nova		13:46
mnaser	ok	13:46
mnaser	explains things	13:46
* mnaser goes back to hacking things		13:46
mnaser	thanks sean-k-mooney	13:46
*** liuyulong has quit IRC		13:47
johnthetubaguy	mnaser: its more like version 4.0 vs version 25.0 actually, as any micro-version can drop functionality	13:50
mnaser	Okay, so trying to figure out why this upgrade somehow is causing nova to request a micro version 1.25 but the service is not providing that	13:51
mnaser	Could be a super screwed up deployment too.	13:51
johnthetubaguy	oh right, request the version from cinder or ironic?	13:51
johnthetubaguy	we usually have a minimum version we need, which implies a minimum version of all the dependent services	13:52
johnthetubaguy	mnaser: who is requesting 1.25 from whom?	13:53
mnaser	johnthetubaguy: so it looks like os_region_name is not a valid option inside the placement section	13:55
mnaser	So this multiregion deployed was probably hitting the wrong region. os_region_name was silently dropped?	13:56
mnaser	So it was hitting an older region	13:56
johnthetubaguy	good question, that sounds bad	13:56
mnaser	It was removed after one cycle..	13:57
sean-k-mooney	mnaser: the simplest thing to do it pretend ther is no .	13:57
mnaser	https://github.com/openstack/nova/commit/3db815957324f4bd6912238a960a90624d97c518	13:58
mriedem	nova meeting in 2 minutes	13:58
mnaser	A bit quick to remove it after just a cycle?	13:58
*** suggestable has joined #openstack-nova		14:01
johnthetubaguy	mnaser: that has always been the norm for config, we just don't usually remember to do it	14:01
*** suggestable has left #openstack-nova		14:01
*** liuyulong_ has joined #openstack-nova		14:03
mnaser	johnthetubaguy: ah okay	14:04
johnthetubaguy	mnaser: now the whole skip version upgrades thing clearly makes that less of a good policy... not sure if we have an answer for that one yet.	14:06
mnaser	johnthetubaguy: yeah, i dont do that (nor do i support that idea).. so i should look at logs :p	14:07
johnthetubaguy	mnaser: heh :)	14:09
mriedem	oslo.config has a new thing for FFU with config stuff	14:15
johnthetubaguy	mriedem: ah, cool	14:19
sean-k-mooney	on the meeting ended quicking then i taught it would	14:20
*** takashin has left #openstack-nova		14:20
sean-k-mooney	i was going to ask peole to asses https://blueprints.launchpad.net/nova/+spec/libvirt-neutron-sriov-livemigration and the related spec if they can to indicate if this can proceed for this cycle	14:21
sean-k-mooney	i have spec update to make but they will be done later today.	14:21
mriedem	johnthetubaguy: mnaser: this thing https://specs.openstack.org/openstack/oslo-specs/specs/rocky/handle-config-changes.html	14:23
mriedem	i think that is still a WIP	14:23
mriedem	jackding: if https://review.openstack.org/#/c/609180/ is ready for review please put it in the runways queue https://etherpad.openstack.org/p/nova-runways-stein	14:24
*** mlavalle has joined #openstack-nova		14:24
*** Luzi has joined #openstack-nova		14:27
*** mvkr has quit IRC		14:40
mriedem	hmm, did something regress with performance? https://bugs.launchpad.net/nova/+bug/1800755	14:42
openstack	Launchpad bug 1800755 in OpenStack Compute (nova) "The instance_faults table is too large, leading to slow query speed of command: nova list --all-tenants" [Undecided,New]	14:42
mriedem	that was fixed with https://bugs.launchpad.net/nova/+bug/1800755	14:42
mriedem	oops	14:42
mriedem	https://review.openstack.org/#/c/409943/	14:42
mriedem	is there any reason we don't purge old faults?	14:43
mriedem	we only show the latest	14:43
mriedem	and we don't provide any API or nova-manage CLI to show all faults for a given instance	14:43
*** Luzi has quit IRC		14:44
jackding	mriedem: sure, will do	14:44
sean-k-mooney	mriedem: would that mess with audit logs?	14:45
mriedem	you mean that config/api that no one uses?	14:45
sean-k-mooney	mriedem: a nova-manage command could make sense or an admin only api	14:45
mriedem	https://developer.openstack.org/api-ref/compute/?expanded=list-server-usage-audits-detail#server-usage-audit-log-os-instance-usage-audit-log	14:45
sean-k-mooney	mriedem: no i was thinking that for some deployment there may be requiremetn to record falts for audit/sla reasons	14:46
sean-k-mooney	i was not thinking of any feature in partaclar	14:46
sean-k-mooney	im just not sure if auto cleanup of old faluts would be somehting we would want in all cases	14:47
mriedem	i'm not suggest an auto cleanup,	14:47
mriedem	but a nova-manage db purge_faults	14:48
openstackgerrit	Surya Seetharaman proposed openstack/nova master: [WIP] Make _instances_cores_ram_count() be smart about cells https://review.openstack.org/569055	14:48
openstackgerrit	Surya Seetharaman proposed openstack/nova master: WIP: API microversion bump for handling-down-cell https://review.openstack.org/591657	14:48
openstackgerrit	Surya Seetharaman proposed openstack/nova master: [WIP] Add os_compute_api:servers:create:cell_down policy https://review.openstack.org/614783	14:48
sean-k-mooney	ya that i think makes total sense. the same way keystone allows you to purge the expired uuid tokens from its db	14:48
sean-k-mooney	mriedem: where you thinking it would drop the fault older then X from the db or move them to an archive table?	14:50
mriedem	i'm not really putting much thought into this	14:51
*** Swami has joined #openstack-nova		14:52
sean-k-mooney	its one of those things that if you brought it up at the ptg i woudl be like "sure go for it" but it also does not should like a supper high prioity either so ya in any case i cant really think of a reason not to allow it off the top of my head	14:54
mriedem	tssurya: in case you haven't started yet, i was thinking about how to do 2.68 down-cell functional api samples testing, which will require some kind of fixture to simulate a down cell,	14:58
mriedem	and i think i have an idea of how to write that	14:58
*** cfriesen has joined #openstack-nova		15:00
tssurya	mriedem: I saw your todos but I haven't started, feel free to start if you have the time your tests are surely going to be more thorough than mine.	15:00
tssurya	bug thanks	15:00
tssurya	big*	15:00
mriedem	ok i think i'll just hack on a DownCellFixture in a separate patch below the API microversion one at the end, and then it could be used in the functional api samples tests,	15:00
mriedem	the nice thing with fixtures is they are also context managers,	15:01
mriedem	so you could create a server while the cell is 'up' and then do something like:	15:01
mriedem	with down_cell_fixture:	15:01
mriedem	get('/servers')	15:01
mriedem	and you should get the minimal construct back	15:01
tssurya	oh nice	15:02
tssurya	there was a doubt however with the sample tests, the jsons you have created.. I thought they were supposed to be created automatically once we write the tests ?	15:02
mriedem	i think they are,	15:02
mriedem	i was just trying to get the api-ref build to pass	15:02
tssurya	ah okay :)	15:03
*** liuyulong_ has quit IRC		15:05
*** liuyulong has joined #openstack-nova		15:06
mriedem	lyarwood: don't forget to add an etherpad for your forum session to https://wiki.openstack.org/wiki/Forum/Berlin2018	15:07
mriedem	i think i have crap to dump in there	15:07
mriedem	melwitt: were you going to send https://etherpad.openstack.org/p/nova-forum-stein to the ML for the list of xp sessions to have warm nova bodies in attendance?	15:09
lyarwood	mriedem: unfortunately I'm no longer attending, sent a note to the foundation when I found out yesterday.	15:09
mriedem	lyarwood: hmm, i could possibly run that session	15:09
mriedem	or we could find someone	15:09
dansmith	I vote for mriedem	15:09
mriedem	random berliner on the street	15:09
mriedem	i'll pay them in sausage	15:10
dansmith	he needs more stuff to do and I hear he loves volumes, especially multi-attached ones	15:10
*** k_mouza has quit IRC		15:10
*** jmlowe has quit IRC		15:10
mriedem	i'll gladly moderate any number of forum sessions if it means i don't have to do any presentations	15:10
*** mvkr has joined #openstack-nova		15:11
*** kukacz has quit IRC		15:12
mriedem	lyarwood: well if the foundation doesn't pull the session, it looks like i had it marked on my calendar to attend anyway so if you want i can moderate it	15:12
mriedem	and just assign all of the work to you	15:12
*** psachin has quit IRC		15:13
*** itlinux has quit IRC		15:13
lyarwood	mriedem: haha so nothing would ever get done	15:13
lyarwood	mriedem: but yeah let me ping them quickly and see if I can save that session	15:14
mriedem	there are a few volume-related specs for stein that would be good to discuss there, like this one to specify delete_on_termination when attaching a volume (and changing that value for existing attachments)	15:14
mriedem	which reminds me, i dusted this off too https://review.openstack.org/#/c/393930/	15:15
mriedem	getting device tags out of the API	15:15
mriedem	dansmith: i think you've been on board with that in the past ^	15:15
lyarwood	mriedem: ha the delete_on_termination issue just came up downstream and we NACK'd assuming it would be lots of work for little gain	15:15
* dansmith nods		15:16
lyarwood	mriedem: are you getting pushed for that as well given users can do this in AWS?	15:16
mriedem	the major problem i see with that one is we already have a PUT API for volume attachments,	15:16
mriedem	the dreaded swap volume API	15:16
mriedem	lyarwood: no i'm not getting pushed for it from our product people	15:16
mriedem	as far as i know anyway	15:16
mriedem	but it's one of those things that comes up every so often, like proxying the volume type on bfv	15:17
mriedem	i don't think it's much work, it's just updating the DB	15:17
mriedem	and taking a new parameter on attach	15:17
mriedem	updating existing attachments is difficult b/c of our already f'ed up api	15:18
mriedem	https://developer.openstack.org/api-ref/compute/#update-a-volume-attachment	15:18
dansmith	it's not a major amount of heavy lifting,	15:18
dansmith	but the gain seems very minor to me	15:18
sean-k-mooney	so random quest. would peole object to an api to list the currently enabled schduler filters? specifically to enabel tempest and other multicloud services to detect what schuler featres they can expect	15:19
dansmith	the strongest argument I've seen for it is that AWS has it and thus the standalone EC2 thing needs to be able to proxy that in	15:19
sean-k-mooney	*question however it could become a quest	15:19
dansmith	but afaik, that's pretty much dead these days	15:19
*** kukacz has joined #openstack-nova		15:19
dansmith	sean-k-mooney: yes I would object	15:19
*** alexchadin has quit IRC		15:19
mriedem	"because AWS and Alibaba have it" is something i hear every week	15:19
sean-k-mooney	dansmith: because we are exposing configuration via the api or somethign else	15:20
dansmith	sean-k-mooney: it would literally be an api call that would make an rpc call to scheduler to return a chunk of config, which shouldn't be visible externally anyway. and if you're running multiple schedulers, which do you cal?	15:20
mriedem	the only reason i could see for doing something like that (scheduler filters and such) is to tell users, via the api, which hints are available	15:20
*** fghaas has joined #openstack-nova		15:20
johnthetubaguy	sean-k-mooney: discovery of available scheduler hints was something we once said we would consider, which is a bit different	15:21
dansmith	yep	15:21
mriedem	right, it would only be feasible if it was a list of hints	15:21
mriedem	which is totally pluggable btw	15:21
sean-k-mooney	johnthetubaguy: its related to this tempest change https://review.openstack.org/#/c/570207/12	15:21
johnthetubaguy	yeah, that was the downside, in the general case, it means nothing useful	15:21
sean-k-mooney	johnthetubaguy: the issue i have with the change is it require use to keep the nova and tempest default in sync	15:22
dansmith	sean-k-mooney: tempest has always been blackbox, requiring you to tell it the nova side scheduler config for this reason	15:22
artom	sean-k-mooney, don't you dare bring more people into this. I will fly to Ireland and cut you, I swear.	15:22
artom	We already can't agree downstream	15:22
dansmith	tempest is a testing/validation tool.. keeping the two configs in sync is a few lines of bash	15:22
mriedem	right, devstack configures the filters in both nova and tempest	15:22
dansmith	right	15:22
*** k_mouza has joined #openstack-nova		15:22
sean-k-mooney	dansmith: the issue is making triplo do that	15:22
mriedem	devstack also adds the same/different host filtesr which aren't in the default enabled_filters list for nova	15:23
dansmith	sean-k-mooney: s/bash/puppet/	15:23
mriedem	for any nfv ci, they'd also need to configure to numa/pci filters	15:23
mriedem	etc	15:23
sean-k-mooney	dansmith: ya i know its just triplo is a pain to make work instead of devstack	15:23
dansmith	sean-k-mooney: adding an api to nova to work around tripleo not being able to communicate config to another module is INSANITY	15:23
sean-k-mooney	mriedem: yes today they only need to enable it in nova however	15:23
johnthetubaguy	I think sdague convinced me about this in the past, you don't want auto discovery, you want to tell the test system what you expect to happen, else there be dragons	15:24
artom	dansmith, it's not communicate per se - if tripleo doens't set the nova value, it shouldn't have to set the corresponding tempest value	15:24
dansmith	artom: find another way	15:25
dansmith	seriously.	15:25
johnthetubaguy	matching defaults?	15:25
artom	dansmith, my other way is https://review.openstack.org/#/c/570207/12	15:25
sean-k-mooney	johnthetubaguy: ya well it was jut a taught my main issue with artom change is that we would have to keep it in sync if we add filter in the futre to the default set	15:25
artom	johnthetubaguy, that's what ^^ is	15:25
artom	But apparently everyone is literally willing to fight to the death over this.	15:25
dansmith	I am	15:25
* dansmith breaks a bottle on the table		15:25
dansmith	let's do it.	15:25
artom	I only have this bluetooth mouse :(	15:26
dansmith	forfeit?	15:26
* johnthetubaguy hopes people defend themselves with pumpkins		15:26
sean-k-mooney	johnthetubaguy: the interop benift is really only a side effect and i dont feel that strongly that its a good thing	15:26
artom	Pfft, as if. I'm making brass knuckles. Wireless ones.	15:26
*** k_mouza has quit IRC		15:27
johnthetubaguy	artom: curious, when nova changes a default in its config, what happens to the rest of the tempest settings?	15:29
artom	johnthetubaguy, you mean for other config options where Tempest uses values from Nova? Good question - gmann was saying on that review that they just update Tempest, but I'd need to look for concrete examples	15:31
johnthetubaguy	artom: cool, that is what I assumed. I know its branchless, but the default is just a helping hand.	15:31
* johnthetubaguy shudders, its complicated		15:32
artom	johnthetubaguy, yeah, I grok that it can't be perfect, I figured at least making it match what Nova has in master is a good first step.	15:32
*** rmk has joined #openstack-nova		15:32
artom	johnthetubaguy, because the previous default of 'all' is... well, it's a handy "feature" for CIs, because they can just enable any filter in Nova and Tempest just runs with it	15:33
*** gyee has joined #openstack-nova		15:33
artom	But it becomes a problem if a filter hasn't been enabled in Nova, Tempest will still try to run with it.	15:33
mriedem	this is fun https://bugs.launchpad.net/nova/+bug/1800508	15:52
openstack	Launchpad bug 1800508 in OpenStack Compute (nova) "Missing exception handling mechanism in 'schedule_and_build_instances' for DBError at line 1180 of nova/conductor/manager.py" [Low,New]	15:52
mriedem	"nova should set the instance to error state when nova fails to insert the instance into the db"	15:53
sean-k-mooney	mriedem: am wait if nova cant insert the instacne into the db what is it setting error on?	15:54
artom	Chicken, meet again. Cart, meet horse.	15:54
artom	"again"? I mean egg	15:54
mriedem	the only thing i could think there is we could try updating the instance within the build request, but that's pretty shitty	15:57
sean-k-mooney	mriedem: in this case however it seams like they are booting with an invalid flavor id right?	15:59
mriedem	no	15:59
mriedem	he's injecting some kind of fault into the code	15:59
mriedem	to trigger the db error	15:59
mriedem	if you try to boot with an invalid flavor id, you'll get a 404 in the api looking up the flavor	15:59
sean-k-mooney	oh ok i was trying to figure out how the create a flavor with id 1E+22 but then failed to boot with that flavor	16:00
mriedem	so, i mean, your cell db could drop right when we're trying to create the server i guess, that would do it as well	16:00
mriedem	but are we going to handle that scenario everywhere in nova?	16:00
sean-k-mooney	ok right so in that case the insnace would be in the api db but fail to insert into the cell db	16:01
sean-k-mooney	me moved the instance staus into the api db recnetly right so ya in that case we coudl set error on the api db i guess but there are a tone of other edgecase like that we dont handel	16:02
sean-k-mooney	mriedem: the other thing we coudl do is have a periodic task that just updates the status of perptually building instance to error after some time e.g a day or rety limit*build timeout or something	16:04
mriedem	this guy has been busy https://bugs.launchpad.net/nova/+bug/1800204 https://bugs.launchpad.net/nova/+bug/1799949	16:05
openstack	Launchpad bug 1800204 in OpenStack Compute (nova) "n-cpu.service consuming 100% of CPU indeterminately" [Undecided,New]	16:05
openstack	Launchpad bug 1799949 in OpenStack Compute (nova) "VM instance building forever when an RPC error occurs" [Undecided,New]	16:05
sean-k-mooney	mriedem: actull https://bugs.launchpad.net/nova/+bug/1800204 seams familar there was a similar bug report a few monts back around the rocky release	16:06
openstack	Launchpad bug 1800204 in OpenStack Compute (nova) "n-cpu.service consuming 100% of CPU indeterminately" [Undecided,New]	16:06
*** k_mouza has joined #openstack-nova		16:10
*** belmoreira has quit IRC		16:11
sean-k-mooney	oh wait that n-cpu not the conductor never mind	16:13
*** itlinux has joined #openstack-nova		16:14
*** imacdonn has joined #openstack-nova		16:15
*** ttsiouts has quit IRC		16:16
sean-k-mooney	mriedem: do you think we will actully adress any of those bugs.	16:16
stephenfin	artom: So, do I need to review https://code.engineering.redhat.com/gerrit/#/c/154627/2 yet?	16:16
sean-k-mooney	stephenfin: wrong irc	16:16
stephenfin	ta :)	16:16
mriedem	sean-k-mooney: probably not	16:19
mriedem	unless there is a more obvious way to create those faults with injecting code into the path and blow up the system	16:19
mriedem	*without	16:19
*** ircuser-1 has joined #openstack-nova		16:20
sean-k-mooney	mriedem: i was just debating if we shoudl triage them as incomplete or wontfix unless a different way to reporduce can be provided	16:22
mriedem	i marked one of them as opinion	16:23
*** k_mouza has quit IRC		16:24
johnthetubaguy	FWIW, I always wanted to be able to "timeout" tasks to try and catch that pending forever case. They caused me endless pain at Rackspace (I think mostly in the migrate/resize code path). The difference was they were more expected / user triggered errors.	16:25
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add --before to nova-manage db archive_deleted_rows https://review.openstack.org/556751	16:25
mriedem	johnthetubaguy: how much of that was resolved with service user tokens though?	16:26
mriedem	or the long_rpc_timeout we have since rocky	16:26
mriedem	which we're using now in the live migration flows that do rpc calls	16:26
johnthetubaguy	mriedem: yeah, I saw that go in. Although most of those cases it went to Error (eventually) when it didn't have to.	16:27
*** ttsiouts has joined #openstack-nova		16:27
mriedem	that's a different bug then	16:27
sean-k-mooney	johnthetubaguy: i have seen this happen with rabbitmq restarte in the past where when perstiency was disabled on instance build and a few other cases.	16:29
sean-k-mooney	i never really considerd that a nova bug however because i cased the issue by restarting rabbit	16:29
*** slaweq has joined #openstack-nova		16:29
sean-k-mooney	johnthetubaguy: but ya its proably more complcated then jsut set to error after x time as some request could still be in flight	16:31
melwitt	mriedem: yes, it completely slipped my mind :( and I'm not done going through the entire list of the schedule yet	16:36
mriedem	melwitt: i added several sessions in there based on my schedule	16:40
melwitt	ok, thank you. that's helpful	16:41
johnthetubaguy	sean-k-mooney: yeah, its hard to get right	16:47
*** k_mouza has joined #openstack-nova		16:53
*** Swami has quit IRC		16:55
openstackgerrit	Matt Riedemann proposed openstack/nova master: WIP: API microversion bump for handling-down-cell https://review.openstack.org/591657	16:56
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add DownCellFixture https://review.openstack.org/614810	16:56
mriedem	tssurya: ^	16:56
tssurya	looking, thanks	16:57
*** k_mouza has quit IRC		16:59
*** mgoddard has quit IRC		17:02
*** belmoreira has joined #openstack-nova		17:02
tssurya	mriedem: okay I am going to write the tests here https://review.openstack.org/#/c/591657/12/nova/tests/functional/api_sample_tests/test_servers.py based on your fixture	17:04
*** udesale has quit IRC		17:06
openstackgerrit	Merged openstack/nova master: Make ResourceTracker.tracked_instances a set https://review.openstack.org/608781	17:13
dansmith	mriedem: tssurya I'm explaining the host-status concept to someone right now, and why an instance state doesn't go to STOPPED just because the compute node is down	17:18
dansmith	mriedem: I wonder if it would make sense to integrate the use of the UNKNOWN state we're adding here with that feature,	17:18
*** fghaas has quit IRC		17:18
dansmith	so that in the same microversion, instances with a down host show up as UNKNOWN as well	17:18
*** fghaas has joined #openstack-nova		17:19
*** belmoreira has quit IRC		17:19
*** fghaas has quit IRC		17:19
*** k_mouza has joined #openstack-nova		17:20
*** belmoreira has joined #openstack-nova		17:20
tssurya	dansmith: you mean you want to add a new "UNKNOWN" vm_state ?	17:25
*** Swami has joined #openstack-nova		17:26
dansmith	tssurya: you're already doing that from the external view right now	17:26
tssurya	yea	17:26
dansmith	we would do a similar thing for real instances we can look up just fine, but which have down hosts	17:26
dansmith	the only problem would be that right now UNKNOWN means "the rest of the instance details aren't there" which would be slightly more ambiguous in this case	17:27
*** slaweq has quit IRC		17:27
tssurya	hmm, makes sense to make the instance state UNKNOWN since we don't know the host state, I mean I guess "UNKNOWN" could mean unknown details/state right ?	17:28
belmoreira	dansmith mriedem should placement/nova issues be discussed here or in placement channel	17:28
cfriesen	dansmith: for what it's worth, in our environment if a compute node goes down an external entity sets all of the instances to the "error" state, until they were automatically recovered.	17:28
dansmith	that seems like an improper use of the error state to me	17:29
dansmith	not to mention that nova on its own won't know whether they're still up and fine or not	17:29
cfriesen	if the host is "down", then we fence it off and force a reboot. those instances are guaranteed to be toast	17:29
dansmith	which is why we don't call them "stopped"	17:29
dansmith	cfriesen: okay, well, that's better in that case but vanilla nova can't do or know that	17:30
cfriesen	agreed, nova itself can't know the bigger picture	17:30
dansmith	belmoreira: depends on what it is.. if it's integration issues then probably here	17:30
belmoreira	is the increase of the number of requests to placement	17:32
cfriesen	dansmith: although, if an external entity uses the nova API to tell nova that the compute node is "down", it's supposed to have already fenced off the node to prevent instances from (eg) talking to volumes.	17:32
belmoreira	have a look into: https://docs.google.com/document/d/1d5k1hA3DbGmMyJbXdVcekR12gyrFTaj_tJdFwdQy-8E/edit?usp=sharing	17:32
cfriesen	otherwise you could evacuate and then have two copies of an instance trying to access the same cinder volume	17:32
dansmith	cfriesen: yeah, true.. I guess I just prefer something less overloaded like UNKNOWN than saying it's stopped or error	17:33
belmoreira	this is the number of requests to placement when compute-nodes get upgrade to Rocky	17:33
dansmith	efried: ^	17:33
efried	how far back am I reading?	17:33
dansmith	efried: one line	17:33
dansmith	and the url he posted	17:33
dansmith	belmoreira: is it really increasing, or is that you bringing nodes on over time?	17:35
belmoreira	the increase of requests shows the compute nodes being upgraded over time (queens -> rocky)	17:36
dansmith	okay	17:36
dansmith	(ouch)	17:36
efried	What's happening at that cat-head bump?	17:36
efried	or possibly batman	17:36
dansmith	online migrations?	17:36
efried	those look like trait requests	17:37
efried	if I'm reading this right.	17:37
belmoreira	no, it must be a cell that upgraded and then stopped nova-compute	17:37
*** ttsiouts has quit IRC		17:38
belmoreira	so, in the second graph we can see all the new requests	17:38
*** ttsiouts has joined #openstack-nova		17:38
*** liuyulong has quit IRC		17:39
belmoreira	UUID/trais ; ?in_tree; UUID/aggregates; ...	17:39
*** k_mouza_ has joined #openstack-nova		17:40
efried	right, so it looks to me like, before rocky, we weren't calling ?in_tree, UUID/aggregates, or ?member_of at all. Which makes sense.	17:40
belmoreira	yes, and this seems to be the reason of the increase of requests	17:41
efried	but also increased number of requests for inventories.	17:42
belmoreira	but is a huge increase. Just added another graph with the response time of my placement infrastructure	17:42
efried	I have to say, this isn't all that surprising.	17:42
*** ttsiouts has quit IRC		17:43
*** k_mouza has quit IRC		17:43
efried	although, hm, I would have expected this jump in queens	17:44
efried	belmoreira: Was this an upgrade from queens, or from earlier?	17:44
*** k_mouza_ has quit IRC		17:45
belmoreira	efried from queens	17:45
belmoreira	I could handle it creating more placement nodes (x3). But looks too much...	17:47
efried	belmoreira: Can you give me a sense of what this timeline represents? At what point are all the upgrades done and the cloud in stable state?	17:48
belmoreira	efried the nova/placement control plane was upgraded between 8:00 and 9:00. ~12:00 the compute nodes started to upgrade (this takes 24h for all of them upgrade)	17:50
belmoreira	at 12:00 (today) almost all compute nodes are in Rocky.	17:51
efried	belmoreira: So where it tails off at the end, that's when the upgrades are pretty much done?	17:51
belmoreira	the load graphs shows when I added more capacity for placement	17:52
efried	Do you have a graph for what it looks like right now?	17:52
efried	I'm just wondering if it's a massive spike during upgrade, but then it evens back out afterward.	17:52
efried	in which case... yeah	17:52
belmoreira	efried I'm getting a new graph from now	17:54
efried	though once again, I wouldn't have expected e.g. ?in_tree to be zero at queens. That should be happening every periodic.	17:55
dansmith	efried: you mean you think it's startup storm?	17:57
dansmith	so every time they reboot computes they'll get this?	17:57
efried	dansmith: If you reboot a thousand computes...	17:58
efried	dansmith: I just wanted to understand whether it was startup storm.	17:58
dansmith	right but presumably they're not rebooting them every second	17:58
dansmith	ack	17:58
efried	Whether it goes back to normal once everything stabilizes	17:58
openstackgerrit	Merged openstack/nova stable/rocky: De-dupe subnet IDs when calling neutron /subnets API https://review.openstack.org/608336	17:58
dansmith	they also know what upgrades look like	17:58
efried	(I don't)	17:58
dansmith	so the fact that they're concerned probably means something	17:58
efried	Heh, I'm not tryng to weasel out of anything. Just trying to grok the problem domain.	17:59
dansmith	no, I know	17:59
dansmith	just sain'	17:59
dansmith	even if we just made the reboot storm a lot worse, that's something we probably need to look at	18:00
belmoreira	efried a new graph from now	18:00
*** derekh has quit IRC		18:00
belmoreira	it is flat at the end. That is the total number of requests that we handle now	18:00
efried	dansmith: Can you sanity-check me on this, though - the _refresh_associations code is in queens, including _ensure_resource_provider invoking _get_provider_in_tree, which is what invokes the ?in_tree URI.	18:01
efried	the mystery being, why would they be seeing zero ?in_tree calls right before the upgrade?	18:01
dansmith	I just headed into a meeting I have to pay attention to	18:01
belmoreira	humm. tssurya just point out the "resource_provider_association_refresh" configuration that we had in queens we don't have it in rocky	18:04
efried	mm, that'd explain a lot. Y'all added that to compensate for this kind of spike in placement traffic iirc	18:05
belmoreira	efried that explains " I wouldn't have expected e.g. ?in_tree to be zero at queens"	18:05
efried	yup	18:05
mriedem	i thought you totally nuked resource_provider_association_refresh rather than just set it to a large value?	18:06
efried	but also why all those things are zero before the upgrade and nonzero after. Like I was saying, I expect all this stuff to happen at the queens boundary, not rocky.	18:06
belmoreira	in queens we patch it and set it to a very large number (to not run again). And I miss it now. My fault!	18:07
efried	IOW I suspect that turning that you would have seen the same graphs simply by turning that switch off and leaving your nodes at queens	18:08
belmoreira	but the number of requests we really impressive! meaning that is very difficult to keep this option in a large infrastructure	18:08
efried	belmoreira: I don't disagree with that.	18:09
mriedem	so by default, every compute (70K?) is refreshing inventory every 1 minute, and every 5 minutes it's also refreshing in_tree, aggregates and traits?	18:10
efried	I would think moving it to a fairly generous interval and hoping your computes don't all hit that interval at the same time :)	18:10
tssurya	mriedem: yea	18:10
mriedem	and we do'nt use the aggregates stuff in compute yet at all from what i can tell	18:10
mriedem	it was there for sharing providers which we don't support yet	18:10
efried	well, didn't we start cloning host azs ?	18:10
mriedem	that's in the API	18:10
efried	but we're not using that in the scheduler yet?	18:11
mriedem	the mirrored aggregates stuff? yes there are pre-request placement filters that rely on it (or something external doing the mirroring)	18:11
mriedem	i'm not sure what that has to do with the cache / refresh for aggregates in all the computes	18:11
mriedem	iow, i'm not sure what the cache in the compute buys us	18:12
efried	yeah, I'm actually trying to think what we actually use the cache for at all... right.	18:12
*** ralonsoh has quit IRC		18:13
efried	cdent has been grousing for a while that we should just be able to make placement calls when we need 'em.	18:13
sean-k-mooney	if we really wanted to make the storm less likely we could use a random prime ofset for the update	18:14
efried	I was thinking it, but then you said it.	18:14
mriedem	oslo already does something like that for periodics	18:15
*** jdillaman has quit IRC		18:15
efried	Trying to think what it would take to rip out the cache completely.	18:15
belmoreira	I'm changing this option, it will take ~2h to propagate. Will let you know the result	18:15
efried	ack	18:15
belmoreira	I have to leave now for some minutes. Thanks for all your help	18:16
efried	o/	18:16
efried	mriedem: We use the cache data so the virt driver has the opportunity every periodic to update the provider tree.	18:17
efried	mriedem: assuming stable placement, no _refresh'ing, we would be doing a helluva lot fewer calls	18:18
efried	and that's also why we cache agg data. Because upt gets to muck with that stuff also.	18:18
mriedem	but nothing does right now right?	18:18
mriedem	for aggregates	18:18
mriedem	and assuming inventory isn't wildly changing on a compute node, we don't really need the cache	18:19
efried	not sure I'm following.	18:19
efried	are you saying "as long as nothing is changing, we don't need to call update_provider_tree" ?	18:20
mriedem	update_provider_tree is what returns the inventory from the driver to the RT to push off to placement every 60 seocnds	18:20
mriedem	*seconds	18:20
mriedem	right?	18:20
efried	Yes	18:21
mriedem	and assuming that disk/ram/cpu on a host doesn't change all that often, at least without a restart of the host, it seems odd we need to cache that information	18:21
efried	But how else would we know whether to push the info back to placement?	18:21
*** ldau has joined #openstack-nova		18:21
mriedem	in the before upt times, didn't the RT/report client just pull inventory, compare to what was reported by the driver, and the PUT it back if there were changes?	18:21
efried	What does "pull inventory" mean, though?	18:22
*** jmlowe has joined #openstack-nova		18:22
efried	pull from placement	18:22
mriedem	GET /resource_providers/{rp_uuid}/inventories	18:22
efried	i.e. GET /rps/UUID/inventory	18:22
efried	yeah	18:22
sean-k-mooney	efried: well the driver could have a perodic check but rememebr the last value it sent and only send a value if it detactes there was a chage	18:22
efried	sean-k-mooney: ^ cache	18:22
sean-k-mooney	that not the same as a cache	18:22
efried	and that's what we do	18:23
mriedem	get_provider_tree_and_ensure_root is what gets the provider tree from the report client and pulls the current inventory from placement, yes?	18:23
mriedem	and also checks to see that the provider exists on every periodic	18:23
efried	yes	18:23
mriedem	which we should actually know	18:23
efried	yeah, we could conceivably expect the compute RP not to disappear once we've created it.	18:24
efried	I mean, I don't know how resilient we're trying to be in the face of OOB changes.	18:24
efried	we do offer a placement CLI, not just for GETs but for writes as well	18:24
mriedem	the compute service record, compute node record, and rp can all be deleted if the compute service record is deleted	18:25
mriedem	but to get the compute service record back, you have to restart the compute service	18:25
mriedem	to recreate the record which would also re-create the compute node	18:26
mriedem	and then the RP	18:26
mriedem	since we know https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L780	18:26
sean-k-mooney	efried: true but in that case could we do what we do with neutron and have placment send a notificaiton to nova that it was changed instead of polling	18:26
mriedem	we can pass that down	18:26
mriedem	i'll hack something up quick	18:26
efried	IOW, only have the create code path on start=True	18:26
efried	and don't bother with the existence check otherwise	18:26
mriedem	not even start true	18:26
mriedem	since https://github.com/openstack/nova/commit/418fc93a10fe18de27c75b522a6afdc15e1c49f2 we have a flag to pass through when we create the compute node	18:26
mriedem	we just don't plumb it far enough	18:27
mriedem	i can push up a change that does	18:27
efried	mriedem: That's what pike looked like, though. The stuff that's causing the spike is necessary for enablement of nrp, which we haven't started using yet	18:27
mriedem	that might save precious ms for belmoreira :)	18:27
efried	so it seems useless atm	18:27
efried	but as soon as we get e.g. neutron or cyborg adding shit to the tree, we're going to need to do that ?in_tree call every periodic.	18:27
efried	unless there's some kind of async notification hook to trigger a refresh	18:28
efried	yeah, what sean-k-mooney said.	18:28
mriedem	i'm saying we can resolve this todo i think https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/scheduler/client/report.py#L1011	18:28
mriedem	can we agree on that?	18:28
mriedem	neutron sending an event is possible, but it's also per instance...	18:29
mriedem	not per host	18:29
efried	mriedem: unfortunately not anymore, because we plan to allow other-than-nova to edit the tree.	18:29
mriedem	including delete the compute node root provider?	18:29
mriedem	that nova creates?	18:29
efried	no, not that.	18:29
mriedem	well isn't that what that todo is all about?	18:29
mriedem	create the resource provider for the compute node if it doesn't exist	18:30
efried	no	18:30
sean-k-mooney	mriedem: we are going to allow them to manage there onw subtrees only so the wont be allowed to modify any nodes created by nova	18:30
*** tssurya has quit IRC		18:30
efried	mriedem: You could probably factor out just the root provider part of that; but you can't get rid of the whole method.	18:31
mriedem	i'm not saying remove the method	18:31
efried	and the GET that _ensure_resource_provider is doing is the ?in_tree one that we can't get rid of anyway.	18:31
mriedem	because of something external adding/removing things from the root	18:35
mriedem	right?	18:35
sean-k-mooney	well external entitiy can only leagally add nested resouce providers to the root node they cant add invetories or traits	18:36
sean-k-mooney	technicall the api does not enforce that as we dont have owner of resouce proivers in the api however	18:37
mriedem	what i'm hearing is we can't remove this todo https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/scheduler/client/report.py#L1011 even if we know we didn't just create the root compute node, because we need to call it anyway to determine if there are new nested providers under that pre-existing compute node	18:38
mriedem	s/remove/resolve/	18:38
mriedem	iow, the todo should be removed b/c we can't do anything about it	18:38
mriedem	even if we know the compute node record was just created	18:38
sean-k-mooney	am i dont know if we need to know if there are new nested resouce providers	18:39
mriedem	that's the whole in_tree thing i thought	18:39
sean-k-mooney	in fact i would asser as the compute node we dont need to know that	18:39
mriedem	ok well what i'm saying is we (the RT) know when we created a new compute node record, and thus need to create its resource provider, i guess i'll wait for someone to tell me if that's worth doing so we can avoid the GET /resource_providers?in_tree=<uuid of the thing we just created and thus doesn't exist yet> case	18:40
sean-k-mooney	mriedem: in that case i think you are right wew dont need the /resource_providers?in_tree=<thing i just created> call	18:42
sean-k-mooney	the placement api will not allow me to create a resouce provider with a parrent uuid that does not exist	18:43
* mriedem shoots self and moves on		18:43
efried	sean-k-mooney: We do need to know if there are new nested providers.	18:44
sean-k-mooney	efried: there cant be nested resouce providers of a compute node if we have not created the compute node yet right	18:44
efried	That's what ?in_tree is about, though.	18:45
efried	?in_tree=$compute_rp gives me the compute RP and any descendants.	18:45
sean-k-mooney	yes	18:45
sean-k-mooney	but if the compute RP does not exist yet then cyborge cant create nested resources under it	18:46
efried	So at T0, it gives me nothing, so I create the compute RP. At T1 it gives me the compute RP. At T2, cyborg creates a child provider for a device. At T3 ?in_tree=$compute_rp gives me both providers.	18:46
efried	If I didn't call ?in_tree I would never know about that device RP	18:46
efried	and I need to know about that device RP e.g. from my virt driver so I can white/blacklist it and/or deploy it.	18:46
sean-k-mooney	efried: sure you dont own that and as nova you cant directly modify it	18:46
efried	Unclear whether blacklisting happens at cyborg or at nova	18:47
efried	but what you say makes sense (single ownership) so it would have to be at cyborg.	18:47
*** mvkr has quit IRC		18:47
efried	Which means virt driver-esque code would need to be invoked by cyborg to do discovery in the first place.	18:48
*** belmoreira has quit IRC		18:48
*** cfriesen has quit IRC		18:48
openstackgerrit	Matt Riedemann proposed openstack/nova master: Remove TODO from get_provider_tree_and_ensure_root https://review.openstack.org/614835	18:48
sean-k-mooney	efried: well not nessisarily	18:48
*** cfriesen has joined #openstack-nova		18:49
sean-k-mooney	we did suggest that in update_provider_tree we could call os-acc to do that	18:49
efried	yes	18:49
efried	wait	18:49
sean-k-mooney	but what cyborg need to do is 1 lookup the root provider for the compute node	18:50
efried	update_provider_tree would call os_acc with a list of discovered-and-already-whitelist-scrubbed devices so that cyborg can create the providers?	18:50
sean-k-mooney	efried: no	18:50
sean-k-mooney	if nova is doing the deicovery we are rebuiling cyborg in nova	18:50
sean-k-mooney	the idea was that we woudl pass in the current tree to cyborg and it woudl do the discovery itslf and append to that tree	18:51
sean-k-mooney	but the other approch	18:51
sean-k-mooney	which is what we were going to do	18:51
sean-k-mooney	was cybroge poll placement for compute node to be created.	18:52
sean-k-mooney	then it would add child resouce providers to the tree created by nova	18:52
sean-k-mooney	but not modify any resouce provider it did not created	18:52
sean-k-mooney	efried: today are we doing the provider tree update by put or patch. if put what would it take to make it a patch so nova can do a partial update and merge it on the placement side	18:57
efried	mriedem: quick fix pls	18:58
efried	sean-k-mooney: patch is only applicable if you're talking about modifying part of a single provider. Which I think we're not considering.	18:59
efried	sean-k-mooney: IIUC, you're suggesting modifying some providers in the tree, but not others. That's still PUT - one per provider to be modified.	18:59
efried	And it's what we do today, see update_from_provider_tree	19:00
sean-k-mooney	efried: ok cool	19:00
openstackgerrit	Matt Riedemann proposed openstack/nova master: Remove TODO from get_provider_tree_and_ensure_root https://review.openstack.org/614835	19:00
efried	+2 ^	19:00
*** ldau has quit IRC		19:01
*** itlinux has quit IRC		19:01
sean-k-mooney	what i was actully suggesting was make it a single patch call to placement to update all resouce providers in a tree owned by service x but thats a different conversation	19:03
efried	totally	19:03
sean-k-mooney	i am still not aware of an usecase that woudl require nova to be aware of a resouce provider created by another service by the way	19:04
sean-k-mooney	when i say nova i sepcfically mean the compute agent	19:05
*** pcaruana has quit IRC		19:05
dansmith	efried: so was there some outcome?	19:07
efried	dansmith: Remember that thing they did where they disabled the refresh_associations?	19:08
dansmith	oh they reverted that?	19:08
dansmith	accidentally	19:08
efried	dansmith: That's why all those calls were zeroes in queens and nonzero once they upgraded (because that hack was no longer there). Yeah.	19:08
dansmith	ah cool.	19:09
efried	So belmiro is going to reinstate that and come back at us.	19:09
dansmith	right on	19:09
efried	but it's still a shit ton of calls	19:09
efried	Matt and Sean and I brainstormed briefly on whether we could just get rid of the cache completely (and whether that would actually help).	19:10
efried	And what we actually ended up doing was getting rid of a comment: https://review.openstack.org/614835 :(	19:10
sean-k-mooney	efried: well i think we could maybe get rid of the cache but i think it needs a spec not irc ideas	19:13
efried	sean-k-mooney: I would rather see a PoC in code for that one than a spec.	19:13
sean-k-mooney	efried: well that a possiblity but it would be similar to neutrons notifications for prot/network events	19:14
efried	yeah, a subscribable notification framework at placement itself would be cool.	19:15
sean-k-mooney	yep that is what i was about to type but had not decided how to phase it	19:15
mriedem	there was a blueprint for that at one point i think	19:15
mriedem	https://blueprints.launchpad.net/nova/+spec/placement-notifications	19:16
sean-k-mooney	well this is a much more positive resonce to this idea then i had expected	19:16
sean-k-mooney	if we had an owner attribute on every provider and a api to register owner with call backs then each service could register a subsciption to the rps they own	19:17
mriedem	and if ifs and buts were candy and nuts we'd all have a merry christmas	19:19
*** tbachman has quit IRC		19:19
efried	I was thinking much simpler to start.	19:19
efried	You could register for a notification any time $rp_uuid is touched.	19:19
efried	which includes "create a resource provider with $rp_uuid as a root", which solves the use case we were discussing.	19:20
sean-k-mooney	efried: i considered that but that could be a lot of RPs	19:20
efried	Only one per host	19:20
sean-k-mooney	that said i guess you would only have to do that on creating the rp the first time	19:20
sean-k-mooney	at least for clean deployments	19:20
cfriesen	sean-k-mooney: does nova-compute need to know about the child resource providers that cyborg created?	19:20
*** panda is now known as panda\|off		19:20
sean-k-mooney	cfriesen: i dont think so	19:21
sean-k-mooney	cfriesen: not in any of the interaction specs i have seen	19:21
efried	cfriesen: Yeah, we talked about that above; the virt driver needs to know about them for purposes of whitelisting, deploying/attaching, etc.	19:21
sean-k-mooney	efried: no it does not	19:21
sean-k-mooney	the whitelisting is cyborges job	19:21
*** ldau has joined #openstack-nova		19:22
sean-k-mooney	and deploy/attaching will be not dont in term of the VARs or whatever the equivalent of a port bining has become	19:22
cfriesen	assuming nova owns a specific set of resource providers, and it's the only thing consuming from those resource providers (can we assume that) then it should only have to update inventory once at startup.	19:22
openstackgerrit	Merged openstack/os-vif master: Do not import pyroute2 on Windows https://review.openstack.org/614728	19:23
ldau	Hi, somebody has installed all-in-one openstack using vmware as hypervisor?	19:23
sean-k-mooney	cfriesen: that is the assumtion we stated in denver so yes i think that is still ture	19:23
cfriesen	now if anything else can consume those resources, then we need the periodic inventory update	19:23
efried	sean-k-mooney, cfriesen: if all of that is true, then we can indeed resolve the TODO that mriedem just blew away.	19:23
efried	cfriesen: You mocking this up?	19:24
cfriesen	not me. :)	19:24
efried	cfriesen: "Consume the resources" doesn't matter. Changing inventory matters, but only for the providers I onw.	19:24
efried	own	19:24
efried	we don't cache allocation/usage data.	19:25
cfriesen	efried: agreed. I guess it'd have to be something like CPU/RAM hotplug where it actually changes the inventory	19:25
efried	But that would be noticed by the virt driver, which would update_provider_tree, which the rt would flush back.	19:26
sean-k-mooney	cfriesen: it would be the virtdirver that would do that however	19:26
efried	IOW, we're not getting rid of the cache. We're getting rid of all the cache refreshing.	19:26
sean-k-mooney	efried: yes	19:26
efried	sean-k-mooney: you mocking this up?	19:26
efried	or is mriedem?	19:26
mriedem	i stopped paying attention, what now?	19:27
sean-k-mooney	efried: if we just disable the refesh in the config dose that not effectivly mock it up	19:27
efried	mriedem: We're operating on the hypothesis that nova does not in fact need to know if outside agents create child providers that they will continue to own.	19:27
efried	mriedem: And if that's true, we do not in fact need to refresh the cache, ever.	19:28
efried	mriedem: So we can in fact resolve the TODO you just removed.	19:28
efried	sean-k-mooney: More or less, yeah. Which is what CERN did. Which they seem to have had success with.	19:28
mriedem	how about someone poop this out in the ML and get it sorted out there when gibi, jaypipes and cdent can also weigh in	19:28
efried	sean-k-mooney: I think there's more we can do, though.	19:29
mriedem	i think in general, we should default to not cache b/c of the cern issue, and only allow caching if you want to opt-in b/c you have a wildly busy env where inventory is changing a lot	19:29
efried	mriedem: What was your idea to get the compute RP creation happening only once?	19:29
mriedem	which i guess is a powervm thing	19:29
sean-k-mooney	efried: yes proably	19:29
mriedem	efried: yes, create the compute node root rp when the compute node is created, otherwise don't attempt to do the _ensure_resource_provider thing again	19:29
mriedem	i.e. is it a powervm thing to be swapping out disk and such on the fly and expect nova-compute to just happily handle that?	19:30
cfriesen	mriedem: changing inventory, or changing usage?	19:30
mriedem	inventory	19:31
mriedem	anyway, that probably doesn't matter here,	19:31
mriedem	we get the inventory regardless to know if it changed so we can push updates back to placement	19:31
cfriesen	do we expect anyone to have wildly changing inventory? I would have thought inventory is relatively stable.	19:31
mriedem	cfriesen: that's what i said about 2 hours ago	19:31
sean-k-mooney	mriedem: if its somthing that is discoverd by the virt driver its not an issue if it chagnes	19:31
mriedem	i also don't think we really need to worry about refreshing aggregate relationships in the compute,	19:32
efried	mriedem: I contend it doesn't matter if powervm (or any driver) changes inventory every single periodic. Because update_provider_tree is getting whatever the previous state of placement was (because placement isn't changed yet) and then update_from_provider_tree is flushing that back to placement and updating the cache accordingly.	19:32
mriedem	since we don't do anything with those yet	19:32
*** belmoreira has joined #openstack-nova		19:32
efried	yeah, what sean-k-mooney said, only bigger.	19:32
sean-k-mooney	so do we all agree we dont need to refesh the cache in any case that is atleas emitly obvious to us	19:33
efried	tentatively yes	19:33
mriedem	i would have thought a lot of prior discussion about why we even have a cache in the first place has happene	19:34
mriedem	*happened	19:34
mriedem	therefore it seems pretty severe to just all of a sudden say, "oh i guess we don't"	19:34
mriedem	and if there are reasons, do those reasons justify us caching by default	19:35
mriedem	anyway, those are questions for the ML, not irc	19:35
sean-k-mooney	if so then should we start with a patch to default the refresh to off. and a mailing list post to see what operators think/other feedback	19:35
mriedem	cern is the only deployment big enough and new enough that i've heard complain about that refresh	19:35
mriedem	not sure if mnaser is doing anything about it	19:36
mnaser	hi	19:36
* mnaser reads		19:36
mriedem	mnaser: tl;dr do you turn this way down? https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh	19:36
mriedem	to avoid computes DDoS'ing placement every 5 minutes	19:37
mnaser	i didn't know about that but i do feel like placement gets waaaaay too much unnecessary traffic	19:37
mnaser	an idle cloud (aka literally no new vms being created/deleted) will constantly hit placement	19:38
* mnaser never understood why		19:38
efried	so mriedem, to do the thing you were talking about earlier, we would add is_new_compute_node as a kwarg from _update_available_resource into _update => _update_to_placement and then only call _get_provider_tree_and_ensure_root if it's true?	19:38
mriedem	mnaser: inventory updates baby!	19:38
mnaser	i'm all in favour of minimizing the amount of traffic that placement gets however a lot of times we end up seeing weird stuff happen in placement db	19:38
mnaser	so if we have to edit stuff via the api	19:38
mnaser	it'd be nice to just know they work	19:39
*** itlinux has joined #openstack-nova		19:39
mriedem	efried: that's what i was thinking yeah, and nearly starting writing it, but then you guys all said in_tree was mega importante	19:39
efried	mnaser: True story. If you edit something by hand, and we've switched off all this cache refreshing, the only way you're going to pick it up again is to restart the compute service.	19:39
efried	or maybe we can work a HUP in there or something.	19:39
mriedem	HUP is how we refresh the enabled/disabled cell cache in the scheduler	19:40
sean-k-mooney	and any of the mutable config stuff if we have it so HUP makes sense	19:40
mnaser	but really the only editing ive had to do was delete/remove allocations	19:40
mnaser	that were stale for $reasons	19:40
sean-k-mooney	mnaser: oh if its allocation that fine	19:40
efried	mriedem: based on having toggled some kind of setting, right? I.e. HUP is a no-op if you haven't changed anything.	19:40
mnaser	so forgive me for my silly question but	19:40
mnaser	why do computes care about that information	19:41
efried	mnaser: which information specifically?	19:41
efried	allocations?	19:41
mnaser	well, whatever api hits all the time, i think its allocations?	19:41
efried	what hits the API all the time is inventory, providers, aggregates.	19:41
mnaser	but inventory can be a one time hit on start up because afaik the state of that is pretty darn static right	19:42
sean-k-mooney	no this check aggretes and traits on the provider not the assoications	19:42
efried	mnaser: And what we're discussing is, the virt drivers need to know what that stuff looks like so they can modify the provider layout, inventory, etc.	19:42
sean-k-mooney	*allocations	19:42
efried	BUT that stuff shouldn't change unless the virt driver changes it	19:42
efried	...or if you muck with it in the CLI :)	19:42
* mnaser should probably read placement for dummies		19:42
mnaser	yeah usually my mucking around is around allocations, that's where things get out of sync usually	19:43
efried	mnaser: Placement for dummies won't help you. And for placement-in-nova, there's no such thing as for-dummies.	19:43
mnaser	but i mean the "traits" change dynamically?	19:43
mnaser	anyways, i wont let your discusion diverge too much	19:43
efried	no, traits is a good point, /me thinks...	19:44
mriedem	mnaser: you are asking and saying the same thing i've been saying for an hour or so,	19:44
mriedem	that inventory is pretty static	19:44
mnaser	its 100% static.. things in inventory are	19:44
sean-k-mooney	mnaser: we technicall propised that operators should be able to add traits to RPs but currently the virt dirver just overrite them i think	19:44
efried	once again, if traits change due to external factors, you could HUP to get that flushed.	19:44
mriedem	traits could be changed out of band if you're decorating capabilities on your compute node for scheduling,	19:44
mnaser	memory.. disk.. vcpus..	19:44
mriedem	and aggregates aren't used in the compute service (yet)	19:44
mnaser	yeah unless someone is hot plugging in memory/disk/cpu	19:45
mnaser	i dont see inventory changing	19:45
mnaser	and yes i agree traits make sense if you wanna say this compute node is special.. but also, does that compute node really care to know if its special?	19:45
mnaser	only the scheduler cares that it's special..	19:45
sean-k-mooney	mriedem: well the only ones that are are the ones that are created form nova host aggreates(assuming jays stuff laned last cycle)	19:45
mriedem	mnaser: almost correct	19:45
efried	you could run into some interesting race conditions. If you muck with traits at the same time as the virt driver is mucking with traits, whoever gets there last will win.	19:45
mriedem	right, we try to merge in what the compute is reporting for traits with what was put on the node resource provider for traits externally	19:45
*** fghaas has joined #openstack-nova		19:46
sean-k-mooney	mriedem: sorry you said compute service ignore that	19:46
mnaser	ok so its like	19:46
mriedem	the only thing on the compute that would care about aggregates in placement is shared storage providers,	19:46
mriedem	which we don't support yet	19:46
mnaser	self reported traits + "user" decorated traits	19:46
mriedem	mnaser: yes	19:46
mnaser	but the nova-compute reported traits.. i feel like those are pretty static right? maybe i just don't know any wild use cases	19:47
mriedem	i think i've been saying since pike, at least queens, we don't need to refresh aggregate info in compute	19:47
mriedem	mnaser: proably depends on the driver	19:47
mnaser	but i feel like most of the time, if a system trait changes, you probably have nova-compute restart things	19:47
mnaser	ah ok	19:47
mriedem	vmware would love to be able to randomly proxy traits from vcenter through nova-compute to placement	19:47
cfriesen	mnaser: the one exception would be something like vTPM where the driver uses the presence of the requested trait to decide to do something with the instance.	19:47
mriedem	for changes in vcenter	19:47
sean-k-mooney	mnaser: the intent with the user decorated traits was to let the operator tag node with stuff the virt dirver cant discover or to express policy	19:47
cfriesen	mnaser: but that's really looking at the requested trait, not the trait on the resource provider	19:48
sean-k-mooney	cfriesen: but that is in the instance request	19:48
mnaser	yeah, i dunno, i feel like those will not change much, and i think just calling a method once when you make some changes rather than all the time isnt probelmatic	19:48
sean-k-mooney	cfriesen: it does not need the RP info	19:48
cfriesen	sean-k-mooney: yah, that's what I realized after typing the first sentence. :)	19:48
*** tbachman has joined #openstack-nova		19:48
mnaser	i mean lets be honest, we don't "refresh" allocations and those can go pretty stale	19:48
mnaser	are "system" and "user" traits distingusable or in a race they can wipe each other out?	19:49
*** ldau has quit IRC		19:49
mriedem	system traits might be 'standard' traits	19:49
mriedem	user traits would be CUSTOM traits	19:49
sean-k-mooney	efried: i assume we update the resouce provider generattion when updating traits?	19:49
mriedem	but a user could put standard traits on a provider that the virt driver doesn't report	19:49
mriedem	sean-k-mooney: yes	19:49
efried	sean-k-mooney: Placement does	19:49
cfriesen	sean-k-mooney: on a totally different topic, have you ever run into a scenario where qemu has a thread sitting at 100% cpu but not making any forward progress? I'm assuming it's a livelock somehow, just not sure how.	19:50
sean-k-mooney	ya so one of either the user or the virtdriver will fail in that case and have to retry	19:50
mriedem	i think the only virt driver today that reports any traits is the libvirt driver reporting cpu features	19:50
efried	so yeah, this is where re-GET-and-redrive comes into play.	19:50
sean-k-mooney	so there is not race	19:50
mnaser	i still don't see the need for constantly updating, sounds like this is something that you can just report once on start up or when it changes	19:50
sean-k-mooney	well unless the clinet is coded badly	19:50
efried	mnaser: Yes	19:50
mnaser	MAYBE pull it in when a new VM gets spun up	19:51
*** mvkr has joined #openstack-nova		19:51
mnaser	if it means a different codepath	19:51
*** tbachman_ has joined #openstack-nova		19:51
sean-k-mooney	so currntly resource_provider_association_refresh has a min vlaue of 1. can we allow 0 and define that to mean update cache only on startup or sig hup	19:51
*** cfriesen has quit IRC		19:52
*** cfriesen has joined #openstack-nova		19:53
sean-k-mooney	mriedem: did you not have a propsal to report things like support_migration as traits too	19:53
cfriesen	dunno what's going on today, I keep disconnecting	19:53
sean-k-mooney	mriedem: or is that handeled by the compute manager above the virt dirver level	19:54
*** tbachman has quit IRC		19:54
*** tbachman_ is now known as tbachman		19:54
mnaser	anyways thats my 2 cents	19:54
* mnaser goes back to dealing with rocky upgrades		19:54
efried	mriedem, sean-k-mooney: Is ComputeManager.reset() the right hook for that SIGHUP thing?	19:56
*** tbachman has quit IRC		19:56
mriedem	sean-k-mooney: https://review.openstack.org/#/c/538498/	19:56
mriedem	efried: yeah	19:56
sean-k-mooney	mriedem: ah ya that is what i was thinking of	19:56
openstackgerrit	Merged openstack/nova master: PowerVM upt parity for reshaper, DISK_GB reserved https://review.openstack.org/614643	19:57
efried	mriedem: So like if I wanted to "clear the cache" I could add	19:58
efried	self.scheduler_client = scheduler_client.SchedulerClient()	19:58
efried	self.reportclient = self.scheduler_client.reportclient	19:58
efried	to that method	19:58
efried	or if I wanted to be narrower about it, I could add a reset() to the report client and invoke self.reportclient.reset() from there instead.	19:58
mriedem	i reckon	19:59
sean-k-mooney	efried: reset might be better	19:59
mriedem	this just seems like a 180 in attitude about how important it is to having nova-compute be totally self-healing on every periodic	19:59
mriedem	which i'm sure was debated to death in releases past	19:59
sean-k-mooney	mriedem: well nothing is stopping us also reviving the notificaiton also to move the healing to a push model	20:00
mriedem	there are plenty of things stopping me from doing anything	20:01
openstackgerrit	Matt Riedemann proposed openstack/nova master: Clean up cpu_shared_set config docs https://review.openstack.org/614864	20:01
efried	mriedem: I think a lot of the reasoning in the past was because information was coming from several different places while we were transitioning to placement but not fully there yet.	20:02
efried	We're getting pretty close to fully there at this point, so I think a lot of this stuff is going to get to be cleaned up.	20:02
efried	Like _normalize_allocation_and_reserved_bullshit()	20:03
mriedem	efried: maybe, but i specifically remember you bringing up something once about how powervm shared storage pools can have disk swapped in and out on a whim and nova-compute should be cool with reporting that as it changes - but maybe that's unrelated to this, idk	20:04
efried	mriedem: 1) that's not implemented yet, but 2) even when it is, that gels just fine with this, precisely because it's being node by virt.powervm.update_provider_tree and not out of band.	20:04
efried	s/node/done/	20:05
mriedem	ok i thought it was to handle some out of band thing	20:05
mriedem	but it was awhile ago and i've been high on ether since then	20:06
sean-k-mooney	well in additon to the sig hup stuff + disableing the cache refush via config =0 if we inject a sleep(random(refersh interval)) seconds to the specific periodic task once the jitter should sperad out the update over the entire interval smoothly on average	20:06
sean-k-mooney	so for those that dont turn this off the same amount of update to placement will happen jsut not all at once every x seconds	20:07
*** eharney has quit IRC		20:08
mriedem	i imagine cern would appreciate a way to disable the refresh altogether since they are already doing that out of tree	20:10
efried	I'm working something up now.	20:11
sean-k-mooney	efried: code or ml post or spec	20:11
*** belmoreira has quit IRC		20:12
efried	sean-k-mooney: code	20:13
efried	If it gets traction, I can spec it.	20:13
sean-k-mooney	cool	20:13
*** itlinux has quit IRC		20:14
sean-k-mooney	mriedem: by the way i know your busy with other stuff but do you plan to revive https://review.openstack.org/#/c/538498/ at somepoint	20:15
*** itlinux has joined #openstack-nova		20:15
mriedem	it's pretty low priority	20:16
sean-k-mooney	mriedem: ok i stared it. so in the unlikely event i run out of things to do i might take a look at it if you dont get back to it. but ya ther are many things ahead of it	20:18
openstackgerrit	Matt Riedemann proposed openstack/nova-specs master: High Precision Event Timer (HPET) on x86 guests https://review.openstack.org/607989	20:27
mriedem	cfriesen: jackding: ^ i cleaned that up, +2	20:29
cfriesen	mriedem: sweet, thanks. any chance you could take another look at the vtpm one?	20:30
cfriesen	sean-k-mooney: you too	20:30
sean-k-mooney	cfriesen: am sure	20:30
sean-k-mooney	cfriesen: im respining a patch but ill take a look after	20:30
mriedem	ffs yes you know i'd love t	20:30
mriedem	*to	20:30
cfriesen	you're so sweet	20:30
*** KeithMnemonic has joined #openstack-nova		20:31
mriedem	i have my moments	20:32
mriedem	once per quarter	20:32
KeithMnemonic	mriedem: can someone help move this along https://review.openstack.org/#/c/611326/1 ?	20:32
*** dave-mccowan has quit IRC		20:32
* cfriesen snags another bag of leftover halloween snacks		20:32
mriedem	KeithMnemonic: umm, melwitt and/or dansmith could probably hammer that through	20:32
mriedem	KeithMnemonic: how far back do you need that fix?	20:34
KeithMnemonic	thanks melwitt: dansmith: can you help here ?	20:34
KeithMnemonic	just pike	20:34
mriedem	ok i can work on the queens and pike backports in the meantime	20:35
KeithMnemonic	but it needs to get in rocky first then	20:35
mriedem	yup	20:35
melwitt	looking	20:35
KeithMnemonic	thanks for helping out!!	20:35
*** cdent has joined #openstack-nova		20:43
*** slaweq has joined #openstack-nova		20:50
openstackgerrit	sean mooney proposed openstack/os-vif master: add support for generic tap device plug https://review.openstack.org/602384	21:02
openstackgerrit	sean mooney proposed openstack/os-vif master: add isolate_vif config option https://review.openstack.org/612534	21:02
*** erlon has quit IRC		21:05
openstackgerrit	sean mooney proposed openstack/os-vif master: always create ovs port during plug https://review.openstack.org/602384	21:07
openstackgerrit	sean mooney proposed openstack/os-vif master: add isolate_vif config option https://review.openstack.org/612534	21:07
sean-k-mooney	jaypipes: sorry for the delay i shoudl have adressed all your comments in ^ i have also reworded the commit message for the first patch to clarify things a little	21:08
mriedem	cfriesen: done https://review.openstack.org/#/c/571111/	21:10
openstackgerrit	Matt Riedemann proposed openstack/nova stable/queens: Fix NoneType error in _notify_volume_usage_detach https://review.openstack.org/614868	21:11
cfriesen	thanks. do you think we should deal with shelve/unshelve as part of this, given that it's broken for UEFI nvram currently?	21:11
mriedem	i think if you're not going to deal with it now, it should be explicitly called out as a limitation	21:12
cfriesen	okay, happy to do that	21:12
mriedem	happier than adding shelve support anyway :)	21:12
cfriesen	I think for both cases we'd need to store those files somewhere, either in glance or maybe swift (if present)	21:13
mriedem	nova doesn't do anything with swift directly so idk	21:14
mriedem	if only we had switched to glare 3 years ago when they wanted us to	21:14
cfriesen	fyi, there are actual differences between 1.2 and 2.0 other than CRB	21:15
mriedem	i figured maybe there were, but idk what they are	21:16
mriedem	but assume people that care about using this would know the difference	21:16
cfriesen	me too. :)	21:16
mriedem	ooo https://www.dell.com/support/article/us/en/04/sln312590/tpm-12-vs-20-features	21:16
cfriesen	my impression is that this stuff is all crazy complicated	21:17
sean-k-mooney	cfriesen: yes yes it is	21:17
mriedem	cool, let's add it to nova!	21:17
mriedem	WHAT COULD GO WRONG?!	21:17
sean-k-mooney	mriedem: well a version number is a lot better then traits for all the crap added in each versions	21:18
cfriesen	you're giving me nightmares	21:18
mriedem	i'm fine with reporting the different versions as traits	21:19
mriedem	https://en.wikipedia.org/wiki/Trusted_Platform_Module#TPM_1.2_vs_TPM_2.0 could be a reference in the spec if we cared	21:19
mriedem	sounds like 2.0 is more secure	21:19
sean-k-mooney	cfriesen: the cloud plathform group gave me them frist when the wanted me to enable tpm traits 12 months ago	21:19
*** awaugama has quit IRC		21:19
sean-k-mooney	mriedem: yes it is	21:20
sean-k-mooney	mriedem: when i was orignally try to standardise tpm trais i have multiple version traits https://review.openstack.org/#/c/514712/3/os_traits/hw/platform/security.py	21:21
sean-k-mooney	but honelst 1.2 and 2.0 are all that matter	21:22
sean-k-mooney	as far a i know very few deplopyment of tpm 1.0 or 1.1 were ever a thing	21:22
cfriesen	on a totally different topic, I'd like to draw your attention to https://review.openstack.org/#/c/473973/	21:24
cfriesen	originally we used these for the nova/neutron update where we were being blasted with a bunch of neutron updates. now with the changes to get fewer neutron updates it's probably not as big a deal, but we might want to consider using the fair locks in a few places.	21:26
sean-k-mooney	cfriesen: so these are basically the opisite of pirority locks hehe	21:27
cfriesen	they're like ticket spinlocks	21:27
sean-k-mooney	cfriesen: just looking at the implementaiton	21:29
cfriesen	the original problem we hit was that the nova-compute thread handling "real work" (like a migration or something) was being starved by tons of incoming neutron events that always got the lock first	21:29
cfriesen	sean-k-mooney: for simplicity it uses the fact that fasteners writer locks are queued	21:29
openstackgerrit	Matt Riedemann proposed openstack/nova stable/pike: Fix NoneType error in _notify_volume_usage_detach https://review.openstack.org/614872	21:30
sean-k-mooney	cfriesen: how does this interact with and without eventlets monkypatching	21:31
cfriesen	should just work. the underlying stuff is threading.Condition	21:33
*** slaweq has quit IRC		21:36
sean-k-mooney	cfriesen: once comment if you re spin the patch but ya it neat	21:38
sean-k-mooney	that said if you did not need a named lock you could jsut use the readerwriter lock directly	21:39
jackding	mriedem: I forgot to push my change, Thank you for doing that.	21:45
sean-k-mooney	cfriesen: so for real time guests do you care that we cant disable the perfomance moniting unit in the libvir xml in nova	22:06
cfriesen	sean-k-mooney: I don't think it's come up. Do they default to on?	22:10
sean-k-mooney	cfriesen: yep	22:10
sean-k-mooney	i have no idea what the impact of that is	22:10
sean-k-mooney	i assume low	22:10
sean-k-mooney	but i have an internal email asking about turning realtime instance and that was the only itme that is not already supported upstream	22:11
sean-k-mooney	i could write a patch to allow disableing it in like an hour just not sure its worth my time and or if people would accpet the patch if i did	22:11
cfriesen	sean-k-mooney: I don't see a "perf" section if I do "virsh dumpxml"	22:12
cfriesen	maybe we default it to off or something in libvirt	22:13
sean-k-mooney	its in this section https://libvirt.org/formatdomain.html#elementsFeatures	22:13
sean-k-mooney	and its defalted to on in libvirt	22:14
cfriesen	"virsh domstats --perf <domain>" gives me nothing	22:16
sean-k-mooney	cfriesen: virsh dumpxml \| grep pmu ?	22:16
*** threestrands has joined #openstack-nova		22:17
cfriesen	nothing	22:17
sean-k-mooney	what version of libvirt are you running	22:17
sean-k-mooney	the docs could be wrong	22:17
cfriesen	3.5.0	22:18
sean-k-mooney	and qemu	22:18
cfriesen	qemu-kvm-ev-2.10.0	22:18
*** cdent has quit IRC		22:19
sean-k-mooney	ok it said since 1.2.12 ill assume the docs are wrong until they show me a vm xml with this from an openstack instance	22:19
cfriesen	I have a specific CPU model though, not host-passhtrough, if that matters	22:20
sean-k-mooney	it may in this case it was using host-passtrogh	22:21
sean-k-mooney	that said i pmu is not a cpu flag so it should not	22:21
*** mriedem has quit IRC		22:23
sean-k-mooney	actully maybe when the default is on it just does not include it in the xml	22:23
sean-k-mooney	ill get them to verify its actully on before spending any more time on it. thanks cfriesen :)	22:24
cfriesen	how do we handle long URLs in specs?	22:24
sean-k-mooney	i belive flake8 ignore them	22:25
sean-k-mooney	at least it appeared to in the onse i was writing so i just put them in the refernece section and use [0]_ to refer to them	22:25
sean-k-mooney	i dont belive there is an openstack url shortenaer so just use google or something else if you need too	22:26
cfriesen	hmm..just had a thought. is there a way to schedule based on libvirt version?	22:32
sean-k-mooney	cfriesen: nope but you could have trait	22:33
* sean-k-mooney ducks before jaypipes see ^		22:33
cfriesen	heh. actually, I think I'm okay. I have a trait for TPM 2.0, and that requires libvirt 4.5 which will also support CRB	22:34
sean-k-mooney	ya i think realistically we dont want to expose software versions to schdule on things and use feautre instead	22:35
sean-k-mooney	TPM 2.0 is defernt at that is refering to an iso standard and well they take a bit more time to have revisions and get implementd in hardware	22:36
jaypipes	sean-k-mooney: you're now officially on the naughty list.	22:38
sean-k-mooney	hehe i did say lets not use traits for this :) also was i ever not?	22:38
*** fghaas has quit IRC		22:45
jaypipes	:)	22:48
*** KeithMnemonic has quit IRC		22:55
openstackgerrit	Eric Fried proposed openstack/nova master: WIP: Trust the report client cache more https://review.openstack.org/614886	23:06
efried	mriedem, sean-k-mooney, jaypipes, cfriesen, belmoreira: ^^	23:07
efried	I should link today's IRC discussion in there. But I gotta run riiight now.	23:07
*** owalsh_ has joined #openstack-nova		23:14
*** owalsh has quit IRC		23:15
*** tbachman has joined #openstack-nova		23:19
*** spatel has joined #openstack-nova		23:21
*** mvkr has quit IRC		23:22
*** mvkr has joined #openstack-nova		23:23
*** tbachman has quit IRC		23:25
*** spatel has quit IRC		23:25
*** mlavalle has quit IRC		23:36
*** Swami has quit IRC		23:53
*** gyee has quit IRC		23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!