Friday, 2018-11-09

*** slaweq has joined #openstack-nova		00:11
*** mvkr has joined #openstack-nova		00:12
*** slaweq has quit IRC		00:16
*** eandersson has joined #openstack-nova		00:16
*** gyee has quit IRC		00:16
*** hshiina has joined #openstack-nova		00:23
*** pcaruana has quit IRC		00:25
*** hshiina has quit IRC		00:27
*** hshiina has joined #openstack-nova		00:28
*** brinzhang has joined #openstack-nova		00:37
*** tetsuro has joined #openstack-nova		00:40
*** k_mouza has quit IRC		00:46
*** Dinesh_Bhor has joined #openstack-nova		01:03
*** slaweq has joined #openstack-nova		01:11
*** erlon_ has quit IRC		01:15
*** slaweq has quit IRC		01:16
*** Dinesh_Bhor has quit IRC		01:25
*** Dinesh_Bhor has joined #openstack-nova		01:31
openstackgerrit	Takashi NATSUME proposed openstack/nova master: Add description of custom resource classes https://review.openstack.org/616721	01:32
openstackgerrit	Takashi NATSUME proposed openstack/nova master: Fix a help string in nova-manage https://review.openstack.org/616723	01:50
openstackgerrit	Merged openstack/nova master: Harden placement init under wsgi https://review.openstack.org/610034	01:53
*** hamzy has joined #openstack-nova		02:15
*** hongbin has joined #openstack-nova		02:42
*** mrsoul has quit IRC		02:46
*** eharney has quit IRC		02:54
openstackgerrit	Merged openstack/nova master: Use SleepFixture instead of mocking _ThreadingEvent.wait https://review.openstack.org/615724	03:09
*** slaweq has joined #openstack-nova		03:11
*** slaweq has quit IRC		03:15
*** Dinesh_Bhor has quit IRC		03:17
openstackgerrit	98k proposed openstack/os-traits master: Add python 3.6 unit test job https://review.openstack.org/616749	03:18
*** Dinesh_Bhor has joined #openstack-nova		03:20
*** tbachman has quit IRC		03:27
*** Dinesh_Bhor has quit IRC		04:08
*** janki has joined #openstack-nova		04:36
*** Dinesh_Bhor has joined #openstack-nova		04:40
*** openstackstatus has quit IRC		04:59
*** hongbin has quit IRC		05:07
*** slaweq has joined #openstack-nova		05:11
*** moshele has joined #openstack-nova		05:12
*** slaweq has quit IRC		05:16
*** moshele has quit IRC		05:18
*** openstack has joined #openstack-nova		07:09
*** ChanServ sets mode: +o openstack		07:09
*** dpawlik has joined #openstack-nova		07:12
openstackgerrit	Takashi NATSUME proposed openstack/nova master: Fix server query examples https://review.openstack.org/616834	07:13
*** sahid has joined #openstack-nova		07:18
openstackgerrit	Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (6) https://review.openstack.org/574113	07:20
openstackgerrit	Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (7) https://review.openstack.org/574974	07:20
openstackgerrit	Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (8) https://review.openstack.org/575311	07:20
*** pcaruana has joined #openstack-nova		07:21
*** dpawlik has quit IRC		07:28
*** dpawlik has joined #openstack-nova		07:29
*** dpawlik has quit IRC		07:29
*** dpawlik has joined #openstack-nova		07:30
*** slaweq has joined #openstack-nova		07:45
*** jangutter has quit IRC		07:50
*** jangutter has joined #openstack-nova		07:50
openstackgerrit	Brin Zhang proposed openstack/nova-specs master: Support admin to specify project to create snapshot https://review.openstack.org/616843	07:52
gibi	melwitt: I think there is only one patch left from use-nested-allocation-candidates that handles allocations that only use nested RPs and not the root RP https://review.openstack.org/#/c/608298/	07:57
gibi	melwitt: that happen to be lower prio to me as that does not needed for GPU and Bandwidth work. It would be needed for the NUMA work though	07:58
gibi	melwitt: so I think we can pull use-nested-allocation-candidates from runway	08:01
*** takashin has left #openstack-nova		08:03
gibi	melwitt: I did pull that from the runway now on the etherpad	08:03
melwitt	gibi: got it, thank you. great work on all of that btw. it is exciting	08:05
openstackgerrit	Jeffrey Zhang proposed openstack/nova master: Add feature to flatten the volume from glance image snapshort https://review.openstack.org/616461	08:06
*** tssurya has joined #openstack-nova		08:07
gibi	melwitt: thanks for looking at the bandwidth patches. It was motivating for me to see them moving forward.	08:07
gibi	melwitt: this week I finally resumed to work on that series	08:08
melwitt	np, it is really cool to see it all coming together now	08:08
*** trident has quit IRC		08:12
*** trident has joined #openstack-nova		08:14
*** ralonsoh has joined #openstack-nova		08:15
*** helenaAM has joined #openstack-nova		08:31
*** sridharg has joined #openstack-nova		08:54
*** sridharg has quit IRC		08:54
*** sridharg has joined #openstack-nova		08:55
*** hshiina has quit IRC		09:09
*** tetsuro has quit IRC		09:20
*** panda\|rover\|off is now known as panda\|rover		09:24
*** Dinesh_Bhor has quit IRC		09:34
*** derekh has joined #openstack-nova		09:42
*** k_mouza has joined #openstack-nova		09:52
*** k_mouza has quit IRC		09:53
*** k_mouza has joined #openstack-nova		09:54
*** Dinesh_Bhor has joined #openstack-nova		09:57
*** Dinesh_Bhor has quit IRC		10:01
*** ttsiouts has joined #openstack-nova		10:05
*** ttsiouts has quit IRC		10:10
*** ttsiouts has joined #openstack-nova		10:11
*** ttsiouts has quit IRC		10:15
*** ttsiouts has joined #openstack-nova		10:20
*** ttsiouts has quit IRC		10:21
*** ttsiouts has joined #openstack-nova		10:22
*** ttsiouts has quit IRC		10:26
*** ttsiouts has joined #openstack-nova		10:30
openstackgerrit	zhouxinyong proposed openstack/nova master: delete unavailable links https://review.openstack.org/616870	10:30
openstackgerrit	Surya Seetharaman proposed openstack/nova master: Make _instances_cores_ram_count() be smart about cells https://review.openstack.org/569055	10:31
openstackgerrit	Surya Seetharaman proposed openstack/nova master: Make _instances_cores_ram_count() be smart about cells https://review.openstack.org/569055	10:32
*** sambetts_ is now known as sambetts\|afk		10:33
*** davidsha has joined #openstack-nova		10:36
openstackgerrit	zhouxinyong proposed openstack/nova master: delete unavailable links https://review.openstack.org/616870	10:52
*** maciejjozefczyk has quit IRC		10:59
*** ttsiouts has quit IRC		10:59
*** ttsiouts has joined #openstack-nova		11:00
*** maciejjozefczyk has joined #openstack-nova		11:01
*** maciejjozefczyk has joined #openstack-nova		11:01
*** tssurya has quit IRC		11:04
*** dpawlik has quit IRC		11:08
*** rodolof has joined #openstack-nova		11:12
*** k_mouza has quit IRC		11:31
*** k_mouza has joined #openstack-nova		11:33
*** k_mouza has quit IRC		11:52
*** dtantsur is now known as dtantsur\|brb		11:59
*** k_mouza has joined #openstack-nova		12:05
*** brinzhang has quit IRC		12:10
*** janki has quit IRC		12:11
*** ondrejme has joined #openstack-nova		12:13
*** maciejjozefczyk has quit IRC		12:15
*** maciejjozefczyk has joined #openstack-nova		12:16
*** erlon has joined #openstack-nova		12:29
*** panda\|rover is now known as panda\|rover\|lch		12:31
*** alexchadin has joined #openstack-nova		12:39
*** rodolof has quit IRC		12:55
openstackgerrit	zhouxinyong proposed openstack/nova master: modify the avaliable link https://review.openstack.org/616905	13:08
*** dtantsur\|brb is now known as dtantsur		13:09
*** k_mouza has quit IRC		13:11
*** Dinesh_Bhor has joined #openstack-nova		13:18
*** k_mouza has joined #openstack-nova		13:23
openstackgerrit	Lee Yarwood proposed openstack/nova master: DNM WIP zuul: Add a lioadm based multiattach job https://review.openstack.org/616916	13:26
*** Dinesh_Bhor has quit IRC		13:34
*** sahid has quit IRC		13:42
*** sahid has joined #openstack-nova		13:47
*** mriedem has joined #openstack-nova		14:08
*** tbachman has joined #openstack-nova		14:11
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Calculate port_id rp_uuid mapping for binding https://review.openstack.org/616239	14:18
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Pass allocations and traits to neturonv2 api https://review.openstack.org/616240	14:18
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Send RP uuid in the port binding https://review.openstack.org/569459	14:18
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Test boot with more ports with bandwidth request https://review.openstack.org/573317	14:18
openstackgerrit	Ivaylo Mitev proposed openstack/nova master: VMware: Attach volumes using adapter type from instance https://review.openstack.org/616599	14:18
openstackgerrit	Matt Riedemann proposed openstack/nova master: libvirt: change "Ignoring supplied device name" warning to info https://review.openstack.org/616952	14:19
*** davidsha has quit IRC		14:32
*** eharney has joined #openstack-nova		14:34
*** liuyulong has joined #openstack-nova		14:35
*** eharney has quit IRC		14:54
*** tssurya has joined #openstack-nova		14:57
*** maciejjozefczyk has quit IRC		15:00
*** lbragstad has quit IRC		15:02
*** lbragstad has joined #openstack-nova		15:03
*** maciejjozefczyk has joined #openstack-nova		15:07
*** maciejjozefczyk has quit IRC		15:07
*** maciejjozefczyk has joined #openstack-nova		15:09
*** alexchadin has quit IRC		15:10
*** k_mouza has quit IRC		15:16
*** hongbin has joined #openstack-nova		15:18
*** artom has quit IRC		15:21
*** artom has joined #openstack-nova		15:21
*** k_mouza has joined #openstack-nova		15:24
*** maciejjozefczyk has quit IRC		15:33
*** maciejjozefczyk has joined #openstack-nova		15:34
*** maciejjozefczyk has quit IRC		15:34
*** maciejjozefczyk has joined #openstack-nova		15:34
*** maciejjozefczyk has quit IRC		15:35
*** maciejjozefczyk has joined #openstack-nova		15:35
*** rnoriega has quit IRC		15:36
*** maciejjozefczyk has quit IRC		15:36
*** rnoriega has joined #openstack-nova		15:36
mriedem	aspiers: done https://review.openstack.org/#/c/609779/	15:43
mriedem	i'm really not thrilled at the amount of technical debt we'll be taking on if we add this	15:43
mriedem	but such is life i suppose	15:43
mriedem	by technical debt i mean "oh i can create and destroy a sev-enabled vm, but that's all i can do with it"	15:43
openstackgerrit	Jack Ding proposed openstack/nova master: Add cache=none option for qemu-img convert https://review.openstack.org/616692	15:45
*** spatel has joined #openstack-nova		15:48
spatel	sean-k-mooney: Howdy!!!	15:50
spatel	Morning	15:50
spatel	https://bugs.launchpad.net/nova/+bug/1792763	15:50
openstack	Launchpad bug 1792763 in OpenStack Compute (nova) "tap TX packet drops during high cpu load " [Undecided,Invalid]	15:50
spatel	Yes this could be resolved or close.. because its design question ( i won't say its BUG )	15:51
*** bnemec is now known as beekneemech		15:53
sean-k-mooney	ya the drops you were seeing were a limitation of linux bridge as you said so this is not something nova can fix	15:53
sean-k-mooney	spatel: the kernel can only handel about 1.4mpps on a 3.4GHz cpu and generally its less then that	15:54
sean-k-mooney	once you exceed that level you get drops.	15:54
spatel	In my test after 50kpps i was seeing TX drop on tap interface	15:55
*** eharney has joined #openstack-nova		15:56
spatel	I think this is issue of tap interface design, it run on kernel space which is overhead on kernel	15:56
spatel	virtio i meant	15:56
sean-k-mooney	to get to 1.4 you need kernel ovs with kernel vhost module to acclerate it	15:56
sean-k-mooney	spatel: yes so its not something openstack can remedy	15:57
spatel	++	15:57
spatel	In my case i am using linuxbridge (not ovs)	15:57
spatel	do you think OVS is better in performance compare to linuxbridge ?	15:58
sean-k-mooney	yes it is	15:58
sean-k-mooney	bar multicast tunnelling	15:58
sean-k-mooney	if you have a multicast hevey workload use linux bridge as ovs falls back to unicast	15:58
sean-k-mooney	but in gereral ovs out performes linux bridge in vm based workloads	15:59
spatel	when you say multicast what is the relation here?	15:59
*** jistr is now known as jistr\|call		16:00
sean-k-mooney	linux bridge support using multicast endpoint for teant networks meaning it can more efficetly handel tenatn traffic with a high proportion of broadcase or multicast traffic	16:00
sean-k-mooney	ovs does not and has to fall back to a unicast mesh toplogy for vxlan	16:01
sean-k-mooney	but for typical workloads ovs will out perfrom linux bridge	16:01
dansmith	mriedem: I threw a comment in there about using sysmeta to let virt drivers declare some ops as invalid for an instance. is there some reason that's not reasonable?	16:04
*** burt has joined #openstack-nova		16:04
dansmith	presumably 403 is allowed for pretty much any operation on any microversion, so I would think it'd be not a huge deal, and immediately applies to existing operations in certain situations	16:04
*** jangutter has quit IRC		16:07
*** Luzi has quit IRC		16:07
*** janki has joined #openstack-nova		16:09
mriedem	dansmith: yeah, replied	16:09
mriedem	it really goes back to the capabilities thing we've discussed several times before	16:09
mriedem	i'm mostly concerned about snapshot, because if you can't move the instance, users are at least going to want to be able to snapshot it i'd think before it has to be destroyed and recreated elsewhere because the compute it's on is going away	16:11
mriedem	of course this is where someone says, "just attach a data volume and rewrite the application to use that"	16:12
spatel	sean-k-mooney: thanks for clear that point.. :) i have all unicast workload	16:12
*** lbragstad has quit IRC		16:14
dansmith	mriedem: did he say snapshot wasn't supported? I would think it would be	16:15
*** lbragstad has joined #openstack-nova		16:15
dansmith	the airplane wifi is sucking too hard for me to even open it again	16:16
mriedem	it wasn't mentioned	16:16
mriedem	that's why i asked, because it sure seems like a lot can't be supported	16:16
*** jistr\|call is now known as jistr		16:17
*** imacdonn has quit IRC		16:18
mriedem	we got a bug b/c of the limit of tenant ids for the aggregate multitenancy isolation filter, that's resolved with the placement request filter, but doesn't look like the docs for the placement filter mention you can namespace the metadata so you can add as many tenants as you want	16:18
dansmith	mriedem: the suspend/resume and live migration are about in-memory state, which is why they're hard to support I think	16:18
dansmith	snapshot, reboot, cold migrate should all be fine I would think	16:18
dansmith	based on my reading and assumptions about how this works	16:18
*** cfriesen has joined #openstack-nova		16:18
dansmith	mriedem: hmm, I was sure I put that in there	16:19
mriedem	don't see it, i can push up something for that	16:19
dansmith	okay	16:20
mriedem	and i'll probably update the docs for the old filter to mention the limitation (and link to the bug) and say the placement one is a better replacement	16:20
dansmith	ack	16:21
dansmith	did I have it in the commit message or something?	16:21
dansmith	I was sure I wrote words about this	16:21
*** etp has quit IRC		16:21
*** gyee has joined #openstack-nova		16:22
*** tssurya has quit IRC		16:22
mriedem	https://review.openstack.org/#/c/545002/27 "This also allows making this filter advisory but not required, and supports multiple tenants per aggregate, unlike the original filter."	16:22
mriedem	maybe that	16:22
dansmith	nova with the ``filter_tenant_id`` key (optionally suffixed with any string for	16:24
dansmith	multiple tenants,	16:24
dansmith	https://review.openstack.org/#/c/557490/8/releasenotes/notes/tenant_aggregate_placement_filter-c2fed8889f43b6e3.yaml	16:24
dansmith	in the reno not hte docs	16:24
dansmith	tha's mah bad	16:25
mriedem	k, i'll copy that	16:25
*** etp has joined #openstack-nova		16:27
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Calculate port_id rp_uuid mapping for binding https://review.openstack.org/616239	16:28
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Pass allocations and traits to neturonv2 api https://review.openstack.org/616240	16:28
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Send RP uuid in the port binding https://review.openstack.org/569459	16:28
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Test boot with more ports with bandwidth request https://review.openstack.org/573317	16:29
openstackgerrit	Matt Riedemann proposed openstack/nova master: Mention meta key suffix in tenant isolation with placement docs https://review.openstack.org/616991	16:33
mriedem	sean-k-mooney: is it just me or is NeutronLinuxBridgeInterfaceDriver completely replaced with os-vif now?	16:38
sean-k-mooney	ill need to check but probably	16:41
mriedem	i think it would only be used via the linuxnet_interface_driver config option, but i don't see anything with neutron in nova using the code path that hits that option	16:42
mriedem	only the nova-network l3 stuff	16:42
sean-k-mooney	we can likely kill it when we kill nova networks	16:42
mriedem	sure but this is a neutron-specific driver	16:42
sean-k-mooney	mriedem: ill look into it next week while and see what its actully used for but your right	16:43
sean-k-mooney	* while ye are at the summit	16:44
*** spatel has quit IRC		16:44
cfriesen	stephenfin: question about your commit https://review.openstack.org/#/c/526329 One of our guys says he ran into a scenario in pike where "image_chunks" itself was None due to things like firewall breakage or server-side problems. Does that get handled properly currently?	16:45
openstackgerrit	Merged openstack/nova stable/pike: Fix the request context in ServiceFixture https://review.openstack.org/599839	16:46
openstackgerrit	Merged openstack/nova stable/pike: Add functional test for affinity with multiple cells https://review.openstack.org/599840	16:46
*** k_mouza has quit IRC		16:46
openstackgerrit	Matt Riedemann proposed openstack/nova master: Delete NeutronLinuxBridgeInterfaceDriver https://review.openstack.org/616995	16:54
*** spatel has joined #openstack-nova		16:54
*** helenaAM has quit IRC		16:59
*** dtantsur is now known as dtantsur\|afk		17:00
*** hongbin has quit IRC		17:02
cfriesen	mriedem: regarding the disk sector size issue...are you aware of any 8K sector disks or did you suggest it for future expansion?	17:03
stephenfin	cfriesen: Based on that as-is, it would not	17:03
stephenfin	cfriesen: Though I'd have expected to see an exception raised by the client, more so than anything else	17:04
mriedem	cfriesen: was just suggesting based on what was noted in the bug report	17:06
sean-k-mooney	im not aware of any 8k sector discs but i belive we can also diskcover the sector size by querying the disk via sysfs so we proably dont need to hardcode it	17:07
sean-k-mooney	that said 4k and 512 are the most common	17:07
cfriesen	I thought that 4K was still the highest supported physical sector size (since you'd want to be able to read a whole disk sector into a memory page)	17:07
dansmith	not everyone uses 4k pages :)	17:08
sean-k-mooney	power pc i think is 16k	17:08
dansmith	I thought there were some SAN types that used larger sector sizes just because of the network optimization,	17:08
dansmith	even if not backed by actual 8k	17:08
cfriesen	as far as I know the block size can be different from the sector size	17:09
dansmith	also, netapp I think uses some super odd sizes, even to the point of having weirdly low-level-formatted drives for them	17:09
sean-k-mooney	i know some raid controls can be configured to exposed larger sector sizes but i dont know how common that is anymore	17:09
dansmith	yep	17:09
sean-k-mooney	mriedem: i think one of the things you suggested was just making a config option correct	17:10
sean-k-mooney	something like directio_sector_sizes=512,4096	17:11
dansmith	ah, but it's hidden to the LUN: https://kb.netapp.com/app/answers/answer_view/a_id/1001353/~/how-can-the-bytes%2Fsector-be-changed-in-a-luns-geometry%3F-	17:13
cfriesen	the goal here is to figure out if the filesystem supports O_DIRECT. according to the man page, this should be set to the logical block size of the underlying storage, which can be determined using the ioctl() BLKSSZGET operation or by calling "blockdev --getss"	17:14
sean-k-mooney	cfriesen: i think part of the issue is that on older kernel < 2.4 O_DIRECT required alinged access	17:16
sean-k-mooney	but on bsd and newer linux kernel O_DIRECT did not reuqire alinged acess	17:16
sean-k-mooney	cfriesen: im not actully sure of we need to do the alignment check we are doing anymore	17:17
openstackgerrit	Merged openstack/nova stable/queens: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/612494	17:17
cfriesen	the man page says: Under Linux 2.4, transfer sizes, and the alignment of the user buffer	17:17
cfriesen	and the file offset must all be multiples of the logical block size	17:17
cfriesen	of the filesystem. Since Linux 2.6.0, alignment to the logical block	17:17
cfriesen	size of the underlying storage (typically 512 bytes) suffices.	17:17
cfriesen	oh, ick. sorry	17:17
openstackgerrit	Merged openstack/nova stable/queens: Add regression test for bug#1784353 https://review.openstack.org/612495	17:17
sean-k-mooney	cfriesen: ok so we still need aligned acess but the function is ment to determin if directio is possibel what its currently doing is determing if direct 512B aligned acess is possible with is a different thing	17:19
cfriesen	agreed. I think switching to 4K would cover 95% of the cases.	17:19
cfriesen	doing it totally correctly woudl require querying the block size from the OS for that specific device	17:19
mriedem	sean-k-mooney: i suggested a config option as an option because this sounds very hit or miss	17:20
sean-k-mooney	well the backing store for libvirt instance is only going to be on one mountpoint	17:21
mriedem	this is definitely not something i've got a lot of experience in though	17:21
sean-k-mooney	presumable it will all have the same alignment/sector size so we could jsut have a single valus and defualt it to 512 and they could set it to 4k or 8k if they have something else	17:22
sean-k-mooney	the other option is just super over align to like a 64K bondary	17:23
sean-k-mooney	that said im sure someone will have a 128K lun now that i have said that	17:23
cfriesen	so guaranteed setting it to 4K will work for both 4K and 512b disks, so I think 4K should be the default	17:24
sean-k-mooney	cfriesen: its also the most common sector size on most new disks so ya that should work	17:24
dansmith	for years now	17:25
cfriesen	I'd be okay with a config option if someone has weird hardware	17:25
sean-k-mooney	cfriesen: so your going to submit a patch :)_	17:26
cfriesen	there's already a patch in progress	17:26
cfriesen	by someone else	17:26
mriedem	wee https://bugs.launchpad.net/nova/+bug/1798688	17:27
openstack	Launchpad bug 1798688 in OpenStack Compute (nova) "AllocationUpdateFailed_Remote: Failed to update allocations for consumer. Error: another process changed the consumer after the report client read the consumer state during the claim" [Undecided,Triaged]	17:27
mriedem	looks like our scheduler allocation claim races have shot up since nov 4	17:27
cfriesen	mriedem: did you want to get the person to update the 4K patch to add a config option? or change the hardcoded number to 4K (which is still an improvement) and add the config option later if someone complains?	17:28
dansmith	mriedem: what does that mean exactly? just the scheduler having to retry the allocation part?	17:29
mriedem	dansmith: we already retry on PUT allocations in the scheduler	17:29
mriedem	maybe the error message changed and we're not retrying properly now?	17:29
mriedem	i haven't dug in yet	17:29
mriedem	cfriesen: the patch already just changes 512 to 4k right?	17:29
mriedem	cfriesen: i believe i just said we might want a 'fixes' release note for it	17:30
mriedem	as a heads up	17:30
melwitt	cfriesen: this is the method we have to checking for directio, if that's the same thing you mentioned earlier https://github.com/openstack/nova/blob/master/nova/privsep/utils.py#L34	17:30
dansmith	mriedem: I know we do, I'm wondering if you mean it's just having to retry more lately or if it's failing	17:30
mriedem	haven't dug into the logs yet	17:30
mriedem	hopefully we log if we are retrying	17:30
sean-k-mooney	mriedem: ya i was under teh impression we had planned at least to retry in this case which is why we have the generation on the resouce providers in the first place	17:30
*** markvoelker has quit IRC		17:30
dansmith	sean-k-mooney: we do retry	17:31
mriedem	the scheduler doesn't do anything with generations for this as far as i know	17:31
mriedem	just duplicated another bug in triage to this if someone is looking for work https://bugs.launchpad.net/nova/+bug/1783338	17:31
openstack	Launchpad bug 1783338 in OpenStack Compute (nova) "Unexpected exception in API method: ValueError: year is out of range" [Medium,Confirmed] - Assigned to Ghanshyam Mann (ghanshyammann)	17:31
mriedem	something in the simple tenant usage code	17:31
cfriesen	mriedem: in irc you were talking about a config option I thought. but yeah, I'd be cool with just a release note for now.	17:32
*** imacdonn has joined #openstack-nova		17:32
mriedem	gmann: looks like https://bugs.launchpad.net/nova/+bug/1783338 was due to bad data in the db? you're probably traveling, but if you don't plan on handling this we should unassign you https://bugs.launchpad.net/nova/+bug/1783338	17:32
openstack	Launchpad bug 1783338 in OpenStack Compute (nova) "Unexpected exception in API method: ValueError: year is out of range" [Medium,Confirmed] - Assigned to Ghanshyam Mann (ghanshyammann)	17:32
cfriesen	melwitt: yes, that's the one. there's a patch in review to change the 512 to 4096 in there. which is good, but maybe not sufficient for exotic hardware	17:32
mriedem	i'm mriedem	17:33
melwitt	I said something to him earlier	17:33
melwitt	cfriesen: ok, cool. looks for the patch	17:34
mriedem	oh missed that	17:34
cfriesen	melwitt: https://review.openstack.org/#/c/616580	17:34
melwitt	probably because our nicks blend together. maybe I need to be jgwentworth all the time	17:34
mriedem	dansmith: looks like, from the logs, that we're not retrying	17:35
dansmith	mriedem: maybe something changed recently then?	17:36
*** derekh has quit IRC		17:37
melwitt	cfriesen: ok, so looks like trying to decide which value to use for the check	17:39
*** sahid has quit IRC		17:40
mriedem	dansmith: my guess would be https://review.openstack.org/#/c/583667/	17:40
mriedem	because the scheduler logs are saying we're doing a double up allocation	17:40
mriedem	Nov 06 19:48:36.969356 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: DEBUG nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] Doubling-up allocation_request for move operation. {{(pid=13677) _move_operation_alloc_request /opt/stack/nova/nova/scheduler/client/report.py:203}}	17:40
mriedem	but in this test, we're just unshelving a shelved offloaded server	17:41
mriedem	so that shouldn't really double up any allocatoins	17:41
cfriesen	melwitt: basically, yes. 4096 would work for the vast majority of systems	17:42
mriedem	Nov 06 19:48:36.969659 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: DEBUG nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] New allocation_request containing both source and destination hosts in move operation: {'allocations': {u'3ceb7eab-549c-40ba-a70c-320822c310ab': {'resources': {u'VCPU': 2, u'MEMORY	17:42
mriedem	: 128}}}} {{(pid=13677) _move_operation_alloc_request /opt/stack/nova/nova/scheduler/client/report.py:234}}	17:42
dansmith	mriedem: it also touches the code near where we raise retry...	17:42
mriedem	^ is definitely wrong	17:42
mriedem	there is only one provider in that log	17:43
dansmith	mriedem: it seems to specifically exclude the consumer generation conflict from the case where we retry	17:43
dansmith	mriedem: do you see "another process changed the consumer" in the log?	17:46
mriedem	yes	17:46
dansmith	then it's hitting that consumer case and not retrying	17:46
mriedem	that's why we don't retry	17:46
dansmith	yeah	17:46
mriedem	i also don't know why it thinks we're starting with existing allocations for a shelved offloaded server	17:47
dansmith	and that's causing it to try to double?	17:47
mriedem	well, it goes into _move_operation_alloc_request but doesn't actually double anything	17:48
mriedem	Nov 06 19:48:36.969659 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: DEBUG nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] New allocation_request containing both source and destination hosts in move operation: {'allocations': {u'3ceb7eab-549c-40ba-a70c-320822c310ab': {'resources': {u'VCPU': 2, u'MEMORY	17:48
mriedem	: 128}}}} {{(pid=13677) _move_operation_alloc_request /opt/stack/nova/nova/scheduler/client/report.py:234}}	17:48
mriedem	there is only one provider in that body	17:48
mriedem	this is the error from placement	17:51
mriedem	Nov 06 19:48:37.013780 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: WARNING nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] Failed to save allocation for 6665f00a-dcf1-4286-b075-d7dcd7c37487. Got HTTP 409: {"errors": [{"status": 409, "request_id": "req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2", "code": "p	17:51
mriedem	ment.concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 1 ", "title": "Conflict"}]}	17:51
*** ralonsoh has quit IRC		17:51
*** ttsiouts has quit IRC		17:51
mriedem	idk wtf is going on, but i see 3 different PUT allocations in the placement logs for that consumer	17:58
mriedem	first is probably for the initial scheduler, and then we offload and remove allocations	17:58
mriedem	2nd is for the unshelve	17:59
*** ivve has joined #openstack-nova		18:00
mriedem	aha	18:07
mriedem	the allocatoin delete on unshelve changed with this patch https://review.openstack.org/#/c/591597/	18:07
mriedem	so we no longer actually delete allocations, we PUT {}	18:07
sean-k-mooney	quick question. in what cases does nova update the network info cache?	18:07
*** Swami has joined #openstack-nova		18:07
sean-k-mooney	i know it does it in respconce to neutron notification. there is also a periodic heal task right	18:08
sean-k-mooney	is that it?	18:08
mriedem	in all cases	18:08
mriedem	attach vifs	18:08
mriedem	etc	18:08
mriedem	dansmith: yeah so there are 3 PUTs for allocations, 1st for initial schedule, 2nd for shelve offload (PUT /allocations with {}) and then the 3rd is scheduling during unshelve	18:08
mriedem	since the allocations aren't deleted on shelve offload, the consumer must persist in placement with it's existing consumer generation	18:09
mriedem	which the sheduler in the 3rd PUT doesn't account for	18:09
dansmith	okay	18:09
dansmith	the scheduler assumes that the consumer is gone?	18:09
dansmith	I'm surprised it would care,	18:09
dansmith	because it doesn't know if we're doing an initial boot or a move right?	18:10
mriedem	it would know if we're doing a move if the consumer already has allocations elsewhere	18:10
dansmith	right, but otherwise it doesn't,	18:10
dansmith	and in this case there are no remaining allocations right?	18:11
mriedem	well,	18:11
dansmith	or are you saying it assumes that if you have no allocations the consumer can't exist?	18:11
*** sridharg has quit IRC		18:11
mriedem	that's not what the scheduler thinks, because it goes down that _move_operation_alloc_request path	18:11
mriedem	maybe the tempest test isn't really waiting for the instance be fully shelved offloaded before it unshelves	18:11
mriedem	ah indeed,	18:11
mriedem	we set the instance vm_state to SHELVED_OFFLOADED before we remove allocations	18:12
dansmith	I guess I'm still surprised it cares	18:12
mriedem	which is definitely a race	18:12
*** zul has quit IRC		18:12
dansmith	why?	18:16
dansmith	just because it signals to tempest that it's done?	18:16
dansmith	if we didn't,	18:17
dansmith	and we crashed right between deleting the allocations and makring it as offloaded we'd have lost some information I would think	18:17
mriedem	well, this also doesn't seem right	18:17
mriedem	https://review.openstack.org/#/c/591597/8/nova/scheduler/client/report.py@2091	18:17
mriedem	"# removing all resources from the allocation will auto delete the	18:17
mriedem	# consumer in placement"	18:17
dansmith	I guess maybe your point is that it's a race between starting the unshelve and there still being allocations...	18:18
mriedem	maybe that is happening, i'm not sure	18:18
mriedem	correct	18:18
mriedem	b/c i'm seeing this being true during unshelve https://github.com/openstack/nova/blob/e27905f482ba26d2bbf3ae5d948dee37523042d5/nova/scheduler/client/report.py#L1824	18:18
dansmith	not sure that's better either way	18:18
mriedem	which shouldn't be the case	18:18
*** k_mouza has joined #openstack-nova		18:19
dansmith	it's certainly possible that the move to PUT{} from DELETE didn't bring over some "and delete the consumer" part	18:19
dansmith	but again, I'm not sure why it should matter to the scheduler that it exists	18:20
dansmith	although...	18:20
dansmith	we have no api for looking at the consumer to get the generation if it already exists, IIRC	18:20
dansmith	so maybe without seeing an allocation, and not being able to see the consumer directly, we have no alternative?	18:20
mriedem	right, it's supposed to come back on the GET /allocations/{consumer} call	18:20
dansmith	I expect jaypipes to pop in here any second and say "ah hah!"	18:20
mriedem	jaypipes is busy chefing it up	18:21
dansmith	his fave	18:21
mriedem	and getting ready to sleep for a week while the rest of us are in berlin	18:21
dansmith	lucky bastard	18:21
dansmith	I'm going to be landing pretty soon, FYI	18:21
mriedem	i'm overdue for lunch as well	18:22
mriedem	so, can't claim_resources in the scheduler still just retry if it hits that consumer generatoin conflict?	18:22
dansmith	well,	18:22
dansmith	not if it doesn't know what the generation is	18:23
dansmith	that's what I was saying.. it might not be able to find out what it is,	18:23
dansmith	with no consumer api and no existing allocation to look at	18:23
mriedem	i would expect GET /allocations/{consumer_uuid} to return the consumer generation	18:23
mriedem	even if allocations are {}	18:23
dansmith	with no alloc records?	18:24
dansmith	I dunno	18:24
mriedem	but i guess i'd have to dig into the placement code	18:24
dansmith	I would expect that code returns 404 if none come back,	18:24
*** k_mouza has quit IRC		18:24
mriedem	should probably also log in placement when the consumer is deleted b/c allocations went to 0	18:24
dansmith	because it would only get the consume through the join, or afterwards I would expect	18:24
mriedem	no it doesn't 404, you get {"allocations": {}}	18:24
mriedem	if there are no allocations for the consumer	18:24
dansmith	oh?	18:24
mriedem	yeah it's confusing	18:24
dansmith	that seems supremely weird to me, but okay	18:25
mriedem	what does taylor think about all this?	18:25
dansmith	she's busy with her own work	18:26
jaypipes	mriedem: fuck chef. fuck ansible. fuck docker. it's all a bunch of complete assbaggery.	18:26
dansmith	my little mobile wifi router lets us share the same crappy airline wifi, so after that came online, I might as well not be sitting next to her	18:26
mriedem	jaypipes: but salt?!	18:26
* jaypipes reads back to see something about consumers.		18:26
dansmith	jaypipes: you're gonna love it	18:27
sean-k-mooney	mriedem: i think jaypipes has enough salt in his life right now	18:27
mriedem	jaypipes: notes are in https://bugs.launchpad.net/nova/+bug/1798688	18:27
openstack	Launchpad bug 1798688 in OpenStack Compute (nova) "AllocationUpdateFailed_Remote: Failed to update allocations for consumer. Error: another process changed the consumer after the report client read the consumer state during the claim" [Undecided,Triaged]	18:27
dansmith	jaypipes: question.. should placement have a consumers endpoint?	18:27
mriedem	looks like we're racing between shelve offload (allocation removal) and unshelve (put new allocations) and hitting a consumer generation conflict	18:27
sean-k-mooney	jaypipes: at least you dont have to use tripplo where we use yaml to drive heat to drive puppet to drive ansible to deploy docker containers ...	18:28
sean-k-mooney	or maybe the ansible drives puppet its hard to keep track of	18:28
dansmith	sean-k-mooney: it drives ... me insane	18:29
mriedem	so, i see in placement handler code where it handles the "allocations are being removed on PUT" case, and it ensures a consumer exists, but then i don't see where that consumer is deleted	18:30
mriedem	like the note in the compute code	18:30
mriedem	https://github.com/openstack/nova/blob/e27905f482ba26d2bbf3ae5d948dee37523042d5/nova/api/openstack/placement/handlers/allocation.py#L404	18:32
mriedem	oh i guess it should happen here https://github.com/openstack/nova/blob/e27905f482ba26d2bbf3ae5d948dee37523042d5/nova/api/openstack/placement/objects/resource_provider.py#L2099	18:34
mriedem	https://github.com/openstack/nova/blob/e27905f482ba26d2bbf3ae5d948dee37523042d5/nova/api/openstack/placement/objects/consumer.py#L70 might be broken	18:36
mriedem	in the same way that ensure was broken https://github.com/openstack/nova/commit/730936e535e67127c76d4f27649a16d8cf05efc9#diff-fcca11e34c1b5fce52a4ddbc418aa2d5	18:36
openstackgerrit	Merged openstack/nova stable/queens: conductor: Recreate volume attachments during a reschedule https://review.openstack.org/612496	18:37
openstackgerrit	Merged openstack/nova master: Update the description to make it more accuracy https://review.openstack.org/615362	18:37
mriedem	i can't really tell where delete_consumers_if_no_allocations is tested though...	18:39
mriedem	some gabbit i'm sure	18:40
dansmith	seems easily unit testable,	18:41
dansmith	and I definitely can't look at that and tell that it works	18:41
dansmith	since it's joining on consume id and asserting that it's none in one case	18:41
mriedem	yeah i can't do anything with the sql w/o testing it	18:42
dansmith	time to pack up, back later	18:42
mriedem	oh i guess DeleteConsumerIfNoAllocsTestCase	18:42
mriedem	i'll tweak that after lunch	18:43
*** bigdogstl has joined #openstack-nova		18:50
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add debug logs when doubling-up allocations during scheduling https://review.openstack.org/617016	18:53
openstackgerrit	Matt Riedemann proposed openstack/nova master: Log consumers_to_check when calling delete_consumers_if_no_allocations https://review.openstack.org/617017	18:53
mriedem	jaypipes: debug logging needed for this gate bug ^	18:53
jaypipes	still reading back, sorry	18:54
*** mriedem is now known as mriedem_hangry		18:55
jaypipes	dansmith, mriedem_hangry: there is a check at the end of the server-side of PUT /allocations that will auto-delete the consumer record if there are no allocations still referring to it.	19:04
jaypipes	dansmith: and yes, I've said for a long time that we should have a GET /consumers endpoint. There are placement devs that vehemently disagreed with that.	19:07
*** bigdogstl has quit IRC		19:08
*** bigdogstl has joined #openstack-nova		19:12
*** ivve has quit IRC		19:12
*** janki has quit IRC		19:22
*** zigo has quit IRC		19:25
*** bigdogstl has quit IRC		19:26
sean-k-mooney	mriedem_hangry: would you have any objection to backporting https://review.openstack.org/#/c/591607/9 to newton? im pretty sure we have a customer that is hitting this as they reported instance restarting after a host reboot missing interfaces that show up in nova interface-list	19:27
*** bigdogstl has joined #openstack-nova		19:30
sean-k-mooney	mriedem_hangry: actullly i just realised newton is way older then i remembered and is eol	19:34
*** mriedem_hangry is now known as mriedem		19:34
mriedem	jaypipes: yup found that, and the related functional test	19:34
mriedem	sean-k-mooney: not to mention that isn't even approved on master	19:35
*** bigdogstl has quit IRC		19:35
sean-k-mooney	mriedem: yes :) im aware. i was more asking do you think this is something that can be backported in general upstream	19:35
sean-k-mooney	once it lands in master	19:36
mriedem	idk	19:36
mriedem	it seems to be pretty controversial	19:37
mriedem	like my david bowie costume on halloween	19:37
sean-k-mooney	im waiting on more logs for the downstream bug to confirm this is actully the issue	19:38
sean-k-mooney	mriedem: ill try to review and digest these chagnes more on monday	19:40
sean-k-mooney	mriedem: are you flying out to berlin today/tomorow?	19:41
mriedem	the problem the huawei ops team ran into was policy changed on the neutron side which started returning an empty list of ports, which was then saved into the info cache in the nova db,	19:41
mriedem	and the heal periodic relies on the info cache rather than the source of truth to fix the cache	19:41
mriedem	tonight	19:41
mriedem	there are other ways to simply rebuild the cache if that's what is needed, e.g. https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-refresh-network	19:41
sean-k-mooney	right. i think we discussed this in the past at some point too it feels familar but i have not reviewd this before	19:42
mriedem	it came up at the ptg i think	19:42
mriedem	and the public cloud sig has brought it up before (OVH obviously)	19:42
sean-k-mooney	mriedem: perhaps. oh so we can force a rebuild via the nova client today?	19:42
* sean-k-mooney clicks		19:42
mriedem	it doesn't rebuild from neutron	19:43
mriedem	i don't think anyway	19:43
mriedem	it just sends a network-changed event to the compute	19:43
sean-k-mooney	oh it rebuilds form the vif table in the nova db?	19:43
mriedem	yes	19:43
mriedem	well,	19:43
mriedem	from the info cache	19:43
mriedem	iow,	19:44
sean-k-mooney	ok but if the info cache got currpted when the host rebooted your still stuck	19:44
mriedem	the network-changed event and _heal_instance_info_cache periodic do the same thing today	19:44
mriedem	correct	19:44
mriedem	hence the reason for making the periodic actually "heal" from the source of truth, which is neutron	19:44
mriedem	and not our potentially corrupted cache	19:44
sean-k-mooney	so i suggested they detach the missing interfaces and reattach them as a workaround for now to try and force nova and neutron to resync	19:45
sean-k-mooney	i think that would work in this case but its not ideal	19:45
sean-k-mooney	mriedem: anyway thanks. i was sraching my head for the last day or so trying to parse what was going on from incomplete logs but im 98% sure this is it.	19:47
sean-k-mooney	mriedem: have a safe trip.	19:47
*** bigdogstl has joined #openstack-nova		19:53
*** bigdogstl has quit IRC		19:57
*** sambetts\|afk has quit IRC		20:00
*** sambetts_ has joined #openstack-nova		20:02
*** efried is now known as fried_rice		20:09
*** munimeha1 has joined #openstack-nova		20:19
*** erlon has quit IRC		20:47
mriedem	thanks	20:48
*** eharney has quit IRC		20:53
dansmith	mriedem: so you found a test that confirmed the behavior of that thing?	20:58
dansmith	mriedem: that deletes the consumer?	20:58
*** bigdogstl has joined #openstack-nova		20:59
mriedem	DeleteConsumerIfNoAllocsTestCase is the functional test that covers that case,	21:02
mriedem	and it looks like a correct test to me,	21:02
mriedem	creates 2 consumers each with 2 allocations on different resource classes,	21:03
mriedem	clears the allocations for one of them and asserts the consumer is gone	21:03
mriedem	i think we're just hitting a race with the shelve offloaded status change before we cleanup the allocations	21:03
mriedem	but i've posted a couple of patches to add debug logs to help determine if that's the case	21:03
mriedem	https://review.openstack.org/617016	21:03
dansmith	okay I'm not sure how we could race and see no allocations but a consumer and get that generation conflict	21:04
dansmith	it'd be one thing if we thought the consumer was there and then disappeared out from under us	21:05
*** lbragstad has quit IRC		21:08
*** bigdogstl has quit IRC		21:09
*** bigdogstl has joined #openstack-nova		21:13
mriedem	during unshelve the scheduler does see allocations	21:17
mriedem	and it thinks we're doing a move	21:17
dansmith	okay I thought you pasted a line showing that there was only one allocation going back to placement	21:17
mriedem	there are 3 PUTs for allocations	21:18
mriedem	1. create the server - initial	21:18
mriedem	2. shelve offload - wipe the allocations to {} - which should delete the consumer	21:18
mriedem	3. unshelve - scheduler claims resources with the wrong consumer generation	21:18
mriedem	and when 3 happens, the scheduler gets allocations for hte consumer and they are there,	21:18
dansmith	...right	21:18
*** bigdogstl has quit IRC		21:18
mriedem	so it uses the consumer generation (1) from those allocations	21:18
mriedem	then i think what happens is,	21:19
dansmith	oh, so it passes generation=1 instead of generation=0, meaning new consumer?	21:19
mriedem	placement recreates the consumer which will have generation null	21:19
mriedem	yes	21:19
dansmith	okay I see	21:19
dansmith	I thought you were seeing consumer generation was null or zero or whatever in the third put, but still getting a conflict	21:19
dansmith	but that makes sense now	21:19
mriedem	Nov 06 19:48:37.013780 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: WARNING nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] Failed to save allocation for 6665f00a-dcf1-4286-b075-d7dcd7c37487. Got HTTP 409: {"errors": [{"status": 409, "request_id": "req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2", "code": "p	21:20
mriedem	ment.concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 1 ", "title": "Conflict"}]}	21:20
mriedem	consumer generation conflict - expected null but got 1	21:20
mriedem	yup - so new consumer but we're passing a generation of 1	21:20
mriedem	from the old, now deleted consumer	21:20
dansmith	cool	21:21
mriedem	so,	21:21
dansmith	I wish there was something better to communicate that, but any time we get "expected null" in that case, we should be able to re-try but as a non-move sort of thing	21:21
mriedem	we can paper over this by deleting the allocations before marking the instance as shelved offloaded, but that's whack-a-moley	21:21
dansmith	yeah	21:21
mriedem	right we need to retry from claim_resources but i'm not sure what's the best way to do that	21:22
dansmith	and like I said, I think it's not really any better, it just changes the problem	21:22
dansmith	yeah	21:22
mriedem	if we do retry that method, the next get for allocations will see there are none and we should be good	21:22
dansmith	right	21:22
mriedem	b/c we'll pass consumer_generation=None	21:22
mriedem	i think i know what we can do	21:22
mriedem	if we hit	21:23
mriedem	if 'consumer generation conflict' in err['detail']:	21:23
mriedem	we get the allocs again, and if empty,	21:23
mriedem	we retry	21:23
mriedem	easy peasy	21:23
mriedem	it's a double get but meh?	21:23
dansmith	yeah, it's just that it takes us an extra op,	21:23
dansmith	when "expected null" should be enough	21:23
dansmith	yeah	21:23
mriedem	i can parse that out of the message if we want..	21:23
dansmith	I know we can, it's just icky and unfortunate	21:23
mriedem	with a TODO for more granular error codes later	21:23
dansmith	like all the other cases in there	21:24
mriedem	right	21:24
mriedem	i haven't even started packing yet	21:24
mriedem	laura is starting to check in on me every 30 minutes	21:24
mriedem	"this is what i'm wearing all week! god!"	21:24
dansmith	hah	21:24
mriedem	plus my mother in law is here,	21:25
mriedem	so lots of teenage angst memories coming back right now	21:25
mriedem	the coffee and metallica doesn't hel[p	21:25
dansmith	isn't that a good reason to pack and get out?	21:25
mriedem	i've just holed up in my office	21:26
mriedem	i'll crank out a patch for this bug and be off	21:26
*** eharney has joined #openstack-nova		21:43
openstackgerrit	Jack Ding proposed openstack/nova master: Use virt.images.convert_image for qemu-img convert https://review.openstack.org/616692	21:45
fried_rice	mriedem: Sanity check, please. The compute manager has a report client via the scheduler client, that's not the same as the report client the resource tracker has.	21:52
fried_rice	which means my current SIGHUP doesn't do shit to the RT's cache	21:53
fried_rice	I need to make the report client a singleton.	21:53
mriedem	correct	21:53
mriedem	we have report clients all over the place	21:53
mriedem	api, conductor, scheduler	21:53
mriedem	etc	21:53
fried_rice	that's a scroo	21:54
fried_rice	mriedem: So - make the report client a singleton (per process), or just diddle the compute manager's reset to hit the rt's reportclient instead.	21:54
fried_rice	f, without knowing what the various ones in the compute manager are used for, is it really safe to make them a singleton?	21:55
mriedem	the latter would be a smaller blast area	21:56
fried_rice	I think I may have actually done this to myself, by removing that LazyLoader	21:57
fried_rice	I suspect that guy was incidentally singleton-ing.	21:57
* fried_rice looks...		21:57
fried_rice	nah, that should still have been creating separate instances per scheduler client.	21:58
dansmith	fried_rice: yeah I thought that was making it a singleton a month ago when I was looking at it	22:03
dansmith	a lot of stuff in nova used to be lazy loaded because.. um, terrible reasons	22:04
dansmith	lazy loaded or pluggable	22:04
fried_rice	dansmith: In this case it was supposedly because of a circular import. Whether that was ever really an issue, it isn't now, so I ripped it out. But having just looked, I still don't think it was making the report client a singleton. Care to confirm?	22:04
dansmith	I think I confirmed that a month ago when I was looking into a seemingly recent memory leak	22:05
dansmith	so I think it's fine that it's gone	22:05
dansmith	I agree that randomly making it a singleton now should be done with care	22:06
dansmith	but I don't really know how to convince myself that it's okay once its done, tbh	22:06
openstackgerrit	Jack Ding proposed openstack/nova master: Use virt.images.convert_image for qemu-img convert https://review.openstack.org/616692	22:06
fried_rice	well, using the RT's report client fixed the problem I was having. So maybe I pretend singleton was never suggested.	22:07
fried_rice	oh, f, this is gonna break all over the place. I can't see a reason why the compute manager would possibly want or need to use separate report clients. I'd really like to put 'em together. If not making it a singleton, at least using only one of them from the compute manager.	22:10
* dansmith ->plane		22:10
fried_rice	o/	22:10
*** eharney has quit IRC		22:15
mriedem	fried_rice: the compute manager / RT using the same report client is probably fine,	22:15
mriedem	a lot of that compute manager / RT code was cleaned up way back in ocata i think when jaypipes made the RT a singleton that tracked multiple compute nodes,	22:16
fried_rice	mriedem: Ima put up an independent patch for that	22:16
mriedem	whereas before it was 1 RT per compute node	22:16
fried_rice	ah	22:16
mriedem	they are very tightly coupled, like how the compute manager passes the virt driver into the RT	22:16
openstackgerrit	Jack Ding proposed openstack/nova master: Use virt.images.convert_image for qemu-img convert https://review.openstack.org/616692	22:16
openstackgerrit	Eric Fried proposed openstack/nova master: SIGHUP n-cpu to refresh provider tree cache https://review.openstack.org/615646	22:18
openstackgerrit	Eric Fried proposed openstack/nova master: Reduce calls to placement from _ensure https://review.openstack.org/615677	22:18
openstackgerrit	Eric Fried proposed openstack/nova master: Consolidate inventory refresh https://review.openstack.org/615695	22:18
openstackgerrit	Eric Fried proposed openstack/nova master: Commonize _update code path https://review.openstack.org/615705	22:18
openstackgerrit	Eric Fried proposed openstack/nova master: Turn off rp association refresh in nova-next https://review.openstack.org/616033	22:18
fried_rice	let's see how that goes	22:18
fried_rice	mriedem: Oh, my removal of lazyload probably reinstated "lockutils spam" mentioned in nova/compute/api.py@257	22:21
openstackgerrit	Matt Riedemann proposed openstack/nova master: Retry on consumer delete race in claim_resources https://review.openstack.org/617040	22:25
mriedem	dansmith: jaypipes: fried_rice: ^ bingo bango	22:25
mriedem	gibi: you too ^	22:25
mriedem	the commit message is longer than the code	22:25
mriedem	and with that i'm off	22:29
*** mriedem has quit IRC		22:29
*** bigdogstl has joined #openstack-nova		22:51
openstackgerrit	Eric Fried proposed openstack/nova master: Rip the report client out of SchedulerClient https://review.openstack.org/617042	22:52
openstackgerrit	Eric Fried proposed openstack/nova master: Rip the report client out of SchedulerClient https://review.openstack.org/617042	22:54
*** spatel has quit IRC		22:56
*** betherly has joined #openstack-nova		23:02
*** bigdogstl has quit IRC		23:03
*** bigdogstl has joined #openstack-nova		23:05
*** betherly has quit IRC		23:06
*** bigdogstl has quit IRC		23:10
*** bigdogstl has joined #openstack-nova		23:11
*** munimeha1 has quit IRC		23:12
openstackgerrit	Vlad Gusev proposed openstack/nova stable/pike: libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/604039	23:17
*** elod has quit IRC		23:17
openstackgerrit	Merged openstack/nova stable/pike: Make scheduler.utils.setup_instance_group query all cells https://review.openstack.org/599841	23:18
openstackgerrit	Vlad Gusev proposed openstack/nova stable/pike: libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/604039	23:18
*** elod has joined #openstack-nova		23:18
openstackgerrit	Merged openstack/nova master: Add recreate test for bug 1799892 https://review.openstack.org/613304	23:20
openstack	bug 1799892 in OpenStack Compute (nova) rocky "Placement API crashes with 500s in Rocky upgrade with downed compute nodes" [Medium,Triaged] https://launchpad.net/bugs/1799892	23:20
*** bigdogstl has quit IRC		23:24
aspiers	mriedem: thanks for the review! Regarding technical debt, my understanding is that the intention is very much for SUSE/AMD to carry on working to flesh out the functionality after implementation of the MVP described in the initial spec, rather than just to dump some half-baked implementation upstream and then vanish ;-) This would include adding support for attestation, migration etc.	23:25
aspiers	ah, he's gone	23:26
*** bigdogstl has joined #openstack-nova		23:27
openstackgerrit	Vlad Gusev proposed openstack/nova stable/pike: libvirt: Use os.stat and os.path.getsize for RAW disk inspection https://review.openstack.org/607544	23:27
*** s10 has joined #openstack-nova		23:28
*** bigdogstl has quit IRC		23:32
*** bigdogstl has joined #openstack-nova		23:43
*** bigdogstl has quit IRC		23:54
openstackgerrit	Merged openstack/nova master: Mention meta key suffix in tenant isolation with placement docs https://review.openstack.org/616991	23:56
openstackgerrit	Ken'ichi Ohmichi proposed openstack/nova master: api-ref: Add a description about sort order https://review.openstack.org/616773	23:57
*** slaweq has quit IRC		23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!