Tuesday, 2019-09-10

*** ociuhandu has joined #openstack-nova		00:10
*** factor has joined #openstack-nova		00:12
*** ociuhandu has quit IRC		00:15
*** ganso has quit IRC		00:23
*** avolkov has quit IRC		00:29
*** gbarros has joined #openstack-nova		00:30
*** sapd1_x has quit IRC		00:35
*** macz has quit IRC		00:39
*** gbarros has quit IRC		00:40
*** gyee has quit IRC		00:42
*** gbarros has joined #openstack-nova		00:46
*** markvoelker has joined #openstack-nova		00:46
*** markvoelker has quit IRC		00:48
*** markvoelker has joined #openstack-nova		00:49
*** spatel has joined #openstack-nova		00:49
*** spatel has quit IRC		00:53
*** gbarros has quit IRC		00:57
*** markvoelker has quit IRC		00:59
*** markvoelker has joined #openstack-nova		00:59
*** markvoelker has quit IRC		01:04
*** nicolasbock has quit IRC		01:04
*** nicolasbock has joined #openstack-nova		01:04
*** markvoelker has joined #openstack-nova		01:07
*** markvoelker has quit IRC		01:17
*** markvoelker has joined #openstack-nova		01:18
*** markvoelker has quit IRC		01:23
*** mtanino has joined #openstack-nova		01:25
*** hongbin has joined #openstack-nova		01:35
*** awalende has joined #openstack-nova		01:46
*** markvoelker has joined #openstack-nova		01:48
*** awalende has quit IRC		01:50
openstackgerrit	Merged openstack/python-novaclient master: Microversion 2.79: Add delete_on_termination to volume-attach API https://review.opendev.org/673485	01:59
*** nicolasbock has quit IRC		02:01
*** spsurya has joined #openstack-nova		02:17
*** markvoelker has quit IRC		02:20
*** markvoelker has joined #openstack-nova		02:22
*** icarusfactor has joined #openstack-nova		02:25
openstackgerrit	Akira KAMIO proposed openstack/nova master: VMware: disk_io_limits settings are not reflected when resize https://review.opendev.org/680296	02:27
*** factor has quit IRC		02:27
*** ganso has joined #openstack-nova		02:33
*** larainema has joined #openstack-nova		02:33
*** macz has joined #openstack-nova		02:36
*** macz has quit IRC		02:41
*** BjoernT has joined #openstack-nova		03:01
openstackgerrit	Artom Lifshitz proposed openstack/nova master: New objects for NUMA live migration https://review.opendev.org/634827	03:01
openstackgerrit	Artom Lifshitz proposed openstack/nova master: LM: Use Claims to update numa-related XML on the source https://review.opendev.org/635229	03:01
openstackgerrit	Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606	03:01
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021	03:01
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595	03:01
*** hamzy_ has quit IRC		03:06
*** BjoernT has quit IRC		03:24
*** igordc has quit IRC		03:31
*** dave-mccowan has quit IRC		03:35
*** ash2307 has joined #openstack-nova		03:36
*** Izza_ has joined #openstack-nova		03:37
Izza_	hello...good day, i'm doing tempest testing on openstack helm environment but i encountered error "Got Server Fault"	03:38
Izza_	tempest.lib.exceptions.ServerFault: Got server fault	03:38
Izza_	Failed 1 tests - output below:	03:39
Izza_	"/usr/lib/python2.7/site-packages/tempest/test.py", line 172, in setUpClass	03:39
*** Izza_ has quit IRC		03:39
*** Izza_ has joined #openstack-nova		03:41
Izza_	Failed 1 tests - output below:	03:41
Izza_	"/usr/lib/python2.7/site-packages/tempest/test.py", line 172, in setUpClass	03:41
*** Izza_ has quit IRC		03:41
*** Izza_ has joined #openstack-nova		03:43
Izza_	hi can u pls help me on my tempest testing, scenario: tempest.api.compute.servers.test_create_server.ServersTestJSON , error: tempest.lib.exceptions.ServerFault: Got server fault	03:44
Izza_	Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.	03:44
*** hongbin has quit IRC		03:46
*** mkrai has joined #openstack-nova		03:59
openstackgerrit	Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606	03:59
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021	03:59
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595	03:59
*** udesale has joined #openstack-nova		04:07
*** etp has joined #openstack-nova		04:18
*** mtanino has quit IRC		04:19
openstackgerrit	Sundar Nadathur proposed openstack/nova master: ksa auth conf and client for Cyborg access https://review.opendev.org/631242	04:30
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243	04:30
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244	04:30
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Get resolved Cyborg ARQs and add PCI BDFs to VM's domain XML. https://review.opendev.org/631245	04:30
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735	04:30
openstackgerrit	Sundar Nadathur proposed openstack/nova master: [WIP] add cyborg tempest job https://review.opendev.org/670999	04:30
*** ociuhandu has joined #openstack-nova		04:30
*** ociuhandu has quit IRC		04:34
*** Luzi has joined #openstack-nova		04:36
*** macz has joined #openstack-nova		04:37
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243	04:39
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244	04:39
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Get resolved Cyborg ARQs and add PCI BDFs to VM's domain XML. https://review.opendev.org/631245	04:39
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735	04:39
openstackgerrit	Sundar Nadathur proposed openstack/nova master: [WIP] add cyborg tempest job https://review.opendev.org/670999	04:39
*** etp_ has joined #openstack-nova		04:41
*** macz has quit IRC		04:41
*** etp_ has quit IRC		04:41
*** etp_ has joined #openstack-nova		04:42
*** damien_r has joined #openstack-nova		04:42
*** etp_ has quit IRC		04:43
*** mkrai has quit IRC		04:43
*** igordc has joined #openstack-nova		04:44
*** etp_ has joined #openstack-nova		04:44
*** etp has quit IRC		04:46
*** etp has joined #openstack-nova		04:46
*** damien_r has quit IRC		04:47
*** etp_ has quit IRC		04:47
*** etp_ has joined #openstack-nova		04:47
*** markvoelker has quit IRC		04:48
*** etp_ has quit IRC		04:48
*** etp_ has joined #openstack-nova		04:48
*** etp has quit IRC		04:50
*** etp_ is now known as etp		04:50
*** etp_ has joined #openstack-nova		04:50
*** etp has quit IRC		04:51
*** ricolin has joined #openstack-nova		05:00
*** ratailor has joined #openstack-nova		05:04
*** etp_ has quit IRC		05:06
*** markvoelker has joined #openstack-nova		05:26
*** markvoelker has quit IRC		05:30
*** ralonsoh has joined #openstack-nova		05:38
*** pots has quit IRC		05:42
*** etp has joined #openstack-nova		05:47
openstackgerrit	Luyao Zhong proposed openstack/nova master: Claim resources in resource tracker https://review.opendev.org/678452	06:13
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces https://review.opendev.org/678453	06:13
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree https://review.opendev.org/678454	06:13
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup https://review.opendev.org/678455	06:13
openstackgerrit	Luyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec https://review.opendev.org/678456	06:13
*** dtantsur\|afk is now known as dtantsur		06:15
*** arshad777 has joined #openstack-nova		06:20
*** dklyle has quit IRC		06:20
*** dklyle has joined #openstack-nova		06:21
arshad777	I have created a sync replicated volume with peer persistence enabled. Attached this volume	06:24
*** sapd1_x has joined #openstack-nova		06:25
*** N3l1x has quit IRC		06:27
*** jawad_axd has joined #openstack-nova		06:29
*** slaweq has joined #openstack-nova		06:41
*** ileixe has joined #openstack-nova		06:45
*** luksky has joined #openstack-nova		06:52
*** ileixe has quit IRC		06:52
*** igordc has quit IRC		07:00
*** avolkov has joined #openstack-nova		07:01
*** tesseract has joined #openstack-nova		07:05
*** awalende has joined #openstack-nova		07:08
*** rcernin has quit IRC		07:09
*** damien_r has joined #openstack-nova		07:10
*** damien_r has quit IRC		07:11
*** damien_r has joined #openstack-nova		07:11
*** ociuhandu has joined #openstack-nova		07:14
*** maciejjozefczyk has joined #openstack-nova		07:15
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces https://review.opendev.org/679640	07:18
openstackgerrit	Luyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory https://review.opendev.org/678470	07:18
*** threestrands has quit IRC		07:20
*** ociuhandu has quit IRC		07:21
*** rpittau\|afk is now known as rpittau		07:28
*** cdent has joined #openstack-nova		07:30
*** tssurya has joined #openstack-nova		07:31
openstackgerrit	Takashi NATSUME proposed openstack/python-novaclient master: doc: Add support microversions for options https://review.opendev.org/681174	07:36
*** macz has joined #openstack-nova		07:39
*** macz has quit IRC		07:43
*** trident has quit IRC		07:50
*** lpetrut has joined #openstack-nova		07:55
*** jangutter has joined #openstack-nova		07:58
*** trident has joined #openstack-nova		08:01
*** priteau has joined #openstack-nova		08:07
*** sapd1_x has quit IRC		08:18
*** ociuhandu has joined #openstack-nova		08:20
bauzas	stephenfin: once you're there, see dansmith's comments	08:21
bauzas	stephenfin: he wants to squash https://review.opendev.org/#/c/680983/	08:21
bauzas	stephenfin: so please try to provide the new revisions this morning	08:21
*** tkajinam has quit IRC		08:22
*** panda\|rover has quit IRC		08:23
*** panda has joined #openstack-nova		08:24
*** ociuhandu has quit IRC		08:25
*** ociuhandu has joined #openstack-nova		08:26
*** ociuhandu has quit IRC		08:30
*** ociuhandu has joined #openstack-nova		08:30
*** takashin has left #openstack-nova		08:32
*** derekh has joined #openstack-nova		08:33
*** ociuhandu has quit IRC		08:40
*** ociuhandu has joined #openstack-nova		08:41
*** ociuhandu has quit IRC		08:44
*** yaawang_ has joined #openstack-nova		08:49
*** yaawang has quit IRC		08:49
stephenfin	sure thing	08:51
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth https://review.opendev.org/676140	09:02
*** jaosorior has joined #openstack-nova		09:11
openstackgerrit	weibin proposed openstack/nova master: Add support for using ceph RBD ereasure code https://review.opendev.org/681188	09:11
*** shilpasd has joined #openstack-nova		09:17
*** tetsuro has joined #openstack-nova		09:18
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth https://review.opendev.org/676972	09:19
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth https://review.opendev.org/676980	09:21
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request https://review.opendev.org/671497	09:23
stephenfin	bauzas: trivial patch with no conflicts here https://review.opendev.org/#/c/679339/	09:24
stephenfin	(if you'd be so kind)	09:24
bauzas	reminder : I'm French, I'm never kind	09:25
stephenfin	bauzas: I'm going to rebase my the cpu-resources series on top of the SEV series to head off the incoming merge conflicts too	09:25
stephenfin	while I rework it, that is	09:25
bauzas	stephenfin: +Wd FWIW	09:26
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize https://review.opendev.org/678827	09:26
bauzas	stephenfin: cool, I'll look at gibi's series then	09:26
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Allow resizing server with port resource request https://review.opendev.org/679019	09:28
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Extract pf$N literals as constants from func test https://review.opendev.org/680991	09:30
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Improve dest service level func tests https://review.opendev.org/680998	09:30
*** yaawang_ has quit IRC		09:33
*** yaawang has joined #openstack-nova		09:33
openstackgerrit	Brin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API https://review.opendev.org/675023	09:34
*** boxiang has joined #openstack-nova		09:35
*** yaawang has quit IRC		09:37
*** yaawang has joined #openstack-nova		09:37
openstackgerrit	Brin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API https://review.opendev.org/675023	09:38
*** rcernin has joined #openstack-nova		09:38
*** yaawang has quit IRC		09:41
*** aarents has quit IRC		09:42
*** yaawang has joined #openstack-nova		09:43
bauzas	gibi: I'm almost done with https://review.opendev.org/#/c/676140 but I'll need to do some checks later today	09:48
* bauzas disappears for the gym		09:48
openstackgerrit	Brin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API https://review.opendev.org/675023	09:57
*** dklyle has quit IRC		10:03
*** dklyle has joined #openstack-nova		10:04
*** udesale has quit IRC		10:09
aspiers	stephenfin: I'm here in case you have any last minute questions about https://review.opendev.org/#/c/644565/	10:09
*** udesale has joined #openstack-nova		10:10
*** markvoelker has joined #openstack-nova		10:16
*** tetsuro has quit IRC		10:17
*** markvoelker has quit IRC		10:21
gibi	bauzas: thanks	10:21
*** macz has joined #openstack-nova		10:25
*** macz has quit IRC		10:30
openstackgerrit	Alexandra Settle proposed openstack/nova master: Fixing broken links https://review.opendev.org/681206	10:31
openstackgerrit	Alexandra Settle proposed openstack/nova master: Fixing broken links https://review.opendev.org/681206	10:32
*** jawad_axd has quit IRC		10:40
*** tbachman has quit IRC		10:42
*** markvoelker has joined #openstack-nova		10:55
aspiers	Is there a way to get a guest's libvirt XML via the nova api? I'm guessing no	11:00
*** ociuhandu has joined #openstack-nova		11:00
*** markvoelker has quit IRC		11:00
aspiers	Would be nice if it was available from https://docs.openstack.org/api-ref/compute/?expanded=show-server-diagnostics-detail#show-server-diagnostics	11:01
*** ociuhandu has quit IRC		11:01
aspiers	Without the XML I am struggling to see how Tempest can verify that an SEV guest was actually booted with SEV enabled	11:01
*** nicolasbock has joined #openstack-nova		11:06
gmann	aspiers: same case for NFV use case testing from artom L180 -https://etherpad.openstack.org/p/qa-train-ptg	11:07
aspiers	hmm	11:07
aspiers	gmann: which line?	11:07
*** ociuhandu has joined #openstack-nova		11:07
gmann	in QA PTG, we accepted the idea of adding that tempest plugin into QA but it was action item for artom to propose that.	11:08
gmann	L180	11:08
aspiers	thanks	11:08
aspiers	oh nice	11:08
aspiers	yeah "white box plugin" sounds like a good concept	11:08
gmann	yeah and it can be expanded with more use case which is out of scope from Tempest	11:09
aspiers	right	11:09
*** rouk has quit IRC		11:09
aspiers	gmann: although I wonder if adding libvirt XML to the show-server-diagnostics API call might be a simpler solution	11:10
aspiers	showing the XML could be an admin-only thing	11:11
gmann	but even admin-only does it expose more info than nova should ? (from security point of view.)	11:12
gmann	I think it was discussed previously also but sean-k-mooney or artom might know more on that.	11:13
*** udesale has quit IRC		11:14
*** larainema has quit IRC		11:15
artom	gmann, yeah, I still want to do that, I was meant to propose a spec for Train but only have a very WIP up	11:16
*** dave-mccowan has joined #openstack-nova		11:17
artom	The code exists in RDO project's gerrit, and a Red Hat QE and myself wanted to clear outstanding reviews on there and merging its tests for NUMA live migration	11:17
artom	But that didn't happen yet	11:17
gmann	i see.	11:17
sean-k-mooney	gmann: sorry i was not following chat	11:18
sean-k-mooney	what were we talking about	11:18
gmann	sean-k-mooney: question from aspiers on exposing the libvirt XML to the show-server-diagnostics API	11:18
sean-k-mooney	aspiers: no there is not a way to get the xml form the api	11:18
sean-k-mooney	and there nerver will be	11:19
aspiers	that's a bold statement :)	11:19
sean-k-mooney	it completely violates the cloud abstration to expose that level of detail via the api	11:19
sean-k-mooney	a non admin is not even ment to know the hypervior that is in use	11:19
aspiers	sean-k-mooney: as admin-only diagnostics there is no violation	11:20
*** jawad_axd has joined #openstack-nova		11:20
aspiers	sean-k-mooney: we're not talking about non-admins	11:20
sean-k-mooney	as admin only technically be they could jsut ssh into the host and look at the xml	11:20
aspiers	sean-k-mooney: not from tempest they can't	11:20
aspiers	plus that's a lot less convenient	11:20
sean-k-mooney	tempest shoudl not be asserting behavior of the xml generation	11:20
openstackgerrit	Brin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API https://review.opendev.org/675023	11:20
aspiers	sean-k-mooney: please first read the use case above to understand the need ^^^	11:21
sean-k-mooney	that is what functional or white box testing is for	11:21
artom	aspiers, yeah, that's quite explicitly out of scope for tempest	11:21
sean-k-mooney	tempest if for blackbox testing	11:21
artom	But... quite explicitly in scope for whitebox :)	11:21
gmann	yeah	11:21
artom	So now you've landed yourself on the list of people interested, and will be poked mercilessly once it's ready ;)	11:21
aspiers	again, this is repeating discussion from a few minutes ago when we talked about the white box plugin	11:21
aspiers	if there is a white box plugin for tempest, then that means white box testing is in scope for the tempest ecosystem, even if not the core	11:22
sean-k-mooney	aspiers: so why cant you boot a vm and ssh into an detect that sev is configured from within the vm?	11:22
sean-k-mooney	or at least available	11:22
aspiers	sean-k-mooney: how would I detect that?	11:22
sean-k-mooney	lscpu?	11:22
sean-k-mooney	is there not an msr or cpu flag for sev	11:22
aspiers	sean-k-mooney: have you tested that?	11:22
sean-k-mooney	no i dont have sev hardware	11:22
gmann	aspiers: within scope of QA ecosystem not Tempest ecosystem. Tempest is just a tool under QA :)	11:23
artom	aspiers, yeah, we're not saying "don't do it", we're saying "don't propose patches for it to Tempest"	11:23
aspiers	artom: OK, it sounded like the former before :)	11:23
aspiers	gmann: the white box tempest plugin isn't in the tempest ecosystem? ;-)	11:24
sean-k-mooney	aspiers: whitebox and the intel nfv test repo use tempest as a framework	11:24
sean-k-mooney	to do this type of testing	11:24
sean-k-mooney	but its not in tempest as its out of scope fo tempest	11:24
sean-k-mooney	but a whitebox style tempest plugin for sev would be fine	11:24
sean-k-mooney	or add it to whitebox	11:24
artom	sean-k-mooney, unrelated, but we had some discussion around saving the new NUMA topology in https://review.opendev.org/#/c/634606/75/nova/compute/manager.py@7223 - and in func tests at least, that instance.refresh() isn't necessary (yes, the tests check the InstanceNUMATopology)	11:25
gmann	aspiers: It will be QA ecosystem. it can be done via tempest plugin or separate testing framework like extreme-testing( which never got progress ). but it will be separate project under QA with separate team.	11:25
artom	sean-k-mooney, I'll try to get to the office later this morning, to see what's up with my machine, would you have the bandwidth to play around with that in the meantime in your env?	11:25
sean-k-mooney	artom: since this is aparently our highest priority i can make time	11:26
sean-k-mooney	what exactly do you want me to test	11:26
sean-k-mooney	remove the instace.refresh	11:26
artom	sean-k-mooney, I did that already in the latest patchset	11:26
artom	Making sure that the new instance NUMA topology is saved in the DB	11:27
sean-k-mooney	and check both the db and virsh to confrim that the state is updated correctly?	11:27
sean-k-mooney	ok	11:27
sean-k-mooney	ya ill do that now	11:27
artom	Thank you (for the ∞'s time)	11:27
artom	:)	11:27
aspiers	sean-k-mooney: BTW lscpu on the guest does not mention sev at all	11:28
sean-k-mooney	as i said before i have exposed the servers im using for testing via port forwarding so if you continue to have issue then you can ssh into them	11:28
sean-k-mooney	aspiers: is there anything in dmidecode/dmesg to indicate sev	11:29
sean-k-mooney	i though the guest had to set bit 48 to 1 to enable the encryption	11:29
sean-k-mooney	for pointers	11:29
aspiers	sean-k-mooney: http://paste.openstack.org/show/774681/	11:30
aspiers	doesn't even look right	11:30
artom	sean-k-mooney, IIRC when I tried connecting last time I couldn't - but yeah, what's the connection info again?	11:31
sean-k-mooney	https://events.linuxfoundation.org/wp-content/uploads/2017/12/Extending-Secure-Encrypted-Virtualization-with-SEV-ES-Thomas-Lendacky-AMD.pdf looking at slide 12 we might be able to check it via the gurest msr	11:32
sean-k-mooney	artom: i have two routter my isp one and my ubiquity one. my isp router firwall was blocking it so i truned it off and it started working	11:32
aspiers	sean-k-mooney: I will ask the experts	11:33
artom	sean-k-mooney, sure, but I still don't have the IP/FQDN in my bash history for some reason	11:34
sean-k-mooney	artom: ya i know im looking it up in mine/my router config	11:35
*** priteau has quit IRC		11:38
*** mtreinish has joined #openstack-nova		11:39
kashyap	aspiers: Randomly chiming in, but there is an MSR for SEV: https://www.kernel.org/doc/html/latest/x86/amd-memory-encryption.html	11:41
kashyap	aspiers: And it's reported via `cpuid`	11:41
kashyap	"Support for SME and SEV can be determined through the CPUID instruction. The CPUID function 0x8000001f reports information related to SME:"	11:41
kashyap	And:	11:41
kashyap	"If SEV is supported, MSR 0xc0010131 (MSR_AMD64_SEV) can be used to determine if SEV is active: [...]"	11:42
aspiers	Yeah, I've already used that in the past	11:42
kashyap	Ah, then disregard me.	11:42
* kashyap goes back to fiddling with what he needs to fiddle with		11:42
aspiers	No, the reminder is appreciated	11:42
kashyap	aspiers: Your goal is to check if the instance (in the upstream CI) booted has indeed with SEV, yeah?	11:43
aspiers	right	11:43
aspiers	I guess the JeOS image will need to include cpuid	11:44
kashyap	Isn't the JeOS (Just Enough OS, I presume) CirrOS in this case?	11:44
aspiers	It's whatever tempest is configured with	11:44
kashyap	(Nod)	11:45
sean-k-mooney	aspiers: has one of the upstream ci provered provided you with a lable that will run on SEV hardware	11:47
sean-k-mooney	aspiers: because 90% of the upstream ci cloud are proably runing intel x86_64	11:47
sean-k-mooney	although rackspace did run power for a while	11:47
aspiers	sean-k-mooney: SUSE has our own SEV boxes which can run 3rd party CI	11:48
sean-k-mooney	ah so its for the suse thrid party ci not for upstream	11:48
aspiers	well if upstream has SEV hardware then great, but I was not expecting that any time soon	11:48
sean-k-mooney	aspiers: i dont think it currely does	11:49
sean-k-mooney	or at least not in a way you can target	11:49
sean-k-mooney	im sure some of the clould proably have at least a small amd eypc inventory if for nothing else but there own internal validation	11:49
aspiers	yeah	11:50
donnyd	sean-k-mooney: have the numa jobs been running?	12:00
donnyd	I saw last night it was having some inbound ssh timeout issues.	12:02
sean-k-mooney	i have not checked this morning but i think clarkb kicked off a job around 4/5 am so ill see if that passed	12:04
sean-k-mooney	donnyd: so that one failed at 04:54	12:04
*** markvoelker has joined #openstack-nova		12:04
sean-k-mooney	donnyd: was that after your l3 agent restart	12:05
donnyd	i didn't restart till well after you were talking about it on infra last night	12:05
sean-k-mooney	donnyd: ya looking at the infra scroll back clarkb kicked that off after you did the restart	12:06
openstackgerrit	Chris Dent proposed openstack/os-resource-classes master: Update api-ref link to canonical location https://review.opendev.org/681235	12:06
donnyd	oh yea i see	12:06
donnyd	It really just makes no sense though	12:07
donnyd	all the rest of the labels seem to work without issue	12:07
donnyd	mostly without issue... not anymore than any other provider at least	12:07
*** tbachman has joined #openstack-nova		12:08
openstackgerrit	Chris Dent proposed openstack/os-traits master: Update README to be a bit more clear https://review.opendev.org/681237	12:09
donnyd	So the old label we have setup for numa, does that still work?	12:09
donnyd	I haven't set it back on my end	12:10
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Shrink the race window in confirm resize func test https://review.opendev.org/681238	12:12
*** etp has quit IRC		12:16
kashyap	aspiers: Is there a way to detect from _inside_ the guest that it has indeed booted with SEV?	12:25
kashyap	aspiers: (From the host, we can use the CPUID bit or other ways)	12:25
*** macz has joined #openstack-nova		12:26
kashyap	aspiers: Meanwhile, I learnt that if SEV isn't provided by the host, an SEV-enabled kernel should fail to boot.	12:26
openstackgerrit	Alexandra Settle proposed openstack/nova master: Fixing broken links https://review.opendev.org/681206	12:27
openstackgerrit	Chris Dent proposed openstack/os-traits master: Update README to be a bit more clear https://review.opendev.org/681237	12:29
gibi	efried, aspiers: Is there anything I can do regarding the SEV series?	12:30
stephenfin	aspiers: Random question: why can we not add the iommu attribute for all those device types?	12:30
stephenfin	gibi: I'm reviewing the last patch now. Think you've already hit it though	12:30
stephenfin	by which I mean the first one	12:30
stephenfin	since the other two are +2	12:30
stephenfin	+W	12:30
gibi	stephenfin: cool. Yeah I tried to find somebody how can look at the first	12:30
*** macz has quit IRC		12:30
gibi	but then it is in good hands now	12:31
kashyap	aspiers: Also, when you're about, see my response to your comment: https://review.opendev.org/#/c/348394/10	12:33
kashyap	aspiers: If you can confirm that /usr/share/qemu/ovmf-x86_64-suse-code.bin is indeed the binary built with Secure Boot, then we can fix it	12:34
luyao	Hi, everyone, I have a question, when an instance is in rebuild, how many allocations will it have? Both new and old ?	12:34
brinzhang	efried: Could you please review this patch https://review.opendev.org/#/c/681151/, it changes the novalient version to 15.1.0, needed by https://review.opendev.org/#/c/673725/	12:43
*** derekh has quit IRC		12:43
*** derekh has joined #openstack-nova		12:43
stephenfin	aspiers: Is that follow-up for https://review.opendev.org/#/c/666616/ around yet?	12:44
gibi	stephenfin: I did not find the followup either so I guess it isn't	12:47
*** tbachman has quit IRC		12:47
*** mriedem has joined #openstack-nova		12:55
*** udesale has joined #openstack-nova		12:57
openstackgerrit	Stephen Finucane proposed openstack/nova master: Apply SEV-specific guest config when SEV is required https://review.opendev.org/644565	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: Reject live migration and suspend on SEV guests https://review.opendev.org/680158	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption https://review.opendev.org/666616	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement https://review.opendev.org/671793	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: '_get_(v\|p)cpu_total' to '_get_(v\|p)cpu_available' https://review.opendev.org/672693	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: objects: Add 'InstanceNUMATopology.cpu_pinning' property https://review.opendev.org/680106	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: Validate CPU config options against running instances https://review.opendev.org/680107	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: trivial: Use sane indent https://review.opendev.org/680229	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: objects: Add 'NUMACell.pcpuset' field https://review.opendev.org/680108	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs https://review.opendev.org/671800	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait https://review.opendev.org/675571	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta https://review.opendev.org/671801	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique https://review.opendev.org/681060	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly https://review.opendev.org/681061	12:58
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add reshaper for PCPU https://review.opendev.org/674895	12:58
bauzas	gibi: add a question for you https://review.opendev.org/#/c/676140/	12:58
bauzas	hadù	12:58
gibi	bauzas: looking	12:59
*** hamzy_ has joined #openstack-nova		12:59
*** nweinber has joined #openstack-nova		12:59
openstackgerrit	Stephen Finucane proposed openstack/nova master: Apply SEV-specific guest config when SEV is required https://review.opendev.org/644565	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: Reject live migration and suspend on SEV guests https://review.opendev.org/680158	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption https://review.opendev.org/666616	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement https://review.opendev.org/671793	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: '_get_(v\|p)cpu_total' to '_get_(v\|p)cpu_available' https://review.opendev.org/672693	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: objects: Add 'InstanceNUMATopology.cpu_pinning' property https://review.opendev.org/680106	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: Validate CPU config options against running instances https://review.opendev.org/680107	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: trivial: Use sane indent https://review.opendev.org/680229	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: objects: Add 'NUMACell.pcpuset' field https://review.opendev.org/680108	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs https://review.opendev.org/671800	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait https://review.opendev.org/675571	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta https://review.opendev.org/671801	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique https://review.opendev.org/681060	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly https://review.opendev.org/681061	13:00
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add reshaper for PCPU https://review.opendev.org/674895	13:00
*** tbachman has joined #openstack-nova		13:00
*** BjoernT has joined #openstack-nova		13:02
*** macz has joined #openstack-nova		13:03
stephenfin	bauzas, alex_xu: I've merged in those follow-ups to https://review.opendev.org/#/c/671793/ and the next two patches, per dansmith's request, if you fancy re +2ing	13:04
bauzas	ack	13:05
gibi	bauzas: replied in https://review.opendev.org/#/c/676140	13:07
bauzas	gibi: cool thanks	13:07
*** macz has quit IRC		13:07
luyao	efried: Hi efried, are you around?	13:07
mriedem	gibi: do you have a bug reported for https://954d3ddb67e757934983-a9cc155153d08dd30dfffbbf1d71d234.ssl.cf5.rackcdn.com/676138/16/gate/nova-tox-functional-py36/fb5d235/testr_results.html.gz yet?	13:08
gibi	mriedem: not yet. I have a patch that shrinks the window	13:08
gibi	https://review.opendev.org/#/c/681238/	13:08
*** jdillaman has quit IRC		13:09
*** tbachman has quit IRC		13:09
*** nweinber_ has joined #openstack-nova		13:09
efried	luyao: o/	13:09
bauzas	gibi: mriedem: just an unrelated thought, should we somehow persist VGPU resource request to the RequestSpec requested_resources field ?	13:10
luyao	efried: I would like another patch to refactor this later if necessary. Current appproach works well and it will be great for me to change after merged. https://review.opendev.org/#/c/678452/22/nova/compute/resource_tracker.py@1168	13:10
gibi	requested_resources field is intentionally not persistes in the db	13:10
gibi	bauzas: ^^	13:10
bauzas	k	13:11
efried	bauzas: imo VGPU should take its lead from what's been done with vpmem	13:11
*** arshad777 has quit IRC		13:11
efried	move vgpu into the resources field	13:11
bauzas	for the moment, we don't really do anything, we just take the allocation that was done	13:11
stephenfin	dansmith: When you're around, we're going to need to discuss https://review.opendev.org/#/c/671801/40/nova/conf/scheduler.py@208 so I can grasp where you're coming from	13:11
*** nweinber has quit IRC		13:11
bauzas	efried: WDYM ? sorry, don't have a lot of contect of VPMEM apart of the spec	13:11
*** eharney has joined #openstack-nova		13:12
efried	luyao: Yes, I completely overlooked the fact that the report client's provider tree doesn't have the resources in it. I agree we should keep it this way for now, but I think for the refactor we may want to reduce the data structure stored in the RT to just a dict, keyed by rp_uuid, of lists of resources.	13:12
*** gbarros has joined #openstack-nova		13:12
efried	bauzas: Unless you were thinking to do this for Train, let's talk about it after FF.	13:13
bauzas	efried: of course not, just considering the next steps	13:13
efried	stephenfin: Are you going to have time today to review the vpmem series?	13:13
luyao	efried: Okey, we can discuss the details in the future. :)	13:13
bauzas	since we plan to do VGPU affinity	13:13
stephenfin	efried: I am, and I'm going to hold off on the follow-up requested for https://review.opendev.org/#/c/674895/ because reviews seem more important at the moment, right?	13:14
efried	brinzhang: looks like you already got your novaclient release merging, yah?	13:14
efried	stephenfin: totally	13:15
openstackgerrit	Merged openstack/nova master: Indent fake libvirt host capabilities fixtures more nicely https://review.opendev.org/679339	13:16
gibi	mriedem: field a bug https://bugs.launchpad.net/nova/+bug/1843433	13:16
openstack	Launchpad bug 1843433 in OpenStack Compute (nova) "functional test test_migrate_server_with_qos_port fails intermittently due to race condition" [Undecided,New]	13:16
bauzas	gibi: you could triage it as critical since it impacts the gate IMHO	13:17
bauzas	gibi: any ideas of the failure rate?	13:17
gibi	bauzas: does it block the gate?	13:17
mriedem	http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22in%20test_migrate_server_with_qos_port%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d	13:17
mriedem	it's not critical	13:17
mriedem	yes it hits the gate but doesn't block it	13:17
bauzas	mriedem: okay, thanks, I was about to ask logstash	13:18
mriedem	gibi: ack, i left a few comments in the patch to close the race	13:18
gibi	mriedem: thanks. I will respin soon	13:18
bauzas	gibi: what's the patch up for this ?	13:18
gibi	bauzas: https://review.opendev.org/#/c/681238/1	13:18
bauzas	gibi: tbh, I'm a bit afraid it's at the top of the series	13:19
bauzas	if you need to respin, I'd appreciate you move it down	13:19
luyao	efried: Could you help confirm will an rebuild instance have two groups of allocations? I knew migration instance will change one allocation consumer to migration uuid. https://review.opendev.org/#/c/678452/23/nova/compute/resource_tracker.py@406	13:19
gibi	bauzas: I think it is top of master	13:19
bauzas	yup	13:19
gibi	bauzas: so it is not connected to the series	13:19
gibi	bauzas: just by topic	13:19
mriedem	right, it's a race in an already merged functional test,	13:20
mriedem	so let's just fix the race and merge it	13:20
bauzas	gibi: my bad, you're right	13:20
gibi	mriedem: good suggestion about instance event, that will fix the race	13:20
mriedem	\o/	13:20
gibi	I was affraid we need to poll the migration allocation in placement to close the race	13:20
gibi	mriedem: I also fixed the bug grenade found in revert	13:21
gibi	mriedem: unfortunately we still not have test run on that patch	13:21
aspiers	kashyap: oh, I thought you were talking about cpuid inside the guest	13:21
mriedem	gibi: yeah i saw you updated but i haven't looked at your replies or changes,	13:22
aspiers	kashyap: I think there are multiple binaries with SB	13:22
mriedem	it kind of sucks that we don't store the information needed for revert in the migration context	13:22
gibi	mriedem: I have better plans, described in my reply	13:22
*** Sundar has joined #openstack-nova		13:22
gibi	mriedem: we can let neutron remember the mapping	13:22
kashyap	aspiers: Sorry, I was not clear. I don't _know_ if it is also reported in the guest `cpuid` -- maybe you can tell from your hardware?	13:23
gibi	mriedem: by using multiple portbinding	13:23
aspiers	kashyap: I don't have a guest with cpuid installed currently	13:23
aspiers	kashyap: I think I misremembered - before I probably ran it on the host	13:23
kashyap	aspiers: For SB -- IMHO, you don't _need_ all those 4M, 2M variants -- it's beyond overkill. Your life, and admins life will be far more simpler, if you used the "two pairs" approach I noted in the comment	13:24
aspiers	kashyap: You're preaching to the choir. You really need to file a bug report on bugzilla.opensuse.org	13:24
mriedem	gibi: ah yeah	13:24
sean-k-mooney	artom: i have confirmed that the migration context correctly contains the new numa toplogy blob but it does not get saved to the db	13:24
*** ratailor has quit IRC		13:25
aspiers	kashyap: or reach the right team in some other way	13:25
kashyap	aspiers: Aaah, sorry; didn't realize we're on the same line, same word	13:25
openstackgerrit	Adam Spiers proposed openstack/nova master: Improve SEV documentation and other minor tweaks https://review.opendev.org/681254	13:25
aspiers	efried, gibi, stephenfin: there's the follow-up ^^^	13:25
sean-k-mooney	artom: so do ing apply migration context followed by drop to delete it form the db does not work	13:25
sean-k-mooney	artom: i would assume that for some reason the filed is not flaged as dirty and is not being saved	13:26
gibi	aspiers: ack!	13:26
*** BjoernT_ has joined #openstack-nova		13:27
Sundar	Hi sean-k-mooney, how are you doing?	13:27
sean-k-mooney	Sundar: hi	13:28
efried	luyao: I'm trying to make sense of those scenarios right now.	13:28
Sundar	sean-k-mooney: The Cyborg patches for nova-integ have merged; https://review.opendev.org/#/q/project:openstack/cyborg+branch:master+topic:nova-integ+owner:Sundar	13:28
sean-k-mooney	Sundar: ok so its just the nova code that is pending	13:28
Sundar	There is one related patch for Nova notification https://review.opendev.org/674520 that is close	13:29
*** BjoernT has quit IRC		13:29
aspiers	kashyap: Where did you hear that an SEV-enabled kernel would fail to boot if SEV is not provided? I have empirically proven that false by booting the same image both with and without SEV	13:29
sean-k-mooney	Sundar: ok yes without that nova will timeout waiting and rollback the spawn	13:29
mriedem	Sundar: no testing at all for those cyborg patches?	13:30
Sundar	Yes. BTW, I updated the Nova patches too. Notification works with 674520. If you look at the seeming UT failures, they are mostly unrelated to Cyborg	13:30
kashyap	aspiers: DanPB; but to be fair to him, he used a qualifier "IIUC". If you've tested it, I'll go with your data	13:30
aspiers	kashyap: gotcha	13:30
Sundar	mriedem: What do you mean by 'no testing'?	13:30
sean-k-mooney	Sundar: have we had a end to end run of the tempest job	13:30
mriedem	Sundar: there are no tests associated with that patch	13:31
Sundar	The tempest CI is working with the patches, and we are hoping to merge it by this week.	13:31
efried	mriedem: do we use "migration instance" for evac?	13:31
mriedem	efried: no	13:31
efried	so what happens to allocations?	13:31
mriedem	efried: you mean migratoin-based allocations?	13:31
efried	yeah	13:31
mriedem	efried: they sit on the source	13:31
efried	thanks	13:31
mriedem	which is why we can't delete resource providers when deleting a compute service that has evacuated instance allocations against it	13:31
mriedem	that whole thread i had in the ML	13:32
*** Luzi has quit IRC		13:32
mriedem	efried: related to this https://review.opendev.org/#/c/678100/	13:32
sean-k-mooney	mriedem: that has been changed recently right. we delete the allcoation if they exisits if and only if you bring back up the compute agent on the failed host	13:32
sean-k-mooney	so that only help if you repair whatever the issue was	13:32
mriedem	sean-k-mooney: yes the source allocations are deleted if you bring up the evacuated-from compute service	13:33
sean-k-mooney	if you dont you get into the stiutation in that ml thread	13:33
mriedem	i don't think it's probably uncommon to have a host failure, down the compute service, evacuate it, and then try to delete the comptue service before redeploying on that host	13:33
mriedem	Sundar: tempest only covers happy path testing for the most part; unit tests are good for testing error conditions and such - exceptional cases	13:34
mriedem	anyway, that's up to the cyborg core team for how they want to enforce testing standards	13:34
efried	mriedem: also rebuild?	13:35
mriedem	efried: a rebuild isn't a migration	13:35
mriedem	and you can't rebuild on a down host	13:35
*** bbowen_ has joined #openstack-nova		13:35
mriedem	so the host stays the same and the flavor stays the same, but the image might change on a rebuild	13:35
sean-k-mooney	Sundar: have the depencies been fixed in https://review.opendev.org/#/c/670999/	13:36
efried	mriedem: right, separate topic, I'm saying: does rebuild also end up with multiple sets of allocations for the same instance uuid?	13:36
sean-k-mooney	Sundar: you rebased it but did not run the new job via "check experimental"	13:36
sean-k-mooney	so we still dont have a run that worked	13:36
sean-k-mooney	the previous run failded https://c3308e17743765936b80-6c7fec3fffbf24afb7394804bcdecfae.ssl.cf2.rackcdn.com/670999/7/experimental/cyborg-tempest/2fe52ec/testr_results.html.gz	13:37
mriedem	efried: no	13:37
sean-k-mooney	that said we knew the depencies were wrong so we expected that	13:37
*** bbowen__ has joined #openstack-nova		13:37
*** bbowen has quit IRC		13:37
openstackgerrit	Matt Riedemann proposed openstack/nova master: Fixing broken links https://review.opendev.org/681206	13:38
luyao	mriedem: what about in the process of rebuild	13:39
*** bbowen_ has quit IRC		13:39
luyao	mriedem: rebuild is not done, how many allocations will an instance have?	13:40
efried	we destroy the instance but keep the allocations, then respawn it with the existing allocations?	13:40
efried	...and maybe a different image?	13:40
sean-k-mooney	Sundar: it looks like there are 3 patches still remaining on the cyborg side. 2 for nova integration and 1 for python3 support	13:40
mriedem	luyao: i don't understand your question	13:41
mriedem	efried: correct	13:41
efried	mriedem: luyao and I are asking about the same thing. Thanks for the help.	13:41
mriedem	rebuild is basically re-spawn in place on the same host with maybe a different image but the same ports/volumes/flavor - if you're on shared storage you keep your root disk, if not your root disk is rebuilt from the specified image	13:42
Sundar	sean-k-mooney: I'll rerun with 'check experimental'. Re. dependencies for https://review.opendev.org/#/c/670999/, one has already merged, and the other is the tempest code itself, which is working with the patches and should merge soon. There should be only 1 for nova-integ (i.e. Nova notification) -- I fixed the topic now.	13:42
sean-k-mooney	Sundar: wait a minute	13:42
*** eharney has quit IRC		13:42
mriedem	though i might be thinking of evacuate for that root disk comment	13:42
sean-k-mooney	im going to fix it to not waste gate resouces and fix the depencies	13:42
mriedem	anyway, root disk doesn't matter for what you're asking	13:43
mriedem	sean-k-mooney: cyborg integration isn't happening in nova in train - where are we with test runs on the numa live migration series?	13:43
efried	stephenfin, dansmith: btw, vpmem now has a CI passing. I haven't opened one up yet, but it's taking half an hour, so it's doing... something :P	13:43
luyao	mriedem: I saw an instance have new and old allocations in an functional test when it's in evacuating.	13:43
sean-k-mooney	mriedem: i confrimed that it is not saving the updated numa toplogy so it looks like we need instance.refresh	13:43
*** jawad_axd has quit IRC		13:43
sean-k-mooney	mriedem: the migration context has the correct numa toplogy	13:44
luyao	mriedem: I thought rebuild is similar to evacuate	13:44
sean-k-mooney	mriedem: so it looks liek its not marking the field as changed for some reason	13:44
sean-k-mooney	mriedem: perhapes because apply migration context uses setattr	13:44
mriedem	luyao: evacuate is rebuild to another host when the source host is down	13:44
sean-k-mooney	mriedem: i -1'd the patch	13:44
*** rcernin has quit IRC		13:45
mriedem	evacuate and rebuild use the same code flows in conductor and compute services with conditionals for any differences	13:45
efried	luyao, mriedem: I find all of it so confusing that I don't even try to remember, because that just makes things worse. It goes against the very fiber of my being, but this is one where I'll ask. Every. Time.	13:45
*** jawad_axd has joined #openstack-nova		13:45
mriedem	e.g. evacuate does a claim on the dest host, rebuild does not	13:45
*** jawad_axd has quit IRC		13:45
mriedem	sean-k-mooney: ok i didn't realize artom removed the instance.refresh in post live migration	13:45
mriedem	but haven't looked at changes since yesterday	13:46
mriedem	efried: rebuild+evacuate and resize+cold-migrate are confusing in that they share code but don't do the exact same things	13:46
mriedem	but are like 90% the same	13:46
mriedem	efried: do we need a "what's the difference between evacuate and rebuild" thing in the contributor docs?	13:47
mriedem	b/c that's probably pretty easy to write up	13:47
efried	that must be why luyao was conflating rebuild+evacuate wrt allocations. That must fall in the 10% that's different.	13:47
mriedem	part of it yeah	13:48
efried	mriedem: I'll admit I didn't even go looking for a doc on this.	13:48
mriedem	there likely isn't one	13:48
mriedem	but this isn't the first time this has come up,	13:48
efried	There was that one blog post	13:48
efried	"one"	13:48
mriedem	so instead of explaining it each time, one could just link to a doc	13:48
jangutter	efried: http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/ this one?	13:48
mriedem	not really the same	13:48
efried	that looks familiar jangutter	13:49
*** jawad_axd has joined #openstack-nova		13:49
mriedem	that's about nova client commands that live migrate all instances off a host	13:49
openstackgerrit	Adam Spiers proposed openstack/nova master: Improve SEV documentation and other minor tweaks https://review.opendev.org/681254	13:49
luyao	efried: yes, actually I found evacuting instance has two groups of allocations, and I thought rebuild is the same.	13:49
* alex_xu tried to understand luyao's question		13:49
mriedem	"After a compute host has failed, rebuild my instance from the original image in another place, keeping my name, uuid, network addresses, and any other allocated resources that I had before."	13:49
efried	anyway, I envision some kind of table with all these related operations on one axis, an things like "same/different destination host", "what happens to allocations", "instance/migration UUIDs", etc on the other.	13:50
mriedem	that's a much bigger doc if you're trying to lump in all move operations,	13:51
openstackgerrit	sean mooney proposed openstack/nova master: [WIP] add cyborg tempest job https://review.opendev.org/670999	13:51
mriedem	because you'd have to consider resize on the same host, which is complicated in different ways	13:51
efried	mriedem: is there a todo list on which that ^ could be registered so it is not forgotten, but not started until after FF? We need you for more urgent things this week.	13:51
sean-k-mooney	Sundar: that ^ should test with the correct deps. im going back to testing artoms code	13:51
mriedem	efried: i'm not signing up to write a doc with all of the axis of evil,	13:52
mriedem	writing up something quick and easy about "what's the difference between rebuild and evacuate" is pretty easy for me to crank out	13:52
efried	even so	13:52
mriedem	a bug is good enough for that	13:52
mriedem	docs bug, link to the irc question that started tihs	13:52
efried	ack	13:52
*** jawad_axd has quit IRC		13:53
sean-k-mooney	mriedem: im going to add back in the instnace.refresh locally and add some logging to see what teh numa toplogy blob look like before and after.	13:53
*** med_ has joined #openstack-nova		13:53
sean-k-mooney	mriedem: artom is in dad taxi mode currently but he should be back soon ish	13:53
*** nweinber__ has joined #openstack-nova		13:53
sean-k-mooney	although looking at that funciton im not clear why that helps	13:55
Sundar	Thanks, sean-k-mooney	13:55
*** eharney has joined #openstack-nova		13:55
*** mlavalle has joined #openstack-nova		13:56
efried	https://bugs.launchpad.net/nova/+bug/1843439	13:56
openstack	Launchpad bug 1843439 in OpenStack Compute (nova) "doc: Rebuild vs. Evacuate (and other move-ish ops)" [Low,Confirmed] - Assigned to Matt Riedemann (mriedem)	13:56
*** nweinber_ has quit IRC		13:56
efried	mriedem: there's a 'doc' tag and a 'docs' tag. Which one is real?	13:56
efried	(I used both)	13:56
*** ociuhandu has quit IRC		13:57
*** ociuhandu has joined #openstack-nova		13:58
*** tbachman has joined #openstack-nova		13:58
bauzas	efried: IIRC, it was 'docs' 3 years ago	13:59
bauzas	oops, I meant 'doc'	14:00
bauzas	holy shit	14:00
sean-k-mooney	its doc	14:00
*** BjoernT has joined #openstack-nova		14:00
sean-k-mooney	https://wiki.openstack.org/wiki/Nova/BugTriage#Tag_Owner_List	14:00
bauzas	efried: yeah, I'm correct : https://bugs.launchpad.net/nova/+bugs?field.tag=doc vs. https://bugs.launchpad.net/nova/+bugs?field.tag=docs	14:00
efried	bauzas: are tags autovivified or do they need to be explicitly created somewhere?	14:01
*** Izza_ has quit IRC		14:01
bauzas	efried: nope, you can add anyone	14:01
efried	Like, could we go through the latter four bugs and remove 'docs' and that tag would disappear?	14:01
mriedem	there is an official list	14:01
bauzas	efried: I'm just updating the 4 'docs'	14:01
mriedem	anyone can add any tag,	14:01
sean-k-mooney	efried: if you go to the wiki i likned	14:01
*** BjoernT_ has quit IRC		14:01
bauzas	s/docs/doc	14:01
mriedem	but the list of official tags that auto-complete is curated	14:01
sean-k-mooney	there is a link to manage the offical tags	14:01
efried	cool	14:01
efried	mriedem: oh, in that case both 'docs' and 'doc' must be in there	14:02
sean-k-mooney	https://bugs.launchpad.net/nova/+manage-official-tags	14:02
efried	because both autocompleted for me	14:02
mriedem	https://bugs.launchpad.net/nova/+manage-official-tags	14:02
sean-k-mooney	no idea how to use that however	14:02
*** tkajinam has joined #openstack-nova		14:02
mriedem	you move things in or out of the 'official tags' column	14:02
bauzas	efried: mriedem: I removed usage for 'docs' https://bugs.launchpad.net/nova/+bugs?field.tag=docs	14:02
efried	okay, I'm removing 'docs' from official...	14:03
bauzas	and yeah, I just removed 'docs' from the official list	14:03
bauzas	heh, jinxed	14:03
*** ociuhandu has quit IRC		14:03
efried	sorted	14:03
openstackgerrit	Stephen Finucane proposed openstack/nova master: objects: Remove custom comparison methods https://review.opendev.org/472285	14:03
mriedem	i just added "documentation"	14:03
efried	you're a bastard	14:03
mriedem	and "docify"	14:03
openstackgerrit	Stephen Finucane proposed openstack/nova master: objects: Remove custom comparison methods https://review.opendev.org/472285	14:04
efried	gibi: Did I see something about a grenade failure - should I wait to recheck stuff?	14:04
bauzas	efried: mriedem: either way, it's been a long time since I triaged a single bug in Launchpad, floor is yours, folks	14:04
gibi	efried: don't have to wait	14:04
efried	thanks	14:04
gibi	efried: it was a failure in an open review	14:04
efried	o good :)	14:04
gibi	efried: which I respinned with a fix	14:04
efried	I'm going to need to bring in a ringer for this vpmem patch https://review.opendev.org/#/c/678455/	14:05
efried	I can handle the rest, but that one is beyond me.	14:05
efried	stephenfin is already lined up for it. alex_xu is a co-author.	14:05
efried	sean-k-mooney, kashyap, aspiers, artom: I would proxy a +2 from one of you if you're able --^	14:06
alex_xu	also the vpmem CI is pretty close https://review.opendev.org/#/c/679640/14 \o/	14:06
kashyap	efried: vPMEM?	14:06
* kashyap clicks		14:07
efried	kashyap: Yeah, that patch is just "do shit efried doesn't understand to the libvirt xml"	14:07
efried	so I don't think you need to understand vpmem	14:07
sean-k-mooney	efried: the pmem seriese. i can take a look if kashyap cant	14:07
kashyap	efried: It's been forever on my to-read list, but I was pulled into urgent downstream stuff last/this week	14:07
bauzas	efried: once I'm done with gibi's series and stephenfin's cpu-resources, I could give some look at https://review.opendev.org/#/q/topic:bp/virtual-persistent-memory	14:07
openstackgerrit	Matt Riedemann proposed openstack/nova master: Fixing broken links https://review.opendev.org/681206	14:08
efried	Thanks folks. bauzas kashyap sean-k-mooney in the interest of focusing resources, if it's possible to start with https://review.opendev.org/#/c/678455/ rather than going through the whole series, that's the one where we have a known reviewer gap.	14:09
alex_xu	bauzas: kashyap sean-k-mooney thanks also	14:09
bauzas	efried: honestly, reviewing the series needs me looking from the start, but I can surely just look at the patches without really commenting them	14:09
sean-k-mooney	yes if that is the one that need to be reviewed i can start there in about 5 mins	14:10
kashyap	efried: Nod; that's what I clicked on	14:10
gibi	stephenfin: a question about the cpu series. The vcpu_pins_set can be and empty string and the nova will use every host cpu for the vcpus. Does this behaviour preserved for the cpu_shared_set and cpu_dedicated_set ?	14:10
*** brinzhang has quit IRC		14:11
efried	gibi: iiuc no, you'll get an error if you try that.	14:11
bauzas	gibi: good question, we haven't agreed on that in the spec	14:12
*** brinzhang has joined #openstack-nova		14:12
gibi	efried: so I can assume that when the vcpu_pin_set is removed then there will be no way to implicitly offer every host cpu as vcpu? But I have to explicitly list them in one of the new configs	14:13
stephenfin	gibi: It's preserved, yes	14:13
bauzas	stephenfin: only if you don't use the new options, right?	14:14
bauzas	stephenfin: because if you start messing with your options, then something will come up :)	14:14
stephenfin	any(vcpu_pin_set, compute.cpu_shared_set, compute.cpu_dedicated_set) == False --> allocate all host CPUs for PCPU inventory	14:14
stephenfin	bauzas: correct	14:14
sean-k-mooney	whats the status on the SEV series by the way. stephen did you get a chance to look at the bottom patch? i think that was all that was left to be +w and the rest were ready to go?	14:14
efried	Right, I guess we don't have to make the decision about what gibi is asking until a future release.	14:15
efried	sean-k-mooney: it's gateward	14:15
stephenfin	sean-k-mooney: It's gone through	14:15
stephenfin	going, rather	14:15
sean-k-mooney	ok so i can unload the SEV context form my brain	14:15
sean-k-mooney	at least for a while thats good	14:15
stephenfin	burn it. burn it all	14:15
stephenfin	at least until people start using it in two years or so	14:15
gibi	efried: sure, the part of what happens when we remove the vcpu_pin_set is for the future not for Train	14:15
gibi	stephenfin: thanks	14:16
stephenfin	gibi, efried: fwiw, when we remove 'vcpu_pin_set' I'd like to preserve the same behavior	14:16
stephenfin	though sean-k-mooney disagrees	14:16
stephenfin	in U or V or whatever, the above simply becomes	14:16
stephenfin	any(compute.cpu_shared_set, compute.cpu_dedicated_set) == False --> allocate all host CPUs for PCPU inventory	14:16
gibi	stephenfin: I have conflicting requirements internally what empty vcpu_pin_set "should" mean	14:17
stephenfin	damn, sorry, VCPU	14:17
stephenfin	in both cases	14:17
gibi	stephenfin: yeah, VCPU. even beter	14:17
efried	alex_xu: even if not yet +2able, do you see https://review.opendev.org/#/c/671800/ being generally sane, enough to unblock the bottom and start merging cpu-resources patches?	14:17
stephenfin	yeah, my mistake	14:17
alex_xu	efried: yes, I think it is good. the case I tested passed	14:18
efried	stephenfin: any reservations about that ^ ?	14:18
stephenfin	efried: Nope. I need to fix a thing with quotas and discuss whether we need the scheduler option with dansmith ([1]) but neither of those should affect anything [1] https://review.opendev.org/#/c/671801	14:19
stephenfin	*anything lower in the series	14:19
stephenfin	sean-k-mooney: Want to skim through https://review.opendev.org/#/c/671793/ again now that the fixes for your issues are merged directly in?	14:20
sean-k-mooney	am ill add it to the queue after the vpmem patch but sure	14:20
efried	bauzas: would you please re+2 and +W the bottom cpu-resources https://review.opendev.org/#/c/671793/ once sean-k-mooney has done that ^ ?	14:20
*** awalende has quit IRC		14:24
*** awalende has joined #openstack-nova		14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: Claim resources in resource tracker https://review.opendev.org/678452	14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Enable driver discovering PMEM namespaces https://review.opendev.org/678453	14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree https://review.opendev.org/678454	14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup https://review.opendev.org/678455	14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec https://review.opendev.org/678456	14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces https://review.opendev.org/679640	14:25
*** awalende has quit IRC		14:25
openstackgerrit	Luyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory https://review.opendev.org/678470	14:25
*** dtantsur is now known as dtantsur\|afk		14:25
*** awalende has joined #openstack-nova		14:26
artom	sean-k-mooney, thanks for testing, I saw your comments while waiting at the doctor's office. I'm going to try and fix the func tests, and then get a better handle on why instance.refresh() makes it work	14:27
sean-k-mooney	artom: well it does not	14:27
*** brinzhang_ has joined #openstack-nova		14:27
artom	sean-k-mooney, hah, so it never worked?	14:27
sean-k-mooney	at least adding it before apply does not work	14:27
sean-k-mooney	im gong to try adding after drop migration context	14:27
artom	sean-k-mooney, no, it was on the source, before calling driver.cleanup()	14:27
sean-k-mooney	oh i thought it was in post live migrate	14:28
artom	The theory was that driver.cleanup(), at least in libvirt, calls instance.save() and clobbers that the dest saved	14:28
sean-k-mooney	oh ok	14:28
kashyap	efried: Ah, nice - just noticed that Luyao also wrote the upstream libvirt-proper part of the 'pmem' / related NVDIMM support bits :-). (/me needs more time, it's a massive context-switch for my slow brain.)	14:28
sean-k-mooney	ill look at the old code and try and figure out where to add it	14:28
sean-k-mooney	if you know then i can test it	14:28
* kashyap bbiab; neighbour called for help		14:29
sean-k-mooney	or feel free to updated it on the host i gave you acess too	14:29
efried	kashyap: that sounds impressive	14:29
sean-k-mooney	whichever works	14:29
artom	mriedem, ^^ ... would we be OK using instance.refresh() without a clear handle on why it's needed? I'm assuming not	14:29
*** awalende has quit IRC		14:30
*** maciejjozefczyk has quit IRC		14:30
sean-k-mooney	efried: https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py is the primary file you want our input on yes	14:31
efried	sean-k-mooney: yeah, I can at least somewhat understand the others.	14:32
sean-k-mooney	i have looked at the other and assming this file looks sane those do too	14:32
*** ociuhandu has joined #openstack-nova		14:32
sean-k-mooney	im almost done but sofar nothing jumps out as me as obviously wrong	14:32
*** ociuhandu has quit IRC		14:33
efried	luyao: https://review.opendev.org/#/c/678456/ quick fix here please	14:33
*** ociuhandu has joined #openstack-nova		14:33
luyao	efried : ok	14:34
mriedem	artom: i thought sean-k-mooney said it was needed	14:35
sean-k-mooney	mriedem: i said it does not work without it. i have not tested readding instance.refresh in the correct place	14:35
mriedem	artom: like i said in review, my guess is the copy of the instance on the source has a dirty migration_context field, the applies it's copy and drops it, and then the source does an instance.save and saves the dirty copy	14:35
sean-k-mooney	the current code is setting the correct numa toplogy in the migration context but its not makeing it into the numa_toplogy field int eh instance_extra table	14:36
openstackgerrit	Luyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec https://review.opendev.org/678456	14:36
openstackgerrit	Luyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces https://review.opendev.org/679640	14:36
openstackgerrit	Luyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory https://review.opendev.org/678470	14:36
sean-k-mooney	luyao: efried alex_xu where are ye subtracting the alingment form the requested size	14:36
sean-k-mooney	self.target_size = kwargs.get("size_kb", 0) - self.align_size	14:37
artom	mriedem, it is needed, sean-k-mooney confirmed this morning	14:37
artom	But I don't have a good understanding of why	14:37
sean-k-mooney	luyao: efried alex_xu https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py@3180	14:37
artom	Err, brb, my client's all wonky	14:37
*** ociuhandu has quit IRC		14:37
*** artom has quit IRC		14:38
*** artom has joined #openstack-nova		14:38
*** ociuhandu has joined #openstack-nova		14:39
luyao	efried, sean-k-mooney: because the label will occupy some space, and the the size must be aligned	14:39
sean-k-mooney	luyao: sure but jsut substracting the alignment is not correct	14:40
sean-k-mooney	it will result in a smaller allcotion then you asked for.well	14:40
sean-k-mooney	there are two ways to handel it	14:40
sean-k-mooney	if we know the lable size	14:40
artom	mriedem, btw, I removed all the allocation-style pinning/non-NUMA stuff that we talked about last night. You and sean-k-mooney convinced me it was 1. tangential (long-standing issue not related to intance NUMA topology, fixed with a hard reboot) 2. scope creep that's too risky at this point. So I pushed a PS last night with out it	14:40
artom	mriedem, so, in my mind, all that's left is the instance.refresh() thing	14:40
sean-k-mooney	we can add that then round up to the next size or we had do what your doing	14:40
luyao	sean-k-mooney : Yes it is, it will be smaller	14:41
mriedem	artom: ok. the instance.refresh() certainly doesn't hurt so i have no issues with leaving that in	14:41
dansmith	same,	14:41
sean-k-mooney	luyao: i normally would have assumed we would handel this by adding the overhaed size and round up to then next aliment boundry like we do for block device but i guess we cant do that as we need to fit in the size in the placemnt allocation	14:42
dansmith	I had said I wasn't totally sure why, but wasn't surprised it was needed	14:42
sean-k-mooney	luyao: so you chose to subtract the alignment	14:42
luyao	sean-k-mooney: the size must be aligned by alignsize	14:42
mriedem	i think it would be pretty simple to debug after the fact if we have a working job - just dump the source instance.migration_context to logs, call post at dest, then refresh and dump it again to the logs	14:42
sean-k-mooney	luyao: yes i know	14:42
sean-k-mooney	im not arguing that point	14:42
dansmith	mriedem: yeah	14:42
sean-k-mooney	but that code does not actully guarentee that	14:42
artom	mriedem, dansmith, ack, so first thing I'll do is put it back in, and while you guys check whether I've addressed the rest of the feedback satisfactorily, I can fix the func tests for not picking that up, and bet a better handle on why it's needed in the first place	14:43
luyao	sean-k-mooney: let me check it again	14:43
dansmith	artom: ack	14:43
sean-k-mooney	luyao: im refering to https://review.opendev.org/#/c/678455/24/nova/virt/libvirt/config.py@3180 by the way	14:44
sean-k-mooney	luyao: oh you are relying on the plamcent step size to prevent non aligned allocations	14:45
luyao	sean-k-mooney: sorry, I don't understand	14:46
luyao	sean-k-mooney: the target size is aligned by alignsize, this is guaruanteed by creating namepsace, then I subtract the alignment, it will be aligned by alignsize also.	14:47
luyao	sean-k-mooney: what is plamcent step size	14:48
sean-k-mooney	yes but this code is relying on other code makeing sure that size_kb is a multiple of the aliment size	14:48
sean-k-mooney	it is nto actully checkign for that here its just assuming it	14:48
sean-k-mooney	which is why this normally would not guarentee its a still alinged but in this case it will be	14:49
sean-k-mooney	luyao: ignore the placment step size we decided to model it diffrently	14:50
luyao	sean-k-mooney: the size is return from ndctl utility, ndctl will guarentee the size is aligned. the alignsize is also from ndctl	14:51
sean-k-mooney	we had two proposals this is why it would have been better for me to review all of the series rather then jsut this patch	14:51
sean-k-mooney	ok	14:51
efried	luyao: FYI: step_size is the granularity with which you're allowed to make an allocation for a resource.	14:51
efried	So if you have 64 units of something, and your step_size is 8, you're allowed to allocate 8, 16, 24... but not 9 or 3 or 14.	14:51
sean-k-mooney	efried: ya so in the current propoal we are tracking invenoties of namespace size	14:52
sean-k-mooney	so i think its 1	14:52
sean-k-mooney	but we had debated modeling it differently at one point	14:52
sean-k-mooney	in its current form its not relevent. but i need to remind my self of some of teh other decisions	14:53
luyao	efried,sean-k-mooney: thanks, I see. it has no step_size, it is not partable	14:54
sean-k-mooney	luyao: well its set to 1	14:54
efried	Yes, I remember this now. 1 is appropriate, because they're single namespaces	14:55
sean-k-mooney	so you get pmem in units of 1 namespace	14:55
luyao	sean-k-mooney: yes	14:55
efried	^	14:55
efried	and the size/align shouldn't actually leak into any kind of placement, scheduling, etc. code.	14:55
efried	If it's anywhere at all, it would be deep in the libvirt-isms.	14:55
efried	because as far as nova+placement are concerned, a namespace is a single unit.	14:56
sean-k-mooney	efried: at one point we had discussed the idea of dynamically creating the namespces but we regjected that	14:56
sean-k-mooney	when we were considring it the alinment was one of the posible values for the stepsize i belive	14:57
sean-k-mooney	anyway that is not relevent	14:57
efried	right	14:57
* efried ==> meetings...		14:57
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Fix the race in confirm resize func test https://review.opendev.org/681238	14:57
gibi	mriedem: ^^	14:57
*** nweinber_ has joined #openstack-nova		14:58
efried	dansmith, stephenfin: FYI on the pmem CI, the reason it's passing on patches outside of the pmem series is because currently the job is configured to pull down the top patch (https://review.opendev.org/#/c/678470) before running.	14:59
efried	I'm still digging into what the tests are actually doing.	14:59
gibi	mriedem: it seems that the top of the bw series hits the race so I'm wondering what to do. Rebase the series to inclued the fix in the bottom, or wait for the grenade test result first as gate is slow	14:59
dansmith	efried: are you talking about vpmem or PCPU?	14:59
*** udesale has quit IRC		15:00
*** udesale has joined #openstack-nova		15:00
efried	dansmith: vpmem	15:00
*** nweinber__ has quit IRC		15:01
efried	bbiab	15:01
*** efried is now known as efried_afk		15:01
*** gbarros has quit IRC		15:02
mriedem	gibi: a nit inline on that race test fix	15:03
mriedem	gibi: i think we'll just get the race fix merged today and then it's the same as you rebasing your series on it	15:03
gibi	mriedem: OK. will put back the lower() call.	15:04
gibi	mriedem: ahh good point, the gate will rebase the series for the test anyhow	15:05
*** spsurya has quit IRC		15:05
*** jawad_axd has joined #openstack-nova		15:06
*** brinzhang_ has quit IRC		15:07
mriedem	gibi: left another comment, don't know if you want to fup again or not,	15:07
mriedem	but since you changed all of these copy/paste blocks of confirmResize/wait for status/wait for event, those could go into a single helper method	15:08
gibi	mriedem: OK I will put that in a helper. Do you suggest to only change the new confirm resize tests?	15:09
mriedem	i'm not sure what you mean by 'new confirm resize tests'	15:09
mriedem	like the new bw provider migration tests?	15:10
gibi	mriedem: the current patch adds the instance action event waiting for every confirm resize tests in test_servers	15:10
gibi	mriedem: not just to the bw related confirm resize tests	15:10
mriedem	yeah i didn't expect to see all of those other changes in tests that aren't doing exotic allocatoins	15:10
*** Sundar has quit IRC		15:10
mriedem	i mean, it's fine, and we could just put the helper in ServerMovingTests	15:11
gibi	this race does not depend on any exotic allocation.	15:11
gibi	true we only saw the issue in the bw related tests	15:12
mriedem	putting it in ServerMovingTests is probably not good enough then, since the new tests don't use that	15:12
mriedem	in PortResourceRequestBasedSchedulingTestBase	15:12
mriedem	could go in ProviderUsageBaseTestCase though...	15:13
*** tkajinam has quit IRC		15:13
mriedem	up to you - i just don't love the copy/paste so we should use a helper at some point, fup or whatever - up to you	15:13
*** tssurya has quit IRC		15:13
gibi	mriedem: OK. I will respin the patch	15:13
openstackgerrit	Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606	15:14
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021	15:14
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595	15:14
bauzas	gibi: could you please give me again the race fix change ?	15:14
bauzas	gibi: can't easily find it	15:14
gibi	bauzas: https://review.opendev.org/#/c/681238	15:15
bauzas	gibi: thanks	15:15
bauzas	ah, that's because the topic name changed :)	15:15
sean-k-mooney	alex_xu: luyao have you tested the vpmem code with instace that request hugepage or pinning but dont specify a numa toplogy	15:16
artom	dansmith, mriedem, ^^	15:17
*** gbarros has joined #openstack-nova		15:17
*** shilpasd has quit IRC		15:17
bauzas	gibi: so you'll respin https://review.opendev.org/#/c/681238 ?	15:18
alex_xu	sean-k-mooney: I guess luyao have test on that. But that will be a case like normal instance with numa. Since when we specify pinning or hugepage, we will create one node numa for the instance, right?	15:18
* bauzas context switches to cpu-resources then		15:18
sean-k-mooney	alex_xu: we should but i need to make sure https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@4664 does nto break that	15:19
*** ociuhandu has quit IRC		15:19
sean-k-mooney	e.g. and instance with hw:cpu_policy=dedicated + vpmem needs to be pinned	15:19
*** ociuhandu has joined #openstack-nova		15:19
sean-k-mooney	and i need to ensure adding not need_pin wont break that	15:19
alex_xu	sean-k-mooney: https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@5454, for that case, the need_pin should be True, since the instance has numa_topology	15:20
sean-k-mooney	instance.numa_topology shoudl be set in the calim but i need to triple check	15:20
alex_xu	yea	15:20
gibi	bauzas: yeah, I will respin soon.	15:20
bauzas	cool	15:21
alex_xu	sean-k-mooney: actually we set instance.numa_topology in the API layer I think	15:21
sean-k-mooney	alex_xu: i think the code is correct but im not sure im happy with this be in a diffreent location then the other places we create a numa toplogy implcitly	15:21
alex_xu	sean-k-mooney: yea, that is agreement on the PTG, people said they want the vpmem on no numa instance, but I totally understand your point	15:22
*** ociuhandu has quit IRC		15:22
alex_xu	I can't remember who is asking, but it should be redhat people	15:23
*** ociuhandu has joined #openstack-nova		15:23
sean-k-mooney	well as a redhat person i did not but i think maybe dansmith prefered supporting this. i kown we said we woudl do that at the PTG however	15:23
alex_xu	yea, actually I'm on your side in the beginning	15:24
sean-k-mooney	alex_xu: do you have an xml i can inspect	15:24
dansmith	sean-k-mooney: um, what?	15:24
sean-k-mooney	e.g. an exmaple xml that was generated using this code	15:24
* dansmith feels like sean-k-mooney has been blaming a lot of stuff on him lately		15:24
*** tbachman has quit IRC		15:25
alex_xu	sean-k-mooney: luyao has, but she is on the way home~	15:25
sean-k-mooney	dansmith: you prefered to add pmem suport without requireing numa in the inital version corect?	15:25
sean-k-mooney	dansmith: i could be miss rememebring if so my applogies	15:25
sean-k-mooney	dansmith: i just rememebr you and i were the most active redhatters in that discuession in the ptg.	15:26
alex_xu	sean-k-mooney: here is one http://52.27.155.124/95/674895/35/check/pmem-tempest-plugin-filtered/af5fdef/controller/logs/screen-n-cpu.txt.gz	15:26
alex_xu	sean-k-mooney: search the 'pmem', you will see the xml output by nova-compute log	15:27
dansmith	sean-k-mooney: I seriously doubt I said that	15:27
*** ociuhandu has quit IRC		15:28
sean-k-mooney	dansmith: in that case my applogies for invoking your name :)	15:29
sean-k-mooney	alex_xu: so this is the generated domain	15:29
sean-k-mooney	http://paste.openstack.org/show/774840/	15:29
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Fix the race in confirm resize func test https://review.opendev.org/681238	15:30
gibi	mriedem, bauzas: ^^	15:30
sean-k-mooney	alex_xu: it is both pinning ram of the and the core of the guest to float over host numa node 0	15:30
alex_xu	sean-k-mooney: yes, but I think the is based on old patchset, since that ci job always pull a old version of luyao's patch	15:31
*** ociuhandu has joined #openstack-nova		15:31
alex_xu	in new version, it shouldn't have the pinning	15:31
sean-k-mooney	alex_xu: but it is not specificy a constrait on pmem device as far as i can see	15:31
alex_xu	sean-k-mooney: yes, that is the output for patchset13, sorry	15:33
sean-k-mooney	alex_xu: im not sure that is a good thing. but ok. do you have an updated run	15:33
sean-k-mooney	ill check the ci logs i guess	15:33
alex_xu	sean-k-mooney: luyao is trying to get one for you	15:33
sean-k-mooney	http://52.27.155.124/93/671793/23/check/pmem-tempest-plugin-filtered/23b32fa/ shoudl have them right	15:34
sean-k-mooney	that from ps 23	15:34
sean-k-mooney	i found one	15:34
alex_xu	sean-k-mooney: no, it won't, the CI always try to apply the PS13, Rui is trying to remove that. That is why you can see other patch also can pass the pmem ci test	15:35
*** macz has joined #openstack-nova		15:35
*** gyee has joined #openstack-nova		15:35
sean-k-mooney	in the new code will both the numatune and cputune element be removed if only a pmem device is requested in the flavor	15:37
*** ociuhandu has quit IRC		15:38
kashyap	sean-k-mooney: To my eyes this whole NUMA and PMEM interaction looks sutble enough that some more functional testing is required...	15:39
*** tbachman has joined #openstack-nova		15:41
sean-k-mooney	kashyap: well there are functional tests in https://review.opendev.org/#/c/678470/27	15:41
sean-k-mooney	but i have not looked at them yet	15:41
kashyap	sean-k-mooney: Ah, missed it; thx	15:42
sean-k-mooney	... its locking up my firefox windwo trying to open it for some reason	15:43
bauzas	gibi: https://review.opendev.org/#/c/681238/3	15:44
bauzas	gibi: can I update the commit msg ?	15:44
sean-k-mooney	kashyap: the functional test will not assert anythin about the xml generation	15:44
gibi	bauzas: sure go ahead	15:44
*** factor has joined #openstack-nova		15:45
kashyap	sean-k-mooney: Sure, I see those are already in the main change.	15:45
*** icarusfactor has quit IRC		15:45
sean-k-mooney	there are no assertion made about numa element in the main change as far as i can tell	15:46
sean-k-mooney	rather numatune and cputune	15:46
*** ociuhandu has joined #openstack-nova		15:47
bauzas	gibi: cool, will do and +2	15:48
gibi	bauzas: thanks a lot	15:48
mriedem	gibi: nits in https://review.opendev.org/#/c/676140/19 for a follow up	15:48
openstackgerrit	Sylvain Bauza proposed openstack/nova master: Fix the race in confirm resize func test https://review.opendev.org/681238	15:48
sean-k-mooney	efried_afk: alex_xu ill try to come back to the pmem stuff in an hour or so. i need to review stephens patch then sync with artom	15:49
gibi	mriedem: ack. thanks	15:49
sean-k-mooney	stephenfin: you wanted me to review https://review.opendev.org/#/c/671793/ specifcally right. should i also review the rest	15:50
stephenfin	sean-k-mooney: would be helpful, aye	15:50
alex_xu	sean-k-mooney: i got one http://paste.openstack.org/show/774842/	15:50
stephenfin	I'm marching through the pmem stuff	15:50
* bauzas is done for the day, sorry stephenfin :(		15:50
bauzas	but I'll look at your series tomorrow morning	15:51
sean-k-mooney	alex_xu: looking	15:51
artom	sean-k-mooney, for once, I think I'm good, and do not require your excellent services :)	15:51
sean-k-mooney	alex_xu: i think that that is correct. it is creatin a virtual numa toplogy but it is not tiying it to the host in any way	15:52
sean-k-mooney	artom: the pmem device is also assocaitated with the virtual guest numa node 0	15:53
*** gbarros has quit IRC		15:53
sean-k-mooney	artom: are you working on fixing the persitence issue	15:53
sean-k-mooney	artom: i jsut wanted to circle back and see if you had anything form me to test or if i should start looking into where to fix the issue	15:54
artom	sean-k-mooney, I added instance.refresh() back it, so that's settled	15:54
alex_xu	sean-k-mooney: but I think this is wrong https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@5458	15:54
sean-k-mooney	artom: ok did you push that?	15:54
artom	The func tests weren't hitting it because driver.cleanup() is called conditionally, and the func test env isn't meeting those conditions	15:54
artom	sean-k-mooney, I did push	15:54
alex_xu	sean-k-mooney: I checked the nova show, I saw there is "hw:numa_nodes" being added, so I guess that is persistented in the db	15:55
sean-k-mooney	ok then ill check it locally unless you have already	15:55
artom	If I just change the code to always call it, I can reproduce	15:55
sean-k-mooney	artom: that is not correct	15:55
artom	And instance.refresh() does indeed fix it	15:55
artom	sean-k-mooney, I know :) It was just to test	15:55
mriedem	stephenfin: did you ever re-post your PCPU upgrade ML thread with [nova] tagged on it to actually get operator visibility?	15:55
artom	Next step is to trigger driver.cleanup in the "real" way in func tests	15:55
sean-k-mooney	we should not see hw:numa_nodes in nova show	15:55
alex_xu	sean-k-mooney: yes, we shouldn't	15:55
sean-k-mooney	artom: sorry that was for allex	15:56
sean-k-mooney	artom: i have not looked at your change	15:56
stephenfin	mriedem: No, it didn't seem necessary since we'd solved the upgrade issue in a way that didn't require anything special from the operator	15:56
*** damien_r has quit IRC		15:56
stephenfin	outside of bog standard config options	15:56
sean-k-mooney	alex_xu: we should not see hw:numa_nodes=1 if its not in the flavor	15:56
sean-k-mooney	alex_xu: we do not see that when you get implcit numa toplogyies in other cases	15:56
sean-k-mooney	alex_xu: so if you are seeing it then the code is incorrect	15:57
artom	sean-k-mooney, we're good, don't worry :)	15:57
mriedem	stephenfin: isn't dansmith's comment all about a nasty upgrade problem?	15:57
alex_xu	sean-k-mooney: I think the problem is the patch is change instance.flavor direclty, after driver.spawn, the nova-comptue update the instance object, then persistent it into the db.	15:57
mriedem	to which operators, like mnaser, might want to weigh in?	15:57
sean-k-mooney	alex_xu: if we were to cold migrate the instance the behavior sould chagne if we save it to the db	15:57
mnaser	hm	15:57
* mnaser can read context now while munching food		15:57
dansmith	mriedem: yeah, the "plan" doesn't seem super great to me as currently laid out :/	15:57
sean-k-mooney	alex_xu: ya we shoudl not be changeing tehe flavor at all	15:57
*** ociuhandu has quit IRC		15:58
stephenfin	mriedem: an intractable one though. Even if operators don't like the little dance we're doing, I fail to see how there's an alternative	15:58
mriedem	mnaser: you'd need someone to tl;dr it (i would also)	15:58
sean-k-mooney	alex_xu: the other case dont chagne the flaovr they just create a numa toplogy	15:58
mriedem	since there are 5 conversations going on at once in here right now	15:58
mnaser	sounds like an ML post?	15:58
mnaser	that i can read	15:58
mriedem	mnaser: there was one which no operators read :)	15:58
mriedem	b/c it was'nt tagged for [nova] or [ops]	15:58
sean-k-mooney	so the pmem code is tacking a shortcut by updating the flavor. on a hard reboot that instance would be pinned	15:58
*** ociuhandu has joined #openstack-nova		15:58
stephenfin	sean-k-mooney: I'm looking at that at the moment. I don't like it.	15:59
stephenfin	Not at all	15:59
stephenfin	Assuming you're referring to https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py@5458	15:59
mriedem	stephenfin: for the sake of everyone's clarity, could you post a new ML thread with the proposed upgrade path for PCPU and tag with [nova] and [ops]?	15:59
sean-k-mooney	stephenfin: if we need to create a numa toplogy we should move it to where we do it for hugepages right?	15:59
sean-k-mooney	stephenfin: yes	15:59
sean-k-mooney	stephenfin: that is the hack that i dont like	16:00
stephenfin	sean-k-mooney: exactly what I'm writing in a comment as we speak	16:00
stephenfin	what is it with people trying to hack flavors :D	16:00
stephenfin	mriedem: sure	16:00
stephenfin	though I really don't see the point	16:00
dansmith	stephenfin: "intractable" is a bit of a silly characterization :)	16:00
alex_xu	sean-k-mooney: yea, agree with you	16:00
stephenfin	because the only people that can solve this are in this channel/on the review already	16:00
sean-k-mooney	stephenfin: the issue is that we want a numa topology in the xml. but not in the numa toplogy filter	16:01
stephenfin	dansmith: Possibly :) I have been thinking about this for quite some time though and we've gone through a lot of options, so it starts looking like that to me, heh	16:01
*** ociuhandu has quit IRC		16:02
*** ociuhandu has joined #openstack-nova		16:03
sean-k-mooney	alex_xu: since stephenfin is looking at it im gong to review his cpu code then ill come back to this after i test artoms code	16:03
mriedem	stephenfin: saying "because the only people that can solve this are in this channel/on the review already" is not true imo - if you've got a hard upgrade thing coming for operators, you likely should get some feedback from them before pushing forward	16:03
alex_xu	sean-k-mooney: thanks a lot	16:03
sean-k-mooney	mriedem: the upgrade will be signifcantly harder if we also have to deal with numa in placment in the same release	16:04
mriedem	sean-k-mooney: i don't know what that has to do with this at all	16:04
sean-k-mooney	mriedem: if we defer pcpus in placment to U we will have to deal with both in one go	16:04
mriedem	i didn't say anything about deferring	16:04
mriedem	i said, does anyone outside of the 3 people reviewing this that will actually have to deal with the upgrade know what the plan is	16:05
mriedem	and are they ok with it	16:05
*** tbachman has quit IRC		16:05
sean-k-mooney	right but the current upgrade approch is the best we could come up with and we went to the MLs and asked if the toggel was ok	16:05
mriedem	and no operators even saw that thread,	16:05
mriedem	which is why i asked (again) if it could be posed with a [nova][ops] tag	16:06
mriedem	to get visibility	16:06
mriedem	lack of feedback from operators is not agreement	16:06
*** tesseract has quit IRC		16:06
sean-k-mooney	well we did ask cern in irc	16:06
*** rpittau is now known as rpittau\|afk		16:06
dansmith	I think it's probably good to get feedback not just from ops,.	16:06
sean-k-mooney	but i would have liked other to comment too	16:07
dansmith	but from people that have to do this in the deployment tools	16:07
mriedem	it would be a lot better to know before releasing train that "this sucks but it's not terrible" rather than "this is a no-go for me"	16:07
dansmith	as this adds at least one more atomic reconfigure/restart of the deployment	16:07
mriedem	sure, i lump mnaser into the ops and tooling (OSA) camps	16:07
dansmith	yup	16:08
stephenfin	I don't see what the actual issue is though	16:08
sean-k-mooney	dansmith: for what its worth we talked about this internally with our tripleo folks that will be implementing and they were ok and actully prefered the seperate config flip step	16:08
mriedem	sean-k-mooney: and cern (surya? belmiro?) said what?	16:08
sean-k-mooney	mriedem: we ask belmiro	16:08
dansmith	sean-k-mooney: preferred to what/	16:08
sean-k-mooney	and he was ok with the config	16:08
stephenfin	You do your upgrade and nothing changes. At some point after the upgrade, you go tweak knobs on the compute nodes followed by a knob on the scheduler	16:08
stephenfin	and you're done	16:08
sean-k-mooney	ill see if i can find the irc logs	16:08
dansmith	stephenfin: and restart the whole deployment atomically :)	16:08
stephenfin	no, you don't need to do that	16:09
dansmith	no?	16:09
dansmith	you say "immediately" in your comment	16:09
stephenfin	I said we'd have to do that immediately if I wasn't doing the things I was doing to prevent that	16:09
sean-k-mooney	dansmith: the alternitve was to do the doble report of resouces as both vcpu and pcpu by the way. and that was not done for a spciric reason i cant rememebr	16:10
*** tbachman has joined #openstack-nova		16:10
dansmith	sean-k-mooney: yeah, that's a terrible alternative, agreed :)	16:10
dansmith	"Would you prefer an extra config step or a kick in the nuts?"	16:10
dansmith	I could get most people to agree to the first	16:11
sean-k-mooney	it did not require the config and it was self healing but ok	16:11
dansmith	seems to me that with the current plan,	16:11
dansmith	after they've upgraded,	16:12
sean-k-mooney	i think we did not do it because of an issue with reshapes	16:12
stephenfin	sean-k-mooney: Not reshapes, no	16:12
stephenfin	http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008501.html	16:12
dansmith	they have to change all (or some fraction) of their computes to the new config to expose the new resources,	16:12
dansmith	restart them,	16:12
dansmith	then tweak the scheduler config to ask for the new thing, then restart those,	16:12
dansmith	then fix the rest of the computes before running out of capacity, and then restart those	16:12
dansmith	right?	16:12
*** lpetrut has quit IRC		16:12
sean-k-mooney	dansmith more or less	16:12
dansmith	that's the most graceful thing	16:12
stephenfin	dansmith: exactly, yeah	16:13
dansmith	which sounds like (a) hard to automate and (b) laborious	16:13
dansmith	otherwise you're looking for full atomic downtime while you do all that in one go	16:13
dansmith	I can't imagine OSA is going to decide what the sufficient fraction for conversion is,	16:13
sean-k-mooney	well for FFU we take down the whole cloud contol plain so that not unprecidented	16:13
dansmith	convert that set, reconfig/restart control services, etc	16:14
stephenfin	the scheduler option exists to prevent the need for that atomic upgrade	16:14
dansmith	sean-k-mooney: this is for rolling one release	16:14
stephenfin	*exists solely	16:14
*** bbowen__ has quit IRC		16:14
mriedem	sean-k-mooney: vexxhost doesn't need to FFU because they actually don't suck at CD	16:14
dansmith	stephenfin: which likely only works for the case where the humans decide when to throw that switch	16:14
sean-k-mooney	mriedem: :) yes but telco dont upgrade untill the last second.	16:15
stephenfin	dansmith: Yeah, I've told mschuppert et al internally to not even try automating this in TripleO	16:15
dansmith	stephenfin: exactly	16:15
stephenfin	dansmith: But manual human intervention is going to be necessary anyway	16:15
mriedem	oh right, openstack's only consumer, telco's	16:15
dansmith	stephenfin: I don't think that's a given	16:15
stephenfin	yeah, it is	16:15
dansmith	alright, well, end of discussion then huh?	16:16
sean-k-mooney	stephenfin: well they will be automatining it as a seperate step that you run but the full detail are tbd	16:16
stephenfin	wait, I'm preparing my longer answer :)	16:16
stephenfin	we can't tell if a host is intended for pinned workloads, unpinned workloads or (bad!) both	16:16
dansmith	if converting from the current cpuset config to the new one is not something a computer can automate, then this is all unreasonable, IMHO	16:17
stephenfin	so we can't therefore tell whether we should be mapping 'vcpu_pin_set' to '[compute] cpu_dedicated_set' or '[compute] cpu_shared_set'	16:17
stephenfin	assuming 'vcpu_pin_set' is even set, which it doesn't have to be	16:17
mriedem	stephenfin: couldn't we detect that if the compute was reporting a trait saying what it's configured for?	16:17
dansmith	mriedem: he's saying the config doesn't currently include the intended behavior	16:17
dansmith	which is fine, the operator may have to tell the tool what they're using their pinning for,	16:18
stephenfin	mriedem: To paraphrase a kid with a spoon, there is no trait	16:18
dansmith	but the actual conversion of the formats does not need to be hand-edited everywhere	16:18
mriedem	stephenfin: i don't know what that means (the spoon kid thing) but sure there is never a trait unless we add one	16:18
dansmith	matrix	16:18
dansmith	can't believe you didn't get a 90s movie reference dude	16:18
mriedem	my point was, if the control plane needs to know things about how the compute is configured/supported, then we use traits for that now	16:18
mriedem	i'm not a matrix fanboy	16:19
* dansmith glares		16:19
stephenfin	mriedem: We don't need a trait though - we have resources	16:19
dansmith	right, that's not really the problem	16:19
stephenfin	the problem is that we're going from a world where two different types of resource have been munged together, and we're trying to unmunge them as cleanly as possible	16:20
dansmith	the controller side of this seems easy to make flexible enough to handle the rolling config of computes to me	16:20
stephenfin	using service versions?	16:21
dansmith	I would say that the scheduler should ask for the new format by default. If placement returns some options, then we filter and schedule to those if possible. Basically, prefer the upgraded machines	16:21
dansmith	if we get back no candidates or filter them all out,	16:21
sean-k-mooney	we are going form a world where teh VCPU reouse was the number of virtual cpus avialabel to it meaning the number of shared cpus	16:21
dansmith	we check the service version to determine if there are old computes in the deployment. If so, we query again for the older format to see if there's any room that way	16:21
artom	So... maybe we need entirely new resource names then?	16:21
dansmith	potentially cache that determination for ten minutes or something, but...	16:22
mriedem	artom: moving away from VCPU is a non starter to me	16:22
dansmith	then you can upgrade and convert computes in one step, which is what OSA and other tools are going to want to do..	16:22
mriedem	it's baked into everything	16:22
dansmith	they want to upgrade and fix the config as one step generally	16:22
artom	mriedem, I'm not saying remove it, I'm saying leave it "legacy CPU resource thing", and come up with new resources to mean shared CPU and dedicated CPU	16:23
dansmith	artom: -3	16:23
* artom shuts up in a corner		16:23
artom	Anyways, I have 0 context and func tests to fix	16:23
* artom tries to stop bike-shed-astinating		16:24
* dansmith whips artom like a mule		16:24
sean-k-mooney	artom: we talked about SCPU and PCPU in the past but we said no we want to keep VCPU resouces	16:24
artom	dansmith, hey man, dinner first	16:24
sean-k-mooney	so way to late to go back that route	16:24
stephenfin	dansmith: so tl;dr: kill the static scheduler-only config option and instead do "give me PCPU, but if you can't give me PCPU then search for VCPU" instead	16:24
dansmith	stephenfin: yes, but only the last part of there are old computes around. once you do that one time and find everything is upgraded, stop even doing that check	16:25
dansmith	stephenfin: remove that compat step in U, no deprecation cycle needed for that	16:25
stephenfin	What do you mean by old computes?	16:25
sean-k-mooney	dansmith: if we did that we would need a trait to make the scond query safe	16:25
mriedem	stephenfin: older than train computes	16:26
mriedem	based on the nova-compute service rpc api version	16:26
dansmith	stephenfin: service version will tell you if all computes have been upgraded.. so actually, maybe just always do that in T if no candidates, because they could do the upgrade and the config tweak separately	16:26
sean-k-mooney	we would need a support_PCPU capablity trait and need to add it as a forbiden trait for the second query	16:26
*** bbowen__ has joined #openstack-nova		16:26
stephenfin	that won't work though	16:26
dansmith	stephenfin: so yeah, what you said	16:26
stephenfin	because by default we don't report PCPU on Train	16:26
stephenfin	doing so would force people to set new config options as soon as they upgrade	16:27
stephenfin	which we can't do	16:27
sean-k-mooney	stephenfin: if you look at artom migratio ncode we ignore the "can live migrate with numa" config if everythin is upgraded	16:27
dansmith	stephenfin: that's why I backed away from the version check there	16:27
sean-k-mooney	that would still require everything to be restated however.	16:28
stephenfin	sean-k-mooney: right, but that's because the "can live migrate with numa" check only needs to know that code is new enough. it doesn't need the operator to tweak some config first	16:28
stephenfin	which this does	16:28
stephenfin	dansmith: Ah, yeah, I missed the "so actually"	16:28
dansmith	stephenfin: the reason I was heading in that direction, is because:	16:29
sean-k-mooney	i think if we were to do the double query as i said we need the compute capableity trait to protect against lading on a new host when old hsot are available and we ask for vcpus	16:29
mriedem	you'd only do that fallback for VCPU if the flavor in the request spec (in the scheduler) has resources:PCPU=x right?	16:29
stephenfin	sean-k-mooney: not gonna happen - the NUMATopologyFilter or libvirt driver protect us	16:30
dansmith	stephenfin: I was going to suggest we also always expose the inventory, even if we have to synthesize it from the older config, but I'm guessing you're going to say we'd potentially expose as shared or dedicated and be wrong for the intention of the operator right?	16:30
sean-k-mooney	mriedem: no we are translating hw:cpu_policy=dedicated into PCPU requests	16:30
dansmith	stephenfin: require them to convert their configs before U, but expose the new inventory right away	16:30
mriedem	ok but my point is, just make sure we're not unconditionally doing that fallback re-query	16:30
stephenfin	sean-k-mooney: because if you land on a Train compute node, that'll have explicitly set 'NUMATopology.pcpuset' to None	16:30
dansmith	mriedem: yes, only do the fallback query for PCPU things	16:31
sean-k-mooney	ok we could have issue with the limit paramater on placement	16:31
stephenfin	and if that's None, we've got nothing to pin to	16:31
stephenfin	so placement will pass but the filter will fail	16:31
sean-k-mooney	but if we do the second query with out a limit then ya the numa toplogy filter would prevent that	16:31
sean-k-mooney	so we dont need the trait	16:31
stephenfin	dansmith: Yeah, exactly	16:32
dansmith	I don't think the limit is a problem	16:32
*** N3l1x has joined #openstack-nova		16:32
dansmith	stephenfin: so they express their intent right now how?	16:32
sean-k-mooney	dansmith: doesnt cern set it to like 15	16:32
stephenfin	dansmith: My previous solution had been to expose CPU inventory on hosts without the new configuration as both PCPU and VCPU	16:32
dansmith	sean-k-mooney: we're asking placement for a query that will return things with PCPU resources.. if placement returns nothing then there's nothing that will fit	16:33
sean-k-mooney	the default limit is 1000 so that should not be a proablem but if its really low then i expect it could be	16:33
dansmith	sean-k-mooney: their problem is different	16:33
sean-k-mooney	dansmith: im not talking baout the first query for pcpus	16:33
dansmith	stephenfin: yeah, I don't think that's a good plan	16:33
sean-k-mooney	dansmith: i ment the second query for VCPUS	16:33
*** gbarros has joined #openstack-nova		16:33
stephenfin	Yeah, it's really not. I detailed why in that ML post	16:33
dansmith	sean-k-mooney: I don't see what the problem is	16:34
sean-k-mooney	the new host can have invetories of both	16:34
sean-k-mooney	we could get 15 new hosts and no old hosts	16:34
dansmith	stephenfin: I'm asking about today.. they use this ambiguous config thing.. how do they control which type land where?	16:34
sean-k-mooney	then the numa toplogy filter would elinate all hosts	16:34
stephenfin	the scheduler option and the NUMATopologyFilter	16:35
*** mdbooth has joined #openstack-nova		16:35
dansmith	stephenfin: meaning static config to determine how to treat pinned cpu requests?	16:35
stephenfin	If the scheduler option is set to False (so it's not translating 'hw:cpu_policy=dedicated' to 'resources:PCPU') then we'll keep requesting VCPU from placement	16:36
dansmith	no,	16:36
dansmith	I'm asking about TODAY	16:36
stephenfin	oh, today	16:36
dansmith	not the mythical future where your patches are landed	16:36
stephenfin	gotcha	16:36
stephenfin	host aggregates	16:36
dansmith	right, okay	16:36
stephenfin	and metadata	16:36
dansmith	that's what I thought	16:36
stephenfin	that's what you're _supposed_ to use, of course. We never enforced it	16:36
dansmith	so the problem is that computes literally don't have access to the information they need to know what kind of thing to expose, because they have config, but no access to the aggregate info	16:37
sean-k-mooney	and windrieve developed a host agent to allow you to mix	16:37
sean-k-mooney	so they exploited that we did nto enforece it	16:37
stephenfin	correct	16:37
dansmith	stephenfin: so, if we defaulted to the looser interpretation of the inventory,	16:37
stephenfin	the aggregate metadata isn't standardized	16:37
stephenfin	and it doesn't even need to be set	16:37
dansmith	then the aggregate configs would still be in place and we could fall back appropriately maybe?	16:37
efried_afk	stephenfin: Not having looked thoroughly, if I addressed your -1s on the pmem series, a) would I still be able to +2 in your opinion; b) would it do any good over waiting for luyao to hit it overnight, in terms of you being able to get back to it today?	16:37
dansmith	anyway, we're getting off on a tangent a bit I think, so let me summarize:	16:38
dansmith	I think we can do away with the scheduler knob by doing the query-nay-requery approach to prioritize upgraded and converted computes	16:39
*** igordc has joined #openstack-nova		16:39
dansmith	it would be nice if we could make the compute side smarter and/or default to a closer runtime scenario,	16:39
stephenfin	efried_afk: So if the comments were addressed could you +2 in my absence? I guess, but I'm happy to review again tomorrow too	16:39
dansmith	but I'm less concerned about that if they can be reconfigured and restarted in isolation to be picked up by the scheduler by default	16:39
*** efried_afk is now known as efried		16:39
efried	stephenfin: I meant me +2ing on myself, not assuming your +2.	16:39
stephenfin	ohh	16:40
stephenfin	Yeah, sure. There's no major rework necessary in anything I've reviewed so far	16:40
stephenfin	this will need a good bit of modification though, I suspect https://review.opendev.org/#/c/678455/25/nova/virt/libvirt/driver.py	16:41
*** jawad_axd has quit IRC		16:41
stephenfin	dansmith: Yup, that all makes sense to me	16:41
efried	I ain't touching that patch	16:43
stephenfin	good call :)	16:43
gibi	mriedem: replied in https://review.opendev.org/#/c/676140 with some questions regarding the private helper you suggested	16:43
*** igordc has quit IRC		16:44
*** derekh has quit IRC		16:44
* gibi needs to drop for today		16:45
*** maciejjozefczyk has joined #openstack-nova		16:47
mriedem	gibi: replied	16:48
*** awalende has joined #openstack-nova		16:55
*** maciejjozefczyk has quit IRC		16:56
*** awalende has quit IRC		17:00
*** ociuhandu_ has joined #openstack-nova		17:00
*** ociuhandu has quit IRC		17:03
*** brault has joined #openstack-nova		17:03
*** ociuhandu_ has quit IRC		17:05
*** brault has quit IRC		17:08
sean-k-mooney	i need to take a break for a bit to clear my head so im going to have something to eat.	17:11
sean-k-mooney	i just kicked of stacking with the latest version of artoms code	17:11
sean-k-mooney	ill start testing it when i get back	17:11
*** udesale has quit IRC		17:12
*** tbachman has quit IRC		17:14
openstackgerrit	Merged openstack/os-resource-classes master: Update api-ref link to canonical location https://review.opendev.org/681235	17:21
*** brault has joined #openstack-nova		17:22
efried	dansmith: bottom two vpmems on your radar for today? https://review.opendev.org/#/c/678447/	17:23
efried	hopefully easy, minor updates per your prior comments	17:24
dansmith	efried: no, I gotta give a talk in a bit and I spent too much time this morning on reviews already	17:24
dansmith	if comments were addressed it's probably easy for other people to confirm,	17:24
efried	if you're okay with that, then sure.	17:25
dansmith	but for the record, I'm scared of both pcpu and vpmems at this point	17:25
efried	imo the changes are simple and in line with what you requested	17:25
efried	I don't blame you. But hey, it's just FF. We have weeks to fix bugs :P	17:25
* efried feeds face		17:26
*** brault has quit IRC		17:27
*** ralonsoh has quit IRC		17:32
*** cdent has quit IRC		17:33
*** nicolasbock has quit IRC		17:35
*** tbachman has joined #openstack-nova		17:37
*** ralonsoh has joined #openstack-nova		17:39
*** nicolasbock has joined #openstack-nova		17:40
mriedem	i'll call it, the first major bugs from pmem and pcpu regressions will be after vexxhost upgrades to train, and then in about 18-24 months when other deployments start upgrading to train :)	17:43
mriedem	maybe cern in a year	17:43
dansmith	heh, yeah, we won't find any of the bugs in the FF->release window	17:43
*** igordc has joined #openstack-nova		17:44
mriedem	is there any way to test pcpu in a gate job if we ran tempest smoke tests in serial or something? like wouldn't that just be a matter of creating a flavor with PCPU, single node devstack?	17:45
mriedem	and configuring n-cpu for the dedicated CPUs on the host?	17:45
mriedem	basically all of them	17:45
*** trident has quit IRC		17:46
dansmith	I'm less concerned about if it actually works in a contrived scenario, and more about an existing deployment trying to get through the upgrade and/or being able to use it without regressions or other issues	17:46
mriedem	sure, and functional tests are good, they just aren't a replacement for the real thing	17:47
dansmith	presumably vpmem testing is pretty much impossive	17:47
dansmith	*impossible	17:47
mriedem	i assume so, hence the 3rd party ci	17:47
*** igordc has quit IRC		17:50
*** gbarros has quit IRC		17:51
artom	Which, btw, came up super fast (though I dunno if it had been in the works for a long time before)	17:57
*** igordc has joined #openstack-nova		17:57
artom	RH can learn a thing or 2	17:57
*** gbarros has joined #openstack-nova		17:58
*** trident has joined #openstack-nova		17:59
dansmith	mriedem: so numa LM should be ready for you, if I understand the current state right?	18:02
dansmith	I hit the object patch again a bit ago	18:02
dansmith	I think the third patch should be good too, but you wanted some verification from the NUMA boyz which sent it off into that cpu pinning tangent, so not sure if you're still waiting for something there	18:03
*** igordc has quit IRC		18:06
*** igordc has joined #openstack-nova		18:06
mriedem	artom: came up super fast? you mean the 3rd party CI did?	18:10
artom	mriedem, yeah	18:10
artom	Intel_Zuul appeared a few days ago	18:10
mriedem	dansmith: yeah it's sitting in a tab, was going through gibi's series until i hit a stopping point, which i just did	18:10
*** jdillaman has joined #openstack-nova		18:10
mriedem	gibi: https://review.opendev.org/#/c/676972/ re-introduces the race you just fixed, so i'll rebase the series and fix that and +W once your other change is merged	18:10
artom	I talked with efried a few days ago, mention the apparent disconnect between the "burden of proof" on my for NUMA LM, vs the VPMEM stuff that was apparently ready to go in without any public demonstrations of it working	18:11
mriedem	and "Intel NFV CI" comments on everything immediately just to say it was skipped, which is annoying	18:11
mriedem	efried: any way you can get "Intel NFV CI" to shut up if it's not going to do anything?	18:11
mriedem	it hasn't done anything for years	18:11
artom	Dunno if Intel_Zuul popping up was related, but the coincidence was interesting :)	18:11
mriedem	artom: fwiw i'm not ready for vpmem to go in	18:12
efried	mriedem, artom: The Intel_Zuul is actually running. It's only running pmem, three tests. Unfortunately at the moment it's hardcoded to pull down an old version of the pmem series.	18:12
dansmith	artom: well, one difference is that yours has the potential to break existing functionality, whereas vpmem hopefully won't break anything existing, only people that try to use it	18:12
efried	right ^	18:12
artom	dansmith, yeah, efried made the same argument - and it's true	18:12
dansmith	artom: not that it's no risk at all, but it is a teensy bit different	18:12
artom	dansmith, yep, I get you	18:13
mriedem	except all of the resource tracker weird side bugs refactoring that code will probably introduce on everyone	18:13
dansmith	well, that's true	18:13
efried	mriedem: retrieving the allocations earlier than we were before. That's the only difference.	18:13
efried	All of it runs under COMPUTE_RESOURCE_SEMAPHORE	18:13
artom	Anyways, my point was: Intel spun up a CI for their RFE in a matter of days (apparently). RH sucks if we can't do the same	18:13
efried	did before, does now.	18:13
mriedem	artom: the guy was working on that since denver i think	18:14
mriedem	efried: btw i was harassing you about "Intel NFV CI" not Intel_Zuul	18:14
artom	mriedem, on the CI?	18:14
efried	yeah, I can go ask, but it wasn't like days	18:14
mriedem	"Intel NFV CI" is an old thing that no longer does anything except comment that it's not doing anything	18:14
mriedem	artom: yeah	18:14
mriedem	we were talking about 3rd party CI in denver	18:14
mriedem	for vpmem	18:14
artom	mriedem, OK, I'll eat my words then	18:14
mriedem	you and the other red hat bros might have been hungover still from the bar the night before :P	18:15
efried	mriedem: fwiw Intel NFV CI they're trying to resurrect to do the thing it was originally intended for.	18:15
*** igordc has quit IRC		18:15
mriedem	efried: i've been hearing that for months	18:15
efried	first step was turning it back on to be a no-op	18:15
mriedem	i'd like it to shut up until it actually delivers	18:15
efried	yeah I know.	18:15
artom	mriedem, I have Russian roots, calling me an alcoholic is a noop ;)	18:15
sean-k-mooney	mriedem: we shoudl be able to test the pcpu stuff in the gate yes	18:16
mriedem	https://www.youtube.com/watch?v=soNcOfRvOtg	18:16
dansmith	<3	18:17
dansmith	I love me some early 'priest	18:17
sean-k-mooney	i was talking to a few people on the infra channel and i think we migth be able to replace teh intel nfv ci with first party ci.	18:17
*** eharney has quit IRC		18:17
sean-k-mooney	we might be abel to get multi numa nested virt lables form limestone and vexhost in the future	18:18
sean-k-mooney	we can do non numa testing in the gate already as we have 3 providers with nested virt capablity	18:18
sean-k-mooney	or rather singel numa	18:19
sean-k-mooney	the testing i set up for the numa live migration was/is testing with cpu pinning, multiple numa nodes and hugepages	18:21
mriedem	hmm, it seems weird to me that the intel pmem job is running and passing on patches that don't have anything to do with pmem when pmem isn't merged	18:23
mriedem	like, how does that even work?	18:23
sean-k-mooney	mriedem: they have hardocde a specifc version of the patch to be merged in	18:24
sean-k-mooney	they were tring to un do that earlier	18:25
mriedem	2019-09-10 03:27:01.729430 \| TASK [upgrade-libvirt-qemu : apply vpmem patch]	18:25
mriedem	aha	18:25
efried	yeah	18:25
mriedem	refs/changes/70/678470/13 -> FETCH_HEAD	18:26
efried	unfortunately it's applying PS13, not the latest	18:26
mriedem	that patch is up to PS27 now	18:26
sean-k-mooney	yep	18:26
efried	known, Rui is supposed to unwind that asap.	18:26
mriedem	i'm not sure why you wouldn't just only run it on patches in that series and skip for anything else	18:26
efried	that would have been another way to do it.	18:26
efried	though it wouldn't have made sense to do it for patches at the bottom either	18:27
mriedem	artom: sean-k-mooney: so this is the numa lm patch/job we care about right? https://review.opendev.org/#/c/680739/	18:27
sean-k-mooney	yes more or less	18:27
mriedem	which hasn't run on the latest series of patches	18:27
sean-k-mooney	yes so if you recheck it	18:28
sean-k-mooney	it will just run the singel job we want	18:28
sean-k-mooney	shall i do it	18:29
mriedem	ye shall	18:29
artom	mriedem, it hasn't - what's FN's status, are we back up?	18:29
artom	donnyd ^^?	18:29
donnyd	yea	18:29
donnyd	it was back up yesterday	18:29
artom	👍	18:29
sean-k-mooney	ya there is ocationally a network issue	18:29
sean-k-mooney	becaue we curently cant fail over to another cloud it more obvios with this job	18:30
donnyd	im not sure why this particular job keeps doing that too... it seems to be failure more often than not	18:31
donnyd	every now and again i get something that pukes... but by and large it works	18:32
sean-k-mooney	well this job cant retry on another provider and im not sure if zuul will retry on the same one	18:32
sean-k-mooney	so i think its more a case of it works our your out of luck	18:33
openstackgerrit	melanie witt proposed openstack/nova master: WIP Include error 'details' in dynamic vendordata log https://review.opendev.org/681329	18:34
donnyd	Well if FN was having this kind of failure rate we are seeing with this job... I am pretty sure the other projects would have already kilt me	18:35
sean-k-mooney	well i admit it does look like there is something else going on but im at a loss as to what	18:36
donnyd	I don't think we were having this much of an issue when it was in a seperate pool on the other label	18:36
sean-k-mooney	well we have hit a few different issues. 1 was quotas when we change pool	18:37
sean-k-mooney	then the other issue is the ssh connection	18:37
sean-k-mooney	i think the quota issue has gone away sicne you are nolonger managin that on your end and contoling it via nodepool	18:38
donnyd	well there is no more quota (within reason) max-servers is set to 70 and quota for instances set to 100	18:38
donnyd	everything else on quota is -1	18:38
sean-k-mooney	ya	18:38
sean-k-mooney	the pools were both the same project on the same cloud right	18:39
*** ociuhandu has joined #openstack-nova		18:39
sean-k-mooney	so it should not affect this behavior	18:39
donnyd	correct	18:39
donnyd	should is the opportune word	18:39
sean-k-mooney	yes this also should work :)	18:39
donnyd	LOL sean-k-mooney	18:39
mriedem	https://review.opendev.org/#/c/634606/ went from PS75 to PS83 in a hurry	18:39
donnyd	So the ssh thing I have some theories on and a fix in flight	18:40
donnyd	my edge router could be a little (lot) better	18:40
donnyd	so its possible that its struggling with all of the connections... or i have a knob i need to turn	18:40
donnyd	load is not high according to cpu / mem / network... but that doesn't mean something else isn't borked... so I am digging	18:41
sean-k-mooney	ya perhaps you said your using bgp to advertiese the block right but i assume you dont need to adverteise each /64 you are delegating to the vms and have a /48 or something from your isp	18:42
donnyd	each vm is on the same /64 that is advertised (i know... not cloudy) at the edge	18:43
*** ociuhandu has quit IRC		18:44
donnyd	the zuul tenant has one /64 and its already advertised.. so if we had a routing issue it would be pretty big	18:44
sean-k-mooney	donnyd: oh you have a /64 for the sight and the vms are getting a /128? form that subnet	18:44
donnyd	its also possible my state table is too small	18:44
donnyd	I have tweaked some knobs to start	18:44
donnyd	so hopefully the issue goes away	18:44
sean-k-mooney	i know that alot of hardware router assume that endpoing are /64 so i know use /128 or more stict routes can cause issue somethimes	18:45
donnyd	yea, i tried to get the functionality to work pretty much like v4 so people don't get too confused	18:45
donnyd	The way its setup is exactly like if your isp provided you a /64	18:46
sean-k-mooney	right	18:46
sean-k-mooney	ok makes sense	18:46
donnyd	all the systems on your network could get addresses off that /64 and the isp routes it	18:46
*** ralonsoh has quit IRC		18:46
donnyd	that is exactly how I have FN setup	18:46
donnyd	not great for billing (don't have billing at FN), but great for actual networky things	18:47
donnyd	so if two instances needed to talk in the same tenant... they could just start talking..	18:47
sean-k-mooney	im currently trying to get ipv6 working via a HE.net ipv4->ipv6 tunnel but i have only got as far as my router has ipv6	18:47
sean-k-mooney	when i had my client get ipv6 i had mtu issue	18:47
sean-k-mooney	so your doing better then me at getting ipv6 to work	18:48
donnyd	i have my /48 routed to me via HE and then I am able to pass out /64's to tenants	18:48
sean-k-mooney	ya that is what i setup on my router too but couldnt get the mtu clamping to work	18:49
*** gyee has quit IRC		18:49
sean-k-mooney	so i could get to ipv6 sight but packtes over aboutu 1350-1400 bytes were droped	18:49
donnyd	that is strange	18:50
donnyd	what are you using at your edge?	18:50
*** gyee has joined #openstack-nova		18:50
donnyd	could be ISP not doing you any favors	18:50
donnyd	i have a business connection, so they pretty much leave me alone	18:50
sean-k-mooney	a ubiquite edgerouter-x well actully that is not my edge router	18:50
sean-k-mooney	my isp router is infront of it	18:50
sean-k-mooney	so you my isp router could be messing it up	18:51
* dansmith is thankful to have native ipv6 these days		18:51
dansmith	also a business connection here and they'll give me multiple /64s for my subnets	18:51
sean-k-mooney	i pay extra for a static ip but they wont let me pay for a buisness connection and that is the only way to get native ipv6	18:52
dansmith	our residential connections here all have native ipv6,	18:52
dansmith	but not sure those people can get multiple /64s like I can	18:52
sean-k-mooney	most are on cable broadband here but not vsdl	18:52
donnyd	dansmith: verizon fios business doesn't even have v6 to offer	18:53
dansmith	donnyd: sucks	18:53
sean-k-mooney	all the fiber to home stuff is ipv6 enabled	18:53
dansmith	mine is cable	18:53
donnyd	yea.. its pretty frustrating	18:53
dansmith	I don't like to say nice things about comcast, but they do have the ipv6 stuff pretty well sorted at this point (and have for a couple years)	18:53
donnyd	the network is pretty quick... so the HE overlay doesn't really seem to be hurting for performance	18:53
dansmith	yeah, I loved my fios business for speed when I had it	18:54
donnyd	yea I had them when i was in CoSprings and their business class was pretty good	18:54
dansmith	but then moved outside their area and they mostly stopped expanding it	18:54
*** brinzhang has quit IRC		18:54
*** brinzhang has joined #openstack-nova		18:54
*** panda is now known as panda\|rover\|off		18:54
sean-k-mooney	artom: sorry this took so long but the latest version is updating the numa toplogy blob in the db correctly	18:55
sean-k-mooney	so puting back in the instance.refresh() or whatever you did fixed that issue	18:56
sean-k-mooney	im going to try testing a bunch of different cases but are there any in partcalar people want me to test	18:58
*** ricolin has quit IRC		18:58
mriedem	artom: so don't respin now since we need to get a result from that ci job, but queue this locally https://review.opendev.org/#/c/634606/83	19:05
artom	mriedem, ack	19:06
artom	sean-k-mooney, cool, thank you :)	19:06
artom	I managed to hit it func tests, as I said. But... that was by cheating and setting do_cleanup = True in the code itself, to trigger the driver.cleanup() call	19:07
mriedem	smells like a networking orgy in here	19:08
artom	So I'm trying to do it correctly by forcing making is_shared_instance_path false	19:08
artom	Which leads to a whole other rabbit hole...	19:08
dansmith	mriedem: efried: not sure if you saw this, but alex_xu confirmed that we just stop caring about quotas on pcpu instances (IIUC) per my question about how it works: https://review.opendev.org/#/c/674895/32/nova/virt/libvirt/driver.py	19:08
mriedem	stephenfin said something about dealing with quota needs to happen yet	19:09
dansmith	mriedem: efried: I can see the solution being a new quota class (ick) or lumping them together (which may be confusing) but just leaving them ignored doesn't seem like a good plan to me, especially since it differs for the user based on whether or not the deployment is configured for placement	19:09
sean-k-mooney	dansmith: for what its worth i point out the PCPU quota issue in the unified limts spec	19:11
*** eharney has joined #openstack-nova		19:11
sean-k-mooney	if we implement it next cycle we either will have 2 quotas or have to use both vcpu and pcpu count when looking at the cpu quota	19:12
dansmith	we have to do something other than just pretend they're not there regardless of when we land it	19:13
sean-k-mooney	well we did not intend to pretend they are not there	19:13
dansmith	since it's all about consuming whole cpus, I'm pretty sure that operators won't be okay with just allowing anyone to boot enough pcpu guests to exhaust the whole deployment	19:13
sean-k-mooney	but yes	19:13
dansmith	I know, but...	19:13
sean-k-mooney	personally i prefer having two spereate quotas one per RC but i can see why some wont want to distiguiush	19:14
dansmith	yeah, I mean,	19:15
dansmith	you'd think that the operators would want to quota those separately	19:15
sean-k-mooney	so they can bill for the seperatly	19:15
dansmith	no,	19:15
sean-k-mooney	because one cost them a lot more	19:15
dansmith	they will bill separately, they just need to make sure one tenant doesn't eat them all up	19:15
sean-k-mooney	well ya that too	19:16
sean-k-mooney	if we treat them sepatly we dont need to special case for them	19:16
sean-k-mooney	its just 1 limit per resource class	19:16
sean-k-mooney	so its simpler to reason about	19:16
sean-k-mooney	the down side is we need to teach people that there would now be two limits on cpus	19:17
dansmith	well, only if they configure flavors with those things	19:18
dansmith	I can't imagine anyone that enables this functionality will be fine with treating them as the same.. one costing like 100x the other :)	19:18
sean-k-mooney	by those things you mean cpu pinning	19:18
dansmith	no, I mean allowing people to ask for dedicated cpus	19:19
dansmith	isn't that the point of this?	19:19
sean-k-mooney	of pcpu in placmenet or unified limits	19:19
*** pcaruana has quit IRC		19:20
sean-k-mooney	my understanding is we are sitll going to request dedicated cpus the same way we always didn with hw:cpu_policy=dedicated	19:20
dansmith	I think I've lost grasp of this conversation	19:21
sean-k-mooney	me too	19:21
dansmith	heh	19:21
sean-k-mooney	i was going to say i dont think the way we request things is going to change	19:21
mriedem	but but but i could override my flavor vcpu with resources:VCPU=0 and define resources:PCPU=1	19:21
sean-k-mooney	i am 99% sure stephen add a check in his code to block that	19:22
sean-k-mooney	we definetly discussed addint one	19:22
sean-k-mooney	but yes today with out his patches you could	19:22
mriedem	it would be confusing anyway since flavor.vcpus would be...what?	19:22
sean-k-mooney	flavor.vcpus would be 1	19:22
mriedem	even though you're not getting vcpu, you're getting pcpu	19:23
sean-k-mooney	yes	19:23
dansmith	mriedem: right, I'm hoping we go in the direction of a resource override in the flavor and not the hw:$foo stuff	19:23
mriedem	yeah you can't even create a flavor with vcpu=0	19:23
sean-k-mooney	the placement resouce class shouldn nver have been vcpu. flavor.vcpu means virtual cpu count exposed to the guest	19:24
sean-k-mooney	dansmith: we are currently not going in that direction	19:24
sean-k-mooney	dansmith: we are currently explcitly planning ot block resouce class overrides	19:24
dansmith	mriedem: yeah, but the "rewriting" patch will cause you to get an allocation with vcpu=0, which is weirdish	19:24
dansmith	sean-k-mooney: huh?	19:24
dansmith	I dunno who the "we" is in that scenario	19:25
mriedem	we = stephen and the people approving the changes	19:25
dansmith	do you mean specifically for PCPU things?	19:25
dansmith	mriedem: heh, yeah	19:25
sean-k-mooney	well yes to both	19:25
dansmith	I definitely don't agree with blocking resource based overrides in the future :)	19:25
sean-k-mooney	erric stephen alex gibi and i were talking about this about a mothn ago	19:26
sean-k-mooney	dansmith: the reason is if we dont you need to modfiy your flavor if the toplogy of resouce changes in placmenet	19:26
sean-k-mooney	e.g. you flavor would break if we move cpus under numa nodes or cache nodes	19:26
mriedem	i get the reason in the short term	19:26
dansmith	sean-k-mooney: I don't see what that has to do with it at all	19:27
sean-k-mooney	you woudl have to chagne teh resouce: syntax to the numberd group form	19:27
efried	I'm not fully swapped into this conversation, but last time I looked you get an error if flavor.vcpu != PCPU	19:27
efried	(when PCPU is specified)	19:27
sean-k-mooney	am that is not always correct	19:28
dansmith	which is also really weird and confusing	19:28
efried	except for something something hyperthread	19:28
efried	yes, it's confusing, but it's the compromise that was agreed on in the spec.	19:28
sean-k-mooney	except if you have hw:emulator_threads_policy=isolate	19:28
efried	then you add 1	19:28
efried	right?	19:28
sean-k-mooney	in that case we allocate 1 addtional pcpu for the emulater thread	19:29
sean-k-mooney	yes	19:29
efried	that was in there too.	19:29
dansmith	well, I don't see how we can be landing any of this without the quota bit in place at the very least	19:29
mriedem	which is because of this right? https://github.com/openstack/nova/blob/stable/stein/nova/virt/libvirt/driver.py#L852	19:29
mriedem	is hw:emulator_threads_policy and that +1 thing for pcpu restricted to the libvirt driver or does that logic creep into the api?	19:30
*** nweinber_ has quit IRC		19:30
sean-k-mooney	am well that is hard to answer	19:30
mriedem	GREAT	19:31
sean-k-mooney	only the libvirt driver supprots pinning	19:31
sean-k-mooney	and this only works with pinning	19:31
sean-k-mooney	so this only works with the libvirt dirver	19:31
dansmith	that is not the right answer	19:31
dansmith	letting stuff like that creep into the API because only one driver supports it is how we have a ton of xen-specific warts on the api	19:31
sean-k-mooney	is the right answer we hope to remove that in U	19:31
sean-k-mooney	because we hope to remvoe it in U	19:32
mriedem	delicious agent builds	19:32
sean-k-mooney	now that we have support for share	19:32
sean-k-mooney	wehich mapps the emulator threads to the same cpu pool as the floating vms we dont think we need isolate anymore	19:33
mriedem	sean-k-mooney: so by remove "that" you mean hw:emulator_threads_policy ?	19:33
sean-k-mooney	yes	19:33
sean-k-mooney	what stephen and i woudl like to propose in U is this. if you have cpu_shared_set defiend the emultor thread run there. if not they run on the same cores as your pinned vm	19:34
sean-k-mooney	that is what you get if you do hw:emulator_threads_policy=share today	19:34
sean-k-mooney	and if we only support 1 value for the option we can remove it entirly and just do that by default	19:35
mriedem	efried: so you might want to not drop the -2 on https://review.opendev.org/#/c/671793/23 until the quota issue is sorted out	19:36
mriedem	this quota issue https://review.opendev.org/#/c/674895/32/nova/virt/libvirt/driver.py@7457	19:36
efried	done	19:36
sean-k-mooney	do we have quota issue wiht this?	19:36
sean-k-mooney	i though quota was being counted on flavor.vcpu	19:36
dansmith	sean-k-mooney: did you read the comments?	19:36
sean-k-mooney	not on the resouce class in train	19:36
sean-k-mooney	dansmith: no not yet	19:37
*** panda\|rover\|off has quit IRC		19:37
dansmith	sean-k-mooney: maybe do that :)	19:37
efried	good, now everyone can focus on vpmem	19:37
sean-k-mooney	ok but since we only support vms with all pinned or all shared cpus with this seriese it does not change things as far as i was aware in train	19:37
*** panda has joined #openstack-nova		19:38
dansmith	before this, we'd limit you to at least your vcpu quopta	19:38
dansmith	after this, no limits AFAICT	19:38
dansmith	honestly, I'm not sure how you're going to fix it so it works the same with placement quotas and nova quotas	19:39
sean-k-mooney	we should still be limiting on flavor.vcpu	19:39
dansmith	not with placement	19:39
dansmith	did you read the comments?	19:39
sean-k-mooney	im trying to find which patch you put it on	19:39
mriedem	(2:36:21 PM) mriedem: this quota issue https://review.opendev.org/#/c/674895/32/nova/virt/libvirt/driver.py@7457	19:39
sean-k-mooney	oh but i think i know what your saying	19:40
sean-k-mooney	we have the option to track cpu quota in placmenet already	19:40
mriedem	by default we don't use placement for quota usage counting	19:40
sean-k-mooney	we enabled it by default stien right	19:40
mriedem	f naw	19:40
sean-k-mooney	oh we dont	19:40
mriedem	where is melwitt	19:40
dansmith	we should be in train I think right?	19:40
dansmith	meaning,	19:40
dansmith	we should be turning that on but have't	19:41
melwitt	mriedem: hiding	19:41
sean-k-mooney	dansmith: so if we turn that on then yes we have a proablem	19:41
sean-k-mooney	if we dont we should not	19:41
dansmith	no	19:41
dansmith	we have a problem because people could have that on	19:41
sean-k-mooney	ture	19:41
dansmith	you know config knobs can be set to not the default right? :)	19:41
mriedem	cern uses it	19:41
sean-k-mooney	ok i see the problem. i was aware that was a thing but ya i had not consiered the impact to this	19:43
mriedem	quota is generally the last thing anyone thinks about	19:43
dansmith	if CPUs weren't the most expensive and constrained resource in the cloud, then maybe less of an issue, but.... :D	19:44
mriedem	[quota]/injected_file_content_bytes is a major concern of mine	19:45
sean-k-mooney	im not sure if sarcasim or if i shoudl feel realy bad for you	19:45
mriedem	heh, sarcasm	19:45
sean-k-mooney	dansmith: i always ran out of ram first but ya. i think the only thing we coudl do woudl be to could both inventories. at least short term	19:46
sean-k-mooney	wehere does that code live	19:47
sean-k-mooney	e.g. where we check the quota	19:47
melwitt	enforcing quota on other resource classes (DISK_GB and more) is part of the proposal in the unified limits spec, as I think sean-k-mooney mentioned earlier	19:47
dansmith	melwitt: right, but this set is taking dedicated cpus out of the equation for the cpus quota,	19:48
dansmith	effectively making them unconstrained (and unconstrainable)	19:48
melwitt	sean-k-mooney: nova/quota.py is the main file	19:48
dansmith	but they're the most expensive things	19:48
mriedem	sean-k-mooney: start here https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/quota.py#L1281	19:48
sean-k-mooney	melwitt: thanks	19:48
dansmith	and combining them together as a hack is pretty dumb, because the whole point of this effort is to make dedicated cpus be first class citizens instead of a hack	19:49
dansmith	the other problem is,	19:49
dansmith	we wouldn't want the quota behavior to be different depending on whether or not you're using placement for quota	19:50
melwitt	dansmith: yeah, sorry, was trying to say that if enforcing quota for PCPU resource class is what's needed then that's a big amount of work	19:50
dansmith	somewhat unrelated,	19:50
dansmith	melwitt: were't we going to enable quota in placement by default soon?	19:50
sean-k-mooney	dansmith: well if combined them together ti woudl be maintained previous behavior	19:50
dansmith	melwitt: yep, and I'm saying that's the right solution	19:50
dansmith	sean-k-mooney: the point of this work is to make pcpus not work like vcpus right? :)	19:50
melwitt	dansmith: I don't remember a specific timeline, just that we wanted to let it bake with cern for awhile before making it default	19:51
mriedem	idk that counting PCPU for cores quota from placement is that hard, you'd just sum VCPU and PCPU here wouldn't you? https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L2344	19:51
sean-k-mooney	and the two main goals of this work was co existing of pinned and floating vms on the same host and removeing the race on claiming pinned cpus	19:51
mriedem	and here https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L2356	19:51
dansmith	melwitt: ack	19:51
sean-k-mooney	dansmith: not direclty that is a sideffect	19:51
dansmith	mriedem: if you combine them yes	19:51
dansmith	mriedem: that's a hack to make it work, not the right solution	19:52
mriedem	i know we dont want to combine them externally, but the non-placement counting method is going to be based on instances.vcpus which comes from the embedded instance.flavor.vcpus	19:52
sean-k-mooney	dansmith: the follow up spec that wants to have a vm with some pinned cores and some floating cores need them to be different thigns	19:52
dansmith	mriedem: right but we should be moving to placement quotas anyway	19:52
dansmith	sean-k-mooney: AFAIK, this work is aimed at making pcpus not just a per-host special case of pretending that they're like vcpus	19:53
dansmith	sean-k-mooney: so, yeah, obviously separate quotas is not the goal of this,	19:53
dansmith	but all of this is important to get right	19:53
sean-k-mooney	yes	19:53
sean-k-mooney	im not disputing that at all	19:53
dansmith	otherwise people really need to segregate these hosts still, which means the rest of it doesn't get us anything	19:53
sean-k-mooney	im not sure i agree on the last point but im not going to rathole on it either	19:54
sean-k-mooney	i think this will change in U again with unified limits	19:54
sean-k-mooney	so i would prefer to groub them in Train to keep the pretrain behavior	19:54
sean-k-mooney	and then only change the behavior once	19:54
sean-k-mooney	in U when everything goes to unifed limits	19:55
dansmith	lol	19:55
dansmith	riiiight	19:55
mriedem	i want to say making count_usage_from_placement=True by default was dependent on consumer types to smooth out some of the inconsistencies with the legacy counting method today in edge cases	19:55
sean-k-mooney	the lol being ever getting to unifed limits	19:55
mriedem	like during a resize	19:55
melwitt	unified limits is going to go through the same bake time that counting usage from placement is. it will begin defaulted to False	19:56
dansmith	mriedem: I didn't think that's why we didn't make it default, because we'd still do the other thing for the non-placement-able resources	19:56
dansmith	sean-k-mooney: I'm saying don't count your eggs before they hatch	19:56
sean-k-mooney	ok	19:56
dansmith	unified limits has been a long time coming, so..	19:56
*** gbarros has quit IRC		19:57
sean-k-mooney	ya i know. so are you opposed to internally in nova combining them for Train	19:57
mriedem	dansmith: without digging up old review comments and ML thread conversations, i want to say my recollection was (1) counting usage from placement has at least 3 differences in behavior from legacy counting - which are documenting in the config option help text and (2) the main benefit is for multi-cell deployments, of which there are few,	19:57
mriedem	and (3) it landed late in stein,	19:57
sean-k-mooney	when counting using placment	19:57
mriedem	so to reduce the risk on non-multi-cell deployments with the behavior change, default to legacy counting	19:57
mriedem	melwitt: ^ is that what you remember?	19:58
melwitt	it landed early in train, the data migration landed late in stein	19:58
dansmith	oh, I thought it was stein	19:58
mriedem	i thought it was stein too...	19:58
dansmith	in that case, can't default it in train	19:58
* dansmith has to go to a thing		19:58
mriedem	https://review.opendev.org/#/c/638073/ it was train	19:59
mriedem	time flies	19:59
sean-k-mooney	well if we cant default it in train then we still need to supprot it in train with pcpu in plamcent right	19:59
melwitt	mriedem: but yeah I think the combo of all those reasons were why we default to false	19:59
melwitt	big delta from legacy counting and only multi-cell ppl likely to "need" it (for the down cell resilience). so we chose not to impose the big delta in behavior on the majority who don't need down cell resilience	20:00
sean-k-mooney	so do we just want to add another line here https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L2356 to make it work by combining them?	20:01
sean-k-mooney	*them being PCPUs	20:01
melwitt	mriedem pointed that out earlier	20:01
mriedem	to be fair though, some of those differences in behavior in counting also changed since i think ocata or pike	20:01
mriedem	when we moved to counting in general and dropped reservations	20:01
mriedem	and no one apparently noticed until we noticed in stein :)	20:01
sean-k-mooney	yes im wondering if we should comment tha ton stephens patch as the way forward so he can do it tomorrow	20:01
melwitt	mriedem: I think there was only one change in pike, doesn't leave room for a revert resize, IIRC (I documented all of it on the review comment)	20:02
mriedem	sean-k-mooney: i think one could say, "one way forward is ...."	20:02
melwitt	the new stuff, there's way more deltas than that, a big laundry list	20:02
mriedem	what is a laundry list anyway? 1. put stuff in washer and wash, 2. put stuff in dryer, 3. fold. wouldn't a more accurate list be a grocery list?	20:03
sean-k-mooney	mriedem: ok but stephen isnt here and i wanted to provide a summary to him but ill just tell him to read scollback in the morning	20:03
melwitt	mriedem: hah	20:03
melwitt	I dunno where that saying comes from actually	20:03
mriedem	sean-k-mooney: whatever, that's better than nothign	20:03
mriedem	https://english.stackexchange.com/questions/437507/what-is-the-origin-of-the-phrase-laundry-list	20:04
melwitt	I still haven't wrapped my head around what "combining" the resource classes means and whether/how it's different from legacy counting	20:04
mriedem	it wouldn't be different from legacy counting	20:04
melwitt	I mean, I know it means VCPU + PCPU	20:04
melwitt	ok	20:04
mriedem	so i think the summary is,	20:04
mriedem	1. with legacy counting, pcpu is counted the same since we use instance.vcpus which comes from the flavor	20:05
mriedem	2. with placement counting, we'd have to combine VCPU and PCPU usage to match ^	20:05
dansmith	3. neither are actually right	20:05
mriedem	3. long-term combining them goes against the goal of separate them as countable trackable resources	20:05
mriedem	right	20:05
mriedem	so the immediate question is, is #2 good enough for train with punting sorting out #3 to the future	20:06
sean-k-mooney	so this is a bit of nova i dont really fully understand. melwitt is there anythin we can do before unified limts to supprot not combining them if that is prefrable	20:07
sean-k-mooney	or do we just wati for unifed limits to do #3	20:07
melwitt	I guess if it's not worse than what's possible with legacy counting (inability to separate) then the hack doesn't sound like a huge issue to me. it would obviously need a TODO on it to get rid of it when unified limit support exists and is enabled	20:07
melwitt	sean-k-mooney: nothing reasonable, really. you'd have to add a new quota config and resource for "pcpu_whatever" and use that to let people set the limit separately. because until unified limits, there would be no way for nova to consume the unified limit for PCPU that a person sets in keystone	20:09
sean-k-mooney	right and we dont really want to do that because techdebth	20:10
efried	From where I sit (outside the swirling maelstrom of actual understanding) it sounds like we've had a pretty big imbalance "forever" where we've been counting dedicated and virtual cpus "the same" from a quota perspective. If so, we're not making it worse by continuing to do that, just now we're doing it with a separate resource class for the former.	20:10
melwitt	yeah, that would be a big tech debt	20:10
melwitt	whereas the hack of combining VCPU and PCPU doesn't sound like big debt to me, unless I've missed something more complex about it	20:11
melwitt	"small debt"	20:11
sean-k-mooney	it should be small to do i think. i assume its just the line mriedem linked but this code is new to me	20:13
mriedem	plus functional testing	20:13
sean-k-mooney	we need to haneld this branch too https://github.com/openstack/nova/blob/f4ca3e70852c0a7ed7904a9f2d7177c9118d3d1c/nova/scheduler/client/report.py#L2344	20:13
mriedem	i'd want to see a functional test that creates a server that allocates PCPU inventory and then assert that cores usage for that tenant is incremented	20:13
* melwitt nods		20:14
sean-k-mooney	i think stephen has a function test for the first half so the quota check shoudl not be too hard to add	20:14
sean-k-mooney	ok so he has functional test in the reshap patch at a minium	20:17
sean-k-mooney	ok yes he add more here https://review.opendev.org/#/c/671801/43/nova/tests/functional/libvirt/test_numa_servers.py	20:18
sean-k-mooney	so that proably where it makes sense to ad the test	20:18
*** nweinber_ has joined #openstack-nova		20:28
*** spatel has joined #openstack-nova		20:31
*** spatel has quit IRC		20:35
*** ociuhandu has joined #openstack-nova		20:37
mriedem	artom: https://review.opendev.org/#/c/640021/48	20:41
mriedem	i probably won't get to the functional test patch tonight	20:45
sean-k-mooney	mriedem: for what its worth i did test teh [upgrade_levels]/compute=stein case prviously	20:46
sean-k-mooney	i can test it again tomorrow im more or less done for the day	20:47
sean-k-mooney	btu if either node has [upgrade_levels]/compute=stein we endup with the stien behavior	20:47
sean-k-mooney	the migration successes as long as the cores used in the souce host exist on the dest	20:47
sean-k-mooney	but no xml is updated	20:47
mriedem	so it goes back to the bug behavior	20:47
mriedem	right?	20:47
sean-k-mooney	yep	20:48
sean-k-mooney	it goes back to the current master/stien behavior	20:48
mriedem	yeah i don't know that we really need to bend over backward to try and detect that from conductor	20:48
mriedem	if you're computes are fully upgraded and restarted and reporting as train service versions you should have also removed any manual pins	20:49
sean-k-mooney	if you dont have can_live_migrate_numa set we will block it anyway	20:49
mriedem	*your	20:49
mriedem	not with the new checks	20:49
mriedem	we would totally ignore can_live_migrate_numa	20:49
mriedem	as long as all computes are reporting train	20:49
sean-k-mooney	i thik we have a min compute servcie check too before we ignore it	20:50
sean-k-mooney	oh thats the compute service	20:50
sean-k-mooney	not the RPC	20:50
mriedem	if we pass that check we ignore the config	20:50
mriedem	right	20:50
sean-k-mooney	well ya i agree we likely dont need to bend over backwards to check this in the conductor	20:51
sean-k-mooney	anyway my concentration is totally gone so ill call it a day.	20:52
*** ociuhandu has quit IRC		20:52
sean-k-mooney	artom: ill test v 49 tommorow when you have addressed mriedem comments	20:52
*** ociuhandu has joined #openstack-nova		20:53
*** ociuhandu has quit IRC		20:57
*** luksky has quit IRC		21:00
mriedem	melwitt: brinzhang has been asking me to review this api change https://review.opendev.org/#/c/674243/ but i haven't had time with the bw provider migrate and numa lm stuff - maybe you can peruse it?	21:10
artom	mriedem, ack, thanks!	21:11
artom	(The func test can wait until after FF if that's what it comes to, right?)	21:11
artom	I mean, it's not plan A, but...	21:11
mriedem	my priority is sean's manual testing and the gate job	21:13
mriedem	brinzhang: a follow up is not the correct answer to all issues https://review.opendev.org/#/c/679413/4	21:13
melwitt	mriedem: will do	21:19
*** BjoernT has quit IRC		21:20
openstackgerrit	Dustin Cowles proposed openstack/nova master: Use SDK for add/remove instance info from node https://review.opendev.org/659691	21:25
openstackgerrit	Dustin Cowles proposed openstack/nova master: Use SDK for getting network metadata from node https://review.opendev.org/670213	21:25
*** hemna has joined #openstack-nova		21:29
*** henriqueof1 has joined #openstack-nova		21:30
*** henriqueof has quit IRC		21:31
*** hemna has quit IRC		21:34
*** mriedem has quit IRC		21:37
*** nweinber_ has quit IRC		21:43
*** aloga has quit IRC		21:45
openstackgerrit	Dustin Cowles proposed openstack/nova master: Use SDK for add/remove instance info from node https://review.opendev.org/659691	21:50
openstackgerrit	Dustin Cowles proposed openstack/nova master: Use SDK for getting network metadata from node https://review.opendev.org/670213	21:50
*** N3l1x has quit IRC		21:50
*** hemna has joined #openstack-nova		21:57
*** adriant has quit IRC		21:59
*** mriedem has joined #openstack-nova		22:01
*** slaweq has quit IRC		22:21
*** panda has quit IRC		22:26
*** panda has joined #openstack-nova		22:28
*** aloga has joined #openstack-nova		22:29
*** ociuhandu has joined #openstack-nova		22:30
*** ociuhandu has quit IRC		22:35
*** mriedem has quit IRC		22:41
*** brault has joined #openstack-nova		22:43
*** avolkov has quit IRC		22:47
*** hemna has quit IRC		22:50
*** TxGirlGeek has joined #openstack-nova		22:57
*** brault has quit IRC		22:59
*** rcernin has joined #openstack-nova		22:59
*** henriqueof1 has quit IRC		23:02
*** tkajinam has joined #openstack-nova		23:03
*** macz has quit IRC		23:05
*** slaweq has joined #openstack-nova		23:11
*** slaweq has quit IRC		23:15
*** spatel has joined #openstack-nova		23:29
*** TxGirlGeek has quit IRC		23:30
*** spatel has quit IRC		23:34
openstackgerrit	Merged openstack/nova master: api-ref: fix server topology "host_numa_node" field param name https://review.opendev.org/680775	23:34
*** mlavalle has quit IRC		23:37
*** adriant has joined #openstack-nova		23:39
*** mtreinish has quit IRC		23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!