Tuesday, 2020-02-11

*** spatel has joined #openstack-nova		00:01
*** slaweq has quit IRC		00:04
*** spatel has quit IRC		00:06
*** Liang__ has joined #openstack-nova		00:20
*** spatel has joined #openstack-nova		00:34
*** mlavalle has quit IRC		00:47
*** yedongcan has joined #openstack-nova		01:05
*** gentoorax has quit IRC		01:06
*** zhanglong has joined #openstack-nova		01:15
*** openstackgerrit has joined #openstack-nova		01:15
openstackgerrit	Merged openstack/nova stable/rocky: Use stable constraint for Tempest pinned stable branches https://review.opendev.org/706716	01:15
*** gentoorax has joined #openstack-nova		01:37
*** david-lyle is now known as dklyle		02:05
openstackgerrit	Huachang Wang proposed openstack/nova-specs master: Use PCPU and VCPU in one instance https://review.opendev.org/668656	02:06
*** psachin has joined #openstack-nova		02:06
*** nweinber has joined #openstack-nova		02:07
*** nweinber has quit IRC		02:19
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Introduce scope_types in os-create-backup https://review.opendev.org/707038	02:21
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add new default roles in os-create-backup policies https://review.opendev.org/707039	02:25
*** nicolasbock has quit IRC		02:28
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Introduce scope_types in os-console-output https://review.opendev.org/707040	02:36
*** ileixe has joined #openstack-nova		02:37
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add new default roles in os-console-output policies https://review.opendev.org/707041	02:41
*** gyee has quit IRC		03:11
*** gentoorax has quit IRC		03:14
ileixe	Hi Nova,	03:22
ileixe	Does anyone know the current status of https://blueprints.launchpad.net/nova/+spec/ip-aware-scheduling-placement?	03:22
*** spatel has quit IRC		03:23
ileixe	I thought if nova does not aware of neutron segment, routed network does not work and the spec say it's not implemented yet.	03:23
ileixe	Does it mean routed network not yet implemented?	03:23
*** hongbin has joined #openstack-nova		03:48
*** yedongcan has quit IRC		03:50
*** gentoorax has joined #openstack-nova		03:52
*** Sundar has quit IRC		03:53
*** udesale has joined #openstack-nova		04:19
*** mkrai has joined #openstack-nova		04:37
*** hongbin has quit IRC		04:41
*** vesper has quit IRC		05:18
*** vesper11 has joined #openstack-nova		05:23
*** evrardjp has quit IRC		05:34
*** evrardjp has joined #openstack-nova		05:34
alex_xu	ileixe: I think it is implemented	05:50
ileixe	alex_xu: Thanks for response. Do you mean nova lookup segment then?	05:51
*** igordc has joined #openstack-nova		05:51
alex_xu	ileixe: no, the neutron side will do that, and report the resource to the placement	05:51
alex_xu	ileixe: https://blueprints.launchpad.net/neutron/+spec/routed-networks	05:52
alex_xu	ileixe: I never try that feature, but hope ^ that can help you	05:52
*** mkrai has quit IRC		05:52
alex_xu	ileixe: also this one https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html	05:53
*** mkrai has joined #openstack-nova		05:54
ileixe	alex_xu: Hm.. maybe I understand what neutron does for routed network	05:56
ileixe	What I do not understand is.. how nova use resource provider which neutron gave	05:57
alex_xu	ileixe: maybe i'm wrong, i saw the note from the matt, looks like neutorn side impelemnt, but yes, nova side do nohting now	05:57
ileixe	iirc, matt you said is the owner of the commit (https://review.opendev.org/#/c/656885/) right?	05:58
alex_xu	ileixe: yes, he isn't working on that anymore	05:59
ileixe	And the commit does not implemented which I thought for routed network.	05:59
ileixe	So.. I assume that routed network does not work (especially related to nova scheduling)	05:59
*** igordc has quit IRC		06:00
alex_xu	ileixe: yes, I think you are right	06:00
ileixe	alex_xu: Hm... thanks for the answer..	06:00
alex_xu	np	06:02
*** mkrai has quit IRC		06:09
*** ratailor has joined #openstack-nova		06:21
*** gentoora- has joined #openstack-nova		06:27
*** gentoorax has quit IRC		06:27
*** gentoora- is now known as gentoorax		06:27
*** ccamacho has quit IRC		06:50
*** lpetrut has joined #openstack-nova		07:03
*** mkrai has joined #openstack-nova		07:05
*** maciejjozefczyk has joined #openstack-nova		07:08
*** damien_r has joined #openstack-nova		07:18
*** damien_r has quit IRC		07:23
*** yedongcan has joined #openstack-nova		07:36
*** udesale has quit IRC		07:46
*** udesale has joined #openstack-nova		07:47
*** imacdonn has quit IRC		07:53
*** imacdonn has joined #openstack-nova		07:53
*** mkrai has quit IRC		08:02
*** mriosfer has joined #openstack-nova		08:22
mriosfer	Hi guys, after change in the flavor and image the vram value, in our openstack queens and rebuild the instance i saw in the virsh xml that its correctly added to vm config "<model type='qxl' ram='65536' vram='131072' vgamem='16384' heads='1' primary='yes'/>" with windows with dxdiag detect 0MB vram. Is it correct? Should be running?	08:24
*** tosky has joined #openstack-nova		08:27
*** priteau has joined #openstack-nova		08:28
*** tesseract has joined #openstack-nova		08:29
openstackgerrit	Brin Zhang proposed openstack/nova master: Expose instance action event details out of the API https://review.opendev.org/694430	08:33
openstackgerrit	Brin Zhang proposed openstack/nova master: Add server actions v82 samples test https://review.opendev.org/706251	08:35
*** mkrai has joined #openstack-nova		08:37
openstackgerrit	Brin Zhang proposed openstack/nova master: Add instance actions v82 samples test https://review.opendev.org/706251	08:38
*** jcosmao has joined #openstack-nova		08:40
*** ccamacho has joined #openstack-nova		08:46
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Merge qos related renos for Ussuri https://review.opendev.org/706766	08:49
gibi	bauzas: I'm here if you want to chat about NUMA	08:50
bauzas	gibi: 10 mins please but yeah :)	08:50
gibi	bauzas: sure	08:50
*** ralonsoh has joined #openstack-nova		08:52
*** rpittau\|afk is now known as rpittau		08:55
*** martinkennelly has joined #openstack-nova		09:01
*** ccamacho has quit IRC		09:01
bauzas	ok, processed the whole bunch of comments for the NUMA in Placement spec...	09:01
* bauzas grabs some coffee and pings gibi later		09:01
gibi	ok	09:02
gibi	I just realized tha sean-k-mooney and efried also commented while I was away... reading them...	09:02
*** elod has quit IRC		09:03
*** amoralej\|off is now known as amoralej		09:03
*** slaweq has joined #openstack-nova		09:07
*** mkrai has quit IRC		09:09
*** mkrai_ has joined #openstack-nova		09:10
bauzas	gibi: I'm back	09:10
bauzas	gibi: FWIW efried mostly replied on your concerns	09:11
bauzas	the spec needs another round of rewrites so I'm starting it now	09:11
*** dtantsur\|afk is now known as dtantsur		09:12
*** elod has joined #openstack-nova		09:12
openstackgerrit	HYSong proposed openstack/nova master: Update request_specs.availability_zone during live migration https://review.opendev.org/706647	09:12
gibi	bauzas: I'm still reading efried's comments, I will get back to you soon	09:12
bauzas	cool	09:13
*** ociuhandu has joined #openstack-nova		09:28
*** takamatsu has joined #openstack-nova		09:28
openstackgerrit	HYSong proposed openstack/nova master: Update request_specs.availability_zone during live migration https://review.opendev.org/706647	09:28
*** ociuhandu has quit IRC		09:29
*** martinkennelly has quit IRC		09:33
openstackgerrit	HYSong proposed openstack/nova master: Update request_specs.availability_zone during live migration https://review.opendev.org/706647	09:33
openstackgerrit	HYSong proposed openstack/nova master: Update request_specs.availability_zone during live migration https://review.opendev.org/706647	09:35
*** psachin has quit IRC		09:42
*** derekh has joined #openstack-nova		09:44
bauzas	gibi: unrelated, see my last comment on https://review.opendev.org/#/c/706647/5	09:44
bauzas	I know not a lot of folks know about AZs, so I want to make sure that all of us as cores know about the design consensus :)	09:45
gibi	bauzas: ack, queued for double check	09:45
bauzas	no rush, it's more for a knowledge	09:47
*** psachin has joined #openstack-nova		09:47
*** ivve has joined #openstack-nova		09:53
gibi	bauzas: replyied in the NUMA spec. I agree Eric's proposals in his reply. I'm still a bit open in the upgrade check case but that is a low prio issue	09:53
bauzas	cool, I'm just modifying things as we speak	09:53
*** psachin has quit IRC		09:54
gibi	bauzas: you did a great job pulling the piece together I think I see the light of the end of this tunnel	09:54
*** davidsha has joined #openstack-nova		09:54
*** slaweq has quit IRC		09:56
openstackgerrit	jichenjc proposed openstack/nova master: set default value to 0 instead of '' https://review.opendev.org/706730	10:02
*** psachin has joined #openstack-nova		10:02
gibi	bauzas: https://review.opendev.org/#/c/706647 thanks for chiming in I agree with you. I missed the fact that we don't allow moving instance between AZs (in recent microversions)	10:04
bauzas	gibi: no worries, again, my ping is just for making sure we share our knowledge	10:06
*** ociuhandu has joined #openstack-nova		10:06
bauzas	I'm always on and off upstream, so the more people know about AZs, the better it will be	10:06
*** xek has joined #openstack-nova		10:07
*** ociuhandu has quit IRC		10:09
*** ociuhandu has joined #openstack-nova		10:09
*** mkrai__ has joined #openstack-nova		10:10
*** ociuhandu has quit IRC		10:10
*** ociuhandu has joined #openstack-nova		10:11
*** mkrai_ has quit IRC		10:13
*** ociuhandu has quit IRC		10:18
*** psachin has quit IRC		10:20
*** ociuhandu has joined #openstack-nova		10:20
*** psachin has joined #openstack-nova		10:21
*** ociuhandu has quit IRC		10:21
*** martinkennelly has joined #openstack-nova		10:21
*** ociuhandu has joined #openstack-nova		10:26
*** ociuhandu has quit IRC		10:27
*** priteau has quit IRC		10:28
*** ociuhandu has joined #openstack-nova		10:29
*** ociuhandu has quit IRC		10:30
bauzas	shit, I lack time for fixing all the comments	10:42
bauzas	gibi: I think we need efried and sean-k-mooney around this afternoon for discussing the upgrade pre-flight check and the Ussuri condition	10:45
bauzas	I'm personnally in favor of keeping NUMA workloads in Ussuri as they are	10:45
bauzas	ie. no migration asked	10:46
bauzas	and a pre-flight check pre-Victoria	10:46
bauzas	because if not, that's a chicken-and-egg issue	10:46
gibi	sure lets see what they think	10:50
*** priteau has joined #openstack-nova		11:03
*** ociuhandu has joined #openstack-nova		11:04
*** ociuhandu has quit IRC		11:09
*** zhanglong has quit IRC		11:12
*** ociuhandu has joined #openstack-nova		11:13
*** zhanglong has joined #openstack-nova		11:15
*** ociuhandu has quit IRC		11:18
*** mkrai__ has quit IRC		11:19
*** rpittau is now known as rpittau\|bbl		11:21
*** fungi has quit IRC		11:24
*** fungi has joined #openstack-nova		11:27
*** yedongcan has left #openstack-nova		11:37
*** tbachman has quit IRC		11:46
*** ociuhandu has joined #openstack-nova		11:51
*** ociuhandu has quit IRC		11:57
*** ociuhandu has joined #openstack-nova		12:01
*** mriedem has joined #openstack-nova		12:01
*** nicolasbock has joined #openstack-nova		12:02
*** amoralej is now known as amoralej\|lunch		12:04
*** udesale_ has joined #openstack-nova		12:11
*** udesale has quit IRC		12:13
openstackgerrit	Stephen Finucane proposed openstack/nova master: WIP: api: Add support for extra spec validation https://review.opendev.org/704643	12:14
*** jaosorior has joined #openstack-nova		12:15
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Support unshelve with qos ports https://review.opendev.org/704759	12:24
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Enable unshelve with qos ports https://review.opendev.org/705475	12:25
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Merge qos related renos for Ussuri https://review.opendev.org/706766	12:27
*** dtantsur is now known as dtantsur\|brb		12:27
*** ociuhandu has quit IRC		12:31
*** jaosorior has quit IRC		12:32
*** ociuhandu has joined #openstack-nova		12:34
openstackgerrit	Stephen Finucane proposed openstack/nova master: WIP: api: Add support for extra spec validation https://review.opendev.org/704643	12:36
frickler	hello nova, the new alembic==1.4.0 is causing this failure, please have a look https://c264ca14759376f2bea5-6cd49316b9babb1f90743cae9cd67f9e.ssl.cf2.rackcdn.com/705380/10/check/cross-nova-py36/8bcec81/testr_results.html , I'll pin to the previous version for now, see https://review.opendev.org/705380	12:38
*** mdbooth_ has quit IRC		12:45
*** mkrai__ has joined #openstack-nova		12:46
*** damien_r has joined #openstack-nova		12:55
*** mriedem has quit IRC		13:00
*** adriant has quit IRC		13:02
*** adriant has joined #openstack-nova		13:02
*** mkrai__ has quit IRC		13:08
*** decrypt has joined #openstack-nova		13:09
*** jmlowe has joined #openstack-nova		13:10
*** tbachman has joined #openstack-nova		13:10
*** ratailor has quit IRC		13:11
openstackgerrit	Stephen Finucane proposed openstack/nova master: WIP: api: Add support for extra spec validation https://review.opendev.org/704643	13:14
*** takamatsu has quit IRC		13:14
*** amoralej\|lunch is now known as amoralej		13:18
*** rpittau\|bbl is now known as rpittau		13:22
efried	bauzas, gibi: o/	13:23
efried	What's the question?	13:23
gibi	efried: my remaning open question is https://review.opendev.org/#/c/552924/16/specs/ussuri/approved/numa-topology-with-rps.rst@223 about upgrade checks	13:25
gibi	but I have to jump on a call for the next 1 and a half hour so talk to you later	13:25
*** derekh has quit IRC		13:27
*** nweinber has joined #openstack-nova		13:28
*** jmlowe has quit IRC		13:30
*** zhanglong has quit IRC		13:33
*** zhanglong has joined #openstack-nova		13:34
bauzas	efried: gibi: will ping you later, just updating the spec now	13:36
*** takamatsu has joined #openstack-nova		13:37
*** jmlowe has joined #openstack-nova		13:37
alex_xu	bauzas: I leave one also https://review.opendev.org/#/c/552924/16/specs/ussuri/approved/numa-topology-with-rps.rst@421	13:47
*** dtantsur\|brb is now known as dtantsur		13:49
bauzas	ack	13:50
bauzas	alex_xu: good point, I was thinking of the upgrade issue when rolling the compute upgrades	13:52
alex_xu	good news, we can just copy the way of standard-cpu-resource-tracking way	13:53
alex_xu	probably need another workaround config option for disable the fallback placement query	13:53
bauzas	I'll sat this	13:53
bauzas	say*	13:54
*** nweinber has quit IRC		13:58
*** tkajinam has joined #openstack-nova		13:59
*** derekh has joined #openstack-nova		14:01
*** nweinber has joined #openstack-nova		14:07
*** jmlowe has quit IRC		14:09
*** jmlowe has joined #openstack-nova		14:11
*** zhanglong has quit IRC		14:28
*** takamatsu has quit IRC		14:29
*** tbachman has quit IRC		14:33
efried	bauzas, gibi: responded.	14:39
bauzas	dammit, need to Ctrl-R	14:39
bauzas	I'm litterally writing live	14:39
efried	bauzas: shouldn't be anything earth-shattering in my response.	14:40
bauzas	eek, sean-k-mooney did too	14:40
* bauzas needs to look again at all comments and process them		14:40
*** mriosfer has quit IRC		14:40
bauzas	efried: for the placement-ish syntax, I'm all for docs	14:42
bauzas	and not code	14:42
bauzas	FWIW	14:42
bauzas	like, you can do it but you can mess it	14:42
bauzas	your dog	14:42
bauzas	anyway, continuing to write	14:43
efried	IMO the only reason we shouldn't block placement-ish syntax is because we might miss something in our translation utility and have to provide a workaround until we fix it.	14:43
efried	even there be tygers.	14:44
*** Liang__ is now known as LiangFang		14:44
LiangFang	gibi: hi gibi, regarding https://review.opendev.org/#/c/689070/	14:45
LiangFang	gibi: how do you think to set trait for the host machine, and specify trait in flavor extra spec?	14:47
LiangFang	gibi: so the guest can be scheduled to the host with cache capability	14:47
*** vishalmanchanda has joined #openstack-nova		14:51
*** ociuhandu has quit IRC		14:57
*** elod has quit IRC		14:57
*** elod has joined #openstack-nova		14:58
*** artom has joined #openstack-nova		15:04
*** bbowen has joined #openstack-nova		15:07
*** artom has quit IRC		15:09
*** takamatsu has joined #openstack-nova		15:11
*** artom has joined #openstack-nova		15:12
*** artom has quit IRC		15:12
*** artom has joined #openstack-nova		15:13
openstackgerrit	Sylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs https://review.opendev.org/552924	15:22
*** mlavalle has joined #openstack-nova		15:22
bauzas	efried: gibi: sean-k-mooney: alex_xu: thanks for the comments, here is another baking of NUMA topology spec https://review.opendev.org/552924	15:22
gibi	LiangFang: if we only care about having a cache configured on the host then it can be a capability represented by a trait. If we also needs to think about the available size of the caches then it is a resource	15:23
*** spatel has joined #openstack-nova		15:24
gibi	bauzas, efried: ack, will check back soon	15:24
*** ociuhandu has joined #openstack-nova		15:28
*** tkajinam has quit IRC		15:29
stephenfin	dansmith: Any particular reason we don't use entrypoints for custom scheduler filters? Is it something we could/should do?	15:39
dansmith	don't we already?	15:40
dansmith	bauzas:	15:40
stephenfin	Custom scheduler drivers, yes. Not filters for the filter_scheduler though	15:40
dansmith	I was sure we did	15:40
stephenfin	You've to configure them via a python path in '[filter_scheduler] enabled_filters'	15:40
stephenfin	fwict	15:40
*** mkrai__ has joined #openstack-nova		15:43
*** jmlowe has quit IRC		15:44
*** TxGirlGeek has joined #openstack-nova		15:50
*** udesale_ has quit IRC		15:52
*** udesale_ has joined #openstack-nova		15:53
*** ociuhandu has quit IRC		15:54
*** KeithMnemonic has quit IRC		15:56
*** lpetrut has quit IRC		16:01
spatel	sean-k-mooney: morning	16:02
spatel	let me know if you around i want to share my load-test result.	16:02
*** udesale_ has quit IRC		16:06
gibi	bauzas: thanks for the update on the NUMA spec it looks good to me now	16:06
*** udesale_ has joined #openstack-nova		16:07
*** ociuhandu has joined #openstack-nova		16:11
*** ociuhandu has quit IRC		16:12
*** ociuhandu has joined #openstack-nova		16:12
*** ivve has quit IRC		16:13
*** udesale_ has quit IRC		16:16
sean-k-mooney	bauzas: im skiming through it now but ya im more or less happy with it. ill proably +1 it when i finish this pass	16:24
bauzas	dansmith: sorry was AFK	16:26
bauzas	stephenfin: yeah you have to set a specific option	16:27
dansmith	bauzas: np. I thought our scheduler filter interface was already using entry points, but stephenfin says it's not.. I'm sure he's right I was just poking you in case you were also surprised	16:27
bauzas	https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.available_filters	16:28
bauzas	stephenfin: ^	16:28
stephenfin	right, matches up with what I suspected so	16:28
stephenfin	bauzas: any reason not to deprecate that an ask people to use entrypoints instead?	16:28
stephenfin	given I'll be asking them to do that for extra spec validators	16:28
bauzas	well, I don't have any opinion	16:29
bauzas	we had a lot of entrypoints	16:29
bauzas	so for sure we could just use another one	16:29
sean-k-mooney	spatel: did you see an improvement?	16:30
bauzas	stephenfin: dansmith: that's how Nova knows about filters https://github.com/openstack/nova/blob/master/nova/loadables.py#L78	16:32
*** gyee has joined #openstack-nova		16:32
bauzas	for example you can ask to have a new custom filter by doing something like scheduler_available_filters = myownproject.scheduler.filters.climate_filter.ClimateFilter	16:33
*** dtantsur is now known as dtantsur\|afk		16:34
efried	bauzas, gibi, sean-k-mooney: I don't understand the "fallback" thing.	16:35
efried	https://review.opendev.org/#/c/552924/17/specs/ussuri/approved/numa-topology-with-rps.rst@516	16:36
sean-k-mooney	efried: its mimicing PCPUS	16:36
sean-k-mooney	basically when procesing a vm request with a numa toplogy if and only if the placement allocation candiates responce is empty we will fall back to the non numa aware query and let the numa toplogy filter elimidate the host if it cant fit	16:37
bauzas	efried: try: <call placement asking for NUMA-aware instances> except NoValidHosts: <call Placement like in Train>	16:38
sean-k-mooney	yes but without using excption for control flow :P	16:38
efried	Sorry, let me clarify	16:38
bauzas	efried: sean-k-mooney: I could have provided a link to the implementation instead of the spec :)	16:39
efried	I understand the what/how. I don't understand the why.	16:39
bauzas	efried: because,	16:39
bauzas	say a rolling upgrade	16:39
sean-k-mooney	efried: so we dont need to have a global flag in the schduler to turn on the translation	16:39
*** tbachman has joined #openstack-nova		16:39
bauzas	or just a Ussuri cloud with only one node being transformed	16:40
bauzas	then we could have some problems	16:40
sean-k-mooney	the same reason we did it for PCPUs to make upgrdade simpler	16:40
bauzas	efried: have you seen the Upgrade Impact section already ?	16:40
sean-k-mooney	bauzas: ya the single node case is basicaly during a rolling upgrade	16:40
bauzas	I tried to explain the why there	16:40
efried	Because what's going to happen here is:	16:41
efried	I'll upgrade to Ussuri, where my workaround option is set to not reshape;	16:41
efried	and all my flavors will continue to work, only a teeny bit slower because I'll get empty allocation candidates on the first query 100% of the time.	16:41
efried	And I won't notice, so I'll never flip the workaround off.	16:41
efried	so,	16:42
efried	Am I misunderstanding what release/combination will have this fallback mechanism in play?	16:42
gibi	efried: I understand the need to push the operators to switch. But pushing them to do the switch right at the upgrade feels too much to me. It would like an ultimate. "When you upgrade to Ussuri you will loose all the NUMA aware capacity of your cloud, but you can get them back iff you do the reshape one every NUMA aware compute, but you should not do that all at once as that will overload placement."	16:43
gibi	s/one/on/	16:43
dansmith	gibi: you think that's okay?	16:44
efried	Well, I don't buy that it would overload placement. The reshape will be one query per compute host.	16:44
sean-k-mooney	efried: i would hope we remove it when we remove the config option	16:44
gibi	dansmith: I think it is not OK to loose all the NUMA aware capacity at upgrade	16:44
dansmith	gibi: okay you were describing the problem, not the goal, is that right?	16:45
sean-k-mooney	we could remove it when we change the default but i would hope it will not last more then 2 releases	16:45
efried	Then we kind of have to position this as a "tech preview" or experimental change.	16:45
gibi	dansmith: my goal is not to loose the capacity at upgrade. I dont like the the quoted part	16:45
efried	It just seems pretty pointless to me.	16:45
dansmith	gibi: ++	16:45
efried	because we're going to do a bunch of work to enable something that nobody is going to use.	16:46
sean-k-mooney	well since we are not enableing it by default in ussuri it kind of will be	16:46
sean-k-mooney	but in vitora i think we should enable numa reporting by default. i would be happy to do that in ussuri but that might be a bit agressive	16:46
*** slaweq has joined #openstack-nova		16:47
efried	I buy that we can't enable it by default as long as we can't fit non-NUMA-aware workloads onto NUMA-modeled hosts.	16:47
sean-k-mooney	efried: well actully im not sure i agree with that but thats a sperate dicussion	16:48
efried	but with this fallback mechanism as designed, we're giving operators NO reason to switch over.	16:48
sean-k-mooney	efried: well with my downstream hat on i will be pushing to make numa on the default for our next lts	16:48
efried	which means we might as well not bother with this incremental improvement. We might as well just wait until we've solved the fitting problem.	16:48
sean-k-mooney	efried: we have said that for 4+ releases	16:49
sean-k-mooney	this will imporve schdulign time and reduce the chance of racecs for numa instnaces	16:49
efried	no	16:49
efried	it won't	16:49
sean-k-mooney	so it is still usful	16:49
efried	it will do exactly nothing	16:49
efried	except inject an extra placement call into every scheduling request.	16:49
sean-k-mooney	it will because non numa instance will ignore numa nosts	16:50
sean-k-mooney	and numa instance will ignore non numa nosts	16:50
efried	there will be	16:50
efried	no	16:50
efried	numa	16:50
efried	hosts.	16:50
sean-k-mooney	people will turn this on	16:50
sean-k-mooney	i can almost guarentee the then first release that we productise downstream with this feature will have it enabeld by default regardless of the upstream defaul	16:51
efried	what, do your downstream releases force only flavors with guest numa topos?	16:51
artom	sean-k-mooney, I don't see how we can do that...	16:51
artom	Wouldn't that make all computes essentially not usable for non-NUMA instances?	16:52
efried	^	16:52
artom	If they're too big to fit on 1 NUMA node?	16:52
gibi	efried: actually that what happens in my downstream project as we use pinning and huge page	16:52
sean-k-mooney	no they dont but we do force the numa hosts to be partioned	16:52
artom	So?	16:52
efried	artom: even if they're not too big. Because we're adding the HW_NUMA_ROOT trait.	16:52
sean-k-mooney	so all the hosts that will run numa workloads will have the feature turned on and the non numa host will have it off	16:52
artom	They'd still all suddenly NUMA-exposing	16:52
artom	sean-k-mooney, uh, so that's not "on by default as soon as they upgrade" :)	16:53
artom	efried, oh right	16:53
sean-k-mooney	it will be enabled in roles that configure the host for dpdk, hugepages or pinning	16:54
sean-k-mooney	i would argue it should be set to on for all host and provide a way for them to opt out if they need the giant vm case	16:54
sean-k-mooney	non of our telco customer need that case	16:54
artom	sean-k-mooney, our telco customers are not 100% of openstack users	16:55
artom	Would CERN want that, for example? :)	16:55
sean-k-mooney	i dont think so no	16:55
sean-k-mooney	i think they understand the perfomance cost and would use numa instances	16:55
sean-k-mooney	cern are not going to trow away 30% of there compute performance	16:56
sean-k-mooney	plus they have ironci if they really need to alocate all the resouce of a full host to an instance	16:56
artom	sean-k-mooney, maybe, maybe not. I don't know how they, or anyone else how isn't a RH telco customer, operate. Which is why I'm weary of making this opt-out, and would feel safer making it opt-in.	16:57
artom	If our deployment tooling wants to turn it on by default in some cases, that's cool	16:57
artom	But not a Nova default	16:57
sean-k-mooney	i think as proposed we get the best of both worlds	16:58
sean-k-mooney	the fallback mechanisum will mean that it will just work if you dont set anything	16:58
bauzas	sorry you lost my attention by not highlighting me	16:58
sean-k-mooney	and we can add a nova status check to warn that you shoudl enable this before upgrading to Victora	16:58
artom	sean-k-mooney, wait, the current proposed thing is disable_placement_numa_reporting = <bool> (default True for Ussuri)	16:59
sean-k-mooney	yes	16:59
artom	Which is what I'm advocating for	16:59
sean-k-mooney	it will be disabled in ussuri	16:59
artom	OK, so we agree :)	16:59
sean-k-mooney	and hopefully enabled by default in Victoria	16:59
dansmith	and that means what for flavors that currently have numa?	16:59
*** rpittau is now known as rpittau\|afk		17:00
sean-k-mooney	dansmith: the inital placement query will be empty	17:00
sean-k-mooney	then we fall back to the current query	17:00
artom	dansmith, IIUC same as what we did for PCPUs - try the new Placement query, if it comes back empty, try the legacy one	17:00
sean-k-mooney	and leave it to the numa toplogy filter to do all the work	17:00
*** priteau has quit IRC		17:00
sean-k-mooney	so they will just work	17:00
dansmith	ack, yep, I think that's the sane path for U	17:01
sean-k-mooney	stephenfin: by the way are you removing the fallback for PCPUs in U	17:02
sean-k-mooney	stephenfin: that was the plan but i dont think you have had time to work on it	17:02
stephenfin	I could but I was thinking I'd wait another cycle	17:02
dansmith	if it's on by default, could we do the opposite for non-numa flavors? meaning, query with numa_nodes=1 and if we get no options, then try with =2, etc?	17:02
dansmith	until we get a range capability with placement	17:02
stephenfin	To let it bake in more. It's just dead code once the correct config options have been set	17:03
sean-k-mooney	am maybe but the performace of that might not be great	17:03
sean-k-mooney	or acccptable.	17:03
artom	dansmith, that would assign a NUMA topology even if the user didn't request it (explicitly or implicitly), no?	17:03
sean-k-mooney	artom: ya but that would actully be fine	17:04
sean-k-mooney	they expressed no prefernce	17:04
artom	NUMA topologies have a whole bunch of limitations :)	17:04
sean-k-mooney	so we can decided what that is	17:04
sean-k-mooney	artom: you removed the main one e.g. live migration	17:04
artom	True...	17:04
openstackgerrit	Douglas Mendizábal proposed openstack/nova master: Allow TLS ciphers/protocols to be configurable for console proxies https://review.opendev.org/679502	17:04
dansmith	artom: yes, it would, but I definitely think that we should be getting to the place where we don't just pretend numa doesn't exist	17:05
artom	I'd still feel uncomfortable springing that on people	17:05
dansmith	artom: so we want people to not have to worry about it in detail, but not just let them totally ignore it, IMHO	17:05
sean-k-mooney	i think that is something we shoudl consider in V when we consider changing the default	17:06
dansmith	yup	17:06
sean-k-mooney	if there is a sensable way to express it to placmenet or a sensible way to make it work form our side we should explore it	17:06
dansmith	what I don't think we should do, is continue to have two distinct ways of running instances forever	17:06
sean-k-mooney	i agree with that	17:07
artom	Same here	17:07
artom	Though I'm not entirely convinced giving everyone a NUMA topology is the way to do it	17:07
dansmith	similar to cellsv1, that didn't work out well, and we never closed the feature and bug gap for people using it until cellsv2 where we just make everyone use it.. it's a little more overhead, but the gap is way smaller	17:07
dansmith	artom: but everyone _has_ a numa topology	17:08
artom	dansmith, you know what I mean ;)	17:08
dansmith	if we need to retool the numa stuff in nova (and probably placement) then that's what we need to do	17:08
*** martinkennelly has quit IRC		17:08
artom	I think I'd lean towards the can_split stuff in placement	17:08
artom	Though I admittedly have no idea how complicated that would be	17:09
dansmith	yep, it seems like the major barrier here is lack of expressivity with what we ask of placement when we don't care as much	17:09
artom	Like, if the host is NUMA, fine, expose it	17:09
artom	But as you said, being able to say "I don't care about NUMA" would be the best way to solve this, I think	17:10
*** dave-mccowan has joined #openstack-nova		17:10
sean-k-mooney	artom: can split is very ineefincet to implement in sql	17:10
artom	So, I have beef with "placement has to be SQL", but that's just me ;)	17:10
sean-k-mooney	artom: if "i dont care about numa" means we are free to invent numa toploigies then sure	17:10
artom	sean-k-mooney, weeelll... wouldn't that lead to a bunch of packing problems?	17:11
bauzas	folks, I have to disappear	17:11
bauzas	leave comments, disagreements, concers on the spec	17:11
sean-k-mooney	artom: oh it doesnt have to be sql. but it would be very hard to support can split with the current impeneation	17:11
sean-k-mooney	artom: no	17:11
artom	I guess not, if you retry with enough combinations of guest NUMA nodes and CPUs per node	17:11
*** martinkennelly has joined #openstack-nova		17:12
artom	So if you some how end up with NUMA0 with 1 CPU free, and NUMA1 with 3 CPUs free, and boot an instance with 4 CPUs, you would need to retry until you get to the numa0_cpus:1,numa2_cpus=3 combo	17:12
sean-k-mooney	honestly doing 4 placment queries with 1-4 numa nodes is still proably faster then the numa toplogy filter today	17:12
sean-k-mooney	but i dont think this is productive to continue now	17:13
artom	True	17:13
artom	(On the second point)	17:13
dansmith	if, like I said, we had a range of nodes, I would think placement could just loop internally much faster than even our retry process	17:13
dansmith	and just dump us more options in the first go	17:13
sean-k-mooney	dansmith: ya its really a proablem of being able to express our actul constratin to placmenet so they can do an efficent thing	17:14
dansmith	right	17:14
sean-k-mooney	at the moment we are over spcifying and underspecifying at the same time	17:14
artom	dansmith, I think placement would want to now know about NUMA	17:14
artom	And only think of things in terms of generic RPs	17:14
sean-k-mooney	since the way we express the query does not fully match what we need	17:14
artom	*to not know	17:15
dansmith	artom: that's the proposal, AFAIK	17:15
dansmith	artom: to model this in placement as RPs with the relevant hierarchy	17:15
artom	dansmith, how does this fit with your "internal placement retry loop" idea though?	17:15
sean-k-mooney	artom: right but we added some emantic meaning that is not helpful to come concpets in placment.	17:15
dansmith	artom: ?	17:15
artom	dansmith, well, wouldn't placement need to understand what a NUMA node is to do that?	17:16
sean-k-mooney	for example resouces:vcpu=2 must allcaote 2 cpus form the same RP	17:16
sean-k-mooney	artom: no it just need to know that some resouce need to be in the same sub tree	17:16
*** ivve has joined #openstack-nova		17:16
sean-k-mooney	with a parent contiing a opacue trating "HW_NUMA_ROOT"	17:17
artom	sean-k-mooney, I guess I can see that	17:17
dansmith	artom: no, I don't think so, we just need to provide some way to communicate to placement that it can satisfy the required resources by allowing them to exist in multiple parts in the tree, with some minimum amount per provider, with maybe matching ratios of cpus and memory	17:17
dansmith	artom: I don't mean an "&numa_nodes=1-4" level of explicitness (even though I've said that just as an example), but it really just needs to be the generic form of that	17:17
artom	dansmith, I wonder if we could tell placement something like "aggregate_at_root=true", and then it could sum the child RPs resources temporarily when handling that query	17:18
dansmith	artom: we need more than that, I think	17:18
sean-k-mooney	no that is not enouch	17:18
dansmith	artom: we need to be able to say "don't give me 7 cpus on one node with 1MB of ram, and 1 CPU on the other with 8G"	17:18
sean-k-mooney	the thing that breaks with simple models is alwasy disk space	17:18
sean-k-mooney	if i asked for 100	17:19
dansmith	artom: something like "cpus and mem must match 60/40 across the split" or something	17:19
sean-k-mooney	100G you cant give me 2x50G	17:19
dansmith	yep	17:19
sean-k-mooney	so it has to be per resouce class and we need to express grouping constratins and potaentail sizing info like the 64/40 split	17:20
*** gmann is now known as gmann_afk		17:20
sean-k-mooney	artom: right now to do ^ we have to specify the toplogy exactly rather then saying these are the limits give me anyting that matches	17:20
artom	But... the disk would never be on a NUMA child RP...	17:20
dansmith	artom: he's giving an example of a resource splitting constraint	17:21
artom	TBH I still don't see why we need to express splitting constraints, but I'm probably just being thick	17:22
artom	And as sean-k-mooney said, it's an academic discussion not relevant to the current spec	17:22
sean-k-mooney	same thing applies to hugpegaes. if i ask for 512mb of hugepages and the host only has 1G hugepages allocated you cant split it	17:22
sean-k-mooney	disk is just more approchable	17:22
dansmith	because if you ask for 16 CPUs and 32G of RAM, we need to be able to say "we're willing to take that split across two nodes as long as the ratio of the split for those resources is at most 60/40"	17:22
artom	sean-k-mooney, right, but hugepages we model as their own RP	17:22
*** mkrai__ has quit IRC		17:23
artom	So you'd never get 1GB hugepages anyways, you'd get, for example, 400MB from 1 RP, and 112 from another (unrealistic numbers, I know)	17:23
artom	dansmith, so that's my thing - why do we want to say that? What's wrong with CPUs split 1/15 and memory 30GB/2GB	17:24
sean-k-mooney	it would get rejected by the step size yes	17:24
*** ccamacho has joined #openstack-nova		17:24
dansmith	artom: because that would be pretty unhelpful?	17:24
*** igordc has joined #openstack-nova		17:24
artom	dansmith, hey, they user they don't care about NUMA topologies ^_^	17:25
sean-k-mooney	but point is there are limits on how thing can be split that depned on the resouce class and how it willl be used	17:25
artom	But seriously, unhelpful from a performance POV?	17:25
dansmith	artom: right, the user isn't opinionated, but they still want a sane instance	17:25
dansmith	artom: not caring about numa doesn't mean we should give them something completely pathologically stupid	17:25
*** ociuhandu_ has joined #openstack-nova		17:26
artom	lulz - we need a hw:sanity extra specs	17:26
artom	And if they set it to ludicrous we do ^^	17:26
artom	dansmith, but yeah, I get your point	17:26
dansmith	if we did this, we'd want to be able to specify what those splits are, and if the op really doesn't care, then they can set the split policy to something very fine	17:26
dansmith	but it wouldn't make much sense for them to do that	17:27
artom	I was mostly joking about that extra spec	17:27
sean-k-mooney	yes we know	17:27
sean-k-mooney	anyway form my view point if you dont set hw:numa_nodes at all it gives nova the freedome to do something sane	17:28
sean-k-mooney	that can be create a numa toplogy if it chooese	17:28
*** ociuhandu has quit IRC		17:29
sean-k-mooney	for now we leave libvirt invent a single numa node with no affinity	17:29
*** ociuhandu_ has quit IRC		17:30
sean-k-mooney	haveing one or multiple numa nodes in the guest and mapping them to 1 or more numa node on the host are two different thing	17:30
sean-k-mooney	so can if we want expose 1 numa node to the guest and on the host map it across them	17:31
sean-k-mooney	i think we can do better then that howwever. anyway time to go review something else	17:31
*** evrardjp has quit IRC		17:34
*** evrardjp has joined #openstack-nova		17:34
*** martinkennelly has quit IRC		17:39
*** ccamacho has quit IRC		17:42
*** spatel has quit IRC		17:50
stephenfin	ah, crap. Today was supposed to be spec review day	17:51
stephenfin	Guess tomorrow is spec review day for me now \o/	17:51
yoctozepto	stephenfin: the most busy today - tomorrow :-)	17:53
stephenfin	sean-k-mooney: Tempest is failing on my extra spec validation patch because it's using generic e.g. 'key1=value1' extra specs. What's the most generic extra spec we've got?	17:57
stephenfin	I've been using 'hw:numa_nodes' but that's libvirt/HyperV specific	17:57
*** spatel has joined #openstack-nova		17:57
spatel	sean-k-mooney: sorry i was in meeting	17:57
*** derekh has quit IRC		17:58
spatel	sean-k-mooney: as per your recommendation i have added cpu_threads=2 and cpu_sockets=2 in flavor and run test but result was OK (compare to 16 vCPU with single numa0)	17:59
spatel	still i don't understand why, erlang correctly detected CPUTopology on VM but still result was poor with 28vCPU	18:00
*** igordc has quit IRC		18:02
*** igordc has joined #openstack-nova		18:06
*** davidsha has quit IRC		18:07
openstackgerrit	Lee Yarwood proposed openstack/nova master: images: Move qemu-img info calls into privsep https://review.opendev.org/706897	18:11
openstackgerrit	Lee Yarwood proposed openstack/nova master: images: Allow the output format of qemu-img info to be controlled https://review.opendev.org/706898	18:11
openstackgerrit	Lee Yarwood proposed openstack/nova master: virt: Pass request context to extend_volume https://review.opendev.org/706899	18:11
openstackgerrit	Lee Yarwood proposed openstack/nova master: WIP libvirt: Fix attached encrypted LUKSv1 volume extension https://review.opendev.org/706900	18:11
*** xek_ has joined #openstack-nova		18:12
*** jcosmao has left #openstack-nova		18:13
openstackgerrit	Stephen Finucane proposed openstack/nova master: WIP: api: Add support for extra spec validation https://review.opendev.org/704643	18:13
openstackgerrit	Stephen Finucane proposed openstack/nova master: trivial: Remove FakeScheduler https://review.opendev.org/707224	18:13
openstackgerrit	Stephen Finucane proposed openstack/nova master: conf: Deprecate '[scheduler] driver' https://review.opendev.org/707225	18:13
openstackgerrit	Stephen Finucane proposed openstack/nova master: doc: Improve documentation on writing custom scheduler filters https://review.opendev.org/707226	18:13
stephenfin	sean-k-mooney: I went with 'hw:numa_nodes' and 'hw:cpu_policy' for want of something better	18:13
*** macz has joined #openstack-nova		18:15
*** xek has quit IRC		18:15
*** umbSublime has quit IRC		18:23
*** maciejjozefczyk has quit IRC		18:23
*** tbachman has quit IRC		18:23
*** tbachman has joined #openstack-nova		18:24
*** mriedem has joined #openstack-nova		18:30
*** ralonsoh has quit IRC		18:33
*** martinkennelly has joined #openstack-nova		18:37
efried	sean-k-mooney, bauzas, gibi: I think I would actually prefer the approach dansmith suggests, even if we only do it at the coarsest level, rather than effectively disable the topology modeling in ussuri.	18:44
*** gmann_afk is now known as gmann		18:49
efried	in other words:	18:49
efried	- Model with NUMA topology by default. Provide the [workaround] option to disable (and un-reshape) for situations where the following is just sh*tting all over itself and nothing is landing.	18:49
efried	- Flavors with hw:numa*-isms get translated as specced.	18:49
efried	- Flavors without hw:numa*-isms get looped for $n in range(0, $max_sane_number_of_numa_nodes_we_expect_a_host_to_ever_have) to behave as if they had specified hw:numa_nodes=$n, stopping as soon as we get a hit.	18:49
efried	This is without changing anything in placement.	18:50
*** amoralej is now known as amoralej\|off		18:51
efried	For uneven splits, we just get as close as we can, but make no attempt to be "fuzzy" (like "up to 60/40" or anything like that). So like, for $n=2 and VCPU=10, we try 10, then 5/5, then 3/3/4, then 2/3/2/3.	18:52
dansmith	there has to be some sort of policy tunable for how wild we're able to get I think	18:56
efried	max_numa_nodes_guessed ?	18:56
*** psachin has quit IRC		18:57
efried	Are there really hosts out there with more than, say, 8 NUMA cells?	18:57
dansmith	no, I mean how small of a footprint on a given node we're going to allow	18:57
efried	To be clear, I'm talking about splitting as evenly as possible, always.	18:58
efried	You're saying that sometimes that would result in unreasonably small footprint on one numa node anyway?	18:58
efried	obv we stop trying to split if any of the cells are going to get 0 of anything.	18:59
efried	so if you ask for 2 vcpu, we go to a max of hw:numa_nodes=2	18:59
dansmith	sure, but I think we need to not be willing to split memory at 1MB and 8191MB	18:59
dansmith	or 31 cpus and 1	18:59
dansmith	and the ratio needs to be close/equal for cpus and memory	19:00
efried	Right, I'm saying that happens implicitly	19:00
dansmith	how?	19:00
efried	each iteration of the loop simply behaves as if you said hw:numa_nodes=$n.	19:00
efried	which simply splits your CPU and mem "evenly" across $n nodes.	19:01
dansmith	that just seems to naive to me	19:01
efried	yes	19:01
efried	it is	19:01
efried	but so is "I don't care about NUMA"?	19:01
efried	And it ought to work 99% of the time	19:01
efried	and if it doesn't, switch on your [workaround] for a couple of hosts.	19:01
dansmith	I want the workaround to go away, remember	19:02
efried	Yes, the workaround goes away completely once we beef up placement to understand can_split	19:02
dansmith	okay your even splitting is just the first pass at sanity? then that's fine	19:02
efried	oh, yeah, eventually we want the utopia where all of this happens in one call with can_split, whose ratios are tunable (via placement side conf? via nova conf fed into placement qparams?)	19:04
dansmith	has to be nova-side	19:04
dansmith	communicated to placement via the query	19:04
efried	then we won't need the workaround anymore, because everything will be able to land (and if it can't, it's because it shouldn't).	19:05
efried	In practice, we may even find that nobody needs the workaround. But we'll see.	19:05
*** maciejjozefczyk has joined #openstack-nova		19:07
efried	Did we decide how we're going to deal with "control plane is updated but some computes are not"? Does the control plane wait to start using the new query style until all the computes are updated? We have that capability, right?	19:09
efried	okay, I see that discussed in the spec.	19:11
*** LiangFang has quit IRC		19:11
sean-k-mooney	efried: i think if we go that route in ussuri we wont be able to land it in time	19:18
efried	why not?	19:19
efried	we're not talking about trying to implement can_split in any form in ussuri	19:19
efried	are you concerned that the progressive-splitting algorithm is too complicated?	19:20
sean-k-mooney	just multiple queries wehre we try to progressively split	19:20
sean-k-mooney	efried: yes	19:20
efried	meh, I don't see how it's any worse than the proposed fallback.	19:21
sean-k-mooney	efried: it will have to take into accoung numa, native request groups in teh flavor, external requests form cyborng and netorn ports and not suck form a performnce point of view	19:21
sean-k-mooney	the proposed fallback is two queries, the native nuam one followed by the query we do today	19:21
sean-k-mooney	and it only impacts the perfomce of numa instances	19:22
sean-k-mooney	the other way imacpts the perfomce of all non numa instnaces	19:22
efried	sean-k-mooney: I don't think it has to take all that stuff into account at all.	19:22
efried	sean-k-mooney: It should behave exactly as if you said hw:numa_nodes=$n with no other hw:numa*-isms.	19:23
efried	(except I think we said we would bounce if we couldn't split evenly; that restriction would have to be lifted for this case.)	19:23
sean-k-mooney	so we would create a fake flaovr where we overrid that and pass it to the current get numa constratis funct	19:24
sean-k-mooney	*function	19:24
efried	if you like	19:24
efried	that would be the spirit, anyway.	19:24
sean-k-mooney	i gues that makes it simpler	19:24
*** maciejjozefczyk has quit IRC		19:25
sean-k-mooney	so the get_numa_constraits function will reject any invalid toplogy with regard to even spliting with an excetion	19:25
sean-k-mooney	so we would just loop and contiue if an excption is raised up to the limit	19:25
efried	or we relax the constraint to split as close to evenly as possible. Or do that split first.	19:26
efried	Implementation detail. Point is, it shouldn't be super hard to figure out.	19:26
sean-k-mooney	we could use the asemetric numa modeling support that is there yes	19:26
sean-k-mooney	e.g. if you had a 9 core vm and we are on numa=2 do 4cpus+5cpus	19:27
sean-k-mooney	instead of going to 3 numa nodes with 3 cpus	19:28
efried	right	19:28
sean-k-mooney	im not sure which is more likely to cause fragmentation of the top of my head	19:28
sean-k-mooney	we should tell people to just use powers of 2	19:28
efried	example I gave above was with 10 VCPUs, we would try 10, then 5/5, then 3/3/4, then 2/3/2/3.	19:28
efried	no	19:28
sean-k-mooney	i was jokeing	19:29
efried	we should tell people to use real numa specs if they care.	19:29
sean-k-mooney	it does make life easier when they do but ya i can see that working	19:29
sean-k-mooney	ok if we caluate the split in the tempory flavor we pass to the numa constratis function then it would populate the instance numa toplogy object as if the user had set it manuyaly in teh falvor	19:30
sean-k-mooney	then if we save that in the instnace we could ensure we dont break live migration by changing it	19:31
efried	just so.	19:31
efried	well, I would expect we shouldn't save it in the flavor, because we want the instance to be able to morph to fit somewhere else if it needs to.	19:31
sean-k-mooney	right	19:31
efried	but I don't know how that works, do you break an instance if you "change" its topo from under it?	19:31
sean-k-mooney	i ment save the instance_numa_toplogy object	19:31
sean-k-mooney	not the falavor	19:32
sean-k-mooney	efried: during live migration you would	19:32
sean-k-mooney	cold migration it might mess up some manula config but it should not break it in general	19:32
efried	hm, well that's a bummer. So how do we migrate numa-agnostic instances today?	19:32
sean-k-mooney	i was suggesting once we select a toplogy we store it in the request_spec and instnace	19:32
sean-k-mooney	so that it stays the same for the liftime of the instance unless you resize	19:33
sean-k-mooney	or rebuild	19:33
sean-k-mooney	am today	19:33
sean-k-mooney	non numa isntace are alwasy exposed as 1 numa node	19:33
sean-k-mooney	so it never chagnes form the gest point of view	19:33
sean-k-mooney	so we just lie to the guest	19:34
efried	couldn't we continue lying to the guest?	19:34
sean-k-mooney	we can yes	19:34
efried	does the guest do things differently if it knows CPU x is affined to memory y?	19:34
sean-k-mooney	so we would do the progessive spliting to select the host resouces and present it as 1 numa node to the guest	19:34
sean-k-mooney	but that will have worse performace then telling it its actull toplogy	19:35
sean-k-mooney	yes	19:35
sean-k-mooney	the kernel will take that into account wehn allocation memroy for a proces	19:35
sean-k-mooney	trying to use numa local memory ahead of remote numa memory	19:35
spatel	sean-k-mooney: hey	19:35
efried	well, I guess this is a problem we would have eventually anyway, right dansmith?	19:35
sean-k-mooney	if we dotn expose the toplogy to the vm the vm kernel wont know how to optimise	19:35
spatel	did you see my last mesg?	19:36
dansmith	efried: what? needing everything to support guests with numa?	19:36
spatel	running vm on single numa0 give very high performance compare to running on both numa node (with cpu_socket=2 and cpu_threads=2 option)	19:37
efried	dansmith: TL;DR: In the world where you say you don't care about NUMA, we give you an instance that's NUMA-ified, but with whatever split we were able to fit.	19:37
efried	If you migrate that instance, you have to preserve that topo on the target; you can't just munge it to a new shape.	19:37
efried	Which will then limit where you can fit it on migration.	19:37
sean-k-mooney	spatel: i was away but just saw it now. fundemtally i think erlang is jut not optimising correctly. im not really sure how to help other then suggestin run more small vms if you can scale out the application horrizontally instead of vertically	19:37
dansmith	sean-k-mooney: presumably if the flavor on the instance is not numa-aware then we can just find a new host during cold migration and give it a new topology when it moves	19:37
sean-k-mooney	dansmith: yes we could	19:37
dansmith	efried: only on cold migration	19:38
dansmith	sorry	19:38
dansmith	efried: only on live migration	19:38
dansmith	efried: on cold migration it could change because you're rebooting and there shouldn't be anything in the guest that persistently cares what the topology is..	19:38
efried	Okay, that would be a new limitation.	19:38
spatel	sean-k-mooney: totally understand but i am going to loose 16 cpu in that case. but anyway i can live with that	19:38
dansmith	efried: we have that limitation today and it's handled by the numa live migration stuff, AFAIK	19:38
efried	dansmith: oh, my understanding was that, by lying to the guest and saying it only has one NUMA node, regardless of how many are under the covers, we can change the under-the-covers on live migration without "affecting" the guest. It would still have crappy performance on both sides, but it wouldn't notice that anything had changed.	19:39
sean-k-mooney	efried: well the limitation would be dont change the view of hardwar the vm sees in live migration	19:39
dansmith	efried: and selecting a host in that situation should be the same as selecting a host for a new instance boot where the flavor cares deeply about the topology	19:39
sean-k-mooney	when framed that way its what we do today	19:39
dansmith	efried: only insofar as it is unaware of how stupid it's being, regardless of what is underneath	19:40
sean-k-mooney	if we lye to the vm so it only sees 1 numa node we can in some cases change the mapping underneth yes	19:40
dansmith	efried: so yes, if all guests are numa-aware then migrating indifferent ones becomes a little more restrictive, but that's the same goal as not pretending this stuff doesn't exist at boot time, IMHO	19:41
efried	dansmith: okay, I understand and agree with that; I'm just questioning whether that's going to effectively spike the chances of NVH trying to live migrate a NUMA-agnostic instance.	19:41
efried	sounds like the answer is yes, and we're okay with that.	19:41
dansmith	efried: it may increase the difficulty of moving things, yes	19:41
sean-k-mooney	efried: well in a non full clould it very likely that the numa=1 case will just work	19:42
sean-k-mooney	unless the vm is very large	19:42
dansmith	efried: I refer to the documented goal of not trying to schedule the last byte of memory	19:42
dansmith	yep	19:42
sean-k-mooney	and in that case the numa=2 case is likely to work	19:42
sean-k-mooney	so i dont think it would spike much	19:42
dansmith	small guests are less likely to care about numa, and thus more likely to fit into numa=1, and thus more likely to be easily movable	19:43
dansmith	large guests are more likely to need numa for proper performance and have the moving restriction today	19:43
*** vishalmanchanda has quit IRC		19:43
sean-k-mooney	efried: dansmith im wondering if we could/should have a second spec or defer this to the implematnion	19:44
efried	This spec needs to say whether we're going to try to do the progressive splitting thing.	19:44
sean-k-mooney	e.g. if we agree withthe proposed placmenet modeling should we decied how to do the query splitting seperatly	19:45
efried	But that's not the only factor in play. We also need to address the partially-upgraded cloud.	19:45
sean-k-mooney	vvs fallbackj	19:45
sean-k-mooney	so with the progress spilting i think we still need the fallback	19:45
sean-k-mooney	to cover that case	19:45
sean-k-mooney	the fall back is the only thing that can ever land on the non upgraded hosts	19:46
efried	I'm not sure we should do the fallback thing at all.	19:46
efried	Not if there's a way we can simply avoid doing the translation until all computes are upgraded.	19:46
sean-k-mooney	we cant without a global config	19:47
sean-k-mooney	which is why we have the fallback for pcpus	19:47
efried	I thought the control plane was able to tell which computes were at what level?	19:47
dansmith	efried: we have to do the fallback no?	19:47
efried	By RPC something something?	19:47
dansmith	efried: it can	19:47
sean-k-mooney	it was a gloabl config untill we agreed to do the fallback at the end of train	19:47
efried	dansmith: isn't that going to end up violating pack/spread weighing and server affinity groups, because it will always favor upgraded/reshaped computes?	19:48
*** spatel has quit IRC		19:48
dansmith	yes?	19:48
efried	I mean, I see your point, that you really can't upgrade a compute and reshape it unless the scheduler is going to do some translating.	19:49
dansmith	or, do both queries, merge the two results and let the filters/weighers decide?	19:49
efried	gross. But yeah.	19:49
dansmith	doesn't seem more gross to me.. two queries, yes, but everything will start with two queries effectively	19:49
sean-k-mooney	dansmith: do we merge them for PCPUs or is there a reason we dont?	19:50
dansmith	idk	19:50
efried	IIRC we don't	19:50
efried	we do one, then if no results, we do the other	19:50
sean-k-mooney	right which by the way we still do	19:50
efried	as for a reason, probably just didn't think about the weighing etc.	19:51
*** jmlowe has joined #openstack-nova		19:51
sean-k-mooney	no i remebere this coming up	19:51
sean-k-mooney	i just dont recally why we chose to/not to merge them	19:51
sean-k-mooney	so is the poposal always do both queries and merge them or to the progressive spliting	19:52
efried	Both	19:53
efried	because	19:53
efried	we want upgraded hosts to cut over to NUMA-modeled by default.	19:53
sean-k-mooney	ok so progressive for non numa and both qureies for numa	19:53
efried	I think we need two queries for both.	19:54
sean-k-mooney	i think it will be more like 2 for numa and 5 for non numa	19:54
efried	yeah	19:54
sean-k-mooney	5 asumign we dint max_numa_nodes=5	19:55
efried	4	19:55
sean-k-mooney	sorry 4	19:55
efried	I asked this earlier: what's the max number of NUMA nodes we know of on any system? Is it 4?	19:55
sean-k-mooney	could we bail out if the first numa query passed	19:55
efried	yes	19:55
sean-k-mooney	on a 32-64 core 2 socket amd eypci host with a numa node per l3 region 16-32 numa node	19:56
efried	ye gods	19:56
*** tbachman has quit IRC		19:57
sean-k-mooney	if you dont expose a numa node per l3 region then 8 is more realisitc	19:57
efried	Can we tell from the db?	19:57
efried	I guess we could tell by querying placement.	19:57
sean-k-mooney	its in teh hohst numa toplogy blob in the cell db	19:57
*** tbachman has joined #openstack-nova		19:57
sean-k-mooney	so yes technically.	19:57
efried	but I don't know that we really want to go discovering that every time we schedule.	19:58
sean-k-mooney	every time defintly not	19:58
sean-k-mooney	schduler config option?	19:58
efried	configurable [scheduler]max_implicit_numa_nodes ?	19:58
efried	yeah	19:58
efried	because even if it's possible to have a 32-way split, that would almost never be a good idea probably.	19:58
sean-k-mooney	set it to 0 to diable numa qureis entrily otherwise its the limit	19:58
sean-k-mooney	efried: it give you almost a 35% performace boost	19:59
efried	"disable" meaning what?	19:59
sean-k-mooney	efried: disabel mean i have disable numa reporting in my entire cloud so dont even try	19:59
efried	Need to grok how that query would be different from the non-upgraded query.	20:00
sean-k-mooney	it would be the same	20:00
sean-k-mooney	just want we do today in tran	20:00
efried	Oh, because it would have the HW_NUMA_ROOT trait.	20:00
sean-k-mooney	well require=!HW_NUMA_ROOT would be the detla form train i guess	20:01
sean-k-mooney	but if no host reports numa thats a noop	20:01
efried	In this new picture, I'm not sure we need/want to forbid that trait.	20:01
sean-k-mooney	we dont	20:02
sean-k-mooney	if we are allowed to invent numa toplogies	20:02
sean-k-mooney	we could specificly to land on un upgraded hosts	20:02
sean-k-mooney	but it has no other use	20:02
efried	in this case we don't want to target un-upgraded hosts.	20:02
efried	we want to allow landing there, but not force it ever.	20:03
sean-k-mooney	sure	20:03
efried	hum, but what we don't want is to land across numa nodes on an upgraded host. So yeah, I think the 'fallback' query in both cases should have !HW_NUMA_ROOT.	20:04
sean-k-mooney	so [scheduler]max_implicit_numa_nodes=0 means dont add HW_NUMA_ROOT or granular groups, anyting above 0 is the amount of numa nodes to try for progressive spliting	20:04
efried	yeah. But only for NUMA-agnostic flavors. For NUMA-aware flavors, we have an explicit number of nodes we're trying for.	20:04
sean-k-mooney	yes	20:05
efried	Do you have it in you to write this up, sean-k-mooney?	20:05
efried	I made a start on it, but I need to step away for... possibly the rest of the day.	20:05
sean-k-mooney	ill start a second etherpad.	20:05
sean-k-mooney	or if you have a start put it in one and i can extend it.	20:05
sean-k-mooney	efried: do we still need teh per host config option in this model	20:06
sean-k-mooney	i think no if we do the progressive spliting	20:06
sean-k-mooney	but we might want to for perfomcne reasin if we jsut want to turn it off	20:06
efried	yes we do, because sometimes the progressive splitting won't get a result, and they want to force a host to behave like a Train host.	20:06
sean-k-mooney	ok	20:07
efried	but that's why the workaround is off by default. You have to really need it to turn it on.	20:07
sean-k-mooney	ok that makes sense	20:07
sean-k-mooney	dansmith: would you be oke with a [scheduler]/max_implicit_numa_nodes config option	20:07
sean-k-mooney	to contol the progessive spliting	20:08
efried	"config-driven API behavior" warning. Not sure I see a better alternative though.	20:08
sean-k-mooney	efried: well the virt driver can today dowhatever the hell it like in this case anyway so im not sure its an observable thing	20:09
sean-k-mooney	at least form the api perspctive	20:09
sean-k-mooney	but i get where your coming form	20:09
*** jmlowe has quit IRC		20:14
dansmith	that's totally not config-driven api behavior	20:15
dansmith	and yes, I think that's fine	20:15
sean-k-mooney	ok ill try to write this up in a comment to the spec and then ill try not to melt bauzas brain when i try to explain this to him tomorrow in our downstream tech call	20:18
sean-k-mooney	i think 90% of the spec woudl remain the same we just need to update the section that refence the fallback and upgrade impact	20:18
efried	sean-k-mooney: I left a comment	20:22
efried	I think I covered the high points, but I'm pretty fried (efried) so I probably missed some things, if you want to fill in.	20:22
efried	gtg o/	20:22
*** efried is now known as efried_afk		20:22
sean-k-mooney	efried_afk: ill review it after coffee	20:22
efried_afk	thx	20:23
sean-k-mooney	efried_afk: o/	20:23
*** owalsh has quit IRC		20:30
*** martinkennelly has quit IRC		20:41
*** owalsh has joined #openstack-nova		20:44
*** spatel has joined #openstack-nova		21:18
*** nweinber has quit IRC		21:20
*** spatel has quit IRC		21:22
*** xek_ has quit IRC		21:35
*** mmethot has quit IRC		21:36
*** mmethot has joined #openstack-nova		21:37
*** mmethot has quit IRC		21:38
*** mmethot has joined #openstack-nova		21:38
*** mmethot has quit IRC		21:42
*** damien_r has quit IRC		21:44
gmann	johnthetubaguy: what you think of passing service as actual target in service policies? - https://review.opendev.org/#/c/676688/8/nova/api/openstack/compute/services.py	21:53
*** maciejjozefczyk has joined #openstack-nova		21:57
*** maciejjozefczyk has quit IRC		21:59
*** rcernin has joined #openstack-nova		22:07
*** umbSublime has joined #openstack-nova		22:11
*** kaisers has joined #openstack-nova		22:21
*** mriedem has quit IRC		22:25
*** slaweq has quit IRC		22:25
artom	We should probably address those errors when running func tests:	22:35
artom	Exception ignored in: <function _after_fork at 0x7f6382a2ed40>	22:35
artom	Traceback (most recent call last):	22:35
artom	File "/usr/lib64/python3.7/threading.py", line 1373, in _after_fork	22:35
artom	assert len(_active) == 1	22:35
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Skip all integration jobs for policies only changes. https://review.opendev.org/707268	22:42
gmann	efried_afk: stephenfins dansmith gibi melwitt ^^ this will speed up the gate for policy BP changes.	22:43
gmann	alex_xu: ^^	22:43
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Skip to run all integration jobs for policies-only changes. https://review.opendev.org/707268	22:44
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Skip to run all integration jobs for policies-only changes. https://review.opendev.org/707268	22:44
*** slaweq has joined #openstack-nova		22:50
*** iurygregory has quit IRC		22:52
*** slaweq has quit IRC		22:55
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Skip to run all integration jobs for policies-only changes. https://review.opendev.org/707268	23:06
gmann	melwitt: updated ^^	23:06
melwitt	ack	23:06
*** tkajinam has joined #openstack-nova		23:06
*** slaweq has joined #openstack-nova		23:11
*** jmlowe has joined #openstack-nova		23:15
*** slaweq has quit IRC		23:16
sean-k-mooney	dansmith: efried_afk: i did a thing. https://review.opendev.org/#/c/552924/17/specs/ussuri/approved/numa-topology-with-rps.rst@516	23:23
sean-k-mooney	dansmith: efried_afk its a trivial poc of the progresive generation of the numa toplogies for a non numa vm	23:23
sean-k-mooney	just the toplogy object not the queries but i could proably hack that up tomorrow	23:24
*** mlavalle has quit IRC		23:27
*** nicolasbock has quit IRC		23:36
*** mmethot has joined #openstack-nova		23:46
*** igordc has quit IRC		23:46
*** jmlowe has quit IRC		23:48
*** Liang__ has joined #openstack-nova		23:49
*** rcernin has quit IRC		23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!