Wednesday, 2019-08-14

openstackgerrit	Takashi NATSUME proposed openstack/nova stable/queens: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676291	00:00
sean-k-mooney	i would proably do this slightly differently however. e.g. not requrie the operator to confgiure aggreates and do it like my image metadtata tratis changes	00:00
openstackgerrit	Takashi NATSUME proposed openstack/nova stable/pike: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676292	00:02
sean-k-mooney	melwitt: i havent reviewd the code for this but i hope they are caching the aggreate metadata	00:02
sean-k-mooney	the only why the prefilter could work would eb to retrive all of the nova host aggreates and there metadata then calulate the aggrates that wont suppor the instance and generate the &member_of=!in:<agg1>,<agg2>,<agg3> query paramater	00:04
melwitt	sean-k-mooney: here's the code, it's close to merging so get in on it while you can https://review.opendev.org/#/q/topic:bp/placement-req-filter-forbidden-aggregates+(status:open+OR+status:merged)	00:05
sean-k-mooney	ya so no caching...	00:07
sean-k-mooney	the proablem with caching is keping the cache valid	00:07
*** rcernin has joined #openstack-nova		00:08
*** whoami-rajat has joined #openstack-nova		00:09
*** igordc has joined #openstack-nova		00:09
*** markvoelker has joined #openstack-nova		00:10
sean-k-mooney	melwitt: the schduler does not directly talk to the db right? i.e. does the schduler acess the db via the conductor like teh compute service?	00:11
melwitt	sean-k-mooney: I think it talks directly	00:11
melwitt	only the compute service does not	00:11
sean-k-mooney	ok so all the "control plane" service talk directly but not comptues	00:12
melwitt	I think so	00:13
sean-k-mooney	ok i was not sure if it was only the api,metadata service and conductor that went direct.	00:13
sean-k-mooney	i guess it makes sense for the schduler to be able to take a fast path to the db	00:14
*** markvoelker has quit IRC		00:16
*** KeithMnemonic has joined #openstack-nova		00:20
*** markvoelker has joined #openstack-nova		00:21
*** trident has quit IRC		00:38
melwitt	sean-k-mooney: you can see which services go through conductor by looking for 'indirection_api' under nova/cmd	00:41
sean-k-mooney	indirection_api is new to me	00:42
melwitt	yeah, that's the magic for objects going through conductor	00:43
sean-k-mooney	ah ok.	00:44
sean-k-mooney	the way the prefilter is workign bugs me	00:46
sean-k-mooney	i need to re read the spec but i dont think it workign the way i expect it too	00:47
sean-k-mooney	it may be equivalent	00:47
*** KeithMnemonic has quit IRC		00:47
*** gyee has quit IRC		00:48
sean-k-mooney	if the image has no trait request i think it would allow all aggreates	00:48
*** KeithMnemonic has joined #openstack-nova		00:49
sean-k-mooney	and i think you would have to list all traits in the imgae/flavor on the aggreate for it to match.	00:49
sean-k-mooney	i proably should be less tired when reviewing this but this seams reversed to me	00:49
melwitt	yeah, I think it will get all the aggregates, add any without the trait to the "no" bin, and then pass the "no" bin to placement and say "these are the forbidden aggregates, don't return hosts in them"	00:52
sean-k-mooney	i think the logic is inverted.	00:53
melwitt	which does seem backward but I assume it's the only way to do it	00:53
sean-k-mooney	its treating the image and flaovr as the attoritive source nto the metadata	00:53
sean-k-mooney	this is broken in the same way the existing filter is	00:53
sean-k-mooney	and it the different betten the out of tree one and the in tree one	00:53
sean-k-mooney	for this prefilter to work every trait that is in the image+flavor would have to be on the host_aggreate metadata	00:55
melwitt	it does specifically say (in the proposed doc change) that image/flavor need not be set and will still not land on hosts in aggregates with =required. I just don't know how that's done	00:55
sean-k-mooney	its done by invertin the relationship	00:55
openstackgerrit	Takashi NATSUME proposed openstack/nova stable/ocata: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676295	00:56
sean-k-mooney	so what you need to do is get the metadata for all host_aggreates. then for each aggreate you check all the keys are present itn the flvore/image requst	00:56
sean-k-mooney	if a key is set on the aggreate but not in the flaor/imgae then you add that to the forbindin list	00:56
melwitt	oh ok	00:56
melwitt	yeah	00:56
sean-k-mooney	again exactly what the out of tree filter does :)	00:56
sean-k-mooney	ok ill comment on the review	00:57
sean-k-mooney	as implement this would work exactly the same as the aggregate_instance_extra_specs filter	00:58
*** rcernin has quit IRC		00:58
sean-k-mooney	well it sort of	00:58
sean-k-mooney	it use traits instead of any proerty and also looks at the image	00:58
openstackgerrit	Takashi NATSUME proposed openstack/nova stable/ocata: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676295	00:58
sean-k-mooney	but conceptuly its the same	00:58
openstackgerrit	Takashi NATSUME proposed openstack/nova stable/ocata: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676295	01:01
sean-k-mooney	melwitt: actully it might be correct https://review.opendev.org/#/c/671074/6/nova/objects/aggregate.py@476	01:04
sean-k-mooney	in need to figure out what sql that generates	01:04
*** spsurya has joined #openstack-nova		01:05
melwitt	sean-k-mooney: note the ~ (not) squiggle	01:10
sean-k-mooney	ya looking at the doc sting i think its correct after all	01:11
melwitt	phew good	01:12
sean-k-mooney	so it selecting the aggrreate if the aggreate metadta key is not in the set passed in	01:12
*** rcernin has joined #openstack-nova		01:14
openstackgerrit	ZHOU YAO proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/621646	01:16
*** hamzy has joined #openstack-nova		01:17
*** markvoelker has quit IRC		01:23
*** markvoelker has joined #openstack-nova		01:23
*** markvoelker has quit IRC		01:28
sean-k-mooney	ok im goign to log off for the night o/	01:32
sean-k-mooney	oh infra change how logs are renderd in the gate.	01:43
sean-k-mooney	i dont think i like it just becasue its different the it has been for 6+ years	01:44
*** BjoernT_ has joined #openstack-nova		01:44
*** BjoernT_ has quit IRC		01:46
*** BjoernT has quit IRC		01:47
openstackgerrit	Takashi NATSUME proposed openstack/nova master: api-ref: Fix collapse of 'host_status' description https://review.opendev.org/676301	01:47
*** markvoelker has joined #openstack-nova		01:56
*** Tianhao_Hu has joined #openstack-nova		01:58
*** markvoelker has quit IRC		02:00
*** ash2307 has joined #openstack-nova		02:13
*** ash2307 has left #openstack-nova		02:16
*** KeithMnemonic has quit IRC		02:17
*** BjoernT has joined #openstack-nova		02:22
openstackgerrit	Sundar Nadathur proposed openstack/nova master: ksa auth conf and client for Cyborg access https://review.opendev.org/631242	02:23
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Refactor some methods for reuse by Cyborg-related code. https://review.opendev.org/673734	02:23
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243	02:23
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244	02:23
openstackgerrit	Sundar Nadathur proposed openstack/nova master: Get resolved Cyborg ARQs and add PCI BDFs to VM's domain XML. https://review.opendev.org/631245	02:23
openstackgerrit	Sundar Nadathur proposed openstack/nova master: WIP: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735	02:23
*** whoami-rajat has quit IRC		02:28
*** yonglihe has joined #openstack-nova		02:40
*** bhagyashris has joined #openstack-nova		02:43
openstackgerrit	Yongli He proposed openstack/nova master: Add server sub-resource topology API https://review.opendev.org/621476	02:44
*** BjoernT has quit IRC		02:46
*** boxiang has joined #openstack-nova		02:59
*** skadam has joined #openstack-nova		03:03
*** takashin has left #openstack-nova		03:06
*** ccamacho has joined #openstack-nova		03:06
openstackgerrit	Boxiang Zhu proposed openstack/nova master: Make evacuation respects anti-affinity rule https://review.opendev.org/649963	03:06
openstackgerrit	Boxiang Zhu proposed openstack/nova master: Fix live migration break group policy simultaneously https://review.opendev.org/651969	03:06
boxiang	hi gibi , I have finished the regression functional test for this https://review.opendev.org/649963	03:08
*** sapd1_x has joined #openstack-nova		03:15
*** licanwei has joined #openstack-nova		03:15
*** whoami-rajat has joined #openstack-nova		03:20
*** psachin has joined #openstack-nova		03:33
*** ash2307 has joined #openstack-nova		03:34
*** tbachman has quit IRC		03:44
*** udesale has joined #openstack-nova		03:54
*** tbachman has joined #openstack-nova		03:55
*** markvoelker has joined #openstack-nova		04:01
openstackgerrit	Merged openstack/nova master: Execute TargetDBSetupTask https://review.opendev.org/633853	04:05
*** markvoelker has quit IRC		04:05
*** sapd1_x has quit IRC		04:08
*** skadam has quit IRC		04:15
*** mkrai has joined #openstack-nova		04:17
*** skadam has joined #openstack-nova		04:32
*** skadam has quit IRC		04:37
*** markvoelker has joined #openstack-nova		04:52
*** Tianhao_Hu has quit IRC		04:53
*** markvoelker has quit IRC		04:57
*** igordc has quit IRC		05:02
*** ratailor has joined #openstack-nova		05:08
*** ratailor has quit IRC		05:20
*** ash2307 has left #openstack-nova		05:31
*** Tianhao_Hu has joined #openstack-nova		05:32
*** mkrai has quit IRC		05:33
*** mkrai has joined #openstack-nova		05:36
*** adrianc has quit IRC		05:37
*** ratailor has joined #openstack-nova		05:46
openstackgerrit	ZHOU YAO proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/621646	05:49
*** dpawlik has joined #openstack-nova		06:11
*** brinzhang_ has joined #openstack-nova		06:16
*** rcernin has quit IRC		06:16
*** brinzhang_ has quit IRC		06:18
*** brinzhang_ has joined #openstack-nova		06:18
*** luksky has joined #openstack-nova		06:18
*** brinzhang has quit IRC		06:19
*** brinzhang_ has quit IRC		06:19
*** brinzhang_ has joined #openstack-nova		06:19
*** dpawlik has quit IRC		06:20
*** Tianhao_Hu has quit IRC		06:20
openstackgerrit	Yongli He proposed openstack/nova master: Add server sub-resource topology API https://review.opendev.org/621476	06:22
*** dpawlik has joined #openstack-nova		06:23
*** rcernin has joined #openstack-nova		06:31
*** slaweq has joined #openstack-nova		06:52
*** ivve has joined #openstack-nova		06:59
*** trident has joined #openstack-nova		07:01
*** xek has joined #openstack-nova		07:03
*** ccamacho has quit IRC		07:10
*** tesseract has joined #openstack-nova		07:10
*** mjozefcz has joined #openstack-nova		07:16
*** kaisers has quit IRC		07:21
*** janki has joined #openstack-nova		07:23
*** kaisers has joined #openstack-nova		07:23
openstackgerrit	Merged openstack/nova master: api-ref: Fix collapse of 'host_status' description https://review.opendev.org/676301	07:27
*** mkrai has quit IRC		07:29
*** mkrai has joined #openstack-nova		07:31
*** mkrai has quit IRC		07:38
*** rcernin has quit IRC		07:56
*** helenafm has joined #openstack-nova		07:58
*** elod_off is now known as elod		08:03
*** markvoelker has joined #openstack-nova		08:07
*** tkajinam has quit IRC		08:11
*** markvoelker has quit IRC		08:15
*** Dinesh_Bhor has joined #openstack-nova		08:16
*** rpittau\|afk is now known as rpittau		08:18
*** tssurya has joined #openstack-nova		08:22
*** rcernin has joined #openstack-nova		08:23
*** jangutter has joined #openstack-nova		08:24
*** derekh has joined #openstack-nova		08:29
*** rcernin has quit IRC		08:29
*** cdent has joined #openstack-nova		08:30
*** rcernin has joined #openstack-nova		08:38
*** luksky has quit IRC		08:39
*** markvoelker has joined #openstack-nova		08:41
*** boxiang has quit IRC		08:41
*** boxiang has joined #openstack-nova		08:41
*** janki has quit IRC		08:44
*** markvoelker has quit IRC		08:45
*** factor has quit IRC		08:58
*** boxiang has quit IRC		08:58
*** icarusfactor has joined #openstack-nova		08:58
*** boxiang has joined #openstack-nova		08:58
*** rcernin has quit IRC		09:04
*** janki has joined #openstack-nova		09:06
*** markvoelker has joined #openstack-nova		09:10
*** luksky has joined #openstack-nova		09:13
*** markvoelker has quit IRC		09:15
*** Tianhao_Hu has joined #openstack-nova		09:29
*** shilpasd has joined #openstack-nova		09:30
*** Dinesh_Bhor has quit IRC		09:35
*** janki has quit IRC		09:40
*** ociuhandu has joined #openstack-nova		09:49
*** icarusfactor has quit IRC		09:51
*** icarusfactor has joined #openstack-nova		09:51
openstackgerrit	Bhagyashri Shewale proposed openstack/nova master: Ignore root_gb for BFV in simple tenant usage API https://review.opendev.org/612626	09:55
*** ut2k3 has joined #openstack-nova		10:05
ut2k3	Hi guys, I have unfortunately two Volumes stock in "detaching" - Is there any chance I can force detach these from the instances? `nova volume-detach...` as well reseting once their state with `cinder reset-state...` did not help	10:05
*** adrianc has joined #openstack-nova		10:07
*** markvoelker has joined #openstack-nova		10:11
*** ociuhandu has quit IRC		10:14
*** ociuhandu has joined #openstack-nova		10:15
*** markvoelker has quit IRC		10:15
*** luyao has joined #openstack-nova		10:16
*** tbachman has quit IRC		10:33
*** bnemec has quit IRC		10:34
*** shilpasd has quit IRC		10:36
*** bnemec has joined #openstack-nova		10:37
*** bhagyashris has quit IRC		10:38
*** brinzhang_ has quit IRC		10:39
*** dpawlik has quit IRC		10:41
*** bnemec has quit IRC		10:45
*** belmoreira has joined #openstack-nova		10:45
*** eharney has quit IRC		10:48
*** Tianhao_Hu has quit IRC		10:49
*** bnemec has joined #openstack-nova		10:49
*** Nick_A has quit IRC		10:50
*** priteau has joined #openstack-nova		10:50
*** tbachman has joined #openstack-nova		10:51
openstackgerrit	Ghanshyam Mann proposed openstack/python-novaclient master: Microversion 2.75 - Multiple API cleanup changes https://review.opendev.org/676275	10:56
stephenfin	efried: I'm kind of stuck on the reshape for VCPU -> PCPU. I think what we're doing at the moment is wrong, and the tests are hiding that fact https://review.opendev.org/#/c/674895/7/nova/virt/libvirt/driver.py@6891	10:58
*** helenafm has quit IRC		10:58
openstackgerrit	Ghanshyam Mann proposed openstack/python-novaclient master: Microversion 2.75 - Multiple API cleanup changes https://review.opendev.org/676275	10:59
stephenfin	efried: What I think we need to do is a multi-step process, (a) figure out if any instance on the host is using pinned instances by looking at the (host) NUMATopology.pinned_cpus attribute, if so (b) figure out if there are any PCPU allocations against the host, if not (c) figure out how many VCPUs to migrate to PCPUs and do that	11:00
*** adrianc has quit IRC		11:00
stephenfin	efried: But I can't do (b) easily since the pattern we have for reshaping is to only provide allocations to 'update_provider_tree' if there's a 'ReshapeNeeded' exception raised, so I'm stuck	11:01
*** adrianc has joined #openstack-nova		11:03
*** bnemec has quit IRC		11:04
openstackgerrit	YAMAMOTO Takashi proposed openstack/nova master: WIP: midonet doesn't have plug-time vif events https://review.opendev.org/676388	11:04
*** udesale has quit IRC		11:05
*** dpawlik has joined #openstack-nova		11:08
*** bnemec has joined #openstack-nova		11:09
*** udesale has joined #openstack-nova		11:12
*** bnemec has quit IRC		11:13
*** udesale has quit IRC		11:16
*** belmoreira has quit IRC		11:17
*** markvoelker has joined #openstack-nova		11:21
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Testing tls with ipv6 also https://review.opendev.org/676391	11:23
*** markvoelker has quit IRC		11:26
*** bnemec has joined #openstack-nova		11:26
*** markvoelker has joined #openstack-nova		11:31
*** bnemec has quit IRC		11:31
*** markvoelker has quit IRC		11:36
*** jaosorior has joined #openstack-nova		11:38
*** bnemec has joined #openstack-nova		11:38
*** factor__ has joined #openstack-nova		11:41
*** icarusfactor has quit IRC		11:43
*** belmoreira has joined #openstack-nova		11:44
*** bnemec has quit IRC		11:45
openstackgerrit	Ghanshyam Mann proposed openstack/python-novaclient master: Microversion 2.75 - Multiple API cleanup changes https://review.opendev.org/676275	11:48
*** bnemec has joined #openstack-nova		11:48
*** bnemec has quit IRC		11:55
stephenfin	alex_xu, gibi: More than happy to let you handle the update :)	11:58
stephenfin	I shall sit at the back of the room and judge mercilessly :P	11:58
*** bnemec has joined #openstack-nova		11:59
gibi	stephenfin: I will do the placement project update with tetsuro so you can have the nova project update :)	11:59
alex_xu	the world depends on three of us~	12:01
*** factor__ has quit IRC		12:02
*** ratailor has quit IRC		12:02
*** markvoelker has joined #openstack-nova		12:02
gibi	will be fun :)	12:04
*** bnemec has quit IRC		12:04
alex_xu	hah	12:04
*** ut2k3 has quit IRC		12:06
sean-k-mooney	stephenfin: your not allowed to anounce that we are killing nova and truning it into a sig :P	12:07
alex_xu	haha	12:09
alex_xu	I'm thinking how a sig works	12:09
sean-k-mooney	alex_xu: they are allowed to have repos but have no ptl to heard the cats in a common direction	12:10
*** bnemec has joined #openstack-nova		12:10
alex_xu	that is fun	12:11
*** tbachman has quit IRC		12:11
sean-k-mooney	hehe yep which is why stephenfin's first act as docs ptl was to eliminate his postion so he will be the final docs ptl	12:12
*** belmoreira has quit IRC		12:13
*** ociuhandu has quit IRC		12:14
alex_xu	haha :)	12:14
*** artom has joined #openstack-nova		12:16
*** jangutter_ has joined #openstack-nova		12:19
*** jangutter has quit IRC		12:20
*** luksky has quit IRC		12:24
*** nweinber__ has joined #openstack-nova		12:27
*** bnemec has quit IRC		12:29
*** bnemec has joined #openstack-nova		12:33
*** rcernin has joined #openstack-nova		12:34
*** alemgeta has joined #openstack-nova		12:38
*** jangutter_ has quit IRC		12:39
*** jangutter has joined #openstack-nova		12:39
*** rcernin has quit IRC		12:40
*** bnemec has quit IRC		12:41
alemgeta	hello please i'm doing my msc thesis on openstack nova, what exactly failure detection algorithm used in openstack ,and place of the code, its my appreciation	12:42
*** belmoreira has joined #openstack-nova		12:42
*** belmoreira has quit IRC		12:43
alemgeta	hello please i'm doing my msc thesis on openstack nova, what exactly failure detection algorithm used in openstack ,and place of the code, its my appreciation	12:43
*** bnemec has joined #openstack-nova		12:44
*** KeithMnemonic has joined #openstack-nova		12:46
*** priteau has quit IRC		12:48
*** bnemec has quit IRC		12:49
*** tbachman has joined #openstack-nova		12:50
*** belmoreira has joined #openstack-nova		12:51
*** luksky has joined #openstack-nova		12:52
*** bnemec has joined #openstack-nova		12:54
tssurya	mriedem, dansmith: are you both around ?	12:55
dansmith	I am but I'm about to jump on a call in 4 minutes	12:55
*** jaosorior has quit IRC		12:56
tssurya	what did we finally decide on ? tweaking the reno and adding the fact that config on ironic side should be diasbled ? or pushing the task state update to driver level	12:56
dansmith	not sure we decided anything specific, but I'd prefer to move the task state setting out of the api	12:57
tssurya	ok let me try to do that then	12:57
dansmith	mriedem isn't around right now	12:57
tssurya	ok I'll push the rest of his suggestions and we can come back to that when he comes online	12:58
alemgeta	hello someone i have little question about openstack nova	12:58
alemgeta	please someone	12:59
alemgeta	about failer detection	12:59
alemgeta	doing msc thesis	12:59
*** mriedem has joined #openstack-nova		13:00
luyao	dansmith: when will you back?😀	13:02
sean-k-mooney	nova does not perfrom failure detection of any of the workloads that are deploy with it if that is what you were wondering. it obviously validates that the action you asked it to perfom succeeded but it does not monitor the apllication lifetime	13:02
sean-k-mooney	dansmith: speak his name an he shall appear	13:03
sean-k-mooney	mriedem: tssurya wanted to know if you were around ~10 minuts ago	13:04
dansmith	in an hour	13:04
luyao	dansmith: got it! See you in an hour. :)	13:05
mriedem	well here i am	13:06
*** bnemec has quit IRC		13:07
*** eharney has joined #openstack-nova		13:07
sean-k-mooney	mriedem: ill adress your vPMU feed back later today. thanks for reviewing it before you dropped off yesterday.	13:08
mriedem	yw	13:10
*** bnemec has joined #openstack-nova		13:10
*** bnemec has quit IRC		13:16
*** beekneemech has joined #openstack-nova		13:16
*** BjoernT has joined #openstack-nova		13:19
mriedem	sean-k-mooney: more lxc failures https://logs.opendev.org/24/676024/5/experimental/nova-lxc/9c06394/controller/logs/screen-n-cpu.txt.gz#_Aug_13_23_16_20_191786	13:23
*** BjoernT has quit IRC		13:23
*** BjoernT has joined #openstack-nova		13:23
mriedem	https://github.com/lxc/lxc/issues/1057 sounds similar	13:26
*** belmoreira has quit IRC		13:28
sean-k-mooney	fun. debian does not symlink /sbin to /usr/sbin so perhaps its in /usr/sbin/init	13:32
*** beekneemech has quit IRC		13:33
*** belmoreira has joined #openstack-nova		13:34
*** bnemec has joined #openstack-nova		13:39
efried	stephenfin: That's an interesting one (your reshape chicken/egg)	13:43
*** belmoreira has quit IRC		13:46
*** bnemec has quit IRC		13:47
sean-k-mooney	stephenfin: by the way regardign the reshap you code needs to be able to handel a reshap on non upgrade cases as well.	13:50
sean-k-mooney	e.g. i need to handel the reshap when then new config options are defiend	13:50
*** bnemec has joined #openstack-nova		13:51
sean-k-mooney	so you could deploy with train and have no config options set. then set them after and restart the compute agent whic would trigger teh reshape	13:51
*** jaosorior has joined #openstack-nova		13:51
stephenfin	sean-k-mooney: Would you actually have a reshape there?	13:51
stephenfin	surely you'd just start reporting the new VCPU/PCPU resources	13:52
stephenfin	no different to changing vcpu_pin_set today	13:52
stephenfin	you're not changing allocations from one type to the other	13:52
sean-k-mooney	well if you were previously using cpu pinning on that host	13:52
efried	We've said until now that we don't want to reshape except on upgrades. (Though I've been skeptical that we would be able to stick to that.)	13:52
sean-k-mooney	then defiend the cpu_dedicated_set only it would change form a vcpu allocation to pcpu allocation	13:52
stephenfin	sean-k-mooney: Nope. Remember, vcpu_pin_set is used for VCPU _and_ PCPU	13:53
sean-k-mooney	yes but you dont need to enable the prefilter	13:53
stephenfin	Is that related?	13:54
sean-k-mooney	or are we doing the convertion to resouce:pCPUs somewere else	13:54
stephenfin	efried: I _think_ I'm okay, actually	13:54
sean-k-mooney	in the scheduler utils maybe?	13:54
stephenfin	It seems the way the reshaping is done is that we build the list of inventory that we're going to report from the virt driver, but before we do the actual update we check for old allocations	13:55
*** bnemec has quit IRC		13:55
sean-k-mooney	this is what might require the reshape https://review.opendev.org/#/c/671801/19/nova/conf/workarounds.py	13:56
stephenfin	So I should be able to simply check "does this compute node have any PCPU resources recorded for itself and does it have any pinned instances" and if there's a mismatch then I reshape	13:56
sean-k-mooney	but i guess since you would have had to install with disable_legacy_pinning_policy_translation=true	13:56
sean-k-mooney	i guess its fair to say if you are disabling that you need to reshape	13:56
sean-k-mooney	you are right about both inventories being reported	13:57
stephenfin	sean-k-mooney: I'm kind of lost, tbh	13:57
stephenfin	That will only affect the scheduler	13:57
sean-k-mooney	it affect teh plamcent query	13:57
stephenfin	Right, but it won't have any impact on the compute node side	13:58
*** belmoreira has joined #openstack-nova		13:58
stephenfin	I mean, unless you're suggesting we'd want to delay reshaping if that option was configured	13:58
stephenfin	But that's not the idea	13:58
sean-k-mooney	with disable_legacy_pinning_policy_translation=true flavor.vcpus is translated to resouces.VCPU when hw:cpu_ploicy=dedicated	13:58
*** bnemec has joined #openstack-nova		13:58
stephenfin	Yeah. So if that's the case, we continue to be able to use all the pre-Train compute nodes that aren't reporting PCPUs	13:59
sean-k-mooney	yep	13:59
stephenfin	But we would not be able to use any of the Train nodes that are reporting them	13:59
sean-k-mooney	when that is removed new vms woudl be translated correctly.	13:59
stephenfin	Correct	13:59
sean-k-mooney	i guess old vms would be fixed when they are migrated	14:00
sean-k-mooney	so ya i guess its not too much of an issue	14:00
stephenfin	Their allocations will also be fixed when the compute node is upgrade	14:00
stephenfin	*upgraded	14:00
sean-k-mooney	you could avoid the reshape by migrating off all vms on old nodes to new ones	14:00
sean-k-mooney	right	14:01
stephenfin	I could, but I think that shuffling on instances between hosts has been rejected	14:01
stephenfin	*of instances	14:01
sean-k-mooney	ok then ya ignore me	14:01
sean-k-mooney	well operators can suffle the instance. nova wont do it automatically	14:02
*** tbachman has quit IRC		14:02
stephenfin	So my current algorithm is if the host has no PCPU inventory and 'ComputeNode.numa_topology.cells[*].pinned_cpus' is set (i.e. there are some pinned instances on the host), reshape	14:02
stephenfin	and reshaping involves identifying the instances on the host that are pinned and migrating their VCPU allocations wholesale to PCPU allocations	14:03
sean-k-mooney	ya	14:03
sean-k-mooney	i think that should work correctly	14:03
stephenfin	I sadly can't reshape every consumers' allocations since there's a chance, however remote, that someone hasn't listened to us and has pinned and unpinned instances on the same host	14:04
sean-k-mooney	its an online data migration?	14:04
stephenfin	Yeah, run on startup	14:04
stephenfin	So this code will trigger once in the entire life of the node	14:04
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth https://review.opendev.org/676140	14:04
stephenfin	We can theoretically remove it in U	14:04
sean-k-mooney	im just thinkin about the FFU implications of that which are that we must start teh agent	14:05
stephenfin	I have the code rewritten to implement that and am just finishing tests off	14:05
sean-k-mooney	e.g. you cant FFU form queens to U	14:05
stephenfin	Do we not do that anyway?	14:05
sean-k-mooney	unless you stop at train to start the agent	14:05
stephenfin	If not, that might justify us keeping the reshape code around for a few cycles	14:06
stephenfin	bbiab	14:06
*** tbachman has joined #openstack-nova		14:06
sean-k-mooney	its only the removal of the resahpe code that would break ffu i think	14:06
*** belmoreira has quit IRC		14:06
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request https://review.opendev.org/671497	14:06
sean-k-mooney	or rather require it to start the agents on an intermendiate release	14:06
*** alemgeta has left #openstack-nova		14:07
luyao	dansmith: Are you around now?	14:10
dansmith	luyao: almost done	14:10
*** mlavalle has joined #openstack-nova		14:11
*** spatel has joined #openstack-nova		14:12
spatel	sean-k-mooney: Do you know what is going on here? http://paste.openstack.org/show/757000/	14:12
*** Tianhao_Hu has joined #openstack-nova		14:12
luyao	dansmith: Great, I'll give my question first. It's about libvirt driver, it seems that it's not recommended to access DB in libvirt driver, but in my patch, I need to get flavor id from db to populate the device manager, is that acceptable? I would like you help to give some comments.	14:12
spatel	It clearly related to resize issue or bug	14:12
sean-k-mooney	ya i have seen that before.	14:13
sean-k-mooney	are you deploying on nfs or shared stoage	14:13
*** helenafm has joined #openstack-nova		14:13
*** Nick_A has joined #openstack-nova		14:14
dansmith	luyao: no, you can't access the database directly, but the flavor is on the instance so it should already be there and have what you need	14:14
sean-k-mooney	spatel: this can happen when an instance was delete in the db when the compute agent is stopped and then was archived before it was started again	14:15
sean-k-mooney	e.g. after an evacutation	14:15
luyao	dansmith: https://review.opendev.org/#/c/672957/5/nova/virt/libvirt/device.py@196 I need to access db to get instance	14:16
sean-k-mooney	the host has a stale libvirt domain that references a disk tha that is nolonger found	14:16
dansmith	luyao: you've got it right there	14:16
spatel	sean-k-mooney: you are saying someone delete instance but compute agent was down and it got out of sync.	14:16
*** jaosorior has quit IRC		14:16
dansmith	lyarwood: Instance.get_by_uuid() is okay, it accesses the database, but not directly, through conductor	14:17
sean-k-mooney	yes basically this happens becasue a statle libvirt domain xml is on the host that references an image file that nolonger exists	14:17
dansmith	oops	14:17
spatel	I believe this is related to resize issue when you do CPU pinning which left resize stall	14:17
dansmith	luyao: ^	14:17
luyao	dansmith: so it's ok to access database like this? I just thought it's not recommended to access database in driver	14:18
dansmith	luyao: but you should try to avoid that if you can pull the instance there.. I'd have to look at the code to see where this is running	14:18
dansmith	luyao: like I just said, you're not hitting the database directly with that call	14:18
sean-k-mooney	spatel: it does look similar to https://bugs.launchpad.net/nova/+bug/1774249	14:19
openstack	Launchpad bug 1774249 in OpenStack Compute (nova) pike "update_available_resource will raise DiskNotFound after resize but before confirm" [Medium,Triaged]	14:19
spatel	sean-k-mooney: do you know how do i fix this issue or clean up left over disk.	14:19
dansmith	luyao: see there are now two places in that init routine that get instances, the mdev one and your pmem one	14:20
dansmith	luyao: the mdev one is more efficient than what you have, but it's not okay to look up all the instances twice, so you need to refactor that code in a patch ahead of time,	14:20
alex_xu	dansmith: yea, that code is running when the nova-compute start, we populate all the assigned vgpu or vpmems from the libvirt, then we need instance_uuid and flavor_id to identify each assignment	14:20
*** dpawlik has quit IRC		14:20
sean-k-mooney	well the disk not found error means there is no left over disk	14:20
dansmith	luyao: to look up the instances on the driver once and pass them to the mdev and pmem methods	14:20
sean-k-mooney	spatel: you have a left over domain xml	14:20
spatel	let me check	14:20
dansmith	luyao: but ideally you would pass all that in from the actual driver init, because it too does this lookup I think	14:21
sean-k-mooney	so you can delete it with virsh but you should first check what the instance uuid was and then check tha tnova does not thinkg that vm should be running on that host	14:21
*** ociuhandu has joined #openstack-nova		14:21
sean-k-mooney	if nova thinkgs the vm should be delete or running no a different host then provide its not in the resize_confirm state its safe to delete teh domain xml	14:22
spatel	sean-k-mooney: i think i have xml file in /etc/libvirt/qemu which is not active in virsh list	14:24
luyao	dansmith: Does driver init lookup all instance objects? I think this should be done in compute manager.	14:24
dansmith	luyao: I said "I think", I'll look	14:25
alex_xu	I guess not	14:25
spatel	sean-k-mooney: can i just delete that file?	14:25
*** ociuhandu has quit IRC		14:26
alex_xu	now I think it is not	14:26
*** mriedem has left #openstack-nova		14:26
*** mriedem has joined #openstack-nova		14:26
mriedem	gibi: some thougths in your tempest change to run resize+confirm in grenade https://review.opendev.org/#/c/675371/	14:26
alex_xu	dansmith: is it ok query all the related instance objects and pass into the driver.init_host?	14:27
dansmith	alex_xu: luyao it does indirectly because compute manager calls init_instance for each, but yeah not in driver init, although I'm not sure where we get devicemanager instantiated either	14:27
*** belmoreira has joined #openstack-nova		14:27
dansmith	alex_xu: I would think refactoring init_host to take/pass all instances might be good, yeah	14:28
dansmith	let me look at something else	14:28
*** tbachman has quit IRC		14:28
alex_xu	luyao: ^ is that doable, I guess you need all the instance which instance.host = self.host and something instance in resizing?	14:29
sean-k-mooney	mriedem: regarding the lxc issue looking at https://logs.opendev.org/24/676024/5/experimental/nova-lxc/9c06394/controller/logs/screen-n-cpu.txt.gz#_Aug_13_23_16_20_055037 we are setting the container entrypoint to /sbin/init but that can be changed in the image metadata https://github.com/openstack/glance/blob/master/etc/metadefs/compute-libvirt-image.json#L63-L67	14:29
dansmith	alex_xu: luyao I was going to say, init_instance() already gets called for each instance in compute manager, and you could put your accounting code in there, but passing that list to init_host() would be just as good, if that works	14:30
sean-k-mooney	so i think for the debian issue we just need to set os_command_line in the lxc image to the path to the init system with is likely systemd now	14:30
luyao	alex_xu: I need all instances which have xml on the host.	14:30
dansmith	that would improve the current code, instead of your code making it worse	14:30
alex_xu	luyao: is there a case, you have xml for the target instance when resize, but the instance.host is still the src host?	14:31
Tianhao_Hu	@mriedem hi matt, about this issue, if the directory left after cold migration is empty and has no effect on cold migration, can you gvie some advice about whether we can think this is not a bug?	14:32
Tianhao_Hu	https://bugs.launchpad.net/starlingx/+bug/1824858	14:32
openstack	Launchpad bug 1824858 in StarlingX "nova instance remnant left behind after cold migration completes" [Low,Confirmed] - Assigned to hutianhao27 (hutianhao)	14:32
*** tbachman has joined #openstack-nova		14:32
alex_xu	dansmith: yea, that is better, just need to check with luyao for the resize case, in case we have domain xml but the instance.host isn't the current host. I guess init won't get the instance doesn't belong to this host	14:32
luyao	alex_xu: there is a case, the resizing is not finished ,but the instance.host is dest host	14:32
dansmith	alex_xu: yeah	14:33
sean-k-mooney	mriedem: could we use a different image in the ci job or set the path to the correct entrypoint. ill look at this again later but i dont think this is a nova bug persay just incorrect configuration	14:33
alex_xu	luyao: ah, so the source host can get a domain xml, but it need the instance which host is target host	14:34
*** BjoernT_ has joined #openstack-nova		14:35
openstackgerrit	Surya Seetharaman proposed openstack/nova master: API microversion 2.76: Add 'power-update' external event https://review.opendev.org/645611	14:35
luyao	alex_xu,: yes , so we may also need check migrations	14:35
alex_xu	dansmith: I think I can query more instances like the case luyao said, those instances only pass to init_host, but won't be used for later compute-manage initialize	14:36
mriedem	Tianhao_Hu: i can't say off the top of my head and without digging through the 23 comments and description and recreate of that bug, which i'm unable to do right now	14:36
*** BjoernT has quit IRC		14:36
mriedem	it also looks like it's rbd backend specific, and i don't have an environment to poke into that or recreate it right now	14:36
mriedem	someone from starlingx that is familiar with the nova code could triage it	14:36
mriedem	could/should	14:36
dansmith	alex_xu: if you pass the list of instances on this host to init_host(), then you can get any instances that have xml but are not in that list, and do an independent query for those, which will be much smaller	14:38
mriedem	sean-k-mooney: i agree it's some kind of misconfig and related to needing to tell the image to use systemd now	14:38
dansmith	alex_xu: however, I would think that if you're just trying to figure out which namespaces are assigned, the xml should be enough, so I'm not sure why you need the instance from the db (but I haven't looked closely)	14:38
alex_xu	dansmith: we need the flavor id. since we use (instance_uuid, flavor_id) to identify a claim. This is due to the same host resize. Both src and dest claim on the same host, we need a way to distinguish the claim	14:40
alex_xu	the instance uuid I can get from the domain xml. the only trouble is the flavor id	14:40
sean-k-mooney	mriedem: looking at the lxc buster(debian 10) images the symlink appear to be there. /sbin/init -> /lib/systemd/systemd	14:41
sean-k-mooney	mriedem: so i think if we just use the right image it should just work	14:41
mriedem	alex_xu: the flavor name is in the domain xml metadata	14:42
mriedem	see https://logs.opendev.org/24/676024/5/experimental/nova-lxc/9c06394/controller/logs/screen-n-cpu.txt.gz#_Aug_13_23_16_20_055037	14:42
mriedem	<nova:flavor name="m1.nano">	14:42
dansmith	alex_xu: okay makes sense	14:42
alex_xu	mriedem: can we ensure all the virt driver persistent the flavor name? or it is libvirt specific.	14:42
mriedem	idk about the other drivers	14:43
mriedem	but they aren't doing pmems either	14:43
dansmith	alex_xu: mriedem you need more than that, you need the actual details of the flavor right?	14:43
sean-k-mooney	alex_xu: i think the metadata is driver specific	14:43
sean-k-mooney	alex_xu: it normally has the flavor uuid too	14:43
dansmith	alex_xu: looking up the flavor by name after the instance is spawned won't work because it may have changed, you need instance.flavor I think	14:43
mriedem	i'm wading into a conversation for which i don't have context of the problem, i'm just saying, the domain xml has flavor info	14:43
alex_xu	dansmith: only need the id	14:43
dansmith	alex_xu: what does that tell you?	14:44
alex_xu	dansmith: I just want to distinguish the src claim and dest claim on the same host. so instance_uuid and flavor_id is enough	14:44
mriedem	the domain xml metadata doesn't store the flavor id so i guess that won't help you	14:44
sean-k-mooney	alex_xu: we generate teh metadtat here https://github.com/openstack/nova/blob/master/nova/virt/libvirt/config.py#L2860	14:44
alex_xu	sean-k-mooney: I'm thinking we have uuid for flavor...?	14:45
mriedem	alex_xu: and this is because the move claim for pmems is moving to the driver, right?	14:45
dansmith	alex_xu: okay I don't understand, since the flavor id won't tell you size or type or anything, but I guess I'll understand when I read	14:45
sean-k-mooney	alex_xu: yes we do but we also have other info	14:45
Tianhao_Hu	mriedem: Thank you for your advice and I will get someone to work with me trying to find out why the directory is left.	14:45
mriedem	alex_xu: flavor.flavorid is the user-facing thing	14:45
mriedem	flavorid is not necessarily a uuid,	14:46
mriedem	but it must be unique	14:46
sean-k-mooney	actully we might not have the uuid unesll we put it in the namve field	14:46
alex_xu	dansmith: flavor id will tell you it is src or dest. the dest is new flavor, the src is old flavor	14:46
luyao	dansmith: flavor id is used to mark a namespace is assigned to which instance with which flavor	14:46
dansmith	luyao: I think that's a bad plan	14:46
sean-k-mooney	alex_xu: you cal always add whatever you need to https://github.com/openstack/nova/blob/master/nova/virt/libvirt/config.py#L2902-L2930	14:46
dansmith	mriedem: IIRC, flavorid is unique, but only amongst non-deleted flavors, so if you delete and recreate a flavor it may be different right?	14:46
alex_xu	mriedem: for same host resize, also note it is useful for vgpu	14:47
mriedem	dansmith: flavors are no longer soft-deletable	14:47
mriedem	since they moved to the api db	14:47
mriedem	schema.UniqueConstraint("flavorid", name="uniq_flavors0flavorid"),	14:47
sean-k-mooney	alex_xu: we are usign a custom xml namespace with a schema filt tha tis not avaliable anymore so you can extend it and ti wont be any more broken then it already is	14:47
dansmith	mriedem: okay, right, but same point.. I can delete a flavor and recreate it and the flavors will be totally different	14:47
alex_xu	dansmith: at least we not allow resize to same flavor(check by flavor id i think)	14:47
mriedem	i don't know enough about the code or the problem here,	14:48
mriedem	note there is also instance.instance_type_id which is the instance.flavor.id,	14:49
mriedem	but not in the domain xml so likel ydoesn't help	14:49
dansmith	but still,	14:49
alex_xu	sean-k-mooney: it is late to add new info to the xml. the existed instance won't have that until a restart	14:49
sean-k-mooney	alex_xu: what info do you need	14:49
dansmith	I think they're making some assumptions that flavorid is permanent and never changes, such that considering two instances with the same pmem flavorid must be the same	14:49
mriedem	alex_xu: existing instances won't have vpmems either	14:49
alex_xu	mriedem: dansmith oops, sorry, I use the flavor.id, not the flavorid	14:49
dansmith	alex_xu: so the integer id?	14:50
alex_xu	dansmith: yes	14:50
dansmith	alex_xu: okay, well, that's still not a good idea, but less problematic	14:50
alex_xu	mriedem: nice point	14:50
*** priteau has joined #openstack-nova		14:50
*** Tianhao_Hu has quit IRC		14:51
*** luksky has quit IRC		14:51
aspiers	Does anyone know why _get_guest_xml() has both instance and image_meta as params, rather than getting image_meta via instance.image_meta? I traced it back and found that prep_resize is called with image_meta from the request_spec, but I'm not sure why a request_spec would have different image_meta to the existing instance	14:52
alex_xu	mriedem: but it should be a problem for vgpu	14:52
alex_xu	mriedem: we already have vgpu instance	14:52
alex_xu	dansmith: one more fun is we support same flavor same host cold migration :)	14:52
alex_xu	dansmith: vmware virt driver is the only driver support that. the same host same flavor cold migration should make no sense for other virt driver	14:53
mriedem	alex_xu: the vmware driver isn't supporting pmems	14:53
dansmith	alex_xu: okay I'm really not sure why these are problematic cases	14:53
luyao	dansmith: beacause when resize to same host ,we can't distinguish the two groups of assignment	14:54
alex_xu	yea, same flavor.id	14:54
mriedem	if you're doing a resize, the flavors have to be different,	14:55
mriedem	if you're doing a cold migration, the flavors must be the same,	14:55
dansmith	luyao: yeah I'm not sure why that's necessary, but it's also a good reason _not_ to use flavor, because it will depend on whether or not the flavor is changing,	14:55
sean-k-mooney	alex_xu: but why do you need to get flaovr info form the xml. for resize we have teh flavor objects and we shoudl have them for cold migrate and live migratie	14:55
mriedem	if you're doing a resize, you can do it on the same host with libvirt,	14:55
dansmith	and building an assumption into this that same-host same-flavor migrations can't happen is a bad idea, IMHO	14:55
mriedem	if you're doing a cold migration, you cannot do it on the same host with libvirt	14:55
mriedem	and it seems folly to think any other driver is going to implement vpmem support	14:55
dansmith	sean-k-mooney: they're talking about during host init, with instances in the middle of a migration	14:55
mriedem	given how complicated this sounds even for libvirt	14:55
sean-k-mooney	dansmith: ah thanks for the context	14:56
dansmith	mriedem: we do same-host same-flavor migration on libvirt in the gate with a single worker yeah?	14:56
mriedem	no	14:56
sean-k-mooney	dansmith: i dont think so	14:56
mriedem	resize yes	14:56
mriedem	not cold migration	14:56
mriedem	you literally can't	14:56
dansmith	why not?	14:56
aspiers	Hrm, sounds like my question might coincidentally be somewhat related to the ongoing conversation, except for image_meta instead of flavors	14:56
sean-k-mooney	dansmith: the livrt driver does not report supprot for it	14:57
alex_xu	sean-k-mooney: avoid db query in libvirt init	14:57
mriedem	https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L314	14:57
mriedem	because ^	14:57
mriedem	remember the other day when we were talking about a workaround option to disallow cold migrating to the same source host as a scheduler option?	14:57
mriedem	this is why	14:57
alex_xu	same flavor same host cold migration means....you stop and start the instance again :)	14:57
sean-k-mooney	alex_xu: why we will be hamming it anyway when the resouce tracker starts up	14:57
*** ociuhandu has joined #openstack-nova		14:57
dansmith	mriedem: either way, don't you think that making that assumption in the accounting for these is a bad plan, if that gets enabled in the future for whatever reason?	14:57
dansmith	alex_xu: no it doesn't	14:57
*** boxiang has quit IRC		14:58
dansmith	mriedem: okay that's just because libvirt won't handle the disk copying properly because the names are all the same righht?	14:58
*** boxiang has joined #openstack-nova		14:58
mriedem	idk what cold migrating on the same host with libvirt would mean	14:58
alex_xu	yea, same question	14:58
dansmith	ages ago we had some CI system (not vmware) that could do same-host cold migration because it would find bugs in that, but maybe it was xen or smokestack or something	14:59
dansmith	just seems like a terribly deep assumption to make	14:59
mriedem	idk, the xenapi driver doesn't support it either https://github.com/openstack/nova/blob/master/nova/virt/xenapi/driver.py#L67	14:59
mriedem	only the vmware driver does	14:59
*** liuyulong has quit IRC		14:59
mriedem	i also don't understand how init_host is connected to a move claim	15:00
dansmith	alex_xu: explain again why you need to tie flavors to the pmem at all for accounting?	15:00
dansmith	mriedem: this is for host restart, rebuilding state for what has been given to what	15:00
dansmith	mriedem: but I don't understand why we need to correlate that with anything other than instance uuid, which is done in the xml, AFAIK	15:01
mriedem	yeah....i thought the pmem "allocation" type data was stored in instance_extra or something	15:01
mriedem	keyed by instance uuid	15:01
alex_xu	for same host resize, we records the claim in memory. but when we confirm and revert the resize, we need unclaim for dst claim or src claim. to distinguish that, only the instance_uuid isn't enough. I use instance_uuid + flavor.id.	15:01
alex_xu	dansmith: mriedem sean-k-mooney ^	15:01
dansmith	alex_xu: isn't the whole point of this pmem stuff that the contents get moved with the instance if it crosses to another host?	15:02
dansmith	(tbh I don't know what the point of this is, but...)	15:02
mriedem	if you're adding a new move claim interface to the virt driver, why wouldn't you just pass the information down to the driver that the RT already knows - i.e. the old and new flavor and if it's a same-host resize?	15:02
mriedem	if we have to piece information together from what the hypervisor says in the domain xml, we screwed up somewhere in the design :)	15:03
alex_xu	when you confirm resize, you neeed unclaim the assigment for the source host. But for the same host, if only use instance_uuid, when you unclaim by instance_uuid, both the source and dest claim will be unclaim	15:03
alex_xu	same problem for revert resize	15:03
dansmith	alex_xu: so change the source allocation to be by migration uuid like we do for allocations?	15:05
alex_xu	dansmith: the problem is when we revert the migration uuid back to instance uuid. for the resize to different host, those claim on two host's memory, we can't change the migration uuid back to instance uuid when the resize finish	15:05
dansmith	alex_xu: why not?	15:06
dansmith	if you can change it from instance to migration, why can't you change it back?	15:06
*** helenafm has quit IRC		15:06
*** jangutter has quit IRC		15:07
alex_xu	dansmith: we claim the vpmem on the dest host(prep_resize), at that point, we need to change the source host claim's instance_uuid to migration_uuid.	15:08
dansmith	...right	15:08
alex_xu	yea...there is no way to change source host claim	15:09
dansmith	are you saying there's no way to rename a pmem device or whatever you're naming (instance, flavor) ?	15:11
*** ociuhandu has quit IRC		15:11
*** ociuhandu has joined #openstack-nova		15:12
artom	alex_xu, I feel like we should coordinate on the claim thing	15:12
artom	Because IIUC your series touches on the same points as NUMA LM	15:12
alex_xu	dansmith: when nova-compute startup, we populate all the assignment of vpmem and vgpu into memory, store those devices claim in the memory. key by (instance, flavor)	15:12
alex_xu	artom: yea	15:13
sean-k-mooney	alex_xu: well why dont we just store them with the instance as teh key	15:13
dansmith	alex_xu: are the pmem devices actually named something specific in linux? or are they just /dev/pmem/2 and we maintain the mapping between them and the instance?	15:13
alex_xu	sean-k-mooney: back to the same host resize problem :)	15:13
artom	alex_xu, what's your series? I'm having trouble searching gerrit for patches you own	15:14
mriedem	(instance, flavor) is redundant because we have instance.flavor	15:14
alex_xu	dansmith: they are just /dev/pmem, but I don't get what relationship with the vpmem name at here	15:14
*** slaweq has quit IRC		15:14
sean-k-mooney	alex_xu: then use instace+flavor_name which is in the xml or add the uuid to the xml	15:14
dansmith	alex_xu: okay so I don't understand why you have a problem accounting things with migration uuid vs. instance uuid at any given point	15:14
mriedem	i think dan is saying, you claim the new_flavor devices on the dest host with the instance, and claim the old_flavor devices on the source host with the migration	15:15
dansmith	sean-k-mooney: it's not a matter of storing that data, and further, flavor is an object that exists and is keyed in another database, so I think using it as a key here is a bad idea as well	15:15
dansmith	mriedem: right	15:15
mriedem	and by "claim the old_flavor devices on the source host with the migration" i mean, swap the mapping for the old_flavor device claim on the source host with the migration for the instance	15:15
dansmith	mriedem: same pattern as we have elsewhere	15:15
alex_xu	dansmith: the resize call to the dest host to claim the resource first, then how can we change the in-memory claim info in the source host? by another rpc call, back to the source host?	15:16
mriedem	when you revert on the source host, you revert that mapping as well	15:16
dansmith	alex_xu: you get a finish call or a revert call, so you do it then and there right?	15:16
mriedem	alex_xu: do it when prep_resize casts to resize_instance on the source host?	15:16
sean-k-mooney	well we could flip it and claim with the migration uuid on the dest	15:16
dansmith	sean-k-mooney: no	15:16
sean-k-mooney	ok	15:17
dansmith	sean-k-mooney: because then you have to change it again if they keep the instance, which is the natural path	15:17
sean-k-mooney	ya that is true you would have to change it in confrim resize	15:17
alex_xu	dansmith: we should change to migration uuid before we add new claim for the target	15:17
mriedem	or like dan said just swap the source host claim during confirm_resize or finish_revert_resize (i'm not sure you'd even need to change the latter)	15:17
alex_xu	otherwise how can we distinguish the src and dest for the same host resize...	15:17
mriedem	for same host resize,	15:18
mriedem	you have some mapping of devices claimed on that host, right?	15:18
dansmith	alex_xu: are you familiar with how we do this for placement allocations?	15:18
mriedem	the instance maps to the new flavor device allocations, the migration maps to the old flavor device allocations	15:18
dansmith	yes ^	15:18
alex_xu	yea, I know those thing for placement	15:18
*** slaweq has joined #openstack-nova		15:18
*** dave-mccowan has joined #openstack-nova		15:19
mriedem	so for same host resize move claim it seems you'd:	15:19
mriedem	1. map migration = old flavor	15:19
mriedem	2. change instance mapping to point from old to new flavor	15:19
mriedem	on confirm, you remove the mapping for the migration,	15:19
mriedem	on revert, you:	15:19
mriedem	1. map instance to old flavor,	15:19
mriedem	2. drop the migration mapping	15:19
dansmith	don't say flavor here	15:19
dansmith	but yes	15:19
mriedem	flavor device allocation thing 2000	15:20
mriedem	will flavour work?	15:20
dansmith	no	15:20
dansmith	I think building more things using the existing pattern of using migration uuid to reserve the old resources is a good idea	15:21
alex_xu	emm...I try to remember which case we said no for this in the beginning	15:21
alex_xu	luyao: ^ help me	15:21
luyao	alex_xu: I'm always trying...	15:22
*** slaweq has quit IRC		15:24
*** tbachman has quit IRC		15:24
tssurya	mriedem: do you also prefer to push all the instance state checking and updates into the manager like dansmith ? since we have a lock there	15:25
alex_xu	mriedem: dansmith the first rpc call to the comptute node for the resize is send to the dest src. so you can do map 'migration = old flavor' first	15:26
*** dave-mccowan has quit IRC		15:26
*** nweinber_ has joined #openstack-nova		15:26
dansmith	alex_xu: you can't, but you don't have to do the migration mapping until you hit the source for the first time	15:26
dansmith	alex_xu: isn't this the exact same set of steps for allocations in placement? so the ordering should work the same way	15:27
*** nweinber__ has quit IRC		15:27
openstackgerrit	YAMAMOTO Takashi proposed openstack/nova master: Revert "Revert resize: wait for events according to hybrid plug" https://review.opendev.org/675021	15:27
openstackgerrit	YAMAMOTO Takashi proposed openstack/nova master: Revert "Pass migration to finish_revert_migration()" https://review.opendev.org/676442	15:27
*** nweinber__ has joined #openstack-nova		15:28
*** macz has joined #openstack-nova		15:28
mriedem	well, it's a little different with placement allocations since we can manage those at the top in conductor	15:28
alex_xu	dansmith: we switch instance_uuid to migration_uuid in the conductor, right?	15:28
dansmith	okay, that's fair	15:28
mriedem	for example, if we claim the new devices on the dest in prep_resize, cast to resize_instance on the source, swap the old devices to the migration, and then something fails during the disk transfer or whatever, we won't go back to the dest to cleanup that old claim - but the RT should fix that up in a periodic	15:29
mriedem	i don't know where these "claims" are stored in memory though - in the compute manager? RT? driver?	15:29
mriedem	i'll just say, the RT logic is already super complex, and now it sounds like we're going to be duplicating parts of it elsewhere...	15:30
alex_xu	mriedem: swap should be happened first for same host resize	15:30
alex_xu	mriedem: claim store in driver	15:30
*** nweinber_ has quit IRC		15:30
*** shilpasd has joined #openstack-nova		15:30
mriedem	cleaning up from a failed same host resize is simpler since yo'ure on the same host, but not for different hosts	15:30
sean-k-mooney	alex_xu: this is diferent then how we claim pci devices and cpu/hugepage right?	15:31
*** tbachman has joined #openstack-nova		15:31
mriedem	the driver doing resource tracking now.... :(	15:31
sean-k-mooney	because for those i thought we stored the claims in the db via the RT rather then in memeroy	15:31
alex_xu	sean-k-mooney: yes, at least the vpmem and vgpu is managed by virt driver. pci and cpu managed by resource tracker	15:32
mriedem	the resource tracker doing resource tracking still ... :(	15:32
mriedem	what a mess	15:32
mriedem	some things in placement, some things in legacy nova tables and the RT, some things now in the driver	15:32
sean-k-mooney	we should really be keepign all this in the RT untill its in placment	15:32
sean-k-mooney	if its ever in placment	15:33
sean-k-mooney	put this in memory in the driver is worring	15:33
alex_xu	sean-k-mooney: mriedem in the future, when pci and numa move to placement, then we needn't store in RT, right? then also managing in the virt driver?	15:33
sean-k-mooney	or at least complex	15:33
sean-k-mooney	alex_xu: it will still be needed in some cases	15:33
alex_xu	sean-k-mooney: hah, you say different with dansmith :)	15:33
* alex_xu prepare a gun for dansmith		15:33
sean-k-mooney	we placement wont be tracking indeivusal device assignment	15:34
sean-k-mooney	e.g which vf(pci addres) the vm is using	15:34
sean-k-mooney	to do that we would need to create a RP per vf which we are not going to do	15:34
alex_xu	sean-k-mooney: yea, that is what happen for vgpu and vpmem	15:34
dansmith	I'm not sure what sean-k-mooney said that is different than me	15:35
sean-k-mooney	so the "assignment" infomation will always need to live in nova	15:35
sean-k-mooney	the tally count of how many are avaliable wil be in placment	15:35
dansmith	sean-k-mooney: when we previously discussed this, I wanted to avoid nova storing a mapping between the actual pmem device and the instance in our database, for a specific reason	15:36
alex_xu	I think dansmith said we use libvirt to persistent the assigment of devices, not DB. sean-k-mooney is talk about we still need the DB	15:36
sean-k-mooney	well currently we regenerate teh xmls on lots of operations so storign the mapping in the xml will be invasive	15:37
dansmith	right, so we have the mapping between instances and pmem devices stored in the libvirt xml	15:37
dansmith	whatever, I give up, do whatever ya	15:37
dansmith	'll want	15:37
sean-k-mooney	it will be there implictly i guess	15:37
alex_xu	dansmith: no...	15:37
sean-k-mooney	dansmith: we dont need to store it in the db provide we will never use it in a filter	15:38
sean-k-mooney	we only need to store the pci info in the db to use it with the numa/pci passthough filters	15:38
sean-k-mooney	same for the numa toplogy blob	15:38
sean-k-mooney	we could caluate them locally on the host and keep it in meory otherwise	15:39
mriedem	what happens when i need a weigher to pick hosts with more or less allocated pmems?!	15:39
sean-k-mooney	so if the only schduling for vpmem is done via placmeent then the assignment could be track via the xml	15:39
mriedem	or pmem affinity	15:40
sean-k-mooney	mriedem: we would either need to call placement for the data or we cant	15:40
mriedem	btw, are there a fair number of rhosp users using vgpus now that you're on queens?	15:40
sean-k-mooney	pmem affinity(i assume numa affinity) could be modeled in the RP tree	15:41
sean-k-mooney	mriedem: not that im aware of	15:41
sean-k-mooney	most are using full GPU passthough when they need gpus	15:41
mriedem	was just going to say that	15:41
sean-k-mooney	nvida licening is $$$$	15:41
dansmith	I'll be really honest here, I think this is a very niche, very libvirt-specific, very unlikely-to-be-widely-used feature, and I think that adding a bunch of nova persistence for these things brings more impact to operations and upgrades than we need,	15:42
dansmith	and storing this information purely in the place where it matters (in libvirt) limits that impact and scope a lot	15:42
*** ivve has quit IRC		15:42
sean-k-mooney	dansmith: im not against that just wanted to point out we have always assumed the xmls are not required until now	15:43
dansmith	if an operator changes hardware after a maint cycle that changes the ordering of these devices or something, I worry about handing persistent devices to the wrong instances, and I think keeping the mapping(s) in one place that is visible and accessible to the operators if they need to remap is also a good idea	15:43
sean-k-mooney	e.g. that we can jsut regenerate them	15:43
dansmith	sean-k-mooney: no, that's not true I don't think	15:43
dansmith	sean-k-mooney: if we delete the instance from a guest and we restart nova I think it will freak	15:43
sean-k-mooney	yes but if an operator change the xml with virsh and we do a hard rebot we jsut regenerate it	15:44
dansmith	sean-k-mooney: regenerating the xml all the time does not mean that the xml is not useful data.. we use it to determine which instances are actually on this host, vs just assigned	15:44
sean-k-mooney	if the domain is missing i dont knwo what happens	15:44
dansmith	sean-k-mooney: not if we don't store that detail ourselves	15:44
dansmith	anyway, I think I've already spent way more time on this than this feature is worth,	15:44
dansmith	and the column in the db to just dump a blob of data into instance_extra was already merged before this was all discussed,	15:45
dansmith	so the easiest thing is to just let that become a dumping ground for all this stuff, regardless	15:45
dansmith	alex_xu: really sorry for ever even involving myself in this, my apologies	15:46
*** tbachman has quit IRC		15:46
mriedem	onto tssurya's problem!	15:46
tssurya	yayy	15:46
dansmith	I was just goign to say	15:46
dansmith	tssurya: I missed if there was a reply on the plan	15:46
tssurya	not yet waiting for mriedem's opinion	15:47
mriedem	she's just asking if i agree with changing task_state in hte api	15:47
dansmith	yeah	15:47
dansmith	didn't see a response on that	15:47
tssurya	let a comment on the patch: https://review.opendev.org/#/c/645611/	15:47
tssurya	left*	15:47
alex_xu	dansmith: sorry, I'm trying my best make it simple and easy. but yea, i still found those issue need help	15:47
*** tbachman has joined #openstack-nova		15:47
tssurya	also efried ^ in case you have an opinion	15:47
mriedem	so if on power-update we don't check or set task_state in the api, we avoid the "instance is stuck witk with non-none task_state b/c it's on a stein compute" issue	15:47
tssurya	right only to do the same in the manager	15:48
mriedem	and thhe driver / compute manager would need to handle the UnexpectedTaskStateError	15:48
dansmith	yep, and it moves the "do we do anything about this" closer to the thing that makes that decision	15:48
tssurya	why does the driver have to handle UnexpectedTaskStateError ?	15:49
dansmith	tssurya: it doesn't I don't think	15:49
tssurya	I would be moving the task_state saving part into the manager	15:49
mriedem	the downside is losing some race between the api and the compute manager where the sync power states task has turned off your bm instance b/c the nova db said it should be off but it's actually on again in ironic, right?	15:49
tssurya	and since it has a lock it should be fine	15:49
dansmith	when Ioriginally suggested this, I was imaging the driver wholly owning the "what do we do"	15:49
tssurya	dansmith: yea I remember you telling it in the spec design phase	15:49
dansmith	so only ironic will be anything other than a no-op in this case, and all it needs to do is do its poweroff	15:49
*** damien_r has quit IRC		15:49
dansmith	tssurya: the lock only works within a compute node,	15:50
tssurya	its just that the notifications/action stuff happens in the upper level	15:50
dansmith	tssurya: so you can still race with the compute node processing this and something else trying to take action on the instance	15:50
mriedem	something needs to handle UnexpectedTaskStateError for power-update otherwise we fail to process all of the events in the same request on the same host	15:50
tssurya	dansmith: oh yea true	15:50
dansmith	tssurya: meaning ironic may have sent that event, and meanwhile some user tried to reboot the instance at the same time	15:50
mriedem	and you could lose a race with the bug you're trying to fix with this, i think, right?	15:51
tssurya	so what's the point of moving it to the manager again ?	15:51
dansmith	mriedem: all we need to do is handle, in the ironic case, what happens if I do instance.save(expected_Task_state=None) right?	15:51
mriedem	tssurya: rolling upgrades for one	15:51
tssurya	ah yes	15:51
dansmith	tssurya: and it makes it so we don't touch the instance at all for any drivers that don't care about this	15:51
tssurya	mriedem: and yea the downside point is valid	15:52
mriedem	dansmith: i think so, but if the driver raises the nthe compute manager code has to handle it and not barf for the other events in the same request	15:52
dansmith	mriedem: I think the driver should just not raise, I think the driver should handle all of this, because it's only one	15:52
mriedem	so if the driver gets UnexpectedTaskStateError, it just logs and returns?	15:53
dansmith	yeah	15:53
mriedem	wfm	15:53
dansmith	it will be contextually relevant,	15:53
tssurya	dansmith, mriedem: wait the instance.save(expected_Task_state=None) is happening in the manager no ?	15:53
dansmith	where compute manager won't know what the eff it means	15:53
tssurya	and the instance.save(expected_task_state=task_states.POWERING_ON)	15:54
tssurya	will happen in the driver	15:54
mriedem	tssurya: no, your code is doing that in the driver	15:54
dansmith	tssurya: I don't think so	15:54
mriedem	it's like the only thing the driver does	15:54
mriedem	https://review.opendev.org/#/c/645611/12/nova/virt/ironic/driver.py	15:54
tssurya	ok so we record an instance action start before setting the task_state to POWERING_ON ?	15:54
mriedem	the instance actoin stuff in the api doesn't change	15:54
mriedem	imo	15:54
dansmith	I dunno	15:55
mriedem	that's how you know from the api if the event failed or not	15:55
tssurya	so isn't it confusing to the user that the task_state hasn't changed	15:55
tssurya	but there is an instance-action record/notification	15:55
tssurya	emitted	15:55
dansmith	mriedem: if I send this event for libvirt instance, it will record an action but nothing really happened	15:55
*** rpittau is now known as rpittau\|afk		15:55
mriedem	dansmith: and i'll say, you shouldn't really do that	15:55
mriedem	same with trying to swap volumes on anything other than libvirt	15:55
dansmith	:/	15:56
*** tesseract has quit IRC		15:56
mriedem	or extend an attached volume, or do host-assised snapshot, or anything with pmems :)	15:57
mriedem	so https://review.opendev.org/#/c/645611/12/nova/compute/api.py@250 would move to the compute manager, or the ironic driver?	15:58
dansmith	yeah, well, this is a minor detail	15:58
mriedem	i'm cool with that moving to the driver too	15:58
mriedem	until some other driver needs this same thing for whatever reason	15:58
dansmith	I would think it moves to the driver, and really doesn't need to be all that detailed I would think, but yeah	15:58
dansmith	I mean, I guess it could stay, I don't really care that much	15:59
tssurya	ok I can move that whole chunk with the no-op	15:59
tssurya	all into the driver	15:59
tssurya	only the instance action and notification stuff remain in place in the api and manager right ?	15:59
mriedem	the state checking in the api could save you some rpc traffic i guess, but you could be racing either way	15:59
tssurya	but then again, is it ok for the task_state to still be None when we emit the state notification ?	16:00
dansmith	the thing I think makes the biggest difference, is changing the power state before we know if we're going to do anything	16:00
tssurya	start*	16:00
dansmith	I don't know the semantics of any expectations around that	16:00
tssurya	ok yea then maybe its fine :)	16:01
dansmith	I didn't really think notifications are required to be realtime and synchronous, so I don't think it'd matter	16:01
mriedem	tssurya: you can set the task_state on the instance in the compute manager before calling the driver method if we care - but then you have to handle UnexpectedTaskStateError there as well	16:01
dansmith	but if we do that,	16:01
dansmith	we	16:01
dansmith	are changing it for non-ironic instances,	16:02
tssurya	hmm yea	16:02
dansmith	then catch the NotImplementedError, and then change it right back, yeah?	16:02
mriedem	which won't actually power off or on the guest	16:02
mriedem	@reverts_task_state would set the task_state back to None yes	16:02
mriedem	but i'm not too worried about non-ironic instances because this is admin-only api stuff and if someone is abusing it for non-ironic instances and it doesn't work as expected....meh?	16:03
dansmith	okay	16:03
tssurya	so what't it going to be: manager or driver ?	16:03
tssurya	at this point I don't mind either	16:03
dansmith	"at this point" .. translation: "at this point I'm about to shoot both of you guys"	16:04
mriedem	i forgot the question	16:04
tssurya	dansmith: ;)	16:04
dansmith	tssurya: I think it doesn't matter that much, figure it out when you move the code and it's probably okay	16:04
tssurya	mriedem: meaning we move the task state and https://review.opendev.org/#/c/645611/12/nova/compute/api.py@250	16:04
dansmith	tssurya: also when are you actually leaving?	16:04
*** tssurya has quit IRC		16:05
mriedem	right now :)	16:05
*** tssurya has joined #openstack-nova		16:05
dansmith	lol	16:05
dansmith	#ragequit	16:05
mriedem	(11:04:47 AM) dansmith: tssurya: also when are you actually leaving?	16:05
tssurya	sorry bad connection	16:05
tssurya	I leave on Friday	16:05
dansmith	okay	16:05
tssurya	so I can work on it this evening and tomorrow	16:05
dansmith	okay cool	16:06
dansmith	tssurya: so I said I think manager vs. driver probably doesn't matter that much, you choose while moving the code and it'll probably be okay	16:06
dansmith	I don't have a strong opinion without seeing it, so maybe just have to pick one and see	16:06
tssurya	ok awesome thanks	16:06
dansmith	mriedem: sound okay?	16:06
tssurya	so first I'll try with the manager and then we can see	16:06
mriedem	i'll leave a couple of comments	16:06
tssurya	thanks a lot dansmith and mriedem :D	16:07
*** belmoreira has quit IRC		16:07
mriedem	done https://review.opendev.org/#/c/645611/12	16:10
*** cdent has quit IRC		16:12
tssurya	ty	16:13
dansmith	mriedem: yeah, so assuming your "don't do this for libvirt" assertion I think what you said in there is fine	16:13
openstackgerrit	Merged openstack/nova stable/stein: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676289	16:13
*** cfriesen has quit IRC		16:13
*** ricolin has quit IRC		16:29
*** dklyle has quit IRC		16:36
*** dklyle has joined #openstack-nova		16:37
*** tssurya has quit IRC		16:39
*** belmoreira has joined #openstack-nova		16:39
*** ociuhandu_ has joined #openstack-nova		16:41
*** belmoreira has quit IRC		16:41
alex_xu	dansmith: mriedem sean-k-mooney anyway thanks for your time today, it is indeed I use a lot today :)	16:42
*** fungi has quit IRC		16:42
*** fungi has joined #openstack-nova		16:43
dansmith	alex_xu: it's okay, and I think sean-k-mooney has more than 24 hours in his days, so he has plenty to spare :)	16:43
*** shilpasd has quit IRC		16:43
mriedem	no problem	16:43
*** ociuhandu has quit IRC		16:44
*** markvoelker has quit IRC		16:44
dansmith	mars has 25 hour long days, so maybe sean-k-mooney is from mars	16:44
openstackgerrit	Matt Riedemann proposed openstack/nova stable/pike: Hook resource_tracker to remove stale node information https://review.opendev.org/676461	16:45
*** ociuhandu_ has quit IRC		16:45
*** adrianc has quit IRC		16:50
*** markvoelker has joined #openstack-nova		16:51
openstackgerrit	Matt Riedemann proposed openstack/nova stable/pike: rt: only map compute node if we created it https://review.opendev.org/676463	16:52
*** adrianc has joined #openstack-nova		16:52
*** mrjk has quit IRC		16:54
openstackgerrit	Matt Riedemann proposed openstack/nova stable/pike: Hook resource_tracker to remove stale node information https://review.opendev.org/676461	16:54
openstackgerrit	Matt Riedemann proposed openstack/nova stable/pike: rt: only map compute node if we created it https://review.opendev.org/676463	16:55
*** ociuhandu has joined #openstack-nova		16:56
*** psachin has quit IRC		16:57
*** tbachman has quit IRC		16:57
*** derekh has quit IRC		17:00
*** ociuhandu has quit IRC		17:00
efried	dustinc: Ima rebase the sdk series and bump ksa/sdk requirements and get it tracking against this devstack patch https://review.opendev.org/#/c/676268/	17:00
efried	as soon as sdk 0.34.0 lands in u-c	17:01
efried	when I say rebase - it's currently based on a really old nova commit, so I'm going to bring it up to current master.	17:01
openstackgerrit	Matt Riedemann proposed openstack/nova stable/ocata: Hook resource_tracker to remove stale node information https://review.opendev.org/676467	17:05
*** mrjk has joined #openstack-nova		17:08
rouk	is there a particular reason why OS-FLV-DISABLED:disabled is not exposed via api client? is it intended to be manual api queries only?	17:12
*** spsurya has quit IRC		17:14
rouk	i see an old proposed patch from... a long time ago, but it was abandoned.	17:15
dansmith	efried: on this:	17:18
dansmith	https://review.opendev.org/#/c/675705/2/nova/tests/functional/regressions/test_bug_1839560.py	17:18
efried	yah	17:19
dansmith	efried: you're just noting that in real life we have a service with hostname "host1" but would normally get uuids for the node names, is that right?	17:19
efried	dansmith: It appears to me as though, in real life, we actually go and create a node for the host, and then delete it when it doesn't come back in the list of nodes.	17:19
efried	and because that messes with the test, it was easier to name the host 'node1', so that we don't do that deletion.	17:20
dansmith	we don't do that in real life though	17:20
efried	well	17:21
efried	what I did was pull down the patch, change node1 to host1 where it was talking about the host, and run the test.	17:21
efried	and saw the message indicating that we deleted the host1 "node"	17:21
dansmith	right, it's just an artifact of the test that we create a host and node from the same name	17:21
dansmith	but in real life that's not how that works,	17:21
efried	oh, it's start_service that's doing that?	17:22
dansmith	it's just because of the test plumbing	17:22
dansmith	no, I think it's because of the fake driver and the service which has already been started by the time we get to the test	17:22
dansmith	well,	17:22
dansmith	maybe not been started,	17:22
dansmith	but because we're using the fake driver,	17:22
dansmith	we've already run get_available_nodes() by the time start_service has returned	17:22
dansmith	and that returns the same name as host.hostname or something	17:23
dansmith	so when he then overrides the node list to be node1, node2, you'd get a deletion of the host1 node that was already created	17:23
dansmith	anyway, I think your "I pulled down the patch" comment explains what I was asking, just that you were noticing test behavior and not concerned about relevance to real life or something else	17:24
efried	I was concerned about relevance to real life. But if you're telling me it's just test harness noise, I'm okay with that.	17:25
dansmith	ack	17:25
efried	however, if it is test harness noise, I feel like we ought to be able to fix it.	17:25
efried	more easily than if it's in the real code.	17:25
dansmith	we could for sure (just made a comment to that effect)	17:26
dansmith	I just don't think it's that important	17:26
dansmith	normally having them match is handy	17:26
efried	"normally" not-ironic	17:26
efried	no doubt	17:26
efried	but for ironic cases in general I would think it's never handy, n/a most of the time, but in certain cases like this one really confusing.	17:27
dansmith	yes, normally not ironic	17:27
dansmith	yep	17:27
dansmith	well, yep to the first part, not the second	17:27
dansmith	I don't really think it's confusing, but probably just because I know the details	17:28
dansmith	the test isn't testing the ironic case, it's testing the bring-back-the-dead case,	17:28
dansmith	which is related to how it works with ironic, but... not entirely ironic specific	17:28
dansmith	ANYway	17:28
efried	we could add to the fragility of FakeDriver's init_host by special-casing another CONF.host, say ironic-compute, and making self._nodes [] in that case.	17:30
*** priteau has quit IRC		17:37
*** tbachman has joined #openstack-nova		17:43
*** dpawlik has joined #openstack-nova		17:43
*** ivve has joined #openstack-nova		17:52
*** mjozefcz has quit IRC		18:00
*** gyee has joined #openstack-nova		18:06
*** ociuhandu has joined #openstack-nova		18:11
*** eharney_ has joined #openstack-nova		18:15
*** dpawlik has quit IRC		18:15
*** eharney has quit IRC		18:17
mriedem	efried: fwiw,	18:19
mriedem	when we had a fake.set_nodes global i could have used that to avoid this problem,	18:19
mriedem	but since that was removed, you get the thing described above	18:19
mriedem	"so when he then overrides the node list to be node1, node2, you'd get a deletion of the host1 node that was already created"	18:19
mriedem	so i could have started with host1:host1, and then returned available nodes host1:host1 and host1:node2 but that's also weird - which was my reply	18:20
efried	I let it go, but if y'all keep talking about it, I'm going to have to fix it.	18:25
openstackgerrit	Merged openstack/nova master: lxc: make use of filter python3 compatible https://review.opendev.org/676263	18:25
mriedem	do you want to talk about AZ design in a new private cloud instead?	18:26
mriedem	sean-k-mooney: do you want to backport https://review.opendev.org/#/c/676263/ ?	18:26
*** slaweq has joined #openstack-nova		18:29
*** ociuhandu has quit IRC		18:34
openstackgerrit	Matt Riedemann proposed openstack/nova stable/ocata: Log compute node uuid when the record is created https://review.opendev.org/676487	18:45
*** maciejjozefczyk has joined #openstack-nova		18:53
*** maciejjozefczyk has quit IRC		18:59
*** maciejjozefczyk has joined #openstack-nova		19:04
sean-k-mooney	mriedem: i can backport it. how far?	19:15
sean-k-mooney	mriedem: also i looked at what devstack is doing	19:15
sean-k-mooney	its useing a cirros image for lxc by defult	19:15
sean-k-mooney	i have not unpacked it but they use a different init system or rather non standard	19:15
sean-k-mooney	so that coudl be the issue with the other error	19:16
sean-k-mooney	i might try to set up an lxc deployment locally and see if i can figure it out later in the week	19:16
*** boxiang has quit IRC		19:19
openstackgerrit	Eric Fried proposed openstack/nova master: Enhance SDK fixture for 0.34.0 https://review.opendev.org/676495	19:20
efried	mordred: ^	19:20
mordred	efried: ++ I think that's the right call	19:21
efried	mriedem, dansmith: can I get a fast approval (assuming zuul happy) for this --^ please? It's blocking u-c for openstacksdk https://review.opendev.org/#/c/676457/	19:21
openstackgerrit	sean mooney proposed openstack/nova stable/stein: lxc: make use of filter python3 compatible https://review.opendev.org/676496	19:21
sean-k-mooney	if i want to cherry pick back multiple branches gerrit does not update the cherry picked from commit like unless its merged right	19:23
sean-k-mooney	so i shoudl either do it via the commandline or update it manually?	19:24
openstackgerrit	sean mooney proposed openstack/nova stable/rocky: lxc: make use of filter python3 compatible https://review.opendev.org/676498	19:25
openstackgerrit	sean mooney proposed openstack/nova stable/queens: lxc: make use of filter python3 compatible https://review.opendev.org/676500	19:25
*** beagles_pto has quit IRC		19:26
openstackgerrit	sean mooney proposed openstack/nova stable/pike: lxc: make use of filter python3 compatible https://review.opendev.org/676502	19:26
sean-k-mooney	ok i have cherry picked it to pike since that is when we added python3 support now to fix all the topics	19:27
mriedem	efried: done	19:31
efried	thanks	19:32
*** kaisers has quit IRC		19:32
mriedem	nice to know that after talking about what to do for AZs in a new cloud for a couple of hours that my initial, "at this point i don't think we need any AZs" comment turned out to be the correct one	19:33
mriedem	$$$ professional $$$	19:33
*** kaisers has joined #openstack-nova		19:34
*** eharney_ has quit IRC		19:38
*** shilpasd has joined #openstack-nova		19:40
sean-k-mooney	the only thinig i have ever used AZs for is to give people the choice of the old servers, the new servers or the ci servers	19:40
openstackgerrit	Matt Riedemann proposed openstack/nova stable/stein: Add functional regression recreate test for bug 1839560 https://review.opendev.org/676507	19:47
openstack	bug 1839560 in OpenStack Compute (nova) "ironic: moving node to maintenance makes it unusable afterwards" [High,In progress] https://launchpad.net/bugs/1839560 - Assigned to Matt Riedemann (mriedem)	19:47
mriedem	sean-k-mooney: which you could do with host aggregates and flavors tied to those aggregates	19:49
*** nweinber__ has quit IRC		20:00
*** tbachman has quit IRC		20:00
mriedem	dropping os-acc before we even have a working nova/cyborg integration seems like jumping the gun	20:02
efried	mordred: stubbing Adapter.get_endpoint is too big a hammer. I'm struggling to come up with another approach...	20:06
*** luksky has joined #openstack-nova		20:07
mordred	efried: poop. my brain is towards the end of its hours of effectiveness in terms of coming up with useful ideas... I'll dig in to it first thing in the morning when I'm fresh.	20:07
mordred	there's got to be something	20:07
efried	mordred: problem is a bunch of other things hit Adapter and Adapter.get_endpoint	20:08
mordred	yeah	20:08
openstackgerrit	Matt Riedemann proposed openstack/nova stable/stein: Restore soft-deleted compute node with same uuid https://review.opendev.org/676509	20:10
openstackgerrit	Merged openstack/nova master: Add functional regression recreate test for bug 1839560 https://review.opendev.org/675705	20:14
openstack	bug 1839560 in OpenStack Compute (nova) stein "ironic: moving node to maintenance makes it unusable afterwards" [High,In progress] https://launchpad.net/bugs/1839560 - Assigned to Matt Riedemann (mriedem)	20:14
openstackgerrit	Merged openstack/nova master: Restore soft-deleted compute node with same uuid https://review.opendev.org/675496	20:14
openstackgerrit	Matt Riedemann proposed openstack/nova stable/rocky: Add functional regression recreate test for bug 1839560 https://review.opendev.org/676513	20:16
openstack	bug 1839560 in OpenStack Compute (nova) stein "ironic: moving node to maintenance makes it unusable afterwards" [High,In progress] https://launchpad.net/bugs/1839560 - Assigned to Matt Riedemann (mriedem)	20:16
*** tbachman has joined #openstack-nova		20:18
openstackgerrit	Matt Riedemann proposed openstack/nova stable/rocky: Restore soft-deleted compute node with same uuid https://review.opendev.org/676514	20:19
mriedem	mnaser: backports are all up	20:19
mriedem	TheJulia: looks like ironic-tempest-dsvm-ipa-wholedisk-bios-agent_ipmitool-tinyipa is busted on stable/rocky,	20:20
mriedem	seeing this: It looks like a path. File ''/home/zuul/src/opendev.org/openstack/ironic-tempest-plugin'' does not exist.	20:21
mriedem	queens and stein are ok	20:23
mriedem	oh i bet it's this https://github.com/openstack/ironic/commit/ef0fde41e9c78e79d0b3b618102bcb475fa1f691	20:27
*** bbowen__ has quit IRC		20:29
*** bbowen__ has joined #openstack-nova		20:29
TheJulia	but that was on master and stein	20:30
mriedem	"The has stopped working out of a sudden." doesn't mention the root cause	20:30
mriedem	which might be something that affects rocky as well	20:30
TheJulia	mriedem: unless your meaning that it might be the fix?	20:30
*** tbachman has quit IRC		20:30
mriedem	yeah	20:30
TheJulia	maybe, I'm wondering why we've not seen it on rocky then	20:31
TheJulia	mriedem: are you guys just invoking the job directly by name, or do you hae a definition in nova's config?	20:31
mriedem	looks like you have... https://review.opendev.org/#/c/648360/	20:31
mriedem	we just use the job name	20:31
mriedem	https://github.com/openstack/nova/blob/stable/rocky/.zuul.yaml#L213	20:32
TheJulia	interesting	20:32
mriedem	yeah stable/rocky is broken for ironic as well,	20:32
mriedem	i'm working on backporting that fix but there are merge conflicts	20:32
TheJulia	well, it was working as of Aug 3	20:34
TheJulia	mriedem: okay, if your doing that, I'll wait then	20:35
*** tbachman has joined #openstack-nova		20:41
*** dpawlik has joined #openstack-nova		20:45
*** efried has quit IRC		20:45
*** efried has joined #openstack-nova		20:46
*** maciejjozefczyk has quit IRC		21:00
*** eharney_ has joined #openstack-nova		21:05
*** slaweq has quit IRC		21:06
melwitt	mriedem: I'm thinking I should fix this up and rebase on it for the multi-cell archive patch https://review.opendev.org/675218	21:18
melwitt	bc if it does the re-use of db engine, I won't have to make any changes to the "for table" method	21:19
mriedem	melwitt: i forgot all about that patch :(	21:21
mriedem	i've got part of it fixed, i can push that up once i'm done rebasing the cross-cell-resize series and mark as WIP if you want to swing at it	21:21
mriedem	but yeah i assumed you'd rebase on that series of fixups	21:21
melwitt	me too kinda, until I went to address review feedback and was like, hm, I think mriedem did something about this in another patch	21:22
melwitt	ok, that would be great. thanks	21:22
*** markvoelker has quit IRC		21:26
*** dpawlik has quit IRC		21:35
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_dest compute method https://review.opendev.org/633293	21:38
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add PrepResizeAtDestTask https://review.opendev.org/627890	21:38
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_source compute method https://review.opendev.org/634832	21:38
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add nova.compute.utils.delete_image https://review.opendev.org/637605	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add PrepResizeAtSourceTask https://review.opendev.org/627891	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Refactor ComputeManager.remove_volume_connection https://review.opendev.org/642183	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add power_on kwarg to ComputeDriver.spawn() method https://review.opendev.org/642590	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add finish_snapshot_based_resize_at_dest compute method https://review.opendev.org/635080	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add FinishResizeAtDestTask https://review.opendev.org/635646	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add Destination.allow_cross_cell_move field https://review.opendev.org/614035	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Execute CrossCellMigrationTask from MigrationTask https://review.opendev.org/635668	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Plumb allow_cross_cell_resize into compute API resize() https://review.opendev.org/635684	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Filter duplicates from compute API get_migrations_sorted() https://review.opendev.org/636224	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Change HostManager to allow scheduling to other cells https://review.opendev.org/614037	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Start functional testing for cross-cell resize https://review.opendev.org/636253	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Handle target host cross-cell cold migration in conductor https://review.opendev.org/642591	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Validate image/create during cross-cell resize functional testing https://review.opendev.org/642592	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add zones wrinkle to TestMultiCellMigrate https://review.opendev.org/643450	21:39
openstackgerrit	Matt Riedemann proposed openstack/nova master: WIP: Re-use DB engine connection during archive_deleted_rows https://review.opendev.org/675218	21:45
mriedem	melwitt: ^ some test_db_api tests are failing with unique constraint errors, like an instance was archived when it didn't expect it to	21:45
melwitt	mriedem: ok, thanks	21:45
mriedem	ok i fixed it	21:49
mriedem	still need to call metadata.bind.connect() per call to _archive_deleted_rows_for_table	21:49
mriedem	not entirely sure why...	21:49
*** macz has quit IRC		21:50
openstackgerrit	Matt Riedemann proposed openstack/nova master: WIP: Re-use DB MetaData during archive_deleted_rows https://review.opendev.org/675218	21:52
*** dpawlik has joined #openstack-nova		21:52
mriedem	ah crap need to remove the WIP	21:52
openstackgerrit	Matt Riedemann proposed openstack/nova master: Re-use DB MetaData during archive_deleted_rows https://review.opendev.org/675218	21:52
mriedem	ok now i need to run	21:52
*** mriedem has quit IRC		21:53
openstackgerrit	Dustin Cowles proposed openstack/nova master: Provider config file schema and loader https://review.opendev.org/673341	21:56
openstackgerrit	Dustin Cowles proposed openstack/nova master: WIP: Public method to retrieve custom resource providers https://review.opendev.org/676029	21:56
openstackgerrit	Dustin Cowles proposed openstack/nova master: WIP: Load the custom resource providers to resource tracker https://review.opendev.org/676522	21:56
*** dpawlik has quit IRC		21:56
melwitt	thanks mriedem++	22:00
*** luksky has quit IRC		22:03
openstackgerrit	Eric Fried proposed openstack/nova master: Enhance SDK fixture for 0.34.0 https://review.opendev.org/676495	22:08
efried	mordred: A big but very focused hammer ^	22:08
*** shilpasd has quit IRC		22:09
*** artom has quit IRC		22:09
mordred	efried: wow	22:14
mordred	efried: I feel like pep8 isn't going to like it - but I do	22:14
efried	oh, rat farts, forgot to fix that part...	22:14
openstackgerrit	Eric Fried proposed openstack/nova master: Enhance SDK fixture for 0.34.0 https://review.opendev.org/676495	22:15
efried	thanks mordred for saving me a failed zuul run on a stupid	22:15
*** mlavalle has quit IRC		22:17
openstackgerrit	Merged openstack/nova stable/rocky: Fix misuse of nova.objects.base.obj_equal_prims https://review.opendev.org/676290	22:17
sean-k-mooney	gibi: by the way you asked me over a month ago if i had an example of how to configure a host to use the same interface for both ovs and sriov. i am setting up my sriov servers again and this is the local.conf i use http://paste.openstack.org/show/757169/	22:18
sean-k-mooney	im using port enp1s0f1 as teh ovs port on the br-ex and its also the pf of the sriov vfs with the neutron fdb extention enalbed to make sure that ovs vm to sriov vm traffic does nto need to hairpin at teh top of rack switch	22:20
*** markvoelker has joined #openstack-nova		22:23
*** eharney_ has quit IRC		22:30
*** spatel has quit IRC		22:31
*** ivve has quit IRC		22:33
*** BjoernT_ has quit IRC		22:40
*** markvoelker has quit IRC		22:48
*** tkajinam has joined #openstack-nova		22:50
*** tyreymer has joined #openstack-nova		23:05
*** xek has quit IRC		23:14
*** markvoelker has joined #openstack-nova		23:25
*** markvoelker has quit IRC		23:30
*** tyreymer has quit IRC		23:41

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!