Tuesday, 2022-03-22

*** dasm\|afk is now known as dasm\|off		00:14
*** hemna0 is now known as hemna		02:28
*** hemna9 is now known as hemna		02:45
*** clarkb is now known as Guest2790		03:17
*** bhagyashris is now known as bhagyashris\|PTO		05:43
opendevreview	Stephen Finucane proposed openstack/os-resource-classes master: setup: Update Python testing classifiers https://review.opendev.org/c/openstack/os-resource-classes/+/834643	10:17
opendevreview	Stephen Finucane proposed openstack/os-resource-classes master: setup: Replace dashes with underscores, add links https://review.opendev.org/c/openstack/os-resource-classes/+/834644	10:17
*** sfinucan is now known as stephenfin		10:18
zigo	Is there a way to evacuate a host that has 3 VMs that have affinity? Can I somehow tell nova "migrate them together" ?	10:51
*** prometheanfire is now known as Guest0		11:48
*** ChanServ changes topic to "This channel is for Nova development. For support of Nova deployments, please use #openstack"		11:55
*** osmanlicilegi is now known as Guest2		11:59
opendevreview	anguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration https://review.opendev.org/c/openstack/nova/+/834677	12:41
opendevreview	anguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration https://review.opendev.org/c/openstack/nova/+/834677	12:47
stephenfin	sean-k-mooney: This isn't hugely important, but could you look at https://review.opendev.org/c/openstack/nova/+/723572/ and https://review.opendev.org/c/openstack/nova/+/723573/ today?	12:50
opendevreview	anguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration https://review.opendev.org/c/openstack/nova/+/834677	12:54
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Don't use generic 'Field' container https://review.opendev.org/c/openstack/nova/+/738239	12:58
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Remove unnecessary type aliases, exceptions https://review.opendev.org/c/openstack/nova/+/738240	12:58
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Use imports instead of type aliases https://review.opendev.org/c/openstack/nova/+/738018	12:58
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Remove wrappers around ovo mixins https://review.opendev.org/c/openstack/nova/+/738019	12:58
opendevreview	Stephen Finucane proposed openstack/nova master: WIP: add ovo-mypy-plugin to type hinting o.vos https://review.opendev.org/c/openstack/nova/+/758851	12:58
sean-k-mooney	stephenfin: sure ill take a look at them now while i have context on this they look reasonably short and i see gmann has already reviewed them	13:00
sean-k-mooney	getting rid fo the dict compat layer has been long overdue	13:00
sean-k-mooney	i woudl be nice not to have to review for new usease of them as a dict	13:00
opendevreview	Stephen Finucane proposed openstack/nova master: doc: Remove useless contributor/api-2 doc https://review.opendev.org/c/openstack/nova/+/828599	13:02
EugenMayer	When deploying via terraform it and changing an flavor (thus replacing it) it seems like the old flavour was removed but not yet 'removed from the instance it has been used' and then it all failed. Now i'am stuck with Unable to retrieve instance size information. Details Flavor 384bc436-a0cb-4e4a-80d1-26dd03743061 could not be found. (HTTP 404)	13:54
EugenMayer	(Request-ID: req-7c68445d-a8b5-4ef6-a11d-6f037402d92a) - so basically one of my instances references a flavor that no longer exists. Is there a way to somehow fix this?	13:54
*** dasm\|off is now known as dasm		14:20
artom	Anyone able to run functional tests on ussuri?	14:48
artom	Trying to figure out if it's something local to me, or more widespread	14:48
artom	Seems to be hanging/timing out on:	14:48
artom	functional installdeps: -chttps://releases.openstack.org/constraints/upper/ussuri, -r/home/artom/src/nova/requirements.txt, -r/home/artom/src/nova/test-requirements.txt, openstack-placement>=1.0.0	14:48
* artom strace's		14:49
sean-k-mooney	i can try it one sec	14:50
artom	Seems to be doing... something?	14:50
artom	Looping on https://paste.opendev.org/show/b45jbgPA429f5iKFJSEq/	14:50
sean-k-mooney	looks like we are missing a fixture	14:52
sean-k-mooney	from that trace	14:52
sean-k-mooney	we shoudl not eb doing ioctl;calls in general	14:52
sean-k-mooney	like that implies we are doing file io or network configuration	14:53
sean-k-mooney	its running fine for me	14:56
sean-k-mooney	were you having a failing test?	14:56
sean-k-mooney	or just would not install	14:56
sean-k-mooney	i did locally change the psycopg2 to psycopg2-binary in my test-requirements.txt but that is just because i dont have or want postgress installed on my laptop	14:57
sean-k-mooney	so i dont have the headers to build psycopg2 form source	14:58
artom	So there is a backport in progress	15:01
* artom tries on a pristine ussuri rpeo		15:01
artom	*repo	15:01
*** Guest2790 is now known as clarkb		15:01
artom	But... it's not running any tests (yet), it's on installdeps...	15:01
sean-k-mooney	got the gerrit linke i can try that explictly if you want	15:02
artom	sean-k-mooney, only local for now	15:03
artom	Backporting https://review.opendev.org/c/openstack/nova/+/796907/2/nova/tests/functional/libvirt/test_pci_sriov_servers.py#73 to ussuri	15:04
sean-k-mooney	i had one failure	15:08
sean-k-mooney	FileNotFoundError: [Errno 2] No such file or directory: 'openssl'	15:08
sean-k-mooney	which is likely just down to the fact im runing this on nixos	15:08
artom	Seems to be the same problem with a pristine ussuri...	15:09
artom	I should try on Ubuntu I guess?	15:10
artom	Although func tests should be platform-independant	15:10
sean-k-mooney	running them on macos last night not as much as you woudl think	15:15
sean-k-mooney	we have a bunch that fail because they detach its not linux	15:15
sean-k-mooney	maybe pass -r	15:15
sean-k-mooney	or delete the .tox dir	15:16
sean-k-mooney	incase you have some leftover issue form a previous run	15:16
artom	Yep, tried with -r, same	15:17
sean-k-mooney	odd what distro are you currently using	15:17
sean-k-mooney	i can try on ubunu if you like i also have a centos 9 vm	15:17
artom	F35	15:18
bauzas	reminder : nova meeting in 41 mins here at #openstack-nova	15:19
bauzas	fwiw, DST is not impacting our meeting, as we use UTC	15:19
clarkb	artom: sean-k-mooney: pip installs taking forever likely indicates a dependency resolver problem	15:20
clarkb	we've seen that happen when the solver can't find a valid answer. However constraints tends to fix that and you supply constraints so maybe not that	15:20
sean-k-mooney	clarkb: i dont think it was the resolver	15:20
sean-k-mooney	clarkb: i think artom is gettign stack traces	15:20
artom	sean-k-mooney, no, just spinning in the void	15:21
sean-k-mooney	oh have you added -v	15:21
artom	The paste was a `strace -p` output	15:21
sean-k-mooney	so you can see what actully happening	15:21
artom	sean-k-mooney, facepalm lemme try that	15:23
sean-k-mooney	artom: f35 has a much newer gcc libffi and kernel then ussuri was developed with by the way so that strace was refering to ffi presumable as part of compiling some of the c python modules so there might be issues with tyrign to install ussuri on f35 to run the func tests	15:35
zigo	I just noticed that if a host is over its CPU ratio (because it has been reduced), then live-migrations are silently failing (only the scheduler gives a clue). Is this known? Is this considered a bug? Should I file the bug?	15:42
zigo	The workaround is obviously to temporary up the CPU overcommit ratio temporarily, but that's still kind of annoying to do.	15:43
sean-k-mooney	zigo: yes its a know issue	15:43
sean-k-mooney	it has to do with how placement currently validates allocation candiates	15:43
sean-k-mooney	if its the issue i think it is	15:43
zigo	Thanks.	15:45
sean-k-mooney	if i rememebr correctly it also effect evacuate	15:48
bauzas	last reminder : nova meeting in 9 mins	15:51
bauzas	#startmeeting nova	16:00
opendevmeet	Meeting started Tue Mar 22 16:00:16 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.	16:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	16:00
opendevmeet	The meeting name has been set to 'nova'	16:00
bauzas	hey ho	16:00
elodilles	o/	16:00
chateaulav	\o	16:00
bauzas	#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting	16:00
gmann	o/	16:00
dansmith	o/	16:00
artom	~o~	16:01
bauzas	ok, let's start	16:01
bauzas	#topic Bugs (stuck/critical)	16:01
bauzas	#info No Critical bug	16:01
bauzas	#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+0 since the last meeting)	16:01
bauzas	#help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage	16:01
bauzas	#link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement	16:01
bauzas	any bug in particular to discuss ?	16:02
bauzas	I triaged a few of them but I need to create some env for verifying some others	16:02
bauzas	ok, looks not	16:03
bauzas	next,	16:03
bauzas	#topic Gate status	16:03
bauzas	#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs	16:03
bauzas	#link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status	16:03
bauzas	#info Please look at the gate failures and file a bug report with the gate-failure tag.	16:03
bauzas	I haven't seen any new problem	16:03
gmann	one update for centos9 stream volume detach failure	16:04
gmann	it is fixed now as SSH-able series is merged #link https://review.opendev.org/q/(topic:bug/1960346+OR+topic:wait_until_sshable_pingable)+status:merged	16:04
gmann	I have made centos9-stream as voting job in tempest gate	16:04
bauzas	\o/	16:04
dansmith	gmann: really, that makes it all pass reliably?	16:05
gmann	and proposed to be voting in devstack side too #link https://review.opendev.org/c/openstack/devstack/+/834546	16:05
gmann	dansmith: for now yes:)	16:05
dansmith	cool	16:05
dansmith	fips job in glance was still failing this morning I think, but I will look and see if it ran against that or not	16:05
gmann	and we will monitor it carefully now as we made it voting. n-v jobs always gets ignored somehow	16:05
dansmith	yeah cool	16:05
artom	So I wonder, would there be anything else to understand at the guest:host interaction level to understand why Ubuntu doesn't need to wait for SSHABLE?	16:06
dansmith	artom: I'm super curious as well, as this seems like an odd thing to have changed with just newer libvirt/qemu, although certainly possible	16:06
dansmith	we'll see if more weirdness comes out of running it in the full firehose	16:06
gmann	dansmith: yeah, you can try with recheck. this patch fixed the last test #link https://review.opendev.org/c/openstack/tempest/+/831608	16:07
bauzas	agreed, it's weird but ok	16:07
dansmith	as I was seeing other problems (on stream 8 mind you) when we were running it voting	16:07
bauzas	thanks gmann btw. for having worked on it :)	16:07
gmann	np!, just carried lyarwood work in this.	16:07
bauzas	can we move ?	16:07
gmann	yeah	16:08
bauzas	kk	16:08
bauzas	#topic Release Planning	16:08
bauzas	shit	16:08
bauzas	#topic Release Planning	16:08
bauzas	#link https://releases.openstack.org/yoga/schedule.html#y-rc1 RC1 is past now	16:08
bauzas	#link https://etherpad.opendev.org/p/nova-yoga-rc-potential Etherpad for RC tracking	16:09
bauzas	#link https://bugs.launchpad.net/nova/+bugs?field.tag=yoga-rc-potential RC potential tags	16:09
bauzas	this is Regression chasing time !	16:09
bauzas	we only have 2 days to provide a RC2 if we find a regression	16:09
bauzas	for the moment, we haven't see any of them	16:09
bauzas	#info RC2 deadline is in 2 days, so we can only fix regressions before	16:10
bauzas	actually, this is RC-deadline	16:10
bauzas	not really a specific RC2	16:10
bauzas	we could have a RC2 release tomorrow and then a RC2 on Thursday	16:10
bauzas	shit, RC3 on Thurs	16:10
* dansmith watches where he steps in here		16:11
bauzas	this is just, either we find regressions before Thursday and then we need to merge the changes before, or we would have a Yoga GA release with some known issue and we could only fix the regression by a next stable release	16:11
bauzas	but, as you can see https://bugs.launchpad.net/nova/+bugs?field.tag=yoga-rc-potential is empty	16:12
bauzas	anyway	16:12
bauzas	that's it for me	16:13
bauzas	any question or discussion for Yoga before we go to the next topic ?	16:13
bauzas	looks not	16:14
bauzas	#topic PTG preparation	16:14
bauzas	#link https://etherpad.opendev.org/p/nova-zed-ptg Nova Zed PTG etherpad	16:14
bauzas	nothing to say, please provide your topics you would like to discuss	16:15
bauzas	the PTG will be in 2 weeks, so I'd prefer to see all the topics before end of the next week	16:16
bauzas	for the moment, we only have a few of them	16:16
bauzas	anything to discuss about the PTG ?	16:16
bauzas	reminder, PTG will be April 4 - 8, 2022	16:17
Uggla	bauzas, sorry for the noob question, will we review bp/specs for zed ?	16:17
bauzas	Uggla: no worries, it's your first PTG	16:18
Uggla	should we put the bp/specs in the agenda ?	16:18
bauzas	Uggla: in general, we discuss about some specs if people have some stuff they'd like to see the community to find a consensus	16:18
bauzas	Uggla: we don't generally look at all the open specs	16:18
bauzas	people can also go and discuss about something they'd like to see or work, without having a spec yes	16:19
bauzas	yet*	16:19
bauzas	Uggla: look at the Xena PTG we had so you'll see what we discussed https://etherpad.opendev.org/p/nova-xena-ptg	16:19
Uggla	bauzas, I will have a look, thanks.	16:20
bauzas	ok, moving on, then	16:21
bauzas	#topic Review priorities	16:21
bauzas	#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1	16:21
artom	(No osc/sdk in there?)	16:22
bauzas	I have seen new changes	16:22
artom	(What with moving towards deprecation of the novaclient CLI)	16:22
bauzas	artom: nope	16:22
bauzas	artom: osc is another community but I understand your point	16:23
bauzas	artom: this is just, this label is only supported for our repos	16:23
artom	Ah, right	16:24
bauzas	(AFAIK)	16:24
sean-k-mooney	artom: we deprecated teh novaclint cli already	16:24
artom	Yeah, I wasn't sure	16:24
bauzas	artom: but if you want us to look at OSC changes, we can do this by some etherpad	16:24
sean-k-mooney	the python binding are still allowed to be extended	16:24
bauzas	artom: but you know what ? let's discuss this at the PTG to see how the nova community can review those OSC changes :)	16:25
bauzas	artom: hopefully you'll provide a topic, right?	16:25
bauzas	:)	16:25
artom	Shoudve kept my fat mouth shut :P	16:26
* artom will		16:26
bauzas	artom: :p	16:26
bauzas	moving on	16:26
bauzas	#topic Stable Branches	16:26
bauzas	elodilles: your point	16:26
elodilles	#info xena branch seems to be blocked by nova-tox-functional-centos8-py36 job - https://zuul.opendev.org/t/openstack/builds?job_name=nova-tox-functional-centos8-py36	16:26
elodilles	#info pike branch is blocked - fix: https://review.opendev.org/c/openstack/nova/+/833666	16:26
elodilles	and finally a reminder:	16:27
elodilles	Victoria Extended Maintenance transition is due ~ in a month (2022-04-27)	16:27
bauzas	wow, time flies	16:27
elodilles	yes yes	16:27
elodilles	that's it i think	16:28
bauzas	elodilles: can we make the centos8 job non-voting ?	16:28
elodilles	bauzas: that's an option	16:28
bauzas	does someone already look at the issue ?	16:28
elodilles	i had a quick look only	16:28
artom	Seems to be spurious...	16:29
bauzas	elodilles: ping me tomorrow morning and we'll jump onto it	16:29
artom	The last few runs passed	16:29
elodilles	it seems to be related to some mirror issue, but not sure	16:29
bauzas	artom: not the stable/xena branch	16:29
gmann	yeah seems mirror issue otherwise we can see same version conflict in other places also	16:29
elodilles	bauzas: sure, thanks	16:29
artom	... then which? stephenfin has a fix up for the pike one, looks like...	16:30
artom	So 'NFO: pip is looking at multiple versions of openstack-placement' is new, no?	16:30
bauzas	for the pike branch, agreed on reviewing the fix	16:30
artom	On my laptop, for stable/ussuri, it's taking forever	16:30
gmann	elodilles: let's wait for few more run.	16:30
bauzas	I don't want us to dig into the job resolution for now	16:31
bauzas	but people can start looking at it after the meeting if they want	16:31
elodilles	gmann: ack	16:31
bauzas	this is just, I don't want this branch holding because of one single job	16:31
bauzas	gmann: elodilles: I'd appreciate some DNM patches to make sure we don't hit this every change	16:32
bauzas	looks we discuss all the thingies by now	16:33
bauzas	discussed*	16:33
*** Guest0 is now known as prometheanfire		16:33
bauzas	can we move ?	16:33
gmann	did recheck on 828413, let's see	16:33
bauzas	gmann: ++	16:33
elodilles	yes, thanks, let's move on	16:34
bauzas	last topic then	16:35
bauzas	#topic Open discussion	16:35
bauzas	I have one	16:35
bauzas	(bauzas) Upgrade our minimum service check https://review.opendev.org/c/openstack/nova/+/833440	16:35
bauzas	takashi gently provided a changes for bumping our min version support	16:35
bauzas	before merging it, I'd like to make sure all people here agree on it	16:36
dansmith	so one thing we might want to consider,	16:36
bauzas	(that said, there is a grenade issue on its change, so even with +Wing it...)	16:36
dansmith	is a PTG topic about the check (and the problems with it that we didn't foresee) to see if there's any better way we could or should be doing that whole thing	16:36
dansmith	and just punt on the patch until we have that discussion	16:36
bauzas	I already opened a PTG topic	16:37
bauzas	I'll add the service check in it	16:37
dansmith	okay	16:37
bauzas	just done	16:39
bauzas	people agree with this plan ?	16:39
bauzas	either way, as said the change itself has grenade issues that need to be fixed	16:39
bauzas	and I don't see any reason for rushing on it being merged	16:39
bauzas	we have the whole zed timeframe for this	16:39
elodilles	(grenade issue might be because devstack does not have yet stable/yoga)	16:40
elodilles	(so that should be OK in 1 or 2 days)	16:40
bauzas	we haven't released stable/yoga	16:40
bauzas	this will be done on next Wed	16:40
bauzas	elodilles: but yeah, sounds this	16:41
elodilles	++	16:41
gmann	yeah, we should do that soon, neutron face same ssue.	16:41
gmann	elodilles: I will discuss in release channel	16:41
elodilles	gmann: ack	16:41
bauzas	ok, I guess we're done then	16:43
artom	Oh, can we chat about https://review.opendev.org/c/openstack/nova/+/833453?	16:43
bauzas	#agreed let's hold https://review.opendev.org/c/openstack/nova/+/833440 until we correctly discuss this at the PTG	16:43
* bauzas clicks on artom's patch		16:44
artom	Really only bringing it up here because, as a periodic, we'd have to check up on the status, presumably here	16:44
artom	Here == the meeting	16:44
bauzas	artom: yeah, that's my point	16:45
bauzas	we already do a few checks during the gate topic	16:45
bauzas	but I wonder whether that wouldn't be better if we could agree on this at the PTG	16:45
EugenMayer	is it possible to set the flavor of an instance manually using the api?	16:46
EugenMayer	Oh - sorry. Still meeting time. Ignore me.	16:46
artom	bauzas, doesn't seem controversial, but OK :)	16:46
bauzas	artom: yup, I don't disagree	16:47
bauzas	do people have concerns with adding a periodic check on whitebox ?	16:47
artom	I guess the downside is CI resource usage, but... one nightly job seems OK?	16:47
bauzas	I heard news of some CI resource shortage, but I'm not in the TC	16:47
artom	Yet ;)	16:48
bauzas	dansmith: gmann: can we just add a periodic job without being concerned ?	16:48
artom	dansmith said someone is pulling out	16:48
artom	(phrasing </archer>)	16:48
dansmith	periodic is probably not a big deal I would imagine	16:48
dansmith	I think we're going to need to trim down nova's per-patch jobs too, as it's getting pretty heavy	16:48
bauzas	yeah, I don't think this is a big thing if we add a periodic	16:49
bauzas	dansmith: adding a PTG topic about it fwiw	16:49
gmann	yeah, and periodic also we can see if daily or weekly?	16:49
bauzas	tbh, the only matter is how much we'll check its status and that will be weekly (during the team meeting)	16:50
gmann	bauzas: artom along with periodic, add in experimental pipeline too for manual trigger. that helps to avoid adding it in check/gate pipeline if anyone want to run maually	16:51
artom	bauzas, yep, no point in making it daily if we're only checking the status weekly	16:51
artom	gmann, ack, can do	16:51
gmann	+1	16:51
dansmith	yeah daily seems excessive	16:51
bauzas	artom: update this change with the weekly period time and mention in the commit msg we'll need to verify it during weekly meetings	16:53
* artom will have to find example of periodic weekly to figure out the correct Zuul words magic		16:53
bauzas	look at the placement ones	16:53
artom	Oh yeah!	16:53
gmann	artom: https://github.com/openstack/placement/blob/master/.zuul.yaml#L64	16:53
gmann	yeah	16:54
artom	Hah, that was easy	16:54
bauzas	this is another pipeline IIRC	16:54
sean-k-mooney	by the way i think weekly jobs in general suit use better as we can review them in the weekly meeting	16:54
sean-k-mooney	if we have a nightly we proably wont look at it every day	16:54
bauzas	oh yeah	16:54
bauzas	I just hope this meeting won't transform into some CI meeting	16:54
chateaulav	artom: nova zuul has an example of weekly periodic now	16:54
bauzas	if we start adding more periodics	16:55
artom	I mean, feel free to nack the idea entirely :)	16:55
sean-k-mooney	bauzas: well it should just be (are they green no we shoudl look at X after the meeting)	16:55
artom	I'll obviously try to debate/convince you	16:55
bauzas	artom: nah, I like the idea, I just want us to buy it	16:55
artom	But if we think whitebox doesn't bring value to Nova CI, let's just not do it :)	16:55
bauzas	we're approaching meeting's end time	16:56
artom	End times are nigh	16:56
bauzas	any other item to mention before we close ?	16:56
sean-k-mooney	:)	16:56
* artom gets raptured		16:56
sean-k-mooney	ah i actully had two blueprints i wanted to raise	16:56
sean-k-mooney	we defered updating the defaults for allcoation ratios	16:56
bauzas	sean-k-mooney: oh I forgot to mention I changed Launchpad to reflect zed as the active series	16:57
sean-k-mooney	shall we proceed with that or discuss at ptg	16:57
sean-k-mooney	also kasyaps blueprint for usign the new libvirt apis	16:57
bauzas	we're a bit short in time for reapproving specless bps by now	16:57
sean-k-mooney	can we retarget both to zed	16:57
sean-k-mooney	ack	16:57
bauzas	but we can look at them during next meeting	16:57
sean-k-mooney	we can disucss it next week or at ptg	16:57
bauzas	well, Zed is open	16:58
bauzas	I'm OK with approving things by now	16:58
bauzas	and the specs repo is ready	16:58
bauzas	sean-k-mooney: just propose your two blueprints for the next meeting so we'll reapprove them (unless concerns of course)	16:58
sean-k-mooney	ack	16:59
bauzas	fwiw, I leave the non-implemented blueprints in Deferred state	16:59
bauzas	once we start reapproving some, I'd change back their state	17:00
bauzas	but anyway, we're on time	17:00
bauzas	thanks all	17:00
bauzas	#endmeeting	17:00
opendevmeet	Meeting ended Tue Mar 22 17:00:16 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	17:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.html	17:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.txt	17:00
opendevmeet	Log: https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.log.html	17:00
elodilles	thanks bauzas o/	17:00
bauzas	was a productive meeting, after all	17:00
EugenMayer	Is there any 'good way' to set the task-state of an instance that has been stuck in 'image backup' due to an issue in glance? so the field OS-EXT-STS:task_state is on "image_backup"	17:03
*** tosky is now known as Guest38		17:04
*** tosky_ is now known as tosky		17:04
EugenMayer	i see there is 'nova set --state' or 'nova reset-state' but both seeem to operate on the instance-power-state (OS-EXT-STS:power_state) or OS-EXT-STS:vm_state - but not the task-state	17:05
zigo	sean-k-mooney: Yeah, this was an evacuate operation.	17:07
dansmith	zigo: I thought you said live migrate?	17:07
sean-k-mooney	zigo: ok the reason this breaks is for evacuation we only have 1 allocation in placemnt against both hosts	17:07
sean-k-mooney	and since the souce host is over capstiy because you reduce the allocate ration the entire allcoation is considered invlaid	17:08
sean-k-mooney	we disussed this at the ptg 1 or 2 ptgs ago	17:08
sean-k-mooney	i cant recall if we said we should fix this after consumer types but i dont think we had a workaround other then tempoarly increase the allcoation ratio so its nolonger over commited	17:09
dansmith	sean-k-mooney: we could also solve it the way we do for cold migration, which is hold the allocation on the source with the migration uuid right?	17:09
sean-k-mooney	dansmith: yes we could that was on eof the options	17:10
sean-k-mooney	im trying to find the launchpad bug	17:10
bauzas	dansmith: sean-k-mooney: yeah, the Migration uuid for evacuate seems the better and cleaner approach	17:11
sean-k-mooney	bauzas: that is what we were proposing doing	17:12
sean-k-mooney	but i dont think anyone has worked on it since	17:12
bauzas	:-)	17:13
sean-k-mooney	https://bugs.launchpad.net/nova/+bug/1943191	17:13
sean-k-mooney	that might be it	17:13
EugenMayer	I'am looking on https://wiki.openstack.org/wiki/CrashUp/Recover_From_Nova_Uncontrolled_Operations to understand how to recover from the crashed task state 'image_backup' but i'am not sure how to actual act upon that. Should i use the nova api?	17:13
sean-k-mooney	and https://bugs.launchpad.net/nova/+bug/1924123	17:14
bauzas	sean-k-mooney: some people expect bugs to be fixed automatically :)	17:14
bauzas	we don't have yet AI bots smart enough to close the gaps	17:14
sean-k-mooney	EugenMayer: the wiki is basicaly unmaintained	17:14
EugenMayer	i see. Thank you	17:15
sean-k-mooney	in the early days of openstack we used the wiki for sepc and project created docs(docs not by the docs team)	17:16
EugenMayer	I'am really not sure hot to again recover from the failed task the proper way. The only way i yet know, which is huge is: reset the state, then restart the compute the vm is hosted so thee state is somewhat recovered	17:16
sean-k-mooney	there is not way to recover form it really beyond that	17:17
sean-k-mooney	we dont provide a api to allow taskt to be restarted	17:17
dansmith	reset state and reboot the vm is what I'd try first,	17:17
sean-k-mooney	yep same	17:18
dansmith	not restarting the compute I'd hope	17:18
sean-k-mooney	ya that normally shoudl not be required	17:18
sean-k-mooney	i guess it woudl depend on why it failed	17:18
dansmith	definitely not expected for anything like a glance thing	17:18
EugenMayer	trying that. AFAIR i had to restart the entire compute last time. Anyway, trying that	17:18
sean-k-mooney	do you recall way?	17:19
sean-k-mooney	*why	17:19
EugenMayer	dansmith well this happens the 4th time. A stuck glance image backup task leaves the task_state of the instance in a broken state	17:19
dansmith	honestly restarting the compute shouldn't even do anything, AFAIK	17:19
sean-k-mooney	i wonder if the main thread of the compute agent was blocked on an io operations	17:19
sean-k-mooney	that is the only thing i can think of that would be fixed by an agent restart	17:20
sean-k-mooney	we were not using a thread pool on some of the older release for those	17:20
dansmith	sean-k-mooney: compute is the thing that "consumes" the task_state and turns it into a vm_state, so to speak, so maybe we clear task_state in init_host in some cases?	17:20
EugenMayer	well i'am on xena, so not really old	17:20
dansmith	but either way, reset_state to error is supposed to let you clear everything by enabling force reboot I think	17:21
dansmith	or that's the intent	17:21
sean-k-mooney	dansmith: i think we do yes but not sure about this case	17:21
EugenMayer	dansmith it is clear, swt wise, that there is more then one misconception in the microservice and task callstack. I'am not sure if glance is required to call a webhook on success or error (not sure how the result is propagated) but this is simply not the right design.	17:22
EugenMayer	should the task crash on glance, neither success nor error is called (ever) and there seems nothing to recover from that	17:22
dansmith	EugenMayer: none of that :)	17:22
dansmith	everything is nova->glance	17:22
sean-k-mooney	i belive this is a blocking call to do the upload to glace	17:23
sean-k-mooney	if its async then either nova would poll	17:23
sean-k-mooney	or we woudl get an external event form glance	17:23
dansmith	so depending on the failure, nova should clean up whatever it can.. an upload to glance for sure should be recoverable on our end, so that's likely it's own bug if we're missing something	17:23
sean-k-mooney	but i think image upload if blocking	17:23
dansmith	sean-k-mooney: none of that with glance	17:23
sean-k-mooney	right we dont do polling or external event right	17:24
sean-k-mooney	we just do two blocking calls	17:24
EugenMayer	if it is a blocking task, well the blocking should cleanup - which it seem to not do	17:24
sean-k-mooney	one for creating the image and the second for the data upload	17:24
dansmith	EugenMayer: if you can repro the problem that's definitely a bug candidate	17:24
sean-k-mooney	EugenMayer: yes it should clean up if we get an error form glance	17:24
dansmith	there are some situations where it might not make sense to clean up, but I would think a glance thing would always be something we can handle	17:24
EugenMayer	dansmith i can reproduce this the 4th time. If you tell me what to gather, i will grab the logs you need the 5th time - which will happen	17:25
dansmith	EugenMayer: logs	17:25
EugenMayer	which logs to get?	17:25
dansmith	all of them? :)	17:25
sean-k-mooney	dansmith: i would expect the vm to go back to active or error if we dont clean up right	17:25
dansmith	nova-compute, nova-api at least	17:25
dansmith	sean-k-mooney: error, yeah	17:25
EugenMayer	vm is in active state, power is on, task_state is image_backup	17:25
dansmith	that said, reset_state resets task_state so that should be the way to get out here	17:26
sean-k-mooney	you can reset state to active	17:26
EugenMayer	reset-state --active + reboot seems to recover just right. Also viewing the console works (which is one of the problems with a partial state recovery)	17:26
dansmith	EugenMayer: we're saying that what we would expect is vm_state=ERROR,task_state=None	17:26
sean-k-mooney	rather then error and potentaly just trigger the backup/snapthot again	17:26
EugenMayer	dansmith that never happened yet	17:26
dansmith	EugenMayer: I know, I'm saying that's what we expect nova should be doing	17:27
sean-k-mooney	EugenMayer: do you know why the glance operation is failing.	17:27
sean-k-mooney	dansmith: i could see an argument to be made that we woudl have vm_state=Active task_state=None but the snapshot action was marked as error in the server event log	17:28
sean-k-mooney	if the vm was indeed still runing proberly depending on how it failed	17:28
EugenMayer	there is so much one can break right now. e.g. a other topic is using terraform and rescale a flavor. In 2 of 5 cases the following happens (i cannot tell you exactly). The old flavor is delete (too early), the new one is created, then the instance is fetched, this fails since the flavor_id of the old flavor is still set and cannot be found. TF	17:28
EugenMayer	cancles and that's it	17:28
dansmith	sean-k-mooney: the problem is one of signaling, which is why we (originally as designed) went to error,None for everything and then you do a start (which does nothing) to reset back to active as sort of "ack"	17:28
EugenMayer	stuck again - stuck that one now needs to shelve the instance and restore it from glance using the 'new flavor'	17:29
EugenMayer	i did not yet check the tf openstack provider implementation to see what they have implemented and how that is a timing issue in the first place (since it does not happen every timee) .. but if i look at the openstack rest api / nova api .. swapping flavors is not designed at all.	17:30
sean-k-mooney	EugenMayer: well flavor are intenede to be imuatble so you idealy woudl not delete them until all instance using them are resized	17:30
sean-k-mooney	we do cache the flavor	17:30
sean-k-mooney	in the insntace	17:30
dansmith	EugenMayer: are you describing two issues or one? if the former, then let's not complicate diagnosing this one	17:30
sean-k-mooney	but really you shoudl try to avoid removing flavor or image that are in use	17:30
EugenMayer	well i cannot tell why tf openstack providere deletes the flav too early or whatever happens in detail (i did not check the sequence in the code yet)	17:31
sean-k-mooney	EugenMayer: it should not delete it at all	17:31
EugenMayer	dansmith sorry, my bad. second issue (the latter one with the flav)	17:31
sean-k-mooney	it sould like they are implementing the hacky workflow that horizon use to have	17:31
dansmith	EugenMayer: yeah, not helping :)	17:31
EugenMayer	dansmith sorry. my bad.	17:32
sean-k-mooney	where they allowed you to update a flavor by deletign and recating it but ya lets not talk about that issue now	17:32
EugenMayer	well if you ask me to the state error - one should not mark the instance as 'error' if a image_backup task failed - there is no reason for that. Creating a glance image does not required the instance to shutdown or similar, this said, i assume both task (the instance running) and the creation of the image can work in parallel and are independent	17:35
dansmith	EugenMayer: going to error state is just the nova convention (in most places)	17:35
EugenMayer	so this said, if the image_backup task is: failed, the task_id does no longer exists or whatever, nova should not block 'restarting the instance'	17:36
dansmith	and if the issue wasn't critical, then a start operation will clear the error state without requiring a reboot of the actual instance	17:36
EugenMayer	dansmith well it is the 'better safe then sorry convention i guess'	17:36
dansmith	EugenMayer: we're agreeing with you that we do not expect that this is something that should be so jammed up and that there's probably some missing error handling in this case	17:37
dansmith	I'm describing what the usual nova error procedure is, regarding going to error state to signal to the user that their thing didn't happen	17:37
dansmith	it's not great, it's just the convention	17:37
sean-k-mooney	EugenMayer: creatign the glance image might require the instnace to be shutdown by the way	17:37
dansmith	because if you do a backup, and the instance goes to active, you assume it worked, but it didn't	17:37
sean-k-mooney	snapshots are not guareentee to be live	17:37
dansmith	right	17:38
EugenMayer	if nova is the task owner, which i understood is the case, it should design a propere state machine in case the task (which i understood is blocking via REST, so very fragile). Task could complete failed or succeeded. Task could never complete or even be deleted (on the glance side)	17:38
sean-k-mooney	EugenMayer: there was a effort to do that at one point but this is also a distibuted system problem	17:38
dansmith	EugenMayer: there's no task	17:38
EugenMayer	understood, but i assume the sequence is: shutdown/sleep instance, create snapshot, start/resume instance, upload snapshot to glance .. (do task tracking)	17:38
sean-k-mooney	EugenMayer:right yes but there may be clean up to be done in the compute node or stoage backend if the upload fails	17:39
EugenMayer	no task means: it's blocking only. Understood there is no task_id or somewhat, just a blocking http-call. So as you both suggested, this blocking call needs to cleanup in all cases: 200,500 and also 408 and others.	17:40
sean-k-mooney	such as deleting the file we created that was not uploaded	17:40
dansmith	EugenMayer: we're saying exactly that.. we should, assuming we can	17:41
sean-k-mooney	EugenMayer: yep and nova shoudl check the respocne code and start cleaning up if it failed	17:41
EugenMayer	the i have seen the glance image task under image, which i was able to delete, but since the blocking request disconnected far ago, no cleanup happened on the nova side	17:41
dansmith	EugenMayer: there are cases that are more complicated, such as with ceph where we might not be able to recover at all, depending on what happened, but in general we agree	17:41
EugenMayer	agreed	17:41
sean-k-mooney	well recovery in ceph might be squash/merge the ceph snapshot back into the previous volume for example	17:42
dansmith	depends on the failure of course	17:42
sean-k-mooney	where as for qcow we woudl mirror the file on disk then upload and if it faile delete the copy	17:42
sean-k-mooney	EugenMayer: if you have logs and or a repoduce please file a bug and we can see if we can figure out why nova is not cleaning up as expected	17:43
opendevreview	Stephen Finucane proposed openstack/nova master: mypy: Add nova.cmd, nova.conf, nova.console https://review.opendev.org/c/openstack/nova/+/705657	17:52
opendevreview	Stephen Finucane proposed openstack/nova master: mypy: Add type annotations to top-level modules https://review.opendev.org/c/openstack/nova/+/705658	17:52
opendevreview	Stephen Finucane proposed openstack/nova master: trivial: Clean manager.Manager, service.Service signatures https://review.opendev.org/c/openstack/nova/+/764806	17:52
EugenMayer	sean-k-mooney dansmith will do, thank you for your both time	17:58
admin1	hi all .. i am hitting this bug, https://bugs.launchpad.net/glance/+bug/1916482 , but don't have an idea on how to solve it .. i am using openstack-ansible and the latest tag 24.0.1	18:22
admin1	nova is local disk, glance is rbd	18:22
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Remove unnecessary type aliases, exceptions https://review.opendev.org/c/openstack/nova/+/738240	18:22
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Use imports instead of type aliases https://review.opendev.org/c/openstack/nova/+/738018	18:22
opendevreview	Stephen Finucane proposed openstack/nova master: objects: Remove wrappers around ovo mixins https://review.opendev.org/c/openstack/nova/+/738019	18:22
opendevreview	Stephen Finucane proposed openstack/nova master: WIP: add ovo-mypy-plugin to type hinting o.vos https://review.opendev.org/c/openstack/nova/+/758851	18:22
opendevreview	Ghanshyam proposed openstack/nova stable/xena: DNM: testing centos8 py36 job https://review.opendev.org/c/openstack/nova/+/834765	18:36
opendevreview	Ghanshyam proposed openstack/nova stable/wallaby: DNM: testing centos8 py36 job https://review.opendev.org/c/openstack/nova/+/834721	18:38
*** dasm is now known as dasm\|off		22:18

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!