Tuesday, 2023-06-06

gibi	bauzas, dansmith: nice findings. I'm OK with the privsep fix and you can ping me with the governor check fix when it is available	07:57
gibi	regarding asserting the use of the decorator during testing we can build a list of filesystem.write(path, data) calls that we know require the privsep decorator and then check in the test that when those calls happen the func has _ENTRYPOINT_ATTR set.	08:00
gibi	*filesyste.write_sys	08:04
songwenping	sean-k-mooney:hi, does live migration pass filters if assigned the host?	08:09
opendevreview	Balazs Gibizer proposed openstack/nova master: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/873216	08:16
gibi	sean-k-mooney: This ^^ was already approved but needed a rebase and a unit test fix due the base changes. Could you check it please?	08:17
gibi	I did not wait for the author with the rebase and went ahead and fixed up the patch	08:17
opendevreview	Balazs Gibizer proposed openstack/nova stable/2023.1: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885343	08:22
opendevreview	Balazs Gibizer proposed openstack/nova stable/zed: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885344	08:30
bauzas	gibi: sorry for the late reply, but thanks	08:33
opendevreview	Balazs Gibizer proposed openstack/nova stable/yoga: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885345	08:36
opendevreview	Balazs Gibizer proposed openstack/nova stable/xena: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885347	08:45
songwenping	gibi:morning, does live migration pass filters if assigned the host?	08:46
gibi	songwenping: it depends. See the doc of the host and the force option in https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#id131	08:47
songwenping	we use rocky version, and nova-conductor donot find new destination.	08:53
songwenping	and there is a problem, if the vm has affinity, it can be migarated to other host.	08:53
opendevreview	Balazs Gibizer proposed openstack/nova stable/wallaby: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885348	08:57
songwenping	then if we use the same affinity strategy to create vms, these vms scheduled to different hosts. gibi, is this reasonable?	08:57
gibi	if you use affinity strategy then you cannot move the VM. Execpt if you disable the scheduler via the force flag and old enough microversion. But if you disable the scheduler then the affinity will not be honored.	08:59
gibi	If you need both affinity and move operations then you should use soft-affinity	08:59
opendevreview	Balazs Gibizer proposed openstack/nova stable/victoria: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885349	09:08
songwenping	gibi, got it, thanks^^	09:09
opendevreview	Sylvain Bauza proposed openstack/nova master: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/885352	09:57
opendevreview	Balazs Gibizer proposed openstack/nova stable/ussuri: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885353	10:04
opendevreview	Balazs Gibizer proposed openstack/nova stable/train: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885355	10:19
sean-k-mooney	gibi: its workign on https://github.com/openstack-k8s-operators/nova-operator/pull/400 by the way	10:22
sean-k-mooney	which ran at 2023-06-05 14:50:20	10:23
sean-k-mooney	so it passed yesterday	10:23
gibi	ack	10:23
gibi	you oddly switched from downstream slack to upstream irc to write that though :)	10:24
sean-k-mooney	oh right donwstream is normlaly on the bottom half of my screen and upstream is the top	10:24
sean-k-mooney	this windows is in the wrong place	10:25
sean-k-mooney	fixed :)	10:25
gibi	:)	10:25
opendevreview	Gorka Eguileor proposed openstack/nova master: Libvirt: remove old discard with virtio log https://review.opendev.org/c/openstack/nova/+/885356	11:07
*** EugenMayer44 is now known as EugenMayer4		11:21
dvo-plv	gibi,bauzas: Hello, Could you please review nova patch: https://review.opendev.org/c/openstack/nova/+/876075	11:41
bauzas	dvo-plv: sure, I already promised but unfortunately I needed to work on my presentation for the OpenInfra Summit :(	11:41
dvo-plv	Sure, thank you no rush, review according to your plan, I just would like to remind in case request was lost	11:44
bauzas	I'm really sorry folks but I forgot to tell that today is the spec review day	12:52
bauzas	!!!	12:52
opendevmeet	bauzas: Error: "!!" is not a valid command.	12:52
opendevreview	Amit Uniyal proposed openstack/nova master: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/881457	14:12
opendevreview	Amit Uniyal proposed openstack/nova master: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/882284	14:12
opendevreview	Sylvain Bauza proposed openstack/nova master: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/885352	14:23
dansmith	bauzas: are you going to update the reno for the first patch?	14:27
mnederlof	hi, i've created this bp https://blueprints.launchpad.net/nova/+spec/rbd-allow-glance-image-deletion and the code change required, can someone help with the next steps for review? https://review.opendev.org/c/openstack/nova/+/884595	14:32
bauzas	dansmith: yeah, I'm just fixing the series	14:33
opendevreview	Sylvain Bauza proposed openstack/nova master: cpu: fix the privsep issue when offlining the cpu https://review.opendev.org/c/openstack/nova/+/885293	14:37
opendevreview	Sylvain Bauza proposed openstack/nova master: cpu: make governors to be optional https://review.opendev.org/c/openstack/nova/+/885352	14:37
bauzas	dansmith: gibi: just updated the cpu fixes ^	14:38
bauzas	elodilles: can you help me ? wanted to propose the train-eol patch but looked at the docs and saw https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life	14:53
bauzas	"point #2 : Remove any related zuul jobs that are defined in other repositories and not needed anymore."	14:53
bauzas	wdym by that ?	14:53
bauzas	like in tempest ?	14:54
elodilles	bauzas: any job that nova uses in its .zuul.yaml, but defined outside of nova repository	14:59
bauzas	I don't see any of them	14:59
elodilles	for example if there is let's say nova-special-grenade-train defined in, for example, openstack/grenade repository	15:00
elodilles	bauzas: if there is none, then you're done with that step ;)	15:01
bauzas	I'll doublecheck with Gerrit	15:01
bauzas	gibi: btw. I have an appointment around 20 mins after the start of the meeting, can you chair it ?	15:02
gibi	bauzas: I can try but I'm probably not the best person today as I will be on an flaky connection at that time	15:10
bauzas	ok, I can ask someone else, I just wonder who	15:10
bauzas	elodilles: want to lead it ?	15:10
elodilles	bauzas: i'm not feeling quite well, so i'd rather pass this time :/	15:15
bauzas	okok	15:16
bauzas	so, we'll try to have a quick meeting then	15:16
elodilles	+1	15:16
gibi	bauzas: then I will jump in after you need to leave but I don't promise I will not get disconnected at some point :)	15:18
dansmith	gibi: #chair me and I can recover it if you drop	15:23
gibi	ack	15:26
bauzas	dansmith: cool thanks for the offer	15:38
bauzas	shit, my appointment just arrived	15:50
bauzas	gibi: can you please lead it ?	15:50
bauzas	the agenda is done https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting	15:50
gibi	sure	15:51
gibi	I will do	15:51
gibi	#startmeeting nova	16:00
opendevmeet	Meeting started Tue Jun 6 16:00:04 2023 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.	16:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	16:00
opendevmeet	The meeting name has been set to 'nova'	16:00
gibi	#chair bauzas	16:00
opendevmeet	Current chairs: bauzas gibi	16:00
gibi	#chair dansmith	16:00
opendevmeet	Current chairs: bauzas dansmith gibi	16:00
dansmith	o/	16:00
auniyal	o/	16:00
elodilles	o/	16:00
gibi	bauzas has an appointment so I try to chair this	16:01
gibi	#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting	16:01
gibi	#topic Bugs (stuck/critical)	16:01
gibi	lets see	16:01
gibi	#info No Critical bug	16:01
gibi	#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 18 new untriaged bugs (+3 since the last meeting)	16:02
gibi	#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster	16:02
gibi	last week the baton was at bauzas	16:02
gibi	so I'm not sure if we have any news from him now	16:03
gibi	the next on the roster is me	16:03
bauzas	I'll take it next week	16:04
gibi	but I will be mostly away next week	16:04
Uggla_	o/	16:04
bauzas	I didn't had time to look at them this week	16:04
bauzas	Ditto due to the summit	16:04
gibi	so moving down the list the next on it is melwitt	16:04
bauzas	But I can try to look at them	16:04
bauzas	(sorry on my phone)	16:04
gibi	melwitt: could you take the baton?	16:05
auniyal	gibi, last to last week I looked into this bug: https://bugs.launchpad.net/nova/+bug/2018719, I could not reproduce so added comment to ask for more info	16:05
gibi	auniyal: ack	16:06
gibi	I guess logging in to the rescue image depends on the actual image so you are right	16:06
gibi	I will ping melwitt later about the bug baton	16:07
gibi	any other bugs we need to discuss?	16:07
auniyal	nothing from my side, thanks	16:08
gibi	#topic Gate status	16:09
gibi	#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs	16:09
gibi	#link https://etherpad.opendev.org/p/nova-ci-failures	16:10
dansmith	lots of little test failures lately which is making it challenging to get a clean result	16:10
dansmith	but nothing outstanding as a super common thing to go tackle that I've seen	16:10
gibi	I saw two different guest failures one case an disk io error	16:10
gibi	the other was probably some metadata error	16:10
gibi	but I agree I did not see a pattern yet	16:11
dansmith	I have seen some IO errors related to volumes yeah, but I don't know what that's coming from	16:11
gibi	I don't see any new bug reported tagged with gate-failure. If I see a pattern in tomorrows reject then I will file some	16:12
gibi	s/reject/recheck/	16:13
bauzas	haven't seen any gate failure	16:13
bauzas	(still otp)	16:13
gibi	#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement	16:14
gibi	periodics look good	16:14
gibi	any other gate issues to raise?	16:15
gibi	#info Please look at the gate failures and file a bug report with the gate-failure tag.	16:15
dansmith	nothing from me	16:15
bauzas	I'm back	16:16
gibi	then the usual announcement	16:16
gibi	#info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures	16:16
dansmith	fwiw,	16:16
dansmith	I think we're doing quite well on the blind recheck thing.. not that we shouldn't remind people, but we could probably un-ALL-CAPS-ify that now :D	16:17
bauzas	gibi: wants me to take again the chair seat ?	16:17
gibi	dansmith: cool. I'm OK to uncap it :)	16:17
dansmith	it's been tracked in the TC meeting and we'	16:17
gibi	bauzas: the char is yours :)	16:17
dansmith	we seem to be settling around pretty good behavior	16:17
bauzas	dansmith: lol, I'll change it :)	16:17
bauzas	#topic Release Planning	16:19
bauzas	#link https://releases.openstack.org/bobcat/schedule.html	16:19
bauzas	#info Nova deadlines are set in the above schedule	16:19
bauzas	#info Nova spec review day today	16:19
bauzas	as a reminder ^	16:19
gibi	I deeply missed that :/	16:19
bauzas	tbh, I wasn't able to do my duty but I'll do this later tonight	16:19
bauzas	(some internal discussion ate my whole afternoon)	16:20
bauzas	so, yeah, would be nice	16:20
gibi	I saw that there is a spec proposal for continuing the PCI in placement work	16:20
bauzas	nothing to tell apart this	16:20
bauzas	gibi: indeed, someone proposed	16:20
gibi	I need to review that	16:20
bauzas	cool	16:20
gibi	but other can chime in there too :)	16:20
bauzas	as a reminder, if folks don't have time to review specs today, that's fine (c)	16:20
bauzas	but please try to look at them this week	16:21
gibi	there is alway tomorrow :)	16:21
bauzas	at least before the Summit in case people discuss there	16:21
bauzas	anyway, good related point,	16:21
bauzas	#topic pPTG Planning	16:21
bauzas	#info please add your topics and names to the etherpad https://etherpad.opendev.org/p/vancouver-june2023-nova	16:21
bauzas	crickets in there ^	16:21
bauzas	so I'll write an -discuss ML thread for this	16:22
gibi	nah, I added one thing now :D	16:22
bauzas	in case ops or devs want to discuss with us	16:22
bauzas	hehe	16:22
bauzas	+ I'll tell ops during our forum meet&greet about our PTG	16:23
bauzas	#info The table #24 is booked for the whole two days. See the Nova community there !	16:23
bauzas	that's it	16:23
bauzas	moving on	16:23
bauzas	#topic Review priorities	16:23
bauzas	#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)	16:23
bauzas	#info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review	16:24
bauzas	#topic Stable Branches	16:24
bauzas	elodilles is maybe afk	16:24
bauzas	so lemme add his points	16:24
bauzas	#info stable gates should be OK (from stable/2023.1 to stable/train)	16:24
bauzas	#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci	16:24
bauzas	huzzah for this	16:24
bauzas	and my point	16:24
bauzas	#info train-eol patch proposed https://review.opendev.org/c/openstack/releases/+/885365	16:24
bauzas	I'd appreciate if nova-cores could comment it ^	16:25
dansmith	I will	16:25
* gibi just proposed a train backport today		16:25
dansmith	I noticed the cinder people are taking a more aggressive approach	16:25
bauzas	gibi: okay, then -1 my patch and I'll modify it to await for your backport merge	16:25
bauzas	dansmith: yup, saw it too	16:25
gibi	bauzas: I think I'm OK to drop that backport	16:25
gibi	I did that to see if the patch works	16:26
bauzas	gibi: as you want, just tell me your insights in the train-eol patch	16:26
gibi	ack	16:26
bauzas	dansmith: fwiw I'm afraid of EOLing the whole EM branches	16:26
dansmith	shrug.. my argument for train applies to all the EM ones too	16:27
dansmith	maybe we wait and see how it goes for cinder ;)	16:27
bauzas	but since we haven't backported the os-brick CVE fix in Ussuri and Victoria, I could understand	16:27
bauzas	but yeah, let's see what it's happening for cinder :D	16:28
* bauzas takes his popcorn :)		16:28
bauzas	anyway, moving on	16:28
bauzas	#topic Open discussion	16:28
bauzas	none in the agenda	16:28
bauzas	anything that someone wants to tell ?	16:28
gibi	one thing	16:30
bauzas	shoto	16:30
bauzas	shoot even	16:30
gibi	there was a request for opinion about openstack on k8s	16:30
gibi	let me find the link	16:30
sean-k-mooney	its a thing that people do	16:30
bauzas	okidoki, let's wait	16:31
gibi	13:01 < mdbooth> I'll be running a forum session on Kubernetes on OpenStack in Vancouver next week. It's for users and developers of all related projects to talk to each other. Etherpad is here if there's anything you'd like to discuss: https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack	16:32
gibi	13:01 < mdbooth> I'll be running a forum session on Kubernetes on OpenStack in Vancouver next week. It's for users and developers of all related projects to talk to each other. Etherpad is here if there's anything you'd like to discuss: https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack	16:32
gibi	sorry for the duplicate	16:32
sean-k-mooney	im wondering why mdboot is runnign that session but ok	16:32
dansmith	that's k8s on openstack not what you said right?	16:32
gibi	sorry I mixed up	16:32
sean-k-mooney	that k8s on openstack	16:32
sean-k-mooney	ya	16:33
bauzas	ok, so lemme add the link	16:33
sean-k-mooney	that makes more sense why mdbooth is involved	16:33
bauzas	#link https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack OpenInfra Forum session for discussing about k8s on openstack	16:33
bauzas	gibi: the other way btw. :)	16:33
gibi	there is some nova related question in the etherpad	16:33
gibi	about getting notified if anything changed with servers	16:34
gibi	I offered the nova notification inteface	16:34
bauzas	yeah and it's a public API :)	16:34
gibi	but apperantly they want something public	16:34
sean-k-mooney	the notifcation are the only interface we have currently	16:34
gibi	the API is documented but the message bus access then to be non public	16:34
bauzas	since someone worked on notifications objects like 6 years ago (guess who :p )	16:35
gibi	bauzas: hah :D	16:35
bauzas	gibi: are you sure that the message bus can't be public ?	16:35
dansmith	it shouldn't be	16:35
gibi	yeah	16:35
dansmith	we have instance events.. that's what they want I think	16:35
sean-k-mooney	the notification bus is semi privladged	16:35
gibi	and our notifications tend to contain infra information	16:35
dansmith	I think they just need an async way to get those	16:35
sean-k-mooney	the notifications can have private infor in them	16:35
sean-k-mooney	depening on what you configure	16:36
sean-k-mooney	like the bdms	16:36
gibi	yeah, bottom line the notification API is designed for consumed by admins or other openstack service not endusers	16:36
bauzas	ah they want it to be consumable by endusers ?	16:36
dansmith	right, it would leak bad things between tenants for sure	16:36
dansmith	not just infra things	16:36
gibi	yeah	16:36
bauzas	urth	16:36
bauzas	urgh	16:37
sean-k-mooney	you could have a multi tentant service that converts the notificaton in to a webhook callback or similar	16:37
dansmith	instance events/actions is the right thing I think, it's just only polling currently	16:37
sean-k-mooney	but that still not greate	16:37
sean-k-mooney	dansmith: ya the event stream would work but im not sure that	16:37
gibi	yeah, a websocket around instance actions wouldbe nice	16:37
sean-k-mooney	even if it was event based they woud lwant ot liste per instance	16:37
sean-k-mooney	more liek open a websocket and get all instance events for a project?	16:37
dansmith	they probably want to be able to register a handler with a scope (one instance, all my instances) that lives for a period of time that we call when there's a new event	16:38
sean-k-mooney	or that you are allowed to see based on teh scope of the keystone token	16:38
bauzas	I think everytime someone asks us to monitor some instance action, we tell them 'lookup the notifications'	16:38
bauzas	but this is for admin usage	16:38
dansmith	websocket will require a lot of standby resources that I think would be hard for us to manage	16:38
gibi	true	16:38
dansmith	anyway,	16:38
bauzas	so, they want some enduser public subscription mechanism for asynchronously being notified on my instance state changes ?	16:38
dansmith	not sure how many people will be there to make any sort of headway on that topic, since I think those people are likely here :)	16:39
dansmith	bauzas: yeah	16:39
bauzas	sounds a client thing to mre	16:39
bauzas	me	16:39
gibi	dansmith: I will be there hence collecting ideas here now :)	16:39
sean-k-mooney	lets see if they acan at least expand on the usecases	16:39
dansmith	bauzas: a client can poll (or long poll) but that's much less efficient	16:39
gibi	yeah I will pull out some specific use case and try to limit the scope to something very simple on our side	16:39
dansmith	especially when instance actions could be days apart	16:40
bauzas	but yeah, someone could provide some tool that would listen to the notification bus and scramples all the admin-only data	16:40
gibi	bauzas: exactly	16:40
bauzas	sorry, by client I meant something unrelated to nova	16:40
sean-k-mooney	so the way to do this in the past was ceilometer put the relevetn events in AODH	16:40
dansmith	they could, but that's basically re-constructing the tenant isolation that nova already has, so it's a big new surface to secure and new services to run	16:40
sean-k-mooney	and then you woudl set up alarms on the events you cared about	16:40
dansmith	sean-k-mooney: that's all intended to be operator-focused, not for users to get status/events on their instances right?	16:41
sean-k-mooney	no	16:41
gibi	if they only need a trigger to re-read the instance action API then most of of the data can be hidden from our notifications	16:41
sean-k-mooney	aodh and celoimeter provided user facing events/metrics	16:41
dansmith	okay I didn't realize	16:41
bauzas	yeah	16:41
sean-k-mooney	they didnt actully expose the full notificaiton	16:41
bauzas	ceilometer was the fit	16:42
sean-k-mooney	jsut instance boot started and instance boot finsined events	16:42
bauzas	and that's why we never had this in nova	16:42
dansmith	honestly, I feel like this is probably something nova can/should be doing	16:42
dansmith	nowadays this is how stuff plugs together	16:42
sean-k-mooney	ya i think its something we coudl do	16:42
sean-k-mooney	but we need to think about how	16:42
dansmith	making an external tool reconstruct what we already know is kinda :/	16:42
gibi	yeah	16:43
dansmith	it could be a service like console that you run if you want, and scale separately to handle the amoun tof load you want to tolerate	16:43
bauzas	dansmith: I'm still struggling to find how we would ensure the tenancy isolation by the message bus, but I'm open to ideas	16:43
bauzas	unless we create a bus per tenant	16:43
sean-k-mooney	if its in nova	16:43
dansmith	bauzas: we wouldn't?	16:43
sean-k-mooney	we can just filter	16:43
bauzas	I'm maybe misunderstanding the proposal, but I thought we were about saying that we may emit project-related notifications	16:44
dansmith	not at the rabbit level	16:44
dansmith	let's let gibi collect some data,	16:44
dansmith	and then we probably need a high-bandwidth conversation about options	16:44
gibi	bauzas: we def need to understand their use case better	16:45
bauzas	dansmith: ah ok	16:45
bauzas	dansmith: then we need to construct some HTTP/2 layer with keystone auth	16:45
bauzas	or something like that	16:45
dansmith	bauzas: not necessarily	16:45
dansmith	it just depends.. but it should be HTTP-something, either an event stream or callbacks	16:45
sean-k-mooney	you would do somethign like "openstack project event subsribe (instnace.action.)*" which woudl return a websock url that would only stream the relevent events for the current project based on the keystone token	16:46
dansmith	yep, could be something like that	16:46
bauzas	sounds very console-ish	16:46
sean-k-mooney	so like the console you woudl first create it and then get a handel for where to collect the data	16:46
bauzas	but okay	16:46
gibi	I can simplify it down to give me a server uuid and I stream you data about notifications affecting the server, but only with very limited data provided	16:46
dansmith	bauzas: exactly.. it's the same sort of arrangement, and the same target audience	16:46
bauzas	ok, then it sounds we have an agreement on the direction, let's not overpaper the technical details	16:47
sean-k-mooney	and if its a seperate binary nova-event-proxy	16:47
dansmith	gibi: yeah I just don't think you should need 100 websockets you have to read from if you have 100 instances in your wordpress deployment	16:47
dansmith	sean-k-mooney: right	16:47
sean-k-mooney	then its scalablity and wether its deployed is up ot the operator	16:48
dansmith	yup	16:48
gibi	dansmith: ahh true, we can do it pre project then	16:48
gibi	per	16:48
bauzas	yeah, devil is in the details of the productization	16:48
dansmith	gibi: or even server group	16:48
sean-k-mooney	anyway lets see what they actully bring up	16:48
bauzas	gibi: honestly, the granularity sounds per project to me	16:48
sean-k-mooney	and see if this type of solution would work for them or not	16:48
dansmith	well, nfv people are all one project in some cases, so that probably won't work for them	16:48
bauzas	gibi: are you done with this topic now that we drafted a solution for you ? :D	16:49
gibi	I'm done	16:49
bauzas	cool	16:49
gibi	thanks for the discussion\	16:49
gibi	I will link this to the etherpad	16:49
gibi	and I will report back from the summit	16:49
bauzas	cool	16:49
bauzas	I'll be back watching you at the Summit anyway	16:49
bauzas	so if you promise too many things, I could yell :p	16:49
gibi	bauzas: please do so	16:49
gibi	:D	16:49
gibi	I don't need another 3 years of "notification" work	16:50
dansmith	heh	16:50
dansmith	still got scars eh/	16:50
bauzas	ok, I was balancing the idea to paperwork the scaphandre and manila series but I'm exhausted of today	16:50
bauzas	so, let's skip it and pretend it will be discussed in two weeks from now	16:50
gibi	dansmith: time makes all these memory nicer and nicer actaully	16:51
dansmith	heh	16:51
gibi	so bauzas' has a good point watching me :)	16:51
dansmith	gibi: https://www.youtube.com/watch?v=dLjNzwEULG8	16:51
bauzas	gibi: you're fortunate, canadians don't open carry	16:51
bauzas	sorry, was a terrible joke :)	16:52
gibi	dansmith: I need to check this out after the meeting :)	16:52
bauzas	anyway, I think we're done for today	16:52
gibi	indeed	16:52
bauzas	thanks all	16:52
bauzas	and thanks gibi for the chair	16:52
bauzas	#endmeeting	16:52
opendevmeet	Meeting ended Tue Jun 6 16:52:50 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:52
opendevmeet	Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.html	16:52
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.txt	16:52
opendevmeet	Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.log.html	16:52
bauzas	gibi: there you go, you have your logs :p	16:53
gibi	thank	16:54
gibi	and it was the moment when my connection dropped first	16:54
gibi	so it was not that flaky after all	16:54
dansmith	are we canceling the nova meeting next week?	16:55
bauzas	dansmith: damn shit, forgot to tell it	16:55
bauzas	unless people wanna run it, which I'm cool wiht	16:55
geguileo	dansmith: sean-k-mooney last week I mentioned the os-brick idempotency and we talked about reconstructing the disk XML on some operations. I've just opened a bug (#2023078) where Nova is not rebuilding the disk XML after block migration. Don't know nova enough to know if this would be fixed with the changes to the os-brick idempotency thingy.	16:56
sean-k-mooney	the xml for live migration is built on the souce node based on info passed back form the destination host	16:57
sean-k-mooney	for cold migration it build on the dest host	16:57
sean-k-mooney	i assume you are refering to live-migration local block devices?	16:57
geguileo	sean-k-mooney: the XML after live migration is wrong	16:57
sean-k-mooney	that xml must be generated on the source host	16:58
geguileo	sean-k-mooney: it doesn't have the right discard=unmap value even when the destination is saying that it is supported	16:58
sean-k-mooney	is it enabeld on the source	16:58
geguileo	sean-k-mooney: but if the source didn't support it and now it does? Then it cannot be supported until reboot?	16:59
geguileo	Because then I think Cinder should start reporting that everything supports discard...	16:59
sean-k-mooney	geguileo: correct	16:59
sean-k-mooney	so this is not somethign that should be chnaging durign a live migrate	17:00
geguileo	ok, so then why do we even report the support for this thing?	17:00
geguileo	We should always set discard=unmap in the XML	17:00
sean-k-mooney	well we cant because our min libvirt/qemu did not support it	17:01
geguileo	if it works, nice, if it doesn't it will not prevent it from working after a live migration	17:01
sean-k-mooney	that may have changed recently but it was a limitation in the past	17:01
opendevreview	sean mooney proposed openstack/nova master: Allow discard with virtio-blk https://review.opendev.org/c/openstack/nova/+/878795	17:01
geguileo	sean-k-mooney: but if it does now then maybe we have to give it another go at this whole thing	17:01
sean-k-mooney	geguileo: without ^ we also dont support discard for all disk buses	17:01
sean-k-mooney	geguileo: sure but we have to be careful to ensure that the move ops work properly	17:02
geguileo	sean-k-mooney: that patch should be abandoned... I added a comment to the LP bug	17:02
sean-k-mooney	that is not your patch	17:02
sean-k-mooney	its my patch to fix the bug	17:02
sean-k-mooney	that for some reason was not in launchpad	17:02
geguileo	sean-k-mooney: how is it different to mine?	17:02
geguileo	it's literaly the same	17:03
geguileo	and it doesn't work	17:03
sean-k-mooney	i make discard work for disk buses	17:03
geguileo	it just removes a debug log message	17:03
geguileo	and the log is correct	17:03
geguileo	discard doesn't work with virtio	17:03
sean-k-mooney	yes it does	17:03
geguileo	don't know why	17:03
sean-k-mooney	it requires a min version fo qemu and libvirt	17:03
geguileo	I could only make it work with IDE, SCSI, and SATA	17:04
sean-k-mooney	i can try and find the downstream bz for virtio-blk again one sec	17:04
geguileo	sean-k-mooney: I have the downstream BZ	17:04
geguileo	sean-k-mooney: I'm just telling you I can't make it work	17:04
sean-k-mooney	for qemu supprot of virtio-blk	17:04
geguileo	(maybe I'm dumb)	17:04
sean-k-mooney	*trim with virtio--blk	17:04
geguileo	sean-k-mooney: sure, it says it supports it... I can't make it work without using IDE, SCSI or SATA	17:05
sean-k-mooney	well it worked in our ci	17:05
sean-k-mooney	https://review.opendev.org/c/openstack/nova/+/879077/1	17:05
geguileo	sean-k-mooney: did you actually check that the size was reduced?	17:05
sean-k-mooney	no but if its not that a qemu bug	17:05
geguileo	sean-k-mooney: but then the log should remain	17:05
sean-k-mooney	why it would not be correct	17:06
geguileo	just because fstrim says it has freed space within the guest OS it doesn't mean that it has actually hapened	17:06
sean-k-mooney	sure there are several layeres at play here	17:07
geguileo	sean-k-mooney: oh, I'm sure there are many, but they are beyond my expertise	17:07
geguileo	I'm just reporting as a storage guy saying, don't know why I can't make it work	17:07
geguileo	lol	17:07
sean-k-mooney	so you were using local sotrage	17:07
sean-k-mooney	no cinder	17:07
sean-k-mooney	booted a vm	17:07
sean-k-mooney	allcoated space	17:08
geguileo	I was booting a VM from RBD, iSCSI, NFS	17:08
sean-k-mooney	and then deleteed it an ddid a trim	17:08
geguileo	Then I did a live volume migration, which made the disk loose sparseness (became thick)	17:08
geguileo	then I issued the "fstrim -v --all"	17:08
sean-k-mooney	well i filed https://bugs.launchpad.net/nova/+bug/2013123 for local storage	17:08
geguileo	sean-k-mooney: yeah, I replied in that LP bug	17:09
geguileo	sean-k-mooney: I'm working on a sparseness document, because this is a CF	17:09
geguileo	(including the cinder side)	17:09
sean-k-mooney	well discard is not just about sparceness	17:10
sean-k-mooney	but for what its worth we do not make any statement about spacencie at the api level	17:10
dansmith	AFAIK, with various file formats you can only expect the used size to decrease if you fully discard a block that covers an extent	17:11
geguileo	sean-k-mooney: yeah, it's also about SSDs optimization, power consumption, etc	17:11
dansmith	vmware, TMK, only actually reclaims space when the guest is shutdown	17:11
geguileo	dansmith: true, but even then I can clearly see the reduction	17:11
geguileo	I mean, the size goes down from 1GB to 100MB or so...	17:12
sean-k-mooney	so for example i dont knwo if qcow or raw files will actully reduce space	17:12
geguileo	sean-k-mooney: qcow2 does	17:12
sean-k-mooney	for qcow i woudl expect it to be reduced for raw proably not	17:12
geguileo	it things are set correctly	17:12
geguileo	(aka all the starts align)	17:12
geguileo	s/starts/stars	17:12
geguileo	and I can even make NFS/qcow2 and RBD preserve sparseness on live migration	17:13
geguileo	changing the nova code	17:13
sean-k-mooney	how did you chagne the nova code	17:13
geguileo	to use the detect_zeroes feature	17:13
geguileo	I created this LP bug for that one https://bugs.launchpad.net/nova/+bug/2023079	17:13
sean-k-mooney	ok then we can add that as a new feature if you can explain what it is	17:13
sean-k-mooney	that is not a bug it woudl be a new feature	17:14
sean-k-mooney	a small one but its still a feature	17:14
geguileo	sean-k-mooney: https://paste.openstack.org/show/brFgX6MgBlxjgCrE3rbg/	17:14
sean-k-mooney	geguileo: what do you mean by volume in https://bugs.launchpad.net/nova/+bug/2023079	17:14
geguileo	sean-k-mooney: but that's not the right code	17:14
geguileo	it's what I used to test	17:14
sean-k-mooney	oh "When doing a live volume migration "	17:15
geguileo	but that has CPU implications when running, so it's best only to change it when the block migration is going to happen	17:15
sean-k-mooney	geguileo: we have a differnt concept called block-migration in the context of live migrating a vm	17:15
dansmith	...yeah	17:15
geguileo	sean-k-mooney: yes, I mean block live migration, I've updated the bug name, thanks	17:16
dansmith	I was going to say, we should never be block migrating a volume	17:16
geguileo	sean-k-mooney: ooooh, then soooooooorry for mixing terms (/me facepalms)	17:16
sean-k-mooney	geguileo: block live migratin in nova mean live migrate a vm with local raw/qcow storage	17:16
dansmith	it means we literally move all the data	17:16
dansmith	(all the disk data)	17:16
geguileo	my bad, I've updated the LP bug	17:17
geguileo	dansmith: we are currently moving ALL the data	17:17
sean-k-mooney	geguileo: ya so i know knwo what your trying to fix	17:17
geguileo	that's why the detect_zeroes would be good for volume live migration	17:17
sean-k-mooney	so we expect this to be done by cider using the driver assited migration feature	17:17
geguileo	sean-k-mooney: thanks for you patience in understanding my ramblings :-)	17:17
sean-k-mooney	however you want nova ot be intelegent enough	17:17
sean-k-mooney	so that when we fallback to nova doing the volume migration	17:18
sean-k-mooney	that we also preseve the sparceness	17:18
geguileo	sean-k-mooney: problem is that driver assisted migration cannot work between different backends (and afaik it doesn't work for any driver even between volumes of the same array)	17:18
geguileo	sean-k-mooney: afaik today all online volume migrations are done by nova	17:18
dansmith	wait what?	17:18
geguileo	I don't think any cinde driver supports it	17:18
opendevreview	Amit Uniyal proposed openstack/nova master: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/881457	17:19
opendevreview	Amit Uniyal proposed openstack/nova master: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/882284	17:19
sean-k-mooney	geguileo: i was pretty sure you fixed at least one vendor driver last year	17:19
dansmith	why would we ever want to do that? unless you're crossing AZs or something	17:19
geguileo	dansmith: all volumes that are attached to now are migrated by nova	17:19
geguileo	s/now/nova	17:19
sean-k-mooney	geguileo: if you change backend right	17:19
sean-k-mooney	not just retype	17:19
sean-k-mooney	within the same backend	17:19
dansmith	retype is a different thing right?	17:19
sean-k-mooney	yes	17:20
dansmith	I'm talking about you live migrate from one server to the next one in the rack, we should not be moving all the volume data to a new volume...	17:20
geguileo	dansmith: retype is a different think, but many times it triggers a migration	17:20
geguileo	dansmith: oh, yeah, not a nova live migration	17:20
geguileo	dansmith: it's a volume live migration	17:20
sean-k-mooney	retypes can be within hte same backend (just different qos policy) or to a diffent backend	17:20
geguileo	basically when you mirror the data from one volume to another	17:20
dansmith	okay, I guess we're confusing too many things	17:21
geguileo	sean-k-mooney: correct!	17:21
sean-k-mooney	so i remmeber a custoemr issue like 4-6 months ago where i tought wew fixed scalio or one of the other driver to explcitly preserve sparcens when doign a driver assited voluem migration	17:21
geguileo	dansmith: what's the right term for moving data from one attached volume to another while the instance is running?	17:21
dansmith	swap volume I think?	17:22
geguileo	sean-k-mooney: I fixed for offline and to report the value to Nova	17:22
dansmith	that's the action we see, AFAIK	17:22
sean-k-mooney	on the nova side its the swap volume is what is called yes	17:22
geguileo	dansmith: ok, I'll try to talk about swap volume	17:22
sean-k-mooney	well no volume migration is fine	17:22
dansmith	geguileo: not trying to make you use our language, I just need to know that a bunch of terms have been re-used :)	17:23
sean-k-mooney	but you are asserting that the driver asseited path never works for an online volume migration	17:23
geguileo	dansmith: oh, I prefer to use the right language	17:23
geguileo	sean-k-mooney: I don't think we have any cinder driver capable of doing it...	17:23
sean-k-mooney	i see...	17:24
sean-k-mooney	that kind of suck since you have to transet all the data via the compute node then	17:24
dansmith	I just don't understand where that happens	17:24
geguileo	maybe RBD can...	17:24
dansmith	if the instance is running but nova is transferring the data between volumes.. where in nova is that happening?	17:24
geguileo	sean-k-mooney: agreed, it sucks	17:24
geguileo	dansmith: libvirt/QEMU supports that	17:25
geguileo	by adding a mirror to the disk	17:25
geguileo	and once the volumes are mirrored	17:25
geguileo	nova removes the old volume	17:25
dansmith	ah, okay and is that what we're poking via swap?	17:25
geguileo	I think so	17:25
geguileo	I believe that's the swap volume	17:26
dansmith	okay I thought our swap was just "pause, change the connection, unpause"	17:26
sean-k-mooney	https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#LL2241C16-L2241C16	17:26
sean-k-mooney	there i think	17:26
dansmith	oh okay so it really is us blocking on that action, interesting	17:27
geguileo	sean-k-mooney: sounds about right	17:27
sean-k-mooney	i dug into this a few months ago but i have mostly purged that info	17:27
dansmith	man, that sucks :)	17:27
sean-k-mooney	its actully libvirt doing this	17:27
sean-k-mooney	but there are a few flags we can pass	17:27
geguileo	yeah, Nova just waits for the job to complete	17:27
sean-k-mooney	to have it sparify the zeors	17:27
geguileo	yeah, it's the detect_zeroes option from https://paste.openstack.org/show/brFgX6MgBlxjgCrE3rbg/	17:28
geguileo	well, that's the brute force approach for my tests	17:28
geguileo	because that enables them ALL the time	17:28
geguileo	which sucks	17:28
geguileo	but I was able to confirm that it works for NFS/qcow2 and RBD	17:28
geguileo	doesn't work for SCSI devices though	17:28
sean-k-mooney	well it depens on if there is any perfornace overhead to it normally	17:28
geguileo	sean-k-mooney: there is performance overhead	17:29
dansmith	definitely	17:29
geguileo	so I think it only makes sense when you are going to be reading the whole thing	17:29
geguileo	and then you save on network + writes	17:29
geguileo	so only enable it during the volume swap operation	17:29
sean-k-mooney	well we cant really change this in responce to a volume migration api request	17:29
geguileo	then disable it	17:29
dansmith	you're trading cpu for disk	17:30
sean-k-mooney	even if we coudl im not sure if we should	17:30
geguileo	dansmith: in volume swap, we are trading cpu for network + disk + time	17:30
dansmith	yeah I meant if it's enabled all the time	17:30
dansmith	and yeah, network too	17:31
sean-k-mooney	well it would only have an effect on livemigration and on the intal disk creation	17:31
geguileo	dansmith: I don't think we should enable it all the time, because if the storage supports discard, then the disk will be recovered by periodically calling fstrimg like some OSs do	17:31
dansmith	yeah	17:31
geguileo	but I think this can greatly improve some volume swap cases	17:31
sean-k-mooney	well discard is off by default	17:31
sean-k-mooney	and currently only works if you use virtio-scsi which is not our default	17:32
geguileo	sean-k-mooney: yeah, but that's something we have to improve on the cinder side	17:32
sean-k-mooney	no i mean in the nova side	17:32
geguileo	so we properly report the value	17:32
sean-k-mooney	we have a config option to opt into allowing discard	17:32
sean-k-mooney	and by default we dont	17:32
geguileo	yeah, but if cinder reports it is supported then nova does the right thing	17:32
geguileo	sean-k-mooney: really?	17:32
geguileo	sean-k-mooney: which one? because I don't recall touching nova conf to enable that one	17:33
geguileo	(maybe devstack does automatically)	17:33
sean-k-mooney	https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_disk_discard	17:33
sean-k-mooney	so that contols if discard works for local disk at least	17:34
geguileo	sean-k-mooney: oh, but for local, not cinder	17:34
geguileo	good to know about that one	17:34
sean-k-mooney	i was under the impression it had an effect for cinder too but im not sure	17:34
sean-k-mooney	geguileo: so the reason the discard beahvior came to my attention	17:34
geguileo	sean-k-mooney: I don't think it does	17:34
sean-k-mooney	was i wanted to make discard the defautl for nova	17:34
sean-k-mooney	and found it broke virtio-blk	17:34
geguileo	sean-k-mooney: that would be awesome!!!	17:34
sean-k-mooney	so i fixed that	17:34
geguileo	what did you fix?	17:35
sean-k-mooney	i removed the block based on the qemu min version	17:35
sean-k-mooney	and i coudl boot vms	17:35
sean-k-mooney	so what i need to do is repoduce this locally again	17:35
sean-k-mooney	and do some manual testing	17:35
sean-k-mooney	to confirm if discard with qcow acturally works	17:36
geguileo	sean-k-mooney: LVM with LIO doesn't currently support trimming	17:36
geguileo	it's one of the bugs I have a local fix for	17:36
sean-k-mooney	ok but that wont affect things right	17:36
sean-k-mooney	since that driver wont report discard supprot	17:36
geguileo	so you may see fstrim telling you it has recovered space, but it hasn't really	17:36
geguileo	sean-k-mooney: we can ask cinder drivers to report discard support	17:36
sean-k-mooney	well again i dont care about the cinder case im trying to fix discard support for non-cinder storage	17:37
geguileo	using the report_discard_supported backend option	17:37
geguileo	sean-k-mooney: I was digging into the discard case for cinder lol	17:37
geguileo	and forgot to look into the ephemeral case	17:37
sean-k-mooney	yep i know :)	17:37
sean-k-mooney	ephmeral means somethign esle in nova	17:38
geguileo	sean-k-mooney: I'm doing a writeup on my findings, so I'll send you the link later so you can add yours as well	17:38
sean-k-mooney	acl	17:38
sean-k-mooney	*ack	17:38
geguileo	sean-k-mooney: dansmith thank you both for your time :-)	17:41
dansmith	same :)	17:41
carloss	o/ bauzas - what are your thoughts on having a follow-up cross-project session between Nova and Manila in the PTG next week? we can use it to chat about gouthamr's specs	18:08
opendevreview	Merged openstack/nova master: Add debug logging when Instance raises OrphanedObjectError https://review.opendev.org/c/openstack/nova/+/883325	20:07

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!