Tuesday, 2025-05-06

opendevreview	Michael Still proposed openstack/nova master: libvirt: Add extra spec for sound device. https://review.opendev.org/c/openstack/nova/+/926126	05:23
opendevreview	Michael Still proposed openstack/nova master: Protect older compute managers from sound model requests. https://review.opendev.org/c/openstack/nova/+/940770	05:23
opendevreview	Michael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection. https://review.opendev.org/c/openstack/nova/+/927354	05:23
zigo	haleyb: Thanks.	08:06
zigo	haleyb: Can you give me your email address and the one of Billy (wolsen) please ?	08:06
zigo	haleyb: I would welcome you guys to join #debian-openstack, and #debian-openstack-commits, which is were I discuss OpenStack packaging.	08:07
opendevreview	Arnaud Morin proposed openstack/nova master: Fix limit when instances are stuck in build_requests https://review.opendev.org/c/openstack/nova/+/947804	09:45
opendevreview	Elod Illes proposed openstack/osc-placement stable/2025.1: Add bindep.txt for ubunutu 24.04 support https://review.opendev.org/c/openstack/osc-placement/+/948867	12:06
gibi	sean-k-mooney[m] dansmith FYI: the first real complication with threading https://review.opendev.org/c/openstack/oslo.service/+/945720/comments/fb5d6632_f0eaf102	12:18
elodilles	hi stable maintainers, may i get a +2+W for this simple clean cherry pick that fixes osc-placement's stable/2025.1 gate? o:) https://review.opendev.org/c/openstack/osc-placement/+/948867	12:20
sean-k-mooney	gibi: that not really related to threading	12:21
sean-k-mooney	that related to multi processing and how cotyledon is handling workers	12:22
gibi	sean-k-mooney: the python doc says do not mix threading with os.fork	12:22
gibi	we do	12:22
gibi	and I see the bad results of it	12:22
sean-k-mooney	well we were not but we are now	12:23
gibi	if I force os.spawn as suggested then I see the pickling error due to a lambda in oslo.config	12:23
sean-k-mooney	so even without the eventlet removal work	12:23
sean-k-mooney	if we enabel the cotyledon backend it would fail in the same way	12:24
sean-k-mooney	at least in the places we are spawnign thread today if they interacted with a thread pool	12:25
sean-k-mooney	im not if we really shoudl be using worker process instead of worker threads	12:26
gibi	we have limited number of thread pools and those are started in the worker process I guess making it no problem. Now that our default ThreadPoolExecutor is also threaded, and that is starting in the master process, os.fork become a problem	12:27
gibi	I will see if the only pickling error is the oslo.config lamda, and if I can patch that out, but if not then we have a nice complication in our hand.	12:28
sean-k-mooney	well my point is i think this is perhaps a bug on oslo.service	12:28
sean-k-mooney	since the new backend does not behave the same as the old processlauncher	12:29
gibi	it is not a bug it is a behavior of os.fork. The child inherits the parent's state	12:29
sean-k-mooney	although maybe it is the same https://github.com/openstack/oslo.service/blob/master/oslo_service/backend/eventlet/service.py#L556	12:29
sean-k-mooney	gibi: sure but the fork is ment ot happen before we ever create any of the executors	12:30
gibi	nope	12:30
gibi	let me link it to you...	12:30
sean-k-mooney	the thread pools are not initalised until there first use right?	12:31
gibi	right	12:31
gibi	this is the point of the fork https://github.com/openstack/nova/blob/a5bcaf69b1a80d4d02fe092900471a6e7a28e292/nova/cmd/scheduler.py#L51	12:31
sean-k-mooney	which happen after the fork if there is one	12:31
gibi	but Service.create() is before that	12:31
gibi	and that already depends on our executors and therefore initialize them	12:31
sean-k-mooney	which threadpool is causign the error	12:31
sean-k-mooney	is it the oslo one or one of the ones we are creating	12:32
gibi	in the scheduler startup both the default and the scatter gather pool is initialized before the fork by Service().create	12:32
sean-k-mooney	ok i would not expect either to be sued at this point	12:33
sean-k-mooney	the service creat is going to hit cell0	12:33
sean-k-mooney	for the schduler at least	12:33
gibi	Service.create calls https://github.com/openstack/nova/blob/a5bcaf69b1a80d4d02fe092900471a6e7a28e292/nova/service.py#L258 which calls scatter-gather	12:33
sean-k-mooney	ah i see	12:34
gibi	the default pool is a bit more complicated but also used for the Schedulers of async init	12:34
sean-k-mooney	ok so we have a few option to adress that i guess.	12:35
sean-k-mooney	we could move all this init logic later into the workers	12:35
sean-k-mooney	you menthioned something about not usign os.fork and using spawn?	12:35
gibi	sean-k-mooney: it depends on if we can kill the master from the worker in a consistent way. the raise_if_old_compute should stop the master process	12:36
gibi	sean-k-mooney: we can force os.spawn	12:36
gibi	but that hits a pickling error when it spanws the worker	12:36
sean-k-mooney	do we have the equivlent for an thread join for processes	12:36
gibi	the interface exists I'm not sure about the semantic	12:37
sean-k-mooney	i.e. can we have the master wait for all the child process to exit	12:37
gibi	also current default behavior is that if the worker dies the master creates a new worker	12:37
sean-k-mooney	i would not expect that but ok	12:37
sean-k-mooney	i assume that handeld in oslo.service?	12:38
gibi	either oslo.service or cotyledon	12:38
gibi	one of those	12:38
gibi	I saw the logic once so I can find it again if needed	12:38
elodilles	(thanks gibi for the +2+W o/)	12:38
gibi	but bottom line, in some cases we want to respawn the worker, but in other cases we want to kill the master	12:39
gibi	so we need an active information flow from worker to master	12:39
sean-k-mooney	i do not really expect either ot take actions like that. if things die i expect that to propagate up and systemd or whatever process is maanging nova would hanel that	12:39
gibi	to influence the logic, if we move the version check to the worker	12:39
gibi	systemd handles the master process	12:39
opendevreview	Arnaud Morin proposed openstack/nova master: Fix limit when instances are stuck in build_requests https://review.opendev.org/c/openstack/nova/+/947804	12:39
gibi	so if master dies the systemd restarts that	12:40
gibi	but if a single worker dies killing the rest of the workers and then let the master die to let systemd restart it seems dangerous without graceful shutdown	12:40
sean-k-mooney	right and i think the master process shoudl just spawn the chile process and wiat for them to exit. any recreation of the child process i woudl expect to be handled in nova	12:40
sean-k-mooney	its tricky however.	12:41
sean-k-mooney	is this one of the know open issue for the eventlet remove	12:41
sean-k-mooney	*removal	12:41
gibi	I can a) try to patch oslo.config to support os.spawn or b) try to re-init the executors in the worker if I can detect that I'm a worker c) move the version check to the worker and try to find a way to signal the master no to respawn but to exit	12:43
gibi	I linked the issue to the #openstack-eventlet-removal as well so maybe Herve will have some ideas too	12:44
sean-k-mooney	looking at the eventlet oslo service code	12:44
sean-k-mooney	the respawn logic was hardcoded	12:45
sean-k-mooney	so while i dont really expect that to happen it seams like its how it has alwasy worked	12:45
stephenfin	dansmith: Could you clarify what we're trying to say about nova-metadata-wsgi under local/global deployments here? https://docs.openstack.org/nova/latest/admin/cells.html#nova-metadata-api-service	12:45
sean-k-mooney	gibi: we do have another option	12:45
sean-k-mooney	which is to not use oslo to create teh works but to spwan them on a process pool ourselves	12:46
stephenfin	Are we simply saying that `[api] local_metadata_per_cell` should be false for global and true for local? Because with the removal of the eventlet server, there's no way to run the metadata API in the same service as the compute (REST) API now so "standalone" doesn't make much sense	12:47
sean-k-mooney	gibi: that would allow us to get back futures form the process pools and we can then catch the excption and decied what to do based on that	12:47
stephenfin	(because it's always "standalone")	12:47
gibi	yeah that is option d). But I guess other projects will have similar problems so a nova local solution is not a nice one	12:47
sean-k-mooney	gibi: well option e woudl be do d in oslo	12:48
gibi	sean-k-mooney: :)	12:48
gibi	stephenfin: you are probably correct	12:48
sean-k-mooney	gibi: neutron may have some ideas ro we could look to one of the service that has been usign coytolon or what ever it is for years	12:48
gibi	yeah	12:49
sean-k-mooney	stephenfin: local metadata per cell should be set to true if you have confiured the metadata agent deploy by neutron to use the local nova metadta api endpoint	12:49
sean-k-mooney	stephenfin: but i think what the doc is saying is	12:50
stephenfin	> The nova metadata API service must not be run as a standalone service, using the nova-metadata-wsgi service, in this case.	12:50
sean-k-mooney	if you do that you need to deploy the dedicated nova-metadata-api wsgi app	12:51
sean-k-mooney	im not sure why the combined one could not work however	12:51
stephenfin	There's no combined one	12:51
stephenfin	Not in WSGI. Only eventlet (now removed)	12:51
sean-k-mooney	oh then that would be why i guess	12:51
sean-k-mooney	ack	12:51
gibi	yepp the combined one was eventlet only	12:52
gibi	in the nova-operator we always start the metadata wsgi as separate pod either on top or on cell level	12:52
sean-k-mooney	ok i guess we were just tryign to say if you want to run per cell metadata you need to deploy addtional metadata api (at least 1 per cell)	12:53
stephenfin	gibi: tbc, when you say on top or cell-level you're refererring to the global and local deployment topologies described in that doc, yeah?	12:53
gibi	yeapp	12:53
sean-k-mooney	yep	12:53
stephenfin	okay, sweet	12:53
gibi	stephenfin: do you see a need for clarifying things in our doc?	12:53
stephenfin	I think so. The sentence I quoted above is confusing IMO	12:54
stephenfin	> The nova metadata API service must not be run as a standalone service, using the nova-metadata-wsgi service, in this case.	12:54
stephenfin	To me, that says "you cannot use nova-metadata-wsgi" service if deploying in a global configuration	12:54
sean-k-mooney	that was not ture even before we remvoe the combined one	12:55
sean-k-mooney	but now it can be simplifed	12:55
stephenfin	and since we no longer provide nova-metadata (i.e. the eventlet one), that would suggest global deployments weren't possible anymore	12:55
sean-k-mooney	we only have the wsgi service on master	12:55
sean-k-mooney	and only the split endpoints	12:55
stephenfin	yeah, hence my concern	12:55
stephenfin	but it sounds like it's just worded weirdly	12:55
opendevreview	Merged openstack/osc-placement stable/2025.1: Add bindep.txt for ubunutu 24.04 support https://review.opendev.org/c/openstack/osc-placement/+/948867	12:55
sean-k-mooney	we used to have seperate console script for nova-api and nvoa-api-metadata	12:55
stephenfin	and I can rephrase like so	12:55
sean-k-mooney	i think gibi dropped the refences to the console scripts when they were remvoed	12:56
stephenfin	> The ``api.local_metadata_per_cell`` option must be set to ``False``	12:56
sean-k-mooney	but keepd the verbage related to the pbr wsgi scripts	12:56
stephenfin	Or drop the sentence entirely	12:56
gibi	yeah I tried to cleanup the doc when the eventlet server is removed but probably missed things	12:56
sean-k-mooney	it need to be set based on teh toplogy	12:57
gibi	yeah, if the metadata is deployed globally (top level) the local_metadata_per_cell needs to be false, but when deployed metadata to each cell then it need to be true	12:57
sean-k-mooney	it tecnically optiona to set this to true for the per cell deployment by the way. it default to false	12:58
sean-k-mooney	meanign if you deploy per cell metadata they can lookup metdata for other cells if they have access to the api db to do so	12:58
gibi	sean-k-mooney: it depends, the the local metadata has cell db access then sure it can be set to false	12:58
gibi	yeah that	12:58
gibi	sean-k-mooney: cotyledon has the worker respawn implemented here https://github.com/sileht/cotyledon/blob/be444189de32a8c29c7107a9b02da44248a7e64a/cotyledon/_service_manager.py#L254-L256	12:59
sean-k-mooney	https://docs.openstack.org/nova/latest/configuration/config.html#api.local_metadata_per_cell is basically a configuration to prevnet that cross cell lookup	12:59
gibi	yepp	13:00
gibi	option b) aka re-initing the executor after fork works, based on the process name stored on the executor compared to the current process name. But this is ugly and nova only solution. Fortunately we will move all our spawns to executors so at least it applicable easily for now	13:14
opendevreview	Merged openstack/osc-placement stable/2025.1: Update .gitreview for stable/2025.1 https://review.opendev.org/c/openstack/osc-placement/+/943757	13:38
dansmith	gibi: that move to spawn soon is surprising to me, as I suspect it will be hella slow, especially for openstack code	14:02
dansmith	gibi: I'm not sure where we care about process workers, other than the api services in standalone mode.. in wsgi mode we shouldn't be forking at all, right?	14:02
gibi	we don't spawn much, just at the start to get the worker processes (or when they die)	14:02
gibi	dansmith: this is scheduler. The default behavior of oslo.service is to fork the workers	14:03
dansmith	oh I guess conductor needs process workers too	14:03
gibi	conductor too	14:03
dansmith	do we need to do that for scheduler though? we did under eventlet for parallelism, but I'm not sure either do going forward	14:04
gibi	threading in python 3.12 still limited by the GIL, so having a way to spawn processes as scaling make sense as that way we can saturate more CPU cores if needed	14:05
dansmith	sure, but there are lots of multithreaded python programs providing reasonable performance with the GIL :)	14:06
gibi	Im not 100% sure but I feel that oslo.service with worker=1 still forks a worker proc from the master proc	14:06
dansmith	I suspect neither are cpu-intensive enough to really need full parallelism	14:06
dansmith	okay	14:06
gibi	but I think I understand you, we might not need to fork just use a single proc with big enough thread pools. I'm not sure this is something that supported by oslo.service out of the box.	14:09
gibi	could be an improvement request	14:09
gibi	to provide alternative way to avoid the fork	14:09
dansmith	yeah, just might be worth consideration	14:10
dansmith	conductor especially I suspect is mostly waiting on mysql and rabbit connections anyway.. scheduler might do enough work traversing lots of host objects or something, but I suspect not anymore with placement	14:11
gibi	I think we had issues with the Numa filter where I needed to add cacheing to help with the execution time	14:14
gibi	so meh	14:14
opendevreview	Merged openstack/osc-placement stable/2025.1: Update TOX_CONSTRAINTS_FILE for stable/2025.1 https://review.opendev.org/c/openstack/osc-placement/+/943758	14:17
opendevreview	Balazs Gibizer proposed openstack/nova master: Translate scatter-gather to futurist https://review.opendev.org/c/openstack/nova/+/947966	14:17
opendevreview	Balazs Gibizer proposed openstack/nova master: Use futurist for _get_default_green_pool() https://review.opendev.org/c/openstack/nova/+/948072	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Replace utils.spawn_n with spawn https://review.opendev.org/c/openstack/nova/+/948076	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Add spawn_on https://review.opendev.org/c/openstack/nova/+/948079	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Move ComputeManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948186	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Move ConductorManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948187	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Make nova.utils.pass_context private https://review.opendev.org/c/openstack/nova/+/948188	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Rename DEFAULT_GREEN_POOL to DEFAULT_EXECUTOR https://review.opendev.org/c/openstack/nova/+/948086	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Make the default executor configurable https://review.opendev.org/c/openstack/nova/+/948087	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Print ThreadPool statistics https://review.opendev.org/c/openstack/nova/+/948340	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: WIP: allow service to start with threading https://review.opendev.org/c/openstack/nova/+/948311	14:18
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM:Run nova-next with n-sch in threading mode https://review.opendev.org/c/openstack/nova/+/948450	14:18
opendevreview	Stephen Finucane proposed openstack/nova master: setup: Remove pbr's wsgi_scripts https://review.opendev.org/c/openstack/nova/+/902688	14:18
stephenfin	sean-k-mooney: gibi: Context for my question earlier ^	14:18
gibi	dansmith: sean-k-mooney: so I think I fixed the fork issue with re-initing the executor in the worker processes https://review.opendev.org/c/openstack/nova/+/947966/8/nova/utils.py#1276 not nice but seem to work. (I noticed the issue when tested slow / never finishing scatter gatter scenarios locally)	14:19
dansmith	yeah that's not great	14:21
gibi	stephenfin: thank I will take a loook	14:21
dansmith	gibi: is there any way I could get you to stop putting many lines of comments in the middle of a condition like that? seems like a pattern that has been emerging and it completely breaks my brain when trying to reason about the logic like that :(	14:21
gibi	feel free to leave a comment and I will move the comment	14:22
gibi	I tried to be as close to the condition as possible but I can move it a bit further	14:22
dansmith	gibi: I may have missed context from the above conversation about where this gets initialized so early, but is moving that an option?	14:28
gibi	it is the service version check that happens before the fork	14:28
gibi	and that uses scatter-gather	14:28
gibi	(and also any init that happens at Service.create() today)	14:29
gibi	we can move it but we have no way to signal to the master process that a worker want to kill the master if the version check fails	14:29
gibi	I need to drop for a bit, I will be back for the nova meeting	14:29
dansmith	ah, I see	14:29
dansmith	seems like we could maybe just disable the pooling there, or just destroy the threadpool at the end of service.create before we return to allow the fork to be clean?	14:30
gibi	disable pooling needs infra as the version check uses the generic scatter-gather. I would not duplicate that	14:31
gibi	destroying the pool before the fork could work	14:31
gibi	probably a bit cleaner than the re-init	14:31
gibi	but	14:31
gibi	I need to test it	14:32
dansmith	for the disable pooling suggestion, I meant something like "use a temporary pool instead of the global one" - I know we need a pool there	14:32
dansmith	you could also have a "sequential thread pool" that just runs each cell query synchronously in order instead of the threads maybe too	14:33
dansmith	but yeah, just destroying after we're done with that check would be simple and clean, I suspect	14:33
dansmith	I will miss the nova meeting today, btw. see you later	14:34
sean-k-mooney	gibi: if notthing else it confirm that the issue is with how it forking and not reinitalisting it	14:36
sean-k-mooney	gibi: dansmith i was on a metting but would markign the pool as thread local help	14:36
sean-k-mooney	they would not be shared with the process correct. the only problem with that is if work on a thread pool wanted to add somethign to the same thread pool	14:37
sean-k-mooney	my thinking is we should not have that patter in general	14:37
sean-k-mooney	so if we mark the module level thread pool at threadlocal then on fork it will be initalsed to empty	14:38
sean-k-mooney	and the first attempt to use it will init it in the child process without any shareing of state	14:38
sean-k-mooney	having a way to disable scatter gahter for that first call on startup would also be an option	14:39
sean-k-mooney	dansmith: by "sequential thread pool" you mean a sequtial executor right. like the one we used in tests	14:40
dansmith	yes	14:41
sean-k-mooney	ok futureist already provides that https://github.com/openstack/futurist/blob/master/futurist/_futures.py#L227	14:41
sean-k-mooney	so that would be an option for the early init	14:42
* dansmith nods		14:42
sean-k-mooney	gibi: i comment with links to the thread local apporch but i have not tested that change it mich be worth trying with a DNM patch or locally	14:50
sean-k-mooney	im just not sure if its the correct patthern to apply to this type of problem	14:50
sean-k-mooney	i.e. can we convert all module global state to thread local storage if we have this problem or would that only work in this case	14:51
sean-k-mooney	we dotn really want to have to have thread local db engine facades for example	14:51
opendevreview	Stephen Finucane proposed openstack/placement master: setup: Remove pbr's wsgi_scripts https://review.opendev.org/c/openstack/placement/+/919582	14:54
stephenfin	sean-k-mooney: Think you can remove your -1 on that now? ^	14:54
sean-k-mooney	probaly we had that supprot in epoxy right	14:55
sean-k-mooney	so we should be able to remove it now if we wanted too	14:55
stephenfin	Yep	14:58
sean-k-mooney	stephenfin: im still hesitent to remvoe this entrily for the simple reason that apache mod_wsgi does not suport using the module approch supproted by uwsgi/gunicorn	14:58
sean-k-mooney	with that said	14:58
sean-k-mooney	can the wsgi module be used directly	14:58
sean-k-mooney	i.e. can you just poing mod_wsgi at https://github.com/openstack/placement/blob/master/placement/wsgi/api.py	14:59
sean-k-mooney	that effectivly what was generated by pbr right	14:59
sean-k-mooney	so for modwsgi you would just point to that in site packages ?	15:00
sean-k-mooney	ok so not quite https://termbin.com/5n98	15:02
sean-k-mooney	but for the mod_wsgi usecase the answer is yes	15:02
sean-k-mooney	care to call that out in the release note?	15:02
sean-k-mooney	hum	15:05
sean-k-mooney	python3 -c "import placement.wsgi.api; print(placement.wsgi.api.__file__);"	15:05
sean-k-mooney	so because we dont card that with if __main__	15:05
sean-k-mooney	that import actully runs some of the code	15:06
sean-k-mooney	stephenfin: anyway im not sure if we need to expalin how to locate /opt/stack/placement/placement/wsgi/api.py on the system	15:07
sean-k-mooney	but instead of /opt/stack/data/venv/bin/placement-api they just need to use ^	15:07
sean-k-mooney	if this was not devstack that would be in /usr/lib/python3/dist-packages/ or /usr/lib/python3/site-packages/	15:09
Uggla	Nova meeting in ~50mn	15:10
Uggla	Nova meeting in ~10mn, time for you to grab a cup of coffee.	15:49
gibi	If I drink a coffee no then won't sleep until 2 in the morning	15:53
gibi	sean-k-mooney: the thread local has a problem that if we have a periodic that wants to run a scatter-gather then the thread holding the periodic will see a different scatter-gather pool than the main one	15:54
gibi	so I will look into the destroy pool before fork idea	15:55
sean-k-mooney	i think there is a generic hook we can implemnt for that	15:57
sean-k-mooney	we have some module levle reset functionaltiy that we use for mutable config	15:58
sean-k-mooney	im thinking of thing like atexit()	15:58
sean-k-mooney	there might be a pre/post fork hook we could register to do that	15:58
gibi	https://github.com/sileht/cotyledon/blob/be444189de32a8c29c7107a9b02da44248a7e64a/cotyledon/_service_manager.py#L156 cotyledon has hooks	16:00
Uggla	#startmeeting nova	16:02
opendevmeet	Meeting started Tue May 6 16:02:07 2025 UTC and is due to finish in 60 minutes. The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot.	16:02
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	16:02
opendevmeet	The meeting name has been set to 'nova'	16:02
bauzas	\o	16:02
Uggla	Hello everyone	16:02
Uggla	awaiting a moment for people to join.	16:02
elodilles	o/	16:02
fwiesel	o/	16:02
gmaan	o/	16:03
Uggla	thanks bauzas for last week meeting.	16:04
bauzas	np	16:04
gibi	o/	16:04
Uggla	#topic Bugs (stuck/critical)	16:04
Uggla	#info No Critical bug	16:04
Uggla	#topic Gate status	16:05
Uggla	#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs	16:05
Uggla	#link https://etherpad.opendev.org/p/nova-ci-failures-minimal	16:05
Uggla	#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status	16:05
Uggla	#info Please look at the gate failures and file a bug report with the gate-failure tag.	16:05
Uggla	#info Please try to provide meaningful comment when you recheck	16:05
Uggla	If I understood well, the gate was blocked by this https://review.opendev.org/c/openstack/nova/+/948392 last week.	16:06
gibi	jepp	16:06
gibi	fix is landed	16:06
Uggla	It is landed now.	16:06
Uggla	It the gate looks good. Please tell me if I'm wrong.	16:06
sean-k-mooney	the cyborg and barbican jobs are still broken i think	16:06
gibi	I have hit by https://bugs.launchpad.net/glance/+bug/2109428 multiple times recently. Not a blocker but definitly a source of rechecks	16:06
gibi	sean-k-mooney: yepp those non-votings are broken	16:07
gibi	bugs are filed	16:07
sean-k-mooney	i may try and find time to go fix them	16:07
sean-k-mooney	they are both broken on the lack of a pyproject.toml	16:07
gibi	yepp	16:07
Uggla	good to know thx gibi	16:07
gibi	https://bugs.launchpad.net/barbican/+bug/2109584	16:07
gibi	https://bugs.launchpad.net/openstack-cyborg/+bug/2109583	16:07
sean-k-mooney	if i can fix it without having to install the service locally i might give it a try	16:07
sean-k-mooney	i can refence thos bugs	16:08
gibi	thanks	16:08
gmaan	I think we need to make pyproject.toml for all projects in openstack otherwise they will break slowly at some point	16:08
gibi	gmaan: I agree	16:08
Uggla	should we track this ?	16:09
sean-k-mooney	gmaan: yes we will	16:09
sean-k-mooney	Uggla: nova is mostly doen stephen and i tried to do this 2 years ago	16:09
sean-k-mooney	to get ahead of things breaking	16:10
sean-k-mooney	so nova and placment are done	16:10
sean-k-mooney	i need to check os-*	16:10
sean-k-mooney	but for the libs we are responible for it trivial	16:10
sean-k-mooney	ok os-vif is still pending	16:11
Uggla	any bug / blueprint to refer to this work ?	16:11
sean-k-mooney	ill start working on them and ping folks to review	16:11
gmaan	++	16:11
Uggla	sean-k-mooney++	16:11
Uggla	anything else ?	16:12
Uggla	moving on to next item.	16:12
Uggla	#topic tempest-with-latest-microversion job status	16:12
Uggla	#link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-with-latest-microversion&skip=0	16:12
Uggla	I have just discussed with gmann about it.	16:12
Uggla	gmann is progressing on this periodic job.	16:13
Uggla	gmaan, I let you give a quick status if you wish.	16:13
gmaan	sure	16:14
gmaan	it is not worst than what i suspected, only 23 tests failing	16:14
gmaan	I started fixing those one by one	16:14
gmaan	#link https://review.opendev.org/q/topic:%22latest-microversion-testing%22	16:14
gmaan	hypervisor are fixed, I will fix a few more today and this week	16:14
sean-k-mooney	do you know if some of them are invalid failure	16:15
gmaan	that is ^^ topic I am adding all changes to, feel free to review/comment	16:15
sean-k-mooney	i.e. the test is depending on an older microverion behavior	16:15
sean-k-mooney	and should be skipped when using latest	16:15
gmaan	sean-k-mooney: not invalid, either we need to fix schema or cap the test with min/max microversions	16:15
sean-k-mooney	ok ya that actully what i was wondering	16:15
gmaan	sean-k-mooney: yeah, for example hypervisor uptime test should run till 2.87	16:15
sean-k-mooney	can we express in tempst	16:15
sean-k-mooney	that this test has a max rather then just min verion requirement	16:16
gmaan	yes with 'max_microversion'	16:16
sean-k-mooney	ack	16:16
sean-k-mooney	ah i see that how your fixing it in https://review.opendev.org/c/openstack/tempest/+/948490	16:16
sean-k-mooney	cool	16:16
gmaan	yeah ^^	16:16
gmaan	sometime we need to refactor test but not bug deal.	16:17
gmaan	that is all on these, fell free to review, as I am only active core in tempest I will merge them but keep it open for sometime if anyone would like to review	16:17
gmaan	most probably, i will keep all fixs open till job is green	16:18
Uggla	thx gmaan	16:18
Uggla	#topic Release Planning	16:19
Uggla	#link https://releases.openstack.org/flamingo/schedule.html	16:19
Uggla	#link https://releases.openstack.org/flamingo/schedule.html	16:19
Uggla	The patch about nova deadlines has been merged so I think it is ok.	16:19
Uggla	Please let me know if it is not the case or if something is wrong.	16:20
Uggla	#topic Review priorities	16:20
Uggla	#link https://etherpad.opendev.org/p/nova-2025.2-status	16:21
Uggla	I'd like to progress on openapi and I'll try to check with stephenfin about it.	16:22
Uggla	#topic Stable Branches	16:22
Uggla	elodilles, the mic is yours	16:23
elodilles	thanks Uggla , so	16:23
elodilles	#info stable/2023.2 (bobcat) is End of Life, branch is deleted (tag: bobcat-eol)	16:23
elodilles	#info maintained stable branches: stable/2025.1, stable/2024.2, stable/2024.1	16:23
elodilles	down to 3 maintained branches again ;)	16:23
elodilles	#info nova stable release from stable/2024.1 is out (29.2.1)	16:24
elodilles	#info not aware of any stable gate failure	16:24
elodilles	#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci	16:24
elodilles	we had a broken gate on stable/2025.1 osc-placement,	16:24
elodilles	but it is now fixed	16:25
elodilles	thanks all for the help :)	16:25
elodilles	and i think that's all from me	16:25
elodilles	Uggla: back to you	16:25
Uggla	elodilles, fyi 947847: nova: Release 2024.2 Dalmatian 30.0.1 \| https://review.opendev.org/c/openstack/releases/+/947847 should be ok now.	16:25
Uggla	As I have done 948811: Add uc check alternative method \| https://review.opendev.org/c/openstack/requirements/+/948811	16:25
elodilles	Uggla: thanks for working on that!	16:26
Uggla	the path is on master with a backport to the stable branch.	16:26
elodilles	i've just added a comment on your patch o:)	16:26
Uggla	so it needs reviews.	16:26
Uggla	but at least the verify is +1	16:27
opendevreview	sean mooney proposed openstack/os-vif master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-vif/+/899946	16:27
Uggla	#topic vmwareapi 3rd-party CI efforts Highlights	16:28
fwiesel	#info No updates	16:28
fwiesel	Uggla: Back to you	16:28
Uggla	fwiesel, btw welcome back, I hope you are good.	16:28
Uggla	#topic Gibi's news about eventlet removal.	16:29
fwiesel	Thanks! Just still catching up with everything.	16:29
gibi	o/	16:29
Uggla	#link Series: https://gibizer.github.io/categories/eventlet/	16:29
gibi	so last week we saw nova-scheduler running with native threading. There is a blogpost about it	16:29
bauzas	had no time to review those patches yet :(	16:29
gibi	this week I started testing the non happy path of the scatter-gather	16:29
* bauzas hardly crying		16:30
gibi	good news both mysql server and pymysql client can be configured with timeouts to avoid handing gather threads	16:30
gibi	I will add a new doc about it in tree	16:30
gibi	bad news, the oslo.service threading backend uses forks	16:31
gibi	which does not play nice with out global threadpools initialized before the service workers are forked off	16:31
gibi	we have workarounds	16:31
gibi	details are in https://review.opendev.org/c/openstack/oslo.service/+/945720/comments/fb5d6632_f0eaf102	16:31
gibi	there are two patches to look at	16:31
gibi	https://review.opendev.org/c/openstack/nova/+/948064?usp=search now with test coverage	16:32
gibi	and	16:32
gibi	https://review.opendev.org/c/openstack/nova/+/948437?usp=search	16:32
gibi	this week I will work on cleaning up the long service to make more patches ready to review	16:32
gibi	that is all	16:32
gibi	Uggla: back to you	16:32
sean-k-mooney	gmaan: Uggla: just a quick update on the pyproject.toml change if i may	16:33
sean-k-mooney	https://etherpad.opendev.org/p/pep-517-and-pip-23 is my old ether pad to track that work	16:33
Uggla	thx gibi	16:33
Uggla	sean-k-mooney, sure go ahead	16:33
sean-k-mooney	and only the os-vif change above is not mergd for nova	16:33
sean-k-mooney	so i rebased and appvoed that	16:33
gmaan	just saw, ++	16:33
sean-k-mooney	once that is landed we shoudl be good	16:33
Uggla	\o/	16:34
sean-k-mooney	anyway that was all on that topic	16:34
Uggla	gibi, just a question you have said : the oslo.service threading backend uses forks it means real processes and not threads ?	16:35
gibi	Uggla: no we want worker processes with thread pool	16:35
gibi	pools	16:35
gibi	but it uses os.fork to create the worker from the main process	16:35
sean-k-mooney	context is oslo service	16:35
sean-k-mooney	allows you to have multiple workers	16:35
gibi	fork copies the state of the process	16:36
sean-k-mooney	we use that in the schduler and conductor	16:36
sean-k-mooney	but not nava-comptue or the api	16:36
gibi	we would need os.spawn to get a totally new worker process	16:36
gibi	without inheriting state	16:36
Uggla	ok I think I have understood the pb.	16:37
gibi	the wa is to reset the problematic state	16:37
gibi	after the fork	16:37
gibi	but I think the real solution would be os.spawn	16:37
gibi	but changing to that is not super simple as oslo.config has non pickleable lambdas :/	16:37
gibi	details are in the linked gerrit comment	16:37
gibi	and are up in the today IRC log	16:38
dansmith	we can reset after the fork (as you're doing) or reset before the fork, as I suspect will also work and be less janky	16:38
gibi	yepp	16:38
gibi	I will try dansmith's suggestion next	16:38
gibi	while we wait for the oslo folks to respond	16:38
sean-k-mooney	https://github.com/openstack/oslo.config/blob/d6e5c96d6dbeec0db974dfb8afc8e508b74861e5/oslo_config/cfg.py#L1387	16:38
dansmith	I agree the real solution is spawn, but we probably don't want to wait	16:38
sean-k-mooney	that is there only use of lambda i think by the way	16:38
gibi	dansmith: yepp we won't wait, we will go with the WA, and adapt when oslo makes the move	16:39
dansmith	+1	16:39
Uggla	moving on	16:40
Uggla	#topic Open discussion	16:40
Uggla	fwiesel, wants to propose something about Cross hypervisor resize.	16:41
gibi	I guess that was last week's topic	16:41
Uggla	yep but I understood you wanted to discuss it more.	16:42
gibi	or maybe a continuation	16:42
fwiesel	Yes, but there was limited feedback. I mean, if no one feels strongly about it, then I can go forward with a blueprint.	16:42
fwiesel	We would like to allow mobility between two hypervisors, and I was thinking the cross-cell migration might already cover it to a large degree.	16:43
fwiesel	The question is though, if that is a use-case you would feel worthwhile supporting or rather not.	16:44
dansmith	I think it _has_ to be more like cross-cell migration than regular	16:44
dansmith	I'm a bit mixed on whether or not I think this is worth it, because I suspect there will be a lot of gotchas in the image properties	16:45
dansmith	and because you can pretty much do this with snapshot yourself now	16:45
dansmith	and because we won't really be able to test it regularl I don't think	16:45
gibi	yeah testing this will be painful	16:46
fwiesel	Okay, got it. That's fine.	16:46
sean-k-mooney	dansmith: specicly like updating the hw_vif_model for say the vmware one to virtio if they dont happen to have common values already	16:46
dansmith	yeah, all that kind of stuff	16:46
fwiesel	Well, we would set those in the flavours and that would override that.	16:47
sean-k-mooney	so general question. is there anyting we know off that would prevent you form shelving, modifying them in glance and unshleving?	16:47
fwiesel	No, it is more a usability issue.	16:47
fwiesel	For our users.	16:48
dansmith	sean-k-mooney: if the flavor you booted from was vmware-specific you can't unshelve to a libvirt-y one right?	16:48
sean-k-mooney	fwiesel: so historically flavors are for amounts and images propereis are for changing how the devices are presented	16:48
fwiesel	And we were thinking of using shared NFS shares to avoid going through image upload, etc...	16:48
sean-k-mooney	fwiesel: so in the past the precendet was dont put device models in flavors	16:48
sean-k-mooney	dansmith: correct	16:48
sean-k-mooney	dansmith: what im thinkign is we dicussed allowing resize in the sheleved state	16:49
dansmith	to me, snapshot, tweak, boot fresh is the best pattern here	16:49
sean-k-mooney	so the workflow would be shelve. resize , update image properties and then unshleve	16:49
dansmith	sean-k-mooney: that's a whole other conversation	16:49
fwiesel	But I got it... hard to test so, no takers... Perfectly understandable.	16:49
sean-k-mooney	ya im just confirming if there is a way to do that today with the api we have and i think the gaps are resize while shleved	16:49
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM:Run nova-next with n-sch in threading mode https://review.opendev.org/c/openstack/nova/+/948450	16:50
fwiesel	Thanks for the feedback.	16:50
sean-k-mooney	but ya snapshot. detach volumes/ports and boot new vms with thos is a valid workflow too	16:50
Uggla	Shall I say it is ok for fwiesel to propose something that goes in that direction ?	16:52
fwiesel	My understanding is, I won't propose things as it is doable with the existing API and implementing it within Nova is hard to test and maintain.	16:53
dansmith	I think anyone can propose anything they want :)	16:53
dansmith	fwiesel: ++	16:54
Uggla	oh ok good.	16:54
Uggla	We are almost at the top of the hour. Are you ok triaging a couple of bugs or would you prefer to do it next week ?	16:56
Uggla	sean-k-mooney, dansmith ^^	16:58
Uggla	ok no answer so you might be busy. So closing the meeting.	16:59
Uggla	thanks all	16:59
gibi	Uggla: thanks	17:00
fwiesel	thanks everyone	17:00
Uggla	#endmeeting	17:00
opendevmeet	Meeting ended Tue May 6 17:00:10 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	17:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/nova/2025/nova.2025-05-06-16.02.html	17:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/nova/2025/nova.2025-05-06-16.02.txt	17:00
opendevmeet	Log: https://meetings.opendev.org/meetings/nova/2025/nova.2025-05-06-16.02.log.html	17:00
bauzas	thanks Uggla	17:00
Uggla	thank you. We will try to do the bug scrubbing next week	17:00
dansmith	sorry, multitasking	17:01
elodilles	thanks Uggla o/	17:01
gibi	sean-k-mooney: I stop for today but I will get back to forking tomorrow :)	17:04
sean-k-mooney	gibi: no worries	17:19
sean-k-mooney	i started at 7am because i woke at 5 so i shoudl also finsih for today	17:19
opendevreview	Merged openstack/os-vif master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-vif/+/899946	20:48
opendevreview	Merged openstack/nova master: [quota]Refactor group counting to scatter-gather https://review.opendev.org/c/openstack/nova/+/948064	22:40

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!