Wednesday, 2025-11-12

clarkb	cardoe: got it. I suspect that we can change that to "please use an external wsgi server" or something along those lines and be less uwsgi specific	00:05
clarkb	which is probably a good first step if we think we'll be moving away from uwsgi (whcih I think is being forced on us)	00:05
cardoe	I mean it does the wrong thing for modern kernels and the answer is to locally patch.	00:17
clarkb	oh I was only aware of the bugfixes only and slow python support updates	00:17
clarkb	I didn't realize that it was actively harmful to use at this point.	00:17
cardoe	So it’s really in a k8s environment. It reads the wrong stuff for memory. So it’ll OOM	00:34
cardoe	It also handles health checks resulting in killing it poorly. It stops servicing loops.	00:37
clarkb	is it not treating reload-on-rss properly beacuse ti reads some incorrect rss value?	00:49
clarkb	or maybe limit-as doesn't work correctly	00:49
cardoe	I haven't had the cycles to chase it down specifically.	01:40
cardoe	The other issue we've got is let's say for glance there's 10 workers in a process. Someone kicks off a bunch of image uploads and that one pod gets 10 of those. Now there's nothing answering the k8s health check since it doesn't have any out of band health check so it fails health checks. k8s will restart the pod which fine do it after the workers finish. nah. let's stop calling epoll() and drop everything on the ground.	01:42
frickler	fun. though in that case I'd argue that the k8s check is bogus	06:19
sean-k-mooney	clarkb: gunicorn is a good droping replacement for our current use of uwsgi	07:49
sean-k-mooney	clarkb: i have been putting off trying to swap devstack to it because of other thing on my plate but that is the imieadeate replacement i see for us	07:50
sean-k-mooney	i thinke with oslo.wsgi we have the opertuntiy to recommend a more moderen and maintaiend stack	07:51
mnasiadka	sean-k-mooney: last time I tried using gunicorn with Nova I had problems with Nova properly parsing CLI arguments - but that was like two cycles ago	08:29
mnasiadka	But yes, I agree we should have a supported alternative for uwsgi - that also can support ASGI	08:31
sean-k-mooney	mnasiadka: that because using cli agrument is not technially supprote by the wsgi spec and someithng you shoudl not do	08:33
sean-k-mooney	supprot for it in other wsgi servers is all non standard and not interoperable	08:33
sean-k-mooney	in general you do not need to use cli args for nova's api	08:34
mnasiadka	Well, we only use CLI arguments to point to config files, so I guess that’s not a problem	08:34
mnasiadka	But at that point in time we started moving everything to a ,,standard’’ Ansible role that generates config for uWSGI - since Devstack uses that and we didn’t want to reinvent the wheel	08:34
sean-k-mooney	ya so if you put the files in the default location it will just work and you may be also able to set the path vai env vars	08:35
sean-k-mooney	i think i added that but i woudl need to check	08:35
mnasiadka	But the good thing is that it’s now easy to add support for gunicorn - once devstack uses that for testing and we’re sure we’re not going to send operators to a bad place with using something else than uWSGI	08:36
sean-k-mooney	https://github.com/openstack/nova/blob/master/nova/api/openstack/wsgi_app.py#L43-L51	08:36
sean-k-mooney	so ya https://github.com/openstack/nova/commit/73fe84fa0ea6f7c7fa55544f6bce5326d87743a6	08:37
sean-k-mooney	itdoes out tha tthe canges i mad were not really requried in the end because we already supproted OS_NOVA_CONFIG_DIR and oslo.config already will search for a /etc/nova/nova.conf.d/ direcotry and load all files there	08:38
sean-k-mooney	btu this give you a little more contole if you really need it	08:38
cardoe	frickler: what other check other than seeing if the pod is still listening on the port its suppose to be servicing would make sense?	13:22
cardoe	sean-k-mooney: so for OpenStack Helm I've gotta pass paths right now and I've gotta come up with a better answer. The helm chart generates /etc/nova/nova.conf, the user can mount overrides into /etc/nova/nova.conf.d/, the init containers generate stuff into /tmp/nova/nova.conf.d/	13:26
sean-k-mooney	cardoe: if you puthign them in /etc/nova and /etc/nova/nova.conf.d	13:32
cardoe	They're read-only in the pod	13:33
sean-k-mooney	you do not need to pass anything to nova to read those	13:33
sean-k-mooney	yep that is fine	13:33
sean-k-mooney	we do the same	13:33
cardoe	So the init container will write a config snippet that'll have like myip set.	13:33
sean-k-mooney	your issue is that you cant copy the content form /tmp to them	13:33
cardoe	Yes. /etc is read-only.	13:34
sean-k-mooney	ya it proably shoudl not be	13:34
sean-k-mooney	myip does not need to be set by hte way	13:34
cardoe	Bad example then. I dunno what the nova one does.	13:34
sean-k-mooney	it can be but you only need to set it to override the default	13:34
cardoe	It's a projected volume so its gonna be read-only in the pod.	13:35
sean-k-mooney	projected form a config map?	13:35
cardoe	one or more config maps and one or more secrets	13:35
sean-k-mooney	what we do is we use kolla's init system so we project the readonly config map to /var/lib/openstack/nova/config and then copy it to /etc/nova.*	13:36
cardoe	We did it this way so that the user can have restart-less changes like debug level by editing the configmap.	13:37
sean-k-mooney	that still needs a SIG_HUP	13:37
sean-k-mooney	to the pod to have it take effect	13:37
sean-k-mooney	well to the nova process	13:37
sean-k-mooney	in any case for the nova api and metadata you can defien an envionment vaaribel to have it read addtion config files or directories	13:38
cardoe	Yeah but not everything is consistent.	13:39
sean-k-mooney	sure	13:39
cardoe	Not even all services read /etc/$service/$service.conf.d when run via WSGI servers still	13:40
sean-k-mooney	but using cli options is not wsgi compleent	13:40
cardoe	I don't disagree.	13:40
sean-k-mooney	cardoe: they shoudl if they are usign oslo.config	13:40
sean-k-mooney	that is done automaticlly	13:40
cardoe	Its not	13:40
sean-k-mooney	it is if they have initalized the oslo.config properly. so swift wont	13:41
cardoe	Its done automatically by reading /proc/self/procname	13:41
cardoe	They have to call oslo.config and pass project="nova" or whatever project it is.	13:41
sean-k-mooney	yep or have procname set properly	13:41
cardoe	I've been submitting patches the past 2 cycles	13:41
cardoe	Then some of them parse sys.args (cause oslo.config defaults that) of what the WSGI server had sent to it.	13:44
sean-k-mooney	yep nova will do that if it can	13:44
sean-k-mooney	but again that not standars complianet form a pure wsgi point of view	13:44
sean-k-mooney	but it works for uwsgi and maybe apache	13:45
cardoe	yeah except when the args aren't intended for nova but the wsgi server itself and the flag is the same for nova and the wsgi server but with a different value or meaning	13:45
sean-k-mooney	we are unfortully still using apache https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/novaapi/config/httpd.conf but im hoping we can eventually use somehting lighter weight	13:45
sean-k-mooney	we do at least use apache for tls termination so its not entirly unhelpful	13:46
sean-k-mooney	cardoe: the reason that command line arge were ever supproted was really beacuse fo the eventlet webserver and the assocated console script	13:48
sean-k-mooney	sicne we were shiping a wsgi applciaiton and its hosting server as a single entry poin twe coudl forward agument to the applction	13:49
cardoe	yeah it makes sense	13:49
cardoe	Just the number of rando bug reports for starting up a process over the past few cycles has been surprising to me.	13:50
sean-k-mooney	for gunicorn you can pass -n to set the applcation name for what its worth	13:50
sean-k-mooney	https://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments	13:50
cardoe	OpenStack Helm lets the user pick their own for uwsgi and gunicorn so that's part of the problem.	13:50
sean-k-mooney	ack, i think kolla semi recently mvoed form apache2 to uwsgi in the last 2-3 years	13:51
sean-k-mooney	i think they only supprot one however now	13:51
sean-k-mooney	as in they decised they were porting and only supprot both as a tempory messure	13:51
sean-k-mooney	although i coudl be wrong	13:52
sean-k-mooney	oh they still have both https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/templates/nova-api.json.j2	13:52
cardoe	as far as the HUP, I've done something ghetto in a few services which is touch a file early on. Then with bash if the modification time of configs are newer, touch the magic file again and send PID 1 a HUP.	13:54
cardoe	Since k8s doesn't have a signaling path	13:54
sean-k-mooney	cardoe: ya so for GMR you can just use a path based trigger	13:55
sean-k-mooney	there is not reason oslo.config coudl not supprot it for config reloading	13:55
cardoe	GMR?	13:56
sean-k-mooney	Guru meditation reports	13:57
sean-k-mooney	https://docs.openstack.org/oslo.reports/latest/user/history.html#id38	13:57
sean-k-mooney	in 1.11.0 oslo report gained the ablity to trigger on file modification events	13:58
sean-k-mooney	there is no reason that oslo.config coudl not do the same in principal	13:58
sean-k-mooney	either on a specific file or by watchign all of the config files for chagne	13:58
sean-k-mooney	in our downstream installer we configure gmr to trigger on touching /var/lib/nova	13:59
sean-k-mooney	https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/nova.conf#L378	13:59
sean-k-mooney	i think that uses inotify or similar behind the seens	14:00
cardoe	Yeah	14:05
cardoe	That would make sense	14:05
sean-k-mooney	for what its worth we do have a simialr desire i to allow some fields ot update without a restart, mainly for password rotation, that sort of thing	14:09
sean-k-mooney	right now we hash the cofnig map and do a full pod restart anytime the content changes	14:09
sean-k-mooney	but long term it woudl be nice if the reload was just built into oslo.config	14:10
cardoe	Yeah that would make sense.	14:16
cardoe	So hashing the configmap and restarting is what most of the OpenStack Helm (OSH) stuff does but trying to make it possible to change some things without that.	14:16
-opendevstatus- NOTICE: Zuul job log URLs for storage.*.cloud.ovh.net are temporarily returning an access denied/payment required error, but the provider has been engaged and is working to correct it		14:50
clarkb	cardoe: you could have uwsgi listen on a second port that isn't exposed maybe? I'm not arguing for uwsgi but usually there are workarounds like that with webservers in particular	15:44
cardoe	clarkb: all http-sockets run in the same loop so it won't service anymore	17:21
cardoe	The best approach that folks have suggested is to talk to it via the master-fifo which is a named pipe. They're not wrong.	17:22
cardoe	uWSGI 2.1 will bring that master-fifo as a regular socket... but that version's been canceled.	17:23
clarkb	interesting. Looks like there is also snmp support	17:24
clarkb	but also looks like sigint is expected to be less bad than sigterm. Not sure if the differences are meaningful for say the glance upload problem	17:24
cardoe	I will say one of my bigger complaints with that glance problem was finally fixed in 2.0.31 so less of an argument out of me now.	17:30
cardoe	The fix that sat in their PR queue for that was merged 30 days ago.	17:33
cardoe	It had been authored by someone years ago.	17:33
cardoe	It had been brought up on openstack-discuss a number of times.	17:33
cardoe	My only point is that it seems very very passively maintained and I don't want to see us tied to another thing that we're going to struggle to migrate away from like eventlet in the future.	17:35
cardoe	The "official" source download URL had its SSL certificate expired for nearly 3 weeks when I brought this up in the TC before. OpenStack projects switched to fetching it from files.pythonhosted.org	17:36
clarkb	the main issue opendev has run into is that the compilation is not reliable on arm64. We ended up dropping arm64 as a wrokaround for uwsgi container image builds. But I also have some changes up to switch to a different wsgi server	17:37
clarkb	cardoe: I thought openstack consumes uwsgi from distro packages?	17:37
clarkb	anyway I'm the last person that would advocate for uwsgi. I'm actively trying to remove it from opendev	17:37
cardoe	There were a couple things fetching it and building it.	17:38
clarkb	granian is what I am experimenting with because it supports rsgi and asgi too so seems flexible. But frickler rightly called out it too appears to be maintained by a single person and could go the way of uwsgi quickly (though I think it is far more actively maintained today)	17:39
cardoe	yeah that sounds more flexible for the future.	17:41
cardoe	uwsgi also only has commits from one person cause the commercial entity behind it seems to be gone and their company website is just a redirect to the uwsgi github project page.	17:42
cardoe	That one committer also very clearly works at a company that's doing AI stuff around opentelemetry	17:42
clarkb	yes I think that is why uwsgi is on life support now. granian is strictly better but may not be appropriate for openstack	17:42
cardoe	When the entity behind uwsgi used to be a web hosting company.	17:43
clarkb	gunicorn was mentioned earlier and is probably more similar to what we thought uwsgi was	17:43
cardoe	I'm still using uwsgi as well. I've not gone to anything else.	17:43
clarkb	I've just sent email about this, but I think we often confuse/conflate decisions made to achieve goals as static entities. Having a goal along the lines of "Openstack should be compatible with a well tested and performant wsgi server" is probably still a good goal to have. Uwsgi being that server is probably no longer addressing the goal.	17:48
clarkb	all that to say I think it might be healthy for us to shift to a mindest where we consider goals first and whether or not the goals are still relevant and then whether or not the decisions we've made are still in alignment with those goals and adjust accordingly when making big changes	17:48
clarkb	when all we see is "uwsgi bad" or "prefer pytest" it is really easy to overlook the reasons that we may do something one way or another in the first place and create new unexpected problems when implementing changes unilaterally or without broader consideration for community needs. FWIW I think the uwsgi replacement process has been collaborative and open in the community so may	17:50
clarkb	be a bad example	17:50
clarkb	but also I think if we become more open to reevaluation then just like I can git revert a commit that goes side ways we build in an expectation of more agility in the first place. WHich for a decision like "should we allow pytest" is probably super low risk to revert or pivot later	17:52
sean-k-mooney	clarkb: nice email by the way :)	18:23
sean-k-mooney	clarkb: on uwsig replacement im not actlly aware fo any real effort yet to move off uwsgi	18:24
sean-k-mooney	we have talked about it	18:24
clarkb	sean-k-mooney: ya I think most of the effort has been in evaluating potential alternatives	18:24
sean-k-mooney	but i dont think anyone has actully made concreate proposals to do it other then using apache	18:24
sean-k-mooney	the wsigi server in general shoudl be replacable	18:25
gouthamr	> clarkb: nice email by the way :)	18:25
gouthamr	++	18:25
sean-k-mooney	so i think this is more a case fo pocing it adding supprot to devstack and just chooing a new server	18:25
clarkb	yes, though the code you write to glue the wsgi server to the application sometimes differs by wsgi implementation. https://review.opendev.org/c/opendev/system-config/+/944806/5/playbooks/roles/lodgeit/templates/docker-compose.yaml.j2 sort of points at that	18:26
clarkb	I suspect that is actually part of the issue here as we've long tried to ship that glue as a one size fits all (at least for uwsgi)	18:26
sean-k-mooney	ya so we used to use pbr to standarise that	18:26
sean-k-mooney	but obviously that not going to be a thing we continue to do	18:27
clarkb	ya pressure from both sides forcing us to give up on that	18:27
sean-k-mooney	we could perhasp centralise that in oslo.wsgi if we were to create that	18:27
clarkb	which is maybe a good sign we shouldn't bother	18:27
sean-k-mooney	ya maybe i think for the most part the binding we use	18:28
sean-k-mooney	wont realy vary form service to servce	18:28
sean-k-mooney	but it can form server ot server so even having a example for each might be eough	18:28
mnaser	from an operator side, it "shouldn't" matter -- but it ends up mattering somehow. for example, neutron-server had some stuff where out of the box you couldn't just .. switch to wsgi because the old eventlet based server started up other tasks in the background/etc	18:28
mnaser	i am gonna guess switching to uwsgi helped us uncover all of that mess	18:29
sean-k-mooney	well neutron was a bit special	18:29
sean-k-mooney	because they didnt have a pure api	18:29
mnaser	yeah so i think we are past that period, so having a pure wsgi entrypoint should be easy now	18:29
sean-k-mooney	the netorn server was both the rest api and the conductore of long runing and perodic tasks	18:29
mnaser	it would be a matter of flipping it from one server to another..	18:29
sean-k-mooney	right but it shoudl be now	18:29
sean-k-mooney	they have now actully split the wsgi app and the rpc/conductor process	18:30
mnaser	and now i guess it can/could be as easy as a configure_wsgi api in devstack that would simply (maybe) use a different server depending on a localrc config	18:31
mnaser	and it should just work(tm)	18:31
sean-k-mooney	now that they did the engenierring work to seperatre it out then ya	18:31
sean-k-mooney	im not sure if there are other proejct that dont have the seperation however that woudl have to do the same exercise	18:32
sean-k-mooney	we have see glance watcher and other all have to fix the fact that wehn they ran under the eventlet server	18:32
sean-k-mooney	that htey coudl do long running background tasks	18:32
sean-k-mooney	that didnt actully fit with the wsig request lifecycle	18:33
mnaser	but that means that in theory they already dont work "with uwsgi" anyways	18:33
sean-k-mooney	correct	18:33
sean-k-mooney	they did not	18:33
sean-k-mooney	glance had many feature that were just flat out broken	18:33
sean-k-mooney	it too a while to fix it	18:33
sean-k-mooney	neutron didnt really suprpot runing under uwsgi at all	18:34
sean-k-mooney	or apache really	18:34
sean-k-mooney	the event let removal cause many proejct to revauate there acicture and fix thing like this	18:34
sean-k-mooney	here is an exampel that we are fixing in watcher https://github.com/openstack/watcher/blob/master/watcher/api/scheduling.py	18:35
clarkb	and going back to the big picture I think what I'm ultimately advocating for is a more proactive approach to identifying issues that we may face in the future with the decisions we've made in the past so that we can more proactively take action before it is super painful when you start making quick changes out of necessity	18:36
sean-k-mooney	+1	18:36
sean-k-mooney	not having all this decsions being urgent becasue we need to adress it woudl be a nice change	18:37
fungi	the mailman community seems to prefer gunicorn overall, even though the container images we base our deployment on still rely on uwsgi	19:22

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!