| clarkb | cardoe: got it. I suspect that we can change that to "please use an external wsgi server" or something along those lines and be less uwsgi specific | 00:05 |
|---|---|---|
| clarkb | which is probably a good first step if we think we'll be moving away from uwsgi (whcih I think is being forced on us) | 00:05 |
| cardoe | I mean it does the wrong thing for modern kernels and the answer is to locally patch. | 00:17 |
| clarkb | oh I was only aware of the bugfixes only and slow python support updates | 00:17 |
| clarkb | I didn't realize that it was actively harmful to use at this point. | 00:17 |
| cardoe | So it’s really in a k8s environment. It reads the wrong stuff for memory. So it’ll OOM | 00:34 |
| cardoe | It also handles health checks resulting in killing it poorly. It stops servicing loops. | 00:37 |
| clarkb | is it not treating reload-on-rss properly beacuse ti reads some incorrect rss value? | 00:49 |
| clarkb | or maybe limit-as doesn't work correctly | 00:49 |
| cardoe | I haven't had the cycles to chase it down specifically. | 01:40 |
| cardoe | The other issue we've got is let's say for glance there's 10 workers in a process. Someone kicks off a bunch of image uploads and that one pod gets 10 of those. Now there's nothing answering the k8s health check since it doesn't have any out of band health check so it fails health checks. k8s will restart the pod which fine do it after the workers finish. nah. let's stop calling epoll() and drop everything on the ground. | 01:42 |
| frickler | fun. though in that case I'd argue that the k8s check is bogus | 06:19 |
| sean-k-mooney | clarkb: gunicorn is a good droping replacement for our current use of uwsgi | 07:49 |
| sean-k-mooney | clarkb: i have been putting off trying to swap devstack to it because of other thing on my plate but that is the imieadeate replacement i see for us | 07:50 |
| sean-k-mooney | i thinke with oslo.wsgi we have the opertuntiy to recommend a more moderen and maintaiend stack | 07:51 |
| mnasiadka | sean-k-mooney: last time I tried using gunicorn with Nova I had problems with Nova properly parsing CLI arguments - but that was like two cycles ago | 08:29 |
| mnasiadka | But yes, I agree we should have a supported alternative for uwsgi - that also can support ASGI | 08:31 |
| sean-k-mooney | mnasiadka: that because using cli agrument is not technially supprote by the wsgi spec and someithng you shoudl not do | 08:33 |
| sean-k-mooney | supprot for it in other wsgi servers is all non standard and not interoperable | 08:33 |
| sean-k-mooney | in general you do not need to use cli args for nova's api | 08:34 |
| mnasiadka | Well, we only use CLI arguments to point to config files, so I guess that’s not a problem | 08:34 |
| mnasiadka | But at that point in time we started moving everything to a ,,standard’’ Ansible role that generates config for uWSGI - since Devstack uses that and we didn’t want to reinvent the wheel | 08:34 |
| sean-k-mooney | ya so if you put the files in the default location it will just work and you may be also able to set the path vai env vars | 08:35 |
| sean-k-mooney | i think i added that but i woudl need to check | 08:35 |
| mnasiadka | But the good thing is that it’s now easy to add support for gunicorn - once devstack uses that for testing and we’re sure we’re not going to send operators to a bad place with using something else than uWSGI | 08:36 |
| sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/api/openstack/wsgi_app.py#L43-L51 | 08:36 |
| sean-k-mooney | so ya https://github.com/openstack/nova/commit/73fe84fa0ea6f7c7fa55544f6bce5326d87743a6 | 08:37 |
| sean-k-mooney | itdoes out tha tthe canges i mad were not really requried in the end because we already supproted OS_NOVA_CONFIG_DIR and oslo.config already will search for a /etc/nova/nova.conf.d/ direcotry and load all files there | 08:38 |
| sean-k-mooney | btu this give you a little more contole if you really need it | 08:38 |
| cardoe | frickler: what other check other than seeing if the pod is still listening on the port its suppose to be servicing would make sense? | 13:22 |
| cardoe | sean-k-mooney: so for OpenStack Helm I've gotta pass paths right now and I've gotta come up with a better answer. The helm chart generates /etc/nova/nova.conf, the user can mount overrides into /etc/nova/nova.conf.d/, the init containers generate stuff into /tmp/nova/nova.conf.d/ | 13:26 |
| sean-k-mooney | cardoe: if you puthign them in /etc/nova and /etc/nova/nova.conf.d | 13:32 |
| cardoe | They're read-only in the pod | 13:33 |
| sean-k-mooney | you do not need to pass anything to nova to read those | 13:33 |
| sean-k-mooney | yep that is fine | 13:33 |
| sean-k-mooney | we do the same | 13:33 |
| cardoe | So the init container will write a config snippet that'll have like myip set. | 13:33 |
| sean-k-mooney | your issue is that you cant copy the content form /tmp to them | 13:33 |
| cardoe | Yes. /etc is read-only. | 13:34 |
| sean-k-mooney | ya it proably shoudl not be | 13:34 |
| sean-k-mooney | myip does not need to be set by hte way | 13:34 |
| cardoe | Bad example then. I dunno what the nova one does. | 13:34 |
| sean-k-mooney | it can be but you only need to set it to override the default | 13:34 |
| cardoe | It's a projected volume so its gonna be read-only in the pod. | 13:35 |
| sean-k-mooney | projected form a config map? | 13:35 |
| cardoe | one or more config maps and one or more secrets | 13:35 |
| sean-k-mooney | what we do is we use kolla's init system so we project the readonly config map to /var/lib/openstack/nova/config and then copy it to /etc/nova.* | 13:36 |
| cardoe | We did it this way so that the user can have restart-less changes like debug level by editing the configmap. | 13:37 |
| sean-k-mooney | that still needs a SIG_HUP | 13:37 |
| sean-k-mooney | to the pod to have it take effect | 13:37 |
| sean-k-mooney | well to the nova process | 13:37 |
| sean-k-mooney | in any case for the nova api and metadata you can defien an envionment vaaribel to have it read addtion config files or directories | 13:38 |
| cardoe | Yeah but not everything is consistent. | 13:39 |
| sean-k-mooney | sure | 13:39 |
| cardoe | Not even all services read /etc/$service/$service.conf.d when run via WSGI servers still | 13:40 |
| sean-k-mooney | but using cli options is not wsgi compleent | 13:40 |
| cardoe | I don't disagree. | 13:40 |
| sean-k-mooney | cardoe: they shoudl if they are usign oslo.config | 13:40 |
| sean-k-mooney | that is done automaticlly | 13:40 |
| cardoe | Its not | 13:40 |
| sean-k-mooney | it is if they have initalized the oslo.config properly. so swift wont | 13:41 |
| cardoe | Its done automatically by reading /proc/self/procname | 13:41 |
| cardoe | They have to call oslo.config and pass project="nova" or whatever project it is. | 13:41 |
| sean-k-mooney | yep or have procname set properly | 13:41 |
| cardoe | I've been submitting patches the past 2 cycles | 13:41 |
| cardoe | Then some of them parse sys.args (cause oslo.config defaults that) of what the WSGI server had sent to it. | 13:44 |
| sean-k-mooney | yep nova will do that if it can | 13:44 |
| sean-k-mooney | but again that not standars complianet form a pure wsgi point of view | 13:44 |
| sean-k-mooney | but it works for uwsgi and maybe apache | 13:45 |
| cardoe | yeah except when the args aren't intended for nova but the wsgi server itself and the flag is the same for nova and the wsgi server but with a different value or meaning | 13:45 |
| sean-k-mooney | we are unfortully still using apache https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/novaapi/config/httpd.conf but im hoping we can eventually use somehting lighter weight | 13:45 |
| sean-k-mooney | we do at least use apache for tls termination so its not entirly unhelpful | 13:46 |
| sean-k-mooney | cardoe: the reason that command line arge were ever supproted was really beacuse fo the eventlet webserver and the assocated console script | 13:48 |
| sean-k-mooney | sicne we were shiping a wsgi applciaiton and its hosting server as a single entry poin twe coudl forward agument to the applction | 13:49 |
| cardoe | yeah it makes sense | 13:49 |
| cardoe | Just the number of rando bug reports for starting up a process over the past few cycles has been surprising to me. | 13:50 |
| sean-k-mooney | for gunicorn you can pass -n to set the applcation name for what its worth | 13:50 |
| sean-k-mooney | https://docs.gunicorn.org/en/latest/run.html#commonly-used-arguments | 13:50 |
| cardoe | OpenStack Helm lets the user pick their own for uwsgi and gunicorn so that's part of the problem. | 13:50 |
| sean-k-mooney | ack, i think kolla semi recently mvoed form apache2 to uwsgi in the last 2-3 years | 13:51 |
| sean-k-mooney | i think they only supprot one however now | 13:51 |
| sean-k-mooney | as in they decised they were porting and only supprot both as a tempory messure | 13:51 |
| sean-k-mooney | although i coudl be wrong | 13:52 |
| sean-k-mooney | oh they still have both https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/templates/nova-api.json.j2 | 13:52 |
| cardoe | as far as the HUP, I've done something ghetto in a few services which is touch a file early on. Then with bash if the modification time of configs are newer, touch the magic file again and send PID 1 a HUP. | 13:54 |
| cardoe | Since k8s doesn't have a signaling path | 13:54 |
| sean-k-mooney | cardoe: ya so for GMR you can just use a path based trigger | 13:55 |
| sean-k-mooney | there is not reason oslo.config coudl not supprot it for config reloading | 13:55 |
| cardoe | GMR? | 13:56 |
| sean-k-mooney | Guru meditation reports | 13:57 |
| sean-k-mooney | https://docs.openstack.org/oslo.reports/latest/user/history.html#id38 | 13:57 |
| sean-k-mooney | in 1.11.0 oslo report gained the ablity to trigger on file modification events | 13:58 |
| sean-k-mooney | there is no reason that oslo.config coudl not do the same in principal | 13:58 |
| sean-k-mooney | either on a specific file or by watchign all of the config files for chagne | 13:58 |
| sean-k-mooney | in our downstream installer we configure gmr to trigger on touching /var/lib/nova | 13:59 |
| sean-k-mooney | https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates/nova.conf#L378 | 13:59 |
| sean-k-mooney | i think that uses inotify or similar behind the seens | 14:00 |
| cardoe | Yeah | 14:05 |
| cardoe | That would make sense | 14:05 |
| sean-k-mooney | for what its worth we do have a simialr desire i to allow some fields ot update without a restart, mainly for password rotation, that sort of thing | 14:09 |
| sean-k-mooney | right now we hash the cofnig map and do a full pod restart anytime the content changes | 14:09 |
| sean-k-mooney | but long term it woudl be nice if the reload was just built into oslo.config | 14:10 |
| cardoe | Yeah that would make sense. | 14:16 |
| cardoe | So hashing the configmap and restarting is what most of the OpenStack Helm (OSH) stuff does but trying to make it possible to change some things without that. | 14:16 |
| -opendevstatus- NOTICE: Zuul job log URLs for storage.*.cloud.ovh.net are temporarily returning an access denied/payment required error, but the provider has been engaged and is working to correct it | 14:50 | |
| clarkb | cardoe: you could have uwsgi listen on a second port that isn't exposed maybe? I'm not arguing for uwsgi but usually there are workarounds like that with webservers in particular | 15:44 |
| cardoe | clarkb: all http-sockets run in the same loop so it won't service anymore | 17:21 |
| cardoe | The best approach that folks have suggested is to talk to it via the master-fifo which is a named pipe. They're not wrong. | 17:22 |
| cardoe | uWSGI 2.1 will bring that master-fifo as a regular socket... but that version's been canceled. | 17:23 |
| clarkb | interesting. Looks like there is also snmp support | 17:24 |
| clarkb | but also looks like sigint is expected to be less bad than sigterm. Not sure if the differences are meaningful for say the glance upload problem | 17:24 |
| cardoe | I will say one of my bigger complaints with that glance problem was finally fixed in 2.0.31 so less of an argument out of me now. | 17:30 |
| cardoe | The fix that sat in their PR queue for that was merged 30 days ago. | 17:33 |
| cardoe | It had been authored by someone years ago. | 17:33 |
| cardoe | It had been brought up on openstack-discuss a number of times. | 17:33 |
| cardoe | My only point is that it seems very very passively maintained and I don't want to see us tied to another thing that we're going to struggle to migrate away from like eventlet in the future. | 17:35 |
| cardoe | The "official" source download URL had its SSL certificate expired for nearly 3 weeks when I brought this up in the TC before. OpenStack projects switched to fetching it from files.pythonhosted.org | 17:36 |
| clarkb | the main issue opendev has run into is that the compilation is not reliable on arm64. We ended up dropping arm64 as a wrokaround for uwsgi container image builds. But I also have some changes up to switch to a different wsgi server | 17:37 |
| clarkb | cardoe: I thought openstack consumes uwsgi from distro packages? | 17:37 |
| clarkb | anyway I'm the last person that would advocate for uwsgi. I'm actively trying to remove it from opendev | 17:37 |
| cardoe | There were a couple things fetching it and building it. | 17:38 |
| clarkb | granian is what I am experimenting with because it supports rsgi and asgi too so seems flexible. But frickler rightly called out it too appears to be maintained by a single person and could go the way of uwsgi quickly (though I think it is far more actively maintained today) | 17:39 |
| cardoe | yeah that sounds more flexible for the future. | 17:41 |
| cardoe | uwsgi also only has commits from one person cause the commercial entity behind it seems to be gone and their company website is just a redirect to the uwsgi github project page. | 17:42 |
| cardoe | That one committer also very clearly works at a company that's doing AI stuff around opentelemetry | 17:42 |
| clarkb | yes I think that is why uwsgi is on life support now. granian is strictly better but may not be appropriate for openstack | 17:42 |
| cardoe | When the entity behind uwsgi used to be a web hosting company. | 17:43 |
| clarkb | gunicorn was mentioned earlier and is probably more similar to what we thought uwsgi was | 17:43 |
| cardoe | I'm still using uwsgi as well. I've not gone to anything else. | 17:43 |
| clarkb | I've just sent email about this, but I think we often confuse/conflate decisions made to achieve goals as static entities. Having a goal along the lines of "Openstack should be compatible with a well tested and performant wsgi server" is probably still a good goal to have. Uwsgi being that server is probably no longer addressing the goal. | 17:48 |
| clarkb | all that to say I think it might be healthy for us to shift to a mindest where we consider goals first and whether or not the goals are still relevant and then whether or not the decisions we've made are still in alignment with those goals and adjust accordingly when making big changes | 17:48 |
| clarkb | when all we see is "uwsgi bad" or "prefer pytest" it is really easy to overlook the reasons that we may do something one way or another in the first place and create new unexpected problems when implementing changes unilaterally or without broader consideration for community needs. FWIW I think the uwsgi replacement process has been collaborative and open in the community so may | 17:50 |
| clarkb | be a bad example | 17:50 |
| clarkb | but also I think if we become more open to reevaluation then just like I can git revert a commit that goes side ways we build in an expectation of more agility in the first place. WHich for a decision like "should we allow pytest" is probably super low risk to revert or pivot later | 17:52 |
| sean-k-mooney | clarkb: nice email by the way :) | 18:23 |
| sean-k-mooney | clarkb: on uwsig replacement im not actlly aware fo any real effort yet to move off uwsgi | 18:24 |
| sean-k-mooney | we have talked about it | 18:24 |
| clarkb | sean-k-mooney: ya I think most of the effort has been in evaluating potential alternatives | 18:24 |
| sean-k-mooney | but i dont think anyone has actully made concreate proposals to do it other then using apache | 18:24 |
| sean-k-mooney | the wsigi server in general shoudl be replacable | 18:25 |
| gouthamr | > clarkb: nice email by the way :) | 18:25 |
| gouthamr | ++ | 18:25 |
| sean-k-mooney | so i think this is more a case fo pocing it adding supprot to devstack and just chooing a new server | 18:25 |
| clarkb | yes, though the code you write to glue the wsgi server to the application sometimes differs by wsgi implementation. https://review.opendev.org/c/opendev/system-config/+/944806/5/playbooks/roles/lodgeit/templates/docker-compose.yaml.j2 sort of points at that | 18:26 |
| clarkb | I suspect that is actually part of the issue here as we've long tried to ship that glue as a one size fits all (at least for uwsgi) | 18:26 |
| sean-k-mooney | ya so we used to use pbr to standarise that | 18:26 |
| sean-k-mooney | but obviously that not going to be a thing we continue to do | 18:27 |
| clarkb | ya pressure from both sides forcing us to give up on that | 18:27 |
| sean-k-mooney | we could perhasp centralise that in oslo.wsgi if we were to create that | 18:27 |
| clarkb | which is maybe a good sign we shouldn't bother | 18:27 |
| sean-k-mooney | ya maybe i think for the most part the binding we use | 18:28 |
| sean-k-mooney | wont realy vary form service to servce | 18:28 |
| sean-k-mooney | but it can form server ot server so even having a example for each might be eough | 18:28 |
| mnaser | from an operator side, it "shouldn't" matter -- but it ends up mattering somehow. for example, neutron-server had some stuff where out of the box you couldn't just .. switch to wsgi because the old eventlet based server started up other tasks in the background/etc | 18:28 |
| mnaser | i am gonna guess switching to uwsgi helped us uncover all of that mess | 18:29 |
| sean-k-mooney | well neutron was a bit special | 18:29 |
| sean-k-mooney | because they didnt have a pure api | 18:29 |
| mnaser | yeah so i think we are past that period, so having a pure wsgi entrypoint should be easy now | 18:29 |
| sean-k-mooney | the netorn server was both the rest api and the conductore of long runing and perodic tasks | 18:29 |
| mnaser | it would be a matter of flipping it from one server to another.. | 18:29 |
| sean-k-mooney | right but it shoudl be now | 18:29 |
| sean-k-mooney | they have now actully split the wsgi app and the rpc/conductor process | 18:30 |
| mnaser | and now i guess it can/could be as easy as a configure_wsgi api in devstack that would simply (maybe) use a different server depending on a localrc config | 18:31 |
| mnaser | and it should just work(tm) | 18:31 |
| sean-k-mooney | now that they did the engenierring work to seperatre it out then ya | 18:31 |
| sean-k-mooney | im not sure if there are other proejct that dont have the seperation however that woudl have to do the same exercise | 18:32 |
| sean-k-mooney | we have see glance watcher and other all have to fix the fact that wehn they ran under the eventlet server | 18:32 |
| sean-k-mooney | that htey coudl do long running background tasks | 18:32 |
| sean-k-mooney | that didnt actully fit with the wsig request lifecycle | 18:33 |
| mnaser | but that means that in theory they already dont work "with uwsgi" anyways | 18:33 |
| sean-k-mooney | correct | 18:33 |
| sean-k-mooney | they did not | 18:33 |
| sean-k-mooney | glance had many feature that were just flat out broken | 18:33 |
| sean-k-mooney | it too a while to fix it | 18:33 |
| sean-k-mooney | neutron didnt really suprpot runing under uwsgi at all | 18:34 |
| sean-k-mooney | or apache really | 18:34 |
| sean-k-mooney | the event let removal cause many proejct to revauate there acicture and fix thing like this | 18:34 |
| sean-k-mooney | here is an exampel that we are fixing in watcher https://github.com/openstack/watcher/blob/master/watcher/api/scheduling.py | 18:35 |
| clarkb | and going back to the big picture I think what I'm ultimately advocating for is a more proactive approach to identifying issues that we may face in the future with the decisions we've made in the past so that we can more proactively take action before it is super painful when you start making quick changes out of necessity | 18:36 |
| sean-k-mooney | +1 | 18:36 |
| sean-k-mooney | not having all this decsions being urgent becasue we need to adress it woudl be a nice change | 18:37 |
| fungi | the mailman community seems to prefer gunicorn overall, even though the container images we base our deployment on still rely on uwsgi | 19:22 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!