opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra spec for sound device. https://review.opendev.org/c/openstack/nova/+/926126 | 05:23 |
---|---|---|
opendevreview | Michael Still proposed openstack/nova master: Protect older compute managers from sound model requests. https://review.opendev.org/c/openstack/nova/+/940770 | 05:23 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection. https://review.opendev.org/c/openstack/nova/+/927354 | 05:23 |
zigo | haleyb: Thanks. | 08:06 |
zigo | haleyb: Can you give me your email address and the one of Billy (wolsen) please ? | 08:06 |
zigo | haleyb: I would welcome you guys to join #debian-openstack, and #debian-openstack-commits, which is were I discuss OpenStack packaging. | 08:07 |
opendevreview | Arnaud Morin proposed openstack/nova master: Fix limit when instances are stuck in build_requests https://review.opendev.org/c/openstack/nova/+/947804 | 09:45 |
opendevreview | Elod Illes proposed openstack/osc-placement stable/2025.1: Add bindep.txt for ubunutu 24.04 support https://review.opendev.org/c/openstack/osc-placement/+/948867 | 12:06 |
gibi | sean-k-mooney[m] dansmith FYI: the first real complication with threading https://review.opendev.org/c/openstack/oslo.service/+/945720/comments/fb5d6632_f0eaf102 | 12:18 |
elodilles | hi stable maintainers, may i get a +2+W for this simple clean cherry pick that fixes osc-placement's stable/2025.1 gate? o:) https://review.opendev.org/c/openstack/osc-placement/+/948867 | 12:20 |
sean-k-mooney | gibi: that not really related to threading | 12:21 |
sean-k-mooney | that related to multi processing and how cotyledon is handling workers | 12:22 |
gibi | sean-k-mooney: the python doc says do not mix threading with os.fork | 12:22 |
gibi | we do | 12:22 |
gibi | and I see the bad results of it | 12:22 |
sean-k-mooney | well we were not but we are now | 12:23 |
gibi | if I force os.spawn as suggested then I see the pickling error due to a lambda in oslo.config | 12:23 |
sean-k-mooney | so even without the eventlet removal work | 12:23 |
sean-k-mooney | if we enabel the cotyledon backend it would fail in the same way | 12:24 |
sean-k-mooney | at least in the places we are spawnign thread today if they interacted with a thread pool | 12:25 |
sean-k-mooney | im not if we really shoudl be using worker process instead of worker threads | 12:26 |
gibi | we have limited number of thread pools and those are started in the worker process I guess making it no problem. Now that our default ThreadPoolExecutor is also threaded, and that is starting in the master process, os.fork become a problem | 12:27 |
gibi | I will see if the only pickling error is the oslo.config lamda, and if I can patch that out, but if not then we have a nice complication in our hand. | 12:28 |
sean-k-mooney | well my point is i think this is perhaps a bug on oslo.service | 12:28 |
sean-k-mooney | since the new backend does not behave the same as the old processlauncher | 12:29 |
gibi | it is not a bug it is a behavior of os.fork. The child inherits the parent's state | 12:29 |
sean-k-mooney | although maybe it is the same https://github.com/openstack/oslo.service/blob/master/oslo_service/backend/eventlet/service.py#L556 | 12:29 |
sean-k-mooney | gibi: sure but the fork is ment ot happen before we ever create any of the executors | 12:30 |
gibi | nope | 12:30 |
gibi | let me link it to you... | 12:30 |
sean-k-mooney | the thread pools are not initalised until there first use right? | 12:31 |
gibi | right | 12:31 |
gibi | this is the point of the fork https://github.com/openstack/nova/blob/a5bcaf69b1a80d4d02fe092900471a6e7a28e292/nova/cmd/scheduler.py#L51 | 12:31 |
sean-k-mooney | which happen after the fork if there is one | 12:31 |
gibi | but Service.create() is before that | 12:31 |
gibi | and that already depends on our executors and therefore initialize them | 12:31 |
sean-k-mooney | which threadpool is causign the error | 12:31 |
sean-k-mooney | is it the oslo one or one of the ones we are creating | 12:32 |
gibi | in the scheduler startup both the default and the scatter gather pool is initialized before the fork by Service().create | 12:32 |
sean-k-mooney | ok i would not expect either to be sued at this point | 12:33 |
sean-k-mooney | the service creat is going to hit cell0 | 12:33 |
sean-k-mooney | for the schduler at least | 12:33 |
gibi | Service.create calls https://github.com/openstack/nova/blob/a5bcaf69b1a80d4d02fe092900471a6e7a28e292/nova/service.py#L258 which calls scatter-gather | 12:33 |
sean-k-mooney | ah i see | 12:34 |
gibi | the default pool is a bit more complicated but also used for the Schedulers of async init | 12:34 |
sean-k-mooney | ok so we have a few option to adress that i guess. | 12:35 |
sean-k-mooney | we could move all this init logic later into the workers | 12:35 |
sean-k-mooney | you menthioned something about not usign os.fork and using spawn? | 12:35 |
gibi | sean-k-mooney: it depends on if we can kill the master from the worker in a consistent way. the raise_if_old_compute should stop the master process | 12:36 |
gibi | sean-k-mooney: we can force os.spawn | 12:36 |
gibi | but that hits a pickling error when it spanws the worker | 12:36 |
sean-k-mooney | do we have the equivlent for an thread join for processes | 12:36 |
gibi | the interface exists I'm not sure about the semantic | 12:37 |
sean-k-mooney | i.e. can we have the master wait for all the child process to exit | 12:37 |
gibi | also current default behavior is that if the worker dies the master creates a new worker | 12:37 |
sean-k-mooney | i would not expect that but ok | 12:37 |
sean-k-mooney | i assume that handeld in oslo.service? | 12:38 |
gibi | either oslo.service or cotyledon | 12:38 |
gibi | one of those | 12:38 |
gibi | I saw the logic once so I can find it again if needed | 12:38 |
elodilles | (thanks gibi for the +2+W o/) | 12:38 |
gibi | but bottom line, in some cases we want to respawn the worker, but in other cases we want to kill the master | 12:39 |
gibi | so we need an active information flow from worker to master | 12:39 |
sean-k-mooney | i do not really expect either ot take actions like that. if things die i expect that to propagate up and systemd or whatever process is maanging nova would hanel that | 12:39 |
gibi | to influence the logic, if we move the version check to the worker | 12:39 |
gibi | systemd handles the master process | 12:39 |
opendevreview | Arnaud Morin proposed openstack/nova master: Fix limit when instances are stuck in build_requests https://review.opendev.org/c/openstack/nova/+/947804 | 12:39 |
gibi | so if master dies the systemd restarts that | 12:40 |
gibi | but if a single worker dies killing the rest of the workers and then let the master die to let systemd restart it seems dangerous without graceful shutdown | 12:40 |
sean-k-mooney | right and i think the master process shoudl just spawn the chile process and wiat for them to exit. any recreation of the child process i woudl expect to be handled in nova | 12:40 |
sean-k-mooney | its tricky however. | 12:41 |
sean-k-mooney | is this one of the know open issue for the eventlet remove | 12:41 |
sean-k-mooney | *removal | 12:41 |
gibi | I can a) try to patch oslo.config to support os.spawn or b) try to re-init the executors in the worker if I can detect that I'm a worker c) move the version check to the worker and try to find a way to signal the master no to respawn but to exit | 12:43 |
gibi | I linked the issue to the #openstack-eventlet-removal as well so maybe Herve will have some ideas too | 12:44 |
sean-k-mooney | looking at the eventlet oslo service code | 12:44 |
sean-k-mooney | the respawn logic was hardcoded | 12:45 |
sean-k-mooney | so while i dont really expect that to happen it seams like its how it has alwasy worked | 12:45 |
stephenfin | dansmith: Could you clarify what we're trying to say about nova-metadata-wsgi under local/global deployments here? https://docs.openstack.org/nova/latest/admin/cells.html#nova-metadata-api-service | 12:45 |
sean-k-mooney | gibi: we do have another option | 12:45 |
sean-k-mooney | which is to not use oslo to create teh works but to spwan them on a process pool ourselves | 12:46 |
stephenfin | Are we simply saying that `[api] local_metadata_per_cell` should be false for global and true for local? Because with the removal of the eventlet server, there's no way to run the metadata API in the same service as the compute (REST) API now so "standalone" doesn't make much sense | 12:47 |
sean-k-mooney | gibi: that would allow us to get back futures form the process pools and we can then catch the excption and decied what to do based on that | 12:47 |
stephenfin | (because it's always "standalone") | 12:47 |
gibi | yeah that is option d). But I guess other projects will have similar problems so a nova local solution is not a nice one | 12:47 |
sean-k-mooney | gibi: well option e woudl be do d in oslo | 12:48 |
gibi | sean-k-mooney: :) | 12:48 |
gibi | stephenfin: you are probably correct | 12:48 |
sean-k-mooney | gibi: neutron may have some ideas ro we could look to one of the service that has been usign coytolon or what ever it is for years | 12:48 |
gibi | yeah | 12:49 |
sean-k-mooney | stephenfin: local metadata per cell should be set to true if you have confiured the metadata agent deploy by neutron to use the local nova metadta api endpoint | 12:49 |
sean-k-mooney | stephenfin: but i think what the doc is saying is | 12:50 |
stephenfin | > The nova metadata API service must not be run as a standalone service, using the nova-metadata-wsgi service, in this case. | 12:50 |
sean-k-mooney | if you do that you need to deploy the dedicated nova-metadata-api wsgi app | 12:51 |
sean-k-mooney | im not sure why the combined one could not work however | 12:51 |
stephenfin | There's no combined one | 12:51 |
stephenfin | Not in WSGI. Only eventlet (now removed) | 12:51 |
sean-k-mooney | oh then that would be why i guess | 12:51 |
sean-k-mooney | ack | 12:51 |
gibi | yepp the combined one was eventlet only | 12:52 |
gibi | in the nova-operator we always start the metadata wsgi as separate pod either on top or on cell level | 12:52 |
sean-k-mooney | ok i guess we were just tryign to say if you want to run per cell metadata you need to deploy addtional metadata api (at least 1 per cell) | 12:53 |
stephenfin | gibi: tbc, when you say on top or cell-level you're refererring to the global and local deployment topologies described in that doc, yeah? | 12:53 |
gibi | yeapp | 12:53 |
sean-k-mooney | yep | 12:53 |
stephenfin | okay, sweet | 12:53 |
gibi | stephenfin: do you see a need for clarifying things in our doc? | 12:53 |
stephenfin | I think so. The sentence I quoted above is confusing IMO | 12:54 |
stephenfin | > The nova metadata API service must not be run as a standalone service, using the nova-metadata-wsgi service, in this case. | 12:54 |
stephenfin | To me, that says "you cannot use nova-metadata-wsgi" service if deploying in a global configuration | 12:54 |
sean-k-mooney | that was not ture even before we remvoe the combined one | 12:55 |
sean-k-mooney | but now it can be simplifed | 12:55 |
stephenfin | and since we no longer provide nova-metadata (i.e. the eventlet one), that would suggest global deployments weren't possible anymore | 12:55 |
sean-k-mooney | we only have the wsgi service on master | 12:55 |
sean-k-mooney | and only the split endpoints | 12:55 |
stephenfin | yeah, hence my concern | 12:55 |
stephenfin | but it sounds like it's just worded weirdly | 12:55 |
opendevreview | Merged openstack/osc-placement stable/2025.1: Add bindep.txt for ubunutu 24.04 support https://review.opendev.org/c/openstack/osc-placement/+/948867 | 12:55 |
sean-k-mooney | we used to have seperate console script for nova-api and nvoa-api-metadata | 12:55 |
stephenfin | and I can rephrase like so | 12:55 |
sean-k-mooney | i think gibi dropped the refences to the console scripts when they were remvoed | 12:56 |
stephenfin | > The ``api.local_metadata_per_cell`` option must be set to ``False`` | 12:56 |
sean-k-mooney | but keepd the verbage related to the pbr wsgi scripts | 12:56 |
stephenfin | Or drop the sentence entirely | 12:56 |
gibi | yeah I tried to cleanup the doc when the eventlet server is removed but probably missed things | 12:56 |
sean-k-mooney | it need to be set based on teh toplogy | 12:57 |
gibi | yeah, if the metadata is deployed globally (top level) the local_metadata_per_cell needs to be false, but when deployed metadata to each cell then it need to be true | 12:57 |
sean-k-mooney | it tecnically optiona to set this to true for the per cell deployment by the way. it default to false | 12:58 |
sean-k-mooney | meanign if you deploy per cell metadata they can lookup metdata for other cells if they have access to the api db to do so | 12:58 |
gibi | sean-k-mooney: it depends, the the local metadata has cell db access then sure it can be set to false | 12:58 |
gibi | yeah that | 12:58 |
gibi | sean-k-mooney: cotyledon has the worker respawn implemented here https://github.com/sileht/cotyledon/blob/be444189de32a8c29c7107a9b02da44248a7e64a/cotyledon/_service_manager.py#L254-L256 | 12:59 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#api.local_metadata_per_cell is basically a configuration to prevnet that cross cell lookup | 12:59 |
gibi | yepp | 13:00 |
gibi | option b) aka re-initing the executor after fork works, based on the process name stored on the executor compared to the current process name. But this is ugly and nova only solution. Fortunately we will move all our spawns to executors so at least it applicable easily for now | 13:14 |
opendevreview | Merged openstack/osc-placement stable/2025.1: Update .gitreview for stable/2025.1 https://review.opendev.org/c/openstack/osc-placement/+/943757 | 13:38 |
dansmith | gibi: that move to spawn soon is surprising to me, as I suspect it will be hella slow, especially for openstack code | 14:02 |
dansmith | gibi: I'm not sure where we care about process workers, other than the api services in standalone mode.. in wsgi mode we shouldn't be forking at all, right? | 14:02 |
gibi | we don't spawn much, just at the start to get the worker processes (or when they die) | 14:02 |
gibi | dansmith: this is scheduler. The default behavior of oslo.service is to fork the workers | 14:03 |
dansmith | oh I guess conductor needs process workers too | 14:03 |
gibi | conductor too | 14:03 |
dansmith | do we need to do that for scheduler though? we did under eventlet for parallelism, but I'm not sure either do going forward | 14:04 |
gibi | threading in python 3.12 still limited by the GIL, so having a way to spawn processes as scaling make sense as that way we can saturate more CPU cores if needed | 14:05 |
dansmith | sure, but there are lots of multithreaded python programs providing reasonable performance with the GIL :) | 14:06 |
gibi | Im not 100% sure but I feel that oslo.service with worker=1 still forks a worker proc from the master proc | 14:06 |
dansmith | I suspect neither are cpu-intensive enough to really need full parallelism | 14:06 |
dansmith | okay | 14:06 |
gibi | but I think I understand you, we might not need to fork just use a single proc with big enough thread pools. I'm not sure this is something that supported by oslo.service out of the box. | 14:09 |
gibi | could be an improvement request | 14:09 |
gibi | to provide alternative way to avoid the fork | 14:09 |
dansmith | yeah, just might be worth consideration | 14:10 |
dansmith | conductor especially I suspect is mostly waiting on mysql and rabbit connections anyway.. scheduler *might* do enough work traversing lots of host objects or something, but I suspect not anymore with placement | 14:11 |
gibi | I think we had issues with the Numa filter where I needed to add cacheing to help with the execution time | 14:14 |
gibi | so meh | 14:14 |
opendevreview | Merged openstack/osc-placement stable/2025.1: Update TOX_CONSTRAINTS_FILE for stable/2025.1 https://review.opendev.org/c/openstack/osc-placement/+/943758 | 14:17 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Translate scatter-gather to futurist https://review.opendev.org/c/openstack/nova/+/947966 | 14:17 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Use futurist for _get_default_green_pool() https://review.opendev.org/c/openstack/nova/+/948072 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Replace utils.spawn_n with spawn https://review.opendev.org/c/openstack/nova/+/948076 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add spawn_on https://review.opendev.org/c/openstack/nova/+/948079 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move ComputeManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948186 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move ConductorManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948187 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make nova.utils.pass_context private https://review.opendev.org/c/openstack/nova/+/948188 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename DEFAULT_GREEN_POOL to DEFAULT_EXECUTOR https://review.opendev.org/c/openstack/nova/+/948086 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make the default executor configurable https://review.opendev.org/c/openstack/nova/+/948087 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Print ThreadPool statistics https://review.opendev.org/c/openstack/nova/+/948340 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: WIP: allow service to start with threading https://review.opendev.org/c/openstack/nova/+/948311 | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: DNM:Run nova-next with n-sch in threading mode https://review.opendev.org/c/openstack/nova/+/948450 | 14:18 |
opendevreview | Stephen Finucane proposed openstack/nova master: setup: Remove pbr's wsgi_scripts https://review.opendev.org/c/openstack/nova/+/902688 | 14:18 |
stephenfin | sean-k-mooney: gibi: Context for my question earlier ^ | 14:18 |
gibi | dansmith: sean-k-mooney: so I think I fixed the fork issue with re-initing the executor in the worker processes https://review.opendev.org/c/openstack/nova/+/947966/8/nova/utils.py#1276 not nice but seem to work. (I noticed the issue when tested slow / never finishing scatter gatter scenarios locally) | 14:19 |
dansmith | yeah that's not great | 14:21 |
gibi | stephenfin: thank I will take a loook | 14:21 |
dansmith | gibi: is there any way I could get you to stop putting many lines of comments in the middle of a condition like that? seems like a pattern that has been emerging and it completely breaks my brain when trying to reason about the logic like that :( | 14:21 |
gibi | feel free to leave a comment and I will move the comment | 14:22 |
gibi | I tried to be as close to the condition as possible but I can move it a bit further | 14:22 |
dansmith | gibi: I may have missed context from the above conversation about where this gets initialized so early, but is moving that an option? | 14:28 |
gibi | it is the service version check that happens before the fork | 14:28 |
gibi | and that uses scatter-gather | 14:28 |
gibi | (and also any init that happens at Service.create() today) | 14:29 |
gibi | we can move it but we have no way to signal to the master process that a worker want to kill the master if the version check fails | 14:29 |
gibi | I need to drop for a bit, I will be back for the nova meeting | 14:29 |
dansmith | ah, I see | 14:29 |
dansmith | seems like we could maybe just disable the pooling there, or just destroy the threadpool at the end of service.create before we return to allow the fork to be clean? | 14:30 |
gibi | disable pooling needs infra as the version check uses the generic scatter-gather. I would not duplicate that | 14:31 |
gibi | destroying the pool before the fork could work | 14:31 |
gibi | probably a bit cleaner than the re-init | 14:31 |
gibi | but | 14:31 |
gibi | I need to test it | 14:32 |
dansmith | for the disable pooling suggestion, I meant something like "use a temporary pool instead of the global one" - I know we need *a* pool there | 14:32 |
dansmith | you could also have a "sequential thread pool" that just runs each cell query synchronously in order instead of the threads maybe too | 14:33 |
dansmith | but yeah, just destroying after we're done with that check would be simple and clean, I suspect | 14:33 |
dansmith | I will miss the nova meeting today, btw. see you later | 14:34 |
sean-k-mooney | gibi: if notthing else it confirm that the issue is with how it forking and not reinitalisting it | 14:36 |
sean-k-mooney | gibi: dansmith i was on a metting but would markign the pool as thread local help | 14:36 |
sean-k-mooney | they would not be shared with the process correct. the only problem with that is if work on a thread pool wanted to add somethign to the same thread pool | 14:37 |
sean-k-mooney | my thinking is we should not have that patter in general | 14:37 |
sean-k-mooney | so if we mark the module level thread pool at threadlocal then on fork it will be initalsed to empty | 14:38 |
sean-k-mooney | and the first attempt to use it will init it in the child process without any shareing of state | 14:38 |
sean-k-mooney | having a way to disable scatter gahter for that first call on startup would also be an option | 14:39 |
sean-k-mooney | dansmith: by "sequential thread pool" you mean a sequtial executor right. like the one we used in tests | 14:40 |
dansmith | yes | 14:41 |
sean-k-mooney | ok futureist already provides that https://github.com/openstack/futurist/blob/master/futurist/_futures.py#L227 | 14:41 |
sean-k-mooney | so that would be an option for the early init | 14:42 |
* dansmith nods | 14:42 | |
sean-k-mooney | gibi: i comment with links to the thread local apporch but i have not tested that change it mich be worth trying with a DNM patch or locally | 14:50 |
sean-k-mooney | im just not sure if its the correct patthern to apply to this type of problem | 14:50 |
sean-k-mooney | i.e. can we convert all module global state to thread local storage if we have this problem or would that only work in this case | 14:51 |
sean-k-mooney | we dotn really want to have to have thread local db engine facades for example | 14:51 |
opendevreview | Stephen Finucane proposed openstack/placement master: setup: Remove pbr's wsgi_scripts https://review.opendev.org/c/openstack/placement/+/919582 | 14:54 |
stephenfin | sean-k-mooney: Think you can remove your -1 on that now? ^ | 14:54 |
sean-k-mooney | probaly we had that supprot in epoxy right | 14:55 |
sean-k-mooney | so we should be able to remove it now if we wanted too | 14:55 |
stephenfin | Yep | 14:58 |
sean-k-mooney | stephenfin: im still hesitent to remvoe this entrily for the simple reason that apache mod_wsgi does not suport using the module approch supproted by uwsgi/gunicorn | 14:58 |
sean-k-mooney | with that said | 14:58 |
sean-k-mooney | can the wsgi module be used directly | 14:58 |
sean-k-mooney | i.e. can you just poing mod_wsgi at https://github.com/openstack/placement/blob/master/placement/wsgi/api.py | 14:59 |
sean-k-mooney | that effectivly what was generated by pbr right | 14:59 |
sean-k-mooney | so for modwsgi you would just point to that in site packages ? | 15:00 |
sean-k-mooney | ok so not quite https://termbin.com/5n98 | 15:02 |
sean-k-mooney | but for the mod_wsgi usecase the answer is yes | 15:02 |
sean-k-mooney | care to call that out in the release note? | 15:02 |
sean-k-mooney | hum | 15:05 |
sean-k-mooney | python3 -c "import placement.wsgi.api; print(placement.wsgi.api.__file__);" | 15:05 |
sean-k-mooney | so because we dont card that with if __main__ | 15:05 |
sean-k-mooney | that import actully runs some of the code | 15:06 |
sean-k-mooney | stephenfin: anyway im not sure if we need to expalin how to locate /opt/stack/placement/placement/wsgi/api.py on the system | 15:07 |
sean-k-mooney | but instead of /opt/stack/data/venv/bin/placement-api they just need to use ^ | 15:07 |
sean-k-mooney | if this was not devstack that would be in /usr/lib/python3/dist-packages/ or /usr/lib/python3/site-packages/ | 15:09 |
Uggla | Nova meeting in ~50mn | 15:10 |
Uggla | Nova meeting in ~10mn, time for you to grab a cup of coffee. | 15:49 |
gibi | If I drink a coffee no then won't sleep until 2 in the morning | 15:53 |
gibi | sean-k-mooney: the thread local has a problem that if we have a periodic that wants to run a scatter-gather then the thread holding the periodic will see a different scatter-gather pool than the main one | 15:54 |
gibi | so I will look into the destroy pool before fork idea | 15:55 |
sean-k-mooney | i think there is a generic hook we can implemnt for that | 15:57 |
sean-k-mooney | we have some module levle reset functionaltiy that we use for mutable config | 15:58 |
sean-k-mooney | im thinking of thing like atexit() | 15:58 |
sean-k-mooney | there might be a pre/post fork hook we could register to do that | 15:58 |
gibi | https://github.com/sileht/cotyledon/blob/be444189de32a8c29c7107a9b02da44248a7e64a/cotyledon/_service_manager.py#L156 cotyledon has hooks | 16:00 |
Uggla | #startmeeting nova | 16:02 |
opendevmeet | Meeting started Tue May 6 16:02:07 2025 UTC and is due to finish in 60 minutes. The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:02 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:02 |
opendevmeet | The meeting name has been set to 'nova' | 16:02 |
bauzas | \o | 16:02 |
Uggla | Hello everyone | 16:02 |
Uggla | awaiting a moment for people to join. | 16:02 |
elodilles | o/ | 16:02 |
fwiesel | o/ | 16:02 |
gmaan | o/ | 16:03 |
Uggla | thanks bauzas for last week meeting. | 16:04 |
bauzas | np | 16:04 |
gibi | o/ | 16:04 |
Uggla | #topic Bugs (stuck/critical) | 16:04 |
Uggla | #info No Critical bug | 16:04 |
Uggla | #topic Gate status | 16:05 |
Uggla | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:05 |
Uggla | #link https://etherpad.opendev.org/p/nova-ci-failures-minimal | 16:05 |
Uggla | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status | 16:05 |
Uggla | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:05 |
Uggla | #info Please try to provide meaningful comment when you recheck | 16:05 |
Uggla | If I understood well, the gate was blocked by this https://review.opendev.org/c/openstack/nova/+/948392 last week. | 16:06 |
gibi | jepp | 16:06 |
gibi | fix is landed | 16:06 |
Uggla | It is landed now. | 16:06 |
Uggla | It the gate looks good. Please tell me if I'm wrong. | 16:06 |
sean-k-mooney | the cyborg and barbican jobs are still broken i think | 16:06 |
gibi | I have hit by https://bugs.launchpad.net/glance/+bug/2109428 multiple times recently. Not a blocker but definitly a source of rechecks | 16:06 |
gibi | sean-k-mooney: yepp those non-votings are broken | 16:07 |
gibi | bugs are filed | 16:07 |
sean-k-mooney | i may try and find time to go fix them | 16:07 |
sean-k-mooney | they are both broken on the lack of a pyproject.toml | 16:07 |
gibi | yepp | 16:07 |
Uggla | good to know thx gibi | 16:07 |
gibi | https://bugs.launchpad.net/barbican/+bug/2109584 | 16:07 |
gibi | https://bugs.launchpad.net/openstack-cyborg/+bug/2109583 | 16:07 |
sean-k-mooney | if i can fix it without having to install the service locally i might give it a try | 16:07 |
sean-k-mooney | i can refence thos bugs | 16:08 |
gibi | thanks | 16:08 |
gmaan | I think we need to make pyproject.toml for all projects in openstack otherwise they will break slowly at some point | 16:08 |
gibi | gmaan: I agree | 16:08 |
Uggla | should we track this ? | 16:09 |
sean-k-mooney | gmaan: yes we will | 16:09 |
sean-k-mooney | Uggla: nova is mostly doen stephen and i tried to do this 2 years ago | 16:09 |
sean-k-mooney | to get ahead of things breaking | 16:10 |
sean-k-mooney | so nova and placment are done | 16:10 |
sean-k-mooney | i need to check os-* | 16:10 |
sean-k-mooney | but for the libs we are responible for it trivial | 16:10 |
sean-k-mooney | ok os-vif is still pending | 16:11 |
Uggla | any bug / blueprint to refer to this work ? | 16:11 |
sean-k-mooney | ill start working on them and ping folks to review | 16:11 |
gmaan | ++ | 16:11 |
Uggla | sean-k-mooney++ | 16:11 |
Uggla | anything else ? | 16:12 |
Uggla | moving on to next item. | 16:12 |
Uggla | #topic tempest-with-latest-microversion job status | 16:12 |
Uggla | #link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-with-latest-microversion&skip=0 | 16:12 |
Uggla | I have just discussed with gmann about it. | 16:12 |
Uggla | gmann is progressing on this periodic job. | 16:13 |
Uggla | gmaan, I let you give a quick status if you wish. | 16:13 |
gmaan | sure | 16:14 |
gmaan | it is not worst than what i suspected, only 23 tests failing | 16:14 |
gmaan | I started fixing those one by one | 16:14 |
gmaan | #link https://review.opendev.org/q/topic:%22latest-microversion-testing%22 | 16:14 |
gmaan | hypervisor are fixed, I will fix a few more today and this week | 16:14 |
sean-k-mooney | do you know if some of them are invalid failure | 16:15 |
gmaan | that is ^^ topic I am adding all changes to, feel free to review/comment | 16:15 |
sean-k-mooney | i.e. the test is depending on an older microverion behavior | 16:15 |
sean-k-mooney | and should be skipped when using latest | 16:15 |
gmaan | sean-k-mooney: not invalid, either we need to fix schema or cap the test with min/max microversions | 16:15 |
sean-k-mooney | ok ya that actully what i was wondering | 16:15 |
gmaan | sean-k-mooney: yeah, for example hypervisor uptime test should run till 2.87 | 16:15 |
sean-k-mooney | can we express in tempst | 16:15 |
sean-k-mooney | that this test has a max rather then just min verion requirement | 16:16 |
gmaan | yes with 'max_microversion' | 16:16 |
sean-k-mooney | ack | 16:16 |
sean-k-mooney | ah i see that how your fixing it in https://review.opendev.org/c/openstack/tempest/+/948490 | 16:16 |
sean-k-mooney | cool | 16:16 |
gmaan | yeah ^^ | 16:16 |
gmaan | sometime we need to refactor test but not bug deal. | 16:17 |
gmaan | that is all on these, fell free to review, as I am only active core in tempest I will merge them but keep it open for sometime if anyone would like to review | 16:17 |
gmaan | most probably, i will keep all fixs open till job is green | 16:18 |
Uggla | thx gmaan | 16:18 |
Uggla | #topic Release Planning | 16:19 |
Uggla | #link https://releases.openstack.org/flamingo/schedule.html | 16:19 |
Uggla | #link https://releases.openstack.org/flamingo/schedule.html | 16:19 |
Uggla | The patch about nova deadlines has been merged so I think it is ok. | 16:19 |
Uggla | Please let me know if it is not the case or if something is wrong. | 16:20 |
Uggla | #topic Review priorities | 16:20 |
Uggla | #link https://etherpad.opendev.org/p/nova-2025.2-status | 16:21 |
Uggla | I'd like to progress on openapi and I'll try to check with stephenfin about it. | 16:22 |
Uggla | #topic Stable Branches | 16:22 |
Uggla | elodilles, the mic is yours | 16:23 |
elodilles | thanks Uggla , so | 16:23 |
elodilles | #info stable/2023.2 (bobcat) is End of Life, branch is deleted (tag: bobcat-eol) | 16:23 |
elodilles | #info maintained stable branches: stable/2025.1, stable/2024.2, stable/2024.1 | 16:23 |
elodilles | down to 3 maintained branches again ;) | 16:23 |
elodilles | #info nova stable release from stable/2024.1 is out (29.2.1) | 16:24 |
elodilles | #info not aware of any stable gate failure | 16:24 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:24 |
elodilles | we had a broken gate on stable/2025.1 osc-placement, | 16:24 |
elodilles | but it is now fixed | 16:25 |
elodilles | thanks all for the help :) | 16:25 |
elodilles | and i think that's all from me | 16:25 |
elodilles | Uggla: back to you | 16:25 |
Uggla | elodilles, fyi 947847: nova: Release 2024.2 Dalmatian 30.0.1 | https://review.opendev.org/c/openstack/releases/+/947847 should be ok now. | 16:25 |
Uggla | As I have done 948811: Add uc check alternative method | https://review.opendev.org/c/openstack/requirements/+/948811 | 16:25 |
elodilles | Uggla: thanks for working on that! | 16:26 |
Uggla | the path is on master with a backport to the stable branch. | 16:26 |
elodilles | i've just added a comment on your patch o:) | 16:26 |
Uggla | so it needs reviews. | 16:26 |
Uggla | but at least the verify is +1 | 16:27 |
opendevreview | sean mooney proposed openstack/os-vif master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-vif/+/899946 | 16:27 |
Uggla | #topic vmwareapi 3rd-party CI efforts Highlights | 16:28 |
fwiesel | #info No updates | 16:28 |
fwiesel | Uggla: Back to you | 16:28 |
Uggla | fwiesel, btw welcome back, I hope you are good. | 16:28 |
Uggla | #topic Gibi's news about eventlet removal. | 16:29 |
fwiesel | Thanks! Just still catching up with everything. | 16:29 |
gibi | o/ | 16:29 |
Uggla | #link Series: https://gibizer.github.io/categories/eventlet/ | 16:29 |
gibi | so last week we saw nova-scheduler running with native threading. There is a blogpost about it | 16:29 |
bauzas | had no time to review those patches yet :( | 16:29 |
gibi | this week I started testing the non happy path of the scatter-gather | 16:29 |
* bauzas hardly crying | 16:30 | |
gibi | good news both mysql server and pymysql client can be configured with timeouts to avoid handing gather threads | 16:30 |
gibi | I will add a new doc about it in tree | 16:30 |
gibi | bad news, the oslo.service threading backend uses forks | 16:31 |
gibi | which does not play nice with out global threadpools initialized before the service workers are forked off | 16:31 |
gibi | we have workarounds | 16:31 |
gibi | details are in https://review.opendev.org/c/openstack/oslo.service/+/945720/comments/fb5d6632_f0eaf102 | 16:31 |
gibi | there are two patches to look at | 16:31 |
gibi | https://review.opendev.org/c/openstack/nova/+/948064?usp=search now with test coverage | 16:32 |
gibi | and | 16:32 |
gibi | https://review.opendev.org/c/openstack/nova/+/948437?usp=search | 16:32 |
gibi | this week I will work on cleaning up the long service to make more patches ready to review | 16:32 |
gibi | that is all | 16:32 |
gibi | Uggla: back to you | 16:32 |
sean-k-mooney | gmaan: Uggla: just a quick update on the pyproject.toml change if i may | 16:33 |
sean-k-mooney | https://etherpad.opendev.org/p/pep-517-and-pip-23 is my old ether pad to track that work | 16:33 |
Uggla | thx gibi | 16:33 |
Uggla | sean-k-mooney, sure go ahead | 16:33 |
sean-k-mooney | and only the os-vif change above is not mergd for nova | 16:33 |
sean-k-mooney | so i rebased and appvoed that | 16:33 |
gmaan | just saw, ++ | 16:33 |
sean-k-mooney | once that is landed we shoudl be good | 16:33 |
Uggla | \o/ | 16:34 |
sean-k-mooney | anyway that was all on that topic | 16:34 |
Uggla | gibi, just a question you have said : the oslo.service threading backend uses forks it means real processes and not threads ? | 16:35 |
gibi | Uggla: no we want worker processes with thread pool | 16:35 |
gibi | pools | 16:35 |
gibi | but it uses os.fork to create the worker from the main process | 16:35 |
sean-k-mooney | context is oslo service | 16:35 |
sean-k-mooney | allows you to have multiple workers | 16:35 |
gibi | fork copies the state of the process | 16:36 |
sean-k-mooney | we use that in the schduler and conductor | 16:36 |
sean-k-mooney | but not nava-comptue or the api | 16:36 |
gibi | we would need os.spawn to get a totally new worker process | 16:36 |
gibi | without inheriting state | 16:36 |
Uggla | ok I think I have understood the pb. | 16:37 |
gibi | the wa is to reset the problematic state | 16:37 |
gibi | after the fork | 16:37 |
gibi | but I think the real solution would be os.spawn | 16:37 |
gibi | but changing to that is not super simple as oslo.config has non pickleable lambdas :/ | 16:37 |
gibi | details are in the linked gerrit comment | 16:37 |
gibi | and are up in the today IRC log | 16:38 |
dansmith | we can reset after the fork (as you're doing) or reset before the fork, as I suspect will also work and be less janky | 16:38 |
gibi | yepp | 16:38 |
gibi | I will try dansmith's suggestion next | 16:38 |
gibi | while we wait for the oslo folks to respond | 16:38 |
sean-k-mooney | https://github.com/openstack/oslo.config/blob/d6e5c96d6dbeec0db974dfb8afc8e508b74861e5/oslo_config/cfg.py#L1387 | 16:38 |
dansmith | I agree the real solution is spawn, but we probably don't want to wait | 16:38 |
sean-k-mooney | that is there only use of lambda i think by the way | 16:38 |
gibi | dansmith: yepp we won't wait, we will go with the WA, and adapt when oslo makes the move | 16:39 |
dansmith | +1 | 16:39 |
Uggla | moving on | 16:40 |
Uggla | #topic Open discussion | 16:40 |
Uggla | fwiesel, wants to propose something about Cross hypervisor resize. | 16:41 |
gibi | I guess that was last week's topic | 16:41 |
Uggla | yep but I understood you wanted to discuss it more. | 16:42 |
gibi | or maybe a continuation | 16:42 |
fwiesel | Yes, but there was limited feedback. I mean, if no one feels strongly about it, then I can go forward with a blueprint. | 16:42 |
fwiesel | We would like to allow mobility between two hypervisors, and I was thinking the cross-cell migration might already cover it to a large degree. | 16:43 |
fwiesel | The question is though, if that is a use-case you would feel worthwhile supporting or rather not. | 16:44 |
dansmith | I think it _has_ to be more like cross-cell migration than regular | 16:44 |
dansmith | I'm a bit mixed on whether or not I think this is worth it, because I suspect there will be a lot of gotchas in the image properties | 16:45 |
dansmith | and because you can pretty much do this with snapshot yourself now | 16:45 |
dansmith | and because we won't really be able to test it regularl I don't think | 16:45 |
gibi | yeah testing this will be painful | 16:46 |
fwiesel | Okay, got it. That's fine. | 16:46 |
sean-k-mooney | dansmith: specicly like updating the hw_vif_model for say the vmware one to virtio if they dont happen to have common values already | 16:46 |
dansmith | yeah, all that kind of stuff | 16:46 |
fwiesel | Well, we would set those in the flavours and that would override that. | 16:47 |
sean-k-mooney | so general question. is there anyting we know off that would prevent you form shelving, modifying them in glance and unshleving? | 16:47 |
fwiesel | No, it is more a usability issue. | 16:47 |
fwiesel | For our users. | 16:48 |
dansmith | sean-k-mooney: if the flavor you booted from was vmware-specific you can't unshelve to a libvirt-y one right? | 16:48 |
sean-k-mooney | fwiesel: so historically flavors are for amounts and images propereis are for changing how the devices are presented | 16:48 |
fwiesel | And we were thinking of using shared NFS shares to avoid going through image upload, etc... | 16:48 |
sean-k-mooney | fwiesel: so in the past the precendet was dont put device models in flavors | 16:48 |
sean-k-mooney | dansmith: correct | 16:48 |
sean-k-mooney | dansmith: what im thinkign is we dicussed allowing resize in the sheleved state | 16:49 |
dansmith | to me, snapshot, tweak, boot fresh is the best pattern here | 16:49 |
sean-k-mooney | so the workflow would be shelve. resize , update image properties and then unshleve | 16:49 |
dansmith | sean-k-mooney: that's a whole other conversation | 16:49 |
fwiesel | But I got it... hard to test so, no takers... Perfectly understandable. | 16:49 |
sean-k-mooney | ya im just confirming if there is a way to do that today with the api we have and i think the gaps are resize while shleved | 16:49 |
opendevreview | Balazs Gibizer proposed openstack/nova master: DNM:Run nova-next with n-sch in threading mode https://review.opendev.org/c/openstack/nova/+/948450 | 16:50 |
fwiesel | Thanks for the feedback. | 16:50 |
sean-k-mooney | but ya snapshot. detach volumes/ports and boot new vms with thos is a valid workflow too | 16:50 |
Uggla | Shall I say it is ok for fwiesel to propose something that goes in that direction ? | 16:52 |
fwiesel | My understanding is, I won't propose things as it is doable with the existing API and implementing it within Nova is hard to test and maintain. | 16:53 |
dansmith | I think anyone can propose anything they want :) | 16:53 |
dansmith | fwiesel: ++ | 16:54 |
Uggla | oh ok good. | 16:54 |
Uggla | We are almost at the top of the hour. Are you ok triaging a couple of bugs or would you prefer to do it next week ? | 16:56 |
Uggla | sean-k-mooney, dansmith ^^ | 16:58 |
Uggla | ok no answer so you might be busy. So closing the meeting. | 16:59 |
Uggla | thanks all | 16:59 |
gibi | Uggla: thanks | 17:00 |
fwiesel | thanks everyone | 17:00 |
Uggla | #endmeeting | 17:00 |
opendevmeet | Meeting ended Tue May 6 17:00:10 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2025/nova.2025-05-06-16.02.html | 17:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2025/nova.2025-05-06-16.02.txt | 17:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2025/nova.2025-05-06-16.02.log.html | 17:00 |
bauzas | thanks Uggla | 17:00 |
Uggla | thank you. We will try to do the bug scrubbing next week | 17:00 |
dansmith | sorry, multitasking | 17:01 |
elodilles | thanks Uggla o/ | 17:01 |
gibi | sean-k-mooney: I stop for today but I will get back to forking tomorrow :) | 17:04 |
sean-k-mooney | gibi: no worries | 17:19 |
sean-k-mooney | i started at 7am because i woke at 5 so i shoudl also finsih for today | 17:19 |
opendevreview | Merged openstack/os-vif master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/os-vif/+/899946 | 20:48 |
opendevreview | Merged openstack/nova master: [quota]Refactor group counting to scatter-gather https://review.opendev.org/c/openstack/nova/+/948064 | 22:40 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!