| opendevreview | Ghanshyam Maan proposed openstack/nova master: Use 2nd RPC server in compute operations https://review.opendev.org/c/openstack/nova/+/975588 | 02:57 |
|---|---|---|
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Prepare resize/cold migration for graceful shutdown https://review.opendev.org/c/openstack/nova/+/977182 | 04:34 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Use 2nd RPC server in compute operations https://review.opendev.org/c/openstack/nova/+/975588 | 05:02 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Prepare resize/cold migration for graceful shutdown https://review.opendev.org/c/openstack/nova/+/977182 | 05:46 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Use 2nd RPC server in compute operations https://review.opendev.org/c/openstack/nova/+/975588 | 06:14 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Prepare resize/cold migration for graceful shutdown https://review.opendev.org/c/openstack/nova/+/977182 | 06:24 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Use 2nd RPC server in compute operations https://review.opendev.org/c/openstack/nova/+/975588 | 07:03 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Prepare resize/cold migration for graceful shutdown https://review.opendev.org/c/openstack/nova/+/977182 | 07:11 |
| opendevreview | Dominik proposed openstack/nova master: NUMA Topology with Resource Providers: Libvirt NUMA Migrate https://review.opendev.org/c/openstack/nova/+/971177 | 08:54 |
| gibi | sean-k-mooney: hi, there is another iothreads fix from lajoskatona that ready to land https://review.opendev.org/c/openstack/nova/+/975934 | 09:22 |
| opendevreview | Silvan Kaiser proposed openstack/nova master: libvirt: partial revert, Quobyte driver supported again https://review.opendev.org/c/openstack/nova/+/977300 | 09:44 |
| opendevreview | Silvan Kaiser proposed openstack/nova master: libvirt: partial revert, Quobyte driver supported again https://review.opendev.org/c/openstack/nova/+/977300 | 09:46 |
| *** tkajinam_ is now known as tkajinam | 10:13 | |
| sean-k-mooney | gibi: ah yes i didnt pull it donw since they respon it but ill take a look shortly | 10:31 |
| sean-k-mooney | that was on my todo list thatnks for the reminder | 10:31 |
| sean-k-mooney | gibi: ill have finsihed testing this in about 15 minute and then ill submit it but how do you feel about adding whitebox to check? we talked about it a few times but it actully caght this bug | 10:42 |
| sean-k-mooney | i mentioned it offhand to jparker too but i recnectly became aware that the linux kernle has the ablity to emulate/report multiple numa nodes as well | 10:43 |
| sean-k-mooney | with not much effort we coulf properly test live migration with and without cpu pinning and hugepages and numa in the first part ci | 10:44 |
| sean-k-mooney | we have most of it already in whitebox we just need to run it and tweak the job slichtly if we want ot add numa | 10:44 |
| sean-k-mooney | i need to tweak my old patch that we reverted for the basic testing of cpu_share_set and repo-spe that for nova-alt-config | 10:45 |
| opendevreview | ribaudr proposed openstack/nova master: FUP Add HW_PCI_LIVE_MIGRATABLE trait to PCI resource providers https://review.opendev.org/c/openstack/nova/+/977310 | 10:51 |
| sean-k-mooney | lajoskatona: gibi tested and approved it will hopefully merge later today and unblock https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/975225 | 11:13 |
| sean-k-mooney | it has also been tested trasitivly via that change | 11:14 |
| lajoskatona | sean-k-mooney, gibi: thanks | 11:17 |
| priteau | Lots of jobs hitting post_failure in 2025.1 :( ModuleNotFoundError: No module named 'typing_extensions' | 11:24 |
| priteau | That's the testtools issue | 11:25 |
| priteau | Is there a patch for this already? | 11:26 |
| gibi | priteau: as far as I know fix landed in testtools and we are waiting for a new release from testtools to get it | 11:30 |
| gibi | priteau: https://github.com/testing-cabal/testtools/pull/570 | 11:35 |
| priteau | https://pypi.org/project/testtools/ | 11:36 |
| priteau | Released: 2 minutes ago | 11:36 |
| priteau | But we would need a bump of upper-constraints? | 11:39 |
| priteau | 2024.1 has testtools===2.7.1 | 11:39 |
| tkajinam | priteau, no | 11:40 |
| tkajinam | priteau, the failing task was common for all branches and doesn't use u-c | 11:41 |
| tkajinam | s/was/is/ | 11:41 |
| priteau | OK | 11:43 |
| priteau | So recheck should be enough? | 11:43 |
| frickler | likely we will need to build new images, as IIUC the broken venv is preconfigured there | 11:52 |
| opendevreview | ribaudr proposed openstack/nova master: FUP Add HW_PCI_LIVE_MIGRATABLE trait to PCI resource providers https://review.opendev.org/c/openstack/nova/+/977310 | 12:02 |
| tkajinam | frickler, seems so, looking at the failure still appearing | 12:25 |
| opendevreview | ribaudr proposed openstack/nova master: FUP Add HW_PCI_LIVE_MIGRATABLE trait to PCI resource providers https://review.opendev.org/c/openstack/nova/+/977310 | 12:52 |
| priteau | frickler: ah, this is why I couldn't see a fresh installation of testtools | 12:56 |
| gibi | dansmith: gmaan: after I implemented the singel long task executor I realized that we have a logical problem https://review.opendev.org/c/openstack/nova/+/977251/1#message-ff7b23fb4a19eee42289ff428220461ae3df2da8 The recommended limit for the concurrent live migration is wildly different from the number of parallel builds | 13:08 |
| gibi | I don't think deplolyers will accept this approach to have 1 lm - 1 build or 10 lm - 10 build, config. | 13:09 |
| gibi | we recommend a single lm but that is not a useful value for concurrent builds | 13:09 |
| gibi | this experiment shows me that we might not able to avoid the complexity in https://review.opendev.org/c/openstack/nova/+/975924/1 | 13:10 |
| sean-k-mooney | gibi: ya those need to be split | 13:12 |
| sean-k-mooney | we could still sue a semapor for teh ratelimiting | 13:13 |
| sean-k-mooney | and just have the pool limit be seperate | 13:13 |
| sean-k-mooney | we can warn if the semephor config option exceed the pool size | 13:13 |
| sean-k-mooney | but i think that an ok compromise | 13:13 |
| opendevreview | ribaudr proposed openstack/nova master: FUP Add HW_PCI_LIVE_MIGRATABLE trait to PCI resource providers https://review.opendev.org/c/openstack/nova/+/977310 | 13:33 |
| gibi | sean-k-mooney: we cannot really take the semaphore approach for live migration as we rely on cancellability of live migrations. If I use the shared executor and take a semaphore within the lm task then lm task waiting for the semaphore becomes non cancellable | 13:34 |
| gibi | on lm tasks waiting in an executor queue is cancellable | 13:35 |
| sean-k-mooney | it depend on how we do cancelation i guess | 13:35 |
| gibi | we cancel futures | 13:35 |
| sean-k-mooney | right but live migation supprot direct cancelation via the api as well | 13:36 |
| gibi | sure we can re-engineer the lm abort logic but then that is complexity | 13:36 |
| sean-k-mooney | so even if its in progess we can actuly cancel it | 13:36 |
| sean-k-mooney | ya so for now i guess we coudl keep those executor seperate | 13:36 |
| sean-k-mooney | but i dont think we want a seprate executor per config optionright | 13:37 |
| sean-k-mooney | at elast not unless the idel at 0 | 13:37 |
| gibi | ahh there is different abort modes of cancellation of lm, but we basically loose the abort mode for lms that are not yet running due to the sempahore (limit'l | 13:37 |
| gibi | (limit) | 13:37 |
| sean-k-mooney | yes you can force compelte or abort | 13:38 |
| gibi | if the lm is really executing the we abort via the driver | 13:38 |
| sean-k-mooney | sorry i have not had time to properly load context | 13:38 |
| sean-k-mooney | you are stating https://review.opendev.org/c/openstack/nova/+/975924/1 wont work right | 13:38 |
| sean-k-mooney | or are you saying it will and we cant avoid the complexity | 13:39 |
| gibi | I'm saying I think we probably need the complexity from https://review.opendev.org/c/openstack/nova/+/975924/1 as a shared executor in https://review.opendev.org/c/openstack/nova/+/977251/1#message-ff7b23fb4a19eee42289ff428220461ae3df2da8 does not work well | 13:39 |
| sean-k-mooney | ack | 13:39 |
| sean-k-mooney | i have not reviewd iether patch yet hence why im asking for your gut feeling on which one is more likely to work | 13:40 |
| gibi | having a shared executor for build and snapshot might work, but we would need a separate for lm. Or we need the complexity from the wrapper that implements limits per task type. | 13:41 |
| sean-k-mooney | i dont hate the idea of TaskTypeLimiterExecutorWrapper | 13:41 |
| sean-k-mooney | i actully think we will want to have a task/proirty aware executro in the future | 13:42 |
| sean-k-mooney | i think that is nessisary complexity rather then overengeinging given our usecases | 13:42 |
| sean-k-mooney | but i ahve only really read the doc strings at this point | 13:42 |
| sean-k-mooney | gibi: honestly i dont think https://review.opendev.org/c/openstack/nova/+/975924/1/nova/utils.py is that complex and i woudl personlly extend it to add a priorty filed to the task or type stuct or both | 13:46 |
| sean-k-mooney | proirtyon type woudl mean all tasks fo this type has a default priorty we regestry when we registrer the type | 13:47 |
| sean-k-mooney | that woudl allow use to still work properly if the concurnace on teh executor is less then the limtis for all the indiviual types | 13:48 |
| sean-k-mooney | we can technially do that in your version as well as proposed | 13:48 |
| sean-k-mooney | but obvioulsy we might want to express some prefence beyond arivle time in when the next task si enquened | 13:49 |
| sean-k-mooney | gibi: if this is don correctly by the way this can just live en futureist | 13:50 |
| sean-k-mooney | but im ok with the idea of building this out in nova first | 13:50 |
| gibi | yeah | 13:53 |
| gibi | given the closeness of FF I'm trying to make a way forward with a split approach. build and snapshot in a shared executor, and lm having its own executor. | 13:58 |
| gibi | but I think the right approach is the TaskTypeLimiterExecutorWrapper | 13:58 |
| gibi | we just don't have time | 13:58 |
| sean-k-mooney | looking at it breifly i woudl be ok with proceeding in that direction too | 13:58 |
| sean-k-mooney | that obviously not a rigours code review but directionally i think the trade offs you are makign make sense | 13:59 |
| gibi | yeah if I would have 2 weeks I would go all in on the wrapper | 14:01 |
| gibi | but if I want to be realistic and want nova-compute with threading in G then the split approach is less risky | 14:01 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: [compute]Use single long task executor https://review.opendev.org/c/openstack/nova/+/977251 | 14:11 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode https://review.opendev.org/c/openstack/nova/+/965467 | 14:11 |
| dansmith | gibi: you're saying you don't think it's reasonable to lift the live migration limit to match what is probably the higher build limit? | 14:42 |
| dansmith | live migrations are controlled by the admins, so I guess I don't see that as a fatal compromise for the time being, | 14:43 |
| dansmith | but even if you go with the semaphore approach, I think you can still handle cancelation if the tasks each immediately check to see if they've been canceled after they acquire the semaphore and exit if so, no? the operation will appear to be a bit sticky until the (or a) current one finishes, but I imagine that's happening today... | 14:44 |
| Uggla | FYI: https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/VTLDDSXUHPKON3WOKNMATFL7ERHFPCYB/ | 15:35 |
| sean-k-mooney | Uggla: my understandign was the bug scrub was previously ment ot happen in teh channel after the meeting | 15:36 |
| sean-k-mooney | not part of it | 15:36 |
| sean-k-mooney | but sure. for watcher we do bug scrubing in the meeting each week and it works fine | 15:36 |
| gmaan | bauzas: this is also ready for your review https://review.opendev.org/c/openstack/nova/+/975586 | 15:36 |
| gmaan | gibi: bauzas: can either of you +w this one, I thought it was merged https://review.opendev.org/c/openstack/nova/+/975242/4 | 15:37 |
| gmaan | gibi: dansmith sean-k-mooney: for single executor, adding semaphore for lm makes it comples than the approach gibi mentioned. I also thought the same. to have separate executor for lm and shared for all other long running tasks | 15:38 |
| Uggla | bauzas, gibi, elodilles, gmaan see 16:35 msg. | 15:38 |
| gmaan | Uggla: yeah, I saw the email too. this cycle, gate is giving the best feeling of FF | 15:39 |
| dansmith | gmaan: the only problem there is that 20 live migrations waiting on semaphores will consume all the builder threads, but the point is to limit the activity a bit, so I think that should be okay | 15:39 |
| Uggla | sean-k-mooney, I would like to have a sync point during the meeting to restart it. Because I was lazy on that topic... :( | 15:40 |
| opendevreview | Bence Romsics proposed openstack/nova master: WIP Functional reproducer for #2051685 https://review.opendev.org/c/openstack/nova/+/977331 | 15:41 |
| gmaan | dansmith: but as we recommend only 1 live migration at a time (currently), by common executor, will 10 (or say 5) parallel live migration be successful and on time ? I am wondering on that side too | 15:42 |
| opendevreview | Bence Romsics proposed openstack/nova master: WIP Functional reproducer for #2051685 https://review.opendev.org/c/openstack/nova/+/977331 | 15:43 |
| dansmith | gmaan: you can limit parallel migrations manually by not issuing more than one of them at a time, but a semaphore (set to 1) for live migrations will limit them to one at a time... it's just that if you start 10 in parallel, 9 will consume threads from the pool that will prevent builds from happening too | 15:43 |
| gmaan | ohk | 15:44 |
| dansmith | the suggestion of putting them both into one pool and requiring them to be the same was a compromise to avoid the complexity that gibi seemed to not think was achievable in a short period of time.. that compromise is, of course, a compromise and has some restrictions | 15:45 |
| dansmith | if we're not okay with those, then we could just punt and try to do the complex thing first | 15:46 |
| dansmith | I tend to want an incremental approach where not everything will be perfect in the first round | 15:46 |
| gmaan | live migration are doable by manager user also not just admin, but still 10 as default shoudl be ok and if operator make it lower then they know the limitation. if needed we can make it 20 as default ? | 15:49 |
| dansmith | I thought the default for the limit was 10 and we _recommended_ they set it to one? | 15:50 |
| gmaan | k, one will be issue i think | 15:53 |
| sean-k-mooney | the default for the live migtion is 1 and we recomemnd it to be one | 16:12 |
| sean-k-mooney | but for the long live pool 10 is proably a reasonable default | 16:12 |
| gmaan | I am saying with single limit for executor, build is default to 10 so that will go on live migration also | 16:13 |
| sean-k-mooney | i woudl prefer to still havea way to limit live migration with a semephro if possibel as i dont like the idea of changign the default by proxy to 10 | 16:16 |
| sean-k-mooney | but i dont think that is a conflciting request | 16:17 |
| sean-k-mooney | we jsut keep the existign cofnig option and semepor for live migraton | 16:17 |
| opendevreview | Koya Watanabe proposed openstack/nova-specs master: Repropose instance-metadata-tag-protection https://review.opendev.org/c/openstack/nova-specs/+/977339 | 16:30 |
| opendevreview | Koya Watanabe proposed openstack/nova-specs master: Repropose newly instance metadata/tag protection feature https://review.opendev.org/c/openstack/nova-specs/+/977339 | 16:34 |
| sean-k-mooney | gibi: melwitt will ye have time to look at https://review.opendev.org/c/openstack/nova/+/975859/4 and https://review.opendev.org/c/openstack/nova/+/975872/5 next week? its a bug fix so it is not as time sensitive as features so if not that is ok | 16:55 |
| gibi | sorry I had to step away. I don't think 10 concurrent live migration is a good thing, as well as limiting the concurrent builds to 1. The semaphore approach makes it complicated to cancel queued live migrations waiting on the semaphore (=> complexity), but also 10 lm waiting on the sempahore will block any new build requests as the sempahore is take on the worker from the Executor. I understand this | 16:55 |
| gibi | was as compromise but I think this is too big of a compromise. As a sort term I'm now pushing for build and snapshot sharing an executor and live migration having its own (as today). Then after G I would do the per task type limit with an executor wrapper | 16:55 |
| gibi | and sorry again, but I have to step away again :/ (but I will read back) | 16:56 |
| gmaan | I am good that way for G and it is simple. semaphore approach still consume the executor so it does not really solve the operation limitation. If we are going to make it more smarter in future then this approach is ok for now. | 17:03 |
| sean-k-mooney | this approch beign gibis current patch with 1 pool for live migration and another for the rest of the backgorund task and no semaphor? | 17:06 |
| sean-k-mooney | im ok with that for this release as well jsut making sure that we are talking about th esame thing | 17:07 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Add manager graceful shutdown, timeout, and wait https://review.opendev.org/c/openstack/nova/+/975586 | 17:08 |
| gmaan | bauzas: gibi thanks for review on graceful shutdown changes, I fixed the bauzas comment in this change itself instead of followup because anyways I need to change the other changes in that series. | 17:11 |
| gmaan | this one https://review.opendev.org/c/openstack/nova/+/975586 | 17:11 |
| opendevreview | Merged openstack/nova master: Add 2nd RPC server for compute service https://review.opendev.org/c/openstack/nova/+/975242 | 17:36 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: fixups for live migration of `host` secret security https://review.opendev.org/c/openstack/nova/+/976316 | 18:09 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: support live migration of `deployment` secret security https://review.opendev.org/c/openstack/nova/+/925771 | 18:09 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: bump service version to enable live migration https://review.opendev.org/c/openstack/nova/+/975724 | 18:09 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: test live migration between hosts with different security https://review.opendev.org/c/openstack/nova/+/952629 | 18:09 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: add late check for supported TPM secret security https://review.opendev.org/c/openstack/nova/+/956975 | 18:09 |
| opendevreview | melanie witt proposed openstack/nova master: TPM: enable conversion of secret security modes via resize https://review.opendev.org/c/openstack/nova/+/962052 | 18:09 |
| opendevreview | melanie witt proposed openstack/nova master: DNM vtpm tempest https://review.opendev.org/c/openstack/nova/+/957477 | 18:09 |
| gmaan | gibi: In my graceful shutdown change, i saw test_submit_second_while_delaying_first failing in threading job AssertionError: 1.997275639999998 not greater than 2.0 | 18:57 |
| gmaan | https://1069208da190f941bbcb-6faf22591116ac424591f44dbeb2cb9b.ssl.cf1.rackcdn.com/openstack/0cc7f4aa239e491493e73f88b45c8986/testr_results.html | 18:57 |
| gmaan | not sure you or sean-k-mooney talked about it but just to let you know that this is happening in more places | 18:57 |
| sean-k-mooney | gmaan: melwitt mentioned it a day or two ago i think | 18:58 |
| gmaan | ohk | 18:58 |
| sean-k-mooney | i didnt look at it in too much detail but we likely neeed to mock time slightly diffently | 18:58 |
| sean-k-mooney | that or use the assert almsot equesl th9ing for floats | 18:59 |
| sean-k-mooney | its defnitly a semi flaky test but i hav enot see it fail much | 18:59 |
| gmaan | k, I just saw it in my change but not anywhere else | 19:00 |
| gmaan | its for delay task StaticallyDelayingCancellableTaskExecutorWrapper with delay of 2 so task should not finish before 2 if we check float then it will fail too | 19:01 |
| sean-k-mooney | well sleep shodl not be less then the ammount but i have not looked at the code | 19:03 |
| gmaan | or maybe it is just matter of next line captired the monotomic time after task is submitted. maybe we can capture time before task is submitted | 19:03 |
| gmaan | let me propose the change and see if that make sense | 19:03 |
| sean-k-mooney | ack | 19:03 |
| sean-k-mooney | im just finishing up but ill be around for a few more mins so if you push it before i wrap for the weekend ill review it quickly | 19:04 |
| gmaan | k, give me few min | 19:04 |
| sean-k-mooney | we could just assert >=1.9 | 19:09 |
| sean-k-mooney | but ya its the order of https://github.com/openstack/nova/blob/7a303bc1e28e9426f2f6d9898a18edda34bb8dd9/nova/tests/unit/test_utils.py#L2166-L2167 | 19:09 |
| sean-k-mooney | its already submited ot teh executor at that point when we recored teh teim | 19:10 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Fix the flasky test test_submit_second_while_delaying_first https://review.opendev.org/c/openstack/nova/+/977356 | 19:13 |
| gmaan | sean-k-mooney: melwitt gibi ^^ | 19:13 |
| sean-k-mooney | that is exactly the fix i was expecting so +2 :) | 19:14 |
| gmaan | thanks | 19:14 |
| gmaan | and it seems py310 job also green | 19:14 |
| sean-k-mooney | already or locally | 19:15 |
| gmaan | already, i can see it is passing in my change | 19:15 |
| sean-k-mooney | locally i gues because it has not run yet in the gate | 19:15 |
| sean-k-mooney | https://zuul.openstack.org/status?change=977356 | 19:15 |
| gmaan | in ohter change i mean which is still in gate | 19:15 |
| sean-k-mooney | oh ok | 19:16 |
| sean-k-mooney | is the grenade issue fixed out of interest | 19:16 |
| sean-k-mooney | ModuleNotFoundError: No module named 'typing_extensions' | 19:17 |
| gmaan | which one? is there new one | 19:17 |
| sean-k-mooney | that failing in watcher | 19:17 |
| gmaan | oh, i can see grenade job also passing in my nova change | 19:17 |
| gmaan | I think it was same in py310 also, i did not dig into it but same error | 19:17 |
| sean-k-mooney | oh ok i tought i saw folks taling about this in one of the irc channle | 19:18 |
| sean-k-mooney | i think its a testtools issue and we ewre waiting on the new release | 19:18 |
| sean-k-mooney | which happend eailer today | 19:18 |
| gmaan | yeah | 19:18 |
| sean-k-mooney | ok well that wil either pass or not but should be resovled one way or the ther by monday | 19:19 |
| gmaan | this is merged so all green https://review.opendev.org/c/openstack/nova/+/975242/4 | 19:20 |
| gmaan | you can recheck maybe | 19:20 |
| gmaan | wathcer should be ok too | 19:20 |
| sean-k-mooney | joan already did | 19:20 |
| gmaan | k | 19:20 |
| sean-k-mooney | and we merges some stuff too i just was not sure if the greade issue had been resovled | 19:20 |
| gmaan | k, at least for now but I am sure its going to be more failure in FF week :) if it start it happen at worst :) | 19:21 |
| melwitt | awesome gmaan | 19:21 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Add manager graceful shutdown, timeout, and wait https://review.opendev.org/c/openstack/nova/+/975586 | 19:49 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Use 2nd RPC server in compute operations https://review.opendev.org/c/openstack/nova/+/975588 | 20:20 |
| opendevreview | Merged openstack/nova master: Fix the flasky test test_submit_second_while_delaying_first https://review.opendev.org/c/openstack/nova/+/977356 | 20:21 |
| loth | Hey all, I'm having some trouble getting multi-cells working. Is anyone on that is familar with it? | 20:40 |
| opendevreview | Ghanshyam Maan proposed openstack/nova master: Prepare resize/cold migration for graceful shutdown https://review.opendev.org/c/openstack/nova/+/977182 | 23:25 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!