opendevreview | Yalei Li proposed openstack/nova master: Implement get_power_state using domain.state https://review.opendev.org/c/openstack/nova/+/904970 | 01:12 |
---|---|---|
opendevreview | sean mooney proposed openstack/nova master: add bin dir for non eventlet console_scripts https://review.opendev.org/c/openstack/nova/+/904424 | 03:24 |
opendevreview | sean mooney proposed openstack/nova master: remove use of eventlet.queue.LightQueue https://review.opendev.org/c/openstack/nova/+/900681 | 03:24 |
opendevreview | sean mooney proposed openstack/nova master: set rpc executor based on monkey patch state https://review.opendev.org/c/openstack/nova/+/905284 | 03:24 |
opendevreview | sean mooney proposed openstack/nova master: use futureist for scater_gather https://review.opendev.org/c/openstack/nova/+/905285 | 03:24 |
opendevreview | sean mooney proposed openstack/nova master: move context executor to utils and remove eventlet.tpool https://review.opendev.org/c/openstack/nova/+/905287 | 04:37 |
opendevreview | Merged openstack/nova master: Lower num_pcie_ports to 12 in the nova-next job https://review.opendev.org/c/openstack/nova/+/902175 | 05:00 |
sahid | o/ | 09:02 |
sahid | anychance that you make progress on that one? https://review.opendev.org/c/openstack/nova/+/896512 I think we are good with it, no? something else is needed? | 09:03 |
sahid | s/you make/we make :-) | 09:03 |
noonedeadpunk | I actually have question slightly realted to this ^ | 10:04 |
noonedeadpunk | Like right now looking at AggregateMultitenancyIsolation and fail to understand a way to make an aggregate isolated for tenant. | 10:04 |
noonedeadpunk | Isolated in term - only tenants specified in aggregate can create instances in the aggregate. And not on any other computes - just this aggregate | 10:05 |
noonedeadpunk | I assumed, that using placement isolation together with AggregateMultitenancyIsolation will do that, but seems it's not... | 10:05 |
noonedeadpunk | Or well, maybe I have to add all other computes to different aggregate.... | 10:06 |
noonedeadpunk | So I have this: enable_isolated_aggregate_filtering = True, limit_tenants_to_placement_aggregate = True, placement_aggregate_required_for_tenants = False | 10:08 |
opendevreview | Rajesh Tailor proposed openstack/nova-specs master: List requested availability zones https://review.opendev.org/c/openstack/nova-specs/+/904368 | 10:12 |
bauzas | sahid: unfortunately, we haven't yet accepted https://blueprints.launchpad.net/nova/+spec/aggregatemultitenancyisolation-to-support-unlimited-tenant as a specless bp | 10:13 |
bauzas | sean-k-mooney was having some concern when we discussed it on a team meeting | 10:14 |
bauzas | for me I'm OK but we don't have consensus | 10:14 |
noonedeadpunk | So my assumption was, that if request comes from a tenant that is not in filter_tenant_id, AggregateMultiTenancyIsolation will filter out all hosts that belong to aggregates with filter_tenant_id defined. And if request comes from tenant with filter_tenant_id - placement will return only hosts that have this metadata | 10:15 |
noonedeadpunk | But in fact, aggregate is not restricted for "outsiders" | 10:15 |
noonedeadpunk | oh, I see now..... | 10:17 |
noonedeadpunk | https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L47 | 10:18 |
noonedeadpunk | suffix is only applicable for placement isolation: https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/request_filter.py#L115 | 10:19 |
noonedeadpunk | So kinda.... I can't define even 2 tenants to be working both for placement and filter? | 10:21 |
noonedeadpunk | as placement is not trying to split on comas, and filter not trying to search for suffixes :( | 10:24 |
noonedeadpunk | ok, so I'm actually facing exact same usecase that sahid patch covers.... | 10:26 |
bauzas | I think we could approve sahid's blueprint today and then we could discuss on his change then with sean-k-mooney | 10:26 |
bauzas | sean-k-mooney: are you okay then ? | 10:27 |
noonedeadpunk | ugh, I for some reason thought that this flow is already working though and was counting for it ;( | 10:34 |
noonedeadpunk | I still have a feeling I'm missing smth in tenant isolation topic though... | 11:46 |
noonedeadpunk | oh, maybe I can have both pre and post enabled indeed, huh | 11:48 |
noonedeadpunk | In terms - define list of tenants for post filter in `filter_tenant_id` through comma, and then have `filter_tenant_id_suffix` for pre-filter. While `filter_tenant_id` will be invalid for pre-filter, rest should work... | 12:02 |
noonedeadpunk | Slightly nasty, but looks like working at scratch... | 12:02 |
noonedeadpunk | *at a glance | 12:02 |
ratailor | bauzas, noonedeadpunk could you please review https://review.opendev.org/c/openstack/nova-specs/+/904368 spec, I have updated it as per yesterday's discussion. | 12:07 |
ratailor | sean-k-mooney, gibi_off please review ^^ | 12:08 |
dvo-plv | sean-k-mooney, Thank you. I will try to finc some examples | 13:53 |
bauzas | sean-k-mooney: gibi_off: we missed to ask auniyal to provide a blueprint for https://specs.openstack.org/openstack/nova-specs/specs/2024.1/approved/enforce-remote-console-session-timeout.html | 14:19 |
bauzas | I'll be creating the blueprint | 14:19 |
sean-k-mooney | ack | 14:31 |
opendevreview | Takashi Kajinami proposed openstack/nova master: Fix test failures with oslo.limit 2.3.0 https://review.opendev.org/c/openstack/nova/+/905314 | 15:16 |
opendevreview | Dan Smith proposed openstack/nova master: Support glance's new location API https://review.opendev.org/c/openstack/nova/+/891036 | 15:29 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Test glance new location api https://review.opendev.org/c/openstack/nova/+/891207 | 15:29 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Catch ImageNotFound on snapshot failure https://review.opendev.org/c/openstack/nova/+/905316 | 15:29 |
opendevreview | Sylvain Bauza proposed openstack/nova-specs master: add the blueprint link to the spec https://review.opendev.org/c/openstack/nova-specs/+/905342 | 16:10 |
bauzas | sean-k-mooney: could you quickly +2 this ^ | 16:11 |
bauzas | noonedeadpunk: fwiw looks like we gonna miss https://review.opendev.org/c/openstack/nova-specs/+/890779 for Caracal due to the spec approval freeze | 16:13 |
opendevreview | Takashi Kajinami proposed openstack/nova master: Fix test failures with oslo.limit 2.3.0 https://review.opendev.org/c/openstack/nova/+/905314 | 16:14 |
bauzas | noonedeadpunk: so that's cool you abandoned your spec proposal but fwiw I don't see any new revision for the other spec since 6 months :( | 16:15 |
sean-k-mooney | bauzas: done | 16:16 |
bauzas | sean-k-mooney: cool thanks | 16:17 |
bauzas | fwiw, I have some concerns when some good contributor abandons some change due to some other existing but the other is not revised from 6 months :( | 16:18 |
noonedeadpunk | bauzas: yeah, that's clear at this point :( | 16:18 |
noonedeadpunk | But realizing complexity more I'm not that self-confident anyway to pick implementation alone | 16:19 |
noonedeadpunk | As that sounds slightly like leveraging taskflow | 16:20 |
pmarsais | Hi nova community, I am looking at enabling TPM for my instances in my openstack cluster. Looking at nova source code, it seems that the shelving/unshelving workflow designed in https://specs.openstack.org/openstack/nova-specs/specs/victoria/implemented/add-emulated-virtual-tpm.html#shelve-offload-and-unshelve is currently not implemented (no references to "tpm_object_id" and "tpm_object_sha256"). Is that correct or am I mi | 16:40 |
pmarsais | If so, are there plan to get this in, or should I try to take a stab at implementing it ? (I already started the CCLA process internally with my employer) | 16:40 |
auniyal | hi sean-k-mooney, can you please have another look on this - https://review.opendev.org/c/openstack/nova/+/905134 [Refactor vf profile for PCI device | 17:24 |
auniyal | ] | 17:24 |
sean-k-mooney | pmarsais: no there is no plan to proceed with storing the tpm data in swrift for shelve/unshleve | 17:28 |
sean-k-mooney | if we wanted to od that we woudl have to repropose it as a feature via a new spec. | 17:29 |
sean-k-mooney | it was not implmented because we felt it was overly complext and we were not sure we wanted to require swift to be present to use it | 17:30 |
sean-k-mooney | today is the spec freeeze deadline for calacal os this woudl be a topic of next cycle (starting in march) | 17:31 |
pmarsais | Ok, thanks for the context ! | 17:35 |
sean-k-mooney | melwitt: im going to set up a devstack vm to test your disk encyprtion series so i can start reviewing it next week | 17:43 |
sean-k-mooney | melwitt:do you have a local.conf you can share | 17:43 |
sean-k-mooney | i belive i need barbican correct | 17:47 |
melwitt | sean-k-mooney: oh, yep. sec | 17:48 |
melwitt | sean-k-mooney: I set up jobs to test myself, you can use as reference as well https://review.opendev.org/c/openstack/nova/+/862416 | 17:49 |
sean-k-mooney | im currenly installing ubuntu so no rush | 17:49 |
sean-k-mooney | ah perfect | 17:49 |
melwitt | obvs everything you need will be in there | 17:49 |
sean-k-mooney | yep i can figure it out form there | 17:50 |
melwitt | ok great | 17:50 |
sean-k-mooney | now that i think about it i should proably clone this vm so i cna also test with rbd | 17:50 |
sean-k-mooney | ill start with qcow | 17:50 |
melwitt | sean-k-mooney: something to know is that it's a good idea to include swap or ephemeral or both in flavors you test with bc the code takes a slightly different path for creating swap or ephemeral disks. so including swap and/or ephemeral you will cover those paths as well | 17:57 |
sean-k-mooney | cool i can add both | 17:57 |
sean-k-mooney | have you tested with ephpemeral with multiple ephermeral disks | 17:57 |
opendevreview | Dan Smith proposed openstack/nova master: Catch ImageNotFound on snapshot failure https://review.opendev.org/c/openstack/nova/+/905316 | 17:58 |
opendevreview | Dan Smith proposed openstack/nova master: Support glance's new location API https://review.opendev.org/c/openstack/nova/+/891036 | 17:58 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Test glance new location api https://review.opendev.org/c/openstack/nova/+/891207 | 17:58 |
sean-k-mooney | i.e. "openstack server create --ephemeral 10 --ephemeral 20 ..." with a flavor with 30 G of ephemeral | 17:58 |
sean-k-mooney | ill create an etherpad whith what i tested with when i get to that point | 17:59 |
melwitt | sean-k-mooney: I think I did but it's been awhile. that's a good thing to test. i'm thinking to write tempest tests for cases like that | 17:59 |
sean-k-mooney | ya we shoudl perhaps as follow ups but it woudl be good to cover that | 18:00 |
melwitt | yeah. I'm working on the docs now and keeping a list of a few cases that are not currently covered in tempest presently, to write tests for those gaps. I'm thinking maybe we could run the encryption jobs (once cleaned up from DNM) at least as periodic jobs or something | 18:02 |
sean-k-mooney | other then the fact it needs barbican | 18:03 |
sean-k-mooney | is there any reason not to perhaps include this in one of our other jobs | 18:03 |
sean-k-mooney | im fine wiht a perodic job too | 18:03 |
melwitt | encryption is pretty noticeably slow. so I'm not sure if it'll fly but we can try | 18:03 |
sean-k-mooney | but would integrating it in the nova-live-migration jobs be an issue | 18:04 |
melwitt | I'm thinking ahead to if it ends up being too slow, I would at least want periodics | 18:04 |
sean-k-mooney | yep | 18:04 |
sean-k-mooney | so what im thinkin is | 18:05 |
sean-k-mooney | the perodics can do a broader range fo testing | 18:05 |
sean-k-mooney | but nova-live-migration and nova-live-migration-ceph could perhaps also do some basic testing for us | 18:05 |
sean-k-mooney | as those are pretty small jobs | 18:05 |
sean-k-mooney | they just run tempest_test_regex: (^tempest\.api\.compute\.admin\.(test_live_migration|test_migration)) | 18:06 |
melwitt | initially I was running those jobs with flavor with 1G disk, 1G ephemeral and 512MB swap and I got quite a bit of timed out jobs for raw and rbd | 18:06 |
melwitt | so I removed the ephemeral disk and left swap, that runs at normal durations | 18:07 |
sean-k-mooney | ok well we loop back to this then and start with just perodics and experimental | 18:07 |
melwitt | yeah I think that approach makes sense | 18:07 |
melwitt | it might be fine, just saying we might need to tweak things if it's too slow | 18:08 |
sean-k-mooney | yep | 18:09 |
sean-k-mooney | thats why i was looking for jobs that currently run few tests | 18:09 |
sean-k-mooney | melwitt: i need to read the spec again | 18:10 |
sean-k-mooney | but if i recall you request encyption via a flavor/image property right | 18:10 |
sean-k-mooney | so if we had barbican in a job we could add a single test that boots a vm with a flavor that uses it | 18:10 |
sean-k-mooney | instead of confiugring the jobs to use that falvor for evertything | 18:11 |
melwitt | sean-k-mooney: another thing to know is that for rbd I had to do a standard snapshot (not direct/clone snapshot) because the current stable release of ceph (quincy) does not support clone with a child having a different passphrase as the parent | 18:11 |
melwitt | *than the parent | 18:11 |
melwitt | sean-k-mooney: right. flavor you use hw:ephemeral_encryption = true and image hw_ephemeral_encryption = true | 18:12 |
sean-k-mooney | so vtpm need barbican to be tested too | 18:15 |
melwitt | yes | 18:15 |
sean-k-mooney | so it might be good to add barbican to one of our jobs and do a simpel boot test and also test vtpm in the future | 18:15 |
sean-k-mooney | we have some whitebox tests to verify vtpm | 18:15 |
sean-k-mooney | melwitt: why are you using a cutom ploicy file for barbican | 18:31 |
sean-k-mooney | hum i see | 18:31 |
sean-k-mooney | you are giving admin extra premisison | 18:31 |
sean-k-mooney | we likley shoudl fix that | 18:32 |
sean-k-mooney | either by skiping those tests | 18:32 |
sean-k-mooney | or by ensuring they are not useing the admin user | 18:32 |
melwitt | sean-k-mooney: there's at least one tempest test that involves admin doing something on behalf of a user, and by default, admin don't have permission to user's barbican secrets | 18:32 |
melwitt | yeah. I'm not sure if that's really what we want but I was just looking to see if I could pass all of tempest as-is | 18:32 |
melwitt | ack | 18:32 |
sean-k-mooney | right so i would not expect an admin to be able to start a vm with disk encyption | 18:33 |
sean-k-mooney | if they tweak the policy they could | 18:33 |
sean-k-mooney | but i dont think we ned to test that | 18:33 |
melwitt | ok, that makes it easy then | 18:33 |
sean-k-mooney | im going to skip that locally for now | 18:33 |
melwitt | sure | 18:33 |
sean-k-mooney | we could docuemnt this i just dont know if we want to test it as i would generally think its best to test with default policy | 18:34 |
melwitt | I think that would be fine. I agree it's best to test with default policy | 18:35 |
sean-k-mooney | ill be around for about another hour | 18:35 |
sean-k-mooney | im going to kick off https://termbin.com/k0sv | 18:35 |
sean-k-mooney | and then ill proably start playing with it tomorrow or monday | 18:36 |
melwitt | the admin test is good to verify that it is _possible_ to let admin do stuff if someone wanted that | 18:36 |
melwitt | but that's all it's for | 18:36 |
melwitt | ok cool | 18:37 |
sean-k-mooney | i need to do some prep on the vm fist like install git and clone devstack | 18:37 |
sean-k-mooney | but that wont thake that long | 18:37 |
wncslln | hi o/, anyone can help me to enable boot with uefi mode, please? | 18:47 |
sean-k-mooney | you just need to set hw_firmware_type=uefi in the glance iamge properties i belive | 18:48 |
sean-k-mooney | oh an hw_machine_type=q35 | 18:48 |
wncslln | just set this two properties? | 18:48 |
sean-k-mooney | assuming you have the correct ovmf firmware installed on all your compute nodes | 18:48 |
wncslln | hmmm i dont think i have | 18:49 |
sean-k-mooney | yep if you have install the ovmf package on all your compute nodes then qemu will have access to the uefi firmeware | 18:49 |
sean-k-mooney | https://docs.openstack.org/nova/yoga/admin/uefi.html | 18:49 |
sean-k-mooney | hw_machine_type=q35 is semi optional | 18:51 |
sean-k-mooney | qemu considers uefi with the the pc machine type to be experimantal and not recommended | 18:51 |
sean-k-mooney | most distos dont ship a ovmf fireware that properly supprots it | 18:52 |
wncslln | i have deployed with kolla-ansible so nova agents are containers. this is a problem? | 18:52 |
sean-k-mooney | so q35 should be used | 18:52 |
sean-k-mooney | no i have kolla-ansible at home | 18:52 |
sean-k-mooney | i am pretty sure there contienres have to correct pakcages already | 18:52 |
sean-k-mooney | so you just need ot set those tow imagea proepries in glance on the image and it should work | 18:52 |
sean-k-mooney | assuming the glance iamge supprot uefi | 18:53 |
sean-k-mooney | the image need to have a gpt partion table and an efi partion | 18:53 |
sean-k-mooney | same as real hardware | 18:53 |
wncslln | i setted the properties but the vm wont boot. could be a problem in the image | 19:02 |
sean-k-mooney | i would try somethign like the latest ubuntu cloud image | 19:02 |
sean-k-mooney | i think that support uefi boot but its been a while since i tried it | 19:03 |
wncslln | there is a way to know if the image have a gpt and efi partion? | 19:03 |
sean-k-mooney | you can download it and inspect it in a vm or with tools like libguestfs | 19:03 |
wncslln | ok, i'll make some tests here | 19:05 |
wncslln | sean-k-mooney: thanks for the help | 19:05 |
wncslln | sean-k-mooney: i found the problem xD probably the image that im using has no uefi/gpt. i downloaded the ubuntu cloud image with support to uefi and worked fine | 19:38 |
sean-k-mooney | cool | 19:44 |
sean-k-mooney | ya that make sense a lot of older images wont | 19:44 |
sean-k-mooney | and if you created it form an iso inn a vm you need to make sure that initall vm is also set up for uefi | 19:45 |
sean-k-mooney | same for windows as the installers have diffent default based on how the vm was booted | 19:45 |
wncslln | the image with problem is a snapshot from a vm annndddd probably the old image does not have uefi | 19:46 |
opendevreview | Merged openstack/nova-specs master: List requested availability zones https://review.opendev.org/c/openstack/nova-specs/+/904368 | 19:50 |
sean-k-mooney | melwitt: https://paste.opendev.org/show/biqXsY6SO6ePMPosdwIW/ | 19:54 |
melwitt | sean-k-mooney: that's good right? haha | 20:00 |
sean-k-mooney | yep https://termbin.com/9tk3 | 20:00 |
sean-k-mooney | well sort of | 20:01 |
sean-k-mooney | i copied that config form your raw job | 20:01 |
sean-k-mooney | so its is using disk encyption | 20:01 |
sean-k-mooney | and i can log into the vm via the console | 20:01 |
sean-k-mooney | so it booted fine | 20:02 |
sean-k-mooney | however its still booting form a qcow with a raw backing file | 20:02 |
sean-k-mooney | so its not actully checking raw images properly | 20:02 |
sean-k-mooney | but ya it looks like everything is working correctly other then that. and i booted with both swap and ephmeral | 20:02 |
sean-k-mooney | im going to leave it there for tonight just wanted to let you knwo i ahve successfuly deployed our code and booted a vm with it | 20:03 |
melwitt | sean-k-mooney: you have set both use_cow_images = False and images_type = raw? bc if you don't set both it won't do raw | 20:03 |
sean-k-mooney | images_type=raw or flat | 20:04 |
melwitt | ok, thanks :) | 20:04 |
sean-k-mooney | actully im goign to check something | 20:05 |
melwitt | ok. at least for me if use_cow_images is not set to False, it will do qcow2 instead of raw despite images_type = raw or flat | 20:06 |
sean-k-mooney | i started with your raw job | 20:06 |
sean-k-mooney | but i might not have kept that in the local.conf when i finished editing it | 20:06 |
sean-k-mooney | ok ya | 20:07 |
sean-k-mooney | i removed that in my editing | 20:07 |
melwitt | ok. that's the only way I know of that what you're seeing could be happening. if you do have use_cow_images = False then something else is wrong that I haven't seen yet | 20:07 |
sean-k-mooney | so everything is working as expecte its using qcow for the vms because that is the default | 20:07 |
sean-k-mooney | no in my local.conf i don thave those set so we are getting the default | 20:07 |
melwitt | ok I see | 20:08 |
sean-k-mooney | so cool i wont have time to do much with this tomorrow but ill play around with it on monday and start reivewing the series | 20:09 |
sean-k-mooney | have you done any multi node testing. | 20:09 |
sean-k-mooney | i.e. cold/live migration | 20:09 |
sean-k-mooney | the current zuul jobs seam to be single node | 20:10 |
sean-k-mooney | on a related note i ported your scater gather patch on to my serise and i got a pretty clean zuul run https://review.opendev.org/c/openstack/nova/+/905285 | 20:11 |
sean-k-mooney | the main delta between what i have doen is how we monkeypatch | 20:12 |
melwitt | sean-k-mooney: hm looks like no. I did resize in place. I'll add a multinode job on that patch | 20:12 |
sean-k-mooney | ok cool ill add a second node and play with that locally too next week | 20:13 |
melwitt | sean-k-mooney: ok cool. I dunno if you saw I wrote up a comment on the scatter gather patch about how I am seeing requests getting "stuck" on nova-api, like it receives the request and does all the stuff but does not respond. I have seen this in other failures that are not related with my patch too but with the patch it seems to happen much more often | 20:14 |
sean-k-mooney | ya i did | 20:14 |
melwitt | oh ok | 20:15 |
sean-k-mooney | i wanted to see if i saw the same behvior | 20:15 |
sean-k-mooney | i was wondering if htat coudl be related to how/when monkey patching was done | 20:15 |
melwitt | ok yeah | 20:15 |
sean-k-mooney | but there is also this comment on my patch https://review.opendev.org/c/openstack/nova/+/905285/1/nova/compute/multi_cell_list.py | 20:15 |
sean-k-mooney | that might be related too | 20:16 |
sean-k-mooney | the timeout might need to be done in the try? | 20:16 |
sean-k-mooney | or rather the creation or the timer | 20:16 |
melwitt | that's good. really want to find what/why that's happening | 20:16 |
melwitt | yeah I was thinking about that too | 20:16 |
sean-k-mooney | im not familar enough with timers in python to say one way or another but i dont think it would hurt to include it in the try block | 20:17 |
sean-k-mooney | melwitt: another data point on the previosu patch that uses the standar lib queue https://review.opendev.org/c/openstack/nova/+/900681/8?tab=change-view-tab-header-zuul-results-summary i got a bunch of timeourts | 20:18 |
sean-k-mooney | but i have not had time to look at way | 20:18 |
sean-k-mooney | *why | 20:18 |
melwitt | yeah, agree. it would be safer to move it into the try, I was going to do that anyway. I think I followed this example initially https://eventlet.net/doc/modules/timeout.html#eventlet.timeout.Timeout | 20:18 |
melwitt | whoa that is a lot of timeouts indeed | 20:19 |
melwitt | sean-k-mooney: I wasn't sure how to interpret this when I looked at it too, does this mean the eventlet queue does something special to work with greenlets? "The eventlet.queue module implements multi-producer, multi-consumer queues that work across greenlets, with the API similar to the classes found in the standard Queue and multiprocessing modules." https://eventlet.net/doc/modules/queue.html | 20:21 |
melwitt | if so, that could be why | 20:22 |
sean-k-mooney | ya so you have done the queue chagne in the same patch that swap to futureist | 20:22 |
sean-k-mooney | i have them split out | 20:22 |
melwitt | right, I know :) | 20:22 |
sean-k-mooney | so it could be related | 20:23 |
melwitt | I'm saying if eventlet.queue is "special" and you're not using it, maybe queue won't work right. oh, but eventlet should be patching queue right? hm | 20:23 |
sean-k-mooney | ya i was assuming eventlet was patching it | 20:23 |
melwitt | if it's patching it I dunno why it wouldn't work 🤔 | 20:23 |
melwitt | yeah | 20:23 |
sean-k-mooney | ok i need to get dinner and feed frya. she is getting barky | 20:24 |
sean-k-mooney | caht to you tomrorow | 20:24 |
JayF | some eventlet queue usages have known bugs | 20:24 |
melwitt | haha k, talk to you tomorrow | 20:24 |
JayF | Let me read scrollback and see if you are in those categories, if so I can likely find the bugs | 20:24 |
JayF | itamarst has been digging them up left and right as part of the "fix eventlet" work | 20:24 |
JayF | He leaves me frightening issue links in my slack every now and then :D | 20:25 |
melwitt | JayF: it's the opposite, what we were chatting about. sean did an in-place swap from eventlet.queue to standard lib queue and got a lot of timed out jobs. which might not be related to the switch https://review.opendev.org/c/openstack/nova/+/900681/8?tab=change-view-tab-header-zuul-results-summary | 20:26 |
JayF | Yeah this tracks | 20:26 |
JayF | because I think there's an RLock used by stdlib queue | 20:26 |
JayF | which is still getting monkey patched in this case | 20:27 |
JayF | let me dig | 20:27 |
JayF | so my hypothesis is this: | 20:27 |
melwitt | earlier runs looked fine and I dunno if I see any difference in the code so I'm not sure | 20:27 |
JayF | by using a stdlib queue, you're exposing yourself to using threading.Nlock (N can be multiple things). Some of these, particularly Rlock, is bugged in inconsistent ways (I am still looking for the github issue) | 20:28 |
melwitt | I didn't look in detail yet | 20:28 |
melwitt | hm ok | 20:31 |
sean-k-mooney | hum ok not back for long what i might do instead is this | 20:32 |
sean-k-mooney | add a inderiction where we import a diffent queue based on if we are patched or not and cenntralise that in the utils module | 20:32 |
sean-k-mooney | so that we can drop the selection of which to use in one place if we wever fully remove eventlet | 20:33 |
melwitt | ack | 20:34 |
sean-k-mooney | that way if the code in question is not monkypatche it will use the std lib one and where it is it will use the eventlet one | 20:34 |
JayF | sean-k-mooney: is "monkeypatched" a boolean for you? You're not playing games with monkeypatching on/off for different modules, are you? | 20:34 |
sean-k-mooney | its gloabl | 20:35 |
JayF | sean-k-mooney: that I think is going to be an attractive pattern that may lead to more pain than it's worth, based on my understanding many of the eventlet bugs in python3.11/3.12 which are just ... unfixable by design are mostly at the edges of stdlib interacting with monkey patched bits of code | 20:35 |
JayF | OK good, hopefully that means you'll avoid the sharpest edges | 20:35 |
sean-k-mooney | so in my case it depens on what direcotry the console cript is defiend in | 20:35 |
sean-k-mooney | if its under nova.cmd its monky patched | 20:35 |
sean-k-mooney | if its under nova.bin its not | 20:35 |
JayF | but there's never a case, as conjectured above, where say, queue would be std lib but threading would be eventlet? | 20:36 |
sean-k-mooney | not unless i screwed up | 20:36 |
sean-k-mooney | so the nova api in my serise can be patch or unpatched | 20:36 |
sean-k-mooney | if run via the wsgi script its unpatched | 20:36 |
sean-k-mooney | if run via nova.cmd.nova-api it is | 20:37 |
JayF | okay cool, in that case my hypothesis is invalid and unlikely to be happening | 20:37 |
JayF | and I'd suggest potentially even enforcing that if possible | 20:37 |
sean-k-mooney | but within any binary its one of the otehr | 20:37 |
JayF | Do you mind a quick topic change? | 20:38 |
sean-k-mooney | sure but i need to leave in a few | 20:38 |
sean-k-mooney | whats on your mind | 20:38 |
JayF | Just curious for ironic-guest-metadata, would you mind doing the piece of that work of splitting out the libvirt code to the common compute driver? | 20:38 |
JayF | I'd prefer not operate in a portion of the nova codebase outside of the ironic driver codepath, just based on familiarity | 20:38 |
JayF | but I can if you are too busy helping save the world from eventlet (a project which I too am invested in) | 20:39 |
sean-k-mooney | hum i might be able to ill have to see if i have time but should nbot be terrible complex | 20:39 |
sean-k-mooney | ask me again next week and ill see if i can help | 20:39 |
sean-k-mooney | i need to make some progress on the healtcheck stuff but im hoping to spend most of the next 4-5 weeks until codefreeze mainaly on upstream nova stuff | 20:40 |
sean-k-mooney | JayF: the eventlet stuff is just somethign im looking at in my personal time | 20:41 |
sean-k-mooney | so im not goign to be spending that much time looking at it untill we have some more detailed disucssions in the next ptg | 20:41 |
sean-k-mooney | i know others have concerse over if moving to asyncio or realtreading is really the right move | 20:42 |
JayF | Honestly, I joke sometimes that I'm a sysadmin who pretends to be a developer, but what that means in reality is for stuff like "should we use asyncio or $other?" type discussions, I tend to let folks with stronger opinions and foundational knowledge battle it out :) | 20:48 |
JayF | my concern is that we use something well maintained, with reasonable patterns that anyone who works on openstack for the next decade will be able to follow | 20:48 |
JayF | so like, I have zero opinions about how we move; I have lots of opinions that moving is important, and doing it well is important (and as I've mentioned elsewhere, I've gotten a part time developer assigned to OpenStack upstream at GR-OSS, he is onboarding now and will be pointed at eventlet migrations for underresourced openstack projects once we have a migration path set) | 20:49 |
opendevreview | melanie witt proposed openstack/nova master: DNM test ephemeral encryption + resize: qcow2, raw, rbd https://review.opendev.org/c/openstack/nova/+/862416 | 22:22 |
opendevreview | melanie witt proposed openstack/nova master: DNM test ephemeral encryption + resize: qcow2, raw, rbd https://review.opendev.org/c/openstack/nova/+/862416 | 23:04 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!