opendevreview | Merged openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507 | 00:26 |
---|---|---|
gibi | bauzas: yeah, I saw the schedule but did not pondered much about it. I will be busy with downstream stuff at least until end of this year or potentially until next summer. That will have more impact on my AA contribution than the AA schedule has | 07:12 |
bauzas | gibi: sure, but that means we have less time for features this cycle | 07:19 |
gibi | yes we will have less review bandwidth in the team and sorter cycle | 07:22 |
gibi | we have couple of in progress features (image encryption, manila support, PCI in placement) if we could finish those in AA that would be totally enough for me | 07:32 |
gibi | I will not push for the neutron based sriov support in placement feature in AA. I probably baby sitt the remaining parts of the flavor based PCI in placement feature (basically the scheduling part) | 07:33 |
opendevreview | Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514 | 08:20 |
opendevreview | Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514 | 08:29 |
opendevreview | Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514 | 08:32 |
opendevreview | Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514 | 08:52 |
sean-k-mooney1 | gibi: you have a pci spec updte i assume that will describe what is complted in zed and what is left | 09:31 |
sean-k-mooney1 | https://review.opendev.org/c/openstack/nova-specs/+/855218 | 09:31 |
gibi | sean-k-mooney1: I have a -1 on it to add a note that the scheduling part is not landed | 09:31 |
sean-k-mooney1 | so will we mark the blueprint as implemnted nad redefien the scope | 09:31 |
sean-k-mooney1 | ack | 09:31 |
gibi | that is a good question | 09:31 |
*** sean-k-mooney1 is now known as sean-k-mooney | 09:31 | |
gibi | I'm not sure I want to create a separate spec for the scheduling part | 09:32 |
gibi | that would need a lot of context building in the spec | 09:32 |
gibi | I would keep the whole thing in a single doc | 09:32 |
gibi | bauzas, sean-k-mooney: how do you feel about it? | 09:32 |
sean-k-mooney | well we will need to copy it to the right repo | 09:32 |
gibi | sure | 09:32 |
sean-k-mooney | and a few minor updates | 09:32 |
gibi | re-proposing it is OK | 09:32 |
sean-k-mooney | baiscaily startign her <=== | 09:32 |
sean-k-mooney | *here | 09:33 |
gibi | yes | 09:33 |
gibi | that is OK | 09:33 |
gibi | but I don't want to split and rewrite | 09:33 |
sean-k-mooney | ok i guess we can see what that looks like | 09:33 |
sean-k-mooney | so keep the same blueprint then for AA too | 09:34 |
sean-k-mooney | thats proably ok | 09:34 |
gibi | I'm OK to create a new bp if that helps tracking, but I don't want to create a totally new spec | 09:34 |
sean-k-mooney | ack lets just update the work items to mark what is left | 09:35 |
gibi | my plan sort term: 1) move the existing two FUPs to master and merge them before RC1 2) update the Zed spec and merge that 3) re-propose the spec to AA | 09:36 |
sean-k-mooney | sure ping me when those are ready to review | 09:37 |
gibi | ack, thanks | 09:38 |
opendevreview | John Garbutt proposed openstack/nova master: update default overcommit https://review.opendev.org/c/openstack/nova/+/830829 | 09:38 |
bauzas | gibi: sorry, dad taxi here, will reply later | 09:39 |
gibi | bauzas: no worries, it is not urgent | 09:42 |
sean-k-mooney | bauzas: gibi john just remmebered one of my patches https://review.opendev.org/c/openstack/nova/+/830829 | 09:43 |
sean-k-mooney | can we land that | 09:43 |
gibi | being at FF makes this comment https://review.opendev.org/c/openstack/nova/+/830829/3#message-12aca1ab8f202fbfa1793c458103055db16a67a5 pretty heavy | 09:45 |
sean-k-mooney | we punted this last feature freeze for the same reason | 09:46 |
sean-k-mooney | i orginally created the patch right at the end of yoga because we agreed to change it in the ptg and the person that raised it did not work on it | 09:48 |
sean-k-mooney | i had tought this already landed this cycle | 09:48 |
sean-k-mooney | if we delay it to AA i would like to merge it right after RC1 | 09:48 |
gibi | it sit with a -1 since Jul unfortunately | 09:48 |
sean-k-mooney | ya i forgot this existied | 09:49 |
sean-k-mooney | which is my bad | 09:49 |
gibi | one alternative I imagine is to merge it and at the same time check the effect of it very carefully and be ready to revert it before RC1 if we see anomalies in CI due to this | 09:51 |
auniyal | O/ | 09:51 |
sean-k-mooney | ya we could | 09:51 |
auniyal | for VM resize, why its required to run resize confirm | 09:51 |
sean-k-mooney | auniyal: becaue the resize might break the vm | 09:52 |
sean-k-mooney | resizing is changing the flavor | 09:52 |
auniyal | but how can I know, that it may break, I can verify | 09:52 |
sean-k-mooney | which may increase or deccreate the resouces allocated and may also change other aspect like numa toplogy or cpu pinning | 09:52 |
sean-k-mooney | yes if the vm is running before you resize it | 09:52 |
sean-k-mooney | in the resize verify status | 09:53 |
sean-k-mooney | you can ssh in | 09:53 |
sean-k-mooney | and check the vm | 09:53 |
sean-k-mooney | its in resize_verify when you need to conrim or revert | 09:53 |
auniyal | okay. while instance.status is verify_resize, we can go and check VM | 09:53 |
sean-k-mooney | yep that is why that state exists | 09:54 |
sean-k-mooney | for you to verify the vm | 09:54 |
sean-k-mooney | we have discused changing this and actully partly agree too but i have not had time to write the spec and work on it | 09:54 |
sean-k-mooney | its a nice UX feature enahcnment but not one that PM prioritiesed | 09:55 |
sean-k-mooney | https://etherpad.opendev.org/p/r.321f34cf3eb9caa9d87a9ec8349c3d29 | 09:57 |
sean-k-mooney | https://etherpad.opendev.org/p/r.321f34cf3eb9caa9d87a9ec8349c3d29#L613 | 09:57 |
sean-k-mooney | that is where we last discussed this if you want context of what we considerd changing. | 09:57 |
sean-k-mooney | whoami-rajat: hehe ^ last line of the agreement is "wait for the rebuild of bvf solution before we do the --image part" context is server recreate api/resize enhancements we discussed back in the wallaby ptg | 09:59 |
bauzas | gibi: what you can do is to use the same spec for Antelope | 10:06 |
bauzas | and just saying what you need to do in AA in the work items if you want | 10:07 |
gibi | bauzas: ack, I think this is aligned with sean-k-mooney's view. | 10:07 |
bauzas | this way, we should just accept again this spec for Antelope quickly | 10:07 |
gibi | OK | 10:08 |
gibi | thanks | 10:08 |
sean-k-mooney | auniyal: i ment to link this before https://docs.openstack.org/nova/latest/contributor/resize-and-cold-migrate.html | 10:10 |
opendevreview | Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514 | 10:14 |
gibi | JayF, sean-k-mooney: I think I figured out the root cause of https://bugs.launchpad.net/nova/+bug/1988482 will push a requirements patch soon | 10:18 |
sean-k-mooney | ack i assume its sqlalcemey related or similar? | 10:23 |
sean-k-mooney | oh PrettyTable | 10:23 |
sean-k-mooney | did know tha was a thing we used | 10:24 |
justas_napa | Sorry for spamming the channel with proposals. Finally pass all syntax checks. While uploading our source changes we noticed that the codebase changes, we are porting the changes tp the latest and will uppload them in coming days. | 10:37 |
sean-k-mooney | justas_napa: is that the multi queue spec? | 10:38 |
sean-k-mooney | im reviewing it now and it has many issues form a design point of view | 10:38 |
justas_napa | yes it is. I'm open to discuss all issues | 10:39 |
sean-k-mooney | if we add support for multiqueue with hardware offloaded ovs i woudl exepct it to look differnt then you propose | 10:39 |
justas_napa | should we discuss your ideas here in IRC or rather in Gerrit? | 10:42 |
sean-k-mooney | im leaving commets in gerrit now | 10:42 |
sean-k-mooney | in general requesting the number of quese via the port binding profiel is not ok | 10:43 |
sean-k-mooney | you will need a new neutron extesion | 10:43 |
auniyal | ack, thanks sean-k-mooney | 10:43 |
sean-k-mooney | that will need a neutron spec and there will have to be a dicussion of if that shoudl be something that the operator defince and a user slects | 10:44 |
sean-k-mooney | kind of like qos polcies | 10:44 |
auniyal | I was looking this one https://docs.openstack.org/nova/ussuri/user/resize.html | 10:44 |
sean-k-mooney | or if the user should be able ot request quese | 10:44 |
justas_napa | in my PoV, it should be something that operator defines, because it is related to the underlying HW | 10:49 |
justas_napa | like a flavor of an instance | 10:50 |
sean-k-mooney | justas_napa: which is not compatible how openstack is ment to work | 10:50 |
sean-k-mooney | openstack is ment to be an cloud abstraction | 10:50 |
sean-k-mooney | there are some atribute that we can expsoe but low level hardware turnign is out of scope | 10:51 |
sean-k-mooney | justas_napa: what we can do is scudle based on how many quee are avaible with some constriats | 10:51 |
sean-k-mooney | and provide a way to ask for a VF with x queue | 10:52 |
justas_napa | yes, I think this would be OK | 10:52 |
sean-k-mooney | we can also have the libvirt driver do min(vf_queue,vcpus) | 10:52 |
sean-k-mooney | on a per interface basis | 10:53 |
justas_napa | that is actually a very good idea | 10:53 |
sean-k-mooney | those are the types of abstration that we need to build to make this useable in a openstack context. | 10:53 |
sean-k-mooney | where the end user does not know anything about the hardware and the operator knows nothing about the worklaod | 10:54 |
gibi | elodilles: JayF: sean-k-mooney: the requirement fix for the stable/yoga prettytable gate block https://review.opendev.org/c/openstack/requirements/+/855643 | 10:55 |
gibi | frickler: ^^ this is also related to the prettytable issue you reported last weekend | 10:57 |
justas_napa | but allowing to ask for VF w/ X queues meant that we expose the use to some kind of knowledge of underlying HW | 10:59 |
sean-k-mooney | it depens on how its done | 11:00 |
justas_napa | Isn't this like allowing a user to choose an instance with 1G or 4G of RAM? | 11:00 |
sean-k-mooney | for example if we created a neutron qos policy that enabled (packeed queues, multi queue and 4 queue) | 11:01 |
sean-k-mooney | then the user could select that | 11:01 |
justas_napa | ant then allocating this instance on a server that has enough resources (in this case, a VF with required number of queues) | 11:01 |
sean-k-mooney | the operator is provideeing the hward ware knoladge | 11:01 |
sean-k-mooney | the user is jsut select the mutliqueue qos policy | 11:01 |
sean-k-mooney | justas_napa: if we did it ^ way | 11:02 |
sean-k-mooney | it woudl be | 11:02 |
sean-k-mooney | if we jsut put the number of quese in the bining profile its not | 11:02 |
frickler | gibi: ah, I'd rather see a fix than a pin, but anyway. also not sure how we actually py39 pins when we don't have them there globally | 11:03 |
justas_napa | I think we are aligned on the overall idea of implementation. The only question is the implementation. | 11:03 |
sean-k-mooney | i think you will need to build agreement between both nova and neutron to move this forward | 11:05 |
gibi | frickler: the above pin is for stable/yoga. I'm not sure we ever intended to test with unconstrained deps on stable | 11:05 |
sean-k-mooney | it will require 2 specs | 11:05 |
sean-k-mooney | justas_napa: and it will require quite a bit more designg work before it can proceed | 11:05 |
sean-k-mooney | justas_napa: im not against supprot multi queue with hardware offloead ovs | 11:05 |
sean-k-mooney | but just seting expectation that his is a lot more invoveld then your spec implies. | 11:06 |
frickler | gibi: right, let's discuss that in the requirements team. | 11:08 |
sean-k-mooney | justas_napa: have you read https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html | 11:08 |
gibi | frickler: sure | 11:08 |
sean-k-mooney | justas_napa: if not you shoudl read it in detail becasue your propoals will ahve to be compatable with that and its futrue direction | 11:08 |
sean-k-mooney | gibi: https://review.opendev.org/c/openstack/nova-specs/+/855514/ that would be good for you to add to your review list as the scdhuling aspecst will directly affect pci in placment eventually | 11:10 |
sean-k-mooney | justas_napa: have you spoken to the neutron team about this by the way | 11:11 |
sean-k-mooney | gibi: just reviewd https://review.opendev.org/c/openstack/placement/+/849348 form stephen. we are just pass FF bug this is a bug fix as far as im concerned so is it still ok to merge it | 11:15 |
gibi | sean-k-mooney: this is a bugfix so I'm OK to land it | 11:16 |
gibi | sean-k-mooney: bug we don't have the tracking bug for it | 11:16 |
gibi | bug/but | 11:16 |
sean-k-mooney | ya i noticed that. we dont always require it but it is nice to have. it would be a story because placment | 11:17 |
sean-k-mooney | i stongly dislike story broard | 11:17 |
sean-k-mooney | i suspect that is why stephen did not file one too | 11:17 |
sean-k-mooney | bauzas: any input ^ | 11:19 |
sean-k-mooney | or stephenfin if you are about | 11:19 |
elodilles | gibi: thanks, looks good! | 11:26 |
*** carloss is now known as carloss|afk | 11:28 | |
gibi | sean-k-mooney: I've added my immediate questions and suggestions the the smartnic mutiqueue spec | 11:29 |
gibi | thanks for the headsup | 11:29 |
elodilles | gibi: i just wonder why this does not affect master branch :-o | 11:29 |
gibi | elodilles: on master prettytable is pinned | 11:30 |
gibi | in upper constarints | 11:30 |
gibi | I think | 11:30 |
elodilles | oh, is it? :-o | 11:30 |
gibi | https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L141 | 11:30 |
gibi | it is | 11:30 |
gibi | without python version filter | 11:30 |
gibi | but on stable/yoga | 11:30 |
gibi | it is only pinned for py38 and py36 | 11:30 |
gibi | not for py39 | 11:31 |
gibi | the general issue that there is no py39 pins in upper constarints on stable/yoga is now raised in the requirements channel | 11:31 |
elodilles | for the record, this means whenever prettytable reqs will be bumped on master branch things will break on master as well :S | 11:32 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename _to_device_spec_conf to _to_list_of_json_str https://review.opendev.org/c/openstack/nova/+/855648 | 11:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce PCI pool filtering bug https://review.opendev.org/c/openstack/nova/+/855649 | 11:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Strictly follow placement allocation during PCI claim https://review.opendev.org/c/openstack/nova/+/855650 | 11:36 |
gibi | elodilles: yes it was detected during the last weekend by frickler here https://zuul.opendev.org/t/openstack/build/2e4535d6aa2d4fd987ddfd0d0bc296d1 | 11:37 |
gibi | elodilles: and the master constraint was not bumped | 11:37 |
gibi | https://review.opendev.org/c/openstack/requirements/+/854862/2..3/upper-constraints.txt | 11:38 |
elodilles | yet | 11:40 |
elodilles | but we'll need to bump after some time | 11:40 |
elodilles | today is the Requirements Freeze date, so for Zed this is probably OK | 11:41 |
gibi | yes, I agree that we should do something on master | 11:42 |
gibi | I'm not sure at the moment that we need to directly adapt to the breaking change in a minor version or what | 11:42 |
gibi | as I haven't looked it into what exactly changed in prettytable and what our test expects exacltly | 11:43 |
elodilles | yepp, i also don't know whether this is a feature or a bug in prettytable :) | 11:43 |
gibi | exactly | 11:43 |
justas_napa | sean-k-mooney: sorry for a late reply, work meeting. No, I have not talked with Neutron yet, I check the proposal you've linked. | 11:43 |
auniyal | I am getting lot of these - http://pastebin.test.redhat.com/1072393, for unknown reason | 11:43 |
auniyal | flavour is available, I can see it user openstack flavor show, but still .... | 11:44 |
gibi | auniyal: you have to put that into a public pastebin as the linked one is RH internal | 11:44 |
gibi | ie. you can use https://paste.opendev.org/ | 11:44 |
gibi | I can look at it as I'm in RH but like elodilles cannot | 11:45 |
auniyal | thanks, gibi, sure will use opendev | 11:46 |
auniyal | copied here as well - https://paste.opendev.org/show/bDpuAInn7ZW6bETeUVjU/ | 11:46 |
gibi | auniyal: GET /flavors/<flavor_id> expects a flavor_id not the name of the flavor https://docs.openstack.org/api-ref/compute/?expanded=show-flavor-details-detail#show-flavor-details | 11:48 |
gibi | so in "GET /compute/v2.1/flavors/m1.medium" the m1.medium is the name of the flavor not the id | 11:49 |
auniyal | I ran resize operation, last time it took name | 11:49 |
gibi | the openstack client can translate between names and ids | 11:49 |
auniyal | like today only | 11:49 |
opendevreview | Vlad Gusev proposed openstack/nova stable/stein: Ensure MAC addresses characters are in the same case https://review.opendev.org/c/openstack/nova/+/855553 | 11:50 |
auniyal | so I should try to use ID instead of name | 11:50 |
gibi | btw, during such translation the openstack client will try the name as a flavor id and if that query returns 404 then it lists the flavors and filter by name locally | 11:50 |
gibi | if you do things via the openstack client then you can use --debug to look at what API requests the client make so you can correlate the error in the nova log with the client request | 11:52 |
gibi | the above API log can be the result of the client trying to use the flavor name as id and then fallbacking to listing the the flavors whent that query returns 404 | 11:53 |
frickler | seems they are not sure yet whether the change in header alignment is a bug or a feature https://github.com/jazzband/prettytable/pull/183 | 11:55 |
sean-k-mooney | frickler: if tis a feature its tecnially a breaking change so should have been a major bump | 11:57 |
frickler | sean-k-mooney: ack, added a comment on that PR now | 11:59 |
gibi | frickler: thanks | 11:59 |
frickler | maybe then we can avoid having to fix it in nova and just exclude 3.4.0 | 11:59 |
sean-k-mooney | by the way why are we not hitting this on master? | 12:04 |
sean-k-mooney | ah | 12:05 |
sean-k-mooney | master is clampped already to 3.3.3 | 12:05 |
sean-k-mooney | 3.3.0 | 12:05 |
sean-k-mooney | https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L141 | 12:05 |
sean-k-mooney | frickler: we are not clamping it on master today https://github.com/openstack/requirements/blob/5b346cf74df6bbfc67f794c053d3c4210da4bacb/global-requirements.txt#L197 | 12:06 |
sean-k-mooney | we just have not merged the update to move it forward form 3.3.0 | 12:06 |
sean-k-mooney | but we have done that on yoga | 12:06 |
sean-k-mooney | presumable this can break all branches | 12:07 |
sean-k-mooney | so we shoudl definlaly also clamp it on master to prevent breaking the gat during FF | 12:07 |
sean-k-mooney | elodilles: gibi ^ | 12:08 |
gibi | sean-k-mooney: it is known and intentionally not updated on master by https://review.opendev.org/c/openstack/requirements/+/854862/2..3/upper-constraints.txt | 12:09 |
gibi | sean-k-mooney: we are pinned to 3.3.0 on master at the moment | 12:09 |
sean-k-mooney | where | 12:09 |
sean-k-mooney | that goign to get overrided when the bot runs | 12:10 |
gibi | https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L141 | 12:10 |
gibi | sean-k-mooney: this all happend already. the bot ran and update the constraint, the nova job failed on the requirements patch, and then frickler removed the bump from the patch | 12:11 |
sean-k-mooney | hum ok | 12:11 |
gibi | frickler even pinged us during the weekend about it :) | 12:11 |
sean-k-mooney | so instead of pinning it in global-requireemnts where the both will not try to bump it | 12:11 |
sean-k-mooney | we are pinnign it in the generate file | 12:11 |
sean-k-mooney | and relaying on the failure | 12:11 |
gibi | I think it is OK while we wait for the devs to decide if this is a bug or not | 12:12 |
sean-k-mooney | ok | 12:12 |
sean-k-mooney | gibi: if they decied its not a bug and dont fix ti we will likely need to disable those tests | 12:13 |
sean-k-mooney | or rewirte them to not use pretty table | 12:14 |
sean-k-mooney | unless the assert fails | 12:14 |
gibi | sure we need to do something with it, but I haven't looked at the tests so I don't know which direction we should go | 12:14 |
sean-k-mooney | we cant just update the string to the new format as it would fail with older version of pretty table | 12:15 |
sean-k-mooney | and we cant raise the test requirement on stable | 12:15 |
sean-k-mooney | so it will need a differnt solution that works with both formats | 12:15 |
gibi | I do believe we will kep this pinned on stable | 12:15 |
gibi | keep | 12:15 |
sean-k-mooney | we could unpin in master and use the new format | 12:16 |
sean-k-mooney | that would mean that would effectivly be our min version for zed | 12:16 |
sean-k-mooney | for testing at least | 12:16 |
sean-k-mooney | i think this is not a runtime requirement | 12:16 |
gibi | I would not change this for Zed either so close to RC1 | 12:16 |
gibi | as we dont need anything from 3.4.0 | 12:17 |
sean-k-mooney | so pin for both | 12:17 |
sean-k-mooney | zed and yoga | 12:17 |
sean-k-mooney | i guess we could do that | 12:17 |
gibi | probably yes | 12:17 |
gibi | and understandt the test code then fix it on master | 12:17 |
sean-k-mooney | this is the current test https://github.com/openstack/nova/blob/master/nova/tests/unit/cmd/test_manage.py#L48-L67 | 12:19 |
sean-k-mooney | so the alieng ment i think changed form left aligned to centered | 12:19 |
sean-k-mooney | In 3.3.0, the data and header were left-aligned. | 12:20 |
sean-k-mooney | In 3.4.0, only the data was left-aligned, the header was centre-aligned. | 12:20 |
frickler | FYI all these blockers are tracked in https://etherpad.opendev.org/p/requirements-blockers (at least that's the idea). help to unblock things is always welcome | 12:21 |
sean-k-mooney | we set the alignment here https://github.com/openstack/nova/blob/90e2a5e50fbf08e62a1aedd5e176845ee22d96c9/nova/cmd/manage.py#L124 | 12:21 |
sean-k-mooney | so we might jsut be able to set pt.header_align='l' | 12:21 |
sean-k-mooney | ill try that quickly and see if it does anything | 12:23 |
gibi | ack | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Follow up for the PCI in placement series https://review.opendev.org/c/openstack/nova/+/855185 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Doc follow up for PCI in placement https://review.opendev.org/c/openstack/nova/+/855186 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Create RequestGroups from InstancePCIRequests https://review.opendev.org/c/openstack/nova/+/852771 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support resource_class and traits in PCI alias https://review.opendev.org/c/openstack/nova/+/853316 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Split PCI pools per PF https://review.opendev.org/c/openstack/nova/+/854440 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Map PCI pools to RP UUIDs https://review.opendev.org/c/openstack/nova/+/854118 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make allocation candidates available for scheduler filters https://review.opendev.org/c/openstack/nova/+/854119 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Filter PCI pools based on Placement allocation https://review.opendev.org/c/openstack/nova/+/854120 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Factor out base class for candidate aware filters https://review.opendev.org/c/openstack/nova/+/854929 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Store allocated RP in InstancePCIRequest https://review.opendev.org/c/openstack/nova/+/854121 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Func test for PCI in placement scheduling https://review.opendev.org/c/openstack/nova/+/854122 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support cold migrate and resize with PCI tracking in placement https://review.opendev.org/c/openstack/nova/+/854247 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support evacuate with PCI in placement https://review.opendev.org/c/openstack/nova/+/854615 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support unshelve with PCI in placement https://review.opendev.org/c/openstack/nova/+/854616 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support same host resize with PCI in placement https://review.opendev.org/c/openstack/nova/+/854441 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Test reschedule with PCI in placement https://review.opendev.org/c/openstack/nova/+/854626 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Test multi create with PCI in placement https://review.opendev.org/c/openstack/nova/+/854663 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow enabling PCI scheduling in Placement https://review.opendev.org/c/openstack/nova/+/854924 | 12:25 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename _to_device_spec_conf to _to_list_of_json_str https://review.opendev.org/c/openstack/nova/+/855648 | 12:25 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce PCI pool filtering bug https://review.opendev.org/c/openstack/nova/+/855649 | 12:25 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Strictly follow placement allocation during PCI claim https://review.opendev.org/c/openstack/nova/+/855650 | 12:25 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Follow up for the PCI in placement series https://review.opendev.org/c/openstack/nova/+/855654 | 12:25 |
bauzas | sorry folks, I was taxiing again kids and all the likes | 12:25 |
bauzas | sean-k-mooney: what was your question for me ? | 12:25 |
sean-k-mooney | its not imporant | 12:25 |
sean-k-mooney | we had a minor bugfix for palcment and wanted to know if i can merge it | 12:26 |
sean-k-mooney | after talkign to gibi i +w'd it | 12:26 |
gibi | stephenfin, sean-k-mooney: I have two FUPs for the merged part of the PCI series. One for the code https://review.opendev.org/c/openstack/nova/+/855185 and one for the doc https://review.opendev.org/c/openstack/nova/+/855186 | 12:26 |
sean-k-mooney | gibi: ack ill take a look | 12:27 |
gibi | now they are up to date and on the proper base | 12:27 |
gibi | sean-k-mooney: thanks | 12:27 |
sean-k-mooney | on master | 12:27 |
sean-k-mooney | or on the last patch that is pending in the gate | 12:27 |
gibi | on top of https://review.opendev.org/c/openstack/nova/+/853835 which is being merged | 12:27 |
sean-k-mooney | ack | 12:27 |
sean-k-mooney | gibi: frickler ok that works ill push a patch shortly | 12:40 |
frickler | sean-k-mooney: great, did you confirm with older prettytable, too? | 12:42 |
sean-k-mooney | yep | 12:42 |
sean-k-mooney | i have not run all tests to see if it fixes all of them yet | 12:42 |
sean-k-mooney | but that one test now passes | 12:43 |
sean-k-mooney | so just adding the header alignemnt seames to work | 12:43 |
frickler | o.k., nice | 12:43 |
gibi | sean-k-mooney: sounds good | 12:45 |
opendevreview | sean mooney proposed openstack/nova master: add header alingment for PrettyTable 3.4.0 https://review.opendev.org/c/openstack/nova/+/855658 | 12:46 |
sean-k-mooney | ill run the full test now locally | 12:46 |
opendevreview | Balazs Gibizer proposed openstack/nova-specs master: Update the PCI in placement spec https://review.opendev.org/c/openstack/nova-specs/+/855218 | 12:46 |
gibi | sean-k-mooney, stephenfin: ^^ and this is the update of the Zed spec that sync the spec with the implementation | 12:47 |
sean-k-mooney | ok that passed unit test localy so it should be good | 12:53 |
sean-k-mooney | gibi: any prefernce on what i review first | 12:54 |
sean-k-mooney | gibi: i was goign to look at the code tehn the spec | 12:54 |
gibi | sean-k-mooney: any order is OK | 12:54 |
*** dasm|off is now known as dasm | 13:00 | |
*** carloss|afk is now known as carloss | 13:03 | |
sean-k-mooney | stephenfin: one question i gues for you in https://review.opendev.org/c/openstack/nova/+/855186 or gibi | 13:06 |
sean-k-mooney | gibi: but the two followup look ok to me | 13:06 |
gibi | heh, that is a nice question | 13:06 |
gibi | I tried many ways to combine them and get a good looking result in the html | 13:06 |
sean-k-mooney | oh i guess i should look at that | 13:06 |
gibi | the current source produces the best html result for me | 13:06 |
gibi | but I'm curious if stephenfin has a better one | 13:07 |
gibi | for me reSt can be mistery sometimes | 13:07 |
opendevreview | Balazs Gibizer proposed openstack/nova-specs master: Re-propose PCI Device Tracking In Placement for A https://review.opendev.org/c/openstack/nova-specs/+/855661 | 13:09 |
sean-k-mooney | it looks alreight i guess https://717a5ffd6a01c6a737e0-3e7434681f5fe092d7172e763d28f5ee.ssl.cf5.rackcdn.com/855186/3/check/openstack-tox-docs/1737365/docs/admin/pci-passthrough.html | 13:09 |
sean-k-mooney | gibi: im going to delegate ot stepehn on this | 13:10 |
gibi | sure | 13:11 |
gibi | if there is a better way then I will change | 13:11 |
sean-k-mooney | i actully tough thte render was goign to be more dramatic then it is | 13:12 |
sean-k-mooney | like a note or imporant | 13:12 |
sean-k-mooney | this is pretty subtle | 13:12 |
bauzas | gibi: sorry if you felt abandoned for your PCI series | 13:12 |
sean-k-mooney | so i dont mind the duplicates as much | 13:12 |
bauzas | I didn't had time to look at all the merged changes | 13:12 |
opendevreview | Merged openstack/nova master: Generate request_id for Flavor based InstancePCIRequest https://review.opendev.org/c/openstack/nova/+/853835 | 13:12 |
sean-k-mooney | bauzas: you can make it up to gibi by looking at the unmerged once after RC1 :P | 13:13 |
gibi | bauzas: no worries, sean-k-mooney and stephenfin did an superb job providing feedback and reviews | 13:13 |
gibi | this was a lot of code (the whole series is over 8KLOC) and we merged a substantial and meaningful part of it | 13:16 |
sean-k-mooney | and fixed some latent bugs | 13:16 |
sean-k-mooney | hard to find ones at that | 13:16 |
sean-k-mooney | specificaly that decorator eating the excption | 13:17 |
gibi | yes, I was able to pull out some bugs in the process too. that was nice | 13:17 |
gibi | so while I'm sad that we could not land the whole thing in Zed, I think we made a good progress on something that was on our plate at least since 2019 | 13:17 |
sean-k-mooney | only 2019 | 13:18 |
sean-k-mooney | pci in placment is more or less why we started nested resouce providers | 13:18 |
gibi | I remember we whiteboarded it in China | 13:18 |
gibi | the rest is fading in my memory | 13:18 |
gibi | but sure there is a lot of history | 13:18 |
sean-k-mooney | yep | 13:19 |
sean-k-mooney | which make it niceer to see it finally making good progress | 13:19 |
sean-k-mooney | im happy with where we got to this cycle | 13:19 |
gibi | I afraid we will lose some momentum due to downstream priorities but I also hope I can land the existing patches in AA at least | 13:22 |
gibi | btw, I fixed the last kown bug of the scheduling part. | 13:22 |
sean-k-mooney | gibi: did you see the patch that bauzas shared yesterday about the propsoed release schdule | 13:22 |
sean-k-mooney | it also look like AA will be quite short | 13:23 |
gibi | sean-k-mooney: I saw | 13:23 |
gibi | I think we should focus on finishing in AA what we started in Z. like manial support, image encryption, PCI | 13:24 |
gibi | and commit only to a small number of small new features | 13:24 |
sean-k-mooney | yep same i was thinking that likely means that we should defer the neturon part to BB | 13:24 |
gibi | I will have no time to do significant work on the neutron part in AA timeframe | 13:25 |
gibi | so I agree | 13:25 |
sean-k-mooney | basiclaly this will need to get finsihed by mid decemerb | 13:25 |
gibi | jeez that is closer that I thought | 13:25 |
sean-k-mooney | ya so m2 is proposed as jan 4th | 13:25 |
sean-k-mooney | but i dont think we will have enough pepopel to merge stuff after decemebr 10th | 13:25 |
sean-k-mooney | many a littl latere | 13:26 |
sean-k-mooney | then FF woudl be febuary 14 | 13:26 |
sean-k-mooney | so about 6 week of time after the winder break | 13:26 |
sean-k-mooney | gibi: ideally i would hope we could merge the rest of the pci seriese by the end of septemeber or early october | 13:26 |
sean-k-mooney | but we will have to see how RC1 goes | 13:27 |
gibi | it is feature complete today :) (but to be honest how I store a list of RP UUIDs in InstancePCIRequest.extra_info might need a rework) | 13:28 |
sean-k-mooney | i have not looked at that bit yet | 13:28 |
gibi | https://review.opendev.org/c/openstack/nova/+/854121/13/nova/compute/utils.py#1537 | 13:29 |
sean-k-mooney | i was expecting the uuid in the device extra info column | 13:29 |
sean-k-mooney | but only one | 13:29 |
gibi | extra_infor is a dict of strings | 13:29 |
gibi | and I needed to store a list of UUIDs | 13:29 |
gibi | so I serialized /o\ | 13:29 |
gibi | one InstancePCIRequest might be fulfilled from a list of RPs | 13:29 |
gibi | due to count > 1 | 13:30 |
gibi | we need the mapping _before_ the PciDevice object is allocated to drive the selection | 13:30 |
sean-k-mooney | hum ill have to think about htat | 13:30 |
gibi | sure | 13:30 |
sean-k-mooney | i was epecting to be able to do a 1 :1 mappign of each request object to one rp uuid from the allocation | 13:31 |
sean-k-mooney | since we are going to split the flavor based allcoation into multipel request objects | 13:31 |
gibi | request group - RP is in 1:1 but I did not split InstancePCIRequest objects jut the RequestGroups | 13:31 |
sean-k-mooney | we shoudl split the object too i think | 13:31 |
sean-k-mooney | that way you can directly map each rest object to the group and allocation | 13:32 |
sean-k-mooney | is there a downside to doing that? at least for new code path | 13:32 |
sean-k-mooney | you dont need to answer that now just woth thinking about | 13:33 |
gibi | the stats module does many things per InstancePCIRequest today | 13:33 |
gibi | but it already handles that a single instance has multiple requests | 13:34 |
sean-k-mooney | yep it need to because of neutron and the fact we can have multiple differnt alaises | 13:34 |
sean-k-mooney | i think we can simplfy the code and basicaly alway have a count of 1 | 13:34 |
gibi | yeah , if we can split, then the count filed become unused | 13:35 |
sean-k-mooney | i think thats ok | 13:35 |
gibi | and some logic where we loop on request and the loop on count can be refactored to a single loop | 13:35 |
sean-k-mooney | we can drop it in a future version if we rev the object major version | 13:35 |
gibi | the only thing where we have to be careful is code that might did some affinity or block based decision based on a single request with count > 1 | 13:36 |
gibi | instead of doing it device by device | 13:37 |
gibi | the filter_pools code need to be checked | 13:37 |
sean-k-mooney | gibi: well it was ment to be per device today | 13:37 |
sean-k-mooney | so if it was per alias that would be a bug | 13:38 |
sean-k-mooney | i think you noted that the request could be fullfiled form differnt pools already | 13:38 |
sean-k-mooney | so hopefuly that all works | 13:38 |
sean-k-mooney | but soudn like more functional test can prove that one way or another | 13:38 |
gibi | yeah, I'm not afraid of the actual consumption part as that already needs to work per device as we can have multiple RP uuid per PCIreq in the current series | 13:40 |
gibi | this was my last fix btw ^^ | 13:40 |
sean-k-mooney | ah ok | 13:41 |
sean-k-mooney | that was the bug | 13:41 |
frickler | gibi: sean-k-mooney: prettytable 3.4.1 was just releases which reverts the broken change. nova tests work fine for me with that locally. does one of you want to propose the exclude in reqs? | 13:41 |
gibi | frickler: if you already at it then feel free to propose the exclude | 13:41 |
sean-k-mooney | frickler: should we proceed with hardeing the nova code anyway | 13:41 |
sean-k-mooney | my patch works with 3.4.1 too it seams | 13:42 |
frickler | can't hurt to be on the safe side, then, I'd say | 13:43 |
sean-k-mooney | ok ill keep it open and backport it to yoga so | 13:43 |
sean-k-mooney | at least that way if a disto hits this issue or they make the chagne in 4.0 we will be fine | 13:44 |
frickler | now I only need to find out the path it get's pulled in, since it doesn't seem to be a direct dependency | 13:48 |
sean-k-mooney | frickler: PrettyTable ? | 13:52 |
sean-k-mooney | nova depend on it directly but i guess we might not list it | 13:53 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/requirements.txt#L18 | 13:53 |
sean-k-mooney | its there | 13:53 |
bauzas | sean-k-mooney: I checked all the libs and we don't need a new release | 13:53 |
bauzas | did I miss one ? | 13:54 |
sean-k-mooney | we may want anotther release of python-novaclint | 13:54 |
sean-k-mooney | brb | 13:54 |
sean-k-mooney | if we merge some pending patches | 13:54 |
opendevreview | Slawek Kaplonski proposed openstack/nova master: WIP Don't provide MTU value in metadata service if DHCP is enabled https://review.opendev.org/c/openstack/nova/+/855664 | 14:02 |
frickler | sean-k-mooney: ah, CaseMismatch in my search, thx | 14:04 |
sean-k-mooney | gibi: i may hae missed something but https://review.opendev.org/c/openstack/nova-specs/+/855218 looks good over all a cople of nits inline | 14:08 |
sean-k-mooney | im gong to read it again quickly | 14:08 |
gibi | thanks I will fix the nits a bit later todayt | 14:09 |
bauzas | fwiw, I'm not using the -2 hammer yet | 14:10 |
bauzas | for all the open changes | 14:10 |
bauzas | I'll do it only on Tuesday | 14:11 |
sean-k-mooney | unless there is anything urgnet im going to take a break form lookign at upstream reviews and work on some automation | 14:13 |
* bauzas works on the cycle highlights fwiw | 14:15 | |
bauzas | => goes getting his kid from school | 14:15 |
*** johnthetubaguy[m] is now known as johnthetubaguy | 16:14 | |
gibi | I think lockutils.synchronized(...fair=True) + eventlet.spawn_n() + fastener > 0.15 actually breaks nova proper. Not just the test code that we fixed in https://review.opendev.org/c/openstack/nova/+/813114 | 16:34 |
gibi | here is a minimal reproductionhttps://gist.github.com/gibizer/9051369e67fd46a20d52963dac534852 | 16:34 |
gibi | https://gist.github.com/gibizer/9051369e67fd46a20d52963dac534852 | 16:34 |
* sean-k-mooney clicks | 16:34 | |
gibi | the realization came when I looked at the logs in https://bugs.launchpad.net/nova/+bug/1988311// | 16:35 |
gibi | those logs shows that two rebuild_claim can take the same lock twice | 16:35 |
sean-k-mooney | isnt it an instantace lock | 16:36 |
sean-k-mooney | or is it the rt lock | 16:36 |
sean-k-mooney | it must be the rt lock actully | 16:37 |
gibi | it is the rt lock | 16:37 |
gibi | https://github.com/openstack/nova/blob/8b55b44cc605533f2a12189a2b5899c0f58c91a7/nova/compute/resource_tracker.py#L201-L202 | 16:37 |
sean-k-mooney | i havent looked in deail but the lock name would be the same | 16:38 |
sean-k-mooney | i assuem you are loking at someting in the outpu specificaly | 16:38 |
gibi | yes it is compute_resources | 16:38 |
gibi | https://bugs.launchpad.net/nova/+bug/1988311/comments/3 | 16:38 |
sean-k-mooney | that shows they are taking the same lock wtich | 16:38 |
gibi | yepp | 16:38 |
sean-k-mooney | are we context switch between the two coroutines inside the critical section under the lock? | 16:39 |
sean-k-mooney | and there for data racing? | 16:39 |
gibi | the bug is written as two pinned VM evacuated and ended up selecting overlapping cpus | 16:40 |
sean-k-mooney | yep so unlike pci devices we dont enforce that in the db | 16:41 |
sean-k-mooney | the only protection we have is the rt lock | 16:41 |
sean-k-mooney | to ensure we claim the cpus and update the host numa toplogy blob | 16:41 |
sean-k-mooney | then we regrenrate that over time based on the instance numa toplogy blob in the perodic | 16:42 |
sean-k-mooney | so if this lock is broken its very posible for that to break and not be able to fix itslef | 16:42 |
sean-k-mooney | it need to not only prorect against the concurrent evacuate btu also the preiodic running | 16:43 |
gibi | yes | 16:43 |
gibi | and we have the rt lock around many actions | 16:43 |
sean-k-mooney | have you treid https://review.opendev.org/c/openstack/nova/+/842359/5/nova/monkey_patch.py | 16:47 |
sean-k-mooney | eventlet.spawn_n = eventlet.spawn | 16:47 |
sean-k-mooney | try: | 16:47 |
sean-k-mooney | import eventlet.convenient | 16:48 |
sean-k-mooney | eventlet.convenient.spawn_n = eventlet.spawn | 16:48 |
sean-k-mooney | except ImportError: | 16:48 |
sean-k-mooney | pass | 16:48 |
sean-k-mooney | to see if that fixes the reproducer | 16:48 |
gibi | If I change eventlet.spawn_n to eventlet.spawn in https://gist.github.com/gibizer/9051369e67fd46a20d52963dac534852 the the locking works | 16:49 |
gibi | melwitt has a very good thread in https://github.com/eventlet/eventlet/issues/731 about the whole picture | 16:49 |
sean-k-mooney | right but does chanign it like that work | 16:49 |
sean-k-mooney | since that is how we tried to do that in nova | 16:49 |
sean-k-mooney | im wondering if my patch woudl fix the issue basically | 16:50 |
gibi | based on the last comment in the eventlet issue we cannot simply replace spawn_n with spawn as eventlet calls spawn_n internall too | 16:50 |
gibi | https://github.com/eventlet/eventlet/issues/731#issuecomment-969891721 | 16:51 |
gibi | https://github.com/eventlet/eventlet/blob/v0.32.0/eventlet/green/thread.py#L72 | 16:52 |
sean-k-mooney | https://github.com/eventlet/eventlet/issues/731#issuecomment-968135262 said we shoudl be able too | 16:53 |
sean-k-mooney | and when i did it it did not break anything | 16:53 |
sean-k-mooney | i know that melwitt noted a delta in some fo the behaivor | 16:54 |
sean-k-mooney | but i dont think that caused any issues for our usage | 16:54 |
gibi | we dont have test coverage to detect the break, we have this broken locking for the last 7 month I think | 16:56 |
gibi | so we actually don't know if replaceing spawn_n with spawn in nova and keeping spawn_n internally is enough | 16:56 |
sean-k-mooney | well it passed tempest and our func/unit test with the replacment | 16:56 |
gibi | we are passing tempest today | 16:56 |
gibi | with the broken lock | 16:56 |
sean-k-mooney | yep | 16:56 |
sean-k-mooney | but what im saying is that if that fixes your repoducer | 16:57 |
gibi | so we have no information if the replacement actually fixed the problem or not | 16:57 |
sean-k-mooney | then it likely will fix the issue | 16:57 |
sean-k-mooney | doing the repacement the same way in yoru standalone repoducer does not help? | 16:57 |
gibi | it does but my reproducer does not use the Threading.thread way the last comment in the issue mentions | 16:58 |
sean-k-mooney | right but does the lock | 16:58 |
gibi | and we know from melwitt that someting under nova uses Threading.thread | 16:58 |
sean-k-mooney | that should not matter by the way | 16:58 |
sean-k-mooney | even if it deoes the global replacment should mean that provided that call did not happen before we monkey patched | 16:58 |
sean-k-mooney | it should get our replaced version | 16:59 |
gibi | nope | 16:59 |
gibi | the internall call does greenlet.spawn_n | 16:59 |
gibi | I think that is not even replaceble as it is a c extension | 16:59 |
sean-k-mooney | oh well we can patch that too | 16:59 |
sean-k-mooney | oh | 16:59 |
sean-k-mooney | if its c then no | 16:59 |
sean-k-mooney | but we dont uses threading.thread normally | 17:00 |
sean-k-mooney | so while it might not fix every case it might fix the case we care about | 17:00 |
gibi | melwitt found someting that uses | 17:00 |
sean-k-mooney | teh libvirt thread | 17:00 |
gibi | what the rpc worker use? | 17:01 |
gibi | how we spawn the rpc workers listening on rabbit? | 17:01 |
sean-k-mooney | whell there are only two thread i can think of that might use it | 17:01 |
sean-k-mooney | the libvirt one and the heathbeat | 17:01 |
sean-k-mooney | if we use a pthred | 17:01 |
sean-k-mooney | im not sure we are using one for rpc explictly but we might be | 17:02 |
gibi | note that threading.thread issue path is problematic when it is monkey patched | 17:02 |
gibi | so when threading.thread actually patched to create an eventlet | 17:02 |
gibi | I can try tracing our rpc eventlet to see if it is created from spawn_n or spawn | 17:03 |
gibi | I guess it is coming from oslo.messaging somehow | 17:03 |
sean-k-mooney | well we use it in a few places | 17:03 |
sean-k-mooney | https://github.com/openstack/nova/blob/18d9c85aa4cbdbc471c6c7916ca6f1367c7ab4e5/nova/virt/libvirt/host.py#L491 | 17:04 |
sean-k-mooney | https://github.com/openstack/nova/blob/18d9c85aa4cbdbc471c6c7916ca6f1367c7ab4e5/nova/virt/hyperv/serialproxy.py#L101 | 17:04 |
sean-k-mooney | i was going to use it for the healthchecks | 17:05 |
gibi | native threading is OK that is real python Thread | 17:05 |
gibi | https://github.com/openstack/nova/blob/18d9c85aa4cbdbc471c6c7916ca6f1367c7ab4e5/nova/virt/hyperv/serialproxy.py#L101 <-- this can be a problem though | 17:05 |
gibi | hm it is native too https://github.com/openstack/nova/blob/18d9c85aa4cbdbc471c6c7916ca6f1367c7ab4e5/nova/virt/hyperv/serialproxy.py#L32 | 17:06 |
gibi | so this two places creates a real python thread that is probably OK from this issue perspective | 17:06 |
sean-k-mooney | so all other usages today are in libs | 17:07 |
sean-k-mooney | oslo.messaging, ovsdbapp (via os-vif), os-brick, and proably oslo.db | 17:08 |
sean-k-mooney | if i was to guess | 17:08 |
sean-k-mooney | https://github.com/openstack/os-brick/blob/4acfd6bc7e7e816ce8e9d5ac59cfc0f6e5e816f4/os_brick/executor.py#L71 | 17:08 |
gibi | the rabbit driver has threading but I'm not sure if that is always native https://github.com/openstack/oslo.messaging/blob/e44f286ebca0fbde5eae2f7eb9a21ba55ba2a549/oslo_messaging/_drivers/impl_rabbit.py | 17:08 |
sean-k-mooney | gibi: its not we monkey patch before we import it | 17:09 |
gibi | OK so it might or might not native | 17:09 |
gibi | https://github.com/openstack/oslo.messaging/blob/e44f286ebca0fbde5eae2f7eb9a21ba55ba2a549/oslo_messaging/_drivers/impl_rabbit.py#L607 | 17:09 |
sean-k-mooney | its sill be a GreenThread unless we disable patching threads | 17:09 |
sean-k-mooney | we currently mokeypatch every nova service | 17:10 |
sean-k-mooney | and that will happen very early | 17:10 |
gibi | OK, so if rabbit driver uses threading and it is monkeypatched then our rpc worker spawned by spawn_n and effect by the bug | 17:11 |
gibi | even if we replace eventlet.spawn_n | 17:11 |
gibi | I cannot do a reproduction in the functional as it does not use the rabbit driver | 17:11 |
gibi | so it won't be indicative | 17:11 |
gibi | doing a tempest reproducer would be a pita as the reproducer needs timing | 17:12 |
sean-k-mooney | we will need to re MonkeyPatch threading.Thread then | 17:12 |
sean-k-mooney | to make it not call greenlet.spwan_n | 17:13 |
sean-k-mooney | or change how ew do locking | 17:13 |
sean-k-mooney | do you knwo how/why it migh not be locking correctly | 17:13 |
gibi | or we need to do this globally https://review.opendev.org/c/openstack/nova/+/813114/4/nova/tests/fixtures/nova.py#425 | 17:13 |
gibi | fastener changed how it get the current thread | 17:14 |
gibi | and getting it from a thread created by spawn_n actaully returns the native thread name not the eventlet id | 17:14 |
gibi | therefore fasteners sees to different eventelt as same | 17:14 |
gibi | and allow reentry to the lock | 17:15 |
sean-k-mooney | well that we can do | 17:15 |
sean-k-mooney | you should be able to test that in your repoducer | 17:15 |
gibi | the downside of the globlack monekeypatch that it might effect the native threads | 17:15 |
sean-k-mooney | that can un monky patch it | 17:16 |
sean-k-mooney | we can save the reference to the ortinal fucniton globally | 17:16 |
sean-k-mooney | adn restore it with a context manager when launching the native thread | 17:16 |
gibi | basically all the native thread user in our under nova needs to be aware of it | 17:17 |
gibi | *in or | 17:17 |
gibi | we can sure fix nova native threads | 17:17 |
gibi | I'm not sure we can fix native threading in all libs used by nova | 17:17 |
sean-k-mooney | well ok | 17:17 |
sean-k-mooney | we dont need to do it globally | 17:17 |
sean-k-mooney | we just need the lock to use it | 17:18 |
sean-k-mooney | so we only need to do that for oslo | 17:18 |
gibi | so we wrap the lock with a local mocking | 17:18 |
sean-k-mooney | basically | 17:18 |
sean-k-mooney | yes | 17:18 |
gibi | if we not pass the lock to libs then that could work | 17:18 |
sean-k-mooney | i dont think we do | 17:18 |
gibi | yeah | 17:18 |
gibi | so backtrack a bit | 17:18 |
sean-k-mooney | we also alrady have nova util fucntion for locks already | 17:19 |
gibi | how can we test it in a way that i) uses the real things like rabbit driver ii) | 17:20 |
sean-k-mooney | https://github.com/openstack/nova/blob/4ca795536521f5e3d65d44b4de63c25294a5e0c4/nova/utils.py#L63 | 17:20 |
gibi | allows hacking timing to reproduce the problem | 17:20 |
sean-k-mooney | we just need to use threading.Thread right | 17:20 |
sean-k-mooney | or | 17:22 |
gibi | basically we need to make sure threading.current_thread points to eventlet.getcurrent if called in a patched thread | 17:22 |
sean-k-mooney | you could use my unix socket impl | 17:22 |
sean-k-mooney | ctully maybe not | 17:23 |
gibi | back to testing, I can probably add sleeps to nova running in devstack to simulat the timing but that is not something we can merge as a test | 17:23 |
gibi | ahh, actually it is haard. as I need timing right in scheduler and in the compute too | 17:24 |
gibi | the scheduler needs to run the two evac parallel enough to select the same host | 17:24 |
sean-k-mooney | ya i dint know if https://review.opendev.org/c/openstack/oslo.messaging/+/841892/4/oslo_messaging/tests/functional/notify/test_unix_socket.py will be able to trigger it | 17:24 |
gibi | I guess this does not use the rabbit driver for notifications | 17:25 |
sean-k-mooney | its my unix socket one | 17:25 |
sean-k-mooney | its using either the eventlet or threadpool executor | 17:26 |
gibi | yeah so with that I can prove that your unix impl is fixed | 17:26 |
sean-k-mooney | but i dont know if its broken | 17:26 |
gibi | true :) | 17:26 |
sean-k-mooney | https://review.opendev.org/c/openstack/oslo.messaging/+/841892/4/oslo_messaging/notify/_impl_unix_socket.py#100 | 17:26 |
sean-k-mooney | so its dynamicaly getting the exectuor and i think the futurist | 17:27 |
sean-k-mooney | threadpool executor is using threading.Thread internally | 17:27 |
sean-k-mooney | but its a streach | 17:27 |
gibi | I have to drop soon so I will sleep on this | 17:27 |
gibi | but fixing this is probalby an RC blocker | 17:27 |
sean-k-mooney | i think the best way to proceed is with a syntetic fix | 17:28 |
sean-k-mooney | import synconise for nova utiles | 17:28 |
sean-k-mooney | and tweak it until it works | 17:28 |
sean-k-mooney | using the simpler repoducer you were creating | 17:28 |
sean-k-mooney | https://github.com/openstack/futurist/blob/master/futurist/_thread.py#L43 | 17:29 |
gibi | we can test the fix with the simple repro I have, what we cannot test that such fix is applied to every places we need it in nova :) | 17:29 |
sean-k-mooney | well i was thinkign of fixing it in oslo eventurlly | 17:29 |
sean-k-mooney | just for that lock | 17:29 |
gibi | that is better | 17:29 |
gibi | we can assume everything uses lock from oslo | 17:30 |
gibi | at least within core openstack | 17:30 |
sean-k-mooney | yep | 17:30 |
sean-k-mooney | they should | 17:30 |
sean-k-mooney | so when my unix socket driver is runing with the trehad execurftor it is using threading.Thread | 17:30 |
gibi | ack that is a way forward | 17:30 |
sean-k-mooney | but its not currently locking | 17:30 |
sean-k-mooney | so i dont think that helps | 17:30 |
gibi | I will summarise what we have in the bug and target the bug to oslo too | 17:31 |
sean-k-mooney | but i think we could adapt your fucn test into and oslo.synconisation fucn test | 17:31 |
sean-k-mooney | for the lock | 17:31 |
sean-k-mooney | sorry oslo.concurrency func test | 17:32 |
gibi | yes, I think so too | 17:34 |
opendevreview | Merged openstack/placement master: Fix typo in schema https://review.opendev.org/c/openstack/placement/+/849348 | 17:34 |
gibi | bauzas: I think this is an RC blocker https://bugs.launchpad.net/oslo.concurrency/+bug/1988311 I tagged it but it seem we don't have official zed-rc-potential tag yet | 17:39 |
gibi | sean-k-mooney: an alternative fix is to roll back to fasteners < 0.15.0 | 17:45 |
gibi | but I'm not sure that is viable in the whole core openstack | 17:45 |
melwitt | are yall considering doing sean-k-mooney's patch to monkey patch all of our spawn_n with spawn? | 17:53 |
melwitt | nvm, reading backscroll and saw mention | 17:55 |
gibi | melwitt: o/ when you found https://github.com/eventlet/eventlet/issues/731#issuecomment-969891721 where the thread.Thread was called? | 17:56 |
melwitt | gibi: that isn't what I found directly and tbh I need to trace it again in case I made a mistake, but what I found in oslo.messaging is that *it* uses spawn_n in eventlet mode. but I think sean's patch would solve that right? | 17:57 |
gibi | we need to try probably | 17:57 |
gibi | it depends how oslo.messaging actaully calls spawn_n | 17:58 |
melwitt | doing s/spawn_n/spawn/ in our code wouldn't be enough but I think monkey patching the whole thing should be if I'm not missing something. let me see if I can find a link real quick | 17:58 |
gibi | melwitt: no need to rush, my brain is already toasted :) | 18:00 |
melwitt | urgh, I remembering now that it was more complicated than that. I'll try to find where I saw thread.Thread. I wish I had put code links in the comment | 18:00 |
melwitt | hehe ok | 18:01 |
melwitt | I'll find whatever it was and add comment on the bug | 18:04 |
gibi | thank you | 18:07 |
gibi | your original eventlet issue thread was a really good information source already. thank you for that too | 18:08 |
*** dasm is now known as dasm|off | 21:32 | |
melwitt | gibi, sean-k-mooney, bauzas: just fyi monday is a holiday in the US and I will be back tuesday | 23:29 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!