opendevreview | Amit Uniyal proposed openstack/nova master: Added context manager for instance lock https://review.opendev.org/c/openstack/nova/+/873648 | 05:43 |
---|---|---|
opendevreview | Amit Uniyal proposed openstack/nova master: Separate OSError with ValueError https://review.opendev.org/c/openstack/nova/+/908825 | 05:43 |
opendevreview | Amit Uniyal proposed openstack/nova master: Disconnecting volume from the compute host https://review.opendev.org/c/openstack/nova/+/877446 | 05:43 |
opendevreview | Amit Uniyal proposed openstack/nova master: Removed explicit call to delete attachment https://review.opendev.org/c/openstack/nova/+/891289 | 05:43 |
opendevreview | Amit Uniyal proposed openstack/nova master: Separate OSError with ValueError https://review.opendev.org/c/openstack/nova/+/908825 | 08:47 |
opendevreview | Amit Uniyal proposed openstack/nova master: Disconnecting volume from the compute host https://review.opendev.org/c/openstack/nova/+/877446 | 08:47 |
opendevreview | Amit Uniyal proposed openstack/nova master: Removed explicit call to delete attachment https://review.opendev.org/c/openstack/nova/+/891289 | 08:48 |
tkajinam | sean-k-mooney, hi. thanks for your feedback in libvirt related patches ! I've replied to your comment in https://review.opendev.org/c/openstack/nova/+/908546/ so please check it when you have time | 08:49 |
tkajinam | in short, swtpm_ioctl has been required since swtpm support was initially added to libvirt. | 08:50 |
opendevreview | Merged openstack/nova master: libvirt: Stop unconditionally enabling evmcs https://review.opendev.org/c/openstack/nova/+/899776 | 10:22 |
opendevreview | sean mooney proposed openstack/nova stable/2023.2: libvirt: Stop unconditionally enabling evmcs https://review.opendev.org/c/openstack/nova/+/909082 | 11:12 |
sean-k-mooney[m] | elodilles: gibi ^ can we start backporting that now that it finally merged on master. this need to go to 2023.1 to fix windows on amd | 11:13 |
sean-k-mooney[m] | there is one other releated patch for live migration but i belive that is still not merged on master | 11:13 |
sean-k-mooney[m] | it can be backported seperatly | 11:14 |
gibi | sean-k-mooney[m]: thanks for the backport, +2 | 11:16 |
artom | sean-k-mooney[m], oh you're doing the enlightenments backports? | 11:20 |
sean-k-mooney | artom: well it merged os i just did it via the ui | 11:52 |
sean-k-mooney | ill let you actully do the other one or respin it | 11:53 |
sean-k-mooney | artom: do you have the linke to the live migration patch by the way | 11:54 |
artom | sean-k-mooney, https://review.opendev.org/c/openstack/nova/+/904183?usp=email | 11:54 |
artom | sean-k-mooney, so I'm doing from bobcat to antelope, and the downstream cherry-picl? | 11:54 |
sean-k-mooney | artom: yes please | 11:55 |
sean-k-mooney | and can you backport the other one too | 11:55 |
opendevreview | Takashi Kajinami proposed openstack/os-traits master: Add traits for TPM models https://review.opendev.org/c/openstack/os-traits/+/909107 | 11:57 |
artom | sean-k-mooney, it didn't merge yet | 11:57 |
artom | But maybe we don't care, because it has +A | 11:57 |
sean-k-mooney | yes i know. we do care for the cherry picked form line i just ment when its ready can you do it and ping me | 12:00 |
sean-k-mooney | if you want to prepare them local youc an but hopefuly it will merge later today | 12:00 |
elodilles | sean-k-mooney[m]: i'm not super confident about that backport, so left a question in my comment there | 12:45 |
sean-k-mooney | elodilles: ack, right now you cant boot windows vms on amd host becuase of a reguression intoduced by enableing that feature | 12:46 |
sean-k-mooney | so we really need to undo that | 12:46 |
sean-k-mooney | live migration is also broken for a simialr reason | 12:46 |
elodilles | sean-k-mooney: ACK i understand that, but what about with intel hosts? we might remove a feature that is already used by some operators? | 12:46 |
sean-k-mooney | yep that hsoudl only reduce performace in some edge cases where the enlightnemtn was used | 12:48 |
elodilles | and yepp, probably to fix this to work both AMD and Intel hosts has higher prio, i'm just asking whether this is the right way | 12:48 |
sean-k-mooney | to be clear i would have consider this a criticla bug requireing a new RC if we had found it before the release | 12:48 |
sean-k-mooney | elodilles: so to reenable this for intel hosts we will need a trait and a new falvor/image property | 12:49 |
sean-k-mooney | because it depend on the hardware vendor of the underlying host | 12:50 |
sean-k-mooney | we cant just enable this via the exissting OS_TYPE=windows | 12:50 |
sean-k-mooney | it will need to be requested explictly to safely enable and supprot live migration as we will have to schdule on it | 12:50 |
elodilles | ACK, then i also tend to agree on backporting this. but to be on the safe side, i let other stable cores to also share their opinions on the patch o:) | 12:51 |
sean-k-mooney | elodilles: what will happen with the backport is existing vms will keep the enlightnement until they are hard rebooted, cold migrated, shelved or rebuilt | 12:51 |
elodilles | is this already a bug since 2023.1 Antelope? :-o | 12:55 |
sean-k-mooney | yep... | 12:56 |
sean-k-mooney | i was kind of shocked it took so long for it to be reported | 12:56 |
sean-k-mooney | well i gues it didnt take that long to be reported but it did take a long time to get this fixed | 12:57 |
sean-k-mooney | i moved it to high the second i realsised the impact and confirmed the limitation | 12:58 |
elodilles | i guess not many operators have windows guests on AMD hosts :-o | 12:58 |
sean-k-mooney | the workaround is just dont set OS_TYPE=windows on the image | 12:59 |
sean-k-mooney | but that disables all the enlightenment | 12:59 |
sean-k-mooney | and that reduces performance a lot | 12:59 |
sean-k-mooney | having them eaislly doubles the guest performacne and more in some workloads | 12:59 |
sean-k-mooney | not that one specificaly jsut in general | 13:00 |
sean-k-mooney | so truning off all the windows enlightenments is really not a compeling choice | 13:00 |
sean-k-mooney | elodilles: https://bugs.launchpad.net/nova/+bug/2046549 is more shocking to me | 13:04 |
sean-k-mooney | elodilles: i would have expected operators to be screaming at use for breakign live migration with windows guests | 13:05 |
elodilles | :-o | 13:05 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/904183 is the fix but that is still in ci | 13:06 |
elodilles | so that will be also backported together with the above fix i guess then | 13:08 |
sean-k-mooney | that would be the intent yes | 13:09 |
sean-k-mooney | they can be backported seperatly | 13:09 |
sean-k-mooney | but both are a high prioryt i think to backprot | 13:09 |
sean-k-mooney | i think that will be compelling ot operators to have espeiclly those who have not upgraded to antelope yet | 13:10 |
elodilles | yepp, probably the thing is that 'those' operators have noy yet upgraded (not even to 2023.1 Antelope :S) | 13:16 |
opendevreview | Merged openstack/nova master: libvirt: stop enabling hyperv feature reenlightenment https://review.opendev.org/c/openstack/nova/+/904183 | 13:21 |
*** jph5 is now known as jph | 13:45 | |
sean-k-mooney | artom: ^ fyi that merged on master so you can start the backport of that too now | 14:00 |
opendevreview | Mohammed Naser proposed openstack/nova stable/2023.2: libvirt: stop enabling hyperv feature reenlightenment https://review.opendev.org/c/openstack/nova/+/909086 | 14:20 |
sean-k-mooney | JayF: for that its worth we have mypy in some files in nova | 14:31 |
sean-k-mooney | JayF: so its not all or nothing it can be added gradully | 14:31 |
JayF | I know, and I think it would be cool, but that email the point was more.. do something rather than telling other people to do something 😀 | 14:32 |
sean-k-mooney | when startging new files like for the healthcheck stuff i have inteionally starte with mypy enabeld | 14:32 |
sean-k-mooney | yep | 14:32 |
sean-k-mooney | i am kind of trying to stay out of that thread | 14:32 |
sean-k-mooney | partly because i have been burnt on tryign to do simiear things and i just dont have time right now | 14:32 |
sean-k-mooney | i.e. i would like to moderenize some things and pay down tech debt | 14:33 |
sean-k-mooney | like i really want to see the SQLA2.0 stuff done in manilla so we can move to it | 14:34 |
sean-k-mooney | or get the wsgi/pyproject.yaml stuff doen | 14:34 |
sean-k-mooney | just time... | 14:34 |
sean-k-mooney | JayF: thanks for responding :) | 14:35 |
JayF | I have a friend who works on a major python library who always says (joking) that they're about 100 developers working on infrastructure across all technology. | 14:36 |
JayF | Finding the quote from that paper about OSS contribution that said 97% of value was created by 5% of people... It's weird to see the gist of your validated by research | 14:37 |
JayF | **your joke | 14:37 |
sean-k-mooney | the best jokes are based in reality... | 14:38 |
* sean-k-mooney like centos stream being as stable as rhel... looks at the libvirt package that was broken for 3 weeks | 14:38 | |
sean-k-mooney | i was chatting to one of the developer workign on stading up the infra for centos 10 stream yeasterday and they actully suggested that instead of using centos stream a a rhel next proxy we should be using https://docs.fedoraproject.org/en-US/eln/ | 14:40 |
sean-k-mooney | apprenetly fedora eln is what facebook and amazone use to build adn test there rhel deritivtes | 14:41 |
sean-k-mooney | if we were usign fedroa eln + rocky/almalinux we would ahve better proxies for btoh rhel next and rhel current | 14:42 |
sean-k-mooney | then useing stream gives us today | 14:42 |
opendevreview | Doug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660" https://review.opendev.org/c/openstack/nova/+/909122 | 15:35 |
opendevreview | Doug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660" https://review.opendev.org/c/openstack/nova/+/909122 | 15:45 |
dougszu | As promised sean-k-mooney ^. It's entirely clear to me yet. Will keep digging. | 15:56 |
dougszu | *not | 15:56 |
dansmith | bauzas: I thought you said you were going to remove the dict and use the guest XMLs to determine persistence in this: https://review.opendev.org/c/openstack/nova/+/904209 | 17:15 |
dansmith | did I misunderstand or are we still waiting for that revision/ | 17:15 |
mnaser | I'd appreciate reviews on https://review.opendev.org/c/openstack/nova/+/909082 + https://review.opendev.org/c/openstack/nova/+/909086 -- we'll need to cherry-pick it a bit further down so b enice to merge those in | 17:26 |
bauzas | dansmith : I'm not super opinionated about that, I eventually thought that it wouldn't be a problem if we reserve the mdevs with the dict | 18:00 |
dansmith | using the xmls seems better to me, but I know it's more work | 18:01 |
dansmith | I'm mostly asking because I was expecting to see that change and haven't | 18:01 |
bauzas | Well, we already look at the XMLs | 18:01 |
bauzas | So it should be automatically find the paused guests | 18:02 |
bauzas | So the dict shouldn't be needed | 18:02 |
bauzas | but again, in case libvirt would change it, I'm not sure it would continue to work | 18:03 |
bauzas | I can test it if you want | 18:03 |
dansmith | I think it would be better to not have a thing that can get out of sync, so if it's reasonable to PoC it, I think that's worth it | 18:04 |
sean-k-mooney | mnaser: i tought you might be interested in those | 18:18 |
sean-k-mooney | dansmith: bauzas so im very hesitent to addign even more reliance on the xmls | 18:18 |
sean-k-mooney | i have said this in the past that i woudl not have merged the orginal vgpu code that used the xmls to track what mdevs where in use | 18:18 |
sean-k-mooney | with that said i get dans concern about it getting out of sync | 18:18 |
sean-k-mooney | bauzas: the dict is needed because the pasued guest does not exist in pre-live-migrate | 18:20 |
sean-k-mooney | its only created by libvirt when we start the migration | 18:20 |
sean-k-mooney | so if we dont have the dict pre precreate the domain in pre live migrate on the dest we woudl have a rcace | 18:21 |
sean-k-mooney | so to remove the dict you need to add creating a domin | 18:21 |
sean-k-mooney | you shoudl not create a paused doamin however | 18:21 |
sean-k-mooney | jsut define it and create the approate cleanup codded for revert | 18:22 |
sean-k-mooney | just beause i strongly dislike what we do with the xml does not mean im -2 on replacing the dict with pre defineing a domian to reserve the mdev | 18:22 |
bauzas | sean-k-mooney: the target is automatically created by libvirt | 18:22 |
bauzas | and then paused | 18:23 |
sean-k-mooney | yes but only wehn we call migrate2URL | 18:23 |
bauzas | don't ask me how | 18:23 |
bauzas | sean-k-mooney: yup frow what I saw | 18:23 |
sean-k-mooney | it wont exist in pre-live migrate | 18:23 |
bauzas | indeed | 18:23 |
sean-k-mooney | so you cant jsut remove the dict | 18:23 |
sean-k-mooney | without intoducting a race condition | 18:23 |
bauzas | okay, so your preference is to keep the dict ? fine by me, as I said I'm not opiniated | 18:24 |
bauzas | there are pros and cons | 18:24 |
sean-k-mooney | if you want to remove the dict based on dansmith's feedback then you need to define the domain on the dest instead | 18:24 |
bauzas | not sure it may work | 18:24 |
sean-k-mooney | im ok with using a domian xml to reserve it as long as we evenutally stop using the xmls for this in the long term | 18:24 |
sean-k-mooney | bauzas: you can defien a domain without starting it | 18:24 |
bauzas | maybe the best would be to persist the mdevs in some internal state | 18:24 |
sean-k-mooney | yes it would | 18:24 |
sean-k-mooney | but out of scope of this cycle | 18:25 |
bauzas | so, if dansmith accepts, let's keep it straight as it is | 18:25 |
sean-k-mooney | we could also tack on the use of a doamin as a patch on the end | 18:25 |
sean-k-mooney | again depending on dansmith's prefernce to derisk things | 18:26 |
bauzas | sean-k-mooney: we would need to delete the domain as soon as we call migrate | 18:26 |
sean-k-mooney | no we would not | 18:26 |
sean-k-mooney | libvirt shoudl update it | 18:26 |
bauzas | and there could be residues too | 18:26 |
sean-k-mooney | there would be yes which is why i said you would have to update revert | 18:26 |
dansmith | I don't want to push us to define the domain before we migrate, I just thought bauzas said it was already there by the time we needed to account for the things | 18:26 |
bauzas | honestly, the more I think is to keep the existing | 18:27 |
sean-k-mooney | dansmith: its not we are creating the mdevs and returning the uuid in prelive migrate | 18:27 |
bauzas | dansmith: sean-k-mooney had a concern if we were removing the patch as the domain is only created by libvirt once we call migrate | 18:27 |
dansmith | personally I don't like that we're keeping an in-memory dict which could get out of sync if we have an mq blip, where restarting compute will just make it work again and for which the operator has basically no visibility | 18:27 |
dansmith | the logs help, but... | 18:27 |
sean-k-mooney | then we generate the domain on the souce after that | 18:27 |
bauzas | there is no easy path here | 18:28 |
bauzas | as I said, pros and cons for each of the solutions | 18:28 |
sean-k-mooney | dansmith: ya i would prefer to be trackign mdev in novas db. but i would also prefer to do that next cycle | 18:28 |
dansmith | yeah clearly not something we can do in the short term | 18:28 |
bauzas | I'm just saying that a dict that would be emptied as soon as we restart the compute is probably the simpliest approach | 18:28 |
sean-k-mooney | yep | 18:29 |
dansmith | bauzas: yeah it just sucks to make it fragile and restarts to fix things | 18:29 |
dansmith | but sounds like that's the option for now | 18:29 |
bauzas | dansmith: only if the MQ blip happens on abort, right? | 18:29 |
bauzas | that's only when you have a residue | 18:29 |
dansmith | bauzas: any time between when we record it and would commit/clean | 18:30 |
dansmith | just saying, it feels fragile, like I've _been_ saying, but I know the argument for it | 18:30 |
bauzas | we record it when we pre-livemigrate | 18:30 |
bauzas | then we delete it as soon as the migration is complete or aborted | 18:30 |
dansmith | I understand | 18:31 |
bauzas | if we have a MQ blip in between, then the live-migration could be impacted as well | 18:31 |
bauzas | I said "could" because the migration could be run on different NIC | 18:31 |
bauzas | but what's important is that the message saying "I'm done, please remove it" or "I was aborted, please remove it" needs to arrive | 18:32 |
bauzas | dansmith: I think I proposed a periodic, I'll try to add it on top of my series | 18:33 |
* bauzas needs to leave, I gonna work on the periodic tomorrow | 18:36 | |
dansmith | ack | 18:37 |
* bauzas just needs to consider whether we use a libvirt method which is periodically called or if we want to create a specific one | 18:37 | |
bauzas | but I think we have some reconciliation methods in libvirt for the guests | 18:37 |
bauzas | (I'm sure, at least for the guest states) | 18:37 |
sean-k-mooney | bauzas: either add a new one or do this as part of update_avaiable_resouces | 18:46 |
sean-k-mooney | i dont think there are any other perodics that would make sense | 18:47 |
sean-k-mooney | update_avaiable_resouces takes a resouce tracker lock and already acocunts for migration ectra | 18:47 |
sean-k-mooney | so removign instances form the dict that are not on this host and dont have a migration assocated with the host would be ok i think | 18:47 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!