mikal | Potentially silly question(s), but is there a git plugin or precommit or whatever that will just make me magically DCO compliant with minimal effort? I am assuming that DCO implies GPG signed commits, although I cannot find a web page which clearly states tat -- it might just be some text in the commit message which alleges to come from me? If I | 01:10 |
---|---|---|
mikal | didn't in fact hallucinate the GPG signature requirement, are there minimum requirements for the GPG key used to sign the commit? My GPG key is relatively well connected, but also ancient. | 01:10 |
mikal | Like... it would be cool if "git review" added missing DCO entries like it does missing change ids. | 01:11 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra spec for sound device. https://review.opendev.org/c/openstack/nova/+/926126 | 01:11 |
opendevreview | Michael Still proposed openstack/nova master: Protect older compute managers from sound model requests. https://review.opendev.org/c/openstack/nova/+/940770 | 01:11 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection. https://review.opendev.org/c/openstack/nova/+/927354 | 01:11 |
opendevreview | sean mooney proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516 | 04:57 |
opendevreview | sean mooney proposed openstack/nova master: Add mtty/mdpy support for testing fake mdevs https://review.opendev.org/c/openstack/nova/+/898100 | 04:57 |
opendevreview | sean mooney proposed openstack/nova master: WIP : Add mtty support to nova-next https://review.opendev.org/c/openstack/nova/+/922140 | 04:57 |
sean-k-mooney | mikal: "git commit -s" should be all you need to do but im not sure yet about an automatic way yet. most seam to recomemd a git alis like "git config --global alias.cs commit --signoff" but `git commit -s` vs `git cs` does not help much | 05:03 |
sean-k-mooney | mikal: you do not need ot sign comites with DCO | 05:04 |
sean-k-mooney | you can but git now supprot using ssh keys instead of GPG keys for sgining if you want to do that | 05:04 |
mikal | Yeah, that bit confuses me. What's to stop a forgery of a signoff if its just a text string in the commit message? | 05:04 |
sean-k-mooney | https://dev.to/janderssonse/git-signoff-and-signing-like-a-champ-41f3 | 05:04 |
mikal | Ahhh, so signoff != signing. | 05:05 |
sean-k-mooney | mikal: well gerrit will prevent forgin the authero but the commit message can be updated | 05:05 |
sean-k-mooney | signing commtis wont help because we dont use mege commits | 05:05 |
sean-k-mooney | so your commit will be rebasees/cherry picked anyway when its merged by zuul | 05:06 |
mikal | Well, at least this means I can still stick with my 25 year old GPG key! | 05:06 |
mikal | Its an antique and needs to be cherished. | 05:06 |
sean-k-mooney | so i have gpg siging setup or at least id did before i reinstalled my laptop.... | 05:07 |
mikal | I GPG sign tags on some personal projects, although I am unclear on how meaningful that is with such an old key. | 05:07 |
sean-k-mooney | but i tend not to use it because i had issue with geting it to prompt both locally and if i ssh into my work laptop | 05:07 |
sean-k-mooney | i.e. if i sshed in it woudl tyr ot pop up a unlock prome in a window on my work laptop instead of in the ssh terminal | 05:08 |
sean-k-mooney | i think i eventrally got it to work by alwasy forcign the gpg agent to use the tty for promting instead of the shel integration btu i remember that beign a bit of a pain | 05:08 |
sean-k-mooney | mikal: i mean if that key has never been leaked | 05:09 |
mikal | Yeah fair enough. I feel like GPG never really took off because its a huge pain in the ass. | 05:09 |
sean-k-mooney | its a as valid as min | 05:09 |
sean-k-mooney | ya it kind of used to be. i inally set it up because i wanted to be able to sgin tag for networking-ovs-dpdk and my emails | 05:10 |
sean-k-mooney | but signing emails with gpg keys causes all sort of issues at times | 05:10 |
mikal | Dr Google assures me that a 1024 bit DSA key is crackable by motivated players. So that's nice. | 05:10 |
sean-k-mooney | mikal: by the way git-review is nto the thing that adds the change id for you | 05:13 |
sean-k-mooney | mikal: we have configured our gerrit to server a commit hook which git review will download and install for you | 05:14 |
sean-k-mooney | im not sure if we can do something similar ot reject or amend your commit automticlly for you in a sane way | 05:14 |
mikal | Ahhh ok. My lived experience is "git review" saying "you're missing thingies" and adding them, but I've never looked at the mechanism all that hard. | 05:14 |
sean-k-mooney | ah https://gerrit.cloudera.org/Documentation/project-configuration.html#_require_signed_off_by | 05:15 |
sean-k-mooney | so ya we can just turn on the enforcement in gerrit | 05:15 |
sean-k-mooney | but im not sure about the auto fixign i suspect we woudl need to dig deaper | 05:16 |
sean-k-mooney | im sure git can suprpot this in a hook but we might need to write it | 05:16 |
mikal | The commit message hook appears to already understand "Signed-off-by trailers" based on a cursory reading? | 05:18 |
mikal | It uses https://git-scm.com/docs/git-interpret-trailers to do... things. | 05:19 |
mikal | Ahhh, ok. I think its trying quite hard to have the Change-Id before the signed off by trailers. Dr Google does suggest that if you turn on DCO enforcement on the project in gerrit then some amount of magic will happen. | 05:22 |
opendevreview | Abhishek Kekane proposed openstack/nova master: [DNM] Revert^2 "Support glance's new location API" https://review.opendev.org/c/openstack/nova/+/950623 | 05:27 |
sean-k-mooney | mikal: regarding pre-commit it does supprot writing "commit-msg" hooks which are pass a single parmeter the path to the file | 05:27 |
sean-k-mooney | mikal: so we coudl write one to either abort the comiit if Signed-off-by was not there ot ro add it for you before the commit was actully finalised | 05:28 |
mikal | As a forgetful person I quite like the idea of option two there. | 05:29 |
mikal | It is after all what we already do for change ids. | 05:29 |
mikal | And my global git config already knows my name and email address, which it needs to know for the log. | 05:29 |
sean-k-mooney | yep | 05:29 |
sean-k-mooney | its proably a good weekend hackign project which is also where we shoudl put it | 05:30 |
sean-k-mooney | in the hacking repo so we can share it acrross other repos | 05:30 |
mikal | It would also make the transition largely transparent... People do a git review, land a new version of the hook automatically, and blam they're compliant. Much more fun than herding cats. | 05:30 |
sean-k-mooney | we can add a new hook there and then enable it in the relevent repos https://github.com/openstack/hacking/blob/master/.pre-commit-hooks.yaml | 05:31 |
mikal | Although... Other projects also use signed-off-by to track who reviewed and approved a change. Is there any intent to do that as well? | 05:31 |
sean-k-mooney | mikal: that actully a missues of it | 05:34 |
sean-k-mooney | mikal: there are other standared lables for that | 05:34 |
sean-k-mooney | mikal: so no not currently however you can actully config gerrit to auto ammend the commit with that info to some degree | 05:35 |
sean-k-mooney | we had that enabeld downstream untile they moved use to gitlab..... | 05:35 |
* sean-k-mooney such a downgrade :( | 05:35 | |
mikal | I kind of like the history being all in one place to be honest. | 05:35 |
mikal | So yeah, review history in the commit would be cool. I don't really care what labels are used to be honest. I do agree the kernel seems to use "Signed-off-by", "Tested-by", "Reviewed-by", etc. | 05:36 |
sean-k-mooney | your ment to use Reviewed-by, Acked-by and Tested-by instead of sgined-off by to track that | 05:37 |
sean-k-mooney | so you coudl update the comit with the review vots and add acted by for the person that +w'd and tested-by by zuul (the verifed label votes) | 05:38 |
sean-k-mooney | however to do that gerrit would have to be sett to cerry pick mode which will alwasy cheryy pick the commit on to the target branch changign it git sha | 05:39 |
sean-k-mooney | which will mean you will not be able to knwo the sha to use for backportign until after it has merged because it will alwasy change | 05:39 |
sean-k-mooney | so to do that we loose our current cherry-picked form workflow/lines when backporting | 05:40 |
sean-k-mooney | mikal: the convetions for the by lines alll come form teh linke kernel and are part of there patch submission process https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes | 05:42 |
sean-k-mooney | bauzas: just so your aware i rebased your mtty_suppot sersie and moved my patch to the bottom. | 07:07 |
sean-k-mooney | bauzas: with it i was able to fix the error you were hitting but there is a bug in your first patch https://review.opendev.org/c/openstack/nova/+/898100/11 | 07:08 |
sean-k-mooney | the error i got with my fix was libvirt.libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/6dc8d190-2bc9-45ed-a4b8-d4f0708f92bb is in use by driver QEMU, domain instance-00000001 | 07:08 |
sean-k-mooney | so somehow nova tired to reuse the same mdev between to vms. i assuem that is a bug in https://review.opendev.org/c/openstack/nova/+/898100/11 that only affect virutal mdev device if not that woudl be a real bug in our current mdev supprot | 07:09 |
sean-k-mooney | i suspect it has somthign to do wtth how your trackign the virutal ones given they do not have a pci adress but im not sure | 07:10 |
mikal | sean-k-mooney: when you have a moment, I feel like the state of the VDI patches has fallen out of my brain. Are you now happy enough with the first two patches in the series that I could squash them into one and then do the compute protection change to the other set of extra specs? | 07:27 |
sean-k-mooney | i share your oxidation. ill need to quickly load context | 07:29 |
sean-k-mooney | you asking about squashing https://review.opendev.org/c/openstack/nova/+/940770/9 and the patch before it | 07:30 |
sean-k-mooney | or all 3 | 07:30 |
sean-k-mooney | mikal: i know dan prefers if we sepreate the object change form the driver change | 07:31 |
sean-k-mooney | so i think there overall prefence woudl be have the first patch containe the object and noticicaton object change. then move the rest of the changes in that to the "Protect older compute managers from sound model requests." change | 07:32 |
mikal | So at the moment the chain is: sound device extra spec -> compute manager protection for sound -> usb controller extra spec -> compute manager protection for USB (this last one not yet done) | 07:32 |
mikal | I had thought you'd want the first two squashed so you get the extra spec and the compute protection at the same time. | 07:32 |
sean-k-mooney | yes i want that btu we geneally prefer to have the object chagns in there won patch | 07:33 |
sean-k-mooney | so it would still be 4 patchees | 07:33 |
mikal | I think you're saying for those first two you'd instead prefer to see the first to squashed into one, which becomes the second patch, with the object and notification changes sitting before it? | 07:33 |
mikal | So: sound objs and notifications -> rest of sound -> usb objs and notifications -> rest of usb? | 07:33 |
sean-k-mooney | "add object changes for sound" -> "support soudn device" -> "add object changes for usb" "supprot usb" | 07:34 |
sean-k-mooney | yes | 07:34 |
mikal | Cool. I can do that. Are you happy with the content of the first two patches, just not the way they're presented? Or are there other changes that need to be made too? | 07:34 |
sean-k-mooney | mikal: in the unlikely case we need to revert this its easy to rever the driver change but we can really revert the object change | 07:35 |
sean-k-mooney | so having it be seperate just makes taht edge case or backport a littel simpler | 07:35 |
mikal | sean-k-mooney: that's fair. I'll do that refactor ASAPish if you're otherwise happy with the code. | 07:35 |
sean-k-mooney | ill need to look at the detail but i think i can review this series today | 07:36 |
sean-k-mooney | i dont think there was anything major last time i looked | 07:36 |
sean-k-mooney | so i dont really expect there to be any thing that need to be changed | 07:37 |
mikal | Yeah, certainly I can't see anything major in the comments unless I've missed something. | 07:37 |
mikal | Ok cool. I will do the refactor then and implement the compute manager protection for USB as well. | 07:37 |
noonedeadpunk | Hey folks! Got a question. Today realized that quite an unexpected behaviour is happening with live migrations. So on memory-intense VMs migration seems to fail on post-migration step. | 09:23 |
noonedeadpunk | The weird/enexpected thing, that as a result VM keeps shut down on both hosts and it moves to ERROR state | 09:23 |
noonedeadpunk | so decided to ask if this is smth quite well known? | 09:24 |
noonedeadpunk | As right now I think it's related to `live_migration_permit_auto_converge = true` | 09:24 |
noonedeadpunk | as issue happens only on memory intense VMs. So if you run stress inside VM - it will 99% end up in ERROR and shut down. If there's some reasonable applucation running - it's fine | 09:25 |
noonedeadpunk | So my guess is that if live_migration_permit_auto_converge kicks in, libvirt simply incapable to resume the VM on thye destination, but I'm not really sure if that is true | 09:26 |
noonedeadpunk | Thus before going to this rabbit hole wanted to ask if you might now anything about that :) | 09:26 |
frickler | noonedeadpunk: I assume you checked the nova-compute logs without success? how about the libvirt instance logs? but maybe wait for sean-k-mooney to chime in first | 09:30 |
sean-k-mooney | sorry was looking at something else reading back | 09:31 |
sean-k-mooney | noonedeadpunk: do you have post_copy migration enabeld? | 09:32 |
sean-k-mooney | auto converge just add cpu puases so that is very unlikely to be a factor | 09:32 |
sean-k-mooney | noonedeadpunk: for what its worth if there is high ram load it basically imposibel to migrate without post-copy migration. auto converge si not entough to supprot that usecase | 09:34 |
sean-k-mooney | it just slightly chagne wehre the live mgiration will fall over | 09:34 |
sean-k-mooney | auto converge is still a type of "pre-copy" migration when qemu copes all the memroy first , pauses the vm on the dest copyies the detal and resuses it on the souce | 09:36 |
noonedeadpunk | no, post_copy is disabled | 09:36 |
noonedeadpunk | only converge is | 09:36 |
sean-k-mooney | the difference betwen live_migration_permit_auto_converge = true and false is if its set to true. the qemu on the souce is allwo to micro pause the souce vm cpus before switching to the dest | 09:36 |
noonedeadpunk | yes, right, and that used to work nicely | 09:37 |
sean-k-mooney | in your case if your getting to post_live_migration that mean libvirt told nova the migration was complete | 09:37 |
noonedeadpunk | and I think regardless that should not end up with VM jsut being down on both sides? | 09:37 |
noonedeadpunk | jsut a sec, I'll re-paste logs | 09:37 |
sean-k-mooney | at which point the vm on the souce is not need any mroe and shoudl be terminate by libvirt | 09:38 |
sean-k-mooney | as you cant rool back at that point | 09:38 |
noonedeadpunk | it;s from source https://paste.openstack.org/show/bSc55xr9LSgCXiaj1uo1/ | 09:38 |
noonedeadpunk | and that's destination: https://paste.openstack.org/show/boBhuqrR8mYDoqCjYxex/ | 09:38 |
noonedeadpunk | the only very suspicious thing I've spotted, is that I also do see a libvirt reseted communication to qemu | 09:39 |
sean-k-mooney | i knwo you said this but the vm is activly runing right it not paused when you live migrated it right | 09:40 |
noonedeadpunk | so it feels that during unpause things crash unobviously | 09:40 |
noonedeadpunk | it's running, yes | 09:40 |
noonedeadpunk | I mean paused on destination | 09:40 |
sean-k-mooney | the reaons im asking is this is sort of similar to https://issues.redhat.com/browse/RHEL-48801 but that about live migrating a puased instance twice | 09:41 |
sean-k-mooney | qemu was not probly handling something in the block storage layer in that case | 09:41 |
noonedeadpunk | there's also no block migration as storage is ceph | 09:42 |
sean-k-mooney | ack that also good to know | 09:42 |
noonedeadpunk | but yeah, trying out post_copy was my next step to try | 09:43 |
noonedeadpunk | Was jsut a bit afraid of it tbh | 09:43 |
sean-k-mooney | noonedeadpunk: can you check the qemu instance logs and see if there is an error on the source or dest | 09:43 |
sean-k-mooney | for the block device one we saw a message like this | 09:43 |
sean-k-mooney | qemu-kvm: ../block.c:6979: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. | 09:43 |
sean-k-mooney | 2024-07-16 13:29:55.795+0000: shutting down, reason=crashed | 09:43 |
sean-k-mooney | noonedeadpunk: post-copy is more robust to activly loaded vms and is effectivly requried if your using 1G hugepages | 09:44 |
sean-k-mooney | you can kind of get vms with 2M hugepages to migate without it but 1G hugepages guests are very hard to make function | 09:45 |
noonedeadpunk | I did not configure hugepages explicitly tbh | 09:45 |
sean-k-mooney | it woudl be woth trying but i suspect your hiting a qemu but and post copy might help you work around that but im not sure | 09:45 |
sean-k-mooney | noonedeadpunk: if you didnt then its not a factor | 09:45 |
sean-k-mooney | the kernel might be using transparent hugepages | 09:45 |
sean-k-mooney | but if that the case it wont impact the migration | 09:46 |
noonedeadpunk | yeah, let me try to reproduce one more time as original VM was dropped apparently | 09:46 |
sean-k-mooney | the issue with explcit hugepages is if you dirty the page qemu has to copy the entire page again | 09:46 |
sean-k-mooney | mean a 1 byte write requrie 1G of data to be copied again | 09:46 |
* sean-k-mooney yes all memory writes are gong to be a cachelien so a 1 byte write is really a 64 byte write... | 09:47 | |
noonedeadpunk | so what would be suggested value? 1G? | 09:47 |
sean-k-mooney | sorry can you say that again | 09:48 |
sean-k-mooney | are you askign what i recomemnd if you choosoe to use hugepages? | 09:48 |
noonedeadpunk | Yeah :) I';d need to read about them more, but maybe you have a suggestion from your experience | 09:48 |
sean-k-mooney | if so i recommend 2MB hugepages for vms and if you happen ot use ovs-dpdk or have some other use for hugepages in teh ssytem then use 1G hugepages for those but not for the nova vms | 09:49 |
noonedeadpunk | ok, right | 09:49 |
noonedeadpunk | I don't have anything specific | 09:49 |
opendevreview | Markus Hentsch proposed openstack/nova master: Use os-brick for volume attachment passphrases https://review.opendev.org/c/openstack/nova/+/926326 | 09:49 |
noonedeadpunk | thus - never bothered to be explicit in defining the size | 09:49 |
sean-k-mooney | noonedeadpunk: well you also need to preallcoate them on teh kernel commandlien or with a start up script and some other config. | 09:50 |
sean-k-mooney | they do improve performance but the make the vm numa affined and you cant have any memroy over subscription | 09:50 |
noonedeadpunk | yeah, see that in the doc https://docs.openstack.org/nova/latest/admin/huge-pages.html | 09:50 |
sean-k-mooney | so for high performacne workload they are greate but they are not for all usecases | 09:51 |
noonedeadpunk | we have ram allocation ratio to 1 anyway | 09:51 |
sean-k-mooney | then you should use them :) | 09:51 |
noonedeadpunk | numa affinity is more scary though | 09:51 |
sean-k-mooney | do you use cpu pinning | 09:51 |
noonedeadpunk | as I can recall some painful experience on scheduling/evacuating with numa pinning | 09:51 |
noonedeadpunk | no, not really | 09:52 |
sean-k-mooney | ack | 09:52 |
noonedeadpunk | there's a specific tier with cpu pinning though | 09:52 |
noonedeadpunk | so probably there it makes most sense to use hugepages... | 09:53 |
noonedeadpunk | anyway | 09:53 |
sean-k-mooney | all cpu pinned guests are numa afifned and unless your using file backed memory when you use cpu pinning you need to specify hw:mem_page_size | 09:53 |
sean-k-mooney | you can set it to small | 09:53 |
sean-k-mooney | so you dont need hugepages | 09:53 |
sean-k-mooney | but if you dont set it your pinned VMs will eventually get OOM killed | 09:53 |
sean-k-mooney | noonedeadpunk: lets park the wonderful world of numa for now | 09:54 |
noonedeadpunk | right | 09:54 |
noonedeadpunk | but it's super interesting as well;) | 09:54 |
sean-k-mooney | if you replicate the issue please check if there is anything in the qemu isntance logs | 09:54 |
noonedeadpunk | thanks for enlighting me | 09:54 |
sean-k-mooney | im guessing your hiting a bug at that level but if not that also valumbel to knwo | 09:54 |
sean-k-mooney | noonedeadpunk: by the way on the srouce node sie we see in the logs that the migration compelted an libvirt stop the vm based on the auti message then it proceed to post live migration. | 10:02 |
noonedeadpunk | yeah | 10:02 |
noonedeadpunk | so it's kinda nowhere to revert at this point | 10:02 |
sean-k-mooney | so if there is a failure i woudl expect it to be in the destionat qemu instance log | 10:02 |
noonedeadpunk | I was thinking it's smth related to neutron port attachment, but I saw that on both OVN and OVS deployments | 10:03 |
noonedeadpunk | on Ubuntu 22.04 and 24.04 | 10:03 |
noonedeadpunk | (Caracal and Dalmatian) | 10:03 |
sean-k-mooney | ya i guess this was 22.04 basesed on teh fact your runnning caracal on 3.10 | 10:03 |
noonedeadpunk | this one was ovs one, yeah | 10:04 |
sean-k-mooney | and i assum OSA based on the fact your installed in a venv but not the normal kolla location | 10:04 |
noonedeadpunk | sure :) | 10:04 |
sean-k-mooney | i mean that a pretty battel tested config at this point | 10:05 |
noonedeadpunk | "normal kolla location" is a bit hilarious, but yeah :) | 10:05 |
sean-k-mooney | so its interesting that this is the first tiem someone raised this specific failrue mode | 10:05 |
noonedeadpunk | as I think system packages would use same path as kolla? | 10:05 |
sean-k-mooney | kolla supprots soruce or binary installs in the contianer | 10:06 |
noonedeadpunk | that's why my question also was if it's smth known already so I don't have to dig through the rabbit hole | 10:06 |
sean-k-mooney | if it source it builds the contaienr with a venv ate /var/lib/kolla something i think | 10:06 |
noonedeadpunk | ah, ok, right | 10:06 |
sean-k-mooney | that how it used to work anyway i have not contributed to kolla in a long time although my home oepnstack s deployed with caracal using kolla :) | 10:07 |
sean-k-mooney | my current config is defintly not a supproted one however so it prehaps not the best refernce | 10:08 |
noonedeadpunk | I don't think it has anything to do with deployment method tbh, unless we've messed up with mtls certificate generation | 10:08 |
noonedeadpunk | or ssh cert | 10:08 |
noonedeadpunk | but I don't think live migrations use ssh for post anyway? | 10:09 |
sean-k-mooney | i dont thnk so either i was more wondering what the rought qemu/libvirt/kernel version woudl be in this case | 10:09 |
noonedeadpunk | qemu 6.2 libvirt 8.0.0 kernel 6.5.0 | 10:10 |
sean-k-mooney | noonedeadpunk: so form the libvirt point of view the migration was complete. and at this point knwo shoudl not need to use ssh because the only thing its copying with images_type=rbd is liek the consoel | 10:10 |
sean-k-mooney | and that happend already | 10:10 |
sean-k-mooney | s/point know/point nova/ | 10:11 |
noonedeadpunk | and for 24.04 it was qemu 8.2.2 libvirt 10.0.0 and kernel 6.8,0 | 10:11 |
sean-k-mooney | what ever happend to the vm on the dest i think is local to the dest host. | 10:11 |
sean-k-mooney | are you migrating acrsso distor verison or is the os the saem on both ends | 10:11 |
noonedeadpunk | it;'s same on both ends. just different envs I've seen same thing | 10:12 |
sean-k-mooney | goign old to new is supproted but if thise does end up beign a qemu issue thten that will be an impoarnt factor if the qemu versions differ | 10:12 |
noonedeadpunk | yeah, right, that would be completely different story | 10:12 |
noonedeadpunk | and where 24.04 is OVN deployment as well | 10:13 |
noonedeadpunk | as I had suspicions for port attachement tbh | 10:13 |
sean-k-mooney | so that almost makes me think its ceph related | 10:13 |
sean-k-mooney | seperate ceph clusters? | 10:14 |
sean-k-mooney | so regardign the ports we create the ovs port on the destiatnion in pre-live-migration | 10:14 |
noonedeadpunk | separate in terms of? different? yes, these are 2 completely independent clusters at different sites in different countries, so nothing is shared between them at all | 10:14 |
sean-k-mooney | and then we activate the port binding in neutron in post live migration | 10:15 |
sean-k-mooney | but we are not modifyign the data path just switchihng which host is the active one | 10:15 |
noonedeadpunk | yeah, so I saw on OVN some fishy logs regarding port attachement... anyway, let me reproduce the thing :) | 10:15 |
sean-k-mooney | so the port biding activation shoudl not impact teh vm as that is out side its view | 10:15 |
noonedeadpunk | ah, right... | 10:15 |
noonedeadpunk | makes sense | 10:15 |
sean-k-mooney | if they are two entrily diffent site witn two entrily diffent versio oeprating system version an qemu/libvirt that wild | 10:16 |
sean-k-mooney | and also two diffent verison of opesntack | 10:17 |
noonedeadpunk | but I'm not migrating between them | 10:17 |
noonedeadpunk | jsut 2 envs where I can redroduce failure quite reliably | 10:17 |
sean-k-mooney | so etierh it a regression in nova (but i dont knwo what it could be) that in caracal and not fixed in dalmation or some things you shoudl have a bad day | 10:17 |
noonedeadpunk | or I messed up in setup/osa... | 10:18 |
noonedeadpunk | sean-k-mooney: destination logs do not have anything enlighting: https://paste.openstack.org/show/bcocoH9OrjscQt4gPTOr/ | 10:28 |
sean-k-mooney | well it has | 10:29 |
sean-k-mooney | 2025-05-22 06:55:49.882+0000: initiating migration | 10:29 |
sean-k-mooney | 2025-05-22 07:18:40.025+0000: shutting down, reason=crashed | 10:29 |
noonedeadpunk | shutting down, reason=migrated and next thing it crashed | 10:29 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add objects and notifications for sound model. https://review.opendev.org/c/openstack/nova/+/926126 | 10:29 |
opendevreview | Michael Still proposed openstack/nova master: Implement sound model extra spec for libvirt. https://review.opendev.org/c/openstack/nova/+/940770 | 10:29 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add objects and notifications for USB controller model. https://review.opendev.org/c/openstack/nova/+/927354 | 10:29 |
opendevreview | Michael Still proposed openstack/nova master: Implement USB controller extra spec for libvirt. https://review.opendev.org/c/openstack/nova/+/950643 | 10:29 |
noonedeadpunk | ah, ok, I think it was first attempt on migrating | 10:29 |
noonedeadpunk | yeah | 10:29 |
noonedeadpunk | but it does not point to any reason.... | 10:30 |
mikal | sean-k-mooney: the fourth patch lacks the compute protection, I'll do that tomorrow. So don't go mad and merge that one too early! | 10:31 |
* noonedeadpunk need to look at direct-spice implementation details | 10:33 | |
noonedeadpunk | *spice-direct | 10:33 |
sean-k-mooney | noonedeadpunk: this all look very stnadard. not nessiarly optimised for performacne but this is a very typeical looking vm so you not enabling anyting that raises red falgs to me | 10:34 |
mikal | noonedeadpunk: So its sort of half done -- there are now Nova APIs to create a console of type spice-direct, with openstacksdk and python-openstackclient support. Sound and USB are still missing (that's the patches above), so if that's a thing you care about you might need to hold your breath a bit. I have patches for kolla-ansible to deploy it, | 10:35 |
mikal | but I am sure it has bugs like no HA -- its still a work in progress. My pain focus right now is working through load testing the SPICE native proxy so I can try and quantify if it needs tweaking and what scaling to recommend with it. | 10:35 |
noonedeadpunk | mikal: not specifically care about spice-direct itself, but more wanted to implement that for OSA as well | 10:36 |
sean-k-mooney | mikal: ack | 10:36 |
mikal | noonedeadpunk: so OSA would need to know how to deploy the proxy and to twiddle some Nova config options. The existing Kolla-Ansible patches demonstrate how to do that. | 10:37 |
noonedeadpunk | mikal: do you have link top patches somewhere nearby? | 10:37 |
noonedeadpunk | there's support for spice-html5 but direct looks quite different | 10:38 |
mikal | noonedeadpunk: this is the list of patches I am carrying against Kolla-Ansible right now -- https://github.com/shakenfist/kerbside-patches/blob/develop/kolla-ansible/ORDER. There are also some minor patches against Kolla itself at https://github.com/shakenfist/kerbside-patches/blob/develop/kolla/ORDER. I'm sorry that's not the most helpful way to | 10:38 |
mikal | express them but its what I have right now. | 10:38 |
sean-k-mooney | noonedeadpunk: it is its using a out of tree proyx that mikal is developing called kerbside that actully allows raw spice conenction form a client to the vm | 10:39 |
mikal | noonedeadpunk: HTML5 and spice-direct can coexist I think, as long as you don't turn on the new require_secure option. The HTML5 client doesn't support TLS. | 10:39 |
sean-k-mooney | the proxy makes that safe | 10:39 |
sean-k-mooney | at least in thory | 10:39 |
mikal | noonedeadpunk: the proxy is at https://github.com/shakenfist/kerbside for what its worth. Like I said, the proxy itself is still a bit of a work in progress. | 10:39 |
noonedeadpunk | mikal: ok, so proxy needs to be started on control side I assume? | 10:40 |
noonedeadpunk | and then nova should be configured to point to it? | 10:41 |
mikal | noonedeadpunk: yes. The proxy sits running all the time and needs to be network accessible to your users. Nova is configured with the public URL to the proxy, and generates console access URLs using that URL template when a user creates a spice-direct console and then requests access to it. The proxy uses the console auth token to turn that back | 10:42 |
mikal | into hypervisor / port and then proxies the SPICE traffic between the client and the hypervisor. | 10:42 |
mikal | noonedeadpunk: https://github.com/shakenfist/kerbside-patches/blob/develop/tools/test-console#L88 is a worked example of that with a terrible fake client. | 10:42 |
noonedeadpunk | so in theory, L4 balancing can be added on top for public access? | 10:43 |
mikal | noonedeadpunk: yes. I expect the proxy would be clustered and it would be fine with traffic hitting multiple proxies. I suspect you'd want to try and keep a single client limited to a single backend for failure domain reasons, but apart from that it should just work. | 10:43 |
noonedeadpunk | ok, I should try making some patches, looks quite doable so far | 10:44 |
mikal | noonedeadpunk: state is in a central mysql database much like nova does. | 10:44 |
mikal | noonedeadpunk: by far my biggest concern is if the proxy is performant enough, but that's why that's the bit I am focused on right now. | 10:44 |
mikal | sean-k-mooney should still merge my patches though ;) | 10:44 |
mikal | (oVirt does something similar to all this, but they use squid as the proxy which means the client needs to know which hypervisor and port to connect to because its basically just doing SSL CONNECTs under the hood). | 10:45 |
noonedeadpunk | mikal: um... is the config respecting only ENV vars? | 10:47 |
mikal | noonedeadpunk: so kerbside itself isn't very "openstacky". Its written more like other personal projects of mine. It expects its configuration in environment variables, which can be in a systemd envvar file if that is less messy. Kolla-Ansible likes these start scripts, which is why it is done like that for those patches. It wouldn't be too hard to | 10:48 |
mikal | support a more traditional config file as well, its all just pydantic under the hood -- https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py | 10:48 |
noonedeadpunk | systemd envvar works as well, ofc. was just asking if I'm not missing more "conventional" config :) | 10:49 |
noonedeadpunk | And then servcice itself launches gunicorn? meaning https://github.com/shakenfist/kerbside/blob/053d068b0cc5b25a6263fe39a3548ca985b972ee/kerbside/api.py#L555-L563 | 10:50 |
mikal | noonedeadpunk: you didn't miss it. Pydantic can do YAML etc as well apparently https://docs.pydantic.dev/latest/concepts/pydantic_settings/#other-settings-source | 10:50 |
mikal | noonedeadpunk: correct. | 10:50 |
noonedeadpunk | so, like, it does not make sense to try to launch it with uWSGI or smth like that? | 10:50 |
noonedeadpunk | as if I can run service as wsgi - I'd rather do that... | 10:51 |
mikal | noonedeadpunk: maybe? The gunicorn bit is only for the REST API and these Nova generated access URLs, so its not particularly high traffic or anything. I am not opposed to supporting other runtimes, it just hasn't been a priority for me yet. | 10:51 |
mikal | noonedeadpunk: like I said my current focus is working through some sort of performance testing, because I'd say there's a 50% change kerbside is going to get a rewrite in golang or C or something. | 10:52 |
noonedeadpunk | right. | 10:52 |
noonedeadpunk | not sure if golang will help a lot, but yeah | 10:52 |
mikal | So it works right now, but I am unclear if it is performant enough. The proxy is basically a proof of concept right now. | 10:52 |
noonedeadpunk | From my prespective separation of uwsgi API and worker itself would be really nice to have. Probably bercause too much got used to how openstack is written :) | 10:56 |
mikal | Ok, I'll make a note to circle back to that once I have a stronger view of if a rewrite is required or not. | 10:57 |
noonedeadpunk | As for instance we have quite established way of running wsgi apps, as well as wiring around TLS generation and deployment for it | 10:58 |
mikal | Basically the focus so far has been on getting a proxy to work at all, given the poor state of SPICE protocol documentation, and then on landing the supporting nova patches. I am only just now circling back to productionizing the proxy. | 10:58 |
noonedeadpunk | yeah, right | 10:58 |
mikal | I don't know much about uwsgi. If you have best practise recommendations I'd be quite interested in those. | 10:59 |
noonedeadpunk | it's not that I'm blaming or anything, just commenting :) | 10:59 |
mikal | Yeah its all good. I'm mildly excited someone is considering deploying this at all. Its sometimes lonely over here. | 10:59 |
noonedeadpunk | we have a uwsgi role we're actively using, which produced config file like this one: https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/templates/uwsgi.ini.j2 | 11:00 |
noonedeadpunk | I totally see how useful this all might be in light of VMware migrations | 11:00 |
mikal | Yeah, I'm basically aiming at Citrix-like functionality here. Its just going to take a little while to get there. | 11:01 |
sean-k-mooney | noonedeadpunk: i dont even think you need to get partically fancy with the loadbalancing just have haproxxy or whatere use the souce ip/mac affiniteis the connetion to one of the active backends | 11:01 |
noonedeadpunk | but also this allows to secure communication with API side with TLS, as I think it's not covered now | 11:01 |
sean-k-mooney | noonedeadpunk: basiclly like you would for the novnc or html spice console | 11:01 |
noonedeadpunk | sean-k-mooney: exactly what I was thinking about, yes | 11:01 |
noonedeadpunk | jsut not L7 but L4 I guess | 11:01 |
mikal | noonedeadpunk: Kerbside API TLS is done with a frontend like apache2, not natively. SPICE protocol TLS is done natively within Kerbside. | 11:01 |
sean-k-mooney | ya so use the source ip and tcp port to balance | 11:02 |
sean-k-mooney | if it fails over they will have to reconenct | 11:02 |
noonedeadpunk | right, but how you communicate between apache2 and Kerbside API | 11:02 |
sean-k-mooney | but i dotn see a way around that without clustering in kerbside | 11:02 |
noonedeadpunk | as it;s listening on 0.0.0.0 explicitly | 11:02 |
mikal | If you squint at it, Kerbside is sort of a layer 7 load balancer. Its the thing which understands the protocol well enough to direct traffic to the right backend. | 11:02 |
noonedeadpunk | sean-k-mooney: I think it's totally fine tbh and more then reasonable | 11:02 |
mikal | noonedeadpunk: that's just the default in the config, you can override it. | 11:03 |
noonedeadpunk | mikal: um, can you? https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py#L56 | 11:03 |
noonedeadpunk | and if that is a bare metal host and not docker container? | 11:03 |
mikal | noonedeadpunk: yeah, that whole template can be overridden by exporting KERBSIDE_API_COMMAND_LINE in the starting environment. | 11:04 |
noonedeadpunk | ah | 11:04 |
noonedeadpunk | ok | 11:04 |
sean-k-mooney | its just a wsgi app by the looks of it | 11:04 |
noonedeadpunk | so I can also set `echo ''` to it kinda? | 11:04 |
noonedeadpunk | it is | 11:04 |
noonedeadpunk | jsut started by the main service | 11:04 |
mikal | noonedeadpunk: I don't follow the echo comment? | 11:04 |
sean-k-mooney | mikal: are you using https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py#L54 to gengerate a script to run kerbside | 11:05 |
noonedeadpunk | so if I want to run wsgi separately, I shoudl be able to replace KERBSIDE_API_COMMAND_LINE with noop command quite successfully I assiume | 11:05 |
sean-k-mooney | as in a console script | 11:05 |
sean-k-mooney | or are you using that to spwan serpte isntance of kerbside per conection | 11:05 |
mikal | noonedeadpunk: secretly, this is just a cut down (and a bit older) version of https://github.com/shakenfist/shakenfist/blob/develop/shakenfist/config.py which is the project that code was lifted from. | 11:05 |
mikal | Oh interesting. | 11:06 |
mikal | Like I said, I don't know much about uwsgi. If flask can work in that mode then I think that would work? | 11:06 |
sean-k-mooney | noonedeadpunk: https://github.com/shakenfist/kerbside/blob/develop/kerbside/api.py#L566-L568 | 11:07 |
mikal | The protocol native proxy is a separate socket server. | 11:07 |
noonedeadpunk | usually gunicorn, uwsgi and mod_wsgi are interchangable | 11:07 |
sean-k-mooney | mikal: so your api consol script currently pawans a wsgi server form api.py | 11:07 |
sean-k-mooney | but normlaly you woudl want to spin up the wsgi appolcation directly with a webserver and not spawn it with Popen like that if it can be shared between conenctions | 11:08 |
noonedeadpunk | and then if your intention to have apache2 in front for TLS, using mod_wsgi might be even more wise | 11:08 |
sean-k-mooney | if ithis is one process per api request however that is differnt | 11:08 |
mikal | The Shaken Fist Foundation [1] welcomes your feedback and potential contributions. | 11:08 |
mikal | 1: Said foundation does not exist. | 11:08 |
sean-k-mooney | mikal: without deep diving i think noonedeadpunk could ignore your main an api.py | 11:09 |
mikal | I think what I am getting from this conversation is that I should go read more about uwsgi, which I pinky swear to do. | 11:10 |
sean-k-mooney | and just spin up the wsig app directly and as they said use apache in front to handell tls | 11:10 |
sean-k-mooney | wsgi and uwsgi are also two diffent things | 11:10 |
sean-k-mooney | uwsgi, gurnicorn, and mod_wsgi are 3 examples wor wsgi applcation servers | 11:11 |
sean-k-mooney | as in they host a wsgi applcation | 11:11 |
sean-k-mooney | wsgi its self is a protocol | 11:11 |
noonedeadpunk | yeah, so usually you just don;t need to spawn gunicorn as part of the service, but do that separately | 11:12 |
noonedeadpunk | and leave user to choose wsgi server | 11:12 |
mikal | Ok, I will put reading about all this on my todo list for tomorrow. | 11:12 |
noonedeadpunk | as some might want to run granian or smth like that even | 11:12 |
mikal | Although, writing load testing tooling sounds like more fun ;) | 11:13 |
sean-k-mooney | so mikal what actully happenign when you creat taht subprcess is your stattign a wsgi sever that is importing https://github.com/shakenfist/kerbside/blob/develop/kerbside/api.py#L39-L42 | 11:13 |
sean-k-mooney | mikal: then en that applcation host is bidnign to the port in the temlsate and handeign the request ot of band of the kerbside command you use to run it | 11:14 |
sean-k-mooney | mikal: by the way the way you wrote looks liek you wanted it to work how nova-api worked usign the eventlet webserver whild also not using eventlet :) | 11:15 |
sean-k-mooney | i.e. you wanted to provide a simpel way to run it form the cli via the kerbside command generated by this entry point https://github.com/shakenfist/kerbside/blob/develop/setup.cfg#L24 | 11:16 |
mikal | sean-k-mooney: Oh I was definitely thinking about nova-api when I wrote it, but it was also over four years ago when I wrote that which means its all long forgotten now. | 11:16 |
sean-k-mooney | mikal: kolla recently moved form apache with mod_wsi to uwsgi | 11:17 |
sean-k-mooney | you shoudl be able to skip the commdn without any code changes and just run that flask ap directly by provieing the moduel path like you did to grunicorn | 11:18 |
mikal | sean-k-mooney: that's probably good because this is a topic I am entirely not passionate about. I just want to do something people think doesn't suck and then move onto more interesting things. | 11:18 |
sean-k-mooney | the main benifity fo that approch is you dont need to do the proces management in your cli | 11:19 |
mikal | Like I want it to be performant and not make people angry, but it is not my calling. | 11:19 |
sean-k-mooney | you can leave it up to docker/systemd to do that for you | 11:19 |
mikal | Ok well I will play with this tomorrow. I am being assured it is past my bed time now. Feel free to file a github issue if you're keen, otherwise I'll just work through a fix and smash it in. | 11:20 |
sean-k-mooney | well none of this is a blcoerk on the nova side. but fore scalablity and security reasons it woudl be good to do when doing a production install | 11:22 |
sean-k-mooney | so its somethign that noonedeadpunk and the kolla folks wil care about | 11:22 |
sean-k-mooney | but its not liek its need a big code change at first glance | 11:23 |
noonedeadpunk | I can try to submit a report later today | 11:25 |
noonedeadpunk | I need to deal with failing migrations right now :( | 11:25 |
mikal | Thanks guys. | 11:31 |
opendevreview | Kamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event https://review.opendev.org/c/openstack/nova/+/949754 | 11:43 |
opendevreview | Merged openstack/nova stable/2025.1: Amend functional reproducer for bug 1899835 https://review.opendev.org/c/openstack/nova/+/950213 | 12:39 |
opendevreview | Dan Smith proposed openstack/nova master: pci: Add more detail and examples to pci.alias docs https://review.opendev.org/c/openstack/nova/+/950659 | 13:52 |
dansmith | gibi: Uggla, per yesterday, some refinement ^ | 13:52 |
opendevreview | Dan Smith proposed openstack/nova master: pci: Add more detail and examples to pci.alias docs https://review.opendev.org/c/openstack/nova/+/950659 | 13:53 |
opendevreview | yaguang proposed openstack/nova stable/2024.2: Handle iso+gpt detections https://review.opendev.org/c/openstack/nova/+/943940 | 14:08 |
Uggla | dansmith, +1 for the new pci.alias example. | 14:14 |
opendevreview | Merged openstack/nova stable/2025.1: live migration: Avoid volume rollback mismatches https://review.opendev.org/c/openstack/nova/+/950214 | 14:33 |
opendevreview | melanie witt proposed openstack/nova-specs master: Re-propose vTPM live migration https://review.opendev.org/c/openstack/nova-specs/+/947542 | 15:23 |
melwitt | gibi: fixed the typo you found ^ | 15:23 |
melwitt | dansmith, sean-k-mooney: the re-proposal for vtpm live migration spec is ready for review if you may be interested. I have noted the changes from the last approval in the commit message | 15:26 |
opendevreview | Fabian Wiesel proposed openstack/nova master: libvirt: Expose number of migration threads https://review.opendev.org/c/openstack/nova/+/950667 | 15:34 |
gibi | melwitt: thanks | 15:55 |
gibi | I added my +2 back | 15:55 |
dansmith | melwitt: ack, I'm several layers deep in the stack right now | 16:03 |
opendevreview | Merged openstack/nova stable/2024.2: Amend functional reproducer for bug 1899835 https://review.opendev.org/c/openstack/nova/+/950215 | 16:54 |
opendevreview | Merged openstack/nova stable/2024.2: live migration: Avoid volume rollback mismatches https://review.opendev.org/c/openstack/nova/+/950216 | 16:58 |
melwitt | dansmith: ack, it is not urgent but it is there when you are ready | 18:09 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!