Thursday, 2025-05-22

mikalPotentially silly question(s), but is there a git plugin or precommit or whatever that will just make me magically DCO compliant with minimal effort? I am assuming that DCO implies GPG signed commits, although I cannot find a web page which clearly states tat -- it might just be some text in the commit message which alleges to come from me? If I01:10
mikaldidn't in fact hallucinate the GPG signature requirement, are there minimum requirements for the GPG key used to sign the commit? My GPG key is relatively well connected, but also ancient.01:10
mikalLike... it would be cool if "git review" added missing DCO entries like it does missing change ids.01:11
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add extra spec for sound device.  https://review.opendev.org/c/openstack/nova/+/92612601:11
opendevreviewMichael Still proposed openstack/nova master: Protect older compute managers from sound model requests.  https://review.opendev.org/c/openstack/nova/+/94077001:11
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection.  https://review.opendev.org/c/openstack/nova/+/92735401:11
opendevreviewsean mooney proposed openstack/nova master: move compile earlier  https://review.opendev.org/c/openstack/nova/+/95051604:57
opendevreviewsean mooney proposed openstack/nova master: Add mtty/mdpy support for testing fake mdevs  https://review.opendev.org/c/openstack/nova/+/89810004:57
opendevreviewsean mooney proposed openstack/nova master: WIP : Add mtty support to nova-next  https://review.opendev.org/c/openstack/nova/+/92214004:57
sean-k-mooneymikal: "git commit -s" should be all you need to do but im not sure yet about an automatic way yet. most seam to recomemd a git alis like "git config --global alias.cs commit --signoff" but `git commit -s` vs `git cs` does not help much05:03
sean-k-mooneymikal: you do not need ot sign comites with DCO05:04
sean-k-mooneyyou can but git now supprot using ssh keys instead of GPG keys for sgining if you want to do that05:04
mikalYeah, that bit confuses me. What's to stop a forgery of a signoff if its just a text string in the commit message?05:04
sean-k-mooneyhttps://dev.to/janderssonse/git-signoff-and-signing-like-a-champ-41f305:04
mikalAhhh, so signoff != signing.05:05
sean-k-mooneymikal: well gerrit will prevent forgin the authero but the commit message can be updated05:05
sean-k-mooneysigning commtis wont help because we dont use mege commits05:05
sean-k-mooneyso your commit will be rebasees/cherry picked anyway when its merged by zuul05:06
mikalWell, at least this means I can still stick with my 25 year old GPG key!05:06
mikalIts an antique and needs to be cherished.05:06
sean-k-mooneyso i have gpg siging setup or at least id did before i reinstalled my laptop....05:07
mikalI GPG sign tags on some personal projects, although I am unclear on how meaningful that is with such an old key.05:07
sean-k-mooneybut i tend not to use it because i had issue with geting it to prompt both locally and if i ssh  into my work laptop05:07
sean-k-mooneyi.e. if i sshed in it woudl tyr ot pop up a unlock prome in a window on my work laptop instead of in the ssh terminal05:08
sean-k-mooneyi think i eventrally got it to work by alwasy forcign the gpg agent to use the tty for promting instead of the shel integration btu i remember that beign a bit of a pain05:08
sean-k-mooneymikal: i mean if that key has never been leaked05:09
mikalYeah fair enough. I feel like GPG never really took off because its a huge pain in the ass.05:09
sean-k-mooneyits a as valid as min05:09
sean-k-mooneyya it kind of used to be. i inally set it up because i wanted to be able to sgin tag for networking-ovs-dpdk and my emails05:10
sean-k-mooneybut signing emails with gpg keys causes all sort of issues at times05:10
mikalDr Google assures me that a 1024 bit DSA key is crackable by motivated players. So that's nice.05:10
sean-k-mooneymikal: by the way git-review is nto the thing that adds the change id for you05:13
sean-k-mooneymikal: we have configured our gerrit to server a commit hook which git review will download and install for you05:14
sean-k-mooneyim not sure if we can do something similar ot reject or amend your commit automticlly for you in a sane way05:14
mikalAhhh ok. My lived experience is "git review" saying "you're missing thingies" and adding them, but I've never looked at the mechanism all that hard.05:14
sean-k-mooneyah https://gerrit.cloudera.org/Documentation/project-configuration.html#_require_signed_off_by05:15
sean-k-mooneyso ya we can just turn on the enforcement in gerrit05:15
sean-k-mooneybut im not sure about the auto fixign i suspect we woudl need to dig deaper05:16
sean-k-mooneyim sure git can suprpot this in a hook but we might need to write it 05:16
mikalThe commit message hook appears to already understand "Signed-off-by trailers" based on a cursory reading?05:18
mikalIt uses https://git-scm.com/docs/git-interpret-trailers to do... things.05:19
mikalAhhh, ok. I think its trying quite hard to have the Change-Id before the signed off by trailers. Dr Google does suggest that if you turn on DCO enforcement on the project in gerrit then some amount of magic will happen.05:22
opendevreviewAbhishek Kekane proposed openstack/nova master: [DNM] Revert^2 "Support glance's new location API"  https://review.opendev.org/c/openstack/nova/+/95062305:27
sean-k-mooneymikal: regarding pre-commit it does supprot writing "commit-msg" hooks which are pass a single parmeter the path to the file05:27
sean-k-mooneymikal: so we coudl write one to either abort the comiit if Signed-off-by was not there ot ro add it for you before the commit was actully finalised05:28
mikalAs a forgetful person I quite like the idea of option two there.05:29
mikalIt is after all what we already do for change ids.05:29
mikalAnd my global git config already knows my name and email address, which it needs to know for the log.05:29
sean-k-mooneyyep05:29
sean-k-mooneyits proably a good weekend hackign project which is also where we shoudl put it 05:30
sean-k-mooneyin the hacking repo so we can share it acrross other repos05:30
mikalIt would also make the transition largely transparent... People do a git review, land a new version of the hook automatically, and blam they're compliant. Much more fun than herding cats.05:30
sean-k-mooneywe can add a new hook there and then enable it in the relevent repos https://github.com/openstack/hacking/blob/master/.pre-commit-hooks.yaml05:31
mikalAlthough... Other projects also use signed-off-by to track who reviewed and approved a change. Is there any intent to do that as well?05:31
sean-k-mooneymikal: that actully a missues of it 05:34
sean-k-mooneymikal: there are other standared lables for that05:34
sean-k-mooneymikal: so no not currently however you can actully config gerrit to auto ammend the commit with that info to some degree05:35
sean-k-mooneywe had that enabeld downstream untile they moved use to gitlab.....05:35
* sean-k-mooney such a downgrade :(05:35
mikalI kind of like the history being all in one place to be honest.05:35
mikalSo yeah, review history in the commit would be cool. I don't really care what labels are used to be honest. I do agree the kernel seems to use "Signed-off-by", "Tested-by", "Reviewed-by", etc.05:36
sean-k-mooneyyour ment to use Reviewed-by, Acked-by and Tested-by instead of sgined-off by  to track that 05:37
sean-k-mooneyso you coudl update the comit with the review vots and add acted by for the person that +w'd and tested-by by zuul (the verifed label votes)05:38
sean-k-mooneyhowever to do that gerrit would have to be sett to cerry pick  mode which will alwasy cheryy pick the commit on to the target branch changign it git sha05:39
sean-k-mooneywhich will mean you will not be able to knwo the sha to use for backportign until after it has merged because it will alwasy change05:39
sean-k-mooneyso to do that we loose our current cherry-picked form workflow/lines when backporting05:40
sean-k-mooneymikal: the convetions for the by lines alll come form teh linke kernel and are part of there patch submission process https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes05:42
sean-k-mooneybauzas: just so your aware i rebased your mtty_suppot sersie and moved my patch to the bottom.07:07
sean-k-mooneybauzas: with it i was able to fix the error you were hitting but there is a bug in your first patch https://review.opendev.org/c/openstack/nova/+/898100/1107:08
sean-k-mooneythe error i got with my fix was libvirt.libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/6dc8d190-2bc9-45ed-a4b8-d4f0708f92bb is in use by driver QEMU, domain instance-0000000107:08
sean-k-mooneyso somehow nova tired to reuse the same mdev between to vms. i assuem that is a bug in https://review.opendev.org/c/openstack/nova/+/898100/11 that only affect virutal mdev device if not that woudl be a real bug in our current mdev supprot07:09
sean-k-mooneyi suspect it has somthign to do wtth how your trackign the virutal ones given they do not have a pci adress but im not sure07:10
mikalsean-k-mooney: when you have a moment, I feel like the state of the VDI patches has fallen out of my brain. Are you now happy enough with the first two patches in the series that I could squash them into one and then do the compute protection change to the other set of extra specs?07:27
sean-k-mooneyi share your oxidation. ill need to quickly load context07:29
sean-k-mooneyyou asking about squashing https://review.opendev.org/c/openstack/nova/+/940770/9 and the patch before it07:30
sean-k-mooneyor all 307:30
sean-k-mooneymikal: i know dan prefers if we sepreate the object change form the driver change07:31
sean-k-mooneyso i think there overall prefence woudl be have the first patch containe the object and noticicaton object change. then move the rest of the changes in that to the "Protect older compute managers from sound model requests." change07:32
mikalSo at the moment the chain is: sound device extra spec -> compute manager protection for sound -> usb controller extra spec -> compute manager protection for USB (this last one not yet done)07:32
mikalI had thought you'd want the first two squashed so you get the extra spec and the compute protection at the same time.07:32
sean-k-mooneyyes i want that btu we geneally prefer to have the object chagns in there won patch07:33
sean-k-mooneyso it would still be 4 patchees07:33
mikalI think you're saying for those first two you'd instead prefer to see the first to squashed into one, which becomes the second patch, with the object and notification changes sitting before it?07:33
mikalSo: sound objs and notifications -> rest of sound -> usb objs and notifications -> rest of usb?07:33
sean-k-mooney"add object changes for sound" -> "support soudn device" -> "add object changes for usb" "supprot usb"07:34
sean-k-mooneyyes07:34
mikalCool. I can do that. Are you happy with the content of the first two patches, just not the way they're presented? Or are there other changes that need to be made too?07:34
sean-k-mooneymikal: in the unlikely case we need to revert this its easy to rever the driver change but we can really revert the object change07:35
sean-k-mooneyso having it be seperate just makes taht edge case or backport a littel simpler07:35
mikalsean-k-mooney: that's fair. I'll do that refactor ASAPish if you're otherwise happy with the code.07:35
sean-k-mooneyill need to look at the detail but i think i can review this series today07:36
sean-k-mooneyi dont think there was anything major last time i looked07:36
sean-k-mooneyso i dont really expect there to be any thing that need to be changed07:37
mikalYeah, certainly I can't see anything major in the comments unless I've missed something.07:37
mikalOk cool. I will do the refactor then and implement the compute manager protection for USB as well.07:37
noonedeadpunkHey folks! Got a question. Today realized that quite an unexpected behaviour is happening with live migrations. So on memory-intense VMs migration seems to fail on post-migration step.09:23
noonedeadpunkThe weird/enexpected thing, that as a result VM keeps shut down on both hosts and it moves to ERROR state09:23
noonedeadpunkso decided to ask if this is smth quite well known? 09:24
noonedeadpunkAs right now I think it's related to `live_migration_permit_auto_converge = true` 09:24
noonedeadpunkas issue happens only on memory intense VMs. So if you run stress inside VM - it will 99% end up in ERROR and shut down. If there's some reasonable applucation running - it's fine09:25
noonedeadpunkSo my guess is that if live_migration_permit_auto_converge kicks in, libvirt simply incapable to resume the VM on thye destination, but I'm not really sure if that is true09:26
noonedeadpunkThus before going to this rabbit hole wanted to ask if you might now anything about that :)09:26
fricklernoonedeadpunk: I assume you checked the nova-compute logs without success? how about the libvirt instance logs? but maybe wait for sean-k-mooney to chime in first09:30
sean-k-mooneysorry was looking at something else reading back09:31
sean-k-mooneynoonedeadpunk: do you have post_copy migration enabeld? 09:32
sean-k-mooneyauto converge just add cpu puases so that is very unlikely to be a factor09:32
sean-k-mooneynoonedeadpunk: for what its worth if there is high ram load it basically imposibel to migrate without post-copy migration. auto converge si not entough to supprot that usecase09:34
sean-k-mooneyit just slightly chagne wehre the live mgiration will fall over09:34
sean-k-mooneyauto converge is still a type of "pre-copy" migration when qemu copes all the memroy first , pauses the vm on the dest copyies the detal and resuses it on the souce09:36
noonedeadpunkno, post_copy is disabled09:36
noonedeadpunkonly converge is09:36
sean-k-mooneythe difference betwen live_migration_permit_auto_converge = true and false is if its set to true. the qemu on the souce is allwo to micro pause the souce vm cpus before switching to the dest09:36
noonedeadpunkyes, right, and that used to work nicely09:37
sean-k-mooneyin your case if your getting to post_live_migration that mean libvirt told nova the migration was complete09:37
noonedeadpunkand I think regardless that should not end up with VM jsut being down on both sides?09:37
noonedeadpunkjsut a sec, I'll re-paste logs09:37
sean-k-mooneyat which point the vm on the souce is not need any mroe and shoudl be terminate by libvirt09:38
sean-k-mooneyas you cant rool back at that point09:38
noonedeadpunkit;s from source https://paste.openstack.org/show/bSc55xr9LSgCXiaj1uo1/09:38
noonedeadpunkand that's destination: https://paste.openstack.org/show/boBhuqrR8mYDoqCjYxex/09:38
noonedeadpunkthe only very suspicious thing I've spotted, is that I also do see a libvirt reseted communication to qemu09:39
sean-k-mooneyi knwo you said this but the vm is activly runing right it not paused when you live migrated it right09:40
noonedeadpunkso it feels that during unpause things crash unobviously09:40
noonedeadpunkit's running, yes09:40
noonedeadpunkI mean paused on destination09:40
sean-k-mooneythe reaons im asking is this is sort of similar to https://issues.redhat.com/browse/RHEL-48801 but that about live migrating a puased instance twice09:41
sean-k-mooneyqemu was not probly handling something in the block storage layer in that case09:41
noonedeadpunkthere's also no block migration as storage is ceph09:42
sean-k-mooneyack that also good to know09:42
noonedeadpunkbut yeah, trying out post_copy was my next step to try09:43
noonedeadpunkWas jsut a bit afraid of it tbh09:43
sean-k-mooneynoonedeadpunk: can you check the qemu instance logs and see if there is an error on the source or dest09:43
sean-k-mooneyfor the block device one we saw a message like this 09:43
sean-k-mooneyqemu-kvm: ../block.c:6979: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.09:43
sean-k-mooney2024-07-16 13:29:55.795+0000: shutting down, reason=crashed09:43
sean-k-mooneynoonedeadpunk: post-copy is more robust to activly loaded vms and is effectivly requried if your using 1G hugepages09:44
sean-k-mooneyyou can kind of get vms with 2M hugepages to migate without it but 1G hugepages guests are very hard to make function09:45
noonedeadpunkI did not configure hugepages explicitly tbh09:45
sean-k-mooneyit woudl be woth trying but i suspect your hiting a qemu but and post copy might help you work around that but im not sure09:45
sean-k-mooneynoonedeadpunk: if you didnt then its not a factor09:45
sean-k-mooneythe kernel might be using transparent hugepages09:45
sean-k-mooneybut if that the case it wont impact the migration09:46
noonedeadpunkyeah, let me try to reproduce one more time as original VM was dropped apparently09:46
sean-k-mooneythe issue with explcit hugepages is if you dirty the page qemu has to copy the entire page again09:46
sean-k-mooneymean a 1 byte write requrie 1G of data to be copied again09:46
* sean-k-mooney yes all memory writes are gong to be a cachelien so a 1 byte write is really a 64 byte write...09:47
noonedeadpunkso what would be suggested value? 1G?09:47
sean-k-mooneysorry can you say that again09:48
sean-k-mooneyare you askign what i recomemnd if you choosoe to use hugepages?09:48
noonedeadpunkYeah :) I';d need to read about them more, but maybe you have a suggestion from your experience09:48
sean-k-mooneyif so i recommend 2MB hugepages for vms and if you happen ot use ovs-dpdk or have some other use for hugepages in teh ssytem then use 1G hugepages for those but not for the nova vms09:49
noonedeadpunkok, right09:49
noonedeadpunkI don't have anything specific09:49
opendevreviewMarkus Hentsch proposed openstack/nova master: Use os-brick for volume attachment passphrases  https://review.opendev.org/c/openstack/nova/+/92632609:49
noonedeadpunkthus - never bothered to be explicit in defining the size09:49
sean-k-mooneynoonedeadpunk: well you also need to preallcoate them on teh kernel commandlien or with a start up script and some other config.09:50
sean-k-mooneythey do improve performance but the make the vm numa affined and you cant have any memroy over subscription09:50
noonedeadpunkyeah, see that in the doc https://docs.openstack.org/nova/latest/admin/huge-pages.html09:50
sean-k-mooneyso for high performacne workload they are greate but they are not for all usecases09:51
noonedeadpunkwe have ram allocation ratio to 1 anyway09:51
sean-k-mooneythen you should use them :)09:51
noonedeadpunknuma affinity is more scary though09:51
sean-k-mooneydo you use cpu pinning09:51
noonedeadpunkas I can recall some painful experience on scheduling/evacuating with numa pinning09:51
noonedeadpunkno, not really09:52
sean-k-mooneyack09:52
noonedeadpunkthere's a specific tier with cpu pinning though09:52
noonedeadpunkso probably there it makes most sense to use hugepages...09:53
noonedeadpunkanyway09:53
sean-k-mooneyall cpu pinned guests are numa afifned and unless your using file backed memory when you use cpu pinning you need to specify hw:mem_page_size 09:53
sean-k-mooneyyou can set it to small09:53
sean-k-mooneyso you dont need hugepages 09:53
sean-k-mooneybut if you dont set it your pinned VMs will eventually get OOM killed09:53
sean-k-mooneynoonedeadpunk: lets park the wonderful world of numa for now09:54
noonedeadpunkright09:54
noonedeadpunkbut it's super interesting as well;)09:54
sean-k-mooneyif you replicate the issue please check if there is anything in the qemu isntance logs09:54
noonedeadpunkthanks for enlighting me09:54
sean-k-mooneyim guessing your hiting a bug at that level but if not that also valumbel to knwo09:54
sean-k-mooneynoonedeadpunk: by the way on the srouce node sie we see in the logs that the migration compelted an libvirt stop the vm based on the auti message then it proceed to post live migration.10:02
noonedeadpunkyeah10:02
noonedeadpunkso it's kinda nowhere to revert at this point10:02
sean-k-mooneyso if there is a failure i woudl expect it to be in the destionat qemu instance log10:02
noonedeadpunkI was thinking it's smth related to neutron port attachment, but I saw that on both OVN and OVS deployments10:03
noonedeadpunkon Ubuntu 22.04 and 24.0410:03
noonedeadpunk(Caracal and Dalmatian)10:03
sean-k-mooneyya i guess this was 22.04 basesed on teh fact your runnning caracal on 3.1010:03
noonedeadpunkthis one was ovs one, yeah10:04
sean-k-mooneyand i assum OSA based on the fact your installed in a venv but not the normal kolla location10:04
noonedeadpunksure :)10:04
sean-k-mooneyi mean that a pretty battel tested config at this point10:05
noonedeadpunk"normal kolla location" is a bit hilarious, but yeah :)10:05
sean-k-mooneyso its interesting that this is the first tiem someone raised this specific failrue mode10:05
noonedeadpunkas I think system packages would use same path as kolla?10:05
sean-k-mooneykolla supprots soruce or binary installs in the contianer10:06
noonedeadpunkthat's why my question also was if it's smth known already so I don't have to dig through the rabbit hole10:06
sean-k-mooneyif it source it builds the contaienr with a venv ate /var/lib/kolla something i think10:06
noonedeadpunkah, ok, right10:06
sean-k-mooneythat how it used to work anyway i have not contributed to kolla in a long time although my home oepnstack s deployed with caracal using kolla :)10:07
sean-k-mooneymy current config is defintly not a supproted one however so it prehaps not the best refernce10:08
noonedeadpunkI don't think it has anything to do with deployment method tbh, unless we've messed up with mtls certificate generation10:08
noonedeadpunkor ssh cert10:08
noonedeadpunkbut I don't think live migrations use ssh for post  anyway?10:09
sean-k-mooneyi dont thnk so either i was more wondering what the rought qemu/libvirt/kernel version woudl be in this case10:09
noonedeadpunkqemu 6.2 libvirt 8.0.0 kernel 6.5.010:10
sean-k-mooneynoonedeadpunk: so form the libvirt point of view the migration was complete. and at this point knwo shoudl not need to use ssh because the only thing its copying with images_type=rbd is liek the consoel10:10
sean-k-mooneyand that happend already10:10
sean-k-mooneys/point know/point nova/10:11
noonedeadpunkand for 24.04 it was qemu 8.2.2 libvirt 10.0.0 and kernel 6.8,010:11
sean-k-mooneywhat ever happend to the vm on the dest i think is local to the dest host.10:11
sean-k-mooneyare you migrating acrsso distor verison or is the os the saem on both ends10:11
noonedeadpunkit;'s same on both ends. just different envs I've seen same thing10:12
sean-k-mooneygoign old to new is supproted but if thise does end up beign a qemu issue thten that will be an impoarnt factor if the qemu versions differ10:12
noonedeadpunkyeah, right, that would be completely different story10:12
noonedeadpunkand where 24.04 is OVN deployment as well10:13
noonedeadpunkas I had suspicions for port attachement tbh10:13
sean-k-mooneyso that almost makes me think its ceph related10:13
sean-k-mooneyseperate ceph clusters?10:14
sean-k-mooneyso regardign the ports we create the ovs port on the destiatnion in pre-live-migration10:14
noonedeadpunkseparate in terms of? different? yes, these are 2 completely independent clusters at different sites in different countries, so nothing is shared between them at all10:14
sean-k-mooneyand then we activate the port binding in neutron in post live migration10:15
sean-k-mooneybut we are not modifyign the data path just switchihng which host is the active one10:15
noonedeadpunkyeah, so I saw on OVN some fishy logs regarding port attachement... anyway, let me reproduce the thing :)10:15
sean-k-mooneyso the port biding activation shoudl not impact teh vm as that is out side its view10:15
noonedeadpunkah, right...10:15
noonedeadpunkmakes sense10:15
sean-k-mooneyif they are two entrily diffent site witn two entrily diffent versio oeprating system version an qemu/libvirt that wild10:16
sean-k-mooneyand also two diffent verison of opesntack10:17
noonedeadpunkbut I'm not migrating between them10:17
noonedeadpunkjsut 2 envs where I can redroduce failure quite reliably10:17
sean-k-mooneyso etierh it a regression in nova (but i dont knwo what it could be) that in caracal and not fixed in dalmation or some things you shoudl have a bad day10:17
noonedeadpunkor I messed up in setup/osa...10:18
noonedeadpunksean-k-mooney: destination logs do not have anything enlighting: https://paste.openstack.org/show/bcocoH9OrjscQt4gPTOr/10:28
sean-k-mooneywell it has 10:29
sean-k-mooney2025-05-22 06:55:49.882+0000: initiating migration10:29
sean-k-mooney2025-05-22 07:18:40.025+0000: shutting down, reason=crashed10:29
noonedeadpunkshutting down, reason=migrated and next thing it crashed10:29
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add objects and notifications for sound model.  https://review.opendev.org/c/openstack/nova/+/92612610:29
opendevreviewMichael Still proposed openstack/nova master: Implement sound model extra spec for libvirt.  https://review.opendev.org/c/openstack/nova/+/94077010:29
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add objects and notifications for USB controller model.  https://review.opendev.org/c/openstack/nova/+/92735410:29
opendevreviewMichael Still proposed openstack/nova master: Implement USB controller extra spec for libvirt.  https://review.opendev.org/c/openstack/nova/+/95064310:29
noonedeadpunkah, ok, I think it was first attempt on migrating10:29
noonedeadpunkyeah10:29
noonedeadpunkbut it does not point to any reason....10:30
mikalsean-k-mooney: the fourth patch lacks the compute protection, I'll do that tomorrow. So don't go mad and merge that one too early!10:31
* noonedeadpunk need to look at direct-spice implementation details10:33
noonedeadpunk*spice-direct10:33
sean-k-mooneynoonedeadpunk: this all look very stnadard. not nessiarly optimised for performacne but this is a very typeical looking vm so you not enabling anyting that raises red falgs to me10:34
mikalnoonedeadpunk: So its sort of half done -- there are now Nova APIs to create a console of type spice-direct, with openstacksdk and python-openstackclient support. Sound and USB are still missing (that's the patches above), so if that's a thing you care about you might need to hold your breath a bit. I have patches for kolla-ansible to deploy it,10:35
mikalbut I am sure it has bugs like no HA -- its still a work in progress. My pain focus right now is working through load testing the SPICE native proxy so I can try and quantify if it needs tweaking and what scaling to recommend with it.10:35
noonedeadpunkmikal: not specifically care about spice-direct itself, but more wanted to implement that for OSA as well10:36
sean-k-mooneymikal: ack 10:36
mikalnoonedeadpunk: so OSA would need to know how to deploy the proxy and to twiddle some Nova config options. The existing Kolla-Ansible patches demonstrate how to do that.10:37
noonedeadpunkmikal: do you have link top patches somewhere nearby?10:37
noonedeadpunkthere's support for spice-html5 but direct looks quite different 10:38
mikalnoonedeadpunk: this is the list of patches I am carrying against Kolla-Ansible right now -- https://github.com/shakenfist/kerbside-patches/blob/develop/kolla-ansible/ORDER. There are also some minor patches against Kolla itself at https://github.com/shakenfist/kerbside-patches/blob/develop/kolla/ORDER. I'm sorry that's not the most helpful way to10:38
mikalexpress them but its what I have right now.10:38
sean-k-mooneynoonedeadpunk: it is its using a out of tree proyx that mikal is developing called kerbside that actully allows raw spice conenction form a client to the vm 10:39
mikalnoonedeadpunk: HTML5 and spice-direct can coexist I think, as long as you don't turn on the new require_secure option. The HTML5 client doesn't support TLS.10:39
sean-k-mooneythe proxy makes that safe10:39
sean-k-mooneyat least in thory10:39
mikalnoonedeadpunk: the proxy is at https://github.com/shakenfist/kerbside for what its worth. Like I said, the proxy itself is still a bit of a work in progress.10:39
noonedeadpunkmikal: ok, so proxy needs to be started on control side I assume?10:40
noonedeadpunkand then nova should be configured to point to it?10:41
mikalnoonedeadpunk: yes. The proxy sits running all the time and needs to be network accessible to your users. Nova is configured with the public URL to the proxy, and generates console access URLs using that URL template when a user creates a spice-direct console and then requests access to it. The proxy uses the console auth token to turn that back10:42
mikalinto hypervisor / port and then proxies the SPICE traffic between the client and the hypervisor.10:42
mikalnoonedeadpunk: https://github.com/shakenfist/kerbside-patches/blob/develop/tools/test-console#L88 is a worked example of that with a terrible fake client.10:42
noonedeadpunkso in theory, L4 balancing can be added on top for public access?10:43
mikalnoonedeadpunk: yes. I expect the proxy would be clustered and it would be fine with traffic hitting multiple proxies. I suspect you'd want to try and keep a single client limited to a single backend for failure domain reasons, but apart from that it should just work.10:43
noonedeadpunkok, I should try making some patches, looks quite doable so far10:44
mikalnoonedeadpunk: state is in a central mysql database much like nova does.10:44
mikalnoonedeadpunk: by far my biggest concern is if the proxy is performant enough, but that's why that's the bit I am focused on right now.10:44
mikalsean-k-mooney should still merge my patches though ;)10:44
mikal(oVirt does something similar to all this, but they use squid as the proxy which means the client needs to know which hypervisor and port to connect to because its basically just doing SSL CONNECTs under the hood).10:45
noonedeadpunkmikal: um... is the config respecting only ENV vars?10:47
mikalnoonedeadpunk: so kerbside itself isn't very "openstacky". Its written more like other personal projects of mine. It expects its configuration in environment variables, which can be in a systemd envvar file if that is less messy. Kolla-Ansible likes these start scripts, which is why it is done like that for those patches. It wouldn't be too hard to10:48
mikalsupport a more traditional config file as well, its all just pydantic under the hood -- https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py10:48
noonedeadpunksystemd envvar works as well, ofc. was just asking if I'm not missing more "conventional" config :)10:49
noonedeadpunkAnd then servcice itself launches gunicorn? meaning https://github.com/shakenfist/kerbside/blob/053d068b0cc5b25a6263fe39a3548ca985b972ee/kerbside/api.py#L555-L56310:50
mikalnoonedeadpunk: you didn't miss it. Pydantic can do YAML etc as well apparently https://docs.pydantic.dev/latest/concepts/pydantic_settings/#other-settings-source10:50
mikalnoonedeadpunk: correct.10:50
noonedeadpunkso, like, it does not make sense to try to launch it with uWSGI or smth like that?10:50
noonedeadpunkas if I can run service as wsgi - I'd rather do that...10:51
mikalnoonedeadpunk: maybe? The gunicorn bit is only for the REST API and these Nova generated access URLs, so its not particularly high traffic or anything. I am not opposed to supporting other runtimes, it just hasn't been a priority for me yet.10:51
mikalnoonedeadpunk: like I said my current focus is working through some sort of performance testing, because I'd say there's a 50% change kerbside is going to get a rewrite in golang or C or something.10:52
noonedeadpunkright.10:52
noonedeadpunknot sure if golang will help a lot, but yeah10:52
mikalSo it works right now, but I am unclear if it is performant enough. The proxy is basically a proof of concept right now.10:52
noonedeadpunkFrom my prespective separation of uwsgi API and worker itself would be really nice to have. Probably bercause too much got used to how openstack is written :)10:56
mikalOk, I'll make a note to circle back to that once I have a stronger view of if a rewrite is required or not.10:57
noonedeadpunkAs for instance we have quite established way of running wsgi apps, as well as wiring around TLS generation and deployment for it10:58
mikalBasically the focus so far has been on getting a proxy to work at all, given the poor state of SPICE protocol documentation, and then on landing the supporting nova patches. I am only just now circling back to productionizing the proxy.10:58
noonedeadpunkyeah, right10:58
mikalI don't know much about uwsgi. If you have best practise recommendations I'd be quite interested in those.10:59
noonedeadpunkit's not that I'm blaming or anything, just commenting :)10:59
mikalYeah its all good. I'm mildly excited someone is considering deploying this at all. Its sometimes lonely over here.10:59
noonedeadpunkwe have a uwsgi role we're actively using, which produced config file like this one: https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/templates/uwsgi.ini.j211:00
noonedeadpunkI totally see how useful this all might be in light of VMware migrations11:00
mikalYeah, I'm basically aiming at Citrix-like functionality here. Its just going to take a little while to get there.11:01
sean-k-mooneynoonedeadpunk: i dont even think you need to get partically fancy with the loadbalancing just have haproxxy or whatere use the souce ip/mac affiniteis the connetion to one of the active backends11:01
noonedeadpunkbut also this allows to secure communication with API side with TLS, as I think it's not covered now11:01
sean-k-mooneynoonedeadpunk: basiclly like you would for the novnc or html spice console11:01
noonedeadpunksean-k-mooney: exactly what I was thinking about, yes11:01
noonedeadpunkjsut not L7 but L4 I guess11:01
mikalnoonedeadpunk: Kerbside API TLS is done with a frontend like apache2, not natively. SPICE protocol TLS is done natively within Kerbside.11:01
sean-k-mooneyya so use the source ip and tcp port to balance11:02
sean-k-mooneyif it fails over they will have to reconenct11:02
noonedeadpunkright, but how you communicate between apache2 and Kerbside API11:02
sean-k-mooneybut i dotn see a way around that without clustering in kerbside11:02
noonedeadpunkas it;s listening on 0.0.0.0 explicitly11:02
mikalIf you squint at it, Kerbside is sort of a layer 7 load balancer. Its the thing which understands the protocol well enough to direct traffic to the right backend.11:02
noonedeadpunksean-k-mooney: I think it's totally fine tbh and more then reasonable11:02
mikalnoonedeadpunk: that's just the default in the config, you can override it.11:03
noonedeadpunkmikal: um, can you? https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py#L5611:03
noonedeadpunkand if that is a bare metal host and not docker container?11:03
mikalnoonedeadpunk: yeah, that whole template can be overridden by exporting KERBSIDE_API_COMMAND_LINE in the starting environment.11:04
noonedeadpunkah11:04
noonedeadpunkok11:04
sean-k-mooneyits just a wsgi app by the looks of it11:04
noonedeadpunkso I can also set `echo ''` to it kinda?11:04
noonedeadpunkit is11:04
noonedeadpunkjsut started by the main service11:04
mikalnoonedeadpunk: I don't follow the echo comment?11:04
sean-k-mooneymikal: are you using https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py#L54 to gengerate  a script to run kerbside11:05
noonedeadpunkso if I want to run wsgi separately, I shoudl be able to replace KERBSIDE_API_COMMAND_LINE with noop command quite successfully I assiume11:05
sean-k-mooneyas in a console script11:05
sean-k-mooneyor are you using that to spwan serpte isntance of kerbside per conection11:05
mikalnoonedeadpunk: secretly, this is just a cut down (and a bit older) version of https://github.com/shakenfist/shakenfist/blob/develop/shakenfist/config.py which is the project that code was lifted from.11:05
mikalOh interesting.11:06
mikalLike I said, I don't know much about uwsgi. If flask can work in that mode then I think that would work?11:06
sean-k-mooneynoonedeadpunk: https://github.com/shakenfist/kerbside/blob/develop/kerbside/api.py#L566-L56811:07
mikalThe protocol native proxy is a separate socket server.11:07
noonedeadpunkusually gunicorn, uwsgi and mod_wsgi are interchangable11:07
sean-k-mooneymikal: so your api consol script currently pawans a wsgi server form api.py11:07
sean-k-mooneybut normlaly you woudl want to spin up the wsgi appolcation directly with a webserver and not spawn it with Popen like that if it can be shared between conenctions11:08
noonedeadpunkand then if your intention to have apache2 in front for TLS, using mod_wsgi might be even more wise 11:08
sean-k-mooneyif ithis is one process per api request however that is differnt11:08
mikalThe Shaken Fist Foundation [1] welcomes your feedback and potential contributions.11:08
mikal1: Said foundation does not exist.11:08
sean-k-mooneymikal: without deep diving i think noonedeadpunk could ignore your main an  api.py11:09
mikalI think what I am getting from this conversation is that I should go read more about uwsgi, which I pinky swear to do.11:10
sean-k-mooneyand just spin up the wsig app directly and as they said use apache in front to handell tls11:10
sean-k-mooneywsgi and uwsgi are also two diffent things11:10
sean-k-mooneyuwsgi, gurnicorn, and mod_wsgi are 3 examples wor wsgi applcation servers11:11
sean-k-mooneyas in they host a wsgi applcation11:11
sean-k-mooneywsgi its self is a protocol11:11
noonedeadpunkyeah, so usually you just don;t need to spawn gunicorn as part of the service, but do that separately11:12
noonedeadpunkand leave user to choose wsgi server11:12
mikalOk, I will put reading about all this on my todo list for tomorrow.11:12
noonedeadpunkas some might want to run granian or smth like that even11:12
mikalAlthough, writing load testing tooling sounds like more fun ;)11:13
sean-k-mooneyso mikal  what actully happenign when you creat taht subprcess is your stattign a wsgi sever that is importing https://github.com/shakenfist/kerbside/blob/develop/kerbside/api.py#L39-L4211:13
sean-k-mooneymikal: then en that applcation host is bidnign to the port in the temlsate and handeign the request ot of band of the kerbside command you use to run it11:14
sean-k-mooneymikal: by the way the way you wrote looks liek you wanted it to work how nova-api worked usign the eventlet webserver whild also not using eventlet :)11:15
sean-k-mooneyi.e. you wanted  to provide a simpel way to run it form the cli via the kerbside command generated by this entry point https://github.com/shakenfist/kerbside/blob/develop/setup.cfg#L2411:16
mikalsean-k-mooney: Oh I was definitely thinking about nova-api when I wrote it, but it was also over four years ago when I wrote that which means its all long forgotten now.11:16
sean-k-mooneymikal: kolla recently moved form apache with mod_wsi to uwsgi11:17
sean-k-mooneyyou shoudl be able to skip the commdn without any code changes and just run that flask ap directly by provieing the moduel path like you did to grunicorn11:18
mikalsean-k-mooney: that's probably good because this is a topic I am entirely not passionate about. I just want to do something people think doesn't suck and then move onto more interesting things.11:18
sean-k-mooneythe main benifity fo that approch is you dont need to do the proces management in your cli11:19
mikalLike I want it to be performant and not make people angry, but it is not my calling.11:19
sean-k-mooneyyou can leave it up to docker/systemd to do that for you11:19
mikalOk well I will play with this tomorrow. I am being assured it is past my bed time now. Feel free to file a github issue if you're keen, otherwise I'll just work through a fix and smash it in.11:20
sean-k-mooneywell none of this is a blcoerk on the nova side. but fore scalablity and security reasons it woudl be good to do when doing a production install11:22
sean-k-mooneyso its somethign that noonedeadpunk  and the kolla folks wil care about11:22
sean-k-mooneybut its not liek its need a big code change at first glance11:23
noonedeadpunkI can try to submit a report later today11:25
noonedeadpunkI need to deal with failing migrations right now :(11:25
mikalThanks guys.11:31
opendevreviewKamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event  https://review.opendev.org/c/openstack/nova/+/94975411:43
opendevreviewMerged openstack/nova stable/2025.1: Amend functional reproducer for bug 1899835  https://review.opendev.org/c/openstack/nova/+/95021312:39
opendevreviewDan Smith proposed openstack/nova master: pci: Add more detail and examples to pci.alias docs  https://review.opendev.org/c/openstack/nova/+/95065913:52
dansmithgibi: Uggla, per yesterday, some refinement ^13:52
opendevreviewDan Smith proposed openstack/nova master: pci: Add more detail and examples to pci.alias docs  https://review.opendev.org/c/openstack/nova/+/95065913:53
opendevreviewyaguang proposed openstack/nova stable/2024.2: Handle iso+gpt detections  https://review.opendev.org/c/openstack/nova/+/94394014:08
Uggladansmith, +1 for the new pci.alias example.14:14
opendevreviewMerged openstack/nova stable/2025.1: live migration: Avoid volume rollback mismatches  https://review.opendev.org/c/openstack/nova/+/95021414:33
opendevreviewmelanie witt proposed openstack/nova-specs master: Re-propose vTPM live migration  https://review.opendev.org/c/openstack/nova-specs/+/94754215:23
melwittgibi: fixed the typo you found ^15:23
melwittdansmith, sean-k-mooney: the re-proposal for vtpm live migration spec is ready for review if you may be interested. I have noted the changes from the last approval in the commit message15:26
opendevreviewFabian Wiesel proposed openstack/nova master: libvirt: Expose number of migration threads  https://review.opendev.org/c/openstack/nova/+/95066715:34
gibimelwitt: thanks15:55
gibiI added my +2 back15:55
dansmithmelwitt: ack, I'm several layers deep in the stack right now16:03
opendevreviewMerged openstack/nova stable/2024.2: Amend functional reproducer for bug 1899835  https://review.opendev.org/c/openstack/nova/+/95021516:54
opendevreviewMerged openstack/nova stable/2024.2: live migration: Avoid volume rollback mismatches  https://review.opendev.org/c/openstack/nova/+/95021616:58
melwittdansmith: ack, it is not urgent but it is there when you are ready18:09

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!