Thursday, 2025-05-22

mikal	Potentially silly question(s), but is there a git plugin or precommit or whatever that will just make me magically DCO compliant with minimal effort? I am assuming that DCO implies GPG signed commits, although I cannot find a web page which clearly states tat -- it might just be some text in the commit message which alleges to come from me? If I	01:10
mikal	didn't in fact hallucinate the GPG signature requirement, are there minimum requirements for the GPG key used to sign the commit? My GPG key is relatively well connected, but also ancient.	01:10
mikal	Like... it would be cool if "git review" added missing DCO entries like it does missing change ids.	01:11
opendevreview	Michael Still proposed openstack/nova master: libvirt: Add extra spec for sound device. https://review.opendev.org/c/openstack/nova/+/926126	01:11
opendevreview	Michael Still proposed openstack/nova master: Protect older compute managers from sound model requests. https://review.opendev.org/c/openstack/nova/+/940770	01:11
opendevreview	Michael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection. https://review.opendev.org/c/openstack/nova/+/927354	01:11
opendevreview	sean mooney proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516	04:57
opendevreview	sean mooney proposed openstack/nova master: Add mtty/mdpy support for testing fake mdevs https://review.opendev.org/c/openstack/nova/+/898100	04:57
opendevreview	sean mooney proposed openstack/nova master: WIP : Add mtty support to nova-next https://review.opendev.org/c/openstack/nova/+/922140	04:57
sean-k-mooney	mikal: "git commit -s" should be all you need to do but im not sure yet about an automatic way yet. most seam to recomemd a git alis like "git config --global alias.cs commit --signoff" but `git commit -s` vs `git cs` does not help much	05:03
sean-k-mooney	mikal: you do not need ot sign comites with DCO	05:04
sean-k-mooney	you can but git now supprot using ssh keys instead of GPG keys for sgining if you want to do that	05:04
mikal	Yeah, that bit confuses me. What's to stop a forgery of a signoff if its just a text string in the commit message?	05:04
sean-k-mooney	https://dev.to/janderssonse/git-signoff-and-signing-like-a-champ-41f3	05:04
mikal	Ahhh, so signoff != signing.	05:05
sean-k-mooney	mikal: well gerrit will prevent forgin the authero but the commit message can be updated	05:05
sean-k-mooney	signing commtis wont help because we dont use mege commits	05:05
sean-k-mooney	so your commit will be rebasees/cherry picked anyway when its merged by zuul	05:06
mikal	Well, at least this means I can still stick with my 25 year old GPG key!	05:06
mikal	Its an antique and needs to be cherished.	05:06
sean-k-mooney	so i have gpg siging setup or at least id did before i reinstalled my laptop....	05:07
mikal	I GPG sign tags on some personal projects, although I am unclear on how meaningful that is with such an old key.	05:07
sean-k-mooney	but i tend not to use it because i had issue with geting it to prompt both locally and if i ssh into my work laptop	05:07
sean-k-mooney	i.e. if i sshed in it woudl tyr ot pop up a unlock prome in a window on my work laptop instead of in the ssh terminal	05:08
sean-k-mooney	i think i eventrally got it to work by alwasy forcign the gpg agent to use the tty for promting instead of the shel integration btu i remember that beign a bit of a pain	05:08
sean-k-mooney	mikal: i mean if that key has never been leaked	05:09
mikal	Yeah fair enough. I feel like GPG never really took off because its a huge pain in the ass.	05:09
sean-k-mooney	its a as valid as min	05:09
sean-k-mooney	ya it kind of used to be. i inally set it up because i wanted to be able to sgin tag for networking-ovs-dpdk and my emails	05:10
sean-k-mooney	but signing emails with gpg keys causes all sort of issues at times	05:10
mikal	Dr Google assures me that a 1024 bit DSA key is crackable by motivated players. So that's nice.	05:10
sean-k-mooney	mikal: by the way git-review is nto the thing that adds the change id for you	05:13
sean-k-mooney	mikal: we have configured our gerrit to server a commit hook which git review will download and install for you	05:14
sean-k-mooney	im not sure if we can do something similar ot reject or amend your commit automticlly for you in a sane way	05:14
mikal	Ahhh ok. My lived experience is "git review" saying "you're missing thingies" and adding them, but I've never looked at the mechanism all that hard.	05:14
sean-k-mooney	ah https://gerrit.cloudera.org/Documentation/project-configuration.html#_require_signed_off_by	05:15
sean-k-mooney	so ya we can just turn on the enforcement in gerrit	05:15
sean-k-mooney	but im not sure about the auto fixign i suspect we woudl need to dig deaper	05:16
sean-k-mooney	im sure git can suprpot this in a hook but we might need to write it	05:16
mikal	The commit message hook appears to already understand "Signed-off-by trailers" based on a cursory reading?	05:18
mikal	It uses https://git-scm.com/docs/git-interpret-trailers to do... things.	05:19
mikal	Ahhh, ok. I think its trying quite hard to have the Change-Id before the signed off by trailers. Dr Google does suggest that if you turn on DCO enforcement on the project in gerrit then some amount of magic will happen.	05:22
opendevreview	Abhishek Kekane proposed openstack/nova master: [DNM] Revert^2 "Support glance's new location API" https://review.opendev.org/c/openstack/nova/+/950623	05:27
sean-k-mooney	mikal: regarding pre-commit it does supprot writing "commit-msg" hooks which are pass a single parmeter the path to the file	05:27
sean-k-mooney	mikal: so we coudl write one to either abort the comiit if Signed-off-by was not there ot ro add it for you before the commit was actully finalised	05:28
mikal	As a forgetful person I quite like the idea of option two there.	05:29
mikal	It is after all what we already do for change ids.	05:29
mikal	And my global git config already knows my name and email address, which it needs to know for the log.	05:29
sean-k-mooney	yep	05:29
sean-k-mooney	its proably a good weekend hackign project which is also where we shoudl put it	05:30
sean-k-mooney	in the hacking repo so we can share it acrross other repos	05:30
mikal	It would also make the transition largely transparent... People do a git review, land a new version of the hook automatically, and blam they're compliant. Much more fun than herding cats.	05:30
sean-k-mooney	we can add a new hook there and then enable it in the relevent repos https://github.com/openstack/hacking/blob/master/.pre-commit-hooks.yaml	05:31
mikal	Although... Other projects also use signed-off-by to track who reviewed and approved a change. Is there any intent to do that as well?	05:31
sean-k-mooney	mikal: that actully a missues of it	05:34
sean-k-mooney	mikal: there are other standared lables for that	05:34
sean-k-mooney	mikal: so no not currently however you can actully config gerrit to auto ammend the commit with that info to some degree	05:35
sean-k-mooney	we had that enabeld downstream untile they moved use to gitlab.....	05:35
* sean-k-mooney such a downgrade :(		05:35
mikal	I kind of like the history being all in one place to be honest.	05:35
mikal	So yeah, review history in the commit would be cool. I don't really care what labels are used to be honest. I do agree the kernel seems to use "Signed-off-by", "Tested-by", "Reviewed-by", etc.	05:36
sean-k-mooney	your ment to use Reviewed-by, Acked-by and Tested-by instead of sgined-off by to track that	05:37
sean-k-mooney	so you coudl update the comit with the review vots and add acted by for the person that +w'd and tested-by by zuul (the verifed label votes)	05:38
sean-k-mooney	however to do that gerrit would have to be sett to cerry pick mode which will alwasy cheryy pick the commit on to the target branch changign it git sha	05:39
sean-k-mooney	which will mean you will not be able to knwo the sha to use for backportign until after it has merged because it will alwasy change	05:39
sean-k-mooney	so to do that we loose our current cherry-picked form workflow/lines when backporting	05:40
sean-k-mooney	mikal: the convetions for the by lines alll come form teh linke kernel and are part of there patch submission process https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes	05:42
sean-k-mooney	bauzas: just so your aware i rebased your mtty_suppot sersie and moved my patch to the bottom.	07:07
sean-k-mooney	bauzas: with it i was able to fix the error you were hitting but there is a bug in your first patch https://review.opendev.org/c/openstack/nova/+/898100/11	07:08
sean-k-mooney	the error i got with my fix was libvirt.libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/6dc8d190-2bc9-45ed-a4b8-d4f0708f92bb is in use by driver QEMU, domain instance-00000001	07:08
sean-k-mooney	so somehow nova tired to reuse the same mdev between to vms. i assuem that is a bug in https://review.opendev.org/c/openstack/nova/+/898100/11 that only affect virutal mdev device if not that woudl be a real bug in our current mdev supprot	07:09
sean-k-mooney	i suspect it has somthign to do wtth how your trackign the virutal ones given they do not have a pci adress but im not sure	07:10
mikal	sean-k-mooney: when you have a moment, I feel like the state of the VDI patches has fallen out of my brain. Are you now happy enough with the first two patches in the series that I could squash them into one and then do the compute protection change to the other set of extra specs?	07:27
sean-k-mooney	i share your oxidation. ill need to quickly load context	07:29
sean-k-mooney	you asking about squashing https://review.opendev.org/c/openstack/nova/+/940770/9 and the patch before it	07:30
sean-k-mooney	or all 3	07:30
sean-k-mooney	mikal: i know dan prefers if we sepreate the object change form the driver change	07:31
sean-k-mooney	so i think there overall prefence woudl be have the first patch containe the object and noticicaton object change. then move the rest of the changes in that to the "Protect older compute managers from sound model requests." change	07:32
mikal	So at the moment the chain is: sound device extra spec -> compute manager protection for sound -> usb controller extra spec -> compute manager protection for USB (this last one not yet done)	07:32
mikal	I had thought you'd want the first two squashed so you get the extra spec and the compute protection at the same time.	07:32
sean-k-mooney	yes i want that btu we geneally prefer to have the object chagns in there won patch	07:33
sean-k-mooney	so it would still be 4 patchees	07:33
mikal	I think you're saying for those first two you'd instead prefer to see the first to squashed into one, which becomes the second patch, with the object and notification changes sitting before it?	07:33
mikal	So: sound objs and notifications -> rest of sound -> usb objs and notifications -> rest of usb?	07:33
sean-k-mooney	"add object changes for sound" -> "support soudn device" -> "add object changes for usb" "supprot usb"	07:34
sean-k-mooney	yes	07:34
mikal	Cool. I can do that. Are you happy with the content of the first two patches, just not the way they're presented? Or are there other changes that need to be made too?	07:34
sean-k-mooney	mikal: in the unlikely case we need to revert this its easy to rever the driver change but we can really revert the object change	07:35
sean-k-mooney	so having it be seperate just makes taht edge case or backport a littel simpler	07:35
mikal	sean-k-mooney: that's fair. I'll do that refactor ASAPish if you're otherwise happy with the code.	07:35
sean-k-mooney	ill need to look at the detail but i think i can review this series today	07:36
sean-k-mooney	i dont think there was anything major last time i looked	07:36
sean-k-mooney	so i dont really expect there to be any thing that need to be changed	07:37
mikal	Yeah, certainly I can't see anything major in the comments unless I've missed something.	07:37
mikal	Ok cool. I will do the refactor then and implement the compute manager protection for USB as well.	07:37
noonedeadpunk	Hey folks! Got a question. Today realized that quite an unexpected behaviour is happening with live migrations. So on memory-intense VMs migration seems to fail on post-migration step.	09:23
noonedeadpunk	The weird/enexpected thing, that as a result VM keeps shut down on both hosts and it moves to ERROR state	09:23
noonedeadpunk	so decided to ask if this is smth quite well known?	09:24
noonedeadpunk	As right now I think it's related to `live_migration_permit_auto_converge = true`	09:24
noonedeadpunk	as issue happens only on memory intense VMs. So if you run stress inside VM - it will 99% end up in ERROR and shut down. If there's some reasonable applucation running - it's fine	09:25
noonedeadpunk	So my guess is that if live_migration_permit_auto_converge kicks in, libvirt simply incapable to resume the VM on thye destination, but I'm not really sure if that is true	09:26
noonedeadpunk	Thus before going to this rabbit hole wanted to ask if you might now anything about that :)	09:26
frickler	noonedeadpunk: I assume you checked the nova-compute logs without success? how about the libvirt instance logs? but maybe wait for sean-k-mooney to chime in first	09:30
sean-k-mooney	sorry was looking at something else reading back	09:31
sean-k-mooney	noonedeadpunk: do you have post_copy migration enabeld?	09:32
sean-k-mooney	auto converge just add cpu puases so that is very unlikely to be a factor	09:32
sean-k-mooney	noonedeadpunk: for what its worth if there is high ram load it basically imposibel to migrate without post-copy migration. auto converge si not entough to supprot that usecase	09:34
sean-k-mooney	it just slightly chagne wehre the live mgiration will fall over	09:34
sean-k-mooney	auto converge is still a type of "pre-copy" migration when qemu copes all the memroy first , pauses the vm on the dest copyies the detal and resuses it on the souce	09:36
noonedeadpunk	no, post_copy is disabled	09:36
noonedeadpunk	only converge is	09:36
sean-k-mooney	the difference betwen live_migration_permit_auto_converge = true and false is if its set to true. the qemu on the souce is allwo to micro pause the souce vm cpus before switching to the dest	09:36
noonedeadpunk	yes, right, and that used to work nicely	09:37
sean-k-mooney	in your case if your getting to post_live_migration that mean libvirt told nova the migration was complete	09:37
noonedeadpunk	and I think regardless that should not end up with VM jsut being down on both sides?	09:37
noonedeadpunk	jsut a sec, I'll re-paste logs	09:37
sean-k-mooney	at which point the vm on the souce is not need any mroe and shoudl be terminate by libvirt	09:38
sean-k-mooney	as you cant rool back at that point	09:38
noonedeadpunk	it;s from source https://paste.openstack.org/show/bSc55xr9LSgCXiaj1uo1/	09:38
noonedeadpunk	and that's destination: https://paste.openstack.org/show/boBhuqrR8mYDoqCjYxex/	09:38
noonedeadpunk	the only very suspicious thing I've spotted, is that I also do see a libvirt reseted communication to qemu	09:39
sean-k-mooney	i knwo you said this but the vm is activly runing right it not paused when you live migrated it right	09:40
noonedeadpunk	so it feels that during unpause things crash unobviously	09:40
noonedeadpunk	it's running, yes	09:40
noonedeadpunk	I mean paused on destination	09:40
sean-k-mooney	the reaons im asking is this is sort of similar to https://issues.redhat.com/browse/RHEL-48801 but that about live migrating a puased instance twice	09:41
sean-k-mooney	qemu was not probly handling something in the block storage layer in that case	09:41
noonedeadpunk	there's also no block migration as storage is ceph	09:42
sean-k-mooney	ack that also good to know	09:42
noonedeadpunk	but yeah, trying out post_copy was my next step to try	09:43
noonedeadpunk	Was jsut a bit afraid of it tbh	09:43
sean-k-mooney	noonedeadpunk: can you check the qemu instance logs and see if there is an error on the source or dest	09:43
sean-k-mooney	for the block device one we saw a message like this	09:43
sean-k-mooney	qemu-kvm: ../block.c:6979: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.	09:43
sean-k-mooney	2024-07-16 13:29:55.795+0000: shutting down, reason=crashed	09:43
sean-k-mooney	noonedeadpunk: post-copy is more robust to activly loaded vms and is effectivly requried if your using 1G hugepages	09:44
sean-k-mooney	you can kind of get vms with 2M hugepages to migate without it but 1G hugepages guests are very hard to make function	09:45
noonedeadpunk	I did not configure hugepages explicitly tbh	09:45
sean-k-mooney	it woudl be woth trying but i suspect your hiting a qemu but and post copy might help you work around that but im not sure	09:45
sean-k-mooney	noonedeadpunk: if you didnt then its not a factor	09:45
sean-k-mooney	the kernel might be using transparent hugepages	09:45
sean-k-mooney	but if that the case it wont impact the migration	09:46
noonedeadpunk	yeah, let me try to reproduce one more time as original VM was dropped apparently	09:46
sean-k-mooney	the issue with explcit hugepages is if you dirty the page qemu has to copy the entire page again	09:46
sean-k-mooney	mean a 1 byte write requrie 1G of data to be copied again	09:46
* sean-k-mooney yes all memory writes are gong to be a cachelien so a 1 byte write is really a 64 byte write...		09:47
noonedeadpunk	so what would be suggested value? 1G?	09:47
sean-k-mooney	sorry can you say that again	09:48
sean-k-mooney	are you askign what i recomemnd if you choosoe to use hugepages?	09:48
noonedeadpunk	Yeah :) I';d need to read about them more, but maybe you have a suggestion from your experience	09:48
sean-k-mooney	if so i recommend 2MB hugepages for vms and if you happen ot use ovs-dpdk or have some other use for hugepages in teh ssytem then use 1G hugepages for those but not for the nova vms	09:49
noonedeadpunk	ok, right	09:49
noonedeadpunk	I don't have anything specific	09:49
opendevreview	Markus Hentsch proposed openstack/nova master: Use os-brick for volume attachment passphrases https://review.opendev.org/c/openstack/nova/+/926326	09:49
noonedeadpunk	thus - never bothered to be explicit in defining the size	09:49
sean-k-mooney	noonedeadpunk: well you also need to preallcoate them on teh kernel commandlien or with a start up script and some other config.	09:50
sean-k-mooney	they do improve performance but the make the vm numa affined and you cant have any memroy over subscription	09:50
noonedeadpunk	yeah, see that in the doc https://docs.openstack.org/nova/latest/admin/huge-pages.html	09:50
sean-k-mooney	so for high performacne workload they are greate but they are not for all usecases	09:51
noonedeadpunk	we have ram allocation ratio to 1 anyway	09:51
sean-k-mooney	then you should use them :)	09:51
noonedeadpunk	numa affinity is more scary though	09:51
sean-k-mooney	do you use cpu pinning	09:51
noonedeadpunk	as I can recall some painful experience on scheduling/evacuating with numa pinning	09:51
noonedeadpunk	no, not really	09:52
sean-k-mooney	ack	09:52
noonedeadpunk	there's a specific tier with cpu pinning though	09:52
noonedeadpunk	so probably there it makes most sense to use hugepages...	09:53
noonedeadpunk	anyway	09:53
sean-k-mooney	all cpu pinned guests are numa afifned and unless your using file backed memory when you use cpu pinning you need to specify hw:mem_page_size	09:53
sean-k-mooney	you can set it to small	09:53
sean-k-mooney	so you dont need hugepages	09:53
sean-k-mooney	but if you dont set it your pinned VMs will eventually get OOM killed	09:53
sean-k-mooney	noonedeadpunk: lets park the wonderful world of numa for now	09:54
noonedeadpunk	right	09:54
noonedeadpunk	but it's super interesting as well;)	09:54
sean-k-mooney	if you replicate the issue please check if there is anything in the qemu isntance logs	09:54
noonedeadpunk	thanks for enlighting me	09:54
sean-k-mooney	im guessing your hiting a bug at that level but if not that also valumbel to knwo	09:54
sean-k-mooney	noonedeadpunk: by the way on the srouce node sie we see in the logs that the migration compelted an libvirt stop the vm based on the auti message then it proceed to post live migration.	10:02
noonedeadpunk	yeah	10:02
noonedeadpunk	so it's kinda nowhere to revert at this point	10:02
sean-k-mooney	so if there is a failure i woudl expect it to be in the destionat qemu instance log	10:02
noonedeadpunk	I was thinking it's smth related to neutron port attachment, but I saw that on both OVN and OVS deployments	10:03
noonedeadpunk	on Ubuntu 22.04 and 24.04	10:03
noonedeadpunk	(Caracal and Dalmatian)	10:03
sean-k-mooney	ya i guess this was 22.04 basesed on teh fact your runnning caracal on 3.10	10:03
noonedeadpunk	this one was ovs one, yeah	10:04
sean-k-mooney	and i assum OSA based on the fact your installed in a venv but not the normal kolla location	10:04
noonedeadpunk	sure :)	10:04
sean-k-mooney	i mean that a pretty battel tested config at this point	10:05
noonedeadpunk	"normal kolla location" is a bit hilarious, but yeah :)	10:05
sean-k-mooney	so its interesting that this is the first tiem someone raised this specific failrue mode	10:05
noonedeadpunk	as I think system packages would use same path as kolla?	10:05
sean-k-mooney	kolla supprots soruce or binary installs in the contianer	10:06
noonedeadpunk	that's why my question also was if it's smth known already so I don't have to dig through the rabbit hole	10:06
sean-k-mooney	if it source it builds the contaienr with a venv ate /var/lib/kolla something i think	10:06
noonedeadpunk	ah, ok, right	10:06
sean-k-mooney	that how it used to work anyway i have not contributed to kolla in a long time although my home oepnstack s deployed with caracal using kolla :)	10:07
sean-k-mooney	my current config is defintly not a supproted one however so it prehaps not the best refernce	10:08
noonedeadpunk	I don't think it has anything to do with deployment method tbh, unless we've messed up with mtls certificate generation	10:08
noonedeadpunk	or ssh cert	10:08
noonedeadpunk	but I don't think live migrations use ssh for post anyway?	10:09
sean-k-mooney	i dont thnk so either i was more wondering what the rought qemu/libvirt/kernel version woudl be in this case	10:09
noonedeadpunk	qemu 6.2 libvirt 8.0.0 kernel 6.5.0	10:10
sean-k-mooney	noonedeadpunk: so form the libvirt point of view the migration was complete. and at this point knwo shoudl not need to use ssh because the only thing its copying with images_type=rbd is liek the consoel	10:10
sean-k-mooney	and that happend already	10:10
sean-k-mooney	s/point know/point nova/	10:11
noonedeadpunk	and for 24.04 it was qemu 8.2.2 libvirt 10.0.0 and kernel 6.8,0	10:11
sean-k-mooney	what ever happend to the vm on the dest i think is local to the dest host.	10:11
sean-k-mooney	are you migrating acrsso distor verison or is the os the saem on both ends	10:11
noonedeadpunk	it;'s same on both ends. just different envs I've seen same thing	10:12
sean-k-mooney	goign old to new is supproted but if thise does end up beign a qemu issue thten that will be an impoarnt factor if the qemu versions differ	10:12
noonedeadpunk	yeah, right, that would be completely different story	10:12
noonedeadpunk	and where 24.04 is OVN deployment as well	10:13
noonedeadpunk	as I had suspicions for port attachement tbh	10:13
sean-k-mooney	so that almost makes me think its ceph related	10:13
sean-k-mooney	seperate ceph clusters?	10:14
sean-k-mooney	so regardign the ports we create the ovs port on the destiatnion in pre-live-migration	10:14
noonedeadpunk	separate in terms of? different? yes, these are 2 completely independent clusters at different sites in different countries, so nothing is shared between them at all	10:14
sean-k-mooney	and then we activate the port binding in neutron in post live migration	10:15
sean-k-mooney	but we are not modifyign the data path just switchihng which host is the active one	10:15
noonedeadpunk	yeah, so I saw on OVN some fishy logs regarding port attachement... anyway, let me reproduce the thing :)	10:15
sean-k-mooney	so the port biding activation shoudl not impact teh vm as that is out side its view	10:15
noonedeadpunk	ah, right...	10:15
noonedeadpunk	makes sense	10:15
sean-k-mooney	if they are two entrily diffent site witn two entrily diffent versio oeprating system version an qemu/libvirt that wild	10:16
sean-k-mooney	and also two diffent verison of opesntack	10:17
noonedeadpunk	but I'm not migrating between them	10:17
noonedeadpunk	jsut 2 envs where I can redroduce failure quite reliably	10:17
sean-k-mooney	so etierh it a regression in nova (but i dont knwo what it could be) that in caracal and not fixed in dalmation or some things you shoudl have a bad day	10:17
noonedeadpunk	or I messed up in setup/osa...	10:18
noonedeadpunk	sean-k-mooney: destination logs do not have anything enlighting: https://paste.openstack.org/show/bcocoH9OrjscQt4gPTOr/	10:28
sean-k-mooney	well it has	10:29
sean-k-mooney	2025-05-22 06:55:49.882+0000: initiating migration	10:29
sean-k-mooney	2025-05-22 07:18:40.025+0000: shutting down, reason=crashed	10:29
noonedeadpunk	shutting down, reason=migrated and next thing it crashed	10:29
opendevreview	Michael Still proposed openstack/nova master: libvirt: Add objects and notifications for sound model. https://review.opendev.org/c/openstack/nova/+/926126	10:29
opendevreview	Michael Still proposed openstack/nova master: Implement sound model extra spec for libvirt. https://review.opendev.org/c/openstack/nova/+/940770	10:29
opendevreview	Michael Still proposed openstack/nova master: libvirt: Add objects and notifications for USB controller model. https://review.opendev.org/c/openstack/nova/+/927354	10:29
opendevreview	Michael Still proposed openstack/nova master: Implement USB controller extra spec for libvirt. https://review.opendev.org/c/openstack/nova/+/950643	10:29
noonedeadpunk	ah, ok, I think it was first attempt on migrating	10:29
noonedeadpunk	yeah	10:29
noonedeadpunk	but it does not point to any reason....	10:30
mikal	sean-k-mooney: the fourth patch lacks the compute protection, I'll do that tomorrow. So don't go mad and merge that one too early!	10:31
* noonedeadpunk need to look at direct-spice implementation details		10:33
noonedeadpunk	*spice-direct	10:33
sean-k-mooney	noonedeadpunk: this all look very stnadard. not nessiarly optimised for performacne but this is a very typeical looking vm so you not enabling anyting that raises red falgs to me	10:34
mikal	noonedeadpunk: So its sort of half done -- there are now Nova APIs to create a console of type spice-direct, with openstacksdk and python-openstackclient support. Sound and USB are still missing (that's the patches above), so if that's a thing you care about you might need to hold your breath a bit. I have patches for kolla-ansible to deploy it,	10:35
mikal	but I am sure it has bugs like no HA -- its still a work in progress. My pain focus right now is working through load testing the SPICE native proxy so I can try and quantify if it needs tweaking and what scaling to recommend with it.	10:35
noonedeadpunk	mikal: not specifically care about spice-direct itself, but more wanted to implement that for OSA as well	10:36
sean-k-mooney	mikal: ack	10:36
mikal	noonedeadpunk: so OSA would need to know how to deploy the proxy and to twiddle some Nova config options. The existing Kolla-Ansible patches demonstrate how to do that.	10:37
noonedeadpunk	mikal: do you have link top patches somewhere nearby?	10:37
noonedeadpunk	there's support for spice-html5 but direct looks quite different	10:38
mikal	noonedeadpunk: this is the list of patches I am carrying against Kolla-Ansible right now -- https://github.com/shakenfist/kerbside-patches/blob/develop/kolla-ansible/ORDER. There are also some minor patches against Kolla itself at https://github.com/shakenfist/kerbside-patches/blob/develop/kolla/ORDER. I'm sorry that's not the most helpful way to	10:38
mikal	express them but its what I have right now.	10:38
sean-k-mooney	noonedeadpunk: it is its using a out of tree proyx that mikal is developing called kerbside that actully allows raw spice conenction form a client to the vm	10:39
mikal	noonedeadpunk: HTML5 and spice-direct can coexist I think, as long as you don't turn on the new require_secure option. The HTML5 client doesn't support TLS.	10:39
sean-k-mooney	the proxy makes that safe	10:39
sean-k-mooney	at least in thory	10:39
mikal	noonedeadpunk: the proxy is at https://github.com/shakenfist/kerbside for what its worth. Like I said, the proxy itself is still a bit of a work in progress.	10:39
noonedeadpunk	mikal: ok, so proxy needs to be started on control side I assume?	10:40
noonedeadpunk	and then nova should be configured to point to it?	10:41
mikal	noonedeadpunk: yes. The proxy sits running all the time and needs to be network accessible to your users. Nova is configured with the public URL to the proxy, and generates console access URLs using that URL template when a user creates a spice-direct console and then requests access to it. The proxy uses the console auth token to turn that back	10:42
mikal	into hypervisor / port and then proxies the SPICE traffic between the client and the hypervisor.	10:42
mikal	noonedeadpunk: https://github.com/shakenfist/kerbside-patches/blob/develop/tools/test-console#L88 is a worked example of that with a terrible fake client.	10:42
noonedeadpunk	so in theory, L4 balancing can be added on top for public access?	10:43
mikal	noonedeadpunk: yes. I expect the proxy would be clustered and it would be fine with traffic hitting multiple proxies. I suspect you'd want to try and keep a single client limited to a single backend for failure domain reasons, but apart from that it should just work.	10:43
noonedeadpunk	ok, I should try making some patches, looks quite doable so far	10:44
mikal	noonedeadpunk: state is in a central mysql database much like nova does.	10:44
mikal	noonedeadpunk: by far my biggest concern is if the proxy is performant enough, but that's why that's the bit I am focused on right now.	10:44
mikal	sean-k-mooney should still merge my patches though ;)	10:44
mikal	(oVirt does something similar to all this, but they use squid as the proxy which means the client needs to know which hypervisor and port to connect to because its basically just doing SSL CONNECTs under the hood).	10:45
noonedeadpunk	mikal: um... is the config respecting only ENV vars?	10:47
mikal	noonedeadpunk: so kerbside itself isn't very "openstacky". Its written more like other personal projects of mine. It expects its configuration in environment variables, which can be in a systemd envvar file if that is less messy. Kolla-Ansible likes these start scripts, which is why it is done like that for those patches. It wouldn't be too hard to	10:48
mikal	support a more traditional config file as well, its all just pydantic under the hood -- https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py	10:48
noonedeadpunk	systemd envvar works as well, ofc. was just asking if I'm not missing more "conventional" config :)	10:49
noonedeadpunk	And then servcice itself launches gunicorn? meaning https://github.com/shakenfist/kerbside/blob/053d068b0cc5b25a6263fe39a3548ca985b972ee/kerbside/api.py#L555-L563	10:50
mikal	noonedeadpunk: you didn't miss it. Pydantic can do YAML etc as well apparently https://docs.pydantic.dev/latest/concepts/pydantic_settings/#other-settings-source	10:50
mikal	noonedeadpunk: correct.	10:50
noonedeadpunk	so, like, it does not make sense to try to launch it with uWSGI or smth like that?	10:50
noonedeadpunk	as if I can run service as wsgi - I'd rather do that...	10:51
mikal	noonedeadpunk: maybe? The gunicorn bit is only for the REST API and these Nova generated access URLs, so its not particularly high traffic or anything. I am not opposed to supporting other runtimes, it just hasn't been a priority for me yet.	10:51
mikal	noonedeadpunk: like I said my current focus is working through some sort of performance testing, because I'd say there's a 50% change kerbside is going to get a rewrite in golang or C or something.	10:52
noonedeadpunk	right.	10:52
noonedeadpunk	not sure if golang will help a lot, but yeah	10:52
mikal	So it works right now, but I am unclear if it is performant enough. The proxy is basically a proof of concept right now.	10:52
noonedeadpunk	From my prespective separation of uwsgi API and worker itself would be really nice to have. Probably bercause too much got used to how openstack is written :)	10:56
mikal	Ok, I'll make a note to circle back to that once I have a stronger view of if a rewrite is required or not.	10:57
noonedeadpunk	As for instance we have quite established way of running wsgi apps, as well as wiring around TLS generation and deployment for it	10:58
mikal	Basically the focus so far has been on getting a proxy to work at all, given the poor state of SPICE protocol documentation, and then on landing the supporting nova patches. I am only just now circling back to productionizing the proxy.	10:58
noonedeadpunk	yeah, right	10:58
mikal	I don't know much about uwsgi. If you have best practise recommendations I'd be quite interested in those.	10:59
noonedeadpunk	it's not that I'm blaming or anything, just commenting :)	10:59
mikal	Yeah its all good. I'm mildly excited someone is considering deploying this at all. Its sometimes lonely over here.	10:59
noonedeadpunk	we have a uwsgi role we're actively using, which produced config file like this one: https://opendev.org/openstack/ansible-role-uwsgi/src/branch/master/templates/uwsgi.ini.j2	11:00
noonedeadpunk	I totally see how useful this all might be in light of VMware migrations	11:00
mikal	Yeah, I'm basically aiming at Citrix-like functionality here. Its just going to take a little while to get there.	11:01
sean-k-mooney	noonedeadpunk: i dont even think you need to get partically fancy with the loadbalancing just have haproxxy or whatere use the souce ip/mac affiniteis the connetion to one of the active backends	11:01
noonedeadpunk	but also this allows to secure communication with API side with TLS, as I think it's not covered now	11:01
sean-k-mooney	noonedeadpunk: basiclly like you would for the novnc or html spice console	11:01
noonedeadpunk	sean-k-mooney: exactly what I was thinking about, yes	11:01
noonedeadpunk	jsut not L7 but L4 I guess	11:01
mikal	noonedeadpunk: Kerbside API TLS is done with a frontend like apache2, not natively. SPICE protocol TLS is done natively within Kerbside.	11:01
sean-k-mooney	ya so use the source ip and tcp port to balance	11:02
sean-k-mooney	if it fails over they will have to reconenct	11:02
noonedeadpunk	right, but how you communicate between apache2 and Kerbside API	11:02
sean-k-mooney	but i dotn see a way around that without clustering in kerbside	11:02
noonedeadpunk	as it;s listening on 0.0.0.0 explicitly	11:02
mikal	If you squint at it, Kerbside is sort of a layer 7 load balancer. Its the thing which understands the protocol well enough to direct traffic to the right backend.	11:02
noonedeadpunk	sean-k-mooney: I think it's totally fine tbh and more then reasonable	11:02
mikal	noonedeadpunk: that's just the default in the config, you can override it.	11:03
noonedeadpunk	mikal: um, can you? https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py#L56	11:03
noonedeadpunk	and if that is a bare metal host and not docker container?	11:03
mikal	noonedeadpunk: yeah, that whole template can be overridden by exporting KERBSIDE_API_COMMAND_LINE in the starting environment.	11:04
noonedeadpunk	ah	11:04
noonedeadpunk	ok	11:04
sean-k-mooney	its just a wsgi app by the looks of it	11:04
noonedeadpunk	so I can also set `echo ''` to it kinda?	11:04
noonedeadpunk	it is	11:04
noonedeadpunk	jsut started by the main service	11:04
mikal	noonedeadpunk: I don't follow the echo comment?	11:04
sean-k-mooney	mikal: are you using https://github.com/shakenfist/kerbside/blob/develop/kerbside/config.py#L54 to gengerate a script to run kerbside	11:05
noonedeadpunk	so if I want to run wsgi separately, I shoudl be able to replace KERBSIDE_API_COMMAND_LINE with noop command quite successfully I assiume	11:05
sean-k-mooney	as in a console script	11:05
sean-k-mooney	or are you using that to spwan serpte isntance of kerbside per conection	11:05
mikal	noonedeadpunk: secretly, this is just a cut down (and a bit older) version of https://github.com/shakenfist/shakenfist/blob/develop/shakenfist/config.py which is the project that code was lifted from.	11:05
mikal	Oh interesting.	11:06
mikal	Like I said, I don't know much about uwsgi. If flask can work in that mode then I think that would work?	11:06
sean-k-mooney	noonedeadpunk: https://github.com/shakenfist/kerbside/blob/develop/kerbside/api.py#L566-L568	11:07
mikal	The protocol native proxy is a separate socket server.	11:07
noonedeadpunk	usually gunicorn, uwsgi and mod_wsgi are interchangable	11:07
sean-k-mooney	mikal: so your api consol script currently pawans a wsgi server form api.py	11:07
sean-k-mooney	but normlaly you woudl want to spin up the wsgi appolcation directly with a webserver and not spawn it with Popen like that if it can be shared between conenctions	11:08
noonedeadpunk	and then if your intention to have apache2 in front for TLS, using mod_wsgi might be even more wise	11:08
sean-k-mooney	if ithis is one process per api request however that is differnt	11:08
mikal	The Shaken Fist Foundation [1] welcomes your feedback and potential contributions.	11:08
mikal	1: Said foundation does not exist.	11:08
sean-k-mooney	mikal: without deep diving i think noonedeadpunk could ignore your main an api.py	11:09
mikal	I think what I am getting from this conversation is that I should go read more about uwsgi, which I pinky swear to do.	11:10
sean-k-mooney	and just spin up the wsig app directly and as they said use apache in front to handell tls	11:10
sean-k-mooney	wsgi and uwsgi are also two diffent things	11:10
sean-k-mooney	uwsgi, gurnicorn, and mod_wsgi are 3 examples wor wsgi applcation servers	11:11
sean-k-mooney	as in they host a wsgi applcation	11:11
sean-k-mooney	wsgi its self is a protocol	11:11
noonedeadpunk	yeah, so usually you just don;t need to spawn gunicorn as part of the service, but do that separately	11:12
noonedeadpunk	and leave user to choose wsgi server	11:12
mikal	Ok, I will put reading about all this on my todo list for tomorrow.	11:12
noonedeadpunk	as some might want to run granian or smth like that even	11:12
mikal	Although, writing load testing tooling sounds like more fun ;)	11:13
sean-k-mooney	so mikal what actully happenign when you creat taht subprcess is your stattign a wsgi sever that is importing https://github.com/shakenfist/kerbside/blob/develop/kerbside/api.py#L39-L42	11:13
sean-k-mooney	mikal: then en that applcation host is bidnign to the port in the temlsate and handeign the request ot of band of the kerbside command you use to run it	11:14
sean-k-mooney	mikal: by the way the way you wrote looks liek you wanted it to work how nova-api worked usign the eventlet webserver whild also not using eventlet :)	11:15
sean-k-mooney	i.e. you wanted to provide a simpel way to run it form the cli via the kerbside command generated by this entry point https://github.com/shakenfist/kerbside/blob/develop/setup.cfg#L24	11:16
mikal	sean-k-mooney: Oh I was definitely thinking about nova-api when I wrote it, but it was also over four years ago when I wrote that which means its all long forgotten now.	11:16
sean-k-mooney	mikal: kolla recently moved form apache with mod_wsi to uwsgi	11:17
sean-k-mooney	you shoudl be able to skip the commdn without any code changes and just run that flask ap directly by provieing the moduel path like you did to grunicorn	11:18
mikal	sean-k-mooney: that's probably good because this is a topic I am entirely not passionate about. I just want to do something people think doesn't suck and then move onto more interesting things.	11:18
sean-k-mooney	the main benifity fo that approch is you dont need to do the proces management in your cli	11:19
mikal	Like I want it to be performant and not make people angry, but it is not my calling.	11:19
sean-k-mooney	you can leave it up to docker/systemd to do that for you	11:19
mikal	Ok well I will play with this tomorrow. I am being assured it is past my bed time now. Feel free to file a github issue if you're keen, otherwise I'll just work through a fix and smash it in.	11:20
sean-k-mooney	well none of this is a blcoerk on the nova side. but fore scalablity and security reasons it woudl be good to do when doing a production install	11:22
sean-k-mooney	so its somethign that noonedeadpunk and the kolla folks wil care about	11:22
sean-k-mooney	but its not liek its need a big code change at first glance	11:23
noonedeadpunk	I can try to submit a report later today	11:25
noonedeadpunk	I need to deal with failing migrations right now :(	11:25
mikal	Thanks guys.	11:31
opendevreview	Kamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event https://review.opendev.org/c/openstack/nova/+/949754	11:43
opendevreview	Merged openstack/nova stable/2025.1: Amend functional reproducer for bug 1899835 https://review.opendev.org/c/openstack/nova/+/950213	12:39
opendevreview	Dan Smith proposed openstack/nova master: pci: Add more detail and examples to pci.alias docs https://review.opendev.org/c/openstack/nova/+/950659	13:52
dansmith	gibi: Uggla, per yesterday, some refinement ^	13:52
opendevreview	Dan Smith proposed openstack/nova master: pci: Add more detail and examples to pci.alias docs https://review.opendev.org/c/openstack/nova/+/950659	13:53
opendevreview	yaguang proposed openstack/nova stable/2024.2: Handle iso+gpt detections https://review.opendev.org/c/openstack/nova/+/943940	14:08
Uggla	dansmith, +1 for the new pci.alias example.	14:14
opendevreview	Merged openstack/nova stable/2025.1: live migration: Avoid volume rollback mismatches https://review.opendev.org/c/openstack/nova/+/950214	14:33
opendevreview	melanie witt proposed openstack/nova-specs master: Re-propose vTPM live migration https://review.opendev.org/c/openstack/nova-specs/+/947542	15:23
melwitt	gibi: fixed the typo you found ^	15:23
melwitt	dansmith, sean-k-mooney: the re-proposal for vtpm live migration spec is ready for review if you may be interested. I have noted the changes from the last approval in the commit message	15:26
opendevreview	Fabian Wiesel proposed openstack/nova master: libvirt: Expose number of migration threads https://review.opendev.org/c/openstack/nova/+/950667	15:34
gibi	melwitt: thanks	15:55
gibi	I added my +2 back	15:55
dansmith	melwitt: ack, I'm several layers deep in the stack right now	16:03
opendevreview	Merged openstack/nova stable/2024.2: Amend functional reproducer for bug 1899835 https://review.opendev.org/c/openstack/nova/+/950215	16:54
opendevreview	Merged openstack/nova stable/2024.2: live migration: Avoid volume rollback mismatches https://review.opendev.org/c/openstack/nova/+/950216	16:58
melwitt	dansmith: ack, it is not urgent but it is there when you are ready	18:09

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!