Wednesday, 2022-11-09

melwitt	clarkb: this is the test coverage we have for rescue https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_server_rescue.py and it's enabled in the tempest-integrated-compute job for example https://zuul.opendev.org/t/openstack/build/c35d560c76a24e45959aa609ac372d67/log/controller/logs/tempest_conf.txt#70	01:37
opendevreview	Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806	06:23
opendevreview	Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055	06:23
opendevreview	Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806	07:48
opendevreview	Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055	07:48
opendevreview	Nobuhiro MIKI proposed openstack/nova master: libvirt: add maxphysaddr support https://review.opendev.org/c/openstack/nova/+/864091	08:20
samuelkunkel[m]	Good morning,... (full message at <https://matrix.org/_matrix/media/r0/download/matrix.org/rADMLssdKgBpMiHEvywbsOpx>)	10:06
frickler	samuelkunkel[m]: your message has been truncated by the matrix bridge. I suggest not to use matrix in order to join IRC. if you think that this is still the right solution for you, make sure your messages are not too long	10:21
frickler	in particular avoiding to send multiline messages may be helpful	10:22
samuelkunkel[m]	ah sure, sorry. I can try to make it single line. Links still should work? gonna look for a different client...	10:23
samuelkunkel[m]	we are currently facing an issue in yoga with libvirt 8.0 for reporting mdev devices	10:23
samuelkunkel[m]	in particular https://review.opendev.org/c/openstack/nova/+/838976	10:23
samuelkunkel[m]	is this still being worked on?	10:24
samuelkunkel[m]	(hope it is readable now)	10:24
frickler	seem bauzas was the last one working on it	10:25
samuelkunkel[m]	currently I will use the quick fix provided https://review.opendev.org/c/openstack/nova/+/838976	10:25
bauzas	frickler: yup, I need to update my change	10:26
bauzas	it's a priority I have	10:26
samuelkunkel[m]	that sounds nice, if you need somebody to test - feel free to reach out to me, have some nodes with mdevs to play on	10:38
ygk_12345	HI all	11:56
sean-k-mooney	samuelkunkel[m]: we not only plan to fix that but backport the fix to wallaby as we require it for our downstream product that far and there is no point doing it downstream only since the fix is backpoartable	12:04
sean-k-mooney	so given your on yoga that shoudl hopefully also adress your usecase	12:05
samuelkunkel[m]	yes, that sounds great	12:05
samuelkunkel[m]	I assume there is currently no estimation possible on a timeframe?	12:05
sean-k-mooney	well the patch thats propsoed actully works we just need a few comments adressed	12:06
auniyal	Hi sean-k-mooney	12:06
sean-k-mooney	downstream we have a dealine of mid decemebr to adress this so i am stongly hoping that we can adress this upstream before then so our product team does not start asking me about it	12:07
auniyal	how can we run tox functional locally in train branch	12:07
auniyal	tox -e functional fails	12:07
sean-k-mooney	use the python3 version	12:07
sean-k-mooney	or a vm/container based on ubutu 18.04?	12:07
samuelkunkel[m]	I can second that, it also works on my yoga setup on a non productive cluster. Thanks for the clarification. Until the fix is backported I just use the patch	12:07
samuelkunkel[m]	thanks for all the information	12:07
sean-k-mooney	auniyal: so on tain you can use tox -e functional-py36 or tox -e functional-py37	12:09
sean-k-mooney	auniyal: i would either use ubuntu 18.04/ubuntu-bionic or centos 8 stream to run the tests	12:10
sean-k-mooney	we use 18.04 in teh ci https://github.com/openstack/nova/blob/stable/train/.zuul.yaml#L72-L119	12:11
auniyal	got same error, I think its trying to need some package/module	12:11
auniyal	https://paste.opendev.org/show/bsE4F25vNPl8BaGh7I6I/	12:11
sean-k-mooney	you are trying to use 3.8	12:11
auniyal	oh in here - /usr/lib/python3.8/runpy.py	12:12
sean-k-mooney	do you have 3.6 avaiable	12:12
auniyal	no right now 3.6	12:12
auniyal	3.8	12:12
sean-k-mooney	ya 3.8 was not released/supported by train	12:13
auniyal	if I create venv of 3.6 and install test-requirements.txt in it	12:13
auniyal	will it work	12:13
sean-k-mooney	so if you want to run these you need to use an operating system that was support hence why i said centos 8 stream or ubuntu 18.04	12:13
auniyal	ack, will go with ubuntu 18,	12:14
auniyal	thanks Sean	12:14
sean-k-mooney	if you host os is too new thing liek sqlight might have issues	12:14
sean-k-mooney	basically where we have python modules that wrap c libs	12:15
sean-k-mooney	if your host os lib is too new then the old python bindign might now work	12:15
sean-k-mooney	so if your currently using say the latest fedora you are likely to have issues with old releases like train	12:15
sean-k-mooney	i generally use vms or contaienr to work around that if i hit that	12:16
auniyal	yeah I am using vm , devstack on ubuntu 20	12:16
auniyal	fo this tests, will go with ubuntu 18	12:17
sean-k-mooney	ack i used to keep a few vms around for backporting	12:17
sean-k-mooney	i do that less now just because its rare that i need older then 3.8	12:18
auniyal	ack	12:19
sean-k-mooney	i think we added 3.8 in ussuri so train is really the only release that does not supprot 3.8 officall now	12:19
sean-k-mooney	on it was victoria	12:20
auniyal	for ussuri also I was dependent on zuul, but as there less conflict so it need less tests	12:21
sean-k-mooney	frickler: by the way i have been using matrix on and off via the element client pretty seamlessly for irc	12:23
sean-k-mooney	frickler: i still use weechat as my main irc client	12:24
sean-k-mooney	but if im not at my work laptop i somethimes use teh eleemnt client form my personal laptop or ipad to chat via teh matrix.org bridge	12:24
sean-k-mooney	so ya if you keep messanges relitivly short (3-4 lines) it works fine i havent hit the lenght limit personlly	12:25
sean-k-mooney	of the irc alternivies i have used matrix is really the only one i tollerate	12:26
sean-k-mooney	if the element desktop clinet ever get the ablity to sign into two matix accounts at once it might even be something i would consider as a replacemnt for weechat	12:27
frickler	sean-k-mooney: there have also been issues where the bridge disconnects but you do not notice on the matrix side, so my personal suggest is still to not use this, ymmv	12:27
sean-k-mooney	ya i have not had that issue but i still use irc as my primary interface and matix as what i use when im traveling or not working from my normal location	12:29
sean-k-mooney	so i porably would not notice if there were tempoiry issues	12:30
admin1	i have a vm which is always in a pause state in the hypervisor .. trying to unpause using virsh gives error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) .. the vm is backed by volume on ceph, but ceph is fine and there are no locks	12:33
admin1	what can i do to check/troubleshoot this issue	12:33
admin1	i rebooted the hypervisor as well, no luck	12:33
sean-k-mooney	this might be a lock crated by qemu	12:34
sean-k-mooney	have you tried stopping the vm and staring it	12:35
sean-k-mooney	e.g. via a hard reboot	12:35
admin1	when i do a vrish destroy, it disappears from virsh list --all	12:35
admin1	when i start again (horizon/cli) appears back	12:36
admin1	with a paused state	12:36
sean-k-mooney	ack	12:36
sean-k-mooney	did you check the qemu instance log for any errors	12:37
sean-k-mooney	this does not sound like a nova issue by the way	12:37
sean-k-mooney	this sound like an issue at the qemu/libvirt level and or perhaps the ceph interaction	12:37
sean-k-mooney	you dont happen to have a kvm error in the instance log do you?	12:38
sean-k-mooney	we hit an issue with ubutu 22.04 where libvirt incorrectly detected the cpu model	12:38
sean-k-mooney	it enabled amd cpu flags in the domain on an intel host	12:39
sean-k-mooney	that left teh vm in a paused state	12:39
sean-k-mooney	although that would not expaling the lock message but i woudl check the qemu instance log in anycase	12:39
admin1	sean-k-mooney thanks . i know what to check for now	12:40
admin1	where does qemu/libvirt read the ceph connectioon details like mon addresses ?	12:40
admin1	from /etc/ceph/ceph.conf ?	12:40
admin1	or is it internally somewhere else	12:41
sean-k-mooney	we get them form the cinder attachment connection info and then store them in our db and pass it to libvirt	12:43
sean-k-mooney	so no not from the ceph.conf	12:43
sean-k-mooney	in recent release of openstack (xena+) we have a nova manage command to refresh the atachment info	12:45
sean-k-mooney	https://docs.openstack.org/nova/latest/cli/nova-manage.html#volume-attachment-refresh	12:45
admin1	this one is not xena yet	12:45
admin1	i want to remove 2x mons and use only 1 remaining mon	12:45
admin1	how do I update/edit this db ?	12:45
sean-k-mooney	with great pain and care	12:46
sean-k-mooney	so we added this command to nova-manage because this is sotred in a json blob in the db	12:46
sean-k-mooney	while it can be modifed its a pain to do	12:46
sean-k-mooney	admin1: one option woudl be to grab a xena contaiern or create a xena virtual env and just run nova manage	12:47
sean-k-mooney	i belive this is implemented such that if you have the new version of nova manage and point it to an old cloud it can work but im not 100% certin of that	12:47
admin1	you mean have binaries of xena but connect to existing db to manage/manipulate the entries ?	12:48
sean-k-mooney	ya	12:48
sean-k-mooney	so bauzas gibi correct me if im wrong be we have had customer do that right^	12:48
sean-k-mooney	use the updated contaienr with this command ot repair old dbs when connection infor is out of date	12:48
sean-k-mooney	admin1: i think we have a downstream backport of this by the way to some release which is why im not 100% sure how we used this downstream with train	12:49
sean-k-mooney	admin1: ya so we have it backported downstream to train in our 16.2 product	12:51
sean-k-mooney	and i think we have had custoemr use the 16.2 contaienr to fix this on queens/osp 13	12:52
admin1	i am on osa tag 23.1.2	12:52
admin1	wallaby	12:53
sean-k-mooney	we cannot backport db/object/rpc change even downstream so the fact it works on train implies this is very self contaiend meanign you should be able to use it with wallaby	12:54
admin1	: invalid choice: 'volume_attachment' on this	12:54
admin1	i have to boot a new container, point to the existing one and try from there	12:54
sean-k-mooney	yep	12:55
opendevreview	Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806	13:46
opendevreview	Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055	13:46
dvo-plv	Hello, everyone, Could tou please review our comments on the next blueprint: https://review.opendev.org/c/openstack/nova-specs/+/859290	13:55
*** slaweq_ is now known as slaweq		14:09
*** dasm\|off is now known as dasm		14:10
admin1	sean-k-mooney,is this a libvirt-secrets-gone thing or a ceph thing ? https://gist.githubusercontent.com/a1git/67cc7dab45f9bff536296670ab6ce65d/raw/450a2e31125e01826a90b1f133bc9b4821f807e8/gistfile1.txt	14:19
admin1	my mons were deleted completely .. i recreated those from osds	14:20
admin1	and cinder client was added with the same keys	14:20
admin1	most vms started, a few come with this error	14:20
sean-k-mooney	did the mon ips change	14:22
sean-k-mooney	presumable the secret is the same	14:23
sean-k-mooney	but ya it could be that either the secret is msisign or the user aut info changed	14:23
sean-k-mooney	the sechre has the ceph keyring inside	14:23
sean-k-mooney	i dont have a ceph deployment to check but i belive that is tied a a spcific pool/user uuid	14:24
sean-k-mooney	im not really shoudl how you recover on teh cecph side form all mons going away	14:24
sean-k-mooney	but if any of the uuid chaged then you might need to get new keyrings and update the secret	14:25
sean-k-mooney	changing the mon ips is not supproted in an openstack env since it requried bd surgery to fix	14:26
admin1	the mons were gone totally, but the ips did not change	14:26
admin1	all 3 mons are back in qorum	14:26
admin1	ceph is healthy and most of the vms started OK	14:26
sean-k-mooney	ok the ips are cluster uuid and secrete are teh main things	14:26
admin1	there are 2 i know of that show this behaviour	14:26
admin1	there is no lock	14:26
sean-k-mooney	so if the ips are the same no need to update the nova db unless the cluster id changed	14:27
admin1	cluster name /fsid all is same	14:27
sean-k-mooney	ya fsid was what i ment	14:27
admin1	fsid is the same	14:27
sean-k-mooney	so if that the same then provide the keyring in the secret is still valid you are porably ok	14:27
sean-k-mooney	have you tried using that to list the volumes on the pool	14:28
admin1	got it	14:32
clarkb	melwitt: thanks. It does look like bfv is tested, but it does also appear that the image used for rescuing is modified to set its bus and device types? I wonder if that is what we are missing here. The rescue command itself doesn't appear to take those arguments so these would need to be specified before hand on a special image? I guess that lends more weight to having	15:50
clarkb	dedicated images as a required part of the rescue process?	15:50
melwitt	clarkb: can be image properties or expressed as extra specs in the flavor. I can test out a change that would add a flavor create setting bus and device types in the test	16:07
clarkb	melwitt: either way it is something that the cloud or cloud user would need to be aware of. Currently the default rescue behavior is to reuse the same image the rescued node booted off of. This is problematic because of the label specifier collisions, but also because if the image itself is broken you'd still be broken in a rescue. That leads users to using another image, but	16:09
clarkb	there isn't any clear indication to me as a user that I need to use a special image.	16:09
opendevreview	Dan Smith proposed openstack/nova master: Test ceph-multistore with a real image https://review.opendev.org/c/openstack/nova/+/860864	16:10
clarkb	I suspect the solution here is to make it clear to cloud operators that rescue has requirements x y z (I don't know what they all are yet) and that they should provide an image that meets those requirements	16:11
melwitt	clarkb: hm yeah you are probably right it's only image properties, this section doesn't mention using a flavor to do it https://docs.openstack.org/nova/latest/user/rescue.html#stable-device-instance-rescue	16:11
clarkb	also I wonder if nova should drop the default behavior or reusing the running image and instead force people to explicitly provide one	16:12
clarkb	I suspect there are scenarios where reusing the image would work, but in the vast majority it seems unlikely	16:13
clarkb	and that would help provide signal that something different is required here	16:13
melwitt	I think we could do that in a new API microversion to avoid breaking anyone who is using it the old way and succeeding ... but the fact that openstackclient defaults to lowest microversion makes it more difficult to signal imho	16:16
clarkb	ya and I think users could manually specify the same image if they really did need/want that	16:18
clarkb	it just wouldn't be provided as a dfeault (which I think users expect to work)	16:18
melwitt	yeah, I think that makes sense	16:19
opendevreview	Merged openstack/nova stable/yoga: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/861872	17:19
opendevreview	Amit Uniyal proposed openstack/nova stable/ussuri: add regression test case for bug 1978983 https://review.opendev.org/c/openstack/nova/+/862603	17:31
opendevreview	Amit Uniyal proposed openstack/nova stable/ussuri: For evacuation, ignore if task_state is not None https://review.opendev.org/c/openstack/nova/+/862604	17:31
opendevreview	melanie witt proposed openstack/nova-specs master: Re-propose spec for ephemeral storage encryption https://review.opendev.org/c/openstack/nova-specs/+/864138	18:45
darkhorse	Hi team	19:29
darkhorse	class NovaSession():	19:29
darkhorse	def __init__(self):	19:29
darkhorse	self.auth = v3.Password(KEYSTONE_URL, username=OPENSTACK_ADMIN, password=OPENSTACK_ADMIN_PASS, project_name=ADMIN_PROJECT, user_domain_id=DOMAIN_ID,	19:29
darkhorse	project_domain_id=PROJECT_DOMAIN_ID)	19:29
darkhorse	self.sess = session.Session(self.auth)	19:29
darkhorse	self.nova2 = nova_client.Client(2, session=self.sess)	19:29
darkhorse	I use this code to create nova session. I want to use internal IP address since my app runs on controller. I set KEYSTONE_URL to internal keystone endpoint address but the client still send requests to nova public ip address.	19:31
darkhorse	Is there a setting that I can tell the client to use internal address?	19:31
opendevreview	Dan Smith proposed openstack/nova master: Test ceph-multistore with a real image https://review.opendev.org/c/openstack/nova/+/860864	19:34
melwitt	darkhorse: you might try the 'interface' kwarg to Client (interface=$the_name_of_your_internal_endpoint_in_the_service_catalog) which will get passed to the keystone adapter https://docs.openstack.org/keystoneauth/latest/api/keystoneauth1.adapter.html	19:57
melwitt	there's also endpoint_override to provide the full url but the interface discovery is nicer I think if it works	19:58
darkhorse	melwitt: thank you i ended up using nova_client.Client(2, session=self.sess, endpoint_type='internal')	20:02
darkhorse	interface kwarg seems to be deprecated.	20:02
melwitt	darkhorse: ok cool. it's the other way around I think, endpoint_type is an old name now used as an alias	20:03
darkhorse	ok thank you.	20:04
opendevreview	melanie witt proposed openstack/nova-specs master: Re-propose spec for ephemeral encryption for libvirt https://review.opendev.org/c/openstack/nova-specs/+/864147	22:30
*** dasm is now known as dasm\|offp		23:03
*** dasm\|offp is now known as dasm\|off		23:03
opendevreview	melanie witt proposed openstack/nova-specs master: Re-propose per process healthchecks https://review.opendev.org/c/openstack/nova-specs/+/864150	23:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!