melwitt | clarkb: this is the test coverage we have for rescue https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_server_rescue.py and it's enabled in the tempest-integrated-compute job for example https://zuul.opendev.org/t/openstack/build/c35d560c76a24e45959aa609ac372d67/log/controller/logs/tempest_conf.txt#70 | 01:37 |
---|---|---|
opendevreview | Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806 | 06:23 |
opendevreview | Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055 | 06:23 |
opendevreview | Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806 | 07:48 |
opendevreview | Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055 | 07:48 |
opendevreview | Nobuhiro MIKI proposed openstack/nova master: libvirt: add maxphysaddr support https://review.opendev.org/c/openstack/nova/+/864091 | 08:20 |
samuelkunkel[m] | Good morning,... (full message at <https://matrix.org/_matrix/media/r0/download/matrix.org/rADMLssdKgBpMiHEvywbsOpx>) | 10:06 |
frickler | samuelkunkel[m]: your message has been truncated by the matrix bridge. I suggest not to use matrix in order to join IRC. if you think that this is still the right solution for you, make sure your messages are not too long | 10:21 |
frickler | in particular avoiding to send multiline messages may be helpful | 10:22 |
samuelkunkel[m] | ah sure, sorry. I can try to make it single line. Links still should work? gonna look for a different client... | 10:23 |
samuelkunkel[m] | we are currently facing an issue in yoga with libvirt 8.0 for reporting mdev devices | 10:23 |
samuelkunkel[m] | in particular https://review.opendev.org/c/openstack/nova/+/838976 | 10:23 |
samuelkunkel[m] | is this still being worked on? | 10:24 |
samuelkunkel[m] | (hope it is readable now) | 10:24 |
frickler | seem bauzas was the last one working on it | 10:25 |
samuelkunkel[m] | currently I will use the quick fix provided https://review.opendev.org/c/openstack/nova/+/838976 | 10:25 |
bauzas | frickler: yup, I need to update my change | 10:26 |
bauzas | it's a priority I have | 10:26 |
samuelkunkel[m] | that sounds nice, if you need somebody to test - feel free to reach out to me, have some nodes with mdevs to play on | 10:38 |
ygk_12345 | HI all | 11:56 |
sean-k-mooney | samuelkunkel[m]: we not only plan to fix that but backport the fix to wallaby as we require it for our downstream product that far and there is no point doing it downstream only since the fix is backpoartable | 12:04 |
sean-k-mooney | so given your on yoga that shoudl hopefully also adress your usecase | 12:05 |
samuelkunkel[m] | yes, that sounds great | 12:05 |
samuelkunkel[m] | I assume there is currently no estimation possible on a timeframe? | 12:05 |
sean-k-mooney | well the patch thats propsoed actully works we just need a few comments adressed | 12:06 |
auniyal | Hi sean-k-mooney | 12:06 |
sean-k-mooney | downstream we have a dealine of mid decemebr to adress this so i am stongly hoping that we can adress this upstream before then so our product team does not start asking me about it | 12:07 |
auniyal | how can we run tox functional locally in train branch | 12:07 |
auniyal | tox -e functional fails | 12:07 |
sean-k-mooney | use the python3 version | 12:07 |
sean-k-mooney | or a vm/container based on ubutu 18.04? | 12:07 |
samuelkunkel[m] | I can second that, it also works on my yoga setup on a non productive cluster. Thanks for the clarification. Until the fix is backported I just use the patch | 12:07 |
samuelkunkel[m] | thanks for all the information | 12:07 |
sean-k-mooney | auniyal: so on tain you can use tox -e functional-py36 or tox -e functional-py37 | 12:09 |
sean-k-mooney | auniyal: i would either use ubuntu 18.04/ubuntu-bionic or centos 8 stream to run the tests | 12:10 |
sean-k-mooney | we use 18.04 in teh ci https://github.com/openstack/nova/blob/stable/train/.zuul.yaml#L72-L119 | 12:11 |
auniyal | got same error, I think its trying to need some package/module | 12:11 |
auniyal | https://paste.opendev.org/show/bsE4F25vNPl8BaGh7I6I/ | 12:11 |
sean-k-mooney | you are trying to use 3.8 | 12:11 |
auniyal | oh in here - /usr/lib/python3.8/runpy.py | 12:12 |
sean-k-mooney | do you have 3.6 avaiable | 12:12 |
auniyal | no right now 3.6 | 12:12 |
auniyal | 3.8 | 12:12 |
sean-k-mooney | ya 3.8 was not released/supported by train | 12:13 |
auniyal | if I create venv of 3.6 and install test-requirements.txt in it | 12:13 |
auniyal | will it work | 12:13 |
sean-k-mooney | so if you want to run these you need to use an operating system that was support hence why i said centos 8 stream or ubuntu 18.04 | 12:13 |
auniyal | ack, will go with ubuntu 18, | 12:14 |
auniyal | thanks Sean | 12:14 |
sean-k-mooney | if you host os is too new thing liek sqlight might have issues | 12:14 |
sean-k-mooney | basically where we have python modules that wrap c libs | 12:15 |
sean-k-mooney | if your host os lib is too new then the old python bindign might now work | 12:15 |
sean-k-mooney | so if your currently using say the latest fedora you are likely to have issues with old releases like train | 12:15 |
sean-k-mooney | i generally use vms or contaienr to work around that if i hit that | 12:16 |
auniyal | yeah I am using vm , devstack on ubuntu 20 | 12:16 |
auniyal | fo this tests, will go with ubuntu 18 | 12:17 |
sean-k-mooney | ack i used to keep a few vms around for backporting | 12:17 |
sean-k-mooney | i do that less now just because its rare that i need older then 3.8 | 12:18 |
auniyal | ack | 12:19 |
sean-k-mooney | i think we added 3.8 in ussuri so train is really the only release that does not supprot 3.8 officall now | 12:19 |
sean-k-mooney | on it was victoria | 12:20 |
auniyal | for ussuri also I was dependent on zuul, but as there less conflict so it need less tests | 12:21 |
sean-k-mooney | frickler: by the way i have been using matrix on and off via the element client pretty seamlessly for irc | 12:23 |
sean-k-mooney | frickler: i still use weechat as my main irc client | 12:24 |
sean-k-mooney | but if im not at my work laptop i somethimes use teh eleemnt client form my personal laptop or ipad to chat via teh matrix.org bridge | 12:24 |
sean-k-mooney | so ya if you keep messanges relitivly short (3-4 lines) it works fine i havent hit the lenght limit personlly | 12:25 |
sean-k-mooney | of the irc alternivies i have used matrix is really the only one i tollerate | 12:26 |
sean-k-mooney | if the element desktop clinet ever get the ablity to sign into two matix accounts at once it might even be something i would consider as a replacemnt for weechat | 12:27 |
frickler | sean-k-mooney: there have also been issues where the bridge disconnects but you do not notice on the matrix side, so my personal suggest is still to not use this, ymmv | 12:27 |
sean-k-mooney | ya i have not had that issue but i still use irc as my primary interface and matix as what i use when im traveling or not working from my normal location | 12:29 |
sean-k-mooney | so i porably would not notice if there were tempoiry issues | 12:30 |
admin1 | i have a vm which is always in a pause state in the hypervisor .. trying to unpause using virsh gives error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) .. the vm is backed by volume on ceph, but ceph is fine and there are no locks | 12:33 |
admin1 | what can i do to check/troubleshoot this issue | 12:33 |
admin1 | i rebooted the hypervisor as well, no luck | 12:33 |
sean-k-mooney | this might be a lock crated by qemu | 12:34 |
sean-k-mooney | have you tried stopping the vm and staring it | 12:35 |
sean-k-mooney | e.g. via a hard reboot | 12:35 |
admin1 | when i do a vrish destroy, it disappears from virsh list --all | 12:35 |
admin1 | when i start again (horizon/cli) appears back | 12:36 |
admin1 | with a paused state | 12:36 |
sean-k-mooney | ack | 12:36 |
sean-k-mooney | did you check the qemu instance log for any errors | 12:37 |
sean-k-mooney | this does not sound like a nova issue by the way | 12:37 |
sean-k-mooney | this sound like an issue at the qemu/libvirt level and or perhaps the ceph interaction | 12:37 |
sean-k-mooney | you dont happen to have a kvm error in the instance log do you? | 12:38 |
sean-k-mooney | we hit an issue with ubutu 22.04 where libvirt incorrectly detected the cpu model | 12:38 |
sean-k-mooney | it enabled amd cpu flags in the domain on an intel host | 12:39 |
sean-k-mooney | that left teh vm in a paused state | 12:39 |
sean-k-mooney | although that would not expaling the lock message but i woudl check the qemu instance log in anycase | 12:39 |
admin1 | sean-k-mooney thanks . i know what to check for now | 12:40 |
admin1 | where does qemu/libvirt read the ceph connectioon details like mon addresses ? | 12:40 |
admin1 | from /etc/ceph/ceph.conf ? | 12:40 |
admin1 | or is it internally somewhere else | 12:41 |
sean-k-mooney | we get them form the cinder attachment connection info and then store them in our db and pass it to libvirt | 12:43 |
sean-k-mooney | so no not from the ceph.conf | 12:43 |
sean-k-mooney | in recent release of openstack (xena+) we have a nova manage command to refresh the atachment info | 12:45 |
sean-k-mooney | https://docs.openstack.org/nova/latest/cli/nova-manage.html#volume-attachment-refresh | 12:45 |
admin1 | this one is not xena yet | 12:45 |
admin1 | i want to remove 2x mons and use only 1 remaining mon | 12:45 |
admin1 | how do I update/edit this db ? | 12:45 |
sean-k-mooney | with great pain and care | 12:46 |
sean-k-mooney | so we added this command to nova-manage because this is sotred in a json blob in the db | 12:46 |
sean-k-mooney | while it can be modifed its a pain to do | 12:46 |
sean-k-mooney | admin1: one option woudl be to grab a xena contaiern or create a xena virtual env and just run nova manage | 12:47 |
sean-k-mooney | i belive this is implemented such that if you have the new version of nova manage and point it to an old cloud it can work but im not 100% certin of that | 12:47 |
admin1 | you mean have binaries of xena but connect to existing db to manage/manipulate the entries ? | 12:48 |
sean-k-mooney | ya | 12:48 |
sean-k-mooney | so bauzas gibi correct me if im wrong be we have had customer do that right^ | 12:48 |
sean-k-mooney | use the updated contaienr with this command ot repair old dbs when connection infor is out of date | 12:48 |
sean-k-mooney | admin1: i think we have a downstream backport of this by the way to some release which is why im not 100% sure how we used this downstream with train | 12:49 |
sean-k-mooney | admin1: ya so we have it backported downstream to train in our 16.2 product | 12:51 |
sean-k-mooney | and i think we have had custoemr use the 16.2 contaienr to fix this on queens/osp 13 | 12:52 |
admin1 | i am on osa tag 23.1.2 | 12:52 |
admin1 | wallaby | 12:53 |
sean-k-mooney | we cannot backport db/object/rpc change even downstream so the fact it works on train implies this is very self contaiend meanign you should be able to use it with wallaby | 12:54 |
admin1 | : invalid choice: 'volume_attachment' on this | 12:54 |
admin1 | i have to boot a new container, point to the existing one and try from there | 12:54 |
sean-k-mooney | yep | 12:55 |
opendevreview | Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806 | 13:46 |
opendevreview | Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055 | 13:46 |
dvo-plv | Hello, everyone, Could tou please review our comments on the next blueprint: https://review.opendev.org/c/openstack/nova-specs/+/859290 | 13:55 |
*** slaweq_ is now known as slaweq | 14:09 | |
*** dasm|off is now known as dasm | 14:10 | |
admin1 | sean-k-mooney,is this a libvirt-secrets-gone thing or a ceph thing ? https://gist.githubusercontent.com/a1git/67cc7dab45f9bff536296670ab6ce65d/raw/450a2e31125e01826a90b1f133bc9b4821f807e8/gistfile1.txt | 14:19 |
admin1 | my mons were deleted completely .. i recreated those from osds | 14:20 |
admin1 | and cinder client was added with the same keys | 14:20 |
admin1 | most vms started, a few come with this error | 14:20 |
sean-k-mooney | did the mon ips change | 14:22 |
sean-k-mooney | presumable the secret is the same | 14:23 |
sean-k-mooney | but ya it could be that either the secret is msisign or the user aut info changed | 14:23 |
sean-k-mooney | the sechre has the ceph keyring inside | 14:23 |
sean-k-mooney | i dont have a ceph deployment to check but i belive that is tied a a spcific pool/user uuid | 14:24 |
sean-k-mooney | im not really shoudl how you recover on teh cecph side form all mons going away | 14:24 |
sean-k-mooney | but if any of the uuid chaged then you might need to get new keyrings and update the secret | 14:25 |
sean-k-mooney | changing the mon ips is not supproted in an openstack env since it requried bd surgery to fix | 14:26 |
admin1 | the mons were gone totally, but the ips did not change | 14:26 |
admin1 | all 3 mons are back in qorum | 14:26 |
admin1 | ceph is healthy and most of the vms started OK | 14:26 |
sean-k-mooney | ok the ips are cluster uuid and secrete are teh main things | 14:26 |
admin1 | there are 2 i know of that show this behaviour | 14:26 |
admin1 | there is no lock | 14:26 |
sean-k-mooney | so if the ips are the same no need to update the nova db unless the cluster id changed | 14:27 |
admin1 | cluster name /fsid all is same | 14:27 |
sean-k-mooney | ya fsid was what i ment | 14:27 |
admin1 | fsid is the same | 14:27 |
sean-k-mooney | so if that the same then provide the keyring in the secret is still valid you are porably ok | 14:27 |
sean-k-mooney | have you tried using that to list the volumes on the pool | 14:28 |
admin1 | got it | 14:32 |
clarkb | melwitt: thanks. It does look like bfv is tested, but it does also appear that the image used for rescuing is modified to set its bus and device types? I wonder if that is what we are missing here. The rescue command itself doesn't appear to take those arguments so these would need to be specified before hand on a special image? I guess that lends more weight to having | 15:50 |
clarkb | dedicated images as a required part of the rescue process? | 15:50 |
melwitt | clarkb: can be image properties or expressed as extra specs in the flavor. I can test out a change that would add a flavor create setting bus and device types in the test | 16:07 |
clarkb | melwitt: either way it is something that the cloud or cloud user would need to be aware of. Currently the default rescue behavior is to reuse the same image the rescued node booted off of. This is problematic because of the label specifier collisions, but also because if the image itself is broken you'd still be broken in a rescue. That leads users to using another image, but | 16:09 |
clarkb | there isn't any clear indication to me as a user that I need to use a special image. | 16:09 |
opendevreview | Dan Smith proposed openstack/nova master: Test ceph-multistore with a real image https://review.opendev.org/c/openstack/nova/+/860864 | 16:10 |
clarkb | I suspect the solution here is to make it clear to cloud operators that rescue has requirements x y z (I don't know what they all are yet) and that they should provide an image that meets those requirements | 16:11 |
melwitt | clarkb: hm yeah you are probably right it's only image properties, this section doesn't mention using a flavor to do it https://docs.openstack.org/nova/latest/user/rescue.html#stable-device-instance-rescue | 16:11 |
clarkb | also I wonder if nova should drop the default behavior or reusing the running image and instead force people to explicitly provide one | 16:12 |
clarkb | I suspect there are scenarios where reusing the image would work, but in the vast majority it seems unlikely | 16:13 |
clarkb | and that would help provide signal that something different is required here | 16:13 |
melwitt | I think we could do that in a new API microversion to avoid breaking anyone who is using it the old way and succeeding ... but the fact that openstackclient defaults to lowest microversion makes it more difficult to signal imho | 16:16 |
clarkb | ya and I think users could manually specify the same image if they really did need/want that | 16:18 |
clarkb | it just wouldn't be provided as a dfeault (which I think users expect to work) | 16:18 |
melwitt | yeah, I think that makes sense | 16:19 |
opendevreview | Merged openstack/nova stable/yoga: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/861872 | 17:19 |
opendevreview | Amit Uniyal proposed openstack/nova stable/ussuri: add regression test case for bug 1978983 https://review.opendev.org/c/openstack/nova/+/862603 | 17:31 |
opendevreview | Amit Uniyal proposed openstack/nova stable/ussuri: For evacuation, ignore if task_state is not None https://review.opendev.org/c/openstack/nova/+/862604 | 17:31 |
opendevreview | melanie witt proposed openstack/nova-specs master: Re-propose spec for ephemeral storage encryption https://review.opendev.org/c/openstack/nova-specs/+/864138 | 18:45 |
darkhorse | Hi team | 19:29 |
darkhorse | class NovaSession(): | 19:29 |
darkhorse | def __init__(self): | 19:29 |
darkhorse | self.auth = v3.Password(KEYSTONE_URL, username=OPENSTACK_ADMIN, password=OPENSTACK_ADMIN_PASS, project_name=ADMIN_PROJECT, user_domain_id=DOMAIN_ID, | 19:29 |
darkhorse | project_domain_id=PROJECT_DOMAIN_ID) | 19:29 |
darkhorse | self.sess = session.Session(self.auth) | 19:29 |
darkhorse | self.nova2 = nova_client.Client(2, session=self.sess) | 19:29 |
darkhorse | I use this code to create nova session. I want to use internal IP address since my app runs on controller. I set KEYSTONE_URL to internal keystone endpoint address but the client still send requests to nova public ip address. | 19:31 |
darkhorse | Is there a setting that I can tell the client to use internal address? | 19:31 |
opendevreview | Dan Smith proposed openstack/nova master: Test ceph-multistore with a real image https://review.opendev.org/c/openstack/nova/+/860864 | 19:34 |
melwitt | darkhorse: you might try the 'interface' kwarg to Client (interface=$the_name_of_your_internal_endpoint_in_the_service_catalog) which will get passed to the keystone adapter https://docs.openstack.org/keystoneauth/latest/api/keystoneauth1.adapter.html | 19:57 |
melwitt | there's also endpoint_override to provide the full url but the interface discovery is nicer I think if it works | 19:58 |
darkhorse | melwitt: thank you i ended up using nova_client.Client(2, session=self.sess, endpoint_type='internal') | 20:02 |
darkhorse | interface kwarg seems to be deprecated. | 20:02 |
melwitt | darkhorse: ok cool. it's the other way around I think, endpoint_type is an old name now used as an alias | 20:03 |
darkhorse | ok thank you. | 20:04 |
opendevreview | melanie witt proposed openstack/nova-specs master: Re-propose spec for ephemeral encryption for libvirt https://review.opendev.org/c/openstack/nova-specs/+/864147 | 22:30 |
*** dasm is now known as dasm|offp | 23:03 | |
*** dasm|offp is now known as dasm|off | 23:03 | |
opendevreview | melanie witt proposed openstack/nova-specs master: Re-propose per process healthchecks https://review.opendev.org/c/openstack/nova-specs/+/864150 | 23:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!