opendevreview | Jorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host https://review.opendev.org/c/openstack/nova/+/864812 | 10:14 |
---|---|---|
opendevreview | Jorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host https://review.opendev.org/c/openstack/nova/+/864812 | 10:52 |
opendevreview | Jorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host https://review.opendev.org/c/openstack/nova/+/864812 | 11:37 |
admin1 | i have a server with 1 vm .. i want to do maintenance on this server ... when i run openstack server migrate .. it tells me compute host X could not be found .. | 12:16 |
admin1 | i look into the logs and it has a diff uuid | 12:17 |
admin1 | if i want to delete this host, it says it has instances, clear it first | 12:17 |
admin1 | so i am in a bit of catch22 | 12:17 |
admin1 | cannot fix without migratiing , cannot migrate without fixing | 12:17 |
sean-k-mooney | admin1: it sounds like you changed the hostname on the server or the host value in the nova.conf | 12:19 |
admin1 | sean-k-mooney, i have always used the openstack-ansible playbook and never touched a manual setting | 12:19 |
sean-k-mooney | thats the only way the uuid would change | 12:19 |
admin1 | how would one fix this ? | 12:20 |
sean-k-mooney | you need to deterim if the hostname changed first | 12:29 |
sean-k-mooney | but it likely will need db surgery if it did and you cant set it back | 12:30 |
admin1 | would changing the resource provider uuid for this hypervisor from old ( non existent) to new one help ? | 12:36 |
admin1 | i see that the UUID appears in only 1 filed in the compute_nodes tables | 12:37 |
admin1 | sean-k-mooney, i think the hostname changed from fqdn -> non-fqdn | 12:46 |
admin1 | hostname remained the same | 12:46 |
admin1 | sean-k-mooney, how to check own uuid ? | 13:11 |
admin1 | from the hypervisor | 13:11 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: compute: enhance compute evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858383 | 13:19 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858384 | 13:19 |
sahid | o/ gibi sean-k-mooney I have added you change that you were looking for, hope that makes sense | 13:21 |
sahid | https://review.opendev.org/c/openstack/nova/+/858384/20/nova/api/openstack/compute/evacuate.py#104 | 13:21 |
sahid | s/you/the | 13:22 |
sean-k-mooney | sahid: admin1 sorry was on a call downstream. sahid ill try and take a look at yyour change in general later in the week but that section looks like i was expecting so i think that will be fine | 13:39 |
sean-k-mooney | admin1: that is unforgunete the base way to resolve this issue would be to set teh hostname back to the fqdn | 13:39 |
admin1 | i have one vm in this i need to migrate .. after that i can just delete /re-initialize it | 13:39 |
sahid | sean-k-mooney: no worries, thanks a lot for your return | 13:40 |
sean-k-mooney | can you check the instance.host value for that vm | 13:40 |
sean-k-mooney | admin1: the instance.host value is ment to match the host value in the nova.conf | 13:41 |
sean-k-mooney | if you have just one vm the simpleist fix woudl be to set the nova.conf host value on that node to match the instance.host on the vm | 13:41 |
sean-k-mooney | then you should be able to cold migrate teh vm | 13:41 |
sean-k-mooney | live migrtate might also work depending on the vm. e.g. if you are using any special feature like sriov or cpu pinning then cold migration has a higher proablity of working | 13:42 |
admin1 | i was not able to find hostname or name value in nova.conf | 13:43 |
sean-k-mooney | admin1: if its not set the defautl is socket.gethostname() | 13:43 |
sean-k-mooney | admin1: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.host | 13:44 |
sean-k-mooney | admin1: nova does not support changing the hostname because it currpts our db. we have had a bad expirince with customer doing this acidentally of late to the point that we are now working on detecting it and prevent the compute agent form starting when it happens https://review.opendev.org/q/topic:bp%252Fstable-compute-uuids | 13:45 |
admin1 | Failed to create resource provider record in placement API for UUID 88b9b395-784f-4d78-8497-3d674f7dff64 .. Conflicting resource provider name: h20 already exists .. this is what I have | 13:46 |
admin1 | so question is where does this UUID come from ? | 13:46 |
sean-k-mooney | ah yes that makes sesne | 13:48 |
sean-k-mooney | ok remove the host value | 13:48 |
sean-k-mooney | and upstea the instnace.host for that one instnace | 13:48 |
sean-k-mooney | they way the uuid is calulated today is we use the nova.conf host value to look for a compute service record with the same host value | 13:49 |
sean-k-mooney | *we look for a compute node record with the same host value not comptue service | 13:49 |
admin1 | so wherever in database, h20 with old UUID appears, i need to just updated it with the new 88b9b395-784f-4d78-8497-3d674f7dff64 uuid ? | 13:50 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/wallaby: [stable-only] Use os-brick from source in wallaby https://review.opendev.org/c/openstack/nova/+/865134 | 13:55 |
sean-k-mooney | admin1: no | 14:04 |
sean-k-mooney | you should leave teh comptue node alone and update the host value on the one instnace that is affected | 14:04 |
sean-k-mooney | admin1: presumable its the full fqdn corrently right | 14:04 |
sean-k-mooney | and the host is not just the hostname not fqdn | 14:04 |
sean-k-mooney | on the compute node | 14:05 |
sean-k-mooney | so you need to make them match then migrate it | 14:05 |
sean-k-mooney | we use the instance.host to determin the rpc endpoint of the compute service that manages it | 14:05 |
sean-k-mooney | admin1: so if the compute service name change and you have just one vm the shortest way to fix it is update that one instnace and then migrate it | 14:06 |
admin1 | right .. its full fqdn, but the issue is the current hostname is also not able to register into placement .. it saysFailed to create resource provider record in placement API for UUID 88b9b395-784f-4d78-8497-3d674f7dff64 .. Conflicting resource provider name: h20 already exists | 14:06 |
admin1 | so just update for this one instance, node to h20 instead of h20.fqdn | 14:06 |
admin1 | host does show h20 .. node shows h20.fqdn | 14:07 |
sean-k-mooney | ok so i need you to check a few things all of which shoudl be the same | 14:08 |
sean-k-mooney | we need to check that the instance.host value and serivce.host value are the same. | 14:08 |
sean-k-mooney | the hyperviour hostname and placement RP name need to be the same | 14:08 |
sean-k-mooney | and the compute node uuid and placment uuid need to match | 14:08 |
sean-k-mooney | and the compute node host value must match the instace.host and service.host values | 14:09 |
sean-k-mooney | those are the 4 things that need to align. | 14:09 |
sean-k-mooney | nova does not support changign the hostname or the [DEFAULT]/host value after the agent is first started on a physical server | 14:10 |
sean-k-mooney | chaning either will currpt both the nova db and create issues in placemnt | 14:10 |
admin1 | " compute node uuid and placment uuid need to match" - where/how would I see those values ? | 14:11 |
admin1 | from the db ? | 14:11 |
sean-k-mooney | yep although you can actuly get them form the api too | 14:12 |
sean-k-mooney | the placment uuid is jsut in the placement show output | 14:12 |
sean-k-mooney | the compute node uuid is in the hypervior api if you use a new enough verion | 14:12 |
sean-k-mooney | admin1: but yes you can get it in the cell db compute_nodes table | 14:12 |
admin1 | in placement, i already have h20 as 59cc8a37-cee4-4dbc-84bf-18f56366bb2d, and h20.fqdn as 7bf78a2d-ce88-4ba2-a5b5-27fa4479f887 .. but in the h20 nova-compute logs, it tries to register itself as 5fecf61b-feb6-4af4-82d9-e5f5245e6ae9 | 14:14 |
admin1 | a grep of the whole database dump shows that that UUID ... 59c is only in 2 places ..... resource_providers and compute_nodes | 14:17 |
admin1 | so if I update those 2 tables with the new uuid 5f that the node is trying to register as instead of of the 59 in the db, would it fix ? | 14:17 |
sean-k-mooney | admin1: thats because you have presumable already deleted teh old compute service entry for the host | 14:20 |
sean-k-mooney | a safer approch might be to remove the resouce providers for that host in placment | 14:21 |
sean-k-mooney | allow the compute service to start up and regeister its self | 14:22 |
sean-k-mooney | then make sure the instance alines and them migrate it | 14:22 |
sean-k-mooney | admin1: i want to make it very clear however that the hostname changing is one of the most distructive things that can happen to the nova/placment dbs and is very non trivial to recover form | 14:23 |
sean-k-mooney | if you remove the placment rp with/without the fqdn | 14:23 |
sean-k-mooney | it will allow the compute service to start | 14:24 |
sean-k-mooney | if you ensure the instnace.host matches the running compute service you should then be able to manage it and migrate it | 14:24 |
admin1 | is there an api way to delete the entry from placement | 14:24 |
admin1 | instead from db | 14:24 |
admin1 | cli way | 14:24 |
sean-k-mooney | yes if it has no allocations | 14:24 |
sean-k-mooney | https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-delete | 14:25 |
sean-k-mooney | you might have to do https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete to delete the allcotion for the vm first | 14:25 |
sean-k-mooney | once the compute service is running and regesterd in placment again | 14:26 |
sean-k-mooney | update the instanace.host value to match the running service | 14:27 |
sean-k-mooney | optionally run https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement-heal-allocations | 14:27 |
sean-k-mooney | for the singel instnace | 14:27 |
sean-k-mooney | and then migrate it | 14:27 |
sean-k-mooney | heal allocations will restorte the allocatiosn in palcment that you deleted to allow you to delete the resouce provider | 14:28 |
sean-k-mooney | cold migration will fix the alloction on the destination in either case when you confirm the migration | 14:28 |
admin1 | UUID of the consumer -- is the UUID of the vm ? | 14:29 |
sean-k-mooney | yep | 14:29 |
sean-k-mooney | in this case at least | 14:29 |
sean-k-mooney | if you are cold migrating a vm it will also have a second allocation using the migration uuid | 14:30 |
admin1 | resource provider allocation show UUID ( of h20 ) shows blank, but delete gives Resource provider has allocations | 14:32 |
admin1 | so there could be some more allocations in the old uuid .. but not present int he virsh list that i can see | 14:33 |
sean-k-mooney | were you able to delete the fqdn version | 14:34 |
admin1 | yes | 14:34 |
admin1 | fqdn one is gone | 14:34 |
sean-k-mooney | and the vm has the fqdn currently | 14:34 |
sean-k-mooney | if so can you check what virsh hostname outputs | 14:34 |
admin1 | in the instances.node , its set to fqdn | 14:35 |
sean-k-mooney | is it the hostname or hostname.fqdn | 14:36 |
sean-k-mooney | or i guess hostname.domain | 14:36 |
admin1 | i was able to delete both now | 14:36 |
sean-k-mooney | oh ok good | 14:37 |
sean-k-mooney | so the compute agent should now be able to start | 14:37 |
admin1 | the GUI hypervisors showed the instances .. | 14:37 |
admin1 | it registered itself now .. | 14:38 |
sean-k-mooney | ya if you have db currption like this its hard to resolve | 14:38 |
sean-k-mooney | so if its regeisterd its self you just need to ensure the instance.host and service.host agree and you shoudl be able to migrate | 14:38 |
admin1 | now when i try to migrate, using openstack server migrate, it says compute host h20 could not be found | 14:39 |
admin1 | so i guess its trying to refer to some other h20 | 14:39 |
sean-k-mooney | you see the compute service in openstack compute service list right and its up | 14:40 |
sean-k-mooney | oh did you update the compute service mappings in the api deb | 14:40 |
admin1 | not the last part .. | 14:40 |
sean-k-mooney | you need to run nova-mange cell_v2 discover_hosts i think | 14:40 |
admin1 | that would be from inside the nova venv ? | 14:41 |
sean-k-mooney | the new comptue service recorred need to be mapped to the correct cell | 14:41 |
sean-k-mooney | ideally form one of the contoller with db access | 14:41 |
admin1 | ok | 14:41 |
admin1 | from the os utils as admin, or as nova in the nova venv | 14:41 |
sean-k-mooney | nova-mange uses the credential in the nova.conf | 14:41 |
admin1 | got it | 14:42 |
sean-k-mooney | so you should use the config the conductor uses | 14:42 |
sean-k-mooney | or ap | 14:42 |
sean-k-mooney | *or api | 14:42 |
sean-k-mooney | https://docs.openstack.org/nova/latest/cli/nova-manage.html#cell-v2-discover-hosts | 14:42 |
*** dasm|off is now known as dasm | 14:48 | |
admin1 | sean-k-mooney , thank you .. finally its migrating to another host | 14:50 |
sean-k-mooney | admin1: once you have completed that and confirmed the migration | 14:50 |
admin1 | i read the spec .. having a uuid that is associated with the server and not associated with hostname will fix issues like this in future | 14:50 |
sean-k-mooney | i woudl advise checking if any other computes have had a host name change | 14:50 |
sean-k-mooney | admin1: that not really what the sepc is goign to do | 14:51 |
admin1 | in my case, grafana showed fqdn while others were non-fqdn, so another collegue decided to change the fqdn to just hostname only and not full fqdn to make the graphs sane | 14:51 |
sean-k-mooney | admin1: the spec will record the compute node uuid in a file so we can detech if the hostname changes | 14:51 |
sean-k-mooney | admin1: ya if they did that everywhere they woudl have severly currpted the db | 14:52 |
sean-k-mooney | you woudl need to cold migrate every workload on teh affected hosts to resolve it | 14:52 |
sean-k-mooney | its much much better to correct the hostname if no new instnace have been created | 14:52 |
admin1 | yeah .. we normally ensure all is what is needed before deploying the first vm | 14:54 |
admin1 | but in this case, since the server was replaced and pxe put the fqdn, it slipped | 14:54 |
sean-k-mooney | ack | 14:54 |
admin1 | and it was corrected only after the first vm was deployed | 14:54 |
sean-k-mooney | i would advise autiting the rest to make sure there are no other in this state | 14:55 |
sean-k-mooney | the longer you hosts like this the harder it is to fix | 14:55 |
opendevreview | Merged openstack/os-vif stable/zed: Move mtu update request into ovsdb transaction https://review.opendev.org/c/openstack/os-vif/+/863993 | 15:13 |
*** tbachman_ is now known as tbachman | 15:43 | |
opendevreview | sean mooney proposed openstack/nova-specs master: add spec for fqdn in hostname https://review.opendev.org/c/openstack/nova-specs/+/862626 | 16:23 |
sean-k-mooney | gibi: dansmith melwitt ^ that hopefully has adressed all the outstanding comments | 16:24 |
admin1 | as an operator, i would prefer to not have fqdn but just hostnames .. because when it comes to monitoring and graphs, with fqdn it clutters the whole graph .. h20.location.dev.cloud.domain.com where the location.dev.cloud.domain.com is redundant in all | 16:26 |
sean-k-mooney | admin1: that spec is for vms | 16:26 |
sean-k-mooney | and on the compute nodes you are free to use either | 16:27 |
sean-k-mooney | i prefer using hostnames on the computes nodes too | 16:27 |
sean-k-mooney | admin1: for what its worth nova did not test and recommended against using fqdns for the compute node hostname for a very long time | 16:30 |
sean-k-mooney | some installer implemtned it anyway and we kind of got stuck with supporting it | 16:30 |
sean-k-mooney | unfortunetly if you want to use tls i think (not 100% sure) a fqdn is required for the certs | 16:31 |
sean-k-mooney | that is the main reason they started changing to FQDNs as far as i am aware | 16:31 |
opendevreview | sean mooney proposed openstack/nova stable/xena: Record SRIOV PF MAC in the binding profile https://review.opendev.org/c/openstack/nova/+/864933 | 17:08 |
opendevreview | sean mooney proposed openstack/nova stable/xena: Remove double mocking https://review.opendev.org/c/openstack/nova/+/864934 | 17:08 |
opendevreview | sean mooney proposed openstack/nova stable/xena: Remove double mocking... again https://review.opendev.org/c/openstack/nova/+/864935 | 17:08 |
opendevreview | sean mooney proposed openstack/nova stable/xena: Add compute restart capability for libvirt func tests https://review.opendev.org/c/openstack/nova/+/864936 | 17:08 |
opendevreview | sean mooney proposed openstack/nova stable/xena: enable blocked VDPA move operations https://review.opendev.org/c/openstack/nova/+/864937 | 17:08 |
opendevreview | Merged openstack/os-vif stable/yoga: Move mtu update request into ovsdb transaction https://review.opendev.org/c/openstack/os-vif/+/863994 | 18:18 |
opendevreview | Merged openstack/nova stable/zed: Handle "no RAM info was set" migration case https://review.opendev.org/c/openstack/nova/+/860732 | 20:59 |
*** tbachman_ is now known as tbachman | 21:04 | |
opendevreview | Merged openstack/nova master: Update contributor guide for 2023.1 Antelope https://review.opendev.org/c/openstack/nova/+/858238 | 21:50 |
*** dasm is now known as dasm|off | 22:01 | |
opendevreview | Ghanshyam proposed openstack/nova master: Update gate jobs as per the 2023.1 cycle testing runtime https://review.opendev.org/c/openstack/nova/+/861111 | 22:08 |
opendevreview | Ghanshyam proposed openstack/os-vif master: Update gate jobs as per the 2023.1 cycle testing runtime https://review.opendev.org/c/openstack/os-vif/+/861468 | 22:08 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!