Monday, 2022-11-21

opendevreview	Jorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host https://review.opendev.org/c/openstack/nova/+/864812	10:14
opendevreview	Jorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host https://review.opendev.org/c/openstack/nova/+/864812	10:52
opendevreview	Jorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host https://review.opendev.org/c/openstack/nova/+/864812	11:37
admin1	i have a server with 1 vm .. i want to do maintenance on this server ... when i run openstack server migrate .. it tells me compute host X could not be found ..	12:16
admin1	i look into the logs and it has a diff uuid	12:17
admin1	if i want to delete this host, it says it has instances, clear it first	12:17
admin1	so i am in a bit of catch22	12:17
admin1	cannot fix without migratiing , cannot migrate without fixing	12:17
sean-k-mooney	admin1: it sounds like you changed the hostname on the server or the host value in the nova.conf	12:19
admin1	sean-k-mooney, i have always used the openstack-ansible playbook and never touched a manual setting	12:19
sean-k-mooney	thats the only way the uuid would change	12:19
admin1	how would one fix this ?	12:20
sean-k-mooney	you need to deterim if the hostname changed first	12:29
sean-k-mooney	but it likely will need db surgery if it did and you cant set it back	12:30
admin1	would changing the resource provider uuid for this hypervisor from old ( non existent) to new one help ?	12:36
admin1	i see that the UUID appears in only 1 filed in the compute_nodes tables	12:37
admin1	sean-k-mooney, i think the hostname changed from fqdn -> non-fqdn	12:46
admin1	hostname remained the same	12:46
admin1	sean-k-mooney, how to check own uuid ?	13:11
admin1	from the hypervisor	13:11
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/nova master: compute: enhance compute evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858383	13:19
opendevreview	Sahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858384	13:19
sahid	o/ gibi sean-k-mooney I have added you change that you were looking for, hope that makes sense	13:21
sahid	https://review.opendev.org/c/openstack/nova/+/858384/20/nova/api/openstack/compute/evacuate.py#104	13:21
sahid	s/you/the	13:22
sean-k-mooney	sahid: admin1 sorry was on a call downstream. sahid ill try and take a look at yyour change in general later in the week but that section looks like i was expecting so i think that will be fine	13:39
sean-k-mooney	admin1: that is unforgunete the base way to resolve this issue would be to set teh hostname back to the fqdn	13:39
admin1	i have one vm in this i need to migrate .. after that i can just delete /re-initialize it	13:39
sahid	sean-k-mooney: no worries, thanks a lot for your return	13:40
sean-k-mooney	can you check the instance.host value for that vm	13:40
sean-k-mooney	admin1: the instance.host value is ment to match the host value in the nova.conf	13:41
sean-k-mooney	if you have just one vm the simpleist fix woudl be to set the nova.conf host value on that node to match the instance.host on the vm	13:41
sean-k-mooney	then you should be able to cold migrate teh vm	13:41
sean-k-mooney	live migrtate might also work depending on the vm. e.g. if you are using any special feature like sriov or cpu pinning then cold migration has a higher proablity of working	13:42
admin1	i was not able to find hostname or name value in nova.conf	13:43
sean-k-mooney	admin1: if its not set the defautl is socket.gethostname()	13:43
sean-k-mooney	admin1: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.host	13:44
sean-k-mooney	admin1: nova does not support changing the hostname because it currpts our db. we have had a bad expirince with customer doing this acidentally of late to the point that we are now working on detecting it and prevent the compute agent form starting when it happens https://review.opendev.org/q/topic:bp%252Fstable-compute-uuids	13:45
admin1	Failed to create resource provider record in placement API for UUID 88b9b395-784f-4d78-8497-3d674f7dff64 .. Conflicting resource provider name: h20 already exists .. this is what I have	13:46
admin1	so question is where does this UUID come from ?	13:46
sean-k-mooney	ah yes that makes sesne	13:48
sean-k-mooney	ok remove the host value	13:48
sean-k-mooney	and upstea the instnace.host for that one instnace	13:48
sean-k-mooney	they way the uuid is calulated today is we use the nova.conf host value to look for a compute service record with the same host value	13:49
sean-k-mooney	*we look for a compute node record with the same host value not comptue service	13:49
admin1	so wherever in database, h20 with old UUID appears, i need to just updated it with the new 88b9b395-784f-4d78-8497-3d674f7dff64 uuid ?	13:50
opendevreview	Alexey Stupnikov proposed openstack/nova stable/wallaby: [stable-only] Use os-brick from source in wallaby https://review.opendev.org/c/openstack/nova/+/865134	13:55
sean-k-mooney	admin1: no	14:04
sean-k-mooney	you should leave teh comptue node alone and update the host value on the one instnace that is affected	14:04
sean-k-mooney	admin1: presumable its the full fqdn corrently right	14:04
sean-k-mooney	and the host is not just the hostname not fqdn	14:04
sean-k-mooney	on the compute node	14:05
sean-k-mooney	so you need to make them match then migrate it	14:05
sean-k-mooney	we use the instance.host to determin the rpc endpoint of the compute service that manages it	14:05
sean-k-mooney	admin1: so if the compute service name change and you have just one vm the shortest way to fix it is update that one instnace and then migrate it	14:06
admin1	right .. its full fqdn, but the issue is the current hostname is also not able to register into placement .. it saysFailed to create resource provider record in placement API for UUID 88b9b395-784f-4d78-8497-3d674f7dff64 .. Conflicting resource provider name: h20 already exists	14:06
admin1	so just update for this one instance, node to h20 instead of h20.fqdn	14:06
admin1	host does show h20 .. node shows h20.fqdn	14:07
sean-k-mooney	ok so i need you to check a few things all of which shoudl be the same	14:08
sean-k-mooney	we need to check that the instance.host value and serivce.host value are the same.	14:08
sean-k-mooney	the hyperviour hostname and placement RP name need to be the same	14:08
sean-k-mooney	and the compute node uuid and placment uuid need to match	14:08
sean-k-mooney	and the compute node host value must match the instace.host and service.host values	14:09
sean-k-mooney	those are the 4 things that need to align.	14:09
sean-k-mooney	nova does not support changign the hostname or the [DEFAULT]/host value after the agent is first started on a physical server	14:10
sean-k-mooney	chaning either will currpt both the nova db and create issues in placemnt	14:10
admin1	" compute node uuid and placment uuid need to match" - where/how would I see those values ?	14:11
admin1	from the db ?	14:11
sean-k-mooney	yep although you can actuly get them form the api too	14:12
sean-k-mooney	the placment uuid is jsut in the placement show output	14:12
sean-k-mooney	the compute node uuid is in the hypervior api if you use a new enough verion	14:12
sean-k-mooney	admin1: but yes you can get it in the cell db compute_nodes table	14:12
admin1	in placement, i already have h20 as 59cc8a37-cee4-4dbc-84bf-18f56366bb2d, and h20.fqdn as 7bf78a2d-ce88-4ba2-a5b5-27fa4479f887 .. but in the h20 nova-compute logs, it tries to register itself as 5fecf61b-feb6-4af4-82d9-e5f5245e6ae9	14:14
admin1	a grep of the whole database dump shows that that UUID ... 59c is only in 2 places ..... resource_providers and compute_nodes	14:17
admin1	so if I update those 2 tables with the new uuid 5f that the node is trying to register as instead of of the 59 in the db, would it fix ?	14:17
sean-k-mooney	admin1: thats because you have presumable already deleted teh old compute service entry for the host	14:20
sean-k-mooney	a safer approch might be to remove the resouce providers for that host in placment	14:21
sean-k-mooney	allow the compute service to start up and regeister its self	14:22
sean-k-mooney	then make sure the instance alines and them migrate it	14:22
sean-k-mooney	admin1: i want to make it very clear however that the hostname changing is one of the most distructive things that can happen to the nova/placment dbs and is very non trivial to recover form	14:23
sean-k-mooney	if you remove the placment rp with/without the fqdn	14:23
sean-k-mooney	it will allow the compute service to start	14:24
sean-k-mooney	if you ensure the instnace.host matches the running compute service you should then be able to manage it and migrate it	14:24
admin1	is there an api way to delete the entry from placement	14:24
admin1	instead from db	14:24
admin1	cli way	14:24
sean-k-mooney	yes if it has no allocations	14:24
sean-k-mooney	https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-delete	14:25
sean-k-mooney	you might have to do https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete to delete the allcotion for the vm first	14:25
sean-k-mooney	once the compute service is running and regesterd in placment again	14:26
sean-k-mooney	update the instanace.host value to match the running service	14:27
sean-k-mooney	optionally run https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement-heal-allocations	14:27
sean-k-mooney	for the singel instnace	14:27
sean-k-mooney	and then migrate it	14:27
sean-k-mooney	heal allocations will restorte the allocatiosn in palcment that you deleted to allow you to delete the resouce provider	14:28
sean-k-mooney	cold migration will fix the alloction on the destination in either case when you confirm the migration	14:28
admin1	UUID of the consumer -- is the UUID of the vm ?	14:29
sean-k-mooney	yep	14:29
sean-k-mooney	in this case at least	14:29
sean-k-mooney	if you are cold migrating a vm it will also have a second allocation using the migration uuid	14:30
admin1	resource provider allocation show UUID ( of h20 ) shows blank, but delete gives Resource provider has allocations	14:32
admin1	so there could be some more allocations in the old uuid .. but not present int he virsh list that i can see	14:33
sean-k-mooney	were you able to delete the fqdn version	14:34
admin1	yes	14:34
admin1	fqdn one is gone	14:34
sean-k-mooney	and the vm has the fqdn currently	14:34
sean-k-mooney	if so can you check what virsh hostname outputs	14:34
admin1	in the instances.node , its set to fqdn	14:35
sean-k-mooney	is it the hostname or hostname.fqdn	14:36
sean-k-mooney	or i guess hostname.domain	14:36
admin1	i was able to delete both now	14:36
sean-k-mooney	oh ok good	14:37
sean-k-mooney	so the compute agent should now be able to start	14:37
admin1	the GUI hypervisors showed the instances ..	14:37
admin1	it registered itself now ..	14:38
sean-k-mooney	ya if you have db currption like this its hard to resolve	14:38
sean-k-mooney	so if its regeisterd its self you just need to ensure the instance.host and service.host agree and you shoudl be able to migrate	14:38
admin1	now when i try to migrate, using openstack server migrate, it says compute host h20 could not be found	14:39
admin1	so i guess its trying to refer to some other h20	14:39
sean-k-mooney	you see the compute service in openstack compute service list right and its up	14:40
sean-k-mooney	oh did you update the compute service mappings in the api deb	14:40
admin1	not the last part ..	14:40
sean-k-mooney	you need to run nova-mange cell_v2 discover_hosts i think	14:40
admin1	that would be from inside the nova venv ?	14:41
sean-k-mooney	the new comptue service recorred need to be mapped to the correct cell	14:41
sean-k-mooney	ideally form one of the contoller with db access	14:41
admin1	ok	14:41
admin1	from the os utils as admin, or as nova in the nova venv	14:41
sean-k-mooney	nova-mange uses the credential in the nova.conf	14:41
admin1	got it	14:42
sean-k-mooney	so you should use the config the conductor uses	14:42
sean-k-mooney	or ap	14:42
sean-k-mooney	*or api	14:42
sean-k-mooney	https://docs.openstack.org/nova/latest/cli/nova-manage.html#cell-v2-discover-hosts	14:42
*** dasm\|off is now known as dasm		14:48
admin1	sean-k-mooney , thank you .. finally its migrating to another host	14:50
sean-k-mooney	admin1: once you have completed that and confirmed the migration	14:50
admin1	i read the spec .. having a uuid that is associated with the server and not associated with hostname will fix issues like this in future	14:50
sean-k-mooney	i woudl advise checking if any other computes have had a host name change	14:50
sean-k-mooney	admin1: that not really what the sepc is goign to do	14:51
admin1	in my case, grafana showed fqdn while others were non-fqdn, so another collegue decided to change the fqdn to just hostname only and not full fqdn to make the graphs sane	14:51
sean-k-mooney	admin1: the spec will record the compute node uuid in a file so we can detech if the hostname changes	14:51
sean-k-mooney	admin1: ya if they did that everywhere they woudl have severly currpted the db	14:52
sean-k-mooney	you woudl need to cold migrate every workload on teh affected hosts to resolve it	14:52
sean-k-mooney	its much much better to correct the hostname if no new instnace have been created	14:52
admin1	yeah .. we normally ensure all is what is needed before deploying the first vm	14:54
admin1	but in this case, since the server was replaced and pxe put the fqdn, it slipped	14:54
sean-k-mooney	ack	14:54
admin1	and it was corrected only after the first vm was deployed	14:54
sean-k-mooney	i would advise autiting the rest to make sure there are no other in this state	14:55
sean-k-mooney	the longer you hosts like this the harder it is to fix	14:55
opendevreview	Merged openstack/os-vif stable/zed: Move mtu update request into ovsdb transaction https://review.opendev.org/c/openstack/os-vif/+/863993	15:13
*** tbachman_ is now known as tbachman		15:43
opendevreview	sean mooney proposed openstack/nova-specs master: add spec for fqdn in hostname https://review.opendev.org/c/openstack/nova-specs/+/862626	16:23
sean-k-mooney	gibi: dansmith melwitt ^ that hopefully has adressed all the outstanding comments	16:24
admin1	as an operator, i would prefer to not have fqdn but just hostnames .. because when it comes to monitoring and graphs, with fqdn it clutters the whole graph .. h20.location.dev.cloud.domain.com where the location.dev.cloud.domain.com is redundant in all	16:26
sean-k-mooney	admin1: that spec is for vms	16:26
sean-k-mooney	and on the compute nodes you are free to use either	16:27
sean-k-mooney	i prefer using hostnames on the computes nodes too	16:27
sean-k-mooney	admin1: for what its worth nova did not test and recommended against using fqdns for the compute node hostname for a very long time	16:30
sean-k-mooney	some installer implemtned it anyway and we kind of got stuck with supporting it	16:30
sean-k-mooney	unfortunetly if you want to use tls i think (not 100% sure) a fqdn is required for the certs	16:31
sean-k-mooney	that is the main reason they started changing to FQDNs as far as i am aware	16:31
opendevreview	sean mooney proposed openstack/nova stable/xena: Record SRIOV PF MAC in the binding profile https://review.opendev.org/c/openstack/nova/+/864933	17:08
opendevreview	sean mooney proposed openstack/nova stable/xena: Remove double mocking https://review.opendev.org/c/openstack/nova/+/864934	17:08
opendevreview	sean mooney proposed openstack/nova stable/xena: Remove double mocking... again https://review.opendev.org/c/openstack/nova/+/864935	17:08
opendevreview	sean mooney proposed openstack/nova stable/xena: Add compute restart capability for libvirt func tests https://review.opendev.org/c/openstack/nova/+/864936	17:08
opendevreview	sean mooney proposed openstack/nova stable/xena: enable blocked VDPA move operations https://review.opendev.org/c/openstack/nova/+/864937	17:08
opendevreview	Merged openstack/os-vif stable/yoga: Move mtu update request into ovsdb transaction https://review.opendev.org/c/openstack/os-vif/+/863994	18:18
opendevreview	Merged openstack/nova stable/zed: Handle "no RAM info was set" migration case https://review.opendev.org/c/openstack/nova/+/860732	20:59
*** tbachman_ is now known as tbachman		21:04
opendevreview	Merged openstack/nova master: Update contributor guide for 2023.1 Antelope https://review.opendev.org/c/openstack/nova/+/858238	21:50
*** dasm is now known as dasm\|off		22:01
opendevreview	Ghanshyam proposed openstack/nova master: Update gate jobs as per the 2023.1 cycle testing runtime https://review.opendev.org/c/openstack/nova/+/861111	22:08
opendevreview	Ghanshyam proposed openstack/os-vif master: Update gate jobs as per the 2023.1 cycle testing runtime https://review.opendev.org/c/openstack/os-vif/+/861468	22:08

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!