*** mhen_ is now known as mhen | 02:01 | |
*** yadnesh|away is now known as yadnesh | 04:35 | |
*** luigi is now known as luigi-mtg | 07:05 | |
*** rlandy|out is now known as rlandy | 10:37 | |
*** blarnath is now known as d34dh0r53 | 12:27 | |
*** yadnesh is now known as yadnesh|away | 13:13 | |
jpic_ | hi all, i'm having issues migrating instances, it fails with "cannot find compute host compute-008", nova hypervisor-list shows fqdns in hypervisor hostnames, but in openstack server show it shows only short hostnames for OS-EXT-SRV-ATTR:{host,hypervisor_hostname}, how do you suggest to fix this ? | 14:15 |
---|---|---|
jpic_ | i suppose we need to have matching hypervisor hostnames in both instances host attributes and hypervisor names, so, I'm goign to have to change something one way or another right? | 14:16 |
lowercase | it would be best if you go the nova-scheduler logs and look at the error there | 14:50 |
lowercase | If the nova scheduler states, "no suitable hypervisor found" then yes. | 14:51 |
lowercase | as for the hypervisor hostnames? I'm not quite sure what you mean there. This might be a stupid answer, but all hypervisors should have unique hostnames. | 14:52 |
lowercase | You could look at hypervisor attributes in the admin console, and confirm that the different hypervisors have similar attributes. | 14:53 |
lowercase | "but in openstack server show it shows only short hostnames for OS-EXT-SRV-ATTR:{host,hypervisor_hostname}" I never did, those attributes are different in my environments as well. I just wrote code to handle the difference for me. | 14:54 |
lowercase | You could also avoid typing in the destination server, and just do `nova live-migrate <host>` and then do not specify a destination server. Nova scheduler will then select the best hypervisor, and migrate the server to the best suitable hypervisor. | 14:56 |
lowercase | jpic_: ^ | 14:56 |
jpic_ | lowercase: not no suitable hypervisor, it says it can't find the hypervisor where the instance is actually located | 14:59 |
jpic_ | it says it can't find compute-008 where the VM is at, while compute-008 is indeed the name of the compute service, it's not the name from `nova hypervisor-list` output so I thought we had a naming problem there | 15:00 |
lowercase | nova live-migration takes the vm uuid as a param. | 15:02 |
lowercase | so the hypervisor is online and nova-compute is running? | 15:03 |
jpic_ | yes, it's up, the VM is even up | 15:05 |
lowercase | Confirmed by doing `nova service-list`? | 15:08 |
lowercase | the vm can be up with the nova-compute service being down | 15:11 |
jpic_ | yes, it's up in nova service-list, and openstack compute service list | 15:23 |
lowercase | okay, can you please share the command you are using to migrate an instance? | 15:29 |
jpic_ | openstack server migrate --live-migration 84cfe8c4-e28a-4b2e-9586-3207492710dc --host compute-009 --os-compute-api-version=2.30 | 15:30 |
lowercase | 2.3 huh. What version of openstack is your environment running? | 15:31 |
lowercase | juno ish | 15:31 |
jpic_ | train ... | 15:31 |
lowercase | lol, you are in luck that i still have 2 environments that run train. | 15:31 |
lowercase | everything else i am up to date. | 15:31 |
jpic_ | they are using an os-ansible playbooks fork made by a company that has shutdown, and expecting me to make a quote to "migrate to kolla-ansible" xD | 15:32 |
lowercase | so uh, openstack command isn't really fully fledged then. Do you have the command `nova`? | 15:32 |
lowercase | naw, make new environment. | 15:32 |
lowercase | lol | 15:32 |
jpic_ | yes there's the nova command | 15:33 |
lowercase | yea, try just doing nova live-migrate <vim-uuid> | 15:33 |
jpic_ | well i guess i need to do some policies: ERROR (Forbidden): Policy doesn't allow os_compute_api:os-migrate-server:migrate_live to be performed. | 15:34 |
lowercase | i love a good helpful error | 15:34 |
jpic_ | thanks! | 15:34 |
jpic_ | lowercase: actually that wasn't the VM causing the issue i was trying to describe, see https://dpaste.com/GQXF2XAB8 | 15:41 |
jpic_ | openstack server migrate is not finding the compute where the VM is actually sitting ... | 15:42 |
jpic_ | `nova live-migration UUID` does not output anything, but doesn't do anything neither: https://dpaste.com/AYVSJTEB2 | 15:43 |
lowercase | please perform `nova migration-list`, your migration will be the top one | 15:48 |
jpic_ | yes, it's in error | 15:48 |
lowercase | view the scheduler logs | 15:49 |
lowercase | and then check the nova-conductor logs if oyu don't see anything there | 15:49 |
jpic_ | Failed to compute_task_migrate_server: L'hôte de calcul compute-008 ne peut pas être trouvé.: ComputeHostNotFound: L'h\xf4te de calcul compute-008 ne peut pas \xeatre trouv\xe9. | 15:50 |
jpic_ | which means "the compute host compute-008 cannot be found" | 15:50 |
jpic_ | apparently they got a joyful locale | 15:50 |
lowercase | here is a stupid thing to check | 15:50 |
lowercase | go check if compute-008 is in the same AZ as 09 | 15:50 |
lowercase | nvm it is | 15:51 |
lowercase | sorry | 15:51 |
jpic_ | openstack availability zone list does show 4 lines with the same name "nova" | 15:52 |
lowercase | yeah, nova is the default name for an az | 15:52 |
jpic_ | but we do we have 4 of them | 15:52 |
lowercase | compute-008 the host hypervisor cannot be found. | 15:53 |
jpic_ | check it out https://dpaste.com/HRYHSSZMQ | 15:53 |
lowercase | restart nova-compute on 008 ? | 15:53 |
lowercase | oh no. | 15:54 |
lowercase | they have 4 identical zones named nova?! | 15:54 |
jpic_ | apparently!! | 15:54 |
lowercase | yikes, start deleting them one by one and see if any hypervisors are assigned to the other novas | 15:54 |
jpic_ | no my bad they don't, they have one nova zone only for computes | 15:55 |
jpic_ | https://dpaste.com/CHGU6TKXT | 15:55 |
lowercase | what does the nova-compute logs say on 008? | 15:55 |
lowercase | you may need to enable debug logging at this point | 15:56 |
jpic_ | ok restarting in debug and retrying | 15:56 |
lowercase | might as well put the scheduler and conductor logs in debug as well | 15:57 |
lowercase | i would still remove those extra AZs | 15:57 |
jpic_ | there's a nova AZ for volume, another one for storage and so on apparently | 15:57 |
lowercase | so thats 3 nova's, whats the forth one for lol | 15:58 |
jpic_ | ok there are 2 AZs with name nova for network | 15:59 |
lowercase | welp, guess thats okay | 16:00 |
lowercase | do a nova live-migration again and watch the logs. | 16:00 |
jpic_ | one is for resource router, the other for network | 16:00 |
jpic_ | well there's a lot of logs but not really seeing anything related | 16:02 |
jpic_ | especially on compute-008, absolutely nothing | 16:02 |
jpic_ | in scheduler, this is logged a lot though: /var/log/nova/nova-scheduler.log:2022-05-02 18:52:13.129 42861 INFO nova.scheduler.host_manager [req-1ee6306b-4ffc-4ddd-b552-c817fdaf3996 - - - - -] The instance sync for host 'compute-008' did not match. Re-created its InstanceList. | 16:03 |
jpic_ | is it normal to have a lot of packet drops on br-ex with ovs and vxlan? like, almost half of the total packets going through? | 16:09 |
jpic_ | I don't have another environment to check | 16:09 |
jpic_ | RX packets 18008 dropped 15487 | 16:10 |
lowercase | We don't use ovs. | 16:11 |
lowercase | and no... that's a lot | 16:11 |
jpic_ | you have ovn? | 16:11 |
lowercase | our switches and routers do the routing. We use a combination of neutron routing, bgp and dedicated routers | 16:12 |
lowercase | but the key thing is that neutron and the routers share the same multicast routing, so that both know of each other and can perform the packet routing | 16:13 |
lowercase | s/multicast routing/multicast group | 16:13 |
jpic | i did fix a bunch of networking problems earlier today, not sure if the drops are exactly recent | 16:23 |
lowercase | if the api is unable to talk to the compute node, it could result in a very similar issue | 16:30 |
*** soniya29|ruck is now known as soniya29 | 16:30 | |
lowercase | So, let me think this outloud cause i didn't set it up. We use zebra (linux bgp service) on the neutron linux container. The router advertises a 10.0. block of addresses to the neutron routers. The neutron routers and physical routers share a multicast group that contains the location of each address and it's reservation. If the ip exists, it gets the ip address of the hypervisor as to where it to route the packet (i think). Th | 16:35 |
lowercase | e hypervisors have multiple vlans, normal stuff. one for db, one for rabbit, one for trove, one for storage backend, one for management.. those are all vxlans. If a vm needs to talk to the other vm, it goes up to neutron and back down. If the connection is external, i believe it go to neutron, to the router and back out. | 16:35 |
lowercase | So every vm, gets a fully routable ip address, a 10.x.x.x address, instead of the default, non routable 1.x.x.x address | 16:35 |
lowercase | im afk, be back in an hour or so | 16:37 |
*** rlandy is now known as rlandy|bbl | 21:58 | |
*** rlandy|bbl is now known as rlandy|out | 22:15 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!