*** tetsuro has joined #openstack-placement | 00:10 | |
*** tetsuro_ has joined #openstack-placement | 04:09 | |
*** tetsuro has quit IRC | 04:11 | |
*** tetsuro has joined #openstack-placement | 04:31 | |
*** tetsuro_ has quit IRC | 04:31 | |
*** openstackgerrit has joined #openstack-placement | 04:41 | |
openstackgerrit | Merged openstack/placement master: Add irrelevant files list to perfload job https://review.openstack.org/624047 | 04:41 |
---|---|---|
*** e0ne has joined #openstack-placement | 05:52 | |
*** e0ne has quit IRC | 05:53 | |
*** avolkov has joined #openstack-placement | 06:03 | |
*** takashin has left #openstack-placement | 06:36 | |
*** tssurya has joined #openstack-placement | 07:41 | |
*** helenafm has joined #openstack-placement | 08:22 | |
*** tssurya has quit IRC | 09:25 | |
*** tetsuro has quit IRC | 09:36 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Configure database api in upgrade check https://review.openstack.org/632365 | 09:51 |
*** cdent has joined #openstack-placement | 10:12 | |
cdent | thanks for paying attention tetsuro | 10:15 |
cdent | gibi: if you're happy with https://review.openstack.org/#/c/632365/ can you kick it in? It's a bug fix to the status command that didn't get fully tested before merged | 10:16 |
gibi | cdent: done | 10:20 |
cdent | thanks | 10:20 |
*** e0ne has joined #openstack-placement | 10:24 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Configure database api in upgrade check https://review.openstack.org/632365 | 10:32 |
*** tetsuro has joined #openstack-placement | 10:34 | |
cdent | I think maybe we should just let tetsuro do everything, he's the only one paying sufficient attention to correctness :) | 10:37 |
tetsuro | cdent: anyway thanks for re-approving! :) | 10:38 |
cdent | thank you | 10:39 |
gibi | :) | 10:46 |
*** ttsiouts has joined #openstack-placement | 12:22 | |
*** e0ne has quit IRC | 12:25 | |
*** e0ne has joined #openstack-placement | 12:30 | |
*** tssurya has joined #openstack-placement | 12:38 | |
*** tetsuro has quit IRC | 12:53 | |
*** mriedem has joined #openstack-placement | 13:18 | |
openstackgerrit | Merged openstack/placement master: Configure database api in upgrade check https://review.openstack.org/632365 | 14:01 |
*** e0ne has quit IRC | 14:05 | |
*** e0ne has joined #openstack-placement | 14:08 | |
*** mriedem has quit IRC | 14:19 | |
*** mriedem has joined #openstack-placement | 14:21 | |
*** ttsiouts has quit IRC | 14:39 | |
*** ttsiouts has joined #openstack-placement | 14:39 | |
*** ttsiouts has quit IRC | 14:41 | |
*** ttsiouts has joined #openstack-placement | 14:41 | |
*** efried_mlk is now known as efried | 14:45 | |
*** rubasov_ is now known as rubasov | 14:50 | |
*** efried1 has joined #openstack-placement | 15:00 | |
*** efried has quit IRC | 15:01 | |
*** efried1 is now known as efried | 15:01 | |
*** avolkov has quit IRC | 15:34 | |
*** openstackgerrit has quit IRC | 15:51 | |
*** efried has quit IRC | 16:00 | |
*** efried has joined #openstack-placement | 16:09 | |
*** dims has quit IRC | 16:15 | |
*** dims has joined #openstack-placement | 16:20 | |
*** e0ne has quit IRC | 16:38 | |
*** e0ne has joined #openstack-placement | 16:39 | |
*** mriedem is now known as mriedem_away | 16:39 | |
*** ttsiouts has quit IRC | 16:55 | |
*** ttsiouts has joined #openstack-placement | 16:55 | |
*** ttsiouts has quit IRC | 17:00 | |
*** efried has quit IRC | 17:00 | |
*** e0ne has quit IRC | 17:02 | |
*** helenafm has quit IRC | 17:10 | |
*** efried has joined #openstack-placement | 17:49 | |
*** avolkov has joined #openstack-placement | 18:35 | |
*** e0ne has joined #openstack-placement | 19:07 | |
*** gryf has joined #openstack-placement | 19:29 | |
*** tssurya has quit IRC | 19:36 | |
*** e0ne has quit IRC | 19:42 | |
*** mriedem_away is now known as mriedem | 20:16 | |
*** alanmeadows has joined #openstack-placement | 20:37 | |
jaypipes | alanmeadows: whatup g-money? | 20:37 |
alanmeadows | Ahoy folks. | 20:37 |
alanmeadows | We had change go out to a number of production sites that adjusted the hostname to nova agents (agent stops reporting as `host` and starts reporting as `host.fqdn`) | 20:38 |
jaypipes | alanmeadows: lemme guess... doubled-up resource provider records? :) | 20:39 |
jaypipes | alanmeadows: and a scheduler that suddenly thinks you've got a shitload of extra capacity? | 20:39 |
alanmeadows | yes along those lines | 20:39 |
*** dims has quit IRC | 20:40 | |
alanmeadows | pci scheduling conflicts obviously, as the pci_devices table is populated with unallocated entries for these "new" nodes | 20:40 |
alanmeadows | but let me walk through what was tried quickly | 20:40 |
jaypipes | alanmeadows: and you're looking for a quick hotfix to get the data ungoofed? | 20:40 |
alanmeadows | and link that up to the placement question | 20:40 |
jaypipes | ack | 20:40 |
alanmeadows | we got the bright idea given no resources have been created using the new compute_nodes entry | 20:41 |
alanmeadows | that we would revive the compute_nodes old entry, and update it to the new fqdn, and deactivate the new one | 20:41 |
*** dims has joined #openstack-placement | 20:42 | |
alanmeadows | dancing around the unique constraints | 20:42 |
alanmeadows | we then discovered a `uuid` in compute_nodes that clearly links the node to the placement tables | 20:42 |
alanmeadows | and finally on to the bit confusing us | 20:43 |
jaypipes | alanmeadows: yes, that's essentially what you'll need to do. the only issue is that you're going to need to first delete the placement resource_providers table records that refer to the new fqdns | 20:43 |
alanmeadows | well this is whats weird | 20:43 |
alanmeadows | the resource_providers table has an entry for the new, target fqdn name, with the wrong uuid | 20:44 |
alanmeadows | we can fix that, sure | 20:44 |
alanmeadows | but whats odd is there is no entry for the old agent name like there was in compute_nodes | 20:44 |
alanmeadows | on top of that there are no allocations for the new entry generated in resource_providers | 20:44 |
jaypipes | alanmeadows: that is indeed weird. | 20:45 |
alanmeadows | much like the magic conversion nova will do for deactivating dupes in compute_nodes (set shortname to deleted=1, ...) | 20:45 |
jaypipes | alanmeadows: there cannot be allocations referring to the old provider UUIDs but no entries in the resource_providers table with those UUIDs. | 20:45 |
alanmeadows | to make a transition from short->long or long->short hostnames painless | 20:45 |
alanmeadows | it almost seems like some magic transition happened in the placement data, and all allocations lost in the process | 20:46 |
jaypipes | alanmeadows: so all allocations are gone? | 20:46 |
alanmeadows | all allocations for a node that has undergone this hostname transition of short->fqdn are gone | 20:47 |
jaypipes | yikes. | 20:47 |
alanmeadows | in a site where this happened to all nodes before it was noticed | 20:47 |
jaypipes | mriedem: ^^ | 20:47 |
alanmeadows | the allocations table is an empty set | 20:47 |
jaypipes | alanmeadows: is this a flashing red lights situation? | 20:49 |
jaypipes | alanmeadows: i.e. production site with no immediate way of recovering | 20:49 |
alanmeadows | it has some people quite interested in the outcome and a resolution ;-) | 20:49 |
jaypipes | heh, ok. | 20:50 |
jaypipes | we really need to put some sort of barrier/prevention in place when/if we notice a my_hostname CONF change... | 20:50 |
*** dims has quit IRC | 20:50 | |
jaypipes | or whatever the CONF option is called that determines the nova-compute service name. can't ever remember it. | 20:50 |
jaypipes | my_ip? | 20:51 |
jaypipes | meh, whatever... | 20:51 |
jaypipes | alanmeadows: lemme have a think. | 20:52 |
*** dims has joined #openstack-placement | 20:52 | |
jaypipes | alanmeadows: if you reset the hostname for a service back to its original hostname and restart the nova-compute service, what happens? | 20:52 |
alanmeadows | oh thats definitely coming under strict control as a lesson learned | 20:52 |
alanmeadows | I did not try that scenario without the mucking | 20:53 |
mriedem | hostname changes will result in a new compute node record | 20:53 |
mriedem | which means a new resource provider with new uuid | 20:53 |
mriedem | compute nodes are unique per hostname/nodename (which for kvm is the same) | 20:53 |
alanmeadows | We attempt to roll forward with the fqdn transition but preserve mappings | 20:54 |
jaypipes | mriedem: right, but apparently something deletes all the instances/allocations on the old provider in the process... | 20:54 |
alanmeadows | looking at nodes that underwent the transition | 20:54 |
mriedem | probably the resource tracker | 20:54 |
alanmeadows | they have the highest ID increment in resource_providers | 20:54 |
mriedem | https://github.com/openstack/nova/blob/31956108e6e785407bdcc31dbc8ba99e6a28c96d/nova/compute/resource_tracker.py#L1244 | 20:54 |
alanmeadows | as though something deleted their `short` version, created the `longName` version and cascaded the allocations | 20:54 |
mriedem | my guess would be either something in the RT or something on compute restart thinking an evacuation happened | 20:56 |
jaypipes | mriedem: right, but https://github.com/openstack/nova/blob/31956108e6e785407bdcc31dbc8ba99e6a28c96d/nova/compute/resource_tracker.py#L792-L794 should not delete instances from the *old* hostname, since self._compute_nodes[nodename] (where nodename == new FQDN) should yield no results for InstanceList.get_all_by_host_and_nodename(), right? | 20:56 |
mriedem | https://github.com/openstack/nova/blob/31956108e6e785407bdcc31dbc8ba99e6a28c96d/nova/compute/manager.py#L628 | 20:57 |
jaypipes | mriedem: that's migrations, though, which again, shouldn't be returning anything in this case (of a hostname rename) | 20:58 |
jaypipes | or at least, I *think* that's the case. alanmeadows, what version of openstack are you using? | 20:59 |
alanmeadows | This is ocata | 20:59 |
alanmeadows | in this instance | 20:59 |
mriedem | the evac migrations robustification was added by dan because in the olden times a hostname change would make compute think an evac happened and delete your instances | 20:59 |
mriedem | but that was liberty i think | 20:59 |
mriedem | https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/robustify_evacuate.html | 20:59 |
jaypipes | yeah, ocata code is identical as no | 21:00 |
jaypipes | yeah, ocata code is identical as now | 21:00 |
jaypipes | so I don't believe it's the evacuate code path that is the issue here. | 21:00 |
mriedem | so the allocations were deleted for the old provider in placement or just not there for the new provider? | 21:00 |
jaypipes | alanmeadows: UNLESS... your deployment tooling issued some sort of host-evacuate call in doing this rename of hostname FQDNs? | 21:00 |
alanmeadows | I'd be ok with them not being there for the new provider | 21:01 |
alanmeadows | I could deal with that | 21:01 |
alanmeadows | its that they appear to be gone entirely | 21:01 |
alanmeadows | @jaypipes: definitely no, no nova calls, just a /etc/hosts ordering and `domain` resolv updates. | 21:02 |
mriedem | as a workaround you could run the heal_allocations CLI but that's not in ocata you'd have to backport it or run from a container https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement | 21:02 |
alanmeadows | didn't know about that | 21:02 |
alanmeadows | potential contender | 21:03 |
alanmeadows | one grounding question though | 21:05 |
alanmeadows | these agents that have undergone this name transition | 21:06 |
alanmeadows | they come up, and wish to report to the world their PciDevicePool counts are 100% available | 21:07 |
alanmeadows | when obviously resources have been assigned | 21:07 |
alanmeadows | they also overwrite any pci_devices entries for that node id that may have been allocated to unallocated | 21:07 |
alanmeadows | and so at the end of the day, am I chasing the right thing with there being empty allocations records for these hosts | 21:08 |
jaypipes | alanmeadows: well, PCI devices are not handled by placement unfortunately (or... fortunately, for you at least) | 21:09 |
jaypipes | alanmeadows: so we need to separate out the placement DB's allocations table issues from the pci_devices table issues, because they are handled differently. | 21:09 |
alanmeadows | sure the authority separation I get | 21:10 |
alanmeadows | I arrived at missing data in the allocations table but I started with | 21:10 |
alanmeadows | why these agents are reporting an incorrect state to the world | 21:10 |
jaypipes | alanmeadows: are there still entries in the pci_devices table that refer to the original compute nodes table records for the original hostname? | 21:13 |
alanmeadows | yes, until the agent starts up and whacks them | 21:14 |
alanmeadows | oh, re-read your question | 21:14 |
alanmeadows | yes, there were | 21:14 |
alanmeadows | but recall our brilliant idea about how to back out of this | 21:14 |
alanmeadows | and preserve mappings | 21:14 |
jaypipes | alanmeadows: ok, so at least *that* issue should be easy to resolve... | 21:14 |
jaypipes | alanmeadows: need to drop jules off ... back in about 20 mins | 21:14 |
alanmeadows | was to revive the original compute_nodes entry | 21:15 |
alanmeadows | by updating its hypervisor_name | 21:15 |
alanmeadows | and when we do this | 21:15 |
alanmeadows | the correct pci_devices entries for that older node name (but now updated) | 21:15 |
alanmeadows | are clobbered | 21:15 |
alanmeadows | and set all set to available | 21:16 |
alanmeadows | we're just trying this approach out on one host | 21:16 |
alanmeadows | https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6658-L6671 | 21:18 |
* alanmeadows blinks | 21:18 | |
*** efried has quit IRC | 21:30 | |
*** efried has joined #openstack-placement | 21:30 | |
jaypipes | mriedem, alanmeadows: that looks to be it. | 21:39 |
alanmeadows | How much do we trust https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement | 21:40 |
alanmeadows | This does look to be an answer for rebuilding this data | 21:40 |
alanmeadows | without having to go off and figure out how to cobble it | 21:40 |
mriedem | i wrote it | 21:49 |
mriedem | but that doesn't mean you have to trust it | 21:50 |
mriedem | run it with --max-count of 1 if you don't trust it | 21:50 |
mriedem | i thought about adding a --dry-run option but didn't have time | 21:50 |
mriedem | mnaser also has a script to fix up allocations i think | 21:51 |
alanmeadows | whatever the nova elders believe is the best approach | 21:53 |
alanmeadows | ensuring I can still read docs I should be fine with an RPC version of 1.28 against a rocky nova-manage to leverage heal_allocations | 21:54 |
alanmeadows | aka rocky nova-manage on ocata nova/placement for `heal_allocations` seems to be "ok" | 21:55 |
jaypipes | alanmeadows: honestly, I'm still trying to figure out if the code you link above is actually the thing that is deleting the resource provider and allocation records. | 21:56 |
jaypipes | mriedem: I mean, wouldn't self.host by equal to the new FQDN in this line? https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6675 and therefore it would not find the old compute node record and call destroy() on it? | 21:58 |
mriedem | alanmeadows: heal_allocations shouldn't require anything over rpc | 22:00 |
mriedem | it's all db | 22:00 |
jaypipes | mriedem: ahhhhhhhhh | 22:00 |
jaypipes | mriedem: I think I understand now what happened... | 22:01 |
mriedem | well, was "Deleting orphan compute node" in the logs? | 22:01 |
jaypipes | alanmeadows: I bet you didn't change the nova.cnf file's CONF.host option when you changed the hostname of the compute nodes, right? | 22:02 |
alanmeadows | mriedem: excellent question, working on an answer to that | 22:02 |
alanmeadows | jaypipes: we do not use `host` at this time, but clearly after this, we will drive it going forwar | 22:03 |
alanmeadows | to avoid any shuffling not at our consent | 22:03 |
alanmeadows | we let nova determine it | 22:04 |
alanmeadows | and of course, no one likes moving targets | 22:04 |
jaypipes | alanmeadows: and in doing so, there was a mismatch between the CONF.host value and what was returned by the virt driver's get_available_nodes() method (called from here: https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6654). the issue is that get_available_nodename() doesn't use CONF.host. It uses the hypervisor's local hostname which would be different (https://github.com/openstack/nova/blob/stable/ocata/nova/ | 22:06 |
jaypipes | virt/libvirt/host.py#L681-L691) | 22:06 |
jaypipes | alanmeadows: and that's what caused the delete of orphaned compute nodes to run. | 22:06 |
jaypipes | mriedem: so, basically, nova-compute started up thinking it was the old hostname, libvirt told the compute manager it was the new hostname, and the compute manager deleted the compute node record referring to the old hostname. | 22:07 |
mriedem | yup, that's what the old evac issue was like | 22:09 |
mriedem | that code in the compute manager is really meant for ironic | 22:09 |
alanmeadows | luckily we don't allow orphan vms to be cleaned | 22:09 |
alanmeadows | or ... oops. | 22:09 |
*** cdent has quit IRC | 22:50 | |
*** efried has quit IRC | 22:53 | |
alanmeadows | looks like heal_allocations will require backporting | 23:01 |
mnaser | seems like hostname changing fun? :\ | 23:40 |
mnaser | we just reboot servers on hostname changes now | 23:40 |
alanmeadows | since you popped in mnaser... | 23:41 |
alanmeadows | mriedem mentioned you had a script for fixing up allocations | 23:42 |
mnaser | i pasted it somewhere hmm | 23:42 |
mnaser | it was more meant for cleaning up in the sense of removing entries that should not be there | 23:42 |
alanmeadows | attempting to slam heal_allocations into ocata is proving... fun | 23:42 |
alanmeadows | so if you have something more simplistic about | 23:42 |
mnaser | but you could maybe rewrite it using the foundation to do more | 23:42 |
mnaser | let me find it | 23:42 |
mnaser | it was in a launchpad somewhere.. | 23:43 |
mnaser | the world's worst site to search | 23:43 |
mnaser | alanmeadows: https://bugs.launchpad.net/nova/+bug/1793569 | 23:44 |
openstack | Launchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed] | 23:44 |
mnaser | http://paste.openstack.org/show/734146/ | 23:44 |
alanmeadows | this is much more hackable | 23:45 |
mnaser | so the idea is it hits the nova os-hypervisors api | 23:45 |
mnaser | and then kinda just does an audit comparing things back and forth | 23:45 |
alanmeadows | I'm not convinced the ocata placement api has everything heal_allocations wants (the report client definitely does not, but was fixing) - that rabbit hole feeling was creeping over me | 23:46 |
mnaser | if you can keep somewhat the same logic and add a way to make sure entries which are missing get added, it'll be even more useful | 23:46 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!