*** mriedem has quit IRC | 00:38 | |
*** altlogbot_3 has quit IRC | 01:37 | |
*** altlogbot_0 has joined #openstack-placement | 01:38 | |
*** tetsuro has joined #openstack-placement | 01:39 | |
*** tetsuro has quit IRC | 02:12 | |
*** tetsuro has joined #openstack-placement | 02:44 | |
*** tetsuro_ has joined #openstack-placement | 02:51 | |
*** tetsuro has quit IRC | 02:53 | |
*** tetsuro_ has quit IRC | 03:17 | |
*** tetsuro has joined #openstack-placement | 03:55 | |
*** ykarel|away has joined #openstack-placement | 04:05 | |
*** tetsuro has quit IRC | 05:23 | |
*** tetsuro has joined #openstack-placement | 05:24 | |
*** tetsuro has quit IRC | 05:27 | |
*** tetsuro has joined #openstack-placement | 05:30 | |
*** ykarel|away is now known as ykarel | 05:39 | |
openstackgerrit | Merged openstack/placement master: Use TraitCache for Trait.get_by_name https://review.opendev.org/673750 | 05:48 |
---|---|---|
*** belmoreira has joined #openstack-placement | 06:36 | |
*** belmoreira has quit IRC | 06:37 | |
*** belmoreira has joined #openstack-placement | 06:37 | |
*** tssurya has joined #openstack-placement | 07:04 | |
*** cdent has joined #openstack-placement | 07:44 | |
*** ykarel is now known as ykarel|lunch | 08:02 | |
*** helenafm has joined #openstack-placement | 08:12 | |
*** tetsuro has quit IRC | 08:14 | |
openstackgerrit | Chris Dent proposed openstack/placement master: Run nested-perfload parallel correctly https://review.opendev.org/673505 | 08:19 |
openstackgerrit | Chris Dent proposed openstack/placement master: Implement a more complex nested-perfload topology https://review.opendev.org/673513 | 08:19 |
openstackgerrit | Chris Dent proposed openstack/placement master: Add apache benchmark (ab) to end of perfload jobs https://review.opendev.org/673540 | 08:19 |
openstackgerrit | Chris Dent proposed openstack/placement master: Optimize trait creation to check existence first https://review.opendev.org/673555 | 08:25 |
openstackgerrit | Chris Dent proposed openstack/placement master: Add RequestWideSearchContext.summaries_by_id https://review.opendev.org/674254 | 08:33 |
openstackgerrit | Chris Dent proposed openstack/placement master: Further optimize _build_provider_summaries https://review.opendev.org/674349 | 08:33 |
openstackgerrit | Chris Dent proposed openstack/placement master: Track usage info on RequestWideSearchContext https://review.opendev.org/674581 | 08:33 |
openstackgerrit | Chris Dent proposed openstack/placement master: Make _get_trees_with_traits return a set https://review.opendev.org/674630 | 08:33 |
openstackgerrit | Chris Dent proposed openstack/placement master: Use expanding bindparam in extend_usages_by_provider_tree https://review.opendev.org/674647 | 08:33 |
openstackgerrit | Chris Dent proposed openstack/placement master: WIP: Use orjson in python3 for allocation candidate dump https://review.opendev.org/674661 | 08:33 |
*** e0ne has joined #openstack-placement | 08:35 | |
openstackgerrit | Merged openstack/osc-placement master: Add Python 3 Train unit tests https://review.opendev.org/669478 | 08:46 |
*** tetsuro has joined #openstack-placement | 08:58 | |
*** tetsuro has quit IRC | 09:00 | |
*** tetsuro has joined #openstack-placement | 09:01 | |
*** tetsuro has quit IRC | 09:03 | |
*** ykarel_ has joined #openstack-placement | 10:17 | |
*** ykarel|lunch has quit IRC | 10:19 | |
*** ykarel_ is now known as ykarel | 10:27 | |
*** ykarel_ has joined #openstack-placement | 10:31 | |
*** ykarel has quit IRC | 10:34 | |
*** ykarel_ is now known as ykarel | 10:42 | |
*** ykarel is now known as ykarel|afk | 11:47 | |
*** ykarel|afk is now known as ykarel | 12:11 | |
edleafe | cdent: Thought you might enjoy this: https://nedbatchelder.com//blog/201908/why_your_mock_doesnt_work.html | 12:39 |
edleafe | I know you know the concepts behind it, but I thought this was the clearest explanation of why mocking can give unexpected results. | 12:39 |
cdent | yeah, saw that a couple days ago | 12:39 |
cdent | "At this point, you might be concerned: it seems like mocking is kind of delicate. " | 12:41 |
*** ykarel_ has joined #openstack-placement | 12:42 | |
*** ykarel has quit IRC | 12:44 | |
edleafe | Yep | 12:45 |
*** ykarel_ is now known as ykarel | 12:54 | |
*** mriedem has joined #openstack-placement | 13:00 | |
*** ykarel_ has joined #openstack-placement | 13:16 | |
*** ykarel has quit IRC | 13:18 | |
*** ykarel_ is now known as ykarel|afk | 13:19 | |
*** ykarel_ has joined #openstack-placement | 13:38 | |
*** ykarel|afk has quit IRC | 13:40 | |
*** belmoreira has quit IRC | 13:41 | |
*** belmoreira has joined #openstack-placement | 13:43 | |
*** ykarel__ has joined #openstack-placement | 13:45 | |
*** ykarel_ has quit IRC | 13:47 | |
*** ykarel__ is now known as ykarel | 13:48 | |
*** ykarel has quit IRC | 14:11 | |
*** ykarel has joined #openstack-placement | 14:12 | |
*** altlogbot_0 has quit IRC | 14:12 | |
*** N3l1x has joined #openstack-placement | 14:14 | |
*** N3l1x_ has joined #openstack-placement | 14:14 | |
*** altlogbot_1 has joined #openstack-placement | 14:14 | |
*** N3l1x_ has quit IRC | 14:15 | |
*** belmoreira has quit IRC | 14:49 | |
*** belmoreira has joined #openstack-placement | 15:05 | |
*** belmoreira has quit IRC | 15:13 | |
*** ykarel is now known as ykarel|away | 15:19 | |
*** helenafm has quit IRC | 15:49 | |
*** tssurya has quit IRC | 16:15 | |
*** e0ne has quit IRC | 16:48 | |
* cdent waves | 16:48 | |
*** cdent has quit IRC | 16:48 | |
*** N3l1x has quit IRC | 16:49 | |
*** e0ne has joined #openstack-placement | 18:17 | |
*** ykarel|away has quit IRC | 18:31 | |
*** e0ne has quit IRC | 18:31 | |
*** e0ne has joined #openstack-placement | 19:04 | |
*** mriedem has quit IRC | 19:08 | |
*** mriedem has joined #openstack-placement | 19:09 | |
*** spatel has joined #openstack-placement | 19:10 | |
spatel | Folks, I need help here, i hit this bug https://bugs.launchpad.net/nova/+bug/1829479 | 19:10 |
openstack | Launchpad bug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,Triaged] | 19:10 |
spatel | I have deleted compute service and re-build node and now getting this error - http://paste.openstack.org/show/755583/ | 19:11 |
spatel | How do i delete old uuid from placement service? | 19:11 |
*** e0ne has quit IRC | 19:13 | |
*** e0ne has joined #openstack-placement | 19:14 | |
*** e0ne has quit IRC | 19:15 | |
sean-k-mooney | said a different way when i triaged ^ its my assertion that nova could not delete the RP when the compute service was removed due to existing allocation | 19:22 |
sean-k-mooney | so spatel need to know how to list the allocation on the RP and delete them and the RP via curl | 19:22 |
sean-k-mooney | so that when the restart the compute agent it can create a new RP and not get a confilt on the RP name | 19:23 |
sean-k-mooney | there is a bug open for this in nova that mriedem was woking on at one point it think | 19:23 |
sean-k-mooney | but im about to get dinner and signing off for the day so i can walk spatel though doing this. | 19:23 |
spatel | thanks sean-k-mooney | 19:24 |
mriedem | sean-k-mooney: one of these it sounds like https://review.opendev.org/#/c/663737/ | 19:29 |
mriedem | bug 1829479 or bug 1817833 | 19:29 |
openstack | bug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,Triaged] https://launchpad.net/bugs/1829479 | 19:29 |
openstack | bug 1817833 in OpenStack Compute (nova) "Check compute_id existence when nova-compute reports info to placement" [Undecided,In progress] https://launchpad.net/bugs/1817833 - Assigned to xulei (605423512-j) | 19:29 |
mriedem | oh right spatel said that :) | 19:29 |
spatel | mriedem: i am stuck here :( | 19:30 |
spatel | finding way to delete RP | 19:30 |
mriedem | you have to find the migration uuids which are the consumers with allocatoins against the evacuated node's resource provider, | 19:30 |
mriedem | you should be able to list migratoins by migration type (evacuate) and host | 19:30 |
spatel | mriedem: my issue isn't related to migration | 19:31 |
mriedem | it's related to evacuate, | 19:31 |
mriedem | but under the covers nova creates an entry in the 'migrations' table in the cell db | 19:31 |
spatel | I had working compute node in cluster which i re-build to adjust disk size but my mistake was i delete compute service :( | 19:31 |
mriedem | and that migration record has a uuid which is the placement allocation consumer of the source node resources during the evacuate | 19:32 |
spatel | hmmm | 19:32 |
spatel | mriedem: interesting, how do i find migration uuid here? | 19:33 |
mriedem | https://docs.openstack.org/api-ref/compute/?expanded=list-migrations-detail#list-migrations | 19:33 |
mriedem | using microversion >= 2.59 | 19:33 |
mriedem | you can filter on migration_type=evacuation and source_compute=<host that you evacuated> | 19:34 |
mriedem | then cross check that https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-show (openstack resource provider show --allocations <rp_uuid>) | 19:35 |
mriedem | for each of the matching migration allocation consumers, you need to delete those using https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete | 19:35 |
spatel | mriedem: i think this is to much for me to eat.. i am new and trying to understand what these document saying.. | 19:36 |
spatel | is there a command or something which i can list and delete stuff? | 19:36 |
mriedem | you're not that new - you've been around for at least a year asking sean-k-mooney to help you with stuff | 19:36 |
mriedem | those links are to the placement commands | 19:36 |
spatel | yes but not good at this placement domain :) | 19:36 |
mriedem | you can use https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-migration-list for listing migrations with the nova cli | 19:36 |
spatel | do i need OSC-Placement plugin? | 19:37 |
mriedem | yes | 19:37 |
mriedem | just pip install it | 19:37 |
spatel | just did - pip install osc-placement | 19:37 |
mriedem | unfortunately 'nova migration-list' doesn't have a specific filter option for migration_type or source_compute | 19:38 |
spatel | mriedem: hey - openstack resource provider list | 19:40 |
spatel | i can see the list now.. so my osc plugin is working at least | 19:40 |
mriedem | yeah you can filter by hostname | 19:40 |
mriedem | openstack resource provider list --name <hostname> | 19:41 |
-spatel- [root@ostack-osa-2 ~ (admin)]> openstack resource provider list | grep ostack-compute-bld-sriov-2-1.v1v0x.net | 19:41 | |
-spatel- | 93d7ff00-d4ee-4b7c-9d23-8265554ed99b | ostack-compute-bld-sriov-2-1.v1v0x.net | 2 | | 19:41 | |
spatel | can i delete this uuid? | 19:41 |
mriedem | no, placement won't let you because it has allocations against it | 19:42 |
mriedem | you can try but i don't think it will work | 19:42 |
mriedem | openstack resource provider delete 93d7ff00-d4ee-4b7c-9d23-8265554ed99b | 19:42 |
spatel | done, its gone | 19:42 |
spatel | let me restart compute agent services | 19:42 |
spatel | [root@ostack-compute-bld-sriov-2-1 ~]# systemctl restart nova* | 19:43 |
spatel | checking logs for placement error | 19:44 |
spatel | so far logs are clean.. let me trying to build instance | 19:45 |
spatel | mriedem: so placement related error is gone but look like still nova isn't happy | 19:48 |
spatel | during vms build get error - {"message": "No valid host was found. There are not enough hosts available | 19:49 |
spatel | my /var/log/nova/nova-compute.log logs are freeze, mostly these logs are very chatty | 19:50 |
spatel | I can see | 19:50 |
-spatel- [root@ostack-osa-2 ~ (admin)]> openstack resource provider list | grep ostack-compute-bld-sriov-2-1.v1v0x.net | 19:50 | |
-spatel- | 5f38b898-cc22-49b6-935d-847d1b440bdc | ostack-compute-bld-sriov-2-1.v1v0x.net | 3 | | 19:50 | |
*** e0ne has joined #openstack-placement | 19:51 | |
spatel | let me restart placement service on controller nodes | 19:51 |
mriedem | novalidhost is a scheduling issue, | 19:53 |
mriedem | so check scheduler logs | 19:53 |
mriedem | and/or placement-api logs | 19:53 |
mriedem | might need to enable debug logging to see which filter(s) rejected the request | 19:53 |
spatel | in scheduler logs i am seeing - Received a sync request from an unknown host 'ostack-compute-bld-sriov-2-1.v1v0x.net'. Re-created its InstanceList. | 19:54 |
spatel | but its INFO | 19:54 |
spatel | scheduler / placement logs are clean not a single error anywhere | 19:56 |
spatel | very strange | 20:07 |
spatel | all logs are clean | 20:07 |
mriedem | that sync thing is unrelated | 20:08 |
spatel | i am running compute nova in debug and didn't find any error or issue | 20:08 |
mriedem | the novalidhost error might be in the conductor logs on the nova side | 20:08 |
mriedem | placement-api logs would have filtering at debug level | 20:08 |
spatel | look like nova compute not updating resources to scheduler | 20:08 |
mriedem | nova-compute is reporting information to placement, not hte scheduler | 20:08 |
mriedem | and the scheduler is making decisions by asking placement what's available | 20:09 |
mriedem | which release are you running? | 20:09 |
spatel | stein | 20:09 |
mriedem | then if you enable debug in placement you should see some logging about filtering allocation candidates | 20:10 |
mriedem | during a scheduling request from nova | 20:10 |
spatel | you know what is interesting thing... i re-kick 15 compute nodes and all of them showing same behavior | 20:10 |
spatel | for testing i build new compute node (new, this wan't there before) and it working fine.. | 20:11 |
spatel | look like when you re-kick existing compute they somehow doesn't like.. | 20:11 |
spatel | mriedem: let me enable debug in placement, i have 3 controller node so let me try one | 20:12 |
spatel | mriedem: is this correct file - /etc/uwsgi/nova-api-os-compute.ini | 20:14 |
spatel | or /etc/nova/nova.conf | 20:15 |
spatel | let me try nova.conf first | 20:16 |
spatel | I can see DEBUG in logs but nothing interesting | 20:21 |
mriedem | well are you actually trying to schedule a new vm? | 20:21 |
spatel | http://paste.openstack.org/show/755585/ | 20:21 |
mriedem | there isn't probably anything interesting at steady stae | 20:21 |
mriedem | *State | 20:21 |
spatel | Yes trying to build new vm and saying no host available | 20:21 |
spatel | if i try to build vm on other compute it works | 20:22 |
spatel | these 15 computes nodes totally zombie now i can't build anything even they are in hypervisour list | 20:22 |
spatel | why compute node not sending periodic updates to placement? | 20:24 |
spatel | Do you think this is smoking gun in nova-compute.log file - http://paste.openstack.org/show/755586/ | 20:25 |
spatel | Lock "compute_resources" acquired by "nova.compute.resource_tracker._update_available_resource" | 20:25 |
spatel | bunch of Lock statment | 20:25 |
spatel | statements* | 20:26 |
spatel | mriedem: ^^ | 20:27 |
mriedem | that's from the update_available_resource periodic task, | 20:30 |
mriedem | it's normal | 20:30 |
mriedem | runs every minute by default | 20:30 |
mriedem | check that there is inventory for the provider, openstack resource provider inventory list 5f38b898-cc22-49b6-935d-847d1b440bdc | 20:31 |
-spatel- [root@ostack-osa-2 ~ (admin)]> openstack resource provider inventory list 5f38b898-cc22-49b6-935d-847d1b440bdc | 20:33 | |
-spatel- +----------------+------------------+----------+----------+-----------+----------+-------+ | 20:33 | |
-spatel- | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | | 20:33 | |
-spatel- +----------------+------------------+----------+----------+-----------+----------+-------+ | 20:33 | |
-spatel- | VCPU | 2.0 | 32 | 0 | 1 | 1 | 32 | | 20:33 | |
-spatel- | MEMORY_MB | 1.0 | 65501 | 2048 | 1 | 1 | 65501 | | 20:33 | |
-spatel- | DISK_GB | 1.0 | 431 | 0 | 1 | 1 | 431 | | 20:33 | |
-spatel- +----------------+------------------+----------+----------+-----------+----------+-------+ | 20:33 | |
spatel | everything looks OK | 20:33 |
spatel | look like somewhere its holding some info of old data which it doesn't like.. | 20:37 |
spatel | i can try to re-kick machine and re-add | 20:37 |
mriedem | well you can check your scheduler logs for this https://github.com/openstack/nova/blob/stable/stein/nova/scheduler/manager.py#L149 | 20:39 |
mriedem | if you see that, it means placement is filtering things out and enabling debug logs on the placement side should show what is filtering the allocatoin candidates, | 20:39 |
mriedem | if placement is returning candidates, then the scheduler debug logs should show which filters are kicking out the host(s) | 20:40 |
spatel | let me try that | 20:40 |
spatel | grep -i "Got no allocation candidates from the Placement" /var/log/nova/nova-scheduler.log | 20:42 |
spatel | nothing found on all 3 controller nodes | 20:42 |
mriedem | then you should see logs from your enabled filters rejecting hosts | 20:43 |
mriedem | in here https://github.com/openstack/nova/blob/stable/stein/nova/filters.py#L68 | 20:44 |
mriedem | ^ gives the summary | 20:44 |
spatel | let me dig | 20:45 |
mriedem | i have to run | 20:46 |
*** mriedem is now known as mriedem_afk | 20:46 | |
*** e0ne has quit IRC | 20:48 | |
*** spatel has quit IRC | 21:09 | |
*** mriedem_afk is now known as mriedem | 21:20 | |
*** takashin has joined #openstack-placement | 23:27 | |
*** mriedem has quit IRC | 23:40 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!