fungi | clarkb: the reason we turned off autoreload is that it basically dropped all pending tasks in the queue at reload | 00:00 |
---|---|---|
fungi | not sure if latest gerrit still has that behavior, but it was resulting in lots of lost replication tasks and stale repo mirrors | 00:01 |
Clark[m] | I think I saw it say it does similar in the docs. That would explain it. I knew there was a good reason just didn't remember specifics | 00:08 |
fungi | well, at the time we were making much more frequent changes to the replication config. now we hardly change it at all so it might be okay? but ultimately there's still some risk | 02:13 |
*** ykarel_ is now known as ykarel | 03:57 | |
*** ysandeep|away is now known as ysandeep | 05:09 | |
*** bhagyashris|off is now known as bhagyashris | 05:34 | |
*** jpena|off is now known as jpena | 07:28 | |
newopenstack | Need to setup openstack with 6 server and wnat to use Maas and Juju | 07:46 |
newopenstack | please advise. | 07:46 |
newopenstack | and then want to grow the infrastucture to more compute nodes | 07:46 |
newopenstack | also want to use some storage center from dell | 07:46 |
newopenstack | please share some guidelines. | 07:47 |
newopenstack | any one can help .. please | 07:47 |
*** ykarel__ is now known as ykarel | 08:03 | |
*** ykarel is now known as ykarel|lunch | 08:20 | |
*** ykarel|lunch is now known as ykarel | 09:27 | |
*** odyssey4me is now known as Guest65 | 10:07 | |
opendevreview | Michal Nasiadka proposed opendev/bindep master: Add Rocky Linux support https://review.opendev.org/c/opendev/bindep/+/809362 | 10:11 |
*** ysandeep is now known as ysandeep|brb | 10:52 | |
*** dviroel|out is now known as dviroel | 11:20 | |
*** jpena is now known as jpena|lunch | 11:21 | |
*** ysandeep|brb is now known as ysandeep | 11:52 | |
*** ykarel is now known as ykarel|afk | 11:54 | |
*** jpena|lunch is now known as jpena | 12:21 | |
fungi | newopenstack: sorry, this is the channel where we coordinate the services which make up the opendev collaboratory. you're probably looking for the #openstack channel or more likely the openstack-discuss@lists.openstack.org mailing list | 12:55 |
fungi | newopenstack: though since you mentioned maas and juju (software made by canonical, it's not part of openstack really) you might want to be looking closer at https://ubuntu.com/openstack | 12:56 |
fungi | hope that helps! | 12:56 |
opendevreview | Merged openstack/project-config master: Add openstack-loadbalancer charm and interfaces https://review.opendev.org/c/openstack/project-config/+/807838 | 13:08 |
*** ykarel|afk is now known as ykarel | 13:20 | |
*** slaweq__ is now known as slaweq | 13:23 | |
*** frenzy_friday is now known as anbanerj|ruck | 13:35 | |
*** odyssey4me is now known as Guest74 | 13:42 | |
*** ysandeep is now known as ysandeep|dinner | 14:26 | |
opendevreview | daniel.pawlik proposed opendev/puppet-log_processor master: Add capability with python3; add log request cert verify https://review.opendev.org/c/opendev/puppet-log_processor/+/809424 | 14:55 |
*** ykarel is now known as ykarel|away | 15:01 | |
*** marios is now known as marios|out | 15:33 | |
clarkb | We have no currently leaked replication tasks | 15:41 |
*** ysandeep|dinner is now known as ysandeep|out | 15:41 | |
clarkb | I've just confirmed the inmotion boots continue to fail. Will try and dig into that after some breakfast | 15:44 |
clarkb | I've got tails running against the three different servers' nova api error logs. If that doesn't record anything interesting in the next bit I'll dig in further. I expect this should give me a clue in the next few minutes though | 16:16 |
clarkb | The api was very quiet. looking at other things I find messages like Instance f98ce366-90b1-43ba-8513-bf2ea559c931 has allocations against this compute host but is not found in the database. in the nova compute log | 16:28 |
clarkb | I suspect that may be the underlying cuase? we're leaking instances that don't exist but count against quota? | 16:28 |
*** jpena is now known as jpena|off | 16:28 | |
clarkb | hrm no quotas as reported by openstackclient look fine | 16:30 |
clarkb | "Allocations" seems to be what placement does | 16:31 |
fungi | might be a question for #openstack-nova | 16:32 |
clarkb | nova.exception_Remote.NoValidHost_Remote: No valid host was found. <- is what the conductor says | 16:32 |
clarkb | so ya I think what is happening is placement is unable to place possibly ebcause it has leaky allocations. | 16:33 |
clarkb | https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html is the indicated solution from the nova channel | 16:39 |
*** ysandeep|out is now known as ysandeep | 16:39 | |
clarkb | thank you melwitt! | 16:39 |
clarkb | I'll have to digest that and dig around and see if I can fix things. | 16:39 |
melwitt | clarkb: lmk if you run into any issues or have questions and I will help | 16:40 |
clarkb | will do | 16:40 |
fungi | yeah, this particular provider is unique in that they give us an automatically deployed turn-key/cookie-cutter openstack environment, but it's mostly us on the hook if it falls over | 16:42 |
clarkb | any idea what provides the openstack resource provider commands to osc? seems my installs don't have that | 16:43 |
melwitt | clarkb: osc-placement is the osc plugin you need | 16:44 |
melwitt | you just install it and then it works | 16:44 |
clarkb | thanks | 16:44 |
clarkb | and now I've hit policy problems. I think I need to escalte my privs. I expect the next bit will just be me stumbling around to find the correct incantations :) | 16:45 |
melwitt | clarkb: placement api is defaulted to admin-only | 16:46 |
clarkb | melwitt: I've found the env to administrate the environment and can run the resource provider commands. When I run openstack server list --all-projects only one VM shows up (our mirror). In the doc you shared it showed performing actions for specific VMs but I don't seem to have that here. In this case would I just run the heal command first? | 16:54 |
clarkb | and i guess make note of the allocation for the single VM that is presnet first | 16:54 |
* melwitt looks | 16:55 | |
clarkb | there also doesn't appear to be a way to list all resource allocations. | 16:57 |
melwitt | clarkb: ok yeah sorry, heal_allocations is when you still have the server and want to "heal" it. but it might still work if you pass the uuid of the server from the error message | 16:58 |
melwitt | if not, we'll want to do allocation deletes directly | 16:58 |
clarkb | melwitt: got it. Do you know if there is a way to list the allocations? I can show the allocations for the uuids in the logs and they show up but I can't seem to do a listing of all of them | 16:58 |
clarkb | but worst case I can parse the log and generate a list to operate on. That should be doable | 16:59 |
melwitt | listing allocations can be done per resource provider by 'openstack resource provider show <compute node uuid> --allocations" | 16:59 |
clarkb | aha thanks! | 17:00 |
melwitt | compute node uuid == resource provider uuid | 17:00 |
clarkb | I think I have what I need then. I can list all the allocations. Remove allocation(s) for the mirror VM then iterate over that list deleting the allocations and healing them | 17:01 |
melwitt | yeah you just want to remove allocations for any servers that no longer exist | 17:02 |
melwitt | i.e. "not in the database" | 17:02 |
melwitt | and the "consumer" uuids in placement map to the server uuids in nova | 17:03 |
melwitt | most of the time consumer == nova server/instance | 17:04 |
melwitt | I say "most of the time" because other services/entities can consume resources in placement as well | 17:05 |
clarkb | makes snse. in this case I only see allocations that seem to map to nova | 17:06 |
clarkb | their attributes have servery things like memory and disk and cpus | 17:06 |
melwitt | ah yeah | 17:11 |
melwitt | you are right, those are nova | 17:11 |
clarkb | melwitt: do I need to run the heal command at all if these instances don't exist? I should be able to simply delete the allocations then I am done? Or are there other side effects of the heal that I want? | 17:16 |
melwitt | clarkb: no I think heal is when the instance is still around but has some extra allocations from env "irregularities" during migrations etc. you are good to just delete for these servers that were deleted in the past | 17:16 |
clarkb | melwitt: thanks for confirming | 17:17 |
melwitt | clarkb: ok so sorry but I got the tools mixed up 😓 this is the one I should have told you https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement-audit for this case where you want to delete ones that no longer exist | 17:18 |
clarkb | melwitt: oh thanks | 17:18 |
melwitt | 'nova-manage placement audit --verbose' will iterate over all resource providers and look for orphaned allocations and if you pass --delete it will delete them for you | 17:19 |
clarkb | I'll try that before I manually delete over my list. Though I have to figure out whwere the nova-manage command is. I think it must be in one of the containers. Does nova-manage talk to the apis like osc and need those credentials or is it more behind the scenes? | 17:19 |
clarkb | looks like it reads configs directly in the install somewhere | 17:20 |
melwitt | yeah was just looking through, it does call the placement api as well but you don't need your own creds for it | 17:22 |
clarkb | alright it cleaned up 65 allocations and the mirror still shows up with its allocations | 17:22 |
clarkb | now we wait and see if nodepool can launch successfully | 17:23 |
melwitt | ok cool | 17:23 |
clarkb | melwitt: doing it the more difficult way was good because I feel like I learned a bit more :) | 17:24 |
clarkb | but then having easy mode at the end was nice | 17:24 |
melwitt | :) | 17:25 |
clarkb | [node_request: 300-0015441935] [node: 0026535559] Node is ready | 17:25 |
clarkb | I think it is happy now | 17:26 |
melwitt | phew! | 17:26 |
fungi | awesome | 17:30 |
clarkb | https://grafana.opendev.org/d/4sdNjeXGk/nodepool-inmotion?orgId=1 | 17:40 |
*** ysandeep is now known as ysandeep|out | 18:27 | |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Explicit tox_extra_args in zuul-jobs-test-tox https://review.opendev.org/c/zuul/zuul-jobs/+/809456 | 19:01 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Add tox_config_file rolevar to tox https://review.opendev.org/c/zuul/zuul-jobs/+/806613 | 19:17 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Support verbose showconfig in tox siblings https://review.opendev.org/c/zuul/zuul-jobs/+/806621 | 19:17 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Include tox_extra_args in tox siblings tasks https://review.opendev.org/c/zuul/zuul-jobs/+/806612 | 19:17 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Explicit tox_extra_args in zuul-jobs-test-tox https://review.opendev.org/c/zuul/zuul-jobs/+/809456 | 19:17 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Pin protobuf<3.18 for Python<3.6 https://review.opendev.org/c/zuul/zuul-jobs/+/809460 | 19:17 |
fungi | infra-root: bad news, ticket from rackspace says they're planning a block storage maintenance for 2021-10-04 impacting afs01.dfw.opendev.org/main04 | 19:43 |
fungi | i suppose we should attach a new volume, add it as a pv in the main vg on the server, and then pvmove the extents off main04 and delete the volume | 19:44 |
fungi | i'll try to get that going today or tomorrow, it should be hitless for us | 19:45 |
fungi | at least we have a few weeks warning | 19:46 |
fungi | unfortunately, cinder operations in rackspace are a pain because of the need to use the cinder v1 api which osc no longer supports | 19:47 |
*** odyssey4me is now known as Guest93 | 20:05 | |
Clark[m] | fungi: I think the osc in the venv in my home for on bridge works with tax cinder you just have to override the API version on the command line to v1 | 20:36 |
opendevreview | Slawek Kaplonski proposed opendev/irc-meetings master: Update Neutron meetings chairs https://review.opendev.org/c/opendev/irc-meetings/+/809478 | 20:48 |
fungi | Clark[m]: i'll give that a try, but i also have cinderclient set up on bridge i can use to do the cinder api bits | 20:49 |
*** dviroel is now known as dviroel|out | 21:00 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Run daily backups of nodepool zk image data https://review.opendev.org/c/opendev/system-config/+/809483 | 21:13 |
clarkb | infra-root ^ that isn't critical to backup but nodepool has grown the ability to do those data dumps so I figure we may as well take advantage of it' | 21:13 |
fungi | i just did `curl -XPURGE https://pypi.org/simple/reno` (and a second time with a trailing / just in case) based on the discussion in #openstack-swift about job failures which look like more stale reno indices being served near montreal | 22:11 |
*** odyssey4me is now known as Guest101 | 22:56 | |
fungi | clarkb: i think ianw was able to work out how to extract the cached indices from the fs at one point, but i don't recall how he located samples | 23:29 |
ianw | fungi: ISTR it being a inelegant but ultimately fruitful application of "grep" | 23:31 |
ianw | 2020-09-16 : "pypi stale index issues ... end up finding details by walking mirror caches" is what i have in my notes | 23:32 |
fungi | sounds about right | 23:32 |
fungi | wow, and today's the anniversary! coincidence? | 23:33 |
clarkb | fwiw I did a find /var/cache/apache2/proxy -type f -name \*.header -exec grep reno {} \; | 23:36 |
ianw | https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2020-09-15.log.html#t2020-09-15T20:22:56 | 23:36 |
clarkb | then looked at all the files. It seems that pip explicitly asks for uncached data and that the only version of the file we cached was up to date | 23:37 |
clarkb | for reno's index specifically on the iweb mirror | 23:37 |
ianw | fungi: haha yes, i guess that's from my timestamp, so happened on the 15th UTC | 23:37 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!