tkajinam | fungi, yeah that was merged after recheck. | 01:50 |
---|---|---|
frickler | so even if nodepool did upload images to iad, those uploads weren't successful from a nodepool pov, the active images e.g. for rockylinux-9 are still 10 and 11d old. I'll try to clean up the older one via nodepool in the hope of getting a bit of space freed up on the builders | 04:29 |
frickler | also the osc command is "openstack image task list", it does work fine on other tenants and shows just a single task for the ci@iad tenant | 04:31 |
frickler | so the nodepool image-delete worked, I'll try to go through the other ones slowly in order not to overload the API or backend | 04:52 |
frickler | regarding deleting the old instances, as expected a manual delete command didn't change the status, so those will likely need some rax intervention | 04:53 |
frickler | the retried acme.sh run on meetpad01 also still didn't work, so I'll do the manual cert copying steps later today | 04:54 |
frickler | seems the image deletion worked well and we now have at least about 25% free on the builders again | 06:57 |
*** amoralej is now known as amoralej|lunch | 11:04 | |
fungi | there were 3 images in rax-iad that nodepool had no record of so i deleted them just now and the counts are finally matched up | 11:42 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily pause image uploads to rax-iad" https://review.opendev.org/c/openstack/project-config/+/890908 | 11:44 |
fungi | and yes, after comparing times based on the image names, i'm certain the ones i was seeing appear after the pause took effect were somehow delayed/queued on the glance side. the builders didn't try to upload anything there after the pause, but things they had previously tried to upload were eventually appearing in the image list in glance many hours later | 11:52 |
fungi | counts in dfw and ord aren't as bad as iad was (each a little over 400) but still clearly about 95% leaked. i'll try to find time to clean those up later today | 12:03 |
frickler | fungi: regarding iad uploads, maybe try manually uploading an image first? does nodepool complain then? we could also test in the ci tenant? | 12:08 |
frickler | fungi: also any idea what to do about the stuck-in-deleting instances? | 12:09 |
fungi | for instances stuck deleting, i've usually had to open a support ticket or otherwise reach out to a cloud admin regardless of which provider it is | 12:11 |
fungi | i can try manually uploading an image from one of the builders and see what happens, though working out the invocation could be tricky (i'll probably need to install osc in a venv first) | 12:13 |
fungi | and we don't install the python3-venv package on the servers | 12:14 |
fungi | we've got some room on bridge01 though, i guess i could try copying an image to it and uploading from there | 12:15 |
frickler | maybe start with a standard cloud image or even cirros. would also be interesting to see if smaller images work better | 12:20 |
fungi | well, i've got debian-bookworm-0000000167.vhd copying over from nb01 | 12:21 |
fungi | conveniently, bridge has ssh access to everything already | 12:23 |
*** amoralej|lunch is now known as amoralej | 12:23 | |
fungi | i've got to go do a couple other things, but i've started an image create in rax-iad with that vhd file and am timing it | 12:40 |
opendevreview | Slawek Kaplonski proposed openstack/project-config master: Allow neutron-core to act as osc/sdk service-core https://review.opendev.org/c/openstack/project-config/+/890914 | 12:54 |
fungi | created_at 2023-08-09T12:44:44Z | 13:11 |
fungi | status pending | 13:11 |
fungi | image list still isn't returning it | 13:11 |
fungi | and image show with its uuid returns "No Image found for ..." | 13:12 |
fungi | image task list is taking quite a while to return anything | 13:17 |
fungi | took about 9 minutes to complete and listed thousands of entries (i'm running it again to try and get a proper count of them since the output was longer than my buffer) | 13:32 |
fungi | 45275 tasks listed | 13:44 |
fungi | the image i uploaded still isn't being returned by the api | 13:44 |
fungi | need to go run some errands, but will bbiab | 13:45 |
guilhermesp_____ | fungi: sorry for the delay here -- we did have an incident yesterday with one of the storage nodes in ca-ymq-1 around 10:30 am est -- wonder how is this looking so far for you now | 14:32 |
opendevreview | Slawek Kaplonski proposed openstack/project-config master: Allow neutron-core to act as osc/sdk service-core https://review.opendev.org/c/openstack/project-config/+/890914 | 14:39 |
*** dviroel__ is now known as dviroel | 14:43 | |
fungi | guilhermesp_____: nothing new since 14:49:46 utc yesterday, so seems the impact was limited. thanks for confirming it was what it seemed like! | 14:59 |
opendevreview | gnuoy proposed openstack/project-config master: Add OpenStack K8S Telemetry charms https://review.opendev.org/c/openstack/project-config/+/890921 | 15:04 |
guilhermesp_____ | cool thanks for confirming fungi we took the measures to avoid that in the future | 15:34 |
fungi | much appreciated! | 15:34 |
frickler | fungi: wow, that's a lot of tasks. seems I never waited long enough for that command to return. should we try to delete all of them and see if that helps? I would hope that these are simple db operations that can work faster than image deletions | 15:57 |
fungi | frickler: i don't think they can be deleted? | 16:03 |
fungi | it seems like it's more of a log, since they all have states like "success" or "failure" | 16:04 |
fungi | `openstack image task ...` subcommands are limited to list and show, from what i can see | 16:05 |
fungi | probably some background process in glance is supposed to expire those entries after some time | 16:05 |
fungi | some of these entries are from 2015 | 16:07 |
frickler | oh, right, tab completion was giving me hallucinated extra commands | 16:10 |
fungi | i'm going to actually save the output to a file this time and then i can more easily see the newer entries (i think they're at the top of the list) and check what extended data they have using image task show | 16:10 |
fungi | probably the expected way to use the task listing endpoint in the api is to request the n newest entries or entries after x time | 16:12 |
fungi | certainly getting 42k task entries from the past 8 years isn't terribly useful, at least in our case | 16:13 |
fungi | er, 45k | 16:13 |
fungi | also, if the image upload processing backlog is consistent with what i was seeing yesterday, that image i uploaded at 12:45 utc will probably start showing up in the image list soonish | 16:16 |
fungi | 8d06a385-edfa-4a52-aa3d-223b6ae3bd96 is the top one in the task list and it has a status of success. it's for that test image i uploaded | 16:19 |
fungi | huh, okay so the image uuid returned earlier by the create command doesn't match the one referenced in the task detail | 16:19 |
fungi | that image does exist, and is status active updated_at 2023-08-09T13:12:14Z | 16:20 |
fungi | so took just shy of 30 minutes to appear, probably | 16:20 |
fungi | no, wait that was an update from reaching it's "expiration" (not sure what that does exactly). the import task was created at 12:44:44 and the image says it was created at 12:54:44, which is only 10 minutes | 16:23 |
fungi | i'll delete that and do another upload test to see if it's consistent | 16:24 |
fungi | started the new upload at 16:25:00 precisely | 16:25 |
frickler | I actually tried to list tasks with --limit 2, but that didn't seem to work any faster | 16:28 |
fungi | real 4m23.219s | 16:30 |
fungi | created_at 2023-08-09T16:29:23Z | 16:30 |
fungi | the image create response said the id was c35b96c7-5448-4e24-9d97-983ae3c2aab4 | 16:31 |
fungi | but i'll check for it by name instead since it seems like it eventually appears with a different uuid (maybe as a result of the import task) | 16:32 |
fungi | oh! the id is actually the glance task's id, not the image's id, this makes slightly more sense | 16:33 |
fungi | that uuid is an import task with status processing | 16:33 |
fungi | so once that task's status reaches success, it should mention an image id for the uploaded image | 16:34 |
fungi | stil | 16:45 |
fungi | l processing | 16:45 |
fungi | image is now appearing in the list, and claims to have been created at 2023-08-09T16:37:56Z (but wasn't there when i checked for it at 16:45 so something else is still going on) | 17:20 |
frickler | if I read the nodepool code correctly, it simply calls the create_image function in sdk. that has a default timeout of 1h, so I guess that should be fine and we can try to go on with the revert | 18:48 |
frickler | fungi: I've +2d the patch so you can proceed with it whenever you feel ready. or if you just un-wip I can merge and watch tomorrow my morning | 18:50 |
fungi | thanks frickler! i'm going to do a couple more upload tests and see if i can get a clearer picture of how quickly they're appearing in the image list | 18:53 |
fungi | but if it's significantly less than an hour i agree that seems good enough | 18:53 |
fungi | started a test upload at 19:39:32 which finished at 19:44:19 and appeared in the image list between 20:11:12 and 20:12:12, so worst case call that 32m40s (but probably a few minutes less if nodepool is counting from when the create call returns from blocking) | 20:26 |
fungi | this seems reasonable, i'll approve the revert and keep an eye on it while i knock out some yardwork | 20:27 |
opendevreview | Merged openstack/project-config master: Revert "Temporarily pause image uploads to rax-iad" https://review.opendev.org/c/openstack/project-config/+/890908 | 20:43 |
fungi | keeping an eye on new uploads to rax-iad once that deploys | 20:45 |
fungi | which have now started | 20:47 |
fungi | currently uploading 15 images... hopefully that thundering herd doesn't slow ready time to more than an hour | 20:51 |
fungi | not looking good, we're already past 45 minutes and they're all still "uploading" | 21:33 |
fungi | then again, uploading one image from bridge took over 4 minutes. even splitting the load between two builders, maybe it really takes this long just to transfer the data for ~7.5 images from each builder to glance | 21:35 |
fungi | especially if there are uploads to other providers going on at the same time | 21:35 |
fungi | not good, now the uploading count has risen to 16 | 22:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!