| opendevreview | James E. Blair proposed zuul/zuul-jobs master: Simplify testing of some upload roles https://review.opendev.org/c/zuul/zuul-jobs/+/959402 | 00:02 |
|---|---|---|
| opendevreview | James E. Blair proposed zuul/zuul-jobs master: Simplify testing of some upload roles https://review.opendev.org/c/zuul/zuul-jobs/+/959402 | 00:11 |
| opendevreview | Merged zuul/zuul-jobs master: Simplify testing of some upload roles https://review.opendev.org/c/zuul/zuul-jobs/+/959402 | 00:24 |
| *** liuxie is now known as liushy | 02:28 | |
| opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/957995 | 02:30 |
| *** bbezak_ is now known as bbezak | 06:22 | |
| *** liuxie is now known as liushy | 07:04 | |
| opendevreview | Merged openstack/diskimage-builder master: bootloader: Fix searching for grub-mkconfig https://review.opendev.org/c/openstack/diskimage-builder/+/949493 | 08:14 |
| amorin | hey team, are you aware of some ipv6 issues recently on openstack CI? I got 503 error that seems random and not related to my work since few days | 09:49 |
| amorin | e.g. https://zuul.opendev.org/t/openstack/build/f0ae336a1b2142a7930d68e46f3a965f | 09:50 |
| frickler | amorin: nothing I'm aware of, but it also doesn't look related to IPv6, like I'm seeing the same (or so it seems) failure on plain devstack here https://zuul.opendev.org/t/openstack/build/47e48f0c00614ea4b215ff407b35cdfc , would rather look like some flaky/racey test? | 10:42 |
| amorin | ack, thanks, so if it's only me, it's very high possibility that this is my code change that is doing races :) | 10:44 |
| fungi | bad news on the afs01.dfw upgrade. while it did successfully move mirror.ubuntu it ran out of space on afs02.dfw for mirror.ubuntu-ports | 12:30 |
| fungi | in retrospect, i should probably have inserted a vos release after each move, i bet that's what frees the temporary consumption | 12:31 |
| fungi | i'll make sure to do that when i move them all back to their original homes | 12:31 |
| fungi | anyway, it's in progress again | 12:37 |
| Clark[m] | If you do vos releases you need to hold the lock right? So maybe it does make sense to grab those on the mirror node for the next pass? | 13:29 |
| amorin | ahah frickler, in mistral test, we do a request against: https://httpbin.org/encoding/utf8 | 14:00 |
| amorin | giving 503 at the moment | 14:00 |
| fungi | Clark[m]: oh, good point, so maybe it's more a matter of pausing between the larger volume moves to let our scheduled processes execute a vos release on their own | 14:30 |
| clarkb | fungi: if you have time I have a whole bunch of changes that have come up while you were out that shouldn't require too much effort to review: https://review.opendev.org/c/opendev/system-config/+/959236 https://review.opendev.org/q/hashtag:"drop-bionic" https://review.opendev.org/c/zuul/zuul-jobs/+/958800 https://review.opendev.org/c/opendev/infra-manual/+/958571 note that some of | 14:47 |
| clarkb | them also have children | 14:47 |
| fungi | i'll take a look after lunch. trying to catch up on languishing mailing list moderation tasks first | 14:47 |
| fungi | oh, also an additional stat, the mirror.ubuntu move last week took a little over 19 hours to complete | 15:04 |
| fungi | i'm guessing the mirror.ubuntu-ports move will be a little faster owing to the removal of bionic-arm64 earlier | 15:04 |
| fungi | gonna grab a quick lunch, bbiab | 15:21 |
| opendevreview | Mohammad Issa proposed openstack/project-config master: Add repo app-metallb for starlingx https://review.opendev.org/c/openstack/project-config/+/959574 | 18:14 |
| clarkb | that change sets requiercontributoragreement = true | 18:15 |
| clarkb | I don't think we are doing that anymore and expect ci to -1 for that reason. but figured I'd call it out if we want to provide any special messaging | 18:16 |
| opendevreview | Mohammad Issa proposed openstack/project-config master: Add repo app-metallb for starlingx https://review.opendev.org/c/openstack/project-config/+/959574 | 18:18 |
| fungi | looks like they got the hint | 18:21 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Update UA filter rules https://review.opendev.org/c/opendev/system-config/+/959576 | 18:32 |
| opendevreview | Mohammad Issa proposed openstack/project-config master: Add repo app-metallb for starlingx https://review.opendev.org/c/openstack/project-config/+/959574 | 18:33 |
| clarkb | side note on 959576 at least one UA is already blocked and shows up with a ton of 403 hits | 18:38 |
| clarkb | so they don't even back off when they get told to go away but at least that is far cheaper for the server to respond with than trying ot process expensive requests for bots that don't identify themselves properly | 18:39 |
| fungi | yeah, at least it doesn't result in more load on the db | 19:08 |
| fungi | is there a way to do a quoted reply to an inline comment in gerrit where someone else has already replied? the ui only seems to let me reply to the most recent comment, not any earlier ones | 19:11 |
| fungi | the very bottom of the comment thread has a "quote" link but it only ever quotes the last comment in the thread | 19:11 |
| opendevreview | Merged openstack/project-config master: Add repo app-metallb for starlingx https://review.opendev.org/c/openstack/project-config/+/959574 | 19:27 |
| opendevreview | Nicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge https://review.opendev.org/c/zuul/zuul-jobs/+/959393 | 19:41 |
| clarkb | fungi: it uses email like quoting > foo | 19:59 |
| clarkb | but no the UI doesn't give you an easy button like it does for earlier comments. But you can copy the content and prefix with > | 20:00 |
| fungi | i didn't know if it did any sort of fancy linking or username reference beyond just the actual quote markers | 20:04 |
| clarkb | I don't think it does | 20:04 |
| clarkb | the problem from the UI perspective is you can only respond to the last comment | 20:05 |
| clarkb | regardless of replies or quoting | 20:05 |
| corvus | there actually is a comment thread, but it's really hard to piece together using the api (which is why gertty's implementation of it is not quite 100% yet) | 20:08 |
| corvus | (as in, replies to have a reference to their parent comment.) | 20:09 |
| clarkb | ya and the web UI only lets you reply to the last comment in the thread. It doesn't do branching | 20:10 |
| fungi | clarkb: question on 958809 | 20:23 |
| fungi | never mind! i should have read the next change in that series first | 20:24 |
| clarkb | fungi: zuul-jobs automatic job management doesn't let non voting jobs go into the gate | 20:25 |
| clarkb | so when I made them nonvoting when changing the behavior of the test to enforce the behavior we want things got sad | 20:25 |
| fungi | yeah, mentioning that in the commit message might have helped, but no biggie | 20:25 |
| clarkb | ah ya you weren't around for the day of wtf is going on debugging that happened. Sorry I should've included more detail there | 20:26 |
| fungi | i re-read that commit message several times to make sure i wasn't missing some additional context | 20:26 |
| fungi | i saw a bit of the discussion in here or in matrix (i forget which), but that was many beers ago for me now | 20:26 |
| clarkb | the main source of the confusion was that the uwsgi image build worked. But eventually I realized that was because we were doing a multiarch build for it with a single arch listed | 20:27 |
| clarkb | then it all sort of started to come together after some manual testing on a held node | 20:27 |
| clarkb | but it was very confusing for a while | 20:27 |
| fungi | understandably confusing, it's quite complicated | 20:27 |
| fungi | i'm still working to wrap my head around some of it | 20:28 |
| clarkb | as a side note: I think it is crazy that the default buildx builder doesn't honor the settings in the documented config file. I also think it is crazy that ssl cert amnagement isn't simpler, but I learned long ago that docker really wants you to use things the prescribed way and anything else is good luck have fun | 20:28 |
| fungi | starting with not rolling your own network connectivity | 20:28 |
| fungi | if you pass a list as the condition for an ansible "when" directive, are the items tested as a logical and? | 20:33 |
| opendevreview | Merged zuul/zuul-jobs master: Fix kubernetes install methods https://review.opendev.org/c/zuul/zuul-jobs/+/958800 | 20:33 |
| clarkb | fungi: yes listed conditions should all be satisfied to trigger the task/block | 20:34 |
| opendevreview | Nicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge https://review.opendev.org/c/zuul/zuul-jobs/+/959393 | 20:37 |
| opendevreview | Merged opendev/system-config master: Exclude django caches from lists.o.o backups https://review.opendev.org/c/opendev/system-config/+/959236 | 20:42 |
| fungi | clarkb: okay, now i have a question on 958783. i expect i'm missing some fundamental nuance of the change | 20:45 |
| clarkb | I just remembered to remove mirror02.sjc3.raxflex.opendev.org from the emergency file. THis is now done | 20:45 |
| opendevreview | Merged zuul/zuul-jobs master: Update registry tests to better cover speculative image builds https://review.opendev.org/c/zuul/zuul-jobs/+/958809 | 20:46 |
| clarkb | fungi: responded. Does that help? | 20:47 |
| fungi | ah, yep, so it was being done needlessly before in single-arch jobs? | 20:49 |
| clarkb | fungi: no single arch builds used the default docker buildx builder. They didn't use the custom one. | 20:49 |
| opendevreview | Merged opendev/infra-manual master: Drop the suggestion for using the x/ namespace https://review.opendev.org/c/opendev/infra-manual/+/958571 | 20:49 |
| clarkb | fungi: using the default builder doesn't work with our buildset registries because it seems to ignore the config file and even then we'd have to configure /etc/hosts or DNS and ssl certs to amek ti all work | 20:49 |
| clarkb | fungi: the multiarch builds use a custom buildx builder that solves all of those problems. So we can run single arch builds just like the multiarch builds with a custom builder. We just leave out the multiarch specific bits (the emulation essentially) | 20:50 |
| fungi | oh, so this starts using the custom one for single-arch runs, and then you're just omitting steps that they don't need | 20:50 |
| clarkb | yes | 20:50 |
| fungi | got it | 20:50 |
| clarkb | https://review.opendev.org/c/zuul/zuul-jobs/+/958783/9/roles/build-container-image/tasks/main.yaml thats what this diff does essentially | 20:51 |
| clarkb | when container_command == docker we always use the custom buildx builder. Wheras before it split based on wether or not the build was multiarch | 20:51 |
| fungi | clarkb: and now a question on 958906 | 20:54 |
| fungi | in 958907 (for bindep) you did increase it | 20:55 |
| fungi | sadly some of the packaging improvements won't be possible until we also drop support for 3.8, not quite there yet | 20:57 |
| clarkb | fungi: responded | 20:58 |
| clarkb | I'm ahppy to go either way on it | 20:58 |
| fungi | a bunch of the support for more modern metadata didn't show up until setuptools 77, which requires python 3.9 or later | 20:58 |
| clarkb | also for some reason I thought that was the bindep change (whcih is probably evident in my response) | 20:58 |
| clarkb | I think I mixed up where I wanted to update python requires. I meant to leave it on bindep and bump it on git review | 20:58 |
| clarkb | essentially because bindep doesn't change anymore | 20:58 |
| clarkb | but I'm happy to bump it for all of them if we think that is just simpler | 20:59 |
| fungi | yeah, there are things we can't do with bindep's packaging, as i just mentioned, until we're only supporting newer python, so this is a forcing function for us to slowly get there | 20:59 |
| clarkb | ack so the purpose behind it is to simplify and converge the packaging behavior/tooling on a modern common platform | 21:00 |
| clarkb | I'll update the git review change | 21:00 |
| opendevreview | Clark Boylan proposed opendev/git-review master: Drop testing on Bionic and Python36 https://review.opendev.org/c/opendev/git-review/+/958906 | 21:00 |
| fungi | that's at least the driving concern for me. newer packaging needs newer deps which need newer python, so we're essentially stuck producing python 3.6 era packages as long as we want to support python 3.6 | 21:01 |
| clarkb | I guess this also goes back to my frustrations with python packing breaking old stuff that was perfectly fine | 21:02 |
| clarkb | similar to how old openstack packages are no longer installable from pypi anymore or something | 21:02 |
| fungi | and then not backporting fixes for older setuptools, yeah | 21:03 |
| clarkb | ruamel.yaml had a warning about this too but that seems to still work for now | 21:03 |
| clarkb | basically if the old code isn't broken as it stands then we shouldn't be breaking it just because | 21:03 |
| clarkb | but that is the approach most of the python ecosystem has taken and trying to swim against that tide is a losing battle | 21:03 |
| fungi | they introduced a regression that nobody pointed out before they dropped old python support in the next version, and then couldn't fix setuptools for people on the older python version they broke | 21:04 |
| fungi | with the argument that such old python versions are eol upstream | 21:04 |
| fungi | so trying to continue supporting them is too much work | 21:04 |
| clarkb | its literally more work to do what they've done... | 21:05 |
| clarkb | fungi: I'm noticing in https://review.opendev.org/c/opendev/system-config/+/959236/1/inventory/service/group_vars/mailman3.yaml I kept a trailing / I don't think it matters but none of the other values have a trailing / | 21:06 |
| clarkb | ya those values get passed to borg --exclude | 21:07 |
| fungi | i didn't spot that, i guess it's a question of whether borg thinks trailing slashes are special or just filters them out | 21:07 |
| clarkb | https://borgbackup.readthedocs.io/en/stable/usage/help.html#borg-patterns discusses patterns | 21:09 |
| clarkb | I don't see it handling the / special at all. I expect this will be fine and we can confirm after the next rounds of backups | 21:09 |
| fungi | anyway, on the python_requires/requires-python front, i think the setuptools situation is why i'd rather take this opportunity to increase it when we drop testing for it, because otherwise we may drop support after merging a regression for older python and be unable to go back and fix that as easily | 21:10 |
| clarkb | fungi: # Exclude the contents of '/home/user/cache' but not the directory itself: when using -e /home/user/cache/ there | 21:10 |
| clarkb | so the diskcache/ dir will get backed up but none of its contents. I think that is fine | 21:10 |
| fungi | yeah, it won't really take up any room | 21:10 |
| fungi | at most its last updated time will have some churn | 21:10 |
| clarkb | ya shouldn't be a big deal. But happy to push a change up to drop the / if we want | 21:11 |
| fungi | no need in my opinion | 21:14 |
| opendevreview | Merged zuul/zuul-jobs master: Always build docker images with custom buildx builder https://review.opendev.org/c/zuul/zuul-jobs/+/958783 | 21:14 |
| fungi | as for preventing users from installing newer bindep on older platforms, the issue is that pip will end up automatically selecting a bindep not tested with python 3.6 when users `pip install bindep` there, while setting python_requires/requires-python higher will stop it from being auto-selected on platforms where it might have stopped working | 21:18 |
| fungi | i do wish pip had a middle ground and "recommended for use with" wasn't the same as "will only be installable on" | 21:19 |
| opendevreview | Clark Boylan proposed opendev/grafyaml master: Pull python base images from quay.io https://review.opendev.org/c/opendev/grafyaml/+/958601 | 21:20 |
| clarkb | with 958783 landed I can clean up changes like ^ and recheck the others that failed | 21:21 |
| fungi | the python version support removal changes probably also warrant release notes, but we can worry about those when we get ready to release. they're fairly copy-paste one sentence deals anyway | 21:23 |
| opendevreview | Clark Boylan proposed opendev/lodgeit master: Pull base images from opendevorg rather than opendevmirror https://review.opendev.org/c/opendev/lodgeit/+/958602 | 21:24 |
| fungi | at least for bindep and git-review, i don't recall if we ever bothered to do them for glean | 21:24 |
| clarkb | I cant' recall for glean | 21:24 |
| fungi | i don't see any, don't think we did | 21:25 |
| fungi | also looks like we never added python_requires to the setup.cfg for glean? | 21:26 |
| fungi | but that's not something like a tool end users typically install directly, so probably fine | 21:26 |
| clarkb | probably not, because of our reliance on it and the slowness with which we remove diskimages I don't think we've run into problems there | 21:27 |
| clarkb | we could potentially here with bionic testing going away before we remove bionic nodes but we're activelytrying to remove bionic nodes and that chicken and egg has always existed with glean so I epxect its mostly fine | 21:27 |
| fungi | right, i'm not overly concerned | 21:27 |
| clarkb | fungi: were you going to review https://review.opendev.org/c/opendev/glean/+/953163/2 as well? | 21:29 |
| clarkb | I think I'm happy to approve that as is if not | 21:29 |
| fungi | yeah, i'm looking at it already | 21:29 |
| fungi | seems like the dib change it depends-on has already merged | 21:29 |
| opendevreview | Merged opendev/bindep master: Drop Bionic testing https://review.opendev.org/c/opendev/bindep/+/958907 | 21:29 |
| clarkb | fungi: yes there was a lot more momentum on the dib side to get those new jobs in place to test centos 10 | 21:31 |
| fungi | cool, well this finishes it off i guess | 21:32 |
| opendevreview | Merged opendev/git-review master: Drop testing on Bionic and Python36 https://review.opendev.org/c/opendev/git-review/+/958906 | 21:41 |
| fungi | looking like i'll probably be afk by the time the mirror.ubuntu-ports move completes, but assuming it does then i can likely knock out the afs01.dfw upgrade to noble tomorrow, as well as the afsdb and kdc servers, and get the rw volume moves back to afs01.dfw underway in preparation to upgrade afs01.ord and afs02.dfw early next week | 21:45 |
| fungi | though even if the volume moves start tomorrow, i expect it'll be tuesday at the earliest before they're finished | 21:46 |
| clarkb | makes sense given how long it took the first pass through | 21:49 |
| opendevreview | Nicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge https://review.opendev.org/c/zuul/zuul-jobs/+/959393 | 21:55 |
| clarkb | corvus: I think the image builds are a bit slower now. Not the end of the world I just wanted to make note of it (not surprising given the few extra steps we're taking) | 22:06 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Build gerrit image with python base from quay.io https://review.opendev.org/c/opendev/system-config/+/958597 | 22:10 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Pull hound's base python image from quay https://review.opendev.org/c/opendev/system-config/+/958593 | 22:11 |
| corvus | like pulling the builder image and starting it? or is this a situation where we've got too many ansible tasks and we should just squash them into a script? | 22:12 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Build ircbots with base python image from quay.io https://review.opendev.org/c/opendev/system-config/+/958596 | 22:13 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Pull python base image for statsd metric reporters from quay.io https://review.opendev.org/c/opendev/system-config/+/958594 | 22:16 |
| clarkb | corvus: yes I think it is all of the extra pulls and pushes | 22:16 |
| clarkb | corvus: we also have the temporary registery that we pull into from the custom buildx builder | 22:16 |
| corvus | i think that was due to multi-arch; i wonder if modern skopeo could accomplish a transfer that retains all the info. that might be a thread to pull on. | 22:17 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Update jinjia-init and gitea-init to modern image build tooling https://review.opendev.org/c/opendev/system-config/+/958598 | 22:18 |
| clarkb | corvus: oh something like tell skopeo to directly transfer from the customer builder to the buildset/intermediate registries? | 22:18 |
| clarkb | ya that could be quicker if possible | 22:19 |
| corvus | yep | 22:19 |
| clarkb | in any case all of these chagnes I just updated/rechecked should be good exercise of the updated system | 22:19 |
| clarkb | and extra prove that this now works as expected for docker | 22:19 |
| fungi | looking at the graphs, mirror.ubuntu-ports is roughly 77% the size of mirror.ubuntu at the moment, so projecting completion time around 03:25 utc | 22:31 |
| fungi | if i'm still awake i'll check in on it | 22:32 |
| fungi | but doesn't look like i'll be able to proceed with upgrades until tomorrow morning my time in any case | 22:33 |
| fungi | also a heads up, mirror.centos-stream seems to be over 95% of its quota, so could fill up pretty easily | 22:34 |
| clarkb | no rush. I've got to pop out this evening for school thing myself | 22:34 |
| clarkb | fungi: yes one of the issues with centos mirroring (and why I'm wary of mirroring it more) is that they don't remove a lot of old packages like debuntu do | 22:35 |
| clarkb | and some of the packages are quite large (like firefox and thunderbird iirc) so things can grow unwieldy | 22:35 |
| fungi | so basically the only culling that happens for it is when we drop entire release series | 22:36 |
| clarkb | yup | 22:36 |
| clarkb | I think some packages do get deleted when replaced, but not all of them. And notably the very large packages don't | 22:37 |
| opendevreview | Merged opendev/glean master: Drop testing on Bionic and Xenial https://review.opendev.org/c/opendev/glean/+/958909 | 22:39 |
| clarkb | https://mirror.dfw.rax.opendev.org/centos-stream/9-stream/AppStream/aarch64/os/Packages/?C=S;O=A and https://mirror.dfw.rax.opendev.org/centos-stream/9-stream/AppStream/x86_64/os/Packages/?C=S;O=A illustrate the problem (scroll to the bottom) | 22:40 |
| clarkb | https://mirror.dfw.rax.opendev.org/centos-stream/9-stream/CRB/x86_64/os/Packages/?C=S;O=A this is a good one. There is like 25GB of unneeeded data right there | 22:41 |
| fungi | looking at https://zuul.opendev.org/t/openstack/build/1f72b674eac0472fb116dfcef250f0f3 it's unclear to me whether "ensure-dib: Check if diskimage-builder is installed" should be reworked to not present a failure state | 22:42 |
| fungi | "Wait for server to boot or fail" later looks like the actual issue | 22:42 |
| clarkb | its 5 major versions of dotnet sdk with multiple versions of each one except for 10 | 22:42 |
| clarkb | fungi: that test is building an image with dib (whcih it appears to have done), uploading to the openstack, then booting with openstack and waiting for the boot to finish. This last step appears to have failed. Not suer about the diskimage-builder is installed question | 22:44 |
| clarkb | I suspect we may not be waiting long enough for things to boot in all cases and may need to increase the time we wait there | 22:44 |
| fungi | the n-cpu log does have a bunch of tracebacks in it, but i have no idea whether that's normal | 22:45 |
| fungi | well, not a bunch, just a few. and they look like slow rabbitmq maybe | 22:45 |
| clarkb | oh I see if you look at the console log its doing a check to see if dib is installed then installing dib. I think that is just noice and an artifact of how things get rendered. Its probably a failed when false situation | 22:46 |
| fungi | right, i was just wondering if that task should be reworked to not exhibit a failure result, since it also ends up in the summary as the likely cause for the job failure (which it isn't) | 22:46 |
| fungi | or, well, as a likely cause anyway | 22:47 |
| fungi | and yes, ultimately noise | 22:47 |
| clarkb | the task is marked ignore_errors: true | 22:48 |
| clarkb | maybe it needs to be failed_when: false to not set the status to failed and then ignore that status | 22:48 |
| clarkb | the boot timeout is ~20 minutes. Its 120 checks with a 10 second delay each time | 22:48 |
| fungi | or maybe the dashboard could also filter out its typical failure identification if ignore_errors is true on the task | 22:48 |
| clarkb | oh except it logs that it only tried 100 attempts | 22:49 |
| clarkb | oh! the server entered an ERROR status with nova that is why | 22:49 |
| fungi | aha, that'd do it | 22:50 |
| fungi | maybe still a timeout, just one internal to the scheduler/controller? | 22:50 |
| clarkb | ya I think Instance failed to spawn: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5ef90e0256e94292986613ba4ae98e64 is why it entered an ERROR state with nova | 22:50 |
| fungi | okay, so those exceptions that i mistook for slow rabbitmq responses may have been an actual indication | 22:51 |
| clarkb | yes I suspect this is a sad cloud more than anything else | 22:52 |
| clarkb | corvus: its a bit late to do anything about ze11 now. But I'm wondering if we test it one more time then maybe redeploy it? | 22:53 |
| corvus | oh yeah that | 22:53 |
| clarkb | corvus: the interseting thing is you just deployed ze11 on noble recently and that would've built up the git caches from scratch and been fine at that time so somethign definitely changed with ipv6 on that instance but maybe the aesiest thing to do is move on like we did with the opendev.org haproxy | 22:53 |
| corvus | test cloning now | 22:54 |
| fungi | oh, i missed this bit of fun... is it exhibiting the same symptoms? | 22:54 |
| corvus | still slow | 22:55 |
| clarkb | fungi: no it never seems to drop the connection (and it is in rax not vexxhost), but it clones nova in about 12-13 minutes which is several minutes longer than our 10 minute timeout | 22:55 |
| clarkb | fungi: testing via ipv4 on ze11 and ipv6 on ze01 we get clones in the 3-4 minute range which is what we expect (and gives our timeout a typical 2.5x buffer) | 22:55 |
| fungi | ah, okay so totally different problem then | 22:55 |
| clarkb | but the node would've cloned nova under the timeout when it was rebuilt onto noble recently | 22:56 |
| clarkb | so within the last couple of months its ipv6 connection to review (and possibly elsewhere) is slower than expected | 22:56 |
| clarkb | we noticed because for some reason that executor decided it neeeded to reclone nova then entered a failure loop that acutally impacted other jobs running on it (as their git stuff got backed up behind the things trying to clone nova) | 22:57 |
| clarkb | so we turned off ze11 (and its been off since with occasioanly manual tests to see if it is any better) | 22:57 |
| clarkb | er we turned off the executor container on ze11 but the host is running and in the emergency file | 22:57 |
| clarkb | cloudnull: ^ is slower than expected ipv6 connectivity for specific hosts in rax classic something you'd be interested into digging into? | 22:58 |
| clarkb | cloudnull: tl;dr is we have a host that clones nova from review.opendev.org slower than other hosts in the same region. And forcing ipv4 also seems to provide consistent behavior with the other hosts | 22:59 |
| clarkb | but its tough to say the problem is in rackspace, it could be in vexxhost. Or it could be some intermediate router/pathway on the internet... | 22:59 |
| opendevreview | Tim Burke proposed openstack/project-config master: update_constraints.sh: Better describe what we're skipping https://review.opendev.org/c/openstack/project-config/+/959628 | 23:04 |
| corvus | clarkb: 8:41.02elapsed | 23:04 |
| corvus | not good enough to return to service i think | 23:04 |
| clarkb | corvus: interesting that it is faster. So maybe whatever the issue is is not a consistent or persistent problem? | 23:07 |
| clarkb | like if it is packet loss maybe the amount of packet loss has reduced or something | 23:07 |
| clarkb | cloudnull: I guess if this is the sort of mystery you would be interested in running down I think we're willing to help. Otherwise we may recycle the node and see if we get better results with a new one | 23:19 |
| clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/959576 is a good one to look at tomorrow too if we think that will help lists performance. Note I don't want to land it tonight as that will affect all services using teh filter so has potentially broad impact and I have parent teacher meeting things tonight | 23:24 |
| fungi | yeah, would be better to have someone on hand to edit/roll back in case it blocks legit users we're not aware of | 23:28 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!