opendevreview | Takashi Kajinami proposed openstack/diskimage-builder master: Remove the usage of pkg_resource https://review.opendev.org/c/openstack/diskimage-builder/+/933324 | 05:54 |
---|---|---|
frickler | infra-root: the issues with the registry seem to persist after the restart, see e.g. https://zuul.opendev.org/t/openstack/build/3d0dca8ff09d41f49832835f0ae08c91 , I'm trying to do some debugging now but if anyone has more knowledge about that setup, help will be welcome | 08:12 |
tkajinam | frickler, thanks ! | 08:13 |
tkajinam | the issue is consistently seen but I don't see any specific failure in logs. actually log ends with "TASK [upload-logs-swift : Upload logs to swift]" so I suspect something went wrong during log upload but I'm struggling to understand details | 08:14 |
frickler | tkajinam: if you look at the timestamps, you see about 30 minutes pass after starting that task and then the timeout happens. so likely something getting stuck on the registry side, I'm trying to find logs for that now | 08:15 |
frickler | I didn't find any matching logs from the registry container. there are some ssl errors in the logs, but from different timestamps, so I'm assuming some rogue clients/portscans | 08:38 |
frickler | looking at the swift backend, it seems to be working, although the number of objects and amounts of data looks pretty huge to me, but not sure if that might be related to any issue | 08:39 |
frickler | it does seem like expiration isn't (always?) working though, as I found some objects dating back to 2020, while we should have 180d set as limit | 08:40 |
frickler | I guess I'll try to restart the container once again and see if that helps at least for some small amount of time. if not, running zuul-registry with debug enabled would be the next step | 08:46 |
frickler | fwiw I went for a reboot of the server instead of just restarting the containers, figured it couldn't hurt to pick up the updated kernel and resolve possible resource exhaustion issues | 08:52 |
frickler | recheck of the osc patch is running and I'm also holding the node in case it fails again to allow for more testing from the client side | 08:53 |
frickler | ok, now the build job passed and the container log also does contain some relevant logs for the upload. so I guess I'll need to dig deeper into the logs after the previous restart and try to locate when it stopped working | 09:18 |
frickler | I didn't find anything that looks helpful in the log, seems the last successful upload before my restart was at 2024-10-26 14:43:32,600 | 09:40 |
frickler | so I guess we'll just watch what happens now over the next 24h or so | 09:40 |
*** ykarel_ is now known as ykarel | 09:49 | |
tkajinam | yeah I see the build job now succeeds | 09:51 |
opendevreview | yatin proposed opendev/irc-meetings master: Move Neutron CI Weekly meeting 1 hour later https://review.opendev.org/c/opendev/irc-meetings/+/933637 | 10:14 |
opendevreview | Merged opendev/irc-meetings master: Revert "Update kolla meeting time" https://review.opendev.org/c/opendev/irc-meetings/+/933556 | 11:18 |
opendevreview | Merged opendev/irc-meetings master: Move Neutron CI Weekly meeting 1 hour later https://review.opendev.org/c/opendev/irc-meetings/+/933637 | 11:18 |
NeilHanlon | o/ morning folks. looks like gerritlib is back with another issue, this time due to setuptools 74. I'll try and throw a fix up this week if I can figure it out :) https://bugzilla.redhat.com/show_bug.cgi?id=2319663 | 13:37 |
NeilHanlon | (cc fungi) | 13:43 |
*** ykarel_ is now known as ykarel | 13:59 | |
fungi | thanks! | 14:19 |
fungi | NeilHanlon: i can successfully use https://pypi.org/p/build in a fresh venv to build the sdist and wheel for gerritlib 0.11.0 without pinning setuptools, is copr maybe extra strict about deprecation warnings? | 14:29 |
fungi | this was with setuptools 75.3.0 and wheel 0.44.0 | 14:32 |
NeilHanlon | hm, maybe they changed something again between 74.1.x and 75.3.x? | 14:40 |
NeilHanlon | i think it's https://github.com/pypa/setuptools/issues/931 | 14:41 |
NeilHanlon | `error: invalid command 'test'` | 14:41 |
NeilHanlon | heh | 14:41 |
NeilHanlon | feel free to ignore fungi... this is a packaging bug :) | 14:42 |
NeilHanlon | https://src.fedoraproject.org/rpms/python-gerritlib/blob/rawhide/f/python-gerritlib.spec#_67 | 14:42 |
fungi | perfect! | 14:44 |
NeilHanlon | thanks for looking at it before I got a chance to! saved me some time for sure | 14:44 |
fungi | no worries, i do normally consider myself fairly up to date on the overall python packaging scene and follow most of the discussions on the python discourse, so as to make sure we keep our projects relatively compliant (that said, there are plenty of setuptools deprecation warnings spewed when building gerritlib packages, which will need to be dealt with at some point, though i think most of | 14:47 |
fungi | them are ultimately needing adjustments to pbr) | 14:47 |
clarkb | ya we haven't dropped setup.py test support because pbr can be used with older setup.py. You just have to stop using the command if your toolchain is sufficiently new | 14:53 |
NeilHanlon | yep yep, i'm just gonna switch to running nox in the %check of the rpm | 14:58 |
fungi | sounds like a good idea | 15:01 |
frickler | fungi: picking up from the discussion in the nova channel, do we have a doc somewhere how to use the list pw for e.g. openstack-discuss (which I found on bridge) in order to access the moderation interface for the list? or should I rather get myself added as moderator with my usual mailman account? | 15:45 |
clarkb | I think the easiest way is going to be having your personal account as moderator on the various lists you are willing to moderate. For admin I think you login as admin to mailman3 like you would your normal user but then you have access to everything? fungi should be able to confirm and ya maybe we need a blurb on that in our docs | 15:48 |
fungi | frickler: yes, ideally have your own moderator account if you want to help with moderator activities, but the admin user should still be able to access the list moderation features as well | 15:50 |
frickler | ok, then how do I log in as admin user? I tried as just "admin" and that didn't seem to work, either | 15:50 |
fungi | lemme check the file, hold on | 15:51 |
frickler | and I likely wouldn't want to count as "regular" moderator, more like just in case of emergencies (although the event this morning wouldn't count as that ... maybe make it "on-demand" rather ;) | 15:52 |
opendevreview | Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add support for building Fedora 40 https://review.opendev.org/c/openstack/diskimage-builder/+/922109 | 15:53 |
opendevreview | Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add Fedora 40 to the CI tests https://review.opendev.org/c/openstack/diskimage-builder/+/933664 | 15:54 |
fungi | frickler: aha, looks like i need to clear out the old mailman v2 info from the list. you're effectively using the django admin user (password is in the private ansible inventory group vars), but i'll copy it to the usual place and make it clearer | 15:55 |
frickler | fungi: ah, ok, that explains it. so you are using a dedicated account for moderation, different from your usual subscription account? | 15:56 |
fungi | i'll throw a section in https://docs.opendev.org/opendev/system-config/latest/lists.html too | 15:56 |
fungi | frickler: it's not a dedicated account, just your normal subscriber account which can have additional roles (moderator, owner) | 15:56 |
frickler | ok, then I misread "your own moderator account" | 15:57 |
fungi | yeah, i meant your own account instead of the shared admin account | 15:57 |
clarkb | yesterday I tried to get an etherpad v2.2.6 change tested and I seem to be continuously timing out when pushing to the intermediate registry. corvus was that ssl issue with the zuul registry you mentioned showing up as a timeout in jobs or did that fail immediately? | 15:59 |
fungi | clarkb: see scrollback and recheck | 15:59 |
fungi | 08:12-09:51 utc | 16:00 |
clarkb | oh thanks I had indeed missed that | 16:00 |
fungi | but getting to the bottom of what caused it would definitely be a good idea | 16:00 |
clarkb | so restarting wasn't the fix (unlikely to be the tls issue then) and rebooting did help (maybe it was a network stack problem?) | 16:00 |
clarkb | frickler: re the 180 expiry swift is responsible for that if you set the metadata header flag thing. We should probably check those object properly have the flag. Maybe they predate setting it and we should clear them? | 16:01 |
clarkb | as a side note I think we can declare bankruptcy on that entire container and delete all the contents and start over you may just need to recheck things to get zuul happy again afterwards | 16:01 |
frickler | clarkb: actually it looks like the restart earlier did help, but only for < 24h | 16:02 |
frickler | clarkb: yes, I didn't check since when we do set the expiry, I'm also not sure how to detect this setting on an object | 16:05 |
clarkb | I think there is a way with osc and/or swiftclient to show object details not contents similar to a server show | 16:06 |
frickler | I did use "openstack object show", but I don't think it showed any expiry | 16:06 |
clarkb | did that show other X-* metadata headers? I wonder if it is just not listing that one or that implies we didn't set it | 16:11 |
clarkb | problem is we're trying to determine if we didn't set it at all so have to be careful about the lack of info :) | 16:12 |
clarkb | reminder that DST ended in Europe over the weekend and it will end in North America this weekend. Keep that in mind for meeting times set in UTC | 16:28 |
fungi | ooh! i just realized, the long-standing mailman bug which used to require a super user for archive message/thread deletion functions was fixed in the recent upgrade! now list owners can delete messages and threads in hyperkitty | 16:41 |
frickler | clarkb: this is how it looks like https://paste.opendev.org/show/b2Uj9MmZyeZqfXVvYKoE/ | 16:43 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add documentation about Django/Mailman super user https://review.opendev.org/c/opendev/system-config/+/933668 | 16:43 |
fungi | frickler: ^ | 16:44 |
fungi | i also updated the list so you don't have to dig it out of the private groupvars | 16:44 |
frickler | fungi: thx, will check | 16:44 |
fungi | thanks! | 16:45 |
frickler | clarkb: I also can't seem to set the expiry with "openstack object create", maybe I need to check the swift client. not sure what zuul-registry does | 16:45 |
clarkb | frickler: I think the expiry is just a generic metadata header so if you can set those you're good. Unfortunately that paste doesn't seem to have enough info on whether or not we're seeing that metadata or not. That said since we want a 180 day expiry deleting older object should be fine? | 16:46 |
clarkb | at least for blobs and manifests. If there is other data stored then that may be longer lived for the server | 16:47 |
corvus | the zuul registry does not rely on automatic expiry | 17:34 |
corvus | it needs to be pruned, and the last time someone pruned it, it did net behave as expected, so it may need further debugging/development. | 17:35 |
corvus | (prune is a zuul-registry subcommand) | 17:36 |
corvus | but if it needs to be pruned because it's too big, feel free to try it and just know that there may be some corruption; it may be easier/better/faster/safer to just delete the whole container. | 17:38 |
corvus | none of that should be related to the tls errors though | 17:38 |
clarkb | re deleting the whole container I just remembered that fungi was looking into doing that efficiently. Was there ever a solution/resolution to that? | 17:55 |
corvus | different cloud i thought; may be able to delete the whole thing in the web ui; i'm not sure | 17:58 |
fungi | yeah, it sounds like the solution is to create a new container, and then ask the admins of the cloud to purge the old one | 17:58 |
clarkb | ah ya if it is a different cloud then tooling may exist for that | 17:58 |
corvus | otherwise, no i think it's just recursive delete | 17:58 |
clarkb | 158.69.73.1 is the held 2.2.6 etherpad node. I'll try to test that and the new pad deletion feature (to see if it works with anonymous pads at all) after all ym meetings today | 17:59 |
fungi | at least the cli/sdk seemed like it didn't have a way to purge a container without recursively deleting contents first, and even that was capped at a fairly small number of deletions per api request | 17:59 |
fungi | i can't remember if the bulk delete call was limited to 1k or 10k objects | 18:00 |
opendevreview | Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add Fedora 40 to the CI tests https://review.opendev.org/c/openstack/diskimage-builder/+/933664 | 19:14 |
fungi | with held node 158.69.73.1 i've tested out https://etherpad.opendev.org/p/test and it looks good so far | 19:27 |
fungi | i haven't tried any admin api interactions yet though | 19:27 |
fungi | maybe after dinner | 19:27 |
clarkb | fungi: note the pad deletion isn't an admin api interaction its through the pad settings menu in the ui iirc | 19:28 |
clarkb | as creator | 19:28 |
fungi | oh | 19:29 |
fungi | neat! | 19:29 |
fungi | is that new? | 19:29 |
clarkb | yes thats the 2.2.6 change | 19:30 |
fungi | yeah, i tried it out and it seemed to work. as a non-creator of https://etherpad.opendev.org/p/test i suppose someone else should confirm that they don't have the option to delete that pad | 19:30 |
clarkb | but I don't know how that works (if at all) with no auth so want to check it | 19:30 |
clarkb | you were able to delete a pad as the creator? | 19:30 |
fungi | yes | 19:31 |
clarkb | interesting so ya we need to make sure other users can't do so | 19:31 |
clarkb | I bet if you clear your cache and reconnect it won't know you are the creator any longer as you'd have a new cookie? | 19:32 |
fungi | i created a pad called testdelete and just wrote "stuff" in it, then tried the delete option in the config ui and refreshed and was back to the original boilerplate new pad content again | 19:32 |
fungi | maybe it's by ip address? i checked in a different account container and still have a button that prompts me for confirmation... | 19:33 |
clarkb | hrm if its by ip that won't work because NAT | 19:33 |
clarkb | won't workfor us I mean | 19:33 |
clarkb | maybe it will fail if you confirm? | 19:34 |
clarkb | though that seems unlikely if it is asking for confirmation | 19:34 |
fungi | "Admin message: [19.34.58]: You are not the creator of this pad, so you cannot delete it" | 19:35 |
clarkb | ok so you do get as far as confirm this action then it fails? I guess that is workable if confusing UI | 19:35 |
fungi | okay, so it basically asks you to confirm the delete action, and then throws up an error if you're not the owner session (tested in a different account container in my browser) | 19:35 |
clarkb | I'm going to eat lunch then figure out a bike ride but I'll do testing on my end after all that too | 19:37 |
fungi | similarly popping out to get a bite to eat, then i'll be back | 19:43 |
clarkb | fungi: I also meant to point out that our test suite checks some admin api tasks so I'm less worried about admin stuff | 22:49 |
clarkb | anyway I'm now back from all the things and pulling up etherpad testing now | 22:49 |
clarkb | I have reproduced the pad deletion failure (which is what we want) on fungi's test pad | 22:51 |
clarkb | it looks good to me after some basic testing. I guess we can proceed with that upgrade tomorrow too maybe? | 22:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!