Tuesday, 2024-10-29

opendevreviewTakashi Kajinami proposed openstack/diskimage-builder master: Remove the usage of pkg_resource  https://review.opendev.org/c/openstack/diskimage-builder/+/93332405:54
fricklerinfra-root: the issues with the registry seem to persist after the restart, see e.g. https://zuul.opendev.org/t/openstack/build/3d0dca8ff09d41f49832835f0ae08c91 , I'm trying to do some debugging now but if anyone has more knowledge about that setup, help will be welcome08:12
tkajinamfrickler, thanks !08:13
tkajinamthe issue is consistently seen but I don't see any specific failure in logs. actually log ends with "TASK [upload-logs-swift : Upload logs to swift]" so I suspect something went wrong during log upload but I'm struggling to understand details08:14
fricklertkajinam: if you look at the timestamps, you see about 30 minutes pass after starting that task and then the timeout happens. so likely something getting stuck on the registry side, I'm trying to find logs for that now08:15
fricklerI didn't find any matching logs from the registry container. there are some ssl errors in the logs, but from different timestamps, so I'm assuming some rogue clients/portscans08:38
fricklerlooking at the swift backend, it seems to be working, although the number of objects and amounts of data looks pretty huge to me, but not sure if that might be related to any issue08:39
fricklerit does seem like expiration isn't (always?) working though, as I found some objects dating back to 2020, while we should have 180d set as limit08:40
fricklerI guess I'll try to restart the container once again and see if that helps at least for some small amount of time. if not, running zuul-registry with debug enabled would be the next step08:46
fricklerfwiw I went for a reboot of the server instead of just restarting the containers, figured it couldn't hurt to pick up the updated kernel and resolve possible resource exhaustion issues08:52
fricklerrecheck of the osc patch is running and I'm also holding the node in case it fails again to allow for more testing from the client side08:53
fricklerok, now the build job passed and the container log also does contain some relevant logs for the upload. so I guess I'll need to dig deeper into the logs after the previous restart and try to locate when it stopped working09:18
fricklerI didn't find anything that looks helpful in the log, seems the last successful upload before my restart was at 2024-10-26 14:43:32,60009:40
fricklerso I guess we'll just watch what happens now over the next 24h or so09:40
*** ykarel_ is now known as ykarel09:49
tkajinamyeah I see the build job now succeeds09:51
opendevreviewyatin proposed opendev/irc-meetings master: Move Neutron CI Weekly meeting 1 hour later  https://review.opendev.org/c/opendev/irc-meetings/+/93363710:14
opendevreviewMerged opendev/irc-meetings master: Revert "Update kolla meeting time"  https://review.opendev.org/c/opendev/irc-meetings/+/93355611:18
opendevreviewMerged opendev/irc-meetings master: Move Neutron CI Weekly meeting 1 hour later  https://review.opendev.org/c/opendev/irc-meetings/+/93363711:18
NeilHanlono/ morning folks. looks like gerritlib is back with another issue, this time due to setuptools 74. I'll try and throw a fix up this week if I can figure it out :) https://bugzilla.redhat.com/show_bug.cgi?id=231966313:37
NeilHanlon(cc fungi)13:43
*** ykarel_ is now known as ykarel13:59
fungithanks!14:19
fungiNeilHanlon: i can successfully use https://pypi.org/p/build in a fresh venv to build the sdist and wheel for gerritlib 0.11.0 without pinning setuptools, is copr maybe extra strict about deprecation warnings?14:29
fungithis was with setuptools 75.3.0 and wheel 0.44.014:32
NeilHanlonhm, maybe they changed something again between 74.1.x and 75.3.x? 14:40
NeilHanloni think it's https://github.com/pypa/setuptools/issues/93114:41
NeilHanlon`error: invalid command 'test'`14:41
NeilHanlonheh14:41
NeilHanlonfeel free to ignore fungi... this is a packaging bug :) 14:42
NeilHanlonhttps://src.fedoraproject.org/rpms/python-gerritlib/blob/rawhide/f/python-gerritlib.spec#_6714:42
fungiperfect!14:44
NeilHanlonthanks for looking at it before I got a chance to! saved me some time for sure14:44
fungino worries, i do normally consider myself fairly up to date on the overall python packaging scene and follow most of the discussions on the python discourse, so as to make sure we keep our projects relatively compliant (that said, there are plenty of setuptools deprecation warnings spewed when building gerritlib packages, which will need to be dealt with at some point, though i think most of14:47
fungithem are ultimately needing adjustments to pbr)14:47
clarkbya we haven't dropped setup.py test support because pbr can be used with older setup.py. You just have to stop using the command if your toolchain is sufficiently new14:53
NeilHanlonyep yep, i'm just gonna switch to running nox in the %check of the rpm14:58
fungisounds like a good idea15:01
fricklerfungi: picking up from the discussion in the nova channel, do we have a doc somewhere how to use the list pw for e.g. openstack-discuss (which I found on bridge) in order to access the moderation interface for the list? or should I rather get myself added as moderator with my usual mailman account?15:45
clarkbI think the easiest way is going to be having your personal account as moderator on the various lists you are willing to moderate. For admin I think you login as admin to mailman3 like you would your normal user but then you have access to everything? fungi should be able to confirm and ya maybe we need a blurb on that in our docs15:48
fungifrickler: yes, ideally have your own moderator account if you want to help with moderator activities, but the admin user should still be able to access the list moderation features as well15:50
fricklerok, then how do I log in as admin user? I tried as just "admin" and that didn't seem to work, either15:50
fungilemme check the file, hold on15:51
fricklerand I likely wouldn't want to count as "regular" moderator, more like just in case of emergencies (although the event this morning wouldn't count as that ... maybe make it "on-demand" rather ;)15:52
opendevreviewDmitriy Rabotyagov proposed openstack/diskimage-builder master: Add support for building Fedora 40  https://review.opendev.org/c/openstack/diskimage-builder/+/92210915:53
opendevreviewDmitriy Rabotyagov proposed openstack/diskimage-builder master: Add Fedora 40 to the CI tests  https://review.opendev.org/c/openstack/diskimage-builder/+/93366415:54
fungifrickler: aha, looks like i need to clear out the old mailman v2 info from the list. you're effectively using the django admin user (password is in the private ansible inventory group vars), but i'll copy it to the usual place and make it clearer15:55
fricklerfungi: ah, ok, that explains it. so you are using a dedicated account for moderation, different from your usual subscription account?15:56
fungii'll throw a section in https://docs.opendev.org/opendev/system-config/latest/lists.html too15:56
fungifrickler: it's not a dedicated account, just your normal subscriber account which can have additional roles (moderator, owner)15:56
fricklerok, then I misread "your own moderator account"15:57
fungiyeah, i meant your own account instead of the shared admin account15:57
clarkbyesterday I tried to get an etherpad v2.2.6 change tested and I seem to be continuously timing out when pushing to the intermediate registry. corvus was that ssl issue with the zuul registry you mentioned showing up as a timeout in jobs or did that fail immediately?15:59
fungiclarkb: see scrollback and recheck15:59
fungi08:12-09:51 utc16:00
clarkboh thanks I had indeed missed that16:00
fungibut getting to the bottom of what caused it would definitely be a good idea16:00
clarkbso restarting wasn't the fix (unlikely to be the tls issue then) and rebooting did help (maybe it was a network stack problem?)16:00
clarkbfrickler: re the 180 expiry swift is responsible for that if you set the metadata header flag thing. We should probably check those object properly have the flag. Maybe they predate setting it and we should clear them?16:01
clarkbas a side note I think we can declare bankruptcy on that entire container and delete all the contents and start over you may just need to recheck things to get zuul happy again afterwards16:01
fricklerclarkb: actually it looks like the restart earlier did help, but only for < 24h16:02
fricklerclarkb: yes, I didn't check since when we do set the expiry, I'm also not sure how to detect this setting on an object16:05
clarkbI think there is a way with osc and/or swiftclient to show object details not contents similar to a server show16:06
fricklerI did use "openstack object show", but I don't think it showed any expiry16:06
clarkbdid that show other X-* metadata headers? I wonder if it is just not listing that one or that implies we didn't set it16:11
clarkbproblem is we're trying to determine if we didn't set it at all so have to be careful about the lack of info :)16:12
clarkbreminder that DST ended in Europe over the weekend and it will end in North America this weekend. Keep that in mind for meeting times set in UTC16:28
fungiooh! i just realized, the long-standing mailman bug which used to require a super user for archive message/thread deletion functions was fixed in the recent upgrade! now list owners can delete messages and threads in hyperkitty16:41
fricklerclarkb: this is how it looks like https://paste.opendev.org/show/b2Uj9MmZyeZqfXVvYKoE/16:43
opendevreviewJeremy Stanley proposed opendev/system-config master: Add documentation about Django/Mailman super user  https://review.opendev.org/c/opendev/system-config/+/93366816:43
fungifrickler: ^16:44
fungii also updated the list so you don't have to dig it out of the private groupvars16:44
fricklerfungi: thx, will check16:44
fungithanks!16:45
fricklerclarkb: I also can't seem to set the expiry with "openstack object create", maybe I need to check the swift client. not sure what zuul-registry does16:45
clarkbfrickler: I think the expiry is just a generic metadata header so if you can set those you're good. Unfortunately that paste doesn't seem to have enough info on whether or not we're seeing that metadata or not. That said since we want a 180 day expiry deleting older object should be fine?16:46
clarkbat least for blobs and manifests. If there is other data stored then that may be longer lived for the server16:47
corvusthe zuul registry does not rely on automatic expiry17:34
corvusit needs to be pruned, and the last time someone pruned it, it did net behave as expected, so it may need further debugging/development.17:35
corvus(prune is a zuul-registry subcommand)17:36
corvusbut if it needs to be pruned because it's too big, feel free to try it and just know that there may be some corruption; it may be easier/better/faster/safer to just delete the whole container.17:38
corvusnone of that should be related to the tls errors though17:38
clarkbre deleting the whole container I just remembered that fungi was looking into doing that efficiently. Was there ever a solution/resolution to that?17:55
corvusdifferent cloud i thought; may be able to delete the whole thing in the web ui; i'm not sure17:58
fungiyeah, it sounds like the solution is to create a new container, and then ask the admins of the cloud to purge the old one17:58
clarkbah ya if it is a different cloud then tooling may exist for that17:58
corvusotherwise, no i think it's just recursive delete17:58
clarkb158.69.73.1 is the held 2.2.6 etherpad node. I'll try to test that and the new pad deletion feature (to see if it works with anonymous pads at all) after all ym meetings today17:59
fungiat least the cli/sdk seemed like it didn't have a way to purge a container without recursively deleting contents first, and even that was capped at a fairly small number of deletions per api request17:59
fungii can't remember if the bulk delete call was limited to 1k or 10k objects18:00
opendevreviewDmitriy Rabotyagov proposed openstack/diskimage-builder master: Add Fedora 40 to the CI tests  https://review.opendev.org/c/openstack/diskimage-builder/+/93366419:14
fungiwith held node 158.69.73.1 i've tested out https://etherpad.opendev.org/p/test and it looks good so far19:27
fungii haven't tried any admin api interactions yet though19:27
fungimaybe after dinner19:27
clarkbfungi: note the pad deletion isn't an admin api interaction its through the pad settings menu in the ui iirc19:28
clarkbas creator19:28
fungioh19:29
fungineat!19:29
fungiis that new?19:29
clarkbyes thats the 2.2.6 change19:30
fungiyeah, i tried it out and it seemed to work. as a non-creator of https://etherpad.opendev.org/p/test i suppose someone else should confirm that they don't have the option to delete that pad19:30
clarkbbut I don't know how that works (if at all) with no auth so want to check it19:30
clarkbyou were able to delete a pad as the creator?19:30
fungiyes19:31
clarkbinteresting so ya we need to make sure other users can't do so19:31
clarkbI bet if you clear your cache and reconnect it won't know you are the creator any longer as you'd have a new cookie?19:32
fungii created a pad called testdelete and just wrote "stuff" in it, then tried the delete option in the config ui and refreshed and was back to the original boilerplate new pad content again19:32
fungimaybe it's by ip address? i checked in a different account container and still have a button that prompts me for confirmation...19:33
clarkbhrm if its by ip that won't work because NAT19:33
clarkbwon't workfor us I mean19:33
clarkbmaybe it will fail if you confirm?19:34
clarkbthough that seems unlikely if it is asking for confirmation19:34
fungi"Admin message: [19.34.58]: You are not the creator of this pad, so you cannot delete it"19:35
clarkbok so you do get as far as confirm this action then it fails? I guess that is workable if confusing UI19:35
fungiokay, so it basically asks you to confirm the delete action, and then throws up an error if you're not the owner session (tested in a different account container in my browser)19:35
clarkbI'm going to eat lunch then figure out a bike ride but I'll do testing on my end after all that too19:37
fungisimilarly popping out to get a bite to eat, then i'll be back19:43
clarkbfungi: I also meant to point out that our test suite checks some admin api tasks so I'm less worried about admin stuff22:49
clarkbanyway I'm now back from all the things and pulling up etherpad testing now22:49
clarkbI have reproduced the pad deletion failure (which is what we want) on fungi's test pad22:51
clarkbit looks good to me after some basic testing. I guess we can proceed with that upgrade tomorrow too maybe?22:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!