Tuesday, 2024-10-29

opendevreview	Takashi Kajinami proposed openstack/diskimage-builder master: Remove the usage of pkg_resource https://review.opendev.org/c/openstack/diskimage-builder/+/933324	05:54
frickler	infra-root: the issues with the registry seem to persist after the restart, see e.g. https://zuul.opendev.org/t/openstack/build/3d0dca8ff09d41f49832835f0ae08c91 , I'm trying to do some debugging now but if anyone has more knowledge about that setup, help will be welcome	08:12
tkajinam	frickler, thanks !	08:13
tkajinam	the issue is consistently seen but I don't see any specific failure in logs. actually log ends with "TASK [upload-logs-swift : Upload logs to swift]" so I suspect something went wrong during log upload but I'm struggling to understand details	08:14
frickler	tkajinam: if you look at the timestamps, you see about 30 minutes pass after starting that task and then the timeout happens. so likely something getting stuck on the registry side, I'm trying to find logs for that now	08:15
frickler	I didn't find any matching logs from the registry container. there are some ssl errors in the logs, but from different timestamps, so I'm assuming some rogue clients/portscans	08:38
frickler	looking at the swift backend, it seems to be working, although the number of objects and amounts of data looks pretty huge to me, but not sure if that might be related to any issue	08:39
frickler	it does seem like expiration isn't (always?) working though, as I found some objects dating back to 2020, while we should have 180d set as limit	08:40
frickler	I guess I'll try to restart the container once again and see if that helps at least for some small amount of time. if not, running zuul-registry with debug enabled would be the next step	08:46
frickler	fwiw I went for a reboot of the server instead of just restarting the containers, figured it couldn't hurt to pick up the updated kernel and resolve possible resource exhaustion issues	08:52
frickler	recheck of the osc patch is running and I'm also holding the node in case it fails again to allow for more testing from the client side	08:53
frickler	ok, now the build job passed and the container log also does contain some relevant logs for the upload. so I guess I'll need to dig deeper into the logs after the previous restart and try to locate when it stopped working	09:18
frickler	I didn't find anything that looks helpful in the log, seems the last successful upload before my restart was at 2024-10-26 14:43:32,600	09:40
frickler	so I guess we'll just watch what happens now over the next 24h or so	09:40
*** ykarel_ is now known as ykarel		09:49
tkajinam	yeah I see the build job now succeeds	09:51
opendevreview	yatin proposed opendev/irc-meetings master: Move Neutron CI Weekly meeting 1 hour later https://review.opendev.org/c/opendev/irc-meetings/+/933637	10:14
opendevreview	Merged opendev/irc-meetings master: Revert "Update kolla meeting time" https://review.opendev.org/c/opendev/irc-meetings/+/933556	11:18
opendevreview	Merged opendev/irc-meetings master: Move Neutron CI Weekly meeting 1 hour later https://review.opendev.org/c/opendev/irc-meetings/+/933637	11:18
NeilHanlon	o/ morning folks. looks like gerritlib is back with another issue, this time due to setuptools 74. I'll try and throw a fix up this week if I can figure it out :) https://bugzilla.redhat.com/show_bug.cgi?id=2319663	13:37
NeilHanlon	(cc fungi)	13:43
*** ykarel_ is now known as ykarel		13:59
fungi	thanks!	14:19
fungi	NeilHanlon: i can successfully use https://pypi.org/p/build in a fresh venv to build the sdist and wheel for gerritlib 0.11.0 without pinning setuptools, is copr maybe extra strict about deprecation warnings?	14:29
fungi	this was with setuptools 75.3.0 and wheel 0.44.0	14:32
NeilHanlon	hm, maybe they changed something again between 74.1.x and 75.3.x?	14:40
NeilHanlon	i think it's https://github.com/pypa/setuptools/issues/931	14:41
NeilHanlon	`error: invalid command 'test'`	14:41
NeilHanlon	heh	14:41
NeilHanlon	feel free to ignore fungi... this is a packaging bug :)	14:42
NeilHanlon	https://src.fedoraproject.org/rpms/python-gerritlib/blob/rawhide/f/python-gerritlib.spec#_67	14:42
fungi	perfect!	14:44
NeilHanlon	thanks for looking at it before I got a chance to! saved me some time for sure	14:44
fungi	no worries, i do normally consider myself fairly up to date on the overall python packaging scene and follow most of the discussions on the python discourse, so as to make sure we keep our projects relatively compliant (that said, there are plenty of setuptools deprecation warnings spewed when building gerritlib packages, which will need to be dealt with at some point, though i think most of	14:47
fungi	them are ultimately needing adjustments to pbr)	14:47
clarkb	ya we haven't dropped setup.py test support because pbr can be used with older setup.py. You just have to stop using the command if your toolchain is sufficiently new	14:53
NeilHanlon	yep yep, i'm just gonna switch to running nox in the %check of the rpm	14:58
fungi	sounds like a good idea	15:01
frickler	fungi: picking up from the discussion in the nova channel, do we have a doc somewhere how to use the list pw for e.g. openstack-discuss (which I found on bridge) in order to access the moderation interface for the list? or should I rather get myself added as moderator with my usual mailman account?	15:45
clarkb	I think the easiest way is going to be having your personal account as moderator on the various lists you are willing to moderate. For admin I think you login as admin to mailman3 like you would your normal user but then you have access to everything? fungi should be able to confirm and ya maybe we need a blurb on that in our docs	15:48
fungi	frickler: yes, ideally have your own moderator account if you want to help with moderator activities, but the admin user should still be able to access the list moderation features as well	15:50
frickler	ok, then how do I log in as admin user? I tried as just "admin" and that didn't seem to work, either	15:50
fungi	lemme check the file, hold on	15:51
frickler	and I likely wouldn't want to count as "regular" moderator, more like just in case of emergencies (although the event this morning wouldn't count as that ... maybe make it "on-demand" rather ;)	15:52
opendevreview	Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add support for building Fedora 40 https://review.opendev.org/c/openstack/diskimage-builder/+/922109	15:53
opendevreview	Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add Fedora 40 to the CI tests https://review.opendev.org/c/openstack/diskimage-builder/+/933664	15:54
fungi	frickler: aha, looks like i need to clear out the old mailman v2 info from the list. you're effectively using the django admin user (password is in the private ansible inventory group vars), but i'll copy it to the usual place and make it clearer	15:55
frickler	fungi: ah, ok, that explains it. so you are using a dedicated account for moderation, different from your usual subscription account?	15:56
fungi	i'll throw a section in https://docs.opendev.org/opendev/system-config/latest/lists.html too	15:56
fungi	frickler: it's not a dedicated account, just your normal subscriber account which can have additional roles (moderator, owner)	15:56
frickler	ok, then I misread "your own moderator account"	15:57
fungi	yeah, i meant your own account instead of the shared admin account	15:57
clarkb	yesterday I tried to get an etherpad v2.2.6 change tested and I seem to be continuously timing out when pushing to the intermediate registry. corvus was that ssl issue with the zuul registry you mentioned showing up as a timeout in jobs or did that fail immediately?	15:59
fungi	clarkb: see scrollback and recheck	15:59
fungi	08:12-09:51 utc	16:00
clarkb	oh thanks I had indeed missed that	16:00
fungi	but getting to the bottom of what caused it would definitely be a good idea	16:00
clarkb	so restarting wasn't the fix (unlikely to be the tls issue then) and rebooting did help (maybe it was a network stack problem?)	16:00
clarkb	frickler: re the 180 expiry swift is responsible for that if you set the metadata header flag thing. We should probably check those object properly have the flag. Maybe they predate setting it and we should clear them?	16:01
clarkb	as a side note I think we can declare bankruptcy on that entire container and delete all the contents and start over you may just need to recheck things to get zuul happy again afterwards	16:01
frickler	clarkb: actually it looks like the restart earlier did help, but only for < 24h	16:02
frickler	clarkb: yes, I didn't check since when we do set the expiry, I'm also not sure how to detect this setting on an object	16:05
clarkb	I think there is a way with osc and/or swiftclient to show object details not contents similar to a server show	16:06
frickler	I did use "openstack object show", but I don't think it showed any expiry	16:06
clarkb	did that show other X-* metadata headers? I wonder if it is just not listing that one or that implies we didn't set it	16:11
clarkb	problem is we're trying to determine if we didn't set it at all so have to be careful about the lack of info :)	16:12
clarkb	reminder that DST ended in Europe over the weekend and it will end in North America this weekend. Keep that in mind for meeting times set in UTC	16:28
fungi	ooh! i just realized, the long-standing mailman bug which used to require a super user for archive message/thread deletion functions was fixed in the recent upgrade! now list owners can delete messages and threads in hyperkitty	16:41
frickler	clarkb: this is how it looks like https://paste.opendev.org/show/b2Uj9MmZyeZqfXVvYKoE/	16:43
opendevreview	Jeremy Stanley proposed opendev/system-config master: Add documentation about Django/Mailman super user https://review.opendev.org/c/opendev/system-config/+/933668	16:43
fungi	frickler: ^	16:44
fungi	i also updated the list so you don't have to dig it out of the private groupvars	16:44
frickler	fungi: thx, will check	16:44
fungi	thanks!	16:45
frickler	clarkb: I also can't seem to set the expiry with "openstack object create", maybe I need to check the swift client. not sure what zuul-registry does	16:45
clarkb	frickler: I think the expiry is just a generic metadata header so if you can set those you're good. Unfortunately that paste doesn't seem to have enough info on whether or not we're seeing that metadata or not. That said since we want a 180 day expiry deleting older object should be fine?	16:46
clarkb	at least for blobs and manifests. If there is other data stored then that may be longer lived for the server	16:47
corvus	the zuul registry does not rely on automatic expiry	17:34
corvus	it needs to be pruned, and the last time someone pruned it, it did net behave as expected, so it may need further debugging/development.	17:35
corvus	(prune is a zuul-registry subcommand)	17:36
corvus	but if it needs to be pruned because it's too big, feel free to try it and just know that there may be some corruption; it may be easier/better/faster/safer to just delete the whole container.	17:38
corvus	none of that should be related to the tls errors though	17:38
clarkb	re deleting the whole container I just remembered that fungi was looking into doing that efficiently. Was there ever a solution/resolution to that?	17:55
corvus	different cloud i thought; may be able to delete the whole thing in the web ui; i'm not sure	17:58
fungi	yeah, it sounds like the solution is to create a new container, and then ask the admins of the cloud to purge the old one	17:58
clarkb	ah ya if it is a different cloud then tooling may exist for that	17:58
corvus	otherwise, no i think it's just recursive delete	17:58
clarkb	158.69.73.1 is the held 2.2.6 etherpad node. I'll try to test that and the new pad deletion feature (to see if it works with anonymous pads at all) after all ym meetings today	17:59
fungi	at least the cli/sdk seemed like it didn't have a way to purge a container without recursively deleting contents first, and even that was capped at a fairly small number of deletions per api request	17:59
fungi	i can't remember if the bulk delete call was limited to 1k or 10k objects	18:00
opendevreview	Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add Fedora 40 to the CI tests https://review.opendev.org/c/openstack/diskimage-builder/+/933664	19:14
fungi	with held node 158.69.73.1 i've tested out https://etherpad.opendev.org/p/test and it looks good so far	19:27
fungi	i haven't tried any admin api interactions yet though	19:27
fungi	maybe after dinner	19:27
clarkb	fungi: note the pad deletion isn't an admin api interaction its through the pad settings menu in the ui iirc	19:28
clarkb	as creator	19:28
fungi	oh	19:29
fungi	neat!	19:29
fungi	is that new?	19:29
clarkb	yes thats the 2.2.6 change	19:30
fungi	yeah, i tried it out and it seemed to work. as a non-creator of https://etherpad.opendev.org/p/test i suppose someone else should confirm that they don't have the option to delete that pad	19:30
clarkb	but I don't know how that works (if at all) with no auth so want to check it	19:30
clarkb	you were able to delete a pad as the creator?	19:30
fungi	yes	19:31
clarkb	interesting so ya we need to make sure other users can't do so	19:31
clarkb	I bet if you clear your cache and reconnect it won't know you are the creator any longer as you'd have a new cookie?	19:32
fungi	i created a pad called testdelete and just wrote "stuff" in it, then tried the delete option in the config ui and refreshed and was back to the original boilerplate new pad content again	19:32
fungi	maybe it's by ip address? i checked in a different account container and still have a button that prompts me for confirmation...	19:33
clarkb	hrm if its by ip that won't work because NAT	19:33
clarkb	won't workfor us I mean	19:33
clarkb	maybe it will fail if you confirm?	19:34
clarkb	though that seems unlikely if it is asking for confirmation	19:34
fungi	"Admin message: [19.34.58]: You are not the creator of this pad, so you cannot delete it"	19:35
clarkb	ok so you do get as far as confirm this action then it fails? I guess that is workable if confusing UI	19:35
fungi	okay, so it basically asks you to confirm the delete action, and then throws up an error if you're not the owner session (tested in a different account container in my browser)	19:35
clarkb	I'm going to eat lunch then figure out a bike ride but I'll do testing on my end after all that too	19:37
fungi	similarly popping out to get a bite to eat, then i'll be back	19:43
clarkb	fungi: I also meant to point out that our test suite checks some admin api tasks so I'm less worried about admin stuff	22:49
clarkb	anyway I'm now back from all the things and pulling up etherpad testing now	22:49
clarkb	I have reproduced the pad deletion failure (which is what we want) on fungi's test pad	22:51
clarkb	it looks good to me after some basic testing. I guess we can proceed with that upgrade tomorrow too maybe?	22:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!