Tuesday, 2025-11-04

ykarelthanks corvus 04:46
ykarelHi https://review.opendev.org/ looks down, can someone check06:02
tonybykarel: on it06:08
ykarelthx tonyb 06:09
tonyb#status log review.opendev.org was shutoff approximately an hour ago.  Server restarted.  Working on Gerrit restart06:21
opendevstatustonyb: finished logging06:21
mnasiadkaOh boy07:00
tonybmnasiadka: Yeah.07:01
tonyb#status log  review.opendev.org service restored.  albeit a little slow while caches rebuild08:42
opendevstatustonyb: finished logging08:42
tonybmnasiadka: I kinda recall there was something you wanted to talk about at the last OpenDev meeting?  If so it's in about 11 hours08:46
mnasiadkatonyb: that got sorted out outside of OpenDev meeting08:50
mnasiadkaI know it’s in 11 hours, because it’s just after the TC one08:51
mnasiadkaI’ll come - maybe there will be something I can help in OpenDev - sort of looking to be more active outside of Kolla, so anywhere where I can help - just shout ;)08:51
tonybmnasiadka: oh there are plenty of places to help08:52
tonybwe'll sort something out during the meeting 08:52
opendevreviewSimon Westphahl proposed zuul/zuul-jobs master: Fix ensure-rust wrapper script  https://review.opendev.org/c/zuul/zuul-jobs/+/96602010:11
fungitonyb: i guess there were no errors in the server show output? if not, then this sounds identical to the outage at the summit, so oom killer taking out the vm or similar i guess13:53
fungianyway, i confirm the service is up now, so thanks for taking care of it quickly13:53
clarkbre slowness starting up I believe that deleting the large h2 cache files will address that14:45
clarkb(this should be done when gerrit is stopped)14:45
clarkbinfra-root I made a note on the meeting agenda that we should file a ticket with vexxhost if anyone has done that before or wants to figure out how to do it I think that is our next step14:46
clarkbsomething along the liens of "we noticed the server shutdown on november 11 and october whatever day it was. We don't know why this is happening as the instance doesn't seem to have any records of problems in its logs and nova doesn't show any errors. Talking to the nova team we believe that this could potentially happen if the hypervisor runs out of memory, but that is a hunch at14:47
clarkbthis point and would requier the cloud to debug"14:47
opendevreviewClark Boylan proposed opendev/system-config master: Drop /opt/project-config/gerrit/projects.ini on review  https://review.opendev.org/c/opendev/system-config/+/96608315:21
opendevreviewClark Boylan proposed opendev/system-config master: Update Gerrit images to 3.10.9 and 3.11.7  https://review.opendev.org/c/opendev/system-config/+/96608415:21
clarkbmnasiadka: right now the things I'm looking at are the etherpad 2.5.2 upgrade, gitea 1.25.0 upgrade, and the gerrit 3.11 upgrade. Those first two have held nodes that you can interact with and test to see if they work for you (or not). That is good feedback to have pre upgrade and I can dig up ip addresses for you if you are interested in spot checking things.16:35
clarkbmnasiadka: the gerrit upgrade just went back to limbo due to the chagnes above. We probably need to get gerrit sorted out in terms of uptime and restarts and being on latest bugfixes before we dig into upgrading too much16:36
clarkbbeyond that I also need to look at bootstrapping matrix for opendev comms16:37
mnasiadkaSure - can help with etherpad and gitea testing - if you can pass the ips of the held nodes I can have a look16:37
clarkbhttps://etherpad.opendev.org/p/opendev-running-todo-list16:38
mnasiadkaGerrit - probably the cloud hosting that VM has some issues (if that’s OOM it’s hitting)16:38
clarkbthis is our mpre high level backlog/todo/wants list16:38
clarkbya need to figure out filing a ticket16:38
clarkbmnasiadka: https://213.32.78.118:3081/opendev/system-config is the held gitea 1.25.016:40
clarkb50.56.157.144 is the held etherpad. you have yo put this one in /etc/hosts for etherpad.opendev.org due to how redirects work. there is a clarkb-test pad but feel free to make new ones too16:42
mnasiadkaAny known gitea or etherpad regressions I should keep an eye on?16:43
clarkbmnasiadka: for etherpad the root page at / had css issues where the open pad by name button didn't render nicely in versions 2.5.0 and 2.5.1. We believe 2.5.2 fixes that (and appars to in our screenshots and my local testing) but good to check with your brwoser too16:45
clarkbthen for gitea no known issues yet, they publish a large changelog here: https://github.com/go-gitea/gitea/blob/v1.25.0/CHANGELOG.md16:46
fungimnasiadka: at this point, hypervisor host oom is just a likely guess because we use a very-high-ram flavor for that vm and the outward symptoms (limited as they may be) are consistent with an oom kill16:47
fungiit's something the cloud operators would need to confirm anyway when they (hopefully) look into it16:48
fungiguilhermesp___, ricolin_ and mnaser are the folks who would usually check that, if they're around16:48
fungibut if we open a ticket about it, they may be able to track it better once they have some time16:49
mnaseryes, a ticket would fare much better than an irc ping, at least it goes in the right queues =)16:50
mnaserget pulled into a million things every second16:50
clarkbmnaser: understood. fwiw I've logged into horizon and don't see a link to submit a ticket from there. I assume we need to login via vexxhost id to your main website and do it from there but it looks like we only ever got keystone credentials so I'm not sure how to do that16:51
clarkbmnaser: is there a workaround for that (maybe I'm missing a link or maybe we can login using keystone creds there too? or maybe we can send email to a special address?)16:52
mnaseryeah i think you are a bit of a unicorn with an openstack account only, send support@vexxhost.com and life will be easy -- just call out this is for the openinfra ci account16:52
clarkbmnaser: thank you! I'll get that done soonish16:53
clarkbtuesday morning is my morning of meetings but I should be able to draft something during the meeting blocks16:53
mnasiadkafungi: well, from what tonyb said on #openstack-nova - it was in shutoff state - so a lot of possibilities - from OOM/libvirt killing the VM to a power off from inside the VM (but that’s rather impossible without it being logged) - but basically everything outside of Nova’s hands16:54
fungiright16:54
mnaserit is prooooobably oom, i'll have to see why, we do have a fair bit of reserved memory16:54
clarkbmnaser: thats what we figured. I'll get an email sent with the two latest occurences and their timestamps as well as info about the host (name, uuid)16:55
fungiit's happened a couple of times in the past month, so figuring it out in order to avoid it better in the future is of interest to us16:55
clarkbfungi: you don't happen to have rough timelines for the occurence during the summit do you?16:57
clarkbI should be able to dig it up from logs in irc if nothing else16:57
fungii would just end up digging it out of irc history myself, sorry16:57
fungii know i captured and dug into the approximate time the outage started, just don't remember the details (and wouldn't trust my memory anyway since i was on a different continent at the time)16:58
clarkbya I'll find it in the logs16:59
corvushttps://matrix.to/#/!GXiijUJAGqDLBuZqwV:matrix.org/$7agBEPLFoFifgX_5wjyiiWcLaREDZWTKcPi6jXF1khk?via=matrix.org&via=matrix.uhurutec.com&via=ubuntu.com16:59
corvussunday october 1916:59
fungioh, also it may have been relayed by someone else because my shell server was offline due to an unrelated incident in rackspace flex, so i wasn't in irc at the time16:59
clarkbwe still have syslog logs for both so I'm going off of those timestamps17:08
mnasiadkaclarkb: I think I’ve been in all dark corners of gitea and changelog doesn’t list anything that looks weird17:10
clarkbmnasiadka: thank you for checking: maybe drop a note on https://review.opendev.org/c/opendev/system-config/+/965960 to record that?17:11
mnasiadkadone17:14
clarkbthanks17:14
clarkbok email sent to vexxhost support17:14
clarkbI cc'd opendev admins on it too17:14
fungithanks!17:25
mnasiadkaclarkb: done the same with etherpad, if everything works even in Safari, then I guess it should be fine17:40
clarkbmnasiadka: yup I think we can probably proceed with that upgrade at this point. My testing all looked good as did tonyb's17:41
clarkbI've got my large block of meetings then lunch (and maybe a bike ride if weather cooperates) so either later today or first thing tomorrow on etherpad17:41
clarkbits good you checked safari as I am not able to :)17:41
Clark[m]As expected gitea 1.25.1 just released 20:06
clarkbok I have about a 2 or 3 hour window before the next atmospheric river is supposed to arrive so I'm going t o take it20:37
clarkbbut when I get back we can decide if we are comfortable with upgrading etherpad nad I'll udpate the gitea change and swap its hold for 1.25.120:37
*** mnaser[m] is now known as mnaser20:46
mnaserDid https://docs.opendev.org/opendev/infra-specs/latest/specs/matrix_for_opendev.html ever end up happening or it's still mostly IRC only?20:48
mnaserI'm setting up Matrix for me and just wondering :)20:48
Clark[m]mnaser it hasn't happened yet but hopefully will soon20:49
fungimnaser: we just talked about it in the meeting an hour ago, tonyb will work on setting up the channel on our homeserver soonish, corvus is planning to work on the new statusbot for it, and mnasiadka was interested in helping with some aspects of the move too20:50
mnasersounds good!20:50
fungithere is already a #openstack-ops:opendev.org channel on our homeserver that might be of interest in the meantime though, in addition to #zuul:opendev.org20:51
Clark[m]I need to clean up after my bike ride but I think I have enough time to upgrade etherpad when done22:26
tonybthanks Clark[m] 22:34
clarkbhttps://review.opendev.org/c/opendev/system-config/+/956593 has been approved. I will monitor it22:48
opendevreviewClark Boylan proposed opendev/system-config master: Update Gitea to 1.25  https://review.opendev.org/c/opendev/system-config/+/96596022:50
opendevreviewClark Boylan proposed opendev/system-config master: DNM intentional Gitea failure to hold a node  https://review.opendev.org/c/opendev/system-config/+/84818122:50
clarkbthat updates to 1.25.1 and I'll recycle the autohold too22:51
opendevreviewMerged opendev/system-config master: Upgrade etherpad to 2.5.2  https://review.opendev.org/c/opendev/system-config/+/95659323:22
clarkbthe deployment job for ^ failed. I have logged into the server and it appears to be running the old image and the new image isn't even present yet so we're good for now. I'll look at job logs momentarily23:28
clarkbERROR: for etherpad  toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit23:29
clarkbI'm going to put the docker hub ipv4 addresses in /etc/hosts on etherpad.o.o to force it to use ipv4 then reenqueue the deployment jobs. The only question I have about doing that is the image promotion job is part of the deployment buildset. corvus do you know if we can safely rerun the promotion job? I think yes?23:30
clarkbbtu I'm not sure if it will "safely" fail because the promotion is already done and then the deployment itself won't run because the promotion failed...23:30
clarkbI've forgotten how annoying these rate limits are now that most things are on quay23:31
clarkbactually we just do a docker-compose pull and then docker-compose up -d. I'll just do that rather than reenqueue the deploy buildset as that is easy23:36
clarkbok etherpad reports it is up and healthy now23:41
corvusclarkb: ack.  lmk if you have further questions but easy sounds good23:41
corvusmeanwhile we broke the logjam on zuul changes, so the multi-provider fix merged and was just promoted23:41
corvusi'll restart the launchers23:41
clarkbhttps://etherpad.opendev.org/p/gerrit-upgrade-3.10 loads for me whcih has a lot of formatting23:41
clarkbcorvus: thanks23:41
clarkbarg the main page css formatting is still a little odd but only on firefox23:42
clarkbdespite all the testing it still comes out that way for some reason23:42
corvuswhat's odd?23:43
clarkbcorvus: the text tries to escape the button block23:43
clarkbit doesn't do that on chrom* and I reproted it upstream and they fixed some stuff and when I tested with the held node it didn't do that so now I'm slightly confused23:43
clarkbits not critical23:43
clarkbbut I was trying to get it fixed before we upgraded and thought it was so annoyed that it isn't23:44
corvuscan you screenshot what you see that's wrong?23:44
clarkbyes one sec23:45
corvus#status log restarted zuul launchers with multi-provider ready node fix23:45
opendevstatuscorvus: finished logging23:46
corvusykarel: ^ fyi thanks and sorry :)23:46
clarkbcorvus: I shared it with you on matrix because I wasn't sure how else to do it (I guess imgur?)23:47
corvusoh that main page23:47
corvusyeah i see that too that's weird23:47
clarkbit doesn't do that on chrom*23:48
corvussorry for not understanding the words you very clearly typed -- i just got fixated on the actual pad.  :)23:48
clarkbI'm going to remove the /etc/hosts overrides on the prod server then I'm goign to recheck the held server node23:48
clarkbits compeltely functional just looks odd23:48
clarkbso I don't think we need to rollback23:48
clarkbya the held server at 50.56.157.144 doesn't do it. I wonder if apache is caching things maybe? (I'm doing my testing in an incognito tab so theoretically my browser isn't caching it)23:52
clarkbbut I don't see any explicit caching config in the vhost23:52
clarkbcorvus: do you think it is worth restarting apache to see if its maybe doing some implicit caching?23:52
clarkbI'm going to try and grab css files first to compare23:53
corvusyeah that doesn't sound like it should be a thing, but it's easy and low impact, so why not23:53
clarkbanother idea that I hate to entertain is that the v2.5.2 tag moved between when I held the node to when we just built etherpad23:55
clarkbok I have confirmed that teh css files are different23:57
clarkbnow to restart apache2 and see if that changes23:57
clarkbnope that didn't change anything23:58
clarkbso now I'm going to check if the tag moved23:58

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!