*** tosky has quit IRC | 00:15 | |
clarkb | ianw: left some comments on ^ note the inline comments aren't the reason for the -1, the top level comment is | 00:27 |
---|---|---|
clarkb | let me know if I've missed something obvious and I can ammend my review | 00:27 |
ianw | cool, replied. i may have missed gitea, will check in a sec | 00:31 |
ianw | basically i think the script should work by looping through and trying to backup everything, and if one part fails, the whole thing will exit with !0 | 00:32 |
clarkb | ianw: re your reply on the pipefail: I mention it because you are doing `bash foo | something else` and that will only exit non zero if the something elsefails | 00:32 |
clarkb | I agree with your plan. I'm just worried we'll ignore if bash foo fails | 00:32 |
clarkb | I think if we set -o pipefail we'll get both things | 00:33 |
clarkb | ? | 00:33 |
clarkb | this is distinct from set -e | 00:33 |
ianw | oh yes i see. we can check PIPEFAIL or whatever that is, that's a good idea for robustness | 00:33 |
ianw | PIPESTATUS | 00:34 |
clarkb | ah ya if we can check it directly too that would work | 00:34 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 00:40 |
ianw | clarkb: nice ideas, thanks, implemented with ^ | 00:41 |
clarkb | ianw: one little formatting thing that yaml will eb sad about. Otherwise that lgtm | 00:42 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 00:43 |
ianw | indeed, i was actually just playing with yamllint wrt to https://review.opendev.org/c/opendev/system-config/+/733406 | 00:44 |
clarkb | side note: would it be worth testing starting review-test up against an empty accountPatchReviewDb? | 00:44 |
clarkb | and if that works with the only loss being the little check marks on the ui next to things you've reviewed maybe we just stop backing up the review database entirely? | 00:45 |
clarkb | what I'm not sure about is if there are tendrils of that data in the notedb. I don't think there are, but it is possible | 00:45 |
ianw | possibly, but i don't think it's a major size concern | 00:45 |
clarkb | ok | 00:46 |
ianw | i mean, it's not going to be atomic with any gerrit state saved in backups anyway, so if they do communicate ... i gues sit's likely to be corrupt | 00:46 |
ianw | or at least ... have corruption | 00:46 |
clarkb | ya | 00:46 |
clarkb | that change lgtm now too | 00:47 |
clarkb | I still think docs telling people to set up db backups separately would be a good addtion :) | 00:49 |
ianw | clarkb: yes, i've started :) it got all tangled up in my modifying the stuff we have there about rotation, which now i'm not sure what to do. i'll separate it out | 00:50 |
clarkb | ++ to separating the two things and we can update them as we get to each piece | 00:50 |
clarkb | re the gerrit testing my change seemed to have worked but post failured on some unrelated issues that appear to be network related getting logs. I rechecked it | 00:50 |
ianw | it was all going to be so simple... :) | 00:50 |
clarkb | ianw: I used your screenshots to confirm the gerrit versions were as expected too :) | 00:51 |
clarkb | I thought that would end up in the container logs but didn't find them there | 00:51 |
clarkb | (I think because it logs that info the disk the container log just gets stdout/stderr which is fairly short) | 00:51 |
ianw | yeah with some sleep() and adujsting the viewport the screenshots seem to be pretty good now | 00:52 |
clarkb | and now it is time to go figure out some dinner | 00:52 |
*** diablo_rojo has quit IRC | 00:56 | |
ianw | #status log afsdb01/02 restarted with afs 1.8 packages | 01:15 |
openstackstatus | ianw: finished logging | 01:15 |
openstackgerrit | Merged openstack/diskimage-builder master: Install last stable version of get-pip.py script https://review.opendev.org/c/openstack/diskimage-builder/+/772254 | 01:16 |
*** dviroel has quit IRC | 01:54 | |
*** mlavalle has quit IRC | 01:55 | |
openstackgerrit | Merged opendev/system-config master: Manage afsdb servers with Ansible https://review.opendev.org/c/opendev/system-config/+/771340 | 02:03 |
*** lbragstad_ has joined #opendev | 02:14 | |
*** ysandeep|away is now known as ysandeep | 02:15 | |
*** lbragstad has quit IRC | 02:17 | |
*** hemanth_n has joined #opendev | 02:19 | |
*** DSpider has quit IRC | 02:55 | |
openstackgerrit | Merged opendev/system-config master: borg-backup: implement saving a stream, use for database backups https://review.opendev.org/c/opendev/system-config/+/771738 | 03:11 |
*** ykarel has joined #opendev | 03:18 | |
*** ykarel has quit IRC | 03:27 | |
*** d34dh0r53 has quit IRC | 03:47 | |
*** d34dh0r53 has joined #opendev | 03:48 | |
*** d34dh0r53 has quit IRC | 03:48 | |
*** d34dh0r53 has joined #opendev | 03:49 | |
*** d34dh0r53 has quit IRC | 03:49 | |
*** lbragstad_ is now known as lbragstad | 03:51 | |
*** d34dh0r53 has joined #opendev | 03:53 | |
*** d34dh0r53 has quit IRC | 03:55 | |
*** d34dh0r53 has joined #opendev | 03:56 | |
*** d34dh0r53 has quit IRC | 03:56 | |
*** d34dh0r53 has joined #opendev | 03:57 | |
*** d34dh0r53 has joined #opendev | 03:58 | |
*** d34dh0r53 has joined #opendev | 03:59 | |
*** brinzhang has quit IRC | 04:52 | |
*** brinzhang has joined #opendev | 04:53 | |
*** ykarel has joined #opendev | 05:01 | |
*** brinzhang_ has joined #opendev | 05:02 | |
*** brinzhang has quit IRC | 05:05 | |
ianw | clarkb/kopecmartin : re 705258 i left review comments, but i've started https://etherpad.opendev.org/p/refstack-docker to try and flesh out the steps we'll use to bring up the host and various other things. as mentioned, i think it would be good to validate the db migration procedure on a test host before we start production host | 05:49 |
*** ykarel_ has joined #opendev | 05:50 | |
*** ykarel has quit IRC | 05:53 | |
*** ykarel_ is now known as ykarel | 05:53 | |
*** ykarel_ has joined #opendev | 05:58 | |
*** marios has joined #opendev | 05:59 | |
*** ykarel has quit IRC | 06:00 | |
*** ykarel_ is now known as ykarel | 06:15 | |
*** dirtygiraffe has joined #opendev | 06:58 | |
*** dirtygiraffe has quit IRC | 07:02 | |
*** brinzhang_ has quit IRC | 07:04 | |
*** brinzhang_ has joined #opendev | 07:04 | |
*** eolivare has joined #opendev | 07:28 | |
*** slaweq has joined #opendev | 07:28 | |
*** ralonsoh has joined #opendev | 07:28 | |
*** hashar has joined #opendev | 07:58 | |
*** hashar has quit IRC | 08:01 | |
*** hashar has joined #opendev | 08:01 | |
*** sboyron_ has joined #opendev | 08:04 | |
*** fressi has joined #opendev | 08:04 | |
*** ysandeep is now known as ysandeep|lunch | 08:18 | |
*** andrewbonney has joined #opendev | 08:19 | |
*** valery_t has joined #opendev | 08:21 | |
*** ykarel is now known as ykarel|lunch | 08:21 | |
valery_t | I need a reviewer for my review https://review.opendev.org/c/openstack/python-openstackclient/+/773649 | 08:22 |
*** valery_t has quit IRC | 08:32 | |
frickler | wow, that one was really hasty | 08:33 |
cgoncalves | hey folks. not sure if this issue has been reported or not, apologies in advance. https://releases.openstack.org is super slow, CI jobs timing out | 08:36 |
cgoncalves | (HTTP 443, connection timed out) | 08:38 |
*** tosky has joined #opendev | 08:40 | |
*** rpittau|afk is now known as rpittau | 08:41 | |
openstackgerrit | Merged openstack/diskimage-builder master: Remove the deprecated ironic-agent element https://review.opendev.org/c/openstack/diskimage-builder/+/771808 | 08:45 |
*** valery_t has joined #opendev | 08:49 | |
frickler | cgoncalves: works fine for me, do you have some logs? is this our CI or downstream? | 08:51 |
cgoncalves | frickler, https://zuul.opendev.org/t/openstack/build/4d4a897c012e4f7a8cd13d16fdb114f8/log/controller/logs/dib-build/amphora-x64-haproxy.qcow2_log.txt | 08:51 |
cgoncalves | I also hit HTTP 443 locally | 08:52 |
*** valery_t has quit IRC | 08:55 | |
*** jpena|off is now known as jpena | 08:57 | |
*** brinzhang_ has quit IRC | 08:59 | |
*** brinzhang_ has joined #opendev | 09:00 | |
frickler | hmm, seems to be a bit of a load spike, but I don't see anything wrong locally http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=140&rra_id=all | 09:02 |
frickler | there also seems to be a regular peak in io load starting every day at 6, not sure if that are our periodic jobs or the backup possibly, ianw? | 09:04 |
cgoncalves | frickler, FYI 2m11s http://paste.openstack.org/show/802273/ | 09:04 |
cgoncalves | and thanks for checking! | 09:05 |
*** DSpider has joined #opendev | 09:07 | |
*** valery_t_ has joined #opendev | 09:14 | |
*** ysandeep|lunch is now known as ysandeep | 09:38 | |
priteau | Good morning. tarballs.o.o is extremely slow for me today. I remember it happened some time ago and someone restarted apache (IIRC) which fixed it | 09:53 |
priteau | Yeah, that was on 2020-11-27 | 09:55 |
priteau | 16:31 fungi: #status log restarted apache2 on static.opendev.org in order to troubleshoot very long response times | 09:56 |
priteau | cgoncalves: I see the same problem | 09:57 |
priteau | frickler: See quote from fungi above ^^^ | 09:58 |
*** wanzenbug has joined #opendev | 10:00 | |
*** wanzenbug has quit IRC | 10:04 | |
ttx | Yes, affects docs.openstack.org too | 10:10 |
*** CeeMac has joined #opendev | 10:20 | |
frickler | #status log restarted apache2 on static.opendev.org in order to resolve slow responses and timeouts | 10:20 |
openstackstatus | frickler: finished logging | 10:20 |
frickler | ttx: priteau: cgoncalves: infra-root: ^^ looks better to me currently, please let us know if you see any further issues | 10:21 |
cgoncalves | frickler, functional now. thanks a lot! | 10:22 |
priteau | Thank you frickler! upper constraints fetched in 1 to 2 seconds | 10:23 |
*** ykarel|lunch is now known as ykarel | 10:29 | |
*** sshnaidm|afk is now known as sshnaidm|ruck | 10:35 | |
*** hashar has quit IRC | 10:45 | |
*** dtantsur|afk is now known as dtantsur | 10:49 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 11:13 |
*** dviroel has joined #opendev | 11:14 | |
openstackgerrit | Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 11:43 |
*** hrw has joined #opendev | 12:18 | |
hrw | morning | 12:18 |
hrw | can someone review/approve https://review.opendev.org/c/openstack/project-config/+/772887 patch? it adds centos 8 stream for aarch64 nodes | 12:19 |
*** jpena is now known as jpena|lunch | 12:41 | |
openstackgerrit | Pedro Luis Marques Sliuzas proposed openstack/project-config master: Add Metrics Server App to StarlingX https://review.opendev.org/c/openstack/project-config/+/773883 | 12:42 |
*** hemanth_n has quit IRC | 13:00 | |
*** hrw has quit IRC | 13:19 | |
openstackgerrit | Merged openstack/project-config master: CentOS 8 Stream initial enablement for AArch64 https://review.opendev.org/c/openstack/project-config/+/772887 | 13:25 |
openstackgerrit | Pedro Luis Marques Sliuzas proposed openstack/project-config master: Add Metrics Server App to StarlingX https://review.opendev.org/c/openstack/project-config/+/773883 | 13:26 |
*** jpena|lunch is now known as jpena | 13:37 | |
*** ykarel_ has joined #opendev | 13:51 | |
*** ykarel has quit IRC | 13:54 | |
*** whoami-rajat__ has joined #opendev | 13:55 | |
*** lbragstad has quit IRC | 13:57 | |
*** ykarel_ is now known as ykarel | 13:59 | |
*** brinzhang_ has quit IRC | 14:17 | |
*** brinzhang_ has joined #opendev | 14:17 | |
*** zoharm has joined #opendev | 14:33 | |
*** akahat|rover is now known as akahat | 14:34 | |
*** brinzhang_ has quit IRC | 14:35 | |
*** lbragstad has joined #opendev | 14:35 | |
*** brinzhang_ has joined #opendev | 14:36 | |
*** ysandeep is now known as ysandeep|afk | 14:48 | |
*** bcafarel has quit IRC | 14:58 | |
*** d34dh0r53 has quit IRC | 15:01 | |
*** d34dh0r53 has joined #opendev | 15:01 | |
*** fressi has quit IRC | 15:23 | |
*** ykarel_ has joined #opendev | 15:30 | |
*** ysandeep|afk is now known as ysandeep | 15:31 | |
*** ykarel has quit IRC | 15:32 | |
*** alfred188 has joined #opendev | 15:50 | |
*** ykarel_ is now known as ykarel | 16:00 | |
clarkb | hrw isn't here anymore, but that arm64 centos 8 stream image has me wondering if maybe the centos 8 image should be removed? I don't know if anything is using it currently though | 16:04 |
*** ysandeep is now known as ysandeep|away | 16:06 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Run gerrit 3.2 and 3.3 functional tests https://review.opendev.org/c/opendev/system-config/+/773807 | 16:08 |
*** ykarel has quit IRC | 16:17 | |
*** d34dh0r53 has quit IRC | 16:18 | |
*** d34dh0r53 has joined #opendev | 16:19 | |
openstackgerrit | Matt McEuen proposed openstack/project-config master: New Project Request: airship/gerrit-to-github-bot https://review.opendev.org/c/openstack/project-config/+/773936 | 16:19 |
*** mlavalle has joined #opendev | 16:34 | |
*** hashar has joined #opendev | 16:45 | |
fungi | clarkb: given the concern over centos 8 vs centos stream 8 it seemed like projects were going to want to have both available at least for a bit so they can make sure stream still works the same for them | 16:49 |
clarkb | fungi: yup, but I'm not sure if anything used the arm64 centos 8 image? | 16:49 |
clarkb | seemed like most of the work there was done on debuntu, but I am probably also just working on out of date info | 16:50 |
fungi | oh, the arm64 images specifically. right, that may be | 16:50 |
*** sshnaidm|ruck is now known as sshnaidm | 16:52 | |
*** zoharm has quit IRC | 16:52 | |
*** marios is now known as marios|out | 17:24 | |
*** ralonsoh has quit IRC | 17:31 | |
clarkb | using codesearch kolla uses centos-8-arm64 in the kolla-centos8-aarch64 nodeset and that is used in kolla-build-centos8-source-aarch64. Opendev also uses it to build the arm64 centos 8 wheel cache | 17:36 |
clarkb | my hunch is that hrw is adding the new image for kolla so that kolla-build-centos8-source-aarch64 job is likely to get replaced with a stream job. Once that happens we can drop the centos-8 wheel cache in favor of a stream wheel cache and then drop the image I bet | 17:36 |
clarkb | but we can't just drop it today | 17:36 |
fungi | yeah, that sounds about right | 17:37 |
clarkb | infra-root I think the stack at https://review.opendev.org/c/opendev/system-config/+/773807/ is ready for review now. These are housekeeping changes to add gerrit 3.3 image builds and testing | 17:40 |
clarkb | I figured out why those jobs were post failuring and it was beacuse the run playbook was short circuiting due to an error which caused a log file copy to fail since the file wasn't present | 17:41 |
clarkb | tl;dr best to look at post failures as if they are actual failures first | 17:41 |
fungi | #status log Requested Spamhaus SBL delisting for the lists.katacontainers.io IPv6 address | 17:48 |
openstackstatus | fungi: finished logging | 17:48 |
fungi | infra-root: i checked all the addresses and hostnames for lists.opendev.org and they're still clean | 17:49 |
fungi | just as a heads up | 17:49 |
clarkb | thanks! | 17:50 |
*** valery_t_ has quit IRC | 17:56 | |
*** jpena is now known as jpena|off | 17:59 | |
clarkb | iurygregory: I have approved https://review.opendev.org/c/openstack/project-config/+/772427 to allow ironic project cores to edit hashtags on the appropriate projects. I would be curious to hear how that goes | 18:04 |
iurygregory | clarkb, awesome thanks! after it merges I will give a try | 18:05 |
corvus | i'd be in favor of allowing that for all auth'd users | 18:06 |
clarkb | iurygregory: note there will be a delay while we sync the acl update, you can follow along in the deploy pipeline on zuul status for that change (it will be the manage-projects job) | 18:07 |
*** eolivare has quit IRC | 18:07 | |
fungi | corvus: yeah, i think we mostly wanted to see how it played out for volunteer test projects before we turned it on globally | 18:07 |
clarkb | yup | 18:07 |
iurygregory | clarkb, ack | 18:08 |
fungi | main concern is that any user can remove a hashtag, so some projects may find that they want to override our global access for it and restrict it to a core reviewer group | 18:08 |
*** rpittau is now known as rpittau|afk | 18:08 | |
fungi | but honestly, there are so many ways someone can vandalize a change in gerrit, i'm not too concerned about rampant hashtag deletion | 18:08 |
*** fbo is now known as fbo|off | 18:09 | |
corvus | yep, that's my thought. a measured introduction with clear guidelines would probably help. maybe a standard place (CONTRIBUTING?) to describe a project's "reserved" hashtags | 18:10 |
*** zimmerry has joined #opendev | 18:10 | |
openstackgerrit | Merged openstack/project-config master: Update ACLs of Ironic Projects to allow Edit Hashtags https://review.opendev.org/c/openstack/project-config/+/772427 | 18:10 |
clarkb | fungi: looking at project-config changes I notice that you've got a revert to reenable gentoo image builds again. I presume that means they are off now? should we reenable them at this point? | 18:12 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474 | 18:13 |
*** dtantsur is now known as dtantsur|afk | 18:16 | |
fungi | clarkb: prometheanfire had some fixes into dib, we probably need to check that those appear in a release before we try again | 18:20 |
clarkb | got it | 18:20 |
*** d34dh0r53 has quit IRC | 18:22 | |
openstackgerrit | Merged openstack/project-config master: Remove anachronistic jobs from scciclient https://review.opendev.org/c/openstack/project-config/+/772908 | 18:23 |
*** d34dh0r53 has joined #opendev | 18:24 | |
fungi | i see at least one gentoo-related entry in `git log --no-merges --oneline 3.6.0..origin/master` | 18:28 |
fungi | ianw: frickler: how do you feel about tagging another dib release? | 18:28 |
fungi | looks like that'll pull in the get-pip.py change too | 18:29 |
fungi | and a fix for centos stream | 18:29 |
prometheanfire | fungi: clarkb yep, the gentoo update would be nice, iirc it may help fix the build issues for gentoo | 18:33 |
openstackgerrit | Merged openstack/project-config master: Add Metrics Server App to StarlingX https://review.opendev.org/c/openstack/project-config/+/773883 | 18:52 |
corvus | i'm seeing gerrit http response times > 30s in gertty | 18:53 |
clarkb | load is a bit high right now, but not drastically so. I've been having decent luck through the web ui. I wouldn't say its super fast, but also hasn't been terribly slow doing project-config and zuul-job reviews | 18:54 |
*** marios|out has quit IRC | 18:54 | |
clarkb | dansmith is doing things with the api to get zuul comments (based on conversation in -infra) | 18:56 |
clarkb | I wonder if that could be related, or if its just another researcher | 18:56 |
clarkb | load appears to be falling off now | 18:59 |
iurygregory | clarkb, worked =) | 19:01 |
iurygregory | https://review.opendev.org/c/openstack/bifrost/+/766742 I can add hashtag for a change that I'm not the owner/uploader | 19:02 |
clarkb | iurygregory: cool, if you end up with some examples of how you are using it, I would be interested in seeing those | 19:02 |
iurygregory | \o/ | 19:02 |
iurygregory | the idea is that we will use to track priorities for review | 19:02 |
clarkb | iurygregory: you could tag changes "urgent" I guess | 19:03 |
clarkb | and then core reviewers start reviewing anything tagged urgent when they review sort of thing? | 19:03 |
iurygregory | and probably for backports also (we are thinking in add the backport-candidate label) and maybe try to use the gerrit api to automatically add a hashtag that would tell we need to have backport in some patches | 19:04 |
iurygregory | clarkb, with the hashtag we can have a simple search in gerrit | 19:04 |
clarkb | corvus: I wonder too if possibly updating acls slows things down (maybe there are locsk involved in that?) | 19:04 |
iurygregory | https://review.opendev.org/q/hashtag:bifrost | 19:04 |
iurygregory | for example | 19:04 |
iurygregory | so maybe we will have specific ironic hashtags we want to use to make things easier for us and have a dashboard that would help the community | 19:05 |
clarkb | right, in that example "bifrost" is implied because it is the bifrost repo. But I can see how other values for things like backports and urgency would help out | 19:06 |
*** andrewbonney has quit IRC | 19:17 | |
*** psliuzas has joined #opendev | 19:18 | |
psliuzas | Hey folks, My commit just got merged https://review.opendev.org/c/openstack/project-config/+/773883 and I would like to be the first core reviewer for the repo starlingx/metrics-server-armada-app , could someone help me with that? thanks! | 19:24 |
openstackgerrit | Matt McEuen proposed openstack/project-config master: New Project Request: airship/gerrit-to-github-bot https://review.opendev.org/c/openstack/project-config/+/773936 | 19:29 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474 | 19:32 |
fungi | psliuzas: sure, taking care if it now, just a moment | 19:32 |
fungi | psliuzas: oh, our deployment automation hasn't run for that yet, i'll check it again in a few minutes | 19:34 |
psliuzas | Thanks! | 19:40 |
fungi | infra-prod-manage-projects TIMED_OUT in 30m 39s | 19:43 |
fungi | i guess that's why it wasn't created | 19:43 |
fungi | looking into it now | 19:43 |
fungi | "Failed to set desciption for: openstack/puppet-openstack_extras 500 Server Error: Internal Server Error for url: https://localhost:3000/api/v1/repos/openstack/puppet-openstack_extras" | 19:47 |
fungi | looks like gitea01 may be having a bad day | 19:47 |
fungi | it errored about setting descriptions on a bunch of projects | 19:48 |
openstackgerrit | Merged openstack/project-config master: New Project Request: airship/gerrit-to-github-bot https://review.opendev.org/c/openstack/project-config/+/773936 | 19:49 |
fungi | i'll keep an eye on that one ^ and see if the problem persists | 19:51 |
clarkb | fungi: thanks | 19:57 |
clarkb | I wnt to say we considered making the project description update failures non fatal? | 19:58 |
clarkb | it happens in a different spot than the initial project setup iirc, so we could separate those two concerns and get the project update on the next pass wheneverthat happens | 19:58 |
fungi | yeah, it's not 100% clear to me from the log that's why it didn't run tasks for gerrit, but seems likely | 19:59 |
clarkb | fungi: I think the whole job short circuits if the gitea stuff fails because we don't want to create a repo in gerrit that will fail to replicate | 19:59 |
fungi | yeah | 20:03 |
clarkb | I'll look into that after my bike ride as that seems like a good improvement | 20:04 |
openstackgerrit | Merged zuul/zuul-jobs master: bindep: remove set_fact usage when converting string to list https://review.opendev.org/c/zuul/zuul-jobs/+/771585 | 20:09 |
*** hashar has quit IRC | 20:11 | |
*** klonn has joined #opendev | 20:16 | |
fungi | now gitea08 is returning "Internal Server Error for url: https://localhost:3000/api/v1/orgs/pypa/teams?limit=50&page=2" according to the latest log | 20:23 |
fungi | and gitea04 said "401 Client Error: Unauthorized for url: https://localhost:3000/api/v1/user/orgs?limit=50&page=1" | 20:24 |
fungi | i wonder if something is going sideways in gitea | 20:24 |
clarkb | cacti shows significant new cpu demand on 01 | 20:25 |
clarkb | 04 was in a similar situation until recently but seems to have subsided | 20:26 |
clarkb | 07 and 08 exhibit similar | 20:27 |
fungi | yeah, seeing that. maybe we're getting slammed by something/someone | 20:27 |
fungi | if it hasn't subsided by the time my kettle reaches a boil, i'll start digging into apache access logs and looking at blocking abusive client addresses | 20:29 |
clarkb | fungi: remember that you need to map the connecting port in apache to the haproxy logs in the lb syslog | 20:30 |
clarkb | fungi: since from apache's perspective all connections originate from the load balancer | 20:30 |
clarkb | hrm apache may not be logging that :/ | 20:31 |
clarkb | fungi: maybe we set up something like 'LogFormat "%h:%{remote}p %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined' and then tell CustomLog to use that format? | 20:40 |
clarkb | I'll push that change up now and we can iterate on it if necessary | 20:40 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add remote port info to gitea apache access logs https://review.opendev.org/c/opendev/system-config/+/774000 | 20:43 |
fungi | ahh, yep | 20:45 |
fungi | established tcp connections through the lb shot waaay up around 19z | 20:48 |
*** sshnaidm is now known as sshnaidm|afk | 20:52 | |
ianw | o/ | 20:53 |
clarkb | ianw: for the last little bit the giteas are doing the dance similar to the thing you wrote the apache vhost for | 20:54 |
clarkb | ianw: we've noticed that apache isn't logging source ports so hard to map to the load ablancer logs so I pushed https://review.opendev.org/c/opendev/system-config/+/774000 | 20:54 |
ianw | interesting, and it seems like they must be coming from separately hashed addresses if multiple gitea's are feeling it | 20:55 |
fungi | seems the lb is strafing problem traffic to different backends until they oom | 20:56 |
clarkb | I think too what can happen is one gets overwhelmed and the lb takes it out of the pool and then the addrs shift | 20:56 |
fungi | so it could be just one or a small handful of client addresses | 20:56 |
fungi | but memory consumption seems to be the predominant symptom, we're reaching oom conditions on backend servers | 20:57 |
fungi | huh, any idea why we've got afs set up on the gitea servers? | 20:57 |
clarkb | I don't see afs on gitea08, but I may not be looking properly | 20:58 |
fungi | d'oh, my bad. i should be on gitea08 not ze08 | 20:59 |
fungi | yeah, definite oom there | 20:59 |
fungi | killed a gitea process | 20:59 |
fungi | clarkb spotted one ipv4 address making a ton of requests which were getting directed to gitea01 where the current memory crisis seems to be unfolding. i've temporarily blocked it in iptables on the lb to see what happens | 21:03 |
clarkb | already load seems better fwiw | 21:03 |
fungi | mnaser: seems we may be getting spammed by very heavy git clone operations in volume from 38.102.83.175 which looks like a vexxhost customer (but isn't us as far as i can tell). i've temporarily blocked access from that address to the git servers | 21:05 |
clarkb | that is a bit of an imperfect correlation without the port details | 21:05 |
clarkb | we can get the logging improvement in then open things up and see what we can infer from there | 21:06 |
fungi | yeah, and i'll watch the logs here for a bit, then try to remove the block rule and see if the problem resumes | 21:06 |
ianw | there is something cloning /vexxhost/* with an odd UA "GET /vexxhost/helm-charts/info/refs?service=git-upload-pack HTTP/1.1" 200 8436 "-" "git/1.0" | 21:10 |
*** psliuzas has quit IRC | 21:10 | |
ianw | # cat gitea-ssl-access.log | grep 'git/1.0' | awk '{print $7}' | sort | uniq -c | 21:11 |
ianw | 1229 /vexxhost/helm-charts/info/refs?service=git-upload-pack | 21:11 |
ianw | 297 /vexxhost/openstack-operator/info/refs?service=git-upload-pack | 21:11 |
ianw | 297 /vexxhost/rbac-helm/info/refs?service=git-upload-pack | 21:11 |
ianw | it has very particular interest | 21:11 |
fungi | yeah, the potentially problematic requests i was seeing all had git/1.0 as the us | 21:11 |
fungi | usa | 21:11 |
fungi | gah, ua | 21:11 |
ianw | https://github.com/src-d/go-git/blob/master/plumbing/transport/http/common.go#L19 | 21:12 |
fungi | looks like gitea01 also reached oom conditions | 21:12 |
fungi | yep | 21:13 |
fungi | [Wed Feb 3 20:54:22 2021] Killed process 29676 (gitea) total-vm:30048404kB, anon-rss:7604728kB, file-rss:0kB, shmem-rss:0kB | 21:13 |
fungi | problem client(s) may have gotten punted by the lb to a fresh backend after that | 21:14 |
ianw | gitea01 seems to have no "git/1.0" UA requests? | 21:15 |
fungi | so far the problem seems to have hit 01, 04, 07 and 08 | 21:16 |
fungi | 01 looks reasonably healthy again in past 10-15 minutes | 21:19 |
fungi | i don't see any indication the load has shifted to another backend | 21:20 |
fungi | the secondary symptom of established tcp connection count on the lb has also seems to have subsided around the same timeframe | 21:21 |
fungi | in a few more minutes i'll try removing the firewall rule blocking 38.102.83.175 | 21:21 |
ianw | i can't pick any common themes from the logs like on gitea08 with the git/1.0 thing. although git/1.0 seems to be a pretty common thing used in a few git libraries. all it really indicates is whatever is cloning isn't actually a basic git client, but something using a library | 21:23 |
fungi | i've approved another project creation change, in hopes that might flush the incompletely applied changes from earlier | 21:25 |
fungi | gonna pop out to check the mail while that grist churns through the mill, brb | 21:27 |
*** hamalq has joined #opendev | 21:27 | |
*** klonn has quit IRC | 21:32 | |
*** whoami-rajat__ has quit IRC | 21:34 | |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474 | 21:34 |
openstackgerrit | Merged openstack/project-config master: Add ansible-role-pki repo https://review.opendev.org/c/openstack/project-config/+/773385 | 21:36 |
fungi | and just waiting for that to deploy now | 21:37 |
fungi | seems to be in progress, tailing the log on bridge.o.o | 21:41 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474 | 21:44 |
fungi | so far it's only gitea06 which hasn't reported any task result | 21:46 |
fungi | load average there is pretty high | 21:47 |
fungi | like around 10 right now | 21:47 |
fungi | looks like swap usage is spiking on 06 just in the last poll interval | 21:48 |
fungi | possible it's just manage-projects briefly running through all the descriptions | 21:49 |
*** sboyron_ has quit IRC | 21:49 | |
fungi | but the other 7 backends completed far faster | 21:49 |
fungi | yeah, swap is getting exhausted quickly there | 21:51 |
fungi | already basically no ram available and half the swap in use | 21:51 |
fungi | okay, seems to be subsiding now | 21:52 |
fungi | no oom (yet anyway) | 21:52 |
fungi | i need to start working on dinner but will try to keep one eye on my terminals | 21:55 |
ianw | ok sorry back now | 22:05 |
ianw | seems like we don't really have a smoking gun | 22:05 |
*** openstackgerrit has quit IRC | 22:11 | |
fungi | not so far, no | 22:15 |
clarkb | have we gotten ebtter logging in place? | 22:39 |
ianw | umm i +2'd it but it hadn't finished testing | 22:40 |
clarkb | I think last time we weren't able to pinpoint anything until we had similar in the gitea logs | 22:40 |
ianw | looks like it's still moving through | 22:40 |
clarkb | I expect that to be a big help given previous experiences | 22:40 |
*** slaweq has quit IRC | 22:41 | |
*** slaweq has joined #opendev | 22:43 | |
fungi | looks like the manage-projects run actually completed without timing out | 22:45 |
clarkb | lots of connections from a single vexxhost ip to gitea06 according to the lb | 22:45 |
clarkb | I wonder if its just bouncing around a few IPs there? | 22:45 |
fungi | psliuzas is gone, but i've added them to starlingx-metrics-server-armada-app-core | 22:46 |
fungi | does look like i caught that right as load was ramping up for gitea06 | 22:46 |
fungi | can see it probably hit an oom condition a few minutes ago now | 22:47 |
clarkb | as far as I can tell these IPs from vexxhost are not part of our gitea cluster or in our nodepool logs (so not ours) | 22:47 |
fungi | [Wed Feb 3 22:30:41 2021] Killed process 14724 (gitea) total-vm:22863900kB, anon-rss:7564180kB, file-rss:0kB, shmem-rss:0kB | 22:47 |
fungi | yeah so oom on gitea06 ~47 minutes ago | 22:47 |
*** slaweq has quit IRC | 22:47 | |
fungi | er, ~17 | 22:47 |
clarkb | load was high as of a fwe minuts ago | 22:48 |
clarkb | ya | 22:48 |
clarkb | I didn't think it was that long ago :) | 22:48 |
clarkb | I feel like the key is to catch whichever one is next now | 22:49 |
clarkb | before it goes compeltely sad | 22:49 |
fungi | i've reset iptables on gitea-lb01 now so 38.102.83.175 is no longer blocked | 22:49 |
fungi | as it didn't seem that one (or that one alone anyway) was the problem | 22:50 |
clarkb | 06 has the highest system load of the set, the rest look quite happy actually | 22:50 |
fungi | system load average is back down around 1 now | 22:51 |
fungi | on gitea06 | 22:51 |
clarkb | that vexxhost IP seems to have continuously made requests that hit 06 for hours and hours and hours | 22:53 |
clarkb | which is interesting, but maybe an indication it isn't to blame | 22:54 |
clarkb | however that vexxhost IP made far and away the most requests to gitea06 while cacti reports it as being under high load | 22:58 |
fungi | eyeballing the overall impact, it's possible these two ip addresses together are the cause | 23:01 |
fungi | since it looks like blocking one of them may have roughly halved the effect | 23:01 |
fungi | but it's also possible utilization is trailing off in general for the day, and is no longer compounding the problem | 23:02 |
fungi | mildly amusing, the address i blocked earlier, when stuffed into my web browser, reveals that it's actually trunk-centos8.rdoproject.org | 23:04 |
fungi | and the other one seems to be trunk-primary.rdoproject.org | 23:05 |
fungi | so maybe we need to reach out to rdo folks and make sure everything is okay on their end? | 23:05 |
fungi | we probably even have some rdo people in here or at least in #openstack-infra who can check on things | 23:07 |
fungi | and would probably be faster than having vexxhost support act as a relay for the discussion | 23:08 |
*** openstackgerrit has joined #opendev | 23:24 | |
openstackgerrit | Merged opendev/system-config master: Add remote port info to gitea apache access logs https://review.opendev.org/c/opendev/system-config/+/774000 | 23:24 |
clarkb | I was able to usethe new logging from ^ on gitea06 to correlate some requests to the rdo host fungi pointed out above. | 23:39 |
clarkb | That is the host I identified as making the bulk of the requests to gitea06 via haproxy logs | 23:39 |
clarkb | still not an indication that what they did is wrong (and in fact they seem to regularly poll repos for ref updates) | 23:39 |
clarkb | but was a good test case for: do our logs give us what we need to correlate things now and I think they do. | 23:39 |
clarkb | We might consider logging the apache source port on the connection to gitea so that we can correlate between apache and gitea too? | 23:40 |
clarkb | actually I don't know how to expose that with apache logging | 23:41 |
clarkb | %{format}p doesn't seem to have a format for that | 23:41 |
clarkb | ianw: fungi fwiw I think at this point we largely need to see it happening again so that we log it with the data necessary to correlate things then go from there | 23:42 |
ianw | ++ | 23:42 |
corvus | i wonder if we can get metrics from gerrit on certain operations (like how long a push takes) | 23:42 |
corvus | i was wondering that as i just pushed a change and it seemed to take a good 10-15 seconds | 23:43 |
clarkb | corvus: for replication to gitea? oh this is separate | 23:43 |
corvus | yeah, sorry, separate | 23:43 |
clarkb | corvus: I think that you can probably get that out of the ssh logs | 23:43 |
clarkb | I want to say there is timing info there and there should be enough info to split out the git operations | 23:43 |
corvus | might be a nice thing to track in a dashboard as opposed to anecdata | 23:43 |
clarkb | but its been while since I looked at that log file | 23:43 |
clarkb | fwiw I noticed that pushing to gerrit's gerrit is similarly slow (but I've only pushed a handful of times to there recently) | 23:44 |
clarkb | also that is over http not ssh | 23:44 |
corvus | clarkb: yeah; though i always chalked that up to their backend (i assume a lot of distributed locking is involved) | 23:44 |
corvus | i sort of assumed they had a high cost for each push, but that they could scale out to a lot of simultaneous pushes (to different repos at least) | 23:45 |
corvus | but that's totally just assumption/inference on my part | 23:45 |
*** tosky has quit IRC | 23:54 | |
clarkb | fungi: it appears that updating project descriptions is already a best effort attempt and shouldn't case things to fail | 23:59 |
clarkb | fungi: I think the implication there is that something failed when trying to create the new project in gitea and that was a valid failure | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!