Tuesday, 2025-03-11

ianwi can check 02:00 :)00:00
clarkbthe hourly service-nodepool failed and I suspect it is due to a full disk on nb0400:17
clarkbya /opt is full again arg. I'm goign to start a screen on that host, stop the service, then start a long running rm for those files00:17
clarkbok that is in progress on nb04 (just want to avoid any noise in the signal later if possible)00:20
Clark[m]I think the hourly jobs enqueued ahead of the daily jobs02:01
Clark[m]Hrm I don't see system config in periodic at all02:06
Clark[m]Oh wow there they are. A whole flood of projects after the initial 1502:07
Clark[m]Bootstrap bridge is starting for the daily jobs now02:14
Clark[m]Looking ok from the status page so far02:29
ianwstill all progressing in twos ... bridge utilisation seems sane02:56
ianwi guess parallel operation makes `/var/log/ansible.log` fairly useless as it all gets mixed up02:59
ianwperhaps instead of > output of runs we should set ANSIBLE_LOG_PATH?03:01
Clark[m]I thought we output to playbook specific log files03:01
ianwwe do but with a >>03:02
Clark[m]https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-production-playbook.yaml#L2103:03
Clark[m]Ya I guess I'm not understanding where the conflict is if it's playbook name specific?03:03
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: redirect via ansible logger  https://review.opendev.org/c/opendev/system-config/+/94399903:06
ianwoh just because it _also_ logs to /var/log/ansible.log via the global config.  I think that if we set ANSIBLE_LOG_FILE that will override that, and not have all the prod playbooks writing to the same file03:07
Clark[m]Oh does it log to a global file too? One thing we would need to check with that change is if it is safe to do so for jobs whose logs get published publicly03:07
ianwhaha i was about to say, now i think about it, we want to make sure we put anything that comes out into a local file too :)03:08
Clark[m]Since Ansible log output may not b the same as stdout/stderr03:08
Clark[m]So probably need to understanf any potential behavior differences between the two first?03:09
ianwi guess the best thing would be to >> to a .stdout.log 03:09
ianwbut then we have another log file to deal with on the encryption path03:09
ianw(not that i think anyone else's key but mine is in there for that :)03:10
Clark[m]Ya and regular capture path for those that do it03:10
ianwi wonder if ANSIBLE_LOG_FILE _and_ >> to the same file works ok03:10
Clark[m]Since we won't want to overexpose to start. One option may be to do both like you say then disable public publishing for everything. Then recheck the two files for anything that published before and add it back in03:10
ianwthe ordering may be completely out, but perhaps it doesn't matter that much03:10
Clark[m]If things look safe03:10
Clark[m]What is in the Ansible log file? Is it different?03:11
Clark[m]Sorry I'm not currently able to check that easily but can if it becomes urgent 03:11
Clark[m]Going back to general behavior here I think this is continuing to look good03:12
ianwit looks to me that what is in ansible.log as written out by ansible is the same as what is captured into each <service>.yaml.log 03:14
ianwi don't think this is a problem, as such ... just that we have the rather messy ansible.log file just a jumble of ever increasing mostly random stuff03:15
Clark[m]Got it. Less about ensuring correct behavior for the remote node config management and more about making debugging more straightforward 03:15
Clark[m]https://zuul.opendev.org/t/openstack/buildset/1c24d0f003e9427ea84e393f48120397 success!03:19
Clark[m]ianw: we use >> because we start the file with a header03:27
Clark[m]I suspect that is still fine with the proposed change though as Ansible should append too sounds like03:27
Clark[m]So probably the main thing to check is just that the content isn't more dangerous03:28
ianwyeah i don't think it's right because it will echo the output.  have to think about it :/03:43
Clark[m]It being the change to use the log env var?03:47
ianwyep04:47
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: redirect via ansible logger  https://review.opendev.org/c/opendev/system-config/+/94399904:59
ianw^ thought two - keep the log file capture as we have now, but burn ansible's own logging to /dev/null for Zuul runs.  but leave the default there so that if you run by hand, you still get logs in /var/log/ansible.log.  this is predicated on the idea that "log_path" in Ansible doesn't capture anything that stdout/stderr won't ... which i think is correct05:02
*** mrunge_ is now known as mrunge06:35
opendevreviewKarolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles  https://review.opendev.org/c/opendev/glean/+/94167207:16
*** jroll02 is now known as jroll008:45
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404509:23
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404510:35
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files  https://review.opendev.org/c/opendev/bindep/+/94071114:38
fungislight efficiency improvement in the latest revision of ^14:39
tkajinamhmm I just noticed a bit strange behavior in https://review.opendev.org/c/openstack/puppet-horizon/+/94323214:47
clarkblooking at periodic buildset runtimes for system-config typical runtime was 2 to 2.5 hours and last nights was 1 hour and 8 minutes14:47
tkajinama strange behavior of gerrit, I mean14:47
clarkba definite measurable improvment14:47
clarkbtkajinam: can you be more specific? what do you observe that is strange?14:47
tkajinamif you look at my latest comment it is duplicated.14:48
tkajinamhmmm maybe it might be a problem caused by something in my local. ignore it for now14:48
clarkbI think I've seen similar with gertty users14:49
clarkbnot sure if you use gertty14:49
tkajinamno I posted it from web interface14:49
fungii've accidentally done it with gertty's experimental inline comment threads patch by selecting the reply button on a comment after composing a reply to it already14:50
clarkbhttps://review.opendev.org/c/opendev/bindep/+/940711/ fungi's patchset 5 is the example I'm thinking of14:51
fungiyeah, that's where i saw i'd done it14:51
tkajinamok14:51
fungibut since it's all just rest api calls back to the gerrit server, i expect the webclient might retry to post a comment if it received an error or disconnect, and if the server had correctly processed the comment anyway then maybe you end up with two14:52
tkajinamyeah that's possible14:52
tkajinamI'll come back here in case I observe the same behavior frequently.14:52
tkajinamsorry for the noise !14:53
clarkbtkajinam: thank you for the haeds up and ya I think this is probably fine unless it becomes persistent. If that happens definitely let us know14:53
fungiplease do, maybe we can correlate them and figure out the commonalities14:53
fungidefinitely noy noise14:53
fungis/noy/not/14:53
tkajinam:-)14:54
tkajinamI've never seen Alex using gertty so I was wondering why his comment in patchset 2 is the legacy (non-inline comment, I mean).14:55
tkajinamthat's why I suspected something strange might be happening in comment feature.14:55
clarkbthat is also another hallmark of gertty usage. But maybe there is another client or using the ssh review feature?14:56
fungigertty currently doesn't support the newer thread-style comments (there's an experimental change to add support but it's still incomplete), so most gertty users end up leaving legacy comments14:56
tkajinamyeah14:56
clarkbI think if you do `ssh -p 29418 user@review.opendev.org gerrit review -m "message here"` you also get that behavior14:57
tkajinamI'll talk with him to know how it was posted.14:57
clarkbyou have to use the --json input to get the more modern inline commenting stuff (which the modern top level comment is a special case of)14:57
tkajinamah. ok14:59
tkajinamthere are still a lot of mystery in gerrit :-P14:59
clarkbya there is a special meta file name that if you comment on is a top level comment. That is the new behavior. The old behavior is you set a comment on the patchset and don't specific a file15:01
fungii'd say there's less mystery in gerrit than in proprietary code review platforms without available source code ;)15:02
clarkbfungi: can you check my comment on https://review.opendev.org/c/opendev/system-config/+/943999 as ianw won't be awake this time of day? I'm just trying to reason about the safety of capturing stderr like that for jobs that publish public logs15:04
clarkbI'm really happy the periodic buildset ran in just over half the time it took previously15:06
clarkbeven if we don't increase the semaphore limit that is a great roi and I think we can safely bump the limit up too15:07
fungianswered15:08
clarkbthanks! Wanted to make sure my dst jet lagged brain is keeping up15:09
clarkbjamesdenton: wanted to let you know that the new dfw3 region seems to be working well. We've also managed to switch sjc3 over to the new tenant/project that matches what is in dfw3. Thanks for the help getting that done.15:13
clarkbjamesdenton: also did you know that nova ssh keys are not project/tenant specific (we learned that the hard way when we deleted them using the old project/tenant in sjc3)15:13
jamesdentonGlad to hear about DFW3! But if you could elaborate a little more on the nova keys...15:14
fungiwe had some keypairs defined which we'd been using with the old project in sjc315:15
fungiwe didn't realize that they were the same objects being used for the new project in sjc3 under the same account15:15
fungii deleted the keypairs from the account thinking they were only being deleted from the old project, but it of course led to them being unavailable in the new project as well15:15
clarkband likely to be some ancient nova thing that we're just going to have to live with for backward compatibility reasons15:16
clarkbso nothing for you to change/address. Just an interesting behavior we discovered the hard way15:16
fungii didn't think to check that they were still there for the new project, so we were erroring for a little while with nodepool telling nova to use those keypair objects which no longer existed15:17
fricklerkeypairs are a user resource in nova15:17
fungiright, i learned that the hard way ;)15:17
clarkbour config management sorted it out automatically when it ran again to update cloud things so not a big deal either15:18
fungiarguably, we're abusing the keypair feature somewhat to do things it's not strictly intended for15:18
fungiwe could just inject all the ssh keys into our images instead15:19
clarkbthere are reasons to not do that though. Including people reusing our images15:19
clarkb(though we still bake in zuul's key)15:19
clarkbso we're only half addressing that problem15:19
clarkbjamesdenton: the other thing to note is in each region we have quota sufficient for 50 instances except for the memory limit. We can only fit 32 of our 8GB RAM instances into the memory quota we have currently. I'm not sure what capacity looks like in the new deployments but we're always happy to make use of more quota if possible. I think cloudnull mentioned a third region may come15:21
clarkbonline which is the other direction we can expand too15:21
jamesdentonfrickler thanks for the clarification on that!15:25
jamesdentonclarkb we can help with the quota, i think. 15:25
opendevreviewKarolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles  https://review.opendev.org/c/opendev/glean/+/94167215:27
clarkbfungi: I posted some additional followup to https://review.opendev.org/c/opendev/system-config/+/943999 with some further investigating and info gathering if you are curious15:46
clarkbtl;dr is yes stderr is going to zuul and can be viewed. Seems to be innocuous15:47
fungicool, thanks for confirming. that matches what i expected15:48
clarkbI didn't approve the change because I'm still not sure I trust my early morning brain and figure we can wait for ianw to review the comments before we land it15:49
fungiyeah, that one's not at all urgent15:49
clarkbshould we proceed wtih https://review.opendev.org/c/opendev/system-config/+/940928 to exercise parallel infra-prod more with merging changes?16:29
clarkbthat switches the haproxy and zookeeper statsd container "sidecars" to python3.1216:29
clarkbnow that the board meeting is over I should eat something too16:30
fungii've approved it16:35
clarkbit should merge shortly. Will probably end up behind the hourly jobs18:02
opendevreviewMerged opendev/system-config master: Start using python3.12  https://review.opendev.org/c/opendev/system-config/+/94092818:03
fungithere we go18:04
fungiand yes18:04
clarkboh those image updates don't end up triggering jobs for gitea-lb, zuul-lb, or zookeeper18:05
clarkbI guess I can write a change to fix that18:05
fungigood point18:05
opendevreviewClark Boylan proposed opendev/system-config master: Trigger related jobs when statsd images update  https://review.opendev.org/c/opendev/system-config/+/94406318:10
clarkbsomething like that should do it I think18:10
tonybAny ideas how to debug: ERROR: failed to solve: docker.io/opendevorg/python-builder:3.11-bookworm: failed to resolve source metadata for docker.io/opendevorg/python-builder:3.11-bookworm: failed to copy: httpReadSeeker: failed open: content at https://zuul-jobs.buildset-registry:5000/v2/opendevorg/python-builder/manifests/sha256:9dd6363ddd47c9093f0a14127cf73612b7b7e7ef39db50ab9b7e617d5b1a8e15?ns=docker.io not found: not found18:36
tonybfrom: https://zuul.opendev.org/t/zuul/build/f72e1c199d8e44fe9f0e944be74453a0/log/job-output.txt?severity=0#130818:36
clarkbtonyb: I suspect that is the buildset registry getting hit by the docker rate limit18:37
clarkbif you look in the buildset registry job logs for the registry itself you may be able to confirm18:37
tonybclarkb: Thanks.  I forgot to look there.  It just has 'Not found' and returns a 40418:42
clarkbtonyb: oh right I'm remembering now we theorize that what ahppens is the buildset registry says 404 I don't have that image. Then docker falls back to talking to docker.io directly then gets the rate limit error. but when it reports the errors it only reports the first of the two errors it received18:43
clarkbwe haven't confirmed that via code review or profiling but it seems to match up witn infrequent occurences due to hitting rate limits18:44
tonybHmm okay, I'll see what I can find.18:45
clarkbessentially we think this is afailure of docker to report errors sanely and we're getting the error that ins't really an error masking the actual problem (which we think is likely the rate limit throttling)18:45
clarkbas a side note: I think nodepool can switch over to using the mirrored python base images18:45
clarkband sidestep the whole issue because quay should be less problematic18:46
tonybYeah that all makes sense.18:46
tonybIt'd be nice if the ContainerFile "FROM" could support a list like FROM [doker.io/..., quay.io/...] AS Builder but that's a pipe dream and probably wouldn't work because of SHASUMs or somethign else I'm not considering18:48
clarkband docker just generally trying to lock down their walled garden18:48
tonybYeah18:48
corvusre zuul launcher and rax flex -- i suspect we need new image uploads for the new project; so i've triggered image builds19:16
clarkboh that makes sense since the old tenant/project was cleaned up and its images wouldn't be available in the new tenant/project19:17
corvusyeah, and i think zl isn't smart enough to know that changed (since the connection looks the same)19:17
fungii wasn't sure if anything needed to be restarted there19:23
clarkbtrigginer those rebuilds is currently through the api directly or the web ui right?19:24
fungiit did at least delete its prior images in the old sjc3 project19:24
clarkb(just so everyone else is aware of how to do that should they need to)19:24
clarkbthe statsd image update respin hit docker rate limits. I have rechecked it19:37
clarkbI'm going to pop out on a bike ride soon to get out before the rain arrives. But I'll be back and can help shepherd that in (it is I think a decent land system-config change and see that deploy is happy with parallel jobs in that pipeline candidate)19:38
tonybclarkb: FWIW I added a -1 and asked a question on https://review.opendev.org/c/opendev/system-config/+/943999 ... feel free to ignore if I'm wrong19:48
tonybclarkb: Also enjoy your ride19:49
clarkbtonyb: oh good catch I think you are right looking at the old side19:49
tonybhuzzah19:51
clarkbianw: ^ fyi that should be fixed. We can get to it if you prefer too, but wanted to give you the opportunity to weigh in on the comments as a whole19:54
tonybWhat's needed to +A 943216: Add option to force docker.io addresses to IPv4 | https://review.opendev.org/c/opendev/system-config/+/943216 ?  21:02
fungiprobably just someone needs to be around to spot issues so we can emergency revert it21:03
fungii'm happy to go ahead and approve it now21:03
tonybfungi: Thanks, Assuming you're also happy to keep an eye on things.  If not I'll be around tomorrow21:06
fungiyeah, i can21:06
fungiapproved now21:06
opendevreviewTony Breeds proposed openstack/diskimage-builder master: Add a tool for displaying CPU flags and QEMU version  https://review.opendev.org/c/openstack/diskimage-builder/+/93783621:13
opendevreviewTony Breeds proposed openstack/diskimage-builder master: Add a tool for displaying CPU flags and QEMU version  https://review.opendev.org/c/openstack/diskimage-builder/+/93783621:16
Clark[m]tonyb: fungi: and we want to check prod remains unaffected as expected when that lands22:15
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: redirect via ansible logger  https://review.opendev.org/c/opendev/system-config/+/94399922:17
ianwtonyb: thanks for checking that.  now i'm worried about the testing :)22:18
clarkbianw: I don't think that playbook is tested at all :(22:18
clarkbthat change will be a good one to exercise infra-prod parallel stuff and may actually cause the load balancer statsd stuff ot update too22:19
ianwi'm actually wavering on it now looking at the testing we do do22:20
ianwhttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c1c/943999/2/check/system-config-run-base/c1c2ea4/bridge99.opendev.org/ansible/22:20
ianwone thing that "ansible.log" has that the stdout capture doesn't have is timestamps22:20
clarkbthats a good call out. That said we do put the header in the file so have rough timing. But also on the remote side syslog records that actions too22:21
clarkbso its not impossible to find precise timing but is definitely more annoying22:21
ianwif we set ANSIBLE_LOG to /var/log/ansible/<service>.yaml.log (and get the timestamps) what do we do with the stdout output?22:22
ianwit then becomes redundant22:22
clarkbif they are equivalent we probably want to redirect stdout to /dev/null?22:22
clarkbjust to avoid future confusion over all this when we wonder why they are go into two places22:23
ianwthen i worry "does ansible put out anything on stdout that might not be in the .log file it writes"?22:23
ianw:)22:23
clarkbmaybe someone from the ansibel world can answer that question for us. Is bcoca still around?22:24
ianwwe have stdout_callback=debug which is why i think we get more info coming out of stdout22:24
clarkbfwiw I never realized we had /var/log/ansible.log recording anything and always relied on the redirected stdout file content and never had an issue with it22:26
clarkbor at least not one that I believe the log output would've addressed22:26
clarkbso if we want to stick with it due to expected increase in verbosity I think that is fine22:26
opendevreviewMerged opendev/system-config master: Add option to force docker.io addresses to IPv4  https://review.opendev.org/c/opendev/system-config/+/94321622:34
ianwyeah, if we add a '<service>.yaml.stdout.log' it requires quite a lot of extra stuff in the post-production playbooks22:34
clarkbconfirmed that 943216 just enqueued jobs that should update the statsd containers for us22:35
clarkbthats good two birds one stone here22:35
clarkbgitea's statsd has restarted22:37
clarkboh zuul-db ran and finished not zuul-lb. That is next so zuul's statsd should update shortly22:37
clarkbhttps://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-5m&to=now&timezone=utc shows a brief blip in stats but we are getting data again so I'm ahppy with that for now22:38
clarkband checking /etc/hosts on gitea-lb02 and zuul-lb02 it looks untouched as expected so thats great. Thank you tonyb for that update. I think that should make our CI jobs a lot more reliable when interacting with docker hub22:39
clarkbhttps://grafana.opendev.org/d/39b50608de/zuul-load-balancer?orgId=1&from=now-5m&to=now&timezone=utc zuul-lb stats lgtm too. Short blip and then back to normal22:39
clarkband as an exercise of infra-prod jobs runnign in parallel in the deploy pipeline this is looking great22:40
clarkbthe gitea job failed bceauase gitea09 returned a 500 error trying to check fi the gerrit user is present22:43
clarkbloading /opendev/system-config also produces a 500 error for me22:44
clarkb[E] GetUserByName: Error 1040 (08004): Too many connections22:45
clarkbI think that is the database complaining about too many connections but I'm still trying to run it down22:45
clarkbyes mariadb log confirms it is closing connections prior to auth completing due to too many connections22:46
clarkbit does look like gitea has a lot of connections open22:48
clarkbwe have local config to bump the connection limit up to 20022:49
clarkbcurrently only gitea09 seems to be in this state. I'm going to manually shut it and its db down then start it up again so that gerrit replication doesn't fall too far behind22:50
clarkbbut then I'll follow that up wit ha change to increase the connection limit22:50
fungiyeah, the deploy run lgtm22:51
fungii'll keep an eye out for any more system-config job failures that look like dockerhub rate limit issues22:53
fungihopefully now there will be fewer22:53
clarkbgitea09 immediately reentered the same state22:55
clarkblooking at access logs it looks like things may be hitting :3000 directly now22:55
clarkbso are bypassing the apache filters22:55
clarkbthough I'm not sure if the filters would've been effective for the set of requests22:55
clarkbI half suspect that we're getting a mariadb connection per request22:56
fungihuh22:56
clarkbnot sure what the best option is here. Can shut gitea09 services down. Then maybe bump mariadb connection limits and block port 3000 direct access?22:56
fungithis is random clients hitting 3000/tcp?22:57
clarkbwhois says it is alibaba cloud ips but yes22:57
fungibut yeah, we could definitely just limit it to listening on the loopbck22:57
fungiloopback22:57
clarkband they don't seem to be taking the 500 error as a clue to go away22:58
clarkbit is lamost certainly an AI crawler bot as it appears to be going file by file and commit by commit through everything22:58
clarkbI've just double checked and haproxy is using the apache ports so we can safely block :3000 from the world22:59
clarkblets start there before we worry about mariadb connection limits23:00
fungiagreed23:00
fungithe only reason we have to leave 3000 open is for bypassing/ruling out apache issues, but we could also limit access to it with iptables and allow haproxy to reach it if we need that for some reason23:01
clarkbwe also use port 3000 for management but I think we do that from localhost so this should actually improve things for us as we get direct access for management23:01
clarkband then everyone else has to go through the proxy23:01
fungioh, right, authenticated admin access, but yeah ssh port forward wfm23:02
clarkbthe LE certs specify a port of :3000 for the host specific name23:02
clarkbfungi: in this case its ansible doing management stuff and it just runs against localhost via ansible which is kinda like a port forward23:02
fungioh, that too23:02
clarkbI don't think the :3000 in the certs is a big deal though as my browser doesn't complain when I use :308123:02
clarkbbut I'm not sure23:02
fungii don't think browsers care, no23:02
ianw(just confirming that ANSIBLE_LOG_PATH=file ANSIBLE_STDOUT_CALLBACK=... means that the log file gets the output of the selected stdout callback.  or to say that another way, the log file captures the ansible stdout.  it's not like we can have dense output from the command-line, but have it logging in the background debug level info)23:03
fungii use le issued https certs for smtp, imap, pop3 and irc on some servers without any trouble too, for that matter23:03
opendevreviewClark Boylan proposed opendev/system-config master: Drop public port 3000 access for Gitea  https://review.opendev.org/c/opendev/system-config/+/94408123:04
clarkbI'm thinking maybe we manually apply ^ to gitea09 and then fi there are no problems with that by tomorrow morning land 944081?23:05
fungiyeah, that's what i was just looking at as the easiest option, no need to restart anything23:06
clarkbiptables -I openstack-INPUT -p tcp --dport 3000 -j DROP ?23:07
clarkband ip6tables23:08
clarkbneed to use -I got get it ahead of our accept rule23:08
clarkb-A would put it behind and it wouldn't take effect23:08
clarkbI'm going to do that23:10
fungiyeah23:10
clarkbthe access log went quiet and load is dropping. I still don't get a useful response from the service yet23:12
clarkbI may try restarting things again if that persists23:12
clarkbI think my rule is actually problematic I think youcan't hit :3000 via localhost anymore either with that rule?23:14
clarkbya source is 0.0.0.0/023:14
clarkbit needs to go in rule slot 2 or 3 depending on whether or not iptables 0 indexes23:15
clarkbgive me a minute and I'll delete that rule and apply it after the localhost accept rules 23:15
clarkbthat seems to be happier now23:18
clarkbbut system-config also appears to be behind as anticipated23:19
clarkbI'm going to trigger gerrit replication to gitea09 now23:19
clarkbfungi: can you double check the iptables rules on gitea09 look correct to you?23:20
clarkbI ended up doing -I openstack-INPUT 5 that rule from above then -D openstack-INPUT 1 with both iptables and ip6tables23:20
fungilooking23:22
clarkbalibaba's abuse reporting portal doesn't seem to have an option for "someone is being bad but maybe not strictly illegal"23:24
clarkbreplication is in progress against gitea0923:24
fungiyeah, that looks fine, though note that the actual desired state from 944081 is going to be deleting the existing port 3000 allow rules rather than a separate block rule23:24
clarkbfungi: yup I guess I could've just -D'd that specific rule23:25
clarkbbut I think this is fine we should update the config and then it will upate iptables in place and the new setup should be roughly equivalent?23:25
fungianyway, my guess is that a crawler stumbled across a http://gitea09...:3000 url mentioned in our irc channel logs and just kept spidering from there23:25
clarkbor some public cert registry23:26
clarkba followup to this probably wants to edit our LE certs to drop the :300023:26
fungiyes, i think this is a sufficient test23:26
fungiand agreed, the port specification is unnecessary for the certs, i expect23:26
clarkbthese crawlers are such a nuisance though. Look at robots.txt respect the crawl delay. If you get massive quantities of 500 errors maybe you should look at what you are doing etc23:27
clarkbreplication is almost half done23:29
clarkbfungi: 944081 is also a good sanity check that blocking external port :3000 doesn't break out automation. I don't think it will but good to double check23:30
fungiyep23:31
fungias for irc logs being a likely entrypoint for the crawlers, we frequently test gitea09 and so mention it a lot more often. doesn't seem like they're hitting any of the other backends23:32
clarkban for :3081 I suspect apache would meltdown before gitea did. Not dieal but at least our automation would keep working23:33
clarkbthe go webserver built into gitae is just too good at accepting all the connections23:33
clarkbmore than halfway done now. About 1k tasks remaining in the gerrit queue23:34
clarkbif anyone wants to look at the historical logs for this /var/gitea/logs/access.log on gitea0923:35
clarkb'HTTP/1.1" 500' is a search string that should work23:35
clarkbsystem-config master is up to date on gitea09 now23:38
clarkbbut still about 500 tasks remaining23:38
clarkbit is replicating one nova change meta ref. Everything else has replicated23:42
clarkband now that is done too23:42
clarkbso ya I think once we are satisfied blocking port 3000 isn't a problem we apply that globally and move on. Gitea09 should be good for now23:43
fungiany idea if the user agent(s) for the offenders were already in our filter list?23:45
clarkbno but that is a good thing to check doing so now23:47
clarkbfungi: it is in the filter list23:49
clarkbso we've probably discovered this one before and blocked it at the apache level then they discoverd a backdoor23:49
clarkbI'm being askedquestions about dinner now. I think we're stable and can followup in the morning with the port block and anything else we decide we need to do23:51
opendevreviewIan Wienand proposed opendev/system-config master: install-root-key : run on localhost  https://review.opendev.org/c/opendev/system-config/+/94408423:58
ianw^ one for tomorrow :)  23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!