*** DSpider has quit IRC | 00:15 | |
*** ysandeep|away is now known as ysandeep | 00:26 | |
*** hamalq has quit IRC | 00:43 | |
clarkb | fungi: -v/opt/project-config/gerrit/projects.yaml:/home/gerrit2/projects.yaml in /usr/local/bin/manage-projects is the problem with the command I ran. We're ignoring the content in the gerrit2 homedir | 01:13 |
---|---|---|
clarkb | which is confusing because that is where we look according to defaults/config but we mount /opt/project-config to that location to solve that | 01:14 |
clarkb | anyway we can either change the mount or change project-config contents and try again, but tomorrow | 01:14 |
*** ykarel|away has joined #opendev | 01:29 | |
*** ysandeep is now known as ysandeep|afk | 01:29 | |
*** ykarel|away is now known as ykarel | 01:31 | |
*** fressi has joined #opendev | 02:16 | |
openstackgerrit | sebastian marcet proposed opendev/system-config master: OpenstackId v3.0.16 https://review.opendev.org/758322 | 03:24 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: fetch-sphinx-tarball: explain what is happening https://review.opendev.org/758323 | 03:49 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: fetch-sphinx-tarball: don't run merge-output-to-logs https://review.opendev.org/758324 | 03:49 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Revert "Revert "Refactor fetch-sphinx-tarball to be executor safe"" https://review.opendev.org/758325 | 03:49 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Revert "Revert "Refactor fetch-sphinx-tarball to be executor safe"" https://review.opendev.org/758325 | 03:56 |
*** ykarel_ has joined #opendev | 04:12 | |
*** ykarel has quit IRC | 04:14 | |
openstackgerrit | Merged opendev/system-config master: tarballs: remove incorrect redirects https://review.opendev.org/758259 | 04:32 |
*** ykarel has joined #opendev | 04:49 | |
*** auristor has quit IRC | 04:50 | |
*** ykarel_ has quit IRC | 04:51 | |
*** auristor has joined #opendev | 04:53 | |
*** ykarel has quit IRC | 05:13 | |
openstackgerrit | Ian Wienand proposed opendev/base-jobs master: Run merge-output-to-logs on the executor https://review.opendev.org/758341 | 05:29 |
*** sboyron has joined #opendev | 05:31 | |
openstackgerrit | Merged opendev/system-config master: Add four more gitea ddos UA strings https://review.opendev.org/758219 | 05:34 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 05:39 |
*** fressi has quit IRC | 05:53 | |
*** jaicaa has quit IRC | 06:00 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 06:01 |
*** jaicaa has joined #opendev | 06:02 | |
*** mkalcok has joined #opendev | 06:27 | |
*** ykarel has joined #opendev | 06:29 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 06:34 |
*** slaweq has joined #opendev | 06:35 | |
*** eolivare has joined #opendev | 06:36 | |
*** roman_g has joined #opendev | 06:38 | |
*** ysandeep|afk is now known as ysandeep | 06:38 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 06:44 |
*** ralonsoh has joined #opendev | 06:53 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 06:53 |
*** andrewbonney has joined #opendev | 07:01 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 07:05 |
*** hashar has joined #opendev | 07:05 | |
*** ykarel is now known as ykarel|afk | 07:10 | |
openstackgerrit | Logan V proposed openstack/project-config master: Revert "Disable limestone provider" https://review.opendev.org/758354 | 07:10 |
openstackgerrit | Logan V proposed openstack/project-config master: Revert "Disable limestone provider" https://review.opendev.org/758354 | 07:11 |
*** ykarel|afk has quit IRC | 07:11 | |
*** rpittau|afk is now known as rpittau | 07:22 | |
*** fressi has joined #opendev | 07:31 | |
*** cloudnull has quit IRC | 07:33 | |
*** cloudnull has joined #opendev | 07:34 | |
frickler | mnaser: RIPE atlas confirms that opendev.org is not reachable from AS3320 (German incumbant ISP) via IPv6 https://atlas.ripe.net/measurements/27671479/#!probes | 07:42 |
*** tosky has joined #opendev | 07:42 | |
frickler | mnaser: I'll try to dig further, but this may be related to you only announcing the /48 for that net and not the covering /32 | 07:43 |
*** dmellado has quit IRC | 07:43 | |
*** dmellado has joined #opendev | 07:47 | |
openstackgerrit | Merged openstack/project-config master: Revert "Disable limestone provider" https://review.opendev.org/758354 | 07:48 |
*** ysandeep is now known as ysandeep|lunch | 08:03 | |
AJaeger | infra-root, infra-prod-service-nodepool failed for this ^ https://zuul.opendev.org/t/openstack/build/a18272805d054fddb30d020bde35031c . Could you check, please? | 08:11 |
frickler | AJaeger: looking | 08:14 |
*** lourot has quit IRC | 08:15 | |
*** lourot has joined #opendev | 08:16 | |
frickler | infra-root: that log has nb03 unreachable, when I try to ssh to it, I get "ssh_exchange_identification: Connection closed by remote host". might be related to the rackspace ticket mentioned earlier. | 08:17 |
AJaeger | thanks, frickler | 08:19 |
AJaeger | frickler: the rackspace ticket mentioned different nodes - but that happens sometimes with rackspace AFAIR | 08:19 |
frickler | actually nb03 isn't on rackspace, but on packethost according to the IP. but I can't find that instance there. will need to dig deeper later | 08:23 |
frickler | also nb03.opendev.org seems to be missing in cacti, that would've possibly allowed to get a hint as to when the issue started | 08:46 |
frickler | finally found the node, it's on linaro-us. console log shows "INFO: task jbd2/dm-3-8:17697 blocked for more than 120 seconds." so some kind of disk issue, will try to reboot via the api now | 09:13 |
frickler | o.k., that seems to have worked, I can log in again | 09:15 |
frickler | #status log rebooted nb03.opendev.org via openstack API after it seemed to have gotten stuck due to disk IO issues | 09:16 |
openstackstatus | frickler: finished logging | 09:16 |
*** tkajinam is now known as tkajinam|away | 09:17 | |
*** tkajinam|away is now known as tkajinam | 09:17 | |
*** ysandeep|lunch is now known as ysandeep | 09:24 | |
*** hashar has quit IRC | 09:46 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: WIP: Create upload-logs-with-failover role https://review.opendev.org/758380 | 09:49 |
ianw | frickler: thanks, nb03 is the arm builder | 09:49 |
*** DSpider has joined #opendev | 09:56 | |
*** ysandeep is now known as ysandeep|afk | 11:07 | |
*** sboyron has quit IRC | 11:15 | |
*** sboyron has joined #opendev | 11:16 | |
*** elod is now known as elod_afk | 11:26 | |
*** hashar has joined #opendev | 11:45 | |
*** ysandeep|afk is now known as ysandeep | 12:35 | |
*** elod_afk is now known as elod | 12:39 | |
*** ykarel has joined #opendev | 13:01 | |
fungi | clarkb: ianw: the additional ua strings in 758219 either don't seem to have taken effect or their addition has no clear impact on the established connections graph: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all | 13:18 |
fungi | in theory they were deployed by 06:00 | 13:18 |
mnaser | frickler: we announce a /48 in each region because we have one big /32 .. dont think that should be a wild problem | 13:37 |
fungi | it would only be a problem if the isp in question has bgp filters excluding longer prefixes | 13:39 |
fungi | (or if whoever they get their bgp feeds from does) | 13:39 |
fungi | granted, it's not unheardof, and so not uncommon for folks to also announce their aggregates alongside the longer prefixes to help in such situations | 13:40 |
fungi | worst case the packets make it most of the way to the wrong facility and get redirected via a less optimal path | 13:41 |
fungi | but really they only have to get as far as the first hop which has a more complete bgp table and then they'll start getting matched against the longer prefix | 13:42 |
fungi | it's worth noting that there are recommendations floating around to filter longer prefixes from ranges which are normally used to assign shorter ones | 13:46 |
fungi | https://www.space.net/~gert/RIPE/ipv6-filters.html | 13:46 |
fungi | on the expectation that you'd request a new allocation from your rir for each multi-homed facility | 13:48 |
openstackgerrit | Aurelien Lourot proposed openstack/project-config master: Mirror charm-neutron-api-plugin-ironic to GitHub https://review.opendev.org/758429 | 13:49 |
lourot | ^ sorry I forgot that one in yesterday's review | 13:49 |
*** sshnaidm has quit IRC | 13:54 | |
*** fressi has quit IRC | 13:55 | |
fungi | mnaser: so in that example, i think following the "ipv6 prefix-list ipv6-ebgp-strict permit 2600::/12 ge 19 le 32" recommendation would cause your 2604:e100:3::/48 announcements to be dropped | 13:56 |
mnaser | honestly i don't know why they would be dropping /48 announcements -- its not like ipv6 tables are gigantic :) | 13:59 |
*** sshnaidm has joined #opendev | 14:00 | |
fungi | i concur, my opinion is that isps should simply upgrade their routers if they need enough memory to carry a full bgp table, but then again i've been out of the business for nearly a decade | 14:01 |
fungi | it's probably seen as insurance against someone with a /28 announcing all 2^20 of their possible /48 prefixes | 14:03 |
clarkb | fungi: check /var/gitea/logs/access.log to see if they are making itthrough | 14:11 |
clarkb | note the conneections have to establish long enough for us to 403 them and that ~2k connections may be steady state for that | 14:12 |
*** sgw has left #opendev | 14:16 | |
*** tkajinam has quit IRC | 14:20 | |
*** hashar has quit IRC | 14:25 | |
*** ysandeep is now known as ysandeep|away | 14:43 | |
ttx | docs.openstack.org is down? | 14:47 |
ttx | (from openstack-discuss) | 14:47 |
icey | looks like a lot of the static bits are having trouble, I've got spotty acces to tarballs.opendev.org for a while as well | 14:48 |
ttx | probably all the Victoria downloads (kidding) | 14:48 |
icey | (well, that's what _I'm_ doing :-P ) | 14:51 |
AJaeger | any infra-root around to check docs.o.o, please? ^ | 14:52 |
clarkb | [Thu Oct 15 07:11:15 2020] afs: Waiting for busy volume 536870992 () in cell openstack.org is the most recent afs complaint according to dmesg | 14:53 |
*** sgw has joined #opendev | 14:53 | |
bbezak | tarballs.opendev.org is not accessible either | 14:53 |
clarkb | bbezak: they are just different vhosts on the same server and serving out of afs too | 14:54 |
clarkb | trying to sort out if the problem is afs or apache | 14:54 |
clarkb | I think afs. afs01.dfw.openstack.org seems sad. I can't ssh to it. Going to check the console next | 14:55 |
clarkb | (I really wish I understood why afs doesn't fall over like it is supposed ot) | 14:55 |
clarkb | (thats like half the reason we run afs) | 14:55 |
clarkb | oh wait trying ssh again succeeds so maybe not afs | 14:56 |
*** iurygregory has quit IRC | 14:57 | |
*** iurygregory has joined #opendev | 14:57 | |
clarkb | afs is navigable on static01 | 14:57 |
clarkb | time to look at apache I guess | 14:58 |
fungi | yes, apache seems like it might have gotten sad somehow | 14:59 |
clarkb | error logs don't seem to have anything useful though | 14:59 |
fungi | scorecard full? | 15:00 |
fungi | just guessing | 15:00 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68065&rra_id=all implies not (not sure we have the localhost debugging set up on this vhost to check directly) | 15:00 |
corvus | we have no text browsers on static01? :( | 15:01 |
clarkb | should we restart it ? | 15:01 |
fungi | fungi@static01:~$ wget -O- https://security.openstack.org/ | 15:01 |
fungi | Connecting to security.openstack.org (security.openstack.org)|2001:4800:7818:101:be76:4eff:fe04:7c28|:443... | 15:01 |
fungi | and it just sits there | 15:02 |
corvus | strace on 6155 has it reading from a pipe | 15:02 |
corvus | (maybe connection to master proc) | 15:02 |
fungi | cacti graphs don't show much out of the ordinary other than increase in established tcp connections and a drop in tcp opens (both of those seem like symptoms not causes) | 15:03 |
*** auristor has quit IRC | 15:04 | |
fungi | looks like whatever it is may have started around 12:45 | 15:04 |
corvus | the master process is in a select/wait loop | 15:04 |
clarkb | apache was reloaded about 11 hours ago to pick up the fix for tarballs redirects? | 15:04 |
clarkb | but the main process has been running for almost 2 weeks | 15:05 |
fungi | looks like we may have mod_status plumbed but unable to connect to loopback port 80 to get at it | 15:05 |
corvus | i'm out of debugging ideas and think we should restart | 15:05 |
fungi | so yeah, at this point probably a service restart is our only next step | 15:06 |
fungi | i concur | 15:06 |
corvus | (i see no dmesg entries around 12:45) | 15:06 |
clarkb | wfm | 15:06 |
corvus | (nothing interesting in syslog around that time either) | 15:07 |
clarkb | fwiw tail *_error.log shows some attempts at maybe buteforcing /etc/passwd GETs about 20 minutes prior to 12:45 but that is all the stands out to me there | 15:07 |
clarkb | apache happilly logged those as invalid URIs | 15:08 |
fungi | yeah, i saw nothing of note in any logs | 15:08 |
fungi | it's like it just magically deadlocked | 15:08 |
clarkb | who is doing the restart? should I do it? | 15:08 |
corvus | not i | 15:08 |
clarkb | I'll do it now | 15:08 |
fungi | clarkb: if you can, i'm in a meeting right now | 15:09 |
fungi | thanks! | 15:09 |
clarkb | and tarballs.opendev.org loads for me now | 15:09 |
clarkb | ttx: icey bbezak maybe you can confirm it is happier for you now too | 15:10 |
clarkb | where it == thing you noticed was sad | 15:10 |
icey | clarkb: curl jus tworked for me | 15:10 |
icey | just worked | 15:10 |
icey | unfortunately (or fortunately) I'm handing off the thing I was doing that caused me to notice to somebody else as it's EoD :) | 15:10 |
*** auristor has joined #opendev | 15:11 | |
clarkb | fungi: do you know where we'ev configured mod status? grepping for status in system-config/playbooks/roles/static doesn't show it | 15:12 |
fungi | clarkb: /etc/apache2/mods-enabled/status.conf | 15:12 |
fungi | clarkb: wget -O- http://127.0.0.1/server-status | 15:12 |
fungi | mmm, that seems to get getting overridden | 15:13 |
fungi | ahh, no it's working | 15:13 |
clarkb | ah ok I thought we had to manually configure that, didn't realize that ubuntu was provided it out of the box | 15:14 |
clarkb | TIL and now I know to do a ssh proxy and check that out for next time | 15:14 |
fungi | unsusprisingly the scoreboard shows a lot of available workers/slots for now | 15:14 |
fungi | it was timing out for me like everything else before the restart | 15:14 |
clarkb | oh ha | 15:15 |
bbezak | yeah, looks ok now, thx | 15:18 |
fungi | wish we knew why apache just seized up | 15:18 |
fungi | that's gonna bug me now | 15:18 |
fungi | looks like whatever started around 12:45 may have been a precursor, because graphs show the server basically stopped serving anything between 14:45 and 15:05 when it was restarted | 15:22 |
openstackgerrit | Merged opendev/system-config master: OpenstackId v3.0.16 https://review.opendev.org/758322 | 15:22 |
fungi | so we probably want to keep a close eye on the scoreboard and access logs for a little while | 15:22 |
*** mlavalle has joined #opendev | 15:22 | |
fungi | the e-mail to openstack-discuss was dated 14:41 utc | 15:23 |
clarkb | I half wonder if we should cross check our gitea UAs against static apache logs too | 15:23 |
clarkb | fungi: I'm not seeing those 4 UAs I added in the gitea access logs anymore. I think that level of connections is what we see when dealing out HTTP403s to the bad bot(s) | 15:24 |
fungi | oh, entirely possible | 15:24 |
*** mkalcok has quit IRC | 15:24 | |
clarkb | basically we're doing a bunch of cheap connections to send 403s back to the bot rather than a bunch of expensive connections trying to serve all those git commits and files to the bot | 15:25 |
fungi | #status log restarted apache on static.opendev.org (serving most static content and documentation sites) at 15:09 utc to recover from an unexplained hung process causing site content not to be served | 15:26 |
openstackstatus | fungi: finished logging | 15:26 |
ttx | LGTM! thanks | 15:26 |
fungi | outbound traffic levels for static.o.o have picked way up after the restart | 15:27 |
fungi | around 80mbps at the moment, though nothing out of the ordinary for our weekly peaks | 15:28 |
fungi | scoreboard still looks okay for now | 15:29 |
*** marios has joined #opendev | 15:34 | |
*** sgw has left #opendev | 15:39 | |
*** sgw has joined #opendev | 15:42 | |
*** fressi has joined #opendev | 15:43 | |
fungi | the scoreboard is starting to fill up now, though not looking terrible yet | 15:44 |
*** fressi has quit IRC | 15:44 | |
*** hashar has joined #opendev | 15:44 | |
fungi | and now it's not serving a response to me | 15:44 |
fungi | oh, there it went, just took a moment | 15:45 |
fungi | fwiw, there's a bunch of ip addresses retrieving many different files from tarballs.o.o/osf/openstackid/ | 15:45 |
fungi | almost all the slots are in a "sending reply" state now | 15:46 |
*** rpittau is now known as rpittau|afk | 15:52 | |
fungi | i just got a connection timeout trying to get the server status | 15:58 |
fungi | requests i'm seeing tailing the tarballs.opendev.org access log do look like the ones we'd expect from weibo :/ | 16:00 |
fungi | oh, i finally managed to get server status to reply to me agani | 16:01 |
fungi | again | 16:01 |
fungi | yeah, it's looking like weibo bots were each fetching copies of all the osf/openstackid tarballs and some are now moving on to starting to get the osf/groups tarballs | 16:03 |
*** eolivare has quit IRC | 16:10 | |
*** ysandeep|away is now known as ysandeep | 16:27 | |
*** marios is now known as marios|out | 16:32 | |
openstackgerrit | Noam Angel proposed openstack/diskimage-builder master: kill chroot processes before "dib-block-device umount" https://review.opendev.org/758465 | 16:35 |
openstackgerrit | Noam Angel proposed openstack/diskimage-builder master: kill chroot processes before "dib-block-device umount" https://review.opendev.org/758465 | 16:36 |
*** marios|out has quit IRC | 16:39 | |
*** lpetrut has joined #opendev | 16:44 | |
*** hashar is now known as hasharTurns42 | 16:51 | |
fungi | activity on the server seems to have calmed back down now, and the scoreboard is mostly open/waiting | 16:52 |
*** lpetrut has quit IRC | 16:59 | |
*** hamalq has joined #opendev | 17:02 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives https://review.opendev.org/758469 | 17:07 |
clarkb | fungi: ^ something like that | 17:08 |
clarkb | I've updated sjc1's mirror by hand to match ^ if people want to check it | 17:13 |
fungi | i can confirm, my personal sites are all only using require directives, no order or satisfy anywhere | 17:14 |
*** ykarel is now known as ykarel|away | 17:19 | |
*** andrewbonney has quit IRC | 17:20 | |
*** ysandeep is now known as ysandeep|away | 17:24 | |
*** roman_g has quit IRC | 17:37 | |
clarkb | related does anyone know why we have https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/gerrit.vhost.j2#L72-L81 ? | 17:40 |
*** Vadmacs has joined #opendev | 18:20 | |
*** ralonsoh has quit IRC | 18:22 | |
dmsimard | pardon me for losing track but do git.o.o and opendev.org end up at the same place ? seeing clones on git.o.o hang but not opendev.org | 18:38 |
clarkb | dmsimard: git.o.o should be a redirect to opendev.org | 18:39 |
clarkb | opendev.org does not support git:// so if you're doing git://git.openstack.org that will break | 18:39 |
clarkb | need to be http or https | 18:39 |
dmsimard | over https yeah | 18:39 |
clarkb | ah git.o.o is also hosted on static which has been sad today | 18:40 |
clarkb | I think its been happier recently though, is this an ongoing problem? or was it only happening between ~12:45UTC and 15:10UTC? | 18:40 |
dmsimard | I'm reproducing it right now | 18:40 |
dmsimard | i.e, git clone https://git.openstack.org/openstack/ospurge | 18:41 |
dmsimard | I can update our URLs to use opendev.org but I thought I'd point it out | 18:41 |
clarkb | ya this seems to be the same overwhelmed server state | 18:41 |
clarkb | I wonder if its actually the git vhost that is eating all the connections on that server | 18:41 |
clarkb | you really should be using opendev.org | 18:41 |
dmsimard | sure thing | 18:42 |
dmsimard | I don't know who/what else might be using git.o.o out there though | 18:42 |
clarkb | hrm it says we have idle workers in apache though | 18:42 |
*** ykarel|away has quit IRC | 18:43 | |
clarkb | low system load, plenty of memory, and we have free connections whats up | 18:43 |
dmsimard | are you seeing the issue as well ? | 18:43 |
clarkb | ya | 18:44 |
clarkb | I mean it works but its laggy | 18:44 |
clarkb | like apache doesn't have a free connection to give me when I try to git clone from it | 18:44 |
clarkb | (in reality its a free connection to serve the redirect) | 18:44 |
clarkb | but its still on the order of a few seconds | 18:45 |
clarkb | noticeable but not end of the world | 18:45 |
clarkb | I wonder if it is iowait reltaed to afs reads somehow? | 18:46 |
clarkb | would that bog down the rest of apache when all it needs to do is serve a redirect? | 18:46 |
dmsimard | good question | 18:46 |
clarkb | (side note is this the first release where tarballs have all been in afs?) | 18:46 |
dmsimard | sometimes it completes in a few seconds, others not http://paste.openstack.org/show/799096/ | 18:48 |
clarkb | fungi: when I can get server-status it shows a happy server then other times it times out | 18:48 |
fungi | yeah, same here | 18:48 |
clarkb | we could try tuning the mpm configs to allow many more connections (the server seems super idle) | 18:49 |
clarkb | and maybe we're just getting syn flooded? | 18:49 |
fungi | cacti graphs suggest the problem has spiked up again as of ~18:15 | 18:50 |
clarkb | top reports low wai which I think rules out the iowait theory? | 18:50 |
fungi | yeah | 18:51 |
clarkb | should we try adopting a mpm tuning config more similar to etherpad's? | 18:51 |
fungi | also the server status extended report indicates most of the requests are still tarballs for osf/openstackid and osf/groups | 18:51 |
clarkb | fungi: is that smarcet ddos'ing us? | 18:52 |
fungi | if he's got hundreds of random ip addresses in china who all want to download old versions of openstackid, then maybe | 18:52 |
clarkb | I can copy over the mpm config from etherpad and restart apache on static if we want to give that a shot | 18:52 |
clarkb | (it should dramatically increase the number of allowed connections) | 18:53 |
fungi | well, we don't seem to be at risk of filling them up at the moment | 18:53 |
clarkb | it seems like maybe we are though? otherwise why does apache not respond when I try to connect sometimes? | 18:54 |
dmsimard | conntrack ? | 18:54 |
clarkb | (I guess that is something we could rule in or out by increasing the limits) | 18:54 |
clarkb | I don't think we run any special conntrack rules here | 18:54 |
clarkb | just basic iptables port and source ip based rules | 18:54 |
fungi | the open tcp connections reported by cacti is hovering around the same level as the total connections for our apache workers | 18:55 |
clarkb | ya that is why I wondered if it could be a syn flood. | 18:55 |
clarkb | that may nto show up on our graphs as readily? | 18:55 |
fungi | and the scoreboard is reporting something like 100 idle connections as of a minute ago | 18:56 |
*** ozzzo has joined #opendev | 18:56 | |
fungi | but yeah, can't hurt to try *if* we have the available cpu/ram for it | 18:56 |
clarkb | 14.8 requests/sec 158.4 kB/request | 18:56 |
fungi | looking at the uas for all these tarball requests the workers seem to be spending time on, they look like the same set we blocked in gitea's apache | 18:57 |
clarkb | oh hahahahah | 18:57 |
clarkb | I mean I have to laugh otherwise I'll be sad | 18:58 |
clarkb | ok instead of changing our mpm tuning shoudl we add the same set of UA blockign rules? | 18:58 |
fungi | that might be more useful | 18:58 |
clarkb | ++ do you want to put that chagne together? I'm missing lunch right now but can write it if you're in a bad spot for it | 18:59 |
fungi | connections has jumped way up too, we have one worker with 25 active connections and another 25 closing | 18:59 |
fungi | yeah, should be able to give it a shot, though i guess we need to include the same deny rules into every vhost? i guess i could start with the tarballs | 19:00 |
clarkb | ya | 19:02 |
clarkb | looks like it users mod headers and mod rewrite | 19:02 |
fungi | we have both enabled already, looks like | 19:03 |
fungi | i've manually inserted those rules into the tarballs.opendev.org vhost to see if they help | 19:05 |
*** roman_g has joined #opendev | 19:06 | |
fungi | churn in that vhost's access log has died way down | 19:06 |
fungi | connection counts are dropping in the scoreboard too | 19:06 |
clarkb | ya I can refresh it quickly | 19:08 |
clarkb | definitely looking healthier | 19:08 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add nim roles and job https://review.opendev.org/747865 | 19:08 |
fungi | assuming this has a continued impact, i'll see if we can centralize the list of patterns so it can be included into multiple servers without carrying copies around | 19:08 |
clarkb | that would be excellent | 19:09 |
fungi | probably just install it as a dedicated config file defining a macro, and then call that macro in each vhost | 19:09 |
fungi | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68065&rra_id=all has just updated and shows a dramatic drop | 19:10 |
fungi | i can browse the tarballs site just fine still too | 19:11 |
fungi | but the scoreboard is now showing almost no tarballs vhost request in progress, whereas before those were the vast majority | 19:12 |
fungi | er, not the scoreboard but the extended status data | 19:12 |
fungi | though the scoreboard is settling to around 1/8 of the earlier connection count | 19:13 |
fungi | 15 to 1/8 | 19:13 |
fungi | er, 1/5 to 1/8 | 19:13 |
fungi | not quite an order of magnitude drop, but substantial nonetheless | 19:13 |
clarkb | cool I'm really going to take a break now for lunch and bike ride and all that, back later today | 19:16 |
fungi | i'll keep an eye on this for a little longer and then look at a possible patch | 19:18 |
*** hasharTurns42 is now known as hashar | 19:22 | |
*** hashar has quit IRC | 19:26 | |
*** roman_g has quit IRC | 19:39 | |
*** roman_g has joined #opendev | 19:40 | |
fungi | okay, things are looking a good sight better with this in place, so i'll write up a review | 19:47 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 20:01 |
fungi | so that's step 1 | 20:02 |
fungi | i'm going to wip it while i make sure doing it that way will work for the tarballs site | 20:02 |
fungi | then i'll look at splitting it into a dedicated role and deduplicating the gitea use | 20:03 |
fungi | seems to be working so far | 20:07 |
fungi | ianw: when you're around, your input would be most appreciated, as you have the most experience with this so far | 20:08 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 20:27 |
fungi | okay, there it is done as a reusable ansible role added to the service-static playbook | 20:28 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 20:31 |
fungi | and that ^ will remove the copy in the gitea role | 20:31 |
*** Vadmacs has quit IRC | 20:36 | |
ianw | fungi: hey, catching up ... | 20:38 |
dmsimard | wow, https://github.com/mythsman/weiboCrawler/blob/master/opener.py is all kinds of messed up o_O | 20:45 |
ianw | fungi: idea lgtm; but i do think the handler will cause issues mentioned in comment | 20:45 |
ianw | dmsimard: as will be the bill someone eventually gets ... | 20:54 |
fungi | ianw: oh, good catch | 20:54 |
* clarkb keeps getting distracted is actually doing bike ride next | 20:54 | |
fungi | dmsimard: maybe you know... what happens if you have multiple roles installing the same handler (or handlers with the same name)? | 20:56 |
dmsimard | I've never tried it, good question -- my guess would be that there's a matter of precedence and it would use the one from the current role before reaching out to other roles | 20:57 |
fungi | i suppose it can't hurt to try in that case | 20:58 |
dmsimard | you should be able to tell which role the handler came from from the task output -- it's prefixed by the role name | 20:58 |
dmsimard | s/from// | 20:59 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495 | 21:00 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496 | 21:00 |
fungi | guess we'll find out! ^ | 21:00 |
dmsimard | science \o/ | 21:00 |
fungi | science isn't science without splosions | 21:01 |
ianw | "Use unique handler names. If you trigger more than one handler with the same name, the first one(s) get overwritten. Only the last one defined will run." | 21:07 |
ianw | that isn't really clear what first and last means | 21:07 |
fungi | but in this case they have the same actions, so probably fine? | 21:08 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 21:08 |
fungi | otherwise maybe we want to extract that handler into its own apache-config-handler role or something along those lines, and just include it where it's needed? | 21:09 |
ianw | i think kiss and leave it where it is, unless it doesn't work :) | 21:12 |
fungi | in the past little while, apache on static.o.o has scaled back from 6 to 3 process slots | 21:12 |
fungi | connections graph in apache is looking stable too | 21:13 |
fungi | ooh, only 2 now | 21:14 |
fungi | most of the traffic is docs and legacy git redirects now | 21:15 |
fungi | things seem stable so i'm going to disappear for a couple hours but will check back in later | 21:18 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 21:21 |
*** slaweq has quit IRC | 21:31 | |
*** roman_g has quit IRC | 21:32 | |
*** sboyron has quit IRC | 21:47 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 22:00 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 22:16 |
*** slaweq has joined #opendev | 22:22 | |
*** hamalq has quit IRC | 22:32 | |
*** hamalq has joined #opendev | 22:33 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: WIP: boot test of containerfile image https://review.opendev.org/722148 | 22:38 |
*** slaweq has quit IRC | 22:38 | |
*** qchris has quit IRC | 22:41 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325 | 22:53 |
clarkb | I'm back from my bike ride, is there anything I should review promptly? | 22:54 |
*** qchris has joined #opendev | 22:55 | |
*** tkajinam has joined #opendev | 22:59 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives https://review.opendev.org/758469 | 22:59 |
ianw | clarkb: we can do the referrals thing on static if you like? | 23:09 |
clarkb | ianw: referrals thing? also is gerrit being slow for anyone else ? I wonder if it is being hit by the same thing gitea and static were | 23:10 |
ianw | sorry the blocking UA macro that fungi has put together | 23:10 |
clarkb | looks like CI -1'd it | 23:10 |
clarkb | but ya I'm for landing that in a general way particularly if gerrit is now being hit by it | 23:11 |
clarkb | (not sure about that yet just noticing gerrit is very slow to respond to my requests, could be a local issue) | 23:11 |
ianw | yes i have to agree on review.opendev.org | 23:12 |
clarkb | there seems to be a dotbot and a stormcrawler hitting gerrit | 23:14 |
ianw | yeah http://www.opensiteexplorer.org/dotbot | 23:14 |
clarkb | not seeing the other UAs we've come to expect | 23:14 |
ianw | i really doesn't seem to be under a lot of apache load | 23:16 |
clarkb | ya static was in the same situation though | 23:16 |
clarkb | it seemed fine except it wasnt | 23:16 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=30&rra_id=all we can see the connection spike there | 23:18 |
ianw | sigh | 23:18 |
ianw | should we iptables out the ip-range manually for a bit, so we can at least merge the UA blocking? | 23:19 |
clarkb | ianw: or we can manually apply the UA blocking assuming that will help | 23:19 |
clarkb | I'm still not seeing those UAs on review though | 23:20 |
clarkb | I also see "The Knowledge AI" hitting port 80 a lot | 23:20 |
ianw | yeah, and when i connect it logs me in apache pretty quickly, but i don't get a response | 23:20 |
ianw | ergo my request is getting to apache | 23:21 |
clarkb | ya I think like gitea the issue here is gerrit is busy as a result | 23:21 |
clarkb | system load is 12 | 23:21 |
clarkb | is there a specific IP you've identified as a bad actor? I don't think I've seen one that stands out as particularly bad yet | 23:22 |
ianw | 66.160.140.183 - - [15/Oct/2020:23:22:54 +0000] "GET /748684 HTTP/1.1" 302 558 "-" "The Knowledge AI" | 23:23 |
ianw | is currently going nuts | 23:23 |
clarkb | I'm good with blocking that one and see if things are happier | 23:23 |
ianw | i dunno if that's the root cause though | 23:23 |
*** mlavalle has quit IRC | 23:23 | |
*** tosky has quit IRC | 23:23 | |
ianw | 66.160.140.183 - - [15/Oct/2020:07:34:43 +0000] "GET /robots.txt HTTP/1.1" 302 567 "-" "The Knowledge AI" | 23:24 |
ianw | was the first occurence of that | 23:24 |
clarkb | hrm thats well before things got sad | 23:24 |
ianw | so we've not been seeing problems that long | 23:24 |
clarkb | I'm going to see if java melody will load and show us what is slow | 23:25 |
clarkb | not getting my hopes up | 23:25 |
clarkb | but maybe we can work backward from slow threads in melody to specific requests | 23:25 |
ianw | iotop shows nothing of interest | 23:26 |
TheJulia | I guess gerrit is down? | 23:29 |
* TheJulia looks back a little | 23:29 | |
ianw | TheJulia: yes, it is unhappy unfortunately, debugging ongoing :/ | 23:36 |
TheJulia | :( | 23:38 |
* TheJulia leaves folks be and goes to email | 23:39 | |
ianw | we are going to stop apache on review.opendev.org for a bit, to see if gerrit can clear the connections and slow down, then watch what happens if we re-enable | 23:41 |
ianw | ok, apache is stopped | 23:42 |
clarkb | ianw: maybe give it until 0000UTC and if it isn't happier by then we should consider restarting the service too? | 23:45 |
ianw | clarkb: are there two containers running? | 23:46 |
clarkb | oh looks like it | 23:47 |
clarkb | wat | 23:47 |
ianw | stupefied_bell | 23:47 |
clarkb | oh wait no thats track upstream | 23:47 |
clarkb | if you do a docker ps -a --no-trunc you can see the whole command | 23:47 |
clarkb | I think we're good | 23:47 |
clarkb | I mean other than that it is broken :) | 23:47 |
ianw | doh, right | 23:47 |
clarkb | theory: its ssh that is doing it not http | 23:48 |
ianw | it had 211 connections open, now 210 | 23:48 |
clarkb | fwiw the gerrit server having a hard time happened the other day iirc | 23:52 |
clarkb | fungi had to restart it | 23:52 |
clarkb | then it was ifne after that, I wonder if this is a repeat | 23:52 |
ianw | is 102 mysql connections too many? | 23:53 |
clarkb | aiui no | 23:53 |
clarkb | at least in an openstack context you use a lot of connections | 23:53 |
clarkb | whether that is the case for java and gerrit is possibly another matter | 23:53 |
clarkb | quick google shows gerrit is happy with many connections | 23:54 |
ianw | total open tcp connections still at 209, it's not like things are dropping out quickly | 23:55 |
clarkb | but maybe our db isn't like we could have a limit set on the db side that we're hitting | 23:55 |
clarkb | and that backed up the queues while things spun on slow db lookups? | 23:55 |
clarkb | (it would be so nice if melody was accessible | 23:55 |
clarkb | our pool limit is 225 in the config for mysql | 23:56 |
*** ysandeep|away is now known as ysandeep | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!