*** ysandeep is now known as ysandeep|afk | 01:33 | |
*** ysandeep|afk is now known as ysandeep | 02:19 | |
*** ysandeep is now known as ysandeep|afk | 04:15 | |
*** soniya29 is now known as soniya29|ruck | 04:32 | |
*** ysandeep|afk is now known as ysandeep | 05:04 | |
*** soniya29|ruck is now known as soniya29|ruck|afk | 05:26 | |
*** soniya29|ruck|afk is now known as soniya29|ruck | 06:00 | |
*** ysandeep is now known as ysandeep|away | 07:01 | |
*** frenzyfriday|ruck is now known as frenzyfriday|rover | 07:07 | |
*** jpena|off is now known as jpena | 07:17 | |
*** ysandeep|away is now known as ysandeep|lunch | 08:10 | |
opendevreview | Simon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943 | 08:13 |
---|---|---|
opendevreview | Simon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943 | 08:31 |
opendevreview | Simon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943 | 08:45 |
*** soniya29|ruck is now known as soniya29|ruck|lunch | 08:46 | |
*** soniya29|ruck|lunch is now known as soniya29|ruck | 09:36 | |
*** ysandeep|lunch is now known as ysandeep | 09:40 | |
*** Guest1737 is now known as diablo_rojo | 10:09 | |
*** soniya29|ruck is now known as soniya29|ruck|afk | 10:24 | |
*** rlandy|out is now known as rlandy | 10:34 | |
*** soniya29|ruck|afk is now known as soniya29|ruck | 10:36 | |
*** ysandeep is now known as ysandeep|brb | 10:45 | |
*** ysandeep|brb is now known as ysandeep | 11:13 | |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976 | 11:42 |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add check-post pipeline https://review.opendev.org/c/openstack/project-config/+/859977 | 11:43 |
*** dviroel is now known as dviroel|afk | 11:50 | |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976 | 11:56 |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976 | 12:15 |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976 | 12:27 |
*** dviroel|afk is now known as dviroel | 13:06 | |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add Allow-Post-Review flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976 | 13:07 |
opendevreview | Artem Goncharov proposed openstack/project-config master: Add post-review pipeline https://review.opendev.org/c/openstack/project-config/+/859977 | 13:07 |
*** soniya29|ruck is now known as soniya29|ruck|dinner | 13:38 | |
*** dasm|off is now known as dasm | 13:39 | |
*** soniya29|ruck|dinner is now known as soniya29|ruck | 14:04 | |
Clark[m] | fungi: re the git clone errors I'm not sure playing telephone on the mailing list is going to help us. We probably need to have those affected share IPs if we can't infer them from logs and trace specific requests to the backends and see what gitea and haproxy look like | 14:06 |
Clark[m] | However that sort of issue is one that corporate firewalls can instigate iirc | 14:07 |
fungi | yeah, that's why in my last call i asked if folks are using the same provider or in the same region of the world | 14:08 |
*** lbragstad4 is now known as lbragstad | 14:53 | |
Clark[m] | Gitea06 has a spike in connections but in line with other busy backends | 15:04 |
fungi | yeah, and it didn't persist | 15:08 |
clarkb | the other thing to check is for connections in haproxy with a CD or is it cD ? status | 15:09 |
clarkb | that indicates haproxy belives the client disconnected iirc | 15:10 |
clarkb | and would be a good clue that something downstream of us initiated the eof | 15:10 |
frickler | headsup coming monday is a bank holiday in Germany, I may or may not be around | 15:10 |
fungi | enjoy! and thanks for the reminder | 15:11 |
*** ysandeep is now known as ysandeep|out | 15:11 | |
*** dviroel is now known as dviroel|lunch | 15:12 | |
clarkb | https://www.haproxy.com/documentation/hapee/latest/onepage/#8.5 yes CD is client disconnection | 15:17 |
fungi | clarkb: it seems this is somewhat common if git (or other things doing long-running transfers) are built against gnutls. it's apparently not as robust against network issues as openssl | 15:17 |
fungi | it might be solved in newer gnutls, or on newer distro releases which build git against openssl (since the relicensing) | 15:20 |
fungi | though the version in debian/sid still says it depends on libcurl3-gnutls | 15:22 |
clarkb | I count 301 CD terminated connections since todays log began from a german ipv6 addr | 15:24 |
clarkb | this isn't the worst offender (an amazon IP is) and there is a .eu ipv4 addr as well | 15:24 |
fungi | does traceroute to those from the lb seem to traverse a common provider? | 15:25 |
clarkb | something like cat haproxy.log | grep -v -- ' -- ' | cut -d' ' -f 6 | sed -ne 's/\(:[0-9]\+\)$//p' | sort | uniq -c | sort | tail | 15:25 |
fungi | particularly one other than (or after) zayo? | 15:26 |
* clarkb checks | 15:26 | |
clarkb | the ipv4 trace has a distinct lack of hostnames and since the other is ipv6 its hard to map acorss them | 15:27 |
clarkb | with ipv6 we seem to go cogent to level 3. With ipv4 zayo to I don't know what to the final destination | 15:28 |
fungi | yeah, i've noticed a lot of backbone hops no longer publish reverse dns entries, sadly | 15:28 |
fungi | what is this net coming to? | 15:28 |
clarkb | this is the gitea load balancer | 15:28 |
fungi | sorry, that was a rhetorical quip | 15:28 |
clarkb | oh heh | 15:28 |
clarkb | In any case I do see a fairly large number of CD states recorded by haproxy. Implying to me that it is very possible this is downstream of us. But as I mentioned before playing telephone on the mailing list makes it difficult to say for sure | 15:29 |
fungi | anyway, i'm turning off my git clone loops since they never produced any errors at all | 15:29 |
clarkb | if we had someone in here that could tell us what the originating IPs are and more specific timestamps this would be easier | 15:29 |
clarkb | ++ | 15:29 |
*** marios is now known as marios|out | 15:31 | |
fungi | gtema: ^ you mentioned seeing it from somewhere in europe? | 15:33 |
fungi | neil indicated seeing it from his home provider in the uk, and says the ci system he's seeing it on is hosted in germany | 15:33 |
gtema | yes, I am seeing it from europe | 15:34 |
gtema | from germany | 15:34 |
fungi | apparently if you have a git built linking against openssl instead of gnutls it might handle that better, but it looks like debian (and so ubuntu) are at least still linking git with gnutls in their latest builds | 15:36 |
gtema | under fedora and mac I see different error, but still falling on the same feet | 15:36 |
clarkb | gtema: and this is all over ipv4? | 15:38 |
gtema | yes | 15:38 |
clarkb | if you're comfortable PM'ing me a source IP addr I can cehck the haproxy logs to see if that IP shows the CD termination state | 15:38 |
dtantsur | gtema: I assume you also have telekom? | 15:39 |
gtema | 80.158.88.206 | 15:39 |
gtema | dtantsur - yes, telekom | 15:39 |
* dtantsur may have IPv6 though | 15:39 | |
fungi | hopefully this isn't hurricane impact on transatlantic communications, but the winds are only just now reaching mae east where most of that peering is | 15:39 |
gtema | but not only from home telekom, from the telekom cloud as well, in all cases traceroute start hickup in zayo net | 15:40 |
gtema | but issues eventually seen also between nl and uk already, sometimes uk - us | 15:40 |
clarkb | I see 10 completed connections. 5 with CD state. 3 with SD. 2 with -- (normal) | 15:40 |
dtantsur | PING opendev.org(2604:e100:3:0:f816:3eff:fe6b:ad62 (2604:e100:3:0:f816:3eff:fe6b:ad62)) 56 data bytes | 15:41 |
dtantsur | From 2003:0:8200:c000::1 (2003:0:8200:c000::1) icmp_seq=1 Destination unreachable: No route | 15:41 |
dtantsur | this is weird, I can definitely clone from it.. | 15:41 |
clarkb | dtantsur: interesting you're on ipv6 not ipv4? | 15:41 |
dtantsur | but yeah, my difference with gtema may be IPv6 | 15:41 |
clarkb | that may explain it | 15:41 |
fungi | also the mae east peering is pretty much a bunker (inside a parking garage in tyson's corner), and has massive battery backups, so very unlikely to be the weather ;) | 15:41 |
clarkb | interestingly the SD states all happened in a ~5 minute window about 5 hours ago and since then its largely CDs sprad out | 15:42 |
clarkb | definitely seems network related particularly if someone on the same isp but using ipv6 is able to get through just fine | 15:43 |
dtantsur | clarkb: aha, curl tries IPv6 and falls back to v4 | 15:43 |
clarkb | dtantsur: are you able to force ipv4 and reproduce? | 15:43 |
clarkb | (you can modify /etc/hosts for opendev.org to be the ipv4 addr only) | 15:43 |
clarkb | there might be a git flag too, but ^ should work reliably | 15:43 |
dtantsur | "connect to 2604:e100:3:0:f816:3eff:fe6b:ad62 port 443 failed: Network is unreachable" from curl | 15:44 |
dtantsur | I do find it suspicious | 15:44 |
dtantsur | lemme try | 15:44 |
gtema | clarkb - do you see any changes with my ip - right now it again failed | 15:44 |
clarkb | gtema: this time it was an SD | 15:44 |
gtema | SD is what? | 15:44 |
clarkb | let me check the gitea02 logs | 15:45 |
clarkb | gtema: server initiate the disconnect | 15:45 |
gtema | hmm | 15:45 |
gtema | I see: error: RPC failed; curl 18 transfer closed with outstanding read data remaining | 15:45 |
gtema | fatal: early EOF | 15:45 |
gtema | fatal: fetch-pack: invalid index-pack output | 15:45 |
fungi | git has a "-4" command-line option | 15:45 |
dtantsur | clarkb: I forced v4 in hosts, cloning succesfully so far | 15:45 |
fungi | git clone -4 https://opendev.org/openstack/devstack | 15:46 |
fungi | or whatever | 15:46 |
dtantsur | does anyone have a guess why I'm seeing "destination unreachable" using v6? | 15:46 |
clarkb | uhm did we stop recording backend ports in the haproxy logs? | 15:46 |
gtema | dtantsur - for me it clones perfectly until the last percent | 15:46 |
dtantsur | gtema: I repeated clone in a loop | 15:47 |
dtantsur | all succeded | 15:47 |
fungi | dtantsur: are you able to reach anything over v6 from where you're testing? | 15:47 |
clarkb | gtema: which repo was that a clone for that just failed? | 15:47 |
gtema | git clone https://opendev.org/openstack/heat | 15:47 |
fungi | though i think frickler previously reported that vexxhost's v6 prefixes are too long and get filtered by some isps | 15:47 |
gtema | but it's not 100% reproducable - sometimes it passes, but it is clearly slow | 15:48 |
fungi | and at least at the time when it was a problem they didn't have any backup v6 routes for longer prefixes | 15:48 |
fungi | er, i mean backup routes with shorter prefixes | 15:48 |
dtantsur | fungi: google and facebook curl via v6 for me | 15:48 |
gtema | dtantsur - sure you get google from us? | 15:49 |
gtema | I tested it also and was landing always in EU mirrir | 15:49 |
gtema | mirror | 15:49 |
dtantsur | gtema: the question was about v6 on my side | 15:49 |
dtantsur | I'm quite sure it works at all | 15:49 |
fungi | yeah, i guess it's possible their v6 routes are still filtered at some isps' borders because of the prefix length | 15:49 |
dtantsur | that explains | 15:49 |
fungi | i haven't looked in bgp recently to see what it's like for them | 15:49 |
dtantsur | anyway, it's funny. we have the same provider in the same region of the same country. I cannot reproduce the failure Oo | 15:51 |
fungi | there was old iana guidance that said v6 announcements shouldn't be longer than 48 bits and vexxhost was doing 56 i think in order to carve up their assignments for different regions in a more fine-grained way | 15:51 |
clarkb | dtantsur: can you try talking to https://gitea02.opendev.org:3081 to reproduce? | 15:51 |
clarkb | its possible this is backend specific and that is the backend that gtema is talking to | 15:51 |
dtantsur | curl works, trying git | 15:52 |
dtantsur | (okay, same provider, but my IP address is clearly from a different subnet) | 15:54 |
fungi | https://bgpview.io/ip/2604:e100:3:0:f816:3eff:fe6b:ad62 reports seeing a /48 announced currently. maybe it was that the old iana guidance was for filtering prefixes longer than 32 bits. i'll see if i can find it | 15:54 |
dtantsur | 4 attempts worked on that backend | 15:54 |
clarkb | dtantsur: thank you for checking | 15:57 |
clarkb | fwiw I'm still trying to work my way through the logs on gitea02 to see if there is any indication of the server closing things for some reason (so far nothing) | 15:57 |
gtema | definitely something on the general routing side: time git clone https://gitea02.opendev.org:3081/openstack/python-openstackclient => 5s (from 91.7.x.x) and 2m39 from 80.158.88.x | 15:58 |
gtema | but in both cases it worked now | 15:58 |
clarkb | https://stackoverflow.com/questions/21277806/fatal-early-eof-fatal-index-pack-failed indicates that heat clone error may be due to memory needs of git | 15:59 |
clarkb | (I think that is client side?) | 16:00 |
gtema | he, this looks funny - our repos are now too big? | 16:00 |
clarkb | for your client's defaults to unpack maybe? | 16:00 |
frickler | I can confirm the issue with IPv6 from AS3320 to opendev.org is back. I can try to ping my contact there or maybe gtema has some internal link | 16:00 |
fungi | on the v6 prefix filtering, all current recommendations i'm finding are to filter out bgp6 announcements longer than 48 bits, so that vexxhost route should be fine by modern standards. even ripe seems to say so (slide 54 in this training deck): https://www.ripe.net/support/training/material/webinar-slides/bgp-security-irr-filtering.pdf | 16:00 |
frickler | yes, /48 is acceptable usually and the route6 object that mnaser created last year for this is still in place | 16:01 |
clarkb | gtema: git/2.33.1.gl1 is that you? | 16:01 |
clarkb | er is that your git client version? | 16:01 |
gtema | 2.35.3 | 16:02 |
gtema | and 2.37.3 from mac | 16:02 |
gtema | frickler - wrt as3320. I have contact who know contact who ... So generally I can try, but that is definitely not going to be fast my way | 16:03 |
*** dviroel|lunch is now known as dviroel | 16:05 | |
clarkb | gtema: interesting does that mean your heat clone that failed above ran for about half an hour? | 16:05 |
clarkb | (I'm trying to reconcile it with what I see in the logs) | 16:05 |
gtema | clarkb - nope. It either breaks after 3-4 min or I cancel it | 16:06 |
fungi | it's possible 30 minutes was when the lb gave up waiting for the client response | 16:06 |
clarkb | fungi: oh that could be | 16:06 |
fungi | because it never "got the memo" that the client went away | 16:06 |
clarkb | ya then its a server side disconnect at that point maybe | 16:07 |
fungi | right, we may have failsafe timeouts on the gitea end or in haproxy or it could be some state tracking middlebox elsewhere which terminated the session later | 16:07 |
fungi | maybe even an inactivity timeout from conntrack on the hypervisor host | 16:08 |
fungi | too many variables | 16:08 |
clarkb | but ya I see 200s and 307s. There are a couple of 401s that are curious but I have to assume those are due to parameters sent with the request that I can't view in the logs | 16:09 |
clarkb | separately I thought we had fixed the issue of having ports logged all the way through to trace requests and we definitely don't have that anymore | 16:09 |
clarkb | we need the port for the backend request logged in haproxy. Then we also need to have gitea log the port (it logs :0 for some reason) | 16:09 |
fungi | i thought we had it logged in apache on the gitea side? | 16:10 |
clarkb | fungi: we do but that is insufficient to trace a connection/request from haproxy through to gitea | 16:11 |
fungi | or maybe it was that we broke that logging when we added the apache layer | 16:11 |
clarkb | we need it in the gitea logs and haproxy as well to trace through the entire system | 16:11 |
clarkb | I'm going to look at that now as debugging this without that info is not pleasant | 16:11 |
fungi | i can't imagine how we lost the haproxy side log details, unless they changed their log formatting language | 16:12 |
clarkb | it seems we're ignoring the format specification for sure | 16:14 |
fungi | i see the source ports log-format "%ci:%cp [%t] %ft [%bi]:%bp %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/% | 16:15 |
fungi | sc/%rc %sq/%bq" | 16:15 |
fungi | grr, stray newline in my buffer | 16:15 |
fungi | but yes it doesn't seem to appear in the logs | 16:16 |
*** jpena is now known as jpena|off | 16:17 | |
clarkb | [38.108.68.124]:52854 balance_git_https/gitea02.opendev.org 1/0/164628 | 16:17 |
clarkb | does that map to [%bi]:%bp %b/%s %Tw/%Tc/%Tt ? | 16:17 |
clarkb | and if so why is the frontend ip show in the []s | 16:17 |
clarkb | oh that is because it is the source side of the connection | 16:18 |
clarkb | fungi: ok I think we have haproxy <-> apache info but not apache <-> gitea | 16:19 |
fungi | oh, right, we have to map the frontend entries to their corresponding backend entries in the haproxy log | 16:20 |
clarkb | the gitea log seems to be logging some forwarded for entry and shows 38.108.68.124:0. I think I recall trying to add this info to the gitea log and apparently that doesnt' work | 16:21 |
clarkb | using that extra bit of info gitea and apache appear to report an http 200 response for fatal: fetch-pack: invalid index-pack output | 16:22 |
clarkb | I think it must've failed on the client side and then not properly shut down the connection so the server eventually does it | 16:22 |
clarkb | maybe the bits are being flipped or there is some memory issue on the client hard to say at this point but good to know gitea seems to think it was fine | 16:23 |
clarkb | Ok and I did update the app.ini for gitea to record the request remote addr whcih I thought in testing was working | 16:28 |
clarkb | Either going through the haproxy breaks this or this is a gitea regression | 16:29 |
clarkb | https://github.com/go-chi/chi/issues/453 I think that is realted | 16:32 |
fungi | https://github.com/go-chi/chi/issues/708 seems to propose an alternative which would preserve the port | 16:37 |
fungi | and there's this option: https://github.com/go-chi/chi/pull/518 | 16:38 |
fungi | clarkb: maybe the simple solution is to just tell apache mod_proxy not to set x-forwarded-for and then map them up from the logs ourselves? | 16:40 |
clarkb | I think gitea 1.14.0 broke this. This is when gitea migrated from macaron (which my commit for updating the access log format refers to) to chi | 16:40 |
clarkb | fungi: we'd need to record the port in gitea somehow and I'm not sure how to do that without x-forwarded-for. Or maybe you're saying it will fallback to the regular behavior? Ya that might make sense | 16:41 |
fungi | right, wondering if the realip middleware will just not replace it if it thinks the client address is already "real" (because of a lack of proxy headers) | 16:42 |
clarkb | we should be able to test that at least | 16:42 |
clarkb | I need to eat something, but then I can look at updating the apache configs to try that | 16:43 |
clarkb | also I think we can drop the custom access log template now that gitea doesn't use macaron | 16:43 |
fungi | yeah, i still haven't had a chance to get my morning shower, so should take a break as well | 16:43 |
clarkb | its possible that may fix it too if we're accessing the remoteaddr differently (though I doubt it) | 16:43 |
clarkb | fungi: I'm not seeing any way to have apache record the port it uses to establish the proxy connection? | 17:48 |
clarkb | oh I see | 17:51 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea logs for better request tracing https://review.opendev.org/c/opendev/system-config/+/860010 | 17:58 |
clarkb | something like that maybe. We should be able to check the logs from the zuul jobs to confirm | 17:59 |
fungi | ah okay, so we were already setting a custom logformat in the vhost | 18:00 |
fungi | the suggestions i found were for adjusting loglevel for the proxy subsystem | 18:00 |
clarkb | fungi: ya I think we set the custome apache log to log the port of the upstream connection | 18:06 |
clarkb | previously this should've given us the same host:port pair in all three log files | 18:06 |
clarkb | but then gitea updated to chi and broke that | 18:06 |
clarkb | fungi: I jumped onto the host and this seems to be working. I'm actually going to split this into two changes now because maybe our overriding of the access log template is at least partially to blame | 18:27 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea logs for better request tracing https://review.opendev.org/c/opendev/system-config/+/860010 | 18:30 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch back to default gitea access log format https://review.opendev.org/c/opendev/system-config/+/860017 | 18:30 |
clarkb | Looks like https://zuul.opendev.org/t/openstack/build/6622a1ec85d0476584db69047ed4673d/log/gitea99.opendev.org/logs/access.log#11 shows that simply using the default log format won't fix this. I think landing that is good cleanup though and it isn't a bigger regression | 19:47 |
clarkb | and we need to collect apache logs | 19:48 |
* clarkb writes another change | 19:48 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Collect apache logs from gitea99 host in testing https://review.opendev.org/c/opendev/system-config/+/860030 | 19:49 |
*** lbragstad1 is now known as lbragstad | 20:23 | |
*** dviroel is now known as dviroel|afk | 20:28 | |
*** frenzyfriday|rover is now known as frenzyfriday | 20:39 | |
opendevreview | Merged opendev/system-config master: Switch back to default gitea access log format https://review.opendev.org/c/opendev/system-config/+/860017 | 20:41 |
*** dasm is now known as dasm|off | 21:17 | |
opendevreview | Merged opendev/system-config master: Update gitea logs for better request tracing https://review.opendev.org/c/opendev/system-config/+/860010 | 21:22 |
clarkb | the testing update followup to ^ seems to show they both work in a way that is traceable now | 21:41 |
clarkb | its not as simple as finding the same string in three different log files, but it is doable | 21:41 |
fungi | yep | 21:41 |
clarkb | I did leave a message in the gitea dev discord channel (via matrix) asking if anyone knows why that is broken. | 21:44 |
clarkb | I still suspect go-chi but from what I can tell go-chi wants middleware installed to do that translation and go-chi isn't yet doing the :0 if no other valid value is found. That makes me think gitea itself may be doing it somehow | 21:44 |
opendevreview | Merged opendev/system-config master: Collect apache logs from gitea99 host in testing https://review.opendev.org/c/opendev/system-config/+/860030 | 22:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!