Friday, 2022-09-30

*** ysandeep is now known as ysandeep\|afk		01:33
*** ysandeep\|afk is now known as ysandeep		02:19
*** ysandeep is now known as ysandeep\|afk		04:15
*** soniya29 is now known as soniya29\|ruck		04:32
*** ysandeep\|afk is now known as ysandeep		05:04
*** soniya29\|ruck is now known as soniya29\|ruck\|afk		05:26
*** soniya29\|ruck\|afk is now known as soniya29\|ruck		06:00
*** ysandeep is now known as ysandeep\|away		07:01
*** frenzyfriday\|ruck is now known as frenzyfriday\|rover		07:07
*** jpena\|off is now known as jpena		07:17
*** ysandeep\|away is now known as ysandeep\|lunch		08:10
opendevreview	Simon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943	08:13
opendevreview	Simon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943	08:31
opendevreview	Simon Westphahl proposed zuul/zuul-jobs master: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/859943	08:45
*** soniya29\|ruck is now known as soniya29\|ruck\|lunch		08:46
*** soniya29\|ruck\|lunch is now known as soniya29\|ruck		09:36
*** ysandeep\|lunch is now known as ysandeep		09:40
*** Guest1737 is now known as diablo_rojo		10:09
*** soniya29\|ruck is now known as soniya29\|ruck\|afk		10:24
*** rlandy\|out is now known as rlandy		10:34
*** soniya29\|ruck\|afk is now known as soniya29\|ruck		10:36
*** ysandeep is now known as ysandeep\|brb		10:45
*** ysandeep\|brb is now known as ysandeep		11:13
opendevreview	Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976	11:42
opendevreview	Artem Goncharov proposed openstack/project-config master: Add check-post pipeline https://review.opendev.org/c/openstack/project-config/+/859977	11:43
*** dviroel is now known as dviroel\|afk		11:50
opendevreview	Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976	11:56
opendevreview	Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976	12:15
opendevreview	Artem Goncharov proposed openstack/project-config master: Add Post-Check flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976	12:27
*** dviroel\|afk is now known as dviroel		13:06
opendevreview	Artem Goncharov proposed openstack/project-config master: Add Allow-Post-Review flag to OpenStackSDK project https://review.opendev.org/c/openstack/project-config/+/859976	13:07
opendevreview	Artem Goncharov proposed openstack/project-config master: Add post-review pipeline https://review.opendev.org/c/openstack/project-config/+/859977	13:07
*** soniya29\|ruck is now known as soniya29\|ruck\|dinner		13:38
*** dasm\|off is now known as dasm		13:39
*** soniya29\|ruck\|dinner is now known as soniya29\|ruck		14:04
Clark[m]	fungi: re the git clone errors I'm not sure playing telephone on the mailing list is going to help us. We probably need to have those affected share IPs if we can't infer them from logs and trace specific requests to the backends and see what gitea and haproxy look like	14:06
Clark[m]	However that sort of issue is one that corporate firewalls can instigate iirc	14:07
fungi	yeah, that's why in my last call i asked if folks are using the same provider or in the same region of the world	14:08
*** lbragstad4 is now known as lbragstad		14:53
Clark[m]	Gitea06 has a spike in connections but in line with other busy backends	15:04
fungi	yeah, and it didn't persist	15:08
clarkb	the other thing to check is for connections in haproxy with a CD or is it cD ? status	15:09
clarkb	that indicates haproxy belives the client disconnected iirc	15:10
clarkb	and would be a good clue that something downstream of us initiated the eof	15:10
frickler	headsup coming monday is a bank holiday in Germany, I may or may not be around	15:10
fungi	enjoy! and thanks for the reminder	15:11
*** ysandeep is now known as ysandeep\|out		15:11
*** dviroel is now known as dviroel\|lunch		15:12
clarkb	https://www.haproxy.com/documentation/hapee/latest/onepage/#8.5 yes CD is client disconnection	15:17
fungi	clarkb: it seems this is somewhat common if git (or other things doing long-running transfers) are built against gnutls. it's apparently not as robust against network issues as openssl	15:17
fungi	it might be solved in newer gnutls, or on newer distro releases which build git against openssl (since the relicensing)	15:20
fungi	though the version in debian/sid still says it depends on libcurl3-gnutls	15:22
clarkb	I count 301 CD terminated connections since todays log began from a german ipv6 addr	15:24
clarkb	this isn't the worst offender (an amazon IP is) and there is a .eu ipv4 addr as well	15:24
fungi	does traceroute to those from the lb seem to traverse a common provider?	15:25
clarkb	something like cat haproxy.log \| grep -v -- ' -- ' \| cut -d' ' -f 6 \| sed -ne 's/$:[0-9]\+$$//p' \| sort \| uniq -c \| sort \| tail	15:25
fungi	particularly one other than (or after) zayo?	15:26
* clarkb checks		15:26
clarkb	the ipv4 trace has a distinct lack of hostnames and since the other is ipv6 its hard to map acorss them	15:27
clarkb	with ipv6 we seem to go cogent to level 3. With ipv4 zayo to I don't know what to the final destination	15:28
fungi	yeah, i've noticed a lot of backbone hops no longer publish reverse dns entries, sadly	15:28
fungi	what is this net coming to?	15:28
clarkb	this is the gitea load balancer	15:28
fungi	sorry, that was a rhetorical quip	15:28
clarkb	oh heh	15:28
clarkb	In any case I do see a fairly large number of CD states recorded by haproxy. Implying to me that it is very possible this is downstream of us. But as I mentioned before playing telephone on the mailing list makes it difficult to say for sure	15:29
fungi	anyway, i'm turning off my git clone loops since they never produced any errors at all	15:29
clarkb	if we had someone in here that could tell us what the originating IPs are and more specific timestamps this would be easier	15:29
clarkb	++	15:29
*** marios is now known as marios\|out		15:31
fungi	gtema: ^ you mentioned seeing it from somewhere in europe?	15:33
fungi	neil indicated seeing it from his home provider in the uk, and says the ci system he's seeing it on is hosted in germany	15:33
gtema	yes, I am seeing it from europe	15:34
gtema	from germany	15:34
fungi	apparently if you have a git built linking against openssl instead of gnutls it might handle that better, but it looks like debian (and so ubuntu) are at least still linking git with gnutls in their latest builds	15:36
gtema	under fedora and mac I see different error, but still falling on the same feet	15:36
clarkb	gtema: and this is all over ipv4?	15:38
gtema	yes	15:38
clarkb	if you're comfortable PM'ing me a source IP addr I can cehck the haproxy logs to see if that IP shows the CD termination state	15:38
dtantsur	gtema: I assume you also have telekom?	15:39
gtema	80.158.88.206	15:39
gtema	dtantsur - yes, telekom	15:39
* dtantsur may have IPv6 though		15:39
fungi	hopefully this isn't hurricane impact on transatlantic communications, but the winds are only just now reaching mae east where most of that peering is	15:39
gtema	but not only from home telekom, from the telekom cloud as well, in all cases traceroute start hickup in zayo net	15:40
gtema	but issues eventually seen also between nl and uk already, sometimes uk - us	15:40
clarkb	I see 10 completed connections. 5 with CD state. 3 with SD. 2 with -- (normal)	15:40
dtantsur	PING opendev.org(2604:e100:3:0:f816:3eff:fe6b:ad62 (2604:e100:3:0:f816:3eff:fe6b:ad62)) 56 data bytes	15:41
dtantsur	From 2003:0:8200:c000::1 (2003:0:8200:c000::1) icmp_seq=1 Destination unreachable: No route	15:41
dtantsur	this is weird, I can definitely clone from it..	15:41
clarkb	dtantsur: interesting you're on ipv6 not ipv4?	15:41
dtantsur	but yeah, my difference with gtema may be IPv6	15:41
clarkb	that may explain it	15:41
fungi	also the mae east peering is pretty much a bunker (inside a parking garage in tyson's corner), and has massive battery backups, so very unlikely to be the weather ;)	15:41
clarkb	interestingly the SD states all happened in a ~5 minute window about 5 hours ago and since then its largely CDs sprad out	15:42
clarkb	definitely seems network related particularly if someone on the same isp but using ipv6 is able to get through just fine	15:43
dtantsur	clarkb: aha, curl tries IPv6 and falls back to v4	15:43
clarkb	dtantsur: are you able to force ipv4 and reproduce?	15:43
clarkb	(you can modify /etc/hosts for opendev.org to be the ipv4 addr only)	15:43
clarkb	there might be a git flag too, but ^ should work reliably	15:43
dtantsur	"connect to 2604:e100:3:0:f816:3eff:fe6b:ad62 port 443 failed: Network is unreachable" from curl	15:44
dtantsur	I do find it suspicious	15:44
dtantsur	lemme try	15:44
gtema	clarkb - do you see any changes with my ip - right now it again failed	15:44
clarkb	gtema: this time it was an SD	15:44
gtema	SD is what?	15:44
clarkb	let me check the gitea02 logs	15:45
clarkb	gtema: server initiate the disconnect	15:45
gtema	hmm	15:45
gtema	I see: error: RPC failed; curl 18 transfer closed with outstanding read data remaining	15:45
gtema	fatal: early EOF	15:45
gtema	fatal: fetch-pack: invalid index-pack output	15:45
fungi	git has a "-4" command-line option	15:45
dtantsur	clarkb: I forced v4 in hosts, cloning succesfully so far	15:45
fungi	git clone -4 https://opendev.org/openstack/devstack	15:46
fungi	or whatever	15:46
dtantsur	does anyone have a guess why I'm seeing "destination unreachable" using v6?	15:46
clarkb	uhm did we stop recording backend ports in the haproxy logs?	15:46
gtema	dtantsur - for me it clones perfectly until the last percent	15:46
dtantsur	gtema: I repeated clone in a loop	15:47
dtantsur	all succeded	15:47
fungi	dtantsur: are you able to reach anything over v6 from where you're testing?	15:47
clarkb	gtema: which repo was that a clone for that just failed?	15:47
gtema	git clone https://opendev.org/openstack/heat	15:47
fungi	though i think frickler previously reported that vexxhost's v6 prefixes are too long and get filtered by some isps	15:47
gtema	but it's not 100% reproducable - sometimes it passes, but it is clearly slow	15:48
fungi	and at least at the time when it was a problem they didn't have any backup v6 routes for longer prefixes	15:48
fungi	er, i mean backup routes with shorter prefixes	15:48
dtantsur	fungi: google and facebook curl via v6 for me	15:48
gtema	dtantsur - sure you get google from us?	15:49
gtema	I tested it also and was landing always in EU mirrir	15:49
gtema	mirror	15:49
dtantsur	gtema: the question was about v6 on my side	15:49
dtantsur	I'm quite sure it works at all	15:49
fungi	yeah, i guess it's possible their v6 routes are still filtered at some isps' borders because of the prefix length	15:49
dtantsur	that explains	15:49
fungi	i haven't looked in bgp recently to see what it's like for them	15:49
dtantsur	anyway, it's funny. we have the same provider in the same region of the same country. I cannot reproduce the failure Oo	15:51
fungi	there was old iana guidance that said v6 announcements shouldn't be longer than 48 bits and vexxhost was doing 56 i think in order to carve up their assignments for different regions in a more fine-grained way	15:51
clarkb	dtantsur: can you try talking to https://gitea02.opendev.org:3081 to reproduce?	15:51
clarkb	its possible this is backend specific and that is the backend that gtema is talking to	15:51
dtantsur	curl works, trying git	15:52
dtantsur	(okay, same provider, but my IP address is clearly from a different subnet)	15:54
fungi	https://bgpview.io/ip/2604:e100:3:0:f816:3eff:fe6b:ad62 reports seeing a /48 announced currently. maybe it was that the old iana guidance was for filtering prefixes longer than 32 bits. i'll see if i can find it	15:54
dtantsur	4 attempts worked on that backend	15:54
clarkb	dtantsur: thank you for checking	15:57
clarkb	fwiw I'm still trying to work my way through the logs on gitea02 to see if there is any indication of the server closing things for some reason (so far nothing)	15:57
gtema	definitely something on the general routing side: time git clone https://gitea02.opendev.org:3081/openstack/python-openstackclient => 5s (from 91.7.x.x) and 2m39 from 80.158.88.x	15:58
gtema	but in both cases it worked now	15:58
clarkb	https://stackoverflow.com/questions/21277806/fatal-early-eof-fatal-index-pack-failed indicates that heat clone error may be due to memory needs of git	15:59
clarkb	(I think that is client side?)	16:00
gtema	he, this looks funny - our repos are now too big?	16:00
clarkb	for your client's defaults to unpack maybe?	16:00
frickler	I can confirm the issue with IPv6 from AS3320 to opendev.org is back. I can try to ping my contact there or maybe gtema has some internal link	16:00
fungi	on the v6 prefix filtering, all current recommendations i'm finding are to filter out bgp6 announcements longer than 48 bits, so that vexxhost route should be fine by modern standards. even ripe seems to say so (slide 54 in this training deck): https://www.ripe.net/support/training/material/webinar-slides/bgp-security-irr-filtering.pdf	16:00
frickler	yes, /48 is acceptable usually and the route6 object that mnaser created last year for this is still in place	16:01
clarkb	gtema: git/2.33.1.gl1 is that you?	16:01
clarkb	er is that your git client version?	16:01
gtema	2.35.3	16:02
gtema	and 2.37.3 from mac	16:02
gtema	frickler - wrt as3320. I have contact who know contact who ... So generally I can try, but that is definitely not going to be fast my way	16:03
*** dviroel\|lunch is now known as dviroel		16:05
clarkb	gtema: interesting does that mean your heat clone that failed above ran for about half an hour?	16:05
clarkb	(I'm trying to reconcile it with what I see in the logs)	16:05
gtema	clarkb - nope. It either breaks after 3-4 min or I cancel it	16:06
fungi	it's possible 30 minutes was when the lb gave up waiting for the client response	16:06
clarkb	fungi: oh that could be	16:06
fungi	because it never "got the memo" that the client went away	16:06
clarkb	ya then its a server side disconnect at that point maybe	16:07
fungi	right, we may have failsafe timeouts on the gitea end or in haproxy or it could be some state tracking middlebox elsewhere which terminated the session later	16:07
fungi	maybe even an inactivity timeout from conntrack on the hypervisor host	16:08
fungi	too many variables	16:08
clarkb	but ya I see 200s and 307s. There are a couple of 401s that are curious but I have to assume those are due to parameters sent with the request that I can't view in the logs	16:09
clarkb	separately I thought we had fixed the issue of having ports logged all the way through to trace requests and we definitely don't have that anymore	16:09
clarkb	we need the port for the backend request logged in haproxy. Then we also need to have gitea log the port (it logs :0 for some reason)	16:09
fungi	i thought we had it logged in apache on the gitea side?	16:10
clarkb	fungi: we do but that is insufficient to trace a connection/request from haproxy through to gitea	16:11
fungi	or maybe it was that we broke that logging when we added the apache layer	16:11
clarkb	we need it in the gitea logs and haproxy as well to trace through the entire system	16:11
clarkb	I'm going to look at that now as debugging this without that info is not pleasant	16:11
fungi	i can't imagine how we lost the haproxy side log details, unless they changed their log formatting language	16:12
clarkb	it seems we're ignoring the format specification for sure	16:14
fungi	i see the source ports log-format "%ci:%cp [%t] %ft [%bi]:%bp %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%	16:15
fungi	sc/%rc %sq/%bq"	16:15
fungi	grr, stray newline in my buffer	16:15
fungi	but yes it doesn't seem to appear in the logs	16:16
*** jpena is now known as jpena\|off		16:17
clarkb	[38.108.68.124]:52854 balance_git_https/gitea02.opendev.org 1/0/164628	16:17
clarkb	does that map to [%bi]:%bp %b/%s %Tw/%Tc/%Tt ?	16:17
clarkb	and if so why is the frontend ip show in the []s	16:17
clarkb	oh that is because it is the source side of the connection	16:18
clarkb	fungi: ok I think we have haproxy <-> apache info but not apache <-> gitea	16:19
fungi	oh, right, we have to map the frontend entries to their corresponding backend entries in the haproxy log	16:20
clarkb	the gitea log seems to be logging some forwarded for entry and shows 38.108.68.124:0. I think I recall trying to add this info to the gitea log and apparently that doesnt' work	16:21
clarkb	using that extra bit of info gitea and apache appear to report an http 200 response for fatal: fetch-pack: invalid index-pack output	16:22
clarkb	I think it must've failed on the client side and then not properly shut down the connection so the server eventually does it	16:22
clarkb	maybe the bits are being flipped or there is some memory issue on the client hard to say at this point but good to know gitea seems to think it was fine	16:23
clarkb	Ok and I did update the app.ini for gitea to record the request remote addr whcih I thought in testing was working	16:28
clarkb	Either going through the haproxy breaks this or this is a gitea regression	16:29
clarkb	https://github.com/go-chi/chi/issues/453 I think that is realted	16:32
fungi	https://github.com/go-chi/chi/issues/708 seems to propose an alternative which would preserve the port	16:37
fungi	and there's this option: https://github.com/go-chi/chi/pull/518	16:38
fungi	clarkb: maybe the simple solution is to just tell apache mod_proxy not to set x-forwarded-for and then map them up from the logs ourselves?	16:40
clarkb	I think gitea 1.14.0 broke this. This is when gitea migrated from macaron (which my commit for updating the access log format refers to) to chi	16:40
clarkb	fungi: we'd need to record the port in gitea somehow and I'm not sure how to do that without x-forwarded-for. Or maybe you're saying it will fallback to the regular behavior? Ya that might make sense	16:41
fungi	right, wondering if the realip middleware will just not replace it if it thinks the client address is already "real" (because of a lack of proxy headers)	16:42
clarkb	we should be able to test that at least	16:42
clarkb	I need to eat something, but then I can look at updating the apache configs to try that	16:43
clarkb	also I think we can drop the custom access log template now that gitea doesn't use macaron	16:43
fungi	yeah, i still haven't had a chance to get my morning shower, so should take a break as well	16:43
clarkb	its possible that may fix it too if we're accessing the remoteaddr differently (though I doubt it)	16:43
clarkb	fungi: I'm not seeing any way to have apache record the port it uses to establish the proxy connection?	17:48
clarkb	oh I see	17:51
opendevreview	Clark Boylan proposed opendev/system-config master: Update gitea logs for better request tracing https://review.opendev.org/c/opendev/system-config/+/860010	17:58
clarkb	something like that maybe. We should be able to check the logs from the zuul jobs to confirm	17:59
fungi	ah okay, so we were already setting a custom logformat in the vhost	18:00
fungi	the suggestions i found were for adjusting loglevel for the proxy subsystem	18:00
clarkb	fungi: ya I think we set the custome apache log to log the port of the upstream connection	18:06
clarkb	previously this should've given us the same host:port pair in all three log files	18:06
clarkb	but then gitea updated to chi and broke that	18:06
clarkb	fungi: I jumped onto the host and this seems to be working. I'm actually going to split this into two changes now because maybe our overriding of the access log template is at least partially to blame	18:27
opendevreview	Clark Boylan proposed opendev/system-config master: Update gitea logs for better request tracing https://review.opendev.org/c/opendev/system-config/+/860010	18:30
opendevreview	Clark Boylan proposed opendev/system-config master: Switch back to default gitea access log format https://review.opendev.org/c/opendev/system-config/+/860017	18:30
clarkb	Looks like https://zuul.opendev.org/t/openstack/build/6622a1ec85d0476584db69047ed4673d/log/gitea99.opendev.org/logs/access.log#11 shows that simply using the default log format won't fix this. I think landing that is good cleanup though and it isn't a bigger regression	19:47
clarkb	and we need to collect apache logs	19:48
* clarkb writes another change		19:48
opendevreview	Clark Boylan proposed opendev/system-config master: Collect apache logs from gitea99 host in testing https://review.opendev.org/c/opendev/system-config/+/860030	19:49
*** lbragstad1 is now known as lbragstad		20:23
*** dviroel is now known as dviroel\|afk		20:28
*** frenzyfriday\|rover is now known as frenzyfriday		20:39
opendevreview	Merged opendev/system-config master: Switch back to default gitea access log format https://review.opendev.org/c/opendev/system-config/+/860017	20:41
*** dasm is now known as dasm\|off		21:17
opendevreview	Merged opendev/system-config master: Update gitea logs for better request tracing https://review.opendev.org/c/opendev/system-config/+/860010	21:22
clarkb	the testing update followup to ^ seems to show they both work in a way that is traceable now	21:41
clarkb	its not as simple as finding the same string in three different log files, but it is doable	21:41
fungi	yep	21:41
clarkb	I did leave a message in the gitea dev discord channel (via matrix) asking if anyone knows why that is broken.	21:44
clarkb	I still suspect go-chi but from what I can tell go-chi wants middleware installed to do that translation and go-chi isn't yet doing the :0 if no other valid value is found. That makes me think gitea itself may be doing it somehow	21:44
opendevreview	Merged opendev/system-config master: Collect apache logs from gitea99 host in testing https://review.opendev.org/c/opendev/system-config/+/860030	22:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!