Sunday, 2020-04-26

mordred	corvus: gotcha. so there's just a weird error and it's not version tied. yay	01:15
openstackgerrit	Akihiro Motoki proposed openstack/project-config master: Define stable cores for horizon plugins in neutron stadium https://review.opendev.org/722682	01:15
openstackgerrit	Akihiro Motoki proposed openstack/project-config master: Define stable cores for horizon plugins in neutron stadium https://review.opendev.org/722682	01:24
*** DSpider has joined #opendev		07:04
frickler	infra-root: there is this failure from the service-nodepool playbook http://paste.openstack.org/show/792722/ which I assume is caused by https://review.opendev.org/721098	07:41
frickler	I'm going to try and remove the old dir and hope that that'll make the rsync work, but maybe we should avoid having symlinks in git dirs? or is there some better solution?	07:42
frickler	I also don't like that the rsynced file are owned by 2031:2031, which is a userid that doesn't exist on the targets, do we have a plan to create the zuul user there? otherwise we'd likely rather make things owned by root IMO	07:48
frickler	infra-root: we also seem to never give the "all-clear" that was mentioned in the last status notice, from scrollback I'd assume we could do that now?	07:54
frickler	s/give/have given/	07:54
*** roman_g has joined #opendev		07:54
frickler	next pulse seems to have run that playbook successfully	08:13
frickler	now I see that I wouldn't have needed this, because the focal image has already been built successfully on nb04	08:20
frickler	well, "successfully", because it is giving me node-failures on https://review.opendev.org/704831 . will try to debug later, got to do some plumbing first	08:21
*** roman_g has quit IRC		08:36
AJaeger	frickler: let's give the all-clear...	08:43
AJaeger	#status notice Zuul is happy testing changes again, changes with MERGER_FAILURE can be rechecked.	08:44
openstackstatus	AJaeger: sending notice	08:44
-openstackstatus- NOTICE: Zuul is happy testing changes again, changes with MERGER_FAILURE can be rechecked.		08:44
openstackstatus	AJaeger: finished sending notice	08:48
*** tosky has joined #opendev		08:57
*** elod has quit IRC		09:26
*** elod has joined #opendev		09:26
*** tbarron_ is now known as tbarron		11:43
frickler	humm, can't get nodes when we don't launch'em. patch upcoming	11:53
openstackgerrit	Jens Harbott (frickler) proposed openstack/project-config master: Launch focal nodes https://review.opendev.org/723213	11:59
frickler	infra-root: ^^ that'd be the whole thing, or would we want to do some more testing or a slow start first? I have a devstack patch waiting that went fine on a local instance running the stock Ubuntu cloud image	12:00
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: hlint: add haskell source code suggestions job https://review.opendev.org/722309	12:05
tristanC	clarkb: i haven't noticed a difference between git versions with regards to .git dirs presence. It seems like the rule is that a valid gitdir requires the `.git/refs` directory to exist.	12:06
tristanC	clarkb: the difference was how dangling ref were handled, and old git was not removing directory efficiently, thus there is a custom function in zuul that does extra cleanup, and this function was apparantly too aggressive.	12:10
tristanC	clarkb: the question is how to reproduce that behavior, because it only happened when the workdir didn't have any `heads` or `tags`	12:12
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Stop translation stable branches on some projects https://review.opendev.org/723217	12:29
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Stop translation stable branches on some projects https://review.opendev.org/723217	12:52
yoctozepto	https://bugs.launchpad.net/devstack/+bug/1875075	13:29
openstack	Launchpad bug 1875075 in devstack "devstack setup fails: Failed to connect to opendev.org port 443: Connection refused" [Undecided,Opinion]	13:29
yoctozepto	seems gitea is not doing good today :-(	13:30
fungi	which would be odd since connection refused would be coming from the load balancer, not from a gitea host	13:43
fungi	right now i'm able to reach 443/tcp on the load balancer's ipv4 and ipv6 addresses	13:45
fungi	is it intermittent?	13:45
fungi	trending on elastic-recheck?	13:45
mordred	fungi, frickler: I have a fix lined up for the nodepool playbook issue - but we were fighting fires yeseterday so I didn't want to do it	13:46
mordred	I'll go ahead this morning	13:46
yoctozepto	fungi: intermittent, as per bug report	13:46
yoctozepto	not blaming backend, "gitea" = "service thereof", as seen by the great public	13:47
yoctozepto	us, humble bread eaters	13:47
fungi	tcp rst (connection refused) would have to be coming from the haproxy server directly, since it's configured to act as a tcp socket proxy, so it wouldn't route a connection refused from the gitea backends	13:55
fungi	though it could be that at times none of the 8 backends are reachable and the pool is epmty	13:56
fungi	it's possible clearing all the git repo caches on our zuul mergers and executors (20 in all) has inflicted a partial ddos against our git servers as builds cause them to clone one repository or another in clusters	14:02
corvus	fungi: zuul doesn't know about gitea	14:02
mordred	fungi, frickler: puppet issue wtih set-hostname fixed	14:03
fungi	corvus: oh so those are cloning from gerrit anyway	14:04
fungi	yeah, seemed like a stretch regardless	14:04
corvus	ya	14:04
fungi	and the cacti graphs so far aren't showing anything out of the ordinary for the haproxy or gitea servers	14:04
mordred	frickler: (your solution was the right one - it was an unfortunate issue of replacing a dir with a symlink) - also - I agree, 2031 is ugly, how about we update that to push them onto the remote hosts as root	14:06
yoctozepto	I started getting it today, the reporter probably as well	14:07
yoctozepto	it's very rare, but happens on clone/pull	14:08
yoctozepto	it never happened from browser but it's probably because I only visited it after it failed on git :-)	14:08
yoctozepto	I thought it was my connection, but this report made me share the problem with you	14:09
fungi	i'm being told i have to get off the computer and do some chores, but i can try cloning from all the backends later just to see if i can get it to reproduce any errors	14:10
fungi	this is a snapshot of the current pool state: http://paste.openstack.org/show/792729/	14:10
mordred	morning corvus ! interesting error from zuul smart-reconfigure: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ae8/723107/3/check/system-config-run-zuul/ae8d99a/bridge.openstack.org/ara-report/result/ba99d303-7226-4f15-a341-c10935a45c80/	14:19
mordred	I wonder - docker-compose exec by default allocates a tty - since we're running this in ansible, do we need to do docker-compose exec -T to disable allocating a tty?	14:21
corvus	mordred: i dunno -- maybe try it out with "ansible" on bridge?	14:21
mordred	corvus: safe to run a smart-reconfigure now right?	14:21
corvus	mordred: yeah, i think it should be safe any time	14:21
mordred	kk	14:22
mordred	should we add -f to smart-reconfigure to run it in the foreground and get output? or will that matter for this?	14:22
corvus	mordred: it's an async command either way, so won't matter	14:22
mordred	corvus: yup - the tty error was from docker-compose - -T fixed it	14:24
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run smart-reconfigure instead of HUP https://review.opendev.org/723107	14:25
mordred	corvus: also - yay for testing actually catching that :)	14:25
corvus	\o/	14:26
mordred	corvus: -T is consistent with how we're running docker-compose exec elsewhere in ansible fwiw	14:26
openstackgerrit	Merged openstack/project-config master: Launch focal nodes https://review.opendev.org/723213	14:26
mordred	corvus: oh, I also I fixed your comment from https://review.opendev.org/#/c/723105/	14:27
mordred	corvus: re: stop playbook - are systemctl stop and docker-compose down not blocking? I would have thought that each of them would be responsible for not returning until the thing was stopped :(	14:29
corvus	mordred: i imagine docker-compose down is, but systemctl stop is not, which is why the zuul_restart playbook waits for the service to stop	14:31
mordred	corvus: nod	14:31
mordred	corvus: maybe instead of adding those new playbooks we should add tags to zuul_stop, zuul_start and zuul_restart. or I guess we could actually just use those with limit	14:32
mordred	now that I think about it - those with limit actually sounds like the most flexible thing	14:33
mordred	hrm. except for scheduler and web.	14:35
*** sgw has joined #opendev		14:38
openstackgerrit	Monty Taylor proposed opendev/system-config master: Rework zuul start/stop/restart playbooks for docker https://review.opendev.org/723048	14:49
mordred	corvus: maybe something like that ^^ is better	14:50
mordred	corvus: oh - you're going to _love_ the latest failure on building multi-arch images	15:00
mordred	corvus: it looks like it's ignoring /etc/hosts and doing a dns lookup directly: https://zuul.opendev.org/t/zuul/build/a487b41ba7334baca1af0b67ea21f04f/console#2/5/21/builder	15:01
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	15:01
mordred	I'm gonna grab /etc/hosts just to double check - but the "add buildset registry to /etc/hosts" task ran, so I'm pretty confident it's there	15:02
mordred	OH! maybe since there is a "builder" container, the push is actually happening inside of that container - yup: https://github.com/docker/buildx/issues/218	15:05
corvus	mordred: what do you think that last comment means?	15:07
corvus	did they run a dnsmasq service and somehow configure the buildkit container to use that as its resolver?	15:08
mordred	yeah - I think that's likely what they did	15:10
mordred	corvus: also - note the thing above where the person adds registry mirror config when creating the builder container	15:10
mordred	I think we probably need to do that so that the builders mirror config is right	15:11
corvus	yeah :/	15:11
corvus	so basically, we get to start over from scratch for buildx	15:11
mordred	but lacking support for /etc/hosts ...	15:11
mordred	yeah	15:11
mordred	corvus: I wonder ... the builder container is a running container ... we could exec in to it and edit its /etc/hosts	15:12
corvus	mordred: between the create and build steps?	15:13
corvus	i wonder if there's a way to bind-mount the files in? though if there were, i would have expected that to come up on that issue	15:13
mordred	corvus: yeah - I just exec'd into a sh in my test mybuilder image	15:14
mordred	corvus: yeah - I just exec'd into a sh in my test mybuilder container	15:14
mordred	and /etc/hosts and resolv.conf are both there as expected	15:15
mordred	corvus: https://github.com/moby/buildkit/blob/master/docs/buildkitd.toml.md is the file that one can pass with the --config option to create	15:22
mordred	corvus: and I have confirmed that passing a file to that on the create line will cause the config to show up in /etc/buildkit in the worker contianer	15:24
mordred	so - I think we need to make sure we can create a buildkit.toml file with the registry mirror info ... that might not be buildx-specific - it looks like buildkit understands multi-repo like buildah does, so it's possible we could achieve multi-repo with docker if we enable buildkit (waving hands)	15:25
corvus	mordred: so we might be able to use the containers-style registry config for that	15:25
mordred	yeah	15:25
corvus	that'd be a good reason to do this for everything	15:25
mordred	yeah	15:25
mordred	although we can use buildkit even without buildx - I don't know if buildkit.toml works with normal docker when buildkit is enabled - but I'd imagine so?	15:26
mordred	then we still need to edit /etc/hosts -but we might even be able to just do a docker copy ... let me try that real quick	15:26
mordred	weirdly the /etc/hosts in the container has 172.17.0.3239c1ae2f24b ... so maybe that's for container referencing itself by id	15:27
mordred	so we might need to edit the file anyway	15:27
corvus	mordred: i wonder: do we need to edit /etc/hosts if we can define the registry mirrors?	15:28
mordred	corvus: I don't know	15:28
corvus	is it possible we can specify those by ip address? (and, of course, we'd need to see if ipv6 works)	15:28
mordred	corvus: however - we can do docker cp {container id}:/etc/hosts hosts	15:29
mordred	then edit hosts	15:29
mordred	then docker cp the file back	15:29
mordred	so we could re-use our ansible lineinfile we already have	15:29
mordred	corvus: I'm going to try just starting with the /etc/hosts editing like that and see if it at least pushes to the buildset registry properly	15:32
clarkb	is there any one looking at fixing zuul.opemstack.org's ssl cert?	15:32
mordred	clarkb: what needs to be fixed? it should be part of the subject altname for zuul.opendev..org	15:32
clarkb	mordred: it doesnt appear to be. My phone says its acert for opendev.org and complains	15:32
mordred	clarkb: https://opendev.org/opendev/system-config/raw/branch/master/playbooks/host_vars/zuul01.openstack.org	15:33
mordred	ah - you know what - I bet we added it after we got the initial cert	15:33
mordred	clarkb: so we might need to remove teh letsencrypt files so that we re-request for both	15:33
corvus	what caused it to change just now?	15:33
clarkb	also re gitea connection resets: I think that can happem when we restart containers	15:33
mordred	I forget which files ianw figured out it should be	15:33
clarkb	fungi: yoctozepto basically when mariadb or our gitea imagesupdate	15:34
corvus	cause it looks like we're right in the middle of cert validity	15:34
clarkb	mordred: corvus they were two separate certs before	15:34
corvus	before what?	15:34
clarkb	corvus: guessing ansible zuul combinesthem into one LE	15:34
clarkb	corvus: before the absibling	15:35
mordred	yeah - it should be all served from the le opendev cert now	15:35
clarkb	corvus: so on thursday with puppetit was two separate certs	15:35
corvus	oh, this is a new cert, which we requested in the middle of march, in preparation for switching to ansible, and we started using it yesterday?	15:35
corvus	or friday	15:35
mordred	yeah	15:35
corvus	ok	15:35
corvus	i agree, sounds like asking le for a new cert is the way to go	15:36
corvus	not before: 3/9/2020, 1:22:50 PM (Pacific Daylight Time)	15:36
clarkb	mordred: I think something in a .dir or.file in roots homedir	15:36
mordred	ianw figured out which files need to be blanked out to force that	15:36
mordred	yeah	15:36
mordred	I think it got documented?	15:36
clarkb	that it uses to know when the refresh the cert	15:36
corvus	f0b77485ec (Monty Taylor 2020-04-05 09:25:28 -0500 5) - zuul.openstack.org	15:36
corvus	mordred: i think those timestamps confirm your theory	15:37
mordred	corvus: ++	15:37
mordred	Refreshing keys section in letsencrypt docs	15:37
corvus	https://docs.openstack.org/infra/system-config/letsencrypt.html#refreshing-keys	15:37
mordred	yup	15:37
mordred	clarkb: for the gitea thing - should we update the gitea playbook to remove a gitea backend from the lb when we're ansibling it? or I guess since we're doing tcp lb that wouldn't be any better would it?	15:39
clarkb	mordred: we'd need to remove it from haproxy, wait for connections to stop, then do docker then add it back	15:40
clarkb	mordred: maybe give it a 120 second timeout on the wait for connections to drop	15:40
mordred	clarkb: seems like a thing we'd only really want to do if the image had updated - and we don't really have a great way to track whether compose is going to restart it or not atm	15:41
mordred	I mean - it would always be safe - but it would make the playbook take a long time to run	15:41
mordred	clarkb, corvus : are any of us doing that mv on zuul01? I can if y'all aren't already	15:42
fungi	clarkb: in this case it sounded like more than a momentary blip. have we been restarting the gitea and/or mariadb containers a bunch in the last 24 hours?	15:43
mordred	nope	15:43
mordred	we havent' pushed a new gitea recently	15:43
clarkb	mariadb updates semi often	15:43
mordred	I have renamed the conf files for zuul01	15:44
mordred	next ansible should fix it	15:44
clarkb	mordred: I'm still on a phone	15:44
clarkb	mordred: we actually do know when compose is going to stop and start stuff	15:44
fungi	it sounded more like when someone is running devstack locally (such that it clones a bunch of openstack repos fresh), some small percentage of those encounter a connection refused for opendev.org (reported independently by yoctozepto and also the lp bug against devstack)	15:44
clarkb	that got added in the safe shutdown ordering	15:44
fungi	and supposedly just in the past day	15:45
fungi	entirely possible this is a problem with some backbone peering point shared by both reporters, i suppose	15:46
mordred	fungi: is it worth putting in a clone retry in devstack for when people are running it locally? I mean - network failures happen	15:46
fungi	(wherein something is sending a tcp/rst on behalf of opendev.org in response to the client's tcp/syn i suppose, otherwise it would show up as a connection reset by peer or a timeout or whatever)	15:47
fungi	mordred: possible that's something the devstack maintainers want to consider	15:47
clarkb	it could also be OOMs again	15:47
mordred	clarkb: good point	15:47
fungi	anyway, i'm not immediately seeing any issues not oom	15:50
fungi	most recent oom was gitea08 on 2020-04-07	15:50
fungi	and shortest vm uptime is 66 days	15:50
clarkb	then I'm out of ideas :)	15:51
fungi	gitea web processes all last restarted sometime yesterday utc	15:52
fungi	according to ps	15:52
clarkb	fungi: probably due to mariadb update	15:53
fungi	mysqld on all of them was also last updated sometime yesterday, yes	15:53
fungi	s/updated/restarted/ anyway	15:53
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	15:53
mordred	corvus: ^^ there's a stab at fixing /etc/hosts	15:53
fungi	so doesn't appear to be virtual machines rebooting, containers restarting, out of memory condition... cacti graphs look typical for everything, haproxy pool seems normal at the moment	15:54
fungi	only other thing i can think to try is load-testing them in attempt to reproduce the issue and then try to match up connections	15:54
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	15:57
mordred	corvus: please enjoy the difference between the last 2 patchsets	15:57
openstackgerrit	Monty Taylor proposed opendev/system-config master: Run smart-reconfigure instead of HUP https://review.opendev.org/723107	16:01
openstackgerrit	Monty Taylor proposed opendev/system-config master: Rework zuul start/stop/restart playbooks for docker https://review.opendev.org/723048	16:18
*** roman_g has joined #opendev		16:23
frickler	cool, devstack on focal runs fine for a bit, then it crashes mysql8, can reproduce locally https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1b1/704831/5/check/devstack-platform-focal/1b12f0b/controller/logs/mysql/error_log.txt	16:25
frickler	but our image seems to work fine \o/	16:25
corvus	mordred: wow :)	16:28
mordred	frickler: neat	16:31
*** roman_g has quit IRC		16:44
clarkb	its almost like it crashes just by starting	16:44
clarkb	mordred: how often do we run the LE playbook? is it part of hourly?	16:46
clarkb	(just wondering when I should check the cert for zuul again. Maybe after yard work?)	16:46
clarkb	fungi: fwiw there was a OOM on lists yesterday but not one today	16:47
clarkb	fungi: our large qrunner process for the openstack list is the largest process by memory use during that period, but then it shrinks to 1/10th its orginal size (maybe smaller) and listinfo processes (which are run by the web ui) end up taking over however they are relatively small in size)	16:54
clarkb	fungi: that makes me think it is not a qrunner after all beacuse you'd expect it to grow or to see the large process change to a different qrunner (the one that is growing) if that is the case	16:54
clarkb	fungi: during the period before and after the OOM we are getting crawled by a SEMrush bot	17:00
clarkb	thats basically the only web traffic ( and there is't a ton of it but maybe mailman doesn't like it?)	17:00
clarkb	anyway back to my weekend now. I think the good news is that smtp traffic doesn't seem to be a factor or we'd expect the qrunners to grow	17:00
fungi	i've seen apache eating large amounts of memory, but that may just be file buffers	17:03
fungi	also could be shared memory	17:03
fungi	hard to really tell looking at the oom reports	17:04
*** dpawlik has joined #opendev		18:45
fdegir	we've been having clone issues from opendev yesterday and i thought it is a connection issue on our side but seeing https://bugs.launchpad.net/devstack/+bug/1875075 made me think it is not	18:58
openstack	Launchpad bug 1875075 in devstack "devstack setup fails: Failed to connect to opendev.org port 443: Connection refused" [Undecided,Opinion]	18:58
fdegir	it is still happening randomly - it works for some repos and doesn't work for others	19:02
AJaeger	fdegir: for which repos does it fail for you?	19:10
AJaeger	fdegir: I think it's one of our git hosts	19:12
fdegir	AJaeger: here are the ones we are frequently having issues with cloning	19:14
AJaeger	infra-root, I tried cloning manually from the getia hosts, and the response time really varied. Normally: 1 to 2 secons, gitea05 now 17s, retry only 1.7 s	19:14
fdegir	https://opendev.org/openstack/diskimage-builder	19:14
fdegir	https://opendev.org/openstack/sushy	19:15
fdegir	https://opendev.org/x/ironic-staging-drivers	19:15
fdegir	https://opendev.org/openstack/python-ironic-inspector-client.git	19:16
AJaeger	fdegir: I'm trying diskimage-builder now on all gitea hosts...	19:16
AJaeger	fdegir: is that reproducable? Meaning, does a second clone work?	19:16
fdegir	https://opendev.org/openstack/bifrost.git	19:16
fdegir	AJaeger: it is not always the same repos across the runs	19:16
AJaeger	fdegir: thanks	19:17
fdegir	AJaeger: it works for bifrost and fails on dib and the next time it fails on bifrost and doesn't work for dib	19:17
AJaeger	I was just able to clone dib from all 8 hosts directly...	19:17
AJaeger	so, could be a network or load-balancer problem	19:17
fdegir	AJaeger: if you want the list to try, you can use the repos cloed by bifrost as an example as we see the failures while bifrost clones the repos	19:17
fdegir	https://opendev.org/openstack/bifrost/src/branch/master/playbooks/roles/bifrost-prep-for-install/defaults/main.yml	19:18
AJaeger	fdegir: I hope one of the admins can look later at it, I was just trying the obvious things	19:18
fdegir	AJaeger: yep, thanks	19:18
fdegir	the repos listed in bifrost defaults could be good set of repos to try as we haven't been able to install bifrost since yesterday	19:19
AJaeger	fdegir: is that failing on your system - or in Zuul?	19:21
openstackgerrit	Andreas Jaeger proposed opendev/system-config master: Remove git0*.openstack.org https://review.opendev.org/723251	19:24
AJaeger	fdegir: locally I had no problems cloning dib, hope somebody has a better idea on how to debug	19:26
fdegir	AJaeger: yes, our Jenkins jobs are failing	19:29
AJaeger	fdegir: Ok. Sorry, I can't help further - and don't know when an admin will be around, you might need to wait a bit longer.	19:34
fungi	fdegir: AJaeger: yeah, i looked into it earlier, no sign of any systems in distress. can you confirm the error you get is always "connection refused"?	19:34
AJaeger	fungi: no problems on my side, just twice 10 times longer reponse...	19:35
fungi	opendev.org is a haproxy load balancer in socket proxying mode, not routing, so it would be odd for anything besides the load balancer to be what's emitting the tcp/rst which would indicate connection refused	19:35
fungi	i also checked and none of those systems (frontend or backends) rebooted or had containers downed/upped today	19:36
fungi	and the cacti graphs for all of them don't show any particularly anomalous behavior	19:36
clarkb	and gitea was upgraded thursday but this sounds more recent?	19:38
fungi	i'd be interested to see where all the clients getting the connection refused behavior are coming from (seems like all the reports might be from europe so far? the servers are all in california i think)	19:38
clarkb	fungi: ya servers are all in california	19:38
clarkb	and ya some source IPs would help us debug in logs	19:38
fungi	wondering if it might not be our systems, but something along the path emitting tcp/rst on behalf of the server addresses	19:39
fdegir	clone operations time out	19:39
fdegir	fatal: unable to access 'https://opendev.org/openstack/bifrost.git/': Failed to connect to opendev.org port 443: Connection timed out	19:40
fdegir	and yes, im in europe	19:41
clarkb	oh so it fails to tcp at all	19:41
clarkb	ya might try forcing ipv4 then ipv6 to see if you get differebt behavior?	19:41
clarkb	wehave had ipv6 routing oddities in that cloud before...	19:42
fungi	though i did test both v4 and v6 from home earlier and was able to connect over both	19:54
fungi	but yeah, connection timed out is not connection refused, so we have additional behaviors to investigate	19:54
fungi	and then there's AJaeger saying he's getting very slow transfer intermittently	19:55
*** dpawlik has quit IRC		19:56
fungi	so that's three different network misbehavior patterns which usually indicate different sorts of problems, but could also all be related to some general network connectivity problem (e.g., overloaded backbone peering point getting preferred in bgp)	19:57
fungi	after i'm finished making dinner i'll see if i can reproduce any of these issues from systems connected through different carriers	19:59
*** dpawlik has joined #opendev		20:09
AJaeger	I just updated all my repos (full checkout of non-retired opendev repos) - no problems at all ...	20:10
fungi	also come to think of it, we're still doing source-based hash for load distribution, so all connections from the same client should get persisted to the same backend unless it gets taken out of rotation due to an outage	20:12
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: stack-test: add haskell tool stack test https://review.opendev.org/723263	20:19
*** dpawlik has quit IRC		20:31
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: stack-test: add haskell tool stack test https://review.opendev.org/723263	20:50
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: stack-test: add haskell tool stack test https://review.opendev.org/723263	20:59
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: stack-test: add haskell tool stack test https://review.opendev.org/723263	21:14
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: stack-test: add haskell tool stack test https://review.opendev.org/723263	21:21
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: Add sibling container builds to experimental queue https://review.opendev.org/723281	22:53
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177	22:53
ianw	https://zuul.openstack.org/ is throwing me a security error	22:56
fungi	ianw: i think we're pending a cert re-replacement on that redirect site	23:00
ianw	fungi: pending as in just wait, or pending as in someone has to do something?	23:02
fungi	as in it sounded like mordred thought ansible was going to overwrite it with the proper cert again	23:02
fungi	note that zuul.opendev.org is the canonical url at this point	23:02
ianw	right, but going through status.openstack.org gets you the security error though	23:03
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: use stage3 instead of stage4 for gentoo builds https://review.opendev.org/717177	23:04
ianw	ok, it looks that was in scrollback about 6 hours ago, i'll investigate what's going on with the cert	23:06
*** gouthamr has quit IRC		23:11
*** gouthamr has joined #opendev		23:13
ianw	f0b77485ec (Monty Taylor 2020-04-05 09:25:28 -0500 5) - zuul.openstack.org	23:16
ianw	it really seems like zuul isn't matching in service-letsencrypt.yaml.log logs ... odd	23:18
ianw	right, zuul is in the emergency file ... so that explains that ... although it's in there with no comment	23:26
clarkb	ianw I think its in there due to the reload of tenat config needing a fix	23:26
clarkb	the current setup kills gearman when that happens	23:27
ianw	ok ... i'm also not seeing host _acme-challenge.zuul.openstack.org	23:28
ianw	that was probably why it hasn't been working since the initial add, although now it's not refreshing due to the emergency host situation	23:33
ianw	#status log added _acme-challange.zuul.openstack.org CNAME to acme.opendev.org	23:33
openstackstatus	ianw: finished logging	23:33
ianw	the idea of taking zuul/gearman/? down when i'm not really sure what's going on isn't terribly appealing at 9am on a monday :)	23:34
openstackgerrit	Ian Wienand proposed opendev/system-config master: status.openstack.org: send zuul link to opendev zuul https://review.opendev.org/723282	23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!