Monday, 2020-04-27

*** tosky has quit IRC		00:09
*** sgw has quit IRC		00:16
*** DSpider has quit IRC		00:19
openstackgerrit	Ian Wienand proposed opendev/glean master: Add container build jobs https://review.opendev.org/723285	00:29
openstackgerrit	Merged opendev/system-config master: status.openstack.org: send zuul link to opendev zuul https://review.opendev.org/723282	01:14
openstackgerrit	Merged opendev/system-config master: Cron module wants strings https://review.opendev.org/723106	01:39
openstackgerrit	Merged openstack/diskimage-builder master: Add sibling container builds to experimental queue https://review.opendev.org/723281	02:07
*** rkukura has quit IRC		02:13
*** rkukura has joined #opendev		02:27
ianw	mordred: there is something going on with puppet apply where it's somehow restoring back to an old change	02:49
ianw	remote_puppet_else.yaml.log.2020-04-27T01:24:05Z:Notice: /Stage[main]/Openstack_project::Status/Httpd::Vhost[status.openstack.org]/File[50-status.openstack.org.conf]/content: content changed '{md5}9185a2797200c84814be8c05195800fa' to '{md5}c9a8216d842c5c83e6910eb41d4d91ee'	02:49
ianw	remote_puppet_else.yaml.log.2020-04-27T01:35:36Z:Notice: /Stage[main]/Openstack_project::Status/Httpd::Vhost[status.openstack.org]/File[50-status.openstack.org.conf]/content: content changed '{md5}c9a8216d842c5c83e6910eb41d4d91ee' to '{md5}9185a2797200c84814be8c05195800fa'	02:49
ianw	the 01:24 run updated it, then the 01:35 run un-updated it, i think	02:50
ianw	deploy723282,19 mins 16 secs2020-04-27T01:23:44	02:51
clarkb	ianw: I think thats a zuul bug that cirvus found	02:51
ianw	opendev-prod-hourlymaster9 mins 16 secs2020-04-27T01:35:16	02:51
clarkb	it uses the change merged against master and that is racy	02:52
ianw	clarkb: hrm, i think it was the opendev-prod-hourly that has seemed to revert the change, that should have seen the new change?	02:54
ianw	the hourly job checked out system-config master to 2020-04-27 01:35:45.184204 \| bridge.openstack.org \| 2e2be9e6873ffe7dd07d84792b2bbef47e901f02 Merge "Fix zuul.conf jinja2 template"	02:55
clarkb	hrm maybe another bug of similar variety?	02:56
clarkb	like maybe deploy ran out order so deploy hourly ran head^ ?	02:56
ianw	if i'm correct in calculating https://opendev.org/opendev/system-config/commit/1d0d62c6a61159038be5c4e98bebb0e232131f56 merged at 2020-04-26 23:42 ... so several hours before the hourly job	02:58
ianw	going so see if i can come up with a timeline in https://etherpad.opendev.org/p/DSxEB-ViHzEHDMgxAJDp	03:00
ianw	the next run, running now, appears to have applied it	03:32
*** factor has joined #opendev		03:34
*** ykarel\|away is now known as ykarel		04:30
openstackgerrit	Merged zuul/zuul-jobs master: Update ensure-javascript-packages README https://review.opendev.org/722354	04:52
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] ensure-virtualenv https://review.opendev.org/723309	04:57
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] ensure-virtualenv https://review.opendev.org/723309	05:00
*** ysandeep\|away is now known as ysandeep		05:12
*** jaicaa has quit IRC		05:18
*** jaicaa has joined #opendev		05:20
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: [wip] ensure-virtualenv https://review.opendev.org/723309	05:37
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: [wip] plain nodes https://review.opendev.org/723316	05:41
*** dpawlik has joined #opendev		05:56
AJaeger	infra-root, I just saw a promote job fail with timeout uploading to AFS, see https://zuul.opendev.org/t/openstack/build/413faee223e54bc1bca7051a7b49c59b	05:58
ianw	AJaeger: hrm, weird; i just checked that dir, and even touched and rm'd a file there and it was ok	06:00
ianw	/afs/.openstack.org/docs/devstack-plugin-ceph	06:01
openstackgerrit	Merged openstack/project-config master: Add Airship subproject documentation job https://review.opendev.org/721328	06:04
AJaeger	ianw: might be a temporary networking problem ;(	06:25
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Stop translation stable branches on projects without Dashboard https://review.opendev.org/723217	06:35
*** iurygregory has quit IRC		07:09
*** iurygregory has joined #opendev		07:10
*** DSpider has joined #opendev		07:22
*** rpittau\|afk is now known as rpittau		07:22
*** tosky has joined #opendev		07:26
*** sshnaidm\|afk is now known as sshnaidm		07:35
*** ysandeep is now known as ysandeep\|lunch		08:16
*** logan_ has joined #opendev		08:31
*** logan- has quit IRC		08:32
*** logan_ is now known as logan-		08:35
*** ykarel is now known as ykarel\|lunch		08:44
hrw	zuul runs all using ansible. how to force it to use py3 on zuul?	09:02
hrw	2020-04-24 12:47:53.223078 \| primary \| "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_pip_payload_Ffk1eE/__main__.py\", line 254, in <module>\n from pkg_resources import Requirement\nImportError: No module named pkg_resources\n",	09:05
hrw	2020-04-24 12:47:53.223192 \| primary \| "msg": "Failed to import the required Python library (setuptools) on debian-buster-arm64-linaro-us-0016157969's Python /usr/bin/python. Please read module documentation and install in the appropriate location"	09:05
frickler	hrw: just set it like this? https://opendev.org/opendev/system-config/src/branch/master/playbooks/group_vars/gitea.yaml#L1	09:07
hrw	frickler: thx	09:08
*** ykarel\|lunch is now known as ykarel		09:36
*** ysandeep\|lunch is now known as ysandeep		09:53
*** ykarel is now known as ykarel\|afak		10:31
*** ykarel\|afak is now known as ykarel\|afk		10:31
*** rpittau is now known as rpittau\|bbl		10:32
*** ykarel\|afk is now known as ykarel		11:31
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Use cached buildset_registry fact https://review.opendev.org/723385	11:32
donnyd	Just an FYI OpenEdge is undergoing maintenance - shouldn't affect the CI - but in case it does you will know why	11:35
*** smcginnis has quit IRC		11:40
*** DSpider has quit IRC		11:40
*** smcginnis has joined #opendev		11:41
*** DSpider has joined #opendev		11:41
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: haskell-stack-test: add haskell tool stack test https://review.opendev.org/723263	11:58
openstackgerrit	Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339	12:33
*** ykarel is now known as ykarel\|afk		12:38
*** rpittau\|bbl is now known as rpittau		12:49
*** ykarel\|afk is now known as ykarel		12:52
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: WIP: omit variable instead of ignoring errors https://review.opendev.org/723524	13:19
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: WIP: omit variable instead of ignoring errors https://review.opendev.org/723524	13:20
openstackgerrit	Monty Taylor proposed opendev/system-config master: Use gitea for gerrit gitweb links https://review.opendev.org/723526	13:24
openstackgerrit	Monty Taylor proposed opendev/base-jobs master: Define an ubuntu-focal nodeset https://review.opendev.org/723527	13:27
openstackgerrit	Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528	13:29
openstackgerrit	Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528	13:33
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Do not set buildset_fact if it's not present in results.json https://review.opendev.org/723524	14:01
corvus	mordred: hrm, it doesn't look like we have a working cert for zuul.openstack.org yet	14:11
fungi	corvus: mordred: ianw spotted that it was getting overwritten and tried to put together a timeline in https://etherpad.opendev.org/p/DSxEB-ViHzEHDMgxAJDp	14:13
fungi	though i guess that was the redirect url itself, not the cert	14:15
fungi	for the link on status.o.o	14:16
fungi	oh, right, ianw spotted that the acme challenge cname hadn't been created for it so added that	14:17
fungi	but also noted that the server is still in the emergency disable list so changes to it aren't getting applied	14:17
fungi	and was reluctant to take it out of the emergency disable list with nobody else around	14:18
fungi	corvus: mordred: are we clear to take zuul01.openstack.org back out of the emergency disable list in that case? there's no comment in the file saying why we disabled it and now i can't remember	14:19
corvus	fungi: i think we need https://review.opendev.org/723107	14:20
corvus	otherwise the next config change will kill geard	14:20
fungi	got it, reviewing	14:20
fungi	ahh, yep, i remember discussing this one	14:20
corvus	so it seems like we can merge that, then take the scheduler out of emergency, then run the letsencrypt playbook? then run the zuul playbook?	14:21
fungi	seems that way to me, i just approved it moments ago	14:22
mordred	yes - I agree with all of that	14:29
fungi	related, corvus: mordred seems to have addressed your comment on 723048	14:30
fungi	mordred: i had a question on 723048 about use of sighup there... is that just sending hangup to the scheduler pid, and if so shouldn't we use the rpc client instead?	14:31
mordred	fungi: oh - HUP is probably bad there - maybe we don't need to do anything other than having docker-compose shut down the container?	14:32
mordred	ah - graceful stop in the old init script was USR1	14:33
fungi	yeah, if the goal was to stop the scheduler, then hup is not the thing	14:33
mordred	yeah- lemme update	14:33
corvus	we don't do any graceful stops of the scheduler at the moment, only hard stops	14:33
corvus	mordred: so i think we just want the scheduler to stop in the normal way	14:34
fungi	also usr1 seems unlikely to be something we would want to use anyway	14:34
fungi	because it could take hours to finish	14:34
corvus	yeah that	14:34
openstackgerrit	Monty Taylor proposed opendev/system-config master: Rework zuul start/stop/restart playbooks for docker https://review.opendev.org/723048	14:34
mordred	oh - yeah? ok. me just takes it out	14:34
fungi	though maybe once we have distributed scheduler, it's basically instantaneous/hitless?	14:34
openstackgerrit	Monty Taylor proposed opendev/system-config master: Rework zuul start/stop/restart playbooks for docker https://review.opendev.org/723048	14:34
mordred	how's that look?	14:35
fungi	lgtm	14:35
*** mlavalle has joined #opendev		14:41
fungi	clarkb: cacti indicates we had a hard swap event on lists.o.o (severe enough to cause a 15-minute snmp blackout) around 12:20	14:46
*** iurygregory has quit IRC		14:47
*** iurygregory has joined #opendev		14:48
fungi	oom knocked out 9 python processes between 12:26:48 and 12:33:12	14:49
fungi	probably earlier in fact, that event seems to have overrun the dmesg ring buffer	14:50
fungi	11 "Killed process" lines recorded to syslog between 12:27:01 and 12:33:30	14:51
fungi	i guess the timestamps embedded in the kmesg events are off by a bit	14:52
fungi	oh wow, even dstat was stuttering	14:54
fungi	toward the worst, it was only managing to record roughly one snapshot a minute	14:55
clarkb	fungi: did we see mailman qrunner process memory change upwards in that period?	14:59
openstackgerrit	Merged opendev/system-config master: Run smart-reconfigure instead of HUP https://review.opendev.org/723107	14:59
clarkb	also we should cross check with that robot too maybe?	14:59
fungi	i'm working to understand the fields recorded in the csv	14:59
fungi	looks like the last two fields are process details	15:00
fungi	ahh, no, the final fields are ,"process pid cpu read write","process pid read write cpu","memory process","used","free"	15:02
fungi	i guess those correspond to --top-cpu-adv --top-io-adv --top-mem-adv and so "memory process" is the field we care about there?	15:02
clarkb	Apr 27 12:25:54 is when OOM killer was first invoked looks like	15:03
clarkb	fungi: ya I think memory process is the most important one	15:03
clarkb	the others probably have useful info too like who was busy during the lead up period	15:03
clarkb	fungi: looks like that same bot is active around the OOM	15:05
clarkb	I kinda want to add a robots.txt that tells it to go away and see if we have a behavior change	15:05
fungi	so going into this timeframe, we had 12:20:00 13543 qrunner / 40660992%	15:05
clarkb	fungi: note the % is a bit weird. Its actually just bytes. So thats 40MB ish which isn't bad	15:06
fungi	as of 12:25:32 16053 listinfo / 50327552%	15:06
fungi	and kswapd0 was the most active cpu and i/o consumer	15:07
clarkb	fungi: what that is telling me is we don't have a single process which is loading up on memory.	15:07
clarkb	which makes me more suspicious of apache	15:07
clarkb	fungi: also we seem to be using mpm_worker and not mpm_event in apache	15:09
clarkb	likely a holdover from upgrading that server in place	15:09
clarkb	iirc mpm event is far more efficient memory wise because it doesn't fork for all the things?	15:09
clarkb	maybe we should try switching that over too	15:09
fungi	can't hurt	15:09
fungi	anyway, i'm going to restart all the mailman sites... we talked about wanting a reboot of this server anyway, should i just go ahead and do that?	15:10
fungi	and then set the dstat collection back up (and rotate the old log)	15:10
clarkb	fungi: ya a reboot seems like it would at least help rule out older kernel bugs (if that is a possibility here)	15:11
clarkb	I seem to recall that xenial kernel of some variety didn't handle buffers and caches properly	15:11
clarkb	and then we need to stop apache2, a2dismod mpm_worker, a2enmod mpm_event, start apache?	15:12
clarkb	mordred: maybe we should encode that into unit files then systemctl works and ansible can just ensure a service state?	15:13
clarkb	(I realize that will take a bit more work to get the systemd incantations correct, but our testing should help with that)	15:14
fungi	lists.o.o is currently booted with linux 4.4.0-145-generic with an uptime of 380 days and will be booting linux 4.4.0-177-generic	15:17
fungi	i've checked and apt reports no packages pending upgrade	15:17
fungi	reboot underway	15:17
fungi	taking a while to come back up, probably either a pending host migration or just overdue fsck	15:19
openstackgerrit	Merged opendev/base-jobs master: Define an ubuntu-focal nodeset https://review.opendev.org/723527	15:19
clarkb	fungi: seems like thats pretty normal for us :/	15:20
fungi	when you go that long between reboots, yes	15:20
fungi	it came back	15:21
fungi	41 qrunner processes running according to ps	15:21
clarkb	I see a bunch of mailman processes. I tlooks happy	15:21
fungi	so seems like the sites all started back as expected	15:21
clarkb	fungi: are you wanting to do the apache thing? or should I plan to do that after breakfast? I'm happy either way, justdon't want to step on toes	15:23
fungi	#status log lists.openstack.org rebooted for kernel update	15:23
openstackstatus	fungi: finished logging	15:23
fungi	#status log running `dstat -tcmndrylpg --tcp --top-cpu-adv --top-mem-adv --swap --output dstat-csv.log` in a root screen session on lists.o.o	15:23
openstackstatus	fungi: finished logging	15:23
corvus	mordred, fungi: i think we're ready to remove zuul from emergency and run some playbooks?	15:23
fungi	clarkb: i need to switch gears to do some openstack vmt stuff shortly, but can try to get to it later, or we can just observe first and see if the oom situation persists since the reboot	15:24
openstackgerrit	Merged zuul/zuul-jobs master: hlint: add haskell source code suggestions job https://review.opendev.org/722309	15:24
corvus	i think so, so i'll do that	15:25
*** ysandeep is now known as ysandeep\|away		15:25
fungi	corvus: i think so723107 merged ~25 minutes ago	15:25
fungi	thanks!	15:25
corvus	running le playbook now	15:25
fungi	oui, c'est bon	15:27
corvus	mordred, fungi, ianw: https://zuul.openstack.org/status lgtm now	15:30
corvus	looks like i don't need to run the zuul service playbook	15:30
fungi	awesome	15:32
clarkb	fungi: ya I' mostly just suspicious of apache right now given the qrunner sizes don't go up when we oom and we have an indexer bot running through apache at around that same period	15:35
fungi	oh, me too. if you look back at the cacti graphs, once it's able to get snmp responses again the 5-minute load average is still >50	15:37
fungi	so likely lots and lots of processes	15:37
clarkb	fungi: did my process above look correct to you for using mpm event? I've also double checked toehr other xenial hosts are using apache + mpm_event and not worker	15:38
fungi	which could be the mta or mailman handling a bunch of messages, but probably it's apache forking	15:38
*** _mlavalle_1 has joined #opendev		15:38
fungi	clarkb: yeah, i guess the current puppet-mailman isn't picking an mpm for apache and like you say we've inherited a non-default one due to in-place upgrades	15:39
fungi	your command sequence looks right to me	15:40
clarkb	ya I think we've basically just relied on platform defaults. Unfortauntely platform default has stuck around too long in this case	15:40
*** mlavalle has quit IRC		15:40
clarkb	infra-root ^ I'd like to switch apache2 from mpm_worker to mpm_event on lists.o.o. Plan is stop apache2; a2dismod mpm_worker; a2enmod mpm_event ; start apache2. This gets it in line with our other apache servers. I'll do that shortly after some tea. Let me know if you'd like me to hold off	15:41
corvus	clarkb: ++	15:42
fungi	thanks clarkb!	15:42
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Do not set buildset_fact if it's not present in results.json https://review.opendev.org/723524	15:46
clarkb	alright tea is consumed proceeding now	15:49
clarkb	and done. website seems to respodn to my browser	15:50
*** ykarel is now known as ykarel\|away		15:50
clarkb	#status log Updated lists.openstack.org to use Apache mpm_event insteand of mpm_worker. mpm_worker was a holdover from doing in place upgrades of this server. All other Xenial hosts default to mpm_event.	15:51
openstackstatus	clarkb: finished logging	15:51
clarkb	fungi: then assuming the OOM perists tomorrow maybe we try a robots.txt and exclude this particular bot?	15:52
*** sshnaidm is now known as sshnaidm\|afk		15:53
AJaeger	do we still use git0x.openstack.org? https://review.opendev.org/723251 proposes to kill the only place that I could find...	15:56
clarkb	AJaeger: we do not	15:56
clarkb	AJaeger: do you also want to remove git.opensatck.org from that list?	15:57
clarkb	its the line above the block you removed	15:57
AJaeger	clarkb: sure, can do...	15:57
AJaeger	I thought that was in use, so was not sure whther we need it...	15:57
clarkb	AJaeger: it exists as a redirect host on static.opendev.org, but I don't think we need cacti data for it	15:58
clarkb	(since it is just a CNAME to static in dns)	15:58
AJaeger	Ah, good	15:58
openstackgerrit	Andreas Jaeger proposed opendev/system-config master: Remove git*.openstack.org https://review.opendev.org/723251	15:59
AJaeger	clarkb: updated ^	15:59
*** rpittau is now known as rpittau\|afk		16:07
redrobot	Would love another set of eyes on this change: https://review.opendev.org/#/c/721349/	16:08
clarkb	corvus: mordred ^ are we good to add new git repos or do zuul things still need updating?	16:09
corvus	clarkb: i think we're good	16:09
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Do not set buildset_fact if it's not present in results.json https://review.opendev.org/723524	16:13
mordred	clarkb: yeah- I thnik we're good	16:13
clarkb	corvus: mordred did you see the thing from ianw yesterday about periodic deploy and on demand deploy undoing each other?	16:15
fungi	example was in that etherpad i linked earlier	16:15
clarkb	https://etherpad.opendev.org/p/DSxEB-ViHzEHDMgxAJDp appears to be where notes were taken	16:15
fungi	https://etherpad.opendev.org/p/DSxEB-ViHzEHDMgxAJDp	16:15
corvus	did it perhaps enqueue the item before 1:23, thereby enqueuing the old ref?	16:17
openstackgerrit	Merged opendev/system-config master: Rework zuul start/stop/restart playbooks for docker https://review.opendev.org/723048	16:17
*** iurygregory has quit IRC		16:17
corvus	(rather, not the old ref, but the state of the repo at that point in time)	16:17
clarkb	looks like hourly starts at the top of the hour with a 2 minute hitter	16:18
mordred	clarkb: yeah - I was thinking it might be what corvus said	16:18
clarkb	so ya it would've enqueued before the non hourly job ran at least. Not sure when the non hourly job enqueued though	16:18
corvus	clarkb: well the non-hourly job would have gotten that specific change	16:18
clarkb	should the hourly jobs maybe not use the zuul provided ref? and always update?	16:19
corvus	so basically, we enqueued an hourly job and froze the repo state for it, then probably due to load didn't get around to running it for a while	16:19
mordred	maybe - I believe the intent of the hourly jobs is "run with the tip of master when you run" as opposed to "run with the tip of master when you are encoded" - so maybe putting in a flag we can set on the hourly pipeline invocation of the job that would cause the playbooks to do a pull from opendev first?	16:20
mordred	(the maybe there is in response to clarkb's "should the hourly jobs...")	16:20
openstackgerrit	Merged openstack/project-config master: Revert "Disable ovn pypi jobs temporarily" https://review.opendev.org/723073	16:21
fungi	or should there be a way to tell zuul you want timer triggered jobs to have their heads resolved when started rather than when enqueued? that may be tough to pull off though	16:22
*** hrw has quit IRC		16:22
corvus	yeah, that's an intentional design decision to ensure that all jobs in a buildset run with the same repo state	16:22
*** elod has quit IRC		16:22
*** hrw has joined #opendev		16:22
mordred	in a magical world it would be neat to be able to have a periodic pipeline that only triggers if there has been no corresponding activity in a different pipeline for X duration. I have no idea what that would look like, and would probably require v4 and required db	16:23
corvus	so i don't think changing zuul is appropriate here	16:23
*** elod has joined #opendev		16:23
fungi	i just get a little twitchy with jobs working around zuul's git handling, but maybe this is one of those circumstances where it's the better solution	16:24
mordred	corvus: what do you think about a "pull latest from opendev" flag for the run-base playbook?	16:24
corvus	maybe a pull in the job is the best workaround here -- other than minimizing what we actually need the hourly pipeline for	16:24
corvus	(eventually, it should just be for letsencrypt, right?)	16:24
mordred	corvus: I think we mostly have hourly pipeline for things that are using images but that we don't have a way to trigger otherwise	16:24
mordred	so that we don't have to wait for a day to pick up a new zuul image or similar	16:24
mordred	but I agree with the goal - it woudl be great to have only LE in there	16:25
mordred	I can work up a "pull from upstream" flag if we think that's an ok workaround for now	16:25
corvus	sounds reasonable to me	16:25
fungi	yeah, it seems like the most straightforward solution at this point	16:26
*** tobiash has quit IRC		16:26
*** prometheanfire has quit IRC		16:26
*** calcmandan has quit IRC		16:26
*** noonedeadpunk has quit IRC		16:26
*** jkt has quit IRC		16:26
*** dirk has quit IRC		16:26
*** AJaeger has quit IRC		16:26
mordred	kk	16:26
clarkb	wfm	16:27
*** tobiash has joined #opendev		16:32
*** prometheanfire has joined #opendev		16:32
*** calcmandan has joined #opendev		16:32
*** noonedeadpunk has joined #opendev		16:32
*** jkt has joined #opendev		16:32
*** dirk has joined #opendev		16:32
*** AJaeger has joined #opendev		16:32
fungi	yoctozepto: fdegir: did your git problems with opendev.org repos persist into today or did they mysteriously clear up?	16:36
yoctozepto	fungi: I didn't do much today regarding opendev.org clone/pull operations so hard to tell; assume they did ;-)	16:37
*** ChanServ has quit IRC		16:42
fungi	yoctozepto: thanks, hopefully it was just some temporary network problem somewhere out on the internet	16:42
*** ChanServ has joined #opendev		16:45
*** tepper.freenode.net sets mode: +o ChanServ		16:45
*** _mlavalle_1 has quit IRC		17:09
*** mlavalle has joined #opendev		17:11
openstackgerrit	Merged openstack/project-config master: Define stable cores for horizon plugins in neutron stadium https://review.opendev.org/722682	17:16
openstackgerrit	Merged openstack/project-config master: Add Portieris Armada app to StarlingX https://review.opendev.org/721343	17:16
*** tobiash has quit IRC		17:26
*** prometheanfire has quit IRC		17:26
*** calcmandan has quit IRC		17:26
*** noonedeadpunk has quit IRC		17:26
*** jkt has quit IRC		17:26
*** dirk has quit IRC		17:26
*** AJaeger has quit IRC		17:26
*** tobiash has joined #opendev		17:29
*** prometheanfire has joined #opendev		17:29
*** calcmandan has joined #opendev		17:29
*** noonedeadpunk has joined #opendev		17:29
*** jkt has joined #opendev		17:29
*** dirk has joined #opendev		17:29
*** AJaeger has joined #opendev		17:29
openstackgerrit	Merged openstack/project-config master: Add ansible role for managing Luna SA HSM https://review.opendev.org/721349	17:29
fdegir	fungi: i noticed similar issues today as well so I had to switch to mirrors	17:38
*** ChanServ has quit IRC		17:39
fungi	fdegir: and this was cloning over https via ipv4?	17:40
*** ChanServ has joined #opendev		17:41
*** tepper.freenode.net sets mode: +o ChanServ		17:41
fdegir	fungi: yes and i just started another set of clones manually right now and it's hanging - will probably timeout	17:42
fdegir	Cloning into 'shade'...	17:42
fdegir	and just waits	17:42
fungi	i'll switch to trying shade in that case. and see about forcing my testing to go on ipv4 instead of ipv6	17:43
fdegir	fungi: as i noted yesterday, it could be another repo next time	17:44
fdegir	fatal: unable to access 'https://opendev.org/openstack/shade/': Failed to connect to opendev.org port 443: Connection timed out	17:44
fungi	looks like my git client is new enough to support `git clone --ipv4 ...`	17:44
fdegir	fungi: testing the repos bifrost clones during its installation: https://opendev.org/openstack/bifrost/raw/branch/master/playbooks/roles/bifrost-prep-for-install/defaults/main.yml	17:44
fdegir	*_git_url	17:45
fdegir	now requirements hanging	17:45
fungi	i've got a loop going on my workstation now like `while git clone --ipv4 https://opendev.org/openstack/shade;do rm -rf shade;done`	17:45
fdegir	fungi: if it helps, i can keep this thing running and you can look at logs	17:46
fdegir	i can pass my public ip if it helps	17:46
fungi	fdegir: yes, i can check our load balancer for any hits from your ip address, though if a connection failed to reach the load balancer that will be hard to spot	17:46
fungi	ideally devstack is timestamping when it tries to clone	17:47
fdegir	fungi: we don't use devstack	17:47
fungi	ahh, okay, the other problem report was from a devstack user	17:48
fdegir	fungi: yes - seeing that made me realize it may be an issue on gerrit side	17:48
fdegir	originally i thought i had issues but that bug report made me report as well	17:48
fungi	if you have a timestamp for when one of the failed clone commands was attempted i can hopefully work out whether any connections arrived at the load balancer from you at that time	17:48
fungi	i have exact times for every request which reached the lb from you and what backend they were directed to	17:49
fdegir	fatal: unable to access 'https://opendev.org/openstack/requirements/': Operation timed out after 300029 milliseconds with 0 out of 0 bytes received	17:49
fungi	but obviously if a connection attempt doesn't reach us that won't be logged at our end	17:49
fdegir	i don't have timestamps as we didn't enable timestamping on our jenkins	17:50
fdegir	and can't check the slaves since we use openstack single use slave	17:51
fdegir	Cloning into 'python-ironicclient'...	17:52
fdegir	so it is totally random	17:52
fungi	these are the connections haproxy saw from your ip address: http://paste.openstack.org/show/792771	17:52
fdegir	need to have dinner now	17:53
fdegir	will be back later tonight	17:53
fungi	cool, thanks!	17:53
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Do not set buildset_fact if it's not present in results.json https://review.opendev.org/723524	17:53
fungi	looks like those are all being directed to gitea04.opendev.org, though i may have trouble mapping them to specific requests as haproxy just operating as an osi layer 4 socket proxy isn't doing any transparent forwarding	17:54
fungi	so far my shade clone loop isn't encountering any issues, but i'll switch to hitting gitea04 directly to see if that has anything to do with it	17:55
clarkb	fungi: note that you will only hit a single backend using that url	17:55
fungi	right	17:56
fungi	now trying this: while git clone --ipv4 https://gitea04.opendev.org:3000/openstack/shade;do rm -rf shade;done	17:56
clarkb	and ya mapping onto gitea side requests can be a pain	17:57
*** factor has quit IRC		17:58
clarkb	fungi: gitea04 shows things like [E] Fail to serve RPC(upload-pack): exit status 128 - fatal: the remote end hung up unexpectedly	17:58
fungi	fdegir: when you're back from dinner, it might be helpful if you could try with a simple reproducer like that from the network where you're seeing that, and then maybe also try from another location if you can, so we can tell whether it's specific to the location you're coming from. if it is, then we can compare traceroutes in both directions and possible start to work on correlating where the problem might	17:59
fungi	be	17:59
dpawlik	hi. If we would like to switch in openstack/validations-common from testr to stestr, (https://review.opendev.org/#/c/723529/) requirements-check CI job is raising error that stestr not found in lower-constraints. Is something else that I need to configure or just add stestr==3.0.1 to lower-constraints.txt ?	17:59
fungi	clarkb: yeah, that could be due to a number of reasons, sounds like typical premature socket termination	17:59
fungi	dpawlik: i'm not sure, you may be better off asking in #openstack-requirements as it's probably more on topic there	18:00
clarkb	fungi: fdegir the other thing that might be useful is talking to a backend (or all 8 backends) directly	18:00
dpawlik	thank you fungi	18:00
clarkb	they all have valid tls certs and are exposed publicly;	18:00
fungi	clarkb: yeah, that's what i'm suggesting, test in a loop to one backend directly so we can rule out the source hash directing some clients to good backends and others to bad	18:01
fungi	the while shell loop i pasted just above is exactly that	18:01
clarkb	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66700&rra_id=all shows spikes in failed tcp connection attempts which may be related	18:02
clarkb	asymettric routing could cause that	18:02
fungi	fwiw, i've been running that continuously for nearly 10 minutes, and it's taking roughly 10 seconds each iteration, so far no errors	18:03
clarkb	fungi: the other upside to directly connecting to the backends is we'll be able to filter logs for that more easily	18:03
fungi	exactly	18:03
*** factor has joined #opendev		18:04
fungi	also the ip address fdegir provided me looks like it's in the citynetwork.se kna3 pop	18:06
*** mehakmittal has joined #opendev		18:06
fungi	or at least that's the subdomain in reverse dns on the last named core router in my traceroutes to it, but there are a couple hops after that with no ptr records on their serial interfaces	18:07
clarkb	`while true; do for X in `seq 1 8` ; do echo $X ; rm -rf shade-gitea0$X && git clone https://gitea0$X.opendev.org:3000/openstack/shade shade-gitea0$X ; done ; done` I'm running that now just to see if I can trip it. note you should set -x it	18:07
clarkb	er set -e	18:07
clarkb	fungi: we have a mirror in that location we could try to reprodcue from there	18:08
fungi	my thoughts as well	18:08
fungi	note that fdegir was seeing it over ipv4, so you might want to add --ipv4 to your git clone command to reproduce faithfully	18:08
clarkb	fungi: I've only got ipv4 locally	18:09
fungi	oh, :( for you	18:09
clarkb	if I want ipv6 I haev to explicitly bounce through my ipv6 cloud node	18:09
fungi	my backwater cable provider has finally managed to do a decent job with v6 prefix delegation over dhcp6	18:10
clarkb	fungi: my ISP just got bought (effective may 1st iirc). The new company is soliciting questions about the move and I asked if they planned to roll out ipv6. THe answer I got did not give me confidence they even know what ipv6 is	18:10
clarkb	no errors yet in that for loop. I'm going to knead some bread now	18:11
fungi	have fun!	18:11
fungi	looks like our mirror server is in kna1 not kna3, but maybe they share the same core	18:12
fungi	testing from mirror01.kna1.citycloud.openstack.org	18:14
fungi	interesting, that server routes outbound from an rfc-1918 address, presumably through a fip	18:15
fungi	i've got a clone loop of shade from gitea04 underway on it now, seeing around 4s for each clone to complete	18:16
fungi	ooh! i've hit it!!!	18:17
fungi	this definitely seems to be client location specific	18:17
fungi	i'll reproduce again with a brief sleep between attempts and some timestamping	18:18
*** mehakmittal has quit IRC		18:20
*** mehakmittal has joined #opendev		18:21
fungi	i've got this running in a root screen session on mirror01.kna1.citycloud.openstack.org now: while :;do sleep 10;echo -n 'start ';date -Is;git clone https://gitea04.opendev.org:3000/openstack/shade;echo -n 'end ';date -Is;rm -rf shade;done	18:22
fungi	the spacing should make it easy to find in gitea's web log	18:22
*** muskan has joined #opendev		18:22
fungi	i also confirmed the timestamps on that server seem to be accurate	18:23
fungi	an attempt to clone just now at 2020-04-27T18:23:35 seems to be hanging	18:23
fungi	yeah, still hanging, this is good!	18:24
fungi	doing `docker-compose logs\|grep 91.123.202.253` as root on gitea04 now with pwd /etc/gitea-docker	18:27
fungi	hopefully that's the correct thing	18:27
fungi	clone started at 18:25:53 is still hanging	18:27
clarkb	neat	18:28
clarkb	fungi: do you see it show up on the gitea side?	18:28
fungi	also started a ping from citycloud to opendev.org to see if there's any obvious packet loss	18:28
clarkb	my clone loop from home is still running successfully	18:28
fungi	gitea-web_1 \| [Macaron] 2020-04-27 18:23:20: Started POST /openstack/shade/git-upload-pack for 91.123.202.253	18:28
fungi	that's the last recorded entry for 91.123.202.253	18:29
fungi	i wonder if the timestamps from gitea are accurate	18:29
clarkb	ok so we get far enough to start the upload-pack but then packages maybe disappear? we can tcpdump those to see what is going on at lower level maybe?	18:29
clarkb	fungi: it records the start and end timestamps	18:29
clarkb	as separate entries	18:29
fungi	there was a clone from that address which started at 2020-04-27T18:23:35 and ended at 2020-04-27T18:25:43	18:29
fungi	and was successful	18:30
clarkb	fungi: also maybe have mirror.kna1 fetch resources from mirror.sjc1?	18:30
clarkb	fungi: and see if we can get it to fail doing more basic http requests	18:30
fungi	no, my bad, that one timed out	18:30
fungi	last successful clone started 2020-04-27T18:23:18 and ended at 2020-04-27T18:23:25	18:30
fungi	so i think the connection is never established	18:31
fungi	i'll switch to tcpdump next	18:31
yoctozepto	fungi: actually kolla CI had issues with opendev: " \"msg\": \"Failed to download remote objects and refs: fatal: unable to access 'https://opendev.org/openstack/ironic-python-agent-builder/': Failed to connect to opendev.org port 443: Connection timed out\\n\"",	18:32
yoctozepto	Mon Apr 27 12:45:20 2020	18:32
clarkb	yoctozepto: its likely the same issue if its some transatlantic routing problem (or similar)	18:32
fungi	yoctozepto: that (connection timed out) sounds like what we're seeing thenm	18:32
clarkb	but also please don't talk to gitea in zuul jobs	18:32
clarkb	zuul should provide everything you need	18:32
fungi	the launchpad bug opened for devstack yesterday indicated a "connection refused" error	18:32
fungi	yoctozepto: but since the job did connect to opendev, can you let us know where that failure ran?	18:33
yoctozepto	e90b15791a067a4e6e54-7143c90e898b1b306bc3770ac4d2d8a8.ssl.cf2.rackcdn.com	18:33
yoctozepto	oops	18:33
yoctozepto	https://zuul.opendev.org/t/openstack/build/c0c89c350cb242d7abed88e80de32984	18:33
clarkb	that job ran in kna1 too	18:34
clarkb	so ya likely the same issue	18:34
*** iurygregory has joined #opendev		18:34
fungi	provider: airship-kna1	18:34
fungi	yup	18:34
fungi	starting to suspect this may be a citynetwork issue	18:35
clarkb	I'm goign to stop my local clones now that we haev narrowed this down with an ability to debug	18:35
clarkb	my local clones did not have any problem	18:35
clarkb	fungi: are you testing kna to all gitea backends or just 04?	18:35
fungi	clarkb: just gitea04	18:36
clarkb	fungi: might be worth checking if it is all 8 (if its a bitmask problem or something like that then some may work while others dont)	18:36
fungi	yeah, have definitely seen that in the past when you have flow-based distribution routing hashed on addresses and one of your cores is blackholing stuff	18:37
fungi	okay, i have tcpdump running in a root screen session on gitea04 streaming to stdout and filtering for the kna1 mirror's ip address	18:38
yoctozepto	clarkb: ack, it's actually bifrost that talked to it and we have little control over it (it is to be deprecated and replaced by a kolla-containerised solution when time allows - hopefully soon)	18:38
fungi	assuming this is reproducible, mnaser may want to get in touch with the network folks at citynet	18:39
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: tox: Use 'block: ... always: ...' instead of ignore_errors https://review.opendev.org/723640	18:39
fungi	they'll probably have a faster time of working out the connectivity issues	18:39
fungi	tcpdump is definitely capturing packets on successful clone runs	18:40
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: ensure-sphinx: use failed_when: false instead of ignore_errors: true https://review.opendev.org/723642	18:41
fungi	as soon as i snag another hung clone i'll be able to work out whether the tcp/syn ever arrived. if it does and a response is generated, i'll probably need to start up a similar tcpdump on the mirror server to see if the syn+ack ever arrives	18:41
fungi	okay, caught one	18:42
fungi	start 2020-04-27T18:41:46+00:00	18:42
fungi	and last packet to arrive at gitea04 was 18:42:01.640261 IP 38.108.68.147.3000 > 91.123.202.253.39696 (end of the previous completed clone)	18:43
clarkb	fungi: so not even getting the SYN	18:44
fungi	that's how it's looking to me	18:44
fungi	my 1k echo slow ping is just about to wrap up and i can get some icmp delivery stats	18:44
fungi	1000 packets transmitted, 1000 received, 0% packet loss, time 999929ms, rtt min/avg/max/mdev = 176.894/177.360/267.018/3.398 ms	18:45
fungi	so icmp doesn't seem impacted	18:45
fungi	i could probably install hping or something to do syn/syn+ack pings but may be best if we just hand this off to mnaser and whoever we usually talk to at citycloud	18:46
*** dpawlik has quit IRC		18:47
fungi	though first i guess we can try some connections to other places from citycloud if we want	18:47
clarkb	fungi: ya I think so. Especially since its the initial SYN disappearing	18:47
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: fo: Use 'block: ... always: ...' and failed_whne instead of ignore_errors https://review.opendev.org/723643	18:47
fungi	maybe best to do an easier reproducer with nc or something	18:47
clarkb	or even just ping?	18:47
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: go: Use 'block: ... always: ...' and failed_when instead of ignore_errors https://review.opendev.org/723643	18:47
mnaser	let me try and ping people..	18:48
clarkb	fungi: fwiw tobias is usually who I email	18:48
fungi	thanks mnaser! i know you know some folks there	18:48
clarkb	and mnaser is always around :)	18:48
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: ara-report: use failed_when: false instead of ignore_errors: true https://review.opendev.org/723644	18:49
fungi	yeah, i don't usually see tobberydberg around in irc	18:50
fungi	oh, he's actually in #openstack-infra at the moment	18:51
fungi	but anyway, sounds like maybe mnaser has this well in hand	18:51
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: k8-logs: use failed_when: instead of ignore_errors: https://review.opendev.org/723647	18:51
mnaser	anything in specific i can forward?	18:51
mnaser	it looks like hitting opendev.org is timing out?	18:51
fungi	mnaser: we're getting reports from users of citycloud (including ourselves) that a small percentage of tcp connections from kna to servers we have in your sjc location have their initial tcp/syn packet never make it	18:53
fungi	the result is "connection timed out" for some tcp sockets	18:53
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: container-logs: use failed_when: instead of ignore_errors: https://review.opendev.org/723648	18:54
fungi	generally manifesting so far in `git clone` connections for the opendev.org gitea load balancer (though we've reproduced it with direct connections to the backend as well)	18:54
fungi	an example is 91.123.202.253 in citycloud (a fip for 10.0.1.9) stalls attempting to establish a socket to 38.108.68.147 3000/tcp	18:55
fungi	most connections attempts are fine, but sometimes the initial tcp/syn packet from 91.123.202.253 never makes it to 38.108.68.147 according to tcpdump listening on the destination	18:56
mnaser	fungi: wonderful, thank you, i handed that over	18:56
fungi	thanks!	18:56
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: tox: Use 'block: ... always: ...' instead of ignore_errors https://review.opendev.org/723640	18:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: ensure-sphinx: use failed_when: false instead of ignore_errors: true https://review.opendev.org/723642	18:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: go: Use 'block: ... always: ...' and failed_when instead of ignore_errors https://review.opendev.org/723643	18:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: ara-report: use failed_when: false instead of ignore_errors: true https://review.opendev.org/723644	18:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: use failed_when: instead of ignore_errors: https://review.opendev.org/723653	18:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: add-build-sshkey: use failed_when: instead of ignore_errors: https://review.opendev.org/723654	18:57
mnaser	i dont have an ack from them but its 9pm-ish so	18:57
fungi	yep, it's likely not urgent	18:57
fungi	fdegir: ^ to summarize, we think there's something going on between citycloud kna and vexxhost sjc, likely close to (or maybe even inside) the citycloud side of the connection	18:58
fungi	mainly because i've so far been unable to reproduce from elsewhere, though i'll try to test from citycluod lon and ovh gra just to get some more transatlantic datapoints	19:00
fdegir	thanks for the debugging fungi	19:04
fdegir	i was thinking london but instead try frankfurt and stockholm regions	19:05
fdegir	plus the us one perhaps	19:05
fungi	we conveniently already have servers in kna and lon which is why i tested those	19:05
fdegir	ok	19:06
fungi	so far i'm not able to reproduce from citycloud lon nor from ovh gra	19:06
fungi	so i have doubts it's a general transatlantic issue	19:06
fungi	were your connections coming from directly-addressed servers, or through a (layer 3 or 4) nat?	19:07
fungi	all our systems in citycloud are behind fips, so that could be a common factor too	19:07
fdegir	same as your systems	19:07
fdegir	we are running in kna as well	19:07
fungi	yeah, so could just be their nat layer is overrun in that pop	19:07
fungi	and some new flows are getting dropped	19:08
fungi	fdegir: if you have quota you can shift to one of their other pops, that might be a workaround for you	19:09
openstackgerrit	Merged zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src true https://review.opendev.org/721237	19:09
fdegir	fungi: i think we do and can try moving to london	19:10
fungi	fdegir: if that solve it for you, that'll also be a useful datapoint for us	19:10
fdegir	fungi: this was really helpful as i was puzzled and searching opendev/openstack-infra maillists to see if there was a planned maintenance	19:10
fungi	i'm putting together bidirectional traceroutes now to see if they're symmertical	19:11
fdegir	fungi: will let you know when i do that but it may not happen tomorrow	19:11
clarkb	fungi: the nat on our mjrros is 1:1	19:12
fungi	clarkb: yep, but it may very well be the same systems doing the binat and the overload pat	19:13
clarkb	but I suppose if global tables arefull that wont help much	19:13
*** muskan has quit IRC		19:14
fungi	both vexxhost and citynetwork seem to be peering with cogent and preferring them, though from kna3 the traceroute seems to go through citynetwork sto2/cogent sto03 peering, while on the way back from vexxhost packets are arriving at the cogent lon01/citynetwork lon1 peering and then traverse sto2 to kna3	19:17
fungi	so basically symmetrical on the vexxhost end but somewhat asymmetric on the citynetwork end	19:19
openstackgerrit	Merged zuul/zuul-jobs master: fetch-sphinx-tarball: Do not keep owner of archived files https://review.opendev.org/721248	19:20
fungi	testing with our mirror in lon1, routing is (unsurprisingly) fully symmetrical at least to the pop level	19:20
fungi	so this suggests the problem is likely in citynetwork kna3 or sto2	19:20
fungi	or possibly cogent sto03	19:21
fungi	given i can't reproduce the issue from lon, which is following basically the same routes through cogent's core	19:22
*** mehakmittal has quit IRC		19:22
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Set owner to executor user https://review.opendev.org/701381	19:28
mnaser	fungi: seems like they were having some similar issues -- "As for traffic to opendev.org regardless of which transit provider I push traffic through there is packetloss far out on the net. So they are most likely having issues (or their transit(s))"	19:40
mnaser	we use cogent at sjc1 so i guess that theory may add up	19:40
mnaser	there is other transit but, yeah.	19:41
mnaser	only reported issue is "Welcome to the Cogent Communications status page. Some customers may be seeing latency between Singapore and Hong Kong due to a submarine fiber issue. At this time there is no ETR. The ticket for this issue is HD11103479."	19:42
mnaser	fungi: can we have mtr running from kna1, that info may be useful to reach out to transit	19:43
clarkb	mnaser: do you want the pings too or is a simple traceroute sufficient?	19:43
mnaser	clarkb: traceroutes is usually what makes transit providers happy	19:44
clarkb	well thats a first, traceroute isn't installed but mtr is	19:46
mnaser	heh	19:46
clarkb	mnaser: http://paste.openstack.org/show/792777/	19:49
clarkb	that 486 rtt to first router is rough	19:51
openstackgerrit	Merged zuul/zuul-jobs master: tox: allow running default envlist in tox https://review.opendev.org/721796	19:56
openstackgerrit	Merged opendev/gerritlib master: Use ensure-* roles https://review.opendev.org/719404	20:02
fungi	clarkb: an mtr in the other direction would probably also be good	20:08
fungi	sometimes when you see a jump like that, it's the point of convergence for an assymetric route where the return path is going through a significant latency increase somewhere else many hops out	20:09
clarkb	fungi: k let me install mtr on opendev lb	20:09
clarkb	it doesn't have traceroute either	20:09
fungi	though also since the hops after that one are lower latency, it could just be that router is under load and deprioritizing icmp messages	20:10
fungi	not at all uncommon	20:10
fungi	especially since it looks like it's probably their datacenter distribution layer	20:11
clarkb	fungi: mnaser http://paste.openstack.org/show/792779/ the other direction	20:16
clarkb	I used traceroute there because I had to install either it or mtr and mtr has a million deps	20:16
corvus	clarkb: are we running an apache on zuul01 now?	20:34
clarkb	corvus: system-config/playbooks/roles/zuul-web seems to imply we are but I haven't double checked yet	20:36
clarkb	yup seems that we are	20:36
clarkb	I'm thinking maybe we want to compress the javascript html and css resources	20:36
corvus	clarkb: it looks like apache is still configured to serve out of /opt/zuul-web-content	20:37
corvus	which is making me wonder if we're positive anything has changed?	20:38
clarkb	corvus: the main reason I thought we had changed was the headers for main.js in my brwoser came from cherrypy	20:40
clarkb	it sets the server: header	20:40
clarkb	we rewrite /.* to localhost:9000/.*	20:41
clarkb	also it doesn't seem like the deflation of status.json is actually working. If I request it with accept-encoding: deflate set I get back plain text	20:41
clarkb	this might need a bit more in depth debugging	20:42
corvus	hrm, the timestamps on the apache config files are old though	20:42
corvus	clarkb: i don't see "/.* to localhost:9000/.*"	20:43
clarkb	corvus: thats in the zuul role	20:43
clarkb	corvus: 000-default.conf seems to be where we write that too	20:43
clarkb	and since it comes before the other files it wins? I think we should maybe remove the old files if they are no longer expected to be valid (to reduce confusion)	20:44
corvus	oooooh	20:44
corvus	yes those are brand new	20:44
corvus	this is a very confusing situation	20:44
clarkb	I agree	20:44
corvus	diff 40-zuul.opendev.org.conf 000-default.conf	20:45
corvus	that seems to suggest we have indeed lost some features	20:45
clarkb	corvus: if I'm reading it correctly I think a big change is going to cherrypy for all requests	21:04
clarkb	which I think is desireable, we wanted to stop consuming the js tarball, but maybe we need to figure out how to make that more efficient (better js compiles, compression, etc)	21:04
clarkb	corvus: I think the /api/status caching is all wrong now that zuul's api has been redone too?	21:05
corvus	clarkb: yeah, i don't think anything has to be different than before; apache as a reverse proxy should be able to cache the data, it should be served by cherrypy with correct headers	21:06
corvus	so i guess we need to identify what we think is different or should be improved and see if we can improve the apache config to make that happen	21:06
clarkb	corvus: for caching I think its just the path	21:09
clarkb	its /api/tenant/.*/status now iirc	21:09
*** jrichard has joined #opendev		21:11
clarkb	testing status retrieval in my browser it is coming back as gzip according to headers	21:13
clarkb	so the DEFLATE may be working with gzip and not deflate	21:13
clarkb	aha thats normal because apache	21:14
clarkb	that gives me an idea for an improvement here one moment please	21:14
clarkb	corvus: it doesn't look like cherrypy is setting content-type on static files it is serving	21:16
clarkb	corvus: but if it were we could do something like: AddOutputFilterByType DEFLATE application/json text/css text/javascript application/javascript	21:17
clarkb	I'll go ahead and push ^ up as well as caching improvements then if cherrypy starts doing that we'll be ready for it	21:17
jrichard	My change ( https://review.opendev.org/#/c/721343/ ) went in today to create the starlingx/portieris-armada-app repo, but I don't see it under https://zuul.opendev.org/t/openstack/projects . Do I need to do something else to add the project there?	21:17
clarkb	jrichard: no, we've been having some issues with config management that we thought were addressed but that indicates it probably isn't yet	21:18
clarkb	jrichard: is the project in gerrit?	21:19
clarkb	yes looks like gitea and gerrit are happy so its just the zuul config reload that isn't firing properly	21:19
corvus	looks like it ran manage-projects and puppet-else but not zuul	21:19
clarkb	mordred: corvus ^ fyi I know you were looking at that	21:19
corvus	i don't think i was looking at that but i can	21:20
jrichard	I do see it in gerrit. Is there anything I can do now to get it added there?	21:22
corvus	clarkb, mordred: a cursory look makes me think that project-config is just configured to run remote-puppet-else and hasn't been updated to run service-zuul	21:22
clarkb	corvus: I was assuming it was related to the sighup thing but I guess you think its earlier in the stack (not firing the job at all?)	21:23
corvus	clarkb: yeah, sighup should be fixed; i'll see about making a change to the job config	21:26
corvus	all of the job descriptions say "Run the playbook for the docker registry."	21:27
corvus	i feel like those could be more correct	21:27
corvus	clarkb: i'm really looking forward to your reorg patch	21:28
clarkb	corvus: ya I'll need to resurrect that once the dust has settled on zuul and nodepool and codesearch and eavesdrop	21:29
clarkb	I think nodepool is the last remaining set of services?	21:29
openstackgerrit	Clark Boylan proposed opendev/system-config master: Improve zuul-web apache config https://review.opendev.org/723711	21:29
clarkb	thats the first bit in making performance better I think	21:29
redrobot	Hmm... I don't think Zuul is picking up this patch to a new repo? https://review.opendev.org/#/c/723692/ Maybe I missed something? 🤔	21:30
redrobot	I had to add Zuul to reviewers manually	21:30
redrobot	but I don' think that helped, hehe	21:30
clarkb	redrobot: its the same issue jrichard has but against a different new repo	21:31
clarkb	redrobot: we basically haven't signalled zuul to let it know there are new projects	21:32
redrobot	clarkb, gotcha. Thanks!	21:32
openstackgerrit	James E. Blair proposed opendev/system-config master: Clean up some job descriptions https://review.opendev.org/723717	21:40
openstackgerrit	James E. Blair proposed openstack/project-config master: Run the zuul service playbook on tenant changes https://review.opendev.org/723718	21:43
corvus	clarkb, mordred: ^ i think that should fix the issue redrobot and jrichard observed	21:44
clarkb	looking	21:45
clarkb	corvus: also fwiw I'ev read up on zuul's cherrypy static file serving and it should lookup mimetypes by file extention	21:45
corvus	it looks like the tenant config is in place, so i will manually run a smart-reconfigure	21:45
clarkb	corvus: I think maybe having two .'s in the file extensions like we do with our js may confuse it? I need to set up a test for that	21:45
mordred	corvus: ah - yeah - I think that looks solid	21:45
mordred	clarkb, corvus: related: https://review.opendev.org/#/c/723022/	21:47
mordred	that will make sure service-zuul uses the zuul prepared copy of project-config	21:47
clarkb	corvus: in https://review.opendev.org/#/c/723718/1 I think we want puppet else and zuul	21:48
mordred	(which is a thing we added to other jobs after the initial zuul patch was written)	21:48
mordred	clarkb: why puppet else/	21:48
clarkb	mordred: nodepool for now	21:48
clarkb	I think it may be the last thing though	21:48
corvus	clarkb: i'm not following; we only ran puppet-else on changes to zuul/main.yaml	21:49
*** DSpider has quit IRC		21:49
mordred	yeah - I think it's ok to wait for service-nodepool before triggering nodepool config changes on p-c changes	21:49
ianw	corvus/modred: thanks, i didn't consider the enqueue v runtime	21:49
clarkb	oh I see ya ok	21:49
clarkb	I think the original code should've maybe been run more aggressively but if we weren't already then its fine	21:50
corvus	clarkb: are you suggesting we should run puppet-else on changes to nodepool/.* ?	21:50
corvus	clarkb: it looks like service-nodepool runs puppet on the old puppet servers	21:51
corvus	so i don't think we need puppet-else	21:51
mordred	oh good point	21:51
clarkb	oh I didn't realize that had gotten split out alraady	21:52
corvus	jrichard, redrobot: you should be good to go now; you'll probably need to recheck those changes	21:58
openstackgerrit	Merged openstack/project-config master: Run the zuul service playbook on tenant changes https://review.opendev.org/723718	22:07
clarkb	I was mistaken about cherrypy not sending content-type. It seems that firefox forgets that info if workin with a cached file	22:07
clarkb	but forcing cache bypass shows that it does send the content-type	22:07
openstackgerrit	Clark Boylan proposed opendev/system-config master: Improve zuul-web apache config https://review.opendev.org/723711	22:08
clarkb	infra-root ^ I think that may make zuul a bit more responsive for users	22:08
clarkb	I need to pop out for a bike ride now. Back in a bit	22:08
openstackgerrit	Merged opendev/system-config master: Clean up some job descriptions https://review.opendev.org/723717	22:32
*** jrichard has quit IRC		22:35
openstackgerrit	Monty Taylor proposed opendev/system-config master: Test zuul-executor on focal https://review.opendev.org/723528	22:46
redrobot	corvus, awesome, thanks for the help!	23:01
*** tosky has quit IRC		23:02
openstackgerrit	Clark Boylan proposed opendev/system-config master: Increase timeout on system-config-run-zuul https://review.opendev.org/723756	23:41
clarkb	my apache2 vhost change hit a timeout on that job so I'm bumping it	23:41
clarkb	looking at logs it seems to have been compiling openafs when it triggered the timeout	23:41

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!