Monday, 2020-09-14

*** DSpider has quit IRC		00:37
fungi	yes, i agree the fact that we're logging both vhosts to the same file makes investigating this slightly more confusing	00:47
fungi	ultimately, i would expect requests considered for caching to either be logged as a cache hit or a cache miss	00:47
fungi	requests logged as neither are, i think, not being considered for caching at all	00:47
fungi	possibly skipped by the cache mod, possibly not routed to it, i'm not sure which	00:48
openstackgerrit	Ian Wienand proposed opendev/system-config master: zuul-web: move LogFormat combined-cache into config https://review.opendev.org/751623	01:15
*** user_19173783170 has joined #opendev		01:16
ianw	fungi: ^ i agree it's not being considered, with the mod_cache status just "-"	01:21
user_19173783170	when i register my openstack fundation account, it alway prompt "Please confirm that you are not a robot"，why can't i receive the captcha?	03:14
user_19173783170	when i register my openstack fundation account, it alway prompt "Please confirm that you are not a robot", why can't i receive the captcha?	03:15
ianw	2001:4800:7819:103:be76:4eff:fe04:5870 - - [2020-09-14 03:19:27.549] "GET /api/tenant/pyca/status HTTP/1.1" 200 1140 cache hit "-" "curl/7.47.0"	03:20
ianw	so so much is wrong	03:20
ianw	user_19173783170: what's the page url?	03:20
user_19173783170	it's this:"https://openstackid.org/auth/register?client_id=7tdfQq8hu5SbLqGRXtQk0lwfD4mHBnTt.openstack.client&redirect_uri=https%3A%2F%2Fwww.openstack.org%2FSecurity%2Flogin%3FBackURL%3Djoin%252Fregister%252F%253Fmembership-type%253Dfoundation%26BackURL%3Dhttps%253A%252F%252Fwww.openstack.org%252F"	03:21
ianw	user_19173782170: ok, so you don't see the "i'm not a robot" check box down the bottom? i do	03:23
ianw	or are you saying you select that and it doesn't belive you?	03:23
user_19173783170	i dont see it	03:23
user_19173783170	is the reason my ip is in china?	03:24
ianw	user_19173782170: do you have any sort of ad-blockers or similar installed?	03:24
ianw	oh ... china ... well maybe? it's a standard reCAPTCHA box i see	03:24
ianw	Google reCAPTCHA works in China, as long as you reference reCAPTCHA library by https://www.recaptcha.net instead of https://www.google.com. See developer doc section “Can I use reCAPTCHA globally”	03:25
ianw	.	03:25
user_19173783170	dont have ad-blockers	03:25
ianw	<script src='https://www.google.com/recaptcha/api.js?render=onload'></script>	03:26
ianw	so it looks like that recaptcha should probably be not referencing google.com for global support	03:26
ianw	user_19173783170: it looks like the website will have to fix this ... can you use a vpn :/	03:28
clarkb	fwiw I expect jimmy can help tomorrow. Maybe file a bug?	03:28
ianw	sorry not sure what else to suggest. we definitely have users from China, but I'm not sure if they worked around this or it's something new	03:28
clarkb	https://bugs.launchpad.net/openstack-org	03:29
clarkb	is the bug tracker and I can ping jimmy et al in the morning	03:30
ianw	clarkb / user_19183783170 : i can quickly file the bug	03:30
user_19173783170	i use this for the first time	03:31
ianw	https://bugs.launchpad.net/openstack-org/+bug/1895496	03:32
openstack	Launchpad bug 1895496 in openstack-org "User from China reporting reCAPTCHA does not work" [Undecided,New]	03:32
ianw	user_19173783170: ^ i'm afraid we might have to wait for a resolution on (which i imagine will happen US daytime tomorrow) to get this going for you	03:33
user_19173783170	no problem, thanks for you help	03:33
openstackgerrit	Ian Wienand proposed opendev/system-config master: zuul-web: rework caching https://review.opendev.org/751645	04:00
ianw	fungi / clarkb: ^ i've been poking on the 000-default.conf to come up with that. with it, i'm seeing everything get it's cache-event flag in the logs filled out. i think it's on the way, at least	04:11
*** lpetrut has joined #opendev		05:57
*** ysandeep\|away is now known as ysandeep		06:10
*** cgoncalves has joined #opendev		06:13
*** qchris has quit IRC		06:21
openstackgerrit	Carlos Goncalves proposed openstack/project-config master: Update branch checkout for octavia-lib DIB element https://review.opendev.org/745877	06:33
*** qchris has joined #opendev		06:33
*** hashar has joined #opendev		06:40
*** andrewbonney has joined #opendev		07:42
*** ysandeep is now known as ysandeep\|lunch		07:46
openstackgerrit	Pierre-Louis Bonicoli proposed zuul/zuul-jobs master: default test_command: don't use a shell builtin https://review.opendev.org/751659	07:53
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:01
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Create elastic-recheck container image https://review.opendev.org/750958	08:14
*** DSpider has joined #opendev		08:24
*** tosky has joined #opendev		08:27
openstackgerrit	Dmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add support to use stow for ensure-python https://review.opendev.org/751611	08:46
openstackgerrit	wu.shiming proposed openstack/diskimage-builder master: Remove install unnecessary packages https://review.opendev.org/751665	08:46
openstackgerrit	Pierre-Louis Bonicoli proposed zuul/zuul-jobs master: default test_command: don't use a shell builtin https://review.opendev.org/751659	09:06
*** ysandeep\|lunch is now known as ysandeep		09:10
*** sshnaidm\|pto is now known as sshnaidm		09:10
*** dtantsur\|afk is now known as dtantsur		10:27
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Create elastic-recheck container image https://review.opendev.org/750958	10:32
*** ykarel has joined #opendev		10:52
ykarel	Is there some issue with http://codesearch.openstack.org/	10:52
ykarel	it returns 500	10:52
cgoncalves	I can confirm 500s	11:01
user_19173783170	i have solved the problem which can't receieve the CAPTCHA in chinese ip	11:14
user_19173783170	the solution is installing a plugin which named "Ghelper" in Chrome	11:16
user_19173783170	i also want to ask how to relate my openstack fundation account to my ubuntuone account	11:33
*** lpetrut has quit IRC		12:00
*** lpetrut has joined #opendev		12:01
*** Goneri has joined #opendev		12:11
*** priteau has joined #opendev		12:14
*** slaweq_ has joined #opendev		12:37
ttx	user_19173783170: ah, good to know. I'll pass the info by	12:41
ttx	user_19173783170: For your account issue, you should send an email to support@openstack.org so that they can help you	12:42
*** mnaser has quit IRC		13:32
*** mnaser has joined #opendev		13:32
*** mnaser has quit IRC		13:32
*** mnaser has joined #opendev		13:32
*** tkajinam has quit IRC		13:37
*** slaweq_ has quit IRC		13:38
fungi	ykarel: yeah, emilienm reported it in #openstack-infra too. taking a look now	13:59
fungi	#status log restarted houndd on codesearch.o.o following a json encoding panic at 10:03:40z http://paste.openstack.org/show/797837/	14:01
openstackstatus	fungi: finished logging	14:01
fungi	ykarel: ^ it should be on its way back up now	14:02
fungi	cgoncalves: ^	14:02
cgoncalves	fungi, thanks! waiting for reindexing to finish :)	14:02
fungi	yeah, it takes a few minutes for that to complete unfortunately	14:04
*** auristor has quit IRC		14:05
*** sshnaidm is now known as sshnaidm\|afk		14:07
ykarel	fungi, Thanks	14:08
fungi	user_19173783170: if you use the same e-mail addresses for both your openstack foundation and ubuntuone accounts, then we'll be able to correlate them	14:12
*** hashar has quit IRC		14:14
*** auristor has joined #opendev		14:18
dmsimard	btw seeing all fedora-31 based jobs fail in RETRY_LIMIT due to unbound being in an "unknown state", i.e: https://zuul.opendev.org/t/openstack/build/71a1da78b3f24d2e9883db36a5cf156c/console#0/2/9/fedora-31	14:19
dmsimard	won't have time to troubleshoot for a while longer but wanted to point out in case others have a similar issue	14:20
*** ykarel_ has joined #opendev		14:25
*** ykarel has quit IRC		14:28
*** ykarel__ has joined #opendev		14:32
*** ykarel_ has quit IRC		14:35
*** ykarel__ is now known as ykarel		14:35
openstackgerrit	Carlos Goncalves proposed openstack/project-config master: Add 'check arm64' trigger to check-arm64 pipeline https://review.opendev.org/751829	14:36
*** slaweq_ has joined #opendev		14:40
*** icey has quit IRC		14:48
*** icey has joined #opendev		14:49
*** slaweq_ has quit IRC		14:55
*** ykarel is now known as ykarel\|away		15:00
*** lpetrut has quit IRC		15:06
*** Topner has joined #opendev		15:11
*** Topner has quit IRC		15:11
*** ykarel\|away has quit IRC		15:20
*** Topner has joined #opendev		15:23
*** lpetrut has joined #opendev		15:28
*** priteau has quit IRC		16:04
openstackgerrit	Merged opendev/system-config master: zuul-web: move LogFormat combined-cache into config https://review.opendev.org/751623	16:05
*** ysandeep is now known as ysandeep\|away		16:31
*** ykarel\|away has joined #opendev		16:56
*** mlavalle has joined #opendev		16:57
*** lpetrut has quit IRC		16:57
*** Gyuseok_Jung has quit IRC		17:00
*** ykarel\|away has quit IRC		17:05
openstackgerrit	Sorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Create elastic-recheck container image https://review.opendev.org/750958	17:15
clarkb	I'm looking at the fedora issue dmsimard reported	17:20
fungi	thanks, i hadn't gotten time to dig into that yet	17:20
clarkb	https://bugzilla.redhat.com/show_bug.cgi?id=1853736 seems related	17:20
openstack	bugzilla.redhat.com bug 1853736 in systemd "systemctl show service fails with "Failed to parse bus message: Invalid argument"" [Unspecified,Closed: errata] - Assigned to systemd-maint	17:20
clarkb	it seems that they fixed fedora 32 but not 31	17:22
fungi	fedora 31 is so last year (literally!)	17:23
clarkb	ya I'm not sure how to handle this	17:23
clarkb	we could shell out on f31 but I worry the next use of the service module will just fail	17:23
clarkb	though in this case maybe having a working base job is enough for us then jobs can correct their use of service otherwise	17:24
dmsimard	o/ thanks for looking into this -- it started happening fairly recently, worked until now	17:31
*** andrewbonney has quit IRC		17:31
dmsimard	I can confirm that bumping to f32 "fixes" it	17:31
clarkb	I'm just going to brute force the service restarts with commands	17:32
clarkb	working on that change now	17:32
*** dtantsur is now known as dtantsur\|afk		17:35
clarkb	how do you use different handlers based on some criteria?	17:40
clarkb	do we need to have different notifying tasks for those criteria?	17:40
openstackgerrit	Clark Boylan proposed opendev/base-jobs master: Test handling unbound restart on fedora 31 https://review.opendev.org/751872	17:45
clarkb	I guess we can test and seeif ^ works	17:45
clarkb	another option may be to just deprecate f31 quickly? I dunno where f33 is at. Something to talk to ianw about I guess	17:50
clarkb	infra-root https://review.opendev.org/751645 and https://review.opendev.org/#/c/751426/ are two different followups to the zuul web performance issues from last week. One addresses caching and the other adds more zuul-webs	17:51
dmsimard	f33 is in beta right now iirc	17:52
fungi	and includes zuul!	17:53
* fungi is so proud		17:53
clarkb	dmsimard: ya I mean if the fix works I think we should land it .Mostly concerned that ansible makes this difficult	17:53
clarkb	its intentionally hidden in a test role to start too since I'm not super confident in it	17:53
corvus	clarkb: i vote try cache first then scale; i reviewed accordingly	17:56
corvus	clarkb: that okay, or do you want to get them going in parallel?	17:56
clarkb	corvus: I think doing them one after another to better see impact is a good idea	17:56
corvus	kk	17:57
openstackgerrit	Merged opendev/system-config master: zuul-web: rework caching https://review.opendev.org/751645	18:32
openstackgerrit	Merged openstack/project-config master: Revert "Pin setuptools<50 in our image venvs" https://review.opendev.org/749777	18:37
fungi	what's the next step toward deleting nb04? just taking it out of system-config? are there any remaining blockers to that? has that change been proposed already?	18:40
clarkb	fungi: we need to ensure none of its images are still alive in clouds	18:40
fungi	ahh, right, particularly bfv clouds like vexxhost i guess	18:40
fungi	i'll take a look shortly	18:41
clarkb	there are two remaining opensuse-tumbleweed-0000240092 and opensuse-tumbleweed-0000240093	18:41
clarkb	those are the only two tumbleeed images we have	18:41
fungi	stale ready nodes maybe	18:42
clarkb	we probably aren't building new tumbleweed images otherwise nb01 and nb02 would have at least one	18:42
fungi	#status log provider maintenance 2020-09-30 01:00-05:00 utc involving ~5-minute outages for databases used by cacti, refstack, translate, translate-dev, wiki, wiki-dev	18:44
openstackstatus	fungi: finished logging	18:44
clarkb	ya our tumbleweed image builds are failing	18:46
fungi	ugh	18:46
clarkb	conflict between grep and busybox-grep	18:46
clarkb	I think we can add busybox-grep to the deinstalls list to fix it	18:46
fungi	#status log deleted old 2017-01-04 snapshot of wiki.openstack.org/main01 in rax-dfw	18:48
openstackstatus	fungi: finished logging	18:48
fungi	#status log cinder volume for wiki.o.o has been replaced and cleaned up	18:50
openstackstatus	fungi: finished logging	18:50
fungi	so that only leaves the nb04 cinder volume which would be impacted by next months maintenance	18:51
fungi	and rackspace seems to have cleaned up all our old error_deleting volumes too	18:51
fungi	once nb04 is fully gone, i'll update the open ticket for the cinder maintenance and let them know we've replaced/deleted all the volumes they mentioned	18:54
openstackgerrit	Clark Boylan proposed openstack/diskimage-builder master: Install grep before busybox on suse distros https://review.opendev.org/751880	18:58
clarkb	fungi: ^ I think that is the fix, we already do similar for xz in dib	18:58
clarkb	(why xz doesn't just supercede busybox-xz and grep supercede busybox-grep I don't know)	18:59
fungi	interesting. yeah, debian doesn't even allow that. packages have to declare replaces or breaks if they have conflicting files, otherwise they don't make it into the distro	19:01
clarkb	zypper gives you the option of breaking rsync by keeping busybox-grep, replacing busybox-grep with grep or doing nothing	19:02
clarkb	so it has some of the info but doesn't default just do the sane thing	19:02
openstackgerrit	Clark Boylan proposed opendev/base-jobs master: Test handling unbound restart on fedora 31 https://review.opendev.org/751872	19:17
clarkb	linter didn't like that I used systemctl in command instead of the service module	19:18
* fungi sighs		19:18
clarkb	it would be fine if the service module worked :)	19:19
donnyd	So the transition to ceph and nvme for object storage at OE is complete and I think we would probably be ok to put it back in the rotation for logs	19:34
donnyd	Not sure what needs to be tested before putting it back into prod	19:36
clarkb	donnyd: nice. We'll want ot update the secret at https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/secrets.yaml#L188-L229 as well as update the list of clouds used at https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base-test/post-logs.yaml#L23-L28	19:38
clarkb	then when we're happy with base-test behavior we can make the same playbook change to the base/post-logs.yaml change	19:38
clarkb	donnyd: should we go ahead and add that when we get time or do you have more to do?	19:38
clarkb	(figure infra-root should do it since we have to encrypt the secret	19:38
donnyd	I think we are good to go... but I will find out for sure when the workload comes	19:39
donnyd	I have tested it and seems to work as much as one person can test	19:39
*** Topner has quit IRC		19:39
fungi	clarkb: i expect we can just revert the removal unless we have reason to believe the credentials changed?	19:42
fungi	well, revert but apply it to base-test i mean	19:42
fungi	but not need to reencrypt	19:42
clarkb	fungi: well the cloud name changed at least. Did we also change credentials or did they stay the same?	19:42
fungi	oh, if it was pre-oe then yeah the creds are likely entirely different	19:43
donnyd	yea, the creds will likely need to be redone	19:43
clarkb	yes the current secret is labeled cloud_fn_one	19:44
fungi	got it	19:44
clarkb	I can work on a change in a bit	19:44
openstackgerrit	Pierre Riteau proposed ttygroup/boartty master: Update author and home page to match gertty https://review.opendev.org/751886	19:46
openstackgerrit	Pierre Riteau proposed ttygroup/gertty master: Update author email address https://review.opendev.org/751887	19:47
*** slaweq_ has joined #opendev		19:53
openstackgerrit	Clark Boylan proposed opendev/base-jobs master: Use OpenEdge swift to host job logs https://review.opendev.org/751889	20:01
clarkb	donnyd: infra-root ^ fyi	20:01
donnyd	LGTM	20:05
openstackgerrit	Merged opendev/storyboard master: Optimise the Story browsing query https://review.opendev.org/742046	20:16
donnyd	If i wanted to help write some poorly written ansible to help the infra teams app deployments, where could I start? Is the deployment code in each app, or in a central repo?	20:37
clarkb	donnyd: most of our config management is in https://opendev.org/opendev/system-config	20:38
clarkb	donnyd: that contains our inventory and groups definitions as well as most of the playbooks and roles we use	20:38
corvus	donnyd: we aim for well tested so poorly written won't bother us :)	20:39
clarkb	donnyd: we're using more and more docker containers as well (driven by ansible and docker-compose) and the Dockerfiles for those tend to be in the application repos (like zuul or nodepool) unless we need to do a forked docker image for some reason	20:39
donnyd	corvus: we will see about that.. I write some pretty bad stuff	20:39
clarkb	fungi: good idea on the testing of OE swifts thing	20:40
clarkb	let me do a new pas	20:40
clarkb	*ps	20:40
donnyd	So the dockerfile and compose will be in the app repo and system-config is the tooling to deploy it	20:40
openstackgerrit	Clark Boylan proposed opendev/base-jobs master: Use OpenEdge swift to host job logs https://review.opendev.org/751889	20:40
fungi	donnyd: one which is halfway there is our storyboard deployment... we're publishing docker images to dockerhub for the various storyboard services but not using them yet, we're still deploying storyboard with the storyboard-puppet module at the moment	20:41
clarkb	donnyd: the dockerfile will be in the app repo but then the docker-compose and ansible to deploy it is in system-config	20:41
donnyd	so where do the containers get deployed? not that it matters.. just curious	20:41
fungi	donnyd: the service roles	20:42
clarkb	system-config/playbooks/roles/gitea may be a good example	20:42
clarkb	though we have the dockerfile for gitea in system-config/docker/gitea because we've forked it to add our own main page and branding stuff	20:42
fungi	er, service plabooks	20:42
fungi	which then use service-specific roles	20:43
donnyd	I was just looking at that one clarkb	20:43
donnyd	I was reading the thread on storyboard and it made me think that maybe I could actually make a useful contribution	20:45
donnyd	the etherpad one also looks like a decent example	20:47
diablo_rojo_phon	You definitely could and we'd love to have whatever help you'd like to offer :)	20:47
clarkb	diablo_rojo_phon: ya etherpad and gitea should be pretty similar to how we'd do storyboard except we'd put the Dockerfile in storyboard itself I bet	20:47
clarkb	er donnyd ^	20:47
clarkb	tab complete failed me	20:47
fungi	storyboard or anything else you want to help with, help is most welcome	20:49
fungi	it's also worth noting that switching from puppet to ansible (+docker where relevant) is a blocker for us updating our deployment platforms too. the version of puppet we're stuck on works on xenial but not bionic, so to upgrade past xenial we need to replace the old puppet orchestration and config management	20:51
donnyd	how many things still need to be migrated? lots??	20:51
clarkb	its a fair bit, though I think many of them should be of the more direct variety now	20:53
clarkb	the more difficult ones like gerritand zuul have been done (though netx up is working out the gerrit upgrade whichI'm slowly making progress on)	20:53
fungi	yeah, the stuff which remains doesn't really have interdependencies	20:55
fungi	so a lot more manageable as a task on its own	20:55
fungi	i think ianw has graphite in progress already, but i'm not aware of any others which are in progress	21:08
openstackgerrit	Merged opendev/base-jobs master: Use OpenEdge swift to host job logs https://review.opendev.org/751889	21:14
*** diablo_rojo has joined #opendev		21:15
clarkb	I've rechecked https://review.opendev.org/#/c/680178/5 which should test ^	21:16
clarkb	the commit message is no longer accurate but its still using base-test	21:18
clarkb	donnyd: https://api.us-east.open-edge.io:8080/swift/v1/AUTH_e02c11e4e2c24efc98022353c88ab506/zuul_opendev_logs_225/680178/5/check/tox-py27/22553d1/ it seems to work	21:25
clarkb	donnyd: its a little weird to see https on port 8080 but nothing actually wrong iwth that. Do you want to check anything before I propose a change to add it into the production rotation?	21:25
donnyd	Yea there are containers populating in the project	21:25
donnyd	yea, it is probably a bit strage	21:26
donnyd	LOL	21:26
donnyd	I usually proxy to the 13XXX range	21:26
donnyd	but eh.. .it works	21:26
donnyd	I think we are good to hook	21:26
clarkb	ya looks functional to me /me makes another chnage	21:26
donnyd	I am hopeful that the nvme object storage will work well this time around	21:27
donnyd	we will see when its time for logs to expire hit	21:27
donnyd	https://usercontent.irccloud-cdn.com/file/ke89YaM4/image.png	21:28
donnyd	LGTM	21:28
openstackgerrit	Clark Boylan proposed opendev/base-jobs master: Use openedge swift for logs on all jobs https://review.opendev.org/751905	21:28
donnyd	so long as everyone can reach them, we should be good to hook	21:28
clarkb	I'm ipv4 only at home. Maybe fungi wants to hit it from ipv6 before +2'ing ^	21:29
clarkb	or if there is no ipv6 then thats fine too :) I dind't check dns	21:29
donnyd	hrm, there should be	21:29
donnyd	I do have a record	21:30
clarkb	ya there is a AAAA record so having someone like fungi confirm ipv6 access works t owould be good. Otherwise I think we can land it	21:31
donnyd	https://usercontent.irccloud-cdn.com/file/Rbe9GTpf/image.png	21:34
donnyd	looks like its open outside of my network best I can test.. that whole being local thing has bitten me before though... so probably best to wait for fungi	21:35
*** slaweq_ has quit IRC		21:38
*** slaweq_ has joined #opendev		21:40
fungi	yup, sorry, food distractions here. what url am i testing ipv6 connectivity to?	21:50
clarkb	fungi: https://api.us-east.open-edge.io:8080/swift/v1/AUTH_e02c11e4e2c24efc98022353c88ab506/zuul_opendev_logs_225/680178/5/check/tox-py27/22553d1/	21:51
clarkb	fungi: that was generated by rechecking https://review.opendev.org/#/c/680178/5 which used the base-test update to have OE swift hosted logs	21:51
clarkb	if that looks good to you (it does to me via ipv4) then https://review.opendev.org/751905 should be safe to land	21:52
fungi	yeah, i have no trouble accessing that over ipv6	21:52
fungi	approving	21:52
*** slaweq_ has quit IRC		21:53
*** slaweq has joined #opendev		21:54
clarkb	unrelated: I'm about to send out the meeting agenda, Get your items in now :)	21:54
openstackgerrit	Merged opendev/base-jobs master: Use openedge swift for logs on all jobs https://review.opendev.org/751905	21:58
*** slaweq has quit IRC		22:05
*** slaweq has joined #opendev		22:09
*** slaweq has quit IRC		22:21
ianw	are we good with the zuul-web proxy bits?	22:27
clarkb	ianw: ya I think its working fine	22:27
clarkb	or at least it hasn't regressed. I haven'ttried to characterize the cache hit rate or anything like that	22:27
ianw	[2020-09-14 22:29:15.103] "GET /api/tenant/openstack/status/change/706153,10 HTTP/1.1" 200 2951 "cache miss: attempting entity save" "https://review.opendev.org/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"	22:29
ianw	hrm	22:30
ianw	[2020-09-14 22:29:14.308] "GET /api/status HTTP/1.1" 200 94041 "-" "https://zuul.openstack.org/status" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"	22:30
ianw	i feel like the first we didn't expect cached, and the second we did	22:31
clarkb	I think we expected both? it will cache the urls we specify and any below	22:31
clarkb	so maybe having the changes below that is unexpected but not necessarily wrong aiui	22:31
clarkb	and ya I would've expected the second to be cached or missed	22:31
ianw	i think i've copied it wrong in the openstack.org config	22:32
*** tosky has quit IRC		22:33
clarkb	ianw: two other things came up that may interesy you. The first is systemd is broken for ansible on f31. https://review.opendev.org/#/c/751872/ attempts to work around that and has links to bugs. The other is our tumbleweed image was build on nb04 and we haven't had a successful nb01 or 02 build which is preventing us from deleting nb04. https://review.opendev.org/#/c/751880/ will fix that I think	22:34
openstackgerrit	Ian Wienand proposed opendev/system-config master: zuul-web: fix zuul.openstack.org location match https://review.opendev.org/751917	22:35
ianw	yeah i saw that on f31. given f33 comes in october, maybe we just get rid of it; we have f32 now	22:35
ianw	although i think there was something to do with swap images i saw come by for that ...	22:36
clarkb	the swap change in ozj merged iirc	22:36
clarkb	basically use dd instead of fallocate because ext4 in new kernels breaks swapon on fallocate	22:36
ianw	yeah, that one	22:37
clarkb	we changed it universally to osince the issue is expected to hit everywhere soon enough	22:37
ianw	i'll go through the dib queue and i guess we want a release	22:41
clarkb	ya its also possible there is a better way to handle that in the zypper context	22:43
clarkb	but we already do that workaround for xz and busybox-xz so went with it	22:43
ianw	fungi / clarkb: maybe one more eye on https://review.opendev.org/#/c/747810/ for copying keys to apt dir would be nice	22:43
clarkb	I can review that one in a few. Getting the infra meeting agenda out now	22:44
clarkb	ianw: prometheanfire fungi I thought that doesn't work for xenial and I forget the correspinding debian release	22:47
clarkb	I guess the issue is that it never worked in the first place? so if we fix it we can just fix it for newer distros?	22:47
fungi	that worked for using binary pgp keyring files, just not ascii-armored keys	22:49
clarkb	gotcha	22:49
clarkb	so this will work for older releases too with the proper input data	22:49
ianw	yeah, that's what i thought. we could possibly expand the release note to be clearer on that i guess if you want	22:50
clarkb	I think chances are anyone was using this are slim since prometheanfire found it didn't work at all due to gpg being missing	22:50
clarkb	so should be an improvement going foward. Probably fine as is	22:51
*** tkajinam has joined #opendev		22:52
clarkb	ianw: hrm I've just noticed that https://zuul.opendev.org/t/openstack/build/80798144a8b749cd846356c561d4641f failed on the suse fix	22:52
clarkb	I wonder if we should figure that out too	22:53
ianw	https://zuul.opendev.org/t/openstack/build/80798144a8b749cd846356c561d4641f/log/nodepool/builds/test-image-0000000001.log#3219	22:53
ianw	e2fsprogs-1.45.6-1.19.x86_64 requires info, but this requirement cannot be provided	22:53
clarkb	its the same basic problem with busybox-gzip I think	22:54
clarkb	I wonder if we can do an install without busybox	22:54
clarkb	since it seems to be problematic here	22:55
prometheanfire	ohhai	22:55
clarkb	busybox, busybox-gzip, and busybox-static are the 3 busybox things we install in that log	22:55
clarkb	maybe if we do gzip instead of busybox-gzip (like with xz and grep) that will be sufficient	22:56
johnsom	FYI we seem to be seeing that strange CDN/cache issue again. The recently released oslo.log 4.4.0 is returning not found on some jobs.	22:56
prometheanfire	ah, cool, +2+W	22:56
clarkb	patterns-openSUSE-base is likely what pulls in the busybox stuff	22:56
ianw	johnsom: hrm, do you ahve a link? is ymq involved again?	23:00
johnsom	ianw https://57cbdeeafe6bab618f2f-00780db440ef90d2fe18db9118d58aa1.ssl.cf1.rackcdn.com/751918/1/check/openstack-tox-pep8/f82c8b5/job-output.txt	23:01
openstackgerrit	Clark Boylan proposed openstack/diskimage-builder master: Install gzip instead of busybox-gzip on suse https://review.opendev.org/751919	23:01
johnsom	Some jobs pass, some aren't	23:01
clarkb	ianw: ^ that should test it at least	23:01
clarkb	ianw: looks like ovh gra1	23:02
ianw	johnsom: ovh-gra1	23:02
johnsom	yep	23:02
johnsom	ovh-bhs1 are passing and finding it fine	23:03
ianw	https://pypi.org/simple/oslo-log/ will be the thing to look at	23:06
clarkb	johnsom: they are on different continents :)	23:07
ianw	< x-served-by: cache-bwi5141-BWI, cache-cdg20739-CDG	23:07
ianw	< x-cache: HIT, HIT	23:07
ianw	< x-cache-hits: 2, 1	23:07
ianw	and that seems to show it	23:07
clarkb	http://mirror.gra1.ovh.opendev.org/pypi/simple/oslo-log/ it is there now on the gra1 proxy	23:07
johnsom	I guess a number of the oslo libs are triggering it. I have only seen oslo.log but others are reporting other modules	23:08
clarkb	unfortunately we end up serving whatever pypi gives us and that has a short TTL	23:08
clarkb	often by the time we notice things have rolled over and are happy	23:08
johnsom	inap-mtl01 is also good	23:09
clarkb	https://github.com/pypa/warehouse/issues seems to be where they want pypi.org feedback and issues	23:10
clarkb	I wonder how terrible it would be to file an issue with a captured index fiel	23:11
ianw	if it had x- headers that would probably be good	23:12
clarkb	ya, now th etrouble is catching one :/	23:12
ianw	i got pretty far with it last time, but fastly had status issue up for slow purging or something, so it was put down to that	23:12
clarkb	I think the key thing from the job logs is that it sees 4.3.0 as a valid version which implies this isn't a python version thing since 4.4.0 and 4.3.0 both have the same python version requirements in the source index html. It also implies we got an index.html and not an empty response	23:14
ianw	http://kafka.dcpython.org/day/pypa-dev/2020-08-24#23.31.28.ianw	23:14
clarkb	the source html also has a serial on it	23:14
ianw	http://kafka.dcpython.org/day/pypa-dev/2020-08-25#00.11.33.PSFSlack	23:15
clarkb	ianw: prehaps https://status.fastly.com/incidents/x57ghk0zvq58 was the incident this time around	23:16
clarkb	or maybe they never properly purged back then and we hit the bad servers	23:16
clarkb	fwiw I think it is likely that fastly is at fault	23:16
*** mlavalle has quit IRC		23:20
ianw	fungi: you could double check the /api/status cache match for zuul.openstack.org in https://review.opendev.org/#/c/751917/ and i can monitor it when it deploys	23:20
clarkb	ianw: your IRC logs are for about whne oslo.log 4.4.0 was reeased too	23:20
clarkb	ianw: I suppose it could just be fallout from that original incident and fastly/pypi never did a proper rsync	23:20
ianw	... ohhh, i had just assumed oslo.log 4.4.0 had released like an hour ago :)	23:21
ianw	that perhaps makes it more interesting to pypa/fastly ...	23:21
fungi	ianw: lgtm, i guess we got rid of all the conditional matches on whether different cache modules were loaded?	23:22
clarkb	fungi: ya that was the prior change https://review.opendev.org/751645	23:22
fungi	thanks, today has been a bit hectic	23:23
fungi	i saw the title of that change but hadn't taken time to look through it	23:23
ianw	yeah, it took me quite a while to realise that the cache_mem modules was no more ...	23:23
fungi	aha! what we were seeing makes MUCH more sense now	23:23
fungi	thanks for figuring that out	23:24
ianw	i think the <ifdef> stuff is a bit of an anti-pattern; it's better for apache to just stop if you don't have the modules you want i think	23:24
fungi	it made some sense back when this was in presumed portable puppet modules, but no longer	23:25
ianw	i did also read that mod_rewrite with [p] is not considered as good as proxypass	23:27
fungi	did we change how puppet apply's stdout gets logged? is it no longer going to syslog?	23:28
clarkb	ianw: ya my related changes converts it all to proxypass	23:28
fungi	i'm trying and failing to work out why we've stopped deploying new storyboard commits on storyboard.o.o	23:28
clarkb	ianw: but I think I may rewrite that one to not do the extra zuul-web servers if we don't need them with just the caching and the zuul-web bugfix	23:28
fungi	we landed a new storyboard commit at 20:16 utc and it seems to have gotten checked out on the server at /opt/storyboard but not pip installed, according to `pbr freeze` there we're several commits behind	23:29
fungi	but i can't figure out where puppet's attempt to log that would be. we used to get puppet-user entries logged in /var/log/syslog	23:30
clarkb	fungi: I think it ends up in the ansible logs now	23:31
clarkb	on bridge	23:31
clarkb	I don't remember why it changed though	23:31
fungi	i didn't think the puppet output ended up there, i guess that's the behavior change	23:31
clarkb	I think it was so that we could hae the logs show up in zuul	23:31
fungi	i grepped /var/log/ansible/remote_puppet_else.yaml.log for "pip" but didn't find anything	23:31
fungi	that log is huge	23:31
clarkb	we switched from dumping into syslog to stdout	23:32
clarkb	and ansible grabs the stdout	23:32
fungi	i guess i need to figure out what the name of the task would have been to run puppet on there	23:32
ianw	there was some wip to split out the puppet jobs from one big puppet_else into more separate things i think?	23:32
clarkb	fungi: it may be in an older log file too depending on when the triggering change landed	23:32
clarkb	ianw: that hasn't been done for storyboard yet I don't think	23:33
fungi	oh! we rotate these very aggressively	23:33
fungi	yep	23:33
clarkb	fwiw I could go back to using syslog since we aren't really using zuul for those logs	23:36
fungi	there's a massive gap between the logfiles too	23:37
fungi	the newest rotated logfile ends at 14:16:14 but the first activity in the current log is 22:49:11	23:38
fungi	where did the other 8.5 hours go?	23:38
fungi	unfortunately our princess was in that castle, mario	23:39
clarkb	fungi: its rotated by the job when they complete	23:39
clarkb	I wonder if we're timing out and breaking that setup	23:39
fungi	or maybe this was not the periodic job	23:39
fungi	i'll start from the zuul end and work backwards	23:39
clarkb	job runs and logs to service-foo.yaml.log. At the end of the job we copy that to service-foo.yaml.log.timestamp. Next job runs and does it again	23:40
fungi	in this case there's no separate service log because it's handled by the puppet catch-all	23:41
clarkb	a	23:42
clarkb	*ya	23:42
clarkb	but its the same system (its per playbook)	23:42
fungi	yeah, the hourly runs for puppet are timing out	23:43
fungi	i'll have to look at this with fresh eyes tomorrow, i'm quickly getting fuzzy here	23:44
ianw	104.130.246.111	23:46
ianw	elasticsearch06.openstack.org.	23:46
ianw	i think that's the puppet holder-upper-er	23:46
ianw	it's dead, jim	23:47
clarkb	I wonder if we should proactively reboot those	23:48
clarkb	reboot first one, wait for status to settle, reboot second, wait for settling, etc	23:48
ianw	have we done them all yet? something has clearly happened to them :/	23:48
ianw	#status log rebooted elasticsearch06.opensatck.org, which was hung	23:48
openstackstatus	ianw: finished logging	23:48
clarkb	I've done I think two of them	23:49
*** DSpider has quit IRC		23:52
ianw	i've killed all the stuck processes	23:52
clarkb	we might also consider splitting them out of puppet else	23:52
clarkb	then if they fail the impact is lessened	23:52
*** Goneri has quit IRC		23:53
ianw	yeah, i think we should probably continue that work to split up all of puppet-else	23:56
clarkb	the mechanics of it are pretty straightforward iirc. We create a new .pp file for that service/hosts. We then add a job to run the puppet for that manifest and basically run it hwen else runs	23:57
clarkb	there are a couple explains we can look at to compare	23:57
clarkb	this would make agood meeting agenda item. I'll ninja add it tomorrow	23:57
openstackgerrit	Merged opendev/system-config master: zuul-web: fix zuul.openstack.org location match https://review.opendev.org/751917	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!